Michael Anthony's picture

Building on HF

Michael Anthony PRO

MikeDoes

·

http://www.ai4privacy.com

AI & ML interests

Privacy, Large Language Model, Explainable

Recent Activity

reacted to theirpost with 👀 1 day ago

AI4Privacy datasets are being used to decide what data should never leave the device. A new paper on privacy-preserving cloud computing uses the AI4Privacy PII-Masking-65K dataset to train models that classify text as private or public before it’s ever sent to the cloud. This is a subtle but important shift. Instead of encrypting everything or trusting the cloud by default, the authors ask a simpler question: Can we detect sensitive text early enough to keep it local? Using DistilBERT, trained partly on AI4Privacy PII data, the system learns to: route private text to local processing send non-sensitive text to the cloud train collaboratively using federated learning, without sharing raw data The result: 99.9% accuracy in private vs public text detection Near-centralized performance in downstream tasks like SMS spam detection Privacy protection enforced by design, not policy What stands out here is not just the model performance, but the architectural idea: privacy as a routing decision, backed by large-scale PII annotations. This work reinforces a pattern we keep seeing: scalable privacy systems don’t start with encryption, they start with good PII data. 📄 Full Paper here: https://dl.acm.org/doi/full/10.1145/3773276.3774872 #Ai4Privacy #DataPrivacy #PIIMasking #FederatedLearning #PrivacyEngineering #OpenSourceAI #ResponsibleAI #AcademicResearch #LLMSecurity

liked a model 4 days ago

openai/privacy-filter

reacted to theirpost with 🔥 5 days ago

AI4Privacy datasets are being used to decide what data should never leave the device. A new paper on privacy-preserving cloud computing uses the AI4Privacy PII-Masking-65K dataset to train models that classify text as private or public before it’s ever sent to the cloud. This is a subtle but important shift. Instead of encrypting everything or trusting the cloud by default, the authors ask a simpler question: Can we detect sensitive text early enough to keep it local? Using DistilBERT, trained partly on AI4Privacy PII data, the system learns to: route private text to local processing send non-sensitive text to the cloud train collaboratively using federated learning, without sharing raw data The result: 99.9% accuracy in private vs public text detection Near-centralized performance in downstream tasks like SMS spam detection Privacy protection enforced by design, not policy What stands out here is not just the model performance, but the architectural idea: privacy as a routing decision, backed by large-scale PII annotations. This work reinforces a pattern we keep seeing: scalable privacy systems don’t start with encryption, they start with good PII data. 📄 Full Paper here: https://dl.acm.org/doi/full/10.1145/3773276.3774872 #Ai4Privacy #DataPrivacy #PIIMasking #FederatedLearning #PrivacyEngineering #OpenSourceAI #ResponsibleAI #AcademicResearch #LLMSecurity

View all activity

Organizations

MikeDoes 's Spaces 2

Terminal Visualiser

Create and download styled terminal screenshots

TKG Visualiser

Visualize workflows from TSV data