Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
Building on HF
38
7
54
Michael Anthony
PRO
MikeDoes
Follow
julien-c's profile picture
bennybearlover's profile picture
Altfathermary's profile picture
129 followers
ยท
57 following
http://www.ai4privacy.com
MikeDoesDo
MikeDoes
mazourik
AI & ML interests
Privacy, Large Language Model, Explainable
Recent Activity
reacted
to
their
post
with ๐
1 day ago
AI4Privacy datasets are being used to decide what data should never leave the device. A new paper on privacy-preserving cloud computing uses the AI4Privacy PII-Masking-65K dataset to train models that classify text as private or public before itโs ever sent to the cloud. This is a subtle but important shift. Instead of encrypting everything or trusting the cloud by default, the authors ask a simpler question: Can we detect sensitive text early enough to keep it local? Using DistilBERT, trained partly on AI4Privacy PII data, the system learns to: route private text to local processing send non-sensitive text to the cloud train collaboratively using federated learning, without sharing raw data The result: 99.9% accuracy in private vs public text detection Near-centralized performance in downstream tasks like SMS spam detection Privacy protection enforced by design, not policy What stands out here is not just the model performance, but the architectural idea: privacy as a routing decision, backed by large-scale PII annotations. This work reinforces a pattern we keep seeing: scalable privacy systems donโt start with encryption, they start with good PII data. ๐ Full Paper here: https://dl.acm.org/doi/full/10.1145/3773276.3774872 #Ai4Privacy #DataPrivacy #PIIMasking #FederatedLearning #PrivacyEngineering #OpenSourceAI #ResponsibleAI #AcademicResearch #LLMSecurity
liked
a model
4 days ago
openai/privacy-filter
reacted
to
their
post
with ๐ฅ
5 days ago
AI4Privacy datasets are being used to decide what data should never leave the device. A new paper on privacy-preserving cloud computing uses the AI4Privacy PII-Masking-65K dataset to train models that classify text as private or public before itโs ever sent to the cloud. This is a subtle but important shift. Instead of encrypting everything or trusting the cloud by default, the authors ask a simpler question: Can we detect sensitive text early enough to keep it local? Using DistilBERT, trained partly on AI4Privacy PII data, the system learns to: route private text to local processing send non-sensitive text to the cloud train collaboratively using federated learning, without sharing raw data The result: 99.9% accuracy in private vs public text detection Near-centralized performance in downstream tasks like SMS spam detection Privacy protection enforced by design, not policy What stands out here is not just the model performance, but the architectural idea: privacy as a routing decision, backed by large-scale PII annotations. This work reinforces a pattern we keep seeing: scalable privacy systems donโt start with encryption, they start with good PII data. ๐ Full Paper here: https://dl.acm.org/doi/full/10.1145/3773276.3774872 #Ai4Privacy #DataPrivacy #PIIMasking #FederatedLearning #PrivacyEngineering #OpenSourceAI #ResponsibleAI #AcademicResearch #LLMSecurity
View all activity
Organizations
MikeDoes
's Spaces
2
Sort:ย Recently updated
Running
1
Terminal Visualiser
๐ป
Create and download styled terminal screenshots
Running
1
TKG Visualiser
๐
Visualize workflows from TSV data