Post
22
This new preprint fine-tunes T5-small and Mistral-7B on the AI4Privacy PII-Masking-200K dataset and shows that lightweight models can match and sometimes rival much larger LLMs for privacy tasks.
The study tackles a real deployment question many teams face:
Is PII masking a model-size problem, or a data-quality problem?
Using AI4Privacy’s large-scale, standardized PII annotations, the authors systematically compare:
Encoder–decoder models (T5) vs
Decoder-only models (Mistral)
across accuracy, robustness, latency, and real-world conversational text.
What stood out:
Mistral-7B achieved higher recall and robustness across noisy, informal inputs but with 10× higher latency
T5-small, trained on the same AI4Privacy data, delivered fast, structured, low-cost masking, making it viable for real-time systems
Dataset normalization (not model size) was one of the biggest drivers of performance gains
The models were then deployed in a live Discord bot, where performance dropped under real-world conditions a reminder that benchmarks alone aren’t enough.
The takeaway is hard to ignore:
Privacy-preserving AI scales through data design, not just bigger models.
This work reinforces why open, well-curated datasets like AI4Privacy PII-Masking-200K are becoming foundational infrastructure for privacy-first AI especially for teams that need self-hosted, transparent solutions.
📄 Read the paper: https://arxiv.org/abs/2512.18608
The study tackles a real deployment question many teams face:
Is PII masking a model-size problem, or a data-quality problem?
Using AI4Privacy’s large-scale, standardized PII annotations, the authors systematically compare:
Encoder–decoder models (T5) vs
Decoder-only models (Mistral)
across accuracy, robustness, latency, and real-world conversational text.
What stood out:
Mistral-7B achieved higher recall and robustness across noisy, informal inputs but with 10× higher latency
T5-small, trained on the same AI4Privacy data, delivered fast, structured, low-cost masking, making it viable for real-time systems
Dataset normalization (not model size) was one of the biggest drivers of performance gains
The models were then deployed in a live Discord bot, where performance dropped under real-world conditions a reminder that benchmarks alone aren’t enough.
The takeaway is hard to ignore:
Privacy-preserving AI scales through data design, not just bigger models.
This work reinforces why open, well-curated datasets like AI4Privacy PII-Masking-200K are becoming foundational infrastructure for privacy-first AI especially for teams that need self-hosted, transparent solutions.
📄 Read the paper: https://arxiv.org/abs/2512.18608