📘 Day 2: Sentiment & Zero-Shot Smackdown – Arabic Edition

Today, I dove deeper into two powerful Hugging Face pipelines — Sentiment Analysis and Zero-Shot Classification — with a twist: I focused on Arabic and multilingual performance. The goal? To find models that actually handle Arabic well, dialects and all. 🌍🔤

🔍 Part 1: Sentiment Analysis Recap

Previously, I found that the default pipeline("sentiment-analysis") worked okay for English but... Arabic? Not so much. So today was all about discovering a better Arabic sentiment model.

🧪 Models Tested:

Model	Result
`default` (`pipeline("sentiment-analysis")`)	👍 Good on English 👎 ~55% accurate on Arabic
`Anwaarma/Improved-Arabert-twitter-sentiment-No-dropout`	😐 Inaccurate, struggled with meaning
`Abdo36/Arabert-Sentiment-Analysis-ArSAS`	🫤 Slightly better than default, but dialect handling weak

🥇 Winner: `CAMeL-Lab/bert-base-arabic-camelbert-mix-sentiment`

✅ Accuracy: 95–99% on clear Arabic sentences
💬 Dialect-Friendly: Correctly classified Egyptian slang like
"الواد سواق التوك توك جارنا عسل" → Positive 97%
⚠️ Weakness: Lower performance on English (61%) and French (50%)

🧠 Key Takeaways

Fine-tuned, language-specific models seriously outperform the defaults.
Dialect support = must-have for real-world, diverse data.

🔍 Part 2: Zero-Shot Classification — Arabic & Multilingual Trials

Could zero-shot models understand Arabic prompts and interpret labels in multiple languages? Let’s see how they fared in a multilingual arena! 🧪🌐

1️⃣ `morit/arabic_xlm_xnli`

❌ Inaccurate, even on Arabic-only prompts
❌ Misaligned labels and scores

2️⃣ ✅ `MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`

Scenario	Accuracy	Improvement
Arabic → Arabic labels	✅ 96.1%	+21%
Arabic → English labels	✅ 86%	+37%
English → English labels	✅ 84%	+9%
English → Arabic labels	✅ 84%	vs. ~30% default
Mixed labels (Arabic + English)	✅ 92%	RTL handled properly

📌 UI Smartness:

Keeps English labels left-aligned, Arabic right-aligned
No visual bugs or mis-scored outputs from RTL quirks ✔️

🧠 Final Summary for Day 2

🎯 Model selection matters: The right model can boost performance by 20–25%!
🗣️ Dialect support is key — generic models don’t cut it in nuanced use cases.
🔁 Language pairing (input ↔ label) is critical for zero-shot reliability.
🧑‍💻 Proper RTL handling helps avoid UI headaches and scoring issues.

💡 What’s Next?

🚀 Try out another Hugging Face pipeline: translation or summarization sound exciting!
📚 Keep expanding my language-aware model notebook — with more dialects, labels, and real-world tests.