Musno's picture
docs: Day 3 notebook, polished log, final summary, and Day 4 vision added
19301c9

πŸ“˜ Day 2: Sentiment & Zero-Shot Smackdown – Arabic Edition

Today, I dove deeper into two powerful Hugging Face pipelines β€” Sentiment Analysis and Zero-Shot Classification β€” with a twist: I focused on Arabic and multilingual performance. The goal? To find models that actually handle Arabic well, dialects and all. πŸŒπŸ”€


πŸ” Part 1: Sentiment Analysis Recap

Previously, I found that the default pipeline("sentiment-analysis") worked okay for English but... Arabic? Not so much. So today was all about discovering a better Arabic sentiment model.

πŸ§ͺ Models Tested:

Model Result
default (pipeline("sentiment-analysis")) πŸ‘ Good on English
πŸ‘Ž ~55% accurate on Arabic
Anwaarma/Improved-Arabert-twitter-sentiment-No-dropout 😐 Inaccurate, struggled with meaning
Abdo36/Arabert-Sentiment-Analysis-ArSAS 🫀 Slightly better than default, but dialect handling weak

πŸ₯‡ Winner: CAMeL-Lab/bert-base-arabic-camelbert-mix-sentiment

  • βœ… Accuracy: 95–99% on clear Arabic sentences

  • πŸ’¬ Dialect-Friendly: Correctly classified Egyptian slang like
    "Ψ§Ω„ΩˆΨ§Ψ― Ψ³ΩˆΨ§Ω‚ Ψ§Ω„ΨͺΩˆΩƒ ΨͺΩˆΩƒ Ψ¬Ψ§Ψ±Ω†Ψ§ ΨΉΨ³Ω„" β†’ Positive 97%

  • ⚠️ Weakness: Lower performance on English (61%) and French (50%)

🧠 Key Takeaways

  • Fine-tuned, language-specific models seriously outperform the defaults.

  • Dialect support = must-have for real-world, diverse data.


πŸ” Part 2: Zero-Shot Classification β€” Arabic & Multilingual Trials

Could zero-shot models understand Arabic prompts and interpret labels in multiple languages? Let’s see how they fared in a multilingual arena! πŸ§ͺ🌐

1️⃣ morit/arabic_xlm_xnli

  • ❌ Inaccurate, even on Arabic-only prompts

  • ❌ Misaligned labels and scores

2️⃣ βœ… MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7

Scenario Accuracy Improvement
Arabic β†’ Arabic labels βœ… 96.1% +21%
Arabic β†’ English labels βœ… 86% +37%
English β†’ English labels βœ… 84% +9%
English β†’ Arabic labels βœ… 84% vs. ~30% default
Mixed labels (Arabic + English) βœ… 92% RTL handled properly

πŸ“Œ UI Smartness:

  • Keeps English labels left-aligned, Arabic right-aligned

  • No visual bugs or mis-scored outputs from RTL quirks βœ”οΈ


🧠 Final Summary for Day 2

  • 🎯 Model selection matters: The right model can boost performance by 20–25%!

  • πŸ—£οΈ Dialect support is key β€” generic models don’t cut it in nuanced use cases.

  • πŸ” Language pairing (input ↔ label) is critical for zero-shot reliability.

  • πŸ§‘β€πŸ’» Proper RTL handling helps avoid UI headaches and scoring issues.


πŸ’‘ What’s Next?

  • πŸš€ Try out another Hugging Face pipeline: translation or summarization sound exciting!

  • πŸ“š Keep expanding my language-aware model notebook β€” with more dialects, labels, and real-world tests.