π Day 2: Sentiment & Zero-Shot Smackdown β Arabic Edition
Today, I dove deeper into two powerful Hugging Face pipelines β Sentiment Analysis and Zero-Shot Classification β with a twist: I focused on Arabic and multilingual performance. The goal? To find models that actually handle Arabic well, dialects and all. ππ€
π Part 1: Sentiment Analysis Recap
Previously, I found that the default pipeline("sentiment-analysis")
worked okay for English but... Arabic? Not so much. So today was all about discovering a better Arabic sentiment model.
π§ͺ Models Tested:
Model | Result |
---|---|
default (pipeline("sentiment-analysis") ) |
π Good on English π ~55% accurate on Arabic |
Anwaarma/Improved-Arabert-twitter-sentiment-No-dropout |
π Inaccurate, struggled with meaning |
Abdo36/Arabert-Sentiment-Analysis-ArSAS |
π«€ Slightly better than default, but dialect handling weak |
π₯ Winner: CAMeL-Lab/bert-base-arabic-camelbert-mix-sentiment
β Accuracy: 95β99% on clear Arabic sentences
π¬ Dialect-Friendly: Correctly classified Egyptian slang like
"Ψ§ΩΩΨ§Ψ― Ψ³ΩΨ§Ω Ψ§ΩΨͺΩΩ ΨͺΩΩ Ψ¬Ψ§Ψ±ΩΨ§ ΨΉΨ³Ω"
β Positive 97%β οΈ Weakness: Lower performance on English (61%) and French (50%)
π§ Key Takeaways
Fine-tuned, language-specific models seriously outperform the defaults.
Dialect support = must-have for real-world, diverse data.
π Part 2: Zero-Shot Classification β Arabic & Multilingual Trials
Could zero-shot models understand Arabic prompts and interpret labels in multiple languages? Letβs see how they fared in a multilingual arena! π§ͺπ
1οΈβ£ morit/arabic_xlm_xnli
β Inaccurate, even on Arabic-only prompts
β Misaligned labels and scores
2οΈβ£ β
MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7
Scenario | Accuracy | Improvement |
---|---|---|
Arabic β Arabic labels | β 96.1% | +21% |
Arabic β English labels | β 86% | +37% |
English β English labels | β 84% | +9% |
English β Arabic labels | β 84% | vs. ~30% default |
Mixed labels (Arabic + English) | β 92% | RTL handled properly |
π UI Smartness:
Keeps English labels left-aligned, Arabic right-aligned
No visual bugs or mis-scored outputs from RTL quirks βοΈ
π§ Final Summary for Day 2
π― Model selection matters: The right model can boost performance by 20β25%!
π£οΈ Dialect support is key β generic models donβt cut it in nuanced use cases.
π Language pairing (input β label) is critical for zero-shot reliability.
π§βπ» Proper RTL handling helps avoid UI headaches and scoring issues.
π‘ Whatβs Next?
π Try out another Hugging Face pipeline: translation or summarization sound exciting!
π Keep expanding my language-aware model notebook β with more dialects, labels, and real-world tests.