Musno's picture
docs: Day 3 notebook, polished log, final summary, and Day 4 vision added
19301c9
# πŸ“˜ Day 2: Sentiment & Zero-Shot Smackdown – Arabic Edition
Today, I dove deeper into two powerful Hugging Face pipelines β€” **Sentiment Analysis** and **Zero-Shot Classification** β€” with a twist: I focused on Arabic and multilingual performance. The goal? To find models that _actually_ handle Arabic well, dialects and all. πŸŒπŸ”€
---
## πŸ” Part 1: Sentiment Analysis Recap
Previously, I found that the default `pipeline("sentiment-analysis")` worked okay for English but... Arabic? Not so much. So today was all about discovering **a better Arabic sentiment model**.
### πŸ§ͺ Models Tested:
| Model | Result |
| -------------------------------------------------------- | ---------------------------------------------------------- |
| `default` (`pipeline("sentiment-analysis")`) | πŸ‘ Good on English <br>πŸ‘Ž ~55% accurate on Arabic |
| `Anwaarma/Improved-Arabert-twitter-sentiment-No-dropout` | 😐 Inaccurate, struggled with meaning |
| `Abdo36/Arabert-Sentiment-Analysis-ArSAS` | 🫀 Slightly better than default, but dialect handling weak |
### πŸ₯‡ Winner: `CAMeL-Lab/bert-base-arabic-camelbert-mix-sentiment`
- βœ… **Accuracy**: 95–99% on clear Arabic sentences
- πŸ’¬ **Dialect-Friendly**: Correctly classified Egyptian slang like
`"Ψ§Ω„ΩˆΨ§Ψ― Ψ³ΩˆΨ§Ω‚ Ψ§Ω„ΨͺΩˆΩƒ ΨͺΩˆΩƒ Ψ¬Ψ§Ψ±Ω†Ψ§ ΨΉΨ³Ω„"` β†’ **Positive 97%**
- ⚠️ **Weakness**: Lower performance on English (61%) and French (50%)
### 🧠 Key Takeaways
- **Fine-tuned, language-specific models** seriously outperform the defaults.
- **Dialect support = must-have** for real-world, diverse data.
---
## πŸ” Part 2: Zero-Shot Classification β€” Arabic & Multilingual Trials
Could zero-shot models understand Arabic prompts _and_ interpret labels in multiple languages? Let’s see how they fared in a multilingual arena! πŸ§ͺ🌐
### 1️⃣ `morit/arabic_xlm_xnli`
- ❌ Inaccurate, even on Arabic-only prompts
- ❌ Misaligned labels and scores
### 2️⃣ βœ… `MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`
|Scenario|Accuracy|Improvement|
|---|---|---|
|Arabic β†’ Arabic labels|βœ… 96.1%|+21%|
|Arabic β†’ English labels|βœ… 86%|+37%|
|English β†’ English labels|βœ… 84%|+9%|
|English β†’ Arabic labels|βœ… 84%|vs. ~30% default|
|Mixed labels (Arabic + English)|βœ… 92%|RTL handled properly|
πŸ“Œ **UI Smartness**:
- Keeps English labels left-aligned, Arabic right-aligned
- No visual bugs or mis-scored outputs from RTL quirks βœ”οΈ
---
## 🧠 Final Summary for Day 2
- 🎯 **Model selection matters**: The right model can boost performance by **20–25%**!
- πŸ—£οΈ **Dialect support** is key β€” generic models don’t cut it in nuanced use cases.
- πŸ” **Language pairing (input ↔ label)** is critical for zero-shot reliability.
- πŸ§‘β€πŸ’» **Proper RTL handling** helps avoid UI headaches and scoring issues.
---
## πŸ’‘ What’s Next?
- πŸš€ Try out another Hugging Face pipeline: **translation** or **summarization** sound exciting!
- πŸ“š Keep expanding my **language-aware model notebook** β€” with more dialects, labels, and real-world tests.