File size: 3,322 Bytes
c87b31f
 
69c2b16
c87b31f
69c2b16
c87b31f
69c2b16
c87b31f
69c2b16
c87b31f
69c2b16
c87b31f
69c2b16
 
 
 
 
 
c87b31f
69c2b16
 
 
 
 
19301c9
69c2b16
c87b31f
69c2b16
c87b31f
69c2b16
 
 
 
c87b31f
69c2b16
c87b31f
69c2b16
c87b31f
69c2b16
c87b31f
69c2b16
c87b31f
69c2b16
 
 
 
c87b31f
69c2b16
c87b31f
69c2b16
 
 
 
 
 
 
c87b31f
69c2b16
c87b31f
69c2b16
 
 
 
c87b31f
69c2b16
c87b31f
69c2b16
c87b31f
69c2b16
 
 
 
 
 
 
 
c87b31f
69c2b16
c87b31f
69c2b16
c87b31f
69c2b16
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# πŸ“˜ Day 2: Sentiment & Zero-Shot Smackdown – Arabic Edition

Today, I dove deeper into two powerful Hugging Face pipelines β€” **Sentiment Analysis** and **Zero-Shot Classification** β€” with a twist: I focused on Arabic and multilingual performance. The goal? To find models that _actually_ handle Arabic well, dialects and all. πŸŒπŸ”€

---

## πŸ” Part 1: Sentiment Analysis Recap

Previously, I found that the default `pipeline("sentiment-analysis")` worked okay for English but... Arabic? Not so much. So today was all about discovering **a better Arabic sentiment model**.

### πŸ§ͺ Models Tested:

| Model                                                    | Result                                                     |
| -------------------------------------------------------- | ---------------------------------------------------------- |
| `default` (`pipeline("sentiment-analysis")`)             | πŸ‘ Good on English  <br>πŸ‘Ž ~55% accurate on Arabic         |
| `Anwaarma/Improved-Arabert-twitter-sentiment-No-dropout` | 😐 Inaccurate, struggled with meaning                      |
| `Abdo36/Arabert-Sentiment-Analysis-ArSAS`                | 🫀 Slightly better than default, but dialect handling weak |
### πŸ₯‡ Winner: `CAMeL-Lab/bert-base-arabic-camelbert-mix-sentiment`

- βœ… **Accuracy**: 95–99% on clear Arabic sentences
    
- πŸ’¬ **Dialect-Friendly**: Correctly classified Egyptian slang like  
    `"Ψ§Ω„ΩˆΨ§Ψ― Ψ³ΩˆΨ§Ω‚ Ψ§Ω„ΨͺΩˆΩƒ ΨͺΩˆΩƒ Ψ¬Ψ§Ψ±Ω†Ψ§ ΨΉΨ³Ω„"` β†’ **Positive 97%**
    
- ⚠️ **Weakness**: Lower performance on English (61%) and French (50%)
    

### 🧠 Key Takeaways

- **Fine-tuned, language-specific models** seriously outperform the defaults.
    
- **Dialect support = must-have** for real-world, diverse data.
    

---

## πŸ” Part 2: Zero-Shot Classification β€” Arabic & Multilingual Trials

Could zero-shot models understand Arabic prompts _and_ interpret labels in multiple languages? Let’s see how they fared in a multilingual arena! πŸ§ͺ🌐

### 1️⃣ `morit/arabic_xlm_xnli`

- ❌ Inaccurate, even on Arabic-only prompts
    
- ❌ Misaligned labels and scores
    

### 2️⃣ βœ… `MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`

|Scenario|Accuracy|Improvement|
|---|---|---|
|Arabic β†’ Arabic labels|βœ… 96.1%|+21%|
|Arabic β†’ English labels|βœ… 86%|+37%|
|English β†’ English labels|βœ… 84%|+9%|
|English β†’ Arabic labels|βœ… 84%|vs. ~30% default|
|Mixed labels (Arabic + English)|βœ… 92%|RTL handled properly|

πŸ“Œ **UI Smartness**:

- Keeps English labels left-aligned, Arabic right-aligned
    
- No visual bugs or mis-scored outputs from RTL quirks βœ”οΈ
    

---

## 🧠 Final Summary for Day 2

- 🎯 **Model selection matters**: The right model can boost performance by **20–25%**!
    
- πŸ—£οΈ **Dialect support** is key β€” generic models don’t cut it in nuanced use cases.
    
- πŸ” **Language pairing (input ↔ label)** is critical for zero-shot reliability.
    
- πŸ§‘β€πŸ’» **Proper RTL handling** helps avoid UI headaches and scoring issues.
    

---

## πŸ’‘ What’s Next?

- πŸš€ Try out another Hugging Face pipeline: **translation** or **summarization** sound exciting!
    
- πŸ“š Keep expanding my **language-aware model notebook** β€” with more dialects, labels, and real-world tests.