File size: 5,821 Bytes
41f5161 0ee6c80 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 |
# 📘 Day 1: Sentiment & Zero-Shot Classification for Arabic Text
## ✍️ Language & Sentiment: An Interesting Discovery
Today I explored sentiment analysis using Hugging Face's pipeline and discovered that the same phrase has **different confidence scores** depending on the language.
Example:
- **"I love you"** → Positive 99% ✅
- **"أنا بحبك"** → Positive 55%
- **"أنا بحبك أوي"** → Detected as Negative ❌
- **"Je t’aime" (French)** → Positive 99% ✅
This highlights how these models:
- Are heavily biased toward English
- Struggle with Arabic dialects (like Egyptian Arabic)
- Might not have seen enough emotionally expressive Arabic data during training
It made me realize how important **language diversity** and **cultural context** are in AI.
💡 This observation motivates me even more to contribute to Arabic AI tools and translations — maybe I can help bridge that gap.
---
## ✍️ Zero-Shot in Arabic: Right-to-Left Confusion & Multilingual Quirks
While experimenting with the `zero-shot-classification` pipeline using Arabic, I made a few surprising discoveries:
---
### 1️⃣ Arabic Labels Are More Accurate — But Appear Misleading
I tested the following:
```python
classifier(
"أنا أحب تعلم الذكاء الاصطناعي",
candidate_labels=["تعليم", "رياضة", "طعام"]
)
```
Output:
```python
{
'sequence': 'أنا أحب تعلم الذكاء الاصطناعي',
'labels': ['تعليم', 'طعام', 'رياضة'],
'scores': [0.754, 0.201, 0.044]
}
```
At first glance, it looks like the model chose رياضة (sports) — but that's incorrect.
➡️ Due to Arabic's right-to-left (RTL) layout, the printed label order may look reversed when reading JSON-style outputs — even though the scores remain in left-to-right order. This can easily mislead interpretation unless you're aware of it. The actual top label is تعليم (education) with the highest confidence.
This RTL formatting issue can easily mislead someone reading the raw dictionary output.
2️⃣ English Labels with Arabic Text = Less Accurate, Less Confusing
---
Switching to English labels:
```Python
classifier(
"أنا أحب تعلم الذكاء الاصطناعي",
candidate_labels=["education", "sports", "politics"]
)
```
Result:
```Python
'labels': ['education', 'sports', 'politics'],
'scores': [0.499, 0.287, 0.213]
```
✅ This result is less accurate (lower confidence), but easier to interpret — no RTL formatting confusion.
---
3️⃣ English Input and Labels = Most Accurate (as expected)
When both the text and labels are in English, the model performs better:
```Python
classifier(
"I love learning AI",
candidate_labels=["education", "sports", "food"]
)
```
✅ Much higher and more confident scores — again confirming that these models are optimized for English.
---
### 4️⃣ Arabic Labels with English Input = Inaccurate & Low Confidence`
I tested:
```Python
classifier(
"I love learning AI",
candidate_labels=["طعام", "تعليم", "رياضة"]
)
```
Using explicit pairing:
```Python
output = classifier("I love learning AI", candidate_labels=["طعام", "تعليم", "رياضة"])
for label, score in zip(output['labels'], output['scores']):
print(f"{label}: {score}")
```
Output:
```Python
طعام: 0.373
تعليم: 0.331
رياضة: 0.296
```
🟥 Despite **"تعليم" (education)** being the correct label, the model actually gave the highest score to **"طعام" (food)**.
This shows:
- The model **struggles** when the **input is English** but **labels are in Arabic**
- It may be biased toward more frequent or more easily matched embeddings (e.g., “food” might have more training examples)
- And it **does not align scores correctly** when different languages are used between input and labels
✅ Using matching languages (English-English or Arabic-Arabic) is safer and gives more reliable results.
---
5️⃣ Mixed Labels with Arabic Input = A Funny Twist
Using a mix of Arabic and English labels:
```Python
classifier(
"أنا أحب تعلم الذكاء الاصطناعي",
candidate_labels=["education", "رياضة", "طعام"]
)
```
Result:
```Python
'labels': ['طعام', 'رياضة', 'education'],
'scores': [0.733, 0.160, 0.105]
```
Interesting discovery:
- The English label (education) appears at the end due to the RTL layout.
- The model seems to apply the RTL rule to everything, even if the labels are mixed-language.
🤫 A Use Case Left to You!
If you're reading this and want to explore further — try using English sentences with mixed-language labels and observe how the model responds. I'd love to hear what others discover!
💡 Final Reflection
This pipeline shows promising multilingual ability, but:
- It struggles with confidence outside of English.
- It applies RTL formatting even to mixed-language labels, which is unintuitive.
- It shows how interface and cultural design matter just as much as raw performance.
This made me realize that building better Arabic support isn't just about translation — it's also about fixing formatting, UI feedback, and deeper training data gaps.
---
## 🧠 Final Summary
- These pipelines **work** with Arabic to a surprising extent.
- But they are clearly **not optimized** for multilingual or dialectal input.
- Layout quirks like RTL rendering, mismatched label-input languages, and bias toward English severely impact accuracy.
✅ This reinforces my mission to:
- Document these edge cases
- Advocate for better Arabic support
- Eventually build/testing tools for multilingual fairness
- I'm wondering what would happen if we pick a specific model tuned for Arabic and then use it for other languages? |