|
# 📘 Day 1: Sentiment & Zero-Shot Classification for Arabic Text |
|
|
|
## ✍️ Language & Sentiment: An Interesting Discovery |
|
|
|
Today I explored sentiment analysis using Hugging Face's pipeline and discovered that the same phrase has **different confidence scores** depending on the language. |
|
|
|
Example: |
|
|
|
- **"I love you"** → Positive 99% ✅ |
|
- **"أنا بحبك"** → Positive 55% |
|
- **"أنا بحبك أوي"** → Detected as Negative ❌ |
|
- **"Je t’aime" (French)** → Positive 99% ✅ |
|
|
|
This highlights how these models: |
|
- Are heavily biased toward English |
|
- Struggle with Arabic dialects (like Egyptian Arabic) |
|
- Might not have seen enough emotionally expressive Arabic data during training |
|
|
|
It made me realize how important **language diversity** and **cultural context** are in AI. |
|
|
|
💡 This observation motivates me even more to contribute to Arabic AI tools and translations — maybe I can help bridge that gap. |
|
|
|
--- |
|
## ✍️ Zero-Shot in Arabic: Right-to-Left Confusion & Multilingual Quirks |
|
|
|
While experimenting with the `zero-shot-classification` pipeline using Arabic, I made a few surprising discoveries: |
|
|
|
--- |
|
|
|
### 1️⃣ Arabic Labels Are More Accurate — But Appear Misleading |
|
|
|
I tested the following: |
|
|
|
```python |
|
classifier( |
|
"أنا أحب تعلم الذكاء الاصطناعي", |
|
candidate_labels=["تعليم", "رياضة", "طعام"] |
|
) |
|
``` |
|
|
|
Output: |
|
```python |
|
{ |
|
'sequence': 'أنا أحب تعلم الذكاء الاصطناعي', |
|
'labels': ['تعليم', 'طعام', 'رياضة'], |
|
'scores': [0.754, 0.201, 0.044] |
|
} |
|
``` |
|
|
|
At first glance, it looks like the model chose رياضة (sports) — but that's incorrect. |
|
|
|
➡️ Due to Arabic's right-to-left (RTL) layout, the printed label order may look reversed when reading JSON-style outputs — even though the scores remain in left-to-right order. This can easily mislead interpretation unless you're aware of it. The actual top label is تعليم (education) with the highest confidence. |
|
|
|
This RTL formatting issue can easily mislead someone reading the raw dictionary output. |
|
2️⃣ English Labels with Arabic Text = Less Accurate, Less Confusing |
|
|
|
--- |
|
|
|
Switching to English labels: |
|
|
|
```Python |
|
classifier( |
|
"أنا أحب تعلم الذكاء الاصطناعي", |
|
candidate_labels=["education", "sports", "politics"] |
|
) |
|
``` |
|
|
|
Result: |
|
|
|
```Python |
|
'labels': ['education', 'sports', 'politics'], |
|
'scores': [0.499, 0.287, 0.213] |
|
``` |
|
|
|
✅ This result is less accurate (lower confidence), but easier to interpret — no RTL formatting confusion. |
|
|
|
--- |
|
|
|
3️⃣ English Input and Labels = Most Accurate (as expected) |
|
|
|
When both the text and labels are in English, the model performs better: |
|
|
|
```Python |
|
classifier( |
|
"I love learning AI", |
|
candidate_labels=["education", "sports", "food"] |
|
) |
|
``` |
|
|
|
✅ Much higher and more confident scores — again confirming that these models are optimized for English. |
|
|
|
--- |
|
|
|
### 4️⃣ Arabic Labels with English Input = Inaccurate & Low Confidence` |
|
|
|
I tested: |
|
|
|
```Python |
|
classifier( |
|
"I love learning AI", |
|
candidate_labels=["طعام", "تعليم", "رياضة"] |
|
) |
|
``` |
|
|
|
Using explicit pairing: |
|
|
|
```Python |
|
output = classifier("I love learning AI", candidate_labels=["طعام", "تعليم", "رياضة"]) |
|
for label, score in zip(output['labels'], output['scores']): |
|
print(f"{label}: {score}") |
|
|
|
``` |
|
|
|
Output: |
|
|
|
```Python |
|
طعام: 0.373 |
|
تعليم: 0.331 |
|
رياضة: 0.296 |
|
``` |
|
|
|
🟥 Despite **"تعليم" (education)** being the correct label, the model actually gave the highest score to **"طعام" (food)**. |
|
|
|
This shows: |
|
|
|
- The model **struggles** when the **input is English** but **labels are in Arabic** |
|
|
|
- It may be biased toward more frequent or more easily matched embeddings (e.g., “food” might have more training examples) |
|
|
|
- And it **does not align scores correctly** when different languages are used between input and labels |
|
|
|
|
|
✅ Using matching languages (English-English or Arabic-Arabic) is safer and gives more reliable results. |
|
|
|
--- |
|
|
|
5️⃣ Mixed Labels with Arabic Input = A Funny Twist |
|
|
|
Using a mix of Arabic and English labels: |
|
|
|
```Python |
|
classifier( |
|
"أنا أحب تعلم الذكاء الاصطناعي", |
|
candidate_labels=["education", "رياضة", "طعام"] |
|
) |
|
``` |
|
Result: |
|
```Python |
|
'labels': ['طعام', 'رياضة', 'education'], |
|
'scores': [0.733, 0.160, 0.105] |
|
``` |
|
|
|
Interesting discovery: |
|
|
|
- The English label (education) appears at the end due to the RTL layout. |
|
- The model seems to apply the RTL rule to everything, even if the labels are mixed-language. |
|
|
|
🤫 A Use Case Left to You! |
|
|
|
If you're reading this and want to explore further — try using English sentences with mixed-language labels and observe how the model responds. I'd love to hear what others discover! |
|
|
|
💡 Final Reflection |
|
|
|
This pipeline shows promising multilingual ability, but: |
|
|
|
- It struggles with confidence outside of English. |
|
|
|
- It applies RTL formatting even to mixed-language labels, which is unintuitive. |
|
|
|
- It shows how interface and cultural design matter just as much as raw performance. |
|
|
|
This made me realize that building better Arabic support isn't just about translation — it's also about fixing formatting, UI feedback, and deeper training data gaps. |
|
|
|
--- |
|
|
|
## 🧠 Final Summary |
|
|
|
- These pipelines **work** with Arabic to a surprising extent. |
|
- But they are clearly **not optimized** for multilingual or dialectal input. |
|
- Layout quirks like RTL rendering, mismatched label-input languages, and bias toward English severely impact accuracy. |
|
|
|
✅ This reinforces my mission to: |
|
- Document these edge cases |
|
- Advocate for better Arabic support |
|
- Eventually build/testing tools for multilingual fairness |
|
- I'm wondering what would happen if we pick a specific model tuned for Arabic and then use it for other languages? |