📘 Day 1: Sentiment & Zero-Shot Classification for Arabic Text

✍️ Language & Sentiment: An Interesting Discovery

Today I explored sentiment analysis using Hugging Face's pipeline and discovered that the same phrase has different confidence scores depending on the language.

Example:

"I love you" → Positive 99% ✅
"أنا بحبك" → Positive 55%
"أنا بحبك أوي" → Detected as Negative ❌
"Je t’aime" (French) → Positive 99% ✅

This highlights how these models:

Are heavily biased toward English
Struggle with Arabic dialects (like Egyptian Arabic)
Might not have seen enough emotionally expressive Arabic data during training

It made me realize how important language diversity and cultural context are in AI.

💡 This observation motivates me even more to contribute to Arabic AI tools and translations — maybe I can help bridge that gap.

✍️ Zero-Shot in Arabic: Right-to-Left Confusion & Multilingual Quirks

While experimenting with the zero-shot-classification pipeline using Arabic, I made a few surprising discoveries:

1️⃣ Arabic Labels Are More Accurate — But Appear Misleading

I tested the following:

classifier(
    "أنا أحب تعلم الذكاء الاصطناعي",
    candidate_labels=["تعليم", "رياضة", "طعام"]
)

Output:

{
  'sequence': 'أنا أحب تعلم الذكاء الاصطناعي',
  'labels': ['تعليم', 'طعام', 'رياضة'],
  'scores': [0.754, 0.201, 0.044]
}

At first glance, it looks like the model chose رياضة (sports) — but that's incorrect.

➡️ Due to Arabic's right-to-left (RTL) layout, the printed label order may look reversed when reading JSON-style outputs — even though the scores remain in left-to-right order. This can easily mislead interpretation unless you're aware of it. The actual top label is تعليم (education) with the highest confidence.

This RTL formatting issue can easily mislead someone reading the raw dictionary output. 2️⃣ English Labels with Arabic Text = Less Accurate, Less Confusing

Switching to English labels:

classifier(
    "أنا أحب تعلم الذكاء الاصطناعي",
    candidate_labels=["education", "sports", "politics"]
)

Result:

'labels': ['education', 'sports', 'politics'],
'scores': [0.499, 0.287, 0.213]

✅ This result is less accurate (lower confidence), but easier to interpret — no RTL formatting confusion.

3️⃣ English Input and Labels = Most Accurate (as expected)

When both the text and labels are in English, the model performs better:

classifier(
    "I love learning AI",
    candidate_labels=["education", "sports", "food"]
)

✅ Much higher and more confident scores — again confirming that these models are optimized for English.

4️⃣ Arabic Labels with English Input = Inaccurate & Low Confidence`

I tested:

classifier(
    "I love learning AI",
    candidate_labels=["طعام", "تعليم", "رياضة"]
)

Using explicit pairing:

output = classifier("I love learning AI", candidate_labels=["طعام", "تعليم", "رياضة"])
for label, score in zip(output['labels'], output['scores']):
    print(f"{label}: {score}")

Output:

طعام: 0.373
تعليم: 0.331
رياضة: 0.296

🟥 Despite "تعليم" (education) being the correct label, the model actually gave the highest score to "طعام" (food).

This shows:

The model struggles when the input is English but labels are in Arabic
It may be biased toward more frequent or more easily matched embeddings (e.g., “food” might have more training examples)
And it does not align scores correctly when different languages are used between input and labels

✅ Using matching languages (English-English or Arabic-Arabic) is safer and gives more reliable results.

5️⃣ Mixed Labels with Arabic Input = A Funny Twist

Using a mix of Arabic and English labels:

classifier(
    "أنا أحب تعلم الذكاء الاصطناعي",
    candidate_labels=["education", "رياضة", "طعام"]
)

Result:

'labels': ['طعام', 'رياضة', 'education'],
'scores': [0.733, 0.160, 0.105]

Interesting discovery:

The English label (education) appears at the end due to the RTL layout.
The model seems to apply the RTL rule to everything, even if the labels are mixed-language.

🤫 A Use Case Left to You!

If you're reading this and want to explore further — try using English sentences with mixed-language labels and observe how the model responds. I'd love to hear what others discover!

💡 Final Reflection

This pipeline shows promising multilingual ability, but:

It struggles with confidence outside of English.
It applies RTL formatting even to mixed-language labels, which is unintuitive.
It shows how interface and cultural design matter just as much as raw performance.

This made me realize that building better Arabic support isn't just about translation — it's also about fixing formatting, UI feedback, and deeper training data gaps.

🧠 Final Summary

These pipelines work with Arabic to a surprising extent.
But they are clearly not optimized for multilingual or dialectal input.
Layout quirks like RTL rendering, mismatched label-input languages, and bias toward English severely impact accuracy.

✅ This reinforces my mission to:

Document these edge cases
Advocate for better Arabic support
Eventually build/testing tools for multilingual fairness
I'm wondering what would happen if we pick a specific model tuned for Arabic and then use it for other languages?