docs: Day 3 notebook, polished log, final summary, and Day 4 vision added

19301c9 13 days ago

3.32 kB

	# 📘 Day 2: Sentiment & Zero-Shot Smackdown – Arabic Edition

	Today, I dove deeper into two powerful Hugging Face pipelines — Sentiment Analysis and Zero-Shot Classification — with a twist: I focused on Arabic and multilingual performance. The goal? To find models that _actually_ handle Arabic well, dialects and all. 🌍🔤

	---

	## 🔍 Part 1: Sentiment Analysis Recap

	Previously, I found that the default `pipeline("sentiment-analysis")` worked okay for English but... Arabic? Not so much. So today was all about discovering a better Arabic sentiment model.

	### 🧪 Models Tested:

	\| Model \| Result \|
	\| -------------------------------------------------------- \| ---------------------------------------------------------- \|
	\| `default` (`pipeline("sentiment-analysis")`) \| 👍 Good on English <br>👎 ~55% accurate on Arabic \|
	\| `Anwaarma/Improved-Arabert-twitter-sentiment-No-dropout` \| 😐 Inaccurate, struggled with meaning \|
	\| `Abdo36/Arabert-Sentiment-Analysis-ArSAS` \| 🫤 Slightly better than default, but dialect handling weak \|
	### 🥇 Winner: `CAMeL-Lab/bert-base-arabic-camelbert-mix-sentiment`

	- ✅ Accuracy: 95–99% on clear Arabic sentences

	- 💬 Dialect-Friendly: Correctly classified Egyptian slang like
	`"الواد سواق التوك توك جارنا عسل"` → Positive 97%

	- ⚠️ Weakness: Lower performance on English (61%) and French (50%)


	### 🧠 Key Takeaways

	- Fine-tuned, language-specific models seriously outperform the defaults.

	- Dialect support = must-have for real-world, diverse data.


	---

	## 🔍 Part 2: Zero-Shot Classification — Arabic & Multilingual Trials

	Could zero-shot models understand Arabic prompts _and_ interpret labels in multiple languages? Let’s see how they fared in a multilingual arena! 🧪🌐

	### 1️⃣ `morit/arabic_xlm_xnli`

	- ❌ Inaccurate, even on Arabic-only prompts

	- ❌ Misaligned labels and scores


	### 2️⃣ ✅ `MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`

	\|Scenario\|Accuracy\|Improvement\|
	\|---\|---\|---\|
	\|Arabic → Arabic labels\|✅ 96.1%\|+21%\|
	\|Arabic → English labels\|✅ 86%\|+37%\|
	\|English → English labels\|✅ 84%\|+9%\|
	\|English → Arabic labels\|✅ 84%\|vs. ~30% default\|
	\|Mixed labels (Arabic + English)\|✅ 92%\|RTL handled properly\|

	📌 UI Smartness:

	- Keeps English labels left-aligned, Arabic right-aligned

	- No visual bugs or mis-scored outputs from RTL quirks ✔️


	---

	## 🧠 Final Summary for Day 2

	- 🎯 Model selection matters: The right model can boost performance by 20–25%!

	- 🗣️ Dialect support is key — generic models don’t cut it in nuanced use cases.

	- 🔁 Language pairing (input ↔ label) is critical for zero-shot reliability.

	- 🧑‍💻 Proper RTL handling helps avoid UI headaches and scoring issues.


	---

	## 💡 What’s Next?

	- 🚀 Try out another Hugging Face pipeline: translation or summarization sound exciting!

	- 📚 Keep expanding my language-aware model notebook — with more dialects, labels, and real-world tests.

	# 📘 Day 2: Sentiment & Zero-Shot Smackdown – Arabic Edition

	Today, I dove deeper into two powerful Hugging Face pipelines — Sentiment Analysis and Zero-Shot Classification — with a twist: I focused on Arabic and multilingual performance. The goal? To find models that _actually_ handle Arabic well, dialects and all. 🌍🔤

	---

	## 🔍 Part 1: Sentiment Analysis Recap

	Previously, I found that the default `pipeline("sentiment-analysis")` worked okay for English but... Arabic? Not so much. So today was all about discovering a better Arabic sentiment model.

	### 🧪 Models Tested:

	\| Model \| Result \|
	\| -------------------------------------------------------- \| ---------------------------------------------------------- \|
	\| `default` (`pipeline("sentiment-analysis")`) \| 👍 Good on English <br>👎 ~55% accurate on Arabic \|
	\| `Anwaarma/Improved-Arabert-twitter-sentiment-No-dropout` \| 😐 Inaccurate, struggled with meaning \|
	\| `Abdo36/Arabert-Sentiment-Analysis-ArSAS` \| 🫤 Slightly better than default, but dialect handling weak \|
	### 🥇 Winner: `CAMeL-Lab/bert-base-arabic-camelbert-mix-sentiment`

	- ✅ Accuracy: 95–99% on clear Arabic sentences

	- 💬 Dialect-Friendly: Correctly classified Egyptian slang like
	`"الواد سواق التوك توك جارنا عسل"` → Positive 97%

	- ⚠️ Weakness: Lower performance on English (61%) and French (50%)


	### 🧠 Key Takeaways

	- Fine-tuned, language-specific models seriously outperform the defaults.

	- Dialect support = must-have for real-world, diverse data.


	---

	## 🔍 Part 2: Zero-Shot Classification — Arabic & Multilingual Trials

	Could zero-shot models understand Arabic prompts _and_ interpret labels in multiple languages? Let’s see how they fared in a multilingual arena! 🧪🌐

	### 1️⃣ `morit/arabic_xlm_xnli`

	- ❌ Inaccurate, even on Arabic-only prompts

	- ❌ Misaligned labels and scores


	### 2️⃣ ✅ `MoritzLaurer/mDeBERTa-v3-base-xnli-multilingual-nli-2mil7`

	\|Scenario\|Accuracy\|Improvement\|
	\|---\|---\|---\|
	\|Arabic → Arabic labels\|✅ 96.1%\|+21%\|
	\|Arabic → English labels\|✅ 86%\|+37%\|
	\|English → English labels\|✅ 84%\|+9%\|
	\|English → Arabic labels\|✅ 84%\|vs. ~30% default\|
	\|Mixed labels (Arabic + English)\|✅ 92%\|RTL handled properly\|

	📌 UI Smartness:

	- Keeps English labels left-aligned, Arabic right-aligned

	- No visual bugs or mis-scored outputs from RTL quirks ✔️


	---

	## 🧠 Final Summary for Day 2

	- 🎯 Model selection matters: The right model can boost performance by 20–25%!

	- 🗣️ Dialect support is key — generic models don’t cut it in nuanced use cases.

	- 🔁 Language pairing (input ↔ label) is critical for zero-shot reliability.

	- 🧑‍💻 Proper RTL handling helps avoid UI headaches and scoring issues.


	---

	## 💡 What’s Next?

	- 🚀 Try out another Hugging Face pipeline: translation or summarization sound exciting!

	- 📚 Keep expanding my language-aware model notebook — with more dialects, labels, and real-world tests.