✍️ Day 03 – Deep Dive into Summarization & Translation with Hugging Face 🤗
Today marks our exploration of two advanced generative NLP tasks beyond classification: Text Summarization and Machine Translation. We’ll compare default Hugging Face pipelines to language-specific models, emphasizing Arabic.
📝 1. Text Summarization
1.1 Overview
Baseline: Default
summarization
pipeline on English narrative.English-focused models: Compare
facebook/bart-large-cnn
,csebuetnlp/mT5_multilingual_XLSum
, andFalconsai/text_summarization
with length parameters.Arabic narrative: Assess multilingual vs. Arabic-specialized models on Arabic text.
1.2 Experiment 1: Default Pipeline on English Narrative
from transformers import pipeline
summarizer = pipeline("summarization")
# This is a short example
naruto_story = """
Born an orphan into the Hidden Leaf Village, Naruto Uzumaki's early life was shadowed by the terrifying Nine-Tailed Fox, a monstrous beast sealed within him."""
# Generate the summary
summary_default = summarizer(naruto_story)
# Print the result
print("--- Original Story ---")
print(naruto_story)
print("\n--- Default Summarizer Output ---")
print(summary_default[0]['summary_text'])
Model:
sshleifer/distilbart-cnn-12-6
(default).Input: A Naruto Uzumaki story (with/without an initial title line).
Key Observations:
Conciseness: The summary distilled only the core arc (orphan → Hokage).
Title Sensitivity: With the title line present, the model labeled Naruto as “The Seventh Hokage” and omitted his name; removing the title restored “Naruto.”
Omission of Details: Side characters (Sasuke, Jiraiya, etc.) and subplots were dropped due to aggressive compression.
Insight: Useful for quick overviews but lacks narrative richness and requires parameter tuning or fine-tuned models for detail retention.
1.3 Experiment 2: Fine-Tuned English Models
1.3.1 facebook/bart-large-cnn
Pros: More verbose; includes “Naruto Uzumaki”.
Cons: Hallucination: misgendered Naruto as Kushina’s daughter.
1.3.2 csebuetnlp/mT5_multilingual_XLSum
- Issue: Severe hallucinations; treated narrative like news, fabricating details (e.g., Konoha setting, BBC reporter).
1.3.3 Falconsai/text_summarization
# Load the fine-tuned summarization model
summarizer = pipeline("summarization", model="Falconsai/text_summarization")
# Experiment with increased max_length to get more detail
summarizer = summarizer(naruto_story, max_length=562, min_length=100, do_sample=False)
print("\n--- Fine-Tuned model on English Naruto Story ---")
print(summarizer[0]['summary_text'])
Setup:
max_length=562
,min_length=100
,do_sample=False
.Performance: Rich, coherent summary including multiple characters and plot points; minor truncation at
max_length
cutoff.
Conclusion: For English narrative, Falconsai/text_summarization offers the best balance of detail and accuracy.
1.4 Experiment 3: Arabic Narrative Summarization
- Model: csebuetnlp/mT5_multilingual_XLSum (with
min_length=100
).
Findings:
Hallucinations persisted; invented BBC Arabic interview segments.
Other Arabic or multilingual models similarly fabricated content.
English-tuned models produced garbled output on Arabic input.
Conclusion: Off-the-shelf Arabic summarization models on Hugging Face currently exhibit unreliable hallucinations. Custom fine-tuning on Arabic narratives or larger Arabic LLMs may be required.
🌐 2. Machine Translation Deep Dive
2.1 Scope
Focus: Translate between English ↔ Modern Standard Arabic (MSA) and Arabic dialects.
Models Tested:
facebook/nllb-200-distilled-600M
Helsinki-NLP/opus-mt-ar-en
Helsinki-NLP/opus-mt-en-ar
Helsinki-NLP/opus-mt-mul-en
2.2 Experiment Results
Model | MSA ↔ EN | Dialectal AR → EN | Notes |
---|---|---|---|
nllb-200-distilled-600M | Strong, fluent | Partial transliteration (“Yasta I am tired”) | Requires explicit language codes. |
opus-mt-ar-en | Good formal AR → EN | Struggled; literal or omitted slang | Tends toward brevity. |
opus-mt-en-ar | Weak EN → AR | N/A | Incomplete outputs; unreliable. |
opus-mt-mul-en | Good formal AR → EN | Poor on dialects | Multilingual training offers no advantage on dialects. |
Conclusion: MSA translation is well-supported. Dialects remain a hurdle; NLLB shows promise via its recognition/transliteration of colloquialisms. Specialized fine-tuning or larger LLMs needed for robust dialect handling.
🧠 Final Summary for Day 3
Today’s deep dive revealed both the capabilities and current limitations of open-source models when applied to Arabic-centric tasks:
📝 Summarization: English summaries are generally handled well—especially by models like Falconsai/text_summarization
—producing coherent and detailed outputs. However, Arabic summarization continues to struggle with hallucinations and fragmented narratives, underscoring the need for Arabic-specific fine-tuning and better cultural grounding.
🌐 Translation: Modern Standard Arabic (MSA) is reasonably well-supported across several models. In contrast, Arabic dialects remain a major challenge, often yielding transliterations or contextually inaccurate translations. Among tested models, facebook/nllb-200-distilled-600M
showed the most potential, particularly when used with explicit language codes.
More broadly, these experiments highlight the ongoing hurdles posed by linguistic diversity, dialectal variation, and cultural nuance—even for advanced multilingual systems. This experience strengthens my motivation to keep learning and, ultimately, contribute to building more inclusive tools for Arabic-speaking communities. 🌍💡
🔭 Vision for Day 4
Tomorrow’s mission is to wrap up all text-focused pipelines, completing the core set of foundational NLP tasks before shifting gears into vision models.
📌 Pipelines to Explore:
Question Answering
Compare default vs. Arabic-optimized models
Test with both MSA and dialectal inputs
Evaluate performance on short vs. long contexts
Named Entity Recognition (NER)
Assess entity extraction accuracy in Arabic and English
Look for confusion or missed entities, especially with dialect-specific names or terms
Fill-Mask
Use models like
bert-base-multilingual-cased
and Arabic BERT variantsObserve predictions on varied inputs, including poetry, idioms, and slang
Text Generation
Experiment with
gpt2
,mGPT
, and Arabic GPT modelsEvaluate fluency, coherence, and hallucination tendencies
🔁 Goal: Continue comparing default models with fine-tuned alternatives.
💡 Mindset: We're not just running tests — we're mapping the current landscape of Arabic in open-source NLP.
🎯 Outcome: By the end of Day 4, we’ll have a comprehensive understanding of Hugging Face’s strengths and gaps in multilingual text processing.