Musno's picture
docs: Day 3 notebook, polished log, final summary, and Day 4 vision added
19301c9

✍️ Day 03 – Deep Dive into Summarization & Translation with Hugging Face 🤗

Today marks our exploration of two advanced generative NLP tasks beyond classification: Text Summarization and Machine Translation. We’ll compare default Hugging Face pipelines to language-specific models, emphasizing Arabic.


📝 1. Text Summarization

1.1 Overview

  • Baseline: Default summarization pipeline on English narrative.

  • English-focused models: Compare facebook/bart-large-cnn, csebuetnlp/mT5_multilingual_XLSum, and Falconsai/text_summarization with length parameters.

  • Arabic narrative: Assess multilingual vs. Arabic-specialized models on Arabic text.

1.2 Experiment 1: Default Pipeline on English Narrative

from transformers import pipeline

summarizer = pipeline("summarization")
# This is a short example
naruto_story = """
Born an orphan into the Hidden Leaf Village, Naruto Uzumaki's early life was shadowed by the terrifying Nine-Tailed Fox, a monstrous beast sealed within him."""

# Generate the summary
summary_default = summarizer(naruto_story)

# Print the result
print("--- Original Story ---")
print(naruto_story)
print("\n--- Default Summarizer Output ---")
print(summary_default[0]['summary_text'])
  • Model: sshleifer/distilbart-cnn-12-6 (default).

  • Input: A Naruto Uzumaki story (with/without an initial title line).

Key Observations:

  1. Conciseness: The summary distilled only the core arc (orphan → Hokage).

  2. Title Sensitivity: With the title line present, the model labeled Naruto as “The Seventh Hokage” and omitted his name; removing the title restored “Naruto.”

  3. Omission of Details: Side characters (Sasuke, Jiraiya, etc.) and subplots were dropped due to aggressive compression.

Insight: Useful for quick overviews but lacks narrative richness and requires parameter tuning or fine-tuned models for detail retention.

1.3 Experiment 2: Fine-Tuned English Models

1.3.1 facebook/bart-large-cnn

  • Pros: More verbose; includes “Naruto Uzumaki”.

  • Cons: Hallucination: misgendered Naruto as Kushina’s daughter.

1.3.2 csebuetnlp/mT5_multilingual_XLSum

  • Issue: Severe hallucinations; treated narrative like news, fabricating details (e.g., Konoha setting, BBC reporter).

1.3.3 Falconsai/text_summarization

# Load the fine-tuned summarization model
summarizer = pipeline("summarization", model="Falconsai/text_summarization")

# Experiment with increased max_length to get more detail
summarizer = summarizer(naruto_story, max_length=562, min_length=100, do_sample=False)

print("\n--- Fine-Tuned model on English Naruto Story ---")
print(summarizer[0]['summary_text'])
  • Setup: max_length=562, min_length=100, do_sample=False.

  • Performance: Rich, coherent summary including multiple characters and plot points; minor truncation at max_length cutoff.

Conclusion: For English narrative, Falconsai/text_summarization offers the best balance of detail and accuracy.

1.4 Experiment 3: Arabic Narrative Summarization

  • Model: csebuetnlp/mT5_multilingual_XLSum (with min_length=100).

Findings:

  1. Hallucinations persisted; invented BBC Arabic interview segments.

  2. Other Arabic or multilingual models similarly fabricated content.

  3. English-tuned models produced garbled output on Arabic input.

Conclusion: Off-the-shelf Arabic summarization models on Hugging Face currently exhibit unreliable hallucinations. Custom fine-tuning on Arabic narratives or larger Arabic LLMs may be required.


🌐 2. Machine Translation Deep Dive

2.1 Scope

  • Focus: Translate between English ↔ Modern Standard Arabic (MSA) and Arabic dialects.

  • Models Tested:

    1. facebook/nllb-200-distilled-600M

    2. Helsinki-NLP/opus-mt-ar-en

    3. Helsinki-NLP/opus-mt-en-ar

    4. Helsinki-NLP/opus-mt-mul-en

2.2 Experiment Results

Model MSA ↔ EN Dialectal AR → EN Notes
nllb-200-distilled-600M Strong, fluent Partial transliteration (“Yasta I am tired”) Requires explicit language codes.
opus-mt-ar-en Good formal AR → EN Struggled; literal or omitted slang Tends toward brevity.
opus-mt-en-ar Weak EN → AR N/A Incomplete outputs; unreliable.
opus-mt-mul-en Good formal AR → EN Poor on dialects Multilingual training offers no advantage on dialects.

Conclusion: MSA translation is well-supported. Dialects remain a hurdle; NLLB shows promise via its recognition/transliteration of colloquialisms. Specialized fine-tuning or larger LLMs needed for robust dialect handling.


🧠 Final Summary for Day 3

Today’s deep dive revealed both the capabilities and current limitations of open-source models when applied to Arabic-centric tasks:

📝 Summarization: English summaries are generally handled well—especially by models like Falconsai/text_summarization—producing coherent and detailed outputs. However, Arabic summarization continues to struggle with hallucinations and fragmented narratives, underscoring the need for Arabic-specific fine-tuning and better cultural grounding.

🌐 Translation: Modern Standard Arabic (MSA) is reasonably well-supported across several models. In contrast, Arabic dialects remain a major challenge, often yielding transliterations or contextually inaccurate translations. Among tested models, facebook/nllb-200-distilled-600M showed the most potential, particularly when used with explicit language codes.

More broadly, these experiments highlight the ongoing hurdles posed by linguistic diversity, dialectal variation, and cultural nuance—even for advanced multilingual systems. This experience strengthens my motivation to keep learning and, ultimately, contribute to building more inclusive tools for Arabic-speaking communities. 🌍💡


🔭 Vision for Day 4

Tomorrow’s mission is to wrap up all text-focused pipelines, completing the core set of foundational NLP tasks before shifting gears into vision models.

📌 Pipelines to Explore:

  1. Question Answering

    • Compare default vs. Arabic-optimized models

    • Test with both MSA and dialectal inputs

    • Evaluate performance on short vs. long contexts

  2. Named Entity Recognition (NER)

    • Assess entity extraction accuracy in Arabic and English

    • Look for confusion or missed entities, especially with dialect-specific names or terms

  3. Fill-Mask

    • Use models like bert-base-multilingual-cased and Arabic BERT variants

    • Observe predictions on varied inputs, including poetry, idioms, and slang

  4. Text Generation

    • Experiment with gpt2, mGPT, and Arabic GPT models

    • Evaluate fluency, coherence, and hallucination tendencies


🔁 Goal: Continue comparing default models with fine-tuned alternatives.
💡 Mindset: We're not just running tests — we're mapping the current landscape of Arabic in open-source NLP.
🎯 Outcome: By the end of Day 4, we’ll have a comprehensive understanding of Hugging Face’s strengths and gaps in multilingual text processing.