Fully Fine-tuned NLLB Model for Bidirectional Odia ↔ German Translation

This is a fine-tuned version of facebook/nllb-200-distilled-600M specialized for bidirectional translation between Odia (ory_Orya) and German (deu_Latn).

This model was developed as part of a thesis project focused on effective fine-tuning strategies for low-resource language pairs within the journalistic domain. It was fine-tuned on a carefully constructed hybrid dataset, combining a larger set of high-quality, human-validated translations with a smaller set of machine-translated sentences to expand lexical, contextual and grammatical coverage.

Live Demo:

You can test this model live on its Hugging Face Spaces Gradio App.

Model Details

Base Model: facebook/nllb-200-distilled-600M
Languages: Odia (or), German (de)
Fine-tuning Domain: Journalistic text sourced from contemporary Odia newspapers (Dharitri & Sambad).
Developed by: Abhinandan Samal
Thesis: Enhancing Contextual Understanding in Low-Resource Languages Using Multilingual Transformers
University: IU International University of Applied Sciences
Date: June 18, 2025

Fine-tuning Details

Training and Evaluation Data

The model was fine-tuned on a meticulously prepared parallel corpus. Initially, 3,676 unique parallel line pairs were collected. Each "line" in the corpus was designed to provide contextual information for the model, typically containing 2-3 sentences, although some lines consist of a single sentence.

The data originates from two specific Odia newspapers and encompasses a diverse range of news domains, including National, International, Lifestyle, Sports, Trade, Environmental, Science and Technology, Leisure, Commerce, Metro, State, and Editorial.

The curation process involved distinct quality control steps for each language:

Odia Corpus Validation: All 3,676 lines on the Odia side of the parallel corpus underwent thorough evaluation and validation by a native Odia speaker (the author), ensuring high linguistic fidelity.
German Corpus Curation:
- A high-quality subset of 2,000 German lines (corresponding to 2,000 of the original parallel pairs) was meticulously human-evaluated and corrected by a native German speaker. This segment forms a core, highly accurate dataset.
- The remaining 1,676 German lines (corresponding to the other original parallel pairs) were generated using Google Translate. These lines were utilized to broaden the model's exposure to a wider range of vocabulary and grammatical structures.

Following this rigorous curation, the corpus was transformed into a final bidirectional training dataset, resulting in 7,352 distinct training instances. This was achieved by creating two training examples from each parallel pair, utilizing task-specific prefixes (translate Odia to German: and translate German to Odia:). The overall size of this dataset was carefully managed and selected as a practical upper limit dictated by the memory and computational constraints of the available single-GPU training environment (NVIDIA T4 on Google Colab Pro).

Here, you can check the dataset.

Training Procedure

The model was fine-tuned using PyTorch and the Hugging Face Seq2SeqTrainer.

Key Hyperparameters:

Learning Rate: 2e-5
Number of Epochs: 3
Effective Batch Size: 16 (per_device_train_batch_size=4 with gradient_accumulation_steps=4)
Optimizer: adafactor
Precision: Mixed Precision (fp16=True)
Memory Optimization: gradient_checkpointing=True

Evaluation Results

The fine-tuned model's performance was rigorously evaluated against the original facebook/nllb-200-distilled-600M baseline on a held-out test set composed partially (77%) of human-validated sentence pairs. I report scores across three standard machine translation metrics: BLEU (higher is better), chrF (higher is better), and TER (Translation Edit Rate, where lower is better).

Metric	Odia → German (Baseline)	Odia → German (Fine-Tuned)	German → Odia (Baseline)	German → Odia (Fine-Tuned)
BLEU	72.6444	65.1630	14.4641	21.2164
chrF	62.2442	78.9527	44.5058	48.5377
TER	63.0271	39.3919	106.0486	77.4971

Interpretation of Results

The evaluation reveals two distinct and important outcomes for the two translation directions, demonstrating substantial improvements in the Odia → German direction and a consistent positive trend for German → Odia compared to their respective baselines.:

Odia → German (Generating the High-Resource Language):

While the BLEU score saw a slight decrease (from 72.64 to 65.16), the fine-tuned model achieved a notable increase in chrF (from 62.24 to 78.95) and a significant reduction in TER (from 63.03 to 39.39). The strong performance in chrF and TER suggests that the fine-tuned model generates more fluent and accurate translations that require considerably less post-editing, aligning well with human judgment despite the BLEU score fluctuation.

BLEU score decreased slightly (72.64 → 65.16), while both scores are very high (above 60 is generally considered "quality often better than human" for BLEU, though context matters), the fine-tuned version is less effective by this metric.
chrF score saw a significant increased (62.24 → 78.95), showing superior morphological and character-level correctness.
The Translation Edit Rate (TER) saw a significant reduction (63.03 → 39.39), which signifies that the fine-tuned model's output requires substantially less human effort to correct and is structurally much more sound.

German → Odia (Generating the Low-Resource Language):

The fine-tuned model consistently improved across all metrics relative to its baseline. The BLEU score increased from 14.46 to 21.22, the chrF score improved from 44.51 to 48.53, and the TER score saw a substantial decrease from 106.05 to 77.50. These improvements indicate a more robust and higher-quality translation output for this direction.

This iteration of fine-tuning highlights a strong capacity for optimizing translation quality, particularly yielding a highly performant model for Odia → German. Future work will focus on further balancing and enhancing performance across both directions to achieve optimal bidirectional translation.

How to Use

The easiest way to use this model is with the translation pipeline from the transformers library. The model was trained to be bidirectional, and you can control the translation direction by specifying the src_lang and tgt_lang during the call.

from transformers import pipeline

# Load the translation pipeline with your fine-tuned model
model_id = "abhinandansamal/nllb-200-distilled-600M-finetuned-odia-german-bidirectional"
translator = pipeline("translation", model=model_id, device_map="auto")

# --- Example 1: Translate Odia to German ---
odia_text = "ଆଜି ପାଗ ବହୁତ ଭଲ ଅଛି।"

german_translation = translator(
    odia_text,
    src_lang="ory_Orya",
    tgt_lang="deu_Latn"
)
print(f"Odia Input: {odia_text}")
print(f"German Output: {german_translation[0]['translation_text']}")
# Expected Output: Heute ist das Wetter sehr gut.

# --- Example 2: Translate German to Odia ---
german_text = "Wie ist deine Gesundheit?"

odia_translation = translator(
    german_text,
    src_lang="deu_Latn",
    tgt_lang="ory_Orya"
)
print(f"\nGerman Input: {german_text}")
print(f"Odia Output: {odia_translation[0]['translation_text']}")
# Expected Output: ତୁମର ସ୍ୱାସ୍ଥ୍ୟ ଅବସ୍ଥା କ'ଣ?

Note: While the model was trained with task prefixes (translate Odia to German:), using the translation pipeline with src_lang and tgt_lang arguments is the cleaner, recommended method for inference, as it abstracts this detail away.

Intended Use

This model is primarily intended for translating journalistic text between Odia and German. Given its training on articles from various news domains (e.g., National, International, Lifestyle, Sports, Science and Technology), it is suitable for academic research, cross-lingual information retrieval from news sources, and as a supportive tool for language learners focusing on news-related content in this specific language pair.

Limitations & Bias

Domain Specificity: While encompassing various news domains, the model is not optimized for vastly different fields such as legal, medical, literary, or informal conversational text. Its performance is expected to be significantly lower on out-of-domain content outside of journalism.
Data-Inherited Bias: The model inherits stylistic and topical biases from its training data sources. Despite covering multiple news domains, the primary sources are two specific Odia newspapers. Furthermore, the inclusion of Google Translate-generated German lines in a portion of the training data may introduce or reinforce specific stylistic patterns inherent to machine translation outputs.

Achievements with Current Data Constraints

Despite the constraints in computational resources (single-GPU training on NVIDIA T4 via Google Colab Pro) and the relatively small, specialized dataset size (7,352 bidirectional lines), this fine-tuning process has achieved significant positive outcomes:

Substantial Quality Improvement: The fine-tuned model demonstrates a marked improvement over the baseline, particularly evidenced by substantial gains in chrF and significant reductions in TER for both translation directions. This indicates a higher quality of translation that requires less post-editing and exhibits better character-level accuracy, showcasing the effectiveness of fine-tuning even with limited data.
Practical Viability: The results highlight the practical feasibility of developing effective Neural Machine Translation systems for under-resourced language pairs like Odia-German, even when operating with initial data limitations and constrained resources.

Areas for Future Improvement

To further enhance the model's performance, generalizability, and address existing limitations, the following factors are key considerations for future development:

Expanded High-Quality Data: Increasing the size and diversity of the human-validated parallel corpus, particularly from domains beyond journalism, would be crucial for improving robustness and reducing reliance on machine-translated data.
Refined German Corpus Curation: Exploring strategies to further reduce the dependency on machine-translated content for the German side, potentially through more extensive human validation or alternative data acquisition methods.
Addressing Directional Nuances: Further investigation into the specific performance characteristics of each translation direction (e.g., the BLEU score behavior in Odia → German) could lead to targeted optimizations for balanced bidirectional performance.
Advanced Data Augmentation: Exploring more sophisticated data augmentation techniques could effectively expand the training data's diversity without necessarily requiring more manual collection.
Model Architecture & Hyperparameter Optimization: Continued experimentation with different model architectures, fine-tuning strategies, and hyperparameter configurations could yield additional performance gains.
Bias Mitigation: Proactive strategies to identify and mitigate potential biases inherited from the training data sources could improve fairness and broader applicability.

Citation

If you use this model or the associated methodology in your research, please cite the following thesis:

@mastersthesis{SamalThesis2025,
  author = Abhinandan Samal,
  title  = Enhancing Contextual Understanding in Low-Resource Languages Using Multilingual Transformers,
  school = IU International University of Applied Sciences,
  year   = 2025
}

abhinandansamal
/

nllb-200-distilled-600M-finetuned-odia-german-bidirectional