NLLB-200 Fine-tuned for English-Tamazight Translation

A fine-tuned version of NLLB-200-distilled-600M for English ↔ Tamazight (Kabyle Latin script) translation, trained on a comprehensive dictionary dataset.

Model Description

This model is a fine-tuned version of facebook/nllb-200-distilled-600M specifically adapted for English-Tamazight translation. It was trained on ~9,000 translation pairs from a curated dictionary dataset containing vocabulary, verb conjugations, country names, and cultural phrases.

  • Developed by: Abdeljalil Ounaceur
  • Model type: Sequence-to-sequence transformer (fine-tuned NLLB)
  • Language(s): English (en), Tamazight/Kabyle Latin script (kab_Latn)
  • License: CC-BY-NC-4.0
  • Finetuned from: facebook/nllb-200-distilled-600M

Intended Uses

Direct Use

  • English to Tamazight translation for dictionary terms and basic phrases
  • Tamazight to English translation
  • Research in Berber language NLP
  • Educational applications for Tamazight language learning

Limitations

  • Experimental model: Mixed performance with improvements on dictionary terms but some degradation on general text
  • Domain specificity: Optimized for dictionary-style translations rather than natural conversation
  • Language variant: Some outputs may shift between Kabyle and Tachelhit variants
  • Catastrophic forgetting: Some original NLLB capabilities were lost during fine-tuning

How to Use

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the model
tokenizer = AutoTokenizer.from_pretrained("Abdeljalil-Ounaceur/nllb-tamazight-souss")
model = AutoModelForSeq2SeqLM.from_pretrained("Abdeljalil-Ounaceur/nllb-tamazight-souss")

# Translate English to Tamazight
def translate(text, src_lang="eng_Latn", tgt_lang="kab_Latn"):
    inputs = tokenizer(text, return_tensors="pt")
    
    # Get target language token
    forced_bos_token_id = tokenizer.convert_tokens_to_ids(tgt_lang)
    
    generated_tokens = model.generate(
        **inputs,
        forced_bos_token_id=forced_bos_token_id,
        max_length=50,
        num_beams=4,
        early_stopping=True
    )
    
    return tokenizer.decode(generated_tokens[0], skip_special_tokens=True)

# Example usage
print(translate("house"))  # Expected: tamdint
print(translate("water"))  # Expected: aman
Downloads last month
21
Safetensors
Model size
615M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abdeljalil-Ounaceur/nllb-tamazight-souss

Finetuned
(187)
this model

Dataset used to train Abdeljalil-Ounaceur/nllb-tamazight-souss