NLLB-200 Fine-tuned for English-Tamazight Translation

A fine-tuned version of NLLB-200-distilled-600M for English ↔ Tamazight (Kabyle Latin script) translation, trained on a comprehensive dictionary dataset.

Model Description

This model is a fine-tuned version of facebook/nllb-200-distilled-600M specifically adapted for English-Tamazight translation. It was trained on ~9,000 translation pairs from a curated dictionary dataset containing vocabulary, verb conjugations, country names, and cultural phrases.

Developed by: Abdeljalil Ounaceur
Model type: Sequence-to-sequence transformer (fine-tuned NLLB)
Language(s): English (en), Tamazight/Kabyle Latin script (kab_Latn)
License: CC-BY-NC-4.0
Finetuned from: facebook/nllb-200-distilled-600M

Intended Uses

Direct Use

English to Tamazight translation for dictionary terms and basic phrases
Tamazight to English translation
Research in Berber language NLP
Educational applications for Tamazight language learning

Limitations

Experimental model: Mixed performance with improvements on dictionary terms but some degradation on general text
Domain specificity: Optimized for dictionary-style translations rather than natural conversation
Language variant: Some outputs may shift between Kabyle and Tachelhit variants
Catastrophic forgetting: Some original NLLB capabilities were lost during fine-tuning

How to Use

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load the model
tokenizer = AutoTokenizer.from_pretrained("Abdeljalil-Ounaceur/nllb-tamazight-souss")
model = AutoModelForSeq2SeqLM.from_pretrained("Abdeljalil-Ounaceur/nllb-tamazight-souss")

# Translate English to Tamazight
def translate(text, src_lang="eng_Latn", tgt_lang="kab_Latn"):
    inputs = tokenizer(text, return_tensors="pt")
    
    # Get target language token
    forced_bos_token_id = tokenizer.convert_tokens_to_ids(tgt_lang)
    
    generated_tokens = model.generate(
        **inputs,
        forced_bos_token_id=forced_bos_token_id,
        max_length=50,
        num_beams=4,
        early_stopping=True
    )
    
    return tokenizer.decode(generated_tokens[0], skip_special_tokens=True)

# Example usage
print(translate("house"))  # Expected: tamdint
print(translate("water"))  # Expected: aman

Abdeljalil-Ounaceur
/

nllb-tamazight-souss