NLLB-200 Distilled 600M — Hindi → Kangri (v2)

This is a fine-tuned version of facebook/nllb-200-distilled-600M for Hindi to Kangri translation.

This model was trained on a curated parallel corpus of 49k Hindi-Kangri sentence pairs, with additional vocabulary and tokenizer extensions for kang_Deva support.

Model Details

  • Model Architecture: Transformer (Encoder-Decoder)
  • Base: facebook/nllb-200-distilled-600M
  • Languages:
    • Source: hin_Deva (Hindi)
    • Target: kang_Deva (Kangri in Devanagari script)
  • Tokenizer: SentencePiece with extended vocabulary for kang_Deva
  • Direction Supported: Hindi → Kangri only (unidirectional)

How to Use

import torch
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model_name = "cloghost/nllb-200-distilled-600M-hin-kang-v2"
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

device = 0 if torch.cuda.is_available() else -1 
translator = pipeline(
    "translation",
    model=model,
    tokenizer=tokenizer,
    src_lang="hin_Deva",
    tgt_lang="kang_Deva",
    device=device
)

text = """मगर हिमाचली भाषा तो पहले से बोली जा रही है।
लोग सदियों से ही इसके संग जी रहे हैं।
पहाड़ी भाषा का इतिहास हिन्दी साहित्य के आदिकाल ,‌जिसे सिद्ध चारण काल के नाम से भी जानते हैं
"""

translation = translator(text)


Benchmark Scores

Evaluated on a clean 5k sample held-out test set:

Metric Value
BLEU 26.03
BLEU-4 14.11
ROUGE-1 4.73%
ROUGE-L 4.76%
METEOR 43.63%
BERTScore-F1 93.39%
BERT Precision 93.42%
BERT Recall 93.37%
ChrF 53.93
TER (↓ is better) 56.96
Empty Predictions 0

Downloads last month
40
Safetensors
Model size
615M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 1 Ask for provider support

Model tree for cloghost/nllb-200-distilled-600M-hin-kang-v2

Finetuned
(168)
this model

Space using cloghost/nllb-200-distilled-600M-hin-kang-v2 1

Evaluation results