English → Bangla Neural Machine Translation (NMT) - en-bn-nmt

This model is a fine-tuned version of shhossain/opus-mt-en-to-bn.
It has been trained on a combination of open and custom datasets to enhance English to Bengali (Bangla) translation accuracy, especially in conversational, motivational, and real-life sentence structures.

🧠 Model Description

The en-bn-nmt model is a neural machine translation model fine-tuned using the MarianMT architecture from Hugging Face.
The model's main goal is to translate from English to Bangla in an expressive and human-like way, preserving emotional context and natural tone.

📦 Datasets Used

Training was done on a diverse set of datasets, including:

🔹 Tatoeba Sentence Pairs (English-Bengali) — 10,000+ pairs
🔹 BanglaNMT Dataset — 3,518 pairs
🔹 ✍️ Custom-curated dataset (~500+ pairs) — Sourced from social platforms, YouTube transcriptions, and motivational content

This variety of sources was chosen to ensure the model can perform well in formal, informal, and spoken-language use cases.

🎯 Intended Uses & Limitations

Intended for:

Language learners and educators
Students and researchers
App developers building English to Bengali tools

Not suitable for:

Legal, medical, or sensitive document translation
High-stakes production use without human review

⚙️ Training Details

Platform: Google Colab
GPU: T4 (via Google Colab Pro)
Disk Usage: ~112 GB

📈 Training Results

Training Loss	Epoch	Validation Loss	BLEU Score
0.2006	1	0.191746	24.8904
0.1334	2	0.164062	29.9686
0.0970	3	0.154388	33.1481

Fine‑Tuned Model BLEU: 33.14806

📊 Evaluation Metrics

Final BLEU Score: 33.14806
Final Validation Loss: 0.15439
Final Training Loss: 0.09700

🧪 Sample Output Evaluation

We evaluated the model using hand-crafted sentence sets based on motivational, real-life, and spoken content. While BLEU scores are useful, we also performed manual assessments to compare fluency and correctness against reference translations.

✅ Sample Input → Output Example

English:

"Don't wait for opportunity. Create it."

Model Output:

"সুযোগের জন্য অপেক্ষা করো না, এটা তৈরি কর"

Reference:

"সুযোগের অপেক্ষা করো না। নিজেকে তৈরি করো।"

(Model retains intent but slightly deviates in tone. Under refinement.)

📚 Usage Example

from transformers import MarianMTModel, MarianTokenizer

model_name = "monirbishal/en-bn-nmt"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

def translate(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True)
    translated = model.generate(**inputs)
    return tokenizer.decode(translated[0], skip_special_tokens=True)

translate("I have to go to sleep.")

🔬 Future Improvements

Improve idiomatic and conversational fluency
Add reverse translation support (Bn → En)
Include more complex sentence structures (narratives, dialogues, questions)

📘 License

This model is currently open for educational and research use.
We are working on assigning an appropriate license (likely Apache 2.0 or MIT).
If you use this model, please cite the original authors and datasets.

🙌 Acknowledgements

🧱 Tatoeba Project
🎓 CSE BUET NLP Group
🤗 Hugging Face for model hosting and architecture

✅ Open Source Notice

This project is released to the open-source community to promote better Bangla language technology for educational and real-world applications.
We welcome feedback, collaboration, and contributions.

📖 Citation

If you use this model, please cite it:

@misc{monirbishal_en_bn_nmt,
  title        = {English to Bengali Neural Machine Translation Model},
  author       = {Monir Bishal},
  howpublished = {\url{https://huggingface.co/monirbishal/en-bn-nmt}},
  year         = {2025}
}

monirbishal
/

en-bn-nmt