English β Bangla Neural Machine Translation (NMT) - en-bn-nmt
This model is a fine-tuned version of shhossain/opus-mt-en-to-bn.
It has been trained on a combination of open and custom datasets to enhance English to Bengali (Bangla) translation accuracy, especially in conversational, motivational, and real-life sentence structures.
π§ Model Description
The en-bn-nmt
model is a neural machine translation model fine-tuned using the MarianMT architecture from Hugging Face.
The model's main goal is to translate from English to Bangla in an expressive and human-like way, preserving emotional context and natural tone.
π¦ Datasets Used
Training was done on a diverse set of datasets, including:
- πΉ Tatoeba Sentence Pairs (English-Bengali) β 10,000+ pairs
- πΉ BanglaNMT Dataset β 3,518 pairs
- πΉ βοΈ Custom-curated dataset (~500+ pairs) β Sourced from social platforms, YouTube transcriptions, and motivational content
This variety of sources was chosen to ensure the model can perform well in formal, informal, and spoken-language use cases.
π― Intended Uses & Limitations
Intended for:
- Language learners and educators
- Students and researchers
- App developers building English to Bengali tools
Not suitable for:
- Legal, medical, or sensitive document translation
- High-stakes production use without human review
βοΈ Training Details
- Platform: Google Colab
- GPU: T4 (via Google Colab Pro)
- Disk Usage: ~112 GB
π Training Results
Training Loss | Epoch | Validation Loss | BLEU Score |
---|---|---|---|
0.2006 | 1 | 0.191746 | 24.8904 |
0.1334 | 2 | 0.164062 | 29.9686 |
0.0970 | 3 | 0.154388 | 33.1481 |
- FineβTuned Model BLEU: 33.14806
π Evaluation Metrics
- Final BLEU Score: 33.14806
- Final Validation Loss: 0.15439
- Final Training Loss: 0.09700
π§ͺ Sample Output Evaluation
We evaluated the model using hand-crafted sentence sets based on motivational, real-life, and spoken content. While BLEU scores are useful, we also performed manual assessments to compare fluency and correctness against reference translations.
β Sample Input β Output Example
English:
"Don't wait for opportunity. Create it."
Model Output:
"ΰ¦Έΰ§ΰ¦―ΰ§ΰ¦ΰ§ΰ¦° ΰ¦ΰ¦¨ΰ§ΰ¦― ΰ¦ ΰ¦ͺΰ§ΰ¦ΰ§ΰ¦·ΰ¦Ύ ΰ¦ΰ¦°ΰ§ না, ΰ¦ΰ¦ΰ¦Ύ ঀΰ§ΰ¦°ΰ¦Ώ ΰ¦ΰ¦°"
Reference:
"ΰ¦Έΰ§ΰ¦―ΰ§ΰ¦ΰ§ΰ¦° ΰ¦ ΰ¦ͺΰ§ΰ¦ΰ§ΰ¦·ΰ¦Ύ ΰ¦ΰ¦°ΰ§ নাΰ₯€ নিΰ¦ΰ§ΰ¦ΰ§ ঀΰ§ΰ¦°ΰ¦Ώ ΰ¦ΰ¦°ΰ§ΰ₯€"
(Model retains intent but slightly deviates in tone. Under refinement.)
π Usage Example
from transformers import MarianMTModel, MarianTokenizer
model_name = "monirbishal/en-bn-nmt"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
def translate(text):
inputs = tokenizer(text, return_tensors="pt", padding=True)
translated = model.generate(**inputs)
return tokenizer.decode(translated[0], skip_special_tokens=True)
translate("I have to go to sleep.")
π¬ Future Improvements
- Improve idiomatic and conversational fluency
- Add reverse translation support (Bn β En)
- Include more complex sentence structures (narratives, dialogues, questions)
π License
This model is currently open for educational and research use.
We are working on assigning an appropriate license (likely Apache 2.0 or MIT).
If you use this model, please cite the original authors and datasets.
π Acknowledgements
- π§± Tatoeba Project
- π CSE BUET NLP Group
- π€ Hugging Face for model hosting and architecture
β Open Source Notice
This project is released to the open-source community to promote better Bangla language technology for educational and real-world applications.
We welcome feedback, collaboration, and contributions.
π Citation
If you use this model, please cite it:
@misc{monirbishal_en_bn_nmt,
title = {English to Bengali Neural Machine Translation Model},
author = {Monir Bishal},
howpublished = {\url{https://huggingface.co/monirbishal/en-bn-nmt}},
year = {2025}
}
- Downloads last month
- 72
Model tree for monirbishal/en-bn-nmt
Base model
shhossain/opus-mt-en-to-bn