MarianMT Indonesian-English Translation (Fine-tuned)

This model is a fine-tuned version of Helsinki-NLP/opus-mt-id-en for Indonesian to English translation.

Model Details

  • Base Model: Helsinki-NLP/opus-mt-id-en
  • Fine-tuned on: TED Talks parallel corpus (Indonesian-English)
  • Training Date: 2025-05-25
  • Languages: Indonesian (id) โ†’ English (en)
  • License: Apache 2.0

Training Configuration

  • Training Framework: PyTorch + Transformers
  • Training Data: TED Talks parallel corpus
  • Dataset Usage: 50% of full dataset
  • Training Parameters:
    • Learning Rate: 3e-5
    • Batch Size: 4/2 (GPU/CPU)
    • Max Length: 128 tokens
    • Epochs: 5

Usage

from transformers import MarianMTModel, MarianTokenizer

# Load model and tokenizer
tokenizer = MarianTokenizer.from_pretrained("dhintech/marian-id-en-meeting-translation")
model = MarianMTModel.from_pretrained("dhintech/marian-id-en-meeting-translation")

# Translate Indonesian to English
def translate(text):
    inputs = tokenizer(text, return_tensors="pt", padding=True)
    outputs = model.generate(**inputs, max_length=128, num_beams=4)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

# Example usage
indonesian_text = "Selamat pagi, terima kasih sudah datang."
english_translation = translate(indonesian_text)
print(english_translation)

Example Translations

Indonesian English
Selamat pagi, terima kasih sudah datang. Good morning, thank you for coming.
Teknologi AI berkembang sangat pesat. AI technology is developing very rapidly.
Mari kita diskusikan hasil penelitian ini. Let's discuss the results of this research.

Performance

  • Optimized for conversational and presentation-style text
  • Best performance on formal Indonesian text
  • Model size: approximately 300MB
  • Suitable for mobile deployment

Citation

@misc{marian-id-en-meeting-translation,
  title={MarianMT Indonesian-English Translation (Fine-tuned)},
  author={DhinTech},
  year={2025},
  publisher={Hugging Face},
  journal={Hugging Face Model Hub},
  howpublished={\url{https://huggingface.co/dhintech/marian-id-en-meeting-translation}}
}
Downloads last month
18
Safetensors
Model size
72.3M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for dhintech/marian-id-en-meeting-translation

Finetuned
(12)
this model

Dataset used to train dhintech/marian-id-en-meeting-translation