Hebrew-English Translation Model
A fine-tuned MarianMT model for translating between Hebrew and English, specifically trained on biblical text from the New World Translation of the Holy Scriptures.
Model Description
- Model type: MarianMT (Seq2Seq)
- Language: Hebrew โ English
- Base model: Helsinki-NLP/opus-mt-mul-en
- Training data: New World Translation of the Holy Scriptures (Modern Hebrew translation)
- BLEU Score: 46.69 (test set)
- Character Accuracy: 26.57%
Dataset Information
The model was trained on the New World Translation of the Holy Scriptures dataset, which contains:
- Source: Modern Hebrew translation (not the original Biblical Hebrew)
- Target: English translation
- Dataset size: 30,693 training examples, 3,837 validation examples, 3,837 test examples
- Text type: Biblical scripture with religious terminology
Training Details
- Training epochs: 5.34 (early stopping)
- Learning rate: 2e-5
- Batch size: 8
- Mixed precision: FP16
- Early stopping: Enabled with patience=3
- Training time: ~3.5 hours
- Hardware: GPU training
Usage
Using the Model
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
# Load model and tokenizer
model_name = "johnlockejrr/marianmt-he2en-nwt"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
# Hebrew to English translation
hebrew_text = "ืฉืืื ืขืืื"
inputs = tokenizer(hebrew_text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=128, num_beams=4)
english_translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(english_translation)
Using the Pipeline
from transformers import pipeline
translator = pipeline("translation", model="johnlockejrr/marianmt-he2en-nwt")
# Hebrew to English
hebrew_text = "ืืจืืฉืืช ืืจื ืืืืื ืืช ืืฉืืื ืืืช ืืืจืฅ"
result = translator(hebrew_text)
print(result[0]['translation_text'])
Interactive Translation
python inference.py --model_path ./hebrew_english_model --text "ืฉืืื ืขืืื" --direction he2en
Model Performance
Evaluation Metrics
- BLEU Score: 46.69 (test set)
- Character Accuracy: 26.57%
- Training Loss: 1.07
- Validation Loss: 1.11
Translation Examples
Hebrew | English Translation |
---|---|
ืฉืืื ืขืืื | Hello world |
ืืจืืฉืืช ืืจื ืืืืื | In the beginning God created |
ืืืื | Love |
Limitations
- Domain Specificity: This model is specifically trained on biblical text and may perform best on religious/scriptural content
- Modern Hebrew: The Hebrew text is Modern Hebrew translation, not original Biblical Hebrew
- Context Sensitivity: Translation quality may vary depending on the context and complexity of the text
- Cultural Nuances: Some cultural and religious nuances may not be perfectly captured
Training Configuration
training_args = Seq2SeqTrainingArguments(
output_dir="./hebrew_english_model",
eval_strategy="steps",
eval_steps=500,
save_strategy="steps",
save_steps=500,
learning_rate=2e-5,
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
weight_decay=0.01,
save_total_limit=3,
num_train_epochs=20,
predict_with_generate=True,
fp16=True,
load_best_model_at_end=True,
metric_for_best_model="bleu",
greater_is_better=True,
early_stopping_patience=3
)
Dataset Preparation
The dataset was prepared from the New World Translation corpus with the following preprocessing:
- Text cleaning and normalization
- Length filtering (5-1000 characters)
- Length ratio filtering (0.3-3.0)
- Train/validation/test split (80/10/10)
Citation
If you use this model in your research, please cite:
@misc{hebrew_english_translation_2025,
title={Hebrew-English Translation Model},
author={johnlockejrr},
year={2025},
url={https://huggingface.co/johnlockejrr/marianmt-he2en-nwt}
}
License
This model is released under the same license as the base model (MarianMT) and the training dataset.
Acknowledgments
- Base model: Helsinki-NLP/opus-mt-mul-en
- Dataset: New World Translation of the Holy Scriptures
- Training framework: Hugging Face Transformers
Contact
For questions or issues, please open an issue on the Hugging Face model page.
Note: This model is specifically designed for biblical text translation and may not perform optimally on general Hebrew-English translation tasks.
- Downloads last month
- 14
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for johnlockejrr/marianmt-he2en-nwt
Base model
Helsinki-NLP/opus-mt-mul-en