English-Hebrew Translation Model

A fine-tuned MarianMT model for translating from English to Hebrew, specifically trained on biblical text from the New World Translation of the Holy Scriptures.

Model Description

Model type: MarianMT (Seq2Seq)
Language: English → Hebrew
Base model: Helsinki-NLP/opus-mt-en-he
Fine-tuned model: johnlockejrr/marianmt-en2he-nwt
Training data: New World Translation of the Holy Scriptures (Modern Hebrew translation)
BLEU Score: 40.68 (test set)
Character Accuracy: 32.21%

Dataset Information

The model was trained on the New World Translation of the Holy Scriptures dataset, which contains:

Source: English translation
Target: Modern Hebrew translation (not the original Biblical Hebrew)
Dataset size: 30,693 training examples, 3,837 validation examples, 3,837 test examples
Text type: Biblical scripture with religious terminology

Training Details

Training epochs: 28.1 (early stopping)
Learning rate: 1e-5
Batch size: 8 (gradient accumulation: 4, effective batch size: 32)
Mixed precision: FP16
Early stopping: Enabled
Training time: ~3.5 hours
Hardware: GPU training

Usage

Using the Model

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
model_name = "johnlockejrr/marianmt-en2he-nwt"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# English to Hebrew translation
english_text = "In the beginning God created the heavens and the earth."
inputs = tokenizer(english_text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=128, num_beams=4)
hebrew_translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(hebrew_translation)

Using the Pipeline

from transformers import pipeline

translator = pipeline("translation", model="johnlockejrr/marianmt-en2he-nwt")

# English to Hebrew
english_text = "Love your neighbor as yourself."
result = translator(english_text)
print(result[0]['translation_text'])

Interactive Translation

python inference.py --model_path ./english_hebrew_model_improved --text "Hello world" --direction en2he

Model Performance

Evaluation Metrics

BLEU Score: 40.68 (test set)
Character Accuracy: 32.21%
Test Loss: 1.20

Translation Examples

English	Hebrew Translation
Hello world	שלום עולם
In the beginning God created	בראשית ברא אלהים
Love	אהבה

Limitations

Domain Specificity: This model is specifically trained on biblical text and may perform best on religious/scriptural content.
Modern Hebrew: The Hebrew text is Modern Hebrew translation, not original Biblical Hebrew.
Context Sensitivity: Translation quality may vary depending on the context and complexity of the text.
Cultural Nuances: Some cultural and religious nuances may not be perfectly captured.

Training Configuration

training_args = Seq2SeqTrainingArguments(
    output_dir="./english_hebrew_model_improved",
    eval_strategy="steps",
    eval_steps=500,
    save_strategy="steps",
    save_steps=500,
    learning_rate=1e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    weight_decay=0.01,
    save_total_limit=5,
    num_train_epochs=30,
    predict_with_generate=True,
    fp16=True,
    load_best_model_at_end=True,
    metric_for_best_model="bleu",
    greater_is_better=True,
    gradient_accumulation_steps=4,
    warmup_steps=1000
)

Dataset Preparation

The dataset was prepared from the New World Translation corpus with the following preprocessing:

Text cleaning and normalization
Length filtering (5-1000 characters)
Length ratio filtering (0.3-3.0)
Train/validation/test split (80/10/10)

Citation

If you use this model in your research, please cite:

@misc{english_hebrew_translation_2025,
  title={English-Hebrew Translation Model},
  author={johnlockejrr},
  year={2025},
  url={https://huggingface.co/johnlockejrr/marianmt-en2he-nwt}
}

License

This model is released under the same license as the base model (MarianMT) and the training dataset.

Acknowledgments

Base model: Helsinki-NLP/opus-mt-en-he
Dataset: New World Translation of the Holy Scriptures
Training framework: Hugging Face Transformers

Contact

For questions or issues, please open an issue on the Hugging Face model page.

Note: This model is specifically designed for biblical text translation and may not perform optimally on general English-Hebrew translation tasks.

johnlockejrr
/

marianmt-en2he-nwt