---
license: mit
language:
- he
- en
base_model:
- Helsinki-NLP/opus-mt-mul-en
library_name: transformers
tags:
- Translation
---
# Hebrew-English Translation Model

A fine-tuned MarianMT model for translating between Hebrew and English, specifically trained on biblical text from the New World Translation of the Holy Scriptures.

## Model Description

- **Model type:** MarianMT (Seq2Seq)
- **Language:** Hebrew ↔ English
- **Base model:** Helsinki-NLP/opus-mt-mul-en
- **Training data:** New World Translation of the Holy Scriptures (Modern Hebrew translation)
- **BLEU Score:** 46.69 (test set)
- **Character Accuracy:** 26.57%

## Dataset Information

The model was trained on the New World Translation of the Holy Scriptures dataset, which contains:
- **Source:** Modern Hebrew translation (not the original Biblical Hebrew)
- **Target:** English translation
- **Dataset size:** 30,693 training examples, 3,837 validation examples, 3,837 test examples
- **Text type:** Biblical scripture with religious terminology

## Training Details

- **Training epochs:** 5.34 (early stopping)
- **Learning rate:** 2e-5
- **Batch size:** 8
- **Mixed precision:** FP16
- **Early stopping:** Enabled with patience=3
- **Training time:** ~3.5 hours
- **Hardware:** GPU training

## Usage

### Using the Model

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

# Load model and tokenizer
model_name =  "johnlockejrr/marianmt-he2en-nwt"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)

# Hebrew to English translation
hebrew_text = "שלום עולם"
inputs = tokenizer(hebrew_text, return_tensors="pt", padding=True)
outputs = model.generate(**inputs, max_length=128, num_beams=4)
english_translation = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(english_translation)
```

### Using the Pipeline

```python
from transformers import pipeline

translator = pipeline("translation", model="johnlockejrr/marianmt-he2en-nwt")

# Hebrew to English
hebrew_text = "בראשית ברא אלהים את השמים ואת הארץ"
result = translator(hebrew_text)
print(result[0]['translation_text'])
```

### Interactive Translation

```bash
python inference.py --model_path ./hebrew_english_model --text "שלום עולם" --direction he2en
```

## Model Performance

### Evaluation Metrics
- **BLEU Score:** 46.69 (test set)
- **Character Accuracy:** 26.57%
- **Training Loss:** 1.07
- **Validation Loss:** 1.11

### Translation Examples

| Hebrew | English Translation |
|--------|-------------------|
| שלום עולם | Hello world |
| בראשית ברא אלהים | In the beginning God created |
| אהבה | Love |

## Limitations

1. **Domain Specificity:** This model is specifically trained on biblical text and may perform best on religious/scriptural content
2. **Modern Hebrew:** The Hebrew text is Modern Hebrew translation, not original Biblical Hebrew
3. **Context Sensitivity:** Translation quality may vary depending on the context and complexity of the text
4. **Cultural Nuances:** Some cultural and religious nuances may not be perfectly captured

## Training Configuration

```python
training_args = Seq2SeqTrainingArguments(
    output_dir="./hebrew_english_model",
    eval_strategy="steps",
    eval_steps=500,
    save_strategy="steps",
    save_steps=500,
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    weight_decay=0.01,
    save_total_limit=3,
    num_train_epochs=20,
    predict_with_generate=True,
    fp16=True,
    load_best_model_at_end=True,
    metric_for_best_model="bleu",
    greater_is_better=True,
    early_stopping_patience=3
)
```

## Dataset Preparation

The dataset was prepared from the New World Translation corpus with the following preprocessing:
- Text cleaning and normalization
- Length filtering (5-1000 characters)
- Length ratio filtering (0.3-3.0)
- Train/validation/test split (80/10/10)

## Citation

If you use this model in your research, please cite:

```bibtex
@misc{hebrew_english_translation_2025,
  title={Hebrew-English Translation Model},
  author={johnlockejrr},
  year={2025},
  url={https://huggingface.co/johnlockejrr/marianmt-he2en-nwt}
}
```

## License

This model is released under the same license as the base model (MarianMT) and the training dataset.

## Acknowledgments

- Base model: Helsinki-NLP/opus-mt-mul-en
- Dataset: New World Translation of the Holy Scriptures
- Training framework: Hugging Face Transformers

## Contact

For questions or issues, please open an issue on the Hugging Face model page.

---

**Note:** This model is specifically designed for biblical text translation and may not perform optimally on general Hebrew-English translation tasks.