Model Card for yeniguno/marianmt-en-tr-kafkaesque
A fine-tuned MarianMT model that translates English prose into Turkish with a deliberate “Kafkaesque” flavour.
The checkpoint starts from the bilingual Helsinki-NLP/opus-mt-en-tr base model and is further trained on ~10 k parallel sentences taken from published Turkish & English versions of Franz Kafka’s works.
The goal was purely experimental:
Can a compact MT model be nudged toward a specific literary voice by exposing it to a small, style-consistent corpus?
Model Details
Base architecture | MarianMT (Transformer encoder-decoder) |
Source languages | en (contemporary English) |
Target language | tr (modern Turkish) |
Training corpus | 10 014 sentence pairs manually aligned from Turkish editions of Kafka’s short stories and their authorised English translations |
Framework | 🤗 Transformers ≥ 4.40 |
License | Apache-2.0 for the model code + weights ✧ ⚠️ Translations used for fine-tuning may still be under copyright; see “Data & Copyright” below |
Intended Uses & Scope
You can | You should not |
---|---|
Generate draft Turkish renderings of Kafka excerpts originally translated into English | Assume output is authoritative or publication-ready |
Explore style-transfer / literary MT research | Rely on the model for technical, legal or medical translation |
Use as a starting point for further stylistic fine-tuning | Expect high accuracy outside Kafka’s narrative domain |
Training Procedure
- Hardware: 1× A100 40 GB (Google Colab Pro)
- Hyper-params: 5 epochs, batch 16 (eff.), LR 5 × 10⁻⁵, linear decay, warm-up 200 steps
- Early stopping: patience 3 (@ 500-step evals) monitored on BLEU
- Best checkpoint: step 2 500
- Train loss ≈ 0.42 → Val loss ≈ 1.01
- SacreBLEU (500-sent dev) baseline 24.4 → tuned 31.8
Quick Start
from transformers import MarianMTModel, MarianTokenizer
tr_en_model_name = "yeniguno/opus-mt-en-tr-kafkaesque"
tokenizer = MarianTokenizer.from_pretrained(tr_en_model_name)
model = MarianMTModel.from_pretrained(tr_en_model_name)
turkish_text ="My neighbor, at the same peculiar hour each night, left his room with a small, locked bag in hand."
inputs = tokenizer(turkish_text, return_tensors="pt", padding=True)
output_ids = model.generate(**inputs)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
- Downloads last month
- 86
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for yeniguno/opus-mt-en-tr-kafkaesque
Base model
Helsinki-NLP/opus-mt-tc-big-en-tr