Model Card for yeniguno/marianmt-en-tr-kafkaesque

A fine-tuned MarianMT model that translates English prose into Turkish with a deliberate “Kafkaesque” flavour.
The checkpoint starts from the bilingual Helsinki-NLP/opus-mt-en-tr base model and is further trained on ~10 k parallel sentences taken from published Turkish & English versions of Franz Kafka’s works.
The goal was purely experimental:

Can a compact MT model be nudged toward a specific literary voice by exposing it to a small, style-consistent corpus?


Model Details

Base architecture MarianMT (Transformer encoder-decoder)
Source languages en (contemporary English)
Target language tr (modern Turkish)
Training corpus 10 014 sentence pairs manually aligned from Turkish editions of Kafka’s short stories and their authorised English translations
Framework 🤗 Transformers ≥ 4.40
License Apache-2.0 for the model code + weights ✧ ⚠️ Translations used for fine-tuning may still be under copyright; see “Data & Copyright” below

Intended Uses & Scope

You can You should not
Generate draft Turkish renderings of Kafka excerpts originally translated into English Assume output is authoritative or publication-ready
Explore style-transfer / literary MT research Rely on the model for technical, legal or medical translation
Use as a starting point for further stylistic fine-tuning Expect high accuracy outside Kafka’s narrative domain

Training Procedure

  • Hardware: 1× A100 40 GB (Google Colab Pro)
  • Hyper-params: 5 epochs, batch 16 (eff.), LR 5 × 10⁻⁵, linear decay, warm-up 200 steps
  • Early stopping: patience 3 (@ 500-step evals) monitored on BLEU
  • Best checkpoint: step 2 500
    • Train loss ≈ 0.42 → Val loss ≈ 1.01
    • SacreBLEU (500-sent dev) baseline 24.4 → tuned 31.8

Quick Start

from transformers import MarianMTModel, MarianTokenizer

tr_en_model_name = "yeniguno/opus-mt-en-tr-kafkaesque"
tokenizer = MarianTokenizer.from_pretrained(tr_en_model_name)
model = MarianMTModel.from_pretrained(tr_en_model_name)

turkish_text ="My neighbor, at the same peculiar hour each night, left his room with a small, locked bag in hand."

inputs = tokenizer(turkish_text, return_tensors="pt", padding=True)
output_ids = model.generate(**inputs)
print(tokenizer.decode(output_ids[0], skip_special_tokens=True))
Downloads last month
86
Safetensors
Model size
235M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for yeniguno/opus-mt-en-tr-kafkaesque

Finetuned
(3)
this model