Model Details
- Developed by: Nikolas Ziartis
- Institute: University of Cyprus
- Model type: MarianMT (Transformer-based Seq2Seq)
- Source language: Cypriot Greek (ISO 639-1: cy)
- Target language: Modern Standard Greek (ISO 639-1: el)
- Fine-tuned from:
Helsinki-NLP/opus-mt-en-grk
- License: CC BY 4.0
Model Description
This model is a MarianMT transformer, fine-tuned via active learning to translate from the low-resource Cypriot Greek dialect into Modern Standard Greek. In nine iterative batches, we:
- Extracted high-dimensional embeddings for every unlabeled Cypriot sentence using the Greek LLM
ilsp/Meltemi-7B-Instruct-v1.5
:contentReference[oaicite:0]{index=0}. - Applied k-means clustering to select the 50 “most informative” sentence pairs per batch.
- Had human annotators translate those 50 sentences into Standard Greek.
- Fine-tuned the MarianMT model on the accumulating parallel corpus, freezing and unfreezing layers to preserve learned representations.
The result is a system that accurately captures colloquial Cypriot expressions while producing fluent Modern Greek.
Usage
from transformers import MarianMTModel, MarianTokenizer
model_name = "ZiartisNikolas/NMT-cypriot-dialect-to-greek"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)
src = ["Τζ̆αι φυσικά ήξερα ίνταμπου εγινίσκετουν."] # Cypriot Greek sentence
batch = tokenizer(src, return_tensors="pt", padding=True)
gen = model.generate(**batch)
print(tokenizer.batch_decode(gen, skip_special_tokens=True))
- Downloads last month
- 30
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support