--- tags: - translation - nmt - cypriot-greek - greek library_name: transformers languages: - cy - el license: cc-by-4.0 --- ## Model Details - **Developed by**: Nikolas Ziartis - **Institute**: University of Cyprus - **Model type**: MarianMT (Transformer-based Seq2Seq) - **Source language**: Cypriot Greek (ISO 639-1: cy) - **Target language**: Modern Standard Greek (ISO 639-1: el) - **Fine-tuned from**: `Helsinki-NLP/opus-mt-en-grk` - **License**: CC BY 4.0 ## Model Description This model is a MarianMT transformer, fine-tuned via active learning to translate from the low-resource Cypriot Greek dialect into Modern Standard Greek. In nine iterative batches, we: 1. **Extracted high-dimensional embeddings** for every unlabeled Cypriot sentence using the Greek LLM `ilsp/Meltemi-7B-Instruct-v1.5` :contentReference[oaicite:0]{index=0}. 2. **Applied k-means clustering** to select the 50 “most informative” sentence pairs per batch. 3. **Had human annotators** translate those 50 sentences into Standard Greek. 4. **Fine-tuned** the MarianMT model on the accumulating parallel corpus, freezing and unfreezing layers to preserve learned representations. The result is a system that accurately captures colloquial Cypriot expressions while producing fluent Modern Greek. ## Usage ```python from transformers import MarianMTModel, MarianTokenizer model_name = "ZiartisNikolas/NMT-cypriot-dialect-to-greek" tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name) src = ["Τί κανείς τζι εσιείς;"] # “How are you?” in Cypriot Greek batch = tokenizer(src, return_tensors="pt", padding=True) gen = model.generate(**batch) print(tokenizer.batch_decode(gen, skip_special_tokens=True))