--- language: - ka - en license: apache-2.0 tags: - translation - evaluation - comet - mt-evaluation - georgian metrics: - kendall_tau - spearman_correlation - pearson_correlation model-index: - name: Georgian-COMET results: - task: type: translation-evaluation name: Machine Translation Evaluation dataset: name: Georgian MT Evaluation Dataset type: Darsala/georgian_metric_evaluation metrics: - type: pearson_correlation value: 0.878 name: Pearson Correlation - type: spearman_correlation value: 0.796 name: Spearman Correlation - type: kendall_tau value: 0.603 name: Kendall's Tau base_model: Unbabel/wmt22-comet-da datasets: - Darsala/georgian_metric_evaluation --- # Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation This is a [COMET](https://github.com/Unbabel/COMET) evaluation model fine-tuned specifically for English-Georgian machine translation evaluation. It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference. ## Model Description Georgian-COMET is a fine-tuned version of [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) that has been optimized for evaluating English-to-Georgian translations through knowledge distillation from Claude Sonnet 4. The model shows significant improvements over the base model when evaluating Georgian translations. ### Key Improvements over Base Model | Metric | Base COMET | Georgian-COMET | Improvement | |--------|------------|----------------|-------------| | Pearson | 0.867 | **0.878** | +1.1% | | Spearman | 0.759 | **0.796** | +3.7% | | Kendall | 0.564 | **0.603** | +3.9% | ## Paper - **Base Model Paper**: [COMET-22: Unbabel-IST 2022 Submission for the Metrics Shared Task](https://aclanthology.org/2022.wmt-1.52) (Rei et al., WMT 2022) - **This Model**: Paper coming soon ## Repository [https://github.com/LukaDarsalia/nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research) ## License Apache-2.0 ## Usage (unbabel-comet) Using this model requires unbabel-comet to be installed: ```bash pip install --upgrade pip # ensures that pip is current pip install unbabel-comet ``` ### Option 1: Direct Download from HuggingFace ```python from comet import load_from_checkpoint import requests import os # Download the model checkpoint model_url = "https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt" model_path = "georgian_comet.ckpt" # Download if not already present if not os.path.exists(model_path): response = requests.get(model_url) with open(model_path, 'wb') as f: f.write(response.content) # Load the model model = load_from_checkpoint(model_path) # Prepare your data data = [ { "src": "The cat sat on the mat.", "mt": "კატა ზის ხალიჩაზე.", "ref": "კატა იჯდა ხალიჩაზე." }, { "src": "Schools and kindergartens were opened.", "mt": "სკოლები და საბავშვო ბაღები გაიხსნა.", "ref": "გაიხსნა სკოლები და საბავშვო ბაღები." } ] # Get predictions model_output = model.predict(data, batch_size=8, gpus=1) print(model_output) ``` ### Option 2: Using comet CLI First download the model checkpoint: ```bash wget https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt -O georgian_comet.ckpt ``` Then use it with comet CLI: ```bash comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model georgian_comet.ckpt ``` ### Option 3: Integration with Evaluation Pipeline ```python from comet import load_from_checkpoint import pandas as pd # Load model model = load_from_checkpoint("georgian_comet.ckpt") # Load your evaluation data df = pd.read_csv("your_evaluation_data.csv") # Prepare data in COMET format data = [ { "src": row["sourceText"], "mt": row["targetText"], "ref": row["referenceText"] } for _, row in df.iterrows() ] # Get scores scores = model.predict(data, batch_size=16) print(f"Average score: {sum(scores['scores']) / len(scores['scores']):.3f}") ``` ## Intended Uses This model is intended to be used for **English-Georgian MT evaluation**. Given a triplet with (source sentence in English, translation in Georgian, reference translation in Georgian), it outputs a single score between 0 and 1 where 1 represents a perfect translation. ### Primary Use Cases 1. **MT System Development**: Evaluate and compare different English-Georgian MT systems 2. **Quality Assurance**: Automated quality checks for Georgian translations 3. **Research**: Study MT evaluation for morphologically rich languages like Georgian 4. **Production Monitoring**: Track translation quality in production environments ### Out-of-Scope Use - **Other Language Pairs**: This model is specifically fine-tuned for English-Georgian and may not perform well on other language pairs - **Reference-Free Evaluation**: The model requires reference translations - **Document-Level**: Optimized for sentence-level evaluation ## Training Details ### Training Data - **Dataset**: 5,000 English-Georgian pairs from [corp.dict.ge](https://corp.dict.ge/) - **MT Systems**: Translations from SMaLL-100, Google Translate, and Ucraft Translate - **Scoring Method**: Knowledge distillation from Claude Sonnet 4 with added Gaussian noise (σ=3) - **Details**: See [Darsala/georgian_metric_evaluation](https://huggingface.co/datasets/Darsala/georgian_metric_evaluation) ### Training Configuration ```yaml regression_metric: init_args: nr_frozen_epochs: 0.3 keep_embeddings_frozen: True optimizer: AdamW encoder_learning_rate: 1.5e-05 learning_rate: 1.5e-05 loss: mse dropout: 0.1 batch_size: 8 ``` ### Training Procedure 1. **Base Model**: Started from Unbabel/wmt22-comet-da checkpoint 2. **Knowledge Distillation**: Used Claude Sonnet 4 scores as training targets 3. **Robustness**: Added Gaussian noise to training scores to prevent overfitting 4. **Optimization**: 8 epochs with early stopping (patience=4) on validation Kendall's tau ## Evaluation Results ### Test Set Performance Evaluated on 400 human-annotated English-Georgian translation pairs: | Metric | Score | p-value | |--------|-------|---------| | Pearson | 0.878 | < 0.001 | | Spearman | 0.796 | < 0.001 | | Kendall | 0.603 | < 0.001 | ### Comparison with Other Metrics | Metric | Pearson | Spearman | Kendall | |--------|---------|----------|---------| | **Georgian-COMET** | **0.878** | 0.796 | 0.603 | | Base COMET | 0.867 | 0.759 | 0.564 | | LLM-Reference-Based | 0.852 | **0.798** | **0.660** | | CHRF++ | 0.739 | 0.690 | 0.498 | | TER | 0.466 | 0.443 | 0.311 | | BLEU | 0.413 | 0.497 | 0.344 | ## Languages Covered While the base model (XLM-R) covers 100+ languages, this fine-tuned version is specifically optimized for: - **Source Language**: English (en) - **Target Language**: Georgian (ka) For other language pairs, we recommend using the base [Unbabel/wmt22-comet-da](https://huggingface.co/Unbabel/wmt22-comet-da) model. ## Limitations 1. **Language Specific**: Optimized only for English→Georgian evaluation 2. **Domain**: Training data primarily from corp.dict.ge (general/literary domain) 3. **Reference Required**: Cannot perform reference-free evaluation 4. **Sentence Level**: Not optimized for document-level evaluation ## Citation If you use this model, please cite: ```bibtex @misc{georgian-comet-2025, title={Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation}, author={Luka Darsalia, Ketevan Bakhturidze, Saba Sturua}, year={2025}, publisher={HuggingFace}, url={https://huggingface.co/Darsala/georgian_comet} } @inproceedings{rei-etal-2022-comet, title = "{COMET}-22: Unbabel-{IST} 2022 Submission for the Metrics Shared Task", author = "Rei, Ricardo and C. de Souza, Jos{\'e} G. and Alves, Duarte and Zerva, Chrysoula and Farinha, Ana C and Glushkova, Taisiya and Lavie, Alon and Coheur, Luisa and Martins, Andr{\'e} F. T.", booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)", year = "2022", address = "Abu Dhabi, United Arab Emirates", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.wmt-1.52", pages = "578--585", } ``` ## Acknowledgments - [Unbabel](https://unbabel.com/) team for the base COMET model - [Anthropic](https://anthropic.com/) for Claude Sonnet 4 used in knowledge distillation - [corp.dict.ge](https://corp.dict.ge/) for the Georgian-English corpus - All contributors to the [nmt_metrics_research](https://github.com/LukaDarsalia/nmt_metrics_research) project