Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation

This is a COMET evaluation model fine-tuned specifically for English-Georgian machine translation evaluation. It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference.

Model Description

Georgian-COMET is a fine-tuned version of Unbabel/wmt22-comet-da that has been optimized for evaluating English-to-Georgian translations through knowledge distillation from Claude Sonnet 4. The model shows significant improvements over the base model when evaluating Georgian translations.

Key Improvements over Base Model

Metric Base COMET Georgian-COMET Improvement
Pearson 0.867 0.878 +1.1%
Spearman 0.759 0.796 +3.7%
Kendall 0.564 0.603 +3.9%

Paper

Repository

https://github.com/LukaDarsalia/nmt_metrics_research

License

Apache-2.0

Usage (unbabel-comet)

Using this model requires unbabel-comet to be installed:

pip install --upgrade pip  # ensures that pip is current 
pip install unbabel-comet

Option 1: Direct Download from HuggingFace

from comet import load_from_checkpoint
import requests
import os

# Download the model checkpoint
model_url = "https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt"
model_path = "georgian_comet.ckpt"

# Download if not already present
if not os.path.exists(model_path):
    response = requests.get(model_url)
    with open(model_path, 'wb') as f:
        f.write(response.content)

# Load the model
model = load_from_checkpoint(model_path)

# Prepare your data
data = [
    {
        "src": "The cat sat on the mat.",
        "mt": "კატა ზის ხალიჩაზე.",
        "ref": "კატა იჯდა ხალიჩაზე."
    },
    {
        "src": "Schools and kindergartens were opened.",
        "mt": "სკოლები და საბავშვო ბაღები გაიხსნა.",
        "ref": "გაიხსნა სკოლები და საბავშვო ბაღები."
    }
]

# Get predictions
model_output = model.predict(data, batch_size=8, gpus=1)
print(model_output)

Option 2: Using comet CLI

First download the model checkpoint:

wget https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt -O georgian_comet.ckpt

Then use it with comet CLI:

comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model georgian_comet.ckpt

Option 3: Integration with Evaluation Pipeline

from comet import load_from_checkpoint
import pandas as pd

# Load model
model = load_from_checkpoint("georgian_comet.ckpt")

# Load your evaluation data
df = pd.read_csv("your_evaluation_data.csv")

# Prepare data in COMET format
data = [
    {
        "src": row["sourceText"],
        "mt": row["targetText"],
        "ref": row["referenceText"]
    }
    for _, row in df.iterrows()
]

# Get scores
scores = model.predict(data, batch_size=16)
print(f"Average score: {sum(scores['scores']) / len(scores['scores']):.3f}")

Intended Uses

This model is intended to be used for English-Georgian MT evaluation.

Given a triplet with (source sentence in English, translation in Georgian, reference translation in Georgian), it outputs a single score between 0 and 1 where 1 represents a perfect translation.

Primary Use Cases

  1. MT System Development: Evaluate and compare different English-Georgian MT systems
  2. Quality Assurance: Automated quality checks for Georgian translations
  3. Research: Study MT evaluation for morphologically rich languages like Georgian
  4. Production Monitoring: Track translation quality in production environments

Out-of-Scope Use

  • Other Language Pairs: This model is specifically fine-tuned for English-Georgian and may not perform well on other language pairs
  • Reference-Free Evaluation: The model requires reference translations
  • Document-Level: Optimized for sentence-level evaluation

Training Details

Training Data

  • Dataset: 5,000 English-Georgian pairs from corp.dict.ge
  • MT Systems: Translations from SMaLL-100, Google Translate, and Ucraft Translate
  • Scoring Method: Knowledge distillation from Claude Sonnet 4 with added Gaussian noise (σ=3)
  • Details: See Darsala/georgian_metric_evaluation

Training Configuration

regression_metric:
  init_args:
    nr_frozen_epochs: 0.3
    keep_embeddings_frozen: True
    optimizer: AdamW
    encoder_learning_rate: 1.5e-05
    learning_rate: 1.5e-05
    loss: mse
    dropout: 0.1
    batch_size: 8

Training Procedure

  1. Base Model: Started from Unbabel/wmt22-comet-da checkpoint
  2. Knowledge Distillation: Used Claude Sonnet 4 scores as training targets
  3. Robustness: Added Gaussian noise to training scores to prevent overfitting
  4. Optimization: 8 epochs with early stopping (patience=4) on validation Kendall's tau

Evaluation Results

Test Set Performance

Evaluated on 400 human-annotated English-Georgian translation pairs:

Metric Score p-value
Pearson 0.878 < 0.001
Spearman 0.796 < 0.001
Kendall 0.603 < 0.001

Comparison with Other Metrics

Metric Pearson Spearman Kendall
Georgian-COMET 0.878 0.796 0.603
Base COMET 0.867 0.759 0.564
LLM-Reference-Based 0.852 0.798 0.660
CHRF++ 0.739 0.690 0.498
TER 0.466 0.443 0.311
BLEU 0.413 0.497 0.344

Languages Covered

While the base model (XLM-R) covers 100+ languages, this fine-tuned version is specifically optimized for:

  • Source Language: English (en)
  • Target Language: Georgian (ka)

For other language pairs, we recommend using the base Unbabel/wmt22-comet-da model.

Limitations

  1. Language Specific: Optimized only for English→Georgian evaluation
  2. Domain: Training data primarily from corp.dict.ge (general/literary domain)
  3. Reference Required: Cannot perform reference-free evaluation
  4. Sentence Level: Not optimized for document-level evaluation

Citation

If you use this model, please cite:

@misc{georgian-comet-2025,
  title={Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation},
  author={Luka Darsalia, Ketevan Bakhturidze, Saba Sturua},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/Darsala/georgian_comet}
}

@inproceedings{rei-etal-2022-comet,
  title = "{COMET}-22: Unbabel-{IST} 2022 Submission for the Metrics Shared Task",
  author = "Rei, Ricardo  and
    C. de Souza, Jos{\'e} G.  and
    Alves, Duarte  and
    Zerva, Chrysoula  and
    Farinha, Ana C  and
    Glushkova, Taisiya  and
    Lavie, Alon  and
    Coheur, Luisa  and
    Martins, Andr{\'e} F. T.",
  booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
  year = "2022",
  address = "Abu Dhabi, United Arab Emirates",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2022.wmt-1.52",
  pages = "578--585",
}

Acknowledgments

Downloads last month
0
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Darsala/georgian_comet

Finetuned
(1)
this model

Dataset used to train Darsala/georgian_comet

Evaluation results

  • Pearson Correlation on Georgian MT Evaluation Dataset
    self-reported
    0.878
  • Spearman Correlation on Georgian MT Evaluation Dataset
    self-reported
    0.796
  • Kendall's Tau on Georgian MT Evaluation Dataset
    self-reported
    0.603