georgian_comet / README.md
Darsala's picture
Update README.md
76a269c verified
metadata
language:
  - ka
  - en
license: apache-2.0
tags:
  - translation
  - evaluation
  - comet
  - mt-evaluation
  - georgian
metrics:
  - kendall_tau
  - spearman_correlation
  - pearson_correlation
model-index:
  - name: Georgian-COMET
    results:
      - task:
          type: translation-evaluation
          name: Machine Translation Evaluation
        dataset:
          name: Georgian MT Evaluation Dataset
          type: Darsala/georgian_metric_evaluation
        metrics:
          - type: pearson_correlation
            value: 0.878
            name: Pearson Correlation
          - type: spearman_correlation
            value: 0.796
            name: Spearman Correlation
          - type: kendall_tau
            value: 0.603
            name: Kendall's Tau
base_model: Unbabel/wmt22-comet-da
datasets:
  - Darsala/georgian_metric_evaluation

Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation

This is a COMET evaluation model fine-tuned specifically for English-Georgian machine translation evaluation. It receives a triplet with (source sentence, translation, reference translation) and returns a score that reflects the quality of the translation compared to both source and reference.

Model Description

Georgian-COMET is a fine-tuned version of Unbabel/wmt22-comet-da that has been optimized for evaluating English-to-Georgian translations through knowledge distillation from Claude Sonnet 4. The model shows significant improvements over the base model when evaluating Georgian translations.

Key Improvements over Base Model

Metric Base COMET Georgian-COMET Improvement
Pearson 0.867 0.878 +1.1%
Spearman 0.759 0.796 +3.7%
Kendall 0.564 0.603 +3.9%

Paper

Repository

https://github.com/LukaDarsalia/nmt_metrics_research

License

Apache-2.0

Usage (unbabel-comet)

Using this model requires unbabel-comet to be installed:

pip install --upgrade pip  # ensures that pip is current 
pip install unbabel-comet

Option 1: Direct Download from HuggingFace

from comet import load_from_checkpoint
import requests
import os

# Download the model checkpoint
model_url = "https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt"
model_path = "georgian_comet.ckpt"

# Download if not already present
if not os.path.exists(model_path):
    response = requests.get(model_url)
    with open(model_path, 'wb') as f:
        f.write(response.content)

# Load the model
model = load_from_checkpoint(model_path)

# Prepare your data
data = [
    {
        "src": "The cat sat on the mat.",
        "mt": "แƒ™แƒแƒขแƒ แƒ–แƒ˜แƒก แƒฎแƒแƒšแƒ˜แƒฉแƒแƒ–แƒ”.",
        "ref": "แƒ™แƒแƒขแƒ แƒ˜แƒฏแƒ“แƒ แƒฎแƒแƒšแƒ˜แƒฉแƒแƒ–แƒ”."
    },
    {
        "src": "Schools and kindergartens were opened.",
        "mt": "แƒกแƒ™แƒแƒšแƒ”แƒ‘แƒ˜ แƒ“แƒ แƒกแƒแƒ‘แƒแƒ•แƒจแƒ•แƒ แƒ‘แƒแƒฆแƒ”แƒ‘แƒ˜ แƒ’แƒแƒ˜แƒฎแƒกแƒœแƒ.",
        "ref": "แƒ’แƒแƒ˜แƒฎแƒกแƒœแƒ แƒกแƒ™แƒแƒšแƒ”แƒ‘แƒ˜ แƒ“แƒ แƒกแƒแƒ‘แƒแƒ•แƒจแƒ•แƒ แƒ‘แƒแƒฆแƒ”แƒ‘แƒ˜."
    }
]

# Get predictions
model_output = model.predict(data, batch_size=8, gpus=1)
print(model_output)

Option 2: Using comet CLI

First download the model checkpoint:

wget https://huggingface.co/Darsala/georgian_comet/resolve/main/model.ckpt -O georgian_comet.ckpt

Then use it with comet CLI:

comet-score -s {source-inputs}.txt -t {translation-outputs}.txt -r {references}.txt --model georgian_comet.ckpt

Option 3: Integration with Evaluation Pipeline

from comet import load_from_checkpoint
import pandas as pd

# Load model
model = load_from_checkpoint("georgian_comet.ckpt")

# Load your evaluation data
df = pd.read_csv("your_evaluation_data.csv")

# Prepare data in COMET format
data = [
    {
        "src": row["sourceText"],
        "mt": row["targetText"],
        "ref": row["referenceText"]
    }
    for _, row in df.iterrows()
]

# Get scores
scores = model.predict(data, batch_size=16)
print(f"Average score: {sum(scores['scores']) / len(scores['scores']):.3f}")

Intended Uses

This model is intended to be used for English-Georgian MT evaluation.

Given a triplet with (source sentence in English, translation in Georgian, reference translation in Georgian), it outputs a single score between 0 and 1 where 1 represents a perfect translation.

Primary Use Cases

  1. MT System Development: Evaluate and compare different English-Georgian MT systems
  2. Quality Assurance: Automated quality checks for Georgian translations
  3. Research: Study MT evaluation for morphologically rich languages like Georgian
  4. Production Monitoring: Track translation quality in production environments

Out-of-Scope Use

  • Other Language Pairs: This model is specifically fine-tuned for English-Georgian and may not perform well on other language pairs
  • Reference-Free Evaluation: The model requires reference translations
  • Document-Level: Optimized for sentence-level evaluation

Training Details

Training Data

  • Dataset: 5,000 English-Georgian pairs from corp.dict.ge
  • MT Systems: Translations from SMaLL-100, Google Translate, and Ucraft Translate
  • Scoring Method: Knowledge distillation from Claude Sonnet 4 with added Gaussian noise (ฯƒ=3)
  • Details: See Darsala/georgian_metric_evaluation

Training Configuration

regression_metric:
  init_args:
    nr_frozen_epochs: 0.3
    keep_embeddings_frozen: True
    optimizer: AdamW
    encoder_learning_rate: 1.5e-05
    learning_rate: 1.5e-05
    loss: mse
    dropout: 0.1
    batch_size: 8

Training Procedure

  1. Base Model: Started from Unbabel/wmt22-comet-da checkpoint
  2. Knowledge Distillation: Used Claude Sonnet 4 scores as training targets
  3. Robustness: Added Gaussian noise to training scores to prevent overfitting
  4. Optimization: 8 epochs with early stopping (patience=4) on validation Kendall's tau

Evaluation Results

Test Set Performance

Evaluated on 400 human-annotated English-Georgian translation pairs:

Metric Score p-value
Pearson 0.878 < 0.001
Spearman 0.796 < 0.001
Kendall 0.603 < 0.001

Comparison with Other Metrics

Metric Pearson Spearman Kendall
Georgian-COMET 0.878 0.796 0.603
Base COMET 0.867 0.759 0.564
LLM-Reference-Based 0.852 0.798 0.660
CHRF++ 0.739 0.690 0.498
TER 0.466 0.443 0.311
BLEU 0.413 0.497 0.344

Languages Covered

While the base model (XLM-R) covers 100+ languages, this fine-tuned version is specifically optimized for:

  • Source Language: English (en)
  • Target Language: Georgian (ka)

For other language pairs, we recommend using the base Unbabel/wmt22-comet-da model.

Limitations

  1. Language Specific: Optimized only for Englishโ†’Georgian evaluation
  2. Domain: Training data primarily from corp.dict.ge (general/literary domain)
  3. Reference Required: Cannot perform reference-free evaluation
  4. Sentence Level: Not optimized for document-level evaluation

Citation

If you use this model, please cite:

@misc{georgian-comet-2025,
  title={Georgian-COMET: Fine-tuned COMET for English-Georgian MT Evaluation},
  author={Luka Darsalia, Ketevan Bakhturidze, Saba Sturua},
  year={2025},
  publisher={HuggingFace},
  url={https://huggingface.co/Darsala/georgian_comet}
}

@inproceedings{rei-etal-2022-comet,
  title = "{COMET}-22: Unbabel-{IST} 2022 Submission for the Metrics Shared Task",
  author = "Rei, Ricardo  and
    C. de Souza, Jos{\'e} G.  and
    Alves, Duarte  and
    Zerva, Chrysoula  and
    Farinha, Ana C  and
    Glushkova, Taisiya  and
    Lavie, Alon  and
    Coheur, Luisa  and
    Martins, Andr{\'e} F. T.",
  booktitle = "Proceedings of the Seventh Conference on Machine Translation (WMT)",
  year = "2022",
  address = "Abu Dhabi, United Arab Emirates",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2022.wmt-1.52",
  pages = "578--585",
}

Acknowledgments