mT5-Small (OPUS-100 English→Maltese)
This model is a fine-tuned version of google/mt5-small on the MLRS/OPUS-MT-EN-Fixed dataset. It achieves the following results on the test set:
- Loss: 0.5395
- Bleu:
- Bleu: 0.5175
- Brevity Penalty: 0.9870
- Length Ratio: 0.9871
- Translation Length: 41331
- Reference Length: 41873
- Chrf:
- Score: 75.9261
- Char Order: 6
- Word Order: 0
- Beta: 2
- Gen Len: 51.54
Intended uses & limitations
The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.
Training procedure
The model was fine-tuned using a customised script.
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Use adafactor and the args are: No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10.0
Training results
Training Loss | Epoch | Step | Validation Loss | Bleu Bleu | Bleu Brevity Penalty | Bleu Length Ratio | Bleu Translation Length | Bleu Reference Length | Chrf Score | Chrf Char Order | Chrf Word Order | Chrf Beta | Gen Len |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.7973 | 1.0 | 7813 | 0.7208 | 0.4470 | 0.9888 | 0.9889 | 43489 | 43979 | 71.3941 | 6 | 0 | 2 | 54.3435 |
0.6534 | 2.0 | 15626 | 0.6406 | 0.4712 | 0.9931 | 0.9931 | 43675 | 43979 | 72.7865 | 6 | 0 | 2 | 54.1575 |
0.5785 | 3.0 | 23439 | 0.6027 | 0.4804 | 0.9937 | 0.9937 | 43703 | 43979 | 73.6939 | 6 | 0 | 2 | 54.501 |
0.5336 | 4.0 | 31252 | 0.5779 | 0.4900 | 0.9937 | 0.9937 | 43704 | 43979 | 74.1543 | 6 | 0 | 2 | 54.535 |
0.5034 | 5.0 | 39065 | 0.5617 | 0.4995 | 1.0 | 1.0004 | 43998 | 43979 | 74.7266 | 6 | 0 | 2 | 54.694 |
0.4797 | 6.0 | 46878 | 0.5501 | 0.4985 | 0.9897 | 0.9898 | 43530 | 43979 | 74.7707 | 6 | 0 | 2 | 54.3215 |
0.4576 | 7.0 | 54691 | 0.5458 | 0.5050 | 0.9921 | 0.9921 | 43632 | 43979 | 75.0066 | 6 | 0 | 2 | 54.259 |
0.4424 | 8.0 | 62504 | 0.5369 | 0.5062 | 0.9914 | 0.9915 | 43604 | 43979 | 75.0734 | 6 | 0 | 2 | 54.286 |
0.4287 | 9.0 | 70317 | 0.5358 | 0.5107 | 0.9875 | 0.9876 | 43434 | 43979 | 75.3841 | 6 | 0 | 2 | 54.1655 |
0.417 | 9.9988 | 78120 | 0.5350 | 0.5100 | 0.9868 | 0.9869 | 43404 | 43979 | 75.4033 | 6 | 0 | 2 | 54.058 |
Framework versions
- Transformers 4.48.1
- Pytorch 2.4.1+cu121
- Datasets 3.2.0
- Tokenizers 0.21.0
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:
@inproceedings{micallef-borg-2025-melabenchv1,
title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
author = "Micallef, Kurt and
Borg, Claudia",
editor = "Che, Wanxiang and
Nabende, Joyce and
Shutova, Ekaterina and
Pilehvar, Mohammad Taher",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
month = jul,
year = "2025",
address = "Vienna, Austria",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2025.findings-acl.1053/",
doi = "10.18653/v1/2025.findings-acl.1053",
pages = "20505--20527",
ISBN = "979-8-89176-256-5",
}
- Downloads last month
- -
Model tree for MLRS/mt5-small_opus100-eng-mlt
Base model
google/mt5-smallDataset used to train MLRS/mt5-small_opus100-eng-mlt
Collection including MLRS/mt5-small_opus100-eng-mlt
Evaluation results
- BLEU on MLRS/OPUS-MT-EN-FixedMELABench Leaderboard51.000
- ChrF on MLRS/OPUS-MT-EN-FixedMELABench Leaderboard75.400