MLRS
/

Text Classification
Transformers
TensorBoard
Safetensors
Maltese
bert
Eval Results

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

BERTu (Maltese News Categories)

This model is a fine-tuned version of MLRS/BERTu on the MLRS/maltese_news_categories dataset. It achieves the following results on the test set:

  • Loss: 0.1514
  • F1: 0.6052

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 2
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: inverse_sqrt
  • lr_scheduler_warmup_ratio: 0.005
  • num_epochs: 200.0
  • early_stopping_patience: 20

Training results

Training Loss Epoch Step Validation Loss F1
No log 1.0 337 0.1525 0.1473
0.2654 2.0 674 0.1102 0.2997
0.1081 3.0 1011 0.0984 0.3558
0.1081 4.0 1348 0.0929 0.3846
0.0801 5.0 1685 0.0889 0.3935
0.0636 6.0 2022 0.0915 0.4238
0.0636 7.0 2359 0.0886 0.4707
0.0506 8.0 2696 0.0893 0.5307
0.0422 9.0 3033 0.0894 0.5242
0.0422 10.0 3370 0.0903 0.5166
0.0349 11.0 3707 0.0933 0.5229
0.0297 12.0 4044 0.0924 0.5512
0.0297 13.0 4381 0.0941 0.5428
0.0258 14.0 4718 0.0962 0.5798
0.0223 15.0 5055 0.0965 0.5618
0.0223 16.0 5392 0.0973 0.5852
0.0193 17.0 5729 0.0997 0.5900
0.0165 18.0 6066 0.1006 0.5874
0.0165 19.0 6403 0.1011 0.5824
0.015 20.0 6740 0.1037 0.5897
0.013 21.0 7077 0.1026 0.5919
0.013 22.0 7414 0.1040 0.5985
0.0115 23.0 7751 0.1053 0.5999
0.0102 24.0 8088 0.1059 0.5912
0.0102 25.0 8425 0.1069 0.6226
0.009 26.0 8762 0.1081 0.6065
0.0082 27.0 9099 0.1088 0.6001
0.0082 28.0 9436 0.1105 0.6129
0.0069 29.0 9773 0.1103 0.6199
0.0065 30.0 10110 0.1133 0.6117
0.0065 31.0 10447 0.1137 0.6100
0.0059 32.0 10784 0.1141 0.6053
0.005 33.0 11121 0.1162 0.6175
0.005 34.0 11458 0.1161 0.6095
0.0048 35.0 11795 0.1182 0.6111
0.0042 36.0 12132 0.1195 0.6027
0.0042 37.0 12469 0.1206 0.6080
0.004 38.0 12806 0.1212 0.6075
0.0037 39.0 13143 0.1220 0.6260
0.0037 40.0 13480 0.1265 0.6014
0.0033 41.0 13817 0.1246 0.6057
0.0031 42.0 14154 0.1232 0.6207
0.0031 43.0 14491 0.1261 0.6253
0.0029 44.0 14828 0.1256 0.6100
0.0027 45.0 15165 0.1261 0.6194
0.0025 46.0 15502 0.1272 0.6207
0.0025 47.0 15839 0.1279 0.6195
0.0023 48.0 16176 0.1293 0.6199
0.0021 49.0 16513 0.1315 0.6085
0.0021 50.0 16850 0.1315 0.6186
0.002 51.0 17187 0.1299 0.6117
0.002 52.0 17524 0.1312 0.6320
0.002 53.0 17861 0.1337 0.6232
0.0018 54.0 18198 0.1344 0.6135
0.0017 55.0 18535 0.1339 0.6201
0.0017 56.0 18872 0.1370 0.6221
0.0017 57.0 19209 0.1334 0.6133
0.0015 58.0 19546 0.1352 0.6199
0.0015 59.0 19883 0.1370 0.6189
0.0013 60.0 20220 0.1391 0.6155
0.0013 61.0 20557 0.1409 0.6143
0.0013 62.0 20894 0.1386 0.6218
0.0012 63.0 21231 0.1406 0.6225
0.0012 64.0 21568 0.1400 0.6134
0.0012 65.0 21905 0.1421 0.6221
0.0011 66.0 22242 0.1425 0.6224
0.0011 67.0 22579 0.1433 0.6235
0.0011 68.0 22916 0.1440 0.6294
0.0011 69.0 23253 0.1440 0.6230
0.001 70.0 23590 0.1443 0.6285
0.001 71.0 23927 0.1462 0.6279
0.0009 72.0 24264 0.1466 0.6281

Framework versions

  • Transformers 4.51.1
  • Pytorch 2.7.0+cu126
  • Datasets 3.2.0
  • Tokenizers 0.21.1

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

CC BY-NC-SA 4.0

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}
Downloads last month
-
Safetensors
Model size
126M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MLRS/BERTu_maltese-news-categories

Base model

MLRS/BERTu
Finetuned
(5)
this model

Dataset used to train MLRS/BERTu_maltese-news-categories

Collection including MLRS/BERTu_maltese-news-categories

Evaluation results