BERTu (Maltese News Categories)

This model is a fine-tuned version of MLRS/BERTu on the MLRS/maltese_news_categories dataset. It achieves the following results on the test set:

Loss: 0.1514
F1: 0.6052

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 32
eval_batch_size: 32
seed: 2
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: inverse_sqrt
lr_scheduler_warmup_ratio: 0.005
num_epochs: 200.0
early_stopping_patience: 20

Training results

Training Loss	Epoch	Step	Validation Loss	F1
No log	1.0	337	0.1525	0.1473
0.2654	2.0	674	0.1102	0.2997
0.1081	3.0	1011	0.0984	0.3558
0.1081	4.0	1348	0.0929	0.3846
0.0801	5.0	1685	0.0889	0.3935
0.0636	6.0	2022	0.0915	0.4238
0.0636	7.0	2359	0.0886	0.4707
0.0506	8.0	2696	0.0893	0.5307
0.0422	9.0	3033	0.0894	0.5242
0.0422	10.0	3370	0.0903	0.5166
0.0349	11.0	3707	0.0933	0.5229
0.0297	12.0	4044	0.0924	0.5512
0.0297	13.0	4381	0.0941	0.5428
0.0258	14.0	4718	0.0962	0.5798
0.0223	15.0	5055	0.0965	0.5618
0.0223	16.0	5392	0.0973	0.5852
0.0193	17.0	5729	0.0997	0.5900
0.0165	18.0	6066	0.1006	0.5874
0.0165	19.0	6403	0.1011	0.5824
0.015	20.0	6740	0.1037	0.5897
0.013	21.0	7077	0.1026	0.5919
0.013	22.0	7414	0.1040	0.5985
0.0115	23.0	7751	0.1053	0.5999
0.0102	24.0	8088	0.1059	0.5912
0.0102	25.0	8425	0.1069	0.6226
0.009	26.0	8762	0.1081	0.6065
0.0082	27.0	9099	0.1088	0.6001
0.0082	28.0	9436	0.1105	0.6129
0.0069	29.0	9773	0.1103	0.6199
0.0065	30.0	10110	0.1133	0.6117
0.0065	31.0	10447	0.1137	0.6100
0.0059	32.0	10784	0.1141	0.6053
0.005	33.0	11121	0.1162	0.6175
0.005	34.0	11458	0.1161	0.6095
0.0048	35.0	11795	0.1182	0.6111
0.0042	36.0	12132	0.1195	0.6027
0.0042	37.0	12469	0.1206	0.6080
0.004	38.0	12806	0.1212	0.6075
0.0037	39.0	13143	0.1220	0.6260
0.0037	40.0	13480	0.1265	0.6014
0.0033	41.0	13817	0.1246	0.6057
0.0031	42.0	14154	0.1232	0.6207
0.0031	43.0	14491	0.1261	0.6253
0.0029	44.0	14828	0.1256	0.6100
0.0027	45.0	15165	0.1261	0.6194
0.0025	46.0	15502	0.1272	0.6207
0.0025	47.0	15839	0.1279	0.6195
0.0023	48.0	16176	0.1293	0.6199
0.0021	49.0	16513	0.1315	0.6085
0.0021	50.0	16850	0.1315	0.6186
0.002	51.0	17187	0.1299	0.6117
0.002	52.0	17524	0.1312	0.6320
0.002	53.0	17861	0.1337	0.6232
0.0018	54.0	18198	0.1344	0.6135
0.0017	55.0	18535	0.1339	0.6201
0.0017	56.0	18872	0.1370	0.6221
0.0017	57.0	19209	0.1334	0.6133
0.0015	58.0	19546	0.1352	0.6199
0.0015	59.0	19883	0.1370	0.6189
0.0013	60.0	20220	0.1391	0.6155
0.0013	61.0	20557	0.1409	0.6143
0.0013	62.0	20894	0.1386	0.6218
0.0012	63.0	21231	0.1406	0.6225
0.0012	64.0	21568	0.1400	0.6134
0.0012	65.0	21905	0.1421	0.6221
0.0011	66.0	22242	0.1425	0.6224
0.0011	67.0	22579	0.1433	0.6235
0.0011	68.0	22916	0.1440	0.6294
0.0011	69.0	23253	0.1440	0.6230
0.001	70.0	23590	0.1443	0.6285
0.001	71.0	23927	0.1462	0.6279
0.0009	72.0	24264	0.1466	0.6281

Framework versions

Transformers 4.51.1
Pytorch 2.7.0+cu126
Datasets 3.2.0
Tokenizers 0.21.1

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}

MLRS
/

BERTu_maltese-news-categories

You need to agree to share your contact information to access this model

BERTu (Maltese News Categories)

Intended uses & limitations

Training procedure

Training hyperparameters

Training results

Framework versions

License

Citation

Model tree for MLRS/BERTu_maltese-news-categories

Dataset used to train MLRS/BERTu_maltese-news-categories

Collection including MLRS/BERTu_maltese-news-categories

BERTu

Evaluation results