mBERTu
A Maltese multilingual model pre-trained on the Korpus Malti v4.0 using multilingual BERT as the initial checkpoint.
License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.
Citation
This work was first presented in Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and BERT Models for Maltese. Cite it as follows:
@inproceedings{BERTu,
title = "Pre-training Data Quality and Quantity for a Low-Resource Language: New Corpus and {BERT} Models for {M}altese",
author = "Micallef, Kurt and
Gatt, Albert and
Tanti, Marc and
van der Plas, Lonneke and
Borg, Claudia",
booktitle = "Proceedings of the Third Workshop on Deep Learning for Low-Resource Natural Language Processing",
month = jul,
year = "2022",
address = "Hybrid",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.deeplo-1.10",
doi = "10.18653/v1/2022.deeplo-1.10",
pages = "90--101",
}
- Downloads last month
- 28
Dataset used to train MLRS/mBERTu
Evaluation results
- Unlabelled Attachment Score on Maltese Universal Dependencies Treebank (MUDT)self-reported92.100
- Labelled Attachment Score on Maltese Universal Dependencies Treebank (MUDT)self-reported87.870
- UPOS Accuracy on MLRS POS datasetself-reported98.660
- XPOS Accuracy on MLRS POS datasetself-reported98.580
- Span-based F1 on WikiAnn (Maltese)self-reported86.600
- Macro-averaged F1 on Maltese Sentiment Analysis Datasetself-reported76.790