PortBERT: Navigating the Depths of Portuguese Language Models
PortBERT is a family of RoBERTa-based language models pre-trained from scratch on the Portuguese portion of OSCAR23 and MC4 (deduplicated variants of CulturaX). The models are designed to offer strong downstream performance in Portuguese NLP tasks, while providing insights into the cost-performance tradeoffs of training across hardware backends.
We release two variants:
PortBERT-base
: 126M parameters, trained on 8Γ A40 GPUs (fp32)PortBERT-large
: 357M parameters, trained on TPUv4-128 pod (fp32)
Model Details
Detail | PortBERT-base | PortBERT-large |
---|---|---|
Architecture | RoBERTa-base | RoBERTa-large |
Parameters | ~126M | ~357M |
Tokenizer | GPT-2 style (52k vocab) | Same |
Pretraining corpus | deduplicated mC4 and OSCAR 23 from CulturaX | Same |
Objective | Masked Language Modeling | Same |
Training time | ~27 days on 8Γ A40 | ~6.2 days on TPUv4-128 pod |
Precision | fp32 | fp32 |
Framework | fairseq | fairseq |
Downstream Evaluation (ExtraGLUE)
We evaluate PortBERT on ExtraGLUE, a Portuguese adaptation of the GLUE benchmark. Fine-tuning was conducted using HuggingFace Transformers, with NNI-based grid search over batch size and learning rate (28 configurations per task). Each task was fine-tuned for up to 10 epochs. Metrics were computed on validation sets due to the lack of held-out test sets.
AVG score averages the following metrics:
- STSB Spearman
- STSB Pearson
- RTE Accuracy
- WNLI Accuracy
- MRPC Accuracy
- MRPC F1
π§ͺ Evaluation Results
Legend: Bold = best, italic = second-best per model size.
Model | STSB_Sp | STSB_Pe | STSB_Mean | RTE_Acc | WNLI_Acc | MRPC_Acc | MRPC_F1 | AVG |
---|---|---|---|---|---|---|---|---|
Large models | ||||||||
XLM-RoBERTa_large | 90.00 | 90.27 | 90.14 | 82.31 | 57.75 | 90.44 | 93.31 | 84.01 |
EuroBERT-610m | 88.46 | 88.59 | 88.52 | 78.34 | 59.15 | 91.91 | 94.20 | 83.44 |
PortBERT_large | 88.53 | 88.68 | 88.60 | 72.56 | 61.97 | 89.46 | 92.39 | 82.26 |
BERTimbau_large | 89.40 | 89.61 | 89.50 | 75.45 | 59.15 | 88.24 | 91.55 | 82.23 |
Base models | ||||||||
RoBERTaLexPT_base | 86.68 | 86.86 | 86.77 | 69.31 | 59.15 | 89.46 | 92.34 | 80.63 |
PortBERT_base | 87.39 | 87.65 | 87.52 | 68.95 | 60.56 | 87.75 | 91.13 | 80.57 |
RoBERTaCrawlPT_base | 87.34 | 87.45 | 87.39 | 72.56 | 56.34 | 87.99 | 91.20 | 80.48 |
BERTimbau_base | 88.39 | 88.60 | 88.50 | 70.40 | 56.34 | 87.25 | 90.97 | 80.32 |
XLM-RoBERTa_base | 85.75 | 86.09 | 85.92 | 68.23 | 60.56 | 87.75 | 91.32 | 79.95 |
EuroBERT-210m | 86.54 | 86.62 | 86.58 | 65.70 | 57.75 | 87.25 | 91.00 | 79.14 |
AlBERTina 100M PTPT | 86.52 | 86.51 | 86.52 | 70.04 | 56.34 | 85.05 | 89.57 | 79.01 |
AlBERTina 100M PTBR | 85.97 | 85.99 | 85.98 | 68.59 | 56.34 | 85.78 | 89.82 | 78.75 |
AiBERTa | 83.56 | 83.73 | 83.65 | 64.98 | 56.34 | 82.11 | 86.99 | 76.29 |
roBERTa PT | 48.06 | 48.51 | 48.29 | 56.68 | 59.15 | 72.06 | 81.79 | 61.04 |
Fairseq Checkpoint
Get the fairseq checkpoint here.
π License
MIT License
- Downloads last month
- 16