PortBERT: Navigating the Depths of Portuguese Language Models

PortBERT is a family of RoBERTa-based language models pre-trained from scratch on the Portuguese portion of OSCAR23 and MC4 (deduplicated variants of CulturaX). The models are designed to offer strong downstream performance in Portuguese NLP tasks, while providing insights into the cost-performance tradeoffs of training across hardware backends.

We release two variants:

PortBERT-base: 126M parameters, trained on 8× A40 GPUs (fp32)
PortBERT-large: 357M parameters, trained on TPUv4-128 pod (fp32)

Model Details

Detail	PortBERT-base	PortBERT-large
Architecture	RoBERTa-base	RoBERTa-large
Parameters	~126M	~357M
Tokenizer	GPT-2 style (52k vocab)	Same
Pretraining corpus	deduplicated mC4 and OSCAR 23 from CulturaX	Same
Objective	Masked Language Modeling	Same
Training time	~27 days on 8× A40	~6.2 days on TPUv4-128 pod
Precision	fp32	fp32
Framework	fairseq	fairseq

Downstream Evaluation (ExtraGLUE)

We evaluate PortBERT on ExtraGLUE, a Portuguese adaptation of the GLUE benchmark. Fine-tuning was conducted using HuggingFace Transformers, with NNI-based grid search over batch size and learning rate (28 configurations per task). Each task was fine-tuned for up to 10 epochs. Metrics were computed on validation sets due to the lack of held-out test sets.

AVG score averages the following metrics:

STSB Spearman
STSB Pearson
RTE Accuracy
WNLI Accuracy
MRPC Accuracy
MRPC F1

🧪 Evaluation Results

Legend: Bold = best, italic = second-best per model size.

Model	STSB_Sp	STSB_Pe	STSB_Mean	RTE_Acc	WNLI_Acc	MRPC_Acc	MRPC_F1	AVG
Large models
XLM-RoBERTa_large	90.00	90.27	90.14	82.31	57.75	90.44	93.31	84.01
EuroBERT-610m	88.46	88.59	88.52	78.34	59.15	91.91	94.20	83.44
PortBERT_large	88.53	88.68	88.60	72.56	61.97	89.46	92.39	82.26
BERTimbau_large	89.40	89.61	89.50	75.45	59.15	88.24	91.55	82.23
Base models
RoBERTaLexPT_base	86.68	86.86	86.77	69.31	59.15	89.46	92.34	80.63
PortBERT_base	87.39	87.65	87.52	68.95	60.56	87.75	91.13	80.57
RoBERTaCrawlPT_base	87.34	87.45	87.39	72.56	56.34	87.99	91.20	80.48
BERTimbau_base	88.39	88.60	88.50	70.40	56.34	87.25	90.97	80.32
XLM-RoBERTa_base	85.75	86.09	85.92	68.23	60.56	87.75	91.32	79.95
EuroBERT-210m	86.54	86.62	86.58	65.70	57.75	87.25	91.00	79.14
AlBERTina 100M PTPT	86.52	86.51	86.52	70.04	56.34	85.05	89.57	79.01
AlBERTina 100M PTBR	85.97	85.99	85.98	68.59	56.34	85.78	89.82	78.75
AiBERTa	83.56	83.73	83.65	64.98	56.34	82.11	86.99	76.29
roBERTa PT	48.06	48.51	48.29	56.68	59.15	72.06	81.79	61.04

Fairseq Checkpoint

Get the fairseq checkpoint here.

📜 License

MIT License

Downloads last month: 7

Safetensors

Model size

0.1B params

Tensor type

I64

F32

PortBERT
/

PortBERT_base