Finnish ModernBERT Model Card

Finnish ModernBERT large is an encoder model following ModernBERT architecture, pretrained on Finnish, Swedish, English, Code, Latin, and Northern Sámi. It was trained on 400B tokens. Training was conducted on the LUMI supercomputer. The project aimed to train multilingual encoder models that support long context and all official Finnish languages¹. The model can theoretically extrapolate to a context length of 128,000 tokens.

¹Multiple Sámi languages are spoken in Finland, but Northern Sámi is the most widespread and thus included in the training data. English is not the official language of Finland, but it is widely used. Latin was included for potential clinical use.

Model Overview
Training
Training data
Evaluation results
Ethical Considerations and Limitations
Aknowledgements
Licence
Citation information

Model Overview

Hyperparameter	Value
n_parameters	401M
n_layers	28
RoPE theta	10,000 / 1,000,000
vocab_size	55,616
sequence_length	16,000 / 128,000

Training

Pretraining was done using Distributed Data Parallelism, AdamW with ZeroRedundancyOptimizer, and the WSD learning rate schedule. The model was trained with a learning rate of 3e-4, a sequence length of 1024, and a RoPE theta of 10,000 for 350B tokens over 117,300 steps.

Long context training

The model was trained with a learning rate of 5e-5, increasing the context length from 1024 to 16,000 in six stages, where each sequence length was trained for an equal number of tokens, totaling 40B tokens over 16,560 steps. RoPE theta in global layers was increased to 1,000,000. Long documents were sampled from the original data in the distribution below:

Sequence lenght	%
<1000	21
1000-10000	78
10000-16000	1

Annealing

For the learning rate decay phase, the dataset was swapped into a high-quality subset. The RoPE theta and context length were kept the same as in long context training. The model was annealed for 10B tokens over 4,139 steps using $1-\sqrt{LR}$ learning rate decay.

Training data

All pretraining data (excluding the annealing data) were globally exact deduplicated, and PII-removed.

Pretraining data

Data by language

Language	Tokens	%
Code	14.12B	3.6
English	80.77B	20.7
Finnish	209.09B	53.6
Latin	0.94B	0.3
Northern Sámi	1.07B	0.3
Swedish	80.09B	20.5
Cross-lingual	3.98B	1.0
Total	390B	100

Individual datasets

Language	Dataset	Notes	Sampling fraction	Tokens
Code	Starcoder	GitHub issues	0.83	12.8B
Code	SmolLM	PythonEdu (score 5)	30	1.4B
English	Brithish Library	-	1	1.9B
English	Europarl	English subset	5	0.06B
English	FineWeb-Edu fortified	-	0.5	69.5B
English	Natural Instructions	-	1	0.7B
English	peS2o	-	0.13	51.9B
English	PubMed Central	-	0.1	22.1B
English	PubMed Abstracts	-	1	3.8B
English	Wikipedia	Dump 20241101	9	3.8B
Finnish	CC-fi	FinGPT	4	10.8B
Finnish	CulturaX	Finnish subset	3.7	16.9B
Finnish	HPLT 2.0	Finnish subset	3.7	19.1B
Finnish	nlfcl-fi	Finnish subset	6	0.02B
Finnish	Europarl	Finnish subset	6	0.12B
Finnish	Lönnrot	FinGPT	6	0.13B
Finnish	Reddit-Fi	FinGPT	6	0.11B
Finnish	Suomi24	FinGPT	6	3.27B
Finnish	Wikipedia	Dump 20241101	30	0.13B
Finnish	Yle	FinGPT	30	0.22B
Finnish	Ylilauta	-	30	0.22B
Latin	CulturaX	Latin subset	30	0.03B
Northern Sámi	Glot500	Northern Sámi subset	30	0.004B
Northern Sámi	saami-web	-	30	0.017B
Northern Sámi	SALT	-	30	0.015B
Swedish	CulturaX	Swedish subset	1.09	28.7B
Swedish	Europarl	Swedish subset	5	0.05B
Swedish	fstc	-	5	0.002B
Swedish	HPLT 2.0	Swedish subset	1.05	35.8B
Swedish	nlfcl-sv	Swedish subset	5	0.014B
Swedish	Wikipedia	Dump 20241101	30	0.27B
Swedish	Yle	Swedish subset	30	0.27B
Cross-lingual	Tatoeba	English-Finnish	0.62	1.07B
Cross-lingual	OPUS	English-Northern Sámi	30	5K
Cross-lingual	Tatoeba	English-Swedish	0.57	1.15B
Cross-lingual	Tatoeba	Finnish-English	0.62	1.06B
Cross-lingual	OPUS	Finnish-Northern Sámi	30	12K
Cross-lingual	Tatoeba	Finnish-Swedish	5.7	0.12B
Cross-lingual	OPUS	Northern Sámi-English	30	5K
Cross-lingual	OPUS	Northern Sámi-Finnish	30	12K
Cross-lingual	OPUS	Northern Sámi-Swedish	30	0.8K
Cross-lingual	Tatoeba	Swedish-English	0.58	1.15B
Cross-lingual	Tatoeba	Swedish-Finnish	5.7	0.12B
Cross-lingual	OPUS	Swedish-Northern Sámi	30	0.8K

Annealing data

Details coming soon.

Evaluation results

Complete set of evaluations coming soon. A limited set of assessments using the modified version of EuroEval is presented in the table below. For each model, five learning rates were tested against the validation set, and the F1 score was used as a metric to determine the optimal learning rate. Results are the means of 10 iterations on the bootstrapped versions of the training and test sets.

Results indicate that Finnish ModernBERT is competitive against other multilingual models in short context and performs best in tasks not involving token level predictions.

Finnish

Model	scala-fi	scandisent-fi	turku-ner-fi	tydiqa-fi	Params (M)
FacebookAI/xlm-roberta-large	mcc: 50.84±3.76 \| macro_f1: 74.32±2.41	mcc: 90.39±1.12 \| macro_f1: 95.18±0.56	micro_f1_no_misc: 84.31±1.35 \| micro_f1: 81.93±1.07	f1: 56.66±5.70 \| em: 35.34±4.34	561.2
TurkuNLP/bert-base-finnish-cased-v1	mcc: 47.16±5.27 \| macro_f1: 72.98±2.47	mcc: 90.16±0.50 \| macro_f1: 95.08±0.25	micro_f1_no_misc: 82.04±1.33 \| micro_f1: 79.35±0.94	f1: 56.20±1.42 \| em: 35.68±1.82	125.2
TurkuNLP/bert-large-finnish-cased-v1	mcc: 58.81±2.46 \| macro_f1: 78.91±1.23	mcc: 91.69±0.60 \| macro_f1: 95.85±0.30	micro_f1_no_misc: 77.57±1.43 \| micro_f1: 74.50±1.74	f1: 59.91±1.19 \| em: 39.10±1.18	355.2
TurkuNLP/finnish-modernbert-base	mcc: 24.81±6.66 \| macro_f1: 61.46±3.62	mcc: 84.59±1.80 \| macro_f1: 92.26±0.89	micro_f1_no_misc: 56.17±4.80 \| micro_f1: 56.03±4.91	f1: 30.04±1.27 \| em: 14.22±1.25	143.4
TurkuNLP/finnish-modernbert-large	mcc: 51.88±3.07 \| macro_f1: 75.39±1.91	mcc: 88.02±2.33 \| macro_f1: 93.99±1.18	micro_f1_no_misc: 71.11±1.83 \| micro_f1: 70.47±1.44	f1: 43.45±2.92 \| em: 23.47±2.90	401.3
TurkuNLP/finnish-modernbert-large-seq-len-1024-117300-annealed	mcc: 49.81±4.13 \| macro_f1: 74.58±2.10	mcc: 88.50±2.88 \| macro_f1: 94.22±1.47	micro_f1_no_misc: 71.16±2.41 \| micro_f1: 70.58±2.01	f1: 42.40±3.43 \| em: 22.17±2.78	401.3
TurkuNLP/finnish-modernbert-tiny	mcc: 4.94±1.95 \| macro_f1: 51.89±1.24	mcc: 76.15±1.93 \| macro_f1: 88.05±0.97	micro_f1_no_misc: 52.45±1.23 \| micro_f1: 53.81±1.05	f1: 29.63±0.42 \| em: 14.59±0.58	51.6
intfloat/multilingual-e5-large	mcc: 12.06±4.33 \| macro_f1: 54.51±3.19	mcc: 90.77±0.70 \| macro_f1: 95.37±0.36	micro_f1_no_misc: 80.55±1.28 \| micro_f1: 78.08±1.14	f1: 60.87±1.77 \| em: 39.98±1.78	559.9

Swedish

Model	scala-sv	scandiqa-sv	suc3	swerec	Params (M)
AI-Sweden-Models/roberta-large-1160k	mcc: 76.24±1.30 \| macro_f1: 87.74±0.72	f1: 53.13±0.86 \| em: 46.76±1.08	micro_f1_no_misc: 79.27±2.28 \| micro_f1: 76.65±2.03	mcc: 77.43±0.65 \| macro_f1: 76.11±1.73	355.4
FacebookAI/xlm-roberta-large	mcc: 72.61±2.84 \| macro_f1: 85.79±1.42	f1: 47.91±1.23 \| em: 41.40±1.00	micro_f1_no_misc: 79.12±1.13 \| micro_f1: 76.69±1.14	mcc: 75.34±0.60 \| macro_f1: 70.16±2.52	561.2
TurkuNLP/finnish-modernbert-base	mcc: 58.79±2.50 \| macro_f1: 78.96±1.22	f1: 29.98±2.03 \| em: 23.35±2.22	micro_f1_no_misc: 51.67±3.10 \| micro_f1: 53.42±3.09	mcc: 63.10±3.20 \| macro_f1: 62.47±4.03	143.4
TurkuNLP/finnish-modernbert-large	mcc: 69.42±3.72 \| macro_f1: 84.50±2.01	f1: 34.26±0.85 \| em: 27.46±0.86	micro_f1_no_misc: 59.99±2.42 \| micro_f1: 60.27±2.05	mcc: 71.01±2.11 \| macro_f1: 71.36±1.14	401.3
TurkuNLP/finnish-modernbert-large-seq-len-1024-117300-annealed	mcc: 66.97±2.66 \| macro_f1: 83.38±1.36	f1: 38.83±2.12 \| em: 32.53±2.09	micro_f1_no_misc: 59.65±1.64 \| micro_f1: 59.91±1.33	mcc: 70.18±3.77 \| macro_f1: 69.85±4.05	401.3
TurkuNLP/finnish-modernbert-tiny	mcc: 11.31±3.88 \| macro_f1: 54.81±2.30	f1: 27.19±0.82 \| em: 19.54±0.97	micro_f1_no_misc: 48.06±2.18 \| micro_f1: 49.55±1.87	mcc: 63.73±1.75 \| macro_f1: 63.98±1.64	51.6
intfloat/multilingual-e5-large	mcc: 49.79±11.17 \| macro_f1: 73.39±6.85	f1: 52.23±0.90 \| em: 44.44±1.34	micro_f1_no_misc: 77.37±1.84 \| micro_f1: 75.75±1.76	mcc: 79.13±1.03 \| macro_f1: 77.44±2.85	559.9

English

Model	conll-en	scala-en	squad	sst5	Params (M)
FacebookAI/xlm-roberta-large	micro_f1_no_misc: 88.74±1.06 \| micro_f1: 88.12±0.94	mcc: 34.33±15.56 \| macro_f1: 64.04±9.79	f1: 70.42±0.84 \| em: 57.34±0.82	mcc: 58.86±1.33 \| macro_f1: 58.07±2.23	561.2
TurkuNLP/finnish-modernbert-base	micro_f1_no_misc: 70.64±2.52 \| micro_f1: 72.96±1.99	mcc: 14.04±3.08 \| macro_f1: 56.21±1.86	f1: 29.36±6.50 \| em: 18.20±5.63	mcc: 33.81±3.80 \| macro_f1: 46.50±2.77	143.4
TurkuNLP/finnish-modernbert-large	micro_f1_no_misc: 79.73±1.29 \| micro_f1: 80.90±1.11	mcc: 50.98±3.90 \| macro_f1: 74.94±2.06	f1: 55.98±2.65 \| em: 40.35±2.57	mcc: 37.08±5.53 \| macro_f1: 49.38±4.69	401.3
TurkuNLP/finnish-modernbert-large-seq-len-1024-117300-annealed	micro_f1_no_misc: 79.15±0.60 \| micro_f1: 80.20±0.47	mcc: 46.82±5.34 \| macro_f1: 72.62±2.64	f1: 58.70±1.98 \| em: 42.86±1.95	mcc: 38.60±3.48 \| macro_f1: 51.67±3.58	401.3
TurkuNLP/finnish-modernbert-tiny	micro_f1_no_misc: 68.71±1.09 \| micro_f1: 71.02±0.89	mcc: 4.72±2.12 \| macro_f1: 51.47±1.40	f1: 12.00±0.47 \| em: 4.96±0.43	mcc: 21.24±4.35 \| macro_f1: 40.46±2.94	51.6
intfloat/multilingual-e5-large	micro_f1_no_misc: 90.83±0.49 \| micro_f1: 90.08±0.41	mcc: 37.27±8.82 \| macro_f1: 68.10±4.43	f1: 72.19±0.85 \| em: 58.64±0.76	mcc: 65.11±0.97 \| macro_f1: 64.68±2.38	559.9
microsoft/deberta-v3-base	micro_f1_no_misc: 91.05±0.53 \| micro_f1: 90.46±0.54	mcc: 64.68±1.29 \| macro_f1: 81.85±0.67	f1: 75.68±0.86 \| em: 62.80±0.98	mcc: 62.03±1.05 \| macro_f1: 60.52±3.55	183.8

Ethical Considerations and Limitations

Finnish ModernBERT may produce representations that reflect biases and patterns present in its training data. The training data were not filtered for toxic, harmful, or offensive content to serve various use cases.

Aknowledgements

We thank CSC, the IT Center for Science in Finland, for the computational resources. We thank The Language Bank of Finland for additional resources for Finnish, Finland-Swedish, and Swedish. This research was also supported by HPLT-project and Finnish Cultural Foundation.

Licence

Finnish ModernBert large is released under the Apache 2.0 license.

Citation information

Preprint coming soon. If you need to cite this work, please use the citation below:

@misc {finnish_modernbert_2025,
    author       = { Reunamo, Akseli and Pyysalo, Sampo },
    title        = { Finnish-ModernBert: A Family of ModernBerts for Finnish languages },
    year         = 2025,
    url          = {https://huggingface.co/collections/TurkuNLP/finnish-modernberts-685bb5f2ab4d39d6480a16d4},
    publisher    = { Hugging Face }
}

Model	scala-fi	scandisent-fi	turku-ner-fi	tydiqa-fi	Params (M)
FacebookAI/xlm-roberta-large	mcc: 50.84±3.76 \| macro_f1: 74.32±2.41	mcc: 90.39±1.12 \| macro_f1: 95.18±0.56	micro_f1_no_misc: 84.31±1.35 \| micro_f1: 81.93±1.07	f1: 56.66±5.70 \| em: 35.34±4.34	561.2
TurkuNLP/bert-base-finnish-cased-v1	mcc: 47.16±5.27 \| macro_f1: 72.98±2.47	mcc: 90.16±0.50 \| macro_f1: 95.08±0.25	micro_f1_no_misc: 82.04±1.33 \| micro_f1: 79.35±0.94	f1: 56.20±1.42 \| em: 35.68±1.82	125.2
TurkuNLP/bert-large-finnish-cased-v1	mcc: 58.81±2.46 \| macro_f1: 78.91±1.23	mcc: 91.69±0.60 \| macro_f1: 95.85±0.30	micro_f1_no_misc: 77.57±1.43 \| micro_f1: 74.50±1.74	f1: 59.91±1.19 \| em: 39.10±1.18	355.2
TurkuNLP/finnish-modernbert-base	mcc: 24.81±6.66 \| macro_f1: 61.46±3.62	mcc: 84.59±1.80 \| macro_f1: 92.26±0.89	micro_f1_no_misc: 56.17±4.80 \| micro_f1: 56.03±4.91	f1: 30.04±1.27 \| em: 14.22±1.25	143.4
TurkuNLP/finnish-modernbert-large	mcc: 51.88±3.07 \| macro_f1: 75.39±1.91	mcc: 88.02±2.33 \| macro_f1: 93.99±1.18	micro_f1_no_misc: 71.11±1.83 \| micro_f1: 70.47±1.44	f1: 43.45±2.92 \| em: 23.47±2.90	401.3
TurkuNLP/finnish-modernbert-large-seq-len-1024-117300-annealed	mcc: 49.81±4.13 \| macro_f1: 74.58±2.10	mcc: 88.50±2.88 \| macro_f1: 94.22±1.47	micro_f1_no_misc: 71.16±2.41 \| micro_f1: 70.58±2.01	f1: 42.40±3.43 \| em: 22.17±2.78	401.3
TurkuNLP/finnish-modernbert-tiny	mcc: 4.94±1.95 \| macro_f1: 51.89±1.24	mcc: 76.15±1.93 \| macro_f1: 88.05±0.97	micro_f1_no_misc: 52.45±1.23 \| micro_f1: 53.81±1.05	f1: 29.63±0.42 \| em: 14.59±0.58	51.6
intfloat/multilingual-e5-large	mcc: 12.06±4.33 \| macro_f1: 54.51±3.19	mcc: 90.77±0.70 \| macro_f1: 95.37±0.36	micro_f1_no_misc: 80.55±1.28 \| micro_f1: 78.08±1.14	f1: 60.87±1.77 \| em: 39.98±1.78	559.9

Model	scala-sv	scandiqa-sv	suc3	swerec	Params (M)
AI-Sweden-Models/roberta-large-1160k	mcc: 76.24±1.30 \| macro_f1: 87.74±0.72	f1: 53.13±0.86 \| em: 46.76±1.08	micro_f1_no_misc: 79.27±2.28 \| micro_f1: 76.65±2.03	mcc: 77.43±0.65 \| macro_f1: 76.11±1.73	355.4
FacebookAI/xlm-roberta-large	mcc: 72.61±2.84 \| macro_f1: 85.79±1.42	f1: 47.91±1.23 \| em: 41.40±1.00	micro_f1_no_misc: 79.12±1.13 \| micro_f1: 76.69±1.14	mcc: 75.34±0.60 \| macro_f1: 70.16±2.52	561.2
TurkuNLP/finnish-modernbert-base	mcc: 58.79±2.50 \| macro_f1: 78.96±1.22	f1: 29.98±2.03 \| em: 23.35±2.22	micro_f1_no_misc: 51.67±3.10 \| micro_f1: 53.42±3.09	mcc: 63.10±3.20 \| macro_f1: 62.47±4.03	143.4
TurkuNLP/finnish-modernbert-large	mcc: 69.42±3.72 \| macro_f1: 84.50±2.01	f1: 34.26±0.85 \| em: 27.46±0.86	micro_f1_no_misc: 59.99±2.42 \| micro_f1: 60.27±2.05	mcc: 71.01±2.11 \| macro_f1: 71.36±1.14	401.3
TurkuNLP/finnish-modernbert-large-seq-len-1024-117300-annealed	mcc: 66.97±2.66 \| macro_f1: 83.38±1.36	f1: 38.83±2.12 \| em: 32.53±2.09	micro_f1_no_misc: 59.65±1.64 \| micro_f1: 59.91±1.33	mcc: 70.18±3.77 \| macro_f1: 69.85±4.05	401.3
TurkuNLP/finnish-modernbert-tiny	mcc: 11.31±3.88 \| macro_f1: 54.81±2.30	f1: 27.19±0.82 \| em: 19.54±0.97	micro_f1_no_misc: 48.06±2.18 \| micro_f1: 49.55±1.87	mcc: 63.73±1.75 \| macro_f1: 63.98±1.64	51.6
intfloat/multilingual-e5-large	mcc: 49.79±11.17 \| macro_f1: 73.39±6.85	f1: 52.23±0.90 \| em: 44.44±1.34	micro_f1_no_misc: 77.37±1.84 \| micro_f1: 75.75±1.76	mcc: 79.13±1.03 \| macro_f1: 77.44±2.85	559.9

Model	conll-en	scala-en	squad	sst5	Params (M)
FacebookAI/xlm-roberta-large	micro_f1_no_misc: 88.74±1.06 \| micro_f1: 88.12±0.94	mcc: 34.33±15.56 \| macro_f1: 64.04±9.79	f1: 70.42±0.84 \| em: 57.34±0.82	mcc: 58.86±1.33 \| macro_f1: 58.07±2.23	561.2
TurkuNLP/finnish-modernbert-base	micro_f1_no_misc: 70.64±2.52 \| micro_f1: 72.96±1.99	mcc: 14.04±3.08 \| macro_f1: 56.21±1.86	f1: 29.36±6.50 \| em: 18.20±5.63	mcc: 33.81±3.80 \| macro_f1: 46.50±2.77	143.4
TurkuNLP/finnish-modernbert-large	micro_f1_no_misc: 79.73±1.29 \| micro_f1: 80.90±1.11	mcc: 50.98±3.90 \| macro_f1: 74.94±2.06	f1: 55.98±2.65 \| em: 40.35±2.57	mcc: 37.08±5.53 \| macro_f1: 49.38±4.69	401.3
TurkuNLP/finnish-modernbert-large-seq-len-1024-117300-annealed	micro_f1_no_misc: 79.15±0.60 \| micro_f1: 80.20±0.47	mcc: 46.82±5.34 \| macro_f1: 72.62±2.64	f1: 58.70±1.98 \| em: 42.86±1.95	mcc: 38.60±3.48 \| macro_f1: 51.67±3.58	401.3
TurkuNLP/finnish-modernbert-tiny	micro_f1_no_misc: 68.71±1.09 \| micro_f1: 71.02±0.89	mcc: 4.72±2.12 \| macro_f1: 51.47±1.40	f1: 12.00±0.47 \| em: 4.96±0.43	mcc: 21.24±4.35 \| macro_f1: 40.46±2.94	51.6
intfloat/multilingual-e5-large	micro_f1_no_misc: 90.83±0.49 \| micro_f1: 90.08±0.41	mcc: 37.27±8.82 \| macro_f1: 68.10±4.43	f1: 72.19±0.85 \| em: 58.64±0.76	mcc: 65.11±0.97 \| macro_f1: 64.68±2.38	559.9
microsoft/deberta-v3-base	micro_f1_no_misc: 91.05±0.53 \| micro_f1: 90.46±0.54	mcc: 64.68±1.29 \| macro_f1: 81.85±0.67	f1: 75.68±0.86 \| em: 62.80±0.98	mcc: 62.03±1.05 \| macro_f1: 60.52±3.55	183.8

TurkuNLP
/

finnish-modernbert-large