Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025.
AI & ML interests
At the University of Helsinki, we focus on: - NLP for morphologically-rich languages - Cross-lingual NLP - NLP in the humanities
Recent Activity
View all activity
Organization Card
Helsinki-NLP refers to the language technology research group at the University of Helsinki. Here, we publish various resource related to multilingual NLP, machine translation, text simplification to name a few application areas. We focus on wide language coverage, open data sets and public pre-trained models.
multilingual translation models trained on the Tatoeba Translation Challenge dataset (from OPUS) and a massively multilingual Bible corpus
-
Helsinki-NLP/opus-mt-tc-bible-big-aav-fra_ita_por_spa
Translation • 0.2B • Updated • 17 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-en
Translation • 0.2B • Updated • 125 • 1 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_nld
Translation • 0.2B • Updated • 8 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_fra_por_spa
Translation • 0.2B • Updated • 484
Synthetic baselines trained for our paper "Scaling Low-Resource MT via Synthetic Data Generation with LLMs" accepted as a main in EMNLP 2025.
multilingual translation models trained on the Tatoeba Translation Challenge dataset (from OPUS) and a massively multilingual Bible corpus
-
Helsinki-NLP/opus-mt-tc-bible-big-aav-fra_ita_por_spa
Translation • 0.2B • Updated • 17 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-en
Translation • 0.2B • Updated • 125 • 1 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_nld
Translation • 0.2B • Updated • 8 • 2 -
Helsinki-NLP/opus-mt-tc-bible-big-afa-deu_eng_fra_por_spa
Translation • 0.2B • Updated • 484
models
1,534

Helsinki-NLP/opus-mt-synthetic-en-eu
Updated
•
19
•
1

Helsinki-NLP/opus-mt-synthetic-en-mk
Updated
•
11

Helsinki-NLP/opus-mt-synthetic-en-ka
Updated
•
17

Helsinki-NLP/opus-mt-synthetic-en-so
Updated
•
16

Helsinki-NLP/opus-mt-synthetic-en-is
Updated
•
8
•
1

Helsinki-NLP/opus-mt-synthetic-en-uk
Updated
•
17

Helsinki-NLP/opus-mt-synthetic-en-gd
Updated
•
11

Helsinki-NLP/simple-finnish-gpt3-xl
Text Generation
•
1B
•
Updated
•
63
•
1

Helsinki-NLP/opus-mt-tc-bible-big-deu_eng_fra_por_spa-mul
Translation
•
0.2B
•
Updated
•
168
•
1

Helsinki-NLP/opus-mt-tc-bible-big-mul-deu_eng_fra_por_spa
Translation
•
0.2B
•
Updated
•
42
•
2
datasets
50
Helsinki-NLP/fineweb-edu-translated
Preview
•
Updated
•
6.41k
•
1
Helsinki-NLP/OpenSubtitles2024
Viewer
•
Updated
•
570M
•
54
•
1
Helsinki-NLP/shroom
Preview
•
Updated
•
42
Helsinki-NLP/mu-shroom
Viewer
•
Updated
•
11.5k
•
227
•
4
Helsinki-NLP/tatoeba_mt_train
Viewer
•
Updated
•
13.7B
•
5.65k
•
1
Helsinki-NLP/tatoeba_mt
Updated
•
3.12k
•
60
Helsinki-NLP/un_pc
Viewer
•
Updated
•
323M
•
5.98k
•
23
Helsinki-NLP/un_ga
Viewer
•
Updated
•
1.11M
•
407
•
3
Helsinki-NLP/opus_books
Viewer
•
Updated
•
1.25M
•
23.6k
•
76
Helsinki-NLP/news_commentary
Viewer
•
Updated
•
4.23M
•
2.71k
•
37