Model Card for llama-estllm-protype-0825

llama-estllm-protype-0825 is the first artifact produced by the EstLLM project. The intention of this release is to evaluate the first prototype in a conversational ChatbotArena-style setting on baromeeter.ai, and thus establish a baseline for future improvements.

The model underwent continuous pre-training starting from Llama-3.1-8B on approximately 35B tokens, then supervised fine-tuning and direct preference optimization were applied.

Model Details

Model Description

  • Developed by: TartuNLP and TalTechNLP research groups
  • Funded by: Estonian Ministry of Education and Research, โ€œEstonian Language Technology Program 2018-2027โ€
  • Model type: Causal Language Model, Instruction-following
  • Language(s) (NLP): Estonian, English
  • License: Llama 3.1 Community License Agreement
  • Finetuned from model meta-llama/Llama-3.1-8B

Evaluation

Instruction-following

Every benchmark in this category is treated as a generative problem, and thus the evaluation is performed on the model responses obtained with 0 temperature (not logits).

Model (# parameters โ†“) IFEval-et* Winogrande-et** Trivia-et*** Grammar-et****
moonshotai/Kimi-K2-Instruct 0.7891 0.8138 0.4225 0.916
deepseek-ai/DeepSeek-V3-0324 0.7171 0.8042 0.27 0.364
meta-llama/Llama-3.1-405B-Instruct 0.7159 0.7878 0.4713 0.818
meta-llama/Llama-3.3-70B-Instruct 0.7705 0.7397 0.3875 0.797
Qwen/Qwen2.5-72B-Instruct 0.7407 0.7227 0.315 0.694
google/gemma-3-27b-it 0.7655 0.7510 0.325 0.817
utter-project/EuroLLM-9B-Instruct 0.5397 0.5846 0.3738 0.764
meta-llama/Llama-3.1-8B-Instruct 0.3797 0.5399 0.2888 0.657
tartuNLP/llama-estlm-prototype-0825 0.5174 0.5812 0.425 0.692
BSC-LT/salamandra-7b-instruct 0.5195 0.2878 0.2875 0.594
tartuNLP/Llammas 0.3524 0.5037 0.2838 0.529
Qwen/Qwen2.5-7B-Instruct 0.4988 0.5473 0.2938 0.598

* inst_level_strict_acc

** 3-shot, accuracy

*** 0-shot, accuracy

**** 0-shot, accuracy, formatted as multiple-choice

Translation

English to Estonian

Model wmt24pp (bleu โ†‘)
tartuNLP/llama-estlm-prototype-0825 0.264
utter-project/EuroLLM-9B-Instruct 0.2602
tartuNLP/Llammas 0.1472
meta-llama/Llama-3.1-8B-Instruct 0.1406
BSC-LT/salamandra-7b-instruct 0.1201
Qwen/Qwen2.5-7B-Instruct 0.0476

Limitations

This is an early prototype version. Accordignly, it has limitations in addition to the base Llama limitations:

  • Relatively short context of 4096 tokens. It's not expected to perform well on context sizes beyond that.
  • Multi-turn conversations are not supported in this version.
  • Trained with the original Llama 3.1 system prompt that has a hard-coded date cut-off.

Citation

TBA

Downloads last month
-
Safetensors
Model size
8.03B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tartuNLP/llama-estllm-protype-0825

Finetuned
(1540)
this model