Model Card for educa-ai-nemo-sft

Model Details

Model Description

educa-ai-nemo-sft is our SFT fine-tune of the powerful mistralai/Mistral-Nemo-Instruct-2407, using our internal dataset which contains a unique mix of German and English instruction data covering a multitude of domains. In its creation we have paid special attention to data points that can improve performance in the educational field (text analysis, supporting students in completing textual tasks, ...).

This is a preliminary release and subject to changes or updates. Additionally, we are publishing a preference-aligned updated version of this model in the near future.

Developed by: Digital Learning GmbH
Funded by [optional]: Digital Learning GmbH
Shared by [optional]: Digital Learning GmbH
Model type: Transformer Decoder LLM
Language(s) (NLP): English, French, German, Spanish, Italian, Portuguese, Russian, Chinese, Japanese
License: Apache License 2.0
Finetuned from model: mistralai/Mistral-Nemo-Instruct-2407

Uses

As stated before, this is a preliminary release and we are still benchmarking the model as well as improving our datasets for possible further training. As such, we do not recommend using this model in a production setting yet and are looking forward to engaging with the community regarding possible downstream uses and improvements.

Bias, Risks, and Limitations

Refer to the original model card for an overview of the general risks associated with using this model. As this version is only fine-tuned using SFT without any preference alignment, the model may output harmful data. Use is at your own discretion, taking into account the potential risks.

How to Get Started with the Model

Refer to the original model card for code examples. Be aware that this model uses a slightly different chat template from the original: system prompts are placed before the first user prompt (before the first instance of [INST]). We include the updated template in the tokenizer config, so you can use tokenizer.apply_chat_template.

Training Details

Training Data

The model has been trained on a mix of some publically-available and permissively-licensed data as well as a majority of unique internal datasets which we have created. Our data encompasses examples of a length up to 16384 tokens, further enhancing the model's long-context capability.

Evaluation

IMPORTANT: We performed benchmarks using lighteval. The accuracy numbers obtained this way differ greatly from the base model's official benchmarks and those performed with different benchmark suites. Thus, we have run the same benchmarks using lighteval on the base model under the exact same conditions as well for comparison. As of 2025-01-24, We are working on running these benchmarks again using a different suite as well as running more German-specific benchmarks.

English Benchmarks

Benchmark	Mistral-Nemo-Instruct 2407	educa-ai-nemo-sft
HellaSwag (0-shot)	44.33%	38.65%
WinoGrande (0-shot)	55.49%	58.56%
OpenBookQA (0-shot)	40.60%	36.40%
CommonSenseQA (0-shot)	37.26%	39.31%
TruthfulQA (0-shot)	56.12%	59.94%
MMLU (5-shot)	30.10%	37.91%

Multilingual Benchmarks (MMLU)

Language	Mistral-Nemo-Instruct 2407	educa-ai-nemo-sft
French	30.32%	29.05%
German	27.69%	41.82%
Spanish	24.69%	30.25%
Italian	31.29%	34.81%
Portuguese	24.16%	28.81%
Chinese	34.80%	37.85%
Japanese	34.27%	35.18%

Model Card Authors [optional]

This model card was written by Lennard Michael Strohmeyer

Model Card Contact

Lennard Michael Strohmeyer

DigitalLearningGmbH
/

educa-ai-nemo-sft