Model Card for Mistral-Small-3.1-24B-Base-2503 (TEXT ONLY)

This is the text-only variant of mistralai/Mistral-Small-3.1-24B-Base-2503. This also serves as the base-model for mistralai/Devstral-Small-2505, which had no official base model released.

Features:

Text-only, no multimodality.
128k context length.

How was a text-only model achieved? The vision encoder was removed and the model architecture was converted from mistral3 to mistral. The tokenizer was not modified.

Reproduced eval

Serve with vLLM:

vllm serve casperhansen/Mistral-Small-3.1-24B-Base-2503-Text-Only

The reproduced results can be seen below.

Model	MMLU (0-shot)
Small 3.1 24B Base (Text Only)	77.25% ± 0.0033
Small 3.1 24B Base (Multimodal)	77.34% ± 0.0033

Original Multimodal: Full MMLU (Reproduced)

lm_eval --model local-completions \
  --model_args "base_url=http://localhost:8000/v1/completions,model=mistralai/Mistral-Small-3.1-24B-Base-2503" \
  --tasks mmlu \
  --batch_size 128

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
mmlu	2	none		acc	↑	0.7734	±	0.0033
- humanities	2	none		acc	↑	0.6820	±	0.0062
- formal_logic	1	none	0	acc	↑	0.5714	±	0.0443
- high_school_european_history	1	none	0	acc	↑	0.8303	±	0.0293
- high_school_us_history	1	none	0	acc	↑	0.9363	±	0.0171
- high_school_world_history	1	none	0	acc	↑	0.9241	±	0.0172
- international_law	1	none	0	acc	↑	0.9091	±	0.0262
- jurisprudence	1	none	0	acc	↑	0.8148	±	0.0376
- logical_fallacies	1	none	0	acc	↑	0.8589	±	0.0274
- moral_disputes	1	none	0	acc	↑	0.8208	±	0.0206
- moral_scenarios	1	none	0	acc	↑	0.3844	±	0.0163
- philosophy	1	none	0	acc	↑	0.8296	±	0.0214
- prehistory	1	none	0	acc	↑	0.8704	±	0.0187
- professional_law	1	none	0	acc	↑	0.6095	±	0.0125
- world_religions	1	none	0	acc	↑	0.8713	±	0.0257
- other	2	none		acc	↑	0.8317	±	0.0064
- business_ethics	1	none	0	acc	↑	0.8200	±	0.0386
- clinical_knowledge	1	none	0	acc	↑	0.8679	±	0.0208
- college_medicine	1	none	0	acc	↑	0.7803	±	0.0316
- global_facts	1	none	0	acc	↑	0.6600	±	0.0476
- human_aging	1	none	0	acc	↑	0.7982	±	0.0269
- management	1	none	0	acc	↑	0.9029	±	0.0293
- marketing	1	none	0	acc	↑	0.9359	±	0.0160
- medical_genetics	1	none	0	acc	↑	0.8900	±	0.0314
- miscellaneous	1	none	0	acc	↑	0.9183	±	0.0098
- nutrition	1	none	0	acc	↑	0.8791	±	0.0187
- professional_accounting	1	none	0	acc	↑	0.6277	±	0.0288
- professional_medicine	1	none	0	acc	↑	0.8603	±	0.0211
- virology	1	none	0	acc	↑	0.5602	±	0.0386
- social sciences	2	none		acc	↑	0.8736	±	0.0059
- econometrics	1	none	0	acc	↑	0.6491	±	0.0449
- high_school_geography	1	none	0	acc	↑	0.8990	±	0.0215
- high_school_government_and_politics	1	none	0	acc	↑	0.9637	±	0.0135
- high_school_macroeconomics	1	none	0	acc	↑	0.8103	±	0.0199
- high_school_microeconomics	1	none	0	acc	↑	0.9034	±	0.0192
- high_school_psychology	1	none	0	acc	↑	0.9358	±	0.0105
- human_sexuality	1	none	0	acc	↑	0.8855	±	0.0279
- professional_psychology	1	none	0	acc	↑	0.8578	±	0.0141
- public_relations	1	none	0	acc	↑	0.7909	±	0.0390
- security_studies	1	none	0	acc	↑	0.8327	±	0.0239
- sociology	1	none	0	acc	↑	0.9154	±	0.0197
- us_foreign_policy	1	none	0	acc	↑	0.9300	±	0.0256
- stem	2	none		acc	↑	0.7545	±	0.0073
- abstract_algebra	1	none	0	acc	↑	0.4600	±	0.0501
- anatomy	1	none	0	acc	↑	0.8148	±	0.0336
- astronomy	1	none	0	acc	↑	0.9211	±	0.0219
- college_biology	1	none	0	acc	↑	0.9444	±	0.0192
- college_chemistry	1	none	0	acc	↑	0.5700	±	0.0498
- college_computer_science	1	none	0	acc	↑	0.7100	±	0.0456
- college_mathematics	1	none	0	acc	↑	0.6200	±	0.0488
- college_physics	1	none	0	acc	↑	0.6569	±	0.0472
- computer_security	1	none	0	acc	↑	0.8300	±	0.0378
- conceptual_physics	1	none	0	acc	↑	0.8170	±	0.0253
- electrical_engineering	1	none	0	acc	↑	0.7931	±	0.0338
- elementary_mathematics	1	none	0	acc	↑	0.7910	±	0.0209
- high_school_biology	1	none	0	acc	↑	0.9323	±	0.0143
- high_school_chemistry	1	none	0	acc	↑	0.7586	±	0.0301
- high_school_computer_science	1	none	0	acc	↑	0.8900	±	0.0314
- high_school_mathematics	1	none	0	acc	↑	0.5185	±	0.0305
- high_school_physics	1	none	0	acc	↑	0.6291	±	0.0394
- high_school_statistics	1	none	0	acc	↑	0.7593	±	0.0292
- machine_learning	1	none	0	acc	↑	0.6250	±	0.0460

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7734	±	0.0033
- humanities	2	none	acc	↑	0.6820	±	0.0062
- other	2	none	acc	↑	0.8317	±	0.0064
- social sciences	2	none	acc	↑	0.8736	±	0.0059
- stem	2	none	acc	↑	0.7545	±	0.0073

Text Only: Full MMLU

lm_eval --model local-completions \
  --model_args "base_url=http://localhost:8000/v1/completions,model=casperhansen/Mistral-Small-3.1-24B-Base-2503-Text-Only" \
  --tasks mmlu \
  --batch_size 128

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
mmlu	2	none		acc	↑	0.7725	±	0.0033
- humanities	2	none		acc	↑	0.6793	±	0.0062
- formal_logic	1	none	0	acc	↑	0.5397	±	0.0446
- high_school_european_history	1	none	0	acc	↑	0.8364	±	0.0289
- high_school_us_history	1	none	0	acc	↑	0.9363	±	0.0171
- high_school_world_history	1	none	0	acc	↑	0.9198	±	0.0177
- international_law	1	none	0	acc	↑	0.9008	±	0.0273
- jurisprudence	1	none	0	acc	↑	0.8148	±	0.0376
- logical_fallacies	1	none	0	acc	↑	0.8405	±	0.0288
- moral_disputes	1	none	0	acc	↑	0.8237	±	0.0205
- moral_scenarios	1	none	0	acc	↑	0.3765	±	0.0162
- philosophy	1	none	0	acc	↑	0.8264	±	0.0215
- prehistory	1	none	0	acc	↑	0.8704	±	0.0187
- professional_law	1	none	0	acc	↑	0.6108	±	0.0125
- world_religions	1	none	0	acc	↑	0.8713	±	0.0257
- other	2	none		acc	↑	0.8339	±	0.0064
- business_ethics	1	none	0	acc	↑	0.8300	±	0.0378
- clinical_knowledge	1	none	0	acc	↑	0.8679	±	0.0208
- college_medicine	1	none	0	acc	↑	0.7746	±	0.0319
- global_facts	1	none	0	acc	↑	0.6800	±	0.0469
- human_aging	1	none	0	acc	↑	0.8027	±	0.0267
- management	1	none	0	acc	↑	0.9029	±	0.0293
- marketing	1	none	0	acc	↑	0.9402	±	0.0155
- medical_genetics	1	none	0	acc	↑	0.8900	±	0.0314
- miscellaneous	1	none	0	acc	↑	0.9208	±	0.0097
- nutrition	1	none	0	acc	↑	0.8791	±	0.0187
- professional_accounting	1	none	0	acc	↑	0.6312	±	0.0288
- professional_medicine	1	none	0	acc	↑	0.8603	±	0.0211
- virology	1	none	0	acc	↑	0.5602	±	0.0386
- social sciences	2	none		acc	↑	0.8739	±	0.0059
- econometrics	1	none	0	acc	↑	0.6667	±	0.0443
- high_school_geography	1	none	0	acc	↑	0.8939	±	0.0219
- high_school_government_and_politics	1	none	0	acc	↑	0.9585	±	0.0144
- high_school_macroeconomics	1	none	0	acc	↑	0.8103	±	0.0199
- high_school_microeconomics	1	none	0	acc	↑	0.9076	±	0.0188
- high_school_psychology	1	none	0	acc	↑	0.9358	±	0.0105
- human_sexuality	1	none	0	acc	↑	0.8855	±	0.0279
- professional_psychology	1	none	0	acc	↑	0.8578	±	0.0141
- public_relations	1	none	0	acc	↑	0.7909	±	0.0390
- security_studies	1	none	0	acc	↑	0.8327	±	0.0239
- sociology	1	none	0	acc	↑	0.9104	±	0.0202
- us_foreign_policy	1	none	0	acc	↑	0.9400	±	0.0239
- stem	2	none		acc	↑	0.7520	±	0.0073
- abstract_algebra	1	none	0	acc	↑	0.4500	±	0.0500
- anatomy	1	none	0	acc	↑	0.8296	±	0.0325
- astronomy	1	none	0	acc	↑	0.9211	±	0.0219
- college_biology	1	none	0	acc	↑	0.9444	±	0.0192
- college_chemistry	1	none	0	acc	↑	0.5600	±	0.0499
- college_computer_science	1	none	0	acc	↑	0.7100	±	0.0456
- college_mathematics	1	none	0	acc	↑	0.6200	±	0.0488
- college_physics	1	none	0	acc	↑	0.6569	±	0.0472
- computer_security	1	none	0	acc	↑	0.8300	±	0.0378
- conceptual_physics	1	none	0	acc	↑	0.8213	±	0.0250
- electrical_engineering	1	none	0	acc	↑	0.7862	±	0.0342
- elementary_mathematics	1	none	0	acc	↑	0.7804	±	0.0213
- high_school_biology	1	none	0	acc	↑	0.9290	±	0.0146
- high_school_chemistry	1	none	0	acc	↑	0.7488	±	0.0305
- high_school_computer_science	1	none	0	acc	↑	0.8900	±	0.0314
- high_school_mathematics	1	none	0	acc	↑	0.5222	±	0.0305
- high_school_physics	1	none	0	acc	↑	0.6225	±	0.0396
- high_school_statistics	1	none	0	acc	↑	0.7500	±	0.0295
- machine_learning	1	none	0	acc	↑	0.6339	±	0.0457

Groups	Version	Filter	Metric		Value		Stderr
mmlu	2	none	acc	↑	0.7725	±	0.0033
- humanities	2	none	acc	↑	0.6793	±	0.0062
- other	2	none	acc	↑	0.8339	±	0.0064
- social sciences	2	none	acc	↑	0.8739	±	0.0059
- stem	2	none	acc	↑	0.7520	±	0.0073