File size: 5,323 Bytes
db4b1ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3c88f07
 
 
32e4c9a
db4b1ea
 
3c88f07
db4b1ea
 
 
 
 
 
 
 
 
3c88f07
db4b1ea
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9f1a8bb
 
 
b015fb4
 
9f1a8bb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
---
language:
- en
- fr
- de
- es
- pt
- it
- ja
- ko
- ru
- zh
- ar
- fa
- id
- ms
- ne
- pl
- ro
- sr
- sv
- tr
- uk
- vi
- hi
- bn
license: apache-2.0
library_name: vllm
inference: false
extra_gated_description: >-
  If you want to learn more about how we process your personal data, please read
  our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
---

# Model Card for Mistral-Small-3.1-24B-Base-2503

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) **adds state-of-the-art vision understanding** and enhances **long context capabilities up to 128k tokens** without compromising text performance. 
With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.  
This model is the base model of [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503).

For enterprises requiring specialized capabilities (increased context, specific modalities, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.

Learn more about Mistral Small 3.1 in our [blog post](https://mistral.ai/news/mistral-small-3-1/).

## Key Features
- **Vision:** Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text.
- **Multilingual:** Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farshi.
- **Apache 2.0 License:** Open license allowing usage and modification for both commercial and non-commercial purposes.
- **Context Window:** A 128k context window.
- **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.

## Benchmark Results

When available, we report numbers previously published by other model providers, otherwise we re-evaluate them using our own evaluation harness.

### Pretrain Evals

| Model                          | MMLU (5-shot) | MMLU Pro (5-shot CoT) | TriviaQA   | GPQA Main (5-shot CoT)| MMMU      |
|--------------------------------|---------------|-----------------------|------------|-----------------------|-----------|
| **Small 3.1 24B Base**         | **81.01%**    | **56.03%**            | 80.50%     | **37.50%**            | **59.27%**|
| Gemma 3 27B PT                 | 78.60%        | 52.20%                | **81.30%** | 24.30%                | 56.10%    |

## Usage Examples

### vLLM (recommended)

We recommend using Mistral-Small 3.1 Base with the [vLLM library](https://github.com/vllm-project/vllm).
_Note_ however that this is a pretrained-only checkpoint and thus not ready to work as an instruction model out-of-the-box.
For a production-ready instruction model please use [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503).

**_Installation_**

We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
to implement production-ready inference pipelines.

Make sure you install [`vLLM >= 0.8.1`](https://github.com/vllm-project/vllm/releases/tag/v0.8.1):

```
pip install vllm --ugrade
```

Doing so should automatically install [`mistral_common >= 1.5.4`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.4).

To check:
```
python -c "import mistral_common; print(mistral_common.__version__)"
```

You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).

**_Example_**

```py
from vllm import LLM
from vllm.sampling_params import SamplingParams
from vllm.inputs.data import TokensPrompt
import requests
from PIL import Image
from io import BytesIO
from vllm.multimodal import MultiModalDataBuiltins

from mistral_common.protocol.instruct.messages import TextChunk, ImageURLChunk

model_name = "mistralai/Mistral-Small-3.1-24B-Base-2503"
sampling_params = SamplingParams(max_tokens=8192)

llm = LLM(model=model_name, tokenizer_mode="mistral")

url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
response = requests.get(url)
image = Image.open(BytesIO(response.content))

prompt = "The image shows a"

user_content = [ImageURLChunk(image_url=url), TextChunk(text=prompt)]

tokenizer = llm.llm_engine.tokenizer.tokenizer.mistral.instruct_tokenizer
tokens, _ = tokenizer.encode_user_content(user_content, False)

prompt = TokensPrompt(
    prompt_token_ids=tokens, multi_modal_data=MultiModalDataBuiltins(image=[image])
)
outputs = llm.generate(prompt, sampling_params=sampling_params)

print(outputs[0].outputs[0].text)
# ' scene in Yosemite Valley and was taken at ISO 250 with an aperture of f/16 and a shutter speed of 1/18 second. ...'
```

### Transformers (untested)

Transformers-compatible model weights are also uploaded (thanks a lot @cyrilvallez). 
However the transformers implementation was **not throughly tested**, but only on "vibe-checks".
Hence, we can only ensure 100% correct behavior when using the original weight format with vllm (see above).