danielhanchen commited on
Commit
98721da
·
verified ·
1 Parent(s): 4db10e4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -0
README.md ADDED
@@ -0,0 +1,130 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - de
6
+ - es
7
+ - pt
8
+ - it
9
+ - ja
10
+ - ko
11
+ - ru
12
+ - zh
13
+ - ar
14
+ - fa
15
+ - id
16
+ - ms
17
+ - ne
18
+ - pl
19
+ - ro
20
+ - sr
21
+ - sv
22
+ - tr
23
+ - uk
24
+ - vi
25
+ - hi
26
+ - bn
27
+ license: apache-2.0
28
+ library_name: vllm
29
+ inference: false
30
+ extra_gated_description: >-
31
+ If you want to learn more about how we process your personal data, please read
32
+ our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
33
+ ---
34
+
35
+ # Model Card for Mistral-Small-3.1-24B-Base-2503
36
+
37
+ Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) **adds state-of-the-art vision understanding** and enhances **long context capabilities up to 128k tokens** without compromising text performance.
38
+ With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
39
+ This model is the base model of [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503).
40
+
41
+ For enterprises requiring specialized capabilities (increased context, specific modalities, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
42
+
43
+ Learn more about Mistral Small 3.1 in our [blog post](https://mistral.ai/news/mistral-small-3-1/).
44
+
45
+ ## Key Features
46
+ - **Vision:** Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text.
47
+ - **Multilingual:** Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farshi.
48
+ - **Apache 2.0 License:** Open license allowing usage and modification for both commercial and non-commercial purposes.
49
+ - **Context Window:** A 128k context window.
50
+ - **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.
51
+
52
+ ## Benchmark Results
53
+
54
+ When available, we report numbers previously published by other model providers, otherwise we re-evaluate them using our own evaluation harness.
55
+
56
+ ### Pretrain Evals
57
+
58
+ | Model | MMLU (5-shot) | MMLU Pro (5-shot CoT) | TriviaQA | GPQA Main (5-shot CoT)| MMMU |
59
+ |--------------------------------|---------------|-----------------------|------------|-----------------------|-----------|
60
+ | **Small 3.1 24B Base** | **81.01%** | **56.03%** | 80.50% | **37.50%** | **59.27%**|
61
+ | Gemma 3 27B PT | 78.60% | 52.20% | **81.30%** | 24.30% | 56.10% |
62
+
63
+ ## Usage Examples
64
+
65
+ ### vLLM (recommended)
66
+
67
+ We recommend using Mistral-Small 3.1 Base with the [vLLM library](https://github.com/vllm-project/vllm).
68
+ _Note_ however that this is a pretrained-only checkpoint and thus not ready to work as an instruction model out-of-the-box.
69
+ For a production-ready instruction model please use [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503).
70
+
71
+ **_Installation_**
72
+
73
+ Make sure you install [`vLLM nightly`](https://github.com/vllm-project/vllm/):
74
+
75
+ ```
76
+ pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
77
+ ```
78
+
79
+ Doing so should automatically install [`mistral_common >= 1.5.4`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.4).
80
+
81
+ To check:
82
+ ```
83
+ python -c "import mistral_common; print(mistral_common.__version__)"
84
+ ```
85
+
86
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39) followed by a nightly install of vllm as shown above.
87
+
88
+ **_Example_**
89
+
90
+ ```py
91
+ from vllm import LLM
92
+ from vllm.sampling_params import SamplingParams
93
+ from vllm.inputs.data import TokensPrompt
94
+ import requests
95
+ from PIL import Image
96
+ from io import BytesIO
97
+ from vllm.multimodal import MultiModalDataBuiltins
98
+
99
+ from mistral_common.protocol.instruct.messages import TextChunk, ImageURLChunk
100
+
101
+ model_name = "mistralai/Mistral-Small-3.1-24B-Base-2503"
102
+ sampling_params = SamplingParams(max_tokens=8192)
103
+
104
+ llm = LLM(model=model_name, tokenizer_mode="mistral")
105
+
106
+ url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
107
+ response = requests.get(url)
108
+ image = Image.open(BytesIO(response.content))
109
+
110
+ prompt = "The image shows a"
111
+
112
+ user_content = [ImageURLChunk(image_url=url), TextChunk(text=prompt)]
113
+
114
+ tokenizer = llm.llm_engine.tokenizer.tokenizer.mistral.instruct_tokenizer
115
+ tokens, _ = tokenizer.encode_user_content(user_content, False)
116
+
117
+ prompt = TokensPrompt(
118
+ prompt_token_ids=tokens, multi_modal_data=MultiModalDataBuiltins(image=[image])
119
+ )
120
+ outputs = llm.generate(prompt, sampling_params=sampling_params)
121
+
122
+ print(outputs[0].outputs[0].text)
123
+ # ' scene in Yosemite Valley and was taken at ISO 250 with an aperture of f/16 and a shutter speed of 1/18 second. ...'
124
+ ```
125
+
126
+ ### Transformers (untested)
127
+
128
+ Transformers-compatible model weights are also uploaded (thanks a lot @cyrilvallez).
129
+ However the transformers implementation was **not throughly tested**, but only on "vibe-checks".
130
+ Hence, we can only ensure 100% correct behavior when using the original weight format with vllm (see above).