patrickvonplaten pandora-s commited on
Commit
db4b1ea
·
verified ·
0 Parent(s):

Super-squash branch 'main' using huggingface_hub

Browse files

Co-authored-by: pandora-s <[email protected]>

Files changed (5) hide show
  1. .gitattributes +37 -0
  2. README.md +126 -0
  3. consolidated.safetensors +3 -0
  4. params.json +29 -0
  5. tekken.json +3 -0
.gitattributes ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tekken[[:space:]](3).json filter=lfs diff=lfs merge=lfs -text
37
+ tekken.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,126 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - de
6
+ - es
7
+ - pt
8
+ - it
9
+ - ja
10
+ - ko
11
+ - ru
12
+ - zh
13
+ - ar
14
+ - fa
15
+ - id
16
+ - ms
17
+ - ne
18
+ - pl
19
+ - ro
20
+ - sr
21
+ - sv
22
+ - tr
23
+ - uk
24
+ - vi
25
+ - hi
26
+ - bn
27
+ license: apache-2.0
28
+ library_name: vllm
29
+ inference: false
30
+ extra_gated_description: >-
31
+ If you want to learn more about how we process your personal data, please read
32
+ our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
33
+ tags:
34
+ - transformers
35
+ ---
36
+
37
+ # Model Card for Mistral-Small-3.1-24B-Base-2503
38
+
39
+ Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) **adds state-of-the-art vision understanding** and enhances **long context capabilities up to 128k tokens** without compromising text performance.
40
+ With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
41
+ This model is the base model of [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503).
42
+
43
+ For enterprises requiring specialized capabilities (increased context, specific modalities, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
44
+
45
+ Learn more about Mistral Small 3.1 in our [blog post](https://mistral.ai/news/mistral-small-3-1/).
46
+
47
+ ## Key Features
48
+ - **Vision:** Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text.
49
+ - **Multilingual:** Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farshi.
50
+ - **Apache 2.0 License:** Open license allowing usage and modification for both commercial and non-commercial purposes.
51
+ - **Context Window:** A 128k context window.
52
+ - **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.
53
+
54
+ ## Benchmark Results
55
+
56
+ When available, we report numbers previously published by other model providers, otherwise we re-evaluate them using our own evaluation harness.
57
+
58
+ ### Pretrain Evals
59
+
60
+ | Model | MMLU (5-shot) | MMLU Pro (5-shot CoT) | TriviaQA | GPQA Main (5-shot CoT)| MMMU |
61
+ |--------------------------------|---------------|-----------------------|------------|-----------------------|-----------|
62
+ | **Small 3.1 24B Base** | **81.01%** | **56.03%** | 80.50% | **37.50%** | **59.27%**|
63
+ | Gemma 3 27B PT | 78.60% | 52.20% | **81.30%** | 24.30% | 56.10% |
64
+
65
+ ## Usage Examples
66
+
67
+ ### vLLM (recommended)
68
+
69
+ We recommend using Mistral-Small 3.1 Base with the [vLLM library](https://github.com/vllm-project/vllm).
70
+ _Note_ however that this is a pretrained-only checkpoint and thus not ready to work as an instruction model out-of-the-box.
71
+ For a production-ready instruction model please use [Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503).
72
+
73
+ **_Installation_**
74
+
75
+ Make sure you install [`vLLM nightly`](https://github.com/vllm-project/vllm/):
76
+
77
+ ```
78
+ pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
79
+ ```
80
+
81
+ Doing so should automatically install [`mistral_common >= 1.5.4`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.4).
82
+
83
+ To check:
84
+ ```
85
+ python -c "import mistral_common; print(mistral_common.__version__)"
86
+ ```
87
+
88
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39) followed by a nighly install of vllm as shown above.
89
+
90
+ **_Example_**
91
+
92
+ ```py
93
+ from vllm import LLM
94
+ from vllm.sampling_params import SamplingParams
95
+ from vllm.inputs.data import TokensPrompt
96
+ import requests
97
+ from PIL import Image
98
+ from io import BytesIO
99
+ from vllm.multimodal import MultiModalDataBuiltins
100
+
101
+ from mistral_common.protocol.instruct.messages import TextChunk, ImageURLChunk
102
+
103
+ model_name = "mistralai/Mistral-Small-3.1-24B-Base-2503"
104
+ sampling_params = SamplingParams(max_tokens=8192)
105
+
106
+ llm = LLM(model=model_name, tokenizer_mode="mistral")
107
+
108
+ url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/yosemite.png"
109
+ response = requests.get(url)
110
+ image = Image.open(BytesIO(response.content))
111
+
112
+ prompt = "The image shows a"
113
+
114
+ user_content = [ImageURLChunk(image_url=url), TextChunk(text=prompt)]
115
+
116
+ tokenizer = llm.llm_engine.tokenizer.tokenizer.mistral.instruct_tokenizer
117
+ tokens, _ = tokenizer.encode_user_content(user_content, False)
118
+
119
+ prompt = TokensPrompt(
120
+ prompt_token_ids=tokens, multi_modal_data=MultiModalDataBuiltins(image=[image])
121
+ )
122
+ outputs = llm.generate(prompt, sampling_params=sampling_params)
123
+
124
+ print(outputs[0].outputs[0].text)
125
+ # ' scene in Yosemite Valley and was taken at ISO 250 with an aperture of f/16 and a shutter speed of 1/18 second. ...'
126
+ ```
consolidated.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bcbfc04ffa14c7ae9683222b34d5f44358c29e6ebc6d135c2c8f2c7e243c2946
3
+ size 48022792280
params.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dim": 5120,
3
+ "n_layers": 40,
4
+ "head_dim": 128,
5
+ "hidden_dim": 32768,
6
+ "n_heads": 32,
7
+ "n_kv_heads": 8,
8
+ "rope_theta": 1000000000.0,
9
+ "norm_eps": 1e-05,
10
+ "vocab_size": 131072,
11
+ "vision_encoder": {
12
+ "hidden_size": 1024,
13
+ "num_channels": 3,
14
+ "max_image_size": 1540,
15
+ "patch_size": 14,
16
+ "rope_theta": 10000.0,
17
+ "intermediate_size": 4096,
18
+ "num_hidden_layers": 24,
19
+ "num_attention_heads": 16,
20
+ "adapter_bias": false,
21
+ "mm_projector_id": "patch_merge",
22
+ "spatial_merge_size": 2,
23
+ "add_pre_mm_projector_layer_norm": true,
24
+ "image_token_id": 10,
25
+ "image_break_token_id": 12,
26
+ "image_end_token_id": 13,
27
+ "image_size": 1540
28
+ }
29
+ }
tekken.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c604f35d1035f534519622c0ec83fed6184978d4fdee92a5bd2a50bc05438094
3
+ size 14801330