patrickvonplaten pandora-s wmarsh-m commited on
Commit
b547b53
·
verified ·
0 Parent(s):

Super-squash branch 'main' using huggingface_hub

Browse files

Co-authored-by: pandora-s <[email protected]>
Co-authored-by: wmarsh-m <[email protected]>

Files changed (6) hide show
  1. .gitattributes +36 -0
  2. README.md +426 -0
  3. SYSTEM_PROMPT.txt +19 -0
  4. consolidated.safetensors +3 -0
  5. params.json +29 -0
  6. tekken.json +3 -0
.gitattributes ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tekken.json filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,426 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - de
6
+ - es
7
+ - pt
8
+ - it
9
+ - ja
10
+ - ko
11
+ - ru
12
+ - zh
13
+ - ar
14
+ - fa
15
+ - id
16
+ - ms
17
+ - ne
18
+ - pl
19
+ - ro
20
+ - sr
21
+ - sv
22
+ - tr
23
+ - uk
24
+ - vi
25
+ - hi
26
+ - bn
27
+ license: apache-2.0
28
+ library_name: vllm
29
+ inference: false
30
+ base_model:
31
+ - mistralai/Mistral-Small-3.1-24B-Base-2503
32
+ extra_gated_description: >-
33
+ If you want to learn more about how we process your personal data, please read
34
+ our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
35
+ ---
36
+
37
+ # Model Card for Mistral-Small-3.1-24B-Instruct-2503
38
+
39
+ Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) **adds state-of-the-art vision understanding** and enhances **long context capabilities up to 128k tokens** without compromising text performance.
40
+ With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
41
+ This model is an instruction-finetuned version of: [Mistral-Small-3.1-24B-Base-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503).
42
+
43
+ Mistral Small 3.1 can be deployed locally and is exceptionally "knowledge-dense," fitting within a single RTX 4090 or a 32GB RAM MacBook once quantized.
44
+
45
+ It is ideal for:
46
+ - Fast-response conversational agents.
47
+ - Low-latency function calling.
48
+ - Subject matter experts via fine-tuning.
49
+ - Local inference for hobbyists and organizations handling sensitive data.
50
+ - Programming and math reasoning.
51
+ - Long document understanding.
52
+ - Visual understanding.
53
+
54
+ For enterprises requiring specialized capabilities (increased context, specific modalities, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
55
+
56
+ Learn more about Mistral Small 3.1 in our [blog post](https://mistral.ai/news/mistral-small-3-1/).
57
+
58
+ ## Key Features
59
+ - **Vision:** Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text.
60
+ - **Multilingual:** Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farshi.
61
+ - **Agent-Centric:** Offers best-in-class agentic capabilities with native function calling and JSON outputting.
62
+ - **Advanced Reasoning:** State-of-the-art conversational and reasoning capabilities.
63
+ - **Apache 2.0 License:** Open license allowing usage and modification for both commercial and non-commercial purposes.
64
+ - **Context Window:** A 128k context window.
65
+ - **System Prompt:** Maintains strong adherence and support for system prompts.
66
+ - **Tokenizer:** Utilizes a Tekken tokenizer with a 131k vocabulary size.
67
+
68
+ ## Benchmark Results
69
+
70
+ When available, we report numbers previously published by other model providers, otherwise we re-evaluate them using our own evaluation harness.
71
+
72
+ ### Pretrain Evals
73
+
74
+ | Model | MMLU (5-shot) | MMLU Pro (5-shot CoT) | TriviaQA | GPQA Main (5-shot CoT)| MMMU |
75
+ |--------------------------------|---------------|-----------------------|------------|-----------------------|-----------|
76
+ | **Small 3.1 24B Base** | **81.01%** | **56.03%** | 80.50% | **37.50%** | **59.27%**|
77
+ | Gemma 3 27B PT | 78.60% | 52.20% | **81.30%** | 24.30% | 56.10% |
78
+
79
+ ### Instruction Evals
80
+
81
+ #### Text
82
+
83
+ | Model | MMLU | MMLU Pro (5-shot CoT) | MATH | GPQA Main (5-shot CoT) | GPQA Diamond (5-shot CoT )| MBPP | HumanEval | SimpleQA (TotalAcc)|
84
+ |--------------------------------|-----------|-----------------------|------------------------|------------------------|---------------------------|-----------|-----------|--------------------|
85
+ | **Small 3.1 24B Instruct** | 80.62% | 66.76% | 69.30% | **44.42%** | **45.96%** | 74.71% | **88.41%**| **10.43%** |
86
+ | Gemma 3 27B IT | 76.90% | **67.50%** | **89.00%** | 36.83% | 42.40% | 74.40% | 87.80% | 10.00% |
87
+ | GPT4o Mini | **82.00%**| 61.70% | 70.20% | 40.20% | 39.39% | 84.82% | 87.20% | 9.50% |
88
+ | Claude 3.5 Haiku | 77.60% | 65.00% | 69.20% | 37.05% | 41.60% | **85.60%**| 88.10% | 8.02% |
89
+ | Cohere Aya-Vision 32B | 72.14% | 47.16% | 41.98% | 34.38% | 33.84% | 70.43% | 62.20% | 7.65% |
90
+
91
+ #### Vision
92
+
93
+ | Model | MMMU | MMMU PRO | Mathvista | ChartQA | DocVQA | AI2D | MM MT Bench |
94
+ |--------------------------------|------------|-----------|-----------|-----------|-----------|-------------|-------------|
95
+ | **Small 3.1 24B Instruct** | 64.00% | **49.25%**| **68.91%**| 86.24% | **94.08%**| **93.72%** | **7.3** |
96
+ | Gemma 3 27B IT | **64.90%** | 48.38% | 67.60% | 76.00% | 86.60% | 84.50% | 7 |
97
+ | GPT4o Mini | 59.40% | 37.60% | 56.70% | 76.80% | 86.70% | 88.10% | 6.6 |
98
+ | Claude 3.5 Haiku | 60.50% | 45.03% | 61.60% | **87.20%**| 90.00% | 92.10% | 6.5 |
99
+ | Cohere Aya-Vision 32B | 48.20% | 31.50% | 50.10% | 63.04% | 72.40% | 82.57% | 4.1 |
100
+
101
+ ### Multilingual Evals
102
+
103
+ | Model | Average | European | East Asian | Middle Eastern |
104
+ |--------------------------------|------------|------------|------------|----------------|
105
+ | **Small 3.1 24B Instruct** | **71.18%** | **75.30%** | **69.17%** | 69.08% |
106
+ | Gemma 3 27B IT | 70.19% | 74.14% | 65.65% | 70.76% |
107
+ | GPT4o Mini | 79.36% | 74.21% | 65.96% | **70.90%** |
108
+ | Claude 3.5 Haiku | 70.16% | 73.45% | 67.05% | 70.00% |
109
+ | Cohere Aya-Vision 32B | 62.15% | 64.70% | 57.61% | 64.12% |
110
+
111
+ ### Long Context Evals
112
+
113
+ | Model | LongBench v2 | RULER 32K | RULER 128K |
114
+ |--------------------------------|-----------------|-------------|------------|
115
+ | **Small 3.1 24B Instruct** | **37.18%** | **93.96%** | 81.20% |
116
+ | Gemma 3 27B IT | 34.59% | 91.10% | 66.00% |
117
+ | GPT4o Mini | 29.30% | 90.20% | 65.8% |
118
+ | Claude 3.5 Haiku | 35.19% | 92.60% | **91.90%** |
119
+
120
+ ## Basic Instruct Template (V7-Tekken)
121
+
122
+ ```
123
+ <s>[SYSTEM_PROMPT]<system prompt>[/SYSTEM_PROMPT][INST]<user message>[/INST]<assistant response></s>[INST]<user message>[/INST]
124
+ ```
125
+ *`<system_prompt>`, `<user message>` and `<assistant response>` are placeholders.*
126
+
127
+ ***Please make sure to use [mistral-common](https://github.com/mistralai/mistral-common) as the source of truth***
128
+
129
+ ## Usage
130
+
131
+ The model can be used with the following frameworks;
132
+ - [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm)
133
+
134
+ **Note 1**: We recommend using a relatively low temperature, such as `temperature=0.15`.
135
+
136
+ **Note 2**: Make sure to add a system prompt to the model to best tailer it for your needs. If you want to use the model as a general assistant, we recommend the following
137
+ system prompt:
138
+
139
+ ```
140
+ system_prompt = """You are Mistral Small 3.1, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
141
+ You power an AI assistant called Le Chat.
142
+ Your knowledge base was last updated on 2023-10-01.
143
+ The current date is {today}.
144
+
145
+ When you're not sure about some information, you say that you don't have the information and don't make up anything.
146
+ If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").
147
+ You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.
148
+ You follow these instructions in all languages, and always respond to the user in the language they use or request.
149
+ Next sections describe the capabilities that you have.
150
+
151
+ # WEB BROWSING INSTRUCTIONS
152
+
153
+ You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.
154
+
155
+ # MULTI-MODAL INSTRUCTIONS
156
+
157
+ You have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos.
158
+ You cannot read nor transcribe audio files or videos."""
159
+ ```
160
+
161
+ ### vLLM (recommended)
162
+
163
+ We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
164
+ to implement production-ready inference pipelines.
165
+
166
+ **_Installation_**
167
+
168
+ Make sure you install [`vLLM nightly`](https://github.com/vllm-project/vllm/):
169
+
170
+ ```
171
+ pip install vllm --pre --extra-index-url https://wheels.vllm.ai/nightly --upgrade
172
+ ```
173
+
174
+ Doing so should automatically install [`mistral_common >= 1.5.4`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.4).
175
+
176
+ To check:
177
+ ```
178
+ python -c "import mistral_common; print(mistral_common.__version__)"
179
+ ```
180
+
181
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39) followed by a nighly install of vllm as shown above.
182
+
183
+ #### Server
184
+
185
+ We recommand that you use Mistral-Small-3.1-24B-Instruct-2503 in a server/client setting.
186
+
187
+ 1. Spin up a server:
188
+
189
+ ```
190
+ vllm serve mistralai/Mistral-Small-3.1-24B-Instruct-2503 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --limit_mm_per_prompt 'image=10' --tensor-parallel-size 2
191
+ ```
192
+
193
+ **Note:** Running Mistral-Small-3.1-24B-Instruct-2503 on GPU requires ~55 GB of GPU RAM in bf16 or fp16.
194
+
195
+
196
+ 2. To ping the client you can use a simple Python snippet.
197
+
198
+ ```py
199
+ import requests
200
+ import json
201
+ from huggingface_hub import hf_hub_download
202
+ from datetime import datetime, timedelta
203
+
204
+ url = "http://<your-server-url>:8000/v1/chat/completions"
205
+ headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
206
+
207
+ model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"
208
+
209
+
210
+ def load_system_prompt(repo_id: str, filename: str) -> str:
211
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
212
+ with open(file_path, "r") as file:
213
+ system_prompt = file.read()
214
+ today = datetime.today().strftime("%Y-%m-%d")
215
+ yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
216
+ model_name = repo_id.split("/")[-1]
217
+ return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
218
+
219
+
220
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
221
+
222
+ image_url = "https://huggingface.co/datasets/patrickvonplaten/random_img/resolve/main/europe.png"
223
+
224
+ messages = [
225
+ {"role": "system", "content": SYSTEM_PROMPT},
226
+ {
227
+ "role": "user",
228
+ "content": [
229
+ {
230
+ "type": "text",
231
+ "text": "Which of the depicted countries has the best food? Which the second and third and fourth? Name the country, its color on the map and one its city that is visible on the map, but is not the capital. Make absolutely sure to only name a city that can be seen on the map.",
232
+ },
233
+ {"type": "image_url", "image_url": {"url": image_url}},
234
+ ],
235
+ },
236
+ ]
237
+
238
+ data = {"model": model, "messages": messages, "temperature": 0.15}
239
+
240
+ response = requests.post(url, headers=headers, data=json.dumps(data))
241
+ print(response.json()["choices"][0]["message"]["content"])
242
+ # Determining the "best" food is highly subjective and depends on personal preferences. However, based on general popularity and recognition, here are some countries known for their cuisine:
243
+
244
+ # 1. **Italy** - Color: Light Green - City: Milan
245
+ # - Italian cuisine is renowned worldwide for its pasta, pizza, and various regional specialties.
246
+
247
+ # 2. **France** - Color: Brown - City: Lyon
248
+ # - French cuisine is celebrated for its sophistication, including dishes like coq au vin, bouillabaisse, and pastries like croissants and éclairs.
249
+
250
+ # 3. **Spain** - Color: Yellow - City: Bilbao
251
+ # - Spanish cuisine offers a variety of flavors, from paella and tapas to jamón ibérico and churros.
252
+
253
+ # 4. **Greece** - Not visible on the map
254
+ # - Greek cuisine is known for dishes like moussaka, souvlaki, and baklava. Unfortunately, Greece is not visible on the provided map, so I cannot name a city.
255
+
256
+ # Since Greece is not visible on the map, I'll replace it with another country known for its good food:
257
+
258
+ # 4. **Turkey** - Color: Light Green (east part of the map) - City: Istanbul
259
+ # - Turkish cuisine is diverse and includes dishes like kebabs, meze, and baklava.
260
+ ```
261
+
262
+ ### Function calling
263
+
264
+ Mistral-Small-3.1-24-Instruct-2503 is excellent at function / tool calling tasks via vLLM. *E.g.:*
265
+
266
+ <details>
267
+ <summary>Example</summary>
268
+
269
+ ```py
270
+ import requests
271
+ import json
272
+ from huggingface_hub import hf_hub_download
273
+ from datetime import datetime, timedelta
274
+
275
+ url = "http://<your-url>:8000/v1/chat/completions"
276
+ headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
277
+
278
+ model = "mistralai/Mistral-Small-3.1-24B-Instruct-2503"
279
+
280
+
281
+ def load_system_prompt(repo_id: str, filename: str) -> str:
282
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
283
+ with open(file_path, "r") as file:
284
+ system_prompt = file.read()
285
+ today = datetime.today().strftime("%Y-%m-%d")
286
+ yesterday = (datetime.today() - timedelta(days=1)).strftime("%Y-%m-%d")
287
+ model_name = repo_id.split("/")[-1]
288
+ return system_prompt.format(name=model_name, today=today, yesterday=yesterday)
289
+
290
+
291
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
292
+
293
+
294
+ tools = [
295
+ {
296
+ "type": "function",
297
+ "function": {
298
+ "name": "get_current_weather",
299
+ "description": "Get the current weather in a given location",
300
+ "parameters": {
301
+ "type": "object",
302
+ "properties": {
303
+ "city": {
304
+ "type": "string",
305
+ "description": "The city to find the weather for, e.g. 'San Francisco'",
306
+ },
307
+ "state": {
308
+ "type": "string",
309
+ "description": "The state abbreviation, e.g. 'CA' for California",
310
+ },
311
+ "unit": {
312
+ "type": "string",
313
+ "description": "The unit for temperature",
314
+ "enum": ["celsius", "fahrenheit"],
315
+ },
316
+ },
317
+ "required": ["city", "state", "unit"],
318
+ },
319
+ },
320
+ },
321
+ {
322
+ "type": "function",
323
+ "function": {
324
+ "name": "rewrite",
325
+ "description": "Rewrite a given text for improved clarity",
326
+ "parameters": {
327
+ "type": "object",
328
+ "properties": {
329
+ "text": {
330
+ "type": "string",
331
+ "description": "The input text to rewrite",
332
+ }
333
+ },
334
+ },
335
+ },
336
+ },
337
+ ]
338
+
339
+ messages = [
340
+ {"role": "system", "content": SYSTEM_PROMPT},
341
+ {
342
+ "role": "user",
343
+ "content": "Could you please make the below article more concise?\n\nOpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership.",
344
+ },
345
+ {
346
+ "role": "assistant",
347
+ "content": "",
348
+ "tool_calls": [
349
+ {
350
+ "id": "bbc5b7ede",
351
+ "type": "function",
352
+ "function": {
353
+ "name": "rewrite",
354
+ "arguments": '{"text": "OpenAI is an artificial intelligence research laboratory consisting of the non-profit OpenAI Incorporated and its for-profit subsidiary corporation OpenAI Limited Partnership."}',
355
+ },
356
+ }
357
+ ],
358
+ },
359
+ {
360
+ "role": "tool",
361
+ "content": '{"action":"rewrite","outcome":"OpenAI is a FOR-profit company."}',
362
+ "tool_call_id": "bbc5b7ede",
363
+ "name": "rewrite",
364
+ },
365
+ {
366
+ "role": "assistant",
367
+ "content": "---\n\nOpenAI is a FOR-profit company.",
368
+ },
369
+ {
370
+ "role": "user",
371
+ "content": "Can you tell me what the temperature will be in Dallas, in Fahrenheit?",
372
+ },
373
+ ]
374
+
375
+ data = {"model": model, "messages": messages, "tools": tools, "temperature": 0.15}
376
+
377
+ response = requests.post(url, headers=headers, data=json.dumps(data))
378
+ print(response.json()["choices"][0]["message"]["tool_calls"])
379
+ # [{'id': '8PdihwL6d', 'type': 'function', 'function': {'name': 'get_current_weather', 'arguments': '{"city": "Dallas", "state": "TX", "unit": "fahrenheit"}'}}]
380
+ ```
381
+
382
+ </details>
383
+
384
+ #### Offline
385
+
386
+ ```py
387
+ from vllm import LLM
388
+ from vllm.sampling_params import SamplingParams
389
+ from datetime import datetime, timedelta
390
+
391
+ SYSTEM_PROMPT = "You are a conversational agent that always answers straight to the point, always end your accurate response with an ASCII drawing of a cat."
392
+
393
+ user_prompt = "Give me 5 non-formal ways to say 'See you later' in French."
394
+
395
+ messages = [
396
+ {
397
+ "role": "system",
398
+ "content": SYSTEM_PROMPT
399
+ },
400
+ {
401
+ "role": "user",
402
+ "content": user_prompt
403
+ },
404
+ ]
405
+
406
+ # note that running this model on GPU requires over 60 GB of GPU RAM
407
+ llm = LLM(model=model_name, tokenizer_mode="mistral")
408
+
409
+ sampling_params = SamplingParams(max_tokens=512, temperature=0.15)
410
+ outputs = llm.chat(messages, sampling_params=sampling_params)
411
+
412
+ print(outputs[0].outputs[0].text)
413
+ # Here are five non-formal ways to say "See you later" in French:
414
+
415
+ # 1. **À plus tard** - Until later
416
+ # 2. **À toute** - See you soon (informal)
417
+ # 3. **Salut** - Bye (can also mean hi)
418
+ # 4. **À plus** - See you later (informal)
419
+ # 5. **Ciao** - Bye (informal, borrowed from Italian)
420
+
421
+ # ```
422
+ # /\_/\
423
+ # ( o.o )
424
+ # > ^ <
425
+ # ```
426
+ ```
SYSTEM_PROMPT.txt ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ You are {name}, a Large Language Model (LLM) created by Mistral AI, a French startup headquartered in Paris.
2
+ You power an AI assistant called Le Chat.
3
+ Your knowledge base was last updated on 2023-10-01.
4
+ The current date is {today}.
5
+
6
+ When you're not sure about some information, you say that you don't have the information and don't make up anything.
7
+ If the user's question is not clear, ambiguous, or does not provide enough context for you to accurately answer the question, you do not try to answer it right away and you rather ask the user to clarify their request (e.g. "What are some good restaurants around me?" => "Where are you?" or "When is the next flight to Tokyo" => "Where do you travel from?").
8
+ You are always very attentive to dates, in particular you try to resolve dates (e.g. "yesterday" is {yesterday}) and when asked about information at specific dates, you discard information that is at another date.
9
+ You follow these instructions in all languages, and always respond to the user in the language they use or request.
10
+ Next sections describe the capabilities that you have.
11
+
12
+ # WEB BROWSING INSTRUCTIONS
13
+
14
+ You cannot perform any web search or access internet to open URLs, links etc. If it seems like the user is expecting you to do so, you clarify the situation and ask the user to copy paste the text directly in the chat.
15
+
16
+ # MULTI-MODAL INSTRUCTIONS
17
+
18
+ You have the ability to read images, but you cannot generate images. You also cannot transcribe audio files or videos.
19
+ You cannot read nor transcribe audio files or videos.
consolidated.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d446ca97599fa9d98b2e3744d8b83019837a2fe34a80f4353120b1e9b6249b1e
3
+ size 48022792280
params.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "dim": 5120,
3
+ "n_layers": 40,
4
+ "head_dim": 128,
5
+ "hidden_dim": 32768,
6
+ "n_heads": 32,
7
+ "n_kv_heads": 8,
8
+ "rope_theta": 1000000000.0,
9
+ "norm_eps": 1e-05,
10
+ "vocab_size": 131072,
11
+ "vision_encoder": {
12
+ "hidden_size": 1024,
13
+ "num_channels": 3,
14
+ "max_image_size": 1540,
15
+ "patch_size": 14,
16
+ "rope_theta": 10000.0,
17
+ "intermediate_size": 4096,
18
+ "num_hidden_layers": 24,
19
+ "num_attention_heads": 16,
20
+ "adapter_bias": false,
21
+ "mm_projector_id": "patch_merge",
22
+ "spatial_merge_size": 2,
23
+ "add_pre_mm_projector_layer_norm": true,
24
+ "image_token_id": 10,
25
+ "image_break_token_id": 12,
26
+ "image_end_token_id": 13,
27
+ "image_size": 1540
28
+ }
29
+ }
tekken.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c604f35d1035f534519622c0ec83fed6184978d4fdee92a5bd2a50bc05438094
3
+ size 14801330