Abdoul-AI commited on
Commit
8d9f69b
·
verified ·
1 Parent(s): 1f956f3

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +406 -0
README.md ADDED
@@ -0,0 +1,406 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - de
6
+ - es
7
+ - pt
8
+ - it
9
+ - ja
10
+ - ko
11
+ - ru
12
+ - zh
13
+ - ar
14
+ - fa
15
+ - id
16
+ - ms
17
+ - ne
18
+ - pl
19
+ - ro
20
+ - sr
21
+ - sv
22
+ - tr
23
+ - uk
24
+ - vi
25
+ - hi
26
+ - bn
27
+ license: apache-2.0
28
+ library_name: vllm
29
+ inference: false
30
+ base_model:
31
+ - mistralai/Devstrall-Small-2505
32
+ extra_gated_description: If you want to learn more about how we process your personal
33
+ data, please read our <a href="https://mistral.ai/terms/">Privacy Policy</a>.
34
+ pipeline_tag: text2text-generation
35
+ tags:
36
+ - autoquant
37
+ - gguf
38
+ ---
39
+
40
+ # Devstral-Small-2505
41
+
42
+ Devstral is an agentic LLM for software engineering tasks built under a collaboration between [Mistral AI](https://mistral.ai/) and [All Hands AI](https://www.all-hands.dev/) 🙌. Devstral excels at using tools to explore codebases, editing multiple files and power software engineering agents. The model achieves remarkable performance on SWE-bench which positionates it as the #1 open source model on this [benchmark](#benchmark-results).
43
+
44
+ It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503), therefore it has a long context window of up to 128k tokens. As a coding agent, Devstral is text-only and before fine-tuning from `Mistral-Small-3.1` the vision encoder was removed.
45
+
46
+ For enterprises requiring specialized capabilities (increased context, domain-specific knowledge, etc.), we will release commercial models beyond what Mistral AI contributes to the community.
47
+
48
+ Learn more about Devstral in our [blog post](https://mistral.ai/news/devstral).
49
+
50
+
51
+ ## Key Features:
52
+ - **Agentic coding**: Devstral is designed to excel at agentic coding tasks, making it a great choice for software engineering agents.
53
+ - **lightweight**: with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM, making it an appropriate model for local deployment and on-device use.
54
+ - **Apache 2.0 License**: Open license allowing usage and modification for both commercial and non-commercial purposes.
55
+ - **Context Window**: A 128k context window.
56
+ - **Tokenizer**: Utilizes a Tekken tokenizer with a 131k vocabulary size.
57
+
58
+
59
+
60
+ ## Benchmark Results
61
+
62
+ ### SWE-Bench
63
+
64
+ Devstral achieves a score of 46.8% on SWE-Bench Verified, outperforming prior open-source SoTA by 6%.
65
+
66
+ | Model | Scaffold | SWE-Bench Verified (%) |
67
+ |------------------|--------------------|------------------------|
68
+ | Devstral | OpenHands Scaffold | **46.8** |
69
+ | GPT-4.1-mini | OpenAI Scaffold | 23.6 |
70
+ | Claude 3.5 Haiku | Anthropic Scaffold | 40.6 |
71
+ | SWE-smith-LM 32B | SWE-agent Scaffold | 40.2 |
72
+
73
+
74
+ When evaluated under the same test scaffold (OpenHands, provided by All Hands AI 🙌), Devstral exceeds far larger models such as Deepseek-V3-0324 and Qwen3 232B-A22B.
75
+
76
+ ![SWE Benchmark](assets/swe_bench.png)
77
+
78
+ ## Usage
79
+
80
+ We recommend to use Devstral with the [OpenHands](https://github.com/All-Hands-AI/OpenHands/tree/main) scaffold.
81
+ You can use it either through our API or by running locally.
82
+
83
+ ### API
84
+ Follow these [instructions](https://docs.mistral.ai/getting-started/quickstart/#account-setup) to create a Mistral account and get an API key.
85
+
86
+ Then run these commands to start the OpenHands docker container.
87
+ ```bash
88
+ export MISTRAL_API_KEY=<MY_KEY>
89
+
90
+ docker pull docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik
91
+
92
+ mkdir -p ~/.openhands-state && echo '{"language":"en","agent":"CodeActAgent","max_iterations":null,"security_analyzer":null,"confirmation_mode":false,"llm_model":"mistral/devstral-small-2505","llm_api_key":"'$MISTRAL_API_KEY'","remote_runtime_resource_factor":null,"github_token":null,"enable_default_condenser":true}' > ~/.openhands-state/settings.json
93
+
94
+ docker run -it --rm --pull=always \
95
+ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.39-nikolaik \
96
+ -e LOG_ALL_EVENTS=true \
97
+ -v /var/run/docker.sock:/var/run/docker.sock \
98
+ -v ~/.openhands-state:/.openhands-state \
99
+ -p 3000:3000 \
100
+ --add-host host.docker.internal:host-gateway \
101
+ --name openhands-app \
102
+ docker.all-hands.dev/all-hands-ai/openhands:0.39
103
+ ```
104
+
105
+ ### Local inference
106
+
107
+ The model can also be deployed with the following libraries:
108
+ - [`vllm (recommended)`](https://github.com/vllm-project/vllm): See [here](#vllm-recommended)
109
+ - [`mistral-inference`](https://github.com/mistralai/mistral-inference): See [here](#mistral-inference)
110
+ - [`transformers`](https://github.com/huggingface/transformers): See [here](#transformers)
111
+ - [`LMStudio`](https://lmstudio.ai/): See [here](#lmstudio)
112
+ - [`ollama`](https://github.com/ollama/ollama): See [here](#ollama)
113
+
114
+
115
+ ### OpenHands (recommended)
116
+
117
+ #### Launch a server to deploy Devstral-Small-2505
118
+
119
+ Make sure you launched an OpenAI-compatible server such as vLLM or Ollama as described above. Then, you can use OpenHands to interact with `Devstral-Small-2505`.
120
+
121
+ In the case of the tutorial we spineed up a vLLM server running the command:
122
+ ```bash
123
+ vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
124
+ ```
125
+
126
+ The server address should be in the following format: `http://<your-server-url>:8000/v1`
127
+
128
+ #### Launch OpenHands
129
+
130
+ You can follow installation of OpenHands [here](https://docs.all-hands.dev/modules/usage/installation).
131
+
132
+ The easiest way to launch OpenHands is to use the Docker image:
133
+ ```bash
134
+ docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
135
+
136
+ docker run -it --rm --pull=always \
137
+ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
138
+ -e LOG_ALL_EVENTS=true \
139
+ -v /var/run/docker.sock:/var/run/docker.sock \
140
+ -v ~/.openhands-state:/.openhands-state \
141
+ -p 3000:3000 \
142
+ --add-host host.docker.internal:host-gateway \
143
+ --name openhands-app \
144
+ docker.all-hands.dev/all-hands-ai/openhands:0.38
145
+ ```
146
+
147
+
148
+ Then, you can access the OpenHands UI at `http://localhost:3000`.
149
+
150
+ #### Connect to the server
151
+
152
+ When accessing the OpenHands UI, you will be prompted to connect to a server. You can use the advanced mode to connect to the server you launched earlier.
153
+
154
+ Fill the following fields:
155
+ - **Custom Model**: `openai/mistralai/Devstral-Small-2505`
156
+ - **Base URL**: `http://<your-server-url>:8000/v1`
157
+ - **API Key**: `token` (or any other token you used to launch the server if any)
158
+
159
+ #### Use OpenHands powered by Devstral
160
+
161
+ Now you're good to use Devstral Small inside OpenHands by **starting a new conversation**. Let's build a To-Do list app.
162
+
163
+ <details>
164
+ <summary>To-Do list app</summary
165
+
166
+ 1. Let's ask Devstral to generate the app with the following prompt:
167
+
168
+ ```txt
169
+ Build a To-Do list app with the following requirements:
170
+ - Built using FastAPI and React.
171
+ - Make it a one page app that:
172
+ - Allows to add a task.
173
+ - Allows to delete a task.
174
+ - Allows to mark a task as done.
175
+ - Displays the list of tasks.
176
+ - Store the tasks in a SQLite database.
177
+ ```
178
+
179
+ ![Agent prompting](assets/tuto_open_hands/agent_prompting.png)
180
+
181
+
182
+ 2. Let's see the result
183
+
184
+ You should see the agent construct the app and be able to explore the code it generated.
185
+
186
+ If it doesn't do it automatically, ask Devstral to deploy the app or do it manually, and then go the front URL deployment to see the app.
187
+
188
+ ![Agent working](assets/tuto_open_hands/agent_working.png)
189
+ ![App UI](assets/tuto_open_hands/app_ui.png)
190
+
191
+
192
+ 3. Iterate
193
+
194
+ Now that you have a first result you can iterate on it by asking your agent to improve it. For example, in the app generated we could click on a task to mark it checked but having a checkbox would improve UX. You could also ask it to add a feature to edit a task, or to add a feature to filter the tasks by status.
195
+
196
+ Enjoy building with Devstral Small and OpenHands!
197
+
198
+ </details>
199
+
200
+
201
+ ### vLLM (recommended)
202
+
203
+ We recommend using this model with the [vLLM library](https://github.com/vllm-project/vllm)
204
+ to implement production-ready inference pipelines.
205
+
206
+ **_Installation_**
207
+
208
+ Make sure you install [`vLLM >= 0.8.5`](https://github.com/vllm-project/vllm/releases/tag/v0.8.5):
209
+
210
+ ```
211
+ pip install vllm --upgrade
212
+ ```
213
+
214
+ Doing so should automatically install [`mistral_common >= 1.5.5`](https://github.com/mistralai/mistral-common/releases/tag/v1.5.5).
215
+
216
+ To check:
217
+ ```
218
+ python -c "import mistral_common; print(mistral_common.__version__)"
219
+ ```
220
+
221
+ You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
222
+
223
+ #### Server
224
+
225
+ We recommand that you use Devstral in a server/client setting.
226
+
227
+ 1. Spin up a server:
228
+
229
+ ```
230
+ vllm serve mistralai/Devstral-Small-2505 --tokenizer_mode mistral --config_format mistral --load_format mistral --tool-call-parser mistral --enable-auto-tool-choice --tensor-parallel-size 2
231
+ ```
232
+
233
+
234
+ 2. To ping the client you can use a simple Python snippet.
235
+
236
+ ```py
237
+ import requests
238
+ import json
239
+ from huggingface_hub import hf_hub_download
240
+
241
+
242
+ url = "http://<your-server-url>:8000/v1/chat/completions"
243
+ headers = {"Content-Type": "application/json", "Authorization": "Bearer token"}
244
+
245
+ model = "mistralai/Devstral-Small-2505"
246
+
247
+ def load_system_prompt(repo_id: str, filename: str) -> str:
248
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
249
+ with open(file_path, "r") as file:
250
+ system_prompt = file.read()
251
+ return system_prompt
252
+
253
+ SYSTEM_PROMPT = load_system_prompt(model, "SYSTEM_PROMPT.txt")
254
+
255
+ messages = [
256
+ {"role": "system", "content": SYSTEM_PROMPT},
257
+ {
258
+ "role": "user",
259
+ "content": [
260
+ {
261
+ "type": "text",
262
+ "text": "<your-command>",
263
+ },
264
+ ],
265
+ },
266
+ ]
267
+
268
+ data = {"model": model, "messages": messages, "temperature": 0.15}
269
+
270
+ response = requests.post(url, headers=headers, data=json.dumps(data))
271
+ print(response.json()["choices"][0]["message"]["content"])
272
+ ```
273
+
274
+ ### Mistral-inference
275
+
276
+ We recommend using mistral-inference to quickly try out / "vibe-check" Devstral.
277
+
278
+ #### Install
279
+
280
+ Make sure to have mistral_inference >= 1.6.0 installed.
281
+
282
+ ```bash
283
+ pip install mistral_inference --upgrade
284
+ ```
285
+
286
+ #### Download
287
+
288
+ ```python
289
+ from huggingface_hub import snapshot_download
290
+ from pathlib import Path
291
+
292
+ mistral_models_path = Path.home().joinpath('mistral_models', 'Devstral')
293
+ mistral_models_path.mkdir(parents=True, exist_ok=True)
294
+
295
+ snapshot_download(repo_id="mistralai/Devstral-Small-2505", allow_patterns=["params.json", "consolidated.safetensors", "tekken.json"], local_dir=mistral_models_path)
296
+ ```
297
+
298
+ #### Python
299
+
300
+ You can run the model using the following command:
301
+
302
+ ```bash
303
+ mistral-chat $HOME/mistral_models/Devstral --instruct --max_tokens 300
304
+ ```
305
+
306
+ You can then prompt it with anything you'd like.
307
+
308
+ ### Transformers
309
+
310
+ To make the best use of our model with transformers make sure to have [installed](https://github.com/mistralai/mistral-common) ` mistral-common >= 1.5.5` to use our tokenizer.
311
+
312
+ ```bash
313
+ pip install mistral-common --upgrade
314
+ ```
315
+
316
+ Then load our tokenizer along with the model and generate:
317
+
318
+ ```python
319
+ import torch
320
+
321
+ from mistral_common.protocol.instruct.messages import (
322
+ SystemMessage, UserMessage
323
+ )
324
+ from mistral_common.protocol.instruct.request import ChatCompletionRequest
325
+ from mistral_common.tokens.tokenizers.mistral import MistralTokenizer
326
+ from mistral_common.tokens.tokenizers.tekken import SpecialTokenPolicy
327
+ from huggingface_hub import hf_hub_download
328
+ from transformers import AutoModelForCausalLM
329
+
330
+ def load_system_prompt(repo_id: str, filename: str) -> str:
331
+ file_path = hf_hub_download(repo_id=repo_id, filename=filename)
332
+ with open(file_path, "r") as file:
333
+ system_prompt = file.read()
334
+ return system_prompt
335
+
336
+ model_id = "mistralai/Devstral-Small-2505"
337
+ tekken_file = hf_hub_download(repo_id=model_id, filename="tekken.json")
338
+ SYSTEM_PROMPT = load_system_prompt(model_id, "SYSTEM_PROMPT.txt")
339
+
340
+ tokenizer = MistralTokenizer.from_file(tekken_file)
341
+
342
+ model = AutoModelForCausalLM.from_pretrained(model_id)
343
+
344
+ tokenized = tokenizer.encode_chat_completion(
345
+ ChatCompletionRequest(
346
+ messages=[
347
+ SystemMessage(content=SYSTEM_PROMPT),
348
+ UserMessage(content="<your-command>"),
349
+ ],
350
+ )
351
+ )
352
+
353
+ output = model.generate(
354
+ input_ids=torch.tensor([tokenized.tokens]),
355
+ max_new_tokens=1000,
356
+ )[0]
357
+
358
+ decoded_output = tokenizer.decode(output[len(tokenized.tokens):])
359
+ print(decoded_output)
360
+ ```
361
+
362
+ ### LMStudio
363
+ Download the weights from huggingface:
364
+
365
+ ```
366
+ pip install -U "huggingface_hub[cli]"
367
+ huggingface-cli download \
368
+ "mistralai/Devstral-Small-2505_gguf" \
369
+ --include "devstralQ4_K_M.gguf" \
370
+ --local-dir "mistralai/Devstral-Small-2505_gguf/"
371
+ ```
372
+
373
+ You can serve the model locally with [LMStudio](https://lmstudio.ai/).
374
+ * Download [LM Studio](https://lmstudio.ai/) and install it
375
+ * Install `lms cli ~/.lmstudio/bin/lms bootstrap`
376
+ * In a bash terminal, run `lms import devstralQ4_K_M.gguf` in the directory where you've downloaded the model checkpoint (e.g. `mistralai/Devstral-Small-2505_gguf`)
377
+ * Open the LMStudio application, click the terminal icon to get into the developer tab. Click select a model to load and select Devstral Q4 K M. Toggle the status button to start the model, in setting toggle Serve on Local Network to be on.
378
+ * On the right tab, you will see an API identifier which should be devstralq4_k_m and an api address under API Usage. Keep note of this address, we will use it in the next step.
379
+
380
+ Launch Openhands
381
+ You can now interact with the model served from LM Studio with openhands. Start the openhands server with the docker
382
+
383
+ ```bash
384
+ docker pull docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik
385
+ docker run -it --rm --pull=always \
386
+ -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:0.38-nikolaik \
387
+ -e LOG_ALL_EVENTS=true \
388
+ -v /var/run/docker.sock:/var/run/docker.sock \
389
+ -v ~/.openhands-state:/.openhands-state \
390
+ -p 3000:3000 \
391
+ --add-host host.docker.internal:host-gateway \
392
+ --name openhands-app \
393
+ docker.all-hands.dev/all-hands-ai/openhands:0.38
394
+ ```
395
+
396
+ Click “see advanced setting” on the second line.
397
+ In the new tab, toggle advanced to on. Set the custom model to be mistral/devstralq4_k_m and Base URL the api address we get from the last step in LM Studio. Set API Key to dummy. Click save changes.
398
+
399
+
400
+ ### Ollama
401
+
402
+ You can run Devstral using the [Ollama](https://ollama.ai/) CLI.
403
+
404
+ ```bash
405
+ ollama run devstral
406
+ ```