Update vllm eval results
Browse files
README.md
CHANGED
@@ -10,6 +10,13 @@ This model is a int4 model with group_size 128 and symmetric quantization of [de
|
|
10 |
Please follow the license of the original model.
|
11 |
|
12 |
## How To Use
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
### INT4 Inference
|
14 |
Potential overflow/underflow issues have been observed on CUDA, primarily due to kernel limitations.
|
15 |
For better accuracy, we recommend deploying the model on CPU or using [our INT4 mixed version](https://huggingface.co/Intel/DeepSeek-V3.1-int4-mixed-AutoRound)
|
@@ -168,6 +175,26 @@ autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map, n
|
|
168 |
autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
|
169 |
```
|
170 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
171 |
|
172 |
## Ethical Considerations and Limitations
|
173 |
|
|
|
10 |
Please follow the license of the original model.
|
11 |
|
12 |
## How To Use
|
13 |
+
|
14 |
+
### vLLM usage
|
15 |
+
|
16 |
+
~~~bash
|
17 |
+
vllm serve Intel/DeepSeek-V3.1-int4-AutoRound
|
18 |
+
~~~
|
19 |
+
|
20 |
### INT4 Inference
|
21 |
Potential overflow/underflow issues have been observed on CUDA, primarily due to kernel limitations.
|
22 |
For better accuracy, we recommend deploying the model on CPU or using [our INT4 mixed version](https://huggingface.co/Intel/DeepSeek-V3.1-int4-mixed-AutoRound)
|
|
|
175 |
autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
|
176 |
```
|
177 |
|
178 |
+
## Evaluate Results
|
179 |
+
|
180 |
+
| benchmark | backend | Intel/DeepSeek-V3.1-int4-AutoRound | deepseek-ai/DeepSeek-V3.1 |
|
181 |
+
| :-------: | :-----: | :--------------------------------: | :-----------------------: |
|
182 |
+
| mmlu_pro | vllm | 0.7865 | 0.7965 |
|
183 |
+
|
184 |
+
```
|
185 |
+
# key dependency version
|
186 |
+
torch 2.8.0
|
187 |
+
transformers 4.56.2
|
188 |
+
lm_eval 0.4.9.1
|
189 |
+
vllm 0.10.2rc3.dev291+g535d80056.precompiled
|
190 |
+
|
191 |
+
# eval cmd
|
192 |
+
CUDA_VISIBLE_DEVICES=0,1,2,3 VLLM_WORKER_MULTIPROC_METHOD=spawn \
|
193 |
+
lm_eval --model vllm \
|
194 |
+
--model_args pretrained=Intel/DeepSeek-V3.1-int4-AutoRound,dtype=bfloat16,trust_remote_code=False,tensor_parallel_size=4,gpu_memory_utilization=0.95 \
|
195 |
+
--tasks mmlu_pro \
|
196 |
+
--batch_size 64
|
197 |
+
```
|
198 |
|
199 |
## Ethical Considerations and Limitations
|
200 |
|