INC4AI commited on
Commit
7a4cbc9
·
verified ·
1 Parent(s): 3534960

Update vllm eval results

Browse files
Files changed (1) hide show
  1. README.md +27 -0
README.md CHANGED
@@ -10,6 +10,13 @@ This model is a int4 model with group_size 128 and symmetric quantization of [de
10
  Please follow the license of the original model.
11
 
12
  ## How To Use
 
 
 
 
 
 
 
13
  ### INT4 Inference
14
  Potential overflow/underflow issues have been observed on CUDA, primarily due to kernel limitations.
15
  For better accuracy, we recommend deploying the model on CPU or using [our INT4 mixed version](https://huggingface.co/Intel/DeepSeek-V3.1-int4-mixed-AutoRound)
@@ -168,6 +175,26 @@ autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map, n
168
  autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
169
  ```
170
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
171
 
172
  ## Ethical Considerations and Limitations
173
 
 
10
  Please follow the license of the original model.
11
 
12
  ## How To Use
13
+
14
+ ### vLLM usage
15
+
16
+ ~~~bash
17
+ vllm serve Intel/DeepSeek-V3.1-int4-AutoRound
18
+ ~~~
19
+
20
  ### INT4 Inference
21
  Potential overflow/underflow issues have been observed on CUDA, primarily due to kernel limitations.
22
  For better accuracy, we recommend deploying the model on CPU or using [our INT4 mixed version](https://huggingface.co/Intel/DeepSeek-V3.1-int4-mixed-AutoRound)
 
175
  autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
176
  ```
177
 
178
+ ## Evaluate Results
179
+
180
+ | benchmark | backend | Intel/DeepSeek-V3.1-int4-AutoRound | deepseek-ai/DeepSeek-V3.1 |
181
+ | :-------: | :-----: | :--------------------------------: | :-----------------------: |
182
+ | mmlu_pro | vllm | 0.7865 | 0.7965 |
183
+
184
+ ```
185
+ # key dependency version
186
+ torch 2.8.0
187
+ transformers 4.56.2
188
+ lm_eval 0.4.9.1
189
+ vllm 0.10.2rc3.dev291+g535d80056.precompiled
190
+
191
+ # eval cmd
192
+ CUDA_VISIBLE_DEVICES=0,1,2,3 VLLM_WORKER_MULTIPROC_METHOD=spawn \
193
+ lm_eval --model vllm \
194
+ --model_args pretrained=Intel/DeepSeek-V3.1-int4-AutoRound,dtype=bfloat16,trust_remote_code=False,tensor_parallel_size=4,gpu_memory_utilization=0.95 \
195
+ --tasks mmlu_pro \
196
+ --batch_size 64
197
+ ```
198
 
199
  ## Ethical Considerations and Limitations
200