Intel
/

DeepSeek-V3.1-int4-AutoRound

Text Generation

4-bit precision

Model card Files Files and versions

INC4AI commited on 22 days ago

Commit

7a4cbc9

·

verified ·

1 Parent(s): 3534960

Update vllm eval results

Files changed (1) hide show

README.md +27 -0

README.md CHANGED Viewed

@@ -10,6 +10,13 @@ This model is a int4 model with group_size 128 and symmetric quantization of [de
 Please follow the license of the original model.
 ## How To Use
 ### INT4 Inference
 Potential overflow/underflow issues have been observed on CUDA, primarily due to kernel limitations.
 For better accuracy, we recommend deploying the model on CPU or using [our INT4 mixed version](https://huggingface.co/Intel/DeepSeek-V3.1-int4-mixed-AutoRound)
@@ -168,6 +175,26 @@ autoround = AutoRound(model=model, tokenizer=tokenizer, device_map=device_map, n
 autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
 ```
 ## Ethical Considerations and Limitations

 Please follow the license of the original model.
 ## How To Use
+### vLLM usage
+~~~bash
+vllm serve Intel/DeepSeek-V3.1-int4-AutoRound
+~~~
 ### INT4 Inference
 Potential overflow/underflow issues have been observed on CUDA, primarily due to kernel limitations.
 For better accuracy, we recommend deploying the model on CPU or using [our INT4 mixed version](https://huggingface.co/Intel/DeepSeek-V3.1-int4-mixed-AutoRound)
 autoround.quantize_and_save(format="auto_round", output_dir="tmp_autoround")
 ```
+## Evaluate Results
+| benchmark | backend | Intel/DeepSeek-V3.1-int4-AutoRound | deepseek-ai/DeepSeek-V3.1 |
+| :-------: | :-----: | :--------------------------------: | :-----------------------: |
+| mmlu_pro  |  vllm   |               0.7865               |            0.7965         |
+```
+# key dependency version
+torch                             2.8.0
+transformers                      4.56.2
+lm_eval                           0.4.9.1
+vllm                              0.10.2rc3.dev291+g535d80056.precompiled
+# eval cmd
+CUDA_VISIBLE_DEVICES=0,1,2,3 VLLM_WORKER_MULTIPROC_METHOD=spawn \
+lm_eval --model vllm \
+--model_args pretrained=Intel/DeepSeek-V3.1-int4-AutoRound,dtype=bfloat16,trust_remote_code=False,tensor_parallel_size=4,gpu_memory_utilization=0.95 \
+--tasks mmlu_pro \
+--batch_size 64
+```
 ## Ethical Considerations and Limitations