Update README.md
Browse files
README.md
CHANGED
@@ -13,4 +13,4 @@ AWQ of the DeepSeek V3 chat model.
|
|
13 |
|
14 |
This quant modified some of the model code to fix the overflow issue when using float16.
|
15 |
|
16 |
-
Tested on vLLM with 8x H100, inference speed 5 tokens
|
|
|
13 |
|
14 |
This quant modified some of the model code to fix the overflow issue when using float16.
|
15 |
|
16 |
+
Tested on vLLM with 8x H100, inference speed 5 tokens per second with batch size 1 and short prompt, 12 tokens per second when using `moe_wna16` kernel.
|