DeepSeek-V3-AWQ / README.md

v2ray

Update README.md

950a929 verified 6 days ago

preview code

raw

history blame contribute delete

432 Bytes

metadata

license: mit
language:
  - en
  - zh
base_model:
  - deepseek-ai/DeepSeek-V3
pipeline_tag: text-generation
library_name: transformers

DeepSeek V3 AWQ

AWQ of the DeepSeek V3 chat model.

This quant modified some of the model code to fix the overflow issue when using float16.

Tested on vLLM with 8x H100, inference speed 5 tokens per second with batch size 1 and short prompt, 12 tokens per second when using moe_wna16 kernel.