flin775/Qwen3-1.7B-AWQ · Hugging Face

This is the AWQ(4bit) quantization model of Qwen3-1.7B, which is created by AutoAwq w/ the following config, featuring very low GPU memory usage, high throughput, and extremely fast response speed

{
  "zero_point": True,
  "q_group_size": 16,
  "w_bit": 4,
  "version": "GEMM"
}

P.S. g_group_size is set as 16 for high accuracy.

LLM serving

This model does not work with vllm, as the minimum g_group_size required by vllm is 32. please check the model 'Qwen3-1.7B-AWQ-Group32' for VLLM.
It is recommended to serve the model with lmdeploy, works very very well.

It is recommended to install lmdeploy from git instead of the pip official repo, as the version of 0.8.0 in pip official repo has problem work with the latest transformers lib.
```
pip install git+https://github.com/InternLM/lmdeploy
```

flin775
/

Qwen3-1.7B-AWQ

LLM serving

Model tree for flin775/Qwen3-1.7B-AWQ