This is the AWQ(4bit) quantization model of Qwen3-1.7B, which is created by AutoAwq w/ the following config, featuring very low GPU memory usage, high throughput, and extremely fast response speed

{
  "zero_point": True,
  "q_group_size": 16,
  "w_bit": 4,
  "version": "GEMM"
}

P.S. g_group_size is set as 16 for high accuracy.

LLM serving

  • This model does not work with vllm, as the minimum g_group_size required by vllm is 32. please check the model 'Qwen3-1.7B-AWQ-Group32' for VLLM.

  • It is recommended to serve the model with lmdeploy, works very very well.

    It is recommended to install lmdeploy from git instead of the pip official repo, as the version of 0.8.0 in pip official repo has problem work with the latest transformers lib.

    pip install git+https://github.com/InternLM/lmdeploy
    
Downloads last month
10
Safetensors
Model size
587M params
Tensor type
I32
BF16
FP16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for flin775/Qwen3-1.7B-AWQ

Finetuned
Qwen/Qwen3-1.7B
Quantized
(72)
this model