GPTQ or AWQ Quants
#12
by
guialfaro
- opened
Are there any GPTQ or AWQ available?
You can use llama.cpp to run the gguf quantization of the model.
💬 @xldistance AWQ delivers superior accuracy at equivalent quantization levels, even with higher bit-width implementations. vllm/sglang(:supporting awq) demonstrates significantly faster performance metrics compared to llama.cpp (or ollama : supporing GGUF) deployments.
@guialfaro I've uploaded one here https://huggingface.co/mratsim/GLM-4-32B-0414.w4a16-gptq