GPTQ or AWQ Quants

#12

by guialfaro - opened 15 days ago

Discussion

guialfaro

15 days ago

Are there any GPTQ or AWQ available?

xldistance

15 days ago

You can use llama.cpp to run the gguf quantization of the model.

hyunw55

10 days ago

💬 @xldistance AWQ delivers superior accuracy at equivalent quantization levels, even with higher bit-width implementations. vllm/sglang(:supporting awq) demonstrates significantly faster performance metrics compared to llama.cpp (or ollama : supporing GGUF) deployments.

mratsim

6 days ago

@guialfaro I've uploaded one here https://huggingface.co/mratsim/GLM-4-32B-0414.w4a16-gptq

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment