How you convert from guff to AWQ

by prudant - opened about 5 hours ago

about 5 hours ago

can you explain the process or give as a notebook/link reference to reproduce with other models? do you need a lot of gpu to conversion?

BEST REGARDS!

gaunernst

Owner about 5 hours ago

It was an ad-hoc script so I don't keep the script around. No GPU is needed, it's just re-pack the weight in a different format. Everything runs on CPU.

I wrote my own library to read GGUF file https://github.com/gau-nernst/gguf-pytorch. Then it's just a matter of mapping the weight names and weight layout.

prudant

about 5 hours ago

would be a great tool, a lot of models are in gguf format, and not in AWQ, thus AWQ in VLLM is a lot performant in speed with marlin kernels

gaunernst

Owner about 4 hours ago

I can find that script again and share with you. But the problem is most GGUFs nowadays are not directly convertible to AWQ. Only Gemma3-QAT can be converted to AWQ because it uses Q4_0, which is falling out of favour among the GGUF community. Formats like Q4_K_M are not compatible.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment