How you convert from guff to AWQ
can you explain the process or give as a notebook/link reference to reproduce with other models? do you need a lot of gpu to conversion?
BEST REGARDS!
It was an ad-hoc script so I don't keep the script around. No GPU is needed, it's just re-pack the weight in a different format. Everything runs on CPU.
I wrote my own library to read GGUF file https://github.com/gau-nernst/gguf-pytorch. Then it's just a matter of mapping the weight names and weight layout.
would be a great tool, a lot of models are in gguf format, and not in AWQ, thus AWQ in VLLM is a lot performant in speed with marlin kernels
I can find that script again and share with you. But the problem is most GGUFs nowadays are not directly convertible to AWQ. Only Gemma3-QAT can be converted to AWQ because it uses Q4_0, which is falling out of favour among the GGUF community. Formats like Q4_K_M are not compatible.