so consider build a model for GPU?

#1
by kq - opened

your models are tested better precision than others quantized models.
Is it considerable to convert this model to GPU version.
By borrowing the idea from this project, https://huggingface.co/ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g
Qwq32B && Gemma-3 is tested the best Local model in the world. It's important this leading mode distributed to the world.

Open Platform for Enterprise AI org

Thank you for your kind words and interest in our models! The main issue is not with packing but rather with the kernel or framework. If the framework supports running unquantized layers in BF16, the model should work as expected. Actually, the model is already packed in GPTQ format. You can update quant_method in config.json to gptq and remove the backend key. As long as Transformers or vLLM can run a GPTQ model with BF16 precision for unquantized layer, it should be fine.

Given these considerations and our limited team size, we currently do not plan to support this.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment