OPEA
/

gemma-3-27b-it-int4-AutoRound-cpu

4-bit precision

intel/auto-round

Model card Files Files and versions Community

so consider build a model for GPU?

#1

by kq - opened 11 days ago

kq

11 days ago

your models are tested better precision than others quantized models.
Is it considerable to convert this model to GPU version.
By borrowing the idea from this project, https://huggingface.co/ISTA-DASLab/gemma-3-27b-it-GPTQ-4b-128g
Qwq32B && Gemma-3 is tested the best Local model in the world. It's important this leading mode distributed to the world.

Open Platform for Enterprise AI org 11 days ago

Thank you for your kind words and interest in our models! The main issue is not with packing but rather with the kernel or framework. If the framework supports running unquantized layers in BF16, the model should work as expected. Actually, the model is already packed in GPTQ format. You can update quant_method in config.json to gptq and remove the backend key. As long as Transformers or vLLM can run a GPTQ model with BF16 precision for unquantized layer, it should be fine.

Given these considerations and our limited team size, we currently do not plan to support this.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment