Fine-tuned Gemma-3-1b model produces gibberish/empty output after quantization (GPTQ/AWQ/BitsAndBytes all fail)

#13

by grpathak22 - opened 22 days ago

22 days ago

Environment:

Model: google/gemma-3-1b-pt fine-tuned with Unsloth LoRA (r=8) trained with ChatML Format(as this is pretrained model)
Full precision model: Works perfectly, proper expected responses
Hardware: L40S 48GB VRAM

Issue:
After fine-tuning with Unsloth LoRA and merging weights, all quantization methods fail while the full precision model works perfectly.
Quantization Results:

AWQ (W4A16, W8A16): Produces repetitive gibberish loops and repeating endlessly)
GPTQ (W4A16, W8A8): Outputs all zeros immediately, no actual computation (returns in 20-30sec vs 1min for full precision model)
BitsAndBytes (4-bit, 8-bit): Gibberish output with repetition loops for 8bit and blank output for bit
All methods tried with/without ignore=["lm_head"]

Debugging Done:

Tested different generation parameters (temperature, repetition_penalty, sampling)
Tried various prompt formats (ChatML, simple text)
Verified model dtype shows torch.float16 even after "quantization" (suggesting silent failures)
Full precision model generates proper responses in ~1 minute

Are there quantization parameters specifically recommended for LoRA-merged models, or should quantization-aware training be used instead of post-training quantization for fine-tuned models?
Any guidance on successful quantization of fine-tuned Gemma models would be appreciated.
Thanks!

lkv

Google org 3 days ago

Hi,

Thanks for sharing the detailed description of your issue, Quantizing LoRA-merged models, especially large language models like google/gemma-3-1b-pt , can indeed be challenging due to several factors.
I recommend trying quantization-aware training (QAT) instead, which helps the model adapt during fine-tuning. Also, ensure you’re using quantization tools that explicitly support LoRA models (like the latest BitsAndBytes or GPTQ forks) . Hybrid approaches—keeping some layers in higher precision —can help too.

Kindly try and let me know if you have any concerns. Thank you.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment