This model breaks apart after a few tokens when the context gets long

#1
by stduhpf - opened

tested with llama.cpp commit 0c74b04376b0b9efc096480fe10f866afc8d7c1c.

After some time this model consistently starts generating gibberish, sometimes repeating the same token over and over again. Typically this starts hapenning around 600 tokens in, sometimes earlier.

I couldn't replicate this problem with the original model converted to fp16, nor any other quantized version of it (even q4_0).

Something is wrong specifically with this qat version.

To reproduce:
llama-cli.exe -m models\gemma-3-1b-it-Q4_0.gguf -ngl 99 -t 6 -tb 12 -c 16384 -sm none -p "Hello! " --ignore-eos -n 768

The pretrained version doesn't seem to have the same problem, it's really just this one.

yes, I'm facing the same problem.

I tried the recommended hyper-parameters as well:
temp=1.0,
top-p = 0.95,
top-k = 64

but same problem.

I have exactly the same problem with the 1b qat model.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment