Problems with FP32 model

#25
by YardWeasel - opened

I'm using Ooba's Web UI (https://github.com/oobabooga/text-generation-webui).

I downloaded both the FP16 and FP32 models. The FP16 version works perfectly, but the FP32 version has problems:

  • It has not yet been updated with the template fix present in FP16, so starting a chat will crash unless you manually apply the fix
  • Once the template is fixed, and you start chatting, the output will be nonsense.

For example, if I ask FP32, "Who is George Washington?" the output will be multiple lines of periods:

...?……………………………………………………………………………………………………………………………………………………………………………………………………………......…………………………...…………………………………………………………………………………………………………………………………………………………………………………………...……………………...……………………………………………………………………………………………

I'm just using the default parameter settings of Ooba which work fine for the FP16 model.

Here are the links tot he two models I'm talking about:
https://huggingface.co/unsloth/gpt-oss-20b-GGUF/blob/main/gpt-oss-20b-F16.gguf
https://huggingface.co/unsloth/gpt-oss-20b-GGUF/blob/main/gpt-oss-20b-F32.gguf

I verified both models with SHA256 sums.

Also note that I can run the 6 and 8 bit quants from your GPT OSS 120B GGUF model, and they work perfectly, while being larger than the FP32 model here. So I don't think it's a memory issue.

For reference, I'm talking about these two (which work fine):
https://huggingface.co/unsloth/gpt-oss-120b-GGUF/tree/main/UD-Q6_K_XL
https://huggingface.co/unsloth/gpt-oss-120b-GGUF/tree/main/UD-Q8_K_XL

System: Intel 13900K, 128GB RAM, 5060 Ti 16GB + 4060 Ti 16GB.

Unsloth AI org

Thanks will investigate. For now we'll delete it as it's the only version we did not update

OK, I hope to see it again. I wasn't trying to get it deleted.

Sign up or log in to comment