Problems with FP32 model
I'm using Ooba's Web UI (https://github.com/oobabooga/text-generation-webui).
I downloaded both the FP16 and FP32 models. The FP16 version works perfectly, but the FP32 version has problems:
- It has not yet been updated with the template fix present in FP16, so starting a chat will crash unless you manually apply the fix
- Once the template is fixed, and you start chatting, the output will be nonsense.
For example, if I ask FP32, "Who is George Washington?" the output will be multiple lines of periods:
...?β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦......β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦...β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦...β¦β¦β¦β¦β¦β¦β¦β¦...β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦β¦
I'm just using the default parameter settings of Ooba which work fine for the FP16 model.
Here are the links tot he two models I'm talking about:
https://huggingface.co/unsloth/gpt-oss-20b-GGUF/blob/main/gpt-oss-20b-F16.gguf
https://huggingface.co/unsloth/gpt-oss-20b-GGUF/blob/main/gpt-oss-20b-F32.gguf
I verified both models with SHA256 sums.
Also note that I can run the 6 and 8 bit quants from your GPT OSS 120B GGUF model, and they work perfectly, while being larger than the FP32 model here. So I don't think it's a memory issue.
For reference, I'm talking about these two (which work fine):
https://huggingface.co/unsloth/gpt-oss-120b-GGUF/tree/main/UD-Q6_K_XL
https://huggingface.co/unsloth/gpt-oss-120b-GGUF/tree/main/UD-Q8_K_XL
System: Intel 13900K, 128GB RAM, 5060 Ti 16GB + 4060 Ti 16GB.
Thanks will investigate. For now we'll delete it as it's the only version we did not update
OK, I hope to see it again. I wasn't trying to get it deleted.