Is the BF16 gguf any different from the F16 one? (speed/accuracy)

#10

by CHNtentes - opened 16 days ago

Discussion

CHNtentes

16 days ago

Thanks for your work!

shimmyshimmer

Unsloth AI org 16 days ago

Thanks for your work!

The only difference is that all the weights are in BF16 instead of f16, expect for the MOE layers which are still in MXFP4. The FP32 has ALL layers in BF16 instead of FP4.

Unsure on speeds

CHNtentes

16 days ago

Thanks for your work!

The only difference is that all the weights are in BF16 instead of f16, expect for the MOE layers which are still in MXFP4. The FP32 has ALL layers in BF16 instead of FP4.

Unsure on speeds

Thanks for your reply. I'll download both and do some tests.

CHNtentes

16 days ago

I tested F16 in this repo and the mxfp4 one from ggml org. Although this one is bigger in size and a bit slower, it performs better in my math and logic questions.

CHNtentes

16 days ago

As for the BF16 one, the speed is pretty similar to F16, but there's some weird cut-off issues. Basically every time when it replies "calculate with python", it stops soon after that.

CHNtentes

16 days ago

After some more tests, it seems F16 also has this problem.

CHNtentes

16 days ago

After some more tests, it seems F16 also has this problem.

@shimmyshimmer Could you guys reproduce this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment