Is the BF16 gguf any different from the F16 one? (speed/accuracy)

#10
by CHNtentes - opened

Thanks for your work!

Unsloth AI org

Thanks for your work!

The only difference is that all the weights are in BF16 instead of f16, expect for the MOE layers which are still in MXFP4. The FP32 has ALL layers in BF16 instead of FP4.

Unsure on speeds

Thanks for your work!

The only difference is that all the weights are in BF16 instead of f16, expect for the MOE layers which are still in MXFP4. The FP32 has ALL layers in BF16 instead of FP4.

Unsure on speeds

Thanks for your reply. I'll download both and do some tests.

I tested F16 in this repo and the mxfp4 one from ggml org. Although this one is bigger in size and a bit slower, it performs better in my math and logic questions.

As for the BF16 one, the speed is pretty similar to F16, but there's some weird cut-off issues. Basically every time when it replies "calculate with python", it stops soon after that.

image.png

After some more tests, it seems F16 also has this problem.

image.png

After some more tests, it seems F16 also has this problem.

image.png

@shimmyshimmer Could you guys reproduce this?

Sign up or log in to comment