Is the BF16 gguf any different from the F16 one? (speed/accuracy)
Thanks for your work!
Thanks for your work!
The only difference is that all the weights are in BF16 instead of f16, expect for the MOE layers which are still in MXFP4. The FP32 has ALL layers in BF16 instead of FP4.
Unsure on speeds
Thanks for your work!
The only difference is that all the weights are in BF16 instead of f16, expect for the MOE layers which are still in MXFP4. The FP32 has ALL layers in BF16 instead of FP4.
Unsure on speeds
Thanks for your reply. I'll download both and do some tests.
I tested F16 in this repo and the mxfp4 one from ggml org. Although this one is bigger in size and a bit slower, it performs better in my math and logic questions.
After some more tests, it seems F16 also has this problem.
@shimmyshimmer Could you guys reproduce this?