Thank you so much for making my day!

#1
by Maria99934 - opened

I would like to express my deep gratitude for the work done!
There were two files: mxfp4 - 12 gb + mxfp4 - 15 gb
I downloaded both, both support reasoning in Russian.

they respond very well. However, how do they differ due to weight if both mxfp4?

I want to point out that the 15 gb version seems a little more accurate. although both performed well.

I do not know what you are doing with it, but it is funny to me that the original OpenAI cannot think in Russian, and the quantized version from other specialists like you can...

Qwen deepseek are Chinese, they have multilingual reasoning out of the box

Thank you so much for making my day!

I would like to express my deep gratitude for the work done!
There were two files: mxfp4 - 12 gb + mxfp4 - 15 gb
I downloaded both, both support reasoning in Russian.

they respond very well. However, how do they differ due to weight if both mxfp4?

I want to point out that the 15 gb version seems a little more accurate. although both performed well.

I do not know what you are doing with it, but it is funny to me that the original OpenAI cannot think in Russian, and the quantized version from other specialists like you can...

Qwen deepseek are Chinese, they have multilingual reasoning out of the box

Thank you so much for making my day!

In these quantized models, most layers were compressed using MXFP4 to make the file smaller and faster. The main change is in the gate layers (ffn_gate_exps.weight) that control which experts the model uses. In the Q4_1 version (≈12 GB), these layers are smaller and quicker but slightly less precise, while in the Q8_0 version (≈15 GB), they keep more detail for more accurate decisions, though the file is larger and a bit slower. All other layers remain the same.

Sign up or log in to comment