What distinguishes this model from another MLX-Community/Qwen3-30B-A3B-4bit-DWQ?

#1
by zletpm - opened
MLX Community org

What distinguishes this model from another MLX-Community/Qwen3-30B-A3B-4bit-DWQ?

MLX Community org

This was distilled from the unquantized model, and MLX-Community/Qwen3-30B-A3B-4bit-DWQ was distilled from the 6bit quantized model.

MLX Community org

Thank you for clarifying.

What about MLX-Community/Qwen3-30B-A3B-4bit-DWQ-05082025?

Edit: I just realized that that one was uploaded by Awni, and I believe he's currently experimenting with different data sets.

What about MLX-Community/Qwen3-30B-A3B-4bit-DWQ-05082025?

Edit: I just realized that that one was uploaded by Awni, and I believe he's currently experimenting with different data sets.

I'm wondering if there is any idea on which has better accuracy/quality?

Hard to say at this point. Basically the data set initially used for DWQ didn't have any thinking mode examples, and, somewhat surprisingly, distillation with it had the effect of actually boosting the quality/benchmarks of the DWQ model above the original BF16 model (perhaps because those benchmarks don't test the thinking mode). I'm not sure if thinking mode actually degraded, though, but either way more testing is needed and the data set should be adjusted accordingly.

Sign up or log in to comment