What distinguishes this model from another MLX-Community/Qwen3-30B-A3B-4bit-DWQ?
What distinguishes this model from another MLX-Community/Qwen3-30B-A3B-4bit-DWQ?
This was distilled from the unquantized model, and MLX-Community/Qwen3-30B-A3B-4bit-DWQ was distilled from the 6bit quantized model.
Thank you for clarifying.
What about MLX-Community/Qwen3-30B-A3B-4bit-DWQ-05082025?
Edit: I just realized that that one was uploaded by Awni, and I believe he's currently experimenting with different data sets.
What about MLX-Community/Qwen3-30B-A3B-4bit-DWQ-05082025?
Edit: I just realized that that one was uploaded by Awni, and I believe he's currently experimenting with different data sets.
I'm wondering if there is any idea on which has better accuracy/quality?
Hard to say at this point. Basically the data set initially used for DWQ didn't have any thinking mode examples, and, somewhat surprisingly, distillation with it had the effect of actually boosting the quality/benchmarks of the DWQ model above the original BF16 model (perhaps because those benchmarks don't test the thinking mode). I'm not sure if thinking mode actually degraded, though, but either way more testing is needed and the data set should be adjusted accordingly.