What distinguishes this model from another MLX-Community/Qwen3-30B-A3B-4bit-DWQ?

by zletpm - opened May 8

Discussion

zletpm

MLX Community org May 8

What distinguishes this model from another MLX-Community/Qwen3-30B-A3B-4bit-DWQ?

mzbac

MLX Community org May 8

This was distilled from the unquantized model, and MLX-Community/Qwen3-30B-A3B-4bit-DWQ was distilled from the 6bit quantized model.

zletpm

MLX Community org May 8

Thank you for clarifying.

combin8

May 9

•

edited May 9

What about MLX-Community/Qwen3-30B-A3B-4bit-DWQ-05082025?

Edit: I just realized that that one was uploaded by Awni, and I believe he's currently experimenting with different data sets.

Xonaz81

May 9

What about MLX-Community/Qwen3-30B-A3B-4bit-DWQ-05082025?

Edit: I just realized that that one was uploaded by Awni, and I believe he's currently experimenting with different data sets.

I'm wondering if there is any idea on which has better accuracy/quality?

combin8

May 9

Hard to say at this point. Basically the data set initially used for DWQ didn't have any thinking mode examples, and, somewhat surprisingly, distillation with it had the effect of actually boosting the quality/benchmarks of the DWQ model above the original BF16 model (perhaps because those benchmarks don't test the thinking mode). I'm not sure if thinking mode actually degraded, though, but either way more testing is needed and the data set should be adjusted accordingly.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment