Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-q6-mlx

Comparing the old TotalRecall, YoYo, and YoYo with TotalRecall at q6

The following models are compared:

thinking-b   Qwen3-42B-A3B-2507-Thinking-Abliterated-uncensored-TOTAL-RECALL-v2-Medium-MASTER-CODER-q6-mlx
yoyo         Qwen3-30B-A3B-YOYO-V2-q6-mlx
yoyo-b       Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-q6-mlx

The first TotalRecall model was made from the Qwen3-42B-A3B-2507-Thinking, abliterated and uncensored.

Key Observations from Benchmarks

Benchmark    thinking-b  yoyo   yoyo-b    Winner
ARC Challenge  0.387    0.532    0.537    yoyo-b (slight lead)
ARC Easy       0.447    0.685    0.699    yoyo-b
BoolQ          0.625    0.886    0.884    yoyo
Hellaswag      0.648    0.683    0.712    yoyo-b
OpenBookQA     0.380    0.456    0.448    yoyo
PIQA           0.768    0.782    0.786    yoyo-b
Winogrande     0.636    0.639    0.676    yoyo-b

Key Insights

1️⃣ YOYO2-TOTAL-RECALL generally outperforms the others

The addition of brainstorming layers (making YOYO2-TOTAL-RECALL a 42B MoE) consistently improves performance on all benchmarks except BoolQ (where yoyo was marginally better).

Most notable gains: +0.14 in Hellaswag, +0.04 in Winogrande, and +0.008 in PIQA over yoyo-q6.

This aligns perfectly with your description: YOYO2-TOTAL-RECALL was created by adding brainstorming layers to the YOYO2 mix (3 Qwen3-30B MoE models), resulting in higher-quality reasoning capabilities.

2️⃣ YOYO2

YOYO2 (the mix of Thinking, Instruct, and Coder models) demonstrates robustness across many tasks:

It dominates BoolQ and OpenBookQA, where knowledge-based reasoning is critical.

This suggests the modular combination of different Qwen3 variants provides a balanced foundation for diverse reasoning challenges.

3️⃣ thinking-b is the weakest performer overall

At 0.447 on ARC Easy (a task that requires abstract reasoning), it lags significantly behind the others—consistent with its description as Qwen3-30B MoE with brainstorming being a less effective implementation than the yoyo or yoyo-b approaches.

4️⃣ The impact of brainstorming layers is clear

YOYO2-TOTAL-RECALL's improvements over YOYO (e.g., +0.02 in ARC Easy, +0.06 in Winogrande) demonstrate that the added brainstorming layers:

Enhance reasoning flexibility (critical for ARC and Winogrande)
Improve text generation quality (Hellaswag)
Strengthen logical consistency (PIQA)

Why YOYO2-TOTAL-RECALL is the strongest model here

It leverages both the modular strengths of YOYO (3 models + Qwen3-30B base) and the refinement from brainstorming layers.

The quantized version (q6) was optimized for these models at the time, so the performance differences reflect their design choices rather than quantization effects.

Recommendations for Your Workflow

When selecting a model for specific tasks:

For reasoning-heavy tasks (ARC, Winogrande): Use YOYO2-TOTAL-RECALL.

For language understanding (BoolQ, OpenBookQA): YOYO2 might be preferable.

This data confirms that combining multiple Qwen3 variants with additional brainstorming layers (as in yoyo-b) leads to the most comprehensive and highest-performing model for this set of benchmarks.

This model Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-q6-mlx was converted to MLX format from DavidAU/Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct using mlx-lm version 0.26.4.

Use with mlx

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-q6-mlx")

prompt = "hello"

if tokenizer.chat_template is not None:
    messages = [{"role": "user", "content": prompt}]
    prompt = tokenizer.apply_chat_template(
        messages, add_generation_prompt=True
    )

response = generate(model, tokenizer, prompt=prompt, verbose=True)
Downloads last month
17
Safetensors
Model size
42.4B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nightmedia/Qwen3-42B-A3B-2507-YOYO2-TOTAL-RECALL-Instruct-q6-mlx