Arcee-Blitz (24B) is a new Mistral-based 24B model distilled from DeepSeek, designed to be both fast and efficient. We view it as a practical โ€œworkhorseโ€ model that can tackle a range of tasks without the overhead of larger architectures.

Quantizations

GGUF quants are available here

AWQ quants are available here

Model Details

  • Architecture Base: Mistral-Small-24B-Instruct-2501
  • Parameter Count: 24B
  • Distillation Data:
    • Merged Virtuoso pipeline with Mistral architecture, hotstarting the training with over 3B tokens of pretraining distillation from DeepSeek-V3 logits
  • Fine-Tuning and Post-Training:
    • After capturing core logits, we performed additional fine-tuning and distillation steps to enhance overall performance.
  • License: Apache-2.0

Improving World Knowledge

Arcee-Blitz shows large improvements to performance on MMLU-Pro versus the original Mistral-Small-3, reflecting a dramatic increase in world knowledge.

Data contamination checking

We carefully examined our training data and pipeline to avoid contamination. While weโ€™re confident in the validity of these gains, we remain open to further community validation and testing (one of the key reasons we release these models as open-source).

Benchmark Comparison

Benchmark mistralโ€‘smallโ€‘3 arceeโ€‘blitz
MixEval 81.6% 85.1%
GPQADiamond 42.4% 43.1%
BigCodeBench Complete 44.4% 45.5%
BigCodeBench Instruct 34.7% 35.9%
BigCodeBench Complete-hard 16.2% 19.6%
BigCodeBench Instruct-hard 15.5% 15.5%
IFEval 77.44 80.60
BBH 64.46 65.00
GPQA 33.90 36.70
MMLU Pro 44.70 60.20
MuSR 40.90 50.00
Math Level 5 12.00 38.60

Limitations

  • Context Length: 32k Tokens (may vary depending on the final tokenizer settings and system resources).
  • Knowledge Cut-off: Training data may not reflect the latest events or developments beyond June 2024.

Ethical Considerations

  • Content Generation Risks: Like any language model, Arcee-Blitz can generate potentially harmful or biased content if prompted in certain ways.

License

Arcee-Blitz (24B) is released under the Apache-2.0 License. You are free to use, modify, and distribute this model in both commercial and non-commercial applications, subject to the terms and conditions of the license.

If you have questions or would like to share your experiences using Arcee-Blitz (24B), please connect with us on social media. Weโ€™re excited to see what you buildโ€”and how this model helps you innovate!

Downloads last month
428
Safetensors
Model size
23.6B params
Tensor type
BF16
ยท
Inference Providers NEW
Input a message to start chatting with arcee-ai/Arcee-Blitz.

Model tree for arcee-ai/Arcee-Blitz

Finetuned
(49)
this model
Finetunes
2 models
Merges
31 models
Quantizations
30 models

Spaces using arcee-ai/Arcee-Blitz 2