Safetensors
qwen2

Flash

#1
by Mushoz - opened

What does the Flash mean in the title? It's not explained anywhere in the readme.

It seems to be a merge of the new Sky-T1-32b-Flash model here: https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash. I don't think the performance will be any different, but it should be faster, as it shouldn't overthink as much as the original merge without flash. I made a quant here: https://huggingface.co/sm54/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview-Q4_K_M-GGUF

FuseAI org

It seems to be a merge of the new Sky-T1-32b-Flash model here: https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash. I don't think the performance will be any different, but it should be faster, as it shouldn't overthink as much as the original merge without flash. I made a quant here: https://huggingface.co/sm54/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview-Q4_K_M-GGUF

We just replace Sky-T1-32B-Preview to Sky-T1-32B-Flash. The results are testing and will be updated soon. Stay tuned.

FuseAI org

It seems to be a merge of the new Sky-T1-32b-Flash model here: https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash. I don't think the performance will be any different, but it should be faster, as it shouldn't overthink as much as the original merge without flash. I made a quant here: https://huggingface.co/sm54/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview-Q4_K_M-GGUF

Here are the results:

Models AIME24 Pass@1 AIME24 Cons@32 MATH500 OlympiadBench
OpenAI o1 79.2 - 96.4 -
OpenAI o1-preview 44.6 - 85.5 -
OpenAI o1-mini 63.6 - 90.0 -
DeepSeek R1 79.8 - 97.3 -
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 69.2 83.3 93.6 64.3
Qwen/QwQ-32B-Preview 43.8 56.7 88.4 60.3
NovaSky-AI/Sky-T1-32B-Preview 37.7 50.0 88.0 55.1
Qwen/Qwen2.5-32B-Instruct 17.0 20.0 81.8 48.1
FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview 68.6 83.3 94.6 64.9
FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview 69.7 83.3 94.6 64.0
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview 72.9 86.7 - -
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview 74.0 86.7 94.8 65.0
Models GPQA-Diamond MMLU-Pro MMLU
OpenAI o1 75.7 - 91.8
OpenAI o1-preview 73.3 - 90.8
OpenAI o1-mini 60.0 80.3 85.2
DeepSeek R1 71.5 84.0 90.8
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 57.6 68.7 82.2
Qwen/QwQ-32B-Preview 49.5 63.5 85.2
NovaSky-AI/Sky-T1-32B-Preview 50.5 65.8 82.7
Qwen/Qwen2.5-32B-Instruct 46.5 56.3 79.6
FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview 55.1 68.6 82.0
FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview 62.1 68.9 82.7
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview 54.6 70.6 84.0
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview 62.1 70.8 83.6
Models LiveCodeBench LiveCodeBench-Easy LiveCodeBench-Medium LiveCodeBench-Hard
OpenAI o1 63.4 98.5 80.9 31.7
OpenAI o1-preview 42.7 97.0 47.2 9.8
OpenAI o1-mini 52.00 91.0 67.4 19.5
DeepSeek R1 62.8 98.4 78.3 32.2
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B 56.1 93.6 73.1 23.4
Qwen/QwQ-32B-Preview 44.4 94.9 53.8 10.0
NovaSky-AI/Sky-T1-32B-Preview 37.3 89.7 40.4 6.6
FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview 56.4 92.9 73.5 24.2
FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview 54.8 93.9 71.7 21.3
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview 58.2 94.3 77.1 25.0
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview 57.9 93.6 76.0 25.5

Sign up or log in to comment