Flash
What does the Flash mean in the title? It's not explained anywhere in the readme.
It seems to be a merge of the new Sky-T1-32b-Flash model here: https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash. I don't think the performance will be any different, but it should be faster, as it shouldn't overthink as much as the original merge without flash. I made a quant here: https://huggingface.co/sm54/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview-Q4_K_M-GGUF
It seems to be a merge of the new Sky-T1-32b-Flash model here: https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash. I don't think the performance will be any different, but it should be faster, as it shouldn't overthink as much as the original merge without flash. I made a quant here: https://huggingface.co/sm54/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview-Q4_K_M-GGUF
We just replace Sky-T1-32B-Preview to Sky-T1-32B-Flash. The results are testing and will be updated soon. Stay tuned.
It seems to be a merge of the new Sky-T1-32b-Flash model here: https://huggingface.co/NovaSky-AI/Sky-T1-32B-Flash. I don't think the performance will be any different, but it should be faster, as it shouldn't overthink as much as the original merge without flash. I made a quant here: https://huggingface.co/sm54/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview-Q4_K_M-GGUF
Here are the results:
Models | AIME24 Pass@1 | AIME24 Cons@32 | MATH500 | OlympiadBench |
---|---|---|---|---|
OpenAI o1 | 79.2 | - | 96.4 | - |
OpenAI o1-preview | 44.6 | - | 85.5 | - |
OpenAI o1-mini | 63.6 | - | 90.0 | - |
DeepSeek R1 | 79.8 | - | 97.3 | - |
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | 69.2 | 83.3 | 93.6 | 64.3 |
Qwen/QwQ-32B-Preview | 43.8 | 56.7 | 88.4 | 60.3 |
NovaSky-AI/Sky-T1-32B-Preview | 37.7 | 50.0 | 88.0 | 55.1 |
Qwen/Qwen2.5-32B-Instruct | 17.0 | 20.0 | 81.8 | 48.1 |
FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview | 68.6 | 83.3 | 94.6 | 64.9 |
FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview | 69.7 | 83.3 | 94.6 | 64.0 |
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview | 72.9 | 86.7 | - | - |
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview | 74.0 | 86.7 | 94.8 | 65.0 |
Models | GPQA-Diamond | MMLU-Pro | MMLU |
---|---|---|---|
OpenAI o1 | 75.7 | - | 91.8 |
OpenAI o1-preview | 73.3 | - | 90.8 |
OpenAI o1-mini | 60.0 | 80.3 | 85.2 |
DeepSeek R1 | 71.5 | 84.0 | 90.8 |
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | 57.6 | 68.7 | 82.2 |
Qwen/QwQ-32B-Preview | 49.5 | 63.5 | 85.2 |
NovaSky-AI/Sky-T1-32B-Preview | 50.5 | 65.8 | 82.7 |
Qwen/Qwen2.5-32B-Instruct | 46.5 | 56.3 | 79.6 |
FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Instruct-32B-Preview | 55.1 | 68.6 | 82.0 |
FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview | 62.1 | 68.9 | 82.7 |
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview | 54.6 | 70.6 | 84.0 |
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview | 62.1 | 70.8 | 83.6 |
Models | LiveCodeBench | LiveCodeBench-Easy | LiveCodeBench-Medium | LiveCodeBench-Hard |
---|---|---|---|---|
OpenAI o1 | 63.4 | 98.5 | 80.9 | 31.7 |
OpenAI o1-preview | 42.7 | 97.0 | 47.2 | 9.8 |
OpenAI o1-mini | 52.00 | 91.0 | 67.4 | 19.5 |
DeepSeek R1 | 62.8 | 98.4 | 78.3 | 32.2 |
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B | 56.1 | 93.6 | 73.1 | 23.4 |
Qwen/QwQ-32B-Preview | 44.4 | 94.9 | 53.8 | 10.0 |
NovaSky-AI/Sky-T1-32B-Preview | 37.3 | 89.7 | 40.4 | 6.6 |
FuseAI/FuseO1-DeepSeekR1-Qwen2.5-Coder-32B-Preview | 56.4 | 92.9 | 73.5 | 24.2 |
FuseAI/FuseO1-DeepSeekR1-QwQ-32B-Preview | 54.8 | 93.9 | 71.7 | 21.3 |
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-Flash-32B-Preview | 58.2 | 94.3 | 77.1 | 25.0 |
FuseAI/FuseO1-DeepSeekR1-QwQ-SkyT1-32B-Preview | 57.9 | 93.6 | 76.0 | 25.5 |