Junteng commited on
Commit
ae5bbf0
·
verified ·
1 Parent(s): 663afc7

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +60 -3
README.md CHANGED
@@ -1,3 +1,60 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ tags:
6
+ - LLM
7
+ library_name: transformers
8
+ base_model:
9
+ - Qwen/Qwen2.5-32B
10
+ datasets:
11
+ - MiniMaxAI/SynLogic
12
+ ---
13
+ # SynLogic-32B: Advanced Logical Reasoning Model
14
+
15
+ * 🐙 **GitHub Repo:** [https://github.com/MiniMax-AI/SynLogic](https://github.com/MiniMax-AI/SynLogic)
16
+ * 📜 **Paper (arXiv):** [https://arxiv.org/abs/2505.19641](https://arxiv.org/abs/2505.19641)
17
+ * 🤗 **Dataset:** [SynLogic on Hugging Face](https://huggingface.co/datasets/MiniMaxAI/SynLogic)
18
+
19
+ ## Model Overview
20
+
21
+ **SynLogic-32B** is a state-of-the-art reasoning model built on Qwen2.5-32B-Base and trained using reinforcement learning on our comprehensive SynLogic dataset. The model excels at logical reasoning tasks and demonstrates strong generalization to mathematical domains.
22
+
23
+ ## Key Features
24
+
25
+ * **Comprehensive Logical Reasoning:** Trained on 35 diverse logical reasoning tasks including Sudoku, Game of 24, Cipher, Arrow Maze, and more
26
+ * **Verifiable Training:** All training data can be automatically verified, enabling effective reinforcement learning
27
+ * **Strong Generalization:** Transfers logical reasoning skills to mathematical problem-solving without explicit math training
28
+
29
+ ## Performance Highlights
30
+
31
+ | Model | BBEH | KOR-Bench | BBH |
32
+ |-------|------|-----------|-----|
33
+ | Qwen2.5-32B-Instruct | 17.5 | 54.7 | 84.5 |
34
+ | DeepSeek-R1-Distill-Qwen-32B | 19.2 | **66.6** | **88.3** |
35
+ | **SynLogic-32B** | **25.5** | 62.2 | 85.8 |
36
+
37
+ **Key Achievement:** +6 points improvement over DeepSeek-R1-Distill-Qwen-32B on the challenging BBEH benchmark, establishing state-of-the-art performance among open-source logical reasoning models.
38
+
39
+ ## Training Details
40
+
41
+ * **Base Model:** Qwen2.5-32B-Base
42
+ * **Training Algorithm:** GRPO (Group Relative Policy Optimization)
43
+ * **Dataset:** 33k SynLogic-Hard samples with controlled difficulty
44
+ * **Reward Design:** Binary rewards based on format adherence and correctness verification
45
+
46
+
47
+
48
+ ## Citation
49
+
50
+ ```bibtex
51
+ @misc{liu2025synlogic,
52
+ title={SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond},
53
+ author={Junteng Liu and Yuanxiang Fan and Zhuo Jiang and Han Ding and Yongyi Hu and Chi Zhang and Yiqi Shi and Shitong Weng and Aili Chen and Shiqi Chen and Yunan Huang and Mozhi Zhang and Pengyu Zhao and Junjie Yan and Junxian He},
54
+ year={2025},
55
+ eprint={2505.19641},
56
+ archivePrefix={arXiv},
57
+ primaryClass={cs.AI},
58
+ url={https://arxiv.org/abs/2505.19641},
59
+ }
60
+ ```