zhs12 commited on
Commit
b36c265
·
verified ·
1 Parent(s): 2c55ade

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +47 -3
README.md CHANGED
@@ -1,3 +1,47 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model:
4
+ - deepseek-ai/DeepSeek-R1-Distill-Qwen-32B
5
+ ---
6
+ # Light-R1-32B-DS: near-SOTA 32B Math Model with Only 3K Data
7
+
8
+ |Model|Trained From|Release Date|AIME24|AIME25|GPQA|
9
+ | ---- | ---- | ---- | ---- | ---- | ---- |
10
+ |DeepSeek-R1-Distill-Qwen-32B|Qwen2.5-32B|25.1.20|72.6|54.9|62.1|
11
+ |TinyR1-32B-Preview|DeepSeek-R1-Distill-Qwen-32B|25.2.25|77.1|65.9|65.0|
12
+ | [**Light-R1-32B-DS (ours)** 🤗](https://huggingface.co/qihoo360/Light-R1-32B-DS) |DeepSeek-R1-Distill-Qwen-32B|25.3.12|**78.1**|**65.9**|**68.0**|
13
+ | [Light-R1-32B (ours) 🤗](https://huggingface.co/qihoo360/Light-R1-32B) |Qwen2.5-32B-Instruct|25.3.4|76.6|64.6|61.8|
14
+ | QwQ-32B |N/A|25.3.6|78.5|69.3|67.7|
15
+
16
+ [GitHub page](https://github.com/Qihoo360/Light-R1)
17
+
18
+
19
+ Light-R1-32B-DS is a near-SOTA 32B math model with AIME24 & 25 scores 78.1 & 65.9.
20
+
21
+ Originated from DeepSeek-R1-Distill-Qwen-32B, Light-R1-32B-DS is further trained with only [3K SFT data](https://huggingface.co/datasets/qihoo360/Light-R1-SFTData) as we've open-sourced, demonstrating the strong applicability of the released data.
22
+
23
+ We are excited to release this model along with the [technical report](https://github.com/Qihoo360/Light-R1/blob/main/Light-R1.pdf).
24
+
25
+ ## Usage
26
+ Same as DeepSeek-R1-Distill-Qwen-32B.
27
+
28
+ ## Data Decontamination
29
+
30
+ We carefully evaluated data contamination of several open-sourced datasets.
31
+ While certain contamination may be [inevitable during pre-training](https://x.com/DimitrisPapail/status/1888325914603516214),
32
+ it is unacceptable for post-training to compare on benchmarks.
33
+ MATH-500 is somewhat compromised with tens of questions that are identical or only numbers changed. AIME 24 and 25 stay intact but we have to pay special attention when we incorporate AIME data up to 2023.
34
+
35
+ Light-R1 did thorough decontamination with exact matching (excluding digits) and N-gram (N=32) matching.
36
+
37
+ ## Citation
38
+ ```latex
39
+ @misc{lightr1proj,
40
+ title={Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond},
41
+ author={Liang Wen, Yunke Cai, Fenrui Xiao, Xin He, Qi An, Zhenyu Duan, Yimin Du, Junchen Liu, Lifu Tang, Xiaowei Lv, Haosheng Zou, Yongchao Deng, Shousheng Jia, Xiangzheng Zhang},
42
+ year={2025},
43
+ eprint={},
44
+ archivePrefix={},
45
+ url={https://github.com/Qihoo360/Light-R1},
46
+ }
47
+ ```