Jiahao004 commited on
Commit
a35ec9d
Β·
verified Β·
1 Parent(s): 39cc106

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning πŸš€
6
+
7
+ Welcome to the GitHub repository for **DeepTheorem** πŸŽ‰, a comprehensive framework for enhancing large language model (LLM) mathematical reasoning through informal, natural language-based theorem proving. This project introduces a novel approach to automated theorem proving (ATP) by leveraging the informal reasoning strengths of LLMs, moving beyond traditional formal proof systems 🌟.
8
+
9
+ ## Overview πŸ“–
10
+
11
+ <p align="center">
12
+ <img src="frontpage.png" width="800" />
13
+ </p>
14
+
15
+ Theorem proving is a critical benchmark for evaluating complex reasoning in LLMs 🧠. However, formal proof systems often misalign with the informal, natural language knowledge LLMs acquire during pre-training. DeepTheorem addresses this gap by introducing:
16
+
17
+ 1. **A Large-Scale Theorem Proving Dataset** πŸ“Š:
18
+ - Contains **high-quality, IMO-level informal theorems and proofs** across diverse mathematical domains πŸ“š.
19
+ - Rigorously annotated for correctness, difficulty, and topic categories βœ….
20
+ - Includes systematically constructed **verifiable theorem variants** for robust evaluation πŸ”.
21
+
22
+ 2. **RL-Zero** πŸ€–:
23
+ - A novel **reinforcement learning strategy** tailored for informal theorem proving βš™οΈ.
24
+ - Utilizes verified theorem variants to incentivize robust mathematical inference πŸ’‘.
25
+
26
+ Deeptheorem demonstrates its superior performance over both opensource and commercial models (i.e Gemini and o1)
27
+
28
+ ## Performance πŸš€
29
+ Deeptheorem achieves the #Rank 5 position along all the commerical models and open source models.
30
+
31
+ | **Model** | **FIMO** | | **HMMT** | | **Putnam** | | **Avg.(\#Rank)** | |
32
+ | :--------------------- | :------: | :-----: | :------: | :-----: | :--------: | :-----: | :------: | :-----: |
33
+ | | *out.* | *proc.* | *out.* | *proc.* | *out.* | *proc.* | *out.* | *proc.* |
34
+ | Gemini2\.5-Pro | 57\.14 | 54\.06 | 57\.63 | 49\.82 | 64\.58 | 58\.75 | 59\.78(\#2) | 54\.21(\#3) |
35
+ | o1-mini | 60\.32 | 55\.23 | 35\.59 | 30\.90 | 61\.46 | 52\.88 | 52\.46(\#4) | 46\.34(\#4) |
36
+ | o1 | 66\.67 | 61\.00 | 47\.46 | 47\.30 | 62\.50 | 57\.55 | 58\.88(\#3) | 55\.28(\#2) |
37
+ | o3-mini | 80\.95 | 77\.61 | 45\.76 | 43\.47 | 78\.12 | 75\.12 | 68\.28(\#1) | 65\.40(\#1) |
38
+ | *DeepTheorem-RL-7B | 55\.56 | 39\.07 | 28\.81 | 20\.85 | 57\.29 | 42\.20 | 47\.22(\#5) | 34\.04(\#5) |
39
+
40
+
41
+ ## DeepTheorem Dataset πŸ“Š
42
+
43
+ The DeepTheorem dataset comprises **121K IMO-level informal theorems and proofs** spanning diverse mathematical domains 🌐. Each theorem-proof pair is rigorously annotated for:
44
+ - **o3-mini Proofs** πŸ–‹οΈ: Ensuring mathematical accuracy through proofs generated or verified by the o3-mini model βœ….
45
+ - **Truth Value** πŸ”: The truth value of the theorem extracted from the o3-mini proofs, indicating whether the theorem is true or false βœ”οΈ.
46
+ - **Difficulty** 🎚️: Categorized by complexity to suit various LLM capabilities 🧩.
47
+ - **Topic Categories** πŸ—‚οΈ: Covering algebra, geometry, number theory, and more πŸ“˜.
48
+ - **Variants** πŸ”„: Positive or negative variants of the theorem that share the same or inverse truth value of the original theorem πŸ”€.