Jiahao004
/

DeepTheorem-qwen-1.5b-rl

Safetensors

qwen2

Model card Files Files and versions Community

Jiahao004 commited on 13 days ago

Commit

a35ec9d

verified ·

1 Parent(s): 39cc106

Update README.md

Browse files

Files changed (1) hide show

README.md +48 -3

README.md CHANGED Viewed

@@ -1,3 +1,48 @@
----
-license: mit
----

+---
+license: mit
+---
+# DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning 🚀
+Welcome to the GitHub repository for **DeepTheorem** 🎉, a comprehensive framework for enhancing large language model (LLM) mathematical reasoning through informal, natural language-based theorem proving. This project introduces a novel approach to automated theorem proving (ATP) by leveraging the informal reasoning strengths of LLMs, moving beyond traditional formal proof systems 🌟.
+## Overview 📖
+<p align="center">
+<img src="frontpage.png" width="800" />
+</p>
+Theorem proving is a critical benchmark for evaluating complex reasoning in LLMs 🧠. However, formal proof systems often misalign with the informal, natural language knowledge LLMs acquire during pre-training. DeepTheorem addresses this gap by introducing:
+1. **A Large-Scale Theorem Proving Dataset** 📊:
+   - Contains **high-quality, IMO-level informal theorems and proofs** across diverse mathematical domains 📚.
+   - Rigorously annotated for correctness, difficulty, and topic categories ✅.
+   - Includes systematically constructed **verifiable theorem variants** for robust evaluation 🔍.
+2. **RL-Zero** 🤖:
+   - A novel **reinforcement learning strategy** tailored for informal theorem proving ⚙️.
+   - Utilizes verified theorem variants to incentivize robust mathematical inference 💡.
+Deeptheorem demonstrates its superior performance over both opensource and commercial models (i.e Gemini and o1)
+## Performance 🚀
+Deeptheorem achieves the #Rank 5 position along all the commerical models and open source models.
+| **Model**             | **FIMO** |         | **HMMT** |         | **Putnam** |         | **Avg.(\#Rank)** |         |
+| :--------------------- | :------: | :-----: | :------: | :-----: | :--------: | :-----: | :------: | :-----: |
+|                        | *out.*   | *proc.* | *out.*   | *proc.* | *out.*     | *proc.* | *out.*   | *proc.* |
+| Gemini2\.5-Pro         | 57\.14   | 54\.06  | 57\.63   | 49\.82  | 64\.58     | 58\.75  | 59\.78(\#2)   | 54\.21(\#3)  |
+| o1-mini                | 60\.32   | 55\.23  | 35\.59   | 30\.90  | 61\.46     | 52\.88  | 52\.46(\#4)   | 46\.34(\#4)  |
+| o1                     | 66\.67   | 61\.00  | 47\.46   | 47\.30  | 62\.50     | 57\.55  | 58\.88(\#3)   | 55\.28(\#2)  |
+| o3-mini                | 80\.95   | 77\.61  | 45\.76   | 43\.47  | 78\.12     | 75\.12  | 68\.28(\#1)   | 65\.40(\#1)  |
+| *DeepTheorem-RL-7B     | 55\.56   | 39\.07  | 28\.81   | 20\.85  | 57\.29     | 42\.20  | 47\.22(\#5)   | 34\.04(\#5)  |
+## DeepTheorem Dataset 📊
+The DeepTheorem dataset comprises **121K IMO-level informal theorems and proofs** spanning diverse mathematical domains 🌐. Each theorem-proof pair is rigorously annotated for:
+- **o3-mini Proofs** 🖋️: Ensuring mathematical accuracy through proofs generated or verified by the o3-mini model ✅.
+- **Truth Value** 🔍: The truth value of the theorem extracted from the o3-mini proofs, indicating whether the theorem is true or false ✔️.
+- **Difficulty** 🎚️: Categorized by complexity to suit various LLM capabilities 🧩.
+- **Topic Categories** 🗂️: Covering algebra, geometry, number theory, and more 📘.
+- **Variants** 🔄: Positive or negative variants of the theorem that share the same or inverse truth value of the original theorem 🔀.