Mathesis: Towards Formal Theorem Proving from Natural Languages
Abstract
Recent advances in large language models show strong promise for formal reasoning. However, most LLM-based theorem provers have long been constrained by the need for expert-written formal statements as inputs, limiting their applicability to real-world problems expressed in natural language. We tackle this gap with Mathesis, the first end-to-end theorem proving pipeline processing informal problem statements. It contributes Mathesis-Autoformalizer, the first autoformalizer using reinforcement learning to enhance the formalization ability of natural language problems, aided by our novel LeanScorer framework for nuanced formalization quality assessment. It also proposes a Mathesis-Prover, which generates formal proofs from the formalized statements. To evaluate the real-world applicability of end-to-end formal theorem proving, we introduce Gaokao-Formal, a benchmark of 488 complex problems from China's national college entrance exam. Our approach is carefully designed, with a thorough study of each component. Experiments demonstrate Mathesis's effectiveness, with the autoformalizer outperforming the best baseline by 22% in pass-rate on Gaokao-Formal. The full system surpasses other model combinations, achieving 64% accuracy on MiniF2F with pass@32 and a state-of-the-art 18% on Gaokao-Formal.
Community
This paper introduces Mathesis, the first end-to-end theorem-proving pipeline designed to solve complex mathematical problems directly from informal, natural language statements. The system features the Mathesis-Autoformalizer, which uses reinforcement learning to translate natural language problems into the Lean 4 formal language. The pipeline's performance is tested on the newly introduced Gaokao-Formal benchmark, a challenging set of 488 problems from China's national college entrance exam. The paper also presents LeanScorer, a novel framework for evaluating the semantic accuracy of these formalizations. When combined, the Mathesis components achieve state-of-the-art results, demonstrating a significant advancement in automated formal reasoning.
Here are the key highlights:
- Pioneering End-to-End System: Mathesis is the first theorem-proving pipeline that processes informal problem statements from start to finish.
- Advanced Autoformalizer: The Mathesis-Autoformalizer is the first of its kind to use reinforcement learning and Hierarchical Preference Optimization (HPO) to improve its ability to formalize natural language problems. It outperforms the best baseline by 22% in pass-rate on the Gaokao-Formal benchmark.
- New Challenging Benchmark: The paper introduces Gaokao-Formal, a benchmark with 488 complex problems from China's Gaokao exam, aimed at advancing research on formalizing diverse and intricate mathematical problems.
- Nuanced Evaluation Framework: The authors developed LeanScorer, a novel evaluation method that combines LLM-based analysis with a Sugeno Fuzzy Integral to assess formalization quality. It achieves a 0.92 F1 score, outperforming previous methods.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning (2025)
- Enumerate-Conjecture-Prove: Formally Solving Answer-Construction Problems in Math Competitions (2025)
- Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning (2025)
- Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification (2025)
- FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models (2025)
- DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition (2025)
- Let's Reason Formally: Natural-Formal Hybrid Reasoning Enhances LLM's Math Capability (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper