--- license: apache-2.0 base_model: - Qwen/Qwen2.5-Math-7B pipeline_tag: text-generation tags: - lean4 - step-prover ---

BFS-Prover-V2: Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers

## Introduction We introduce **BFS-Prover-V2**, the state-of-the-art open-source step-level theorem proving system for Lean4, designed to address the dual challenges of scaling both training and inference in neural theorem proving. BFS-Prover-V2 introduces novel solutions to overcome these limitations through: 1. **Training-time scaling**: A novel multi-stage expert iteration framework with adaptive tactic-level data filtering and periodic retraining to surmount the performance plateaus that typically curtail long-term post training 2. **Inference-time scaling**: A planner-enhanced multi-agent tree search system for hierarchical reasoning that scales performance at inference time **BFS-Prover-V2** achieves 95.08\% and 41.4\% on the miniF2F and ProofNet test sets respectively, setting a new state-of-the-art for step-level provers. This repo contains the **BFS-Prover-V2-7B** model, with the following features: - Base Model: Qwen2.5-Math-7B - Training Approach: Multi-stage expert iteration with best-first tree search - Training Data Sources: - Mathlib (via LeanDojo) - Lean-Github repositories - Autoformalized NuminaMath datasets - Goedel-Pset ## Benchmark Performance
| Model | miniF2F-test | miniF2F-valid | ProofNet-test | |:------|:------------:|:-------------:|:-------------:| | 👉 **BFS-Prover-V2-7B** | 82.4% | - | - | | BFS-Prover-V2-32B | 86.1% | 85.5% | 41.4% | | BFS-Prover-V2-32B w/ Planner | 95.08% | 95.5% | - |
## Usage - The model expects input in the format `"{state}:::"` where {state} is a Lean4 tactic state. - `:::` serves as a special indicator to signal the model to generate a tactic for the given state. - The model will echo back the input state followed by the generated tactic. ```python # Example code for loading and using the tactic generator model from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("ByteDance-Seed/BFS-Prover-V2-7B") tokenizer = AutoTokenizer.from_pretrained("ByteDance-Seed/BFS-Prover-V2-7B") # imo_1964_p2 from miniF2F state = """a b c : ℝ h₀ : 0 < a ∧ 0 < b ∧ 0 < c h₁ : c < a + b h₂ : b < a + c h₃ : a < b + c ⊢ a ^ 2 * (b + c - a) + b ^ 2 * (c + a - b) + c ^ 2 * (a + b - c) ≤ 3 * a * b * c""" # Tactic generation sep = ":::" prompt = state + sep inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs) tactic = tokenizer.decode(outputs[0], skip_special_tokens=True).split(sep)[1] print(tactic) # Generated tactic: "nlinarith [sq_nonneg (a - b), sq_nonneg (c - a), sq_nonneg (b - c)]" ``` ## Citation ```bibtex @article{xin2025scaling, title={Scaling up Multi-Turn Off-Policy RL and Multi-Agent Tree Search for LLM Step-Provers}, author={Xin, Ran and Zheng, Zeyu and Nie, Yanchen and Yuan, Kun and Xiao, Xia}, journal={arXiv preprint arXiv:2509.06493}, year={2025} } ``` ## License This project is licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.txt). ## Contact For questions and feedback about the tactic generator model, please contact: - Ran Xin (ran.xin@bytedance.com) - Zeyu Zheng (zeyuzhen@andrew.cmu.edu)