Large Language Models Can Self-Improve in Long-context Reasoning Paper • 2411.08147 • Published Nov 12, 2024 • 65
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering Paper • 2411.11504 • Published Nov 18, 2024 • 21
Auto-Evolve: Enhancing Large Language Model's Performance via Self-Reasoning Framework Paper • 2410.06328 • Published Oct 8, 2024 • 2
Critical Tokens Matter: Token-Level Contrastive Estimation Enhence LLM's Reasoning Capability Paper • 2411.19943 • Published Nov 29, 2024 • 62
Test-time Computing: from System-1 Thinking to System-2 Thinking Paper • 2501.02497 • Published Jan 5 • 45
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20 • 104
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published Feb 26 • 22
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning Paper • 2503.07572 • Published 27 days ago • 41