Towards Solving More Challenging IMO Problems via Decoupled Reasoning and Proving Paper • 2507.06804 • Published 6 days ago • 14
Scaling Speculative Decoding with Lookahead Reasoning Paper • 2506.19830 • Published 19 days ago • 12
Self-Calibration Collection Efficient Test-Time Scaling via Self-Calibration https://arxiv.org/abs/2503.00031 • 7 items • Updated Jun 8 • 2
Router-R1: Teaching LLMs Multi-Round Routing and Aggregation via Reinforcement Learning Paper • 2506.09033 • Published Jun 10 • 7
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 127
Soft Thinking: Unlocking the Reasoning Potential of LLMs in Continuous Concept Space Paper • 2505.15778 • Published May 21 • 17
POSS: Position Specialist Generates Better Draft for Speculative Decoding Paper • 2506.03566 • Published Jun 4 • 6
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published May 27 • 105
WebAgent-R1: Training Web Agents via End-to-End Multi-Turn Reinforcement Learning Paper • 2505.16421 • Published May 22 • 19
Generative AI Act II: Test Time Scaling Drives Cognition Engineering Paper • 2504.13828 • Published Apr 18 • 17
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published Apr 14 • 276
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 295
OLMoTrace: Tracing Language Model Outputs Back to Trillions of Training Tokens Paper • 2504.07096 • Published Apr 9 • 74
Optimizing Language Model's Reasoning Abilities with Weak Supervision Paper • 2405.04086 • Published May 7, 2024 • 2
Taming Overconfidence in LLMs: Reward Calibration in RLHF Paper • 2410.09724 • Published Oct 13, 2024 • 3
On Grounded Planning for Embodied Tasks with Language Models Paper • 2209.00465 • Published Aug 29, 2022 • 1
Divide, Reweight, and Conquer: A Logit Arithmetic Approach for In-Context Learning Paper • 2410.10074 • Published Oct 14, 2024 • 1
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30 • 9