Goedel-Prover-V2: Scaling Formal Theorem Proving with Scaffolded Data Synthesis and Self-Correction Paper • 2508.03613 • Published 18 days ago • 11 • 3
The Promise of RL for Autoregressive Image Editing Paper • 2508.01119 • Published 21 days ago • 11 • 3
Tool-integrated Reinforcement Learning for Repo Deep Search Paper • 2508.03012 • Published 18 days ago • 18 • 3
Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities Paper • 2507.13158 • Published Jul 17 • 24 • 2
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning Paper • 2507.14137 • Published Jul 18 • 33 • 5
A Data-Centric Framework for Addressing Phonetic and Prosodic Challenges in Russian Speech Generative Models Paper • 2507.13563 • Published Jul 17 • 51 • 3
A Survey of Context Engineering for Large Language Models Paper • 2507.13334 • Published Jul 17 • 245 • 12
RuleReasoner: Reinforced Rule-based Reasoning via Domain-aware Dynamic Sampling Paper • 2506.08672 • Published Jun 10 • 31 • 3
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models Paper • 2506.06395 • Published Jun 5 • 129 • 21
X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains Paper • 2505.03981 • Published May 6 • 15 • 3
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities Paper • 2505.02567 • Published May 5 • 79 • 5
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and Beyond Paper • 2503.10460 • Published Mar 13 • 29 • 4
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization Paper • 2503.10615 • Published Mar 13 • 17 • 3
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries Paper • 2502.20475 • Published Feb 27 • 3 • 4
An Empirical Study on Eliciting and Improving R1-like Reasoning Models Paper • 2503.04548 • Published Mar 6 • 8 • 3
Union of Experts: Adapting Hierarchical Routing to Equivalently Decomposed Transformer Paper • 2503.02495 • Published Mar 4 • 8 • 4
FuseChat-3.0: Preference Optimization Meets Heterogeneous Model Fusion Paper • 2503.04222 • Published Mar 6 • 15 • 3
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM? Paper • 2502.14502 • Published Feb 20 • 91 • 9