Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math Paper • 2504.21233 • Published 6 days ago • 35
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published 5 days ago • 39
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities Paper • 2504.20734 • Published 6 days ago • 57
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published 7 days ago • 79
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 11 days ago • 86
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published 14 days ago • 73
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Paper • 2504.16078 • Published 13 days ago • 20
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published 17 days ago • 118
LettuceDetect: A Hallucination Detection Framework for RAG Applications Paper • 2502.17125 • Published Feb 24 • 11
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution Paper • 2502.18449 • Published Feb 25 • 74
The Relationship Between Reasoning and Performance in Large Language Models -- o3 (mini) Thinks Harder, Not Longer Paper • 2502.15631 • Published Feb 21 • 9
The Ultimate Collection of Code Classifiers Collection 🔥 15 classifiers, 124M parameters, one per programming language— for assessing the educational value of GitHub code • 15 items • Updated about 5 hours ago • 11
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 156
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published Feb 17 • 45