WMPO: World Model-based Policy Optimization for Vision-Language-Action Models Paper • 2511.09515 • Published Nov 12 • 18
Reasoning with Confidence: Efficient Verification of LLM Reasoning Steps via Uncertainty Heads Paper • 2511.06209 • Published Nov 9 • 18
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs Paper • 2507.09477 • Published Jul 13 • 86
LongCodeZip: Compress Long Context for Code Language Models Paper • 2510.00446 • Published Oct 1 • 106
SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? Paper • 2502.12115 • Published Feb 17 • 46
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR Paper • 2508.14029 • Published Aug 19 • 118
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published Aug 6 • 129
view article Article Introducing smolagents: simple agents that write actions in code. +1 Dec 31, 2024 • 1.16k
Agent Lightning: Train ANY AI Agents with Reinforcement Learning Paper • 2508.03680 • Published Aug 5 • 121
Efficient Agents: Building Effective Agents While Reducing Cost Paper • 2508.02694 • Published Jul 24 • 86
SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? Paper • 2507.12415 • Published Jul 16 • 42
view article Article DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge Feb 7 • 263
BENCHAGENTS: Automated Benchmark Creation with Agent Interaction Paper • 2410.22584 • Published Oct 29, 2024 • 1
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published Jun 5 • 76
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 187