LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code Paper • 2403.07974 • Published Mar 12, 2024 • 2
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 10 days ago • 52