LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code Paper • 2403.07974 • Published Mar 12, 2024 • 2
The Danger of Overthinking: Examining the Reasoning-Action Dilemma in Agentic Tasks Paper • 2502.08235 • Published 10 days ago • 52
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models Paper • 2406.04271 • Published Jun 6, 2024 • 29