TinyV: Reducing False Negatives in Verification Improves RL for LLM Reasoning Paper • 2505.14625 • Published May 20 • 13
CrossWordBench: Evaluating the Reasoning Capabilities of LLMs and LVLMs with Controllable Puzzle Generation Paper • 2504.00043 • Published Mar 30 • 9