SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published 6 days ago • 78
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published 12 days ago • 35
Is Extending Modality The Right Path Towards Omni-Modality? Paper • 2506.01872 • Published Jun 2 • 22
A Controllable Examination for Long-Context Language Models Paper • 2506.02921 • Published Jun 3 • 33
ARIA: Training Language Agents with Intention-Driven Reward Aggregation Paper • 2506.00539 • Published May 31 • 30
JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community Paper • 2503.21679 • Published Mar 27 • 1
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper • 2503.12329 • Published Mar 16 • 26
Implicit Reasoning in Transformers is Reasoning through Shortcuts Paper • 2503.07604 • Published Mar 10 • 23
BenchMAX: A Comprehensive Multilingual Evaluation Suite for Large Language Models Paper • 2502.07346 • Published Feb 11 • 55
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles Paper • 2502.09082 • Published Feb 13 • 30
Agent-R: Training Language Model Agents to Reflect via Iterative Self-Training Paper • 2501.11425 • Published Jan 20 • 107
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published Oct 15, 2024 • 50
Revealing the Barriers of Language Agents in Planning Paper • 2410.12409 • Published Oct 16, 2024 • 28