OAgents: An Empirical Study of Building Effective Agents Paper β’ 2506.15741 β’ Published 10 days ago β’ 31
Seed1.5-Thinking: Advancing Superb Reasoning Models with Reinforcement Learning Paper β’ 2504.13914 β’ Published Apr 10 β’ 1
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation Paper β’ 2505.14640 β’ Published May 20 β’ 14
ScaleLong: A Multi-Timescale Benchmark for Long Video Understanding Paper β’ 2505.23922 β’ Published 29 days ago
P2P: Automated Paper-to-Poster Generation and Fine-Grained Benchmark Paper β’ 2505.17104 β’ Published May 21
TaskCraft: Automated Generation of Agentic Tasks Paper β’ 2506.10055 β’ Published 16 days ago β’ 31
Lingshu: A Generalist Foundation Model for Unified Multimodal Medical Understanding and Reasoning Paper β’ 2506.07044 β’ Published 20 days ago β’ 105
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix Paper β’ 2505.13032 β’ Published May 19 β’ 1
VisCoder: Fine-Tuning LLMs for Executable Python Visualization Code Generation Paper β’ 2506.03930 β’ Published 23 days ago β’ 24
Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem Paper β’ 2506.03295 β’ Published 24 days ago β’ 17
HardTests: Synthesizing High-Quality Test Cases for LLM Coding Paper β’ 2505.24098 β’ Published 29 days ago β’ 44
Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution Paper β’ 2505.20286 β’ Published May 26 β’ 7