A Rigorous Benchmark with Multidimensional Evaluation for Deep Research Agents: From Answers to Reports Paper • 2510.02190 • Published 14 days ago • 18
A Mousetrap: Fooling Large Reasoning Models for Jailbreak with Chain of Iterative Chaos Paper • 2502.15806 • Published Feb 19 • 2
Argus Inspection: Do Multimodal Large Language Models Possess the Eye of Panoptes? Paper • 2506.14805 • Published Jun 3 • 2