MORSE-500: A Programmatically Controllable Video Benchmark to Stress-Test Multimodal Reasoning Paper • 2506.05523 • Published 6 days ago • 31
Establishing Trustworthy LLM Evaluation via Shortcut Neuron Analysis Paper • 2506.04142 • Published 7 days ago • 26
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos Paper • 2506.04141 • Published 7 days ago • 29
MMR-V: What's Left Unsaid? A Benchmark for Multimodal Deep Reasoning in Videos Paper • 2506.04141 • Published 7 days ago • 29
OmniReward Collection Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences • 3 items • Updated 23 days ago
OmniReward Collection Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences • 3 items • Updated 23 days ago
Reinforced Internal-External Knowledge Synergistic Reasoning for Efficient Adaptive Search Agent Paper • 2505.07596 • Published about 1 month ago • 10