Submitted by akhaliq 42 DAPO: An Open-Source LLM Reinforcement Learning System at Scale · 35 authors 1
Submitted by nebulae09 37 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM · 12 authors 1
Submitted by carboncoo 24 DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding · 8 authors 1
Submitted by cckevinn 19 CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era · 10 authors 1
Submitted by ZhaoyangLyu 18 Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation · 12 authors 1
Submitted by akhaliq 12 Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control · 39 authors 1
Submitted by kumarkrishna 9 Atlas: Multi-Scale Attention Improves Long Context Image Modeling · 9 authors 1
Submitted by kpzhang996 8 MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification · 9 authors 1
Submitted by zhangysk 7 FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis · 9 authors 1
Submitted by Lingaaaaaaa 6 Temporal Consistency for LLM Reasoning Process Error Identification · 7 authors 1
Submitted by edaxberger 6 MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs · 11 authors 1
Submitted by jacklishufan 6 Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection · 7 authors 1
Submitted by BestWishYsh 5 Concat-ID: Towards Universal Identity-Preserving Video Synthesis · 5 authors 1
Submitted by kpzhang996 4 PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models · 11 authors 1
Submitted by PengDa02 4 Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs · 9 authors 2
Submitted by Spravil 4 Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models · 3 authors 1
Submitted by zhuoyanxu 2 Learning to Inference Adaptively for Multimodal Large Language Models · 7 authors 1
Submitted by Mingtongz 2 KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation · 3 authors 1
Submitted by yuwendu 2 RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation · 9 authors 1
Submitted by ZhiyuanZeng 2 EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees · 4 authors 1
Submitted by DamianBoborzi 1 MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling · 6 authors 1
Submitted by cxliu0314 1 CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving · 5 authors 1