Submitted by akhaliq 69 DAPO: An Open-Source LLM Reinforcement Learning System at Scale · 35 authors 2
Submitted by nebulae09 39 Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM · 12 authors 2
Submitted by carboncoo 24 DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding · 8 authors 2
Submitted by ZhaoyangLyu 20 Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation · 12 authors 2
Submitted by cckevinn 20 CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era · 10 authors 2
Submitted by akhaliq 14 Cosmos-Transfer1: Conditional World Generation with Adaptive Multimodal Control · 39 authors 2
Submitted by zhangysk 11 FlexWorld: Progressively Expanding 3D Scenes for Flexiable-View Synthesis · 9 authors 2
Submitted by kumarkrishna 10 Atlas: Multi-Scale Attention Improves Long Context Image Modeling · 9 authors 2
Submitted by BestWishYsh 9 Concat-ID: Towards Universal Identity-Preserving Video Synthesis · 5 authors 2
Submitted by kpzhang996 9 MPBench: A Comprehensive Multimodal Reasoning Benchmark for Process Errors Identification · 9 authors 2
Submitted by Lingaaaaaaa 7 Temporal Consistency for LLM Reasoning Process Error Identification · 7 authors 2
Submitted by edaxberger 7 MM-Spatial: Exploring 3D Spatial Understanding in Multimodal LLMs · 11 authors 2
Submitted by jacklishufan 7 Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection · 7 authors 2
Submitted by kpzhang996 5 PEBench: A Fictitious Dataset to Benchmark Machine Unlearning for Multimodal Large Language Models · 11 authors 2
Submitted by Spravil 5 Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models · 3 authors 2
Submitted by PengDa02 4 Towards Self-Improving Systematic Cognition for Next-Generation Foundation MLLMs · 9 authors 3
Submitted by zhuoyanxu 2 Learning to Inference Adaptively for Multimodal Large Language Models · 7 authors 2
Submitted by Mingtongz 2 KUDA: Keypoints to Unify Dynamics Learning and Visual Prompting for Open-Vocabulary Robotic Manipulation · 3 authors 2
Submitted by yuwendu 2 RoCo-Sim: Enhancing Roadside Collaborative Perception through Foreground Simulation · 9 authors 2
Submitted by ZhiyuanZeng 2 EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees · 4 authors 2
Submitted by DamianBoborzi 1 MeshFleet: Filtered and Annotated 3D Vehicle Dataset for Domain Specific Generative Modeling · 6 authors 2
Submitted by cxliu0314 1 CoLMDriver: LLM-based Negotiation Benefits Cooperative Autonomous Driving · 5 authors 2