Submitted by melisa 165 Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning · 8 authors 4
Submitted by BestWishYsh 55 UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation · 12 authors 2
Submitted by zelaix 55 VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments · 8 authors 3
Submitted by xyliu6 49 SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis · 6 authors 2
Submitted by OrlandoHugBot 47 CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs · 9 authors 4
Submitted by qizekun 33 OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models · 8 authors 2
Submitted by Cynthia-1628 33 OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation · 9 authors 2
Submitted by luojunyu 33 FinMME: Benchmark Dataset for Financial Multi-Modal Reasoning Evaluation · 13 authors 3
Submitted by ganlinyang 31 Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces · 18 authors 5
Submitted by wchengad 27 Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers · 8 authors 2
Submitted by AnonMegumi 24 MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs · 9 authors 2
Submitted by vangard703 24 Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics · 6 authors 2
Submitted by Lingaaaaaaa 22 Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning · 5 authors 2
Submitted by chaehun 21 Negative-Guided Subject Fidelity Optimization for Zero-Shot Subject-Driven Generation · 5 authors 2
Submitted by liyz 21 AnimeShooter: A Multi-Shot Animation Dataset for Reference-Guided Video Generation · 6 authors 2
Submitted by yiren98 15 RelationAdapter: Learning and Transferring Visual Relation with Diffusion Transformers · 5 authors 2
Submitted by ChenyangSi 14 DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation · 7 authors 2
Submitted by Hila 14 FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation · 4 authors 2
Submitted by gentaiscool 11 Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability · 20 authors 2
Submitted by erjui 10 PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models · 5 authors 3
Submitted by AnthonyGosselin 9 Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes · 8 authors 3
Submitted by fengyao1909 9 Training Language Models to Generate Quality Code with Program Analysis Feedback · 10 authors 4
Submitted by danielmisrael 6 Accelerating Diffusion LLMs via Adaptive Parallel Decoding · 3 authors 2
Submitted by WeiChow 3 MERIT: Multilingual Semantic Retrieval with Interleaved Multi-Condition Query · 18 authors 2
Submitted by chs20 3 FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens · 4 authors 2
Submitted by zhaoruiyang 3 Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework · 8 authors 2
Submitted by hyungjoochae 3 One Missing Piece for Open-Source Reasoning Models: A Dataset to Mitigate Cold-Starting Short CoT LLMs in RL · 9 authors 2
Submitted by Qinsi1 3 Angles Don't Lie: Unlocking Training-Efficient RL Through the Model's Own Signals · 9 authors 2
Submitted by lyan62 3 Hanfu-Bench: A Multimodal Benchmark on Cross-Temporal Cultural Understanding and Transcreation · 6 authors 2
Submitted by arkimjh 3 ReFoCUS: Reinforcement-guided Frame Optimization for Contextual Understanding · 4 authors 2
Submitted by gq2138 3 SHARE: An SLM-based Hierarchical Action CorREction Assistant for Text-to-SQL · 7 authors 2
Submitted by jamescai20 3 How Much Backtracking is Enough? Exploring the Interplay of SFT and RL in Enhancing LLM Reasoning · 4 authors 2
Submitted by xyzhang626 3 Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding · 7 authors 2
Submitted by GSean 2 Controllable Human-centric Keyframe Interpolation with Generative Prior · 5 authors 2
Submitted by lx865712528 2 TL;DR: Too Long, Do Re-weighting for Effcient LLM Reasoning Compression · 15 authors 2
Submitted by amazingj 2 M^3FinMeeting: A Multilingual, Multi-Sector, and Multi-Task Financial Meeting Understanding Evaluation Dataset · 6 authors 2
Submitted by Omartificial-Intelligence-Space 2 QARI-OCR: High-Fidelity Arabic Text Recognition through Multimodal Large Language Model Adaptation · 7 authors 2
Submitted by Boese0601 1 ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions · 10 authors 2
Submitted by ItamarZ 1 Revisiting LRP: Positional Attribution as the Missing Ingredient for Transformer Explainability · 4 authors 3
Submitted by dxlong2000 1 Beyond In-Context Learning: Aligning Long-form Generation of Large Language Models via Task-Inherent Attribute Guidelines · 8 authors 2
Submitted by anumafzal94 1 Knowing Before Saying: LLM Representations Encode Information About Chain-of-Thought Success Before Completion · 4 authors 2