Submitted by Liuff23 67 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence · 4 authors 3
Submitted by AngLv 65 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason · 5 authors 2
Submitted by songtingyu 56 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos · 4 authors 2
Submitted by shizhediao 41 Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding · 9 authors 2
Submitted by lyx97 39 VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? · 10 authors 6
Submitted by lhjiang 31 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views · 12 authors 2
Submitted by maksimko123 28 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning · 9 authors 3
Submitted by ydalva 23 LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers · 3 authors 3
Submitted by AliBehrouz 23 ATLAS: Learning to Optimally Memorize the Context at Test Time · 8 authors 2
Submitted by chaoscodes 23 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering · 11 authors 2
Submitted by benzweijia 23 UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning · 3 authors 2
Submitted by dlaptev 21 Train Sparse Autoencoders Efficiently by Utilizing Features Correlation · 5 authors 2
Submitted by sy1998 20 VidText: Towards Comprehensive Evaluation for Video Text Understanding · 10 authors 2
Submitted by spapi 20 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian · 9 authors 2
Submitted by TharinduSK 17 Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation · 9 authors 2
Submitted by Jiahao004 15 DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning · 13 authors 2
Submitted by BryanW 14 Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model · 11 authors 3
Submitted by KunlunZhu 12 SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents · 9 authors 2
Submitted by Bang-UdeM-Mila 12 System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts · 4 authors 2
Submitted by antonio-c 11 GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control · 8 authors 3
Submitted by dek924 11 PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions · 8 authors 2
Submitted by jefflai 10 Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding? · 7 authors 2
Submitted by Jang-Hyun 9 KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction · 6 authors 2
Submitted by m-serious 8 ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind · 3 authors 2
Submitted by smallAI 8 Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction · 6 authors 2
Submitted by Elfsong 7 Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization · 9 authors 2
Submitted by angtian 7 ATI: Any Trajectory Instruction for Controllable Video Generation · 5 authors 2
Submitted by crc5577 7 Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape · 5 authors 2
Submitted by JRQi 6 When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy · 6 authors 2
Submitted by ttumyche 6 CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays · 6 authors 2
Submitted by davidchan 5 Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint · 6 authors 2
Submitted by hdong51 5 To Trust Or Not To Trust Your Vision-Language Model's Prediction · 5 authors 2
Submitted by lyxun 5 UniTEX: Universal High Fidelity Generative Texturing for 3D Shapes · 8 authors 2
Submitted by kornelhowil 5 CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting · 6 authors 2
Submitted by JingzeShi 5 Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting · 7 authors 2
Submitted by lhmd 4 ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS · 6 authors 5
Submitted by ahnpersie 4 Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates · 4 authors 4
Submitted by kpzhang996 4 SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model · 7 authors 2
Submitted by SuperSupermoon 4 Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation · 13 authors 2
Submitted by Franck-Dernoncourt 4 A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models · 9 authors 2
Submitted by yunjae-won 3 Differential Information: An Information-Theoretic Perspective on Preference Optimization · 4 authors 2
Submitted by StringChaos 3 GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents · 6 authors 2
Submitted by Aman 3 Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator · 6 authors 2
Submitted by Junfeng5 3 TokBench: Evaluating Your Visual Tokenizer before Visual Generation · 9 authors 2
Submitted by TeddyXGZ 3 Toward Reliable Biomedical Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models · 8 authors 2
Submitted by gsarti 2 Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement · 4 authors 2
Submitted by pengxiang 2 Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking · 7 authors 2
Submitted by ctma 2 Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities · 5 authors 2