Submitted by cccjc 56 BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing · 5 authors 1
Submitted by StarJiaxing 31 LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs · 4 authors 73 3
Submitted by Mengyi 26 XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation · 7 authors 238 3
Submitted by zhitinghu 23 Do Vision-Language Models Have Internal World Models? Towards an Atomic Evaluation · 24 authors 1
Submitted by THUdyh 21 ShotBench: Expert-Level Cinematic Understanding in Vision-Language Models · 14 authors 22 1
Submitted by ChengyouJia 17 From Ideal to Real: Unified and Data-Efficient Dense Prediction for Real-World Scenarios · 4 authors 24 1
Submitted by AdinaY 14 Pangu Pro MoE: Mixture of Grouped Experts for Efficient Sparsity · 22 authors 2
Submitted by hba123 13 Ark: An Open-source Python-based Framework for Robot Learning · 13 authors 25 1
Submitted by LeoLau 10 Shape-for-Motion: Precise and Consistent Video Editing with 3D Proxy · 5 authors 41 1
Submitted by tennant 10 The Automated LLM Speedrunning Benchmark: Reproducing NanoGPT Improvements · 23 authors 1
Submitted by SivanSX 10 Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs · 9 authors 1
Submitted by mdmoor 7 SMMILE: An Expert-Driven Benchmark for Multimodal Medical In-Context Learning · 12 authors 6 1
Submitted by AhmedMostafa 6 Gazal-R1: Achieving State-of-the-Art Medical Reasoning with Parameter-Efficient Two-Stage Training · 3 authors 1
Submitted by Luo-Yihong 4 Noise Consistency Training: A Native Approach for One-Step Generator in Learning Additional Controls · 4 authors 1
Submitted by nomadlx 4 Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning · 5 authors 1
Submitted by Srizzle 3 Performance Prediction for Large Systems via Text-to-Text Regression · 10 authors 63 1
Submitted by pengxiang 2 GPAS: Accelerating Convergence of LLM Pretraining via Gradient-Preserving Activation Scaling · 15 authors 1
Submitted by shengliu66 2 Fractional Reasoning via Latent Steering Vectors Improves Inference Time Compute · 7 authors 2
Submitted by j-morano 1 RetFiner: A Vision-Language Refinement Scheme for Retinal Foundation Models · 4 authors 4 1
Submitted by Srikumar26 1 Global and Local Entailment Learning for Natural World Imagery · 5 authors 1
Submitted by harisankar95 1 Adaptive Domain Modeling with Language Models: A Multi-Agent Approach to Task Planning · 3 authors 1