Submitted by ai-alanov 84 T-LoRA: Single Image Diffusion Model Customization Without Overfitting · 4 authors 44 1
Submitted by HaochenWang 37 Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology · 12 authors 25 2
Submitted by ChaimZhu 29 OST-Bench: Evaluating the Capabilities of MLLMs in Online Spatio-temporal Scene Understanding · 7 authors 40 1
Submitted by js-hyun 24 Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs · 9 authors 9 3
Submitted by Diankun 23 Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling · 7 authors 2
Submitted by EthanTaylor 19 LangSplatV2: High-dimensional 3D Language Gaussian Splatting with 450+ FPS · 7 authors 1
Submitted by Franck-Dernoncourt 17 A Survey on Long-Video Storytelling Generation: Architectures, Consistency, and Cinematic Quality · 29 authors 1
Submitted by zhoutianyi 15 Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs · 3 authors 5
Submitted by Xuandong 5 Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language Models · 6 authors 2
Submitted by dbralios 2 Re-Bottleneck: Latent Re-Structuring for Neural Audio Autoencoders · 3 authors 1
Submitted by Bochkov 2 Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate · 1 authors 2
Submitted by xianbao 2 SciMaster: Towards General-Purpose Scientific AI Agents, Part I. X-Master as Foundation: Can We Lead on Humanity's Last Exam? · 11 authors 1
Submitted by Bochkov 1 Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations · 1 authors 1 1