Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper β’ 2501.09755 β’ Published 1 day ago β’ 22
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper β’ 2501.08828 β’ Published 3 days ago β’ 26
Towards Best Practices for Open Datasets for LLM Training Paper β’ 2501.08365 β’ Published 4 days ago β’ 40
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models Paper β’ 2501.06751 β’ Published 6 days ago β’ 31
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper β’ 2501.08313 β’ Published 4 days ago β’ 258
Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction Paper β’ 2501.03218 β’ Published 12 days ago β’ 33
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper β’ 2501.05874 β’ Published 8 days ago β’ 61
Centurio: On Drivers of Multilingual Ability of Large Vision-Language Model Paper β’ 2501.05122 β’ Published 9 days ago β’ 18
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper β’ 2501.04003 β’ Published 11 days ago β’ 23
Agent Laboratory: Using LLM Agents as Research Assistants Paper β’ 2501.04227 β’ Published 10 days ago β’ 77
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper β’ 2412.21037 β’ Published 19 days ago β’ 23
An Empirical Study of Autoregressive Pre-training from Videos Paper β’ 2501.05453 β’ Published 9 days ago β’ 36
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Paper β’ 2501.04575 β’ Published 10 days ago β’ 22
Exploring Length Generalization in Large Language Models Paper β’ 2207.04901 β’ Published Jul 11, 2022 β’ 1
Negative Preference Optimization: From Catastrophic Collapse to Effective Unlearning Paper β’ 2404.05868 β’ Published Apr 8, 2024 β’ 1
Simplicity Prevails: Rethinking Negative Preference Optimization for LLM Unlearning Paper β’ 2410.07163 β’ Published Oct 9, 2024 β’ 1
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token Paper β’ 2501.03895 β’ Published 11 days ago β’ 48
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper β’ 2501.04519 β’ Published 10 days ago β’ 230