Learnings from Scaling Visual Tokenizers for Reconstruction and Generation Paper • 2501.09755 • Published 1 day ago • 20
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper • 2501.08828 • Published 3 days ago • 24
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 4 days ago • 257
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs Paper • 2501.06186 • Published 8 days ago • 55
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published 8 days ago • 61
The FACTS Grounding Leaderboard: Benchmarking LLMs' Ability to Ground Responses to Long-Form Input Paper • 2501.03200 • Published 12 days ago • 1
InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection Paper • 2501.04575 • Published 10 days ago • 22
Agent Laboratory: Using LLM Agents as Research Assistants Paper • 2501.04227 • Published 10 days ago • 77
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 11 days ago • 63
STAR: Spatial-Temporal Augmentation with Text-to-Video Models for Real-World Video Super-Resolution Paper • 2501.02976 • Published 12 days ago • 51
VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 15 days ago • 40
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published 18 days ago • 24
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published 16 days ago • 47
Training Software Engineering Agents and Verifiers with SWE-Gym Paper • 2412.21139 • Published 19 days ago • 21
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization Paper • 2412.21037 • Published 19 days ago • 23
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 25 days ago • 70
Orient Anything: Learning Robust Object Orientation Estimation from Rendering 3D Models Paper • 2412.18605 • Published 25 days ago • 20
Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models Paper • 2412.18609 • Published 25 days ago • 16