Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps Paper • 2501.09732 • Published 1 day ago • 38
MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents Paper • 2501.08828 • Published 3 days ago • 24
MangaNinja: Line Art Colorization with Precise Reference Following Paper • 2501.08332 • Published 4 days ago • 48
MiniMax-01: Scaling Foundation Models with Lightning Attention Paper • 2501.08313 • Published 4 days ago • 256
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation Paper • 2406.07422 • Published Jun 11, 2024 • 1
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound Paper • 2405.00233 • Published Apr 30, 2024 • 16
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 5 days ago • 72
VideoRAG: Retrieval-Augmented Generation over Video Corpus Paper • 2501.05874 • Published 8 days ago • 61
Efficiently Serving LLM Reasoning Programs with Certaindex Paper • 2412.20993 • Published 19 days ago • 35
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published 25 days ago • 70
Perceiver: General Perception with Iterative Attention Paper • 2103.03206 • Published Mar 4, 2021 • 1
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking Paper • 2501.04519 • Published 10 days ago • 230
Cosmos World Foundation Model Platform for Physical AI Paper • 2501.03575 • Published 11 days ago • 63
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models Paper • 2501.03262 • Published 14 days ago • 82
High-Fidelity Audio Compression with Improved RVQGAN Paper • 2306.06546 • Published Jun 11, 2023 • 10
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Paper • 1907.04448 • Published Jul 9, 2019 • 1