Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment Paper • 2505.18600 • Published 18 days ago • 45
CPGD: Toward Stable Rule-based Reinforcement Learning for Language Models Paper • 2505.12504 • Published 23 days ago • 23
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6 • 93
CoMP: Continual Multimodal Pre-training for Vision Foundation Models Paper • 2503.18931 • Published Mar 24 • 30
World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning Paper • 2503.10480 • Published Mar 13 • 53
Unified Reward Model for Multimodal Understanding and Generation Paper • 2503.05236 • Published Mar 7 • 124
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding Paper • 2312.00081 • Published Nov 30, 2023 • 2
Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning Paper • 2412.03565 • Published Dec 4, 2024 • 11
RelBench: A Benchmark for Deep Learning on Relational Databases Paper • 2407.20060 • Published Jul 29, 2024 • 10
Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model Paper • 2407.16982 • Published Jul 24, 2024 • 43
Understanding Reference Policies in Direct Preference Optimization Paper • 2407.13709 • Published Jul 18, 2024 • 17
Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Model Paper • 2407.07053 • Published Jul 9, 2024 • 47
view article Article Introducing Idefics2: A Powerful 8B Vision-Language Model for the community By Leyo and 2 others • Apr 15, 2024 • 181
We-Math: Does Your Large Multimodal Model Achieve Human-like Mathematical Reasoning? Paper • 2407.01284 • Published Jul 1, 2024 • 81
UltraEdit: Instruction-based Fine-Grained Image Editing at Scale Paper • 2407.05282 • Published Jul 7, 2024 • 15
Benchmarking Multi-Image Understanding in Vision and Language Models: Perception, Knowledge, Reasoning, and Multi-Hop Reasoning Paper • 2406.12742 • Published Jun 18, 2024 • 15