Even Small Reasoners Should Quote Their Sources: Introducing the Pleias-RAG Model Family Paper • 2504.18225 • Published 19 days ago • 12
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse Paper • 2503.16365 • Published Mar 20 • 40
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published Mar 13 • 50
Gemini Embedding: Generalizable Embeddings from Gemini Paper • 2503.07891 • Published Mar 10 • 38
OmniAlign-V: Towards Enhanced Alignment of MLLMs with Human Preference Paper • 2502.18411 • Published Feb 25 • 73
EgoLife Collection CVPR 2025 - EgoLife: Towards Egocentric Life Assistant. Homepage: https://egolife-ai.github.io/ • 10 items • Updated Mar 7 • 17
Multimodal-SAE Collection The collection of the sae that hooked on llava • 5 items • Updated Mar 4 • 8
LLaVA-Video Collection Models focus on video understanding (previously known as LLaVA-NeXT-Video). • 8 items • Updated Feb 21 • 61
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published Dec 10, 2024 • 37
Large Multi-modal Models Can Interpret Features in Large Multi-modal Models Paper • 2411.14982 • Published Nov 22, 2024 • 17
MS MARCO Web Search: a Large-scale Information-rich Web Dataset with Millions of Real Click Labels Paper • 2405.07526 • Published May 13, 2024 • 22
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation Paper • 2410.13861 • Published Oct 17, 2024 • 57
LLaVA-Critic Collection as a general evaluator for assessing model performance • 6 items • Updated Oct 6, 2024 • 10