SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 28 days ago • 179 • 7
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 28 days ago • 179 • 7
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 146 • 13
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper • 2412.10360 • Published Dec 13, 2024 • 146 • 13
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 53 • 4
Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision Paper • 2407.06189 • Published Jul 8, 2024 • 27 • 3