V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric Capabilities in Multimodal Large Language Models Paper ā¢ 2504.06148 ā¢ Published 7 days ago ā¢ 12
Beyond Words: Advancing Long-Text Image Generation via Multimodal Autoregressive Models Paper ā¢ 2503.20198 ā¢ Published 21 days ago ā¢ 4
VideoLLM-online: Online Video Large Language Model for Streaming Video Paper ā¢ 2406.11816 ā¢ Published Jun 17, 2024 ā¢ 25
UniVTG: Towards Unified Video-Language Temporal Grounding Paper ā¢ 2307.16715 ā¢ Published Jul 31, 2023 ā¢ 11