L&V Models - a shawon Collection

shawon 's Collections

CV

VLMs

RAG

L&V Models

updated Oct 2, 2024

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27, 2024 • 89
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework

Paper • 2403.13248 • Published Mar 20, 2024 • 78
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 51
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models

Paper • 2409.20551 • Published Sep 30, 2024 • 15
Visual Question Decomposition on Multimodal Large Language Models

Paper • 2409.19339 • Published Sep 28, 2024 • 9
Image Copy Detection for Diffusion Models

Paper • 2409.19952 • Published Sep 30, 2024 • 14
FreeInit: Bridging Initialization Gap in Video Diffusion Models

Paper • 2312.07537 • Published Dec 12, 2023 • 27