CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts Paper • 2405.05949 • Published May 9, 2024 • 3
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation Paper • 2412.09585 • Published Dec 12, 2024 • 11
VCoder: Versatile Vision Encoders for Multimodal Large Language Models Paper • 2312.14233 • Published Dec 21, 2023 • 17
Keys to Better Image Inpainting: Structure and Texture Go Hand in Hand Paper • 2208.03382 • Published Aug 5, 2022
OneFormer: One Transformer to Rule Universal Image Segmentation Paper • 2211.06220 • Published Nov 10, 2022