VLMS - a Deping Collection

Deping 's Collections

VisionExpertModels

GeneralDetector

VLMS

updated Sep 22, 2024

PsiPi/liuhaotian_llava-v1.5-13b-GGUF

Image-Text-to-Text • 13B • Updated Mar 11, 2024 • 578 • 36
TRI-ML/prismatic-vlms

Image-to-Text • Updated May 6, 2024 • 28
bczhou/tiny-llava-v1-hf

Image-Text-to-Text • 1B • Updated Aug 17, 2024 • 922 • 57
ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

Paper • 2402.06118 • Published Feb 9, 2024 • 16
LEGO:Language Enhanced Multi-modal Grounding Model

Paper • 2401.06071 • Published Jan 11, 2024 • 13
Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

Paper • 2403.18814 • Published Mar 27, 2024 • 49
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal Language Models

Paper • 2403.16999 • Published Mar 25, 2024 • 6
Salesforce/instructblip-vicuna-7b

Image-Text-to-Text • 8B • Updated Feb 3, 2025 • 10.3k • 101
Pegasus-v1 Technical Report

Paper • 2404.14687 • Published Apr 23, 2024 • 33
List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs

Paper • 2404.16375 • Published Apr 25, 2024 • 18
Needle In A Multimodal Haystack

Paper • 2406.07230 • Published Jun 11, 2024 • 55