VLMS
updated
PsiPi/liuhaotian_llava-v1.5-13b-GGUF
Image-Text-to-Text
• 13B • Updated • 578
• 36
Image-to-Text
• Updated • 28
Image-Text-to-Text
• 1B • Updated • 922
• 57
ViGoR: Improving Visual Grounding of Large Vision Language Models with
Fine-Grained Reward Modeling
Paper
• 2402.06118
• Published • 16
LEGO:Language Enhanced Multi-modal Grounding Model
Paper
• 2401.06071
• Published • 13
Mini-Gemini: Mining the Potential of Multi-modality Vision Language
Models
Paper
• 2403.18814
• Published • 49
Visual CoT: Unleashing Chain-of-Thought Reasoning in Multi-Modal
Language Models
Paper
• 2403.16999
• Published • 6
Salesforce/instructblip-vicuna-7b
Image-Text-to-Text
• 8B • Updated • 10.3k
• 101
Pegasus-v1 Technical Report
Paper
• 2404.14687
• Published • 33
List Items One by One: A New Data Source and Learning Paradigm for
Multimodal LLMs
Paper
• 2404.16375
• Published • 18
Needle In A Multimodal Haystack
Paper
• 2406.07230
• Published • 55