Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models Paper • 2406.09403 • Published Jun 13, 2024 • 22
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published Apr 18, 2024 • 27
BLINK: Multimodal Large Language Models Can See but Not Perceive Paper • 2404.12390 • Published Apr 18, 2024 • 27
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models Paper • 2312.03052 • Published Dec 5, 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering Paper • 2303.11897 • Published Mar 21, 2023