Visual instruction datasets for visual language models Collections of multimodal (image+text) instruction finetuning datasets tailored for visual language models like LlaVA, Fuyu, or IDEFICS. q-future/Q-Instruct-DB Preview • Updated Jan 9, 2024 • 111 • 18 X2FD/LVIS-Instruct4V Viewer • Updated Nov 13, 2023 • 223k • 87 • 83 BAAI/SVIT Updated Jan 2, 2024 • 49 • 31 MMInstruction/M3IT Updated Nov 24, 2023 • 2.47k • 128
Visual instruction datasets for visual language models Collections of multimodal (image+text) instruction finetuning datasets tailored for visual language models like LlaVA, Fuyu, or IDEFICS. q-future/Q-Instruct-DB Preview • Updated Jan 9, 2024 • 111 • 18 X2FD/LVIS-Instruct4V Viewer • Updated Nov 13, 2023 • 223k • 87 • 83 BAAI/SVIT Updated Jan 2, 2024 • 49 • 31 MMInstruction/M3IT Updated Nov 24, 2023 • 2.47k • 128