MangaVQA and MangaLMM: A Benchmark and Specialized Model for Multimodal Manga Understanding Paper • 2505.20298 • Published May 26 • 6
microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition • 6B • Updated May 1 • 450k • 1.48k
JMMMU: A Japanese Massive Multi-discipline Multimodal Understanding Benchmark for Culture-aware Evaluation Paper • 2410.17250 • Published Oct 22, 2024 • 15
stabilityai/japanese-stable-clip-vit-l-16 Feature Extraction • 0.4B • Updated Jul 10, 2024 • 93 • 26