MiniCPM-V 4.5 ๐ New MLLM for image, multi-image & video understanding, running even on your phone, released by OpenBMB openbmb/MiniCPM-V-4_5
โจ SOTA vision language capability โจ 96ร video token compression > high-FPS & long video reasoning โจ Switchable fast vs deep thinking modes โจ Strong OCR, document parsing, supports 30+ languages
OpenGVLab's InternVL3_5-2B-MPO [Mixed Preference Optimization (MPO)] is a compact vision-language model in the InternVL3.5 series. You can now experience it in the Tiny VLMs Lab, an app featuring 15+ multimodal VLMs ranging from 250M to 4B parameters. These models support tasks such as OCR, reasoning, single-shot answering with small models, and captioning (including ablated variants), across a broad range of visual categories. They are also capable of handling images with complex, sensitive, or nuanced content, while adapting to varying aspect ratios and resolutions.