microsoft/Phi-4-multimodal-instruct Automatic Speech Recognition โข Updated May 1 โข 496k โข 1.42k
view article Article SmolVLM - small yet mighty Vision Language Model By andito and 4 others โข Nov 26, 2024 โข 309
Apollo: An Exploration of Video Understanding in Large Multimodal Models Paper โข 2412.10360 โข Published Dec 13, 2024 โข 146