Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated 1 day ago • 58
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published 6 days ago • 155
State of open code models (March 2025) Collection The best open code models on Hugging Face as of March 2025 • 7 items • Updated 19 days ago • 2
MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 8 items • Updated 23 days ago • 22
Training and Inference Efficiency of Encoder-Decoder Speech Models Paper • 2503.05931 • Published Mar 7 • 3
Cosmos Transfer1 Collection Multimodal Conditional World Generation for World2World Transfer • 5 items • Updated 3 days ago • 14
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 • 384
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! Mar 7 • 51