MoshiVis v0.1 Collection MoshiVis is a Vision Speech Model built as a perceptually-augmented version of Moshi v0.1 for conversing about image inputs • 8 items • Updated about 10 hours ago • 6
Training and Inference Efficiency of Encoder-Decoder Speech Models Paper • 2503.05931 • Published 14 days ago • 2
Cosmos Transfer1 Collection World Foundation Model for Domain Transfer • 5 items • Updated 1 day ago • 9
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 10 days ago • 328
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! 15 days ago • 42
Jamba 1.6 Collection The AI21 Jamba family of models are hybrid SSM-Transformer foundation models, outperforming open model competitors on quality and speed. • 2 items • Updated 15 days ago • 11
C4AI Aya Vision Collection Aya Vision is a state-of-the-art family of vision models that brings multimodal capabilities to 23 languages. • 5 items • Updated 17 days ago • 68
C4AI Aya Expanse Collection Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. • 4 items • Updated 19 days ago • 38
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 18 days ago • 68
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion Paper • 2503.01183 • Published 19 days ago • 26