view article Article Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC 6 days ago β’ 17
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM Mar 12 β’ 385
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality Mar 4 β’ 73
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi β’ 13 items β’ Updated Sep 18, 2024 β’ 227
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper β’ 2402.13753 β’ Published Feb 21, 2024 β’ 116
ChatAnything: Facetime Chat with LLM-Enhanced Personas Paper β’ 2311.06772 β’ Published Nov 12, 2023 β’ 35
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper β’ 2311.07069 β’ Published Nov 13, 2023 β’ 45
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper β’ 2311.06783 β’ Published Nov 12, 2023 β’ 28
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper β’ 2311.04145 β’ Published Nov 7, 2023 β’ 35