view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM 12 days ago β’ 338
view article Article A Deepdive into Aya Vision: Advancing the Frontier of Multilingual Multimodality 20 days ago β’ 69
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi β’ 13 items β’ Updated Sep 18, 2024 β’ 226
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper β’ 2402.13753 β’ Published Feb 21, 2024 β’ 115
ChatAnything: Facetime Chat with LLM-Enhanced Personas Paper β’ 2311.06772 β’ Published Nov 12, 2023 β’ 35
Music ControlNet: Multiple Time-varying Controls for Music Generation Paper β’ 2311.07069 β’ Published Nov 13, 2023 β’ 45
Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models Paper β’ 2311.06783 β’ Published Nov 12, 2023 β’ 28
I2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models Paper β’ 2311.04145 β’ Published Nov 7, 2023 β’ 35
Learning From Mistakes Makes LLM Better Reasoner Paper β’ 2310.20689 β’ Published Oct 31, 2023 β’ 29
CapsFusion: Rethinking Image-Text Data at Scale Paper β’ 2310.20550 β’ Published Oct 31, 2023 β’ 26