view article Article π€ππ¬π₯οΈπ Kimi-VL-A3B-Thinking-2506: A Quick Navigation By moonshotai and 1 other β’ 7 days ago β’ 51
view article Article SmolVLM2: Bringing Video Understanding to Every Device By orrzohar and 6 others β’ Feb 20 β’ 276
LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual Preferences Paper β’ 2412.01292 β’ Published Dec 2, 2024 β’ 13
view article Article Releasing the largest multilingual open pretraining dataset By Pclanglais and 2 others β’ Nov 13, 2024 β’ 101
SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation Paper β’ 2409.06633 β’ Published Sep 10, 2024 β’ 15
Meta Flow Matching: Integrating Vector Fields on the Wasserstein Manifold Paper β’ 2408.14608 β’ Published Aug 26, 2024 β’ 8
SAM2Point: Segment Any 3D as Videos in Zero-shot and Promptable Manners Paper β’ 2408.16768 β’ Published Aug 29, 2024 β’ 29
ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model Paper β’ 2408.16767 β’ Published Aug 29, 2024 β’ 33
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper β’ 2408.16532 β’ Published Aug 29, 2024 β’ 51
Scalable High-Resolution Pixel-Space Image Synthesis with Hourglass Diffusion Transformers Paper β’ 2401.11605 β’ Published Jan 21, 2024 β’ 23
view article Article Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA By ybelkada and 4 others β’ May 24, 2023 β’ 156
view article Article Google releases Gemma 2 2B, ShieldGemma and Gemma Scope By Xenova and 3 others β’ Jul 31, 2024 β’ 59