Time Blindness: Why Video-Language Models Can't See What Humans Can? Paper • 2505.24867 • Published 6 days ago • 72
view article Article Vision Language Models (Better, Faster, Stronger) By merve and 4 others • 25 days ago • 414
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 7 days ago • 195
ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Paper • 2503.21144 • Published Mar 27 • 25
view article Article SmolVLM Grows Smaller – Introducing the 250M & 500M Models! By andito and 2 others • Jan 23 • 180
view article Article Open-source DeepResearch – Freeing our search agents By m-ric and 4 others • Feb 4 • 1.25k
MatAnyone: Stable Video Matting with Consistent Memory Propagation Paper • 2501.14677 • Published Jan 24 • 36
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 By eliebak and 2 others • Jan 28 • 862
view article Article Run ComfyUI workflows for free on Spaces By multimodalart and 1 other • Jan 14, 2024 • 81
Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Paper • 2412.15322 • Published Dec 19, 2024 • 18
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models Paper • 2411.07126 • Published Nov 11, 2024 • 31
OpenCoder Collection OpenCoder is an open and reproducible code LLM family which matches the performance of top-tier code LLMs. • 8 items • Updated Nov 23, 2024 • 83