Tinker: Diffusion's Gift to 3D--Multi-View Consistent Editing From Sparse Inputs without Per-Scene Optimization Paper โข 2508.14811 โข Published 5 days ago โข 38
Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities Paper โข 2507.06261 โข Published Jul 7 โข 59
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper โข 2506.13585 โข Published Jun 16 โข 263
SRPO: Enhancing Multimodal LLM Reasoning via Reflection-Aware Reinforcement Learning Paper โข 2506.01713 โข Published Jun 2 โข 47
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL Paper โข 2505.17952 โข Published May 23 โข 20
MoCha: Towards Movie-Grade Talking Character Synthesis Paper โข 2503.23307 โข Published Mar 30 โข 138
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper โข 2503.23461 โข Published Mar 30 โข 95
FLUX.1 Collection A collection of our FLUX.1 models and LoRAs. โข 10 items โข Updated 19 days ago โข 191
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning Paper โข 2502.19634 โข Published Feb 26 โข 64
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper โข 2502.14786 โข Published Feb 20 โข 146
Chirpy3D: Continuous Part Latents for Creative 3D Bird Generation Paper โข 2501.04144 โข Published Jan 7 โข 19
Qwen2-VL Collection Vision-language model series based on Qwen2 โข 16 items โข Updated Jul 21 โข 224