Efficient-Large-Model/Sana_Sprint_1.6B_1024px_teacher Text-to-Image β’ Updated 21 days ago β’ 9 β’ 1
VL-Rethinker: Incentivizing Self-Reflection of Vision-Language Models with Reinforcement Learning Paper β’ 2504.08837 β’ Published 11 days ago β’ 41
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper β’ 2504.08736 β’ Published 10 days ago β’ 46
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper β’ 2504.00595 β’ Published 20 days ago β’ 34
ChatAnyone: Stylized Real-time Portrait Video Generation with Hierarchical Motion Diffusion Model Paper β’ 2503.21144 β’ Published 25 days ago β’ 25
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity Paper β’ 2503.16418 β’ Published Mar 20 β’ 35