Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper β’ 2504.08685 β’ Published 3 days ago β’ 81
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis Paper β’ 2504.04842 β’ Published 8 days ago β’ 28
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper β’ 2504.06263 β’ Published 6 days ago β’ 139
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving Paper β’ 2504.02605 β’ Published 11 days ago β’ 43
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper β’ 2504.01990 β’ Published 14 days ago β’ 238
SANA-Sprint Collection πSANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation β’ 6 items β’ Updated 9 days ago β’ 33
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory β’ 8 items β’ Updated 12 days ago β’ 116
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper β’ 2503.23461 β’ Published 15 days ago β’ 92
Transformers Use Causal World Models in Maze-Solving Tasks Paper β’ 2412.11867 β’ Published Dec 16, 2024 β’ 1
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper β’ 2503.19757 β’ Published 20 days ago β’ 49
MAPS: A Multi-Agent Framework Based on Big Seven Personality and Socratic Guidance for Multimodal Scientific Problem Solving Paper β’ 2503.16905 β’ Published 25 days ago β’ 53
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper β’ 2503.16660 β’ Published 25 days ago β’ 71
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders Paper β’ 2503.18878 β’ Published 21 days ago β’ 113