OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published 18 days ago • 63
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Paper • 2505.16707 • Published 20 days ago • 42
NovelSeek: When Agent Becomes the Scientist -- Building Closed-Loop System from Hypothesis to Verification Paper • 2505.16938 • Published 19 days ago • 116
Teller: Real-Time Streaming Audio-Driven Portrait Animation with Autoregressive Motion Generation Paper • 2503.18429 • Published Mar 24 • 2
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets Paper • 2505.07747 • Published 29 days ago • 60
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 88
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 85
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated Apr 12 • 65
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper • 2504.02542 • Published Apr 3 • 46
Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models Paper • 2405.04233 • Published May 7, 2024 • 2
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis Paper • 2411.01156 • Published Nov 2, 2024 • 7
Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published Mar 26 • 52