Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model Paper • 2505.23606 • Published 12 days ago • 14
G1: Bootstrapping Perception and Reasoning Abilities of Vision-Language Model via Reinforcement Learning Paper • 2505.13426 • Published 22 days ago • 12
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression Paper • 2505.19602 • Published 16 days ago • 13
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action Paper • 2505.01583 • Published May 2 • 9
Science-T2I: Addressing Scientific Illusions in Image Synthesis Paper • 2504.13129 • Published Apr 17 • 3
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 88
It's All Connected: A Journey Through Test-Time Memorization, Attentional Bias, Retention, and Online Optimization Paper • 2504.13173 • Published Apr 17 • 19
WORLDMEM: Long-term Consistent World Simulation with Memory Paper • 2504.12369 • Published Apr 16 • 34
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published Apr 11 • 47
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 85
Kimi-VL-A3B Collection Moonshot's efficient MoE VLMs, exceptional on agent, long-context, and thinking • 6 items • Updated Apr 12 • 65
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published Apr 8 • 168
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper • 2504.02160 • Published Apr 2 • 37