-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 39 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 18 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 19 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2506.14603
-
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Paper • 2503.09641 • Published • 40 -
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformers
Paper • 2410.10629 • Published • 12 -
Efficient Distillation of Classifier-Free Guidance using Adapters
Paper • 2503.07274 • Published • 4 -
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Paper • 2506.14603 • Published • 17
-
ReZero: Enhancing LLM search ability by trying one-more-time
Paper • 2504.11001 • Published • 15 -
FonTS: Text Rendering with Typography and Style Controls
Paper • 2412.00136 • Published -
GenEx: Generating an Explorable World
Paper • 2412.09624 • Published • 97 -
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Paper • 2412.13663 • Published • 150
-
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models
Paper • 2411.07126 • Published • 31 -
Modifying Large Language Model Post-Training for Diverse Creative Writing
Paper • 2503.17126 • Published • 36 -
Align Your Flow: Scaling Continuous-Time Flow Map Distillation
Paper • 2506.14603 • Published • 17 -
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Paper • 2506.18898 • Published • 23
-
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper • 2407.08083 • Published • 33 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 62 -
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper • 2408.15237 • Published • 42 -
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper • 2409.11355 • Published • 31
-
TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing
Paper • 2312.05605 • Published • 3 -
VMamba: Visual State Space Model
Paper • 2401.10166 • Published • 40 -
Rethinking Patch Dependence for Masked Autoencoders
Paper • 2401.14391 • Published • 27 -
Deconstructing Denoising Diffusion Models for Self-Supervised Learning
Paper • 2401.14404 • Published • 18