Diffusers
DiTPipeline

Scalable Diffusion Models with Transformers (DiT)

Abstract

We train latent diffusion models, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches. We analyze the scalability of our Diffusion Transformers (DiTs) through the lens of forward pass complexity as measured by Gflops. We find that DiTs with higher Gflops---through increased transformer depth/width or increased number of input tokens---consistently have lower FID. In addition to good scalability properties, our DiT-XL/2 models outperform all prior diffusion models on the class-conditional ImageNet 512Γ—512 and 256Γ—256 benchmarks, achieving a state-of-the-art FID of 2.27 on the latter.

Downloads last month
4,284
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for facebook/DiT-XL-2-256

Finetunes
1 model

Space using facebook/DiT-XL-2-256 1