Latent Diffusion Model without Variational Autoencoder
Paper
•
2510.15301
•
Published
•
49
SVG is a latent diffusion model framework that replaces the traditional VAE latent space with semantically structured features from self-supervised vision models (e.g., DINOv3). This design improves generative capability and downstream transferability while maintaining efficiency comparable to standard VAE-based latent diffusion models.
Key features:
For code, and instructions, see the GitHub repository:
https://github.com/shiml20/SVG
Official project page:
https://howlin-wang.github.io/svg/
Arxiv paper: