---
license: mit
pipeline_tag: image-to-image
library_name: diffusers
---
REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
Xingjian Leng1* ·
Jaskirat Singh1* ·
Yunzhong Hou1 ·
Zhenchang Xing2 ·
Saining Xie3 ·
Liang Zheng1
1 Australian National University 2Data61-CSIRO 3New York University
*Project Leads
🌐 Project Page
🤗 Models
📃 Paper
---
We address a fundamental question: ***Can latent diffusion models and their VAE tokenizer be trained end-to-end?*** While training both components jointly with standard diffusion loss is observed to be ineffective — often degrading final performance — we show that this limitation can be overcome using a simple representation-alignment (REPA) loss. Our proposed method, **REPA-E**, enables stable and effective joint training of both the VAE and the diffusion model.
**REPA-E** significantly accelerates training — achieving over **17×** speedup compared to REPA and **45×** over the vanilla training recipe. Interestingly, end-to-end tuning also improves the VAE itself: the resulting **E2E-VAE** provides better latent structure and serves as a **drop-in replacement** for existing VAEs (e.g., SD-VAE), improving convergence and generation quality across diverse LDM architectures. Our method achieves state-of-the-art FID scores on ImageNet 256×256: **1.26** with CFG and **1.83** without CFG.
## Usage and Training
Please refer our [Github Repo](https://github.com/End2End-Diffusion/REPA-E) for detailed notes on end-to-end training and inference using REPA-E.
## 📚 Citation
```bibtex
@article{leng2025repae,
title={REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers},
author={Xingjian Leng and Jaskirat Singh and Yunzhong Hou and Zhenchang Xing and Saining Xie and Liang Zheng},
year={2025},
journal={arXiv preprint arXiv:2504.10483},
}
```