xingjianleng commited on
Commit
4231e1d
Β·
verified Β·
1 Parent(s): 572e5e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +102 -4
README.md CHANGED
@@ -1,4 +1,102 @@
1
- ---
2
- license: mit
3
- library_name: diffusers
4
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-image
4
+ library_name: diffusers
5
+ ---
6
+
7
+ <h1 align="center">
8
+ REPA-E for T2I: End-to-End Tuned VAEs for Supercharging Text-to-Image Diffusion Transformers
9
+ </h1>
10
+
11
+ <p align="center">
12
+ <a href="https://scholar.google.com.au/citations?user=GQzvqS4AAAAJ" target="_blank">Xingjian&nbsp;Leng</a><sup>1,2*</sup> &ensp; <b>&middot;</b> &ensp;
13
+ <a href="https://1jsingh.github.io/" target="_blank">Jaskirat&nbsp;Singh</a><sup>1</sup> &ensp; <b>&middot;</b> &ensp;
14
+ <a href="https://rynmurdock.github.io/" target="_blank">Ryan&nbsp;Murdock</a><sup>2</sup> &ensp; <b>&middot;</b> &ensp;
15
+ <a href="https://www.ethansmith2000.com/" target="_blank">Ethan&nbsp;Smith</a><sup>2</sup> &ensp; <b>&middot;</b> &ensp;
16
+ <a href="https://xiaoyang-rebecca.github.io/cv/" target="_blank">Rebecca&nbsp;Li</a><sup>2</sup> &ensp; <b>&middot;</b> &ensp;
17
+ <a href="https://www.sainingxie.com/" target="_blank">Saining&nbsp;Xie</a><sup>3</sup>&ensp; <b>&middot;</b> &ensp;
18
+ <a href="https://zheng-lab-anu.github.io/" target="_blank">Liang&nbsp;Zheng</a><sup>1</sup>&ensp;
19
+ </p>
20
+
21
+ <p align="center">
22
+ <sup>1</sup> Australian National University &emsp; <sup>2</sup>Canva &emsp; <sup>3</sup>New York University &emsp; <br>
23
+ <sub><sup>*</sup>Done during internship at Canva &emsp;</sub>
24
+ </p>
25
+
26
+ <p align="center">
27
+ <a href="https://arxiv.org/abs/2504.10483" target="_blank">πŸ“„ REPA-E Paper</a> &ensp; | &ensp;
28
+ <a href="https://end2end-diffusion.github.io/repa-e-t2i/" target="_blank">🌐 Blog Post</a> &ensp; | &ensp;
29
+ <a href="https://huggingface.co/REPA-E" target="_blank">πŸ€— Models</a>
30
+ </p>
31
+
32
+ ---
33
+
34
+ ## πŸš€ Overall
35
+
36
+ <p>
37
+ We present REPA-E for T2I, a family of end-to-end tuned VAEs designed to supercharge text-to-image generation training. These models consistently outperform Qwen-Image-VAE across all benchmarks (COCO-30K, DPG-Bench, GenAI-Bench, GenEval, and MJHQ-30K) without requiring any additional representation alignment losses.
38
+ </p>
39
+
40
+ <p>
41
+ For training, we adopt the <a href="https://github.com/End2End-Diffusion/REPA-E" target="_blank"><strong>official REPA-E training code</strong></a> to optimize the
42
+ <a href="https://huggingface.co/Qwen/Qwen-Image" target="_blank">Qwen-Image-VAE</a> for <strong>80 epochs</strong> with a batch size of <strong>256</strong> on the <strong>ImageNet-256</strong> dataset.
43
+ The REPA-E training effectively refines the VAE’s latent-space structure and enables faster convergence in downstream text-to-image latent diffusion model training.
44
+ </p>
45
+
46
+ <p>
47
+ This repository provides <code>diffusers</code>-compatible weights for the <strong>end-to-end trained Qwen-Image-VAE</strong>. In addition, we release <strong>end-to-end trained variants</strong> of several other widely used VAEs to facilitate research and integration within text-to-image diffusion frameworks.
48
+ </p>
49
+
50
+ ### 🧩 End-to-End Trained VAE Releases
51
+
52
+ | Model | Link |
53
+ |---|---|
54
+ | FLUX-VAE (E2E-trained) | πŸ€— [HF Model Page](https://huggingface.co/REPA-E/e2e-flux-vae) |
55
+ | SD-3.5-VAE (E2E-trained) | πŸ€— [HF Model Page](https://huggingface.co/REPA-E/e2e-sd3.5-vae) |
56
+ | Qwen-Image-VAE (E2E-trained) | πŸ€— [HF Model Page](https://huggingface.co/REPA-E/e2e-qwenimage-vae) |
57
+
58
+ ## πŸ“¦ Requirements
59
+ The following packages are required to load and run the REPA-E VAEs with the `diffusers` library:
60
+
61
+ ```bash
62
+ pip install diffusers>=0.33.0
63
+ pip install torch>=2.3.1
64
+ ```
65
+
66
+ ## πŸš€ Example Usage
67
+ Below is a minimal example showing how to load and use the REPA-E end-to-end trained Qwen-Image-VAE with `diffusers`:
68
+
69
+ ```python
70
+ from io import BytesIO
71
+ import requests
72
+
73
+ from diffusers import AutoencoderKLQwenImage
74
+ import numpy as np
75
+ import torch
76
+ from PIL import Image
77
+
78
+
79
+ response = requests.get("https://s3.amazonaws.com/masters.galleries.prod.dpreview.com/2935392.jpg?X-Amz-Expires=3600&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUIXIAMA3N436PSEA/20251019/us-east-1/s3/aws4_request&X-Amz-Date=20251019T103721Z&X-Amz-SignedHeaders=host&X-Amz-Signature=219dc5f98e5c2e5f3b72587716f75889b8f45b0a01f1bd08dbbc44106e484144")
80
+ device = "cuda"
81
+
82
+ image = torch.from_numpy(
83
+ np.array(
84
+ Image.open(BytesIO(response.content)).resize((512, 512))
85
+ )
86
+ ).permute(2, 0, 1).unsqueeze(0).to(torch.float32) / 127.5 - 1
87
+ image = image.to(device)
88
+
89
+ vae = AutoencoderKLQwenImage.from_pretrained("REPA-E/e2e-qwenimage-vae").to(device)
90
+
91
+ # QwenImage VAE expects an additional dimension for `num_frames`
92
+ image_ = image.unsqueeze(2)
93
+
94
+ with torch.no_grad():
95
+ latents = vae.encode(image_).latent_dist.sample()
96
+ reconstructed = vae.decode(latents).sample
97
+
98
+ # Squeeze the extra frame dimension
99
+ latents = latents.squeeze(2)
100
+ reconstructed = reconstructed.squeeze(2)
101
+
102
+ ```