Text-to-Image
Diffusers
AIGCer-OPPO commited on
Commit
b817743
ยท
verified ยท
1 Parent(s): 669baca

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +322 -1
README.md CHANGED
@@ -1,4 +1,325 @@
1
  ---
2
  license: apache-2.0
3
  library_name: diffusers
4
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  library_name: diffusers
4
+ base_model:
5
+ - stabilityai/stable-diffusion-xl-base-1.0
6
+ - black-forest-labs/FLUX.1-dev
7
+ pipeline_tag: text-to-image
8
+ ---
9
+ # TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps
10
+
11
+ <p align="center">
12
+ ๐Ÿ“ƒ <a href="https://arxiv.org/html/2406.05768v5" target="_blank">Paper</a> โ€ข
13
+ ๐Ÿค— <a href="https://huggingface.co/OPPOer/TLCM" target="_blank">Checkpoints</a>
14
+ </p>
15
+
16
+ <!-- **TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps** -->
17
+
18
+ <!-- Our method accelerates LDMs via data-free multistep latent consistency distillation (MLCD), and data-free latent consistency distillation is proposed to efficiently guarantee the inter-segment consistency in MLCD.
19
+
20
+ Furthermore, we introduce bags of techniques, e.g., distribution matching, adversarial learning, and preference learning, to enhance TLCMโ€™s performance at few-step inference without any real data.
21
+
22
+ TLCM demonstrates a high level of flexibility by enabling adjustment of sampling steps within the range of 2 to 8 while still producing competitive outputs compared
23
+ to full-step approaches. -->
24
+ we propose an innovative two-stage data-free consistency distillation (TDCD) approach to accelerate latent consistency model. The first stage improves consistency constraint by data-free sub-segment consistency distillation (DSCD). The second stage enforces the
25
+ global consistency across inter-segments through data-free consistency distillation (DCD). Besides, we explore various
26
+ techniques to promote TLCMโ€™s performance in data-free manner, forming Training-efficient Latent Consistency
27
+ Model (TLCM) with 2-8 step inference.
28
+
29
+ TLCM demonstrates a high level of flexibility by enabling adjustment of sampling steps within the range of 2 to 8 while still producing competitive outputs compared
30
+ to full-step approaches.
31
+
32
+ - [Install Dependency](#install-dependency)
33
+ - [Example Use](#example-use)
34
+ - [Art Gallery](#art-gallery)
35
+ - [Addition](#addition)
36
+ - [Citation](#citation)
37
+
38
+ ## Install Dependency
39
+
40
+ ```
41
+ pip install diffusers
42
+ pip install transformers accelerate
43
+ ```
44
+ or try
45
+ ```
46
+ pip install prefetch_generator zhconv peft loguru transformers==4.39.1 accelerate==0.31.0
47
+ ```
48
+ ## Example Use
49
+
50
+ We provide an example inference script in the directory of this repo.
51
+ You should download the Lora path from [here](https://huggingface.co/OPPOer/TLCM) and use a base model, such as [SDXL1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) , as the recommended option.
52
+ After that, you can activate the generation with the following code:
53
+ ```
54
+ python inference.py --prompt {Your prompt} --output_dir {Your output directory} --lora_path {Lora_directory} --base_model_path {Base_model_directory} --infer-steps 4
55
+ ```
56
+ More parameters are presented in paras.py. You can modify them according to your requirements.
57
+
58
+
59
+ <p style="font-size: 24px; font-weight: bold; color: #FF5733; text-align: center;">
60
+ <span style=" padding: 10px; border-radius: 5px;">
61
+ ๐Ÿš€ Update ๐Ÿš€
62
+ </span>
63
+ </p>
64
+
65
+
66
+ We integrate LCMScheduler in the diffuser pipeline for our workflow, so now you can now use a simpler version below with the base model SDXL 1.0, and we **highly recommend** it :
67
+ ```
68
+ import torch,diffusers
69
+ from diffusers import LCMScheduler,AutoPipelineForText2Image
70
+ from peft import LoraConfig, get_peft_model
71
+
72
+ model_id = "stabilityai/stable-diffusion-xl-base-1.0"
73
+ lora_path = 'path/to/the/lora'
74
+ lora_config = LoraConfig(
75
+ r=64,
76
+ target_modules=[
77
+ "to_q",
78
+ "to_k",
79
+ "to_v",
80
+ "to_out.0",
81
+ "proj_in",
82
+ "proj_out",
83
+ "ff.net.0.proj",
84
+ "ff.net.2",
85
+ "conv1",
86
+ "conv2",
87
+ "conv_shortcut",
88
+ "downsamplers.0.conv",
89
+ "upsamplers.0.conv",
90
+ "time_emb_proj",
91
+ ],
92
+ )
93
+
94
+ pipe = AutoPipelineForText2Image.from_pretrained(model_id,torch_dtype=torch.float16, variant="fp16")
95
+ pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
96
+ unet=pipe.unet
97
+ unet = get_peft_model(unet, lora_config)
98
+ unet.load_adapter(lora_path, adapter_name="default")
99
+ pipe.unet=unet
100
+ pipe.to('cuda')
101
+
102
+ eval_step=4 # the step can be changed within 2-8 steps
103
+
104
+ prompt = "An astronaut riding a horse in the jungle"
105
+ # disable guidance_scale by passing 0
106
+ image = pipe(prompt=prompt, num_inference_steps=eval_step, guidance_scale=0).images[0]
107
+ ```
108
+
109
+
110
+ We also adapt our methods based on [**FLUX**](https://huggingface.co/black-forest-labs/FLUX.1-dev) model.
111
+ You can down load the corresponding LoRA model [here]() and load it with the base model for faster sampling.
112
+ The sampling script for faster FLUX sampling as below:
113
+ ```
114
+ import os,torch
115
+ from diffusers import FluxPipeline
116
+ from scheduling_flow_match_tlcm import FlowMatchEulerTLCMScheduler
117
+ from peft import LoraConfig, get_peft_model
118
+
119
+ model_id = "black-forest-labs/FLUX.1-dev"
120
+ lora_path = "path/to/the/lora/folder"
121
+ lora_config = LoraConfig(
122
+ r=64,
123
+ target_modules=[
124
+ "to_k", "to_q", "to_v", "to_out.0",
125
+ "proj_in",
126
+ "proj_out",
127
+ "ff.net.0.proj",
128
+ "ff.net.2",
129
+ # new
130
+ "context_embedder", "x_embedder",
131
+ "linear", "linear_1", "linear_2",
132
+ "proj_mlp",
133
+ "add_k_proj", "add_q_proj", "add_v_proj", "to_add_out",
134
+ "ff_context.net.0.proj", "ff_context.net.2"
135
+ ],
136
+ )
137
+
138
+ pipe = FluxPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16)
139
+ pipe.scheduler = FlowMatchEulerTLCMScheduler.from_config(pipe.scheduler.config)
140
+ pipe.to('cuda:0')
141
+ transformer = pipe.transformer
142
+ transformer = get_peft_model(transformer, lora_config)
143
+ transformer.load_adapter(lora_path, adapter_name="default", is_trainable=False)
144
+ pipe.transformer=transformer
145
+
146
+ eval_step=4 # the step can be changed within 2-8 steps
147
+
148
+ prompt = "An astronaut riding a horse in the jungle"
149
+ image = pipe(prompt=prompt, num_inference_steps=eval_step, guidance_scale=7).images[0]
150
+ ```
151
+ ## Art Gallery
152
+ Here we present some examples based on **SDXL** with different samping steps.
153
+
154
+ <div align="center">
155
+ <p>2-Steps Sampling</p>
156
+ </div>
157
+ <div style="display: flex; justify-content: center; flex-wrap: wrap;">
158
+ <img src="assets/SDXL/2steps/dog.jpg" alt="ๅ›พ็‰‡1" width="180" style="margin: 10px;" />
159
+ <img src="assets/SDXL/2steps/girl1.jpg" alt="ๅ›พ็‰‡2" width="180" style="margin: 10px;" />
160
+ <img src="assets/SDXL/2steps/girl2.jpg" alt="ๅ›พ็‰‡3" width="180" style="margin: 10px;" />
161
+ <img src="assets/SDXL/2steps/rose.jpg" alt="ๅ›พ็‰‡4" width="180" style="margin: 10px;" />
162
+ </div>
163
+
164
+ <div align="center">
165
+ <p>3-Steps Sampling</p>
166
+ </div>
167
+ <div style="display: flex; justify-content: center; flex-wrap: wrap;">
168
+ <img src="assets/SDXL/3steps/batman.jpg" alt="ๅ›พ็‰‡1" width="180" style="margin: 10px;" />
169
+ <img src="assets/SDXL/3steps/horse.jpg" alt="ๅ›พ็‰‡2" width="180" style="margin: 10px;" />
170
+ <img src="assets/SDXL/3steps/living room.jpg" alt="ๅ›พ็‰‡3" width="180" style="margin: 10px;" />
171
+ <img src="assets/SDXL/3steps/woman.jpg" alt="ๅ›พ็‰‡4" width="180" style="margin: 10px;" />
172
+ </div>
173
+
174
+ <div align="center">
175
+ <p>4-Steps Sampling</p>
176
+ </div>
177
+ <div style="display: flex; justify-content: center; flex-wrap: wrap;">
178
+ <img src="assets/SDXL/4steps/boat.jpg" alt="ๅ›พ็‰‡1" width="180" style="margin: 10px;" />
179
+ <img src="assets/SDXL/4steps/building.jpg" alt="ๅ›พ็‰‡2" width="180" style="margin: 10px;" />
180
+ <img src="assets/SDXL/4steps/mountain.jpg" alt="ๅ›พ็‰‡3" width="180" style="margin: 10px;" />
181
+ <img src="assets/SDXL/4steps/wedding.jpg" alt="ๅ›พ็‰‡4" width="180" style="margin: 10px;" />
182
+ </div>
183
+
184
+ <div align="center">
185
+ <p>8-Steps Sampling</p>
186
+ </div>
187
+ <div style="display: flex; justify-content: center; flex-wrap: wrap;">
188
+ <img src="assets/SDXL/8steps/car.jpg" alt="ๅ›พ็‰‡1" width="180" style="margin: 10px;" />
189
+ <img src="assets/SDXL/8steps/cat.jpg" alt="ๅ›พ็‰‡2" width="180" style="margin: 10px;" />
190
+ <img src="assets/SDXL/8steps/robot.jpg" alt="ๅ›พ็‰‡3" width="180" style="margin: 10px;" />
191
+ <img src="assets/SDXL/8steps/woman.jpg" alt="ๅ›พ็‰‡4" width="180" style="margin: 10px;" />
192
+ </div>
193
+
194
+ We also present some examples based on **FLUX**.
195
+ <div align="center">
196
+ <p>3-Steps Sampling</p>
197
+ </div>
198
+ <div style="display: flex; justify-content: center; flex-wrap: wrap;">
199
+ <div style="text-align: center; margin: 10px;">
200
+ <img src="assets/FLUX/3steps/portrait.jpg" alt="ๅ›พ็‰‡1" width="180" />
201
+ <br />
202
+ <span>Seasoned female journalist...</span><br>
203
+ <span>eyes behind glasses...</span>
204
+ </div>
205
+ <div style="text-align: center; margin: 10px;">
206
+ <img src="assets/FLUX/3steps/hallway.jpg" alt="ๅ›พ็‰‡2" width="180" />
207
+ <br/>
208
+ <span>A grand hallway</span><br>
209
+ <span>inside an opulent palace...</span>
210
+ </div>
211
+ <div style="text-align: center; margin: 10px;">
212
+ <img src="assets/FLUX/3steps/starnight.jpg" alt="ๅ›พ็‰‡3" width="180" />
213
+ <br />
214
+ <span>Van Goghโ€™s Starry Night...</span><br>
215
+ <span>replace... with cityscape</span>
216
+ </div>
217
+ <div style="text-align: center; margin: 10px;">
218
+ <img src="assets/FLUX/3steps/sailor.jpg" alt="ๅ›พ็‰‡4" width="180" />
219
+ <br />
220
+ <span>A weathered sailor...</span><br>
221
+ <span>blue eyes...</span>
222
+ </div>
223
+ </div>
224
+ <div align="center">
225
+ <p>4-Steps Sampling</p>
226
+ </div>
227
+ <div style="display: flex; justify-content: center; flex-wrap: wrap;">
228
+ <div style="text-align: center; margin: 10px;">
229
+ <img src="assets/FLUX/4steps/guitar.jpg" alt="ๅ›พ็‰‡1" width="180" />
230
+ <br />
231
+ <span>A guitar,</span><br>
232
+ <span>2d minimalistic icon...</span>
233
+ </div>
234
+ <div style="text-align: center; margin: 10px;">
235
+ <img src="assets/FLUX/4steps/cat.jpg" alt="ๅ›พ็‰‡2" width="180" />
236
+ <br/>
237
+ <span>A cat</span><br>
238
+ <span>near the window...</span>
239
+ </div>
240
+ <div style="text-align: center; margin: 10px;">
241
+ <img src="assets/FLUX/4steps/rabbit.jpg" alt="ๅ›พ็‰‡3" width="180" />
242
+ <br />
243
+ <span>close up photo of a rabbit...</span><br>
244
+ <span>forest in spring...</span>
245
+ </div>
246
+ <div style="text-align: center; margin: 10px;">
247
+ <img src="assets/FLUX/4steps/blossom.jpg" alt="ๅ›พ็‰‡4" width="180" />
248
+ <br />
249
+ <span>...urban decay...</span><br>
250
+ <span>...a vibrant cherry blossom...</span>
251
+ </div>
252
+ </div>
253
+ <div align="center">
254
+ <p>6-Steps Sampling</p>
255
+ </div>
256
+ <div style="display: flex; justify-content: center; flex-wrap: wrap;">
257
+ <div style="text-align: center; margin: 10px;">
258
+ <img src="assets/FLUX/6steps/dog.jpg" alt="ๅ›พ็‰‡1" width="180" />
259
+ <br />
260
+ <span>A cute dog</span><br>
261
+ <span>on the grass...</span>
262
+ </div>
263
+ <div style="text-align: center; margin: 10px;">
264
+ <img src="assets/FLUX/6steps/tea.jpg" alt="ๅ›พ็‰‡2" width="180" />
265
+ <br/>
266
+ <span>...hot floral tea</span><br>
267
+ <span>in glass kettle...</span>
268
+ </div>
269
+ <div style="text-align: center; margin: 10px;">
270
+ <img src="assets/FLUX/6steps/bag.jpg" alt="ๅ›พ็‰‡3" width="180" />
271
+ <br />
272
+ <span>...a bag...</span><br>
273
+ <span>luxury product style...</span>
274
+ </div>
275
+ <div style="text-align: center; margin: 10px;">
276
+ <img src="assets/FLUX/6steps/cat.jpg" alt="ๅ›พ็‰‡4" width="180" />
277
+ <br />
278
+ <span>a master jedi cat...</span><br>
279
+ <span>wearing a jedi cloak hood</span>
280
+ </div>
281
+ </div>
282
+ <div align="center">
283
+ <p>8-Steps Sampling</p>
284
+ </div>
285
+ <div style="display: flex; justify-content: center; flex-wrap: wrap;">
286
+ <div style="text-align: center; margin: 10px;">
287
+ <img src="assets/FLUX/8steps/lion.jpg" alt="ๅ›พ็‰‡1" width="180" />
288
+ <br />
289
+ <span>A lion...</span><br>
290
+ <span>low-poly game art...</span>
291
+ </div>
292
+ <div style="text-align: center; margin: 10px;">
293
+ <img src="assets/FLUX/8steps/street.jpg" alt="ๅ›พ็‰‡2" width="180" />
294
+ <br/>
295
+ <span>Tokyo street...</span><br>
296
+ <span>blurred motion...</span>
297
+ </div>
298
+ <div style="text-align: center; margin: 10px;">
299
+ <img src="assets/FLUX/8steps/dragon.jpg" alt="ๅ›พ็‰‡3" width="180" />
300
+ <br />
301
+ <span>A tiny red dragon sleeps</span><br>
302
+ <span>curled up in a nest...</span>
303
+ </div>
304
+ <div style="text-align: center; margin: 10px;">
305
+ <img src="assets/FLUX/8steps/female.jpg" alt="ๅ›พ็‰‡4" width="180" />
306
+ <br />
307
+ <span>A female...a postcard</span><br>
308
+ <span>with "WanderlustDreamer"</span>
309
+ </div>
310
+ </div>
311
+
312
+
313
+ ## Addition
314
+
315
+ We also provide the latent lpips model [here](https://huggingface.co/OPPOer/TLCM).
316
+ More details are presented in the paper.
317
+
318
+ ## Citation
319
+
320
+ ```
321
+ @article{xietlcm,
322
+ title={TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps},
323
+ author={Xie, Qingsong and Liao, Zhenyi and Chen, Chen and Deng, Zhijie and TANG, SHIXIANG and Lu, Haonan}
324
+ }
325
+ ```