|
--- |
|
license: apache-2.0 |
|
library_name: diffusers |
|
base_model: |
|
- stabilityai/stable-diffusion-xl-base-1.0 |
|
- black-forest-labs/FLUX.1-dev |
|
pipeline_tag: text-to-image |
|
--- |
|
# TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps |
|
|
|
<p align="center"> |
|
๐ <a href="https://arxiv.org/html/2406.05768v5" target="_blank">Paper</a> โข |
|
๐ค <a href="https://huggingface.co/OPPOer/TLCM" target="_blank">Checkpoints</a> |
|
</p> |
|
|
|
<!-- **TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps** --> |
|
|
|
<!-- Our method accelerates LDMs via data-free multistep latent consistency distillation (MLCD), and data-free latent consistency distillation is proposed to efficiently guarantee the inter-segment consistency in MLCD. |
|
|
|
Furthermore, we introduce bags of techniques, e.g., distribution matching, adversarial learning, and preference learning, to enhance TLCMโs performance at few-step inference without any real data. |
|
|
|
TLCM demonstrates a high level of flexibility by enabling adjustment of sampling steps within the range of 2 to 8 while still producing competitive outputs compared |
|
to full-step approaches. --> |
|
we propose an innovative two-stage data-free consistency distillation (TDCD) approach to accelerate latent consistency model. The first stage improves consistency constraint by data-free sub-segment consistency distillation (DSCD). The second stage enforces the |
|
global consistency across inter-segments through data-free consistency distillation (DCD). Besides, we explore various |
|
techniques to promote TLCMโs performance in data-free manner, forming Training-efficient Latent Consistency |
|
Model (TLCM) with 2-8 step inference. |
|
|
|
TLCM demonstrates a high level of flexibility by enabling adjustment of sampling steps within the range of 2 to 8 while still producing competitive outputs compared |
|
to full-step approaches. |
|
|
|
- [Install Dependency](#install-dependency) |
|
- [Example Use](#example-use) |
|
- [Art Gallery](#art-gallery) |
|
- [Addition](#addition) |
|
- [Citation](#citation) |
|
|
|
## Install Dependency |
|
|
|
``` |
|
pip install diffusers |
|
pip install transformers accelerate |
|
``` |
|
or try |
|
``` |
|
pip install prefetch_generator zhconv peft loguru transformers==4.39.1 accelerate==0.31.0 |
|
``` |
|
## Example Use |
|
|
|
We provide an example inference script in the directory of this repo. |
|
You should download the LoRA path from [Flux-LoRA](https://huggingface.co/OPPOer/TLCMFlux) or [SDXL-LoRA](https://huggingface.co/OPPOer/TLCMSDXL) and use a base model, such as [SDXL1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0) , as the recommended option. |
|
After that, you can activate the generation with the following code: |
|
``` |
|
python inference.py --prompt {Your prompt} --output_dir {Your output directory} --lora_path {Lora_directory} --base_model_path {Base_model_directory} --infer-steps 4 |
|
``` |
|
More parameters are presented in paras.py. You can modify them according to your requirements. |
|
|
|
|
|
<p style="font-size: 24px; font-weight: bold; color: #FF5733; text-align: center;"> |
|
<span style=" padding: 10px; border-radius: 5px;"> |
|
๐ Update ๐ |
|
</span> |
|
</p> |
|
|
|
|
|
We integrate LCMScheduler in the diffuser pipeline for our workflow, so now you can now use a simpler version below with the base model SDXL 1.0, and we **highly recommend** it : |
|
``` |
|
import torch,diffusers |
|
from diffusers import LCMScheduler,AutoPipelineForText2Image |
|
from peft import LoraConfig, get_peft_model |
|
|
|
model_id = "stabilityai/stable-diffusion-xl-base-1.0" |
|
lora_path = 'path/to/the/lora' |
|
lora_config = LoraConfig( |
|
r=64, |
|
target_modules=[ |
|
"to_q", |
|
"to_k", |
|
"to_v", |
|
"to_out.0", |
|
"proj_in", |
|
"proj_out", |
|
"ff.net.0.proj", |
|
"ff.net.2", |
|
"conv1", |
|
"conv2", |
|
"conv_shortcut", |
|
"downsamplers.0.conv", |
|
"upsamplers.0.conv", |
|
"time_emb_proj", |
|
], |
|
) |
|
|
|
pipe = AutoPipelineForText2Image.from_pretrained(model_id,torch_dtype=torch.float16, variant="fp16") |
|
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config) |
|
unet=pipe.unet |
|
unet = get_peft_model(unet, lora_config) |
|
unet.load_adapter(lora_path, adapter_name="default") |
|
pipe.unet=unet |
|
pipe.to('cuda') |
|
|
|
eval_step=4 # the step can be changed within 2-8 steps |
|
|
|
prompt = "An astronaut riding a horse in the jungle" |
|
# disable guidance_scale by passing 0 |
|
image = pipe(prompt=prompt, num_inference_steps=eval_step, guidance_scale=0).images[0] |
|
``` |
|
|
|
|
|
We also adapt our methods based on [**FLUX**](https://huggingface.co/black-forest-labs/FLUX.1-dev) model. |
|
You can down load the corresponding LoRA model [here]() and load it with the base model for faster sampling. |
|
The sampling script for faster FLUX sampling as below: |
|
``` |
|
import os,torch |
|
from diffusers import FluxPipeline |
|
from scheduling_flow_match_tlcm import FlowMatchEulerTLCMScheduler |
|
from peft import LoraConfig, get_peft_model |
|
|
|
model_id = "black-forest-labs/FLUX.1-dev" |
|
lora_path = "path/to/the/lora/folder" |
|
lora_config = LoraConfig( |
|
r=64, |
|
target_modules=[ |
|
"to_k", "to_q", "to_v", "to_out.0", |
|
"proj_in", |
|
"proj_out", |
|
"ff.net.0.proj", |
|
"ff.net.2", |
|
"context_embedder", "x_embedder", |
|
"linear", "linear_1", "linear_2", |
|
"proj_mlp", |
|
"add_k_proj", "add_q_proj", "add_v_proj", "to_add_out", |
|
"ff_context.net.0.proj", "ff_context.net.2" |
|
], |
|
) |
|
|
|
pipe = FluxPipeline.from_pretrained(model_id, torch_dtype=torch.bfloat16) |
|
pipe.scheduler = FlowMatchEulerTLCMScheduler.from_config(pipe.scheduler.config) |
|
pipe.to('cuda:0') |
|
transformer = pipe.transformer |
|
transformer = get_peft_model(transformer, lora_config) |
|
transformer.load_adapter(lora_path, adapter_name="default", is_trainable=False) |
|
pipe.transformer=transformer |
|
|
|
eval_step=4 # the step can be changed within 2-8 steps |
|
|
|
prompt = "An astronaut riding a horse in the jungle" |
|
image = pipe(prompt=prompt, num_inference_steps=eval_step, guidance_scale=7).images[0] |
|
``` |
|
## Art Gallery |
|
Here we present some examples based on **SDXL** with different samping steps. |
|
|
|
<div align="center"> |
|
<p>2-Steps Sampling</p> |
|
</div> |
|
<div style="display: flex; justify-content: center; flex-wrap: wrap;"> |
|
<img src="assets/SDXL/2steps/dog.jpg" alt="ๅพ็1" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/2steps/girl1.jpg" alt="ๅพ็2" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/2steps/girl2.jpg" alt="ๅพ็3" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/2steps/rose.jpg" alt="ๅพ็4" width="170" style="margin: 0px;" /> |
|
</div> |
|
|
|
<div align="center"> |
|
<p>3-Steps Sampling</p> |
|
</div> |
|
<div style="display: flex; justify-content: center; flex-wrap: wrap;"> |
|
<img src="assets/SDXL/3steps/batman.jpg" alt="ๅพ็1" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/3steps/horse.jpg" alt="ๅพ็2" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/3steps/living room.jpg" alt="ๅพ็3" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/3steps/woman.jpg" alt="ๅพ็4" width="170" style="margin: 0px;" /> |
|
</div> |
|
|
|
<div align="center"> |
|
<p>4-Steps Sampling</p> |
|
</div> |
|
<div style="display: flex; justify-content: center; flex-wrap: wrap;"> |
|
<img src="assets/SDXL/4steps/boat.jpg" alt="ๅพ็1" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/4steps/building.jpg" alt="ๅพ็2" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/4steps/mountain.jpg" alt="ๅพ็3" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/4steps/wedding.jpg" alt="ๅพ็4" width="170" style="margin: 0px;" /> |
|
</div> |
|
|
|
<div align="center"> |
|
<p>8-Steps Sampling</p> |
|
</div> |
|
<div style="display: flex; justify-content: center; flex-wrap: wrap;"> |
|
<img src="assets/SDXL/8steps/car.jpg" alt="ๅพ็1" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/8steps/cat.jpg" alt="ๅพ็2" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/8steps/robot.jpg" alt="ๅพ็3" width="170" style="margin: 0px;" /> |
|
<img src="assets/SDXL/8steps/woman.jpg" alt="ๅพ็4" width="170" style="margin: 0px;" /> |
|
</div> |
|
|
|
We also present some examples based on **FLUX**. |
|
<div align="center"> |
|
<p>3-Steps Sampling</p> |
|
</div> |
|
<div style="display: flex; justify-content: center; flex-wrap: wrap;"> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/3steps/portrait.jpg" alt="ๅพ็1" width="170" /> |
|
<span style="font-size: 12px;">Seasoned female journalist...</span><br> |
|
<span style="font-size: 12px;">eyes behind glasses...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/3steps/hallway.jpg" alt="ๅพ็2" width="170" /> |
|
<span style="font-size: 12px;">A grand hallway</span><br> |
|
<span style="font-size: 12px;">inside an opulent palace...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/3steps/starnight.jpg" alt="ๅพ็3" width="170" /> |
|
<span style="font-size: 12px;">Van Goghโs Starry Night...</span><br> |
|
<span style="font-size: 12px;">replace... with cityscape</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/3steps/sailor.jpg" alt="ๅพ็4" width="170" /> |
|
<span style="font-size: 12px;">A weathered sailor...</span><br> |
|
<span style="font-size: 12px;">blue eyes...</span> |
|
</div> |
|
</div> |
|
<div align="center"> |
|
<p>4-Steps Sampling</p> |
|
</div> |
|
<div style="display: flex; justify-content: center; flex-wrap: wrap;"> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/4steps/guitar.jpg" alt="ๅพ็1" width="170" /> |
|
<span style="font-size: 12px;">A guitar,</span><br> |
|
<span style="font-size: 12px;">2d minimalistic icon...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/4steps/cat.jpg" alt="ๅพ็2" width="170" /> |
|
<span style="font-size: 12px;">A cat</span><br> |
|
<span style="font-size: 12px;">near the window...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/4steps/rabbit.jpg" alt="ๅพ็3" width="170" /> |
|
<span style="font-size: 12px;">close up photo of a rabbit...</span><br> |
|
<span style="font-size: 12px;">forest in spring...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/4steps/blossom.jpg" alt="ๅพ็4" width="170" /> |
|
<span style="font-size: 12px;">...urban decay...</span><br> |
|
<span style="font-size: 12px;">...a vibrant cherry blossom...</span> |
|
</div> |
|
</div> |
|
<div align="center"> |
|
<p>6-Steps Sampling</p> |
|
</div> |
|
<div style="display: flex; justify-content: center; flex-wrap: wrap;"> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/6steps/dog.jpg" alt="ๅพ็1" width="170" /> |
|
<span style="font-size: 12px;">A cute dog</span><br> |
|
<span style="font-size: 12px;">on the grass...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/6steps/tea.jpg" alt="ๅพ็2" width="170" /> |
|
<span style="font-size: 12px;">...hot floral tea</span><br> |
|
<span style="font-size: 12px;">in glass kettle...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/6steps/bag.jpg" alt="ๅพ็3" width="170" /> |
|
<span style="font-size: 12px;">...a bag...</span><br> |
|
<span style="font-size: 12px;">luxury product style...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/6steps/cat.jpg" alt="ๅพ็4" width="170" /> |
|
<span style="font-size: 12px;">a master jedi cat...</span><br> |
|
<span style="font-size: 12px;">wearing a jedi cloak hood</span> |
|
</div> |
|
</div> |
|
<div align="center"> |
|
<p>8-Steps Sampling</p> |
|
</div> |
|
<div style="display: flex; justify-content: center; flex-wrap: wrap;"> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/8steps/lion.jpg" alt="ๅพ็1" width="170" /> |
|
<span style="font-size: 12px;">A lion...</span><br> |
|
<span style="font-size: 12px;">low-poly game art...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/8steps/street.jpg" alt="ๅพ็2" width="170" /> |
|
<span style="font-size: 12px;">Tokyo street...</span><br> |
|
<span style="font-size: 12px;">blurred motion...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/8steps/dragon.jpg" alt="ๅพ็3" width="170" /> |
|
<span style="font-size: 12px;">A tiny red dragon sleeps</span><br> |
|
<span style="font-size: 12px;">curled up in a nest...</span> |
|
</div> |
|
<div style="text-align: center; margin: 0px;"> |
|
<img src="assets/FLUX/8steps/female.jpg" alt="ๅพ็4" width="170" /> |
|
<span style="font-size: 12px;">A female...a postcard</span><br> |
|
<span style="font-size: 12px;">with "WanderlustDreamer"</span> |
|
</div> |
|
</div> |
|
|
|
|
|
## Addition |
|
|
|
We also provide the latent lpips model [here](https://huggingface.co/OPPOer/TLCM). |
|
More details are presented in the paper. |
|
|
|
## Citation |
|
|
|
``` |
|
@article{xie2024tlcm, |
|
title={TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps}, |
|
author={Xie, Qingsong and Liao, Zhenyi and Deng, Zhijie and Lu, Haonan}, |
|
journal={arXiv preprint arXiv:2406.05768}, |
|
year={2024} |
|
} |
|
``` |