|
--- |
|
library_name: transformers |
|
tags: [] |
|
--- |
|
|
|
# Scaling Down Text Encoders of Text-to-Image Diffusion Models |
|
|
|
Official Repository of the paper: *[Scaling Down Text Encoders of Text-to-Image Diffusion Models](https://github.com/LifuWang-66/DistillT5)*. |
|
|
|
Project Page: https://github.com/LifuWang-66/DistillT5.git |
|
|
|
## Model Descriptions: |
|
T5-Base distilled from [T5-XXL](https://huggingface.co/google/flan-t5-xxl) using [Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev). |
|
It is 50 times smaller and retains most capability of T5-XXL. |
|
|
|
## Generation Results: |
|
|
|
<p align="center"> |
|
<img src="showcase.png"> |
|
</p> |
|
|
|
## Usage: |
|
1. Setup the environment: |
|
``` |
|
git clone https://github.com/LifuWang-66/DistillT5.git |
|
cd DistillT5 |
|
conda create -n distillt5 python=3.12 |
|
conda activate distillt5 |
|
pip install -r requirements.txt |
|
pip install ./diffusers |
|
``` |
|
|
|
2. Inference |
|
```py |
|
import sys |
|
import os |
|
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) |
|
from models.T5_encoder import T5EncoderWithProjection |
|
import torch |
|
from diffusers import FluxPipeline |
|
|
|
|
|
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.float16) |
|
text_encoder = T5EncoderWithProjection.from_pretrained('LifuWang/DistillT5', torch_dtype=torch.float16) |
|
pipe.text_encoder_2 = text_encoder |
|
pipe = pipe.to('cuda') |
|
|
|
prompt = "Photorealistic portrait of a stylish young woman wearing a futuristic golden sequined bodysuit that catches the light, creating a metallic, mirror-like effect. She is wearing large, reflective blue-tinted aviator sunglasses. Over her head, she wears headphones with metallic accents, giving a modern, cyber aesthetic." |
|
|
|
image = pipe(prompt=prompt, num_images_per_prompt=1, guidance_scale=3.5, num_inference_steps=20).images[0] |
|
|
|
image.save("t5_base.png") |
|
``` |
|
|