File size: 1,801 Bytes
514ce27 1a8727d 514ce27 bfc6fe8 514ce27 bfc6fe8 e794be4 7ae4a14 e794be4 598b538 e794be4 1a8727d bfc6fe8 ac31793 1a8727d bfc6fe8 afcfcf4 1a8727d 45fd8ca 1a8727d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 |
---
library_name: transformers
tags: []
---
# Scaling Down Text Encoders of Text-to-Image Diffusion Models
Official Repository of the paper: *[Scaling Down Text Encoders of Text-to-Image Diffusion Models](https://github.com/LifuWang-66/DistillT5)*.
Project Page: https://github.com/LifuWang-66/DistillT5.git
## Model Descriptions:
T5-Base distilled from [T5-XXL](https://huggingface.co/google/flan-t5-xxl) using [Flux](https://huggingface.co/black-forest-labs/FLUX.1-dev).
It is 50 times smaller and retains most capability of T5-XXL.
## Generation Results:
<p align="center">
<img src="showcase.png">
</p>
## Usage:
1. Setup the environment:
```
git clone https://github.com/LifuWang-66/DistillT5.git
cd DistillT5
conda create -n distillt5 python=3.12
conda activate distillt5
pip install -r requirements.txt
pip install ./diffusers
```
2. Inference
```py
import sys
import os
sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '..')))
from models.T5_encoder import T5EncoderWithProjection
import torch
from diffusers import FluxPipeline
pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.float16)
text_encoder = T5EncoderWithProjection.from_pretrained('LifuWang/DistillT5', torch_dtype=torch.float16)
pipe.text_encoder_2 = text_encoder
pipe = pipe.to('cuda')
prompt = "Photorealistic portrait of a stylish young woman wearing a futuristic golden sequined bodysuit that catches the light, creating a metallic, mirror-like effect. She is wearing large, reflective blue-tinted aviator sunglasses. Over her head, she wears headphones with metallic accents, giving a modern, cyber aesthetic."
image = pipe(prompt=prompt, num_images_per_prompt=1, guidance_scale=3.5, num_inference_steps=20).images[0]
image.save("t5_base.png")
```
|