AWS Trainium & Inferentia documentation
PixArt-α
PixArt-α
Overview
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis is Junsong Chen, Jincheng Yu, Chongjian Ge, Lewei Yao, Enze Xie, Yue Wu, Zhongdao Wang, James Kwok, Ping Luo, Huchuan Lu, and Zhenguo Li.
Some notes about this pipeline:
- It uses a Transformer backbone (instead of a UNet) for denoising. As such it has a similar architecture as DiT.
- It was trained using text conditions computed from T5. This aspect makes the pipeline better at following complex text prompts with intricate details.
- It is good at producing high-resolution images at different aspect ratios. To get the best results, the authors recommend some size brackets which can be found here.
- It rivals the quality of state-of-the-art text-to-image generation systems (as of this writing) such as Stable Diffusion XL, Imagen, and DALL-E 2, while being more efficient than them.
You can find the original codebase at PixArt-alpha/PixArt-alpha and all the available checkpoints at PixArt-alpha.
🤗 Optimum
extends Diffusers
to support inference on the second generation of Neuron devices(powering Trainium and Inferentia 2). It aims at inheriting the ease of Diffusers on Neuron.
Export to Neuron
To deploy models in the PixArt-α pipeline, you will need to compile them to TorchScript optimized for AWS Neuron. There are four components which need to be exported to the .neuron
format to boost the performance:
- Text encoder
- Transformer
- VAE encoder
- VAE decoder
You can either compile and export a PixArt-α Checkpoint via CLI or NeuronPixArtAlphaPipeline
class.
Option 1: CLI
optimum-cli export neuron --model PixArt-alpha/PixArt-XL-2-512x512 --batch_size 1 --height 512 --width 512 --num_images_per_prompt 1 --torch_dtype bfloat16 --sequence_length 120 pixart_alpha_neuron_512/
We recommend using a inf2.8xlarge
or a larger instance for the model compilation. You will also be able to compile the model with the Optimum CLI on a CPU-only instance (needs ~35 GB memory), and then run the pre-compiled model on inf2.xlarge
to reduce the expenses. In this case, don’t forget to disable validation of inference by adding the --disable-validation
argument.
Option 2: Python API
import torch
from optimum.neuron import NeuronPixArtAlphaPipeline
# Compile
compiler_args = {"auto_cast": "none"}
input_shapes = {"batch_size": 1, "height": 512, "width": 512, "sequence_length": 120}
neuron_model = NeuronPixArtAlphaPipeline.from_pretrained("PixArt-alpha/PixArt-XL-2-512x512", torch_dtype=torch.bfloat16, export=True, disable_neuron_cache=True, **compiler_args, **input_shapes)
# Save locally
neuron_model.save_pretrained("pixart_alpha_neuron_512/")
# Upload to the HuggingFace Hub
neuron_model.push_to_hub(
"pixart_alpha_neuron_512/", repository_id="Jingya/PixArt-XL-2-512x512-neuronx" # Replace with your HF Hub repo id
)
Text-to-Image
NeuronPixArtAlphaPipeline
class allows you to generate images from a text prompt on neuron devices similar to the experience with Diffusers
.
With pre-compiled PixArt-α models, now generate an image with a prompt on Neuron:
from optimum.neuron import NeuronPixArtAlphaPipeline
neuron_model = NeuronPixArtAlphaPipeline.from_pretrained("pixart_alpha_neuron_512/")
prompt = "Oppenheimer sits on the beach on a chair, watching a nuclear exposition with a huge mushroom cloud, 120mm."
image = neuron_model(prompt=prompt).images[0]

NeuronPixArtAlphaPipeline
Pipeline for text-to-image generation using PixArt-α.
Are there any other diffusion features that you want us to support in 🤗Optimum-neuron
? Please file an issue to Optimum-neuron
Github repo or discuss with us on HuggingFace’s community forum, cheers 🤗 !