logo

Model card

We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU.

Source code is available at https://github.com/NVlabs/Sana.

🧨 Diffusers

1. How to use SanaControlNetPipeline with 🧨diffusers

# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaControlNetModel, SanaControlNetPipeline
from diffusers.utils import load_image

controlnet = SanaControlNetModel.from_pretrained(
    "ishan24/Sana_600M_1024px_ControlNet_diffusers",
    torch_dtype=torch.float16
)

pipe = SanaControlNetPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_600M_1024px_diffusers",
    variant="fp16",
    controlnet=controlnet,
    torch_dtype=torch.float16,
)

pipe.to('cuda')
pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)

cond_image = load_image(
    "https://huggingface.co/ishan24/Sana_600M_1024px_ControlNet_diffusers/resolve/main/hed_example.png"
)
prompt='a cat with a neon sign that says "Sana"'
image = pipe(
    prompt,
    control_image=cond_image,
).images[0]
image.save("sana.png")
Downloads last month
26
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support