kpsss34's picture
Update README.md
d548522 verified
metadata
library_name: sana
tags:
  - text-to-image
  - Sana
  - 1024px_based_image_size
  - Multi-language
language:
  - en
  - zh
base_model:
  - Efficient-Large-Model/Sana_600M_1024px_diffusers
pipeline_tag: text-to-image

Note

  • Weakness in Complex Scene Creation: Due to limitation of data, our model has limited capabilities in generating complex scenes, text, and human hands.
  • Enhancing Capabilities: The model’s performance can be improved by increasing the complexity and length of prompts. Below are some examples of prompts and samples.

Model Description

Model Sources

For research purposes, we recommend our generative-models Github repository (https://github.com/NVlabs/Sana), which is more suitable for both training and inference and for which most advanced diffusion sampler like Flow-DPM-Solver is integrated. MIT Han-Lab provides free Sana inference.

# pip install git+https://github.com/huggingface/diffusers
# pip install transformer
import torch
from diffusers import SanaPAGPipeline

pipe = SanaPAGPipeline.from_pretrained(
  "kpsss34/SANA600.fp8_Realistic_SFW_V1",
  torch_dtype=torch.float16,
)
pipe.to("cuda")

pipe.text_encoder.to(torch.bfloat16)
pipe.vae.to(torch.bfloat16)

prompt = 'A cute 🐼 eating 🎋, ink drawing style'
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=5.0,
    pag_scale=2.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('sana.png')