AWS Trainium & Inferentia documentation
ControlNet
ControlNet
ControlNet conditions the stable diffusion model with an additional input image. In Optimum Neuron, we support the compilation of one or multiple ControlNet(s) along with the stable diffusion checkpoint. Then you can use the compiled artifacts to generate styled images.
Export to Neuron
We can either compile one or multiple ControlNet via the Optimum CLI or programatically via the NeuronStableDiffusionControlNetPipeline
class by passing the controlnet_ids
.
Option 1: CLI
optimum-cli export neuron -m runwayml/stable-diffusion-v1-5 --batch_size 1 --height 512 --width 512 --controlnet_ids lllyasviel/sd-controlnet-canny --num_images_per_prompt 1 sd_neuron_controlnet/
Option 2: Python API
from optimum.neuron import NeuronStableDiffusionControlNetPipeline
model_id = "runwayml/stable-diffusion-v1-5"
controlnet_id = "lllyasviel/sd-controlnet-canny"
# [Neuron] pipeline
input_shapes = {"batch_size": 1, "height": 512, "width": 512, "num_images_per_prompt": 1}
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
pipe = NeuronStableDiffusionControlNetPipeline.from_pretrained(
model_id,
controlnet_ids=controlnet_id,
export=True,
**input_shapes,
**compiler_args,
)
pipe.save_pretrained("sd_neuron_controlnet")
Text-to-Image
For text-to-image, we can specify an additional conditioning input.
Here is an example with a canny image, a white outline of an image on a black background. The ControlNet will use the canny image as a control to guide the model to generate an image with the same outline.
import cv2
import numpy as np
from diffusers import UniPCMultistepScheduler
from diffusers.utils import load_image, make_image_grid
from PIL import Image
from optimum.neuron import NeuronStableDiffusionControlNetPipeline
# prepare canny image
original_image = load_image(
"https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
image = np.array(original_image)
low_threshold = 100
high_threshold = 200
image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
# load pre-compiled neuron model
pipe = NeuronStableDiffusionControlNetPipeline.from_pretrained("sd_neuron_controlnet")
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
# inference
output = pipe("the mona lisa", image=canny_image).images[0]
compare = make_image_grid([original_image, canny_image, output], rows=1, cols=3)
compare.save("compare.png")

MultiControlNet
With Optimum Neuron, you can also compose multiple ControlNet conditionings from different image inputs:
- Compile multiple ControlNet for SD1.5
optimum-cli export neuron --inline-weights-neff --model jyoung105/stable-diffusion-v1-5 --task stable-diffusion --auto_cast matmul --auto_cast_type bf16 --batch_size 1 --num_images_per_prompt 1 --controlnet_ids lllyasviel/control_v11p_sd15_openpose lllyasviel/control_v11f1p_sd15_depth --height 512 --width 512 sd15-512x512-bf16-openpose-depth
- Run SD1.5 with OpenPose and Depth conditionings:
import numpy as np
import torch
from PIL import Image
from controlnet_aux import OpenposeDetector
from transformers import pipeline
from diffusers import UniPCMultistepScheduler
from diffusers.utils import load_image
from optimum.neuron import NeuronStableDiffusionControlNetPipeline
# OpenPose+Depth ControlNet
model_id = "sd15-512x512-bf16-openpose-depth"
# Load ControlNet images
# 1. openpose
image = load_image("https://huggingface.co/lllyasviel/control_v11p_sd15_openpose/resolve/main/images/input.png")
processor = OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
openpose_image = processor(image)
# 2. depth
image = load_image("https://huggingface.co/lllyasviel/control_v11p_sd15_depth/resolve/main/images/input.png")
depth_estimator = pipeline('depth-estimation')
image = depth_estimator(image)['depth']
image = np.array(image)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
depth_image = Image.fromarray(image)
images = [openpose_image.resize((512, 512)), depth_image.resize((512, 512))]
# 3. inference
pipe = NeuronStableDiffusionControlNetPipeline.from_pretrained(model_id)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
prompt = "a giant in a fantasy landscape, best quality"
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"
image = pipe(prompt=prompt, image=images).images[0]
image.save('out.png')

ControlNet with Stable Diffusion XL
Export to Neuron
optimum-cli export neuron -m stabilityai/stable-diffusion-xl-base-1.0 --task stable-diffusion-xl --batch_size 1 --height 1024 --width 1024 --controlnet_ids diffusers/controlnet-canny-sdxl-1.0-small --num_images_per_prompt 1 sdxl_neuron_controlnet/
Text-to-Image
import cv2
import numpy as np
from diffusers.utils import load_image
from PIL import Image
from optimum.neuron import NeuronStableDiffusionXLControlNetPipeline
# Inputs
prompt = "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting"
negative_prompt = "low quality, bad quality, sketches"
image = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
)
image = np.array(image)
image = cv2.Canny(image, 100, 200)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
image = Image.fromarray(image)
controlnet_conditioning_scale = 0.5 # recommended for good generalization
pipe = NeuronStableDiffusionXLControlNetPipeline.from_pretrained("sdxl_neuron_controlnet")
images = pipe(
prompt,
negative_prompt=negative_prompt,
image=image,
controlnet_conditioning_scale=controlnet_conditioning_scale,
).images
images[0].save("hug_lab.png")

NeuronStableDiffusionControlNetPipeline
class optimum.neuron.NeuronStableDiffusionControlNetPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.utils.dummy_sentencepiece_objects.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
__call__
< source >( prompt: typing.Union[str, typing.List[str]] = None image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None num_inference_steps: int = 50 timesteps: typing.Optional[typing.List[int]] = None sigmas: typing.Optional[typing.List[float]] = None guidance_scale: float = 7.5 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: typing.Optional[int] = 1 eta: float = 0.0 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None ip_adapter_image_embeds: typing.Optional[typing.List[torch.Tensor]] = None output_type: str = 'pil' return_dict: bool = True cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None controlnet_conditioning_scale: typing.Union[float, typing.List[float]] = 1.0 guess_mode: bool = False control_guidance_start: typing.Union[float, typing.List[float]] = 0.0 control_guidance_end: typing.Union[float, typing.List[float]] = 1.0 clip_skip: typing.Optional[int] = None callback_on_step_end: typing.Union[typing.Callable[[int, int, typing.Dict], NoneType], diffusers.callbacks.PipelineCallback, diffusers.callbacks.MultiPipelineCallbacks, NoneType] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
- prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide image generation. If not defined, you need to passprompt_embeds
. - image (
Optional["PipelineImageInput"]
, defaults toNone
) — The ControlNet input condition to provide guidance to theunet
for generation. If the type is specified astorch.Tensor
, it is passed to ControlNet as is.PIL.Image.Image
can also be accepted as an image. The dimensions of the output image defaults toimage
’s dimensions. If height and/or width are passed,image
is resized accordingly. If multiple ControlNets are specified ininit
, images must be passed as a list such that each element of the list can be correctly batched for input to a single ControlNet. Whenprompt
is a list, and if a list of images is passed for a single ControlNet, each will be paired with each prompt in theprompt
list. This also applies to multiple ControlNets, where a list of image lists can be passed to batch for each prompt and each ControlNet. - num_inference_steps (
int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. - timesteps (
Optional[List[int]]
, defaults toNone
) — Custom timesteps to use for the denoising process with schedulers which support atimesteps
argument in theirset_timesteps
method. If not defined, the default behavior whennum_inference_steps
is passed will be used. Must be in descending order. - sigmas (
Optional[List[int]]
, defaults toNone
) — Custom sigmas to use for the denoising process with schedulers which support asigmas
argument in theirset_timesteps
method. If not defined, the default behavior whennum_inference_steps
is passed will be used. - guidance_scale (
float
, defaults to 7.5) — A higher guidance scale value encourages the model to generate images closely linked to the textprompt
at the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1
. - negative_prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to passnegative_prompt_embeds
instead. Ignored when not using guidance (guidance_scale < 1
). - num_images_per_prompt (
int
, defaults to 1) — The number of images to generate per prompt. If it is different from the batch size used for the compiltaion, it will be overriden by the static batch size of neuron (except for dynamic batching). - eta (
float
, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to thediffusers.schedulers.DDIMScheduler
, and is ignored in other schedulers. - generator (
Optional[Union[torch.Generator, List[torch.Generator]]]
, defaults toNone
) — Atorch.Generator
to make generation deterministic. - latents (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied randomgenerator
. - prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from theprompt
input argument. - negative_prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided,negative_prompt_embeds
are generated from thenegative_prompt
input argument. - ip_adapter_image — (
Optional[PipelineImageInput]
, defaults toNone
): Optional image input to work with IP Adapters. - ip_adapter_image_embeds (
Optional[List[torch.Tensor]]
, defaults toNone
) — Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters. Each element should be a tensor of shape(batch_size, num_images, emb_dim)
. It should contain the negative image embedding ifdo_classifier_free_guidance
is set toTrue
. If not provided, embeddings are computed from theip_adapter_image
input argument. - output_type (
str
, defaults to"pil"
) — The output format of the generated image. Choose betweenPIL.Image
ornp.array
. - return_dict (
bool
, defaults toTrue
) — Whether or not to return adiffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple. - cross_attention_kwargs (
Optional[Dict[str, Any]]
, defaults toNone
) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined inself.processor
. - controlnet_conditioning_scale (
Union[float, List[float]]
, defaults to 1.0) — The outputs of the ControlNet are multiplied bycontrolnet_conditioning_scale
before they are added to the residual in the originalunet
. If multiple ControlNets are specified ininit
, you can set the corresponding scale as a list. - guess_mode (
bool
, defaults toFalse
) — The ControlNet encoder tries to recognize the content of the input image even if you remove all prompts. Aguidance_scale
value between 3.0 and 5.0 is recommended. - control_guidance_start (
Union[float, List[float]]
, defaults to 0.0) — The percentage of total steps at which the ControlNet starts applying. - control_guidance_end (
Union[float, List[float]]
, optional, defaults to 1.0) — The percentage of total steps at which the ControlNet stops applying. - clip_skip (
Optional[int]
, defaults toNone
) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings. - callback_on_step_end (
Optional[Union[Callable[[int, int, Dict], None], PipelineCallback, MultiPipelineCallbacks]]
, defaults toNone
) — A function or a subclass ofPipelineCallback
orMultiPipelineCallbacks
that is called at the end of each denoising step during the inference. with the following arguments:callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)
.callback_kwargs
will include a list of all tensors as specified bycallback_on_step_end_tensor_inputs
. - callback_on_step_end_tensor_inputs (
List[str]
, defaults to["latents"]
) — The list of tensor inputs for thecallback_on_step_end
function. The tensors specified in the list will be passed ascallback_kwargs
argument. You will only be able to include variables listed in the._callback_tensor_inputs
attribute of your pipeline class.
Returns
diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
If return_dict
is True
, diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
is returned,
otherwise a tuple
is returned where the first element is a list with the generated images and the
second element is a list of bool
s indicating whether the corresponding generated image contains
“not-safe-for-work” (nsfw) content.
The call function to the pipeline for generation.
NeuronStableDiffusionXLControlNetPipeline
class optimum.neuron.NeuronStableDiffusionXLControlNetPipeline
< source >( config: typing.Dict[str, typing.Any] configs: typing.Dict[str, ForwardRef('PretrainedConfig')] neuron_configs: typing.Dict[str, ForwardRef('NeuronDefaultConfig')] data_parallel_mode: typing.Literal['none', 'unet', 'transformer', 'all'] scheduler: typing.Optional[diffusers.schedulers.scheduling_utils.SchedulerMixin] vae_decoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeDecoder')] text_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None text_encoder_2: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTextEncoder'), NoneType] = None unet: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelUnet'), NoneType] = None transformer: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelTransformer'), NoneType] = None vae_encoder: typing.Union[torch.jit._script.ScriptModule, ForwardRef('NeuronModelVaeEncoder'), NoneType] = None image_encoder: typing.Optional[torch.jit._script.ScriptModule] = None safety_checker: typing.Optional[torch.jit._script.ScriptModule] = None tokenizer: typing.Union[transformers.models.clip.tokenization_clip.CLIPTokenizer, transformers.utils.dummy_sentencepiece_objects.T5Tokenizer, NoneType] = None tokenizer_2: typing.Optional[transformers.models.clip.tokenization_clip.CLIPTokenizer] = None feature_extractor: typing.Optional[transformers.models.clip.feature_extraction_clip.CLIPFeatureExtractor] = None controlnet: typing.Union[torch.jit._script.ScriptModule, typing.List[torch.jit._script.ScriptModule], ForwardRef('NeuronControlNetModel'), ForwardRef('NeuronMultiControlNetModel'), NoneType] = None requires_aesthetics_score: bool = False force_zeros_for_empty_prompt: bool = True add_watermarker: typing.Optional[bool] = None model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_and_config_save_paths: typing.Optional[typing.Dict[str, typing.Tuple[str, pathlib.Path]]] = None )
__call__
< source >( prompt: typing.Union[str, typing.List[str], NoneType] = None prompt_2: typing.Union[str, typing.List[str], NoneType] = None image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor]] = None num_inference_steps: int = 50 timesteps: typing.List[int] = None sigmas: typing.List[float] = None denoising_end: typing.Optional[float] = None guidance_scale: float = 5.0 negative_prompt: typing.Union[str, typing.List[str], NoneType] = None negative_prompt_2: typing.Union[str, typing.List[str], NoneType] = None num_images_per_prompt: typing.Optional[int] = 1 eta: float = 0.0 generator: typing.Union[torch._C.Generator, typing.List[torch._C.Generator], NoneType] = None latents: typing.Optional[torch.Tensor] = None prompt_embeds: typing.Optional[torch.Tensor] = None negative_prompt_embeds: typing.Optional[torch.Tensor] = None pooled_prompt_embeds: typing.Optional[torch.Tensor] = None negative_pooled_prompt_embeds: typing.Optional[torch.Tensor] = None ip_adapter_image: typing.Union[PIL.Image.Image, numpy.ndarray, torch.Tensor, typing.List[PIL.Image.Image], typing.List[numpy.ndarray], typing.List[torch.Tensor], NoneType] = None ip_adapter_image_embeds: typing.Optional[typing.List[torch.Tensor]] = None output_type: typing.Optional[str] = 'pil' return_dict: bool = True cross_attention_kwargs: typing.Optional[typing.Dict[str, typing.Any]] = None controlnet_conditioning_scale: typing.Union[float, typing.List[float]] = 1.0 guess_mode: bool = False control_guidance_start: typing.Union[float, typing.List[float]] = 0.0 control_guidance_end: typing.Union[float, typing.List[float]] = 1.0 original_size: typing.Optional[typing.Tuple[int, int]] = None crops_coords_top_left: typing.Tuple[int, int] = (0, 0) target_size: typing.Optional[typing.Tuple[int, int]] = None negative_original_size: typing.Optional[typing.Tuple[int, int]] = None negative_crops_coords_top_left: typing.Tuple[int, int] = (0, 0) negative_target_size: typing.Optional[typing.Tuple[int, int]] = None clip_skip: typing.Optional[int] = None callback_on_step_end: typing.Union[typing.Callable[[int, int, typing.Dict], NoneType], diffusers.callbacks.PipelineCallback, diffusers.callbacks.MultiPipelineCallbacks, NoneType] = None callback_on_step_end_tensor_inputs: typing.List[str] = ['latents'] **kwargs ) → diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
Parameters
- prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide image generation. If not defined, you need to passprompt_embeds
. - prompt_2 (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to be sent totokenizer_2
andtext_encoder_2
. If not defined,prompt
is used in both text-encoders. - image (
Optional["PipelineImageInput"]
, defaults toNone
) — The ControlNet input condition to provide guidance to theunet
for generation. If the type is specified astorch.Tensor
, it is passed to ControlNet as is.PIL.Image.Image
can also be accepted as an image. The dimensions of the output image defaults toimage
’s dimensions. If height and/or width are passed,image
is resized accordingly. If multiple ControlNets are specified ininit
, images must be passed as a list such that each element of the list can be correctly batched for input to a single ControlNet. - num_inference_steps (
int
, defaults to 50) — The number of denoising steps. More denoising steps usually lead to a higher quality image at the expense of slower inference. - timesteps (
Optional[List[int]]
, defaults toNone
) — Custom timesteps to use for the denoising process with schedulers which support atimesteps
argument in theirset_timesteps
method. If not defined, the default behavior whennum_inference_steps
is passed will be used. Must be in descending order. - sigmas (
Optional[List[int]]
, defaults toNone
) — Custom sigmas to use for the denoising process with schedulers which support asigmas
argument in theirset_timesteps
method. If not defined, the default behavior whennum_inference_steps
is passed will be used. - denoising_end (
Optional[float]
, defaults toNone
) — When specified, determines the fraction (between 0.0 and 1.0) of the total denoising process to be completed before it is intentionally prematurely terminated. As a result, the returned sample will still retain a substantial amount of noise as determined by the discrete timesteps selected by the scheduler. The denoising_end parameter should ideally be utilized when this pipeline forms a part of a “Mixture of Denoisers” multi-pipeline setup, as elaborated in Refining the Image Output - guidance_scale (
float
, defaults to 5.0) — A higher guidance scale value encourages the model to generate images closely linked to the textprompt
at the expense of lower image quality. Guidance scale is enabled whenguidance_scale > 1
. - negative_prompt (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide what to not include in image generation. If not defined, you need to passnegative_prompt_embeds
instead. Ignored when not using guidance (guidance_scale < 1
). - negative_prompt_2 (
Optional[Union[str, List[str]]]
, defaults toNone
) — The prompt or prompts to guide what to not include in image generation. This is sent totokenizer_2
andtext_encoder_2
. If not defined,negative_prompt
is used in both text-encoders. - num_images_per_prompt (
int
, defaults to 1) — The number of images to generate per prompt. - eta (
float
, defaults to 0.0) — Corresponds to parameter eta (η) from the DDIM paper. Only applies to thediffusers.schedulers.DDIMScheduler
, and is ignored in other schedulers. - generator (
Optional[Union[torch.Generator, List[torch.Generator]]]
, defaults toNone
) — Atorch.Generator
to make generation deterministic. - latents (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated noisy latents sampled from a Gaussian distribution, to be used as inputs for image generation. Can be used to tweak the same generation with different prompts. If not provided, a latents tensor is generated by sampling using the supplied randomgenerator
. - prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, text embeddings are generated from theprompt
input argument. - negative_prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated negative text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided,negative_prompt_embeds
are generated from thenegative_prompt
input argument. - pooled_prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated pooled text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, pooled text embeddings are generated fromprompt
input argument. - negative_pooled_prompt_embeds (
Optional[torch.Tensor]
, defaults toNone
) — Pre-generated negative pooled text embeddings. Can be used to easily tweak text inputs (prompt weighting). If not provided, poolednegative_prompt_embeds
are generated fromnegative_prompt
input argument. - ip_adapter_image — (
Optional[PipelineImageInput]
, defaults toNone
): Optional image input to work with IP Adapters. - ip_adapter_image_embeds (
Optional[List[torch.Tensor]]
, defaults toNone
) — Pre-generated image embeddings for IP-Adapter. It should be a list of length same as number of IP-adapters. Each element should be a tensor of shape(batch_size, num_images, emb_dim)
. It should contain the negative image embedding ifdo_classifier_free_guidance
is set toTrue
. If not provided, embeddings are computed from theip_adapter_image
input argument. - output_type (
Optional[str]
, defaults to"pil"
) — The output format of the generated image. Choose betweenPIL.Image
ornp.array
. - return_dict (
bool
, defaults toTrue
) — Whether or not to return a~pipelines.stable_diffusion.StableDiffusionPipelineOutput
instead of a plain tuple. - cross_attention_kwargs (
Optional[Dict[str, Any]]
, defaults toNone
) — A kwargs dictionary that if specified is passed along to theAttentionProcessor
as defined inself.processor
. - controlnet_conditioning_scale (
Union[float, List[float]]
, defaults to 1.0) — The outputs of the ControlNet are multiplied bycontrolnet_conditioning_scale
before they are added to the residual in the originalunet
. If multiple ControlNets are specified ininit
, you can set the corresponding scale as a list. - guess_mode (
bool
, defaults toFalse
) — The ControlNet encoder tries to recognize the content of the input image even if you remove all prompts. Aguidance_scale
value between 3.0 and 5.0 is recommended. - control_guidance_start (
Union[float, List[float]]
, defaults to 0.0) — The percentage of total steps at which the ControlNet starts applying. - control_guidance_end (
Union[float, List[float]]
, defaults to 1.0) — The percentage of total steps at which the ControlNet stops applying. - original_size (
Optional[Tuple[int, int]]
, defaults to (1024, 1024)) — Iforiginal_size
is not the same astarget_size
the image will appear to be down- or upsampled.original_size
defaults to(height, width)
if not specified. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. - crops_coords_top_left (
Tuple[int, int]
, defaults to (0, 0)) —crops_coords_top_left
can be used to generate an image that appears to be “cropped” from the positioncrops_coords_top_left
downwards. Favorable, well-centered images are usually achieved by settingcrops_coords_top_left
to (0, 0). Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. - target_size (
Optional[Tuple[int, int]]
, defaults toNone
) — For most cases,target_size
should be set to the desired height and width of the generated image. If not specified it will default to(height, width)
. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. - negative_original_size (
Optional[Tuple[int, int]]
, defaults toNone
) — To negatively condition the generation process based on a specific image resolution. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. For more information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208. - negative_crops_coords_top_left (
Tuple[int, int]
, defaults to (0, 0)) — To negatively condition the generation process based on a specific crop coordinates. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. For more information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208. - negative_target_size (
Optional[Tuple[int, int]]
, defaults toNone
) — To negatively condition the generation process based on a target image resolution. It should be as same as thetarget_size
for most cases. Part of SDXL’s micro-conditioning as explained in section 2.2 of https://huggingface.co/papers/2307.01952. For more information, refer to this issue thread: https://github.com/huggingface/diffusers/issues/4208. - clip_skip (
Optional[int]
, defaults toNone
) — Number of layers to be skipped from CLIP while computing the prompt embeddings. A value of 1 means that the output of the pre-final layer will be used for computing the prompt embeddings. - callback_on_step_end (
Optional[Union[Callable[[int, int, Dict], None], PipelineCallback, MultiPipelineCallbacks]]
, defaults toNone
) — A function or a subclass ofPipelineCallback
orMultiPipelineCallbacks
that is called at the end of each denoising step during the inference. with the following arguments:callback_on_step_end(self: DiffusionPipeline, step: int, timestep: int, callback_kwargs: Dict)
.callback_kwargs
will include a list of all tensors as specified bycallback_on_step_end_tensor_inputs
. - callback_on_step_end_tensor_inputs (
List[str]
, defaults to["latents"]
) — The list of tensor inputs for thecallback_on_step_end
function. The tensors specified in the list will be passed ascallback_kwargs
argument. You will only be able to include variables listed in the._callback_tensor_inputs
attribute of your pipeline class.
Returns
diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
or tuple
If return_dict
is True
, diffusers.pipelines.stable_diffusion.StableDiffusionPipelineOutput
is returned,
otherwise a tuple
is returned containing the output images.
The call function to the pipeline for generation.
Examples:
Are there any other diffusion features that you want us to support in 🤗Optimum-neuron
? Please file an issue to Optimum-neuron
Github repo or discuss with us on HuggingFace’s community forum, cheers 🤗 !