Diffusers documentation
Text-guided depth-to-image generation
Get started
Tutorials
OverviewUnderstanding pipelines, models and schedulersAutoPipelineTrain a diffusion modelLoad LoRAs for inferenceAccelerate inference of text-to-image diffusion modelsWorking with big models
Load pipelines and adapters
Load pipelinesLoad community pipelines and componentsLoad schedulers and modelsModel files and layoutsLoad adaptersPush files to the Hub
Generative tasks
Inference techniques
OverviewCreate a serverDistributed inferenceMerge LoRAsScheduler featuresPipeline callbacksReproducible pipelinesControlling image qualityPrompt techniques
Advanced inference
Hybrid Inference
Specific pipeline examples
CogVideoXConsisIDStable Diffusion XLSDXL TurboKandinskyIP-AdapterOmniGenPAGControlNetT2I-AdapterLatent Consistency ModelTextual inversionShap-EDiffEditTrajectory Consistency Distillation-LoRAStable Video DiffusionMarigold Computer Vision
Training
Quantization Methods
Accelerate inference and reduce memory
Speed up inferenceReduce memory usagePyTorch 2.0xFormersToken mergingDeepCacheTGATExDiTParaAttention
Optimized model formats
Optimized hardware
Conceptual Guides
PhilosophyControlled generationHow to contribute?Diffusers' Ethical GuidelinesEvaluating Diffusion Models
Community Projects
API
Main Classes
Loaders
Models
Pipelines
Schedulers
Internal classes
You are viewing v0.33.0 version. A newer version v0.38.0 is available.
Text-guided depth-to-image generation
The StableDiffusionDepth2ImgPipeline lets you pass a text prompt and an initial image to condition the generation of new images. In addition, you can also pass a depth_map to preserve the image structure. If no depth_map is provided, the pipeline automatically predicts the depth via an integrated depth-estimation model.
Start by creating an instance of the StableDiffusionDepth2ImgPipeline:
import torch
from diffusers import StableDiffusionDepth2ImgPipeline
from diffusers.utils import load_image, make_image_grid
pipeline = StableDiffusionDepth2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-depth",
torch_dtype=torch.float16,
use_safetensors=True,
).to("cuda")Now pass your prompt to the pipeline. You can also pass a negative_prompt to prevent certain words from guiding how an image is generated:
url = "http://images.cocodataset.org/val2017/000000039769.jpg"
init_image = load_image(url)
prompt = "two tigers"
negative_prompt = "bad, deformed, ugly, bad anatomy"
image = pipeline(prompt=prompt, image=init_image, negative_prompt=negative_prompt, strength=0.7).images[0]
make_image_grid([init_image, image], rows=1, cols=2)| Input | Output |
|---|---|
![]() | ![]() |

