Phenomenal post!
Maybe it could help future readers to have some clarity on the CUDA programming model itself so that the hierarchy of where each of the components (SM, thread blocks, registers, etc.) sits is clear.
Phenomenal post!
Maybe it could help future readers to have some clarity on the CUDA programming model itself so that the hierarchy of where each of the components (SM, thread blocks, registers, etc.) sits is clear.
torch.compile
2 ** search_round
) and repeat 1 - 3.diffusers
🧨bistandbytes
as the official backend but using others like torchao
is already very simple. enable_model_cpu_offload()
torch.compile()
them. from_single_file
loading and affected by the Runway SD 1.5 issue.runwayml/stable-diffusion-v1-5
saved locally in your HF cache then loading single file checkpoints in the following way should still work. from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>")
runwayml/stable-diffusion-v1-5
doesn't exist anymore. from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_single_file("<url or path to single file checkpoint>", config="Lykon/DreamShaper")