Issue: SD 3.5L Randomly Stalls on 'Loading VAE Model...' Step

#90
by Shakoretka - opened

I'm using Stable Diffusion 3.5 Large in a venv on a Windows 11 machine with an RTX 5090 GPU.

The model works fine overall, but randomly stops at the "Loading VAE model..." step without any errors. This can happen multiple times in a row. The behavior is inconsistent and doesn't seem to be tied to a specific prompt or configuration.

I've tried the following commands — all result in the same behavior:

python sd3_infer.py --prompt ""
python sd3_infer.py --prompt prompts.txt
python sd3_infer.py --prompt prompts.txt --model models/sd3.5_large.safetensors
python sd3_infer.py --prompt prompts.txt --model models/sd3.5_large.safetensors --width 1024 --height 1024

Tried manually download and add sd3_vae.safetensors to the models/ folder from this source:
https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main/vae
Unfortunately, this didn’t help.

Update: When using the sd3_vae.safetensors file from Hugging Face, receive dozens of "Skipping key ... does not exist in python model" messages. It appears the VAE model is structurally incompatible with the expected format. But got funky results :D

VAE3.png

Had to update PyTorch to the nightly CUDA 12.1 build using this command, since the model wouldn’t run at all without it on RTX 5090:
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128

Really appreciate any help or suggestions on how to fix this issue.

VAE problrm.png

Sign up or log in to comment