How does a RTX4090 with 24GB memory run this model ?
When I try this model on RTX4090 with 24GB memory, it report the torch.OutOfMemoryError:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 72.00 MiB. GPU 0 has a total capacity of 23.53 GiB of which 44.44 MiB is free. Including non-PyTorch memory, this process has 22.92 GiB memory in use. Of the allocated memory 22.36 GiB is allocated by PyTorch, and 196.63 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
Then I refer to a similar question promoted in Qwen-Image and I change my code like below which add device_map config in from_pretrained:
pipeline = QwenImageEditPipeline.from_pretrained(
"/home/tmp/Qwen-Image-Edit",
device_map="balanced")
print("pipeline loaded")
pipeline.to(torch.bfloat16)
pipeline.to("cuda")
pipeline.set_progress_bar_config(disable=None)
but reports:
ValueError: It seems like you have activated a device mapping strategy on the pipeline which doesn't allow explicit device placement using to()
. You can call reset_device_map()
to remove the existing device map from the pipeline.
How do I modify my code so that I can run this model on my 24GB RTX4090 ? Are there any code examples or documents I can refer ?
with the 4-step lora my rtx4090 takes about 14 seconds... If you need help, I suggest you consider doing what I did... https://chat.qwen.ai/ Go ask Qwen yourself, LOL. Surprisingly, it was very aware of its image functionality and corrected my script without any hesitation
you can quantize it see this conversation: https://huggingface.co/Qwen/Qwen-Image-Edit/discussions/6
try commenting out the line pipeline.to("cuda"), that should work.