THUDM/CogVideoX1.5-5B-SAT · Seeking Expert Advice on Optimizing GPU Memory Usage for Large AI Models!

I’m currently working on deploying the CogVideoX-5B model on an AWS EC2 p3.2xlarge instance with a Tesla V100-SXM2 GPU (16GB), but I’ve encountered persistent CUDA Out of Memory issues despite numerous optimizations. Here’s a quick overview of the situation:

1️⃣ What I’ve Tried:

Reduced precision using torch.float16.
CPU-GPU layer offloading via device_map="balanced".
Disk-based offloading with offload_folder.
Lowered parameters: num_frames=10, num_inference_steps=10.
Enabled memory-efficient features like vae.enable_slicing() and vae.enable_tiling().
Set environment variables like PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True.
2️⃣ Environment:

GPU: Tesla V100-SXM2 (16GB)
Instance: AWS EC2 p3.2xlarge
CUDA Version: 12.2
Libraries: PyTorch 2.0.1, diffusers 0.21.0, accelerate 0.21.0
3️⃣ Issue: Despite all efforts, the GPU runs out of memory while trying to allocate 72MB more, even after freeing up as much memory as possible. At this point, I’m wondering if further optimizations are viable or if upgrading to a larger GPU (e.g., 32GB or 64GB) is the only practical solution.

🤔 The Ask:

Have you worked on similar memory-intensive AI models?
Do you know advanced optimization techniques that might help resolve this?
Any guidance or suggestions would be greatly appreciated!
Let’s connect and collaborate! I’d love to hear your thoughts or learn from your experiences in deploying large-scale AI models. Feel free to comment, or DM me if you have insights. 🙌