Model does not load evenly on all available GPUs device_map= auto leading to OOM

#34

by sugunav14 - opened Apr 10

Apr 10

Hi! I was trying to load the llama4 model with device_map = auto/balanced. I noticed that the model doesn't load evenly on the GPUs available. This causes some OOM errors while running model.generate() due to insufficient memory on the GPU on which a higher percentage of model weights are loaded. A workaround I was able to find is to use device_map='sequential' and specify the max_memory I want to use on each GPU. Out of curiosity, why does the model load unevenly onto the GPUs?
@ssuchyta

Sanyam

Meta Llama org Apr 10

can you please confirm if you are using accelerate?

sugunav14

Apr 17

I am using get_max_memory from accelerate.utils and infer_auto_device_map from accelerate

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment