Model does not load evenly on all available GPUs device_map= auto leading to OOM

#34
by sugunav14 - opened

Hi! I was trying to load the llama4 model with device_map = auto/balanced. I noticed that the model doesn't load evenly on the GPUs available. This causes some OOM errors while running model.generate() due to insufficient memory on the GPU on which a higher percentage of model weights are loaded. A workaround I was able to find is to use device_map='sequential' and specify the max_memory I want to use on each GPU. Out of curiosity, why does the model load unevenly onto the GPUs?
@ssuchyta

Meta Llama org

can you please confirm if you are using accelerate?

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment