Model does not load evenly on all available GPUs device_map= auto leading to OOM
#34
by
sugunav14
- opened
Hi! I was trying to load the llama4 model with device_map = auto/balanced. I noticed that the model doesn't load evenly on the GPUs available. This causes some OOM errors while running model.generate() due to insufficient memory on the GPU on which a higher percentage of model weights are loaded. A workaround I was able to find is to use device_map='sequential' and specify the max_memory I want to use on each GPU. Out of curiosity, why does the model load unevenly onto the GPUs?
@ssuchyta
can you please confirm if you are using accelerate
?