mistralai/Mixtral-8x7B-Instruct-v0.1 · How to All Utilize all GPU's when device="balanced_low

Mar 28

•

While loading the model in "balanced_low_0" GPU setting the model is loaded into all GPU's apart from 0: GPU. Where the 0: GPU is left to do the text inference. (i.e. text inference as in performing all the calculation to generate response inside the LLM)

So, as per the give device parameter my model is loaded onto 1,2,3 GPU's and 0: GPU is left for inference.

| ID | GPU | MEM |

| 0 | 0% | 3% |
| 1 | 0% | 83% |
| 2 | 0% | 82% |
| 3 | 0% | 76% |

Question: How can i also utilize the remaining 1,2,3 GPU's to perform text inference not only 0:GPU?

Context: "balanced_low_0" evenly splits the model on all GPUs except the first one, and only puts on GPU 0 what does not fit on the others. This option is great when you need to use GPU 0 for some processing of the outputs, like when using the generate function for Transformers models

Reference: https://huggingface.co/docs/accelerate/en/concept_guides/big_model_inference#designing-a-device-map

kmukeshreddy

Mar 28

•

edited Mar 28

Hello @ybelkada , Hope you are doing well!
Hopefully looking for your comment on the above issue.
Thanks in advance!!

kmukeshreddy

Mar 28

•

edited Mar 28

@ArthurZ @joaogante

mistralai
/

Mixtral-8x7B-Instruct-v0.1

How to All Utilize all GPU's when device="balanced_low_0" in GPU setting

| ID | GPU | MEM |