How much cuda memory to use

#1
by robindon - opened

How much cuda memory is used on the corresponding nvidia card

OpenGVLab org

Hello, if you use bf16 to load the 78B model, the model weights will consume approximately 78 GB * 2 = 156 GB.

On the other hand, if you use lmdeploy to load the AWQ quantized version of the model, the weights will consume approximately 78 GB * 0.5 = 39 GB.

czczup changed discussion status to closed
Your need to confirm your account before you can post a new comment.

Sign up or log in to comment