Ahmadzei's picture
update 1
57bdca5
raw
history blame
406 Bytes
You should also consider the tradeoff between cost and speed because it'll be cheaper to rent or buy a smaller GPU but it'll take longer to train your model.
If you have enough GPU memory make sure you disable CPU/NVMe offload to make everything faster.
Select a ZeRO stage
After you've installed DeepSpeed and have a better idea of your memory requirements, the next step is selecting a ZeRO stage to use.