Model's Resource Utilization
#230
by
kalashshah19
- opened
Let's discuss resource utilization, a major concern for individual devs
How much VRAM does it use ?
The model in this repo without any quantization or customization will be around 1.2 TB of VRAM (probably 16 NVIDIA H100 GPUs will work). But this depends on the amount of cache and the size of the context that you will configure that will may require more VRAM to accomodate the cache. Consider maybe 1.5TB of VRAM for a comfortable scenario.
Ohh thanks for the info mate :)
kalashshah19
changed discussion title from
How much VRAM does it use ?
to Model's Resource Utilization