hardware specs for running RTX Pro 6000

by anuragphadke - opened 21 days ago

21 days ago

Trying to get this running on RTX Pro 6000 with 96GB VRAM, getting OOM..

vllm serve NousResearch/Hermes-4-70B

Tried few other param combination:
--dtype auto --kv-cache-dtype fp8 --gpu-memory-utilization 0.95 --max-model-len 32768 etc.

error returned:
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 0 has a total capacity of 94.97 GiB of which 206.88 MiB is free. Including non-PyTorch memory, this process has 94.76 GiB memory in use
. Of the allocated memory 94.12 GiB is allocated by PyTorch, and 1.80 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments
:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Is it possible to run Hermes 4 70B on RTX Pro 6000?

anuragphadke changed discussion title from hardware specs for running on consumer GPU to hardware specs for running RTX Pro 6000 21 days ago

teknium

NousResearch org 20 days ago

You should run the FP8 version with VLLM. in bf16, it requires 140GB of memory

teknium changed discussion status to closed 20 days ago

teknium

NousResearch org 20 days ago

That's available here and no special flags are required to run it (but you can still use all the ones you are)
https://huggingface.co/NousResearch/Hermes-4-70B-FP8

anuragphadke

20 days ago

thank you so much; how much GPU is needed for the FP8 version? 48/64/96?

teknium

NousResearch org 20 days ago

You would need around 35GB to load it + at least 5 more GB for the context window

teknium

NousResearch org 20 days ago

err sorry, 70GB to load ~5GB for context*

teknium

NousResearch org 20 days ago

If you use the 4bit GGUF, only 35GB but it reduces quality by a small bit

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment