Transformers
GGUF

Kimi K2 Question

#4
by SFPLM - opened

I 'only' have 512 GB RAM But I have an RTX pro 6000 96GB VRAM and RTX 3090. so total Memory combined is around 632 GB. It says recommended 600 GB DRAM + 24GB GPU but can I use the GPU memory to "add ram" to meet the recommended and get similar generation time with the example setup shown in the page of KTransformers?

If you use an RTX Pro 6000 for active parameters and your 512GB of RAM is fast enough, then theoretically your speed will be higher than in the example. (Use a 3090 as RAM or for offloading cache, otherwise the speed will drop)

I also recommend waiting for iq4XS-iq3XXS to avoid OOM (Out of Memory) issues. You won't lose much with such a large model.

Sign up or log in to comment