TQ1_0 deepseek-r1-0528 could not run with ollama

#23
by yoummiegao - opened

I run /home/lx# docker exec ollama ollama run hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0 with 5*4090(48G) GPU, I could see model really run into GPUs

image.png

but always get following error:
Error: llama runner process has terminated: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA4 buffer of size 35888578560

My server mem is 512G and I set 1T virtual mem, why it still doesn't work? Is there any relationship with disk size or docker env? (36G left after downloading the quatized model and running in a docker (0.90 version ollama))

thanks so much!

MJVYs7sUOrjUDnECfH0ee.png
KnUqIcjgyshleZrSvl1hl.png

Sign up or log in to comment