TQ1_0 deepseek-r1-0528 could not run with ollama
#23
by
yoummiegao
- opened
I run /home/lx# docker exec ollama ollama run hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0 with 5*4090(48G) GPU, I could see model really run into GPUs
but always get following error:
Error: llama runner process has terminated: cudaMalloc failed: out of memory
ggml_gallocr_reserve_n: failed to allocate CUDA4 buffer of size 35888578560
My server mem is 512G and I set 1T virtual mem, why it still doesn't work? Is there any relationship with disk size or docker env? (36G left after downloading the quatized model and running in a docker (0.90 version ollama))
thanks so much!