TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF · Oobabooga GPU load error

robert1968

Dec 17, 2023

•

edited Dec 17, 2023

Hi,

When i try to use GPU while loading TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF model i got this error message:

CUDA error 2 at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:8955: out of memory
current device: 0
GGML_ASSERT: /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:8955: !"CUDA error"
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)

is this only for me (some local install problem) or for everyone (llama.cpp) ?

(With n-gpu-layers = 0, it loads the model and works perfectly but slow "a bit". :)

Yhyu13

Dec 18, 2023

I have not encounter such problem with llamacpp that I built from source when runnig mixtral, you might want to search issues for llamacpp. I found his one https://github.com/ggerganov/llama.cpp/issues/4452

robert1968

Dec 18, 2023

Many thanks.
Do you use Oobabooga, right?
I tried to compile llamacpp when previous Oobabooga was not working out of the box with Mixtral but compiled lib was not used by Oobabooga. Maybe i misused Conda env ?

robert1968

Dec 18, 2023

GPU Works !

i miss used it - number of layers must be less the GPU size. i mean i have 3060 with 12GB VRAM so n-gpu-layers < 12 in my case 9 is the max.

Earlier i set n-gpu-layers to 25 so this changed in the new version.
this is much much faster.

robert1968 changed discussion title from Oobabooga GPU load error to Oobabooga GPU load error - Solved! Dec 18, 2023

robert1968 changed discussion status to closed Feb 29, 2024