Oobabooga GPU load error - Solved!
Hi,
When i try to use GPU while loading TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF model i got this error message:
CUDA error 2 at /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:8955: out of memory
current device: 0
GGML_ASSERT: /home/runner/work/llama-cpp-python-cuBLAS-wheels/llama-cpp-python-cuBLAS-wheels/vendor/llama.cpp/ggml-cuda.cu:8955: !"CUDA error"
Could not attach to process. If your uid matches the uid of the target
process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try
again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf
ptrace: Operation not permitted.
No stack.
The program is not being run.
Aborted (core dumped)
is this only for me (some local install problem) or for everyone (llama.cpp) ?
(With n-gpu-layers = 0, it loads the model and works perfectly but slow "a bit". :)
I have not encounter such problem with llamacpp that I built from source when runnig mixtral, you might want to search issues for llamacpp. I found his one https://github.com/ggerganov/llama.cpp/issues/4452
Many thanks.
Do you use Oobabooga, right?
I tried to compile llamacpp when previous Oobabooga was not working out of the box with Mixtral but compiled lib was not used by Oobabooga. Maybe i misused Conda env ?