Spaces:
Running
Error!
It think it just takes a while during busy times to get a GPU out of the cluster. You just need to wait a bit.
can we not use bitsandbytes with ZeroGPU?
In addition, I would like to know how to use Flash attention with ZeroGPU!!
- You can use bitsandbytes, I've used it myself.
- I'm not sure, haven't needed it.
Well, you still exceeded your quota. The quota is fixed for any kind of usage, because you still use a costly GPU when debugging.
For first problem:
Use accelerate in requirements.txt, use @spaces.GPU(queue=False)
and use default theme and UI (Yes UI causes this issue, I found it today).
For second one:
use this@spaces.GPU(queue=False, time=30sec)
Choose a time that meets your needs. However, if a query exceeds this duration, task will be terminated.
Yeah, that's normal for the ZeroGPU Runtime. It should still work, atleast it worked for me.
I solved the issue of installing flash attention
# flash attention
import subprocess
subprocess.run('pip install flash-attn --no-build-isolation', env={'FLASH_ATTENTION_SKIP_CUDA_BUILD': "TRUE"}, shell=True)
I hope to install causal-conv1d
and mamba-ssm
libraries too :)
For anyone who will come to this in the future, i found a better solution than installing flash-attn
through a subprocess
Go to flash-attention-prebuild-wheels and get the wheel that matches the torch/cuda built you have in the space and then just add it to your requirements.txt
file. It should look something like this:
pyyaml
https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.8/flash_attn-2.7.4.post1+cu126torch2.7-cp310-cp310-linux_x86_64.whl
spconv-cu121
Why this is better ?
- The subprocess command doesn't build flash-attn from the start to you torch/cuda built, instead it looks in the prebuilt wheels released by the flash-attn authors and if it found something that matches what you have it will pull it and install it.
- Now the problem in this, is that if there is prebuilt wheel for your built (for me there were no flash-attn wheel for torch2.7+cu126) then it will get something else and when you used it will through for you an error and you won't what causes it.
ZeroGPU spaces
doesn't really provide GPU for your space, instead it moves your decorated function to a GPU instance when excuting and returning to where it was after it finsihes, that's why any attempt to build flash-attn or any other GPU based package when the space runs will give you an error because it will find no cuda to built it for.- This leaves your only choice is to find the build versions for you and fetch the correct prebuilt wheel and install. This will not through you an error because it will only check for cuda when excuting and this will happen on GPU so no problem should occur.