Error!

#63
by BK-Lee - opened
ZeroGPU Explorers org

image.png

How can solve this error?

I did @space .GPU.

ZeroGPU Explorers org

It think it just takes a while during busy times to get a GPU out of the cluster. You just need to wait a bit.

ZeroGPU Explorers org

can we not use bitsandbytes with ZeroGPU?

In addition, I would like to know how to use Flash attention with ZeroGPU!!

ZeroGPU Explorers org
  1. You can use bitsandbytes, I've used it myself.
  2. I'm not sure, haven't needed it.
ZeroGPU Explorers org

image.png

One more question! Is there any way to overcome this issue?

Only way of waiting time...

I cant understand I only debug the code with one text input: hi...

ZeroGPU Explorers org

Well, you still exceeded your quota. The quota is fixed for any kind of usage, because you still use a costly GPU when debugging.

ZeroGPU Explorers org

For first problem:
Use accelerate in requirements.txt, use @spaces.GPU(queue=False) and use default theme and UI (Yes UI causes this issue, I found it today).

For second one:
use this
@spaces.GPU(queue=False, time=30sec)
Choose a time that meets your needs. However, if a query exceeds this duration, task will be terminated.

ZeroGPU Explorers org

image.png

Actually, I am worried about its warning issue for bitsandbytes! @Artples did you happen to see?

ZeroGPU Explorers org

Yeah, that's normal for the ZeroGPU Runtime. It should still work, atleast it worked for me.

ZeroGPU Explorers org
โ€ข
edited May 31, 2024

I solved the issue of installing flash attention

# flash attention
import subprocess
subprocess.run('pip install flash-attn --no-build-isolation', env={'FLASH_ATTENTION_SKIP_CUDA_BUILD': "TRUE"}, shell=True)

I hope to install causal-conv1d and mamba-ssm libraries too :)

For anyone who will come to this in the future, i found a better solution than installing flash-attn through a subprocess

Go to flash-attention-prebuild-wheels and get the wheel that matches the torch/cuda built you have in the space and then just add it to your requirements.txt file. It should look something like this:

pyyaml
https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.8/flash_attn-2.7.4.post1+cu126torch2.7-cp310-cp310-linux_x86_64.whl
spconv-cu121

Why this is better ?

  • The subprocess command doesn't build flash-attn from the start to you torch/cuda built, instead it looks in the prebuilt wheels released by the flash-attn authors and if it found something that matches what you have it will pull it and install it.
  • Now the problem in this, is that if there is prebuilt wheel for your built (for me there were no flash-attn wheel for torch2.7+cu126) then it will get something else and when you used it will through for you an error and you won't what causes it.
  • ZeroGPU spaces doesn't really provide GPU for your space, instead it moves your decorated function to a GPU instance when excuting and returning to where it was after it finsihes, that's why any attempt to build flash-attn or any other GPU based package when the space runs will give you an error because it will find no cuda to built it for.
  • This leaves your only choice is to find the build versions for you and fetch the correct prebuilt wheel and install. This will not through you an error because it will only check for cuda when excuting and this will happen on GPU so no problem should occur.

Sign up or log in to comment