Spaces:

zero-gpu-explorers
/

README

Running

App Files Files Community

164

Error!

#63

by BK-Lee - opened May 29, 2024

Discussion

BK-Lee

ZeroGPU Explorers org May 29, 2024

How can solve this error?

I did @space .GPU.

Artples

ZeroGPU Explorers org May 29, 2024

It think it just takes a while during busy times to get a GPU out of the cluster. You just need to wait a bit.

BK-Lee

ZeroGPU Explorers org May 29, 2024

can we not use bitsandbytes with ZeroGPU?

In addition, I would like to know how to use Flash attention with ZeroGPU!!

Artples

ZeroGPU Explorers org May 29, 2024

You can use bitsandbytes, I've used it myself.
I'm not sure, haven't needed it.

BK-Lee

ZeroGPU Explorers org May 29, 2024

One more question! Is there any way to overcome this issue?

Only way of waiting time...

I cant understand I only debug the code with one text input: hi...

Artples

ZeroGPU Explorers org May 29, 2024

Well, you still exceeded your quota. The quota is fixed for any kind of usage, because you still use a costly GPU when debugging.

KingNish

ZeroGPU Explorers org May 29, 2024

For first problem:
Use accelerate in requirements.txt, use @spaces.GPU(queue=False) and use default theme and UI (Yes UI causes this issue, I found it today).

For second one:
use this
@spaces.GPU(queue=False, time=30sec)
Choose a time that meets your needs. However, if a query exceeds this duration, task will be terminated.

BK-Lee

ZeroGPU Explorers org May 29, 2024

Actually, I am worried about its warning issue for bitsandbytes! @Artples did you happen to see?

Artples

ZeroGPU Explorers org May 29, 2024

Yeah, that's normal for the ZeroGPU Runtime. It should still work, atleast it worked for me.

BK-Lee

ZeroGPU Explorers org May 31, 2024

•

edited May 31, 2024

I solved the issue of installing flash attention

# flash attention
import subprocess
subprocess.run('pip install flash-attn --no-build-isolation', env={'FLASH_ATTENTION_SKIP_CUDA_BUILD': "TRUE"}, shell=True)

I hope to install causal-conv1d and mamba-ssm libraries too :)

MohamedRashad

25 days ago

•

edited 25 days ago

For anyone who will come to this in the future, i found a better solution than installing flash-attn through a subprocess

Go to flash-attention-prebuild-wheels and get the wheel that matches the torch/cuda built you have in the space and then just add it to your requirements.txt file. It should look something like this:

pyyaml
https://github.com/mjun0812/flash-attention-prebuild-wheels/releases/download/v0.0.8/flash_attn-2.7.4.post1+cu126torch2.7-cp310-cp310-linux_x86_64.whl
spconv-cu121

Why this is better ?

The subprocess command doesn't build flash-attn from the start to you torch/cuda built, instead it looks in the prebuilt wheels released by the flash-attn authors and if it found something that matches what you have it will pull it and install it.
Now the problem in this, is that if there is prebuilt wheel for your built (for me there were no flash-attn wheel for torch2.7+cu126) then it will get something else and when you used it will through for you an error and you won't what causes it.
ZeroGPU spaces doesn't really provide GPU for your space, instead it moves your decorated function to a GPU instance when excuting and returning to where it was after it finsihes, that's why any attempt to build flash-attn or any other GPU based package when the space runs will give you an error because it will find no cuda to built it for.
This leaves your only choice is to find the build versions for you and fetch the correct prebuilt wheel and install. This will not through you an error because it will only check for cuda when excuting and this will happen on GPU so no problem should occur.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment