LeX-Lumina

Runtime error

App Files Files Community

RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1695392020201/work/c10/cuda/CUDACachingAllocator.cpp":1154, please report a bug to PyTorch.

by stzhao - opened Mar 27

Discussion

stzhao

Owner Mar 27

•

edited Mar 27

This is a zeroGPU space for our research project to be released. In this space, I first run a 14B prompt enhancer, then run a 2B t2i model. But when the denoised latent tensor was sent to the VAE decoder, I got this error:

RuntimeError: NVML_SUCCESS == r INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1695392020201/work/c10/cuda/CUDACachingAllocator.cpp":1154, please report a bug to PyTorch.

Now I have run out of my quota and can not debug, thus I want to seek the help of the community. @hysts

stzhao

Owner Mar 27

@hysts really need your help ;)

hysts

Mar 27

Hi @stzhao , the error is usually raised when CUDA OOM occurs.
But your Space seems to work fine for me.

hysts

Mar 27

It might be unrelated, but it seems that you are calling .to("cuda") in functions decorated with @spaces.GPU https://huggingface.co/spaces/stzhao/LeX-Lumina/blob/f25d2fbc1f356718c4e9ed12c23a61395d28b9d3/app.py#L46, but it's recommended to call it in the global context. For example, https://huggingface.co/spaces/black-forest-labs/FLUX.1-dev/blob/2f733451dcd2c6690953bf03ced2b9d89e6546f3/app.py#L10-L15.

stzhao

Owner Mar 27

Thanks for the advice, I will check it and modify the code. Yes, the space goes well when the prompt enhancer is not enabled. But in the advanced settings, if you enable the prompt enhancer, you'll get the error I mentioned above. Could you have a try and see how to fix it? Thank you so much!

hysts

Mar 28

•

edited Mar 28

Ah, I see. Yeah, now I'm getting the error.
BTW, unrelated to the CUDA OOM issue, but gr.Textbox.update in this line needs to be replaced with gr.Textbox.

stzhao

Owner Mar 28

OK I fix the bug of gr.Textbox.update. But I met another error, sometimes when running the denoising process, I have this error.

hysts

Mar 28

The GPU task aborted error is raised when your function decorated with @spaces.GPU takes longer than the specified duration, so you might want to adjust it.

hysts

Mar 28

BTW, regarding the CUDA OOM issue, maybe you can create a separate Space for the enhancer and call it using Gradio API (gradio-client) from the main Space.

stzhao

Owner Mar 31

Thank you for the advice, I will have a try.

stzhao

Owner Mar 31

•

edited Mar 31

I have tried your advice and met this error. Could you help me on this :)

hysts

Mar 31

Ah, I've forgotten that it's a bit tricky to call ZeroGPU Spaces using gradio-client. Can you try this?

stzhao

Owner Mar 31

Thank you, let me check this document.

hysts

Apr 1

@stzhao I think I've finally figured out the weird gradio-client error. Looks like it's caused by SSR, which is enabled by default on HF Spaces. Could you try setting the GRADIO_SSR_MODE environment variable to False and see if it fixes the issue?

hysts

Apr 1

You can set environment variables from the Space Settings.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment