gemma-3-27b-it-FP8 sometimes crashes

#90
by mondaylord - opened

When I run this model with vllm v0.10.1, CUDA 12.8, it is successful at first.
But after about 30 minutes, it will crash with such error, but I can't reproduce it every time. Don't know where could be wrong. Please take a look.
RuntimeError: CUDA error: an illegal memory access was encountered. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. Error in chat completion stream generator.

My error is similar to this issue https://github.com/vllm-project/vllm/issues/21708, however, it works fine at first, only crashes after some time(like 30 minutes or so)

Sign up or log in to comment