gemma-3-27b-it-FP8 sometimes crashes
#90
by
mondaylord
- opened
When I run this model with vllm v0.10.1, CUDA 12.8, it is successful at first.
But after about 30 minutes, it will crash with such error, but I can't reproduce it every time. Don't know where could be wrong. Please take a look.RuntimeError: CUDA error: an illegal memory access was encountered. CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with
TORCH_USE_CUDA_DSA to enable device-side assertions. Error in chat completion stream generator.
My error is similar to this issue https://github.com/vllm-project/vllm/issues/21708, however, it works fine at first, only crashes after some time(like 30 minutes or so)