example code 실행이 안 됩니다.
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:110: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion probability tensor contains either
inf,
nan or element < 0
failed.
이런 오류가 나는데 원인이 뭘까요?
최신 transformers 패키지와 머신에 맞는 torch버젼을 설치하신 뒤 실행해보시길 바랍니다.
pip install -U transformers
https://pytorch.org/get-started/previous-versions/
버전 requirements 알 수 있을까요?
requirements은 따로 없고 최신버전으로 사용해보시길 바랍니다.
아래는 정상작동하는 저희의 pip list입니다
Package Version
------------------------ ------------
accelerate 1.6.0
certifi 2025.1.31
charset-normalizer 3.4.1
filelock 3.13.1
fsspec 2024.6.1
huggingface-hub 0.30.2
idna 3.10
Jinja2 3.1.4
MarkupSafe 2.1.5
mpmath 1.3.0
networkx 3.3
numpy 2.1.2
nvidia-cublas-cu12 12.4.5.8
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127
nvidia-cudnn-cu12 9.1.0.70
nvidia-cufft-cu12 11.2.1.3
nvidia-curand-cu12 10.3.5.147
nvidia-cusolver-cu12 11.6.1.9
nvidia-cusparse-cu12 12.3.1.170
nvidia-cusparselt-cu12 0.6.2
nvidia-nccl-cu12 2.21.5
nvidia-nvjitlink-cu12 12.4.127
nvidia-nvtx-cu12 12.4.127
packaging 24.2
pillow 11.0.0
pip 22.0.2
psutil 7.0.0
PyYAML 6.0.2
regex 2024.11.6
requests 2.32.3
safetensors 0.5.3
setuptools 59.6.0
sympy 1.13.1
tokenizers 0.21.1
torch 2.6.0+cu124
torchaudio 2.6.0+cu124
torchvision 0.21.0+cu124
tqdm 4.67.1
transformers 4.51.3
triton 3.2.0
typing_extensions 4.12.2
urllib3 2.4.0
Name Version Build Channel
_libgcc_mutex 0.1 conda_forge conda-forge
_openmp_mutex 4.5 2_gnu conda-forge
accelerate 1.6.0 pypi_0 pypi
bzip2 1.0.8 h4bc722e_7 conda-forge
ca-certificates 2025.1.31 hbcca054_0 conda-forge
certifi 2025.1.31 pypi_0 pypi
charset-normalizer 3.4.1 pypi_0 pypi
filelock 3.13.1 pypi_0 pypi
fsspec 2024.6.1 pypi_0 pypi
huggingface-hub 0.30.2 pypi_0 pypi
idna 3.10 pypi_0 pypi
jinja2 3.1.4 pypi_0 pypi
ld_impl_linux-64 2.43 h712a8e2_4 conda-forge
libexpat 2.7.0 h5888daf_0 conda-forge
libffi 3.4.6 h2dba641_1 conda-forge
libgcc 14.2.0 h767d61c_2 conda-forge
libgcc-ng 14.2.0 h69a702a_2 conda-forge
libgomp 14.2.0 h767d61c_2 conda-forge
liblzma 5.8.1 hb9d3cd8_0 conda-forge
libnsl 2.0.1 hd590300_0 conda-forge
libsqlite 3.49.1 hee588c1_2 conda-forge
libuuid 2.38.1 h0b41bf4_0 conda-forge
libxcrypt 4.4.36 hd590300_1 conda-forge
libzlib 1.3.1 hb9d3cd8_2 conda-forge
markupsafe 2.1.5 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
ncurses 6.5 h2d0b736_3 conda-forge
networkx 3.3 pypi_0 pypi
numpy 2.1.2 pypi_0 pypi
nvidia-cublas-cu12 12.4.5.8 pypi_0 pypi
nvidia-cuda-cupti-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-nvrtc-cu12 12.4.127 pypi_0 pypi
nvidia-cuda-runtime-cu12 12.4.127 pypi_0 pypi
nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi
nvidia-cufft-cu12 11.2.1.3 pypi_0 pypi
nvidia-curand-cu12 10.3.5.147 pypi_0 pypi
nvidia-cusolver-cu12 11.6.1.9 pypi_0 pypi
nvidia-cusparse-cu12 12.3.1.170 pypi_0 pypi
nvidia-cusparselt-cu12 0.6.2 pypi_0 pypi
nvidia-nccl-cu12 2.21.5 pypi_0 pypi
nvidia-nvjitlink-cu12 12.4.127 pypi_0 pypi
nvidia-nvtx-cu12 12.4.127 pypi_0 pypi
openssl 3.5.0 h7b32b05_0 conda-forge
packaging 24.2 pypi_0 pypi
pillow 11.0.0 pypi_0 pypi
pip 22.0.2 pypi_0 pypi
psutil 7.0.0 pypi_0 pypi
python 3.12.10 h9e4cc4f_0_cpython conda-forge
pyyaml 6.0.2 pypi_0 pypi
readline 8.2 h8c095d6_2 conda-forge
regex 2024.11.6 pypi_0 pypi
requests 2.32.3 pypi_0 pypi
safetensors 0.5.3 pypi_0 pypi
setuptools 59.6.0 pypi_0 pypi
sympy 1.13.1 pypi_0 pypi
tk 8.6.13 noxft_h4845f30_101 conda-forge
tokenizers 0.21.1 pypi_0 pypi
torch 2.6.0+cu124 pypi_0 pypi
torchaudio 2.6.0+cu124 pypi_0 pypi
torchvision 0.21.0+cu124 pypi_0 pypi
tqdm 4.67.1 pypi_0 pypi
transformers 4.51.3 pypi_0 pypi
triton 3.2.0 pypi_0 pypi
typing-extensions 4.12.2 pypi_0 pypi
tzdata 2025b h78e105d_0 conda-forge
urllib3 2.4.0 pypi_0 pypi
wheel 0.45.1 pyhd8ed1ab_1 conda-forge
위와 같이 알려주신대로 설치했는데도 아래와 같은 오류가 나네요 ㅠ
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:110: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion probability tensor contains either
inf,
nan or element < 0
failed.