ClearML Task: overwriting (reusing) task id=c1393538e4394832aeb1fbefd1138499 2025-05-15 22:51:31,007 - clearml.Task - INFO - No repository found, storing script code instead /home/alexandre/spft/axolotl/venv/lib/python3.10/site-packages/google/auth/_default.py:76: UserWarning: Your application has authenticated using end user credentials from Google Cloud SDK without a quota project. You might receive a "quota exceeded" or "API not enabled" error. See the following page for troubleshooting: https://cloud.google.com/docs/authentication/adc-troubleshooting/user-creds. warnings.warn(_CLOUD_SDK_CREDENTIALS_WARNING) [2025-05-15 22:51:38,558] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-15 22:51:38,625] [INFO] [root.spawn:77] [PID:2204328] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpyoys4b8u/test.c -o /tmp/tmpyoys4b8u/test.o [2025-05-15 22:51:38,641] [INFO] [root.spawn:77] [PID:2204328] x86_64-linux-gnu-gcc /tmp/tmpyoys4b8u/test.o -laio -o /tmp/tmpyoys4b8u/a.out [2025-05-15 22:51:38,657] [INFO] [root.spawn:77] [PID:2204328] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpy50ah3qc/test.c -o /tmp/tmpy50ah3qc/test.o [2025-05-15 22:51:38,672] [INFO] [root.spawn:77] [PID:2204328] x86_64-linux-gnu-gcc /tmp/tmpy50ah3qc/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmpy50ah3qc/a.out [2025-05-15 22:51:38,708] [INFO] [root.spawn:77] [PID:2204328] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpyicz15p2/test.c -o /tmp/tmpyicz15p2/test.o [2025-05-15 22:51:38,722] [INFO] [root.spawn:77] [PID:2204328] x86_64-linux-gnu-gcc /tmp/tmpyicz15p2/test.o -laio -o /tmp/tmpyicz15p2/a.out WARNING:accelerate.commands.launch:The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `8` More than one GPU was found, enabling multi-GPU training. If this was unintended please pass in `--num_processes=1`. `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. [2025-05-15 22:51:50,065] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-15 22:51:50,096] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-15 22:51:50,125] [INFO] [root.spawn:77] [PID:2204919] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmp73lpcrwx/test.c -o /tmp/tmp73lpcrwx/test.o [2025-05-15 22:51:50,125] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-15 22:51:50,140] [INFO] [root.spawn:77] [PID:2204919] x86_64-linux-gnu-gcc /tmp/tmp73lpcrwx/test.o -laio -o /tmp/tmp73lpcrwx/a.out [2025-05-15 22:51:50,143] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-15 22:51:50,151] [INFO] [root.spawn:77] [PID:2204919] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmp5yxnhm2u/test.c -o /tmp/tmp5yxnhm2u/test.o [2025-05-15 22:51:50,157] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-15 22:51:50,159] [INFO] [root.spawn:77] [PID:2204920] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpsdmi593h/test.c -o /tmp/tmpsdmi593h/test.o [2025-05-15 22:51:50,166] [INFO] [root.spawn:77] [PID:2204919] x86_64-linux-gnu-gcc /tmp/tmp5yxnhm2u/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmp5yxnhm2u/a.out [2025-05-15 22:51:50,176] [INFO] [root.spawn:77] [PID:2204920] x86_64-linux-gnu-gcc /tmp/tmpsdmi593h/test.o -laio -o /tmp/tmpsdmi593h/a.out [2025-05-15 22:51:50,184] [INFO] [root.spawn:77] [PID:2204915] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmphcv_cs1x/test.c -o /tmp/tmphcv_cs1x/test.o [2025-05-15 22:51:50,190] [INFO] [root.spawn:77] [PID:2204920] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpp6f5v3z3/test.c -o /tmp/tmpp6f5v3z3/test.o [2025-05-15 22:51:50,199] [INFO] [root.spawn:77] [PID:2204915] x86_64-linux-gnu-gcc /tmp/tmphcv_cs1x/test.o -laio -o /tmp/tmphcv_cs1x/a.out [2025-05-15 22:51:50,200] [INFO] [root.spawn:77] [PID:2204919] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmp_xn797lf/test.c -o /tmp/tmp_xn797lf/test.o [2025-05-15 22:51:50,203] [INFO] [root.spawn:77] [PID:2204917] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpxd7k_vil/test.c -o /tmp/tmpxd7k_vil/test.o [2025-05-15 22:51:50,205] [INFO] [root.spawn:77] [PID:2204920] x86_64-linux-gnu-gcc /tmp/tmpp6f5v3z3/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmpp6f5v3z3/a.out [2025-05-15 22:51:50,206] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-15 22:51:50,210] [INFO] [root.spawn:77] [PID:2204915] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpmxr0y1zz/test.c -o /tmp/tmpmxr0y1zz/test.o [2025-05-15 22:51:50,214] [INFO] [root.spawn:77] [PID:2204919] x86_64-linux-gnu-gcc /tmp/tmp_xn797lf/test.o -laio -o /tmp/tmp_xn797lf/a.out [2025-05-15 22:51:50,216] [INFO] [root.spawn:77] [PID:2204916] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpwntzrrzu/test.c -o /tmp/tmpwntzrrzu/test.o [2025-05-15 22:51:50,218] [INFO] [root.spawn:77] [PID:2204917] x86_64-linux-gnu-gcc /tmp/tmpxd7k_vil/test.o -laio -o /tmp/tmpxd7k_vil/a.out [2025-05-15 22:51:50,223] [INFO] [root.spawn:77] [PID:2204915] x86_64-linux-gnu-gcc /tmp/tmpmxr0y1zz/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmpmxr0y1zz/a.out [2025-05-15 22:51:50,229] [INFO] [root.spawn:77] [PID:2204917] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmp2sx66j8j/test.c -o /tmp/tmp2sx66j8j/test.o [2025-05-15 22:51:50,231] [INFO] [root.spawn:77] [PID:2204916] x86_64-linux-gnu-gcc /tmp/tmpwntzrrzu/test.o -laio -o /tmp/tmpwntzrrzu/a.out [2025-05-15 22:51:50,240] [INFO] [root.spawn:77] [PID:2204920] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmp0lj7aic7/test.c -o /tmp/tmp0lj7aic7/test.o [2025-05-15 22:51:50,242] [INFO] [root.spawn:77] [PID:2204916] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmp12nb7ydo/test.c -o /tmp/tmp12nb7ydo/test.o [2025-05-15 22:51:50,246] [INFO] [root.spawn:77] [PID:2204917] x86_64-linux-gnu-gcc /tmp/tmp2sx66j8j/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmp2sx66j8j/a.out [2025-05-15 22:51:50,255] [INFO] [root.spawn:77] [PID:2204916] x86_64-linux-gnu-gcc /tmp/tmp12nb7ydo/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmp12nb7ydo/a.out [2025-05-15 22:51:50,257] [INFO] [root.spawn:77] [PID:2204915] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpwgk7ni43/test.c -o /tmp/tmpwgk7ni43/test.o [2025-05-15 22:51:50,258] [INFO] [root.spawn:77] [PID:2204920] x86_64-linux-gnu-gcc /tmp/tmp0lj7aic7/test.o -laio -o /tmp/tmp0lj7aic7/a.out [2025-05-15 22:51:50,267] [INFO] [root.spawn:77] [PID:2204918] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmp799bphcy/test.c -o /tmp/tmp799bphcy/test.o [2025-05-15 22:51:50,271] [INFO] [root.spawn:77] [PID:2204915] x86_64-linux-gnu-gcc /tmp/tmpwgk7ni43/test.o -laio -o /tmp/tmpwgk7ni43/a.out [2025-05-15 22:51:50,281] [INFO] [root.spawn:77] [PID:2204917] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpqvf_g8ly/test.c -o /tmp/tmpqvf_g8ly/test.o [2025-05-15 22:51:50,283] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-15 22:51:50,286] [INFO] [root.spawn:77] [PID:2204918] x86_64-linux-gnu-gcc /tmp/tmp799bphcy/test.o -laio -o /tmp/tmp799bphcy/a.out [2025-05-15 22:51:50,290] [INFO] [root.spawn:77] [PID:2204916] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpft_h_ct7/test.c -o /tmp/tmpft_h_ct7/test.o [2025-05-15 22:51:50,294] [INFO] [root.spawn:77] [PID:2204917] x86_64-linux-gnu-gcc /tmp/tmpqvf_g8ly/test.o -laio -o /tmp/tmpqvf_g8ly/a.out [2025-05-15 22:51:50,300] [INFO] [root.spawn:77] [PID:2204918] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmplgh8faiu/test.c -o /tmp/tmplgh8faiu/test.o [2025-05-15 22:51:50,304] [INFO] [root.spawn:77] [PID:2204916] x86_64-linux-gnu-gcc /tmp/tmpft_h_ct7/test.o -laio -o /tmp/tmpft_h_ct7/a.out [2025-05-15 22:51:50,315] [INFO] [root.spawn:77] [PID:2204918] x86_64-linux-gnu-gcc /tmp/tmplgh8faiu/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmplgh8faiu/a.out [2025-05-15 22:51:50,330] [INFO] [real_accelerator.py:219:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-05-15 22:51:50,344] [INFO] [root.spawn:77] [PID:2204921] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpl_q76i03/test.c -o /tmp/tmpl_q76i03/test.o [2025-05-15 22:51:50,350] [INFO] [root.spawn:77] [PID:2204918] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpd863qkgz/test.c -o /tmp/tmpd863qkgz/test.o [2025-05-15 22:51:50,358] [INFO] [root.spawn:77] [PID:2204921] x86_64-linux-gnu-gcc /tmp/tmpl_q76i03/test.o -laio -o /tmp/tmpl_q76i03/a.out [2025-05-15 22:51:50,364] [INFO] [root.spawn:77] [PID:2204918] x86_64-linux-gnu-gcc /tmp/tmpd863qkgz/test.o -laio -o /tmp/tmpd863qkgz/a.out [2025-05-15 22:51:50,373] [INFO] [root.spawn:77] [PID:2204921] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmp011kwcbo/test.c -o /tmp/tmp011kwcbo/test.o [2025-05-15 22:51:50,389] [INFO] [root.spawn:77] [PID:2204921] x86_64-linux-gnu-gcc /tmp/tmp011kwcbo/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmp011kwcbo/a.out [2025-05-15 22:51:50,393] [INFO] [root.spawn:77] [PID:2204922] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpju0cfv10/test.c -o /tmp/tmpju0cfv10/test.o [2025-05-15 22:51:50,408] [INFO] [root.spawn:77] [PID:2204922] x86_64-linux-gnu-gcc /tmp/tmpju0cfv10/test.o -laio -o /tmp/tmpju0cfv10/a.out [2025-05-15 22:51:50,420] [INFO] [root.spawn:77] [PID:2204922] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpebubdx4f/test.c -o /tmp/tmpebubdx4f/test.o [2025-05-15 22:51:50,425] [INFO] [root.spawn:77] [PID:2204921] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmplkjb4xt5/test.c -o /tmp/tmplkjb4xt5/test.o [2025-05-15 22:51:50,435] [INFO] [root.spawn:77] [PID:2204922] x86_64-linux-gnu-gcc /tmp/tmpebubdx4f/test.o -L/usr/local/cuda -L/usr/local/cuda/lib64 -lcufile -o /tmp/tmpebubdx4f/a.out [2025-05-15 22:51:50,440] [INFO] [root.spawn:77] [PID:2204921] x86_64-linux-gnu-gcc /tmp/tmplkjb4xt5/test.o -laio -o /tmp/tmplkjb4xt5/a.out [2025-05-15 22:51:50,470] [INFO] [root.spawn:77] [PID:2204922] x86_64-linux-gnu-gcc -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -c /tmp/tmpr_6_rwf8/test.c -o /tmp/tmpr_6_rwf8/test.o [2025-05-15 22:51:50,484] [INFO] [root.spawn:77] [PID:2204922] x86_64-linux-gnu-gcc /tmp/tmpr_6_rwf8/test.o -laio -o /tmp/tmpr_6_rwf8/a.out [2025-05-15 22:51:50,650] [INFO] [datasets.:54] [PID:2204919] PyTorch version 2.7.0 available. [2025-05-15 22:51:50,673] [INFO] [datasets.:54] [PID:2204920] PyTorch version 2.7.0 available. [2025-05-15 22:51:50,696] [INFO] [datasets.:54] [PID:2204917] PyTorch version 2.7.0 available. [2025-05-15 22:51:50,697] [INFO] [datasets.:54] [PID:2204915] PyTorch version 2.7.0 available. [2025-05-15 22:51:50,714] [INFO] [datasets.:54] [PID:2204916] PyTorch version 2.7.0 available. [2025-05-15 22:51:50,776] [INFO] [datasets.:54] [PID:2204918] PyTorch version 2.7.0 available. [2025-05-15 22:51:50,880] [INFO] [datasets.:54] [PID:2204921] PyTorch version 2.7.0 available. [2025-05-15 22:51:50,890] [INFO] [datasets.:54] [PID:2204922] PyTorch version 2.7.0 available. [2025-05-15 22:51:52,153] [INFO] [root.register:348] [PID:2204915] Attempting to load plugin: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,154] [INFO] [root.register:351] [PID:2204915] Plugin loaded successfully: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,154] [INFO] [root.register:348] [PID:2204915] Attempting to load plugin: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,184] [INFO] [root.register:348] [PID:2204919] Attempting to load plugin: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,186] [INFO] [root.register:351] [PID:2204919] Plugin loaded successfully: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,186] [INFO] [root.register:348] [PID:2204919] Attempting to load plugin: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,188] [INFO] [root.register:348] [PID:2204920] Attempting to load plugin: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,191] [INFO] [root.register:351] [PID:2204920] Plugin loaded successfully: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,191] [INFO] [root.register:348] [PID:2204920] Attempting to load plugin: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,227] [INFO] [root.register:348] [PID:2204917] Attempting to load plugin: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,229] [INFO] [root.register:351] [PID:2204917] Plugin loaded successfully: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,229] [INFO] [root.register:348] [PID:2204917] Attempting to load plugin: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,237] [INFO] [root.register:351] [PID:2204915] Plugin loaded successfully: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,277] [INFO] [root.register:351] [PID:2204919] Plugin loaded successfully: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,282] [INFO] [root.register:351] [PID:2204920] Plugin loaded successfully: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,296] [INFO] [root.register:348] [PID:2204916] Attempting to load plugin: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,298] [INFO] [root.register:351] [PID:2204916] Plugin loaded successfully: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,298] [INFO] [root.register:348] [PID:2204916] Attempting to load plugin: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,318] [INFO] [root.register:351] [PID:2204917] Plugin loaded successfully: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,348] [INFO] [root.register:348] [PID:2204922] Attempting to load plugin: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,349] [INFO] [root.register:351] [PID:2204922] Plugin loaded successfully: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,349] [INFO] [root.register:348] [PID:2204922] Attempting to load plugin: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,353] [INFO] [root.register:348] [PID:2204918] Attempting to load plugin: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,355] [INFO] [root.register:351] [PID:2204918] Plugin loaded successfully: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,355] [INFO] [root.register:348] [PID:2204918] Attempting to load plugin: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,390] [INFO] [root.register:351] [PID:2204916] Plugin loaded successfully: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,433] [INFO] [root.register:351] [PID:2204922] Plugin loaded successfully: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,441] [INFO] [root.register:351] [PID:2204918] Plugin loaded successfully: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,443] [INFO] [root.register:348] [PID:2204921] Attempting to load plugin: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,445] [INFO] [root.register:351] [PID:2204921] Plugin loaded successfully: axolotl.integrations.liger.LigerPlugin [2025-05-15 22:51:52,445] [INFO] [root.register:348] [PID:2204921] Attempting to load plugin: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:52,534] [INFO] [root.register:351] [PID:2204921] Plugin loaded successfully: axolotl.integrations.llm_compressor.LLMCompressorPlugin [2025-05-15 22:51:54,507] [DEBUG] [axolotl.resolve_dtype:65] [PID:2204917] [RANK:2] bf16 support detected, enabling for this configuration. [2025-05-15 22:51:54,514] [DEBUG] [axolotl.resolve_dtype:65] [PID:2204919] [RANK:4] bf16 support detected, enabling for this configuration. [2025-05-15 22:51:54,516] [DEBUG] [axolotl.resolve_dtype:65] [PID:2204921] [RANK:6] bf16 support detected, enabling for this configuration. [2025-05-15 22:51:54,518] [DEBUG] [axolotl.resolve_dtype:65] [PID:2204920] [RANK:5] bf16 support detected, enabling for this configuration. [2025-05-15 22:51:54,522] [DEBUG] [axolotl.resolve_dtype:65] [PID:2204915] [RANK:0] bf16 support detected, enabling for this configuration. [2025-05-15 22:51:54,551] [DEBUG] [axolotl.resolve_dtype:65] [PID:2204918] [RANK:3] bf16 support detected, enabling for this configuration. [2025-05-15 22:51:54,557] [INFO] [axolotl.normalize_config:237] [PID:2204920] [RANK:5] cuda memory usage baseline: 0.000GB (+0.461GB misc) [2025-05-15 22:51:54,565] [INFO] [axolotl.normalize_config:237] [PID:2204919] [RANK:4] cuda memory usage baseline: 0.000GB (+0.461GB misc) [2025-05-15 22:51:54,565] [INFO] [axolotl.normalize_config:237] [PID:2204921] [RANK:6] cuda memory usage baseline: 0.000GB (+0.461GB misc) [2025-05-15 22:51:54,571] [INFO] [axolotl.normalize_config:237] [PID:2204915] [RANK:0] cuda memory usage baseline: 0.000GB (+0.461GB misc) [2025-05-15 22:51:54,574] [DEBUG] [axolotl.resolve_dtype:65] [PID:2204916] [RANK:1] bf16 support detected, enabling for this configuration. [2025-05-15 22:51:54,606] [INFO] [axolotl.normalize_config:237] [PID:2204918] [RANK:3] cuda memory usage baseline: 0.000GB (+0.461GB misc) [2025-05-15 22:51:54,608] [INFO] [axolotl.normalize_config:237] [PID:2204917] [RANK:2] cuda memory usage baseline: 0.000GB (+0.461GB misc) [2025-05-15 22:51:54,612] [DEBUG] [axolotl.resolve_dtype:65] [PID:2204922] [RANK:7] bf16 support detected, enabling for this configuration. [2025-05-15 22:51:54,630] [INFO] [axolotl.normalize_config:237] [PID:2204916] [RANK:1] cuda memory usage baseline: 0.000GB (+0.461GB misc) [2025-05-15 22:51:54,680] [INFO] [axolotl.normalize_config:237] [PID:2204922] [RANK:7] cuda memory usage baseline: 0.000GB (+0.461GB misc) #@@ #@@ @@# @@# @@ @@ @@ @@ =@@# @@ #@ =@@#. @@ #@@@@@@@@@ @@ #@#@= @@ #@ .=@@ #@@@@@@@@@@@@@@@@@ =@# @# ##= ## =####=+ @@ =#####+ =#@@###. @@ @@@@@@@@@@/ +@@/ +@@ #@ =@= #@= @@ =@#+ +#@# @@ =@#+ +#@# #@. @@ @@@@@@@@@@ ##@@ ##@@ =@# @# =@# @# @@ @@ @@ @@ #@ #@ @@ @@@@@@@@@@@@@@@@@@@@ #@=+++#@= =@@# @@ @@ @@ @@ #@ #@ @@ =@#=====@@ =@# @# @@ @@ @@ @@ #@ #@ @@ @@@@@@@@@@@@@@@@ @@@@ #@ #@= #@= +@@ #@# =@# @@. =@# =@# #@. @@ =@# @# #@= #@ =#@@@@#= +#@@= +#@@@@#= .##@@+ @@ @@@@ @@@@@@@@@@@@@@@@ [2025-05-15 22:51:56,573] [DEBUG] [axolotl.utils.models.load_tokenizer:461] [PID:2204915] [RANK:0] EOS: 128001 / <|end_of_text|> [2025-05-15 22:51:56,574] [DEBUG] [axolotl.utils.models.load_tokenizer:462] [PID:2204915] [RANK:0] BOS: 128000 / <|begin_of_text|> [2025-05-15 22:51:56,574] [DEBUG] [axolotl.utils.models.load_tokenizer:463] [PID:2204915] [RANK:0] PAD: 128001 / <|end_of_text|> [2025-05-15 22:51:56,574] [DEBUG] [axolotl.utils.models.load_tokenizer:464] [PID:2204915] [RANK:0] UNK: None / None [2025-05-15 22:51:56,574] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204915] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:51:56,574] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:260] [PID:2204915] [RANK:0] Loading prepared dataset from disk at last_run_prepared/34fd097a6fdc6f75f0090b4a1d5dd3a0... [2025-05-15 22:51:56,582] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:262] [PID:2204915] [RANK:0] Prepared dataset loaded from disk... [rank0]:[W515 22:51:56.181082343 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device. [2025-05-15 22:51:56,865] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204917] [RANK:2] No Chat template selected. Consider adding a chat template for easier inference. [rank2]:[W515 22:51:56.245238440 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device. [2025-05-15 22:51:56,872] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204918] [RANK:3] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:51:56,872] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204919] [RANK:4] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:51:56,879] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204916] [RANK:1] No Chat template selected. Consider adding a chat template for easier inference. [rank4]:[W515 22:51:56.255395313 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device. [rank3]:[W515 22:51:56.256440028 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device. [rank1]:[W515 22:51:56.261842757 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device. [2025-05-15 22:51:56,886] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204922] [RANK:7] No Chat template selected. Consider adding a chat template for easier inference. [rank7]:[W515 22:51:56.271270856 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device. [2025-05-15 22:51:56,903] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204920] [RANK:5] No Chat template selected. Consider adding a chat template for easier inference. [rank5]:[W515 22:51:56.286513346 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device. [2025-05-15 22:51:56,933] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204921] [RANK:6] No Chat template selected. Consider adding a chat template for easier inference. [rank6]:[W515 22:51:56.313234716 ProcessGroupNCCL.cpp:4715] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 as device used by this process is currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. You can pecify device_id in init_process_group() to force use of a particular device. [2025-05-15 22:51:59,229] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:260] [PID:2204919] [RANK:4] Loading prepared dataset from disk at last_run_prepared/34fd097a6fdc6f75f0090b4a1d5dd3a0... [2025-05-15 22:51:59,229] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:260] [PID:2204918] [RANK:3] Loading prepared dataset from disk at last_run_prepared/34fd097a6fdc6f75f0090b4a1d5dd3a0... [2025-05-15 22:51:59,229] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:260] [PID:2204922] [RANK:7] Loading prepared dataset from disk at last_run_prepared/34fd097a6fdc6f75f0090b4a1d5dd3a0... [2025-05-15 22:51:59,229] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:260] [PID:2204921] [RANK:6] Loading prepared dataset from disk at last_run_prepared/34fd097a6fdc6f75f0090b4a1d5dd3a0... [2025-05-15 22:51:59,230] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:260] [PID:2204916] [RANK:1] Loading prepared dataset from disk at last_run_prepared/34fd097a6fdc6f75f0090b4a1d5dd3a0... [2025-05-15 22:51:59,230] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:260] [PID:2204917] [RANK:2] Loading prepared dataset from disk at last_run_prepared/34fd097a6fdc6f75f0090b4a1d5dd3a0... [2025-05-15 22:51:59,230] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:260] [PID:2204920] [RANK:5] Loading prepared dataset from disk at last_run_prepared/34fd097a6fdc6f75f0090b4a1d5dd3a0... [2025-05-15 22:51:59,242] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:262] [PID:2204919] [RANK:4] Prepared dataset loaded from disk... [2025-05-15 22:51:59,242] [DEBUG] [axolotl.calculate_total_num_steps:405] [PID:2204915] [RANK:0] total_num_tokens: 2_253_747 [2025-05-15 22:51:59,245] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:262] [PID:2204922] [RANK:7] Prepared dataset loaded from disk... [2025-05-15 22:51:59,246] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:262] [PID:2204921] [RANK:6] Prepared dataset loaded from disk... [2025-05-15 22:51:59,247] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:262] [PID:2204918] [RANK:3] Prepared dataset loaded from disk... [2025-05-15 22:51:59,247] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:262] [PID:2204917] [RANK:2] Prepared dataset loaded from disk... [2025-05-15 22:51:59,247] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:262] [PID:2204920] [RANK:5] Prepared dataset loaded from disk... [2025-05-15 22:51:59,250] [INFO] [axolotl.utils.data.sft.load_tokenized_prepared_datasets:262] [PID:2204916] [RANK:1] Prepared dataset loaded from disk... [2025-05-15 22:51:59,304] [DEBUG] [axolotl.calculate_total_num_steps:423] [PID:2204915] [RANK:0] `total_supervised_tokens: 195_019` [2025-05-15 22:52:00,630] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:434] [PID:2204915] [RANK:0] gather_len_batches: [559, 559, 559, 560, 559, 559, 559, 560] [2025-05-15 22:52:00,631] [DEBUG] [axolotl.calculate_total_num_steps:481] [PID:2204915] [RANK:0] data_loader_len: 69 [2025-05-15 22:52:00,650] [INFO] [axolotl.calc_sample_packing_eff_est:491] [PID:2204915] [RANK:0] sample_packing_eff_est across ranks: [0.984313428401947, 0.984313428401947, 0.984313428401947, 0.9825556874275208, 0.984313428401947, 0.984313428401947, 0.984313428401947, 0.9825556874275208] [2025-05-15 22:52:00,651] [DEBUG] [axolotl.calculate_total_num_steps:503] [PID:2204915] [RANK:0] sample_packing_eff_est: None [2025-05-15 22:52:00,651] [DEBUG] [axolotl.calculate_total_num_steps:516] [PID:2204915] [RANK:0] total_num_steps: 69 [2025-05-15 22:52:00,810] [DEBUG] [axolotl.calculate_total_num_steps:405] [PID:2204915] [RANK:0] total_num_tokens: 42_816_897 [2025-05-15 22:52:01,591] [DEBUG] [axolotl.calculate_total_num_steps:423] [PID:2204915] [RANK:0] `total_supervised_tokens: 3_699_357` [2025-05-15 22:52:02,200] [INFO] [axolotl.utils.samplers.multipack.calc_min_len:434] [PID:2204915] [RANK:0] gather_len_batches: [10617, 10617, 10621, 10617, 10621, 10622, 10617, 10620] [2025-05-15 22:52:02,201] [DEBUG] [axolotl.calculate_total_num_steps:481] [PID:2204915] [RANK:0] data_loader_len: 1327 [2025-05-15 22:52:02,202] [INFO] [axolotl.calc_sample_packing_eff_est:491] [PID:2204915] [RANK:0] sample_packing_eff_est across ranks: [0.9845854640007019, 0.9845854640007019, 0.9842146635055542, 0.9845854640007019, 0.9842146635055542, 0.9841220378875732, 0.9845854640007019, 0.9843073487281799] [2025-05-15 22:52:02,202] [DEBUG] [axolotl.calculate_total_num_steps:503] [PID:2204915] [RANK:0] sample_packing_eff_est: 0.99 [2025-05-15 22:52:02,202] [DEBUG] [axolotl.calculate_total_num_steps:516] [PID:2204915] [RANK:0] total_num_steps: 1327 [2025-05-15 22:52:02,219] [DEBUG] [axolotl.train.setup_model_and_tokenizer:63] [PID:2204921] [RANK:6] loading tokenizer... RedHatAI/Sparse-Llama-3.1-8B-2of4 [2025-05-15 22:52:02,219] [DEBUG] [axolotl.train.setup_model_and_tokenizer:63] [PID:2204917] [RANK:2] loading tokenizer... RedHatAI/Sparse-Llama-3.1-8B-2of4 [2025-05-15 22:52:02,222] [DEBUG] [axolotl.train.setup_model_and_tokenizer:63] [PID:2204919] [RANK:4] loading tokenizer... RedHatAI/Sparse-Llama-3.1-8B-2of4 [2025-05-15 22:52:02,224] [DEBUG] [axolotl.train.setup_model_and_tokenizer:63] [PID:2204922] [RANK:7] loading tokenizer... RedHatAI/Sparse-Llama-3.1-8B-2of4 [2025-05-15 22:52:02,225] [DEBUG] [axolotl.train.setup_model_and_tokenizer:63] [PID:2204916] [RANK:1] loading tokenizer... RedHatAI/Sparse-Llama-3.1-8B-2of4 [2025-05-15 22:52:02,226] [DEBUG] [axolotl.train.setup_model_and_tokenizer:63] [PID:2204920] [RANK:5] loading tokenizer... RedHatAI/Sparse-Llama-3.1-8B-2of4 [2025-05-15 22:52:02,229] [DEBUG] [axolotl.train.setup_model_and_tokenizer:63] [PID:2204918] [RANK:3] loading tokenizer... RedHatAI/Sparse-Llama-3.1-8B-2of4 [2025-05-15 22:52:02,252] [DEBUG] [axolotl.train.setup_model_and_tokenizer:63] [PID:2204915] [RANK:0] loading tokenizer... RedHatAI/Sparse-Llama-3.1-8B-2of4 [2025-05-15 22:52:02,551] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204918] [RANK:3] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:52:02,552] [DEBUG] [axolotl.train.setup_model_and_tokenizer:77] [PID:2204918] [RANK:3] loading model [2025-05-15 22:52:02,560] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204922] [RANK:7] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:52:02,560] [DEBUG] [axolotl.train.setup_model_and_tokenizer:77] [PID:2204922] [RANK:7] loading model [2025-05-15 22:52:02,561] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204917] [RANK:2] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:52:02,561] [DEBUG] [axolotl.train.setup_model_and_tokenizer:77] [PID:2204917] [RANK:2] loading model [2025-05-15 22:52:02,572] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204916] [RANK:1] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:52:02,572] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204920] [RANK:5] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:52:02,572] [DEBUG] [axolotl.train.setup_model_and_tokenizer:77] [PID:2204916] [RANK:1] loading model [2025-05-15 22:52:02,572] [DEBUG] [axolotl.train.setup_model_and_tokenizer:77] [PID:2204920] [RANK:5] loading model [2025-05-15 22:52:02,586] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204921] [RANK:6] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:52:02,586] [DEBUG] [axolotl.train.setup_model_and_tokenizer:77] [PID:2204921] [RANK:6] loading model [2025-05-15 22:52:02,614] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204919] [RANK:4] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:52:02,614] [DEBUG] [axolotl.train.setup_model_and_tokenizer:77] [PID:2204919] [RANK:4] loading model [2025-05-15 22:52:02,646] [DEBUG] [axolotl.utils.models.load_tokenizer:461] [PID:2204915] [RANK:0] EOS: 128001 / <|end_of_text|> [2025-05-15 22:52:02,646] [DEBUG] [axolotl.utils.models.load_tokenizer:462] [PID:2204915] [RANK:0] BOS: 128000 / <|begin_of_text|> [2025-05-15 22:52:02,646] [DEBUG] [axolotl.utils.models.load_tokenizer:463] [PID:2204915] [RANK:0] PAD: 128001 / <|end_of_text|> [2025-05-15 22:52:02,646] [DEBUG] [axolotl.utils.models.load_tokenizer:464] [PID:2204915] [RANK:0] UNK: None / None [2025-05-15 22:52:02,646] [INFO] [axolotl.utils.models.load_tokenizer:478] [PID:2204915] [RANK:0] No Chat template selected. Consider adding a chat template for easier inference. [2025-05-15 22:52:02,646] [DEBUG] [axolotl.train.setup_model_and_tokenizer:77] [PID:2204915] [RANK:0] loading model Loading checkpoint shards: 0%| | 0/4 [00:00