+ deepspeed --master_port 63196 --module safe_rlhf.finetune --train_datasets inverse-json::/home/hansirui_1st/jiayi/resist/setting3/safety_data/training/safe/safe_10k.json --model_name_or_path /aifs4su/hansirui_1st/models/Qwen1.5-4B --max_length 2048 --trust_remote_code True --epochs 1 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --gradient_checkpointing --learning_rate 1e-5 --lr_warmup_ratio 0 --weight_decay 0.0 --lr_scheduler_type constant --weight_decay 0.0 --seed 42 --output_dir /aifs4su/hansirui_1st/boyuan/resist/setting3-safety/Qwen1.5-4B/Qwen1.5-4B-s3-Q1-10k --log_type wandb --log_run_name qwen-4b-s3-Q1-10k --log_project Inverse_Alignment --zero_stage 3 --offload none --bf16 True --tf32 True --save_16bit [rank1]:[W528 20:28:56.616013922 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 1] using GPU 1 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. [rank7]:[W528 20:28:56.707723026 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 7] using GPU 7 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. [rank2]:[W528 20:28:56.771028194 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 2] using GPU 2 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. [rank6]:[W528 20:28:56.792126207 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 6] using GPU 6 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. [rank3]:[W528 20:28:56.818517399 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 3] using GPU 3 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. [rank4]:[W528 20:28:56.824782463 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 4] using GPU 4 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. [rank0]:[W528 20:28:57.387286905 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 0] using GPU 0 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. [rank5]:[W528 20:28:57.394591796 ProcessGroupNCCL.cpp:4561] [PG ID 0 PG GUID 0 Rank 5] using GPU 5 to perform barrier as devices used by this process are currently unknown. This can potentially cause a hang if this rank to GPU mapping is incorrect. Specify device_ids in barrier() to force use of a particular device, or call init_process_group() with a device_id. loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-4B/config.json loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-4B/config.json loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-4B/config.json loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-4B/config.json loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-4B/config.json loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-4B/config.json loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-4B/config.json loading configuration file /aifs4su/hansirui_1st/models/Qwen1.5-4B/config.json Model config Qwen2Config { "_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-4B", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 6912, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 20, "num_hidden_layers": 40, "num_key_value_heads": 20, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 5000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } Model config Qwen2Config { "_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-4B", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 6912, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 20, "num_hidden_layers": 40, "num_key_value_heads": 20, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 5000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } Model config Qwen2Config { "_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-4B", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 6912, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 20, "num_hidden_layers": 40, "num_key_value_heads": 20, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 5000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } Model config Qwen2Config { "_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-4B", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 6912, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 20, "num_hidden_layers": 40, "num_key_value_heads": 20, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 5000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } Model config Qwen2Config { "_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-4B", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 6912, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 20, "num_hidden_layers": 40, "num_key_value_heads": 20, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 5000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } Model config Qwen2Config { "_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-4B", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 6912, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 20, "num_hidden_layers": 40, "num_key_value_heads": 20, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 5000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } Model config Qwen2Config { "_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-4B", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 6912, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 20, "num_hidden_layers": 40, "num_key_value_heads": 20, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 5000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } Model config Qwen2Config { "_name_or_path": "/aifs4su/hansirui_1st/models/Qwen1.5-4B", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_size": 2560, "initializer_range": 0.02, "intermediate_size": 6912, "max_position_embeddings": 32768, "max_window_layers": 21, "model_type": "qwen2", "num_attention_heads": 20, "num_hidden_layers": 40, "num_key_value_heads": 20, "rms_norm_eps": 1e-06, "rope_scaling": null, "rope_theta": 5000000.0, "sliding_window": 32768, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.49.0", "use_cache": true, "use_sliding_window": false, "vocab_size": 151936 } loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-4B/model.safetensors.index.json loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-4B/model.safetensors.index.json Will use torch_dtype=torch.bfloat16 as defined in model's config object Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. Detected DeepSpeed ZeRO-3: activating zero.init() for this model Will use torch_dtype=torch.bfloat16 as defined in model's config object Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-4B/model.safetensors.index.json Detected DeepSpeed ZeRO-3: activating zero.init() for this model loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-4B/model.safetensors.index.json loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-4B/model.safetensors.index.json loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-4B/model.safetensors.index.json Will use torch_dtype=torch.bfloat16 as defined in model's config object Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. Will use torch_dtype=torch.bfloat16 as defined in model's config object Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. Will use torch_dtype=torch.bfloat16 as defined in model's config object Detected DeepSpeed ZeRO-3: activating zero.init() for this model Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. Detected DeepSpeed ZeRO-3: activating zero.init() for this model Detected DeepSpeed ZeRO-3: activating zero.init() for this model loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-4B/model.safetensors.index.json Will use torch_dtype=torch.bfloat16 as defined in model's config object Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. Detected DeepSpeed ZeRO-3: activating zero.init() for this model loading weights file /aifs4su/hansirui_1st/models/Qwen1.5-4B/model.safetensors.index.json Will use torch_dtype=torch.bfloat16 as defined in model's config object Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. Detected DeepSpeed ZeRO-3: activating zero.init() for this model Will use torch_dtype=torch.bfloat16 as defined in model's config object Instantiating Qwen2ForCausalLM model under default dtype torch.bfloat16. Detected DeepSpeed ZeRO-3: activating zero.init() for this model Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151643 } Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151643 } Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151643 } Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151643 } Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151643 } Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151643 } Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151643 } Generate config GenerationConfig { "bos_token_id": 151643, "eos_token_id": 151643 } Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered. Loading checkpoint shards: 0%| | 0/2 [00:00.wrapper of > Traceback (most recent call last): File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/utils.py", line 212, in wrapper return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/hansirui_1st/jiayi/resist/setting3/safe_rlhf/logger.py", line 183, in close self.wandb.finish() File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 449, in wrapper return func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 391, in wrapper return func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2106, in finish return self._finish(exit_code) ^^^^^^^^^^^^^^^^^^^^^^^ File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2127, in _finish self._atexit_cleanup(exit_code=exit_code) File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2352, in _atexit_cleanup self._on_finish() File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/wandb_run.py", line 2609, in _on_finish wait_with_progress( File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 24, in wait_with_progress return wait_all_with_progress( ^^^^^^^^^^^^^^^^^^^^^^^ File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/mailbox/wait_with_progress.py", line 87, in wait_all_with_progress return asyncio_compat.run(progress_loop_with_timeout) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/site-packages/wandb/sdk/lib/asyncio_compat.py", line 27, in run future = executor.submit(runner.run, fn) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/aifs4su/hansirui_1st/miniconda3/envs/by-align/lib/python3.11/concurrent/futures/thread.py", line 169, in submit raise RuntimeError('cannot schedule new futures after ' RuntimeError: cannot schedule new futures after interpreter shutdown