[2024-10-11 10:25:04,881] INFO: Will use torch.nn.parallel.DistributedDataParallel() and 8 gpus [2024-10-11 10:25:04,885] INFO: AMD Radeon Graphics [2024-10-11 10:25:04,885] INFO: AMD Radeon Graphics [2024-10-11 10:25:04,885] INFO: AMD Radeon Graphics [2024-10-11 10:25:04,885] INFO: AMD Radeon Graphics [2024-10-11 10:25:04,886] INFO: AMD Radeon Graphics [2024-10-11 10:25:04,886] INFO: AMD Radeon Graphics [2024-10-11 10:25:04,886] INFO: AMD Radeon Graphics [2024-10-11 10:25:04,886] INFO: AMD Radeon Graphics [2024-10-11 10:25:08,545] INFO: configured dtype=torch.bfloat16 for autocast [2024-10-11 10:25:10,626] INFO: using attention_type=math [2024-10-11 10:25:10,662] INFO: using attention_type=math [2024-10-11 10:25:10,697] INFO: using attention_type=math [2024-10-11 10:25:10,735] INFO: using attention_type=math [2024-10-11 10:25:10,770] INFO: using attention_type=math [2024-10-11 10:25:10,807] INFO: using attention_type=math [2024-10-11 10:25:10,843] INFO: using attention_type=math [2024-10-11 10:25:10,879] INFO: using attention_type=math [2024-10-11 10:25:10,914] INFO: using attention_type=math [2024-10-11 10:25:10,950] INFO: using attention_type=math [2024-10-11 10:25:10,987] INFO: using attention_type=math [2024-10-11 10:25:11,024] INFO: using attention_type=math [2024-10-11 10:25:16,061] INFO: DistributedDataParallel( (module): MLPF( (nn0_id): ModuleList( (0-1): 2 x Sequential( (0): Linear(in_features=17, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=1024, bias=True) ) ) (nn0_reg): ModuleList( (0-1): 2 x Sequential( (0): Linear(in_features=17, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=1024, bias=True) ) ) (conv_id): ModuleList( (0-5): 6 x PreLnSelfAttentionLayer( (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=1024, out_features=1024, bias=True) ) (norm0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (seq): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): Linear(in_features=1024, out_features=1024, bias=True) (3): GELU(approximate='none') ) (dropout): Dropout(p=0.0, inplace=False) ) ) (conv_reg): ModuleList( (0-5): 6 x PreLnSelfAttentionLayer( (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=1024, out_features=1024, bias=True) ) (norm0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (seq): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): Linear(in_features=1024, out_features=1024, bias=True) (3): GELU(approximate='none') ) (dropout): Dropout(p=0.1, inplace=False) ) ) (nn_binary_particle): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=2, bias=True) ) (nn_pid): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=6, bias=True) ) (nn_pt): RegressionOutput( (nn): ModuleList( (0-1): 2 x Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=1, bias=True) ) ) ) (nn_eta): RegressionOutput( (nn): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=2, bias=True) ) ) (nn_sin_phi): RegressionOutput( (nn): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=2, bias=True) ) ) (nn_cos_phi): RegressionOutput( (nn): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=2, bias=True) ) ) (nn_energy): RegressionOutput( (nn): ModuleList( (0-1): 2 x Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=1, bias=True) ) ) ) (final_norm_id): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (final_norm_reg): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) [2024-10-11 10:25:16,062] INFO: Trainable parameters: 89388050 [2024-10-11 10:25:16,062] INFO: Non-trainable parameters: 0 [2024-10-11 10:25:16,062] INFO: Total parameters: 89388050 [2024-10-11 10:25:16,067] INFO: Modules Trainable parameters Non-trainable parameters module.nn0_id.0.0.weight 17408 0 module.nn0_id.0.0.bias 1024 0 module.nn0_id.0.2.weight 1024 0 module.nn0_id.0.2.bias 1024 0 module.nn0_id.0.4.weight 1048576 0 module.nn0_id.0.4.bias 1024 0 module.nn0_id.1.0.weight 17408 0 module.nn0_id.1.0.bias 1024 0 module.nn0_id.1.2.weight 1024 0 module.nn0_id.1.2.bias 1024 0 module.nn0_id.1.4.weight 1048576 0 module.nn0_id.1.4.bias 1024 0 module.nn0_reg.0.0.weight 17408 0 module.nn0_reg.0.0.bias 1024 0 module.nn0_reg.0.2.weight 1024 0 module.nn0_reg.0.2.bias 1024 0 module.nn0_reg.0.4.weight 1048576 0 module.nn0_reg.0.4.bias 1024 0 module.nn0_reg.1.0.weight 17408 0 module.nn0_reg.1.0.bias 1024 0 module.nn0_reg.1.2.weight 1024 0 module.nn0_reg.1.2.bias 1024 0 module.nn0_reg.1.4.weight 1048576 0 module.nn0_reg.1.4.bias 1024 0 module.conv_id.0.mha.in_proj_weight 3145728 0 module.conv_id.0.mha.in_proj_bias 3072 0 module.conv_id.0.mha.out_proj.weight 1048576 0 module.conv_id.0.mha.out_proj.bias 1024 0 module.conv_id.0.norm0.weight 1024 0 module.conv_id.0.norm0.bias 1024 0 module.conv_id.0.norm1.weight 1024 0 module.conv_id.0.norm1.bias 1024 0 module.conv_id.0.seq.0.weight 1048576 0 module.conv_id.0.seq.0.bias 1024 0 module.conv_id.0.seq.2.weight 1048576 0 module.conv_id.0.seq.2.bias 1024 0 module.conv_id.1.mha.in_proj_weight 3145728 0 module.conv_id.1.mha.in_proj_bias 3072 0 module.conv_id.1.mha.out_proj.weight 1048576 0 module.conv_id.1.mha.out_proj.bias 1024 0 module.conv_id.1.norm0.weight 1024 0 module.conv_id.1.norm0.bias 1024 0 module.conv_id.1.norm1.weight 1024 0 module.conv_id.1.norm1.bias 1024 0 module.conv_id.1.seq.0.weight 1048576 0 module.conv_id.1.seq.0.bias 1024 0 module.conv_id.1.seq.2.weight 1048576 0 module.conv_id.1.seq.2.bias 1024 0 module.conv_id.2.mha.in_proj_weight 3145728 0 module.conv_id.2.mha.in_proj_bias 3072 0 module.conv_id.2.mha.out_proj.weight 1048576 0 module.conv_id.2.mha.out_proj.bias 1024 0 module.conv_id.2.norm0.weight 1024 0 module.conv_id.2.norm0.bias 1024 0 module.conv_id.2.norm1.weight 1024 0 module.conv_id.2.norm1.bias 1024 0 module.conv_id.2.seq.0.weight 1048576 0 module.conv_id.2.seq.0.bias 1024 0 module.conv_id.2.seq.2.weight 1048576 0 module.conv_id.2.seq.2.bias 1024 0 module.conv_id.3.mha.in_proj_weight 3145728 0 module.conv_id.3.mha.in_proj_bias 3072 0 module.conv_id.3.mha.out_proj.weight 1048576 0 module.conv_id.3.mha.out_proj.bias 1024 0 module.conv_id.3.norm0.weight 1024 0 module.conv_id.3.norm0.bias 1024 0 module.conv_id.3.norm1.weight 1024 0 module.conv_id.3.norm1.bias 1024 0 module.conv_id.3.seq.0.weight 1048576 0 module.conv_id.3.seq.0.bias 1024 0 module.conv_id.3.seq.2.weight 1048576 0 module.conv_id.3.seq.2.bias 1024 0 module.conv_id.4.mha.in_proj_weight 3145728 0 module.conv_id.4.mha.in_proj_bias 3072 0 module.conv_id.4.mha.out_proj.weight 1048576 0 module.conv_id.4.mha.out_proj.bias 1024 0 module.conv_id.4.norm0.weight 1024 0 module.conv_id.4.norm0.bias 1024 0 module.conv_id.4.norm1.weight 1024 0 module.conv_id.4.norm1.bias 1024 0 module.conv_id.4.seq.0.weight 1048576 0 module.conv_id.4.seq.0.bias 1024 0 module.conv_id.4.seq.2.weight 1048576 0 module.conv_id.4.seq.2.bias 1024 0 module.conv_id.5.mha.in_proj_weight 3145728 0 module.conv_id.5.mha.in_proj_bias 3072 0 module.conv_id.5.mha.out_proj.weight 1048576 0 module.conv_id.5.mha.out_proj.bias 1024 0 module.conv_id.5.norm0.weight 1024 0 module.conv_id.5.norm0.bias 1024 0 module.conv_id.5.norm1.weight 1024 0 module.conv_id.5.norm1.bias 1024 0 module.conv_id.5.seq.0.weight 1048576 0 module.conv_id.5.seq.0.bias 1024 0 module.conv_id.5.seq.2.weight 1048576 0 module.conv_id.5.seq.2.bias 1024 0 module.conv_reg.0.mha.in_proj_weight 3145728 0 module.conv_reg.0.mha.in_proj_bias 3072 0 module.conv_reg.0.mha.out_proj.weight 1048576 0 module.conv_reg.0.mha.out_proj.bias 1024 0 module.conv_reg.0.norm0.weight 1024 0 module.conv_reg.0.norm0.bias 1024 0 module.conv_reg.0.norm1.weight 1024 0 module.conv_reg.0.norm1.bias 1024 0 module.conv_reg.0.seq.0.weight 1048576 0 module.conv_reg.0.seq.0.bias 1024 0 module.conv_reg.0.seq.2.weight 1048576 0 module.conv_reg.0.seq.2.bias 1024 0 module.conv_reg.1.mha.in_proj_weight 3145728 0 module.conv_reg.1.mha.in_proj_bias 3072 0 module.conv_reg.1.mha.out_proj.weight 1048576 0 module.conv_reg.1.mha.out_proj.bias 1024 0 module.conv_reg.1.norm0.weight 1024 0 module.conv_reg.1.norm0.bias 1024 0 module.conv_reg.1.norm1.weight 1024 0 module.conv_reg.1.norm1.bias 1024 0 module.conv_reg.1.seq.0.weight 1048576 0 module.conv_reg.1.seq.0.bias 1024 0 module.conv_reg.1.seq.2.weight 1048576 0 module.conv_reg.1.seq.2.bias 1024 0 module.conv_reg.2.mha.in_proj_weight 3145728 0 module.conv_reg.2.mha.in_proj_bias 3072 0 module.conv_reg.2.mha.out_proj.weight 1048576 0 module.conv_reg.2.mha.out_proj.bias 1024 0 module.conv_reg.2.norm0.weight 1024 0 module.conv_reg.2.norm0.bias 1024 0 module.conv_reg.2.norm1.weight 1024 0 module.conv_reg.2.norm1.bias 1024 0 module.conv_reg.2.seq.0.weight 1048576 0 module.conv_reg.2.seq.0.bias 1024 0 module.conv_reg.2.seq.2.weight 1048576 0 module.conv_reg.2.seq.2.bias 1024 0 module.conv_reg.3.mha.in_proj_weight 3145728 0 module.conv_reg.3.mha.in_proj_bias 3072 0 module.conv_reg.3.mha.out_proj.weight 1048576 0 module.conv_reg.3.mha.out_proj.bias 1024 0 module.conv_reg.3.norm0.weight 1024 0 module.conv_reg.3.norm0.bias 1024 0 module.conv_reg.3.norm1.weight 1024 0 module.conv_reg.3.norm1.bias 1024 0 module.conv_reg.3.seq.0.weight 1048576 0 module.conv_reg.3.seq.0.bias 1024 0 module.conv_reg.3.seq.2.weight 1048576 0 module.conv_reg.3.seq.2.bias 1024 0 module.conv_reg.4.mha.in_proj_weight 3145728 0 module.conv_reg.4.mha.in_proj_bias 3072 0 module.conv_reg.4.mha.out_proj.weight 1048576 0 module.conv_reg.4.mha.out_proj.bias 1024 0 module.conv_reg.4.norm0.weight 1024 0 module.conv_reg.4.norm0.bias 1024 0 module.conv_reg.4.norm1.weight 1024 0 module.conv_reg.4.norm1.bias 1024 0 module.conv_reg.4.seq.0.weight 1048576 0 module.conv_reg.4.seq.0.bias 1024 0 module.conv_reg.4.seq.2.weight 1048576 0 module.conv_reg.4.seq.2.bias 1024 0 module.conv_reg.5.mha.in_proj_weight 3145728 0 module.conv_reg.5.mha.in_proj_bias 3072 0 module.conv_reg.5.mha.out_proj.weight 1048576 0 module.conv_reg.5.mha.out_proj.bias 1024 0 module.conv_reg.5.norm0.weight 1024 0 module.conv_reg.5.norm0.bias 1024 0 module.conv_reg.5.norm1.weight 1024 0 module.conv_reg.5.norm1.bias 1024 0 module.conv_reg.5.seq.0.weight 1048576 0 module.conv_reg.5.seq.0.bias 1024 0 module.conv_reg.5.seq.2.weight 1048576 0 module.conv_reg.5.seq.2.bias 1024 0 module.nn_binary_particle.0.weight 1048576 0 module.nn_binary_particle.0.bias 1024 0 module.nn_binary_particle.2.weight 1024 0 module.nn_binary_particle.2.bias 1024 0 module.nn_binary_particle.4.weight 2048 0 module.nn_binary_particle.4.bias 2 0 module.nn_pid.0.weight 1048576 0 module.nn_pid.0.bias 1024 0 module.nn_pid.2.weight 1024 0 module.nn_pid.2.bias 1024 0 module.nn_pid.4.weight 6144 0 module.nn_pid.4.bias 6 0 module.nn_pt.nn.0.0.weight 1048576 0 module.nn_pt.nn.0.0.bias 1024 0 module.nn_pt.nn.0.2.weight 1024 0 module.nn_pt.nn.0.2.bias 1024 0 module.nn_pt.nn.0.4.weight 1024 0 module.nn_pt.nn.0.4.bias 1 0 module.nn_pt.nn.1.0.weight 1048576 0 module.nn_pt.nn.1.0.bias 1024 0 module.nn_pt.nn.1.2.weight 1024 0 module.nn_pt.nn.1.2.bias 1024 0 module.nn_pt.nn.1.4.weight 1024 0 module.nn_pt.nn.1.4.bias 1 0 module.nn_eta.nn.0.weight 1048576 0 module.nn_eta.nn.0.bias 1024 0 module.nn_eta.nn.2.weight 1024 0 module.nn_eta.nn.2.bias 1024 0 module.nn_eta.nn.4.weight 2048 0 module.nn_eta.nn.4.bias 2 0 module.nn_sin_phi.nn.0.weight 1048576 0 module.nn_sin_phi.nn.0.bias 1024 0 module.nn_sin_phi.nn.2.weight 1024 0 module.nn_sin_phi.nn.2.bias 1024 0 module.nn_sin_phi.nn.4.weight 2048 0 module.nn_sin_phi.nn.4.bias 2 0 module.nn_cos_phi.nn.0.weight 1048576 0 module.nn_cos_phi.nn.0.bias 1024 0 module.nn_cos_phi.nn.2.weight 1024 0 module.nn_cos_phi.nn.2.bias 1024 0 module.nn_cos_phi.nn.4.weight 2048 0 module.nn_cos_phi.nn.4.bias 2 0 module.nn_energy.nn.0.0.weight 1048576 0 module.nn_energy.nn.0.0.bias 1024 0 module.nn_energy.nn.0.2.weight 1024 0 module.nn_energy.nn.0.2.bias 1024 0 module.nn_energy.nn.0.4.weight 1024 0 module.nn_energy.nn.0.4.bias 1 0 module.nn_energy.nn.1.0.weight 1048576 0 module.nn_energy.nn.1.0.bias 1024 0 module.nn_energy.nn.1.2.weight 1024 0 module.nn_energy.nn.1.2.bias 1024 0 module.nn_energy.nn.1.4.weight 1024 0 module.nn_energy.nn.1.4.bias 1 0 module.final_norm_id.weight 1024 0 module.final_norm_id.bias 1024 0 module.final_norm_reg.weight 1024 0 module.final_norm_reg.bias 1024 0 [2024-10-11 10:25:16,070] INFO: Creating experiment dir experiments/pyg-clic_20241011_102451_167094 [2024-10-11 10:25:16,071] INFO: Model directory experiments/pyg-clic_20241011_102451_167094 [2024-10-11 10:25:16,279] INFO: train_dataset: clic_edm_qq_pf, 3598296 [2024-10-11 10:25:16,379] INFO: train_dataset: clic_edm_ttbar_pf, 7139800 [2024-10-11 10:25:16,464] INFO: train_dataset: clic_edm_ww_fullhad_pf, 3600900 [2024-10-11 10:25:49,796] INFO: valid_dataset: clic_edm_qq_pf, 399822 [2024-10-11 10:25:49,821] INFO: valid_dataset: clic_edm_ttbar_pf, 793400 [2024-10-11 10:25:49,834] INFO: valid_dataset: clic_edm_ww_fullhad_pf, 400100 [2024-10-11 10:25:50,665] INFO: Initiating epoch #1 train run on device rank=0 [2024-10-11 13:35:01,251] INFO: Initiating epoch #1 valid run on device rank=0 [2024-10-11 13:42:08,318] INFO: Rank 0: epoch=1 / 30 train_loss=3.4490 valid_loss=2.8678 stale=0 epoch_train_time=189.18m epoch_valid_time=7.01m epoch_total_time=196.29m eta=5692.5m [2024-10-11 13:42:08,328] INFO: Initiating epoch #2 train run on device rank=0 [2024-10-11 16:49:04,054] INFO: Initiating epoch #2 valid run on device rank=0 [2024-10-11 16:56:07,094] INFO: Rank 0: epoch=2 / 30 train_loss=2.6138 valid_loss=2.5000 stale=0 epoch_train_time=186.93m epoch_valid_time=6.94m epoch_total_time=193.98m eta=5463.8m [2024-10-11 16:56:07,110] INFO: Initiating epoch #3 train run on device rank=0 [2024-10-11 20:03:17,596] INFO: Initiating epoch #3 valid run on device rank=0 [2024-10-11 20:10:21,873] INFO: Rank 0: epoch=3 / 30 train_loss=2.3946 valid_loss=2.3688 stale=0 epoch_train_time=187.17m epoch_valid_time=6.96m epoch_total_time=194.25m eta=5260.7m [2024-10-11 20:10:21,896] INFO: Initiating epoch #4 train run on device rank=0 [2024-10-11 23:17:12,730] INFO: Initiating epoch #4 valid run on device rank=0 [2024-10-11 23:24:13,970] INFO: Rank 0: epoch=4 / 30 train_loss=2.2850 valid_loss=2.2818 stale=0 epoch_train_time=186.85m epoch_valid_time=6.91m epoch_total_time=193.87m eta=5059.5m [2024-10-11 23:24:13,979] INFO: Initiating epoch #5 train run on device rank=0 [2024-10-12 02:31:25,465] INFO: Initiating epoch #5 valid run on device rank=0 [2024-10-12 02:38:28,338] INFO: Rank 0: epoch=5 / 30 train_loss=2.2046 valid_loss=2.2072 stale=0 epoch_train_time=187.19m epoch_valid_time=6.94m epoch_total_time=194.24m eta=4863.1m [2024-10-12 02:38:28,362] INFO: Initiating epoch #6 train run on device rank=0 [2024-10-12 05:45:31,198] INFO: Initiating epoch #6 valid run on device rank=0 [2024-10-12 05:52:32,395] INFO: Rank 0: epoch=6 / 30 train_loss=2.1378 valid_loss=2.1469 stale=0 epoch_train_time=187.05m epoch_valid_time=6.91m epoch_total_time=194.07m eta=4666.8m [2024-10-12 05:52:32,420] INFO: Initiating epoch #7 train run on device rank=0 [2024-10-12 08:59:54,801] INFO: Initiating epoch #7 valid run on device rank=0 [2024-10-12 09:06:58,357] INFO: Rank 0: epoch=7 / 30 train_loss=2.0795 valid_loss=2.0962 stale=0 epoch_train_time=187.37m epoch_valid_time=6.95m epoch_total_time=194.43m eta=4472.3m [2024-10-12 09:06:58,381] INFO: Initiating epoch #8 train run on device rank=0 [2024-10-12 12:13:59,370] INFO: Initiating epoch #8 valid run on device rank=0 [2024-10-12 12:21:02,407] INFO: Rank 0: epoch=8 / 30 train_loss=2.0358 valid_loss=2.0624 stale=0 epoch_train_time=187.02m epoch_valid_time=6.94m epoch_total_time=194.07m eta=4276.8m [2024-10-12 12:21:02,425] INFO: Initiating epoch #9 train run on device rank=0 [2024-10-12 15:28:05,805] INFO: Initiating epoch #9 valid run on device rank=0 [2024-10-12 15:35:08,750] INFO: Rank 0: epoch=9 / 30 train_loss=2.0020 valid_loss=2.0356 stale=0 epoch_train_time=187.06m epoch_valid_time=6.94m epoch_total_time=194.11m eta=4081.7m [2024-10-12 15:35:08,766] INFO: Initiating epoch #10 train run on device rank=0 [2024-10-12 18:41:58,556] INFO: Initiating epoch #10 valid run on device rank=0 [2024-10-12 18:49:01,652] INFO: Rank 0: epoch=10 / 30 train_loss=1.9755 valid_loss=2.0143 stale=0 epoch_train_time=186.83m epoch_valid_time=6.94m epoch_total_time=193.88m eta=3886.4m [2024-10-12 18:49:01,676] INFO: Initiating epoch #11 train run on device rank=0 [2024-10-12 21:56:02,732] INFO: Initiating epoch #11 valid run on device rank=0 [2024-10-12 22:03:06,330] INFO: Rank 0: epoch=11 / 30 train_loss=1.9537 valid_loss=1.9981 stale=0 epoch_train_time=187.02m epoch_valid_time=6.95m epoch_total_time=194.08m eta=3691.6m [2024-10-12 22:03:06,351] INFO: Initiating epoch #12 train run on device rank=0 [2024-10-13 01:09:56,849] INFO: Initiating epoch #12 valid run on device rank=0 [2024-10-13 01:16:58,608] INFO: Rank 0: epoch=12 / 30 train_loss=1.9351 valid_loss=1.9845 stale=0 epoch_train_time=186.84m epoch_valid_time=6.92m epoch_total_time=193.87m eta=3496.7m [2024-10-13 01:16:58,631] INFO: Initiating epoch #13 train run on device rank=0 [2024-10-13 04:23:56,005] INFO: Initiating epoch #13 valid run on device rank=0 [2024-10-13 04:30:59,653] INFO: Rank 0: epoch=13 / 30 train_loss=1.9185 valid_loss=1.9727 stale=0 epoch_train_time=186.96m epoch_valid_time=6.95m epoch_total_time=194.02m eta=3302.1m [2024-10-13 04:30:59,663] INFO: Initiating epoch #14 train run on device rank=0 [2024-10-13 07:37:50,487] INFO: Initiating epoch #14 valid run on device rank=0 [2024-10-13 07:44:53,667] INFO: Rank 0: epoch=14 / 30 train_loss=1.9034 valid_loss=1.9607 stale=0 epoch_train_time=186.85m epoch_valid_time=6.95m epoch_total_time=193.9m eta=3107.5m [2024-10-13 07:44:53,696] INFO: Initiating epoch #15 train run on device rank=0 [2024-10-13 10:51:52,917] INFO: Initiating epoch #15 valid run on device rank=0 [2024-10-13 10:58:55,298] INFO: Rank 0: epoch=15 / 30 train_loss=1.8897 valid_loss=1.9506 stale=0 epoch_train_time=186.99m epoch_valid_time=6.93m epoch_total_time=194.03m eta=2913.1m [2024-10-13 10:58:55,316] INFO: Initiating epoch #16 train run on device rank=0 [2024-10-13 14:05:54,702] INFO: Initiating epoch #16 valid run on device rank=0 [2024-10-13 14:12:57,300] INFO: Rank 0: epoch=16 / 30 train_loss=1.8772 valid_loss=1.9428 stale=0 epoch_train_time=186.99m epoch_valid_time=6.94m epoch_total_time=194.03m eta=2718.7m [2024-10-13 14:12:57,326] INFO: Initiating epoch #17 train run on device rank=0 [2024-10-13 17:20:12,568] INFO: Initiating epoch #17 valid run on device rank=0 [2024-10-13 17:27:16,171] INFO: Rank 0: epoch=17 / 30 train_loss=1.8658 valid_loss=1.9375 stale=0 epoch_train_time=187.25m epoch_valid_time=6.95m epoch_total_time=194.31m eta=2524.6m [2024-10-13 17:27:16,206] INFO: Initiating epoch #18 train run on device rank=0 [2024-10-13 20:34:27,667] INFO: Initiating epoch #18 valid run on device rank=0 [2024-10-13 20:41:31,076] INFO: Rank 0: epoch=18 / 30 train_loss=1.8551 valid_loss=1.9299 stale=0 epoch_train_time=187.19m epoch_valid_time=6.95m epoch_total_time=194.25m eta=2330.4m [2024-10-13 20:41:31,089] INFO: Initiating epoch #19 train run on device rank=0 [2024-10-13 23:48:44,511] INFO: Initiating epoch #19 valid run on device rank=0 [2024-10-13 23:55:47,205] INFO: Rank 0: epoch=19 / 30 train_loss=1.8452 valid_loss=1.9244 stale=0 epoch_train_time=187.22m epoch_valid_time=6.94m epoch_total_time=194.27m eta=2136.3m [2024-10-13 23:55:47,222] INFO: Initiating epoch #20 train run on device rank=0 [2024-10-14 03:02:39,426] INFO: Initiating epoch #20 valid run on device rank=0 [2024-10-14 03:09:42,527] INFO: Rank 0: epoch=20 / 30 train_loss=1.8362 valid_loss=1.9202 stale=0 epoch_train_time=186.87m epoch_valid_time=6.94m epoch_total_time=193.92m eta=1941.9m [2024-10-14 03:09:42,550] INFO: Initiating epoch #21 train run on device rank=0 [2024-10-14 06:16:45,945] INFO: Initiating epoch #21 valid run on device rank=0 [2024-10-14 06:23:49,200] INFO: Rank 0: epoch=21 / 30 train_loss=1.8278 valid_loss=1.9160 stale=0 epoch_train_time=187.06m epoch_valid_time=6.95m epoch_total_time=194.11m eta=1747.7m [2024-10-14 06:23:49,218] INFO: Initiating epoch #22 train run on device rank=0 [2024-10-14 09:30:41,408] INFO: Initiating epoch #22 valid run on device rank=0 [2024-10-14 09:37:46,117] INFO: Rank 0: epoch=22 / 30 train_loss=1.8199 valid_loss=1.9131 stale=0 epoch_train_time=186.87m epoch_valid_time=6.97m epoch_total_time=193.95m eta=1553.4m [2024-10-14 09:37:46,143] INFO: Initiating epoch #23 train run on device rank=0 [2024-10-14 09:47:43,648] INFO: Will use torch.nn.parallel.DistributedDataParallel() and 8 gpus [2024-10-14 09:47:43,652] INFO: AMD Radeon Graphics [2024-10-14 09:47:43,652] INFO: AMD Radeon Graphics [2024-10-14 09:47:43,652] INFO: AMD Radeon Graphics [2024-10-14 09:47:43,652] INFO: AMD Radeon Graphics [2024-10-14 09:47:43,652] INFO: AMD Radeon Graphics [2024-10-14 09:47:43,652] INFO: AMD Radeon Graphics [2024-10-14 09:47:43,653] INFO: AMD Radeon Graphics [2024-10-14 09:47:43,653] INFO: AMD Radeon Graphics [2024-10-14 09:47:47,850] INFO: configured dtype=torch.bfloat16 for autocast [2024-10-14 09:47:50,639] INFO: model_kwargs: {'input_dim': 17, 'num_classes': 6, 'input_encoding': 'split', 'pt_mode': 'direct-elemtype-split', 'eta_mode': 'linear', 'sin_phi_mode': 'linear', 'cos_phi_mode': 'linear', 'energy_mode': 'direct-elemtype-split', 'elemtypes_nonzero': [1, 2], 'learned_representation_mode': 'last', 'conv_type': 'attention', 'num_convs': 6, 'dropout_ff': 0.1, 'dropout_conv_id_mha': 0.0, 'dropout_conv_id_ff': 0.0, 'dropout_conv_reg_mha': 0.1, 'dropout_conv_reg_ff': 0.1, 'activation': 'gelu', 'head_dim': 32, 'num_heads': 32, 'attention_type': 'math', 'use_pre_layernorm': True} [2024-10-14 09:47:50,738] INFO: using attention_type=math [2024-10-14 09:47:50,778] INFO: using attention_type=math [2024-10-14 09:47:50,814] INFO: using attention_type=math [2024-10-14 09:47:50,849] INFO: using attention_type=math [2024-10-14 09:47:50,884] INFO: using attention_type=math [2024-10-14 09:47:50,921] INFO: using attention_type=math [2024-10-14 09:47:50,957] INFO: using attention_type=math [2024-10-14 09:47:50,993] INFO: using attention_type=math [2024-10-14 09:47:51,029] INFO: using attention_type=math [2024-10-14 09:47:51,065] INFO: using attention_type=math [2024-10-14 09:47:51,102] INFO: using attention_type=math [2024-10-14 09:47:51,137] INFO: using attention_type=math [2024-10-14 09:47:58,146] INFO: Loaded model weights from experiments/pyg-clic_20241011_102451_167094/checkpoints/checkpoint-22-1.913142.pth [2024-10-14 09:47:59,962] INFO: DistributedDataParallel( (module): MLPF( (nn0_id): ModuleList( (0-1): 2 x Sequential( (0): Linear(in_features=17, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=1024, bias=True) ) ) (nn0_reg): ModuleList( (0-1): 2 x Sequential( (0): Linear(in_features=17, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=1024, bias=True) ) ) (conv_id): ModuleList( (0-5): 6 x PreLnSelfAttentionLayer( (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=1024, out_features=1024, bias=True) ) (norm0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (seq): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): Linear(in_features=1024, out_features=1024, bias=True) (3): GELU(approximate='none') ) (dropout): Dropout(p=0.0, inplace=False) ) ) (conv_reg): ModuleList( (0-5): 6 x PreLnSelfAttentionLayer( (mha): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=1024, out_features=1024, bias=True) ) (norm0): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (norm1): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (seq): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): Linear(in_features=1024, out_features=1024, bias=True) (3): GELU(approximate='none') ) (dropout): Dropout(p=0.1, inplace=False) ) ) (nn_binary_particle): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=2, bias=True) ) (nn_pid): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=6, bias=True) ) (nn_pt): RegressionOutput( (nn): ModuleList( (0-1): 2 x Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=1, bias=True) ) ) ) (nn_eta): RegressionOutput( (nn): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=2, bias=True) ) ) (nn_sin_phi): RegressionOutput( (nn): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=2, bias=True) ) ) (nn_cos_phi): RegressionOutput( (nn): Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=2, bias=True) ) ) (nn_energy): RegressionOutput( (nn): ModuleList( (0-1): 2 x Sequential( (0): Linear(in_features=1024, out_features=1024, bias=True) (1): GELU(approximate='none') (2): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (3): Dropout(p=0.1, inplace=False) (4): Linear(in_features=1024, out_features=1, bias=True) ) ) ) (final_norm_id): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) (final_norm_reg): LayerNorm((1024,), eps=1e-05, elementwise_affine=True) ) ) [2024-10-14 09:47:59,965] INFO: Trainable parameters: 89388050 [2024-10-14 09:47:59,965] INFO: Non-trainable parameters: 0 [2024-10-14 09:47:59,965] INFO: Total parameters: 89388050 [2024-10-14 09:47:59,972] INFO: Modules Trainable parameters Non-trainable parameters module.nn0_id.0.0.weight 17408 0 module.nn0_id.0.0.bias 1024 0 module.nn0_id.0.2.weight 1024 0 module.nn0_id.0.2.bias 1024 0 module.nn0_id.0.4.weight 1048576 0 module.nn0_id.0.4.bias 1024 0 module.nn0_id.1.0.weight 17408 0 module.nn0_id.1.0.bias 1024 0 module.nn0_id.1.2.weight 1024 0 module.nn0_id.1.2.bias 1024 0 module.nn0_id.1.4.weight 1048576 0 module.nn0_id.1.4.bias 1024 0 module.nn0_reg.0.0.weight 17408 0 module.nn0_reg.0.0.bias 1024 0 module.nn0_reg.0.2.weight 1024 0 module.nn0_reg.0.2.bias 1024 0 module.nn0_reg.0.4.weight 1048576 0 module.nn0_reg.0.4.bias 1024 0 module.nn0_reg.1.0.weight 17408 0 module.nn0_reg.1.0.bias 1024 0 module.nn0_reg.1.2.weight 1024 0 module.nn0_reg.1.2.bias 1024 0 module.nn0_reg.1.4.weight 1048576 0 module.nn0_reg.1.4.bias 1024 0 module.conv_id.0.mha.in_proj_weight 3145728 0 module.conv_id.0.mha.in_proj_bias 3072 0 module.conv_id.0.mha.out_proj.weight 1048576 0 module.conv_id.0.mha.out_proj.bias 1024 0 module.conv_id.0.norm0.weight 1024 0 module.conv_id.0.norm0.bias 1024 0 module.conv_id.0.norm1.weight 1024 0 module.conv_id.0.norm1.bias 1024 0 module.conv_id.0.seq.0.weight 1048576 0 module.conv_id.0.seq.0.bias 1024 0 module.conv_id.0.seq.2.weight 1048576 0 module.conv_id.0.seq.2.bias 1024 0 module.conv_id.1.mha.in_proj_weight 3145728 0 module.conv_id.1.mha.in_proj_bias 3072 0 module.conv_id.1.mha.out_proj.weight 1048576 0 module.conv_id.1.mha.out_proj.bias 1024 0 module.conv_id.1.norm0.weight 1024 0 module.conv_id.1.norm0.bias 1024 0 module.conv_id.1.norm1.weight 1024 0 module.conv_id.1.norm1.bias 1024 0 module.conv_id.1.seq.0.weight 1048576 0 module.conv_id.1.seq.0.bias 1024 0 module.conv_id.1.seq.2.weight 1048576 0 module.conv_id.1.seq.2.bias 1024 0 module.conv_id.2.mha.in_proj_weight 3145728 0 module.conv_id.2.mha.in_proj_bias 3072 0 module.conv_id.2.mha.out_proj.weight 1048576 0 module.conv_id.2.mha.out_proj.bias 1024 0 module.conv_id.2.norm0.weight 1024 0 module.conv_id.2.norm0.bias 1024 0 module.conv_id.2.norm1.weight 1024 0 module.conv_id.2.norm1.bias 1024 0 module.conv_id.2.seq.0.weight 1048576 0 module.conv_id.2.seq.0.bias 1024 0 module.conv_id.2.seq.2.weight 1048576 0 module.conv_id.2.seq.2.bias 1024 0 module.conv_id.3.mha.in_proj_weight 3145728 0 module.conv_id.3.mha.in_proj_bias 3072 0 module.conv_id.3.mha.out_proj.weight 1048576 0 module.conv_id.3.mha.out_proj.bias 1024 0 module.conv_id.3.norm0.weight 1024 0 module.conv_id.3.norm0.bias 1024 0 module.conv_id.3.norm1.weight 1024 0 module.conv_id.3.norm1.bias 1024 0 module.conv_id.3.seq.0.weight 1048576 0 module.conv_id.3.seq.0.bias 1024 0 module.conv_id.3.seq.2.weight 1048576 0 module.conv_id.3.seq.2.bias 1024 0 module.conv_id.4.mha.in_proj_weight 3145728 0 module.conv_id.4.mha.in_proj_bias 3072 0 module.conv_id.4.mha.out_proj.weight 1048576 0 module.conv_id.4.mha.out_proj.bias 1024 0 module.conv_id.4.norm0.weight 1024 0 module.conv_id.4.norm0.bias 1024 0 module.conv_id.4.norm1.weight 1024 0 module.conv_id.4.norm1.bias 1024 0 module.conv_id.4.seq.0.weight 1048576 0 module.conv_id.4.seq.0.bias 1024 0 module.conv_id.4.seq.2.weight 1048576 0 module.conv_id.4.seq.2.bias 1024 0 module.conv_id.5.mha.in_proj_weight 3145728 0 module.conv_id.5.mha.in_proj_bias 3072 0 module.conv_id.5.mha.out_proj.weight 1048576 0 module.conv_id.5.mha.out_proj.bias 1024 0 module.conv_id.5.norm0.weight 1024 0 module.conv_id.5.norm0.bias 1024 0 module.conv_id.5.norm1.weight 1024 0 module.conv_id.5.norm1.bias 1024 0 module.conv_id.5.seq.0.weight 1048576 0 module.conv_id.5.seq.0.bias 1024 0 module.conv_id.5.seq.2.weight 1048576 0 module.conv_id.5.seq.2.bias 1024 0 module.conv_reg.0.mha.in_proj_weight 3145728 0 module.conv_reg.0.mha.in_proj_bias 3072 0 module.conv_reg.0.mha.out_proj.weight 1048576 0 module.conv_reg.0.mha.out_proj.bias 1024 0 module.conv_reg.0.norm0.weight 1024 0 module.conv_reg.0.norm0.bias 1024 0 module.conv_reg.0.norm1.weight 1024 0 module.conv_reg.0.norm1.bias 1024 0 module.conv_reg.0.seq.0.weight 1048576 0 module.conv_reg.0.seq.0.bias 1024 0 module.conv_reg.0.seq.2.weight 1048576 0 module.conv_reg.0.seq.2.bias 1024 0 module.conv_reg.1.mha.in_proj_weight 3145728 0 module.conv_reg.1.mha.in_proj_bias 3072 0 module.conv_reg.1.mha.out_proj.weight 1048576 0 module.conv_reg.1.mha.out_proj.bias 1024 0 module.conv_reg.1.norm0.weight 1024 0 module.conv_reg.1.norm0.bias 1024 0 module.conv_reg.1.norm1.weight 1024 0 module.conv_reg.1.norm1.bias 1024 0 module.conv_reg.1.seq.0.weight 1048576 0 module.conv_reg.1.seq.0.bias 1024 0 module.conv_reg.1.seq.2.weight 1048576 0 module.conv_reg.1.seq.2.bias 1024 0 module.conv_reg.2.mha.in_proj_weight 3145728 0 module.conv_reg.2.mha.in_proj_bias 3072 0 module.conv_reg.2.mha.out_proj.weight 1048576 0 module.conv_reg.2.mha.out_proj.bias 1024 0 module.conv_reg.2.norm0.weight 1024 0 module.conv_reg.2.norm0.bias 1024 0 module.conv_reg.2.norm1.weight 1024 0 module.conv_reg.2.norm1.bias 1024 0 module.conv_reg.2.seq.0.weight 1048576 0 module.conv_reg.2.seq.0.bias 1024 0 module.conv_reg.2.seq.2.weight 1048576 0 module.conv_reg.2.seq.2.bias 1024 0 module.conv_reg.3.mha.in_proj_weight 3145728 0 module.conv_reg.3.mha.in_proj_bias 3072 0 module.conv_reg.3.mha.out_proj.weight 1048576 0 module.conv_reg.3.mha.out_proj.bias 1024 0 module.conv_reg.3.norm0.weight 1024 0 module.conv_reg.3.norm0.bias 1024 0 module.conv_reg.3.norm1.weight 1024 0 module.conv_reg.3.norm1.bias 1024 0 module.conv_reg.3.seq.0.weight 1048576 0 module.conv_reg.3.seq.0.bias 1024 0 module.conv_reg.3.seq.2.weight 1048576 0 module.conv_reg.3.seq.2.bias 1024 0 module.conv_reg.4.mha.in_proj_weight 3145728 0 module.conv_reg.4.mha.in_proj_bias 3072 0 module.conv_reg.4.mha.out_proj.weight 1048576 0 module.conv_reg.4.mha.out_proj.bias 1024 0 module.conv_reg.4.norm0.weight 1024 0 module.conv_reg.4.norm0.bias 1024 0 module.conv_reg.4.norm1.weight 1024 0 module.conv_reg.4.norm1.bias 1024 0 module.conv_reg.4.seq.0.weight 1048576 0 module.conv_reg.4.seq.0.bias 1024 0 module.conv_reg.4.seq.2.weight 1048576 0 module.conv_reg.4.seq.2.bias 1024 0 module.conv_reg.5.mha.in_proj_weight 3145728 0 module.conv_reg.5.mha.in_proj_bias 3072 0 module.conv_reg.5.mha.out_proj.weight 1048576 0 module.conv_reg.5.mha.out_proj.bias 1024 0 module.conv_reg.5.norm0.weight 1024 0 module.conv_reg.5.norm0.bias 1024 0 module.conv_reg.5.norm1.weight 1024 0 module.conv_reg.5.norm1.bias 1024 0 module.conv_reg.5.seq.0.weight 1048576 0 module.conv_reg.5.seq.0.bias 1024 0 module.conv_reg.5.seq.2.weight 1048576 0 module.conv_reg.5.seq.2.bias 1024 0 module.nn_binary_particle.0.weight 1048576 0 module.nn_binary_particle.0.bias 1024 0 module.nn_binary_particle.2.weight 1024 0 module.nn_binary_particle.2.bias 1024 0 module.nn_binary_particle.4.weight 2048 0 module.nn_binary_particle.4.bias 2 0 module.nn_pid.0.weight 1048576 0 module.nn_pid.0.bias 1024 0 module.nn_pid.2.weight 1024 0 module.nn_pid.2.bias 1024 0 module.nn_pid.4.weight 6144 0 module.nn_pid.4.bias 6 0 module.nn_pt.nn.0.0.weight 1048576 0 module.nn_pt.nn.0.0.bias 1024 0 module.nn_pt.nn.0.2.weight 1024 0 module.nn_pt.nn.0.2.bias 1024 0 module.nn_pt.nn.0.4.weight 1024 0 module.nn_pt.nn.0.4.bias 1 0 module.nn_pt.nn.1.0.weight 1048576 0 module.nn_pt.nn.1.0.bias 1024 0 module.nn_pt.nn.1.2.weight 1024 0 module.nn_pt.nn.1.2.bias 1024 0 module.nn_pt.nn.1.4.weight 1024 0 module.nn_pt.nn.1.4.bias 1 0 module.nn_eta.nn.0.weight 1048576 0 module.nn_eta.nn.0.bias 1024 0 module.nn_eta.nn.2.weight 1024 0 module.nn_eta.nn.2.bias 1024 0 module.nn_eta.nn.4.weight 2048 0 module.nn_eta.nn.4.bias 2 0 module.nn_sin_phi.nn.0.weight 1048576 0 module.nn_sin_phi.nn.0.bias 1024 0 module.nn_sin_phi.nn.2.weight 1024 0 module.nn_sin_phi.nn.2.bias 1024 0 module.nn_sin_phi.nn.4.weight 2048 0 module.nn_sin_phi.nn.4.bias 2 0 module.nn_cos_phi.nn.0.weight 1048576 0 module.nn_cos_phi.nn.0.bias 1024 0 module.nn_cos_phi.nn.2.weight 1024 0 module.nn_cos_phi.nn.2.bias 1024 0 module.nn_cos_phi.nn.4.weight 2048 0 module.nn_cos_phi.nn.4.bias 2 0 module.nn_energy.nn.0.0.weight 1048576 0 module.nn_energy.nn.0.0.bias 1024 0 module.nn_energy.nn.0.2.weight 1024 0 module.nn_energy.nn.0.2.bias 1024 0 module.nn_energy.nn.0.4.weight 1024 0 module.nn_energy.nn.0.4.bias 1 0 module.nn_energy.nn.1.0.weight 1048576 0 module.nn_energy.nn.1.0.bias 1024 0 module.nn_energy.nn.1.2.weight 1024 0 module.nn_energy.nn.1.2.bias 1024 0 module.nn_energy.nn.1.4.weight 1024 0 module.nn_energy.nn.1.4.bias 1 0 module.final_norm_id.weight 1024 0 module.final_norm_id.bias 1024 0 module.final_norm_reg.weight 1024 0 module.final_norm_reg.bias 1024 0 [2024-10-14 09:47:59,977] INFO: Creating experiment dir experiments/pyg-clic_20241011_102451_167094 [2024-10-14 09:47:59,978] INFO: Model directory experiments/pyg-clic_20241011_102451_167094 [2024-10-14 09:48:00,140] INFO: train_dataset: clic_edm_qq_pf, 3598296 [2024-10-14 09:48:00,234] INFO: train_dataset: clic_edm_ttbar_pf, 7139800 [2024-10-14 09:48:00,332] INFO: train_dataset: clic_edm_ww_fullhad_pf, 3600900 [2024-10-14 09:48:35,829] INFO: valid_dataset: clic_edm_qq_pf, 399822 [2024-10-14 09:48:35,938] INFO: valid_dataset: clic_edm_ttbar_pf, 793400 [2024-10-14 09:48:35,965] INFO: valid_dataset: clic_edm_ww_fullhad_pf, 400100 [2024-10-14 09:48:46,758] INFO: Initiating epoch #23 train run on device rank=0 [2024-10-14 13:00:19,246] INFO: Initiating epoch #23 valid run on device rank=0 [2024-10-14 13:07:20,574] INFO: Rank 0: epoch=23 / 30 train_loss=1.8128 valid_loss=1.9104 stale=0 epoch_train_time=191.54m epoch_valid_time=6.91m epoch_total_time=198.56m eta=60.4m [2024-10-14 13:07:20,598] INFO: Initiating epoch #24 train run on device rank=0 [2024-10-14 16:18:48,042] INFO: Initiating epoch #24 valid run on device rank=0 [2024-10-14 16:25:50,924] INFO: Rank 0: epoch=24 / 30 train_loss=1.8063 valid_loss=1.9085 stale=0 epoch_train_time=191.46m epoch_valid_time=6.93m epoch_total_time=198.51m eta=99.3m [2024-10-14 16:25:50,944] INFO: Initiating epoch #25 train run on device rank=0 [2024-10-14 19:37:17,620] INFO: Initiating epoch #25 valid run on device rank=0 [2024-10-14 19:44:18,373] INFO: Rank 0: epoch=25 / 30 train_loss=1.8005 valid_loss=1.9052 stale=0 epoch_train_time=191.44m epoch_valid_time=6.91m epoch_total_time=198.46m eta=119.1m [2024-10-14 19:44:18,398] INFO: Initiating epoch #26 train run on device rank=0 [2024-10-14 22:55:38,092] INFO: Initiating epoch #26 valid run on device rank=0 [2024-10-14 23:02:41,805] INFO: Rank 0: epoch=26 / 30 train_loss=1.7954 valid_loss=1.9043 stale=0 epoch_train_time=191.33m epoch_valid_time=6.95m epoch_total_time=198.39m eta=122.1m [2024-10-14 23:02:41,835] INFO: Initiating epoch #27 train run on device rank=0 [2024-10-15 02:14:30,888] INFO: Initiating epoch #27 valid run on device rank=0 [2024-10-15 02:21:32,802] INFO: Rank 0: epoch=27 / 30 train_loss=1.7909 valid_loss=1.9021 stale=0 epoch_train_time=191.82m epoch_valid_time=6.92m epoch_total_time=198.85m eta=110.3m [2024-10-15 02:21:32,828] INFO: Initiating epoch #28 train run on device rank=0 [2024-10-15 05:32:29,453] INFO: Initiating epoch #28 valid run on device rank=0 [2024-10-15 05:39:31,630] INFO: Rank 0: epoch=28 / 30 train_loss=1.7872 valid_loss=1.9020 stale=0 epoch_train_time=190.94m epoch_valid_time=6.92m epoch_total_time=197.98m eta=85.1m [2024-10-15 05:39:31,652] INFO: Initiating epoch #29 train run on device rank=0 [2024-10-15 08:51:11,754] INFO: Initiating epoch #29 valid run on device rank=0 [2024-10-15 08:58:11,804] INFO: Rank 0: epoch=29 / 30 train_loss=1.7842 valid_loss=1.9017 stale=0 epoch_train_time=191.67m epoch_valid_time=6.89m epoch_total_time=198.67m eta=47.9m [2024-10-15 08:58:11,830] INFO: Initiating epoch #30 train run on device rank=0 [2024-10-15 12:09:34,952] INFO: Initiating epoch #30 valid run on device rank=0