``` pyg-clic_20250204_081614_352844 - legacy GNNLSH model, full dataset, 2 epochs / ~90 hours pyg-clic_20250211_145811_219129 - transformer, full dataset, 2 epochs / ~45 hours pyg-clic_20250208_095515_010468 - transformer + flash attention, 1M events from each dataset, 10 epochs / ~12 hours pyg-clic_20250209_100514_187330 - transformer + flash attention, 4M events from each dataset, 10 epochs / ~40 hours #main training pyg-clic_20250130_214007_333962 - transformer + flash attention, full dataset, 10 epochs / ~80 hours, 1st run pyg-clic_20250306_105311_290722 - transformer + flash attention, full dataset, 10 epochs / ~80 hours, 2nd run pyg-clic_20250309_173756_957486 - transformer + flash attention, full dataset, 10 epochs / ~80 hours, 3rd run #multi-GPU tests largebatch_study_gpus4_notscaledLR0.0001_epochs30_bsm256_adamw_a100_cu124_fulldataset_pyg-clic-v230_20250219_055135_172489 - just run on 4x GPUs largebatch_study_gpus4_linearscaledLR0.0004_epochs30_bsm256_adamw_a100_cu124_fulldataset_pyg-clic-v230_20250217_082738_406721 - run on 4x GPUs, scale learning rate by 4x largebatch_clic_wd3eneg2_gpus4_lr4eneg4_epochs10_pyg-clic-v230_adamw_tunedweightdecay_20250314_085408_738888 - run on 4x GPUs, scale learning rate by 4x, scale weight decay by 3x ```