IntelliGrow's picture
Upload folder using huggingface_hub
b38eaa1 verified
[2025-07-03 20:03:12,030][17717] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-03 20:03:12,030][17717] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-07-03 20:03:12,191][17717] Num visible devices: 1
[2025-07-03 20:03:12,217][17717] Starting seed is not provided
[2025-07-03 20:03:12,221][17717] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-03 20:03:12,222][17717] Initializing actor-critic model on device cuda:0
[2025-07-03 20:03:12,223][17717] RunningMeanStd input shape: (3, 72, 128)
[2025-07-03 20:03:12,239][17717] RunningMeanStd input shape: (1,)
[2025-07-03 20:03:12,558][17717] ConvEncoder: input_channels=3
[2025-07-03 20:03:12,933][17752] Worker 14 uses CPU cores [0]
[2025-07-03 20:03:13,177][17743] Worker 2 uses CPU cores [0]
[2025-07-03 20:03:13,301][17742] Worker 6 uses CPU cores [0]
[2025-07-03 20:03:13,482][17737] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-03 20:03:13,482][17737] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-07-03 20:03:13,595][17747] Worker 7 uses CPU cores [1]
[2025-07-03 20:03:13,590][17737] Num visible devices: 1
[2025-07-03 20:03:13,600][17739] Worker 1 uses CPU cores [1]
[2025-07-03 20:03:13,609][17749] Worker 9 uses CPU cores [1]
[2025-07-03 20:03:13,638][17746] Worker 11 uses CPU cores [1]
[2025-07-03 20:03:13,656][17740] Worker 3 uses CPU cores [1]
[2025-07-03 20:03:13,741][17738] Worker 0 uses CPU cores [0]
[2025-07-03 20:03:13,758][17748] Worker 12 uses CPU cores [0]
[2025-07-03 20:03:13,789][17762] Worker 18 uses CPU cores [0]
[2025-07-03 20:03:13,792][17750] Worker 8 uses CPU cores [0]
[2025-07-03 20:03:13,825][17745] Worker 10 uses CPU cores [0]
[2025-07-03 20:03:13,842][17744] Worker 5 uses CPU cores [1]
[2025-07-03 20:03:13,843][17759] Worker 15 uses CPU cores [1]
[2025-07-03 20:03:13,933][17761] Worker 17 uses CPU cores [1]
[2025-07-03 20:03:13,936][17763] Worker 19 uses CPU cores [1]
[2025-07-03 20:03:13,942][17741] Worker 4 uses CPU cores [0]
[2025-07-03 20:03:13,948][17760] Worker 16 uses CPU cores [0]
[2025-07-03 20:03:13,978][17751] Worker 13 uses CPU cores [1]
[2025-07-03 20:03:13,992][17717] Conv encoder output size: 512
[2025-07-03 20:03:13,992][17717] Policy head output size: 512
[2025-07-03 20:03:14,008][17717] Created Actor Critic model with architecture:
[2025-07-03 20:03:14,008][17717] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-07-03 20:03:14,258][17717] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-07-03 20:03:15,767][17717] No checkpoints found
[2025-07-03 20:03:15,767][17717] Did not load from checkpoint, starting from scratch!
[2025-07-03 20:03:15,768][17717] Initialized policy 0 weights for model version 0
[2025-07-03 20:03:15,771][17717] LearnerWorker_p0 finished initialization!
[2025-07-03 20:03:15,773][17717] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-03 20:03:16,012][17737] RunningMeanStd input shape: (3, 72, 128)
[2025-07-03 20:03:16,014][17737] RunningMeanStd input shape: (1,)
[2025-07-03 20:03:16,027][17737] ConvEncoder: input_channels=3
[2025-07-03 20:03:16,138][17737] Conv encoder output size: 512
[2025-07-03 20:03:16,139][17737] Policy head output size: 512
[2025-07-03 20:03:16,866][17762] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,878][17743] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,893][17749] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,863][17747] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,890][17745] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,905][17740] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,916][17759] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,933][17741] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,968][17761] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,980][17739] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,972][17748] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,990][17744] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:17,001][17746] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:17,002][17742] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:16,995][17752] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:17,009][17738] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:17,028][17763] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:17,040][17751] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:17,036][17760] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:17,123][17750] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-03 20:03:20,181][17745] Decorrelating experience for 0 frames...
[2025-07-03 20:03:20,189][17743] Decorrelating experience for 0 frames...
[2025-07-03 20:03:20,187][17748] Decorrelating experience for 0 frames...
[2025-07-03 20:03:20,190][17762] Decorrelating experience for 0 frames...
[2025-07-03 20:03:20,402][17759] Decorrelating experience for 0 frames...
[2025-07-03 20:03:20,414][17747] Decorrelating experience for 0 frames...
[2025-07-03 20:03:20,439][17749] Decorrelating experience for 0 frames...
[2025-07-03 20:03:20,653][17751] Decorrelating experience for 0 frames...
[2025-07-03 20:03:20,673][17744] Decorrelating experience for 0 frames...
[2025-07-03 20:03:22,828][17763] Decorrelating experience for 0 frames...
[2025-07-03 20:03:22,902][17743] Decorrelating experience for 32 frames...
[2025-07-03 20:03:22,900][17752] Decorrelating experience for 0 frames...
[2025-07-03 20:03:22,921][17745] Decorrelating experience for 32 frames...
[2025-07-03 20:03:22,885][17741] Decorrelating experience for 0 frames...
[2025-07-03 20:03:22,970][17747] Decorrelating experience for 32 frames...
[2025-07-03 20:03:22,972][17740] Decorrelating experience for 0 frames...
[2025-07-03 20:03:23,375][17760] Decorrelating experience for 0 frames...
[2025-07-03 20:03:23,613][17750] Decorrelating experience for 0 frames...
[2025-07-03 20:03:25,085][17746] Decorrelating experience for 0 frames...
[2025-07-03 20:03:25,173][17763] Decorrelating experience for 32 frames...
[2025-07-03 20:03:25,368][17744] Decorrelating experience for 32 frames...
[2025-07-03 20:03:25,801][17744] Decorrelating experience for 64 frames...
[2025-07-03 20:03:25,881][17762] Decorrelating experience for 32 frames...
[2025-07-03 20:03:25,897][17742] Decorrelating experience for 0 frames...
[2025-07-03 20:03:25,894][17738] Decorrelating experience for 0 frames...
[2025-07-03 20:03:25,916][17752] Decorrelating experience for 32 frames...
[2025-07-03 20:03:26,181][17760] Decorrelating experience for 32 frames...
[2025-07-03 20:03:27,197][17743] Decorrelating experience for 64 frames...
[2025-07-03 20:03:27,346][17741] Decorrelating experience for 32 frames...
[2025-07-03 20:03:27,472][17744] Decorrelating experience for 96 frames...
[2025-07-03 20:03:27,647][17763] Decorrelating experience for 64 frames...
[2025-07-03 20:03:27,689][17747] Decorrelating experience for 64 frames...
[2025-07-03 20:03:27,783][17759] Decorrelating experience for 32 frames...
[2025-07-03 20:03:27,879][17738] Decorrelating experience for 32 frames...
[2025-07-03 20:03:28,172][17750] Decorrelating experience for 32 frames...
[2025-07-03 20:03:29,076][17740] Decorrelating experience for 32 frames...
[2025-07-03 20:03:29,252][17739] Decorrelating experience for 0 frames...
[2025-07-03 20:03:29,419][17746] Decorrelating experience for 32 frames...
[2025-07-03 20:03:29,760][17760] Decorrelating experience for 64 frames...
[2025-07-03 20:03:29,979][17743] Decorrelating experience for 96 frames...
[2025-07-03 20:03:30,110][17752] Decorrelating experience for 64 frames...
[2025-07-03 20:03:30,335][17748] Decorrelating experience for 32 frames...
[2025-07-03 20:03:30,396][17745] Decorrelating experience for 64 frames...
[2025-07-03 20:03:30,455][17749] Decorrelating experience for 32 frames...
[2025-07-03 20:03:30,839][17750] Decorrelating experience for 64 frames...
[2025-07-03 20:03:30,952][17747] Decorrelating experience for 96 frames...
[2025-07-03 20:03:31,619][17739] Decorrelating experience for 32 frames...
[2025-07-03 20:03:31,634][17738] Decorrelating experience for 64 frames...
[2025-07-03 20:03:32,293][17744] Decorrelating experience for 128 frames...
[2025-07-03 20:03:32,639][17763] Decorrelating experience for 96 frames...
[2025-07-03 20:03:32,743][17742] Decorrelating experience for 32 frames...
[2025-07-03 20:03:32,759][17740] Decorrelating experience for 64 frames...
[2025-07-03 20:03:33,404][17752] Decorrelating experience for 96 frames...
[2025-07-03 20:03:33,511][17741] Decorrelating experience for 64 frames...
[2025-07-03 20:03:33,611][17750] Decorrelating experience for 96 frames...
[2025-07-03 20:03:33,935][17743] Decorrelating experience for 128 frames...
[2025-07-03 20:03:33,974][17746] Decorrelating experience for 64 frames...
[2025-07-03 20:03:33,977][17749] Decorrelating experience for 64 frames...
[2025-07-03 20:03:34,917][17751] Decorrelating experience for 32 frames...
[2025-07-03 20:03:34,935][17738] Decorrelating experience for 96 frames...
[2025-07-03 20:03:35,273][17762] Decorrelating experience for 64 frames...
[2025-07-03 20:03:35,299][17748] Decorrelating experience for 64 frames...
[2025-07-03 20:03:35,933][17747] Decorrelating experience for 128 frames...
[2025-07-03 20:03:36,143][17740] Decorrelating experience for 96 frames...
[2025-07-03 20:03:36,828][17739] Decorrelating experience for 64 frames...
[2025-07-03 20:03:37,067][17746] Decorrelating experience for 96 frames...
[2025-07-03 20:03:38,368][17761] Decorrelating experience for 0 frames...
[2025-07-03 20:03:38,481][17751] Decorrelating experience for 64 frames...
[2025-07-03 20:03:38,693][17762] Decorrelating experience for 96 frames...
[2025-07-03 20:03:38,705][17759] Decorrelating experience for 64 frames...
[2025-07-03 20:03:38,974][17760] Decorrelating experience for 96 frames...
[2025-07-03 20:03:39,930][17747] Decorrelating experience for 160 frames...
[2025-07-03 20:03:40,181][17749] Decorrelating experience for 96 frames...
[2025-07-03 20:03:40,283][17738] Decorrelating experience for 128 frames...
[2025-07-03 20:03:41,380][17748] Decorrelating experience for 96 frames...
[2025-07-03 20:03:41,487][17763] Decorrelating experience for 128 frames...
[2025-07-03 20:03:41,862][17744] Decorrelating experience for 160 frames...
[2025-07-03 20:03:41,912][17745] Decorrelating experience for 96 frames...
[2025-07-03 20:03:42,163][17746] Decorrelating experience for 128 frames...
[2025-07-03 20:03:43,134][17739] Decorrelating experience for 96 frames...
[2025-07-03 20:03:43,299][17741] Decorrelating experience for 96 frames...
[2025-07-03 20:03:43,515][17760] Decorrelating experience for 128 frames...
[2025-07-03 20:03:44,153][17747] Decorrelating experience for 192 frames...
[2025-07-03 20:03:44,424][17759] Decorrelating experience for 96 frames...
[2025-07-03 20:03:44,842][17749] Decorrelating experience for 128 frames...
[2025-07-03 20:03:45,225][17743] Decorrelating experience for 160 frames...
[2025-07-03 20:03:45,619][17746] Decorrelating experience for 160 frames...
[2025-07-03 20:03:46,342][17744] Decorrelating experience for 192 frames...
[2025-07-03 20:03:46,379][17760] Decorrelating experience for 160 frames...
[2025-07-03 20:03:47,029][17738] Decorrelating experience for 160 frames...
[2025-07-03 20:03:47,946][17751] Decorrelating experience for 96 frames...
[2025-07-03 20:03:48,678][17745] Decorrelating experience for 128 frames...
[2025-07-03 20:03:48,835][17743] Decorrelating experience for 192 frames...
[2025-07-03 20:03:49,036][17759] Decorrelating experience for 128 frames...
[2025-07-03 20:03:49,395][17744] Decorrelating experience for 224 frames...
[2025-07-03 20:03:49,398][17763] Decorrelating experience for 160 frames...
[2025-07-03 20:03:49,410][17741] Decorrelating experience for 128 frames...
[2025-07-03 20:03:50,010][17760] Decorrelating experience for 192 frames...
[2025-07-03 20:03:50,917][17762] Decorrelating experience for 128 frames...
[2025-07-03 20:03:51,017][17747] Decorrelating experience for 224 frames...
[2025-07-03 20:03:51,019][17739] Decorrelating experience for 128 frames...
[2025-07-03 20:03:52,110][17748] Decorrelating experience for 128 frames...
[2025-07-03 20:03:52,174][17745] Decorrelating experience for 160 frames...
[2025-07-03 20:03:52,941][17742] Another process currently holds the lock /tmp/sf2_root/doom_004.lockfile, attempt: 1
[2025-07-03 20:03:53,224][17741] Decorrelating experience for 160 frames...
[2025-07-03 20:03:54,066][17760] Decorrelating experience for 224 frames...
[2025-07-03 20:03:54,157][17752] Decorrelating experience for 128 frames...
[2025-07-03 20:03:54,261][17750] Decorrelating experience for 128 frames...
[2025-07-03 20:03:54,310][17743] Decorrelating experience for 224 frames...
[2025-07-03 20:03:56,001][17746] Decorrelating experience for 192 frames...
[2025-07-03 20:03:58,383][17761] Another process currently holds the lock /tmp/sf2_root/doom_004.lockfile, attempt: 1
[2025-07-03 20:04:00,165][17740] Decorrelating experience for 128 frames...
[2025-07-03 20:04:01,326][17717] Signal inference workers to stop experience collection...
[2025-07-03 20:04:01,368][17737] InferenceWorker_p0-w0: stopping experience collection
[2025-07-03 20:04:01,435][17745] Decorrelating experience for 192 frames...
[2025-07-03 20:04:01,571][17742] Decorrelating experience for 64 frames...
[2025-07-03 20:04:01,660][17748] Decorrelating experience for 160 frames...
[2025-07-03 20:04:01,887][17717] Signal inference workers to resume experience collection...
[2025-07-03 20:04:01,891][17737] InferenceWorker_p0-w0: resuming experience collection
[2025-07-03 20:04:02,426][17752] Decorrelating experience for 160 frames...
[2025-07-03 20:04:03,224][17750] Decorrelating experience for 160 frames...
[2025-07-03 20:04:03,348][17751] Decorrelating experience for 128 frames...
[2025-07-03 20:04:03,974][17740] Decorrelating experience for 160 frames...
[2025-07-03 20:04:05,082][17739] Decorrelating experience for 160 frames...
[2025-07-03 20:04:05,277][17761] Decorrelating experience for 32 frames...
[2025-07-03 20:04:05,470][17749] Another process currently holds the lock /tmp/sf2_root/doom_002.lockfile, attempt: 1
[2025-07-03 20:04:05,495][17746] Decorrelating experience for 224 frames...
[2025-07-03 20:04:06,403][17742] Decorrelating experience for 96 frames...
[2025-07-03 20:04:06,966][17745] Decorrelating experience for 224 frames...
[2025-07-03 20:04:06,970][17748] Decorrelating experience for 192 frames...
[2025-07-03 20:04:07,400][17741] Decorrelating experience for 192 frames...
[2025-07-03 20:04:07,717][17759] Decorrelating experience for 160 frames...
[2025-07-03 20:04:07,860][17738] Another process currently holds the lock /tmp/sf2_root/doom_002.lockfile, attempt: 1
[2025-07-03 20:04:08,496][17751] Decorrelating experience for 160 frames...
[2025-07-03 20:04:09,424][17752] Decorrelating experience for 192 frames...
[2025-07-03 20:04:09,477][17740] Decorrelating experience for 192 frames...
[2025-07-03 20:04:09,998][17763] Decorrelating experience for 192 frames...
[2025-07-03 20:04:11,103][17761] Decorrelating experience for 64 frames...
[2025-07-03 20:04:12,678][17749] Decorrelating experience for 160 frames...
[2025-07-03 20:04:12,765][17762] Decorrelating experience for 160 frames...
[2025-07-03 20:04:13,744][17759] Decorrelating experience for 192 frames...
[2025-07-03 20:04:15,679][17741] Decorrelating experience for 224 frames...
[2025-07-03 20:04:16,549][17748] Decorrelating experience for 224 frames...
[2025-07-03 20:04:17,352][17751] Decorrelating experience for 192 frames...
[2025-07-03 20:04:17,681][17752] Decorrelating experience for 224 frames...
[2025-07-03 20:04:19,025][17761] Decorrelating experience for 96 frames...
[2025-07-03 20:04:20,578][17749] Decorrelating experience for 192 frames...
[2025-07-03 20:04:22,057][17737] Updated weights for policy 0, policy_version 10 (0.0276)
[2025-07-03 20:04:22,537][17740] Decorrelating experience for 224 frames...
[2025-07-03 20:04:23,239][17739] Decorrelating experience for 192 frames...
[2025-07-03 20:04:23,688][17762] Decorrelating experience for 192 frames...
[2025-07-03 20:04:24,257][17750] Another process currently holds the lock /tmp/sf2_root/doom_002.lockfile, attempt: 1
[2025-07-03 20:04:24,406][17763] Decorrelating experience for 224 frames...
[2025-07-03 20:04:25,475][17751] Decorrelating experience for 224 frames...
[2025-07-03 20:04:27,115][17738] Decorrelating experience for 192 frames...
[2025-07-03 20:04:28,555][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000013_53248.pth...
[2025-07-03 20:04:28,875][17761] Decorrelating experience for 128 frames...
[2025-07-03 20:04:29,597][17759] Decorrelating experience for 224 frames...
[2025-07-03 20:04:30,545][17739] Decorrelating experience for 224 frames...
[2025-07-03 20:04:31,146][17762] Decorrelating experience for 224 frames...
[2025-07-03 20:04:32,897][17742] Decorrelating experience for 128 frames...
[2025-07-03 20:04:37,729][17738] Decorrelating experience for 224 frames...
[2025-07-03 20:04:38,600][17737] Updated weights for policy 0, policy_version 20 (0.0045)
[2025-07-03 20:04:40,298][17761] Decorrelating experience for 160 frames...
[2025-07-03 20:04:42,432][17750] Decorrelating experience for 192 frames...
[2025-07-03 20:04:43,999][17742] Decorrelating experience for 160 frames...
[2025-07-03 20:04:48,549][17717] Saving new best policy, reward=3.755!
[2025-07-03 20:04:49,455][17749] Decorrelating experience for 224 frames...
[2025-07-03 20:04:49,768][17761] Decorrelating experience for 192 frames...
[2025-07-03 20:04:50,601][17737] Updated weights for policy 0, policy_version 30 (0.0014)
[2025-07-03 20:04:52,254][17742] Decorrelating experience for 192 frames...
[2025-07-03 20:04:53,548][17717] Saving new best policy, reward=4.266!
[2025-07-03 20:04:57,796][17750] Decorrelating experience for 224 frames...
[2025-07-03 20:04:58,552][17717] Saving new best policy, reward=4.387!
[2025-07-03 20:05:02,071][17761] Decorrelating experience for 224 frames...
[2025-07-03 20:05:03,556][17717] Saving new best policy, reward=4.464!
[2025-07-03 20:05:05,739][17737] Updated weights for policy 0, policy_version 40 (0.0027)
[2025-07-03 20:05:09,085][17742] Decorrelating experience for 224 frames...
[2025-07-03 20:05:20,019][17737] Updated weights for policy 0, policy_version 50 (0.0016)
[2025-07-03 20:05:31,514][17737] Updated weights for policy 0, policy_version 60 (0.0029)
[2025-07-03 20:05:43,319][17737] Updated weights for policy 0, policy_version 70 (0.0014)
[2025-07-03 20:05:53,130][17737] Updated weights for policy 0, policy_version 80 (0.0013)
[2025-07-03 20:05:53,547][17717] Saving new best policy, reward=4.487!
[2025-07-03 20:05:58,556][17717] Saving new best policy, reward=4.542!
[2025-07-03 20:06:03,548][17717] Saving new best policy, reward=4.547!
[2025-07-03 20:06:04,808][17737] Updated weights for policy 0, policy_version 90 (0.0014)
[2025-07-03 20:06:14,436][17737] Updated weights for policy 0, policy_version 100 (0.0014)
[2025-07-03 20:06:18,547][17717] Saving new best policy, reward=4.705!
[2025-07-03 20:06:26,335][17737] Updated weights for policy 0, policy_version 110 (0.0014)
[2025-07-03 20:06:28,695][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000111_454656.pth...
[2025-07-03 20:06:29,005][17717] Saving new best policy, reward=4.720!
[2025-07-03 20:06:36,386][17737] Updated weights for policy 0, policy_version 120 (0.0016)
[2025-07-03 20:06:48,824][17737] Updated weights for policy 0, policy_version 130 (0.0019)
[2025-07-03 20:06:53,544][17717] Saving new best policy, reward=4.881!
[2025-07-03 20:06:57,661][17737] Updated weights for policy 0, policy_version 140 (0.0020)
[2025-07-03 20:07:10,843][17737] Updated weights for policy 0, policy_version 150 (0.0014)
[2025-07-03 20:07:19,403][17737] Updated weights for policy 0, policy_version 160 (0.0014)
[2025-07-03 20:07:32,239][17737] Updated weights for policy 0, policy_version 170 (0.0013)
[2025-07-03 20:07:41,231][17737] Updated weights for policy 0, policy_version 180 (0.0020)
[2025-07-03 20:07:53,562][17737] Updated weights for policy 0, policy_version 190 (0.0016)
[2025-07-03 20:08:05,668][17737] Updated weights for policy 0, policy_version 200 (0.0014)
[2025-07-03 20:08:18,971][17737] Updated weights for policy 0, policy_version 210 (0.0018)
[2025-07-03 20:08:28,550][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000219_897024.pth...
[2025-07-03 20:08:28,891][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000013_53248.pth
[2025-07-03 20:08:29,089][17737] Updated weights for policy 0, policy_version 220 (0.0016)
[2025-07-03 20:08:33,544][17717] Saving new best policy, reward=4.953!
[2025-07-03 20:08:40,910][17737] Updated weights for policy 0, policy_version 230 (0.0016)
[2025-07-03 20:08:43,543][17717] Saving new best policy, reward=5.088!
[2025-07-03 20:08:48,548][17717] Saving new best policy, reward=5.237!
[2025-07-03 20:08:51,087][17737] Updated weights for policy 0, policy_version 240 (0.0013)
[2025-07-03 20:08:53,626][17717] Saving new best policy, reward=5.487!
[2025-07-03 20:08:58,559][17717] Saving new best policy, reward=5.559!
[2025-07-03 20:09:02,349][17737] Updated weights for policy 0, policy_version 250 (0.0018)
[2025-07-03 20:09:13,333][17737] Updated weights for policy 0, policy_version 260 (0.0027)
[2025-07-03 20:09:13,545][17717] Saving new best policy, reward=5.691!
[2025-07-03 20:09:18,547][17717] Saving new best policy, reward=5.997!
[2025-07-03 20:09:23,545][17717] Saving new best policy, reward=6.126!
[2025-07-03 20:09:23,936][17737] Updated weights for policy 0, policy_version 270 (0.0029)
[2025-07-03 20:09:35,457][17737] Updated weights for policy 0, policy_version 280 (0.0032)
[2025-07-03 20:09:43,547][17717] Saving new best policy, reward=6.132!
[2025-07-03 20:09:45,625][17737] Updated weights for policy 0, policy_version 290 (0.0014)
[2025-07-03 20:09:57,708][17737] Updated weights for policy 0, policy_version 300 (0.0015)
[2025-07-03 20:10:03,600][17717] Saving new best policy, reward=6.219!
[2025-07-03 20:10:07,212][17737] Updated weights for policy 0, policy_version 310 (0.0019)
[2025-07-03 20:10:13,542][17717] Saving new best policy, reward=6.266!
[2025-07-03 20:10:18,557][17717] Saving new best policy, reward=6.512!
[2025-07-03 20:10:19,800][17737] Updated weights for policy 0, policy_version 320 (0.0022)
[2025-07-03 20:10:23,552][17717] Saving new best policy, reward=6.690!
[2025-07-03 20:10:28,547][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000329_1347584.pth...
[2025-07-03 20:10:28,805][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000111_454656.pth
[2025-07-03 20:10:28,830][17717] Saving new best policy, reward=6.952!
[2025-07-03 20:10:29,221][17737] Updated weights for policy 0, policy_version 330 (0.0014)
[2025-07-03 20:10:41,763][17737] Updated weights for policy 0, policy_version 340 (0.0019)
[2025-07-03 20:10:48,548][17717] Saving new best policy, reward=7.405!
[2025-07-03 20:10:50,571][17737] Updated weights for policy 0, policy_version 350 (0.0016)
[2025-07-03 20:10:53,544][17717] Saving new best policy, reward=7.703!
[2025-07-03 20:10:58,556][17717] Saving new best policy, reward=8.108!
[2025-07-03 20:11:03,548][17717] Saving new best policy, reward=8.418!
[2025-07-03 20:11:06,671][17737] Updated weights for policy 0, policy_version 360 (0.0031)
[2025-07-03 20:11:15,897][17737] Updated weights for policy 0, policy_version 370 (0.0013)
[2025-07-03 20:11:18,550][17717] Saving new best policy, reward=8.445!
[2025-07-03 20:11:23,548][17717] Saving new best policy, reward=8.767!
[2025-07-03 20:11:28,424][17737] Updated weights for policy 0, policy_version 380 (0.0014)
[2025-07-03 20:11:33,580][17717] Saving new best policy, reward=8.823!
[2025-07-03 20:11:37,310][17737] Updated weights for policy 0, policy_version 390 (0.0034)
[2025-07-03 20:11:43,631][17717] Saving new best policy, reward=9.302!
[2025-07-03 20:11:50,165][17737] Updated weights for policy 0, policy_version 400 (0.0026)
[2025-07-03 20:11:53,544][17717] Saving new best policy, reward=9.769!
[2025-07-03 20:11:59,544][17737] Updated weights for policy 0, policy_version 410 (0.0016)
[2025-07-03 20:12:03,544][17717] Saving new best policy, reward=9.877!
[2025-07-03 20:12:12,066][17737] Updated weights for policy 0, policy_version 420 (0.0022)
[2025-07-03 20:12:22,057][17737] Updated weights for policy 0, policy_version 430 (0.0013)
[2025-07-03 20:12:28,551][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000434_1777664.pth...
[2025-07-03 20:12:28,866][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000219_897024.pth
[2025-07-03 20:12:28,885][17717] Saving new best policy, reward=9.912!
[2025-07-03 20:12:33,499][17737] Updated weights for policy 0, policy_version 440 (0.0015)
[2025-07-03 20:12:33,550][17717] Saving new best policy, reward=10.541!
[2025-07-03 20:12:44,048][17737] Updated weights for policy 0, policy_version 450 (0.0015)
[2025-07-03 20:12:55,119][17737] Updated weights for policy 0, policy_version 460 (0.0018)
[2025-07-03 20:13:06,020][17737] Updated weights for policy 0, policy_version 470 (0.0014)
[2025-07-03 20:13:08,551][17717] Saving new best policy, reward=10.883!
[2025-07-03 20:13:13,543][17717] Saving new best policy, reward=11.324!
[2025-07-03 20:13:16,845][17737] Updated weights for policy 0, policy_version 480 (0.0022)
[2025-07-03 20:13:27,652][17737] Updated weights for policy 0, policy_version 490 (0.0017)
[2025-07-03 20:13:28,550][17717] Saving new best policy, reward=11.523!
[2025-07-03 20:13:38,040][17737] Updated weights for policy 0, policy_version 500 (0.0013)
[2025-07-03 20:13:38,550][17717] Saving new best policy, reward=12.760!
[2025-07-03 20:13:43,543][17717] Saving new best policy, reward=13.081!
[2025-07-03 20:13:51,829][17737] Updated weights for policy 0, policy_version 510 (0.0018)
[2025-07-03 20:13:53,553][17717] Saving new best policy, reward=13.405!
[2025-07-03 20:14:03,546][17717] Saving new best policy, reward=14.013!
[2025-07-03 20:14:03,965][17737] Updated weights for policy 0, policy_version 520 (0.0013)
[2025-07-03 20:14:15,435][17737] Updated weights for policy 0, policy_version 530 (0.0016)
[2025-07-03 20:14:25,502][17737] Updated weights for policy 0, policy_version 540 (0.0013)
[2025-07-03 20:14:28,611][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000544_2228224.pth...
[2025-07-03 20:14:28,892][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000329_1347584.pth
[2025-07-03 20:14:37,463][17737] Updated weights for policy 0, policy_version 550 (0.0017)
[2025-07-03 20:14:43,604][17717] Saving new best policy, reward=14.051!
[2025-07-03 20:14:47,145][17737] Updated weights for policy 0, policy_version 560 (0.0014)
[2025-07-03 20:14:59,751][17737] Updated weights for policy 0, policy_version 570 (0.0029)
[2025-07-03 20:15:08,423][17737] Updated weights for policy 0, policy_version 580 (0.0013)
[2025-07-03 20:15:21,422][17737] Updated weights for policy 0, policy_version 590 (0.0027)
[2025-07-03 20:15:23,544][17717] Saving new best policy, reward=14.086!
[2025-07-03 20:15:28,551][17717] Saving new best policy, reward=15.840!
[2025-07-03 20:15:30,280][17737] Updated weights for policy 0, policy_version 600 (0.0013)
[2025-07-03 20:15:33,623][17717] Saving new best policy, reward=16.522!
[2025-07-03 20:15:42,797][17737] Updated weights for policy 0, policy_version 610 (0.0016)
[2025-07-03 20:15:51,776][17737] Updated weights for policy 0, policy_version 620 (0.0013)
[2025-07-03 20:16:04,244][17737] Updated weights for policy 0, policy_version 630 (0.0029)
[2025-07-03 20:16:13,034][17737] Updated weights for policy 0, policy_version 640 (0.0015)
[2025-07-03 20:16:25,847][17737] Updated weights for policy 0, policy_version 650 (0.0018)
[2025-07-03 20:16:28,555][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000652_2670592.pth...
[2025-07-03 20:16:28,788][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000434_1777664.pth
[2025-07-03 20:16:36,968][17737] Updated weights for policy 0, policy_version 660 (0.0013)
[2025-07-03 20:16:43,542][17717] Saving new best policy, reward=16.827!
[2025-07-03 20:16:48,551][17717] Saving new best policy, reward=17.611!
[2025-07-03 20:16:51,468][17737] Updated weights for policy 0, policy_version 670 (0.0032)
[2025-07-03 20:16:53,547][17717] Saving new best policy, reward=18.078!
[2025-07-03 20:17:01,504][17737] Updated weights for policy 0, policy_version 680 (0.0015)
[2025-07-03 20:17:12,912][17737] Updated weights for policy 0, policy_version 690 (0.0013)
[2025-07-03 20:17:23,477][17737] Updated weights for policy 0, policy_version 700 (0.0015)
[2025-07-03 20:17:33,544][17717] Saving new best policy, reward=18.236!
[2025-07-03 20:17:34,678][17737] Updated weights for policy 0, policy_version 710 (0.0016)
[2025-07-03 20:17:38,551][17717] Saving new best policy, reward=19.271!
[2025-07-03 20:17:43,587][17717] Saving new best policy, reward=20.302!
[2025-07-03 20:17:45,453][17737] Updated weights for policy 0, policy_version 720 (0.0014)
[2025-07-03 20:17:56,357][17737] Updated weights for policy 0, policy_version 730 (0.0021)
[2025-07-03 20:18:07,484][17737] Updated weights for policy 0, policy_version 740 (0.0018)
[2025-07-03 20:18:17,421][17737] Updated weights for policy 0, policy_version 750 (0.0014)
[2025-07-03 20:18:28,551][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000759_3108864.pth...
[2025-07-03 20:18:28,870][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000544_2228224.pth
[2025-07-03 20:18:29,025][17737] Updated weights for policy 0, policy_version 760 (0.0018)
[2025-07-03 20:18:33,544][17717] Saving new best policy, reward=20.501!
[2025-07-03 20:18:39,466][17737] Updated weights for policy 0, policy_version 770 (0.0016)
[2025-07-03 20:18:48,552][17717] Saving new best policy, reward=21.445!
[2025-07-03 20:18:52,335][17737] Updated weights for policy 0, policy_version 780 (0.0015)
[2025-07-03 20:19:01,337][17737] Updated weights for policy 0, policy_version 790 (0.0016)
[2025-07-03 20:19:14,158][17737] Updated weights for policy 0, policy_version 800 (0.0016)
[2025-07-03 20:19:23,805][17737] Updated weights for policy 0, policy_version 810 (0.0017)
[2025-07-03 20:19:39,446][17737] Updated weights for policy 0, policy_version 820 (0.0017)
[2025-07-03 20:19:48,318][17737] Updated weights for policy 0, policy_version 830 (0.0014)
[2025-07-03 20:19:58,558][17717] Saving new best policy, reward=21.624!
[2025-07-03 20:20:01,381][17737] Updated weights for policy 0, policy_version 840 (0.0016)
[2025-07-03 20:20:03,554][17717] Saving new best policy, reward=22.264!
[2025-07-03 20:20:10,528][17737] Updated weights for policy 0, policy_version 850 (0.0015)
[2025-07-03 20:20:22,900][17737] Updated weights for policy 0, policy_version 860 (0.0024)
[2025-07-03 20:20:28,553][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000866_3547136.pth...
[2025-07-03 20:20:28,820][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000652_2670592.pth
[2025-07-03 20:20:32,688][17737] Updated weights for policy 0, policy_version 870 (0.0014)
[2025-07-03 20:20:38,552][17717] Saving new best policy, reward=23.386!
[2025-07-03 20:20:44,448][17737] Updated weights for policy 0, policy_version 880 (0.0020)
[2025-07-03 20:20:53,983][17737] Updated weights for policy 0, policy_version 890 (0.0016)
[2025-07-03 20:21:05,352][17737] Updated weights for policy 0, policy_version 900 (0.0017)
[2025-07-03 20:21:15,246][17737] Updated weights for policy 0, policy_version 910 (0.0014)
[2025-07-03 20:21:26,535][17737] Updated weights for policy 0, policy_version 920 (0.0013)
[2025-07-03 20:21:37,499][17737] Updated weights for policy 0, policy_version 930 (0.0017)
[2025-07-03 20:21:48,343][17737] Updated weights for policy 0, policy_version 940 (0.0021)
[2025-07-03 20:21:59,918][17737] Updated weights for policy 0, policy_version 950 (0.0016)
[2025-07-03 20:22:11,598][17737] Updated weights for policy 0, policy_version 960 (0.0017)
[2025-07-03 20:22:26,756][17737] Updated weights for policy 0, policy_version 970 (0.0017)
[2025-07-03 20:22:28,550][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000971_3977216.pth...
[2025-07-03 20:22:28,866][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000759_3108864.pth
[2025-07-03 20:22:28,883][17717] Saving new best policy, reward=23.591!
[2025-07-03 20:22:33,556][17717] Saving new best policy, reward=24.270!
[2025-07-03 20:22:36,088][17737] Updated weights for policy 0, policy_version 980 (0.0013)
[2025-07-03 20:22:43,545][17717] Saving new best policy, reward=24.298!
[2025-07-03 20:22:48,664][17737] Updated weights for policy 0, policy_version 990 (0.0013)
[2025-07-03 20:22:57,551][17737] Updated weights for policy 0, policy_version 1000 (0.0013)
[2025-07-03 20:23:10,605][17737] Updated weights for policy 0, policy_version 1010 (0.0014)
[2025-07-03 20:23:20,165][17737] Updated weights for policy 0, policy_version 1020 (0.0019)
[2025-07-03 20:23:31,758][17737] Updated weights for policy 0, policy_version 1030 (0.0023)
[2025-07-03 20:23:42,078][17737] Updated weights for policy 0, policy_version 1040 (0.0020)
[2025-07-03 20:23:53,161][17737] Updated weights for policy 0, policy_version 1050 (0.0014)
[2025-07-03 20:24:02,516][17737] Updated weights for policy 0, policy_version 1060 (0.0016)
[2025-07-03 20:24:14,370][17737] Updated weights for policy 0, policy_version 1070 (0.0013)
[2025-07-03 20:24:24,714][17737] Updated weights for policy 0, policy_version 1080 (0.0013)
[2025-07-03 20:24:28,552][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001082_4431872.pth...
[2025-07-03 20:24:28,899][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000866_3547136.pth
[2025-07-03 20:24:36,138][17737] Updated weights for policy 0, policy_version 1090 (0.0035)
[2025-07-03 20:24:47,472][17737] Updated weights for policy 0, policy_version 1100 (0.0024)
[2025-07-03 20:24:59,180][17737] Updated weights for policy 0, policy_version 1110 (0.0017)
[2025-07-03 20:25:14,437][17737] Updated weights for policy 0, policy_version 1120 (0.0018)
[2025-07-03 20:25:23,332][17737] Updated weights for policy 0, policy_version 1130 (0.0013)
[2025-07-03 20:25:35,884][17737] Updated weights for policy 0, policy_version 1140 (0.0014)
[2025-07-03 20:25:44,882][17737] Updated weights for policy 0, policy_version 1150 (0.0013)
[2025-07-03 20:25:57,453][17737] Updated weights for policy 0, policy_version 1160 (0.0032)
[2025-07-03 20:26:06,862][17737] Updated weights for policy 0, policy_version 1170 (0.0017)
[2025-07-03 20:26:19,492][17737] Updated weights for policy 0, policy_version 1180 (0.0015)
[2025-07-03 20:26:28,555][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001189_4870144.pth...
[2025-07-03 20:26:28,937][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000971_3977216.pth
[2025-07-03 20:26:29,135][17737] Updated weights for policy 0, policy_version 1190 (0.0014)
[2025-07-03 20:26:41,650][17737] Updated weights for policy 0, policy_version 1200 (0.0022)
[2025-07-03 20:26:52,302][17737] Updated weights for policy 0, policy_version 1210 (0.0014)
[2025-07-03 20:27:03,076][17737] Updated weights for policy 0, policy_version 1220 (0.0015)
[2025-07-03 20:27:13,550][17717] Saving new best policy, reward=24.535!
[2025-07-03 20:27:14,018][17737] Updated weights for policy 0, policy_version 1230 (0.0015)
[2025-07-03 20:27:18,554][17717] Saving new best policy, reward=24.540!
[2025-07-03 20:27:25,017][17737] Updated weights for policy 0, policy_version 1240 (0.0029)
[2025-07-03 20:27:36,578][17737] Updated weights for policy 0, policy_version 1250 (0.0019)
[2025-07-03 20:27:51,060][17737] Updated weights for policy 0, policy_version 1260 (0.0017)
[2025-07-03 20:28:04,407][17737] Updated weights for policy 0, policy_version 1270 (0.0022)
[2025-07-03 20:28:13,980][17737] Updated weights for policy 0, policy_version 1280 (0.0019)
[2025-07-03 20:28:26,062][17737] Updated weights for policy 0, policy_version 1290 (0.0023)
[2025-07-03 20:28:28,561][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001293_5296128.pth...
[2025-07-03 20:28:28,865][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001082_4431872.pth
[2025-07-03 20:28:28,886][17717] Saving new best policy, reward=25.266!
[2025-07-03 20:28:33,544][17717] Saving new best policy, reward=25.317!
[2025-07-03 20:28:36,395][17737] Updated weights for policy 0, policy_version 1300 (0.0019)
[2025-07-03 20:28:48,255][17737] Updated weights for policy 0, policy_version 1310 (0.0021)
[2025-07-03 20:28:58,583][17737] Updated weights for policy 0, policy_version 1320 (0.0021)
[2025-07-03 20:29:09,662][17737] Updated weights for policy 0, policy_version 1330 (0.0014)
[2025-07-03 20:29:20,958][17737] Updated weights for policy 0, policy_version 1340 (0.0022)
[2025-07-03 20:29:23,545][17717] Saving new best policy, reward=26.904!
[2025-07-03 20:29:31,453][17737] Updated weights for policy 0, policy_version 1350 (0.0020)
[2025-07-03 20:29:43,248][17737] Updated weights for policy 0, policy_version 1360 (0.0039)
[2025-07-03 20:29:53,320][17737] Updated weights for policy 0, policy_version 1370 (0.0016)
[2025-07-03 20:30:05,502][17737] Updated weights for policy 0, policy_version 1380 (0.0016)
[2025-07-03 20:30:14,823][17737] Updated weights for policy 0, policy_version 1390 (0.0014)
[2025-07-03 20:30:28,428][17737] Updated weights for policy 0, policy_version 1400 (0.0013)
[2025-07-03 20:30:28,547][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001400_5734400.pth...
[2025-07-03 20:30:28,870][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001189_4870144.pth
[2025-07-03 20:30:41,970][17737] Updated weights for policy 0, policy_version 1410 (0.0026)
[2025-07-03 20:30:53,480][17737] Updated weights for policy 0, policy_version 1420 (0.0014)
[2025-07-03 20:31:04,084][17737] Updated weights for policy 0, policy_version 1430 (0.0013)
[2025-07-03 20:31:14,968][17737] Updated weights for policy 0, policy_version 1440 (0.0018)
[2025-07-03 20:31:26,458][17737] Updated weights for policy 0, policy_version 1450 (0.0013)
[2025-07-03 20:31:36,488][17737] Updated weights for policy 0, policy_version 1460 (0.0016)
[2025-07-03 20:31:47,893][17737] Updated weights for policy 0, policy_version 1470 (0.0016)
[2025-07-03 20:31:58,095][17737] Updated weights for policy 0, policy_version 1480 (0.0013)
[2025-07-03 20:32:09,492][17737] Updated weights for policy 0, policy_version 1490 (0.0015)
[2025-07-03 20:32:19,586][17737] Updated weights for policy 0, policy_version 1500 (0.0013)
[2025-07-03 20:32:28,551][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001507_6172672.pth...
[2025-07-03 20:32:28,846][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001293_5296128.pth
[2025-07-03 20:32:32,118][17737] Updated weights for policy 0, policy_version 1510 (0.0021)
[2025-07-03 20:32:41,157][17737] Updated weights for policy 0, policy_version 1520 (0.0025)
[2025-07-03 20:32:53,654][17737] Updated weights for policy 0, policy_version 1530 (0.0023)
[2025-07-03 20:33:02,694][17737] Updated weights for policy 0, policy_version 1540 (0.0016)
[2025-07-03 20:33:13,550][17717] Saving new best policy, reward=27.291!
[2025-07-03 20:33:17,199][17737] Updated weights for policy 0, policy_version 1550 (0.0020)
[2025-07-03 20:33:18,557][17717] Saving new best policy, reward=27.428!
[2025-07-03 20:33:29,476][17737] Updated weights for policy 0, policy_version 1560 (0.0020)
[2025-07-03 20:33:41,095][17737] Updated weights for policy 0, policy_version 1570 (0.0019)
[2025-07-03 20:33:51,495][17737] Updated weights for policy 0, policy_version 1580 (0.0019)
[2025-07-03 20:34:02,989][17737] Updated weights for policy 0, policy_version 1590 (0.0013)
[2025-07-03 20:34:14,006][17737] Updated weights for policy 0, policy_version 1600 (0.0017)
[2025-07-03 20:34:24,932][17737] Updated weights for policy 0, policy_version 1610 (0.0013)
[2025-07-03 20:34:28,559][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001614_6610944.pth...
[2025-07-03 20:34:28,925][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001400_5734400.pth
[2025-07-03 20:34:36,651][17737] Updated weights for policy 0, policy_version 1620 (0.0014)
[2025-07-03 20:34:46,525][17737] Updated weights for policy 0, policy_version 1630 (0.0021)
[2025-07-03 20:34:59,126][17737] Updated weights for policy 0, policy_version 1640 (0.0014)
[2025-07-03 20:35:08,783][17737] Updated weights for policy 0, policy_version 1650 (0.0020)
[2025-07-03 20:35:21,049][17737] Updated weights for policy 0, policy_version 1660 (0.0023)
[2025-07-03 20:35:30,366][17737] Updated weights for policy 0, policy_version 1670 (0.0013)
[2025-07-03 20:35:43,280][17737] Updated weights for policy 0, policy_version 1680 (0.0014)
[2025-07-03 20:35:52,389][17737] Updated weights for policy 0, policy_version 1690 (0.0014)
[2025-07-03 20:36:08,553][17737] Updated weights for policy 0, policy_version 1700 (0.0027)
[2025-07-03 20:36:17,743][17737] Updated weights for policy 0, policy_version 1710 (0.0014)
[2025-07-03 20:36:28,550][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001718_7036928.pth...
[2025-07-03 20:36:28,926][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001507_6172672.pth
[2025-07-03 20:36:30,444][17737] Updated weights for policy 0, policy_version 1720 (0.0016)
[2025-07-03 20:36:40,463][17737] Updated weights for policy 0, policy_version 1730 (0.0013)
[2025-07-03 20:36:52,221][17737] Updated weights for policy 0, policy_version 1740 (0.0022)
[2025-07-03 20:37:02,798][17737] Updated weights for policy 0, policy_version 1750 (0.0022)
[2025-07-03 20:37:13,925][17737] Updated weights for policy 0, policy_version 1760 (0.0017)
[2025-07-03 20:37:24,628][17737] Updated weights for policy 0, policy_version 1770 (0.0013)
[2025-07-03 20:37:35,482][17737] Updated weights for policy 0, policy_version 1780 (0.0034)
[2025-07-03 20:37:46,643][17737] Updated weights for policy 0, policy_version 1790 (0.0019)
[2025-07-03 20:37:56,373][17737] Updated weights for policy 0, policy_version 1800 (0.0016)
[2025-07-03 20:38:07,550][17737] Updated weights for policy 0, policy_version 1810 (0.0042)
[2025-07-03 20:38:17,696][17737] Updated weights for policy 0, policy_version 1820 (0.0028)
[2025-07-03 20:38:28,552][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001829_7491584.pth...
[2025-07-03 20:38:28,881][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001614_6610944.pth
[2025-07-03 20:38:29,379][17737] Updated weights for policy 0, policy_version 1830 (0.0014)
[2025-07-03 20:38:33,577][17717] Saving new best policy, reward=27.787!
[2025-07-03 20:38:39,878][17737] Updated weights for policy 0, policy_version 1840 (0.0014)
[2025-07-03 20:38:55,594][17737] Updated weights for policy 0, policy_version 1850 (0.0025)
[2025-07-03 20:39:05,024][17737] Updated weights for policy 0, policy_version 1860 (0.0013)
[2025-07-03 20:39:17,638][17737] Updated weights for policy 0, policy_version 1870 (0.0014)
[2025-07-03 20:39:26,600][17737] Updated weights for policy 0, policy_version 1880 (0.0014)
[2025-07-03 20:39:39,553][17737] Updated weights for policy 0, policy_version 1890 (0.0021)
[2025-07-03 20:39:48,539][17737] Updated weights for policy 0, policy_version 1900 (0.0013)
[2025-07-03 20:40:00,996][17737] Updated weights for policy 0, policy_version 1910 (0.0019)
[2025-07-03 20:40:11,172][17737] Updated weights for policy 0, policy_version 1920 (0.0014)
[2025-07-03 20:40:22,952][17737] Updated weights for policy 0, policy_version 1930 (0.0013)
[2025-07-03 20:40:28,554][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001936_7929856.pth...
[2025-07-03 20:40:28,793][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001718_7036928.pth
[2025-07-03 20:40:32,925][17737] Updated weights for policy 0, policy_version 1940 (0.0014)
[2025-07-03 20:40:44,784][17737] Updated weights for policy 0, policy_version 1950 (0.0022)
[2025-07-03 20:40:49,266][17717] Stopping Batcher_0...
[2025-07-03 20:40:49,267][17717] Loop batcher_evt_loop terminating...
[2025-07-03 20:40:49,280][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2025-07-03 20:40:49,441][17737] Weights refcount: 2 0
[2025-07-03 20:40:49,454][17737] Stopping InferenceWorker_p0-w0...
[2025-07-03 20:40:49,462][17737] Loop inference_proc0-0_evt_loop terminating...
[2025-07-03 20:40:49,507][17717] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001829_7491584.pth
[2025-07-03 20:40:49,523][17717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2025-07-03 20:40:49,809][17717] Stopping LearnerWorker_p0...
[2025-07-03 20:40:49,813][17717] Loop learner_proc0_evt_loop terminating...
[2025-07-03 20:40:51,088][17746] Stopping RolloutWorker_w11...
[2025-07-03 20:40:51,098][17763] Stopping RolloutWorker_w19...
[2025-07-03 20:40:51,098][17763] Loop rollout_proc19_evt_loop terminating...
[2025-07-03 20:40:51,110][17746] Loop rollout_proc11_evt_loop terminating...
[2025-07-03 20:40:51,127][17747] Stopping RolloutWorker_w7...
[2025-07-03 20:40:51,137][17747] Loop rollout_proc7_evt_loop terminating...
[2025-07-03 20:40:51,176][17751] Stopping RolloutWorker_w13...
[2025-07-03 20:40:51,176][17751] Loop rollout_proc13_evt_loop terminating...
[2025-07-03 20:40:51,189][17759] Stopping RolloutWorker_w15...
[2025-07-03 20:40:51,190][17759] Loop rollout_proc15_evt_loop terminating...
[2025-07-03 20:40:51,200][17749] Stopping RolloutWorker_w9...
[2025-07-03 20:40:51,201][17749] Loop rollout_proc9_evt_loop terminating...
[2025-07-03 20:40:51,270][17740] Stopping RolloutWorker_w3...
[2025-07-03 20:40:51,276][17740] Loop rollout_proc3_evt_loop terminating...
[2025-07-03 20:40:51,292][17739] Stopping RolloutWorker_w1...
[2025-07-03 20:40:51,292][17739] Loop rollout_proc1_evt_loop terminating...
[2025-07-03 20:40:51,360][17744] Stopping RolloutWorker_w5...
[2025-07-03 20:40:51,368][17744] Loop rollout_proc5_evt_loop terminating...
[2025-07-03 20:40:51,457][17761] Stopping RolloutWorker_w17...
[2025-07-03 20:40:51,458][17761] Loop rollout_proc17_evt_loop terminating...
[2025-07-03 20:40:51,806][17741] Stopping RolloutWorker_w4...
[2025-07-03 20:40:51,807][17741] Loop rollout_proc4_evt_loop terminating...
[2025-07-03 20:40:51,853][17743] Stopping RolloutWorker_w2...
[2025-07-03 20:40:51,865][17743] Loop rollout_proc2_evt_loop terminating...
[2025-07-03 20:40:51,951][17760] Stopping RolloutWorker_w16...
[2025-07-03 20:40:51,952][17760] Loop rollout_proc16_evt_loop terminating...
[2025-07-03 20:40:51,991][17748] Stopping RolloutWorker_w12...
[2025-07-03 20:40:51,998][17748] Loop rollout_proc12_evt_loop terminating...
[2025-07-03 20:40:52,030][17752] Stopping RolloutWorker_w14...
[2025-07-03 20:40:52,038][17742] Stopping RolloutWorker_w6...
[2025-07-03 20:40:52,039][17742] Loop rollout_proc6_evt_loop terminating...
[2025-07-03 20:40:52,031][17752] Loop rollout_proc14_evt_loop terminating...
[2025-07-03 20:40:52,097][17738] Stopping RolloutWorker_w0...
[2025-07-03 20:40:52,103][17738] Loop rollout_proc0_evt_loop terminating...
[2025-07-03 20:40:52,145][17762] Stopping RolloutWorker_w18...
[2025-07-03 20:40:52,145][17762] Loop rollout_proc18_evt_loop terminating...
[2025-07-03 20:40:52,156][17745] Stopping RolloutWorker_w10...
[2025-07-03 20:40:52,157][17745] Loop rollout_proc10_evt_loop terminating...
[2025-07-03 20:40:52,254][17750] Stopping RolloutWorker_w8...
[2025-07-03 20:40:52,254][17750] Loop rollout_proc8_evt_loop terminating...