diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,2414 @@ +[2025-07-06 15:23:21,859][06149] Saving configuration to /content/train_dir/default_experiment/config.json... +[2025-07-06 15:23:21,860][06149] Rollout worker 0 uses device cpu +[2025-07-06 15:23:21,862][06149] Rollout worker 1 uses device cpu +[2025-07-06 15:23:21,863][06149] Rollout worker 2 uses device cpu +[2025-07-06 15:23:21,863][06149] Rollout worker 3 uses device cpu +[2025-07-06 15:23:21,864][06149] Rollout worker 4 uses device cpu +[2025-07-06 15:23:21,865][06149] Rollout worker 5 uses device cpu +[2025-07-06 15:23:21,866][06149] Rollout worker 6 uses device cpu +[2025-07-06 15:23:21,867][06149] Rollout worker 7 uses device cpu +[2025-07-06 15:23:22,020][06149] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 15:23:22,020][06149] InferenceWorker_p0-w0: min num requests: 2 +[2025-07-06 15:23:22,051][06149] Starting all processes... +[2025-07-06 15:23:22,052][06149] Starting process learner_proc0 +[2025-07-06 15:23:22,106][06149] Starting all processes... +[2025-07-06 15:23:22,117][06149] Starting process inference_proc0-0 +[2025-07-06 15:23:22,121][06149] Starting process rollout_proc0 +[2025-07-06 15:23:22,121][06149] Starting process rollout_proc1 +[2025-07-06 15:23:22,121][06149] Starting process rollout_proc2 +[2025-07-06 15:23:22,121][06149] Starting process rollout_proc3 +[2025-07-06 15:23:22,121][06149] Starting process rollout_proc4 +[2025-07-06 15:23:22,121][06149] Starting process rollout_proc5 +[2025-07-06 15:23:22,121][06149] Starting process rollout_proc6 +[2025-07-06 15:23:22,121][06149] Starting process rollout_proc7 +[2025-07-06 15:23:43,764][06624] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 15:23:43,766][06624] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-07-06 15:23:43,790][06640] Worker 3 uses CPU cores [1] +[2025-07-06 15:23:43,826][06149] Heartbeat connected on RolloutWorker_w3 +[2025-07-06 15:23:43,855][06624] Num visible devices: 1 +[2025-07-06 15:23:43,867][06624] Starting seed is not provided +[2025-07-06 15:23:43,868][06624] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 15:23:43,869][06624] Initializing actor-critic model on device cuda:0 +[2025-07-06 15:23:43,870][06624] RunningMeanStd input shape: (3, 72, 128) +[2025-07-06 15:23:43,872][06149] Heartbeat connected on Batcher_0 +[2025-07-06 15:23:43,876][06624] RunningMeanStd input shape: (1,) +[2025-07-06 15:23:43,921][06637] Worker 0 uses CPU cores [0] +[2025-07-06 15:23:43,943][06149] Heartbeat connected on RolloutWorker_w0 +[2025-07-06 15:23:43,939][06624] ConvEncoder: input_channels=3 +[2025-07-06 15:23:44,248][06641] Worker 4 uses CPU cores [0] +[2025-07-06 15:23:44,242][06642] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 15:23:44,250][06639] Worker 2 uses CPU cores [0] +[2025-07-06 15:23:44,251][06642] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-07-06 15:23:44,274][06149] Heartbeat connected on RolloutWorker_w4 +[2025-07-06 15:23:44,276][06149] Heartbeat connected on RolloutWorker_w2 +[2025-07-06 15:23:44,310][06638] Worker 1 uses CPU cores [1] +[2025-07-06 15:23:44,311][06149] Heartbeat connected on RolloutWorker_w1 +[2025-07-06 15:23:44,327][06642] Num visible devices: 1 +[2025-07-06 15:23:44,332][06149] Heartbeat connected on InferenceWorker_p0-w0 +[2025-07-06 15:23:44,411][06644] Worker 6 uses CPU cores [0] +[2025-07-06 15:23:44,427][06149] Heartbeat connected on RolloutWorker_w6 +[2025-07-06 15:23:44,677][06645] Worker 7 uses CPU cores [1] +[2025-07-06 15:23:44,680][06149] Heartbeat connected on RolloutWorker_w7 +[2025-07-06 15:23:44,695][06624] Conv encoder output size: 512 +[2025-07-06 15:23:44,697][06624] Policy head output size: 512 +[2025-07-06 15:23:44,729][06643] Worker 5 uses CPU cores [1] +[2025-07-06 15:23:44,731][06149] Heartbeat connected on RolloutWorker_w5 +[2025-07-06 15:23:44,793][06624] Created Actor Critic model with architecture: +[2025-07-06 15:23:44,795][06624] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2025-07-06 15:23:45,250][06624] Using optimizer +[2025-07-06 15:23:53,057][06624] No checkpoints found +[2025-07-06 15:23:53,058][06624] Did not load from checkpoint, starting from scratch! +[2025-07-06 15:23:53,059][06624] Initialized policy 0 weights for model version 0 +[2025-07-06 15:23:53,069][06624] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 15:23:53,076][06624] LearnerWorker_p0 finished initialization! +[2025-07-06 15:23:53,079][06149] Heartbeat connected on LearnerWorker_p0 +[2025-07-06 15:23:53,368][06642] RunningMeanStd input shape: (3, 72, 128) +[2025-07-06 15:23:53,371][06642] RunningMeanStd input shape: (1,) +[2025-07-06 15:23:53,384][06642] ConvEncoder: input_channels=3 +[2025-07-06 15:23:53,510][06642] Conv encoder output size: 512 +[2025-07-06 15:23:53,511][06642] Policy head output size: 512 +[2025-07-06 15:23:53,548][06149] Inference worker 0-0 is ready! +[2025-07-06 15:23:53,548][06149] All inference workers are ready! Signal rollout workers to start! +[2025-07-06 15:23:53,828][06640] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 15:23:53,831][06643] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 15:23:53,825][06639] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 15:23:53,832][06645] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 15:23:53,830][06644] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 15:23:53,842][06638] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 15:23:53,844][06637] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 15:23:53,847][06641] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 15:23:55,191][06645] Decorrelating experience for 0 frames... +[2025-07-06 15:23:55,191][06641] Decorrelating experience for 0 frames... +[2025-07-06 15:23:55,192][06643] Decorrelating experience for 0 frames... +[2025-07-06 15:23:55,610][06645] Decorrelating experience for 32 frames... +[2025-07-06 15:23:55,972][06641] Decorrelating experience for 32 frames... +[2025-07-06 15:23:56,045][06639] Decorrelating experience for 0 frames... +[2025-07-06 15:23:56,397][06149] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-07-06 15:23:56,483][06643] Decorrelating experience for 32 frames... +[2025-07-06 15:23:56,709][06645] Decorrelating experience for 64 frames... +[2025-07-06 15:23:56,974][06639] Decorrelating experience for 32 frames... +[2025-07-06 15:23:57,464][06641] Decorrelating experience for 64 frames... +[2025-07-06 15:23:57,661][06643] Decorrelating experience for 64 frames... +[2025-07-06 15:23:57,666][06645] Decorrelating experience for 96 frames... +[2025-07-06 15:23:57,917][06639] Decorrelating experience for 64 frames... +[2025-07-06 15:23:58,560][06641] Decorrelating experience for 96 frames... +[2025-07-06 15:23:58,621][06643] Decorrelating experience for 96 frames... +[2025-07-06 15:23:58,846][06639] Decorrelating experience for 96 frames... +[2025-07-06 15:24:01,398][06149] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 90.0. Samples: 450. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-07-06 15:24:01,399][06149] Avg episode reward: [(0, '2.949')] +[2025-07-06 15:24:02,073][06624] Signal inference workers to stop experience collection... +[2025-07-06 15:24:02,084][06642] InferenceWorker_p0-w0: stopping experience collection +[2025-07-06 15:24:03,506][06624] Signal inference workers to resume experience collection... +[2025-07-06 15:24:03,510][06642] InferenceWorker_p0-w0: resuming experience collection +[2025-07-06 15:24:06,397][06149] Fps is (10 sec: 1228.8, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 12288. Throughput: 0: 324.4. Samples: 3244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-07-06 15:24:06,398][06149] Avg episode reward: [(0, '3.662')] +[2025-07-06 15:24:11,399][06149] Fps is (10 sec: 3276.2, 60 sec: 2184.2, 300 sec: 2184.2). Total num frames: 32768. Throughput: 0: 363.4. Samples: 5452. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:24:11,405][06149] Avg episode reward: [(0, '4.220')] +[2025-07-06 15:24:13,607][06642] Updated weights for policy 0, policy_version 10 (0.0143) +[2025-07-06 15:24:16,397][06149] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 49152. Throughput: 0: 569.4. Samples: 11388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:24:16,402][06149] Avg episode reward: [(0, '4.448')] +[2025-07-06 15:24:21,397][06149] Fps is (10 sec: 2867.9, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 61440. Throughput: 0: 638.7. Samples: 15968. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:24:21,401][06149] Avg episode reward: [(0, '4.407')] +[2025-07-06 15:24:25,679][06642] Updated weights for policy 0, policy_version 20 (0.0026) +[2025-07-06 15:24:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 81920. Throughput: 0: 620.0. Samples: 18600. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:24:26,403][06149] Avg episode reward: [(0, '4.499')] +[2025-07-06 15:24:31,397][06149] Fps is (10 sec: 4095.9, 60 sec: 2925.7, 300 sec: 2925.7). Total num frames: 102400. Throughput: 0: 692.9. Samples: 24252. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:24:31,399][06149] Avg episode reward: [(0, '4.475')] +[2025-07-06 15:24:31,401][06624] Saving new best policy, reward=4.475! +[2025-07-06 15:24:36,397][06149] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 114688. Throughput: 0: 711.3. Samples: 28450. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:24:36,403][06149] Avg episode reward: [(0, '4.457')] +[2025-07-06 15:24:38,530][06642] Updated weights for policy 0, policy_version 30 (0.0018) +[2025-07-06 15:24:41,399][06149] Fps is (10 sec: 2866.8, 60 sec: 2912.6, 300 sec: 2912.6). Total num frames: 131072. Throughput: 0: 689.4. Samples: 31024. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 15:24:41,404][06149] Avg episode reward: [(0, '4.449')] +[2025-07-06 15:24:46,397][06149] Fps is (10 sec: 3276.8, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 147456. Throughput: 0: 801.7. Samples: 36524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:24:46,398][06149] Avg episode reward: [(0, '4.441')] +[2025-07-06 15:24:50,761][06642] Updated weights for policy 0, policy_version 40 (0.0013) +[2025-07-06 15:24:51,397][06149] Fps is (10 sec: 3277.4, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 163840. Throughput: 0: 838.7. Samples: 40986. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:24:51,398][06149] Avg episode reward: [(0, '4.293')] +[2025-07-06 15:24:56,397][06149] Fps is (10 sec: 3686.5, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 860.5. Samples: 44174. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:24:56,401][06149] Avg episode reward: [(0, '4.168')] +[2025-07-06 15:25:01,014][06642] Updated weights for policy 0, policy_version 50 (0.0013) +[2025-07-06 15:25:01,397][06149] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3150.8). Total num frames: 204800. Throughput: 0: 869.4. Samples: 50512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:25:01,401][06149] Avg episode reward: [(0, '4.414')] +[2025-07-06 15:25:06,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3159.8). Total num frames: 221184. Throughput: 0: 876.1. Samples: 55392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 15:25:06,398][06149] Avg episode reward: [(0, '4.447')] +[2025-07-06 15:25:11,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 889.2. Samples: 58614. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:25:11,402][06149] Avg episode reward: [(0, '4.377')] +[2025-07-06 15:25:11,544][06642] Updated weights for policy 0, policy_version 60 (0.0019) +[2025-07-06 15:25:16,398][06149] Fps is (10 sec: 3685.8, 60 sec: 3481.5, 300 sec: 3225.5). Total num frames: 258048. Throughput: 0: 891.6. Samples: 64376. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:25:16,409][06149] Avg episode reward: [(0, '4.589')] +[2025-07-06 15:25:16,422][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000063_258048.pth... +[2025-07-06 15:25:16,545][06624] Saving new best policy, reward=4.589! +[2025-07-06 15:25:21,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 278528. Throughput: 0: 919.0. Samples: 69804. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:25:21,398][06149] Avg episode reward: [(0, '4.518')] +[2025-07-06 15:25:22,716][06642] Updated weights for policy 0, policy_version 70 (0.0014) +[2025-07-06 15:25:26,397][06149] Fps is (10 sec: 4096.7, 60 sec: 3618.1, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 931.8. Samples: 72954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 15:25:26,401][06149] Avg episode reward: [(0, '4.294')] +[2025-07-06 15:25:31,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3319.9). Total num frames: 315392. Throughput: 0: 920.4. Samples: 77940. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:25:31,398][06149] Avg episode reward: [(0, '4.277')] +[2025-07-06 15:25:34,416][06642] Updated weights for policy 0, policy_version 80 (0.0018) +[2025-07-06 15:25:36,399][06149] Fps is (10 sec: 3276.0, 60 sec: 3618.0, 300 sec: 3317.7). Total num frames: 331776. Throughput: 0: 941.1. Samples: 83338. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:25:36,400][06149] Avg episode reward: [(0, '4.339')] +[2025-07-06 15:25:41,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3354.8). Total num frames: 352256. Throughput: 0: 934.9. Samples: 86246. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:25:41,398][06149] Avg episode reward: [(0, '4.476')] +[2025-07-06 15:25:46,397][06149] Fps is (10 sec: 3277.6, 60 sec: 3618.1, 300 sec: 3314.0). Total num frames: 364544. Throughput: 0: 891.6. Samples: 90632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:25:46,398][06149] Avg episode reward: [(0, '4.492')] +[2025-07-06 15:25:46,509][06642] Updated weights for policy 0, policy_version 90 (0.0028) +[2025-07-06 15:25:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3348.0). Total num frames: 385024. Throughput: 0: 914.5. Samples: 96544. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 15:25:51,398][06149] Avg episode reward: [(0, '4.620')] +[2025-07-06 15:25:51,403][06624] Saving new best policy, reward=4.620! +[2025-07-06 15:25:56,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3345.1). Total num frames: 401408. Throughput: 0: 907.6. Samples: 99456. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:25:56,398][06149] Avg episode reward: [(0, '4.686')] +[2025-07-06 15:25:56,417][06624] Saving new best policy, reward=4.686! +[2025-07-06 15:25:58,467][06642] Updated weights for policy 0, policy_version 100 (0.0013) +[2025-07-06 15:26:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3342.3). Total num frames: 417792. Throughput: 0: 872.5. Samples: 103638. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:26:01,398][06149] Avg episode reward: [(0, '4.796')] +[2025-07-06 15:26:01,403][06624] Saving new best policy, reward=4.796! +[2025-07-06 15:26:06,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3371.3). Total num frames: 438272. Throughput: 0: 879.6. Samples: 109384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:26:06,398][06149] Avg episode reward: [(0, '4.730')] +[2025-07-06 15:26:09,763][06642] Updated weights for policy 0, policy_version 110 (0.0015) +[2025-07-06 15:26:11,401][06149] Fps is (10 sec: 3684.7, 60 sec: 3549.6, 300 sec: 3367.7). Total num frames: 454656. Throughput: 0: 873.4. Samples: 112262. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:26:11,402][06149] Avg episode reward: [(0, '4.557')] +[2025-07-06 15:26:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3364.6). Total num frames: 471040. Throughput: 0: 861.3. Samples: 116698. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:26:16,398][06149] Avg episode reward: [(0, '4.460')] +[2025-07-06 15:26:20,973][06642] Updated weights for policy 0, policy_version 120 (0.0014) +[2025-07-06 15:26:21,397][06149] Fps is (10 sec: 3688.0, 60 sec: 3549.9, 300 sec: 3389.8). Total num frames: 491520. Throughput: 0: 876.4. Samples: 122774. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:26:21,398][06149] Avg episode reward: [(0, '4.396')] +[2025-07-06 15:26:26,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3386.0). Total num frames: 507904. Throughput: 0: 870.4. Samples: 125416. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:26:26,400][06149] Avg episode reward: [(0, '4.407')] +[2025-07-06 15:26:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3408.9). Total num frames: 528384. Throughput: 0: 891.2. Samples: 130738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:26:31,400][06149] Avg episode reward: [(0, '4.378')] +[2025-07-06 15:26:32,034][06642] Updated weights for policy 0, policy_version 130 (0.0012) +[2025-07-06 15:26:36,397][06149] Fps is (10 sec: 4095.7, 60 sec: 3618.2, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 893.5. Samples: 136752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:26:36,399][06149] Avg episode reward: [(0, '4.376')] +[2025-07-06 15:26:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3400.9). Total num frames: 561152. Throughput: 0: 871.4. Samples: 138670. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:26:41,398][06149] Avg episode reward: [(0, '4.402')] +[2025-07-06 15:26:44,016][06642] Updated weights for policy 0, policy_version 140 (0.0029) +[2025-07-06 15:26:46,397][06149] Fps is (10 sec: 3277.0, 60 sec: 3618.1, 300 sec: 3421.4). Total num frames: 581632. Throughput: 0: 900.7. Samples: 144168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 15:26:46,398][06149] Avg episode reward: [(0, '4.496')] +[2025-07-06 15:26:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3417.2). Total num frames: 598016. Throughput: 0: 896.5. Samples: 149728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 15:26:51,401][06149] Avg episode reward: [(0, '4.485')] +[2025-07-06 15:26:55,847][06642] Updated weights for policy 0, policy_version 150 (0.0016) +[2025-07-06 15:26:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3413.3). Total num frames: 614400. Throughput: 0: 873.3. Samples: 151556. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:26:56,398][06149] Avg episode reward: [(0, '4.518')] +[2025-07-06 15:27:01,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3431.8). Total num frames: 634880. Throughput: 0: 905.6. Samples: 157448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:27:01,401][06149] Avg episode reward: [(0, '4.547')] +[2025-07-06 15:27:06,398][06149] Fps is (10 sec: 3276.4, 60 sec: 3481.5, 300 sec: 3406.1). Total num frames: 647168. Throughput: 0: 880.2. Samples: 162386. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:27:06,400][06149] Avg episode reward: [(0, '4.713')] +[2025-07-06 15:27:08,057][06642] Updated weights for policy 0, policy_version 160 (0.0022) +[2025-07-06 15:27:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3423.8). Total num frames: 667648. Throughput: 0: 871.8. Samples: 164646. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:27:11,402][06149] Avg episode reward: [(0, '4.510')] +[2025-07-06 15:27:16,397][06149] Fps is (10 sec: 4096.5, 60 sec: 3618.1, 300 sec: 3440.6). Total num frames: 688128. Throughput: 0: 886.0. Samples: 170610. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:27:16,400][06149] Avg episode reward: [(0, '4.637')] +[2025-07-06 15:27:16,408][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000168_688128.pth... +[2025-07-06 15:27:18,949][06642] Updated weights for policy 0, policy_version 170 (0.0016) +[2025-07-06 15:27:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3416.7). Total num frames: 700416. Throughput: 0: 854.7. Samples: 175212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:27:21,400][06149] Avg episode reward: [(0, '4.676')] +[2025-07-06 15:27:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3432.8). Total num frames: 720896. Throughput: 0: 873.2. Samples: 177964. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:27:26,406][06149] Avg episode reward: [(0, '4.622')] +[2025-07-06 15:27:30,327][06642] Updated weights for policy 0, policy_version 180 (0.0013) +[2025-07-06 15:27:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.2). Total num frames: 737280. Throughput: 0: 883.5. Samples: 183926. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:27:31,401][06149] Avg episode reward: [(0, '4.641')] +[2025-07-06 15:27:36,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3425.7). Total num frames: 753664. Throughput: 0: 855.2. Samples: 188214. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:27:36,398][06149] Avg episode reward: [(0, '4.579')] +[2025-07-06 15:27:41,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3440.6). Total num frames: 774144. Throughput: 0: 880.3. Samples: 191168. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:27:41,402][06149] Avg episode reward: [(0, '4.411')] +[2025-07-06 15:27:42,316][06642] Updated weights for policy 0, policy_version 190 (0.0016) +[2025-07-06 15:27:46,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3437.1). Total num frames: 790528. Throughput: 0: 879.8. Samples: 197040. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:27:46,402][06149] Avg episode reward: [(0, '4.460')] +[2025-07-06 15:27:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3433.7). Total num frames: 806912. Throughput: 0: 870.6. Samples: 201562. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:27:51,398][06149] Avg episode reward: [(0, '4.477')] +[2025-07-06 15:27:54,057][06642] Updated weights for policy 0, policy_version 200 (0.0020) +[2025-07-06 15:27:56,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3447.5). Total num frames: 827392. Throughput: 0: 887.2. Samples: 204570. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:27:56,402][06149] Avg episode reward: [(0, '4.472')] +[2025-07-06 15:28:01,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3444.0). Total num frames: 843776. Throughput: 0: 878.1. Samples: 210126. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:28:01,400][06149] Avg episode reward: [(0, '4.464')] +[2025-07-06 15:28:05,963][06642] Updated weights for policy 0, policy_version 210 (0.0018) +[2025-07-06 15:28:06,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3440.6). Total num frames: 860160. Throughput: 0: 881.6. Samples: 214886. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:28:06,398][06149] Avg episode reward: [(0, '4.591')] +[2025-07-06 15:28:11,397][06149] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3453.5). Total num frames: 880640. Throughput: 0: 886.7. Samples: 217864. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:28:11,398][06149] Avg episode reward: [(0, '4.453')] +[2025-07-06 15:28:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3434.3). Total num frames: 892928. Throughput: 0: 867.3. Samples: 222954. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:28:16,398][06149] Avg episode reward: [(0, '4.477')] +[2025-07-06 15:28:17,726][06642] Updated weights for policy 0, policy_version 220 (0.0014) +[2025-07-06 15:28:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3446.8). Total num frames: 913408. Throughput: 0: 891.5. Samples: 228332. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:28:21,402][06149] Avg episode reward: [(0, '4.455')] +[2025-07-06 15:28:26,397][06149] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3458.8). Total num frames: 933888. Throughput: 0: 892.0. Samples: 231310. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:28:26,401][06149] Avg episode reward: [(0, '4.617')] +[2025-07-06 15:28:29,326][06642] Updated weights for policy 0, policy_version 230 (0.0013) +[2025-07-06 15:28:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3440.6). Total num frames: 946176. Throughput: 0: 864.0. Samples: 235918. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:28:31,401][06149] Avg episode reward: [(0, '4.810')] +[2025-07-06 15:28:31,405][06624] Saving new best policy, reward=4.810! +[2025-07-06 15:28:36,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3452.3). Total num frames: 966656. Throughput: 0: 891.1. Samples: 241660. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:28:36,402][06149] Avg episode reward: [(0, '4.869')] +[2025-07-06 15:28:36,410][06624] Saving new best policy, reward=4.869! +[2025-07-06 15:28:39,912][06642] Updated weights for policy 0, policy_version 240 (0.0018) +[2025-07-06 15:28:41,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3449.3). Total num frames: 983040. Throughput: 0: 888.5. Samples: 244554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:28:41,401][06149] Avg episode reward: [(0, '4.911')] +[2025-07-06 15:28:41,402][06624] Saving new best policy, reward=4.911! +[2025-07-06 15:28:46,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3446.3). Total num frames: 999424. Throughput: 0: 861.8. Samples: 248906. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:28:46,399][06149] Avg episode reward: [(0, '5.020')] +[2025-07-06 15:28:46,404][06624] Saving new best policy, reward=5.020! +[2025-07-06 15:28:51,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 1019904. Throughput: 0: 889.6. Samples: 254920. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:28:51,398][06149] Avg episode reward: [(0, '5.008')] +[2025-07-06 15:28:51,985][06642] Updated weights for policy 0, policy_version 250 (0.0014) +[2025-07-06 15:28:56,399][06149] Fps is (10 sec: 3685.8, 60 sec: 3481.5, 300 sec: 3512.8). Total num frames: 1036288. Throughput: 0: 889.1. Samples: 257874. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:28:56,403][06149] Avg episode reward: [(0, '4.857')] +[2025-07-06 15:29:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1052672. Throughput: 0: 875.5. Samples: 262350. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:29:01,399][06149] Avg episode reward: [(0, '5.042')] +[2025-07-06 15:29:01,403][06624] Saving new best policy, reward=5.042! +[2025-07-06 15:29:03,898][06642] Updated weights for policy 0, policy_version 260 (0.0013) +[2025-07-06 15:29:06,397][06149] Fps is (10 sec: 3687.1, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 1073152. Throughput: 0: 884.5. Samples: 268134. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:29:06,400][06149] Avg episode reward: [(0, '5.197')] +[2025-07-06 15:29:06,407][06624] Saving new best policy, reward=5.197! +[2025-07-06 15:29:11,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 1085440. Throughput: 0: 874.6. Samples: 270666. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:29:11,404][06149] Avg episode reward: [(0, '4.924')] +[2025-07-06 15:29:15,812][06642] Updated weights for policy 0, policy_version 270 (0.0013) +[2025-07-06 15:29:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1105920. Throughput: 0: 878.4. Samples: 275448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:29:16,401][06149] Avg episode reward: [(0, '4.751')] +[2025-07-06 15:29:16,407][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000270_1105920.pth... +[2025-07-06 15:29:16,496][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000063_258048.pth +[2025-07-06 15:29:21,397][06149] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1126400. Throughput: 0: 880.0. Samples: 281262. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:29:21,401][06149] Avg episode reward: [(0, '4.775')] +[2025-07-06 15:29:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 1138688. Throughput: 0: 864.2. Samples: 283442. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:29:26,398][06149] Avg episode reward: [(0, '4.756')] +[2025-07-06 15:29:27,873][06642] Updated weights for policy 0, policy_version 280 (0.0015) +[2025-07-06 15:29:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1159168. Throughput: 0: 885.1. Samples: 288736. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:29:31,398][06149] Avg episode reward: [(0, '4.904')] +[2025-07-06 15:29:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1175552. Throughput: 0: 878.3. Samples: 294444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:29:36,402][06149] Avg episode reward: [(0, '5.267')] +[2025-07-06 15:29:36,415][06624] Saving new best policy, reward=5.267! +[2025-07-06 15:29:39,853][06642] Updated weights for policy 0, policy_version 290 (0.0016) +[2025-07-06 15:29:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1191936. Throughput: 0: 850.9. Samples: 296162. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:29:41,399][06149] Avg episode reward: [(0, '5.294')] +[2025-07-06 15:29:41,403][06624] Saving new best policy, reward=5.294! +[2025-07-06 15:29:46,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1212416. Throughput: 0: 875.4. Samples: 301744. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:29:46,402][06149] Avg episode reward: [(0, '5.273')] +[2025-07-06 15:29:51,170][06642] Updated weights for policy 0, policy_version 300 (0.0014) +[2025-07-06 15:29:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1228800. Throughput: 0: 865.3. Samples: 307074. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:29:51,403][06149] Avg episode reward: [(0, '5.302')] +[2025-07-06 15:29:51,405][06624] Saving new best policy, reward=5.302! +[2025-07-06 15:29:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3526.7). Total num frames: 1245184. Throughput: 0: 853.1. Samples: 309054. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:29:56,404][06149] Avg episode reward: [(0, '5.274')] +[2025-07-06 15:30:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1261568. Throughput: 0: 877.0. Samples: 314914. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:30:01,398][06149] Avg episode reward: [(0, '5.208')] +[2025-07-06 15:30:02,488][06642] Updated weights for policy 0, policy_version 310 (0.0013) +[2025-07-06 15:30:06,402][06149] Fps is (10 sec: 3275.0, 60 sec: 3413.0, 300 sec: 3512.8). Total num frames: 1277952. Throughput: 0: 853.4. Samples: 319668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:30:06,404][06149] Avg episode reward: [(0, '5.409')] +[2025-07-06 15:30:06,411][06624] Saving new best policy, reward=5.409! +[2025-07-06 15:30:11,401][06149] Fps is (10 sec: 3684.8, 60 sec: 3549.6, 300 sec: 3526.7). Total num frames: 1298432. Throughput: 0: 857.2. Samples: 322018. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:30:11,403][06149] Avg episode reward: [(0, '5.931')] +[2025-07-06 15:30:11,415][06624] Saving new best policy, reward=5.931! +[2025-07-06 15:30:14,653][06642] Updated weights for policy 0, policy_version 320 (0.0013) +[2025-07-06 15:30:16,397][06149] Fps is (10 sec: 3688.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1314816. Throughput: 0: 869.9. Samples: 327882. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:30:16,398][06149] Avg episode reward: [(0, '6.280')] +[2025-07-06 15:30:16,406][06624] Saving new best policy, reward=6.280! +[2025-07-06 15:30:21,399][06149] Fps is (10 sec: 2867.8, 60 sec: 3344.9, 300 sec: 3485.0). Total num frames: 1327104. Throughput: 0: 840.2. Samples: 332254. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:30:21,401][06149] Avg episode reward: [(0, '6.350')] +[2025-07-06 15:30:21,405][06624] Saving new best policy, reward=6.350! +[2025-07-06 15:30:26,399][06149] Fps is (10 sec: 3276.2, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 1347584. Throughput: 0: 865.6. Samples: 335116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:30:26,400][06149] Avg episode reward: [(0, '6.030')] +[2025-07-06 15:30:26,771][06642] Updated weights for policy 0, policy_version 330 (0.0015) +[2025-07-06 15:30:31,397][06149] Fps is (10 sec: 4097.0, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 1368064. Throughput: 0: 874.0. Samples: 341074. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:30:31,402][06149] Avg episode reward: [(0, '6.281')] +[2025-07-06 15:30:36,397][06149] Fps is (10 sec: 3277.4, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 1380352. Throughput: 0: 851.1. Samples: 345372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:30:36,398][06149] Avg episode reward: [(0, '6.289')] +[2025-07-06 15:30:38,726][06642] Updated weights for policy 0, policy_version 340 (0.0013) +[2025-07-06 15:30:41,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1400832. Throughput: 0: 873.6. Samples: 348366. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:30:41,399][06149] Avg episode reward: [(0, '6.518')] +[2025-07-06 15:30:41,400][06624] Saving new best policy, reward=6.518! +[2025-07-06 15:30:46,398][06149] Fps is (10 sec: 3685.9, 60 sec: 3413.3, 300 sec: 3498.9). Total num frames: 1417216. Throughput: 0: 874.5. Samples: 354266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:30:46,400][06149] Avg episode reward: [(0, '6.584')] +[2025-07-06 15:30:46,409][06624] Saving new best policy, reward=6.584! +[2025-07-06 15:30:50,669][06642] Updated weights for policy 0, policy_version 350 (0.0019) +[2025-07-06 15:30:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 1433600. Throughput: 0: 865.9. Samples: 358630. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:30:51,402][06149] Avg episode reward: [(0, '7.047')] +[2025-07-06 15:30:51,406][06624] Saving new best policy, reward=7.047! +[2025-07-06 15:30:56,397][06149] Fps is (10 sec: 3686.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1454080. Throughput: 0: 878.3. Samples: 361540. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:30:56,399][06149] Avg episode reward: [(0, '7.382')] +[2025-07-06 15:30:56,408][06624] Saving new best policy, reward=7.382! +[2025-07-06 15:31:01,398][06149] Fps is (10 sec: 3686.0, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 1470464. Throughput: 0: 868.5. Samples: 366964. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:31:01,400][06149] Avg episode reward: [(0, '7.177')] +[2025-07-06 15:31:02,778][06642] Updated weights for policy 0, policy_version 360 (0.0013) +[2025-07-06 15:31:06,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3481.9, 300 sec: 3499.0). Total num frames: 1486848. Throughput: 0: 879.4. Samples: 371826. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:31:06,402][06149] Avg episode reward: [(0, '7.175')] +[2025-07-06 15:31:11,397][06149] Fps is (10 sec: 3686.9, 60 sec: 3481.8, 300 sec: 3512.8). Total num frames: 1507328. Throughput: 0: 887.5. Samples: 375054. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:31:11,402][06149] Avg episode reward: [(0, '7.724')] +[2025-07-06 15:31:11,404][06624] Saving new best policy, reward=7.724! +[2025-07-06 15:31:12,467][06642] Updated weights for policy 0, policy_version 370 (0.0014) +[2025-07-06 15:31:16,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1523712. Throughput: 0: 876.0. Samples: 380492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:31:16,400][06149] Avg episode reward: [(0, '7.615')] +[2025-07-06 15:31:16,416][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000372_1523712.pth... +[2025-07-06 15:31:16,506][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000168_688128.pth +[2025-07-06 15:31:21,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3512.8). Total num frames: 1544192. Throughput: 0: 907.9. Samples: 386228. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:31:21,399][06149] Avg episode reward: [(0, '7.588')] +[2025-07-06 15:31:23,614][06642] Updated weights for policy 0, policy_version 380 (0.0013) +[2025-07-06 15:31:26,403][06149] Fps is (10 sec: 4093.4, 60 sec: 3617.9, 300 sec: 3512.8). Total num frames: 1564672. Throughput: 0: 911.1. Samples: 389372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:31:26,404][06149] Avg episode reward: [(0, '7.407')] +[2025-07-06 15:31:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1581056. Throughput: 0: 891.1. Samples: 394366. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:31:31,398][06149] Avg episode reward: [(0, '8.078')] +[2025-07-06 15:31:31,403][06624] Saving new best policy, reward=8.078! +[2025-07-06 15:31:34,781][06642] Updated weights for policy 0, policy_version 390 (0.0020) +[2025-07-06 15:31:36,397][06149] Fps is (10 sec: 3688.8, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 1601536. Throughput: 0: 930.5. Samples: 400504. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:31:36,404][06149] Avg episode reward: [(0, '7.921')] +[2025-07-06 15:31:41,400][06149] Fps is (10 sec: 4094.6, 60 sec: 3686.2, 300 sec: 3526.7). Total num frames: 1622016. Throughput: 0: 935.7. Samples: 403650. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:31:41,404][06149] Avg episode reward: [(0, '7.681')] +[2025-07-06 15:31:45,708][06642] Updated weights for policy 0, policy_version 400 (0.0018) +[2025-07-06 15:31:46,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3526.7). Total num frames: 1638400. Throughput: 0: 924.2. Samples: 408550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:31:46,405][06149] Avg episode reward: [(0, '7.818')] +[2025-07-06 15:31:51,397][06149] Fps is (10 sec: 3687.6, 60 sec: 3754.7, 300 sec: 3540.6). Total num frames: 1658880. Throughput: 0: 954.8. Samples: 414794. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:31:51,398][06149] Avg episode reward: [(0, '8.791')] +[2025-07-06 15:31:51,400][06624] Saving new best policy, reward=8.791! +[2025-07-06 15:31:56,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 1675264. Throughput: 0: 947.2. Samples: 417680. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:31:56,399][06149] Avg episode reward: [(0, '8.891')] +[2025-07-06 15:31:56,418][06624] Saving new best policy, reward=8.891! +[2025-07-06 15:31:57,718][06642] Updated weights for policy 0, policy_version 410 (0.0015) +[2025-07-06 15:32:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3540.6). Total num frames: 1691648. Throughput: 0: 923.2. Samples: 422038. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:32:01,398][06149] Avg episode reward: [(0, '9.019')] +[2025-07-06 15:32:01,402][06624] Saving new best policy, reward=9.019! +[2025-07-06 15:32:06,397][06149] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3540.6). Total num frames: 1712128. Throughput: 0: 922.5. Samples: 427742. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:32:06,398][06149] Avg episode reward: [(0, '8.566')] +[2025-07-06 15:32:08,735][06642] Updated weights for policy 0, policy_version 420 (0.0015) +[2025-07-06 15:32:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 1724416. Throughput: 0: 909.2. Samples: 430282. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:32:11,398][06149] Avg episode reward: [(0, '9.273')] +[2025-07-06 15:32:11,404][06624] Saving new best policy, reward=9.273! +[2025-07-06 15:32:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 1744896. Throughput: 0: 905.8. Samples: 435126. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:32:16,401][06149] Avg episode reward: [(0, '9.650')] +[2025-07-06 15:32:16,414][06624] Saving new best policy, reward=9.650! +[2025-07-06 15:32:20,113][06642] Updated weights for policy 0, policy_version 430 (0.0016) +[2025-07-06 15:32:21,397][06149] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 1765376. Throughput: 0: 899.2. Samples: 440966. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:32:21,402][06149] Avg episode reward: [(0, '9.893')] +[2025-07-06 15:32:21,403][06624] Saving new best policy, reward=9.893! +[2025-07-06 15:32:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3550.2, 300 sec: 3526.7). Total num frames: 1777664. Throughput: 0: 873.7. Samples: 442964. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:32:26,398][06149] Avg episode reward: [(0, '9.809')] +[2025-07-06 15:32:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1798144. Throughput: 0: 882.2. Samples: 448248. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:32:31,399][06149] Avg episode reward: [(0, '10.684')] +[2025-07-06 15:32:31,405][06624] Saving new best policy, reward=10.684! +[2025-07-06 15:32:32,288][06642] Updated weights for policy 0, policy_version 440 (0.0022) +[2025-07-06 15:32:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1814528. Throughput: 0: 868.5. Samples: 453876. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:32:36,400][06149] Avg episode reward: [(0, '11.405')] +[2025-07-06 15:32:36,412][06624] Saving new best policy, reward=11.405! +[2025-07-06 15:32:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3526.7). Total num frames: 1830912. Throughput: 0: 841.7. Samples: 455554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:32:41,398][06149] Avg episode reward: [(0, '12.262')] +[2025-07-06 15:32:41,409][06624] Saving new best policy, reward=12.262! +[2025-07-06 15:32:44,002][06642] Updated weights for policy 0, policy_version 450 (0.0013) +[2025-07-06 15:32:46,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1851392. Throughput: 0: 881.2. Samples: 461690. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:32:46,398][06149] Avg episode reward: [(0, '12.095')] +[2025-07-06 15:32:51,397][06149] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1867776. Throughput: 0: 870.9. Samples: 466932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:32:51,401][06149] Avg episode reward: [(0, '12.639')] +[2025-07-06 15:32:51,406][06624] Saving new best policy, reward=12.639! +[2025-07-06 15:32:55,883][06642] Updated weights for policy 0, policy_version 460 (0.0012) +[2025-07-06 15:32:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1884160. Throughput: 0: 862.6. Samples: 469098. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:32:56,408][06149] Avg episode reward: [(0, '11.940')] +[2025-07-06 15:33:01,397][06149] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1904640. Throughput: 0: 886.3. Samples: 475008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:33:01,404][06149] Avg episode reward: [(0, '12.313')] +[2025-07-06 15:33:06,400][06149] Fps is (10 sec: 3275.7, 60 sec: 3413.1, 300 sec: 3512.8). Total num frames: 1916928. Throughput: 0: 857.9. Samples: 479574. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:33:06,402][06149] Avg episode reward: [(0, '13.217')] +[2025-07-06 15:33:06,411][06624] Saving new best policy, reward=13.217! +[2025-07-06 15:33:08,153][06642] Updated weights for policy 0, policy_version 470 (0.0014) +[2025-07-06 15:33:11,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1937408. Throughput: 0: 869.4. Samples: 482088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 15:33:11,401][06149] Avg episode reward: [(0, '13.710')] +[2025-07-06 15:33:11,405][06624] Saving new best policy, reward=13.710! +[2025-07-06 15:33:16,397][06149] Fps is (10 sec: 3687.6, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1953792. Throughput: 0: 882.7. Samples: 487970. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:33:16,401][06149] Avg episode reward: [(0, '13.624')] +[2025-07-06 15:33:16,410][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000478_1957888.pth... +[2025-07-06 15:33:16,499][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000270_1105920.pth +[2025-07-06 15:33:19,676][06642] Updated weights for policy 0, policy_version 480 (0.0013) +[2025-07-06 15:33:21,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 1970176. Throughput: 0: 856.5. Samples: 492418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 15:33:21,399][06149] Avg episode reward: [(0, '13.734')] +[2025-07-06 15:33:21,400][06624] Saving new best policy, reward=13.734! +[2025-07-06 15:33:26,405][06149] Fps is (10 sec: 3683.4, 60 sec: 3549.4, 300 sec: 3540.5). Total num frames: 1990656. Throughput: 0: 881.8. Samples: 495242. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:33:26,411][06149] Avg episode reward: [(0, '12.334')] +[2025-07-06 15:33:30,391][06642] Updated weights for policy 0, policy_version 490 (0.0013) +[2025-07-06 15:33:31,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2007040. Throughput: 0: 877.1. Samples: 501160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:33:31,401][06149] Avg episode reward: [(0, '11.395')] +[2025-07-06 15:33:36,397][06149] Fps is (10 sec: 3279.5, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2023424. Throughput: 0: 857.0. Samples: 505496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:33:36,401][06149] Avg episode reward: [(0, '10.857')] +[2025-07-06 15:33:41,397][06149] Fps is (10 sec: 3686.6, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2043904. Throughput: 0: 875.3. Samples: 508486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:33:41,398][06149] Avg episode reward: [(0, '11.545')] +[2025-07-06 15:33:42,408][06642] Updated weights for policy 0, policy_version 500 (0.0017) +[2025-07-06 15:33:46,400][06149] Fps is (10 sec: 3685.2, 60 sec: 3481.4, 300 sec: 3526.7). Total num frames: 2060288. Throughput: 0: 872.5. Samples: 514272. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:33:46,404][06149] Avg episode reward: [(0, '12.983')] +[2025-07-06 15:33:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2076672. Throughput: 0: 873.0. Samples: 518856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:33:51,402][06149] Avg episode reward: [(0, '14.091')] +[2025-07-06 15:33:51,406][06624] Saving new best policy, reward=14.091! +[2025-07-06 15:33:54,378][06642] Updated weights for policy 0, policy_version 510 (0.0013) +[2025-07-06 15:33:56,397][06149] Fps is (10 sec: 3687.6, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2097152. Throughput: 0: 880.7. Samples: 521720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:33:56,398][06149] Avg episode reward: [(0, '15.927')] +[2025-07-06 15:33:56,413][06624] Saving new best policy, reward=15.927! +[2025-07-06 15:34:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 2109440. Throughput: 0: 866.2. Samples: 526950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:34:01,399][06149] Avg episode reward: [(0, '16.477')] +[2025-07-06 15:34:01,403][06624] Saving new best policy, reward=16.477! +[2025-07-06 15:34:06,397][06149] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 3526.7). Total num frames: 2125824. Throughput: 0: 872.0. Samples: 531658. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:34:06,398][06149] Avg episode reward: [(0, '16.075')] +[2025-07-06 15:34:06,653][06642] Updated weights for policy 0, policy_version 520 (0.0014) +[2025-07-06 15:34:11,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2146304. Throughput: 0: 876.0. Samples: 534656. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:34:11,398][06149] Avg episode reward: [(0, '17.033')] +[2025-07-06 15:34:11,405][06624] Saving new best policy, reward=17.033! +[2025-07-06 15:34:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 2158592. Throughput: 0: 853.1. Samples: 539548. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:34:16,398][06149] Avg episode reward: [(0, '17.473')] +[2025-07-06 15:34:16,406][06624] Saving new best policy, reward=17.473! +[2025-07-06 15:34:18,581][06642] Updated weights for policy 0, policy_version 530 (0.0012) +[2025-07-06 15:34:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2179072. Throughput: 0: 877.4. Samples: 544978. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:34:21,400][06149] Avg episode reward: [(0, '16.710')] +[2025-07-06 15:34:26,398][06149] Fps is (10 sec: 4095.4, 60 sec: 3482.0, 300 sec: 3526.7). Total num frames: 2199552. Throughput: 0: 876.6. Samples: 547934. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:34:26,401][06149] Avg episode reward: [(0, '15.698')] +[2025-07-06 15:34:30,343][06642] Updated weights for policy 0, policy_version 540 (0.0013) +[2025-07-06 15:34:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2215936. Throughput: 0: 846.5. Samples: 552362. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:34:31,398][06149] Avg episode reward: [(0, '15.669')] +[2025-07-06 15:34:36,397][06149] Fps is (10 sec: 3277.2, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2232320. Throughput: 0: 874.8. Samples: 558224. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:34:36,398][06149] Avg episode reward: [(0, '14.107')] +[2025-07-06 15:34:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 2248704. Throughput: 0: 877.6. Samples: 561212. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:34:41,402][06149] Avg episode reward: [(0, '13.492')] +[2025-07-06 15:34:41,660][06642] Updated weights for policy 0, policy_version 550 (0.0013) +[2025-07-06 15:34:46,399][06149] Fps is (10 sec: 3276.1, 60 sec: 3413.4, 300 sec: 3512.8). Total num frames: 2265088. Throughput: 0: 856.5. Samples: 565494. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:34:46,404][06149] Avg episode reward: [(0, '15.704')] +[2025-07-06 15:34:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2285568. Throughput: 0: 883.0. Samples: 571392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:34:51,404][06149] Avg episode reward: [(0, '16.580')] +[2025-07-06 15:34:52,915][06642] Updated weights for policy 0, policy_version 560 (0.0013) +[2025-07-06 15:34:56,398][06149] Fps is (10 sec: 3686.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 2301952. Throughput: 0: 880.7. Samples: 574288. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:34:56,402][06149] Avg episode reward: [(0, '17.341')] +[2025-07-06 15:35:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.8). Total num frames: 2318336. Throughput: 0: 867.7. Samples: 578594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:35:01,402][06149] Avg episode reward: [(0, '19.349')] +[2025-07-06 15:35:01,407][06624] Saving new best policy, reward=19.349! +[2025-07-06 15:35:05,242][06642] Updated weights for policy 0, policy_version 570 (0.0021) +[2025-07-06 15:35:06,397][06149] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 2338816. Throughput: 0: 873.2. Samples: 584274. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:35:06,405][06149] Avg episode reward: [(0, '20.379')] +[2025-07-06 15:35:06,414][06624] Saving new best policy, reward=20.379! +[2025-07-06 15:35:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 2351104. Throughput: 0: 862.4. Samples: 586740. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:35:11,402][06149] Avg episode reward: [(0, '20.166')] +[2025-07-06 15:35:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2371584. Throughput: 0: 869.6. Samples: 591496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:35:16,398][06149] Avg episode reward: [(0, '19.702')] +[2025-07-06 15:35:16,405][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000579_2371584.pth... +[2025-07-06 15:35:16,503][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000372_1523712.pth +[2025-07-06 15:35:17,369][06642] Updated weights for policy 0, policy_version 580 (0.0013) +[2025-07-06 15:35:21,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2387968. Throughput: 0: 869.4. Samples: 597348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:35:21,398][06149] Avg episode reward: [(0, '21.154')] +[2025-07-06 15:35:21,458][06624] Saving new best policy, reward=21.154! +[2025-07-06 15:35:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3512.8). Total num frames: 2404352. Throughput: 0: 849.0. Samples: 599418. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:35:26,398][06149] Avg episode reward: [(0, '21.333')] +[2025-07-06 15:35:26,406][06624] Saving new best policy, reward=21.333! +[2025-07-06 15:35:29,418][06642] Updated weights for policy 0, policy_version 590 (0.0026) +[2025-07-06 15:35:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 2420736. Throughput: 0: 866.9. Samples: 604502. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:35:31,398][06149] Avg episode reward: [(0, '21.182')] +[2025-07-06 15:35:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2441216. Throughput: 0: 867.1. Samples: 610412. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:35:36,398][06149] Avg episode reward: [(0, '20.737')] +[2025-07-06 15:35:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.9). Total num frames: 2453504. Throughput: 0: 841.4. Samples: 612150. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:35:41,398][06149] Avg episode reward: [(0, '19.511')] +[2025-07-06 15:35:41,454][06642] Updated weights for policy 0, policy_version 600 (0.0015) +[2025-07-06 15:35:46,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3526.7). Total num frames: 2473984. Throughput: 0: 869.5. Samples: 617720. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:35:46,402][06149] Avg episode reward: [(0, '20.273')] +[2025-07-06 15:35:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 2490368. Throughput: 0: 860.7. Samples: 623004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:35:51,400][06149] Avg episode reward: [(0, '18.654')] +[2025-07-06 15:35:53,503][06642] Updated weights for policy 0, policy_version 610 (0.0013) +[2025-07-06 15:35:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3512.9). Total num frames: 2506752. Throughput: 0: 846.9. Samples: 624852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:35:56,399][06149] Avg episode reward: [(0, '18.254')] +[2025-07-06 15:36:01,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2527232. Throughput: 0: 871.7. Samples: 630724. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:36:01,400][06149] Avg episode reward: [(0, '17.656')] +[2025-07-06 15:36:04,669][06642] Updated weights for policy 0, policy_version 620 (0.0013) +[2025-07-06 15:36:06,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 2539520. Throughput: 0: 848.8. Samples: 635546. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:36:06,398][06149] Avg episode reward: [(0, '17.461')] +[2025-07-06 15:36:11,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 2560000. Throughput: 0: 851.1. Samples: 637718. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:36:11,401][06149] Avg episode reward: [(0, '17.178')] +[2025-07-06 15:36:16,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 2576384. Throughput: 0: 868.3. Samples: 643576. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:36:16,398][06149] Avg episode reward: [(0, '18.791')] +[2025-07-06 15:36:16,424][06642] Updated weights for policy 0, policy_version 630 (0.0013) +[2025-07-06 15:36:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 2592768. Throughput: 0: 839.3. Samples: 648180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:36:21,401][06149] Avg episode reward: [(0, '18.437')] +[2025-07-06 15:36:26,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2613248. Throughput: 0: 858.9. Samples: 650802. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:36:26,402][06149] Avg episode reward: [(0, '19.433')] +[2025-07-06 15:36:28,409][06642] Updated weights for policy 0, policy_version 640 (0.0020) +[2025-07-06 15:36:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2629632. Throughput: 0: 863.9. Samples: 656596. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:36:31,398][06149] Avg episode reward: [(0, '20.972')] +[2025-07-06 15:36:36,397][06149] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3457.3). Total num frames: 2641920. Throughput: 0: 842.5. Samples: 660916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:36:36,401][06149] Avg episode reward: [(0, '21.912')] +[2025-07-06 15:36:36,473][06624] Saving new best policy, reward=21.912! +[2025-07-06 15:36:40,623][06642] Updated weights for policy 0, policy_version 650 (0.0013) +[2025-07-06 15:36:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2662400. Throughput: 0: 864.4. Samples: 663748. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:36:41,400][06149] Avg episode reward: [(0, '21.892')] +[2025-07-06 15:36:46,398][06149] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 3457.3). Total num frames: 2678784. Throughput: 0: 862.9. Samples: 669554. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:36:46,400][06149] Avg episode reward: [(0, '22.742')] +[2025-07-06 15:36:46,415][06624] Saving new best policy, reward=22.742! +[2025-07-06 15:36:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 2695168. Throughput: 0: 851.7. Samples: 673872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:36:51,398][06149] Avg episode reward: [(0, '23.426')] +[2025-07-06 15:36:51,406][06624] Saving new best policy, reward=23.426! +[2025-07-06 15:36:52,842][06642] Updated weights for policy 0, policy_version 660 (0.0013) +[2025-07-06 15:36:56,397][06149] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2715648. Throughput: 0: 866.3. Samples: 676702. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:36:56,406][06149] Avg episode reward: [(0, '21.773')] +[2025-07-06 15:37:01,398][06149] Fps is (10 sec: 3685.9, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 2732032. Throughput: 0: 859.3. Samples: 682246. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:37:01,402][06149] Avg episode reward: [(0, '18.514')] +[2025-07-06 15:37:05,171][06642] Updated weights for policy 0, policy_version 670 (0.0014) +[2025-07-06 15:37:06,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2748416. Throughput: 0: 859.4. Samples: 686854. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:37:06,398][06149] Avg episode reward: [(0, '16.956')] +[2025-07-06 15:37:11,397][06149] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2768896. Throughput: 0: 865.2. Samples: 689736. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:37:11,398][06149] Avg episode reward: [(0, '15.704')] +[2025-07-06 15:37:16,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2781184. Throughput: 0: 858.8. Samples: 695242. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:37:16,402][06149] Avg episode reward: [(0, '14.781')] +[2025-07-06 15:37:16,410][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000679_2781184.pth... +[2025-07-06 15:37:16,502][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000478_1957888.pth +[2025-07-06 15:37:16,670][06642] Updated weights for policy 0, policy_version 680 (0.0016) +[2025-07-06 15:37:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2801664. Throughput: 0: 884.6. Samples: 700722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 15:37:21,407][06149] Avg episode reward: [(0, '15.211')] +[2025-07-06 15:37:26,397][06149] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2822144. Throughput: 0: 889.3. Samples: 703768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:37:26,400][06149] Avg episode reward: [(0, '16.993')] +[2025-07-06 15:37:26,863][06642] Updated weights for policy 0, policy_version 690 (0.0015) +[2025-07-06 15:37:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2838528. Throughput: 0: 868.4. Samples: 708632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:37:31,400][06149] Avg episode reward: [(0, '18.106')] +[2025-07-06 15:37:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2859008. Throughput: 0: 904.1. Samples: 714556. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:37:36,398][06149] Avg episode reward: [(0, '19.567')] +[2025-07-06 15:37:38,219][06642] Updated weights for policy 0, policy_version 700 (0.0012) +[2025-07-06 15:37:41,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2875392. Throughput: 0: 905.1. Samples: 717432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:37:41,404][06149] Avg episode reward: [(0, '20.779')] +[2025-07-06 15:37:46,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3471.2). Total num frames: 2891776. Throughput: 0: 876.3. Samples: 721680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 15:37:46,398][06149] Avg episode reward: [(0, '22.450')] +[2025-07-06 15:37:50,407][06642] Updated weights for policy 0, policy_version 710 (0.0013) +[2025-07-06 15:37:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2908160. Throughput: 0: 904.5. Samples: 727558. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:37:51,398][06149] Avg episode reward: [(0, '23.796')] +[2025-07-06 15:37:51,405][06624] Saving new best policy, reward=23.796! +[2025-07-06 15:37:56,398][06149] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 2924544. Throughput: 0: 904.8. Samples: 730452. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:37:56,402][06149] Avg episode reward: [(0, '23.480')] +[2025-07-06 15:38:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3471.2). Total num frames: 2940928. Throughput: 0: 879.4. Samples: 734816. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:38:01,401][06149] Avg episode reward: [(0, '23.883')] +[2025-07-06 15:38:01,458][06624] Saving new best policy, reward=23.883! +[2025-07-06 15:38:02,495][06642] Updated weights for policy 0, policy_version 720 (0.0017) +[2025-07-06 15:38:06,397][06149] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2961408. Throughput: 0: 884.3. Samples: 740516. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:38:06,398][06149] Avg episode reward: [(0, '23.087')] +[2025-07-06 15:38:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 2973696. Throughput: 0: 871.9. Samples: 743002. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:38:11,400][06149] Avg episode reward: [(0, '22.461')] +[2025-07-06 15:38:14,608][06642] Updated weights for policy 0, policy_version 730 (0.0014) +[2025-07-06 15:38:16,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2994176. Throughput: 0: 867.8. Samples: 747684. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:38:16,398][06149] Avg episode reward: [(0, '21.936')] +[2025-07-06 15:38:21,397][06149] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3471.3). Total num frames: 3014656. Throughput: 0: 867.1. Samples: 753574. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:38:21,401][06149] Avg episode reward: [(0, '22.305')] +[2025-07-06 15:38:26,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3026944. Throughput: 0: 850.0. Samples: 755682. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:38:26,401][06149] Avg episode reward: [(0, '21.173')] +[2025-07-06 15:38:26,600][06642] Updated weights for policy 0, policy_version 740 (0.0013) +[2025-07-06 15:38:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3047424. Throughput: 0: 871.8. Samples: 760910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:38:31,402][06149] Avg episode reward: [(0, '20.386')] +[2025-07-06 15:38:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3063808. Throughput: 0: 868.4. Samples: 766636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:38:36,402][06149] Avg episode reward: [(0, '19.737')] +[2025-07-06 15:38:38,133][06642] Updated weights for policy 0, policy_version 750 (0.0020) +[2025-07-06 15:38:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3080192. Throughput: 0: 842.5. Samples: 768362. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:38:41,405][06149] Avg episode reward: [(0, '20.756')] +[2025-07-06 15:38:46,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3100672. Throughput: 0: 869.8. Samples: 773958. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:38:46,402][06149] Avg episode reward: [(0, '20.566')] +[2025-07-06 15:38:49,326][06642] Updated weights for policy 0, policy_version 760 (0.0018) +[2025-07-06 15:38:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3117056. Throughput: 0: 861.0. Samples: 779262. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:38:51,398][06149] Avg episode reward: [(0, '20.534')] +[2025-07-06 15:38:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3471.2). Total num frames: 3133440. Throughput: 0: 849.8. Samples: 781242. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:38:56,403][06149] Avg episode reward: [(0, '20.876')] +[2025-07-06 15:39:01,268][06642] Updated weights for policy 0, policy_version 770 (0.0018) +[2025-07-06 15:39:01,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 3153920. Throughput: 0: 876.1. Samples: 787108. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:39:01,398][06149] Avg episode reward: [(0, '22.123')] +[2025-07-06 15:39:06,399][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3166208. Throughput: 0: 850.4. Samples: 791840. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:39:06,401][06149] Avg episode reward: [(0, '22.937')] +[2025-07-06 15:39:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 3186688. Throughput: 0: 857.8. Samples: 794284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:39:11,406][06149] Avg episode reward: [(0, '21.314')] +[2025-07-06 15:39:13,328][06642] Updated weights for policy 0, policy_version 780 (0.0017) +[2025-07-06 15:39:16,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3203072. Throughput: 0: 869.6. Samples: 800044. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:39:16,402][06149] Avg episode reward: [(0, '21.475')] +[2025-07-06 15:39:16,412][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000782_3203072.pth... +[2025-07-06 15:39:16,516][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000579_2371584.pth +[2025-07-06 15:39:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3219456. Throughput: 0: 839.0. Samples: 804390. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:39:21,401][06149] Avg episode reward: [(0, '20.084')] +[2025-07-06 15:39:25,528][06642] Updated weights for policy 0, policy_version 790 (0.0014) +[2025-07-06 15:39:26,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3235840. Throughput: 0: 863.8. Samples: 807232. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:39:26,402][06149] Avg episode reward: [(0, '18.628')] +[2025-07-06 15:39:31,397][06149] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3256320. Throughput: 0: 870.3. Samples: 813124. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:39:31,398][06149] Avg episode reward: [(0, '17.965')] +[2025-07-06 15:39:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3272704. Throughput: 0: 850.3. Samples: 817526. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:39:36,402][06149] Avg episode reward: [(0, '18.332')] +[2025-07-06 15:39:37,387][06642] Updated weights for policy 0, policy_version 800 (0.0016) +[2025-07-06 15:39:41,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3289088. Throughput: 0: 871.4. Samples: 820456. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:39:41,398][06149] Avg episode reward: [(0, '19.101')] +[2025-07-06 15:39:46,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3305472. Throughput: 0: 869.3. Samples: 826226. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:39:46,398][06149] Avg episode reward: [(0, '20.238')] +[2025-07-06 15:39:49,654][06642] Updated weights for policy 0, policy_version 810 (0.0017) +[2025-07-06 15:39:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3321856. Throughput: 0: 862.7. Samples: 830660. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:39:51,399][06149] Avg episode reward: [(0, '22.122')] +[2025-07-06 15:39:56,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3342336. Throughput: 0: 874.2. Samples: 833622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:39:56,399][06149] Avg episode reward: [(0, '23.447')] +[2025-07-06 15:40:01,185][06642] Updated weights for policy 0, policy_version 820 (0.0013) +[2025-07-06 15:40:01,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3358720. Throughput: 0: 863.5. Samples: 838902. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:40:01,398][06149] Avg episode reward: [(0, '25.121')] +[2025-07-06 15:40:01,403][06624] Saving new best policy, reward=25.121! +[2025-07-06 15:40:06,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3375104. Throughput: 0: 872.9. Samples: 843672. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:40:06,404][06149] Avg episode reward: [(0, '25.532')] +[2025-07-06 15:40:06,412][06624] Saving new best policy, reward=25.532! +[2025-07-06 15:40:11,402][06149] Fps is (10 sec: 3684.4, 60 sec: 3481.3, 300 sec: 3471.1). Total num frames: 3395584. Throughput: 0: 874.3. Samples: 846578. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:40:11,408][06149] Avg episode reward: [(0, '25.620')] +[2025-07-06 15:40:11,414][06624] Saving new best policy, reward=25.620! +[2025-07-06 15:40:12,654][06642] Updated weights for policy 0, policy_version 830 (0.0014) +[2025-07-06 15:40:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3407872. Throughput: 0: 849.6. Samples: 851354. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:40:16,402][06149] Avg episode reward: [(0, '24.069')] +[2025-07-06 15:40:21,397][06149] Fps is (10 sec: 3278.6, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3428352. Throughput: 0: 871.9. Samples: 856762. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:40:21,402][06149] Avg episode reward: [(0, '22.612')] +[2025-07-06 15:40:24,366][06642] Updated weights for policy 0, policy_version 840 (0.0018) +[2025-07-06 15:40:26,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3444736. Throughput: 0: 871.7. Samples: 859684. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:40:26,401][06149] Avg episode reward: [(0, '20.331')] +[2025-07-06 15:40:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3457.3). Total num frames: 3461120. Throughput: 0: 842.2. Samples: 864126. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:40:31,402][06149] Avg episode reward: [(0, '19.632')] +[2025-07-06 15:40:36,307][06642] Updated weights for policy 0, policy_version 850 (0.0013) +[2025-07-06 15:40:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 3481600. Throughput: 0: 871.7. Samples: 869888. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:40:36,402][06149] Avg episode reward: [(0, '20.379')] +[2025-07-06 15:40:41,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3497984. Throughput: 0: 871.9. Samples: 872858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:40:41,404][06149] Avg episode reward: [(0, '18.751')] +[2025-07-06 15:40:46,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3514368. Throughput: 0: 849.4. Samples: 877124. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:40:46,400][06149] Avg episode reward: [(0, '20.638')] +[2025-07-06 15:40:48,460][06642] Updated weights for policy 0, policy_version 860 (0.0013) +[2025-07-06 15:40:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3530752. Throughput: 0: 873.9. Samples: 882996. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:40:51,400][06149] Avg episode reward: [(0, '20.907')] +[2025-07-06 15:40:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3547136. Throughput: 0: 875.6. Samples: 885974. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:40:56,401][06149] Avg episode reward: [(0, '21.966')] +[2025-07-06 15:41:00,401][06642] Updated weights for policy 0, policy_version 870 (0.0013) +[2025-07-06 15:41:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3563520. Throughput: 0: 864.9. Samples: 890276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:41:01,403][06149] Avg episode reward: [(0, '20.715')] +[2025-07-06 15:41:06,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3584000. Throughput: 0: 873.6. Samples: 896076. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:41:06,398][06149] Avg episode reward: [(0, '20.278')] +[2025-07-06 15:41:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3345.4, 300 sec: 3457.3). Total num frames: 3596288. Throughput: 0: 865.2. Samples: 898616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:41:11,398][06149] Avg episode reward: [(0, '20.415')] +[2025-07-06 15:41:12,580][06642] Updated weights for policy 0, policy_version 880 (0.0013) +[2025-07-06 15:41:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3616768. Throughput: 0: 871.8. Samples: 903358. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:41:16,405][06149] Avg episode reward: [(0, '21.287')] +[2025-07-06 15:41:16,414][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000883_3616768.pth... +[2025-07-06 15:41:16,511][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000679_2781184.pth +[2025-07-06 15:41:21,401][06149] Fps is (10 sec: 4094.2, 60 sec: 3481.3, 300 sec: 3471.1). Total num frames: 3637248. Throughput: 0: 872.2. Samples: 909140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:41:21,402][06149] Avg episode reward: [(0, '22.518')] +[2025-07-06 15:41:24,215][06642] Updated weights for policy 0, policy_version 890 (0.0013) +[2025-07-06 15:41:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3649536. Throughput: 0: 851.8. Samples: 911190. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:41:26,398][06149] Avg episode reward: [(0, '23.306')] +[2025-07-06 15:41:31,397][06149] Fps is (10 sec: 3278.1, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 3670016. Throughput: 0: 873.1. Samples: 916416. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:41:31,399][06149] Avg episode reward: [(0, '25.661')] +[2025-07-06 15:41:31,400][06624] Saving new best policy, reward=25.661! +[2025-07-06 15:41:35,094][06642] Updated weights for policy 0, policy_version 900 (0.0020) +[2025-07-06 15:41:36,399][06149] Fps is (10 sec: 3685.7, 60 sec: 3413.2, 300 sec: 3471.2). Total num frames: 3686400. Throughput: 0: 867.5. Samples: 922036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:41:36,411][06149] Avg episode reward: [(0, '25.785')] +[2025-07-06 15:41:36,421][06624] Saving new best policy, reward=25.785! +[2025-07-06 15:41:41,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3702784. Throughput: 0: 839.7. Samples: 923762. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:41:41,403][06149] Avg episode reward: [(0, '26.583')] +[2025-07-06 15:41:41,406][06624] Saving new best policy, reward=26.583! +[2025-07-06 15:41:46,397][06149] Fps is (10 sec: 3687.1, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 3723264. Throughput: 0: 868.0. Samples: 929338. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:41:46,402][06149] Avg episode reward: [(0, '28.137')] +[2025-07-06 15:41:46,410][06624] Saving new best policy, reward=28.137! +[2025-07-06 15:41:47,377][06642] Updated weights for policy 0, policy_version 910 (0.0015) +[2025-07-06 15:41:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3739648. Throughput: 0: 856.3. Samples: 934610. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:41:51,398][06149] Avg episode reward: [(0, '27.853')] +[2025-07-06 15:41:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3756032. Throughput: 0: 845.4. Samples: 936658. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:41:56,398][06149] Avg episode reward: [(0, '28.341')] +[2025-07-06 15:41:56,412][06624] Saving new best policy, reward=28.341! +[2025-07-06 15:41:59,394][06642] Updated weights for policy 0, policy_version 920 (0.0014) +[2025-07-06 15:42:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3772416. Throughput: 0: 870.4. Samples: 942524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:42:01,407][06149] Avg episode reward: [(0, '26.926')] +[2025-07-06 15:42:06,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3788800. Throughput: 0: 849.1. Samples: 947348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:42:06,401][06149] Avg episode reward: [(0, '26.735')] +[2025-07-06 15:42:11,188][06642] Updated weights for policy 0, policy_version 930 (0.0019) +[2025-07-06 15:42:11,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 3809280. Throughput: 0: 858.6. Samples: 949826. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:42:11,398][06149] Avg episode reward: [(0, '27.739')] +[2025-07-06 15:42:16,397][06149] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3825664. Throughput: 0: 874.9. Samples: 955788. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:42:16,401][06149] Avg episode reward: [(0, '26.860')] +[2025-07-06 15:42:21,398][06149] Fps is (10 sec: 3276.4, 60 sec: 3413.5, 300 sec: 3457.3). Total num frames: 3842048. Throughput: 0: 848.6. Samples: 960224. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:42:21,404][06149] Avg episode reward: [(0, '26.362')] +[2025-07-06 15:42:23,156][06642] Updated weights for policy 0, policy_version 940 (0.0015) +[2025-07-06 15:42:26,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 3862528. Throughput: 0: 876.0. Samples: 963180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:42:26,398][06149] Avg episode reward: [(0, '24.394')] +[2025-07-06 15:42:31,399][06149] Fps is (10 sec: 3686.1, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 3878912. Throughput: 0: 884.6. Samples: 969146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:42:31,400][06149] Avg episode reward: [(0, '23.817')] +[2025-07-06 15:42:34,778][06642] Updated weights for policy 0, policy_version 950 (0.0013) +[2025-07-06 15:42:36,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3457.3). Total num frames: 3895296. Throughput: 0: 869.5. Samples: 973738. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-07-06 15:42:36,398][06149] Avg episode reward: [(0, '23.139')] +[2025-07-06 15:42:41,398][06149] Fps is (10 sec: 3687.2, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 3915776. Throughput: 0: 890.0. Samples: 976708. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:42:41,402][06149] Avg episode reward: [(0, '22.658')] +[2025-07-06 15:42:46,027][06642] Updated weights for policy 0, policy_version 960 (0.0013) +[2025-07-06 15:42:46,398][06149] Fps is (10 sec: 3685.9, 60 sec: 3481.5, 300 sec: 3471.2). Total num frames: 3932160. Throughput: 0: 888.0. Samples: 982486. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:42:46,402][06149] Avg episode reward: [(0, '23.051')] +[2025-07-06 15:42:51,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3948544. Throughput: 0: 883.8. Samples: 987120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:42:51,399][06149] Avg episode reward: [(0, '23.135')] +[2025-07-06 15:42:56,397][06149] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 3969024. Throughput: 0: 895.1. Samples: 990104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-07-06 15:42:56,398][06149] Avg episode reward: [(0, '23.407')] +[2025-07-06 15:42:56,842][06642] Updated weights for policy 0, policy_version 970 (0.0013) +[2025-07-06 15:43:01,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 3985408. Throughput: 0: 887.0. Samples: 995704. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-07-06 15:43:01,400][06149] Avg episode reward: [(0, '23.154')] +[2025-07-06 15:43:06,265][06624] Stopping Batcher_0... +[2025-07-06 15:43:06,266][06624] Loop batcher_evt_loop terminating... +[2025-07-06 15:43:06,266][06149] Component Batcher_0 stopped! +[2025-07-06 15:43:06,271][06149] Component RolloutWorker_w0 process died already! Don't wait for it. +[2025-07-06 15:43:06,274][06149] Component RolloutWorker_w1 process died already! Don't wait for it. +[2025-07-06 15:43:06,278][06149] Component RolloutWorker_w3 process died already! Don't wait for it. +[2025-07-06 15:43:06,280][06149] Component RolloutWorker_w6 process died already! Don't wait for it. +[2025-07-06 15:43:06,284][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:43:06,314][06642] Weights refcount: 2 0 +[2025-07-06 15:43:06,318][06642] Stopping InferenceWorker_p0-w0... +[2025-07-06 15:43:06,318][06149] Component InferenceWorker_p0-w0 stopped! +[2025-07-06 15:43:06,319][06642] Loop inference_proc0-0_evt_loop terminating... +[2025-07-06 15:43:06,378][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000782_3203072.pth +[2025-07-06 15:43:06,391][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:43:06,514][06149] Component LearnerWorker_p0 stopped! +[2025-07-06 15:43:06,513][06624] Stopping LearnerWorker_p0... +[2025-07-06 15:43:06,521][06624] Loop learner_proc0_evt_loop terminating... +[2025-07-06 15:43:06,605][06149] Component RolloutWorker_w5 stopped! +[2025-07-06 15:43:06,606][06643] Stopping RolloutWorker_w5... +[2025-07-06 15:43:06,609][06643] Loop rollout_proc5_evt_loop terminating... +[2025-07-06 15:43:06,645][06149] Component RolloutWorker_w7 stopped! +[2025-07-06 15:43:06,646][06645] Stopping RolloutWorker_w7... +[2025-07-06 15:43:06,647][06645] Loop rollout_proc7_evt_loop terminating... +[2025-07-06 15:43:06,688][06641] Stopping RolloutWorker_w4... +[2025-07-06 15:43:06,688][06149] Component RolloutWorker_w4 stopped! +[2025-07-06 15:43:06,695][06149] Component RolloutWorker_w2 stopped! +[2025-07-06 15:43:06,697][06149] Waiting for process learner_proc0 to stop... +[2025-07-06 15:43:06,695][06639] Stopping RolloutWorker_w2... +[2025-07-06 15:43:06,702][06639] Loop rollout_proc2_evt_loop terminating... +[2025-07-06 15:43:06,704][06641] Loop rollout_proc4_evt_loop terminating... +[2025-07-06 15:43:08,008][06149] Waiting for process inference_proc0-0 to join... +[2025-07-06 15:43:08,013][06149] Waiting for process rollout_proc0 to join... +[2025-07-06 15:43:08,017][06149] Waiting for process rollout_proc1 to join... +[2025-07-06 15:43:08,018][06149] Waiting for process rollout_proc2 to join... +[2025-07-06 15:43:08,887][06149] Waiting for process rollout_proc3 to join... +[2025-07-06 15:43:08,888][06149] Waiting for process rollout_proc4 to join... +[2025-07-06 15:43:08,891][06149] Waiting for process rollout_proc5 to join... +[2025-07-06 15:43:08,892][06149] Waiting for process rollout_proc6 to join... +[2025-07-06 15:43:08,893][06149] Waiting for process rollout_proc7 to join... +[2025-07-06 15:43:08,894][06149] Batcher 0 profile tree view: +batching: 22.7740, releasing_batches: 0.0319 +[2025-07-06 15:43:08,895][06149] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0015 + wait_policy_total: 460.4266 +update_model: 9.8839 + weight_update: 0.0014 +one_step: 0.0030 + handle_policy_step: 641.1391 + deserialize: 15.3284, stack: 3.9142, obs_to_device_normalize: 141.4910, forward: 340.6104, send_messages: 24.8876 + prepare_outputs: 87.0798 + to_cpu: 52.9134 +[2025-07-06 15:43:08,900][06149] Learner 0 profile tree view: +misc: 0.0043, prepare_batch: 11.9594 +train: 68.0338 + epoch_init: 0.0113, minibatch_init: 0.0109, losses_postprocess: 0.5875, kl_divergence: 0.5524, after_optimizer: 32.5001 + calculate_losses: 22.9152 + losses_init: 0.0091, forward_head: 1.3074, bptt_initial: 15.6625, tail: 0.9521, advantages_returns: 0.2239, losses: 2.8113 + bptt: 1.7258 + bptt_forward_core: 1.6308 + update: 10.9270 + clip: 1.0328 +[2025-07-06 15:43:08,902][06149] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.4567, enqueue_policy_requests: 172.7543, env_step: 797.9871, overhead: 20.5533, complete_rollouts: 6.9287 +save_policy_outputs: 27.4127 + split_output_tensors: 10.4911 +[2025-07-06 15:43:08,903][06149] Loop Runner_EvtLoop terminating... +[2025-07-06 15:43:08,905][06149] Runner profile tree view: +main_loop: 1186.8546 +[2025-07-06 15:43:08,907][06149] Collected {0: 4005888}, FPS: 3375.2 +[2025-07-06 15:43:55,037][06149] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-07-06 15:43:55,038][06149] Overriding arg 'num_workers' with value 1 passed from command line +[2025-07-06 15:43:55,040][06149] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-07-06 15:43:55,042][06149] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-07-06 15:43:55,043][06149] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-07-06 15:43:55,045][06149] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-07-06 15:43:55,046][06149] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-07-06 15:43:55,047][06149] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-07-06 15:43:55,048][06149] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-07-06 15:43:55,049][06149] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-07-06 15:43:55,050][06149] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-07-06 15:43:55,051][06149] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-07-06 15:43:55,052][06149] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-07-06 15:43:55,053][06149] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-07-06 15:43:55,054][06149] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-07-06 15:43:55,097][06149] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 15:43:55,100][06149] RunningMeanStd input shape: (3, 72, 128) +[2025-07-06 15:43:55,102][06149] RunningMeanStd input shape: (1,) +[2025-07-06 15:43:55,115][06149] ConvEncoder: input_channels=3 +[2025-07-06 15:43:55,228][06149] Conv encoder output size: 512 +[2025-07-06 15:43:55,229][06149] Policy head output size: 512 +[2025-07-06 15:43:55,496][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:43:55,499][06149] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:43:55,503][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:43:55,505][06149] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:43:55,506][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:43:55,508][06149] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:44:34,428][06149] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-07-06 15:44:34,431][06149] Overriding arg 'num_workers' with value 1 passed from command line +[2025-07-06 15:44:34,431][06149] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-07-06 15:44:34,433][06149] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-07-06 15:44:34,434][06149] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-07-06 15:44:34,435][06149] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-07-06 15:44:34,436][06149] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-07-06 15:44:34,440][06149] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-07-06 15:44:34,441][06149] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-07-06 15:44:34,442][06149] Adding new argument 'hf_repository'='zhngq/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-07-06 15:44:34,442][06149] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-07-06 15:44:34,443][06149] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-07-06 15:44:34,444][06149] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-07-06 15:44:34,444][06149] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-07-06 15:44:34,448][06149] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-07-06 15:44:34,500][06149] RunningMeanStd input shape: (3, 72, 128) +[2025-07-06 15:44:34,503][06149] RunningMeanStd input shape: (1,) +[2025-07-06 15:44:34,517][06149] ConvEncoder: input_channels=3 +[2025-07-06 15:44:34,578][06149] Conv encoder output size: 512 +[2025-07-06 15:44:34,580][06149] Policy head output size: 512 +[2025-07-06 15:44:34,610][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:44:34,613][06149] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:44:34,615][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:44:34,617][06149] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:44:34,619][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:44:34,624][06149] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:55:34,865][06149] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-07-06 15:55:34,866][06149] Overriding arg 'num_workers' with value 1 passed from command line +[2025-07-06 15:55:34,867][06149] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-07-06 15:55:34,868][06149] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-07-06 15:55:34,869][06149] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-07-06 15:55:34,869][06149] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-07-06 15:55:34,870][06149] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-07-06 15:55:34,871][06149] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-07-06 15:55:34,872][06149] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-07-06 15:55:34,873][06149] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-07-06 15:55:34,874][06149] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-07-06 15:55:34,875][06149] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-07-06 15:55:34,875][06149] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-07-06 15:55:34,876][06149] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-07-06 15:55:34,877][06149] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-07-06 15:55:34,905][06149] RunningMeanStd input shape: (3, 72, 128) +[2025-07-06 15:55:34,907][06149] RunningMeanStd input shape: (1,) +[2025-07-06 15:55:34,919][06149] ConvEncoder: input_channels=3 +[2025-07-06 15:55:34,956][06149] Conv encoder output size: 512 +[2025-07-06 15:55:34,957][06149] Policy head output size: 512 +[2025-07-06 15:55:34,977][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:55:34,979][06149] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:55:34,980][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:55:34,982][06149] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:55:34,984][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:55:34,985][06149] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:55:41,045][06149] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-07-06 15:55:41,046][06149] Overriding arg 'num_workers' with value 1 passed from command line +[2025-07-06 15:55:41,047][06149] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-07-06 15:55:41,048][06149] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-07-06 15:55:41,048][06149] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-07-06 15:55:41,049][06149] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-07-06 15:55:41,050][06149] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-07-06 15:55:41,051][06149] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-07-06 15:55:41,052][06149] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-07-06 15:55:41,053][06149] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-07-06 15:55:41,054][06149] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-07-06 15:55:41,055][06149] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-07-06 15:55:41,055][06149] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-07-06 15:55:41,056][06149] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-07-06 15:55:41,057][06149] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-07-06 15:55:41,104][06149] RunningMeanStd input shape: (3, 72, 128) +[2025-07-06 15:55:41,106][06149] RunningMeanStd input shape: (1,) +[2025-07-06 15:55:41,121][06149] ConvEncoder: input_channels=3 +[2025-07-06 15:55:41,177][06149] Conv encoder output size: 512 +[2025-07-06 15:55:41,178][06149] Policy head output size: 512 +[2025-07-06 15:55:41,210][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:55:41,212][06149] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:55:41,214][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:55:41,215][06149] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:55:41,216][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 15:55:41,218][06149] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 15:55:46,205][06149] Environment doom_basic already registered, overwriting... +[2025-07-06 15:55:46,206][06149] Environment doom_two_colors_easy already registered, overwriting... +[2025-07-06 15:55:46,207][06149] Environment doom_two_colors_hard already registered, overwriting... +[2025-07-06 15:55:46,208][06149] Environment doom_dm already registered, overwriting... +[2025-07-06 15:55:46,208][06149] Environment doom_dwango5 already registered, overwriting... +[2025-07-06 15:55:46,209][06149] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2025-07-06 15:55:46,210][06149] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2025-07-06 15:55:46,211][06149] Environment doom_my_way_home already registered, overwriting... +[2025-07-06 15:55:46,211][06149] Environment doom_deadly_corridor already registered, overwriting... +[2025-07-06 15:55:46,212][06149] Environment doom_defend_the_center already registered, overwriting... +[2025-07-06 15:55:46,213][06149] Environment doom_defend_the_line already registered, overwriting... +[2025-07-06 15:55:46,215][06149] Environment doom_health_gathering already registered, overwriting... +[2025-07-06 15:55:46,216][06149] Environment doom_health_gathering_supreme already registered, overwriting... +[2025-07-06 15:55:46,216][06149] Environment doom_battle already registered, overwriting... +[2025-07-06 15:55:46,217][06149] Environment doom_battle2 already registered, overwriting... +[2025-07-06 15:55:46,219][06149] Environment doom_duel_bots already registered, overwriting... +[2025-07-06 15:55:46,219][06149] Environment doom_deathmatch_bots already registered, overwriting... +[2025-07-06 15:55:46,220][06149] Environment doom_duel already registered, overwriting... +[2025-07-06 15:55:46,221][06149] Environment doom_deathmatch_full already registered, overwriting... +[2025-07-06 15:55:46,222][06149] Environment doom_benchmark already registered, overwriting... +[2025-07-06 15:55:46,222][06149] register_encoder_factory: +[2025-07-06 15:55:46,240][06149] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-07-06 15:55:46,245][06149] Experiment dir /content/train_dir/default_experiment already exists! +[2025-07-06 15:55:46,245][06149] Resuming existing experiment from /content/train_dir/default_experiment... +[2025-07-06 15:55:46,246][06149] Weights and Biases integration disabled +[2025-07-06 15:55:46,249][06149] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2025-07-06 15:55:48,666][06149] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/content/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=8 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=32 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=4000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=unknown +git_repo_name=not a git repository +[2025-07-06 15:55:48,667][06149] Saving configuration to /content/train_dir/default_experiment/config.json... +[2025-07-06 15:55:48,669][06149] Rollout worker 0 uses device cpu +[2025-07-06 15:55:48,670][06149] Rollout worker 1 uses device cpu +[2025-07-06 15:55:48,671][06149] Rollout worker 2 uses device cpu +[2025-07-06 15:55:48,672][06149] Rollout worker 3 uses device cpu +[2025-07-06 15:55:48,673][06149] Rollout worker 4 uses device cpu +[2025-07-06 15:55:48,675][06149] Rollout worker 5 uses device cpu +[2025-07-06 15:55:48,676][06149] Rollout worker 6 uses device cpu +[2025-07-06 15:55:48,676][06149] Rollout worker 7 uses device cpu +[2025-07-06 15:55:48,771][06149] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 15:55:48,772][06149] InferenceWorker_p0-w0: min num requests: 2 +[2025-07-06 15:55:48,803][06149] Starting all processes... +[2025-07-06 15:55:48,804][06149] Starting process learner_proc0 +[2025-07-06 15:55:48,857][06149] Starting all processes... +[2025-07-06 15:55:48,863][06149] Starting process inference_proc0-0 +[2025-07-06 15:55:48,863][06149] Starting process rollout_proc0 +[2025-07-06 15:55:48,866][06149] Starting process rollout_proc1 +[2025-07-06 15:55:48,866][06149] Starting process rollout_proc2 +[2025-07-06 15:55:48,866][06149] Starting process rollout_proc3 +[2025-07-06 15:55:48,866][06149] Starting process rollout_proc4 +[2025-07-06 15:55:48,866][06149] Starting process rollout_proc5 +[2025-07-06 15:55:48,866][06149] Starting process rollout_proc6 +[2025-07-06 15:55:48,866][06149] Starting process rollout_proc7 +[2025-07-06 15:56:07,015][16021] Worker 0 uses CPU cores [0] +[2025-07-06 15:56:07,034][16023] Worker 2 uses CPU cores [0] +[2025-07-06 15:56:07,227][16028] Worker 7 uses CPU cores [1] +[2025-07-06 15:56:07,562][16025] Worker 4 uses CPU cores [0] +[2025-07-06 15:56:07,582][16027] Worker 6 uses CPU cores [0] +[2025-07-06 15:56:07,671][16026] Worker 5 uses CPU cores [1] +[2025-07-06 15:56:07,815][16024] Worker 3 uses CPU cores [1] +[2025-07-06 15:56:07,889][16022] Worker 1 uses CPU cores [1] +[2025-07-06 15:56:07,911][16020] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 15:56:07,912][16020] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-07-06 15:56:07,933][16020] Num visible devices: 1 +[2025-07-06 15:56:08,772][06149] Heartbeat connected on InferenceWorker_p0-w0 +[2025-07-06 15:56:08,779][06149] Heartbeat connected on RolloutWorker_w0 +[2025-07-06 15:56:08,782][06149] Heartbeat connected on RolloutWorker_w1 +[2025-07-06 15:56:08,785][06149] Heartbeat connected on RolloutWorker_w2 +[2025-07-06 15:56:08,789][06149] Heartbeat connected on RolloutWorker_w3 +[2025-07-06 15:56:08,792][06149] Heartbeat connected on RolloutWorker_w4 +[2025-07-06 15:56:08,796][06149] Heartbeat connected on RolloutWorker_w5 +[2025-07-06 15:56:08,799][06149] Heartbeat connected on RolloutWorker_w6 +[2025-07-06 15:56:08,803][06149] Heartbeat connected on RolloutWorker_w7 +[2025-07-06 16:05:46,250][06149] Components not started: Batcher_0, LearnerWorker_p0, wait_time=600.0 seconds +[2025-07-06 16:15:46,249][06149] Components not started: Batcher_0, LearnerWorker_p0, wait_time=1200.0 seconds +[2025-07-06 16:16:41,202][06149] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 6149], exiting... +[2025-07-06 16:16:41,204][16026] Stopping RolloutWorker_w5... +[2025-07-06 16:16:41,204][16026] Loop rollout_proc5_evt_loop terminating... +[2025-07-06 16:16:41,206][16024] Stopping RolloutWorker_w3... +[2025-07-06 16:16:41,206][16021] Stopping RolloutWorker_w0... +[2025-07-06 16:16:41,206][16024] Loop rollout_proc3_evt_loop terminating... +[2025-07-06 16:16:41,206][16021] Loop rollout_proc0_evt_loop terminating... +[2025-07-06 16:16:41,208][16023] Stopping RolloutWorker_w2... +[2025-07-06 16:16:41,208][16020] Stopping InferenceWorker_p0-w0... +[2025-07-06 16:16:41,208][16023] Loop rollout_proc2_evt_loop terminating... +[2025-07-06 16:16:41,209][16020] Loop inference_proc0-0_evt_loop terminating... +[2025-07-06 16:16:41,205][16027] Stopping RolloutWorker_w6... +[2025-07-06 16:16:41,211][16027] Loop rollout_proc6_evt_loop terminating... +[2025-07-06 16:16:41,205][06149] Runner profile tree view: +main_loop: 1252.4020 +[2025-07-06 16:16:41,214][16022] Stopping RolloutWorker_w1... +[2025-07-06 16:16:41,214][06149] Collected {}, FPS: 0.0 +[2025-07-06 16:16:41,215][16025] Stopping RolloutWorker_w4... +[2025-07-06 16:16:41,221][16025] Loop rollout_proc4_evt_loop terminating... +[2025-07-06 16:16:41,221][16022] Loop rollout_proc1_evt_loop terminating... +[2025-07-06 16:16:41,220][16028] Stopping RolloutWorker_w7... +[2025-07-06 16:16:41,235][16028] Loop rollout_proc7_evt_loop terminating... +[2025-07-06 16:21:33,573][21969] Saving configuration to /content/train_dir/default_experiment/config.json... +[2025-07-06 16:21:33,577][21969] Rollout worker 0 uses device cpu +[2025-07-06 16:21:33,578][21969] Rollout worker 1 uses device cpu +[2025-07-06 16:21:33,579][21969] Rollout worker 2 uses device cpu +[2025-07-06 16:21:33,580][21969] Rollout worker 3 uses device cpu +[2025-07-06 16:21:33,581][21969] Rollout worker 4 uses device cpu +[2025-07-06 16:21:33,582][21969] Rollout worker 5 uses device cpu +[2025-07-06 16:21:33,583][21969] Rollout worker 6 uses device cpu +[2025-07-06 16:21:33,583][21969] Rollout worker 7 uses device cpu +[2025-07-06 16:21:33,694][21969] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 16:21:33,695][21969] InferenceWorker_p0-w0: min num requests: 2 +[2025-07-06 16:21:33,723][21969] Starting all processes... +[2025-07-06 16:21:33,724][21969] Starting process learner_proc0 +[2025-07-06 16:21:33,775][21969] Starting all processes... +[2025-07-06 16:21:33,781][21969] Starting process inference_proc0-0 +[2025-07-06 16:21:33,781][21969] Starting process rollout_proc0 +[2025-07-06 16:21:33,782][21969] Starting process rollout_proc1 +[2025-07-06 16:21:33,782][21969] Starting process rollout_proc2 +[2025-07-06 16:21:33,782][21969] Starting process rollout_proc3 +[2025-07-06 16:21:33,782][21969] Starting process rollout_proc4 +[2025-07-06 16:21:33,782][21969] Starting process rollout_proc5 +[2025-07-06 16:21:33,782][21969] Starting process rollout_proc6 +[2025-07-06 16:21:33,782][21969] Starting process rollout_proc7 +[2025-07-06 16:21:50,796][22699] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 16:21:50,801][22699] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-07-06 16:21:50,853][22699] Num visible devices: 1 +[2025-07-06 16:21:50,871][22699] Starting seed is not provided +[2025-07-06 16:21:50,872][22699] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 16:21:50,872][22699] Initializing actor-critic model on device cuda:0 +[2025-07-06 16:21:50,873][22699] RunningMeanStd input shape: (3, 72, 128) +[2025-07-06 16:21:50,875][22699] RunningMeanStd input shape: (1,) +[2025-07-06 16:21:50,967][22699] ConvEncoder: input_channels=3 +[2025-07-06 16:21:51,051][22717] Worker 5 uses CPU cores [1] +[2025-07-06 16:21:51,205][22716] Worker 3 uses CPU cores [1] +[2025-07-06 16:21:51,207][22720] Worker 6 uses CPU cores [0] +[2025-07-06 16:21:51,271][22712] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 16:21:51,271][22712] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-07-06 16:21:51,338][22713] Worker 0 uses CPU cores [0] +[2025-07-06 16:21:51,345][22719] Worker 4 uses CPU cores [0] +[2025-07-06 16:21:51,350][22712] Num visible devices: 1 +[2025-07-06 16:21:51,366][22714] Worker 1 uses CPU cores [1] +[2025-07-06 16:21:51,420][22715] Worker 2 uses CPU cores [0] +[2025-07-06 16:21:51,442][22718] Worker 7 uses CPU cores [1] +[2025-07-06 16:21:51,459][22699] Conv encoder output size: 512 +[2025-07-06 16:21:51,459][22699] Policy head output size: 512 +[2025-07-06 16:21:51,474][22699] Created Actor Critic model with architecture: +[2025-07-06 16:21:51,474][22699] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2025-07-06 16:21:51,739][22699] Using optimizer +[2025-07-06 16:21:52,650][22699] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 16:21:52,652][22699] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 16:21:52,653][22699] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 16:21:52,654][22699] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 16:21:52,655][22699] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 16:21:52,656][22699] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-07-06 16:21:52,657][22699] Did not load from checkpoint, starting from scratch! +[2025-07-06 16:21:52,657][22699] Initialized policy 0 weights for model version 0 +[2025-07-06 16:21:52,663][22699] LearnerWorker_p0 finished initialization! +[2025-07-06 16:21:52,663][22699] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-07-06 16:21:52,891][22712] RunningMeanStd input shape: (3, 72, 128) +[2025-07-06 16:21:52,892][22712] RunningMeanStd input shape: (1,) +[2025-07-06 16:21:52,905][22712] ConvEncoder: input_channels=3 +[2025-07-06 16:21:53,008][22712] Conv encoder output size: 512 +[2025-07-06 16:21:53,009][22712] Policy head output size: 512 +[2025-07-06 16:21:53,045][21969] Inference worker 0-0 is ready! +[2025-07-06 16:21:53,046][21969] All inference workers are ready! Signal rollout workers to start! +[2025-07-06 16:21:53,289][22720] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 16:21:53,307][22719] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 16:21:53,309][22715] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 16:21:53,305][22713] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 16:21:53,325][22716] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 16:21:53,328][22714] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 16:21:53,324][22717] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 16:21:53,326][22718] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 16:21:53,687][21969] Heartbeat connected on Batcher_0 +[2025-07-06 16:21:53,690][21969] Heartbeat connected on LearnerWorker_p0 +[2025-07-06 16:21:53,734][21969] Heartbeat connected on InferenceWorker_p0-w0 +[2025-07-06 16:21:54,230][21969] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-07-06 16:21:55,455][22720] Decorrelating experience for 0 frames... +[2025-07-06 16:21:55,466][22715] Decorrelating experience for 0 frames... +[2025-07-06 16:21:55,470][22713] Decorrelating experience for 0 frames... +[2025-07-06 16:21:55,478][22719] Decorrelating experience for 0 frames... +[2025-07-06 16:21:55,563][22717] Decorrelating experience for 0 frames... +[2025-07-06 16:21:55,545][22718] Decorrelating experience for 0 frames... +[2025-07-06 16:21:55,556][22716] Decorrelating experience for 0 frames... +[2025-07-06 16:21:56,783][22715] Decorrelating experience for 32 frames... +[2025-07-06 16:21:56,791][22713] Decorrelating experience for 32 frames... +[2025-07-06 16:21:56,927][22714] Decorrelating experience for 0 frames... +[2025-07-06 16:21:56,967][22717] Decorrelating experience for 32 frames... +[2025-07-06 16:21:57,025][22720] Decorrelating experience for 32 frames... +[2025-07-06 16:21:58,821][22714] Decorrelating experience for 32 frames... +[2025-07-06 16:21:58,825][22718] Decorrelating experience for 32 frames... +[2025-07-06 16:21:58,949][22719] Decorrelating experience for 32 frames... +[2025-07-06 16:21:58,988][22716] Decorrelating experience for 32 frames... +[2025-07-06 16:21:59,233][21969] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-07-06 16:21:59,355][22713] Decorrelating experience for 64 frames... +[2025-07-06 16:21:59,392][22715] Decorrelating experience for 64 frames... +[2025-07-06 16:21:59,591][22717] Decorrelating experience for 64 frames... +[2025-07-06 16:21:59,646][22720] Decorrelating experience for 64 frames... +[2025-07-06 16:22:00,464][22714] Decorrelating experience for 64 frames... +[2025-07-06 16:22:00,551][22717] Decorrelating experience for 96 frames... +[2025-07-06 16:22:00,749][21969] Heartbeat connected on RolloutWorker_w5 +[2025-07-06 16:22:00,748][22713] Decorrelating experience for 96 frames... +[2025-07-06 16:22:00,789][22715] Decorrelating experience for 96 frames... +[2025-07-06 16:22:00,889][22719] Decorrelating experience for 64 frames... +[2025-07-06 16:22:01,063][21969] Heartbeat connected on RolloutWorker_w0 +[2025-07-06 16:22:01,157][21969] Heartbeat connected on RolloutWorker_w2 +[2025-07-06 16:22:01,847][22720] Decorrelating experience for 96 frames... +[2025-07-06 16:22:02,089][21969] Heartbeat connected on RolloutWorker_w6 +[2025-07-06 16:22:02,614][22716] Decorrelating experience for 64 frames... +[2025-07-06 16:22:02,616][22714] Decorrelating experience for 96 frames... +[2025-07-06 16:22:02,906][21969] Heartbeat connected on RolloutWorker_w1 +[2025-07-06 16:22:02,934][22718] Decorrelating experience for 64 frames... +[2025-07-06 16:22:04,230][21969] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 31.6. Samples: 316. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-07-06 16:22:04,232][21969] Avg episode reward: [(0, '1.913')] +[2025-07-06 16:22:05,545][22699] Signal inference workers to stop experience collection... +[2025-07-06 16:22:05,568][22712] InferenceWorker_p0-w0: stopping experience collection +[2025-07-06 16:22:05,643][22716] Decorrelating experience for 96 frames... +[2025-07-06 16:22:05,983][21969] Heartbeat connected on RolloutWorker_w3 +[2025-07-06 16:22:06,005][22718] Decorrelating experience for 96 frames... +[2025-07-06 16:22:06,102][21969] Heartbeat connected on RolloutWorker_w7 +[2025-07-06 16:22:06,126][22719] Decorrelating experience for 96 frames... +[2025-07-06 16:22:06,208][21969] Heartbeat connected on RolloutWorker_w4 +[2025-07-06 16:22:07,356][22699] Signal inference workers to resume experience collection... +[2025-07-06 16:22:07,361][22712] InferenceWorker_p0-w0: resuming experience collection +[2025-07-06 16:22:09,234][21969] Fps is (10 sec: 1228.7, 60 sec: 819.0, 300 sec: 819.0). Total num frames: 12288. Throughput: 0: 198.5. Samples: 2978. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2025-07-06 16:22:09,235][21969] Avg episode reward: [(0, '3.024')] +[2025-07-06 16:22:14,230][21969] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 335.0. Samples: 6700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 3.0) +[2025-07-06 16:22:14,232][21969] Avg episode reward: [(0, '3.465')] +[2025-07-06 16:22:18,313][22712] Updated weights for policy 0, policy_version 10 (0.0117) +[2025-07-06 16:22:19,232][21969] Fps is (10 sec: 3277.4, 60 sec: 1802.1, 300 sec: 1802.1). Total num frames: 45056. Throughput: 0: 378.8. Samples: 9470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:22:19,235][21969] Avg episode reward: [(0, '4.111')] +[2025-07-06 16:22:24,230][21969] Fps is (10 sec: 4505.6, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 65536. Throughput: 0: 535.7. Samples: 16072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:22:24,234][21969] Avg episode reward: [(0, '4.343')] +[2025-07-06 16:22:29,230][21969] Fps is (10 sec: 3277.4, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 597.8. Samples: 20924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2025-07-06 16:22:29,233][21969] Avg episode reward: [(0, '4.274')] +[2025-07-06 16:22:29,569][22712] Updated weights for policy 0, policy_version 20 (0.0014) +[2025-07-06 16:22:34,231][21969] Fps is (10 sec: 3686.3, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 597.8. Samples: 23912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:22:34,234][21969] Avg episode reward: [(0, '4.435')] +[2025-07-06 16:22:34,237][22699] Saving new best policy, reward=4.435! +[2025-07-06 16:22:38,598][22712] Updated weights for policy 0, policy_version 30 (0.0026) +[2025-07-06 16:22:39,230][21969] Fps is (10 sec: 4505.6, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 687.6. Samples: 30944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0) +[2025-07-06 16:22:39,232][21969] Avg episode reward: [(0, '4.584')] +[2025-07-06 16:22:39,237][22699] Saving new best policy, reward=4.584! +[2025-07-06 16:22:44,230][21969] Fps is (10 sec: 3686.5, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 794.5. Samples: 35748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2025-07-06 16:22:44,235][21969] Avg episode reward: [(0, '4.604')] +[2025-07-06 16:22:44,243][22699] Saving new best policy, reward=4.604! +[2025-07-06 16:22:49,230][21969] Fps is (10 sec: 3686.4, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 159744. Throughput: 0: 852.4. Samples: 38676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:22:49,232][21969] Avg episode reward: [(0, '4.618')] +[2025-07-06 16:22:49,239][22699] Saving new best policy, reward=4.618! +[2025-07-06 16:22:49,837][22712] Updated weights for policy 0, policy_version 40 (0.0020) +[2025-07-06 16:22:54,230][21969] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 180224. Throughput: 0: 943.4. Samples: 45426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2025-07-06 16:22:54,234][21969] Avg episode reward: [(0, '4.411')] +[2025-07-06 16:22:59,231][21969] Fps is (10 sec: 3686.3, 60 sec: 3277.0, 300 sec: 3024.7). Total num frames: 196608. Throughput: 0: 978.1. Samples: 50714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2025-07-06 16:22:59,232][21969] Avg episode reward: [(0, '4.403')] +[2025-07-06 16:23:00,467][22712] Updated weights for policy 0, policy_version 50 (0.0022) +[2025-07-06 16:23:04,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3101.3). Total num frames: 217088. Throughput: 0: 978.0. Samples: 53476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:23:04,232][21969] Avg episode reward: [(0, '4.341')] +[2025-07-06 16:23:09,230][21969] Fps is (10 sec: 4505.7, 60 sec: 3823.2, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 986.2. Samples: 60452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:23:09,232][21969] Avg episode reward: [(0, '4.468')] +[2025-07-06 16:23:09,427][22712] Updated weights for policy 0, policy_version 60 (0.0016) +[2025-07-06 16:23:14,231][21969] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3225.6). Total num frames: 258048. Throughput: 0: 990.5. Samples: 65498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:23:14,234][21969] Avg episode reward: [(0, '4.520')] +[2025-07-06 16:23:19,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3276.8). Total num frames: 278528. Throughput: 0: 990.3. Samples: 68476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-07-06 16:23:19,232][21969] Avg episode reward: [(0, '4.381')] +[2025-07-06 16:23:20,233][22712] Updated weights for policy 0, policy_version 70 (0.0032) +[2025-07-06 16:23:24,230][21969] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3367.8). Total num frames: 303104. Throughput: 0: 983.7. Samples: 75212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-07-06 16:23:24,234][21969] Avg episode reward: [(0, '4.365')] +[2025-07-06 16:23:29,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 997.1. Samples: 80618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:23:29,232][21969] Avg episode reward: [(0, '4.453')] +[2025-07-06 16:23:29,239][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth... +[2025-07-06 16:23:29,388][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth +[2025-07-06 16:23:31,223][22712] Updated weights for policy 0, policy_version 80 (0.0043) +[2025-07-06 16:23:34,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3399.7). Total num frames: 339968. Throughput: 0: 993.7. Samples: 83394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:23:34,234][21969] Avg episode reward: [(0, '4.544')] +[2025-07-06 16:23:39,230][21969] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3432.8). Total num frames: 360448. Throughput: 0: 1000.9. Samples: 90466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2025-07-06 16:23:39,232][21969] Avg episode reward: [(0, '4.453')] +[2025-07-06 16:23:40,268][22712] Updated weights for policy 0, policy_version 90 (0.0028) +[2025-07-06 16:23:44,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3425.7). Total num frames: 376832. Throughput: 0: 998.0. Samples: 95622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2025-07-06 16:23:44,239][21969] Avg episode reward: [(0, '4.570')] +[2025-07-06 16:23:49,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3454.9). Total num frames: 397312. Throughput: 0: 1001.3. Samples: 98534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:23:49,234][21969] Avg episode reward: [(0, '4.640')] +[2025-07-06 16:23:49,259][22699] Saving new best policy, reward=4.640! +[2025-07-06 16:23:50,974][22712] Updated weights for policy 0, policy_version 100 (0.0026) +[2025-07-06 16:23:54,231][21969] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3515.7). Total num frames: 421888. Throughput: 0: 1003.7. Samples: 105620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:23:54,232][21969] Avg episode reward: [(0, '4.655')] +[2025-07-06 16:23:54,236][22699] Saving new best policy, reward=4.655! +[2025-07-06 16:23:59,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3506.2). Total num frames: 438272. Throughput: 0: 1009.6. Samples: 110928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:23:59,234][21969] Avg episode reward: [(0, '4.625')] +[2025-07-06 16:24:01,367][22712] Updated weights for policy 0, policy_version 110 (0.0022) +[2025-07-06 16:24:04,230][21969] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3560.4). Total num frames: 462848. Throughput: 0: 1011.5. Samples: 113994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:24:04,235][21969] Avg episode reward: [(0, '4.480')] +[2025-07-06 16:24:09,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 1014.0. Samples: 120842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:24:09,235][21969] Avg episode reward: [(0, '4.422')] +[2025-07-06 16:24:10,534][22712] Updated weights for policy 0, policy_version 120 (0.0013) +[2025-07-06 16:24:14,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3569.4). Total num frames: 499712. Throughput: 0: 1010.8. Samples: 126106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:24:14,235][21969] Avg episode reward: [(0, '4.435')] +[2025-07-06 16:24:19,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3615.8). Total num frames: 524288. Throughput: 0: 1018.8. Samples: 129238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:24:19,235][21969] Avg episode reward: [(0, '4.612')] +[2025-07-06 16:24:20,818][22712] Updated weights for policy 0, policy_version 130 (0.0017) +[2025-07-06 16:24:24,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3631.8). Total num frames: 544768. Throughput: 0: 1018.7. Samples: 136308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:24:24,232][21969] Avg episode reward: [(0, '4.709')] +[2025-07-06 16:24:24,233][22699] Saving new best policy, reward=4.709! +[2025-07-06 16:24:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3620.3). Total num frames: 561152. Throughput: 0: 1014.5. Samples: 141276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:24:29,239][21969] Avg episode reward: [(0, '4.543')] +[2025-07-06 16:24:31,723][22712] Updated weights for policy 0, policy_version 140 (0.0023) +[2025-07-06 16:24:34,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3635.2). Total num frames: 581632. Throughput: 0: 1020.1. Samples: 144440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:24:34,232][21969] Avg episode reward: [(0, '4.369')] +[2025-07-06 16:24:39,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3674.0). Total num frames: 606208. Throughput: 0: 1020.7. Samples: 151550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:24:39,233][21969] Avg episode reward: [(0, '4.506')] +[2025-07-06 16:24:41,080][22712] Updated weights for policy 0, policy_version 150 (0.0041) +[2025-07-06 16:24:44,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3662.3). Total num frames: 622592. Throughput: 0: 1006.9. Samples: 156240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:24:44,234][21969] Avg episode reward: [(0, '4.526')] +[2025-07-06 16:24:49,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3674.7). Total num frames: 643072. Throughput: 0: 1015.1. Samples: 159672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:24:49,232][21969] Avg episode reward: [(0, '4.634')] +[2025-07-06 16:24:51,141][22712] Updated weights for policy 0, policy_version 160 (0.0014) +[2025-07-06 16:24:54,231][21969] Fps is (10 sec: 4505.3, 60 sec: 4096.0, 300 sec: 3709.1). Total num frames: 667648. Throughput: 0: 1013.6. Samples: 166454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:24:54,234][21969] Avg episode reward: [(0, '4.468')] +[2025-07-06 16:24:59,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3697.5). Total num frames: 684032. Throughput: 0: 1009.7. Samples: 171544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:24:59,232][21969] Avg episode reward: [(0, '4.582')] +[2025-07-06 16:25:02,016][22712] Updated weights for policy 0, policy_version 170 (0.0019) +[2025-07-06 16:25:04,230][21969] Fps is (10 sec: 3686.6, 60 sec: 4027.7, 300 sec: 3708.0). Total num frames: 704512. Throughput: 0: 1009.8. Samples: 174678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:25:04,235][21969] Avg episode reward: [(0, '4.613')] +[2025-07-06 16:25:09,231][21969] Fps is (10 sec: 4095.7, 60 sec: 4027.7, 300 sec: 3717.9). Total num frames: 724992. Throughput: 0: 1011.0. Samples: 181806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:25:09,234][21969] Avg episode reward: [(0, '4.572')] +[2025-07-06 16:25:12,213][22712] Updated weights for policy 0, policy_version 180 (0.0012) +[2025-07-06 16:25:14,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3706.9). Total num frames: 741376. Throughput: 0: 1002.7. Samples: 186398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:25:14,231][21969] Avg episode reward: [(0, '4.752')] +[2025-07-06 16:25:14,235][22699] Saving new best policy, reward=4.752! +[2025-07-06 16:25:19,230][21969] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 3736.4). Total num frames: 765952. Throughput: 0: 1009.0. Samples: 189846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:25:19,232][21969] Avg episode reward: [(0, '4.875')] +[2025-07-06 16:25:19,239][22699] Saving new best policy, reward=4.875! +[2025-07-06 16:25:21,602][22712] Updated weights for policy 0, policy_version 190 (0.0016) +[2025-07-06 16:25:24,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3744.9). Total num frames: 786432. Throughput: 0: 1000.9. Samples: 196592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:25:24,232][21969] Avg episode reward: [(0, '5.008')] +[2025-07-06 16:25:24,235][22699] Saving new best policy, reward=5.008! +[2025-07-06 16:25:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3734.0). Total num frames: 802816. Throughput: 0: 1004.2. Samples: 201428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-07-06 16:25:29,232][21969] Avg episode reward: [(0, '5.237')] +[2025-07-06 16:25:29,242][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_802816.pth... +[2025-07-06 16:25:29,359][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_802816.pth +[2025-07-06 16:25:29,368][22699] Saving new best policy, reward=5.237! +[2025-07-06 16:25:32,779][22712] Updated weights for policy 0, policy_version 200 (0.0028) +[2025-07-06 16:25:34,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3742.3). Total num frames: 823296. Throughput: 0: 1000.7. Samples: 204702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:25:34,232][21969] Avg episode reward: [(0, '5.120')] +[2025-07-06 16:25:39,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3768.3). Total num frames: 847872. Throughput: 0: 1005.6. Samples: 211704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:25:39,235][21969] Avg episode reward: [(0, '5.135')] +[2025-07-06 16:25:43,366][22712] Updated weights for policy 0, policy_version 210 (0.0031) +[2025-07-06 16:25:44,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3739.8). Total num frames: 860160. Throughput: 0: 997.7. Samples: 216442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:25:44,232][21969] Avg episode reward: [(0, '5.029')] +[2025-07-06 16:25:49,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3764.8). Total num frames: 884736. Throughput: 0: 1001.8. Samples: 219758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:25:49,232][21969] Avg episode reward: [(0, '4.779')] +[2025-07-06 16:25:52,388][22712] Updated weights for policy 0, policy_version 220 (0.0025) +[2025-07-06 16:25:54,231][21969] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3771.7). Total num frames: 905216. Throughput: 0: 992.9. Samples: 226484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:25:54,232][21969] Avg episode reward: [(0, '4.928')] +[2025-07-06 16:25:59,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3761.6). Total num frames: 921600. Throughput: 0: 1000.8. Samples: 231434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:25:59,234][21969] Avg episode reward: [(0, '4.723')] +[2025-07-06 16:26:03,443][22712] Updated weights for policy 0, policy_version 230 (0.0023) +[2025-07-06 16:26:04,230][21969] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3768.3). Total num frames: 942080. Throughput: 0: 1001.4. Samples: 234908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:26:04,232][21969] Avg episode reward: [(0, '4.758')] +[2025-07-06 16:26:09,230][21969] Fps is (10 sec: 4505.5, 60 sec: 4027.8, 300 sec: 3790.8). Total num frames: 966656. Throughput: 0: 996.4. Samples: 241430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:26:09,234][21969] Avg episode reward: [(0, '5.185')] +[2025-07-06 16:26:14,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3765.2). Total num frames: 978944. Throughput: 0: 991.0. Samples: 246022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:26:14,232][21969] Avg episode reward: [(0, '5.207')] +[2025-07-06 16:26:14,498][22712] Updated weights for policy 0, policy_version 240 (0.0035) +[2025-07-06 16:26:19,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3786.9). Total num frames: 1003520. Throughput: 0: 996.1. Samples: 249526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2025-07-06 16:26:19,235][21969] Avg episode reward: [(0, '5.271')] +[2025-07-06 16:26:19,243][22699] Saving new best policy, reward=5.271! +[2025-07-06 16:26:23,374][22712] Updated weights for policy 0, policy_version 250 (0.0020) +[2025-07-06 16:26:24,232][21969] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3792.6). Total num frames: 1024000. Throughput: 0: 994.3. Samples: 256448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:26:24,234][21969] Avg episode reward: [(0, '5.585')] +[2025-07-06 16:26:24,243][22699] Saving new best policy, reward=5.585! +[2025-07-06 16:26:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3783.2). Total num frames: 1040384. Throughput: 0: 990.6. Samples: 261018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:26:29,235][21969] Avg episode reward: [(0, '5.720')] +[2025-07-06 16:26:29,242][22699] Saving new best policy, reward=5.720! +[2025-07-06 16:26:34,230][21969] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3788.8). Total num frames: 1060864. Throughput: 0: 995.1. Samples: 264536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2025-07-06 16:26:34,234][21969] Avg episode reward: [(0, '5.829')] +[2025-07-06 16:26:34,237][22699] Saving new best policy, reward=5.829! +[2025-07-06 16:26:34,501][22712] Updated weights for policy 0, policy_version 260 (0.0024) +[2025-07-06 16:26:39,230][21969] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3794.2). Total num frames: 1081344. Throughput: 0: 991.0. Samples: 271078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:26:39,232][21969] Avg episode reward: [(0, '5.731')] +[2025-07-06 16:26:44,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3785.3). Total num frames: 1097728. Throughput: 0: 982.5. Samples: 275648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:26:44,232][21969] Avg episode reward: [(0, '6.122')] +[2025-07-06 16:26:44,235][22699] Saving new best policy, reward=6.122! +[2025-07-06 16:26:45,275][22712] Updated weights for policy 0, policy_version 270 (0.0017) +[2025-07-06 16:26:49,230][21969] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 981.6. Samples: 279078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:26:49,235][21969] Avg episode reward: [(0, '6.485')] +[2025-07-06 16:26:49,241][22699] Saving new best policy, reward=6.485! +[2025-07-06 16:26:54,232][21969] Fps is (10 sec: 4504.8, 60 sec: 3959.4, 300 sec: 3873.9). Total num frames: 1142784. Throughput: 0: 988.8. Samples: 285926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:26:54,233][21969] Avg episode reward: [(0, '6.685')] +[2025-07-06 16:26:54,236][22699] Saving new best policy, reward=6.685! +[2025-07-06 16:26:55,374][22712] Updated weights for policy 0, policy_version 280 (0.0025) +[2025-07-06 16:26:59,231][21969] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3929.4). Total num frames: 1159168. Throughput: 0: 991.4. Samples: 290636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:26:59,236][21969] Avg episode reward: [(0, '6.580')] +[2025-07-06 16:27:04,231][21969] Fps is (10 sec: 3686.7, 60 sec: 3959.4, 300 sec: 3957.2). Total num frames: 1179648. Throughput: 0: 991.1. Samples: 294126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:27:04,233][21969] Avg episode reward: [(0, '6.445')] +[2025-07-06 16:27:05,458][22712] Updated weights for policy 0, policy_version 290 (0.0012) +[2025-07-06 16:27:09,234][21969] Fps is (10 sec: 4094.5, 60 sec: 3891.0, 300 sec: 3998.8). Total num frames: 1200128. Throughput: 0: 981.0. Samples: 300596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:27:09,240][21969] Avg episode reward: [(0, '7.028')] +[2025-07-06 16:27:09,249][22699] Saving new best policy, reward=7.028! +[2025-07-06 16:27:14,230][21969] Fps is (10 sec: 3686.8, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 1216512. Throughput: 0: 989.1. Samples: 305528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:27:14,234][21969] Avg episode reward: [(0, '7.067')] +[2025-07-06 16:27:14,290][22699] Saving new best policy, reward=7.067! +[2025-07-06 16:27:16,284][22712] Updated weights for policy 0, policy_version 300 (0.0037) +[2025-07-06 16:27:19,230][21969] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1241088. Throughput: 0: 983.1. Samples: 308776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:27:19,232][21969] Avg episode reward: [(0, '8.048')] +[2025-07-06 16:27:19,238][22699] Saving new best policy, reward=8.048! +[2025-07-06 16:27:24,230][21969] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 4012.7). Total num frames: 1261568. Throughput: 0: 984.2. Samples: 315368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:27:24,232][21969] Avg episode reward: [(0, '8.172')] +[2025-07-06 16:27:24,235][22699] Saving new best policy, reward=8.172! +[2025-07-06 16:27:27,001][22712] Updated weights for policy 0, policy_version 310 (0.0020) +[2025-07-06 16:27:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1277952. Throughput: 0: 988.9. Samples: 320150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:27:29,232][21969] Avg episode reward: [(0, '8.521')] +[2025-07-06 16:27:29,244][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000312_1277952.pth... +[2025-07-06 16:27:29,358][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000312_1277952.pth +[2025-07-06 16:27:29,377][22699] Saving new best policy, reward=8.521! +[2025-07-06 16:27:34,234][21969] Fps is (10 sec: 4094.7, 60 sec: 4027.5, 300 sec: 3998.8). Total num frames: 1302528. Throughput: 0: 990.1. Samples: 323634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:27:34,235][21969] Avg episode reward: [(0, '8.098')] +[2025-07-06 16:27:36,208][22712] Updated weights for policy 0, policy_version 320 (0.0012) +[2025-07-06 16:27:39,234][21969] Fps is (10 sec: 4094.4, 60 sec: 3959.2, 300 sec: 3998.8). Total num frames: 1318912. Throughput: 0: 983.5. Samples: 330186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:27:39,236][21969] Avg episode reward: [(0, '7.891')] +[2025-07-06 16:27:44,230][21969] Fps is (10 sec: 3277.9, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1335296. Throughput: 0: 993.8. Samples: 335358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:27:44,234][21969] Avg episode reward: [(0, '8.292')] +[2025-07-06 16:27:47,142][22712] Updated weights for policy 0, policy_version 330 (0.0023) +[2025-07-06 16:27:49,230][21969] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1359872. Throughput: 0: 987.0. Samples: 338540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:27:49,234][21969] Avg episode reward: [(0, '8.530')] +[2025-07-06 16:27:49,241][22699] Saving new best policy, reward=8.530! +[2025-07-06 16:27:54,230][21969] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 4012.7). Total num frames: 1380352. Throughput: 0: 990.3. Samples: 345156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:27:54,232][21969] Avg episode reward: [(0, '8.611')] +[2025-07-06 16:27:54,234][22699] Saving new best policy, reward=8.611! +[2025-07-06 16:27:57,926][22712] Updated weights for policy 0, policy_version 340 (0.0030) +[2025-07-06 16:27:59,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1396736. Throughput: 0: 988.6. Samples: 350014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:27:59,232][21969] Avg episode reward: [(0, '9.139')] +[2025-07-06 16:27:59,237][22699] Saving new best policy, reward=9.139! +[2025-07-06 16:28:04,230][21969] Fps is (10 sec: 4095.9, 60 sec: 4027.8, 300 sec: 3998.8). Total num frames: 1421312. Throughput: 0: 996.3. Samples: 353608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 16:28:04,232][21969] Avg episode reward: [(0, '9.720')] +[2025-07-06 16:28:04,242][22699] Saving new best policy, reward=9.720! +[2025-07-06 16:28:07,019][22712] Updated weights for policy 0, policy_version 350 (0.0019) +[2025-07-06 16:28:09,234][21969] Fps is (10 sec: 4094.5, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1437696. Throughput: 0: 991.2. Samples: 359974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:28:09,235][21969] Avg episode reward: [(0, '10.337')] +[2025-07-06 16:28:09,240][22699] Saving new best policy, reward=10.337! +[2025-07-06 16:28:14,230][21969] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1454080. Throughput: 0: 999.1. Samples: 365110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:28:14,231][21969] Avg episode reward: [(0, '10.160')] +[2025-07-06 16:28:17,502][22712] Updated weights for policy 0, policy_version 360 (0.0029) +[2025-07-06 16:28:19,230][21969] Fps is (10 sec: 4507.3, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1482752. Throughput: 0: 1001.5. Samples: 368696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:28:19,232][21969] Avg episode reward: [(0, '9.732')] +[2025-07-06 16:28:24,230][21969] Fps is (10 sec: 4915.1, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1503232. Throughput: 0: 1004.7. Samples: 375394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-07-06 16:28:24,236][21969] Avg episode reward: [(0, '10.091')] +[2025-07-06 16:28:27,902][22712] Updated weights for policy 0, policy_version 370 (0.0021) +[2025-07-06 16:28:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1519616. Throughput: 0: 1009.8. Samples: 380800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:28:29,232][21969] Avg episode reward: [(0, '10.192')] +[2025-07-06 16:28:34,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4012.7). Total num frames: 1544192. Throughput: 0: 1018.4. Samples: 384370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:28:34,232][21969] Avg episode reward: [(0, '11.170')] +[2025-07-06 16:28:34,234][22699] Saving new best policy, reward=11.170! +[2025-07-06 16:28:36,568][22712] Updated weights for policy 0, policy_version 380 (0.0028) +[2025-07-06 16:28:39,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 4012.7). Total num frames: 1560576. Throughput: 0: 1016.4. Samples: 390892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:28:39,235][21969] Avg episode reward: [(0, '11.379')] +[2025-07-06 16:28:39,243][22699] Saving new best policy, reward=11.379! +[2025-07-06 16:28:44,230][21969] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1581056. Throughput: 0: 1032.2. Samples: 396462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:28:44,232][21969] Avg episode reward: [(0, '11.825')] +[2025-07-06 16:28:44,233][22699] Saving new best policy, reward=11.825! +[2025-07-06 16:28:47,108][22712] Updated weights for policy 0, policy_version 390 (0.0033) +[2025-07-06 16:28:49,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1605632. Throughput: 0: 1029.8. Samples: 399950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:28:49,232][21969] Avg episode reward: [(0, '11.361')] +[2025-07-06 16:28:54,238][21969] Fps is (10 sec: 4502.1, 60 sec: 4095.5, 300 sec: 4026.5). Total num frames: 1626112. Throughput: 0: 1028.4. Samples: 406254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:28:54,242][21969] Avg episode reward: [(0, '12.089')] +[2025-07-06 16:28:54,244][22699] Saving new best policy, reward=12.089! +[2025-07-06 16:28:57,499][22712] Updated weights for policy 0, policy_version 400 (0.0016) +[2025-07-06 16:28:59,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1642496. Throughput: 0: 1044.8. Samples: 412126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:28:59,235][21969] Avg episode reward: [(0, '13.213')] +[2025-07-06 16:28:59,242][22699] Saving new best policy, reward=13.213! +[2025-07-06 16:29:04,230][21969] Fps is (10 sec: 4099.2, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1667072. Throughput: 0: 1043.4. Samples: 415650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:29:04,235][21969] Avg episode reward: [(0, '12.562')] +[2025-07-06 16:29:06,488][22712] Updated weights for policy 0, policy_version 410 (0.0029) +[2025-07-06 16:29:09,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.3, 300 sec: 4012.7). Total num frames: 1683456. Throughput: 0: 1026.5. Samples: 421588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:29:09,237][21969] Avg episode reward: [(0, '12.581')] +[2025-07-06 16:29:14,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4012.7). Total num frames: 1708032. Throughput: 0: 1036.4. Samples: 427438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:29:14,235][21969] Avg episode reward: [(0, '12.854')] +[2025-07-06 16:29:16,776][22712] Updated weights for policy 0, policy_version 420 (0.0019) +[2025-07-06 16:29:19,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1728512. Throughput: 0: 1035.8. Samples: 430980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:29:19,234][21969] Avg episode reward: [(0, '11.992')] +[2025-07-06 16:29:24,231][21969] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1744896. Throughput: 0: 1017.7. Samples: 436690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:29:24,232][21969] Avg episode reward: [(0, '13.642')] +[2025-07-06 16:29:24,237][22699] Saving new best policy, reward=13.642! +[2025-07-06 16:29:27,616][22712] Updated weights for policy 0, policy_version 430 (0.0020) +[2025-07-06 16:29:29,232][21969] Fps is (10 sec: 3685.7, 60 sec: 4095.9, 300 sec: 4012.7). Total num frames: 1765376. Throughput: 0: 1027.9. Samples: 442718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:29:29,236][21969] Avg episode reward: [(0, '14.008')] +[2025-07-06 16:29:29,249][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000432_1769472.pth... +[2025-07-06 16:29:29,380][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000432_1769472.pth +[2025-07-06 16:29:29,396][22699] Saving new best policy, reward=14.008! +[2025-07-06 16:29:34,230][21969] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1789952. Throughput: 0: 1018.3. Samples: 445772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:29:34,232][21969] Avg episode reward: [(0, '15.457')] +[2025-07-06 16:29:34,236][22699] Saving new best policy, reward=15.457! +[2025-07-06 16:29:37,270][22712] Updated weights for policy 0, policy_version 440 (0.0029) +[2025-07-06 16:29:39,231][21969] Fps is (10 sec: 4096.5, 60 sec: 4095.9, 300 sec: 4012.7). Total num frames: 1806336. Throughput: 0: 1008.7. Samples: 451640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:29:39,232][21969] Avg episode reward: [(0, '16.413')] +[2025-07-06 16:29:39,239][22699] Saving new best policy, reward=16.413! +[2025-07-06 16:29:44,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1826816. Throughput: 0: 1006.6. Samples: 457422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:29:44,235][21969] Avg episode reward: [(0, '15.265')] +[2025-07-06 16:29:47,351][22712] Updated weights for policy 0, policy_version 450 (0.0036) +[2025-07-06 16:29:49,230][21969] Fps is (10 sec: 4505.9, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1851392. Throughput: 0: 1008.4. Samples: 461030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:29:49,235][21969] Avg episode reward: [(0, '15.870')] +[2025-07-06 16:29:54,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4028.3, 300 sec: 4012.7). Total num frames: 1867776. Throughput: 0: 1002.9. Samples: 466718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:29:54,233][21969] Avg episode reward: [(0, '15.342')] +[2025-07-06 16:29:57,803][22712] Updated weights for policy 0, policy_version 460 (0.0019) +[2025-07-06 16:29:59,230][21969] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1888256. Throughput: 0: 1013.5. Samples: 473046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:29:59,234][21969] Avg episode reward: [(0, '15.721')] +[2025-07-06 16:30:04,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1912832. Throughput: 0: 1014.7. Samples: 476640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:30:04,232][21969] Avg episode reward: [(0, '17.046')] +[2025-07-06 16:30:04,240][22699] Saving new best policy, reward=17.046! +[2025-07-06 16:30:07,759][22712] Updated weights for policy 0, policy_version 470 (0.0016) +[2025-07-06 16:30:09,231][21969] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1925120. Throughput: 0: 1008.6. Samples: 482078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:30:09,234][21969] Avg episode reward: [(0, '15.981')] +[2025-07-06 16:30:14,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1949696. Throughput: 0: 1020.6. Samples: 488642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:30:14,235][21969] Avg episode reward: [(0, '15.943')] +[2025-07-06 16:30:16,943][22712] Updated weights for policy 0, policy_version 480 (0.0025) +[2025-07-06 16:30:19,230][21969] Fps is (10 sec: 4915.3, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1974272. Throughput: 0: 1032.6. Samples: 492238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:30:19,234][21969] Avg episode reward: [(0, '14.844')] +[2025-07-06 16:30:24,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1990656. Throughput: 0: 1024.4. Samples: 497738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:30:24,232][21969] Avg episode reward: [(0, '14.609')] +[2025-07-06 16:30:27,250][22712] Updated weights for policy 0, policy_version 490 (0.0012) +[2025-07-06 16:30:29,231][21969] Fps is (10 sec: 4095.9, 60 sec: 4164.4, 300 sec: 4040.5). Total num frames: 2015232. Throughput: 0: 1043.6. Samples: 504386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:30:29,232][21969] Avg episode reward: [(0, '15.237')] +[2025-07-06 16:30:34,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2035712. Throughput: 0: 1044.6. Samples: 508038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:30:34,232][21969] Avg episode reward: [(0, '13.871')] +[2025-07-06 16:30:37,449][22712] Updated weights for policy 0, policy_version 500 (0.0015) +[2025-07-06 16:30:39,230][21969] Fps is (10 sec: 3686.5, 60 sec: 4096.1, 300 sec: 4040.5). Total num frames: 2052096. Throughput: 0: 1029.0. Samples: 513022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:30:39,232][21969] Avg episode reward: [(0, '15.281')] +[2025-07-06 16:30:44,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 2076672. Throughput: 0: 1033.3. Samples: 519544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:30:44,231][21969] Avg episode reward: [(0, '16.847')] +[2025-07-06 16:30:46,880][22712] Updated weights for policy 0, policy_version 510 (0.0021) +[2025-07-06 16:30:49,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2097152. Throughput: 0: 1032.5. Samples: 523104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:30:49,232][21969] Avg episode reward: [(0, '16.736')] +[2025-07-06 16:30:54,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2113536. Throughput: 0: 1024.7. Samples: 528188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:30:54,234][21969] Avg episode reward: [(0, '17.369')] +[2025-07-06 16:30:54,238][22699] Saving new best policy, reward=17.369! +[2025-07-06 16:30:57,705][22712] Updated weights for policy 0, policy_version 520 (0.0012) +[2025-07-06 16:30:59,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2134016. Throughput: 0: 1025.5. Samples: 534790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:30:59,233][21969] Avg episode reward: [(0, '19.156')] +[2025-07-06 16:30:59,242][22699] Saving new best policy, reward=19.156! +[2025-07-06 16:31:04,231][21969] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 4040.5). Total num frames: 2158592. Throughput: 0: 1022.6. Samples: 538258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:31:04,233][21969] Avg episode reward: [(0, '18.132')] +[2025-07-06 16:31:08,469][22712] Updated weights for policy 0, policy_version 530 (0.0025) +[2025-07-06 16:31:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2170880. Throughput: 0: 1006.5. Samples: 543032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:31:09,232][21969] Avg episode reward: [(0, '19.412')] +[2025-07-06 16:31:09,237][22699] Saving new best policy, reward=19.412! +[2025-07-06 16:31:14,230][21969] Fps is (10 sec: 3686.7, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2195456. Throughput: 0: 1009.7. Samples: 549822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:31:14,232][21969] Avg episode reward: [(0, '19.267')] +[2025-07-06 16:31:17,552][22712] Updated weights for policy 0, policy_version 540 (0.0019) +[2025-07-06 16:31:19,231][21969] Fps is (10 sec: 4505.2, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2215936. Throughput: 0: 1001.9. Samples: 553126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:31:19,233][21969] Avg episode reward: [(0, '19.484')] +[2025-07-06 16:31:19,241][22699] Saving new best policy, reward=19.484! +[2025-07-06 16:31:24,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2232320. Throughput: 0: 1000.8. Samples: 558056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:31:24,232][21969] Avg episode reward: [(0, '19.790')] +[2025-07-06 16:31:24,234][22699] Saving new best policy, reward=19.790! +[2025-07-06 16:31:28,487][22712] Updated weights for policy 0, policy_version 550 (0.0012) +[2025-07-06 16:31:29,230][21969] Fps is (10 sec: 3686.7, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2252800. Throughput: 0: 1002.3. Samples: 564648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:31:29,234][21969] Avg episode reward: [(0, '20.623')] +[2025-07-06 16:31:29,242][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth... +[2025-07-06 16:31:29,368][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth +[2025-07-06 16:31:29,380][22699] Saving new best policy, reward=20.623! +[2025-07-06 16:31:34,230][21969] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2277376. Throughput: 0: 1000.4. Samples: 568120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:31:34,232][21969] Avg episode reward: [(0, '20.635')] +[2025-07-06 16:31:34,233][22699] Saving new best policy, reward=20.635! +[2025-07-06 16:31:39,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2289664. Throughput: 0: 994.6. Samples: 572944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:31:39,232][21969] Avg episode reward: [(0, '21.470')] +[2025-07-06 16:31:39,300][22712] Updated weights for policy 0, policy_version 560 (0.0022) +[2025-07-06 16:31:39,301][22699] Saving new best policy, reward=21.470! +[2025-07-06 16:31:44,230][21969] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2314240. Throughput: 0: 1000.7. Samples: 579820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:31:44,233][21969] Avg episode reward: [(0, '20.811')] +[2025-07-06 16:31:48,361][22712] Updated weights for policy 0, policy_version 570 (0.0016) +[2025-07-06 16:31:49,230][21969] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2334720. Throughput: 0: 994.1. Samples: 582990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:31:49,235][21969] Avg episode reward: [(0, '19.904')] +[2025-07-06 16:31:54,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2351104. Throughput: 0: 997.1. Samples: 587900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:31:54,235][21969] Avg episode reward: [(0, '19.269')] +[2025-07-06 16:31:59,081][22712] Updated weights for policy 0, policy_version 580 (0.0030) +[2025-07-06 16:31:59,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 2375680. Throughput: 0: 993.7. Samples: 594538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:31:59,235][21969] Avg episode reward: [(0, '18.939')] +[2025-07-06 16:32:04,231][21969] Fps is (10 sec: 4505.3, 60 sec: 3959.5, 300 sec: 4054.4). Total num frames: 2396160. Throughput: 0: 1000.1. Samples: 598130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:32:04,235][21969] Avg episode reward: [(0, '17.539')] +[2025-07-06 16:32:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2412544. Throughput: 0: 992.6. Samples: 602722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:32:09,235][21969] Avg episode reward: [(0, '17.229')] +[2025-07-06 16:32:09,737][22712] Updated weights for policy 0, policy_version 590 (0.0026) +[2025-07-06 16:32:14,230][21969] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2437120. Throughput: 0: 1003.4. Samples: 609800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:32:14,235][21969] Avg episode reward: [(0, '18.405')] +[2025-07-06 16:32:18,750][22712] Updated weights for policy 0, policy_version 600 (0.0029) +[2025-07-06 16:32:19,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 2457600. Throughput: 0: 999.6. Samples: 613100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 16:32:19,232][21969] Avg episode reward: [(0, '20.316')] +[2025-07-06 16:32:24,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2473984. Throughput: 0: 1001.2. Samples: 617996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:32:24,235][21969] Avg episode reward: [(0, '20.700')] +[2025-07-06 16:32:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2494464. Throughput: 0: 997.8. Samples: 624722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 16:32:29,236][21969] Avg episode reward: [(0, '19.712')] +[2025-07-06 16:32:29,515][22712] Updated weights for policy 0, policy_version 610 (0.0015) +[2025-07-06 16:32:34,231][21969] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 4054.4). Total num frames: 2514944. Throughput: 0: 1006.9. Samples: 628300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:32:34,232][21969] Avg episode reward: [(0, '19.395')] +[2025-07-06 16:32:39,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2531328. Throughput: 0: 1005.4. Samples: 633142. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:32:39,231][21969] Avg episode reward: [(0, '16.524')] +[2025-07-06 16:32:40,274][22712] Updated weights for policy 0, policy_version 620 (0.0021) +[2025-07-06 16:32:44,230][21969] Fps is (10 sec: 4096.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2555904. Throughput: 0: 1008.5. Samples: 639922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:32:44,232][21969] Avg episode reward: [(0, '15.975')] +[2025-07-06 16:32:49,233][21969] Fps is (10 sec: 4504.3, 60 sec: 4027.5, 300 sec: 4054.3). Total num frames: 2576384. Throughput: 0: 1002.5. Samples: 643244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:32:49,235][21969] Avg episode reward: [(0, '15.506')] +[2025-07-06 16:32:50,069][22712] Updated weights for policy 0, policy_version 630 (0.0020) +[2025-07-06 16:32:54,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2592768. Throughput: 0: 1008.1. Samples: 648086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:32:54,236][21969] Avg episode reward: [(0, '15.563')] +[2025-07-06 16:32:59,230][21969] Fps is (10 sec: 4097.2, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2617344. Throughput: 0: 1005.8. Samples: 655060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:32:59,235][21969] Avg episode reward: [(0, '18.030')] +[2025-07-06 16:32:59,796][22712] Updated weights for policy 0, policy_version 640 (0.0014) +[2025-07-06 16:33:04,233][21969] Fps is (10 sec: 4504.3, 60 sec: 4027.6, 300 sec: 4068.2). Total num frames: 2637824. Throughput: 0: 1013.0. Samples: 658690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:33:04,234][21969] Avg episode reward: [(0, '18.676')] +[2025-07-06 16:33:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2654208. Throughput: 0: 1011.0. Samples: 663490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:33:09,232][21969] Avg episode reward: [(0, '18.854')] +[2025-07-06 16:33:10,318][22712] Updated weights for policy 0, policy_version 650 (0.0041) +[2025-07-06 16:33:14,230][21969] Fps is (10 sec: 4097.2, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2678784. Throughput: 0: 1021.1. Samples: 670672. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-07-06 16:33:14,235][21969] Avg episode reward: [(0, '18.909')] +[2025-07-06 16:33:19,234][21969] Fps is (10 sec: 4503.8, 60 sec: 4027.5, 300 sec: 4054.3). Total num frames: 2699264. Throughput: 0: 1021.6. Samples: 674276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-07-06 16:33:19,239][21969] Avg episode reward: [(0, '18.821')] +[2025-07-06 16:33:19,873][22712] Updated weights for policy 0, policy_version 660 (0.0014) +[2025-07-06 16:33:24,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2719744. Throughput: 0: 1026.0. Samples: 679314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:33:24,235][21969] Avg episode reward: [(0, '18.265')] +[2025-07-06 16:33:29,230][21969] Fps is (10 sec: 4097.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2740224. Throughput: 0: 1030.6. Samples: 686298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:33:29,234][21969] Avg episode reward: [(0, '18.085')] +[2025-07-06 16:33:29,243][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000669_2740224.pth... +[2025-07-06 16:33:29,375][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000669_2740224.pth +[2025-07-06 16:33:29,480][22712] Updated weights for policy 0, policy_version 670 (0.0014) +[2025-07-06 16:33:34,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 4068.2). Total num frames: 2760704. Throughput: 0: 1033.1. Samples: 689730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:33:34,235][21969] Avg episode reward: [(0, '17.959')] +[2025-07-06 16:33:39,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 2781184. Throughput: 0: 1033.6. Samples: 694600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2025-07-06 16:33:39,235][21969] Avg episode reward: [(0, '17.831')] +[2025-07-06 16:33:40,025][22712] Updated weights for policy 0, policy_version 680 (0.0030) +[2025-07-06 16:33:44,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2801664. Throughput: 0: 1037.1. Samples: 701728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 16:33:44,232][21969] Avg episode reward: [(0, '19.802')] +[2025-07-06 16:33:49,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4054.5). Total num frames: 2822144. Throughput: 0: 1024.1. Samples: 704772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:33:49,235][21969] Avg episode reward: [(0, '20.877')] +[2025-07-06 16:33:50,468][22712] Updated weights for policy 0, policy_version 690 (0.0028) +[2025-07-06 16:33:54,232][21969] Fps is (10 sec: 4095.3, 60 sec: 4164.1, 300 sec: 4068.2). Total num frames: 2842624. Throughput: 0: 1033.0. Samples: 709978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:33:54,237][21969] Avg episode reward: [(0, '21.567')] +[2025-07-06 16:33:54,241][22699] Saving new best policy, reward=21.567! +[2025-07-06 16:33:59,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2863104. Throughput: 0: 1018.2. Samples: 716492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:33:59,233][21969] Avg episode reward: [(0, '22.477')] +[2025-07-06 16:33:59,241][22699] Saving new best policy, reward=22.477! +[2025-07-06 16:33:59,812][22712] Updated weights for policy 0, policy_version 700 (0.0019) +[2025-07-06 16:34:04,230][21969] Fps is (10 sec: 3687.1, 60 sec: 4027.9, 300 sec: 4054.3). Total num frames: 2879488. Throughput: 0: 1006.9. Samples: 719584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:34:04,235][21969] Avg episode reward: [(0, '20.480')] +[2025-07-06 16:34:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2899968. Throughput: 0: 1004.3. Samples: 724506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:34:09,237][21969] Avg episode reward: [(0, '19.672')] +[2025-07-06 16:34:10,722][22712] Updated weights for policy 0, policy_version 710 (0.0016) +[2025-07-06 16:34:14,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2924544. Throughput: 0: 1008.2. Samples: 731668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-07-06 16:34:14,234][21969] Avg episode reward: [(0, '17.999')] +[2025-07-06 16:34:19,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 4054.3). Total num frames: 2940928. Throughput: 0: 997.9. Samples: 734636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:34:19,231][21969] Avg episode reward: [(0, '16.135')] +[2025-07-06 16:34:21,443][22712] Updated weights for policy 0, policy_version 720 (0.0023) +[2025-07-06 16:34:24,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 2961408. Throughput: 0: 1007.2. Samples: 739924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:34:24,234][21969] Avg episode reward: [(0, '16.718')] +[2025-07-06 16:34:29,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2981888. Throughput: 0: 995.2. Samples: 746510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-07-06 16:34:29,235][21969] Avg episode reward: [(0, '16.916')] +[2025-07-06 16:34:30,288][22712] Updated weights for policy 0, policy_version 730 (0.0020) +[2025-07-06 16:34:34,230][21969] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 3002368. Throughput: 0: 995.0. Samples: 749546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:34:34,237][21969] Avg episode reward: [(0, '16.721')] +[2025-07-06 16:34:39,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 3022848. Throughput: 0: 995.0. Samples: 754752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 16:34:39,232][21969] Avg episode reward: [(0, '17.553')] +[2025-07-06 16:34:41,123][22712] Updated weights for policy 0, policy_version 740 (0.0016) +[2025-07-06 16:34:44,230][21969] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3043328. Throughput: 0: 1010.2. Samples: 761952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 16:34:44,232][21969] Avg episode reward: [(0, '18.976')] +[2025-07-06 16:34:49,234][21969] Fps is (10 sec: 3685.0, 60 sec: 3959.2, 300 sec: 4040.4). Total num frames: 3059712. Throughput: 0: 1002.3. Samples: 764692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-07-06 16:34:49,235][21969] Avg episode reward: [(0, '19.691')] +[2025-07-06 16:34:51,719][22712] Updated weights for policy 0, policy_version 750 (0.0021) +[2025-07-06 16:34:54,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 4040.5). Total num frames: 3080192. Throughput: 0: 1014.6. Samples: 770162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:34:54,235][21969] Avg episode reward: [(0, '19.954')] +[2025-07-06 16:34:59,230][21969] Fps is (10 sec: 4507.3, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3104768. Throughput: 0: 1003.8. Samples: 776838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:34:59,236][21969] Avg episode reward: [(0, '20.775')] +[2025-07-06 16:35:00,642][22712] Updated weights for policy 0, policy_version 760 (0.0024) +[2025-07-06 16:35:04,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 3121152. Throughput: 0: 1002.6. Samples: 779754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:35:04,234][21969] Avg episode reward: [(0, '21.616')] +[2025-07-06 16:35:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3141632. Throughput: 0: 1005.8. Samples: 785186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:35:09,235][21969] Avg episode reward: [(0, '19.543')] +[2025-07-06 16:35:11,478][22712] Updated weights for policy 0, policy_version 770 (0.0022) +[2025-07-06 16:35:14,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3166208. Throughput: 0: 1012.3. Samples: 792062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-07-06 16:35:14,239][21969] Avg episode reward: [(0, '18.923')] +[2025-07-06 16:35:19,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3182592. Throughput: 0: 1007.4. Samples: 794878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:35:19,233][21969] Avg episode reward: [(0, '19.116')] +[2025-07-06 16:35:22,252][22712] Updated weights for policy 0, policy_version 780 (0.0033) +[2025-07-06 16:35:24,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3203072. Throughput: 0: 1011.7. Samples: 800278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:35:24,232][21969] Avg episode reward: [(0, '18.559')] +[2025-07-06 16:35:29,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3227648. Throughput: 0: 1008.3. Samples: 807324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:35:29,232][21969] Avg episode reward: [(0, '18.931')] +[2025-07-06 16:35:29,242][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000788_3227648.pth... +[2025-07-06 16:35:29,367][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000788_3227648.pth +[2025-07-06 16:35:31,588][22712] Updated weights for policy 0, policy_version 790 (0.0025) +[2025-07-06 16:35:34,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3239936. Throughput: 0: 1002.5. Samples: 809802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:35:34,231][21969] Avg episode reward: [(0, '19.799')] +[2025-07-06 16:35:39,230][21969] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3260416. Throughput: 0: 1004.9. Samples: 815384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:35:39,234][21969] Avg episode reward: [(0, '19.997')] +[2025-07-06 16:35:42,062][22712] Updated weights for policy 0, policy_version 800 (0.0019) +[2025-07-06 16:35:44,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3284992. Throughput: 0: 1007.6. Samples: 822182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:35:44,235][21969] Avg episode reward: [(0, '19.411')] +[2025-07-06 16:35:49,232][21969] Fps is (10 sec: 4095.3, 60 sec: 4027.9, 300 sec: 4026.6). Total num frames: 3301376. Throughput: 0: 1002.6. Samples: 824874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:35:49,236][21969] Avg episode reward: [(0, '18.441')] +[2025-07-06 16:35:52,773][22712] Updated weights for policy 0, policy_version 810 (0.0025) +[2025-07-06 16:35:54,230][21969] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3321856. Throughput: 0: 1003.6. Samples: 830348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:35:54,235][21969] Avg episode reward: [(0, '18.987')] +[2025-07-06 16:35:59,234][21969] Fps is (10 sec: 4095.1, 60 sec: 3959.2, 300 sec: 4012.6). Total num frames: 3342336. Throughput: 0: 1002.4. Samples: 837172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:35:59,235][21969] Avg episode reward: [(0, '17.801')] +[2025-07-06 16:36:02,785][22712] Updated weights for policy 0, policy_version 820 (0.0026) +[2025-07-06 16:36:04,231][21969] Fps is (10 sec: 3686.0, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 3358720. Throughput: 0: 995.3. Samples: 839668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:36:04,233][21969] Avg episode reward: [(0, '18.655')] +[2025-07-06 16:36:09,230][21969] Fps is (10 sec: 4097.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3383296. Throughput: 0: 1003.6. Samples: 845440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:36:09,232][21969] Avg episode reward: [(0, '19.107')] +[2025-07-06 16:36:12,552][22712] Updated weights for policy 0, policy_version 830 (0.0020) +[2025-07-06 16:36:14,230][21969] Fps is (10 sec: 4506.1, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3403776. Throughput: 0: 998.8. Samples: 852272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:36:14,234][21969] Avg episode reward: [(0, '18.204')] +[2025-07-06 16:36:19,233][21969] Fps is (10 sec: 3685.3, 60 sec: 3959.3, 300 sec: 4026.5). Total num frames: 3420160. Throughput: 0: 998.5. Samples: 854736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:36:19,234][21969] Avg episode reward: [(0, '19.187')] +[2025-07-06 16:36:23,345][22712] Updated weights for policy 0, policy_version 840 (0.0018) +[2025-07-06 16:36:24,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3444736. Throughput: 0: 1001.4. Samples: 860446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:36:24,233][21969] Avg episode reward: [(0, '19.664')] +[2025-07-06 16:36:29,230][21969] Fps is (10 sec: 4506.9, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3465216. Throughput: 0: 1006.2. Samples: 867462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:36:29,234][21969] Avg episode reward: [(0, '20.176')] +[2025-07-06 16:36:33,811][22712] Updated weights for policy 0, policy_version 850 (0.0032) +[2025-07-06 16:36:34,231][21969] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3481600. Throughput: 0: 997.4. Samples: 869756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:36:34,233][21969] Avg episode reward: [(0, '19.645')] +[2025-07-06 16:36:39,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3502080. Throughput: 0: 1007.0. Samples: 875662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:36:39,235][21969] Avg episode reward: [(0, '20.067')] +[2025-07-06 16:36:43,136][22712] Updated weights for policy 0, policy_version 860 (0.0019) +[2025-07-06 16:36:44,230][21969] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3526656. Throughput: 0: 1005.6. Samples: 882420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:36:44,235][21969] Avg episode reward: [(0, '20.313')] +[2025-07-06 16:36:49,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4040.5). Total num frames: 3543040. Throughput: 0: 1004.6. Samples: 884872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:36:49,234][21969] Avg episode reward: [(0, '20.194')] +[2025-07-06 16:36:53,956][22712] Updated weights for policy 0, policy_version 870 (0.0022) +[2025-07-06 16:36:54,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3563520. Throughput: 0: 1000.8. Samples: 890476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:36:54,237][21969] Avg episode reward: [(0, '20.635')] +[2025-07-06 16:36:59,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.3, 300 sec: 4040.5). Total num frames: 3588096. Throughput: 0: 1005.3. Samples: 897510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:36:59,231][21969] Avg episode reward: [(0, '21.798')] +[2025-07-06 16:37:04,232][21969] Fps is (10 sec: 3685.6, 60 sec: 4027.7, 300 sec: 4026.5). Total num frames: 3600384. Throughput: 0: 999.8. Samples: 899724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:37:04,236][21969] Avg episode reward: [(0, '21.433')] +[2025-07-06 16:37:04,916][22712] Updated weights for policy 0, policy_version 880 (0.0022) +[2025-07-06 16:37:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3624960. Throughput: 0: 1003.0. Samples: 905582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:37:09,232][21969] Avg episode reward: [(0, '21.534')] +[2025-07-06 16:37:13,705][22712] Updated weights for policy 0, policy_version 890 (0.0021) +[2025-07-06 16:37:14,230][21969] Fps is (10 sec: 4506.5, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3645440. Throughput: 0: 996.8. Samples: 912320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:37:14,235][21969] Avg episode reward: [(0, '22.940')] +[2025-07-06 16:37:14,238][22699] Saving new best policy, reward=22.940! +[2025-07-06 16:37:19,231][21969] Fps is (10 sec: 3686.1, 60 sec: 4027.9, 300 sec: 4026.6). Total num frames: 3661824. Throughput: 0: 999.5. Samples: 914732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:37:19,236][21969] Avg episode reward: [(0, '22.712')] +[2025-07-06 16:37:24,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3682304. Throughput: 0: 994.7. Samples: 920424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:37:24,232][21969] Avg episode reward: [(0, '22.144')] +[2025-07-06 16:37:24,478][22712] Updated weights for policy 0, policy_version 900 (0.0020) +[2025-07-06 16:37:29,230][21969] Fps is (10 sec: 4505.9, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3706880. Throughput: 0: 998.5. Samples: 927352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-07-06 16:37:29,232][21969] Avg episode reward: [(0, '21.860')] +[2025-07-06 16:37:29,241][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000905_3706880.pth... +[2025-07-06 16:37:29,354][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000883_3616768.pth +[2025-07-06 16:37:34,231][21969] Fps is (10 sec: 3686.2, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 3719168. Throughput: 0: 995.7. Samples: 929680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:37:34,233][21969] Avg episode reward: [(0, '20.968')] +[2025-07-06 16:37:35,166][22712] Updated weights for policy 0, policy_version 910 (0.0023) +[2025-07-06 16:37:39,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3743744. Throughput: 0: 1008.6. Samples: 935864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:37:39,232][21969] Avg episode reward: [(0, '19.603')] +[2025-07-06 16:37:43,762][22712] Updated weights for policy 0, policy_version 920 (0.0022) +[2025-07-06 16:37:44,231][21969] Fps is (10 sec: 4915.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3768320. Throughput: 0: 1012.3. Samples: 943064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:37:44,235][21969] Avg episode reward: [(0, '19.605')] +[2025-07-06 16:37:49,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3784704. Throughput: 0: 1013.5. Samples: 945330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:37:49,234][21969] Avg episode reward: [(0, '19.392')] +[2025-07-06 16:37:54,094][22712] Updated weights for policy 0, policy_version 930 (0.0018) +[2025-07-06 16:37:54,230][21969] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3809280. Throughput: 0: 1024.8. Samples: 951700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:37:54,235][21969] Avg episode reward: [(0, '20.422')] +[2025-07-06 16:37:59,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3829760. Throughput: 0: 1029.9. Samples: 958664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-07-06 16:37:59,235][21969] Avg episode reward: [(0, '21.771')] +[2025-07-06 16:38:04,231][21969] Fps is (10 sec: 3686.3, 60 sec: 4096.1, 300 sec: 4040.5). Total num frames: 3846144. Throughput: 0: 1021.9. Samples: 960718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:38:04,235][21969] Avg episode reward: [(0, '23.000')] +[2025-07-06 16:38:04,240][22699] Saving new best policy, reward=23.000! +[2025-07-06 16:38:04,744][22712] Updated weights for policy 0, policy_version 940 (0.0020) +[2025-07-06 16:38:09,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3870720. Throughput: 0: 1037.8. Samples: 967126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:38:09,237][21969] Avg episode reward: [(0, '22.990')] +[2025-07-06 16:38:13,725][22712] Updated weights for policy 0, policy_version 950 (0.0025) +[2025-07-06 16:38:14,230][21969] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3891200. Throughput: 0: 1028.3. Samples: 973624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:38:14,232][21969] Avg episode reward: [(0, '23.600')] +[2025-07-06 16:38:14,233][22699] Saving new best policy, reward=23.600! +[2025-07-06 16:38:19,231][21969] Fps is (10 sec: 3686.1, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3907584. Throughput: 0: 1023.2. Samples: 975724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:38:19,235][21969] Avg episode reward: [(0, '22.631')] +[2025-07-06 16:38:24,230][21969] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3928064. Throughput: 0: 1028.1. Samples: 982128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:38:24,235][21969] Avg episode reward: [(0, '21.482')] +[2025-07-06 16:38:24,327][22712] Updated weights for policy 0, policy_version 960 (0.0016) +[2025-07-06 16:38:29,230][21969] Fps is (10 sec: 4506.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3952640. Throughput: 0: 1013.2. Samples: 988660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:38:29,232][21969] Avg episode reward: [(0, '20.784')] +[2025-07-06 16:38:34,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 3969024. Throughput: 0: 1008.2. Samples: 990698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-07-06 16:38:34,235][21969] Avg episode reward: [(0, '20.967')] +[2025-07-06 16:38:35,124][22712] Updated weights for policy 0, policy_version 970 (0.0024) +[2025-07-06 16:38:39,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3989504. Throughput: 0: 1016.2. Samples: 997428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-07-06 16:38:39,235][21969] Avg episode reward: [(0, '21.323')] +[2025-07-06 16:38:42,191][22699] Stopping Batcher_0... +[2025-07-06 16:38:42,193][22699] Loop batcher_evt_loop terminating... +[2025-07-06 16:38:42,193][21969] Component Batcher_0 stopped! +[2025-07-06 16:38:42,194][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 16:38:42,255][22712] Weights refcount: 2 0 +[2025-07-06 16:38:42,261][21969] Component InferenceWorker_p0-w0 stopped! +[2025-07-06 16:38:42,263][22712] Stopping InferenceWorker_p0-w0... +[2025-07-06 16:38:42,264][22712] Loop inference_proc0-0_evt_loop terminating... +[2025-07-06 16:38:42,360][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 16:38:42,528][21969] Component RolloutWorker_w5 stopped! +[2025-07-06 16:38:42,530][22717] Stopping RolloutWorker_w5... +[2025-07-06 16:38:42,533][21969] Component LearnerWorker_p0 stopped! +[2025-07-06 16:38:42,531][22717] Loop rollout_proc5_evt_loop terminating... +[2025-07-06 16:38:42,533][22699] Stopping LearnerWorker_p0... +[2025-07-06 16:38:42,538][22699] Loop learner_proc0_evt_loop terminating... +[2025-07-06 16:38:42,550][21969] Component RolloutWorker_w7 stopped! +[2025-07-06 16:38:42,551][22718] Stopping RolloutWorker_w7... +[2025-07-06 16:38:42,553][22718] Loop rollout_proc7_evt_loop terminating... +[2025-07-06 16:38:42,557][21969] Component RolloutWorker_w1 stopped! +[2025-07-06 16:38:42,558][22714] Stopping RolloutWorker_w1... +[2025-07-06 16:38:42,559][22714] Loop rollout_proc1_evt_loop terminating... +[2025-07-06 16:38:42,573][21969] Component RolloutWorker_w3 stopped! +[2025-07-06 16:38:42,572][22716] Stopping RolloutWorker_w3... +[2025-07-06 16:38:42,574][22716] Loop rollout_proc3_evt_loop terminating... +[2025-07-06 16:38:42,728][22720] Stopping RolloutWorker_w6... +[2025-07-06 16:38:42,729][22720] Loop rollout_proc6_evt_loop terminating... +[2025-07-06 16:38:42,728][21969] Component RolloutWorker_w6 stopped! +[2025-07-06 16:38:42,739][22715] Stopping RolloutWorker_w2... +[2025-07-06 16:38:42,739][21969] Component RolloutWorker_w2 stopped! +[2025-07-06 16:38:42,743][22715] Loop rollout_proc2_evt_loop terminating... +[2025-07-06 16:38:42,754][21969] Component RolloutWorker_w0 stopped! +[2025-07-06 16:38:42,758][22713] Stopping RolloutWorker_w0... +[2025-07-06 16:38:42,759][22713] Loop rollout_proc0_evt_loop terminating... +[2025-07-06 16:38:42,765][21969] Component RolloutWorker_w4 stopped! +[2025-07-06 16:38:42,769][21969] Waiting for process learner_proc0 to stop... +[2025-07-06 16:38:42,772][22719] Stopping RolloutWorker_w4... +[2025-07-06 16:38:42,774][22719] Loop rollout_proc4_evt_loop terminating... +[2025-07-06 16:38:44,856][21969] Waiting for process inference_proc0-0 to join... +[2025-07-06 16:38:44,875][21969] Waiting for process rollout_proc0 to join... +[2025-07-06 16:38:47,572][21969] Waiting for process rollout_proc1 to join... +[2025-07-06 16:38:47,593][21969] Waiting for process rollout_proc2 to join... +[2025-07-06 16:38:47,594][21969] Waiting for process rollout_proc3 to join... +[2025-07-06 16:38:47,596][21969] Waiting for process rollout_proc4 to join... +[2025-07-06 16:38:47,597][21969] Waiting for process rollout_proc5 to join... +[2025-07-06 16:38:47,599][21969] Waiting for process rollout_proc6 to join... +[2025-07-06 16:38:47,600][21969] Waiting for process rollout_proc7 to join... +[2025-07-06 16:38:47,601][21969] Batcher 0 profile tree view: +batching: 26.3992, releasing_batches: 0.0226 +[2025-07-06 16:38:47,602][21969] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 402.5665 +update_model: 8.3738 + weight_update: 0.0015 +one_step: 0.0097 + handle_policy_step: 560.5832 + deserialize: 13.3991, stack: 2.9791, obs_to_device_normalize: 117.9687, forward: 287.5495, send_messages: 27.8054 + prepare_outputs: 86.7100 + to_cpu: 53.6134 +[2025-07-06 16:38:47,603][21969] Learner 0 profile tree view: +misc: 0.0049, prepare_batch: 12.4655 +train: 73.3782 + epoch_init: 0.0045, minibatch_init: 0.0077, losses_postprocess: 0.7122, kl_divergence: 0.7295, after_optimizer: 33.2928 + calculate_losses: 25.8418 + losses_init: 0.0152, forward_head: 1.2984, bptt_initial: 17.2304, tail: 1.0916, advantages_returns: 0.3388, losses: 3.7090 + bptt: 1.8873 + bptt_forward_core: 1.8232 + update: 12.2045 + clip: 0.9781 +[2025-07-06 16:38:47,604][21969] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.2348, enqueue_policy_requests: 94.7937, env_step: 801.9069, overhead: 11.5350, complete_rollouts: 7.4212 +save_policy_outputs: 17.4529 + split_output_tensors: 6.8550 +[2025-07-06 16:38:47,605][21969] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.2380, enqueue_policy_requests: 102.3670, env_step: 787.5982, overhead: 11.1607, complete_rollouts: 6.1115 +save_policy_outputs: 16.6501 + split_output_tensors: 6.4323 +[2025-07-06 16:38:47,606][21969] Loop Runner_EvtLoop terminating... +[2025-07-06 16:38:47,607][21969] Runner profile tree view: +main_loop: 1033.8836 +[2025-07-06 16:38:47,607][21969] Collected {0: 4005888}, FPS: 3874.6 +[2025-07-06 16:58:14,453][21969] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-07-06 16:58:14,454][21969] Overriding arg 'num_workers' with value 1 passed from command line +[2025-07-06 16:58:14,455][21969] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-07-06 16:58:14,456][21969] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-07-06 16:58:14,457][21969] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-07-06 16:58:14,458][21969] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-07-06 16:58:14,459][21969] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-07-06 16:58:14,460][21969] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-07-06 16:58:14,461][21969] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-07-06 16:58:14,462][21969] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-07-06 16:58:14,463][21969] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-07-06 16:58:14,464][21969] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-07-06 16:58:14,465][21969] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-07-06 16:58:14,466][21969] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-07-06 16:58:14,467][21969] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-07-06 16:58:14,496][21969] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-07-06 16:58:14,498][21969] RunningMeanStd input shape: (3, 72, 128) +[2025-07-06 16:58:14,500][21969] RunningMeanStd input shape: (1,) +[2025-07-06 16:58:14,513][21969] ConvEncoder: input_channels=3 +[2025-07-06 16:58:14,616][21969] Conv encoder output size: 512 +[2025-07-06 16:58:14,617][21969] Policy head output size: 512 +[2025-07-06 16:58:14,874][21969] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 16:58:15,647][21969] Num frames 100... +[2025-07-06 16:58:15,775][21969] Num frames 200... +[2025-07-06 16:58:15,904][21969] Num frames 300... +[2025-07-06 16:58:16,039][21969] Num frames 400... +[2025-07-06 16:58:16,167][21969] Num frames 500... +[2025-07-06 16:58:16,310][21969] Num frames 600... +[2025-07-06 16:58:16,437][21969] Num frames 700... +[2025-07-06 16:58:16,567][21969] Num frames 800... +[2025-07-06 16:58:16,694][21969] Num frames 900... +[2025-07-06 16:58:16,834][21969] Avg episode rewards: #0: 19.640, true rewards: #0: 9.640 +[2025-07-06 16:58:16,835][21969] Avg episode reward: 19.640, avg true_objective: 9.640 +[2025-07-06 16:58:16,883][21969] Num frames 1000... +[2025-07-06 16:58:17,012][21969] Num frames 1100... +[2025-07-06 16:58:17,145][21969] Num frames 1200... +[2025-07-06 16:58:17,282][21969] Num frames 1300... +[2025-07-06 16:58:17,414][21969] Num frames 1400... +[2025-07-06 16:58:17,543][21969] Num frames 1500... +[2025-07-06 16:58:17,673][21969] Num frames 1600... +[2025-07-06 16:58:17,815][21969] Avg episode rewards: #0: 16.840, true rewards: #0: 8.340 +[2025-07-06 16:58:17,816][21969] Avg episode reward: 16.840, avg true_objective: 8.340 +[2025-07-06 16:58:17,860][21969] Num frames 1700... +[2025-07-06 16:58:17,989][21969] Num frames 1800... +[2025-07-06 16:58:18,125][21969] Num frames 1900... +[2025-07-06 16:58:18,256][21969] Num frames 2000... +[2025-07-06 16:58:18,435][21969] Num frames 2100... +[2025-07-06 16:58:18,618][21969] Num frames 2200... +[2025-07-06 16:58:18,792][21969] Num frames 2300... +[2025-07-06 16:58:18,963][21969] Num frames 2400... +[2025-07-06 16:58:19,143][21969] Num frames 2500... +[2025-07-06 16:58:19,323][21969] Num frames 2600... +[2025-07-06 16:58:19,501][21969] Num frames 2700... +[2025-07-06 16:58:19,671][21969] Num frames 2800... +[2025-07-06 16:58:19,847][21969] Num frames 2900... +[2025-07-06 16:58:20,037][21969] Num frames 3000... +[2025-07-06 16:58:20,217][21969] Num frames 3100... +[2025-07-06 16:58:20,409][21969] Num frames 3200... +[2025-07-06 16:58:20,598][21969] Num frames 3300... +[2025-07-06 16:58:20,734][21969] Num frames 3400... +[2025-07-06 16:58:20,867][21969] Num frames 3500... +[2025-07-06 16:58:21,002][21969] Num frames 3600... +[2025-07-06 16:58:21,134][21969] Num frames 3700... +[2025-07-06 16:58:21,282][21969] Avg episode rewards: #0: 30.893, true rewards: #0: 12.560 +[2025-07-06 16:58:21,283][21969] Avg episode reward: 30.893, avg true_objective: 12.560 +[2025-07-06 16:58:21,326][21969] Num frames 3800... +[2025-07-06 16:58:21,456][21969] Num frames 3900... +[2025-07-06 16:58:21,596][21969] Num frames 4000... +[2025-07-06 16:58:21,730][21969] Num frames 4100... +[2025-07-06 16:58:21,862][21969] Num frames 4200... +[2025-07-06 16:58:21,994][21969] Num frames 4300... +[2025-07-06 16:58:22,066][21969] Avg episode rewards: #0: 25.030, true rewards: #0: 10.780 +[2025-07-06 16:58:22,067][21969] Avg episode reward: 25.030, avg true_objective: 10.780 +[2025-07-06 16:58:22,182][21969] Num frames 4400... +[2025-07-06 16:58:22,314][21969] Num frames 4500... +[2025-07-06 16:58:22,448][21969] Num frames 4600... +[2025-07-06 16:58:22,588][21969] Num frames 4700... +[2025-07-06 16:58:22,718][21969] Num frames 4800... +[2025-07-06 16:58:22,850][21969] Num frames 4900... +[2025-07-06 16:58:22,982][21969] Num frames 5000... +[2025-07-06 16:58:23,114][21969] Num frames 5100... +[2025-07-06 16:58:23,247][21969] Num frames 5200... +[2025-07-06 16:58:23,376][21969] Num frames 5300... +[2025-07-06 16:58:23,508][21969] Num frames 5400... +[2025-07-06 16:58:23,647][21969] Num frames 5500... +[2025-07-06 16:58:23,780][21969] Num frames 5600... +[2025-07-06 16:58:23,910][21969] Num frames 5700... +[2025-07-06 16:58:24,068][21969] Avg episode rewards: #0: 27.160, true rewards: #0: 11.560 +[2025-07-06 16:58:24,069][21969] Avg episode reward: 27.160, avg true_objective: 11.560 +[2025-07-06 16:58:24,096][21969] Num frames 5800... +[2025-07-06 16:58:24,236][21969] Num frames 5900... +[2025-07-06 16:58:24,364][21969] Num frames 6000... +[2025-07-06 16:58:24,493][21969] Num frames 6100... +[2025-07-06 16:58:24,635][21969] Num frames 6200... +[2025-07-06 16:58:24,765][21969] Num frames 6300... +[2025-07-06 16:58:24,896][21969] Num frames 6400... +[2025-07-06 16:58:25,029][21969] Num frames 6500... +[2025-07-06 16:58:25,159][21969] Num frames 6600... +[2025-07-06 16:58:25,288][21969] Num frames 6700... +[2025-07-06 16:58:25,417][21969] Num frames 6800... +[2025-07-06 16:58:25,551][21969] Num frames 6900... +[2025-07-06 16:58:25,695][21969] Num frames 7000... +[2025-07-06 16:58:25,828][21969] Num frames 7100... +[2025-07-06 16:58:25,967][21969] Num frames 7200... +[2025-07-06 16:58:26,103][21969] Num frames 7300... +[2025-07-06 16:58:26,239][21969] Num frames 7400... +[2025-07-06 16:58:26,376][21969] Num frames 7500... +[2025-07-06 16:58:26,508][21969] Num frames 7600... +[2025-07-06 16:58:26,650][21969] Num frames 7700... +[2025-07-06 16:58:26,786][21969] Num frames 7800... +[2025-07-06 16:58:26,945][21969] Avg episode rewards: #0: 32.300, true rewards: #0: 13.133 +[2025-07-06 16:58:26,946][21969] Avg episode reward: 32.300, avg true_objective: 13.133 +[2025-07-06 16:58:26,975][21969] Num frames 7900... +[2025-07-06 16:58:27,107][21969] Num frames 8000... +[2025-07-06 16:58:27,236][21969] Num frames 8100... +[2025-07-06 16:58:27,371][21969] Num frames 8200... +[2025-07-06 16:58:27,502][21969] Num frames 8300... +[2025-07-06 16:58:27,635][21969] Num frames 8400... +[2025-07-06 16:58:27,780][21969] Num frames 8500... +[2025-07-06 16:58:27,912][21969] Num frames 8600... +[2025-07-06 16:58:28,046][21969] Num frames 8700... +[2025-07-06 16:58:28,185][21969] Num frames 8800... +[2025-07-06 16:58:28,294][21969] Avg episode rewards: #0: 30.628, true rewards: #0: 12.629 +[2025-07-06 16:58:28,295][21969] Avg episode reward: 30.628, avg true_objective: 12.629 +[2025-07-06 16:58:28,374][21969] Num frames 8900... +[2025-07-06 16:58:28,504][21969] Num frames 9000... +[2025-07-06 16:58:28,633][21969] Num frames 9100... +[2025-07-06 16:58:28,776][21969] Num frames 9200... +[2025-07-06 16:58:28,907][21969] Num frames 9300... +[2025-07-06 16:58:29,040][21969] Num frames 9400... +[2025-07-06 16:58:29,158][21969] Avg episode rewards: #0: 28.435, true rewards: #0: 11.810 +[2025-07-06 16:58:29,159][21969] Avg episode reward: 28.435, avg true_objective: 11.810 +[2025-07-06 16:58:29,232][21969] Num frames 9500... +[2025-07-06 16:58:29,366][21969] Num frames 9600... +[2025-07-06 16:58:29,496][21969] Num frames 9700... +[2025-07-06 16:58:29,625][21969] Num frames 9800... +[2025-07-06 16:58:29,815][21969] Avg episode rewards: #0: 25.884, true rewards: #0: 10.996 +[2025-07-06 16:58:29,816][21969] Avg episode reward: 25.884, avg true_objective: 10.996 +[2025-07-06 16:58:29,824][21969] Num frames 9900... +[2025-07-06 16:58:29,954][21969] Num frames 10000... +[2025-07-06 16:58:30,092][21969] Num frames 10100... +[2025-07-06 16:58:30,223][21969] Num frames 10200... +[2025-07-06 16:58:30,354][21969] Num frames 10300... +[2025-07-06 16:58:30,484][21969] Num frames 10400... +[2025-07-06 16:58:30,625][21969] Num frames 10500... +[2025-07-06 16:58:30,812][21969] Num frames 10600... +[2025-07-06 16:58:30,925][21969] Avg episode rewards: #0: 24.628, true rewards: #0: 10.628 +[2025-07-06 16:58:30,927][21969] Avg episode reward: 24.628, avg true_objective: 10.628 +[2025-07-06 16:59:38,353][21969] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2025-07-06 17:02:10,224][21969] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-07-06 17:02:10,225][21969] Overriding arg 'num_workers' with value 1 passed from command line +[2025-07-06 17:02:10,226][21969] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-07-06 17:02:10,227][21969] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-07-06 17:02:10,228][21969] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-07-06 17:02:10,229][21969] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-07-06 17:02:10,229][21969] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-07-06 17:02:10,231][21969] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-07-06 17:02:10,231][21969] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-07-06 17:02:10,232][21969] Adding new argument 'hf_repository'='zhngq/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-07-06 17:02:10,233][21969] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-07-06 17:02:10,235][21969] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-07-06 17:02:10,236][21969] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-07-06 17:02:10,237][21969] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-07-06 17:02:10,238][21969] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-07-06 17:02:10,263][21969] RunningMeanStd input shape: (3, 72, 128) +[2025-07-06 17:02:10,265][21969] RunningMeanStd input shape: (1,) +[2025-07-06 17:02:10,276][21969] ConvEncoder: input_channels=3 +[2025-07-06 17:02:10,308][21969] Conv encoder output size: 512 +[2025-07-06 17:02:10,309][21969] Policy head output size: 512 +[2025-07-06 17:02:10,327][21969] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-07-06 17:02:10,769][21969] Num frames 100... +[2025-07-06 17:02:10,901][21969] Num frames 200... +[2025-07-06 17:02:11,032][21969] Num frames 300... +[2025-07-06 17:02:11,166][21969] Num frames 400... +[2025-07-06 17:02:11,306][21969] Num frames 500... +[2025-07-06 17:02:11,436][21969] Num frames 600... +[2025-07-06 17:02:11,566][21969] Num frames 700... +[2025-07-06 17:02:11,707][21969] Num frames 800... +[2025-07-06 17:02:11,804][21969] Avg episode rewards: #0: 16.320, true rewards: #0: 8.320 +[2025-07-06 17:02:11,805][21969] Avg episode reward: 16.320, avg true_objective: 8.320 +[2025-07-06 17:02:11,897][21969] Num frames 900... +[2025-07-06 17:02:12,027][21969] Num frames 1000... +[2025-07-06 17:02:12,164][21969] Num frames 1100... +[2025-07-06 17:02:12,307][21969] Num frames 1200... +[2025-07-06 17:02:12,434][21969] Num frames 1300... +[2025-07-06 17:02:12,506][21969] Avg episode rewards: #0: 11.560, true rewards: #0: 6.560 +[2025-07-06 17:02:12,507][21969] Avg episode reward: 11.560, avg true_objective: 6.560 +[2025-07-06 17:02:12,622][21969] Num frames 1400... +[2025-07-06 17:02:12,750][21969] Num frames 1500... +[2025-07-06 17:02:12,887][21969] Num frames 1600... +[2025-07-06 17:02:13,018][21969] Num frames 1700... +[2025-07-06 17:02:13,150][21969] Num frames 1800... +[2025-07-06 17:02:13,280][21969] Num frames 1900... +[2025-07-06 17:02:13,418][21969] Num frames 2000... +[2025-07-06 17:02:13,534][21969] Avg episode rewards: #0: 12.160, true rewards: #0: 6.827 +[2025-07-06 17:02:13,535][21969] Avg episode reward: 12.160, avg true_objective: 6.827 +[2025-07-06 17:02:13,602][21969] Num frames 2100... +[2025-07-06 17:02:13,731][21969] Num frames 2200... +[2025-07-06 17:02:13,897][21969] Num frames 2300... +[2025-07-06 17:02:14,080][21969] Num frames 2400... +[2025-07-06 17:02:14,272][21969] Num frames 2500... +[2025-07-06 17:02:14,457][21969] Num frames 2600... +[2025-07-06 17:02:14,664][21969] Avg episode rewards: #0: 12.970, true rewards: #0: 6.720 +[2025-07-06 17:02:14,665][21969] Avg episode reward: 12.970, avg true_objective: 6.720 +[2025-07-06 17:02:14,688][21969] Num frames 2700... +[2025-07-06 17:02:14,857][21969] Num frames 2800... +[2025-07-06 17:02:15,025][21969] Num frames 2900... +[2025-07-06 17:02:15,193][21969] Num frames 3000... +[2025-07-06 17:02:15,371][21969] Num frames 3100... +[2025-07-06 17:02:15,438][21969] Avg episode rewards: #0: 12.008, true rewards: #0: 6.208 +[2025-07-06 17:02:15,439][21969] Avg episode reward: 12.008, avg true_objective: 6.208 +[2025-07-06 17:02:15,617][21969] Num frames 3200... +[2025-07-06 17:02:15,800][21969] Num frames 3300... +[2025-07-06 17:02:15,952][21969] Num frames 3400... +[2025-07-06 17:02:16,120][21969] Avg episode rewards: #0: 10.647, true rewards: #0: 5.813 +[2025-07-06 17:02:16,121][21969] Avg episode reward: 10.647, avg true_objective: 5.813 +[2025-07-06 17:02:16,138][21969] Num frames 3500... +[2025-07-06 17:02:16,267][21969] Num frames 3600... +[2025-07-06 17:02:16,398][21969] Num frames 3700... +[2025-07-06 17:02:16,540][21969] Num frames 3800... +[2025-07-06 17:02:16,659][21969] Avg episode rewards: #0: 9.927, true rewards: #0: 5.499 +[2025-07-06 17:02:16,660][21969] Avg episode reward: 9.927, avg true_objective: 5.499 +[2025-07-06 17:02:16,728][21969] Num frames 3900... +[2025-07-06 17:02:16,856][21969] Num frames 4000... +[2025-07-06 17:02:16,985][21969] Num frames 4100... +[2025-07-06 17:02:17,114][21969] Num frames 4200... +[2025-07-06 17:02:17,243][21969] Num frames 4300... +[2025-07-06 17:02:17,412][21969] Num frames 4400... +[2025-07-06 17:02:17,560][21969] Num frames 4500... +[2025-07-06 17:02:17,693][21969] Num frames 4600... +[2025-07-06 17:02:17,821][21969] Num frames 4700... +[2025-07-06 17:02:17,950][21969] Num frames 4800... +[2025-07-06 17:02:18,091][21969] Num frames 4900... +[2025-07-06 17:02:18,223][21969] Num frames 5000... +[2025-07-06 17:02:18,352][21969] Num frames 5100... +[2025-07-06 17:02:18,528][21969] Avg episode rewards: #0: 12.116, true rewards: #0: 6.491 +[2025-07-06 17:02:18,530][21969] Avg episode reward: 12.116, avg true_objective: 6.491 +[2025-07-06 17:02:18,540][21969] Num frames 5200... +[2025-07-06 17:02:18,668][21969] Num frames 5300... +[2025-07-06 17:02:18,797][21969] Num frames 5400... +[2025-07-06 17:02:18,931][21969] Num frames 5500... +[2025-07-06 17:02:19,062][21969] Num frames 5600... +[2025-07-06 17:02:19,188][21969] Num frames 5700... +[2025-07-06 17:02:19,319][21969] Num frames 5800... +[2025-07-06 17:02:19,446][21969] Num frames 5900... +[2025-07-06 17:02:19,588][21969] Num frames 6000... +[2025-07-06 17:02:19,717][21969] Num frames 6100... +[2025-07-06 17:02:19,884][21969] Avg episode rewards: #0: 12.983, true rewards: #0: 6.872 +[2025-07-06 17:02:19,885][21969] Avg episode reward: 12.983, avg true_objective: 6.872 +[2025-07-06 17:02:19,905][21969] Num frames 6200... +[2025-07-06 17:02:20,034][21969] Num frames 6300... +[2025-07-06 17:02:20,164][21969] Num frames 6400... +[2025-07-06 17:02:20,290][21969] Num frames 6500... +[2025-07-06 17:02:20,421][21969] Num frames 6600... +[2025-07-06 17:02:20,555][21969] Num frames 6700... +[2025-07-06 17:02:20,694][21969] Num frames 6800... +[2025-07-06 17:02:20,863][21969] Avg episode rewards: #0: 13.289, true rewards: #0: 6.889 +[2025-07-06 17:02:20,864][21969] Avg episode reward: 13.289, avg true_objective: 6.889 +[2025-07-06 17:03:00,125][21969] Replay video saved to /content/train_dir/default_experiment/replay.mp4!