diff --git "a/sf_log.txt" "b/sf_log.txt"
new file mode 100644--- /dev/null
+++ "b/sf_log.txt"
@@ -0,0 +1,2414 @@
+[2025-07-06 15:23:21,859][06149] Saving configuration to /content/train_dir/default_experiment/config.json...
+[2025-07-06 15:23:21,860][06149] Rollout worker 0 uses device cpu
+[2025-07-06 15:23:21,862][06149] Rollout worker 1 uses device cpu
+[2025-07-06 15:23:21,863][06149] Rollout worker 2 uses device cpu
+[2025-07-06 15:23:21,863][06149] Rollout worker 3 uses device cpu
+[2025-07-06 15:23:21,864][06149] Rollout worker 4 uses device cpu
+[2025-07-06 15:23:21,865][06149] Rollout worker 5 uses device cpu
+[2025-07-06 15:23:21,866][06149] Rollout worker 6 uses device cpu
+[2025-07-06 15:23:21,867][06149] Rollout worker 7 uses device cpu
+[2025-07-06 15:23:22,020][06149] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 15:23:22,020][06149] InferenceWorker_p0-w0: min num requests: 2
+[2025-07-06 15:23:22,051][06149] Starting all processes...
+[2025-07-06 15:23:22,052][06149] Starting process learner_proc0
+[2025-07-06 15:23:22,106][06149] Starting all processes...
+[2025-07-06 15:23:22,117][06149] Starting process inference_proc0-0
+[2025-07-06 15:23:22,121][06149] Starting process rollout_proc0
+[2025-07-06 15:23:22,121][06149] Starting process rollout_proc1
+[2025-07-06 15:23:22,121][06149] Starting process rollout_proc2
+[2025-07-06 15:23:22,121][06149] Starting process rollout_proc3
+[2025-07-06 15:23:22,121][06149] Starting process rollout_proc4
+[2025-07-06 15:23:22,121][06149] Starting process rollout_proc5
+[2025-07-06 15:23:22,121][06149] Starting process rollout_proc6
+[2025-07-06 15:23:22,121][06149] Starting process rollout_proc7
+[2025-07-06 15:23:43,764][06624] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 15:23:43,766][06624] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2025-07-06 15:23:43,790][06640] Worker 3 uses CPU cores [1]
+[2025-07-06 15:23:43,826][06149] Heartbeat connected on RolloutWorker_w3
+[2025-07-06 15:23:43,855][06624] Num visible devices: 1
+[2025-07-06 15:23:43,867][06624] Starting seed is not provided
+[2025-07-06 15:23:43,868][06624] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 15:23:43,869][06624] Initializing actor-critic model on device cuda:0
+[2025-07-06 15:23:43,870][06624] RunningMeanStd input shape: (3, 72, 128)
+[2025-07-06 15:23:43,872][06149] Heartbeat connected on Batcher_0
+[2025-07-06 15:23:43,876][06624] RunningMeanStd input shape: (1,)
+[2025-07-06 15:23:43,921][06637] Worker 0 uses CPU cores [0]
+[2025-07-06 15:23:43,943][06149] Heartbeat connected on RolloutWorker_w0
+[2025-07-06 15:23:43,939][06624] ConvEncoder: input_channels=3
+[2025-07-06 15:23:44,248][06641] Worker 4 uses CPU cores [0]
+[2025-07-06 15:23:44,242][06642] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 15:23:44,250][06639] Worker 2 uses CPU cores [0]
+[2025-07-06 15:23:44,251][06642] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2025-07-06 15:23:44,274][06149] Heartbeat connected on RolloutWorker_w4
+[2025-07-06 15:23:44,276][06149] Heartbeat connected on RolloutWorker_w2
+[2025-07-06 15:23:44,310][06638] Worker 1 uses CPU cores [1]
+[2025-07-06 15:23:44,311][06149] Heartbeat connected on RolloutWorker_w1
+[2025-07-06 15:23:44,327][06642] Num visible devices: 1
+[2025-07-06 15:23:44,332][06149] Heartbeat connected on InferenceWorker_p0-w0
+[2025-07-06 15:23:44,411][06644] Worker 6 uses CPU cores [0]
+[2025-07-06 15:23:44,427][06149] Heartbeat connected on RolloutWorker_w6
+[2025-07-06 15:23:44,677][06645] Worker 7 uses CPU cores [1]
+[2025-07-06 15:23:44,680][06149] Heartbeat connected on RolloutWorker_w7
+[2025-07-06 15:23:44,695][06624] Conv encoder output size: 512
+[2025-07-06 15:23:44,697][06624] Policy head output size: 512
+[2025-07-06 15:23:44,729][06643] Worker 5 uses CPU cores [1]
+[2025-07-06 15:23:44,731][06149] Heartbeat connected on RolloutWorker_w5
+[2025-07-06 15:23:44,793][06624] Created Actor Critic model with architecture:
+[2025-07-06 15:23:44,795][06624] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2025-07-06 15:23:45,250][06624] Using optimizer <class 'torch.optim.adam.Adam'>
+[2025-07-06 15:23:53,057][06624] No checkpoints found
+[2025-07-06 15:23:53,058][06624] Did not load from checkpoint, starting from scratch!
+[2025-07-06 15:23:53,059][06624] Initialized policy 0 weights for model version 0
+[2025-07-06 15:23:53,069][06624] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 15:23:53,076][06624] LearnerWorker_p0 finished initialization!
+[2025-07-06 15:23:53,079][06149] Heartbeat connected on LearnerWorker_p0
+[2025-07-06 15:23:53,368][06642] RunningMeanStd input shape: (3, 72, 128)
+[2025-07-06 15:23:53,371][06642] RunningMeanStd input shape: (1,)
+[2025-07-06 15:23:53,384][06642] ConvEncoder: input_channels=3
+[2025-07-06 15:23:53,510][06642] Conv encoder output size: 512
+[2025-07-06 15:23:53,511][06642] Policy head output size: 512
+[2025-07-06 15:23:53,548][06149] Inference worker 0-0 is ready!
+[2025-07-06 15:23:53,548][06149] All inference workers are ready! Signal rollout workers to start!
+[2025-07-06 15:23:53,828][06640] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 15:23:53,831][06643] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 15:23:53,825][06639] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 15:23:53,832][06645] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 15:23:53,830][06644] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 15:23:53,842][06638] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 15:23:53,844][06637] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 15:23:53,847][06641] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 15:23:55,191][06645] Decorrelating experience for 0 frames...
+[2025-07-06 15:23:55,191][06641] Decorrelating experience for 0 frames...
+[2025-07-06 15:23:55,192][06643] Decorrelating experience for 0 frames...
+[2025-07-06 15:23:55,610][06645] Decorrelating experience for 32 frames...
+[2025-07-06 15:23:55,972][06641] Decorrelating experience for 32 frames...
+[2025-07-06 15:23:56,045][06639] Decorrelating experience for 0 frames...
+[2025-07-06 15:23:56,397][06149] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-07-06 15:23:56,483][06643] Decorrelating experience for 32 frames...
+[2025-07-06 15:23:56,709][06645] Decorrelating experience for 64 frames...
+[2025-07-06 15:23:56,974][06639] Decorrelating experience for 32 frames...
+[2025-07-06 15:23:57,464][06641] Decorrelating experience for 64 frames...
+[2025-07-06 15:23:57,661][06643] Decorrelating experience for 64 frames...
+[2025-07-06 15:23:57,666][06645] Decorrelating experience for 96 frames...
+[2025-07-06 15:23:57,917][06639] Decorrelating experience for 64 frames...
+[2025-07-06 15:23:58,560][06641] Decorrelating experience for 96 frames...
+[2025-07-06 15:23:58,621][06643] Decorrelating experience for 96 frames...
+[2025-07-06 15:23:58,846][06639] Decorrelating experience for 96 frames...
+[2025-07-06 15:24:01,398][06149] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 90.0. Samples: 450. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-07-06 15:24:01,399][06149] Avg episode reward: [(0, '2.949')]
+[2025-07-06 15:24:02,073][06624] Signal inference workers to stop experience collection...
+[2025-07-06 15:24:02,084][06642] InferenceWorker_p0-w0: stopping experience collection
+[2025-07-06 15:24:03,506][06624] Signal inference workers to resume experience collection...
+[2025-07-06 15:24:03,510][06642] InferenceWorker_p0-w0: resuming experience collection
+[2025-07-06 15:24:06,397][06149] Fps is (10 sec: 1228.8, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 12288. Throughput: 0: 324.4. Samples: 3244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-07-06 15:24:06,398][06149] Avg episode reward: [(0, '3.662')]
+[2025-07-06 15:24:11,399][06149] Fps is (10 sec: 3276.2, 60 sec: 2184.2, 300 sec: 2184.2). Total num frames: 32768. Throughput: 0: 363.4. Samples: 5452. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:24:11,405][06149] Avg episode reward: [(0, '4.220')]
+[2025-07-06 15:24:13,607][06642] Updated weights for policy 0, policy_version 10 (0.0143)
+[2025-07-06 15:24:16,397][06149] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 49152. Throughput: 0: 569.4. Samples: 11388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:24:16,402][06149] Avg episode reward: [(0, '4.448')]
+[2025-07-06 15:24:21,397][06149] Fps is (10 sec: 2867.9, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 61440. Throughput: 0: 638.7. Samples: 15968. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:24:21,401][06149] Avg episode reward: [(0, '4.407')]
+[2025-07-06 15:24:25,679][06642] Updated weights for policy 0, policy_version 20 (0.0026)
+[2025-07-06 15:24:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 81920. Throughput: 0: 620.0. Samples: 18600. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:24:26,403][06149] Avg episode reward: [(0, '4.499')]
+[2025-07-06 15:24:31,397][06149] Fps is (10 sec: 4095.9, 60 sec: 2925.7, 300 sec: 2925.7). Total num frames: 102400. Throughput: 0: 692.9. Samples: 24252. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:24:31,399][06149] Avg episode reward: [(0, '4.475')]
+[2025-07-06 15:24:31,401][06624] Saving new best policy, reward=4.475!
+[2025-07-06 15:24:36,397][06149] Fps is (10 sec: 3276.8, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 114688. Throughput: 0: 711.3. Samples: 28450. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:24:36,403][06149] Avg episode reward: [(0, '4.457')]
+[2025-07-06 15:24:38,530][06642] Updated weights for policy 0, policy_version 30 (0.0018)
+[2025-07-06 15:24:41,399][06149] Fps is (10 sec: 2866.8, 60 sec: 2912.6, 300 sec: 2912.6). Total num frames: 131072. Throughput: 0: 689.4. Samples: 31024. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 15:24:41,404][06149] Avg episode reward: [(0, '4.449')]
+[2025-07-06 15:24:46,397][06149] Fps is (10 sec: 3276.8, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 147456. Throughput: 0: 801.7. Samples: 36524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:24:46,398][06149] Avg episode reward: [(0, '4.441')]
+[2025-07-06 15:24:50,761][06642] Updated weights for policy 0, policy_version 40 (0.0013)
+[2025-07-06 15:24:51,397][06149] Fps is (10 sec: 3277.4, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 163840. Throughput: 0: 838.7. Samples: 40986. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:24:51,398][06149] Avg episode reward: [(0, '4.293')]
+[2025-07-06 15:24:56,397][06149] Fps is (10 sec: 3686.5, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 860.5. Samples: 44174. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:24:56,401][06149] Avg episode reward: [(0, '4.168')]
+[2025-07-06 15:25:01,014][06642] Updated weights for policy 0, policy_version 50 (0.0013)
+[2025-07-06 15:25:01,397][06149] Fps is (10 sec: 4096.0, 60 sec: 3413.4, 300 sec: 3150.8). Total num frames: 204800. Throughput: 0: 869.4. Samples: 50512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:25:01,401][06149] Avg episode reward: [(0, '4.414')]
+[2025-07-06 15:25:06,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3159.8). Total num frames: 221184. Throughput: 0: 876.1. Samples: 55392. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 15:25:06,398][06149] Avg episode reward: [(0, '4.447')]
+[2025-07-06 15:25:11,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.7, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 889.2. Samples: 58614. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:25:11,402][06149] Avg episode reward: [(0, '4.377')]
+[2025-07-06 15:25:11,544][06642] Updated weights for policy 0, policy_version 60 (0.0019)
+[2025-07-06 15:25:16,398][06149] Fps is (10 sec: 3685.8, 60 sec: 3481.5, 300 sec: 3225.5). Total num frames: 258048. Throughput: 0: 891.6. Samples: 64376. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:25:16,409][06149] Avg episode reward: [(0, '4.589')]
+[2025-07-06 15:25:16,422][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000063_258048.pth...
+[2025-07-06 15:25:16,545][06624] Saving new best policy, reward=4.589!
+[2025-07-06 15:25:21,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3276.8). Total num frames: 278528. Throughput: 0: 919.0. Samples: 69804. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:25:21,398][06149] Avg episode reward: [(0, '4.518')]
+[2025-07-06 15:25:22,716][06642] Updated weights for policy 0, policy_version 70 (0.0014)
+[2025-07-06 15:25:26,397][06149] Fps is (10 sec: 4096.7, 60 sec: 3618.1, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 931.8. Samples: 72954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 15:25:26,401][06149] Avg episode reward: [(0, '4.294')]
+[2025-07-06 15:25:31,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3319.9). Total num frames: 315392. Throughput: 0: 920.4. Samples: 77940. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:25:31,398][06149] Avg episode reward: [(0, '4.277')]
+[2025-07-06 15:25:34,416][06642] Updated weights for policy 0, policy_version 80 (0.0018)
+[2025-07-06 15:25:36,399][06149] Fps is (10 sec: 3276.0, 60 sec: 3618.0, 300 sec: 3317.7). Total num frames: 331776. Throughput: 0: 941.1. Samples: 83338. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:25:36,400][06149] Avg episode reward: [(0, '4.339')]
+[2025-07-06 15:25:41,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3354.8). Total num frames: 352256. Throughput: 0: 934.9. Samples: 86246. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:25:41,398][06149] Avg episode reward: [(0, '4.476')]
+[2025-07-06 15:25:46,397][06149] Fps is (10 sec: 3277.6, 60 sec: 3618.1, 300 sec: 3314.0). Total num frames: 364544. Throughput: 0: 891.6. Samples: 90632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:25:46,398][06149] Avg episode reward: [(0, '4.492')]
+[2025-07-06 15:25:46,509][06642] Updated weights for policy 0, policy_version 90 (0.0028)
+[2025-07-06 15:25:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3348.0). Total num frames: 385024. Throughput: 0: 914.5. Samples: 96544. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 15:25:51,398][06149] Avg episode reward: [(0, '4.620')]
+[2025-07-06 15:25:51,403][06624] Saving new best policy, reward=4.620!
+[2025-07-06 15:25:56,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3345.1). Total num frames: 401408. Throughput: 0: 907.6. Samples: 99456. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:25:56,398][06149] Avg episode reward: [(0, '4.686')]
+[2025-07-06 15:25:56,417][06624] Saving new best policy, reward=4.686!
+[2025-07-06 15:25:58,467][06642] Updated weights for policy 0, policy_version 100 (0.0013)
+[2025-07-06 15:26:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3342.3). Total num frames: 417792. Throughput: 0: 872.5. Samples: 103638. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:26:01,398][06149] Avg episode reward: [(0, '4.796')]
+[2025-07-06 15:26:01,403][06624] Saving new best policy, reward=4.796!
+[2025-07-06 15:26:06,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3371.3). Total num frames: 438272. Throughput: 0: 879.6. Samples: 109384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:26:06,398][06149] Avg episode reward: [(0, '4.730')]
+[2025-07-06 15:26:09,763][06642] Updated weights for policy 0, policy_version 110 (0.0015)
+[2025-07-06 15:26:11,401][06149] Fps is (10 sec: 3684.7, 60 sec: 3549.6, 300 sec: 3367.7). Total num frames: 454656. Throughput: 0: 873.4. Samples: 112262. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:26:11,402][06149] Avg episode reward: [(0, '4.557')]
+[2025-07-06 15:26:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3364.6). Total num frames: 471040. Throughput: 0: 861.3. Samples: 116698. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:26:16,398][06149] Avg episode reward: [(0, '4.460')]
+[2025-07-06 15:26:20,973][06642] Updated weights for policy 0, policy_version 120 (0.0014)
+[2025-07-06 15:26:21,397][06149] Fps is (10 sec: 3688.0, 60 sec: 3549.9, 300 sec: 3389.8). Total num frames: 491520. Throughput: 0: 876.4. Samples: 122774. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:26:21,398][06149] Avg episode reward: [(0, '4.396')]
+[2025-07-06 15:26:26,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3386.0). Total num frames: 507904. Throughput: 0: 870.4. Samples: 125416. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:26:26,400][06149] Avg episode reward: [(0, '4.407')]
+[2025-07-06 15:26:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3408.9). Total num frames: 528384. Throughput: 0: 891.2. Samples: 130738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:26:31,400][06149] Avg episode reward: [(0, '4.378')]
+[2025-07-06 15:26:32,034][06642] Updated weights for policy 0, policy_version 130 (0.0012)
+[2025-07-06 15:26:36,397][06149] Fps is (10 sec: 4095.7, 60 sec: 3618.2, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 893.5. Samples: 136752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:26:36,399][06149] Avg episode reward: [(0, '4.376')]
+[2025-07-06 15:26:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3400.9). Total num frames: 561152. Throughput: 0: 871.4. Samples: 138670. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:26:41,398][06149] Avg episode reward: [(0, '4.402')]
+[2025-07-06 15:26:44,016][06642] Updated weights for policy 0, policy_version 140 (0.0029)
+[2025-07-06 15:26:46,397][06149] Fps is (10 sec: 3277.0, 60 sec: 3618.1, 300 sec: 3421.4). Total num frames: 581632. Throughput: 0: 900.7. Samples: 144168. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 15:26:46,398][06149] Avg episode reward: [(0, '4.496')]
+[2025-07-06 15:26:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3417.2). Total num frames: 598016. Throughput: 0: 896.5. Samples: 149728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 15:26:51,401][06149] Avg episode reward: [(0, '4.485')]
+[2025-07-06 15:26:55,847][06642] Updated weights for policy 0, policy_version 150 (0.0016)
+[2025-07-06 15:26:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3413.3). Total num frames: 614400. Throughput: 0: 873.3. Samples: 151556. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:26:56,398][06149] Avg episode reward: [(0, '4.518')]
+[2025-07-06 15:27:01,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3431.8). Total num frames: 634880. Throughput: 0: 905.6. Samples: 157448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:27:01,401][06149] Avg episode reward: [(0, '4.547')]
+[2025-07-06 15:27:06,398][06149] Fps is (10 sec: 3276.4, 60 sec: 3481.5, 300 sec: 3406.1). Total num frames: 647168. Throughput: 0: 880.2. Samples: 162386. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:27:06,400][06149] Avg episode reward: [(0, '4.713')]
+[2025-07-06 15:27:08,057][06642] Updated weights for policy 0, policy_version 160 (0.0022)
+[2025-07-06 15:27:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3550.1, 300 sec: 3423.8). Total num frames: 667648. Throughput: 0: 871.8. Samples: 164646. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:27:11,402][06149] Avg episode reward: [(0, '4.510')]
+[2025-07-06 15:27:16,397][06149] Fps is (10 sec: 4096.5, 60 sec: 3618.1, 300 sec: 3440.6). Total num frames: 688128. Throughput: 0: 886.0. Samples: 170610. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:27:16,400][06149] Avg episode reward: [(0, '4.637')]
+[2025-07-06 15:27:16,408][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000168_688128.pth...
+[2025-07-06 15:27:18,949][06642] Updated weights for policy 0, policy_version 170 (0.0016)
+[2025-07-06 15:27:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3416.7). Total num frames: 700416. Throughput: 0: 854.7. Samples: 175212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:27:21,400][06149] Avg episode reward: [(0, '4.676')]
+[2025-07-06 15:27:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3432.8). Total num frames: 720896. Throughput: 0: 873.2. Samples: 177964. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:27:26,406][06149] Avg episode reward: [(0, '4.622')]
+[2025-07-06 15:27:30,327][06642] Updated weights for policy 0, policy_version 180 (0.0013)
+[2025-07-06 15:27:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.2). Total num frames: 737280. Throughput: 0: 883.5. Samples: 183926. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:27:31,401][06149] Avg episode reward: [(0, '4.641')]
+[2025-07-06 15:27:36,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3425.7). Total num frames: 753664. Throughput: 0: 855.2. Samples: 188214. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:27:36,398][06149] Avg episode reward: [(0, '4.579')]
+[2025-07-06 15:27:41,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3440.6). Total num frames: 774144. Throughput: 0: 880.3. Samples: 191168. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:27:41,402][06149] Avg episode reward: [(0, '4.411')]
+[2025-07-06 15:27:42,316][06642] Updated weights for policy 0, policy_version 190 (0.0016)
+[2025-07-06 15:27:46,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3437.1). Total num frames: 790528. Throughput: 0: 879.8. Samples: 197040. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:27:46,402][06149] Avg episode reward: [(0, '4.460')]
+[2025-07-06 15:27:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3433.7). Total num frames: 806912. Throughput: 0: 870.6. Samples: 201562. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:27:51,398][06149] Avg episode reward: [(0, '4.477')]
+[2025-07-06 15:27:54,057][06642] Updated weights for policy 0, policy_version 200 (0.0020)
+[2025-07-06 15:27:56,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3447.5). Total num frames: 827392. Throughput: 0: 887.2. Samples: 204570. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:27:56,402][06149] Avg episode reward: [(0, '4.472')]
+[2025-07-06 15:28:01,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3444.0). Total num frames: 843776. Throughput: 0: 878.1. Samples: 210126. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:28:01,400][06149] Avg episode reward: [(0, '4.464')]
+[2025-07-06 15:28:05,963][06642] Updated weights for policy 0, policy_version 210 (0.0018)
+[2025-07-06 15:28:06,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3440.6). Total num frames: 860160. Throughput: 0: 881.6. Samples: 214886. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:28:06,398][06149] Avg episode reward: [(0, '4.591')]
+[2025-07-06 15:28:11,397][06149] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3453.5). Total num frames: 880640. Throughput: 0: 886.7. Samples: 217864. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:28:11,398][06149] Avg episode reward: [(0, '4.453')]
+[2025-07-06 15:28:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3434.3). Total num frames: 892928. Throughput: 0: 867.3. Samples: 222954. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:28:16,398][06149] Avg episode reward: [(0, '4.477')]
+[2025-07-06 15:28:17,726][06642] Updated weights for policy 0, policy_version 220 (0.0014)
+[2025-07-06 15:28:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3446.8). Total num frames: 913408. Throughput: 0: 891.5. Samples: 228332. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:28:21,402][06149] Avg episode reward: [(0, '4.455')]
+[2025-07-06 15:28:26,397][06149] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3458.8). Total num frames: 933888. Throughput: 0: 892.0. Samples: 231310. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:28:26,401][06149] Avg episode reward: [(0, '4.617')]
+[2025-07-06 15:28:29,326][06642] Updated weights for policy 0, policy_version 230 (0.0013)
+[2025-07-06 15:28:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3440.6). Total num frames: 946176. Throughput: 0: 864.0. Samples: 235918. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:28:31,401][06149] Avg episode reward: [(0, '4.810')]
+[2025-07-06 15:28:31,405][06624] Saving new best policy, reward=4.810!
+[2025-07-06 15:28:36,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3452.3). Total num frames: 966656. Throughput: 0: 891.1. Samples: 241660. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:28:36,402][06149] Avg episode reward: [(0, '4.869')]
+[2025-07-06 15:28:36,410][06624] Saving new best policy, reward=4.869!
+[2025-07-06 15:28:39,912][06642] Updated weights for policy 0, policy_version 240 (0.0018)
+[2025-07-06 15:28:41,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3449.3). Total num frames: 983040. Throughput: 0: 888.5. Samples: 244554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:28:41,401][06149] Avg episode reward: [(0, '4.911')]
+[2025-07-06 15:28:41,402][06624] Saving new best policy, reward=4.911!
+[2025-07-06 15:28:46,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3446.3). Total num frames: 999424. Throughput: 0: 861.8. Samples: 248906. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:28:46,399][06149] Avg episode reward: [(0, '5.020')]
+[2025-07-06 15:28:46,404][06624] Saving new best policy, reward=5.020!
+[2025-07-06 15:28:51,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3457.3). Total num frames: 1019904. Throughput: 0: 889.6. Samples: 254920. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:28:51,398][06149] Avg episode reward: [(0, '5.008')]
+[2025-07-06 15:28:51,985][06642] Updated weights for policy 0, policy_version 250 (0.0014)
+[2025-07-06 15:28:56,399][06149] Fps is (10 sec: 3685.8, 60 sec: 3481.5, 300 sec: 3512.8). Total num frames: 1036288. Throughput: 0: 889.1. Samples: 257874. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:28:56,403][06149] Avg episode reward: [(0, '4.857')]
+[2025-07-06 15:29:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1052672. Throughput: 0: 875.5. Samples: 262350. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:29:01,399][06149] Avg episode reward: [(0, '5.042')]
+[2025-07-06 15:29:01,403][06624] Saving new best policy, reward=5.042!
+[2025-07-06 15:29:03,898][06642] Updated weights for policy 0, policy_version 260 (0.0013)
+[2025-07-06 15:29:06,397][06149] Fps is (10 sec: 3687.1, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 1073152. Throughput: 0: 884.5. Samples: 268134. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:29:06,400][06149] Avg episode reward: [(0, '5.197')]
+[2025-07-06 15:29:06,407][06624] Saving new best policy, reward=5.197!
+[2025-07-06 15:29:11,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 1085440. Throughput: 0: 874.6. Samples: 270666. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:29:11,404][06149] Avg episode reward: [(0, '4.924')]
+[2025-07-06 15:29:15,812][06642] Updated weights for policy 0, policy_version 270 (0.0013)
+[2025-07-06 15:29:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1105920. Throughput: 0: 878.4. Samples: 275448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:29:16,401][06149] Avg episode reward: [(0, '4.751')]
+[2025-07-06 15:29:16,407][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000270_1105920.pth...
+[2025-07-06 15:29:16,496][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000063_258048.pth
+[2025-07-06 15:29:21,397][06149] Fps is (10 sec: 4096.2, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1126400. Throughput: 0: 880.0. Samples: 281262. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:29:21,401][06149] Avg episode reward: [(0, '4.775')]
+[2025-07-06 15:29:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 1138688. Throughput: 0: 864.2. Samples: 283442. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:29:26,398][06149] Avg episode reward: [(0, '4.756')]
+[2025-07-06 15:29:27,873][06642] Updated weights for policy 0, policy_version 280 (0.0015)
+[2025-07-06 15:29:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1159168. Throughput: 0: 885.1. Samples: 288736. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:29:31,398][06149] Avg episode reward: [(0, '4.904')]
+[2025-07-06 15:29:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1175552. Throughput: 0: 878.3. Samples: 294444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:29:36,402][06149] Avg episode reward: [(0, '5.267')]
+[2025-07-06 15:29:36,415][06624] Saving new best policy, reward=5.267!
+[2025-07-06 15:29:39,853][06642] Updated weights for policy 0, policy_version 290 (0.0016)
+[2025-07-06 15:29:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1191936. Throughput: 0: 850.9. Samples: 296162. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:29:41,399][06149] Avg episode reward: [(0, '5.294')]
+[2025-07-06 15:29:41,403][06624] Saving new best policy, reward=5.294!
+[2025-07-06 15:29:46,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3554.5). Total num frames: 1212416. Throughput: 0: 875.4. Samples: 301744. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:29:46,402][06149] Avg episode reward: [(0, '5.273')]
+[2025-07-06 15:29:51,170][06642] Updated weights for policy 0, policy_version 300 (0.0014)
+[2025-07-06 15:29:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3540.6). Total num frames: 1228800. Throughput: 0: 865.3. Samples: 307074. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:29:51,403][06149] Avg episode reward: [(0, '5.302')]
+[2025-07-06 15:29:51,405][06624] Saving new best policy, reward=5.302!
+[2025-07-06 15:29:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3526.7). Total num frames: 1245184. Throughput: 0: 853.1. Samples: 309054. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:29:56,404][06149] Avg episode reward: [(0, '5.274')]
+[2025-07-06 15:30:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1261568. Throughput: 0: 877.0. Samples: 314914. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:30:01,398][06149] Avg episode reward: [(0, '5.208')]
+[2025-07-06 15:30:02,488][06642] Updated weights for policy 0, policy_version 310 (0.0013)
+[2025-07-06 15:30:06,402][06149] Fps is (10 sec: 3275.0, 60 sec: 3413.0, 300 sec: 3512.8). Total num frames: 1277952. Throughput: 0: 853.4. Samples: 319668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:30:06,404][06149] Avg episode reward: [(0, '5.409')]
+[2025-07-06 15:30:06,411][06624] Saving new best policy, reward=5.409!
+[2025-07-06 15:30:11,401][06149] Fps is (10 sec: 3684.8, 60 sec: 3549.6, 300 sec: 3526.7). Total num frames: 1298432. Throughput: 0: 857.2. Samples: 322018. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:30:11,403][06149] Avg episode reward: [(0, '5.931')]
+[2025-07-06 15:30:11,415][06624] Saving new best policy, reward=5.931!
+[2025-07-06 15:30:14,653][06642] Updated weights for policy 0, policy_version 320 (0.0013)
+[2025-07-06 15:30:16,397][06149] Fps is (10 sec: 3688.4, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1314816. Throughput: 0: 869.9. Samples: 327882. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:30:16,398][06149] Avg episode reward: [(0, '6.280')]
+[2025-07-06 15:30:16,406][06624] Saving new best policy, reward=6.280!
+[2025-07-06 15:30:21,399][06149] Fps is (10 sec: 2867.8, 60 sec: 3344.9, 300 sec: 3485.0). Total num frames: 1327104. Throughput: 0: 840.2. Samples: 332254. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:30:21,401][06149] Avg episode reward: [(0, '6.350')]
+[2025-07-06 15:30:21,405][06624] Saving new best policy, reward=6.350!
+[2025-07-06 15:30:26,399][06149] Fps is (10 sec: 3276.2, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 1347584. Throughput: 0: 865.6. Samples: 335116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:30:26,400][06149] Avg episode reward: [(0, '6.030')]
+[2025-07-06 15:30:26,771][06642] Updated weights for policy 0, policy_version 330 (0.0015)
+[2025-07-06 15:30:31,397][06149] Fps is (10 sec: 4097.0, 60 sec: 3481.6, 300 sec: 3512.9). Total num frames: 1368064. Throughput: 0: 874.0. Samples: 341074. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:30:31,402][06149] Avg episode reward: [(0, '6.281')]
+[2025-07-06 15:30:36,397][06149] Fps is (10 sec: 3277.4, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 1380352. Throughput: 0: 851.1. Samples: 345372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:30:36,398][06149] Avg episode reward: [(0, '6.289')]
+[2025-07-06 15:30:38,726][06642] Updated weights for policy 0, policy_version 340 (0.0013)
+[2025-07-06 15:30:41,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1400832. Throughput: 0: 873.6. Samples: 348366. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:30:41,399][06149] Avg episode reward: [(0, '6.518')]
+[2025-07-06 15:30:41,400][06624] Saving new best policy, reward=6.518!
+[2025-07-06 15:30:46,398][06149] Fps is (10 sec: 3685.9, 60 sec: 3413.3, 300 sec: 3498.9). Total num frames: 1417216. Throughput: 0: 874.5. Samples: 354266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:30:46,400][06149] Avg episode reward: [(0, '6.584')]
+[2025-07-06 15:30:46,409][06624] Saving new best policy, reward=6.584!
+[2025-07-06 15:30:50,669][06642] Updated weights for policy 0, policy_version 350 (0.0019)
+[2025-07-06 15:30:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 1433600. Throughput: 0: 865.9. Samples: 358630. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:30:51,402][06149] Avg episode reward: [(0, '7.047')]
+[2025-07-06 15:30:51,406][06624] Saving new best policy, reward=7.047!
+[2025-07-06 15:30:56,397][06149] Fps is (10 sec: 3686.8, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 1454080. Throughput: 0: 878.3. Samples: 361540. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:30:56,399][06149] Avg episode reward: [(0, '7.382')]
+[2025-07-06 15:30:56,408][06624] Saving new best policy, reward=7.382!
+[2025-07-06 15:31:01,398][06149] Fps is (10 sec: 3686.0, 60 sec: 3481.5, 300 sec: 3498.9). Total num frames: 1470464. Throughput: 0: 868.5. Samples: 366964. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:31:01,400][06149] Avg episode reward: [(0, '7.177')]
+[2025-07-06 15:31:02,778][06642] Updated weights for policy 0, policy_version 360 (0.0013)
+[2025-07-06 15:31:06,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3481.9, 300 sec: 3499.0). Total num frames: 1486848. Throughput: 0: 879.4. Samples: 371826. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:31:06,402][06149] Avg episode reward: [(0, '7.175')]
+[2025-07-06 15:31:11,397][06149] Fps is (10 sec: 3686.9, 60 sec: 3481.8, 300 sec: 3512.8). Total num frames: 1507328. Throughput: 0: 887.5. Samples: 375054. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:31:11,402][06149] Avg episode reward: [(0, '7.724')]
+[2025-07-06 15:31:11,404][06624] Saving new best policy, reward=7.724!
+[2025-07-06 15:31:12,467][06642] Updated weights for policy 0, policy_version 370 (0.0014)
+[2025-07-06 15:31:16,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 1523712. Throughput: 0: 876.0. Samples: 380492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:31:16,400][06149] Avg episode reward: [(0, '7.615')]
+[2025-07-06 15:31:16,416][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000372_1523712.pth...
+[2025-07-06 15:31:16,506][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000168_688128.pth
+[2025-07-06 15:31:21,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3512.8). Total num frames: 1544192. Throughput: 0: 907.9. Samples: 386228. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:31:21,399][06149] Avg episode reward: [(0, '7.588')]
+[2025-07-06 15:31:23,614][06642] Updated weights for policy 0, policy_version 380 (0.0013)
+[2025-07-06 15:31:26,403][06149] Fps is (10 sec: 4093.4, 60 sec: 3617.9, 300 sec: 3512.8). Total num frames: 1564672. Throughput: 0: 911.1. Samples: 389372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:31:26,404][06149] Avg episode reward: [(0, '7.407')]
+[2025-07-06 15:31:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1581056. Throughput: 0: 891.1. Samples: 394366. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:31:31,398][06149] Avg episode reward: [(0, '8.078')]
+[2025-07-06 15:31:31,403][06624] Saving new best policy, reward=8.078!
+[2025-07-06 15:31:34,781][06642] Updated weights for policy 0, policy_version 390 (0.0020)
+[2025-07-06 15:31:36,397][06149] Fps is (10 sec: 3688.8, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 1601536. Throughput: 0: 930.5. Samples: 400504. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:31:36,404][06149] Avg episode reward: [(0, '7.921')]
+[2025-07-06 15:31:41,400][06149] Fps is (10 sec: 4094.6, 60 sec: 3686.2, 300 sec: 3526.7). Total num frames: 1622016. Throughput: 0: 935.7. Samples: 403650. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:31:41,404][06149] Avg episode reward: [(0, '7.681')]
+[2025-07-06 15:31:45,708][06642] Updated weights for policy 0, policy_version 400 (0.0018)
+[2025-07-06 15:31:46,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3526.7). Total num frames: 1638400. Throughput: 0: 924.2. Samples: 408550. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:31:46,405][06149] Avg episode reward: [(0, '7.818')]
+[2025-07-06 15:31:51,397][06149] Fps is (10 sec: 3687.6, 60 sec: 3754.7, 300 sec: 3540.6). Total num frames: 1658880. Throughput: 0: 954.8. Samples: 414794. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:31:51,398][06149] Avg episode reward: [(0, '8.791')]
+[2025-07-06 15:31:51,400][06624] Saving new best policy, reward=8.791!
+[2025-07-06 15:31:56,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3526.7). Total num frames: 1675264. Throughput: 0: 947.2. Samples: 417680. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:31:56,399][06149] Avg episode reward: [(0, '8.891')]
+[2025-07-06 15:31:56,418][06624] Saving new best policy, reward=8.891!
+[2025-07-06 15:31:57,718][06642] Updated weights for policy 0, policy_version 410 (0.0015)
+[2025-07-06 15:32:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3540.6). Total num frames: 1691648. Throughput: 0: 923.2. Samples: 422038. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:32:01,398][06149] Avg episode reward: [(0, '9.019')]
+[2025-07-06 15:32:01,402][06624] Saving new best policy, reward=9.019!
+[2025-07-06 15:32:06,397][06149] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3540.6). Total num frames: 1712128. Throughput: 0: 922.5. Samples: 427742. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:32:06,398][06149] Avg episode reward: [(0, '8.566')]
+[2025-07-06 15:32:08,735][06642] Updated weights for policy 0, policy_version 420 (0.0015)
+[2025-07-06 15:32:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3512.8). Total num frames: 1724416. Throughput: 0: 909.2. Samples: 430282. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:32:11,398][06149] Avg episode reward: [(0, '9.273')]
+[2025-07-06 15:32:11,404][06624] Saving new best policy, reward=9.273!
+[2025-07-06 15:32:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 1744896. Throughput: 0: 905.8. Samples: 435126. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:32:16,401][06149] Avg episode reward: [(0, '9.650')]
+[2025-07-06 15:32:16,414][06624] Saving new best policy, reward=9.650!
+[2025-07-06 15:32:20,113][06642] Updated weights for policy 0, policy_version 430 (0.0016)
+[2025-07-06 15:32:21,397][06149] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3540.6). Total num frames: 1765376. Throughput: 0: 899.2. Samples: 440966. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:32:21,402][06149] Avg episode reward: [(0, '9.893')]
+[2025-07-06 15:32:21,403][06624] Saving new best policy, reward=9.893!
+[2025-07-06 15:32:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3550.2, 300 sec: 3526.7). Total num frames: 1777664. Throughput: 0: 873.7. Samples: 442964. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:32:26,398][06149] Avg episode reward: [(0, '9.809')]
+[2025-07-06 15:32:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3540.6). Total num frames: 1798144. Throughput: 0: 882.2. Samples: 448248. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:32:31,399][06149] Avg episode reward: [(0, '10.684')]
+[2025-07-06 15:32:31,405][06624] Saving new best policy, reward=10.684!
+[2025-07-06 15:32:32,288][06642] Updated weights for policy 0, policy_version 440 (0.0022)
+[2025-07-06 15:32:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3526.7). Total num frames: 1814528. Throughput: 0: 868.5. Samples: 453876. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:32:36,400][06149] Avg episode reward: [(0, '11.405')]
+[2025-07-06 15:32:36,412][06624] Saving new best policy, reward=11.405!
+[2025-07-06 15:32:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.8, 300 sec: 3526.7). Total num frames: 1830912. Throughput: 0: 841.7. Samples: 455554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:32:41,398][06149] Avg episode reward: [(0, '12.262')]
+[2025-07-06 15:32:41,409][06624] Saving new best policy, reward=12.262!
+[2025-07-06 15:32:44,002][06642] Updated weights for policy 0, policy_version 450 (0.0013)
+[2025-07-06 15:32:46,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1851392. Throughput: 0: 881.2. Samples: 461690. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:32:46,398][06149] Avg episode reward: [(0, '12.095')]
+[2025-07-06 15:32:51,397][06149] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1867776. Throughput: 0: 870.9. Samples: 466932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:32:51,401][06149] Avg episode reward: [(0, '12.639')]
+[2025-07-06 15:32:51,406][06624] Saving new best policy, reward=12.639!
+[2025-07-06 15:32:55,883][06642] Updated weights for policy 0, policy_version 460 (0.0012)
+[2025-07-06 15:32:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1884160. Throughput: 0: 862.6. Samples: 469098. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:32:56,408][06149] Avg episode reward: [(0, '11.940')]
+[2025-07-06 15:33:01,397][06149] Fps is (10 sec: 3686.5, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1904640. Throughput: 0: 886.3. Samples: 475008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:33:01,404][06149] Avg episode reward: [(0, '12.313')]
+[2025-07-06 15:33:06,400][06149] Fps is (10 sec: 3275.7, 60 sec: 3413.1, 300 sec: 3512.8). Total num frames: 1916928. Throughput: 0: 857.9. Samples: 479574. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:33:06,402][06149] Avg episode reward: [(0, '13.217')]
+[2025-07-06 15:33:06,411][06624] Saving new best policy, reward=13.217!
+[2025-07-06 15:33:08,153][06642] Updated weights for policy 0, policy_version 470 (0.0014)
+[2025-07-06 15:33:11,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 1937408. Throughput: 0: 869.4. Samples: 482088. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 15:33:11,401][06149] Avg episode reward: [(0, '13.710')]
+[2025-07-06 15:33:11,405][06624] Saving new best policy, reward=13.710!
+[2025-07-06 15:33:16,397][06149] Fps is (10 sec: 3687.6, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 1953792. Throughput: 0: 882.7. Samples: 487970. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:33:16,401][06149] Avg episode reward: [(0, '13.624')]
+[2025-07-06 15:33:16,410][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000478_1957888.pth...
+[2025-07-06 15:33:16,499][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000270_1105920.pth
+[2025-07-06 15:33:19,676][06642] Updated weights for policy 0, policy_version 480 (0.0013)
+[2025-07-06 15:33:21,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 1970176. Throughput: 0: 856.5. Samples: 492418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 15:33:21,399][06149] Avg episode reward: [(0, '13.734')]
+[2025-07-06 15:33:21,400][06624] Saving new best policy, reward=13.734!
+[2025-07-06 15:33:26,405][06149] Fps is (10 sec: 3683.4, 60 sec: 3549.4, 300 sec: 3540.5). Total num frames: 1990656. Throughput: 0: 881.8. Samples: 495242. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:33:26,411][06149] Avg episode reward: [(0, '12.334')]
+[2025-07-06 15:33:30,391][06642] Updated weights for policy 0, policy_version 490 (0.0013)
+[2025-07-06 15:33:31,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2007040. Throughput: 0: 877.1. Samples: 501160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:33:31,401][06149] Avg episode reward: [(0, '11.395')]
+[2025-07-06 15:33:36,397][06149] Fps is (10 sec: 3279.5, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2023424. Throughput: 0: 857.0. Samples: 505496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:33:36,401][06149] Avg episode reward: [(0, '10.857')]
+[2025-07-06 15:33:41,397][06149] Fps is (10 sec: 3686.6, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2043904. Throughput: 0: 875.3. Samples: 508486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:33:41,398][06149] Avg episode reward: [(0, '11.545')]
+[2025-07-06 15:33:42,408][06642] Updated weights for policy 0, policy_version 500 (0.0017)
+[2025-07-06 15:33:46,400][06149] Fps is (10 sec: 3685.2, 60 sec: 3481.4, 300 sec: 3526.7). Total num frames: 2060288. Throughput: 0: 872.5. Samples: 514272. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:33:46,404][06149] Avg episode reward: [(0, '12.983')]
+[2025-07-06 15:33:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2076672. Throughput: 0: 873.0. Samples: 518856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:33:51,402][06149] Avg episode reward: [(0, '14.091')]
+[2025-07-06 15:33:51,406][06624] Saving new best policy, reward=14.091!
+[2025-07-06 15:33:54,378][06642] Updated weights for policy 0, policy_version 510 (0.0013)
+[2025-07-06 15:33:56,397][06149] Fps is (10 sec: 3687.6, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2097152. Throughput: 0: 880.7. Samples: 521720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:33:56,398][06149] Avg episode reward: [(0, '15.927')]
+[2025-07-06 15:33:56,413][06624] Saving new best policy, reward=15.927!
+[2025-07-06 15:34:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 2109440. Throughput: 0: 866.2. Samples: 526950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:34:01,399][06149] Avg episode reward: [(0, '16.477')]
+[2025-07-06 15:34:01,403][06624] Saving new best policy, reward=16.477!
+[2025-07-06 15:34:06,397][06149] Fps is (10 sec: 2867.2, 60 sec: 3481.8, 300 sec: 3526.7). Total num frames: 2125824. Throughput: 0: 872.0. Samples: 531658. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:34:06,398][06149] Avg episode reward: [(0, '16.075')]
+[2025-07-06 15:34:06,653][06642] Updated weights for policy 0, policy_version 520 (0.0014)
+[2025-07-06 15:34:11,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2146304. Throughput: 0: 876.0. Samples: 534656. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:34:11,398][06149] Avg episode reward: [(0, '17.033')]
+[2025-07-06 15:34:11,405][06624] Saving new best policy, reward=17.033!
+[2025-07-06 15:34:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 2158592. Throughput: 0: 853.1. Samples: 539548. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:34:16,398][06149] Avg episode reward: [(0, '17.473')]
+[2025-07-06 15:34:16,406][06624] Saving new best policy, reward=17.473!
+[2025-07-06 15:34:18,581][06642] Updated weights for policy 0, policy_version 530 (0.0012)
+[2025-07-06 15:34:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2179072. Throughput: 0: 877.4. Samples: 544978. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:34:21,400][06149] Avg episode reward: [(0, '16.710')]
+[2025-07-06 15:34:26,398][06149] Fps is (10 sec: 4095.4, 60 sec: 3482.0, 300 sec: 3526.7). Total num frames: 2199552. Throughput: 0: 876.6. Samples: 547934. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:34:26,401][06149] Avg episode reward: [(0, '15.698')]
+[2025-07-06 15:34:30,343][06642] Updated weights for policy 0, policy_version 540 (0.0013)
+[2025-07-06 15:34:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2215936. Throughput: 0: 846.5. Samples: 552362. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:34:31,398][06149] Avg episode reward: [(0, '15.669')]
+[2025-07-06 15:34:36,397][06149] Fps is (10 sec: 3277.2, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2232320. Throughput: 0: 874.8. Samples: 558224. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:34:36,398][06149] Avg episode reward: [(0, '14.107')]
+[2025-07-06 15:34:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 2248704. Throughput: 0: 877.6. Samples: 561212. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:34:41,402][06149] Avg episode reward: [(0, '13.492')]
+[2025-07-06 15:34:41,660][06642] Updated weights for policy 0, policy_version 550 (0.0013)
+[2025-07-06 15:34:46,399][06149] Fps is (10 sec: 3276.1, 60 sec: 3413.4, 300 sec: 3512.8). Total num frames: 2265088. Throughput: 0: 856.5. Samples: 565494. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:34:46,404][06149] Avg episode reward: [(0, '15.704')]
+[2025-07-06 15:34:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2285568. Throughput: 0: 883.0. Samples: 571392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:34:51,404][06149] Avg episode reward: [(0, '16.580')]
+[2025-07-06 15:34:52,915][06642] Updated weights for policy 0, policy_version 560 (0.0013)
+[2025-07-06 15:34:56,398][06149] Fps is (10 sec: 3686.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 2301952. Throughput: 0: 880.7. Samples: 574288. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:34:56,402][06149] Avg episode reward: [(0, '17.341')]
+[2025-07-06 15:35:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3526.8). Total num frames: 2318336. Throughput: 0: 867.7. Samples: 578594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:35:01,402][06149] Avg episode reward: [(0, '19.349')]
+[2025-07-06 15:35:01,407][06624] Saving new best policy, reward=19.349!
+[2025-07-06 15:35:05,242][06642] Updated weights for policy 0, policy_version 570 (0.0021)
+[2025-07-06 15:35:06,397][06149] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3526.8). Total num frames: 2338816. Throughput: 0: 873.2. Samples: 584274. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:35:06,405][06149] Avg episode reward: [(0, '20.379')]
+[2025-07-06 15:35:06,414][06624] Saving new best policy, reward=20.379!
+[2025-07-06 15:35:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 2351104. Throughput: 0: 862.4. Samples: 586740. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:35:11,402][06149] Avg episode reward: [(0, '20.166')]
+[2025-07-06 15:35:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3540.6). Total num frames: 2371584. Throughput: 0: 869.6. Samples: 591496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:35:16,398][06149] Avg episode reward: [(0, '19.702')]
+[2025-07-06 15:35:16,405][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000579_2371584.pth...
+[2025-07-06 15:35:16,503][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000372_1523712.pth
+[2025-07-06 15:35:17,369][06642] Updated weights for policy 0, policy_version 580 (0.0013)
+[2025-07-06 15:35:21,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2387968. Throughput: 0: 869.4. Samples: 597348. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:35:21,398][06149] Avg episode reward: [(0, '21.154')]
+[2025-07-06 15:35:21,458][06624] Saving new best policy, reward=21.154!
+[2025-07-06 15:35:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3512.8). Total num frames: 2404352. Throughput: 0: 849.0. Samples: 599418. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:35:26,398][06149] Avg episode reward: [(0, '21.333')]
+[2025-07-06 15:35:26,406][06624] Saving new best policy, reward=21.333!
+[2025-07-06 15:35:29,418][06642] Updated weights for policy 0, policy_version 590 (0.0026)
+[2025-07-06 15:35:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3526.7). Total num frames: 2420736. Throughput: 0: 866.9. Samples: 604502. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:35:31,398][06149] Avg episode reward: [(0, '21.182')]
+[2025-07-06 15:35:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2441216. Throughput: 0: 867.1. Samples: 610412. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:35:36,398][06149] Avg episode reward: [(0, '20.737')]
+[2025-07-06 15:35:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3512.9). Total num frames: 2453504. Throughput: 0: 841.4. Samples: 612150. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:35:41,398][06149] Avg episode reward: [(0, '19.511')]
+[2025-07-06 15:35:41,454][06642] Updated weights for policy 0, policy_version 600 (0.0015)
+[2025-07-06 15:35:46,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3526.7). Total num frames: 2473984. Throughput: 0: 869.5. Samples: 617720. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:35:46,402][06149] Avg episode reward: [(0, '20.273')]
+[2025-07-06 15:35:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3512.8). Total num frames: 2490368. Throughput: 0: 860.7. Samples: 623004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:35:51,400][06149] Avg episode reward: [(0, '18.654')]
+[2025-07-06 15:35:53,503][06642] Updated weights for policy 0, policy_version 610 (0.0013)
+[2025-07-06 15:35:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3512.9). Total num frames: 2506752. Throughput: 0: 846.9. Samples: 624852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:35:56,399][06149] Avg episode reward: [(0, '18.254')]
+[2025-07-06 15:36:01,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3526.7). Total num frames: 2527232. Throughput: 0: 871.7. Samples: 630724. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:36:01,400][06149] Avg episode reward: [(0, '17.656')]
+[2025-07-06 15:36:04,669][06642] Updated weights for policy 0, policy_version 620 (0.0013)
+[2025-07-06 15:36:06,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3499.0). Total num frames: 2539520. Throughput: 0: 848.8. Samples: 635546. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:36:06,398][06149] Avg episode reward: [(0, '17.461')]
+[2025-07-06 15:36:11,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3512.8). Total num frames: 2560000. Throughput: 0: 851.1. Samples: 637718. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:36:11,401][06149] Avg episode reward: [(0, '17.178')]
+[2025-07-06 15:36:16,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 2576384. Throughput: 0: 868.3. Samples: 643576. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:36:16,398][06149] Avg episode reward: [(0, '18.791')]
+[2025-07-06 15:36:16,424][06642] Updated weights for policy 0, policy_version 630 (0.0013)
+[2025-07-06 15:36:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 2592768. Throughput: 0: 839.3. Samples: 648180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:36:21,401][06149] Avg episode reward: [(0, '18.437')]
+[2025-07-06 15:36:26,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2613248. Throughput: 0: 858.9. Samples: 650802. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:36:26,402][06149] Avg episode reward: [(0, '19.433')]
+[2025-07-06 15:36:28,409][06642] Updated weights for policy 0, policy_version 640 (0.0020)
+[2025-07-06 15:36:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2629632. Throughput: 0: 863.9. Samples: 656596. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:36:31,398][06149] Avg episode reward: [(0, '20.972')]
+[2025-07-06 15:36:36,397][06149] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3457.3). Total num frames: 2641920. Throughput: 0: 842.5. Samples: 660916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:36:36,401][06149] Avg episode reward: [(0, '21.912')]
+[2025-07-06 15:36:36,473][06624] Saving new best policy, reward=21.912!
+[2025-07-06 15:36:40,623][06642] Updated weights for policy 0, policy_version 650 (0.0013)
+[2025-07-06 15:36:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2662400. Throughput: 0: 864.4. Samples: 663748. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:36:41,400][06149] Avg episode reward: [(0, '21.892')]
+[2025-07-06 15:36:46,398][06149] Fps is (10 sec: 3685.8, 60 sec: 3413.2, 300 sec: 3457.3). Total num frames: 2678784. Throughput: 0: 862.9. Samples: 669554. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:36:46,400][06149] Avg episode reward: [(0, '22.742')]
+[2025-07-06 15:36:46,415][06624] Saving new best policy, reward=22.742!
+[2025-07-06 15:36:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 2695168. Throughput: 0: 851.7. Samples: 673872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:36:51,398][06149] Avg episode reward: [(0, '23.426')]
+[2025-07-06 15:36:51,406][06624] Saving new best policy, reward=23.426!
+[2025-07-06 15:36:52,842][06642] Updated weights for policy 0, policy_version 660 (0.0013)
+[2025-07-06 15:36:56,397][06149] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2715648. Throughput: 0: 866.3. Samples: 676702. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:36:56,406][06149] Avg episode reward: [(0, '21.773')]
+[2025-07-06 15:37:01,398][06149] Fps is (10 sec: 3685.9, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 2732032. Throughput: 0: 859.3. Samples: 682246. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:37:01,402][06149] Avg episode reward: [(0, '18.514')]
+[2025-07-06 15:37:05,171][06642] Updated weights for policy 0, policy_version 670 (0.0014)
+[2025-07-06 15:37:06,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2748416. Throughput: 0: 859.4. Samples: 686854. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:37:06,398][06149] Avg episode reward: [(0, '16.956')]
+[2025-07-06 15:37:11,397][06149] Fps is (10 sec: 3687.0, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2768896. Throughput: 0: 865.2. Samples: 689736. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:37:11,398][06149] Avg episode reward: [(0, '15.704')]
+[2025-07-06 15:37:16,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 2781184. Throughput: 0: 858.8. Samples: 695242. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:37:16,402][06149] Avg episode reward: [(0, '14.781')]
+[2025-07-06 15:37:16,410][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000679_2781184.pth...
+[2025-07-06 15:37:16,502][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000478_1957888.pth
+[2025-07-06 15:37:16,670][06642] Updated weights for policy 0, policy_version 680 (0.0016)
+[2025-07-06 15:37:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2801664. Throughput: 0: 884.6. Samples: 700722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 15:37:21,407][06149] Avg episode reward: [(0, '15.211')]
+[2025-07-06 15:37:26,397][06149] Fps is (10 sec: 4096.1, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2822144. Throughput: 0: 889.3. Samples: 703768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:37:26,400][06149] Avg episode reward: [(0, '16.993')]
+[2025-07-06 15:37:26,863][06642] Updated weights for policy 0, policy_version 690 (0.0015)
+[2025-07-06 15:37:31,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2838528. Throughput: 0: 868.4. Samples: 708632. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:37:31,400][06149] Avg episode reward: [(0, '18.106')]
+[2025-07-06 15:37:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2859008. Throughput: 0: 904.1. Samples: 714556. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:37:36,398][06149] Avg episode reward: [(0, '19.567')]
+[2025-07-06 15:37:38,219][06642] Updated weights for policy 0, policy_version 700 (0.0012)
+[2025-07-06 15:37:41,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2875392. Throughput: 0: 905.1. Samples: 717432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:37:41,404][06149] Avg episode reward: [(0, '20.779')]
+[2025-07-06 15:37:46,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3471.2). Total num frames: 2891776. Throughput: 0: 876.3. Samples: 721680. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 15:37:46,398][06149] Avg episode reward: [(0, '22.450')]
+[2025-07-06 15:37:50,407][06642] Updated weights for policy 0, policy_version 710 (0.0013)
+[2025-07-06 15:37:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2908160. Throughput: 0: 904.5. Samples: 727558. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:37:51,398][06149] Avg episode reward: [(0, '23.796')]
+[2025-07-06 15:37:51,405][06624] Saving new best policy, reward=23.796!
+[2025-07-06 15:37:56,398][06149] Fps is (10 sec: 3276.3, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 2924544. Throughput: 0: 904.8. Samples: 730452. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:37:56,402][06149] Avg episode reward: [(0, '23.480')]
+[2025-07-06 15:38:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3471.2). Total num frames: 2940928. Throughput: 0: 879.4. Samples: 734816. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:38:01,401][06149] Avg episode reward: [(0, '23.883')]
+[2025-07-06 15:38:01,458][06624] Saving new best policy, reward=23.883!
+[2025-07-06 15:38:02,495][06642] Updated weights for policy 0, policy_version 720 (0.0017)
+[2025-07-06 15:38:06,397][06149] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2961408. Throughput: 0: 884.3. Samples: 740516. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:38:06,398][06149] Avg episode reward: [(0, '23.087')]
+[2025-07-06 15:38:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 2973696. Throughput: 0: 871.9. Samples: 743002. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:38:11,400][06149] Avg episode reward: [(0, '22.461')]
+[2025-07-06 15:38:14,608][06642] Updated weights for policy 0, policy_version 730 (0.0014)
+[2025-07-06 15:38:16,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2994176. Throughput: 0: 867.8. Samples: 747684. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:38:16,398][06149] Avg episode reward: [(0, '21.936')]
+[2025-07-06 15:38:21,397][06149] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3471.3). Total num frames: 3014656. Throughput: 0: 867.1. Samples: 753574. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:38:21,401][06149] Avg episode reward: [(0, '22.305')]
+[2025-07-06 15:38:26,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3026944. Throughput: 0: 850.0. Samples: 755682. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:38:26,401][06149] Avg episode reward: [(0, '21.173')]
+[2025-07-06 15:38:26,600][06642] Updated weights for policy 0, policy_version 740 (0.0013)
+[2025-07-06 15:38:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3047424. Throughput: 0: 871.8. Samples: 760910. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:38:31,402][06149] Avg episode reward: [(0, '20.386')]
+[2025-07-06 15:38:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3063808. Throughput: 0: 868.4. Samples: 766636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:38:36,402][06149] Avg episode reward: [(0, '19.737')]
+[2025-07-06 15:38:38,133][06642] Updated weights for policy 0, policy_version 750 (0.0020)
+[2025-07-06 15:38:41,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3080192. Throughput: 0: 842.5. Samples: 768362. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:38:41,405][06149] Avg episode reward: [(0, '20.756')]
+[2025-07-06 15:38:46,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3100672. Throughput: 0: 869.8. Samples: 773958. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:38:46,402][06149] Avg episode reward: [(0, '20.566')]
+[2025-07-06 15:38:49,326][06642] Updated weights for policy 0, policy_version 760 (0.0018)
+[2025-07-06 15:38:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3117056. Throughput: 0: 861.0. Samples: 779262. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:38:51,398][06149] Avg episode reward: [(0, '20.534')]
+[2025-07-06 15:38:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3471.2). Total num frames: 3133440. Throughput: 0: 849.8. Samples: 781242. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:38:56,403][06149] Avg episode reward: [(0, '20.876')]
+[2025-07-06 15:39:01,268][06642] Updated weights for policy 0, policy_version 770 (0.0018)
+[2025-07-06 15:39:01,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 3153920. Throughput: 0: 876.1. Samples: 787108. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:39:01,398][06149] Avg episode reward: [(0, '22.123')]
+[2025-07-06 15:39:06,399][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3166208. Throughput: 0: 850.4. Samples: 791840. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:39:06,401][06149] Avg episode reward: [(0, '22.937')]
+[2025-07-06 15:39:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 3186688. Throughput: 0: 857.8. Samples: 794284. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:39:11,406][06149] Avg episode reward: [(0, '21.314')]
+[2025-07-06 15:39:13,328][06642] Updated weights for policy 0, policy_version 780 (0.0017)
+[2025-07-06 15:39:16,397][06149] Fps is (10 sec: 3686.3, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3203072. Throughput: 0: 869.6. Samples: 800044. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:39:16,402][06149] Avg episode reward: [(0, '21.475')]
+[2025-07-06 15:39:16,412][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000782_3203072.pth...
+[2025-07-06 15:39:16,516][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000579_2371584.pth
+[2025-07-06 15:39:21,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3219456. Throughput: 0: 839.0. Samples: 804390. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:39:21,401][06149] Avg episode reward: [(0, '20.084')]
+[2025-07-06 15:39:25,528][06642] Updated weights for policy 0, policy_version 790 (0.0014)
+[2025-07-06 15:39:26,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 3235840. Throughput: 0: 863.8. Samples: 807232. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:39:26,402][06149] Avg episode reward: [(0, '18.628')]
+[2025-07-06 15:39:31,397][06149] Fps is (10 sec: 3686.2, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3256320. Throughput: 0: 870.3. Samples: 813124. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:39:31,398][06149] Avg episode reward: [(0, '17.965')]
+[2025-07-06 15:39:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3272704. Throughput: 0: 850.3. Samples: 817526. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:39:36,402][06149] Avg episode reward: [(0, '18.332')]
+[2025-07-06 15:39:37,387][06642] Updated weights for policy 0, policy_version 800 (0.0016)
+[2025-07-06 15:39:41,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3289088. Throughput: 0: 871.4. Samples: 820456. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:39:41,398][06149] Avg episode reward: [(0, '19.101')]
+[2025-07-06 15:39:46,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3305472. Throughput: 0: 869.3. Samples: 826226. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:39:46,398][06149] Avg episode reward: [(0, '20.238')]
+[2025-07-06 15:39:49,654][06642] Updated weights for policy 0, policy_version 810 (0.0017)
+[2025-07-06 15:39:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3321856. Throughput: 0: 862.7. Samples: 830660. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:39:51,399][06149] Avg episode reward: [(0, '22.122')]
+[2025-07-06 15:39:56,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3342336. Throughput: 0: 874.2. Samples: 833622. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:39:56,399][06149] Avg episode reward: [(0, '23.447')]
+[2025-07-06 15:40:01,185][06642] Updated weights for policy 0, policy_version 820 (0.0013)
+[2025-07-06 15:40:01,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3358720. Throughput: 0: 863.5. Samples: 838902. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:40:01,398][06149] Avg episode reward: [(0, '25.121')]
+[2025-07-06 15:40:01,403][06624] Saving new best policy, reward=25.121!
+[2025-07-06 15:40:06,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3375104. Throughput: 0: 872.9. Samples: 843672. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:40:06,404][06149] Avg episode reward: [(0, '25.532')]
+[2025-07-06 15:40:06,412][06624] Saving new best policy, reward=25.532!
+[2025-07-06 15:40:11,402][06149] Fps is (10 sec: 3684.4, 60 sec: 3481.3, 300 sec: 3471.1). Total num frames: 3395584. Throughput: 0: 874.3. Samples: 846578. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:40:11,408][06149] Avg episode reward: [(0, '25.620')]
+[2025-07-06 15:40:11,414][06624] Saving new best policy, reward=25.620!
+[2025-07-06 15:40:12,654][06642] Updated weights for policy 0, policy_version 830 (0.0014)
+[2025-07-06 15:40:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3407872. Throughput: 0: 849.6. Samples: 851354. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:40:16,402][06149] Avg episode reward: [(0, '24.069')]
+[2025-07-06 15:40:21,397][06149] Fps is (10 sec: 3278.6, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3428352. Throughput: 0: 871.9. Samples: 856762. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:40:21,402][06149] Avg episode reward: [(0, '22.612')]
+[2025-07-06 15:40:24,366][06642] Updated weights for policy 0, policy_version 840 (0.0018)
+[2025-07-06 15:40:26,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3444736. Throughput: 0: 871.7. Samples: 859684. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:40:26,401][06149] Avg episode reward: [(0, '20.331')]
+[2025-07-06 15:40:31,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.4, 300 sec: 3457.3). Total num frames: 3461120. Throughput: 0: 842.2. Samples: 864126. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:40:31,402][06149] Avg episode reward: [(0, '19.632')]
+[2025-07-06 15:40:36,307][06642] Updated weights for policy 0, policy_version 850 (0.0013)
+[2025-07-06 15:40:36,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 3481600. Throughput: 0: 871.7. Samples: 869888. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:40:36,402][06149] Avg episode reward: [(0, '20.379')]
+[2025-07-06 15:40:41,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3497984. Throughput: 0: 871.9. Samples: 872858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:40:41,404][06149] Avg episode reward: [(0, '18.751')]
+[2025-07-06 15:40:46,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3514368. Throughput: 0: 849.4. Samples: 877124. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:40:46,400][06149] Avg episode reward: [(0, '20.638')]
+[2025-07-06 15:40:48,460][06642] Updated weights for policy 0, policy_version 860 (0.0013)
+[2025-07-06 15:40:51,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3530752. Throughput: 0: 873.9. Samples: 882996. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:40:51,400][06149] Avg episode reward: [(0, '20.907')]
+[2025-07-06 15:40:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3547136. Throughput: 0: 875.6. Samples: 885974. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:40:56,401][06149] Avg episode reward: [(0, '21.966')]
+[2025-07-06 15:41:00,401][06642] Updated weights for policy 0, policy_version 870 (0.0013)
+[2025-07-06 15:41:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3563520. Throughput: 0: 864.9. Samples: 890276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:41:01,403][06149] Avg episode reward: [(0, '20.715')]
+[2025-07-06 15:41:06,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3584000. Throughput: 0: 873.6. Samples: 896076. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:41:06,398][06149] Avg episode reward: [(0, '20.278')]
+[2025-07-06 15:41:11,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3345.4, 300 sec: 3457.3). Total num frames: 3596288. Throughput: 0: 865.2. Samples: 898616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:41:11,398][06149] Avg episode reward: [(0, '20.415')]
+[2025-07-06 15:41:12,580][06642] Updated weights for policy 0, policy_version 880 (0.0013)
+[2025-07-06 15:41:16,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3616768. Throughput: 0: 871.8. Samples: 903358. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:41:16,405][06149] Avg episode reward: [(0, '21.287')]
+[2025-07-06 15:41:16,414][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000883_3616768.pth...
+[2025-07-06 15:41:16,511][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000679_2781184.pth
+[2025-07-06 15:41:21,401][06149] Fps is (10 sec: 4094.2, 60 sec: 3481.3, 300 sec: 3471.1). Total num frames: 3637248. Throughput: 0: 872.2. Samples: 909140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:41:21,402][06149] Avg episode reward: [(0, '22.518')]
+[2025-07-06 15:41:24,215][06642] Updated weights for policy 0, policy_version 890 (0.0013)
+[2025-07-06 15:41:26,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3649536. Throughput: 0: 851.8. Samples: 911190. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:41:26,398][06149] Avg episode reward: [(0, '23.306')]
+[2025-07-06 15:41:31,397][06149] Fps is (10 sec: 3278.1, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 3670016. Throughput: 0: 873.1. Samples: 916416. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:41:31,399][06149] Avg episode reward: [(0, '25.661')]
+[2025-07-06 15:41:31,400][06624] Saving new best policy, reward=25.661!
+[2025-07-06 15:41:35,094][06642] Updated weights for policy 0, policy_version 900 (0.0020)
+[2025-07-06 15:41:36,399][06149] Fps is (10 sec: 3685.7, 60 sec: 3413.2, 300 sec: 3471.2). Total num frames: 3686400. Throughput: 0: 867.5. Samples: 922036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:41:36,411][06149] Avg episode reward: [(0, '25.785')]
+[2025-07-06 15:41:36,421][06624] Saving new best policy, reward=25.785!
+[2025-07-06 15:41:41,397][06149] Fps is (10 sec: 3276.9, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3702784. Throughput: 0: 839.7. Samples: 923762. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:41:41,403][06149] Avg episode reward: [(0, '26.583')]
+[2025-07-06 15:41:41,406][06624] Saving new best policy, reward=26.583!
+[2025-07-06 15:41:46,397][06149] Fps is (10 sec: 3687.1, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 3723264. Throughput: 0: 868.0. Samples: 929338. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:41:46,402][06149] Avg episode reward: [(0, '28.137')]
+[2025-07-06 15:41:46,410][06624] Saving new best policy, reward=28.137!
+[2025-07-06 15:41:47,377][06642] Updated weights for policy 0, policy_version 910 (0.0015)
+[2025-07-06 15:41:51,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3739648. Throughput: 0: 856.3. Samples: 934610. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:41:51,398][06149] Avg episode reward: [(0, '27.853')]
+[2025-07-06 15:41:56,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3756032. Throughput: 0: 845.4. Samples: 936658. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:41:56,398][06149] Avg episode reward: [(0, '28.341')]
+[2025-07-06 15:41:56,412][06624] Saving new best policy, reward=28.341!
+[2025-07-06 15:41:59,394][06642] Updated weights for policy 0, policy_version 920 (0.0014)
+[2025-07-06 15:42:01,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3772416. Throughput: 0: 870.4. Samples: 942524. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:42:01,407][06149] Avg episode reward: [(0, '26.926')]
+[2025-07-06 15:42:06,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3788800. Throughput: 0: 849.1. Samples: 947348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:42:06,401][06149] Avg episode reward: [(0, '26.735')]
+[2025-07-06 15:42:11,188][06642] Updated weights for policy 0, policy_version 930 (0.0019)
+[2025-07-06 15:42:11,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 3809280. Throughput: 0: 858.6. Samples: 949826. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:42:11,398][06149] Avg episode reward: [(0, '27.739')]
+[2025-07-06 15:42:16,397][06149] Fps is (10 sec: 3686.5, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3825664. Throughput: 0: 874.9. Samples: 955788. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:42:16,401][06149] Avg episode reward: [(0, '26.860')]
+[2025-07-06 15:42:21,398][06149] Fps is (10 sec: 3276.4, 60 sec: 3413.5, 300 sec: 3457.3). Total num frames: 3842048. Throughput: 0: 848.6. Samples: 960224. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:42:21,404][06149] Avg episode reward: [(0, '26.362')]
+[2025-07-06 15:42:23,156][06642] Updated weights for policy 0, policy_version 940 (0.0015)
+[2025-07-06 15:42:26,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 3862528. Throughput: 0: 876.0. Samples: 963180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:42:26,398][06149] Avg episode reward: [(0, '24.394')]
+[2025-07-06 15:42:31,399][06149] Fps is (10 sec: 3686.1, 60 sec: 3481.5, 300 sec: 3457.3). Total num frames: 3878912. Throughput: 0: 884.6. Samples: 969146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:42:31,400][06149] Avg episode reward: [(0, '23.817')]
+[2025-07-06 15:42:34,778][06642] Updated weights for policy 0, policy_version 950 (0.0013)
+[2025-07-06 15:42:36,397][06149] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3457.3). Total num frames: 3895296. Throughput: 0: 869.5. Samples: 973738. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-07-06 15:42:36,398][06149] Avg episode reward: [(0, '23.139')]
+[2025-07-06 15:42:41,398][06149] Fps is (10 sec: 3687.2, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 3915776. Throughput: 0: 890.0. Samples: 976708. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:42:41,402][06149] Avg episode reward: [(0, '22.658')]
+[2025-07-06 15:42:46,027][06642] Updated weights for policy 0, policy_version 960 (0.0013)
+[2025-07-06 15:42:46,398][06149] Fps is (10 sec: 3685.9, 60 sec: 3481.5, 300 sec: 3471.2). Total num frames: 3932160. Throughput: 0: 888.0. Samples: 982486. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:42:46,402][06149] Avg episode reward: [(0, '23.051')]
+[2025-07-06 15:42:51,397][06149] Fps is (10 sec: 3276.7, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3948544. Throughput: 0: 883.8. Samples: 987120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:42:51,399][06149] Avg episode reward: [(0, '23.135')]
+[2025-07-06 15:42:56,397][06149] Fps is (10 sec: 3686.9, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 3969024. Throughput: 0: 895.1. Samples: 990104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-07-06 15:42:56,398][06149] Avg episode reward: [(0, '23.407')]
+[2025-07-06 15:42:56,842][06642] Updated weights for policy 0, policy_version 970 (0.0013)
+[2025-07-06 15:43:01,397][06149] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 3985408. Throughput: 0: 887.0. Samples: 995704. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-07-06 15:43:01,400][06149] Avg episode reward: [(0, '23.154')]
+[2025-07-06 15:43:06,265][06624] Stopping Batcher_0...
+[2025-07-06 15:43:06,266][06624] Loop batcher_evt_loop terminating...
+[2025-07-06 15:43:06,266][06149] Component Batcher_0 stopped!
+[2025-07-06 15:43:06,271][06149] Component RolloutWorker_w0 process died already! Don't wait for it.
+[2025-07-06 15:43:06,274][06149] Component RolloutWorker_w1 process died already! Don't wait for it.
+[2025-07-06 15:43:06,278][06149] Component RolloutWorker_w3 process died already! Don't wait for it.
+[2025-07-06 15:43:06,280][06149] Component RolloutWorker_w6 process died already! Don't wait for it.
+[2025-07-06 15:43:06,284][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:43:06,314][06642] Weights refcount: 2 0
+[2025-07-06 15:43:06,318][06642] Stopping InferenceWorker_p0-w0...
+[2025-07-06 15:43:06,318][06149] Component InferenceWorker_p0-w0 stopped!
+[2025-07-06 15:43:06,319][06642] Loop inference_proc0-0_evt_loop terminating...
+[2025-07-06 15:43:06,378][06624] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000782_3203072.pth
+[2025-07-06 15:43:06,391][06624] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:43:06,514][06149] Component LearnerWorker_p0 stopped!
+[2025-07-06 15:43:06,513][06624] Stopping LearnerWorker_p0...
+[2025-07-06 15:43:06,521][06624] Loop learner_proc0_evt_loop terminating...
+[2025-07-06 15:43:06,605][06149] Component RolloutWorker_w5 stopped!
+[2025-07-06 15:43:06,606][06643] Stopping RolloutWorker_w5...
+[2025-07-06 15:43:06,609][06643] Loop rollout_proc5_evt_loop terminating...
+[2025-07-06 15:43:06,645][06149] Component RolloutWorker_w7 stopped!
+[2025-07-06 15:43:06,646][06645] Stopping RolloutWorker_w7...
+[2025-07-06 15:43:06,647][06645] Loop rollout_proc7_evt_loop terminating...
+[2025-07-06 15:43:06,688][06641] Stopping RolloutWorker_w4...
+[2025-07-06 15:43:06,688][06149] Component RolloutWorker_w4 stopped!
+[2025-07-06 15:43:06,695][06149] Component RolloutWorker_w2 stopped!
+[2025-07-06 15:43:06,697][06149] Waiting for process learner_proc0 to stop...
+[2025-07-06 15:43:06,695][06639] Stopping RolloutWorker_w2...
+[2025-07-06 15:43:06,702][06639] Loop rollout_proc2_evt_loop terminating...
+[2025-07-06 15:43:06,704][06641] Loop rollout_proc4_evt_loop terminating...
+[2025-07-06 15:43:08,008][06149] Waiting for process inference_proc0-0 to join...
+[2025-07-06 15:43:08,013][06149] Waiting for process rollout_proc0 to join...
+[2025-07-06 15:43:08,017][06149] Waiting for process rollout_proc1 to join...
+[2025-07-06 15:43:08,018][06149] Waiting for process rollout_proc2 to join...
+[2025-07-06 15:43:08,887][06149] Waiting for process rollout_proc3 to join...
+[2025-07-06 15:43:08,888][06149] Waiting for process rollout_proc4 to join...
+[2025-07-06 15:43:08,891][06149] Waiting for process rollout_proc5 to join...
+[2025-07-06 15:43:08,892][06149] Waiting for process rollout_proc6 to join...
+[2025-07-06 15:43:08,893][06149] Waiting for process rollout_proc7 to join...
+[2025-07-06 15:43:08,894][06149] Batcher 0 profile tree view:
+batching: 22.7740, releasing_batches: 0.0319
+[2025-07-06 15:43:08,895][06149] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0015
+  wait_policy_total: 460.4266
+update_model: 9.8839
+  weight_update: 0.0014
+one_step: 0.0030
+  handle_policy_step: 641.1391
+    deserialize: 15.3284, stack: 3.9142, obs_to_device_normalize: 141.4910, forward: 340.6104, send_messages: 24.8876
+    prepare_outputs: 87.0798
+      to_cpu: 52.9134
+[2025-07-06 15:43:08,900][06149] Learner 0 profile tree view:
+misc: 0.0043, prepare_batch: 11.9594
+train: 68.0338
+  epoch_init: 0.0113, minibatch_init: 0.0109, losses_postprocess: 0.5875, kl_divergence: 0.5524, after_optimizer: 32.5001
+  calculate_losses: 22.9152
+    losses_init: 0.0091, forward_head: 1.3074, bptt_initial: 15.6625, tail: 0.9521, advantages_returns: 0.2239, losses: 2.8113
+    bptt: 1.7258
+      bptt_forward_core: 1.6308
+  update: 10.9270
+    clip: 1.0328
+[2025-07-06 15:43:08,902][06149] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.4567, enqueue_policy_requests: 172.7543, env_step: 797.9871, overhead: 20.5533, complete_rollouts: 6.9287
+save_policy_outputs: 27.4127
+  split_output_tensors: 10.4911
+[2025-07-06 15:43:08,903][06149] Loop Runner_EvtLoop terminating...
+[2025-07-06 15:43:08,905][06149] Runner profile tree view:
+main_loop: 1186.8546
+[2025-07-06 15:43:08,907][06149] Collected {0: 4005888}, FPS: 3375.2
+[2025-07-06 15:43:55,037][06149] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-07-06 15:43:55,038][06149] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-07-06 15:43:55,040][06149] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-07-06 15:43:55,042][06149] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-07-06 15:43:55,043][06149] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-07-06 15:43:55,045][06149] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-07-06 15:43:55,046][06149] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-07-06 15:43:55,047][06149] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-07-06 15:43:55,048][06149] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-07-06 15:43:55,049][06149] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-07-06 15:43:55,050][06149] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-07-06 15:43:55,051][06149] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-07-06 15:43:55,052][06149] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-07-06 15:43:55,053][06149] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-07-06 15:43:55,054][06149] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-07-06 15:43:55,097][06149] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 15:43:55,100][06149] RunningMeanStd input shape: (3, 72, 128)
+[2025-07-06 15:43:55,102][06149] RunningMeanStd input shape: (1,)
+[2025-07-06 15:43:55,115][06149] ConvEncoder: input_channels=3
+[2025-07-06 15:43:55,228][06149] Conv encoder output size: 512
+[2025-07-06 15:43:55,229][06149] Policy head output size: 512
+[2025-07-06 15:43:55,496][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:43:55,499][06149] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:43:55,503][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:43:55,505][06149] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:43:55,506][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:43:55,508][06149] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:44:34,428][06149] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-07-06 15:44:34,431][06149] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-07-06 15:44:34,431][06149] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-07-06 15:44:34,433][06149] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-07-06 15:44:34,434][06149] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-07-06 15:44:34,435][06149] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-07-06 15:44:34,436][06149] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-07-06 15:44:34,440][06149] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-07-06 15:44:34,441][06149] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-07-06 15:44:34,442][06149] Adding new argument 'hf_repository'='zhngq/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-07-06 15:44:34,442][06149] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-07-06 15:44:34,443][06149] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-07-06 15:44:34,444][06149] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-07-06 15:44:34,444][06149] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-07-06 15:44:34,448][06149] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-07-06 15:44:34,500][06149] RunningMeanStd input shape: (3, 72, 128)
+[2025-07-06 15:44:34,503][06149] RunningMeanStd input shape: (1,)
+[2025-07-06 15:44:34,517][06149] ConvEncoder: input_channels=3
+[2025-07-06 15:44:34,578][06149] Conv encoder output size: 512
+[2025-07-06 15:44:34,580][06149] Policy head output size: 512
+[2025-07-06 15:44:34,610][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:44:34,613][06149] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:44:34,615][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:44:34,617][06149] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:44:34,619][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:44:34,624][06149] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:55:34,865][06149] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-07-06 15:55:34,866][06149] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-07-06 15:55:34,867][06149] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-07-06 15:55:34,868][06149] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-07-06 15:55:34,869][06149] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-07-06 15:55:34,869][06149] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-07-06 15:55:34,870][06149] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-07-06 15:55:34,871][06149] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-07-06 15:55:34,872][06149] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-07-06 15:55:34,873][06149] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-07-06 15:55:34,874][06149] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-07-06 15:55:34,875][06149] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-07-06 15:55:34,875][06149] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-07-06 15:55:34,876][06149] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-07-06 15:55:34,877][06149] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-07-06 15:55:34,905][06149] RunningMeanStd input shape: (3, 72, 128)
+[2025-07-06 15:55:34,907][06149] RunningMeanStd input shape: (1,)
+[2025-07-06 15:55:34,919][06149] ConvEncoder: input_channels=3
+[2025-07-06 15:55:34,956][06149] Conv encoder output size: 512
+[2025-07-06 15:55:34,957][06149] Policy head output size: 512
+[2025-07-06 15:55:34,977][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:55:34,979][06149] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:55:34,980][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:55:34,982][06149] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:55:34,984][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:55:34,985][06149] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:55:41,045][06149] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-07-06 15:55:41,046][06149] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-07-06 15:55:41,047][06149] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-07-06 15:55:41,048][06149] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-07-06 15:55:41,048][06149] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-07-06 15:55:41,049][06149] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-07-06 15:55:41,050][06149] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-07-06 15:55:41,051][06149] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-07-06 15:55:41,052][06149] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-07-06 15:55:41,053][06149] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-07-06 15:55:41,054][06149] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-07-06 15:55:41,055][06149] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-07-06 15:55:41,055][06149] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-07-06 15:55:41,056][06149] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-07-06 15:55:41,057][06149] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-07-06 15:55:41,104][06149] RunningMeanStd input shape: (3, 72, 128)
+[2025-07-06 15:55:41,106][06149] RunningMeanStd input shape: (1,)
+[2025-07-06 15:55:41,121][06149] ConvEncoder: input_channels=3
+[2025-07-06 15:55:41,177][06149] Conv encoder output size: 512
+[2025-07-06 15:55:41,178][06149] Policy head output size: 512
+[2025-07-06 15:55:41,210][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:55:41,212][06149] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:55:41,214][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:55:41,215][06149] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:55:41,216][06149] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 15:55:41,218][06149] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, , weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 15:55:46,205][06149] Environment doom_basic already registered, overwriting...
+[2025-07-06 15:55:46,206][06149] Environment doom_two_colors_easy already registered, overwriting...
+[2025-07-06 15:55:46,207][06149] Environment doom_two_colors_hard already registered, overwriting...
+[2025-07-06 15:55:46,208][06149] Environment doom_dm already registered, overwriting...
+[2025-07-06 15:55:46,208][06149] Environment doom_dwango5 already registered, overwriting...
+[2025-07-06 15:55:46,209][06149] Environment doom_my_way_home_flat_actions already registered, overwriting...
+[2025-07-06 15:55:46,210][06149] Environment doom_defend_the_center_flat_actions already registered, overwriting...
+[2025-07-06 15:55:46,211][06149] Environment doom_my_way_home already registered, overwriting...
+[2025-07-06 15:55:46,211][06149] Environment doom_deadly_corridor already registered, overwriting...
+[2025-07-06 15:55:46,212][06149] Environment doom_defend_the_center already registered, overwriting...
+[2025-07-06 15:55:46,213][06149] Environment doom_defend_the_line already registered, overwriting...
+[2025-07-06 15:55:46,215][06149] Environment doom_health_gathering already registered, overwriting...
+[2025-07-06 15:55:46,216][06149] Environment doom_health_gathering_supreme already registered, overwriting...
+[2025-07-06 15:55:46,216][06149] Environment doom_battle already registered, overwriting...
+[2025-07-06 15:55:46,217][06149] Environment doom_battle2 already registered, overwriting...
+[2025-07-06 15:55:46,219][06149] Environment doom_duel_bots already registered, overwriting...
+[2025-07-06 15:55:46,219][06149] Environment doom_deathmatch_bots already registered, overwriting...
+[2025-07-06 15:55:46,220][06149] Environment doom_duel already registered, overwriting...
+[2025-07-06 15:55:46,221][06149] Environment doom_deathmatch_full already registered, overwriting...
+[2025-07-06 15:55:46,222][06149] Environment doom_benchmark already registered, overwriting...
+[2025-07-06 15:55:46,222][06149] register_encoder_factory: <function make_vizdoom_encoder at 0x7db6b33211c0>
+[2025-07-06 15:55:46,240][06149] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-07-06 15:55:46,245][06149] Experiment dir /content/train_dir/default_experiment already exists!
+[2025-07-06 15:55:46,245][06149] Resuming existing experiment from /content/train_dir/default_experiment...
+[2025-07-06 15:55:46,246][06149] Weights and Biases integration disabled
+[2025-07-06 15:55:46,249][06149] Environment var CUDA_VISIBLE_DEVICES is 0
+
+[2025-07-06 15:55:48,666][06149] Starting experiment with the following configuration:
+help=False
+algo=APPO
+env=doom_health_gathering_supreme
+experiment=default_experiment
+train_dir=/content/train_dir
+restart_behavior=resume
+device=gpu
+seed=None
+num_policies=1
+async_rl=True
+serial_mode=False
+batched_sampling=False
+num_batches_to_accumulate=2
+worker_num_splits=2
+policy_workers_per_policy=1
+max_policy_lag=1000
+num_workers=8
+num_envs_per_worker=4
+batch_size=1024
+num_batches_per_epoch=1
+num_epochs=1
+rollout=32
+recurrence=32
+shuffle_minibatches=False
+gamma=0.99
+reward_scale=1.0
+reward_clip=1000.0
+value_bootstrap=False
+normalize_returns=True
+exploration_loss_coeff=0.001
+value_loss_coeff=0.5
+kl_loss_coeff=0.0
+exploration_loss=symmetric_kl
+gae_lambda=0.95
+ppo_clip_ratio=0.1
+ppo_clip_value=0.2
+with_vtrace=False
+vtrace_rho=1.0
+vtrace_c=1.0
+optimizer=adam
+adam_eps=1e-06
+adam_beta1=0.9
+adam_beta2=0.999
+max_grad_norm=4.0
+learning_rate=0.0001
+lr_schedule=constant
+lr_schedule_kl_threshold=0.008
+lr_adaptive_min=1e-06
+lr_adaptive_max=0.01
+obs_subtract_mean=0.0
+obs_scale=255.0
+normalize_input=True
+normalize_input_keys=None
+decorrelate_experience_max_seconds=0
+decorrelate_envs_on_one_worker=True
+actor_worker_gpus=[]
+set_workers_cpu_affinity=True
+force_envs_single_thread=False
+default_niceness=0
+log_to_file=True
+experiment_summaries_interval=10
+flush_summaries_interval=30
+stats_avg=100
+summaries_use_frameskip=True
+heartbeat_interval=20
+heartbeat_reporting_interval=600
+train_for_env_steps=4000000
+train_for_seconds=10000000000
+save_every_sec=120
+keep_checkpoints=2
+load_checkpoint_kind=latest
+save_milestones_sec=-1
+save_best_every_sec=5
+save_best_metric=reward
+save_best_after=100000
+benchmark=False
+encoder_mlp_layers=[512, 512]
+encoder_conv_architecture=convnet_simple
+encoder_conv_mlp_layers=[512]
+use_rnn=True
+rnn_size=512
+rnn_type=gru
+rnn_num_layers=1
+decoder_mlp_layers=[]
+nonlinearity=elu
+policy_initialization=orthogonal
+policy_init_gain=1.0
+actor_critic_share_weights=True
+adaptive_stddev=True
+continuous_tanh_scale=0.0
+initial_stddev=1.0
+use_env_info_cache=False
+env_gpu_actions=False
+env_gpu_observations=True
+env_frameskip=4
+env_framestack=1
+pixel_format=CHW
+use_record_episode_statistics=False
+with_wandb=False
+wandb_user=None
+wandb_project=sample_factory
+wandb_group=None
+wandb_job_type=SF
+wandb_tags=[]
+with_pbt=False
+pbt_mix_policies_in_one_env=True
+pbt_period_env_steps=5000000
+pbt_start_mutation=20000000
+pbt_replace_fraction=0.3
+pbt_mutation_rate=0.15
+pbt_replace_reward_gap=0.1
+pbt_replace_reward_gap_absolute=1e-06
+pbt_optimize_gamma=False
+pbt_target_objective=true_objective
+pbt_perturb_min=1.1
+pbt_perturb_max=1.5
+num_agents=-1
+num_humans=0
+num_bots=-1
+start_bot_difficulty=None
+timelimit=None
+res_w=128
+res_h=72
+wide_aspect_ratio=False
+eval_env_frameskip=1
+fps=35
+command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
+cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
+git_hash=unknown
+git_repo_name=not a git repository
+[2025-07-06 15:55:48,667][06149] Saving configuration to /content/train_dir/default_experiment/config.json...
+[2025-07-06 15:55:48,669][06149] Rollout worker 0 uses device cpu
+[2025-07-06 15:55:48,670][06149] Rollout worker 1 uses device cpu
+[2025-07-06 15:55:48,671][06149] Rollout worker 2 uses device cpu
+[2025-07-06 15:55:48,672][06149] Rollout worker 3 uses device cpu
+[2025-07-06 15:55:48,673][06149] Rollout worker 4 uses device cpu
+[2025-07-06 15:55:48,675][06149] Rollout worker 5 uses device cpu
+[2025-07-06 15:55:48,676][06149] Rollout worker 6 uses device cpu
+[2025-07-06 15:55:48,676][06149] Rollout worker 7 uses device cpu
+[2025-07-06 15:55:48,771][06149] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 15:55:48,772][06149] InferenceWorker_p0-w0: min num requests: 2
+[2025-07-06 15:55:48,803][06149] Starting all processes...
+[2025-07-06 15:55:48,804][06149] Starting process learner_proc0
+[2025-07-06 15:55:48,857][06149] Starting all processes...
+[2025-07-06 15:55:48,863][06149] Starting process inference_proc0-0
+[2025-07-06 15:55:48,863][06149] Starting process rollout_proc0
+[2025-07-06 15:55:48,866][06149] Starting process rollout_proc1
+[2025-07-06 15:55:48,866][06149] Starting process rollout_proc2
+[2025-07-06 15:55:48,866][06149] Starting process rollout_proc3
+[2025-07-06 15:55:48,866][06149] Starting process rollout_proc4
+[2025-07-06 15:55:48,866][06149] Starting process rollout_proc5
+[2025-07-06 15:55:48,866][06149] Starting process rollout_proc6
+[2025-07-06 15:55:48,866][06149] Starting process rollout_proc7
+[2025-07-06 15:56:07,015][16021] Worker 0 uses CPU cores [0]
+[2025-07-06 15:56:07,034][16023] Worker 2 uses CPU cores [0]
+[2025-07-06 15:56:07,227][16028] Worker 7 uses CPU cores [1]
+[2025-07-06 15:56:07,562][16025] Worker 4 uses CPU cores [0]
+[2025-07-06 15:56:07,582][16027] Worker 6 uses CPU cores [0]
+[2025-07-06 15:56:07,671][16026] Worker 5 uses CPU cores [1]
+[2025-07-06 15:56:07,815][16024] Worker 3 uses CPU cores [1]
+[2025-07-06 15:56:07,889][16022] Worker 1 uses CPU cores [1]
+[2025-07-06 15:56:07,911][16020] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 15:56:07,912][16020] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2025-07-06 15:56:07,933][16020] Num visible devices: 1
+[2025-07-06 15:56:08,772][06149] Heartbeat connected on InferenceWorker_p0-w0
+[2025-07-06 15:56:08,779][06149] Heartbeat connected on RolloutWorker_w0
+[2025-07-06 15:56:08,782][06149] Heartbeat connected on RolloutWorker_w1
+[2025-07-06 15:56:08,785][06149] Heartbeat connected on RolloutWorker_w2
+[2025-07-06 15:56:08,789][06149] Heartbeat connected on RolloutWorker_w3
+[2025-07-06 15:56:08,792][06149] Heartbeat connected on RolloutWorker_w4
+[2025-07-06 15:56:08,796][06149] Heartbeat connected on RolloutWorker_w5
+[2025-07-06 15:56:08,799][06149] Heartbeat connected on RolloutWorker_w6
+[2025-07-06 15:56:08,803][06149] Heartbeat connected on RolloutWorker_w7
+[2025-07-06 16:05:46,250][06149] Components not started: Batcher_0, LearnerWorker_p0, wait_time=600.0 seconds
+[2025-07-06 16:15:46,249][06149] Components not started: Batcher_0, LearnerWorker_p0, wait_time=1200.0 seconds
+[2025-07-06 16:16:41,202][06149] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 6149], exiting...
+[2025-07-06 16:16:41,204][16026] Stopping RolloutWorker_w5...
+[2025-07-06 16:16:41,204][16026] Loop rollout_proc5_evt_loop terminating...
+[2025-07-06 16:16:41,206][16024] Stopping RolloutWorker_w3...
+[2025-07-06 16:16:41,206][16021] Stopping RolloutWorker_w0...
+[2025-07-06 16:16:41,206][16024] Loop rollout_proc3_evt_loop terminating...
+[2025-07-06 16:16:41,206][16021] Loop rollout_proc0_evt_loop terminating...
+[2025-07-06 16:16:41,208][16023] Stopping RolloutWorker_w2...
+[2025-07-06 16:16:41,208][16020] Stopping InferenceWorker_p0-w0...
+[2025-07-06 16:16:41,208][16023] Loop rollout_proc2_evt_loop terminating...
+[2025-07-06 16:16:41,209][16020] Loop inference_proc0-0_evt_loop terminating...
+[2025-07-06 16:16:41,205][16027] Stopping RolloutWorker_w6...
+[2025-07-06 16:16:41,211][16027] Loop rollout_proc6_evt_loop terminating...
+[2025-07-06 16:16:41,205][06149] Runner profile tree view:
+main_loop: 1252.4020
+[2025-07-06 16:16:41,214][16022] Stopping RolloutWorker_w1...
+[2025-07-06 16:16:41,214][06149] Collected {}, FPS: 0.0
+[2025-07-06 16:16:41,215][16025] Stopping RolloutWorker_w4...
+[2025-07-06 16:16:41,221][16025] Loop rollout_proc4_evt_loop terminating...
+[2025-07-06 16:16:41,221][16022] Loop rollout_proc1_evt_loop terminating...
+[2025-07-06 16:16:41,220][16028] Stopping RolloutWorker_w7...
+[2025-07-06 16:16:41,235][16028] Loop rollout_proc7_evt_loop terminating...
+[2025-07-06 16:21:33,573][21969] Saving configuration to /content/train_dir/default_experiment/config.json...
+[2025-07-06 16:21:33,577][21969] Rollout worker 0 uses device cpu
+[2025-07-06 16:21:33,578][21969] Rollout worker 1 uses device cpu
+[2025-07-06 16:21:33,579][21969] Rollout worker 2 uses device cpu
+[2025-07-06 16:21:33,580][21969] Rollout worker 3 uses device cpu
+[2025-07-06 16:21:33,581][21969] Rollout worker 4 uses device cpu
+[2025-07-06 16:21:33,582][21969] Rollout worker 5 uses device cpu
+[2025-07-06 16:21:33,583][21969] Rollout worker 6 uses device cpu
+[2025-07-06 16:21:33,583][21969] Rollout worker 7 uses device cpu
+[2025-07-06 16:21:33,694][21969] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 16:21:33,695][21969] InferenceWorker_p0-w0: min num requests: 2
+[2025-07-06 16:21:33,723][21969] Starting all processes...
+[2025-07-06 16:21:33,724][21969] Starting process learner_proc0
+[2025-07-06 16:21:33,775][21969] Starting all processes...
+[2025-07-06 16:21:33,781][21969] Starting process inference_proc0-0
+[2025-07-06 16:21:33,781][21969] Starting process rollout_proc0
+[2025-07-06 16:21:33,782][21969] Starting process rollout_proc1
+[2025-07-06 16:21:33,782][21969] Starting process rollout_proc2
+[2025-07-06 16:21:33,782][21969] Starting process rollout_proc3
+[2025-07-06 16:21:33,782][21969] Starting process rollout_proc4
+[2025-07-06 16:21:33,782][21969] Starting process rollout_proc5
+[2025-07-06 16:21:33,782][21969] Starting process rollout_proc6
+[2025-07-06 16:21:33,782][21969] Starting process rollout_proc7
+[2025-07-06 16:21:50,796][22699] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 16:21:50,801][22699] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2025-07-06 16:21:50,853][22699] Num visible devices: 1
+[2025-07-06 16:21:50,871][22699] Starting seed is not provided
+[2025-07-06 16:21:50,872][22699] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 16:21:50,872][22699] Initializing actor-critic model on device cuda:0
+[2025-07-06 16:21:50,873][22699] RunningMeanStd input shape: (3, 72, 128)
+[2025-07-06 16:21:50,875][22699] RunningMeanStd input shape: (1,)
+[2025-07-06 16:21:50,967][22699] ConvEncoder: input_channels=3
+[2025-07-06 16:21:51,051][22717] Worker 5 uses CPU cores [1]
+[2025-07-06 16:21:51,205][22716] Worker 3 uses CPU cores [1]
+[2025-07-06 16:21:51,207][22720] Worker 6 uses CPU cores [0]
+[2025-07-06 16:21:51,271][22712] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 16:21:51,271][22712] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2025-07-06 16:21:51,338][22713] Worker 0 uses CPU cores [0]
+[2025-07-06 16:21:51,345][22719] Worker 4 uses CPU cores [0]
+[2025-07-06 16:21:51,350][22712] Num visible devices: 1
+[2025-07-06 16:21:51,366][22714] Worker 1 uses CPU cores [1]
+[2025-07-06 16:21:51,420][22715] Worker 2 uses CPU cores [0]
+[2025-07-06 16:21:51,442][22718] Worker 7 uses CPU cores [1]
+[2025-07-06 16:21:51,459][22699] Conv encoder output size: 512
+[2025-07-06 16:21:51,459][22699] Policy head output size: 512
+[2025-07-06 16:21:51,474][22699] Created Actor Critic model with architecture:
+[2025-07-06 16:21:51,474][22699] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2025-07-06 16:21:51,739][22699] Using optimizer <class 'torch.optim.adam.Adam'>
+[2025-07-06 16:21:52,650][22699] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 16:21:52,652][22699] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 16:21:52,653][22699] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 16:21:52,654][22699] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 16:21:52,655][22699] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 16:21:52,656][22699] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-07-06 16:21:52,657][22699] Did not load from checkpoint, starting from scratch!
+[2025-07-06 16:21:52,657][22699] Initialized policy 0 weights for model version 0
+[2025-07-06 16:21:52,663][22699] LearnerWorker_p0 finished initialization!
+[2025-07-06 16:21:52,663][22699] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-07-06 16:21:52,891][22712] RunningMeanStd input shape: (3, 72, 128)
+[2025-07-06 16:21:52,892][22712] RunningMeanStd input shape: (1,)
+[2025-07-06 16:21:52,905][22712] ConvEncoder: input_channels=3
+[2025-07-06 16:21:53,008][22712] Conv encoder output size: 512
+[2025-07-06 16:21:53,009][22712] Policy head output size: 512
+[2025-07-06 16:21:53,045][21969] Inference worker 0-0 is ready!
+[2025-07-06 16:21:53,046][21969] All inference workers are ready! Signal rollout workers to start!
+[2025-07-06 16:21:53,289][22720] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 16:21:53,307][22719] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 16:21:53,309][22715] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 16:21:53,305][22713] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 16:21:53,325][22716] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 16:21:53,328][22714] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 16:21:53,324][22717] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 16:21:53,326][22718] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 16:21:53,687][21969] Heartbeat connected on Batcher_0
+[2025-07-06 16:21:53,690][21969] Heartbeat connected on LearnerWorker_p0
+[2025-07-06 16:21:53,734][21969] Heartbeat connected on InferenceWorker_p0-w0
+[2025-07-06 16:21:54,230][21969] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-07-06 16:21:55,455][22720] Decorrelating experience for 0 frames...
+[2025-07-06 16:21:55,466][22715] Decorrelating experience for 0 frames...
+[2025-07-06 16:21:55,470][22713] Decorrelating experience for 0 frames...
+[2025-07-06 16:21:55,478][22719] Decorrelating experience for 0 frames...
+[2025-07-06 16:21:55,563][22717] Decorrelating experience for 0 frames...
+[2025-07-06 16:21:55,545][22718] Decorrelating experience for 0 frames...
+[2025-07-06 16:21:55,556][22716] Decorrelating experience for 0 frames...
+[2025-07-06 16:21:56,783][22715] Decorrelating experience for 32 frames...
+[2025-07-06 16:21:56,791][22713] Decorrelating experience for 32 frames...
+[2025-07-06 16:21:56,927][22714] Decorrelating experience for 0 frames...
+[2025-07-06 16:21:56,967][22717] Decorrelating experience for 32 frames...
+[2025-07-06 16:21:57,025][22720] Decorrelating experience for 32 frames...
+[2025-07-06 16:21:58,821][22714] Decorrelating experience for 32 frames...
+[2025-07-06 16:21:58,825][22718] Decorrelating experience for 32 frames...
+[2025-07-06 16:21:58,949][22719] Decorrelating experience for 32 frames...
+[2025-07-06 16:21:58,988][22716] Decorrelating experience for 32 frames...
+[2025-07-06 16:21:59,233][21969] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-07-06 16:21:59,355][22713] Decorrelating experience for 64 frames...
+[2025-07-06 16:21:59,392][22715] Decorrelating experience for 64 frames...
+[2025-07-06 16:21:59,591][22717] Decorrelating experience for 64 frames...
+[2025-07-06 16:21:59,646][22720] Decorrelating experience for 64 frames...
+[2025-07-06 16:22:00,464][22714] Decorrelating experience for 64 frames...
+[2025-07-06 16:22:00,551][22717] Decorrelating experience for 96 frames...
+[2025-07-06 16:22:00,749][21969] Heartbeat connected on RolloutWorker_w5
+[2025-07-06 16:22:00,748][22713] Decorrelating experience for 96 frames...
+[2025-07-06 16:22:00,789][22715] Decorrelating experience for 96 frames...
+[2025-07-06 16:22:00,889][22719] Decorrelating experience for 64 frames...
+[2025-07-06 16:22:01,063][21969] Heartbeat connected on RolloutWorker_w0
+[2025-07-06 16:22:01,157][21969] Heartbeat connected on RolloutWorker_w2
+[2025-07-06 16:22:01,847][22720] Decorrelating experience for 96 frames...
+[2025-07-06 16:22:02,089][21969] Heartbeat connected on RolloutWorker_w6
+[2025-07-06 16:22:02,614][22716] Decorrelating experience for 64 frames...
+[2025-07-06 16:22:02,616][22714] Decorrelating experience for 96 frames...
+[2025-07-06 16:22:02,906][21969] Heartbeat connected on RolloutWorker_w1
+[2025-07-06 16:22:02,934][22718] Decorrelating experience for 64 frames...
+[2025-07-06 16:22:04,230][21969] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 31.6. Samples: 316. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-07-06 16:22:04,232][21969] Avg episode reward: [(0, '1.913')]
+[2025-07-06 16:22:05,545][22699] Signal inference workers to stop experience collection...
+[2025-07-06 16:22:05,568][22712] InferenceWorker_p0-w0: stopping experience collection
+[2025-07-06 16:22:05,643][22716] Decorrelating experience for 96 frames...
+[2025-07-06 16:22:05,983][21969] Heartbeat connected on RolloutWorker_w3
+[2025-07-06 16:22:06,005][22718] Decorrelating experience for 96 frames...
+[2025-07-06 16:22:06,102][21969] Heartbeat connected on RolloutWorker_w7
+[2025-07-06 16:22:06,126][22719] Decorrelating experience for 96 frames...
+[2025-07-06 16:22:06,208][21969] Heartbeat connected on RolloutWorker_w4
+[2025-07-06 16:22:07,356][22699] Signal inference workers to resume experience collection...
+[2025-07-06 16:22:07,361][22712] InferenceWorker_p0-w0: resuming experience collection
+[2025-07-06 16:22:09,234][21969] Fps is (10 sec: 1228.7, 60 sec: 819.0, 300 sec: 819.0). Total num frames: 12288. Throughput: 0: 198.5. Samples: 2978. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2025-07-06 16:22:09,235][21969] Avg episode reward: [(0, '3.024')]
+[2025-07-06 16:22:14,230][21969] Fps is (10 sec: 2048.0, 60 sec: 1024.0, 300 sec: 1024.0). Total num frames: 20480. Throughput: 0: 335.0. Samples: 6700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 3.0)
+[2025-07-06 16:22:14,232][21969] Avg episode reward: [(0, '3.465')]
+[2025-07-06 16:22:18,313][22712] Updated weights for policy 0, policy_version 10 (0.0117)
+[2025-07-06 16:22:19,232][21969] Fps is (10 sec: 3277.4, 60 sec: 1802.1, 300 sec: 1802.1). Total num frames: 45056. Throughput: 0: 378.8. Samples: 9470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:22:19,235][21969] Avg episode reward: [(0, '4.111')]
+[2025-07-06 16:22:24,230][21969] Fps is (10 sec: 4505.6, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 65536. Throughput: 0: 535.7. Samples: 16072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:22:24,234][21969] Avg episode reward: [(0, '4.343')]
+[2025-07-06 16:22:29,230][21969] Fps is (10 sec: 3277.4, 60 sec: 2223.5, 300 sec: 2223.5). Total num frames: 77824. Throughput: 0: 597.8. Samples: 20924. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-07-06 16:22:29,233][21969] Avg episode reward: [(0, '4.274')]
+[2025-07-06 16:22:29,569][22712] Updated weights for policy 0, policy_version 20 (0.0014)
+[2025-07-06 16:22:34,231][21969] Fps is (10 sec: 3686.3, 60 sec: 2560.0, 300 sec: 2560.0). Total num frames: 102400. Throughput: 0: 597.8. Samples: 23912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:22:34,234][21969] Avg episode reward: [(0, '4.435')]
+[2025-07-06 16:22:34,237][22699] Saving new best policy, reward=4.435!
+[2025-07-06 16:22:38,598][22712] Updated weights for policy 0, policy_version 30 (0.0026)
+[2025-07-06 16:22:39,230][21969] Fps is (10 sec: 4505.6, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 687.6. Samples: 30944. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0)
+[2025-07-06 16:22:39,232][21969] Avg episode reward: [(0, '4.584')]
+[2025-07-06 16:22:39,237][22699] Saving new best policy, reward=4.584!
+[2025-07-06 16:22:44,230][21969] Fps is (10 sec: 3686.5, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 794.5. Samples: 35748. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-07-06 16:22:44,235][21969] Avg episode reward: [(0, '4.604')]
+[2025-07-06 16:22:44,243][22699] Saving new best policy, reward=4.604!
+[2025-07-06 16:22:49,230][21969] Fps is (10 sec: 3686.4, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 159744. Throughput: 0: 852.4. Samples: 38676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:22:49,232][21969] Avg episode reward: [(0, '4.618')]
+[2025-07-06 16:22:49,239][22699] Saving new best policy, reward=4.618!
+[2025-07-06 16:22:49,837][22712] Updated weights for policy 0, policy_version 40 (0.0020)
+[2025-07-06 16:22:54,230][21969] Fps is (10 sec: 4096.0, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 180224. Throughput: 0: 943.4. Samples: 45426. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-07-06 16:22:54,234][21969] Avg episode reward: [(0, '4.411')]
+[2025-07-06 16:22:59,231][21969] Fps is (10 sec: 3686.3, 60 sec: 3277.0, 300 sec: 3024.7). Total num frames: 196608. Throughput: 0: 978.1. Samples: 50714. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-07-06 16:22:59,232][21969] Avg episode reward: [(0, '4.403')]
+[2025-07-06 16:23:00,467][22712] Updated weights for policy 0, policy_version 50 (0.0022)
+[2025-07-06 16:23:04,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3101.3). Total num frames: 217088. Throughput: 0: 978.0. Samples: 53476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:23:04,232][21969] Avg episode reward: [(0, '4.341')]
+[2025-07-06 16:23:09,230][21969] Fps is (10 sec: 4505.7, 60 sec: 3823.2, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 986.2. Samples: 60452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:23:09,232][21969] Avg episode reward: [(0, '4.468')]
+[2025-07-06 16:23:09,427][22712] Updated weights for policy 0, policy_version 60 (0.0016)
+[2025-07-06 16:23:14,231][21969] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3225.6). Total num frames: 258048. Throughput: 0: 990.5. Samples: 65498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:23:14,234][21969] Avg episode reward: [(0, '4.520')]
+[2025-07-06 16:23:19,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3276.8). Total num frames: 278528. Throughput: 0: 990.3. Samples: 68476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-07-06 16:23:19,232][21969] Avg episode reward: [(0, '4.381')]
+[2025-07-06 16:23:20,233][22712] Updated weights for policy 0, policy_version 70 (0.0032)
+[2025-07-06 16:23:24,230][21969] Fps is (10 sec: 4505.7, 60 sec: 3959.5, 300 sec: 3367.8). Total num frames: 303104. Throughput: 0: 983.7. Samples: 75212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-07-06 16:23:24,234][21969] Avg episode reward: [(0, '4.365')]
+[2025-07-06 16:23:29,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 997.1. Samples: 80618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:23:29,232][21969] Avg episode reward: [(0, '4.453')]
+[2025-07-06 16:23:29,239][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth...
+[2025-07-06 16:23:29,388][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth
+[2025-07-06 16:23:31,223][22712] Updated weights for policy 0, policy_version 80 (0.0043)
+[2025-07-06 16:23:34,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3399.7). Total num frames: 339968. Throughput: 0: 993.7. Samples: 83394. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:23:34,234][21969] Avg episode reward: [(0, '4.544')]
+[2025-07-06 16:23:39,230][21969] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3432.8). Total num frames: 360448. Throughput: 0: 1000.9. Samples: 90466. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-07-06 16:23:39,232][21969] Avg episode reward: [(0, '4.453')]
+[2025-07-06 16:23:40,268][22712] Updated weights for policy 0, policy_version 90 (0.0028)
+[2025-07-06 16:23:44,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3425.7). Total num frames: 376832. Throughput: 0: 998.0. Samples: 95622. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
+[2025-07-06 16:23:44,239][21969] Avg episode reward: [(0, '4.570')]
+[2025-07-06 16:23:49,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3454.9). Total num frames: 397312. Throughput: 0: 1001.3. Samples: 98534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:23:49,234][21969] Avg episode reward: [(0, '4.640')]
+[2025-07-06 16:23:49,259][22699] Saving new best policy, reward=4.640!
+[2025-07-06 16:23:50,974][22712] Updated weights for policy 0, policy_version 100 (0.0026)
+[2025-07-06 16:23:54,231][21969] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3515.7). Total num frames: 421888. Throughput: 0: 1003.7. Samples: 105620. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:23:54,232][21969] Avg episode reward: [(0, '4.655')]
+[2025-07-06 16:23:54,236][22699] Saving new best policy, reward=4.655!
+[2025-07-06 16:23:59,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3506.2). Total num frames: 438272. Throughput: 0: 1009.6. Samples: 110928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:23:59,234][21969] Avg episode reward: [(0, '4.625')]
+[2025-07-06 16:24:01,367][22712] Updated weights for policy 0, policy_version 110 (0.0022)
+[2025-07-06 16:24:04,230][21969] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3560.4). Total num frames: 462848. Throughput: 0: 1011.5. Samples: 113994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:24:04,235][21969] Avg episode reward: [(0, '4.480')]
+[2025-07-06 16:24:09,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 1014.0. Samples: 120842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:24:09,235][21969] Avg episode reward: [(0, '4.422')]
+[2025-07-06 16:24:10,534][22712] Updated weights for policy 0, policy_version 120 (0.0013)
+[2025-07-06 16:24:14,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3569.4). Total num frames: 499712. Throughput: 0: 1010.8. Samples: 126106. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:24:14,235][21969] Avg episode reward: [(0, '4.435')]
+[2025-07-06 16:24:19,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3615.8). Total num frames: 524288. Throughput: 0: 1018.8. Samples: 129238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:24:19,235][21969] Avg episode reward: [(0, '4.612')]
+[2025-07-06 16:24:20,818][22712] Updated weights for policy 0, policy_version 130 (0.0017)
+[2025-07-06 16:24:24,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3631.8). Total num frames: 544768. Throughput: 0: 1018.7. Samples: 136308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:24:24,232][21969] Avg episode reward: [(0, '4.709')]
+[2025-07-06 16:24:24,233][22699] Saving new best policy, reward=4.709!
+[2025-07-06 16:24:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3620.3). Total num frames: 561152. Throughput: 0: 1014.5. Samples: 141276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:24:29,239][21969] Avg episode reward: [(0, '4.543')]
+[2025-07-06 16:24:31,723][22712] Updated weights for policy 0, policy_version 140 (0.0023)
+[2025-07-06 16:24:34,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3635.2). Total num frames: 581632. Throughput: 0: 1020.1. Samples: 144440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:24:34,232][21969] Avg episode reward: [(0, '4.369')]
+[2025-07-06 16:24:39,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3674.0). Total num frames: 606208. Throughput: 0: 1020.7. Samples: 151550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:24:39,233][21969] Avg episode reward: [(0, '4.506')]
+[2025-07-06 16:24:41,080][22712] Updated weights for policy 0, policy_version 150 (0.0041)
+[2025-07-06 16:24:44,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3662.3). Total num frames: 622592. Throughput: 0: 1006.9. Samples: 156240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:24:44,234][21969] Avg episode reward: [(0, '4.526')]
+[2025-07-06 16:24:49,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3674.7). Total num frames: 643072. Throughput: 0: 1015.1. Samples: 159672. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:24:49,232][21969] Avg episode reward: [(0, '4.634')]
+[2025-07-06 16:24:51,141][22712] Updated weights for policy 0, policy_version 160 (0.0014)
+[2025-07-06 16:24:54,231][21969] Fps is (10 sec: 4505.3, 60 sec: 4096.0, 300 sec: 3709.1). Total num frames: 667648. Throughput: 0: 1013.6. Samples: 166454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:24:54,234][21969] Avg episode reward: [(0, '4.468')]
+[2025-07-06 16:24:59,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3697.5). Total num frames: 684032. Throughput: 0: 1009.7. Samples: 171544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:24:59,232][21969] Avg episode reward: [(0, '4.582')]
+[2025-07-06 16:25:02,016][22712] Updated weights for policy 0, policy_version 170 (0.0019)
+[2025-07-06 16:25:04,230][21969] Fps is (10 sec: 3686.6, 60 sec: 4027.7, 300 sec: 3708.0). Total num frames: 704512. Throughput: 0: 1009.8. Samples: 174678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:25:04,235][21969] Avg episode reward: [(0, '4.613')]
+[2025-07-06 16:25:09,231][21969] Fps is (10 sec: 4095.7, 60 sec: 4027.7, 300 sec: 3717.9). Total num frames: 724992. Throughput: 0: 1011.0. Samples: 181806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:25:09,234][21969] Avg episode reward: [(0, '4.572')]
+[2025-07-06 16:25:12,213][22712] Updated weights for policy 0, policy_version 180 (0.0012)
+[2025-07-06 16:25:14,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3706.9). Total num frames: 741376. Throughput: 0: 1002.7. Samples: 186398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:25:14,231][21969] Avg episode reward: [(0, '4.752')]
+[2025-07-06 16:25:14,235][22699] Saving new best policy, reward=4.752!
+[2025-07-06 16:25:19,230][21969] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 3736.4). Total num frames: 765952. Throughput: 0: 1009.0. Samples: 189846. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:25:19,232][21969] Avg episode reward: [(0, '4.875')]
+[2025-07-06 16:25:19,239][22699] Saving new best policy, reward=4.875!
+[2025-07-06 16:25:21,602][22712] Updated weights for policy 0, policy_version 190 (0.0016)
+[2025-07-06 16:25:24,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3744.9). Total num frames: 786432. Throughput: 0: 1000.9. Samples: 196592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:25:24,232][21969] Avg episode reward: [(0, '5.008')]
+[2025-07-06 16:25:24,235][22699] Saving new best policy, reward=5.008!
+[2025-07-06 16:25:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3734.0). Total num frames: 802816. Throughput: 0: 1004.2. Samples: 201428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-07-06 16:25:29,232][21969] Avg episode reward: [(0, '5.237')]
+[2025-07-06 16:25:29,242][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_802816.pth...
+[2025-07-06 16:25:29,359][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_802816.pth
+[2025-07-06 16:25:29,368][22699] Saving new best policy, reward=5.237!
+[2025-07-06 16:25:32,779][22712] Updated weights for policy 0, policy_version 200 (0.0028)
+[2025-07-06 16:25:34,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3742.3). Total num frames: 823296. Throughput: 0: 1000.7. Samples: 204702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:25:34,232][21969] Avg episode reward: [(0, '5.120')]
+[2025-07-06 16:25:39,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3768.3). Total num frames: 847872. Throughput: 0: 1005.6. Samples: 211704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:25:39,235][21969] Avg episode reward: [(0, '5.135')]
+[2025-07-06 16:25:43,366][22712] Updated weights for policy 0, policy_version 210 (0.0031)
+[2025-07-06 16:25:44,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3739.8). Total num frames: 860160. Throughput: 0: 997.7. Samples: 216442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:25:44,232][21969] Avg episode reward: [(0, '5.029')]
+[2025-07-06 16:25:49,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3764.8). Total num frames: 884736. Throughput: 0: 1001.8. Samples: 219758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:25:49,232][21969] Avg episode reward: [(0, '4.779')]
+[2025-07-06 16:25:52,388][22712] Updated weights for policy 0, policy_version 220 (0.0025)
+[2025-07-06 16:25:54,231][21969] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3771.7). Total num frames: 905216. Throughput: 0: 992.9. Samples: 226484. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:25:54,232][21969] Avg episode reward: [(0, '4.928')]
+[2025-07-06 16:25:59,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3761.6). Total num frames: 921600. Throughput: 0: 1000.8. Samples: 231434. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:25:59,234][21969] Avg episode reward: [(0, '4.723')]
+[2025-07-06 16:26:03,443][22712] Updated weights for policy 0, policy_version 230 (0.0023)
+[2025-07-06 16:26:04,230][21969] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3768.3). Total num frames: 942080. Throughput: 0: 1001.4. Samples: 234908. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:26:04,232][21969] Avg episode reward: [(0, '4.758')]
+[2025-07-06 16:26:09,230][21969] Fps is (10 sec: 4505.5, 60 sec: 4027.8, 300 sec: 3790.8). Total num frames: 966656. Throughput: 0: 996.4. Samples: 241430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:26:09,234][21969] Avg episode reward: [(0, '5.185')]
+[2025-07-06 16:26:14,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3765.2). Total num frames: 978944. Throughput: 0: 991.0. Samples: 246022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:26:14,232][21969] Avg episode reward: [(0, '5.207')]
+[2025-07-06 16:26:14,498][22712] Updated weights for policy 0, policy_version 240 (0.0035)
+[2025-07-06 16:26:19,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3786.9). Total num frames: 1003520. Throughput: 0: 996.1. Samples: 249526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-07-06 16:26:19,235][21969] Avg episode reward: [(0, '5.271')]
+[2025-07-06 16:26:19,243][22699] Saving new best policy, reward=5.271!
+[2025-07-06 16:26:23,374][22712] Updated weights for policy 0, policy_version 250 (0.0020)
+[2025-07-06 16:26:24,232][21969] Fps is (10 sec: 4504.7, 60 sec: 3959.3, 300 sec: 3792.6). Total num frames: 1024000. Throughput: 0: 994.3. Samples: 256448. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:26:24,234][21969] Avg episode reward: [(0, '5.585')]
+[2025-07-06 16:26:24,243][22699] Saving new best policy, reward=5.585!
+[2025-07-06 16:26:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3783.2). Total num frames: 1040384. Throughput: 0: 990.6. Samples: 261018. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:26:29,235][21969] Avg episode reward: [(0, '5.720')]
+[2025-07-06 16:26:29,242][22699] Saving new best policy, reward=5.720!
+[2025-07-06 16:26:34,230][21969] Fps is (10 sec: 3687.1, 60 sec: 3959.5, 300 sec: 3788.8). Total num frames: 1060864. Throughput: 0: 995.1. Samples: 264536. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-07-06 16:26:34,234][21969] Avg episode reward: [(0, '5.829')]
+[2025-07-06 16:26:34,237][22699] Saving new best policy, reward=5.829!
+[2025-07-06 16:26:34,501][22712] Updated weights for policy 0, policy_version 260 (0.0024)
+[2025-07-06 16:26:39,230][21969] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3794.2). Total num frames: 1081344. Throughput: 0: 991.0. Samples: 271078. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:26:39,232][21969] Avg episode reward: [(0, '5.731')]
+[2025-07-06 16:26:44,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3785.3). Total num frames: 1097728. Throughput: 0: 982.5. Samples: 275648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:26:44,232][21969] Avg episode reward: [(0, '6.122')]
+[2025-07-06 16:26:44,235][22699] Saving new best policy, reward=6.122!
+[2025-07-06 16:26:45,275][22712] Updated weights for policy 0, policy_version 270 (0.0017)
+[2025-07-06 16:26:49,230][21969] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 981.6. Samples: 279078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:26:49,235][21969] Avg episode reward: [(0, '6.485')]
+[2025-07-06 16:26:49,241][22699] Saving new best policy, reward=6.485!
+[2025-07-06 16:26:54,232][21969] Fps is (10 sec: 4504.8, 60 sec: 3959.4, 300 sec: 3873.9). Total num frames: 1142784. Throughput: 0: 988.8. Samples: 285926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:26:54,233][21969] Avg episode reward: [(0, '6.685')]
+[2025-07-06 16:26:54,236][22699] Saving new best policy, reward=6.685!
+[2025-07-06 16:26:55,374][22712] Updated weights for policy 0, policy_version 280 (0.0025)
+[2025-07-06 16:26:59,231][21969] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 3929.4). Total num frames: 1159168. Throughput: 0: 991.4. Samples: 290636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:26:59,236][21969] Avg episode reward: [(0, '6.580')]
+[2025-07-06 16:27:04,231][21969] Fps is (10 sec: 3686.7, 60 sec: 3959.4, 300 sec: 3957.2). Total num frames: 1179648. Throughput: 0: 991.1. Samples: 294126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:27:04,233][21969] Avg episode reward: [(0, '6.445')]
+[2025-07-06 16:27:05,458][22712] Updated weights for policy 0, policy_version 290 (0.0012)
+[2025-07-06 16:27:09,234][21969] Fps is (10 sec: 4094.5, 60 sec: 3891.0, 300 sec: 3998.8). Total num frames: 1200128. Throughput: 0: 981.0. Samples: 300596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:27:09,240][21969] Avg episode reward: [(0, '7.028')]
+[2025-07-06 16:27:09,249][22699] Saving new best policy, reward=7.028!
+[2025-07-06 16:27:14,230][21969] Fps is (10 sec: 3686.8, 60 sec: 3959.5, 300 sec: 3971.1). Total num frames: 1216512. Throughput: 0: 989.1. Samples: 305528. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:27:14,234][21969] Avg episode reward: [(0, '7.067')]
+[2025-07-06 16:27:14,290][22699] Saving new best policy, reward=7.067!
+[2025-07-06 16:27:16,284][22712] Updated weights for policy 0, policy_version 300 (0.0037)
+[2025-07-06 16:27:19,230][21969] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1241088. Throughput: 0: 983.1. Samples: 308776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:27:19,232][21969] Avg episode reward: [(0, '8.048')]
+[2025-07-06 16:27:19,238][22699] Saving new best policy, reward=8.048!
+[2025-07-06 16:27:24,230][21969] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 4012.7). Total num frames: 1261568. Throughput: 0: 984.2. Samples: 315368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:27:24,232][21969] Avg episode reward: [(0, '8.172')]
+[2025-07-06 16:27:24,235][22699] Saving new best policy, reward=8.172!
+[2025-07-06 16:27:27,001][22712] Updated weights for policy 0, policy_version 310 (0.0020)
+[2025-07-06 16:27:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1277952. Throughput: 0: 988.9. Samples: 320150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:27:29,232][21969] Avg episode reward: [(0, '8.521')]
+[2025-07-06 16:27:29,244][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000312_1277952.pth...
+[2025-07-06 16:27:29,358][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000312_1277952.pth
+[2025-07-06 16:27:29,377][22699] Saving new best policy, reward=8.521!
+[2025-07-06 16:27:34,234][21969] Fps is (10 sec: 4094.7, 60 sec: 4027.5, 300 sec: 3998.8). Total num frames: 1302528. Throughput: 0: 990.1. Samples: 323634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:27:34,235][21969] Avg episode reward: [(0, '8.098')]
+[2025-07-06 16:27:36,208][22712] Updated weights for policy 0, policy_version 320 (0.0012)
+[2025-07-06 16:27:39,234][21969] Fps is (10 sec: 4094.4, 60 sec: 3959.2, 300 sec: 3998.8). Total num frames: 1318912. Throughput: 0: 983.5. Samples: 330186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:27:39,236][21969] Avg episode reward: [(0, '7.891')]
+[2025-07-06 16:27:44,230][21969] Fps is (10 sec: 3277.9, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1335296. Throughput: 0: 993.8. Samples: 335358. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:27:44,234][21969] Avg episode reward: [(0, '8.292')]
+[2025-07-06 16:27:47,142][22712] Updated weights for policy 0, policy_version 330 (0.0023)
+[2025-07-06 16:27:49,230][21969] Fps is (10 sec: 4097.6, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1359872. Throughput: 0: 987.0. Samples: 338540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:27:49,234][21969] Avg episode reward: [(0, '8.530')]
+[2025-07-06 16:27:49,241][22699] Saving new best policy, reward=8.530!
+[2025-07-06 16:27:54,230][21969] Fps is (10 sec: 4505.6, 60 sec: 3959.6, 300 sec: 4012.7). Total num frames: 1380352. Throughput: 0: 990.3. Samples: 345156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:27:54,232][21969] Avg episode reward: [(0, '8.611')]
+[2025-07-06 16:27:54,234][22699] Saving new best policy, reward=8.611!
+[2025-07-06 16:27:57,926][22712] Updated weights for policy 0, policy_version 340 (0.0030)
+[2025-07-06 16:27:59,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1396736. Throughput: 0: 988.6. Samples: 350014. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:27:59,232][21969] Avg episode reward: [(0, '9.139')]
+[2025-07-06 16:27:59,237][22699] Saving new best policy, reward=9.139!
+[2025-07-06 16:28:04,230][21969] Fps is (10 sec: 4095.9, 60 sec: 4027.8, 300 sec: 3998.8). Total num frames: 1421312. Throughput: 0: 996.3. Samples: 353608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 16:28:04,232][21969] Avg episode reward: [(0, '9.720')]
+[2025-07-06 16:28:04,242][22699] Saving new best policy, reward=9.720!
+[2025-07-06 16:28:07,019][22712] Updated weights for policy 0, policy_version 350 (0.0019)
+[2025-07-06 16:28:09,234][21969] Fps is (10 sec: 4094.5, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1437696. Throughput: 0: 991.2. Samples: 359974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:28:09,235][21969] Avg episode reward: [(0, '10.337')]
+[2025-07-06 16:28:09,240][22699] Saving new best policy, reward=10.337!
+[2025-07-06 16:28:14,230][21969] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 1454080. Throughput: 0: 999.1. Samples: 365110. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:28:14,231][21969] Avg episode reward: [(0, '10.160')]
+[2025-07-06 16:28:17,502][22712] Updated weights for policy 0, policy_version 360 (0.0029)
+[2025-07-06 16:28:19,230][21969] Fps is (10 sec: 4507.3, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1482752. Throughput: 0: 1001.5. Samples: 368696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:28:19,232][21969] Avg episode reward: [(0, '9.732')]
+[2025-07-06 16:28:24,230][21969] Fps is (10 sec: 4915.1, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1503232. Throughput: 0: 1004.7. Samples: 375394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-07-06 16:28:24,236][21969] Avg episode reward: [(0, '10.091')]
+[2025-07-06 16:28:27,902][22712] Updated weights for policy 0, policy_version 370 (0.0021)
+[2025-07-06 16:28:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 1519616. Throughput: 0: 1009.8. Samples: 380800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:28:29,232][21969] Avg episode reward: [(0, '10.192')]
+[2025-07-06 16:28:34,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4012.7). Total num frames: 1544192. Throughput: 0: 1018.4. Samples: 384370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:28:34,232][21969] Avg episode reward: [(0, '11.170')]
+[2025-07-06 16:28:34,234][22699] Saving new best policy, reward=11.170!
+[2025-07-06 16:28:36,568][22712] Updated weights for policy 0, policy_version 380 (0.0028)
+[2025-07-06 16:28:39,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 4012.7). Total num frames: 1560576. Throughput: 0: 1016.4. Samples: 390892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:28:39,235][21969] Avg episode reward: [(0, '11.379')]
+[2025-07-06 16:28:39,243][22699] Saving new best policy, reward=11.379!
+[2025-07-06 16:28:44,230][21969] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1581056. Throughput: 0: 1032.2. Samples: 396462. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:28:44,232][21969] Avg episode reward: [(0, '11.825')]
+[2025-07-06 16:28:44,233][22699] Saving new best policy, reward=11.825!
+[2025-07-06 16:28:47,108][22712] Updated weights for policy 0, policy_version 390 (0.0033)
+[2025-07-06 16:28:49,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1605632. Throughput: 0: 1029.8. Samples: 399950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:28:49,232][21969] Avg episode reward: [(0, '11.361')]
+[2025-07-06 16:28:54,238][21969] Fps is (10 sec: 4502.1, 60 sec: 4095.5, 300 sec: 4026.5). Total num frames: 1626112. Throughput: 0: 1028.4. Samples: 406254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:28:54,242][21969] Avg episode reward: [(0, '12.089')]
+[2025-07-06 16:28:54,244][22699] Saving new best policy, reward=12.089!
+[2025-07-06 16:28:57,499][22712] Updated weights for policy 0, policy_version 400 (0.0016)
+[2025-07-06 16:28:59,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1642496. Throughput: 0: 1044.8. Samples: 412126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:28:59,235][21969] Avg episode reward: [(0, '13.213')]
+[2025-07-06 16:28:59,242][22699] Saving new best policy, reward=13.213!
+[2025-07-06 16:29:04,230][21969] Fps is (10 sec: 4099.2, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1667072. Throughput: 0: 1043.4. Samples: 415650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:29:04,235][21969] Avg episode reward: [(0, '12.562')]
+[2025-07-06 16:29:06,488][22712] Updated weights for policy 0, policy_version 410 (0.0029)
+[2025-07-06 16:29:09,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.3, 300 sec: 4012.7). Total num frames: 1683456. Throughput: 0: 1026.5. Samples: 421588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:29:09,237][21969] Avg episode reward: [(0, '12.581')]
+[2025-07-06 16:29:14,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4012.7). Total num frames: 1708032. Throughput: 0: 1036.4. Samples: 427438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:29:14,235][21969] Avg episode reward: [(0, '12.854')]
+[2025-07-06 16:29:16,776][22712] Updated weights for policy 0, policy_version 420 (0.0019)
+[2025-07-06 16:29:19,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1728512. Throughput: 0: 1035.8. Samples: 430980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:29:19,234][21969] Avg episode reward: [(0, '11.992')]
+[2025-07-06 16:29:24,231][21969] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1744896. Throughput: 0: 1017.7. Samples: 436690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:29:24,232][21969] Avg episode reward: [(0, '13.642')]
+[2025-07-06 16:29:24,237][22699] Saving new best policy, reward=13.642!
+[2025-07-06 16:29:27,616][22712] Updated weights for policy 0, policy_version 430 (0.0020)
+[2025-07-06 16:29:29,232][21969] Fps is (10 sec: 3685.7, 60 sec: 4095.9, 300 sec: 4012.7). Total num frames: 1765376. Throughput: 0: 1027.9. Samples: 442718. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:29:29,236][21969] Avg episode reward: [(0, '14.008')]
+[2025-07-06 16:29:29,249][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000432_1769472.pth...
+[2025-07-06 16:29:29,380][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000432_1769472.pth
+[2025-07-06 16:29:29,396][22699] Saving new best policy, reward=14.008!
+[2025-07-06 16:29:34,230][21969] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1789952. Throughput: 0: 1018.3. Samples: 445772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:29:34,232][21969] Avg episode reward: [(0, '15.457')]
+[2025-07-06 16:29:34,236][22699] Saving new best policy, reward=15.457!
+[2025-07-06 16:29:37,270][22712] Updated weights for policy 0, policy_version 440 (0.0029)
+[2025-07-06 16:29:39,231][21969] Fps is (10 sec: 4096.5, 60 sec: 4095.9, 300 sec: 4012.7). Total num frames: 1806336. Throughput: 0: 1008.7. Samples: 451640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:29:39,232][21969] Avg episode reward: [(0, '16.413')]
+[2025-07-06 16:29:39,239][22699] Saving new best policy, reward=16.413!
+[2025-07-06 16:29:44,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1826816. Throughput: 0: 1006.6. Samples: 457422. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:29:44,235][21969] Avg episode reward: [(0, '15.265')]
+[2025-07-06 16:29:47,351][22712] Updated weights for policy 0, policy_version 450 (0.0036)
+[2025-07-06 16:29:49,230][21969] Fps is (10 sec: 4505.9, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1851392. Throughput: 0: 1008.4. Samples: 461030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:29:49,235][21969] Avg episode reward: [(0, '15.870')]
+[2025-07-06 16:29:54,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4028.3, 300 sec: 4012.7). Total num frames: 1867776. Throughput: 0: 1002.9. Samples: 466718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:29:54,233][21969] Avg episode reward: [(0, '15.342')]
+[2025-07-06 16:29:57,803][22712] Updated weights for policy 0, policy_version 460 (0.0019)
+[2025-07-06 16:29:59,230][21969] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1888256. Throughput: 0: 1013.5. Samples: 473046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:29:59,234][21969] Avg episode reward: [(0, '15.721')]
+[2025-07-06 16:30:04,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1912832. Throughput: 0: 1014.7. Samples: 476640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:30:04,232][21969] Avg episode reward: [(0, '17.046')]
+[2025-07-06 16:30:04,240][22699] Saving new best policy, reward=17.046!
+[2025-07-06 16:30:07,759][22712] Updated weights for policy 0, policy_version 470 (0.0016)
+[2025-07-06 16:30:09,231][21969] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1925120. Throughput: 0: 1008.6. Samples: 482078. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:30:09,234][21969] Avg episode reward: [(0, '15.981')]
+[2025-07-06 16:30:14,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1949696. Throughput: 0: 1020.6. Samples: 488642. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:30:14,235][21969] Avg episode reward: [(0, '15.943')]
+[2025-07-06 16:30:16,943][22712] Updated weights for policy 0, policy_version 480 (0.0025)
+[2025-07-06 16:30:19,230][21969] Fps is (10 sec: 4915.3, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1974272. Throughput: 0: 1032.6. Samples: 492238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:30:19,234][21969] Avg episode reward: [(0, '14.844')]
+[2025-07-06 16:30:24,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1990656. Throughput: 0: 1024.4. Samples: 497738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:30:24,232][21969] Avg episode reward: [(0, '14.609')]
+[2025-07-06 16:30:27,250][22712] Updated weights for policy 0, policy_version 490 (0.0012)
+[2025-07-06 16:30:29,231][21969] Fps is (10 sec: 4095.9, 60 sec: 4164.4, 300 sec: 4040.5). Total num frames: 2015232. Throughput: 0: 1043.6. Samples: 504386. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:30:29,232][21969] Avg episode reward: [(0, '15.237')]
+[2025-07-06 16:30:34,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2035712. Throughput: 0: 1044.6. Samples: 508038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:30:34,232][21969] Avg episode reward: [(0, '13.871')]
+[2025-07-06 16:30:37,449][22712] Updated weights for policy 0, policy_version 500 (0.0015)
+[2025-07-06 16:30:39,230][21969] Fps is (10 sec: 3686.5, 60 sec: 4096.1, 300 sec: 4040.5). Total num frames: 2052096. Throughput: 0: 1029.0. Samples: 513022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:30:39,232][21969] Avg episode reward: [(0, '15.281')]
+[2025-07-06 16:30:44,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 2076672. Throughput: 0: 1033.3. Samples: 519544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:30:44,231][21969] Avg episode reward: [(0, '16.847')]
+[2025-07-06 16:30:46,880][22712] Updated weights for policy 0, policy_version 510 (0.0021)
+[2025-07-06 16:30:49,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2097152. Throughput: 0: 1032.5. Samples: 523104. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:30:49,232][21969] Avg episode reward: [(0, '16.736')]
+[2025-07-06 16:30:54,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2113536. Throughput: 0: 1024.7. Samples: 528188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:30:54,234][21969] Avg episode reward: [(0, '17.369')]
+[2025-07-06 16:30:54,238][22699] Saving new best policy, reward=17.369!
+[2025-07-06 16:30:57,705][22712] Updated weights for policy 0, policy_version 520 (0.0012)
+[2025-07-06 16:30:59,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2134016. Throughput: 0: 1025.5. Samples: 534790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:30:59,233][21969] Avg episode reward: [(0, '19.156')]
+[2025-07-06 16:30:59,242][22699] Saving new best policy, reward=19.156!
+[2025-07-06 16:31:04,231][21969] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 4040.5). Total num frames: 2158592. Throughput: 0: 1022.6. Samples: 538258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:31:04,233][21969] Avg episode reward: [(0, '18.132')]
+[2025-07-06 16:31:08,469][22712] Updated weights for policy 0, policy_version 530 (0.0025)
+[2025-07-06 16:31:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2170880. Throughput: 0: 1006.5. Samples: 543032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:31:09,232][21969] Avg episode reward: [(0, '19.412')]
+[2025-07-06 16:31:09,237][22699] Saving new best policy, reward=19.412!
+[2025-07-06 16:31:14,230][21969] Fps is (10 sec: 3686.7, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2195456. Throughput: 0: 1009.7. Samples: 549822. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:31:14,232][21969] Avg episode reward: [(0, '19.267')]
+[2025-07-06 16:31:17,552][22712] Updated weights for policy 0, policy_version 540 (0.0019)
+[2025-07-06 16:31:19,231][21969] Fps is (10 sec: 4505.2, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2215936. Throughput: 0: 1001.9. Samples: 553126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:31:19,233][21969] Avg episode reward: [(0, '19.484')]
+[2025-07-06 16:31:19,241][22699] Saving new best policy, reward=19.484!
+[2025-07-06 16:31:24,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2232320. Throughput: 0: 1000.8. Samples: 558056. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:31:24,232][21969] Avg episode reward: [(0, '19.790')]
+[2025-07-06 16:31:24,234][22699] Saving new best policy, reward=19.790!
+[2025-07-06 16:31:28,487][22712] Updated weights for policy 0, policy_version 550 (0.0012)
+[2025-07-06 16:31:29,230][21969] Fps is (10 sec: 3686.7, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2252800. Throughput: 0: 1002.3. Samples: 564648. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:31:29,234][21969] Avg episode reward: [(0, '20.623')]
+[2025-07-06 16:31:29,242][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth...
+[2025-07-06 16:31:29,368][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000550_2252800.pth
+[2025-07-06 16:31:29,380][22699] Saving new best policy, reward=20.623!
+[2025-07-06 16:31:34,230][21969] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2277376. Throughput: 0: 1000.4. Samples: 568120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:31:34,232][21969] Avg episode reward: [(0, '20.635')]
+[2025-07-06 16:31:34,233][22699] Saving new best policy, reward=20.635!
+[2025-07-06 16:31:39,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2289664. Throughput: 0: 994.6. Samples: 572944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:31:39,232][21969] Avg episode reward: [(0, '21.470')]
+[2025-07-06 16:31:39,300][22712] Updated weights for policy 0, policy_version 560 (0.0022)
+[2025-07-06 16:31:39,301][22699] Saving new best policy, reward=21.470!
+[2025-07-06 16:31:44,230][21969] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2314240. Throughput: 0: 1000.7. Samples: 579820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:31:44,233][21969] Avg episode reward: [(0, '20.811')]
+[2025-07-06 16:31:48,361][22712] Updated weights for policy 0, policy_version 570 (0.0016)
+[2025-07-06 16:31:49,230][21969] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2334720. Throughput: 0: 994.1. Samples: 582990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:31:49,235][21969] Avg episode reward: [(0, '19.904')]
+[2025-07-06 16:31:54,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2351104. Throughput: 0: 997.1. Samples: 587900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:31:54,235][21969] Avg episode reward: [(0, '19.269')]
+[2025-07-06 16:31:59,081][22712] Updated weights for policy 0, policy_version 580 (0.0030)
+[2025-07-06 16:31:59,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 2375680. Throughput: 0: 993.7. Samples: 594538. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:31:59,235][21969] Avg episode reward: [(0, '18.939')]
+[2025-07-06 16:32:04,231][21969] Fps is (10 sec: 4505.3, 60 sec: 3959.5, 300 sec: 4054.4). Total num frames: 2396160. Throughput: 0: 1000.1. Samples: 598130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:32:04,235][21969] Avg episode reward: [(0, '17.539')]
+[2025-07-06 16:32:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2412544. Throughput: 0: 992.6. Samples: 602722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:32:09,235][21969] Avg episode reward: [(0, '17.229')]
+[2025-07-06 16:32:09,737][22712] Updated weights for policy 0, policy_version 590 (0.0026)
+[2025-07-06 16:32:14,230][21969] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2437120. Throughput: 0: 1003.4. Samples: 609800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:32:14,235][21969] Avg episode reward: [(0, '18.405')]
+[2025-07-06 16:32:18,750][22712] Updated weights for policy 0, policy_version 600 (0.0029)
+[2025-07-06 16:32:19,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 2457600. Throughput: 0: 999.6. Samples: 613100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 16:32:19,232][21969] Avg episode reward: [(0, '20.316')]
+[2025-07-06 16:32:24,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2473984. Throughput: 0: 1001.2. Samples: 617996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:32:24,235][21969] Avg episode reward: [(0, '20.700')]
+[2025-07-06 16:32:29,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2494464. Throughput: 0: 997.8. Samples: 624722. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 16:32:29,236][21969] Avg episode reward: [(0, '19.712')]
+[2025-07-06 16:32:29,515][22712] Updated weights for policy 0, policy_version 610 (0.0015)
+[2025-07-06 16:32:34,231][21969] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 4054.4). Total num frames: 2514944. Throughput: 0: 1006.9. Samples: 628300. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:32:34,232][21969] Avg episode reward: [(0, '19.395')]
+[2025-07-06 16:32:39,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2531328. Throughput: 0: 1005.4. Samples: 633142. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:32:39,231][21969] Avg episode reward: [(0, '16.524')]
+[2025-07-06 16:32:40,274][22712] Updated weights for policy 0, policy_version 620 (0.0021)
+[2025-07-06 16:32:44,230][21969] Fps is (10 sec: 4096.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2555904. Throughput: 0: 1008.5. Samples: 639922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:32:44,232][21969] Avg episode reward: [(0, '15.975')]
+[2025-07-06 16:32:49,233][21969] Fps is (10 sec: 4504.3, 60 sec: 4027.5, 300 sec: 4054.3). Total num frames: 2576384. Throughput: 0: 1002.5. Samples: 643244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:32:49,235][21969] Avg episode reward: [(0, '15.506')]
+[2025-07-06 16:32:50,069][22712] Updated weights for policy 0, policy_version 630 (0.0020)
+[2025-07-06 16:32:54,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2592768. Throughput: 0: 1008.1. Samples: 648086. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:32:54,236][21969] Avg episode reward: [(0, '15.563')]
+[2025-07-06 16:32:59,230][21969] Fps is (10 sec: 4097.2, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2617344. Throughput: 0: 1005.8. Samples: 655060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:32:59,235][21969] Avg episode reward: [(0, '18.030')]
+[2025-07-06 16:32:59,796][22712] Updated weights for policy 0, policy_version 640 (0.0014)
+[2025-07-06 16:33:04,233][21969] Fps is (10 sec: 4504.3, 60 sec: 4027.6, 300 sec: 4068.2). Total num frames: 2637824. Throughput: 0: 1013.0. Samples: 658690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:33:04,234][21969] Avg episode reward: [(0, '18.676')]
+[2025-07-06 16:33:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2654208. Throughput: 0: 1011.0. Samples: 663490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:33:09,232][21969] Avg episode reward: [(0, '18.854')]
+[2025-07-06 16:33:10,318][22712] Updated weights for policy 0, policy_version 650 (0.0041)
+[2025-07-06 16:33:14,230][21969] Fps is (10 sec: 4097.2, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2678784. Throughput: 0: 1021.1. Samples: 670672. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-07-06 16:33:14,235][21969] Avg episode reward: [(0, '18.909')]
+[2025-07-06 16:33:19,234][21969] Fps is (10 sec: 4503.8, 60 sec: 4027.5, 300 sec: 4054.3). Total num frames: 2699264. Throughput: 0: 1021.6. Samples: 674276. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-07-06 16:33:19,239][21969] Avg episode reward: [(0, '18.821')]
+[2025-07-06 16:33:19,873][22712] Updated weights for policy 0, policy_version 660 (0.0014)
+[2025-07-06 16:33:24,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2719744. Throughput: 0: 1026.0. Samples: 679314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:33:24,235][21969] Avg episode reward: [(0, '18.265')]
+[2025-07-06 16:33:29,230][21969] Fps is (10 sec: 4097.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2740224. Throughput: 0: 1030.6. Samples: 686298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:33:29,234][21969] Avg episode reward: [(0, '18.085')]
+[2025-07-06 16:33:29,243][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000669_2740224.pth...
+[2025-07-06 16:33:29,375][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000669_2740224.pth
+[2025-07-06 16:33:29,480][22712] Updated weights for policy 0, policy_version 670 (0.0014)
+[2025-07-06 16:33:34,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 4068.2). Total num frames: 2760704. Throughput: 0: 1033.1. Samples: 689730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:33:34,235][21969] Avg episode reward: [(0, '17.959')]
+[2025-07-06 16:33:39,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 2781184. Throughput: 0: 1033.6. Samples: 694600. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-07-06 16:33:39,235][21969] Avg episode reward: [(0, '17.831')]
+[2025-07-06 16:33:40,025][22712] Updated weights for policy 0, policy_version 680 (0.0030)
+[2025-07-06 16:33:44,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2801664. Throughput: 0: 1037.1. Samples: 701728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 16:33:44,232][21969] Avg episode reward: [(0, '19.802')]
+[2025-07-06 16:33:49,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4054.5). Total num frames: 2822144. Throughput: 0: 1024.1. Samples: 704772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:33:49,235][21969] Avg episode reward: [(0, '20.877')]
+[2025-07-06 16:33:50,468][22712] Updated weights for policy 0, policy_version 690 (0.0028)
+[2025-07-06 16:33:54,232][21969] Fps is (10 sec: 4095.3, 60 sec: 4164.1, 300 sec: 4068.2). Total num frames: 2842624. Throughput: 0: 1033.0. Samples: 709978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:33:54,237][21969] Avg episode reward: [(0, '21.567')]
+[2025-07-06 16:33:54,241][22699] Saving new best policy, reward=21.567!
+[2025-07-06 16:33:59,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2863104. Throughput: 0: 1018.2. Samples: 716492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:33:59,233][21969] Avg episode reward: [(0, '22.477')]
+[2025-07-06 16:33:59,241][22699] Saving new best policy, reward=22.477!
+[2025-07-06 16:33:59,812][22712] Updated weights for policy 0, policy_version 700 (0.0019)
+[2025-07-06 16:34:04,230][21969] Fps is (10 sec: 3687.1, 60 sec: 4027.9, 300 sec: 4054.3). Total num frames: 2879488. Throughput: 0: 1006.9. Samples: 719584. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:34:04,235][21969] Avg episode reward: [(0, '20.480')]
+[2025-07-06 16:34:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2899968. Throughput: 0: 1004.3. Samples: 724506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:34:09,237][21969] Avg episode reward: [(0, '19.672')]
+[2025-07-06 16:34:10,722][22712] Updated weights for policy 0, policy_version 710 (0.0016)
+[2025-07-06 16:34:14,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2924544. Throughput: 0: 1008.2. Samples: 731668. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-07-06 16:34:14,234][21969] Avg episode reward: [(0, '17.999')]
+[2025-07-06 16:34:19,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4028.0, 300 sec: 4054.3). Total num frames: 2940928. Throughput: 0: 997.9. Samples: 734636. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:34:19,231][21969] Avg episode reward: [(0, '16.135')]
+[2025-07-06 16:34:21,443][22712] Updated weights for policy 0, policy_version 720 (0.0023)
+[2025-07-06 16:34:24,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 2961408. Throughput: 0: 1007.2. Samples: 739924. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:34:24,234][21969] Avg episode reward: [(0, '16.718')]
+[2025-07-06 16:34:29,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2981888. Throughput: 0: 995.2. Samples: 746510. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-07-06 16:34:29,235][21969] Avg episode reward: [(0, '16.916')]
+[2025-07-06 16:34:30,288][22712] Updated weights for policy 0, policy_version 730 (0.0020)
+[2025-07-06 16:34:34,230][21969] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 3002368. Throughput: 0: 995.0. Samples: 749546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:34:34,237][21969] Avg episode reward: [(0, '16.721')]
+[2025-07-06 16:34:39,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 3022848. Throughput: 0: 995.0. Samples: 754752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 16:34:39,232][21969] Avg episode reward: [(0, '17.553')]
+[2025-07-06 16:34:41,123][22712] Updated weights for policy 0, policy_version 740 (0.0016)
+[2025-07-06 16:34:44,230][21969] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3043328. Throughput: 0: 1010.2. Samples: 761952. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 16:34:44,232][21969] Avg episode reward: [(0, '18.976')]
+[2025-07-06 16:34:49,234][21969] Fps is (10 sec: 3685.0, 60 sec: 3959.2, 300 sec: 4040.4). Total num frames: 3059712. Throughput: 0: 1002.3. Samples: 764692. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-07-06 16:34:49,235][21969] Avg episode reward: [(0, '19.691')]
+[2025-07-06 16:34:51,719][22712] Updated weights for policy 0, policy_version 750 (0.0021)
+[2025-07-06 16:34:54,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 4040.5). Total num frames: 3080192. Throughput: 0: 1014.6. Samples: 770162. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:34:54,235][21969] Avg episode reward: [(0, '19.954')]
+[2025-07-06 16:34:59,230][21969] Fps is (10 sec: 4507.3, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3104768. Throughput: 0: 1003.8. Samples: 776838. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:34:59,236][21969] Avg episode reward: [(0, '20.775')]
+[2025-07-06 16:35:00,642][22712] Updated weights for policy 0, policy_version 760 (0.0024)
+[2025-07-06 16:35:04,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 3121152. Throughput: 0: 1002.6. Samples: 779754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:35:04,234][21969] Avg episode reward: [(0, '21.616')]
+[2025-07-06 16:35:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3141632. Throughput: 0: 1005.8. Samples: 785186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:35:09,235][21969] Avg episode reward: [(0, '19.543')]
+[2025-07-06 16:35:11,478][22712] Updated weights for policy 0, policy_version 770 (0.0022)
+[2025-07-06 16:35:14,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3166208. Throughput: 0: 1012.3. Samples: 792062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-07-06 16:35:14,239][21969] Avg episode reward: [(0, '18.923')]
+[2025-07-06 16:35:19,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3182592. Throughput: 0: 1007.4. Samples: 794878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:35:19,233][21969] Avg episode reward: [(0, '19.116')]
+[2025-07-06 16:35:22,252][22712] Updated weights for policy 0, policy_version 780 (0.0033)
+[2025-07-06 16:35:24,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3203072. Throughput: 0: 1011.7. Samples: 800278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:35:24,232][21969] Avg episode reward: [(0, '18.559')]
+[2025-07-06 16:35:29,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3227648. Throughput: 0: 1008.3. Samples: 807324. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:35:29,232][21969] Avg episode reward: [(0, '18.931')]
+[2025-07-06 16:35:29,242][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000788_3227648.pth...
+[2025-07-06 16:35:29,367][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000788_3227648.pth
+[2025-07-06 16:35:31,588][22712] Updated weights for policy 0, policy_version 790 (0.0025)
+[2025-07-06 16:35:34,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3239936. Throughput: 0: 1002.5. Samples: 809802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:35:34,231][21969] Avg episode reward: [(0, '19.799')]
+[2025-07-06 16:35:39,230][21969] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3260416. Throughput: 0: 1004.9. Samples: 815384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:35:39,234][21969] Avg episode reward: [(0, '19.997')]
+[2025-07-06 16:35:42,062][22712] Updated weights for policy 0, policy_version 800 (0.0019)
+[2025-07-06 16:35:44,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3284992. Throughput: 0: 1007.6. Samples: 822182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:35:44,235][21969] Avg episode reward: [(0, '19.411')]
+[2025-07-06 16:35:49,232][21969] Fps is (10 sec: 4095.3, 60 sec: 4027.9, 300 sec: 4026.6). Total num frames: 3301376. Throughput: 0: 1002.6. Samples: 824874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:35:49,236][21969] Avg episode reward: [(0, '18.441')]
+[2025-07-06 16:35:52,773][22712] Updated weights for policy 0, policy_version 810 (0.0025)
+[2025-07-06 16:35:54,230][21969] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3321856. Throughput: 0: 1003.6. Samples: 830348. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:35:54,235][21969] Avg episode reward: [(0, '18.987')]
+[2025-07-06 16:35:59,234][21969] Fps is (10 sec: 4095.1, 60 sec: 3959.2, 300 sec: 4012.6). Total num frames: 3342336. Throughput: 0: 1002.4. Samples: 837172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:35:59,235][21969] Avg episode reward: [(0, '17.801')]
+[2025-07-06 16:36:02,785][22712] Updated weights for policy 0, policy_version 820 (0.0026)
+[2025-07-06 16:36:04,231][21969] Fps is (10 sec: 3686.0, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 3358720. Throughput: 0: 995.3. Samples: 839668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:36:04,233][21969] Avg episode reward: [(0, '18.655')]
+[2025-07-06 16:36:09,230][21969] Fps is (10 sec: 4097.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3383296. Throughput: 0: 1003.6. Samples: 845440. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:36:09,232][21969] Avg episode reward: [(0, '19.107')]
+[2025-07-06 16:36:12,552][22712] Updated weights for policy 0, policy_version 830 (0.0020)
+[2025-07-06 16:36:14,230][21969] Fps is (10 sec: 4506.1, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3403776. Throughput: 0: 998.8. Samples: 852272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:36:14,234][21969] Avg episode reward: [(0, '18.204')]
+[2025-07-06 16:36:19,233][21969] Fps is (10 sec: 3685.3, 60 sec: 3959.3, 300 sec: 4026.5). Total num frames: 3420160. Throughput: 0: 998.5. Samples: 854736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:36:19,234][21969] Avg episode reward: [(0, '19.187')]
+[2025-07-06 16:36:23,345][22712] Updated weights for policy 0, policy_version 840 (0.0018)
+[2025-07-06 16:36:24,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3444736. Throughput: 0: 1001.4. Samples: 860446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:36:24,233][21969] Avg episode reward: [(0, '19.664')]
+[2025-07-06 16:36:29,230][21969] Fps is (10 sec: 4506.9, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3465216. Throughput: 0: 1006.2. Samples: 867462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:36:29,234][21969] Avg episode reward: [(0, '20.176')]
+[2025-07-06 16:36:33,811][22712] Updated weights for policy 0, policy_version 850 (0.0032)
+[2025-07-06 16:36:34,231][21969] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3481600. Throughput: 0: 997.4. Samples: 869756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:36:34,233][21969] Avg episode reward: [(0, '19.645')]
+[2025-07-06 16:36:39,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3502080. Throughput: 0: 1007.0. Samples: 875662. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:36:39,235][21969] Avg episode reward: [(0, '20.067')]
+[2025-07-06 16:36:43,136][22712] Updated weights for policy 0, policy_version 860 (0.0019)
+[2025-07-06 16:36:44,230][21969] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3526656. Throughput: 0: 1005.6. Samples: 882420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:36:44,235][21969] Avg episode reward: [(0, '20.313')]
+[2025-07-06 16:36:49,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4040.5). Total num frames: 3543040. Throughput: 0: 1004.6. Samples: 884872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:36:49,234][21969] Avg episode reward: [(0, '20.194')]
+[2025-07-06 16:36:53,956][22712] Updated weights for policy 0, policy_version 870 (0.0022)
+[2025-07-06 16:36:54,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3563520. Throughput: 0: 1000.8. Samples: 890476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:36:54,237][21969] Avg episode reward: [(0, '20.635')]
+[2025-07-06 16:36:59,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4096.3, 300 sec: 4040.5). Total num frames: 3588096. Throughput: 0: 1005.3. Samples: 897510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:36:59,231][21969] Avg episode reward: [(0, '21.798')]
+[2025-07-06 16:37:04,232][21969] Fps is (10 sec: 3685.6, 60 sec: 4027.7, 300 sec: 4026.5). Total num frames: 3600384. Throughput: 0: 999.8. Samples: 899724. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:37:04,236][21969] Avg episode reward: [(0, '21.433')]
+[2025-07-06 16:37:04,916][22712] Updated weights for policy 0, policy_version 880 (0.0022)
+[2025-07-06 16:37:09,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3624960. Throughput: 0: 1003.0. Samples: 905582. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:37:09,232][21969] Avg episode reward: [(0, '21.534')]
+[2025-07-06 16:37:13,705][22712] Updated weights for policy 0, policy_version 890 (0.0021)
+[2025-07-06 16:37:14,230][21969] Fps is (10 sec: 4506.5, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3645440. Throughput: 0: 996.8. Samples: 912320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:37:14,235][21969] Avg episode reward: [(0, '22.940')]
+[2025-07-06 16:37:14,238][22699] Saving new best policy, reward=22.940!
+[2025-07-06 16:37:19,231][21969] Fps is (10 sec: 3686.1, 60 sec: 4027.9, 300 sec: 4026.6). Total num frames: 3661824. Throughput: 0: 999.5. Samples: 914732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:37:19,236][21969] Avg episode reward: [(0, '22.712')]
+[2025-07-06 16:37:24,230][21969] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3682304. Throughput: 0: 994.7. Samples: 920424. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:37:24,232][21969] Avg episode reward: [(0, '22.144')]
+[2025-07-06 16:37:24,478][22712] Updated weights for policy 0, policy_version 900 (0.0020)
+[2025-07-06 16:37:29,230][21969] Fps is (10 sec: 4505.9, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3706880. Throughput: 0: 998.5. Samples: 927352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-07-06 16:37:29,232][21969] Avg episode reward: [(0, '21.860')]
+[2025-07-06 16:37:29,241][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000905_3706880.pth...
+[2025-07-06 16:37:29,354][22699] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000883_3616768.pth
+[2025-07-06 16:37:34,231][21969] Fps is (10 sec: 3686.2, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 3719168. Throughput: 0: 995.7. Samples: 929680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:37:34,233][21969] Avg episode reward: [(0, '20.968')]
+[2025-07-06 16:37:35,166][22712] Updated weights for policy 0, policy_version 910 (0.0023)
+[2025-07-06 16:37:39,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3743744. Throughput: 0: 1008.6. Samples: 935864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:37:39,232][21969] Avg episode reward: [(0, '19.603')]
+[2025-07-06 16:37:43,762][22712] Updated weights for policy 0, policy_version 920 (0.0022)
+[2025-07-06 16:37:44,231][21969] Fps is (10 sec: 4915.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3768320. Throughput: 0: 1012.3. Samples: 943064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:37:44,235][21969] Avg episode reward: [(0, '19.605')]
+[2025-07-06 16:37:49,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3784704. Throughput: 0: 1013.5. Samples: 945330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:37:49,234][21969] Avg episode reward: [(0, '19.392')]
+[2025-07-06 16:37:54,094][22712] Updated weights for policy 0, policy_version 930 (0.0018)
+[2025-07-06 16:37:54,230][21969] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3809280. Throughput: 0: 1024.8. Samples: 951700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:37:54,235][21969] Avg episode reward: [(0, '20.422')]
+[2025-07-06 16:37:59,230][21969] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3829760. Throughput: 0: 1029.9. Samples: 958664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-07-06 16:37:59,235][21969] Avg episode reward: [(0, '21.771')]
+[2025-07-06 16:38:04,231][21969] Fps is (10 sec: 3686.3, 60 sec: 4096.1, 300 sec: 4040.5). Total num frames: 3846144. Throughput: 0: 1021.9. Samples: 960718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:38:04,235][21969] Avg episode reward: [(0, '23.000')]
+[2025-07-06 16:38:04,240][22699] Saving new best policy, reward=23.000!
+[2025-07-06 16:38:04,744][22712] Updated weights for policy 0, policy_version 940 (0.0020)
+[2025-07-06 16:38:09,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3870720. Throughput: 0: 1037.8. Samples: 967126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:38:09,237][21969] Avg episode reward: [(0, '22.990')]
+[2025-07-06 16:38:13,725][22712] Updated weights for policy 0, policy_version 950 (0.0025)
+[2025-07-06 16:38:14,230][21969] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3891200. Throughput: 0: 1028.3. Samples: 973624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:38:14,232][21969] Avg episode reward: [(0, '23.600')]
+[2025-07-06 16:38:14,233][22699] Saving new best policy, reward=23.600!
+[2025-07-06 16:38:19,231][21969] Fps is (10 sec: 3686.1, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3907584. Throughput: 0: 1023.2. Samples: 975724. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:38:19,235][21969] Avg episode reward: [(0, '22.631')]
+[2025-07-06 16:38:24,230][21969] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3928064. Throughput: 0: 1028.1. Samples: 982128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:38:24,235][21969] Avg episode reward: [(0, '21.482')]
+[2025-07-06 16:38:24,327][22712] Updated weights for policy 0, policy_version 960 (0.0016)
+[2025-07-06 16:38:29,230][21969] Fps is (10 sec: 4506.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3952640. Throughput: 0: 1013.2. Samples: 988660. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:38:29,232][21969] Avg episode reward: [(0, '20.784')]
+[2025-07-06 16:38:34,230][21969] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 3969024. Throughput: 0: 1008.2. Samples: 990698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-07-06 16:38:34,235][21969] Avg episode reward: [(0, '20.967')]
+[2025-07-06 16:38:35,124][22712] Updated weights for policy 0, policy_version 970 (0.0024)
+[2025-07-06 16:38:39,230][21969] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 3989504. Throughput: 0: 1016.2. Samples: 997428. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-07-06 16:38:39,235][21969] Avg episode reward: [(0, '21.323')]
+[2025-07-06 16:38:42,191][22699] Stopping Batcher_0...
+[2025-07-06 16:38:42,193][22699] Loop batcher_evt_loop terminating...
+[2025-07-06 16:38:42,193][21969] Component Batcher_0 stopped!
+[2025-07-06 16:38:42,194][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 16:38:42,255][22712] Weights refcount: 2 0
+[2025-07-06 16:38:42,261][21969] Component InferenceWorker_p0-w0 stopped!
+[2025-07-06 16:38:42,263][22712] Stopping InferenceWorker_p0-w0...
+[2025-07-06 16:38:42,264][22712] Loop inference_proc0-0_evt_loop terminating...
+[2025-07-06 16:38:42,360][22699] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 16:38:42,528][21969] Component RolloutWorker_w5 stopped!
+[2025-07-06 16:38:42,530][22717] Stopping RolloutWorker_w5...
+[2025-07-06 16:38:42,533][21969] Component LearnerWorker_p0 stopped!
+[2025-07-06 16:38:42,531][22717] Loop rollout_proc5_evt_loop terminating...
+[2025-07-06 16:38:42,533][22699] Stopping LearnerWorker_p0...
+[2025-07-06 16:38:42,538][22699] Loop learner_proc0_evt_loop terminating...
+[2025-07-06 16:38:42,550][21969] Component RolloutWorker_w7 stopped!
+[2025-07-06 16:38:42,551][22718] Stopping RolloutWorker_w7...
+[2025-07-06 16:38:42,553][22718] Loop rollout_proc7_evt_loop terminating...
+[2025-07-06 16:38:42,557][21969] Component RolloutWorker_w1 stopped!
+[2025-07-06 16:38:42,558][22714] Stopping RolloutWorker_w1...
+[2025-07-06 16:38:42,559][22714] Loop rollout_proc1_evt_loop terminating...
+[2025-07-06 16:38:42,573][21969] Component RolloutWorker_w3 stopped!
+[2025-07-06 16:38:42,572][22716] Stopping RolloutWorker_w3...
+[2025-07-06 16:38:42,574][22716] Loop rollout_proc3_evt_loop terminating...
+[2025-07-06 16:38:42,728][22720] Stopping RolloutWorker_w6...
+[2025-07-06 16:38:42,729][22720] Loop rollout_proc6_evt_loop terminating...
+[2025-07-06 16:38:42,728][21969] Component RolloutWorker_w6 stopped!
+[2025-07-06 16:38:42,739][22715] Stopping RolloutWorker_w2...
+[2025-07-06 16:38:42,739][21969] Component RolloutWorker_w2 stopped!
+[2025-07-06 16:38:42,743][22715] Loop rollout_proc2_evt_loop terminating...
+[2025-07-06 16:38:42,754][21969] Component RolloutWorker_w0 stopped!
+[2025-07-06 16:38:42,758][22713] Stopping RolloutWorker_w0...
+[2025-07-06 16:38:42,759][22713] Loop rollout_proc0_evt_loop terminating...
+[2025-07-06 16:38:42,765][21969] Component RolloutWorker_w4 stopped!
+[2025-07-06 16:38:42,769][21969] Waiting for process learner_proc0 to stop...
+[2025-07-06 16:38:42,772][22719] Stopping RolloutWorker_w4...
+[2025-07-06 16:38:42,774][22719] Loop rollout_proc4_evt_loop terminating...
+[2025-07-06 16:38:44,856][21969] Waiting for process inference_proc0-0 to join...
+[2025-07-06 16:38:44,875][21969] Waiting for process rollout_proc0 to join...
+[2025-07-06 16:38:47,572][21969] Waiting for process rollout_proc1 to join...
+[2025-07-06 16:38:47,593][21969] Waiting for process rollout_proc2 to join...
+[2025-07-06 16:38:47,594][21969] Waiting for process rollout_proc3 to join...
+[2025-07-06 16:38:47,596][21969] Waiting for process rollout_proc4 to join...
+[2025-07-06 16:38:47,597][21969] Waiting for process rollout_proc5 to join...
+[2025-07-06 16:38:47,599][21969] Waiting for process rollout_proc6 to join...
+[2025-07-06 16:38:47,600][21969] Waiting for process rollout_proc7 to join...
+[2025-07-06 16:38:47,601][21969] Batcher 0 profile tree view:
+batching: 26.3992, releasing_batches: 0.0226
+[2025-07-06 16:38:47,602][21969] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0000
+  wait_policy_total: 402.5665
+update_model: 8.3738
+  weight_update: 0.0015
+one_step: 0.0097
+  handle_policy_step: 560.5832
+    deserialize: 13.3991, stack: 2.9791, obs_to_device_normalize: 117.9687, forward: 287.5495, send_messages: 27.8054
+    prepare_outputs: 86.7100
+      to_cpu: 53.6134
+[2025-07-06 16:38:47,603][21969] Learner 0 profile tree view:
+misc: 0.0049, prepare_batch: 12.4655
+train: 73.3782
+  epoch_init: 0.0045, minibatch_init: 0.0077, losses_postprocess: 0.7122, kl_divergence: 0.7295, after_optimizer: 33.2928
+  calculate_losses: 25.8418
+    losses_init: 0.0152, forward_head: 1.2984, bptt_initial: 17.2304, tail: 1.0916, advantages_returns: 0.3388, losses: 3.7090
+    bptt: 1.8873
+      bptt_forward_core: 1.8232
+  update: 12.2045
+    clip: 0.9781
+[2025-07-06 16:38:47,604][21969] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 0.2348, enqueue_policy_requests: 94.7937, env_step: 801.9069, overhead: 11.5350, complete_rollouts: 7.4212
+save_policy_outputs: 17.4529
+  split_output_tensors: 6.8550
+[2025-07-06 16:38:47,605][21969] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.2380, enqueue_policy_requests: 102.3670, env_step: 787.5982, overhead: 11.1607, complete_rollouts: 6.1115
+save_policy_outputs: 16.6501
+  split_output_tensors: 6.4323
+[2025-07-06 16:38:47,606][21969] Loop Runner_EvtLoop terminating...
+[2025-07-06 16:38:47,607][21969] Runner profile tree view:
+main_loop: 1033.8836
+[2025-07-06 16:38:47,607][21969] Collected {0: 4005888}, FPS: 3874.6
+[2025-07-06 16:58:14,453][21969] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-07-06 16:58:14,454][21969] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-07-06 16:58:14,455][21969] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-07-06 16:58:14,456][21969] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-07-06 16:58:14,457][21969] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-07-06 16:58:14,458][21969] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-07-06 16:58:14,459][21969] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-07-06 16:58:14,460][21969] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-07-06 16:58:14,461][21969] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-07-06 16:58:14,462][21969] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-07-06 16:58:14,463][21969] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-07-06 16:58:14,464][21969] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-07-06 16:58:14,465][21969] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-07-06 16:58:14,466][21969] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-07-06 16:58:14,467][21969] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-07-06 16:58:14,496][21969] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-07-06 16:58:14,498][21969] RunningMeanStd input shape: (3, 72, 128)
+[2025-07-06 16:58:14,500][21969] RunningMeanStd input shape: (1,)
+[2025-07-06 16:58:14,513][21969] ConvEncoder: input_channels=3
+[2025-07-06 16:58:14,616][21969] Conv encoder output size: 512
+[2025-07-06 16:58:14,617][21969] Policy head output size: 512
+[2025-07-06 16:58:14,874][21969] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 16:58:15,647][21969] Num frames 100...
+[2025-07-06 16:58:15,775][21969] Num frames 200...
+[2025-07-06 16:58:15,904][21969] Num frames 300...
+[2025-07-06 16:58:16,039][21969] Num frames 400...
+[2025-07-06 16:58:16,167][21969] Num frames 500...
+[2025-07-06 16:58:16,310][21969] Num frames 600...
+[2025-07-06 16:58:16,437][21969] Num frames 700...
+[2025-07-06 16:58:16,567][21969] Num frames 800...
+[2025-07-06 16:58:16,694][21969] Num frames 900...
+[2025-07-06 16:58:16,834][21969] Avg episode rewards: #0: 19.640, true rewards: #0: 9.640
+[2025-07-06 16:58:16,835][21969] Avg episode reward: 19.640, avg true_objective: 9.640
+[2025-07-06 16:58:16,883][21969] Num frames 1000...
+[2025-07-06 16:58:17,012][21969] Num frames 1100...
+[2025-07-06 16:58:17,145][21969] Num frames 1200...
+[2025-07-06 16:58:17,282][21969] Num frames 1300...
+[2025-07-06 16:58:17,414][21969] Num frames 1400...
+[2025-07-06 16:58:17,543][21969] Num frames 1500...
+[2025-07-06 16:58:17,673][21969] Num frames 1600...
+[2025-07-06 16:58:17,815][21969] Avg episode rewards: #0: 16.840, true rewards: #0: 8.340
+[2025-07-06 16:58:17,816][21969] Avg episode reward: 16.840, avg true_objective: 8.340
+[2025-07-06 16:58:17,860][21969] Num frames 1700...
+[2025-07-06 16:58:17,989][21969] Num frames 1800...
+[2025-07-06 16:58:18,125][21969] Num frames 1900...
+[2025-07-06 16:58:18,256][21969] Num frames 2000...
+[2025-07-06 16:58:18,435][21969] Num frames 2100...
+[2025-07-06 16:58:18,618][21969] Num frames 2200...
+[2025-07-06 16:58:18,792][21969] Num frames 2300...
+[2025-07-06 16:58:18,963][21969] Num frames 2400...
+[2025-07-06 16:58:19,143][21969] Num frames 2500...
+[2025-07-06 16:58:19,323][21969] Num frames 2600...
+[2025-07-06 16:58:19,501][21969] Num frames 2700...
+[2025-07-06 16:58:19,671][21969] Num frames 2800...
+[2025-07-06 16:58:19,847][21969] Num frames 2900...
+[2025-07-06 16:58:20,037][21969] Num frames 3000...
+[2025-07-06 16:58:20,217][21969] Num frames 3100...
+[2025-07-06 16:58:20,409][21969] Num frames 3200...
+[2025-07-06 16:58:20,598][21969] Num frames 3300...
+[2025-07-06 16:58:20,734][21969] Num frames 3400...
+[2025-07-06 16:58:20,867][21969] Num frames 3500...
+[2025-07-06 16:58:21,002][21969] Num frames 3600...
+[2025-07-06 16:58:21,134][21969] Num frames 3700...
+[2025-07-06 16:58:21,282][21969] Avg episode rewards: #0: 30.893, true rewards: #0: 12.560
+[2025-07-06 16:58:21,283][21969] Avg episode reward: 30.893, avg true_objective: 12.560
+[2025-07-06 16:58:21,326][21969] Num frames 3800...
+[2025-07-06 16:58:21,456][21969] Num frames 3900...
+[2025-07-06 16:58:21,596][21969] Num frames 4000...
+[2025-07-06 16:58:21,730][21969] Num frames 4100...
+[2025-07-06 16:58:21,862][21969] Num frames 4200...
+[2025-07-06 16:58:21,994][21969] Num frames 4300...
+[2025-07-06 16:58:22,066][21969] Avg episode rewards: #0: 25.030, true rewards: #0: 10.780
+[2025-07-06 16:58:22,067][21969] Avg episode reward: 25.030, avg true_objective: 10.780
+[2025-07-06 16:58:22,182][21969] Num frames 4400...
+[2025-07-06 16:58:22,314][21969] Num frames 4500...
+[2025-07-06 16:58:22,448][21969] Num frames 4600...
+[2025-07-06 16:58:22,588][21969] Num frames 4700...
+[2025-07-06 16:58:22,718][21969] Num frames 4800...
+[2025-07-06 16:58:22,850][21969] Num frames 4900...
+[2025-07-06 16:58:22,982][21969] Num frames 5000...
+[2025-07-06 16:58:23,114][21969] Num frames 5100...
+[2025-07-06 16:58:23,247][21969] Num frames 5200...
+[2025-07-06 16:58:23,376][21969] Num frames 5300...
+[2025-07-06 16:58:23,508][21969] Num frames 5400...
+[2025-07-06 16:58:23,647][21969] Num frames 5500...
+[2025-07-06 16:58:23,780][21969] Num frames 5600...
+[2025-07-06 16:58:23,910][21969] Num frames 5700...
+[2025-07-06 16:58:24,068][21969] Avg episode rewards: #0: 27.160, true rewards: #0: 11.560
+[2025-07-06 16:58:24,069][21969] Avg episode reward: 27.160, avg true_objective: 11.560
+[2025-07-06 16:58:24,096][21969] Num frames 5800...
+[2025-07-06 16:58:24,236][21969] Num frames 5900...
+[2025-07-06 16:58:24,364][21969] Num frames 6000...
+[2025-07-06 16:58:24,493][21969] Num frames 6100...
+[2025-07-06 16:58:24,635][21969] Num frames 6200...
+[2025-07-06 16:58:24,765][21969] Num frames 6300...
+[2025-07-06 16:58:24,896][21969] Num frames 6400...
+[2025-07-06 16:58:25,029][21969] Num frames 6500...
+[2025-07-06 16:58:25,159][21969] Num frames 6600...
+[2025-07-06 16:58:25,288][21969] Num frames 6700...
+[2025-07-06 16:58:25,417][21969] Num frames 6800...
+[2025-07-06 16:58:25,551][21969] Num frames 6900...
+[2025-07-06 16:58:25,695][21969] Num frames 7000...
+[2025-07-06 16:58:25,828][21969] Num frames 7100...
+[2025-07-06 16:58:25,967][21969] Num frames 7200...
+[2025-07-06 16:58:26,103][21969] Num frames 7300...
+[2025-07-06 16:58:26,239][21969] Num frames 7400...
+[2025-07-06 16:58:26,376][21969] Num frames 7500...
+[2025-07-06 16:58:26,508][21969] Num frames 7600...
+[2025-07-06 16:58:26,650][21969] Num frames 7700...
+[2025-07-06 16:58:26,786][21969] Num frames 7800...
+[2025-07-06 16:58:26,945][21969] Avg episode rewards: #0: 32.300, true rewards: #0: 13.133
+[2025-07-06 16:58:26,946][21969] Avg episode reward: 32.300, avg true_objective: 13.133
+[2025-07-06 16:58:26,975][21969] Num frames 7900...
+[2025-07-06 16:58:27,107][21969] Num frames 8000...
+[2025-07-06 16:58:27,236][21969] Num frames 8100...
+[2025-07-06 16:58:27,371][21969] Num frames 8200...
+[2025-07-06 16:58:27,502][21969] Num frames 8300...
+[2025-07-06 16:58:27,635][21969] Num frames 8400...
+[2025-07-06 16:58:27,780][21969] Num frames 8500...
+[2025-07-06 16:58:27,912][21969] Num frames 8600...
+[2025-07-06 16:58:28,046][21969] Num frames 8700...
+[2025-07-06 16:58:28,185][21969] Num frames 8800...
+[2025-07-06 16:58:28,294][21969] Avg episode rewards: #0: 30.628, true rewards: #0: 12.629
+[2025-07-06 16:58:28,295][21969] Avg episode reward: 30.628, avg true_objective: 12.629
+[2025-07-06 16:58:28,374][21969] Num frames 8900...
+[2025-07-06 16:58:28,504][21969] Num frames 9000...
+[2025-07-06 16:58:28,633][21969] Num frames 9100...
+[2025-07-06 16:58:28,776][21969] Num frames 9200...
+[2025-07-06 16:58:28,907][21969] Num frames 9300...
+[2025-07-06 16:58:29,040][21969] Num frames 9400...
+[2025-07-06 16:58:29,158][21969] Avg episode rewards: #0: 28.435, true rewards: #0: 11.810
+[2025-07-06 16:58:29,159][21969] Avg episode reward: 28.435, avg true_objective: 11.810
+[2025-07-06 16:58:29,232][21969] Num frames 9500...
+[2025-07-06 16:58:29,366][21969] Num frames 9600...
+[2025-07-06 16:58:29,496][21969] Num frames 9700...
+[2025-07-06 16:58:29,625][21969] Num frames 9800...
+[2025-07-06 16:58:29,815][21969] Avg episode rewards: #0: 25.884, true rewards: #0: 10.996
+[2025-07-06 16:58:29,816][21969] Avg episode reward: 25.884, avg true_objective: 10.996
+[2025-07-06 16:58:29,824][21969] Num frames 9900...
+[2025-07-06 16:58:29,954][21969] Num frames 10000...
+[2025-07-06 16:58:30,092][21969] Num frames 10100...
+[2025-07-06 16:58:30,223][21969] Num frames 10200...
+[2025-07-06 16:58:30,354][21969] Num frames 10300...
+[2025-07-06 16:58:30,484][21969] Num frames 10400...
+[2025-07-06 16:58:30,625][21969] Num frames 10500...
+[2025-07-06 16:58:30,812][21969] Num frames 10600...
+[2025-07-06 16:58:30,925][21969] Avg episode rewards: #0: 24.628, true rewards: #0: 10.628
+[2025-07-06 16:58:30,927][21969] Avg episode reward: 24.628, avg true_objective: 10.628
+[2025-07-06 16:59:38,353][21969] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+[2025-07-06 17:02:10,224][21969] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-07-06 17:02:10,225][21969] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-07-06 17:02:10,226][21969] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-07-06 17:02:10,227][21969] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-07-06 17:02:10,228][21969] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-07-06 17:02:10,229][21969] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-07-06 17:02:10,229][21969] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-07-06 17:02:10,231][21969] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-07-06 17:02:10,231][21969] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-07-06 17:02:10,232][21969] Adding new argument 'hf_repository'='zhngq/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-07-06 17:02:10,233][21969] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-07-06 17:02:10,235][21969] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-07-06 17:02:10,236][21969] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-07-06 17:02:10,237][21969] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-07-06 17:02:10,238][21969] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-07-06 17:02:10,263][21969] RunningMeanStd input shape: (3, 72, 128)
+[2025-07-06 17:02:10,265][21969] RunningMeanStd input shape: (1,)
+[2025-07-06 17:02:10,276][21969] ConvEncoder: input_channels=3
+[2025-07-06 17:02:10,308][21969] Conv encoder output size: 512
+[2025-07-06 17:02:10,309][21969] Policy head output size: 512
+[2025-07-06 17:02:10,327][21969] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-07-06 17:02:10,769][21969] Num frames 100...
+[2025-07-06 17:02:10,901][21969] Num frames 200...
+[2025-07-06 17:02:11,032][21969] Num frames 300...
+[2025-07-06 17:02:11,166][21969] Num frames 400...
+[2025-07-06 17:02:11,306][21969] Num frames 500...
+[2025-07-06 17:02:11,436][21969] Num frames 600...
+[2025-07-06 17:02:11,566][21969] Num frames 700...
+[2025-07-06 17:02:11,707][21969] Num frames 800...
+[2025-07-06 17:02:11,804][21969] Avg episode rewards: #0: 16.320, true rewards: #0: 8.320
+[2025-07-06 17:02:11,805][21969] Avg episode reward: 16.320, avg true_objective: 8.320
+[2025-07-06 17:02:11,897][21969] Num frames 900...
+[2025-07-06 17:02:12,027][21969] Num frames 1000...
+[2025-07-06 17:02:12,164][21969] Num frames 1100...
+[2025-07-06 17:02:12,307][21969] Num frames 1200...
+[2025-07-06 17:02:12,434][21969] Num frames 1300...
+[2025-07-06 17:02:12,506][21969] Avg episode rewards: #0: 11.560, true rewards: #0: 6.560
+[2025-07-06 17:02:12,507][21969] Avg episode reward: 11.560, avg true_objective: 6.560
+[2025-07-06 17:02:12,622][21969] Num frames 1400...
+[2025-07-06 17:02:12,750][21969] Num frames 1500...
+[2025-07-06 17:02:12,887][21969] Num frames 1600...
+[2025-07-06 17:02:13,018][21969] Num frames 1700...
+[2025-07-06 17:02:13,150][21969] Num frames 1800...
+[2025-07-06 17:02:13,280][21969] Num frames 1900...
+[2025-07-06 17:02:13,418][21969] Num frames 2000...
+[2025-07-06 17:02:13,534][21969] Avg episode rewards: #0: 12.160, true rewards: #0: 6.827
+[2025-07-06 17:02:13,535][21969] Avg episode reward: 12.160, avg true_objective: 6.827
+[2025-07-06 17:02:13,602][21969] Num frames 2100...
+[2025-07-06 17:02:13,731][21969] Num frames 2200...
+[2025-07-06 17:02:13,897][21969] Num frames 2300...
+[2025-07-06 17:02:14,080][21969] Num frames 2400...
+[2025-07-06 17:02:14,272][21969] Num frames 2500...
+[2025-07-06 17:02:14,457][21969] Num frames 2600...
+[2025-07-06 17:02:14,664][21969] Avg episode rewards: #0: 12.970, true rewards: #0: 6.720
+[2025-07-06 17:02:14,665][21969] Avg episode reward: 12.970, avg true_objective: 6.720
+[2025-07-06 17:02:14,688][21969] Num frames 2700...
+[2025-07-06 17:02:14,857][21969] Num frames 2800...
+[2025-07-06 17:02:15,025][21969] Num frames 2900...
+[2025-07-06 17:02:15,193][21969] Num frames 3000...
+[2025-07-06 17:02:15,371][21969] Num frames 3100...
+[2025-07-06 17:02:15,438][21969] Avg episode rewards: #0: 12.008, true rewards: #0: 6.208
+[2025-07-06 17:02:15,439][21969] Avg episode reward: 12.008, avg true_objective: 6.208
+[2025-07-06 17:02:15,617][21969] Num frames 3200...
+[2025-07-06 17:02:15,800][21969] Num frames 3300...
+[2025-07-06 17:02:15,952][21969] Num frames 3400...
+[2025-07-06 17:02:16,120][21969] Avg episode rewards: #0: 10.647, true rewards: #0: 5.813
+[2025-07-06 17:02:16,121][21969] Avg episode reward: 10.647, avg true_objective: 5.813
+[2025-07-06 17:02:16,138][21969] Num frames 3500...
+[2025-07-06 17:02:16,267][21969] Num frames 3600...
+[2025-07-06 17:02:16,398][21969] Num frames 3700...
+[2025-07-06 17:02:16,540][21969] Num frames 3800...
+[2025-07-06 17:02:16,659][21969] Avg episode rewards: #0: 9.927, true rewards: #0: 5.499
+[2025-07-06 17:02:16,660][21969] Avg episode reward: 9.927, avg true_objective: 5.499
+[2025-07-06 17:02:16,728][21969] Num frames 3900...
+[2025-07-06 17:02:16,856][21969] Num frames 4000...
+[2025-07-06 17:02:16,985][21969] Num frames 4100...
+[2025-07-06 17:02:17,114][21969] Num frames 4200...
+[2025-07-06 17:02:17,243][21969] Num frames 4300...
+[2025-07-06 17:02:17,412][21969] Num frames 4400...
+[2025-07-06 17:02:17,560][21969] Num frames 4500...
+[2025-07-06 17:02:17,693][21969] Num frames 4600...
+[2025-07-06 17:02:17,821][21969] Num frames 4700...
+[2025-07-06 17:02:17,950][21969] Num frames 4800...
+[2025-07-06 17:02:18,091][21969] Num frames 4900...
+[2025-07-06 17:02:18,223][21969] Num frames 5000...
+[2025-07-06 17:02:18,352][21969] Num frames 5100...
+[2025-07-06 17:02:18,528][21969] Avg episode rewards: #0: 12.116, true rewards: #0: 6.491
+[2025-07-06 17:02:18,530][21969] Avg episode reward: 12.116, avg true_objective: 6.491
+[2025-07-06 17:02:18,540][21969] Num frames 5200...
+[2025-07-06 17:02:18,668][21969] Num frames 5300...
+[2025-07-06 17:02:18,797][21969] Num frames 5400...
+[2025-07-06 17:02:18,931][21969] Num frames 5500...
+[2025-07-06 17:02:19,062][21969] Num frames 5600...
+[2025-07-06 17:02:19,188][21969] Num frames 5700...
+[2025-07-06 17:02:19,319][21969] Num frames 5800...
+[2025-07-06 17:02:19,446][21969] Num frames 5900...
+[2025-07-06 17:02:19,588][21969] Num frames 6000...
+[2025-07-06 17:02:19,717][21969] Num frames 6100...
+[2025-07-06 17:02:19,884][21969] Avg episode rewards: #0: 12.983, true rewards: #0: 6.872
+[2025-07-06 17:02:19,885][21969] Avg episode reward: 12.983, avg true_objective: 6.872
+[2025-07-06 17:02:19,905][21969] Num frames 6200...
+[2025-07-06 17:02:20,034][21969] Num frames 6300...
+[2025-07-06 17:02:20,164][21969] Num frames 6400...
+[2025-07-06 17:02:20,290][21969] Num frames 6500...
+[2025-07-06 17:02:20,421][21969] Num frames 6600...
+[2025-07-06 17:02:20,555][21969] Num frames 6700...
+[2025-07-06 17:02:20,694][21969] Num frames 6800...
+[2025-07-06 17:02:20,863][21969] Avg episode rewards: #0: 13.289, true rewards: #0: 6.889
+[2025-07-06 17:02:20,864][21969] Avg episode reward: 13.289, avg true_objective: 6.889
+[2025-07-06 17:03:00,125][21969] Replay video saved to /content/train_dir/default_experiment/replay.mp4!