diff --git "a/sf_log.txt" "b/sf_log.txt"
new file mode 100644--- /dev/null
+++ "b/sf_log.txt"
@@ -0,0 +1,1236 @@
+[2025-08-18 16:45:17,470][02710] Saving configuration to /content/train_dir/default_experiment/config.json...
+[2025-08-18 16:45:17,472][02710] Rollout worker 0 uses device cpu
+[2025-08-18 16:45:17,473][02710] Rollout worker 1 uses device cpu
+[2025-08-18 16:45:17,475][02710] Rollout worker 2 uses device cpu
+[2025-08-18 16:45:17,475][02710] Rollout worker 3 uses device cpu
+[2025-08-18 16:45:17,476][02710] Rollout worker 4 uses device cpu
+[2025-08-18 16:45:17,477][02710] Rollout worker 5 uses device cpu
+[2025-08-18 16:45:17,478][02710] Rollout worker 6 uses device cpu
+[2025-08-18 16:45:17,479][02710] Rollout worker 7 uses device cpu
+[2025-08-18 16:45:17,646][02710] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-18 16:45:17,647][02710] InferenceWorker_p0-w0: min num requests: 2
+[2025-08-18 16:45:17,677][02710] Starting all processes...
+[2025-08-18 16:45:17,678][02710] Starting process learner_proc0
+[2025-08-18 16:45:17,729][02710] Starting all processes...
+[2025-08-18 16:45:17,738][02710] Starting process inference_proc0-0
+[2025-08-18 16:45:17,741][02710] Starting process rollout_proc0
+[2025-08-18 16:45:17,751][02710] Starting process rollout_proc1
+[2025-08-18 16:45:17,751][02710] Starting process rollout_proc2
+[2025-08-18 16:45:17,751][02710] Starting process rollout_proc3
+[2025-08-18 16:45:17,751][02710] Starting process rollout_proc4
+[2025-08-18 16:45:17,751][02710] Starting process rollout_proc5
+[2025-08-18 16:45:17,751][02710] Starting process rollout_proc6
+[2025-08-18 16:45:17,751][02710] Starting process rollout_proc7
+[2025-08-18 16:45:34,168][02847] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-18 16:45:34,179][02847] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2025-08-18 16:45:34,276][02847] Num visible devices: 1
+[2025-08-18 16:45:34,417][02834] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-18 16:45:34,424][02834] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2025-08-18 16:45:34,514][02834] Num visible devices: 1
+[2025-08-18 16:45:34,530][02834] Starting seed is not provided
+[2025-08-18 16:45:34,531][02834] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-18 16:45:34,532][02834] Initializing actor-critic model on device cuda:0
+[2025-08-18 16:45:34,533][02834] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-18 16:45:34,536][02834] RunningMeanStd input shape: (1,)
+[2025-08-18 16:45:34,604][02834] ConvEncoder: input_channels=3
+[2025-08-18 16:45:35,470][02849] Worker 1 uses CPU cores [1]
+[2025-08-18 16:45:35,692][02852] Worker 5 uses CPU cores [1]
+[2025-08-18 16:45:35,692][02854] Worker 7 uses CPU cores [1]
+[2025-08-18 16:45:35,700][02851] Worker 3 uses CPU cores [1]
+[2025-08-18 16:45:35,724][02850] Worker 2 uses CPU cores [0]
+[2025-08-18 16:45:35,789][02848] Worker 0 uses CPU cores [0]
+[2025-08-18 16:45:35,851][02855] Worker 6 uses CPU cores [0]
+[2025-08-18 16:45:35,856][02834] Conv encoder output size: 512
+[2025-08-18 16:45:35,857][02834] Policy head output size: 512
+[2025-08-18 16:45:35,937][02834] Created Actor Critic model with architecture:
+[2025-08-18 16:45:35,938][02834] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2025-08-18 16:45:35,945][02853] Worker 4 uses CPU cores [0]
+[2025-08-18 16:45:36,224][02834] Using optimizer <class 'torch.optim.adam.Adam'>
+[2025-08-18 16:45:37,641][02710] Heartbeat connected on Batcher_0
+[2025-08-18 16:45:37,647][02710] Heartbeat connected on InferenceWorker_p0-w0
+[2025-08-18 16:45:37,657][02710] Heartbeat connected on RolloutWorker_w0
+[2025-08-18 16:45:37,658][02710] Heartbeat connected on RolloutWorker_w1
+[2025-08-18 16:45:37,660][02710] Heartbeat connected on RolloutWorker_w2
+[2025-08-18 16:45:37,666][02710] Heartbeat connected on RolloutWorker_w4
+[2025-08-18 16:45:37,668][02710] Heartbeat connected on RolloutWorker_w3
+[2025-08-18 16:45:37,669][02710] Heartbeat connected on RolloutWorker_w5
+[2025-08-18 16:45:37,673][02710] Heartbeat connected on RolloutWorker_w6
+[2025-08-18 16:45:37,677][02710] Heartbeat connected on RolloutWorker_w7
+[2025-08-18 16:45:40,966][02834] No checkpoints found
+[2025-08-18 16:45:40,967][02834] Did not load from checkpoint, starting from scratch!
+[2025-08-18 16:45:40,967][02834] Initialized policy 0 weights for model version 0
+[2025-08-18 16:45:40,970][02834] LearnerWorker_p0 finished initialization!
+[2025-08-18 16:45:40,970][02834] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-18 16:45:40,982][02710] Heartbeat connected on LearnerWorker_p0
+[2025-08-18 16:45:41,117][02847] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-18 16:45:41,118][02847] RunningMeanStd input shape: (1,)
+[2025-08-18 16:45:41,130][02847] ConvEncoder: input_channels=3
+[2025-08-18 16:45:41,232][02847] Conv encoder output size: 512
+[2025-08-18 16:45:41,232][02847] Policy head output size: 512
+[2025-08-18 16:45:41,270][02710] Inference worker 0-0 is ready!
+[2025-08-18 16:45:41,271][02710] All inference workers are ready! Signal rollout workers to start!
+[2025-08-18 16:45:41,521][02849] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-18 16:45:41,524][02855] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-18 16:45:41,534][02848] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-18 16:45:41,535][02850] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-18 16:45:41,550][02851] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-18 16:45:41,557][02852] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-18 16:45:41,569][02854] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-18 16:45:41,565][02853] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-18 16:45:42,421][02710] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-18 16:45:42,761][02850] Decorrelating experience for 0 frames...
+[2025-08-18 16:45:42,761][02851] Decorrelating experience for 0 frames...
+[2025-08-18 16:45:42,763][02853] Decorrelating experience for 0 frames...
+[2025-08-18 16:45:43,139][02853] Decorrelating experience for 32 frames...
+[2025-08-18 16:45:43,145][02851] Decorrelating experience for 32 frames...
+[2025-08-18 16:45:43,701][02851] Decorrelating experience for 64 frames...
+[2025-08-18 16:45:43,974][02850] Decorrelating experience for 32 frames...
+[2025-08-18 16:45:44,137][02851] Decorrelating experience for 96 frames...
+[2025-08-18 16:45:44,186][02853] Decorrelating experience for 64 frames...
+[2025-08-18 16:45:44,858][02850] Decorrelating experience for 64 frames...
+[2025-08-18 16:45:44,918][02853] Decorrelating experience for 96 frames...
+[2025-08-18 16:45:45,337][02850] Decorrelating experience for 96 frames...
+[2025-08-18 16:45:47,423][02710] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 336.7. Samples: 1684. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-18 16:45:47,433][02710] Avg episode reward: [(0, '2.807')]
+[2025-08-18 16:45:48,920][02834] Signal inference workers to stop experience collection...
+[2025-08-18 16:45:48,980][02847] InferenceWorker_p0-w0: stopping experience collection
+[2025-08-18 16:45:49,912][02834] Signal inference workers to resume experience collection...
+[2025-08-18 16:45:49,913][02847] InferenceWorker_p0-w0: resuming experience collection
+[2025-08-18 16:45:52,421][02710] Fps is (10 sec: 1638.4, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 16384. Throughput: 0: 234.8. Samples: 2348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:45:52,422][02710] Avg episode reward: [(0, '3.675')]
+[2025-08-18 16:45:57,421][02710] Fps is (10 sec: 3277.2, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 32768. Throughput: 0: 541.5. Samples: 8122. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:45:57,424][02710] Avg episode reward: [(0, '3.983')]
+[2025-08-18 16:45:58,451][02847] Updated weights for policy 0, policy_version 10 (0.0014)
+[2025-08-18 16:46:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 49152. Throughput: 0: 640.4. Samples: 12808. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:46:02,424][02710] Avg episode reward: [(0, '4.358')]
+[2025-08-18 16:46:07,421][02710] Fps is (10 sec: 3276.8, 60 sec: 2621.5, 300 sec: 2621.5). Total num frames: 65536. Throughput: 0: 611.1. Samples: 15278. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:46:07,426][02710] Avg episode reward: [(0, '4.387')]
+[2025-08-18 16:46:10,606][02847] Updated weights for policy 0, policy_version 20 (0.0012)
+[2025-08-18 16:46:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 86016. Throughput: 0: 713.8. Samples: 21414. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:46:12,426][02710] Avg episode reward: [(0, '4.366')]
+[2025-08-18 16:46:17,421][02710] Fps is (10 sec: 3686.4, 60 sec: 2925.7, 300 sec: 2925.7). Total num frames: 102400. Throughput: 0: 749.0. Samples: 26214. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:46:17,607][02710] Avg episode reward: [(0, '4.474')]
+[2025-08-18 16:46:17,611][02834] Saving new best policy, reward=4.474!
+[2025-08-18 16:46:22,010][02847] Updated weights for policy 0, policy_version 30 (0.0012)
+[2025-08-18 16:46:22,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 122880. Throughput: 0: 728.3. Samples: 29130. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:46:22,428][02710] Avg episode reward: [(0, '4.356')]
+[2025-08-18 16:46:27,423][02710] Fps is (10 sec: 4095.4, 60 sec: 3185.7, 300 sec: 3185.7). Total num frames: 143360. Throughput: 0: 780.8. Samples: 35138. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:46:27,424][02710] Avg episode reward: [(0, '4.364')]
+[2025-08-18 16:46:32,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3113.0, 300 sec: 3113.0). Total num frames: 155648. Throughput: 0: 852.0. Samples: 40024. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:46:32,425][02710] Avg episode reward: [(0, '4.474')]
+[2025-08-18 16:46:33,446][02847] Updated weights for policy 0, policy_version 40 (0.0013)
+[2025-08-18 16:46:37,421][02710] Fps is (10 sec: 3277.3, 60 sec: 3202.3, 300 sec: 3202.3). Total num frames: 176128. Throughput: 0: 904.5. Samples: 43052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:46:37,423][02710] Avg episode reward: [(0, '4.520')]
+[2025-08-18 16:46:37,428][02834] Saving new best policy, reward=4.520!
+[2025-08-18 16:46:42,422][02710] Fps is (10 sec: 4095.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 196608. Throughput: 0: 902.2. Samples: 48720. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:46:42,423][02710] Avg episode reward: [(0, '4.406')]
+[2025-08-18 16:46:44,910][02847] Updated weights for policy 0, policy_version 50 (0.0012)
+[2025-08-18 16:46:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 212992. Throughput: 0: 910.2. Samples: 53768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:46:47,423][02710] Avg episode reward: [(0, '4.282')]
+[2025-08-18 16:46:52,421][02710] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3335.3). Total num frames: 233472. Throughput: 0: 922.8. Samples: 56804. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:46:52,423][02710] Avg episode reward: [(0, '4.279')]
+[2025-08-18 16:46:55,698][02847] Updated weights for policy 0, policy_version 60 (0.0015)
+[2025-08-18 16:46:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3331.4). Total num frames: 249856. Throughput: 0: 901.1. Samples: 61962. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:46:57,423][02710] Avg episode reward: [(0, '4.481')]
+[2025-08-18 16:47:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3328.0). Total num frames: 266240. Throughput: 0: 919.1. Samples: 67574. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:47:02,426][02710] Avg episode reward: [(0, '4.473')]
+[2025-08-18 16:47:06,567][02847] Updated weights for policy 0, policy_version 70 (0.0015)
+[2025-08-18 16:47:07,422][02710] Fps is (10 sec: 3686.1, 60 sec: 3686.3, 300 sec: 3373.1). Total num frames: 286720. Throughput: 0: 921.9. Samples: 70616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:47:07,424][02710] Avg episode reward: [(0, '4.403')]
+[2025-08-18 16:47:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3367.8). Total num frames: 303104. Throughput: 0: 894.4. Samples: 75384. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:47:12,425][02710] Avg episode reward: [(0, '4.476')]
+[2025-08-18 16:47:12,431][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth...
+[2025-08-18 16:47:17,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3686.4, 300 sec: 3406.1). Total num frames: 323584. Throughput: 0: 918.7. Samples: 81364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:47:17,429][02710] Avg episode reward: [(0, '4.339')]
+[2025-08-18 16:47:18,067][02847] Updated weights for policy 0, policy_version 80 (0.0016)
+[2025-08-18 16:47:22,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3399.7). Total num frames: 339968. Throughput: 0: 918.2. Samples: 84372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:47:22,423][02710] Avg episode reward: [(0, '4.223')]
+[2025-08-18 16:47:27,422][02710] Fps is (10 sec: 2867.1, 60 sec: 3481.7, 300 sec: 3354.8). Total num frames: 352256. Throughput: 0: 887.9. Samples: 88676. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:47:27,425][02710] Avg episode reward: [(0, '4.369')]
+[2025-08-18 16:47:31,789][02847] Updated weights for policy 0, policy_version 90 (0.0014)
+[2025-08-18 16:47:32,421][02710] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3351.3). Total num frames: 368640. Throughput: 0: 870.5. Samples: 92942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:47:32,423][02710] Avg episode reward: [(0, '4.571')]
+[2025-08-18 16:47:32,433][02834] Saving new best policy, reward=4.571!
+[2025-08-18 16:47:37,421][02710] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3348.0). Total num frames: 385024. Throughput: 0: 857.3. Samples: 95384. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:47:37,423][02710] Avg episode reward: [(0, '4.595')]
+[2025-08-18 16:47:37,425][02834] Saving new best policy, reward=4.595!
+[2025-08-18 16:47:42,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3379.2). Total num frames: 405504. Throughput: 0: 860.6. Samples: 100690. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:47:42,423][02710] Avg episode reward: [(0, '4.709')]
+[2025-08-18 16:47:42,431][02834] Saving new best policy, reward=4.709!
+[2025-08-18 16:47:43,242][02847] Updated weights for policy 0, policy_version 100 (0.0013)
+[2025-08-18 16:47:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3375.1). Total num frames: 421888. Throughput: 0: 868.0. Samples: 106636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:47:47,423][02710] Avg episode reward: [(0, '4.724')]
+[2025-08-18 16:47:47,452][02834] Saving new best policy, reward=4.724!
+[2025-08-18 16:47:52,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3371.3). Total num frames: 438272. Throughput: 0: 842.2. Samples: 108514. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:47:52,425][02710] Avg episode reward: [(0, '4.591')]
+[2025-08-18 16:47:54,745][02847] Updated weights for policy 0, policy_version 110 (0.0012)
+[2025-08-18 16:47:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3398.2). Total num frames: 458752. Throughput: 0: 866.7. Samples: 114386. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:47:57,430][02710] Avg episode reward: [(0, '4.709')]
+[2025-08-18 16:48:02,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3393.8). Total num frames: 475136. Throughput: 0: 854.9. Samples: 119834. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:48:02,423][02710] Avg episode reward: [(0, '4.757')]
+[2025-08-18 16:48:02,429][02834] Saving new best policy, reward=4.757!
+[2025-08-18 16:48:06,314][02847] Updated weights for policy 0, policy_version 120 (0.0012)
+[2025-08-18 16:48:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3418.0). Total num frames: 495616. Throughput: 0: 837.6. Samples: 122064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:48:07,423][02710] Avg episode reward: [(0, '4.793')]
+[2025-08-18 16:48:07,426][02834] Saving new best policy, reward=4.793!
+[2025-08-18 16:48:12,429][02710] Fps is (10 sec: 3683.4, 60 sec: 3481.1, 300 sec: 3413.2). Total num frames: 512000. Throughput: 0: 874.1. Samples: 128016. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:48:12,431][02710] Avg episode reward: [(0, '4.621')]
+[2025-08-18 16:48:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3408.9). Total num frames: 528384. Throughput: 0: 888.4. Samples: 132922. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:48:17,423][02710] Avg episode reward: [(0, '4.410')]
+[2025-08-18 16:48:17,796][02847] Updated weights for policy 0, policy_version 130 (0.0013)
+[2025-08-18 16:48:22,422][02710] Fps is (10 sec: 3689.3, 60 sec: 3481.6, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 896.6. Samples: 135730. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:48:22,426][02710] Avg episode reward: [(0, '4.345')]
+[2025-08-18 16:48:27,424][02710] Fps is (10 sec: 4094.8, 60 sec: 3618.0, 300 sec: 3450.5). Total num frames: 569344. Throughput: 0: 913.0. Samples: 141776. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:48:27,426][02710] Avg episode reward: [(0, '4.459')]
+[2025-08-18 16:48:28,106][02847] Updated weights for policy 0, policy_version 140 (0.0013)
+[2025-08-18 16:48:32,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3445.5). Total num frames: 585728. Throughput: 0: 886.6. Samples: 146534. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:48:32,423][02710] Avg episode reward: [(0, '4.510')]
+[2025-08-18 16:48:37,421][02710] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3440.6). Total num frames: 602112. Throughput: 0: 905.5. Samples: 149262. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:48:37,426][02710] Avg episode reward: [(0, '4.573')]
+[2025-08-18 16:48:39,959][02847] Updated weights for policy 0, policy_version 150 (0.0013)
+[2025-08-18 16:48:42,423][02710] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3458.8). Total num frames: 622592. Throughput: 0: 906.0. Samples: 155156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:48:42,424][02710] Avg episode reward: [(0, '4.590')]
+[2025-08-18 16:48:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3453.9). Total num frames: 638976. Throughput: 0: 889.0. Samples: 159838. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:48:47,426][02710] Avg episode reward: [(0, '4.541')]
+[2025-08-18 16:48:51,438][02847] Updated weights for policy 0, policy_version 160 (0.0013)
+[2025-08-18 16:48:52,421][02710] Fps is (10 sec: 3277.4, 60 sec: 3618.1, 300 sec: 3449.3). Total num frames: 655360. Throughput: 0: 906.7. Samples: 162866. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:48:52,425][02710] Avg episode reward: [(0, '4.667')]
+[2025-08-18 16:48:57,422][02710] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3444.8). Total num frames: 671744. Throughput: 0: 893.8. Samples: 168232. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:48:57,427][02710] Avg episode reward: [(0, '4.904')]
+[2025-08-18 16:48:57,429][02834] Saving new best policy, reward=4.904!
+[2025-08-18 16:49:02,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3461.1). Total num frames: 692224. Throughput: 0: 900.0. Samples: 173422. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:49:02,423][02710] Avg episode reward: [(0, '4.826')]
+[2025-08-18 16:49:03,084][02847] Updated weights for policy 0, policy_version 170 (0.0014)
+[2025-08-18 16:49:07,421][02710] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3476.6). Total num frames: 712704. Throughput: 0: 904.6. Samples: 176438. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:49:07,423][02710] Avg episode reward: [(0, '4.750')]
+[2025-08-18 16:49:12,422][02710] Fps is (10 sec: 3276.5, 60 sec: 3550.3, 300 sec: 3452.3). Total num frames: 724992. Throughput: 0: 881.7. Samples: 181450. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:49:12,427][02710] Avg episode reward: [(0, '4.574')]
+[2025-08-18 16:49:12,435][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth...
+[2025-08-18 16:49:14,619][02847] Updated weights for policy 0, policy_version 180 (0.0013)
+[2025-08-18 16:49:17,422][02710] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3467.3). Total num frames: 745472. Throughput: 0: 905.3. Samples: 187274. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:49:17,426][02710] Avg episode reward: [(0, '4.477')]
+[2025-08-18 16:49:22,421][02710] Fps is (10 sec: 4096.3, 60 sec: 3618.1, 300 sec: 3481.6). Total num frames: 765952. Throughput: 0: 911.3. Samples: 190272. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:49:22,425][02710] Avg episode reward: [(0, '4.415')]
+[2025-08-18 16:49:26,080][02847] Updated weights for policy 0, policy_version 190 (0.0013)
+[2025-08-18 16:49:27,424][02710] Fps is (10 sec: 3685.4, 60 sec: 3549.9, 300 sec: 3477.0). Total num frames: 782336. Throughput: 0: 885.2. Samples: 194990. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:49:27,426][02710] Avg episode reward: [(0, '4.695')]
+[2025-08-18 16:49:32,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3490.5). Total num frames: 802816. Throughput: 0: 914.8. Samples: 201004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:49:32,423][02710] Avg episode reward: [(0, '4.864')]
+[2025-08-18 16:49:36,760][02847] Updated weights for policy 0, policy_version 200 (0.0012)
+[2025-08-18 16:49:37,422][02710] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3485.9). Total num frames: 819200. Throughput: 0: 913.5. Samples: 203972. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:49:37,424][02710] Avg episode reward: [(0, '4.826')]
+[2025-08-18 16:49:42,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3481.6). Total num frames: 835584. Throughput: 0: 902.1. Samples: 208826. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:49:42,424][02710] Avg episode reward: [(0, '4.796')]
+[2025-08-18 16:49:47,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3494.1). Total num frames: 856064. Throughput: 0: 918.3. Samples: 214744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:49:47,424][02710] Avg episode reward: [(0, '4.792')]
+[2025-08-18 16:49:47,779][02847] Updated weights for policy 0, policy_version 210 (0.0017)
+[2025-08-18 16:49:52,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3489.8). Total num frames: 872448. Throughput: 0: 904.8. Samples: 217154. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:49:52,424][02710] Avg episode reward: [(0, '4.772')]
+[2025-08-18 16:49:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3501.7). Total num frames: 892928. Throughput: 0: 911.9. Samples: 222484. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:49:57,423][02710] Avg episode reward: [(0, '4.853')]
+[2025-08-18 16:49:59,274][02847] Updated weights for policy 0, policy_version 220 (0.0012)
+[2025-08-18 16:50:02,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3497.4). Total num frames: 909312. Throughput: 0: 912.0. Samples: 228314. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:50:02,426][02710] Avg episode reward: [(0, '4.972')]
+[2025-08-18 16:50:02,445][02834] Saving new best policy, reward=4.972!
+[2025-08-18 16:50:07,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3493.2). Total num frames: 925696. Throughput: 0: 887.0. Samples: 230186. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:50:07,423][02710] Avg episode reward: [(0, '4.851')]
+[2025-08-18 16:50:10,812][02847] Updated weights for policy 0, policy_version 230 (0.0016)
+[2025-08-18 16:50:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3504.4). Total num frames: 946176. Throughput: 0: 915.1. Samples: 236166. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:50:12,423][02710] Avg episode reward: [(0, '4.885')]
+[2025-08-18 16:50:17,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3500.2). Total num frames: 962560. Throughput: 0: 902.0. Samples: 241594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:50:17,425][02710] Avg episode reward: [(0, '4.998')]
+[2025-08-18 16:50:17,428][02834] Saving new best policy, reward=4.998!
+[2025-08-18 16:50:22,385][02847] Updated weights for policy 0, policy_version 240 (0.0013)
+[2025-08-18 16:50:22,424][02710] Fps is (10 sec: 3685.4, 60 sec: 3618.0, 300 sec: 3510.8). Total num frames: 983040. Throughput: 0: 887.1. Samples: 243892. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:50:22,428][02710] Avg episode reward: [(0, '4.881')]
+[2025-08-18 16:50:27,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3506.8). Total num frames: 999424. Throughput: 0: 914.7. Samples: 249988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:50:27,423][02710] Avg episode reward: [(0, '4.719')]
+[2025-08-18 16:50:32,421][02710] Fps is (10 sec: 3277.7, 60 sec: 3549.9, 300 sec: 3502.8). Total num frames: 1015808. Throughput: 0: 887.1. Samples: 254662. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:50:32,423][02710] Avg episode reward: [(0, '4.856')]
+[2025-08-18 16:50:33,787][02847] Updated weights for policy 0, policy_version 250 (0.0014)
+[2025-08-18 16:50:37,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3512.8). Total num frames: 1036288. Throughput: 0: 901.9. Samples: 257738. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:50:37,425][02710] Avg episode reward: [(0, '5.034')]
+[2025-08-18 16:50:37,428][02834] Saving new best policy, reward=5.034!
+[2025-08-18 16:50:42,421][02710] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1056768. Throughput: 0: 917.1. Samples: 263754. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:50:42,423][02710] Avg episode reward: [(0, '5.200')]
+[2025-08-18 16:50:42,438][02834] Saving new best policy, reward=5.200!
+[2025-08-18 16:50:44,994][02847] Updated weights for policy 0, policy_version 260 (0.0013)
+[2025-08-18 16:50:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1073152. Throughput: 0: 892.9. Samples: 268496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:50:47,425][02710] Avg episode reward: [(0, '5.178')]
+[2025-08-18 16:50:52,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1093632. Throughput: 0: 918.3. Samples: 271508. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:50:52,423][02710] Avg episode reward: [(0, '5.199')]
+[2025-08-18 16:50:55,422][02847] Updated weights for policy 0, policy_version 270 (0.0012)
+[2025-08-18 16:50:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1110016. Throughput: 0: 915.2. Samples: 277348. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:50:57,429][02710] Avg episode reward: [(0, '5.143')]
+[2025-08-18 16:51:02,422][02710] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1126400. Throughput: 0: 899.3. Samples: 282064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:51:02,423][02710] Avg episode reward: [(0, '5.262')]
+[2025-08-18 16:51:02,428][02834] Saving new best policy, reward=5.262!
+[2025-08-18 16:51:07,244][02847] Updated weights for policy 0, policy_version 280 (0.0012)
+[2025-08-18 16:51:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1146880. Throughput: 0: 912.7. Samples: 284962. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:51:07,423][02710] Avg episode reward: [(0, '5.387')]
+[2025-08-18 16:51:07,425][02834] Saving new best policy, reward=5.387!
+[2025-08-18 16:51:12,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1163264. Throughput: 0: 897.2. Samples: 290362. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:51:12,433][02710] Avg episode reward: [(0, '5.530')]
+[2025-08-18 16:51:12,442][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth...
+[2025-08-18 16:51:12,539][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth
+[2025-08-18 16:51:12,552][02834] Saving new best policy, reward=5.530!
+[2025-08-18 16:51:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1179648. Throughput: 0: 908.3. Samples: 295536. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:51:17,426][02710] Avg episode reward: [(0, '5.238')]
+[2025-08-18 16:51:18,911][02847] Updated weights for policy 0, policy_version 290 (0.0012)
+[2025-08-18 16:51:22,424][02710] Fps is (10 sec: 3685.4, 60 sec: 3618.1, 300 sec: 3582.2). Total num frames: 1200128. Throughput: 0: 905.5. Samples: 298488. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:51:22,426][02710] Avg episode reward: [(0, '5.397')]
+[2025-08-18 16:51:27,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1216512. Throughput: 0: 883.5. Samples: 303512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:51:27,423][02710] Avg episode reward: [(0, '5.492')]
+[2025-08-18 16:51:30,295][02847] Updated weights for policy 0, policy_version 300 (0.0014)
+[2025-08-18 16:51:32,421][02710] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1232896. Throughput: 0: 907.9. Samples: 309350. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:51:32,425][02710] Avg episode reward: [(0, '5.533')]
+[2025-08-18 16:51:32,483][02834] Saving new best policy, reward=5.533!
+[2025-08-18 16:51:37,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1253376. Throughput: 0: 907.1. Samples: 312328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:51:37,426][02710] Avg episode reward: [(0, '5.478')]
+[2025-08-18 16:51:41,804][02847] Updated weights for policy 0, policy_version 310 (0.0013)
+[2025-08-18 16:51:42,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1269760. Throughput: 0: 882.4. Samples: 317054. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:51:42,423][02710] Avg episode reward: [(0, '5.213')]
+[2025-08-18 16:51:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1290240. Throughput: 0: 912.2. Samples: 323114. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:51:47,426][02710] Avg episode reward: [(0, '5.232')]
+[2025-08-18 16:51:52,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1306624. Throughput: 0: 913.0. Samples: 326048. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:51:52,423][02710] Avg episode reward: [(0, '5.474')]
+[2025-08-18 16:51:53,123][02847] Updated weights for policy 0, policy_version 320 (0.0012)
+[2025-08-18 16:51:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1327104. Throughput: 0: 901.1. Samples: 330912. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:51:57,426][02710] Avg episode reward: [(0, '5.611')]
+[2025-08-18 16:51:57,431][02834] Saving new best policy, reward=5.611!
+[2025-08-18 16:52:02,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 1343488. Throughput: 0: 917.7. Samples: 336834. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:52:02,426][02710] Avg episode reward: [(0, '5.585')]
+[2025-08-18 16:52:03,506][02847] Updated weights for policy 0, policy_version 330 (0.0013)
+[2025-08-18 16:52:07,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1359872. Throughput: 0: 905.7. Samples: 339242. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:52:07,425][02710] Avg episode reward: [(0, '5.505')]
+[2025-08-18 16:52:12,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1380352. Throughput: 0: 914.2. Samples: 344650. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:52:12,425][02710] Avg episode reward: [(0, '6.016')]
+[2025-08-18 16:52:12,432][02834] Saving new best policy, reward=6.016!
+[2025-08-18 16:52:14,949][02847] Updated weights for policy 0, policy_version 340 (0.0012)
+[2025-08-18 16:52:17,426][02710] Fps is (10 sec: 4094.1, 60 sec: 3686.1, 300 sec: 3596.1). Total num frames: 1400832. Throughput: 0: 916.5. Samples: 350598. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:52:17,433][02710] Avg episode reward: [(0, '6.524')]
+[2025-08-18 16:52:17,442][02834] Saving new best policy, reward=6.524!
+[2025-08-18 16:52:22,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.3, 300 sec: 3610.0). Total num frames: 1417216. Throughput: 0: 891.7. Samples: 352454. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:52:22,425][02710] Avg episode reward: [(0, '6.589')]
+[2025-08-18 16:52:22,435][02834] Saving new best policy, reward=6.589!
+[2025-08-18 16:52:26,479][02847] Updated weights for policy 0, policy_version 350 (0.0013)
+[2025-08-18 16:52:27,424][02710] Fps is (10 sec: 3277.5, 60 sec: 3618.0, 300 sec: 3610.0). Total num frames: 1433600. Throughput: 0: 919.4. Samples: 358428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:52:27,425][02710] Avg episode reward: [(0, '6.567')]
+[2025-08-18 16:52:32,423][02710] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3623.9). Total num frames: 1454080. Throughput: 0: 902.6. Samples: 363734. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:52:32,425][02710] Avg episode reward: [(0, '6.339')]
+[2025-08-18 16:52:37,421][02710] Fps is (10 sec: 3687.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1470464. Throughput: 0: 890.7. Samples: 366128. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:52:37,424][02710] Avg episode reward: [(0, '6.348')]
+[2025-08-18 16:52:38,000][02847] Updated weights for policy 0, policy_version 360 (0.0013)
+[2025-08-18 16:52:42,422][02710] Fps is (10 sec: 3686.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1490944. Throughput: 0: 917.4. Samples: 372196. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:52:42,423][02710] Avg episode reward: [(0, '7.073')]
+[2025-08-18 16:52:42,430][02834] Saving new best policy, reward=7.073!
+[2025-08-18 16:52:47,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 1503232. Throughput: 0: 890.5. Samples: 376906. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:52:47,425][02710] Avg episode reward: [(0, '7.096')]
+[2025-08-18 16:52:47,429][02834] Saving new best policy, reward=7.096!
+[2025-08-18 16:52:49,568][02847] Updated weights for policy 0, policy_version 370 (0.0013)
+[2025-08-18 16:52:52,426][02710] Fps is (10 sec: 3275.4, 60 sec: 3617.9, 300 sec: 3610.0). Total num frames: 1523712. Throughput: 0: 904.3. Samples: 379942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:52:52,428][02710] Avg episode reward: [(0, '6.977')]
+[2025-08-18 16:52:57,421][02710] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1544192. Throughput: 0: 919.6. Samples: 386030. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:52:57,424][02710] Avg episode reward: [(0, '6.948')]
+[2025-08-18 16:53:00,842][02847] Updated weights for policy 0, policy_version 380 (0.0012)
+[2025-08-18 16:53:02,421][02710] Fps is (10 sec: 3688.2, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1560576. Throughput: 0: 891.7. Samples: 390718. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:53:02,424][02710] Avg episode reward: [(0, '7.516')]
+[2025-08-18 16:53:02,430][02834] Saving new best policy, reward=7.516!
+[2025-08-18 16:53:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3624.0). Total num frames: 1581056. Throughput: 0: 917.5. Samples: 393740. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:53:07,424][02710] Avg episode reward: [(0, '8.303')]
+[2025-08-18 16:53:07,429][02834] Saving new best policy, reward=8.303!
+[2025-08-18 16:53:11,278][02847] Updated weights for policy 0, policy_version 390 (0.0013)
+[2025-08-18 16:53:12,429][02710] Fps is (10 sec: 3683.5, 60 sec: 3617.7, 300 sec: 3623.8). Total num frames: 1597440. Throughput: 0: 914.5. Samples: 399584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:53:12,443][02710] Avg episode reward: [(0, '8.422')]
+[2025-08-18 16:53:12,456][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000390_1597440.pth...
+[2025-08-18 16:53:12,573][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth
+[2025-08-18 16:53:12,586][02834] Saving new best policy, reward=8.422!
+[2025-08-18 16:53:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3550.2, 300 sec: 3610.0). Total num frames: 1613824. Throughput: 0: 902.5. Samples: 404344. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:53:17,423][02710] Avg episode reward: [(0, '8.704')]
+[2025-08-18 16:53:17,424][02834] Saving new best policy, reward=8.704!
+[2025-08-18 16:53:22,422][02710] Fps is (10 sec: 3689.2, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 1634304. Throughput: 0: 912.7. Samples: 407198. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:53:22,426][02710] Avg episode reward: [(0, '9.189')]
+[2025-08-18 16:53:22,433][02834] Saving new best policy, reward=9.189!
+[2025-08-18 16:53:22,897][02847] Updated weights for policy 0, policy_version 400 (0.0014)
+[2025-08-18 16:53:27,424][02710] Fps is (10 sec: 3685.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1650688. Throughput: 0: 896.1. Samples: 412522. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:53:27,425][02710] Avg episode reward: [(0, '10.076')]
+[2025-08-18 16:53:27,433][02834] Saving new best policy, reward=10.076!
+[2025-08-18 16:53:32,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.2, 300 sec: 3623.9). Total num frames: 1671168. Throughput: 0: 910.4. Samples: 417876. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:53:32,423][02710] Avg episode reward: [(0, '10.495')]
+[2025-08-18 16:53:32,434][02834] Saving new best policy, reward=10.495!
+[2025-08-18 16:53:34,473][02847] Updated weights for policy 0, policy_version 410 (0.0012)
+[2025-08-18 16:53:37,421][02710] Fps is (10 sec: 3687.4, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 1687552. Throughput: 0: 909.2. Samples: 420850. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:53:37,425][02710] Avg episode reward: [(0, '10.426')]
+[2025-08-18 16:53:42,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 1703936. Throughput: 0: 879.6. Samples: 425614. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:53:42,425][02710] Avg episode reward: [(0, '10.032')]
+[2025-08-18 16:53:45,935][02847] Updated weights for policy 0, policy_version 420 (0.0016)
+[2025-08-18 16:53:47,424][02710] Fps is (10 sec: 3685.4, 60 sec: 3686.2, 300 sec: 3623.9). Total num frames: 1724416. Throughput: 0: 908.7. Samples: 431614. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:53:47,428][02710] Avg episode reward: [(0, '10.363')]
+[2025-08-18 16:53:52,421][02710] Fps is (10 sec: 4096.0, 60 sec: 3686.7, 300 sec: 3637.8). Total num frames: 1744896. Throughput: 0: 909.4. Samples: 434664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:53:52,429][02710] Avg episode reward: [(0, '11.015')]
+[2025-08-18 16:53:52,437][02834] Saving new best policy, reward=11.015!
+[2025-08-18 16:53:57,410][02847] Updated weights for policy 0, policy_version 430 (0.0016)
+[2025-08-18 16:53:57,422][02710] Fps is (10 sec: 3687.3, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1761280. Throughput: 0: 884.7. Samples: 439388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:53:57,423][02710] Avg episode reward: [(0, '12.120')]
+[2025-08-18 16:53:57,426][02834] Saving new best policy, reward=12.120!
+[2025-08-18 16:54:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1777664. Throughput: 0: 911.9. Samples: 445378. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:54:02,423][02710] Avg episode reward: [(0, '12.294')]
+[2025-08-18 16:54:02,432][02834] Saving new best policy, reward=12.294!
+[2025-08-18 16:54:07,423][02710] Fps is (10 sec: 3276.3, 60 sec: 3549.8, 300 sec: 3623.9). Total num frames: 1794048. Throughput: 0: 911.3. Samples: 448206. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:54:07,424][02710] Avg episode reward: [(0, '12.704')]
+[2025-08-18 16:54:07,426][02834] Saving new best policy, reward=12.704!
+[2025-08-18 16:54:08,925][02847] Updated weights for policy 0, policy_version 440 (0.0012)
+[2025-08-18 16:54:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.6, 300 sec: 3623.9). Total num frames: 1814528. Throughput: 0: 903.1. Samples: 453160. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:54:12,424][02710] Avg episode reward: [(0, '11.606')]
+[2025-08-18 16:54:17,421][02710] Fps is (10 sec: 4096.7, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1835008. Throughput: 0: 917.5. Samples: 459162. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:54:17,423][02710] Avg episode reward: [(0, '10.698')]
+[2025-08-18 16:54:19,590][02847] Updated weights for policy 0, policy_version 450 (0.0015)
+[2025-08-18 16:54:22,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3624.0). Total num frames: 1851392. Throughput: 0: 903.3. Samples: 461500. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:54:22,425][02710] Avg episode reward: [(0, '9.914')]
+[2025-08-18 16:54:27,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3610.0). Total num frames: 1867776. Throughput: 0: 921.2. Samples: 467068. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:54:27,428][02710] Avg episode reward: [(0, '10.513')]
+[2025-08-18 16:54:30,520][02847] Updated weights for policy 0, policy_version 460 (0.0013)
+[2025-08-18 16:54:32,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1888256. Throughput: 0: 916.9. Samples: 472874. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:54:32,425][02710] Avg episode reward: [(0, '11.437')]
+[2025-08-18 16:54:37,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1904640. Throughput: 0: 891.9. Samples: 474798. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:54:37,426][02710] Avg episode reward: [(0, '11.324')]
+[2025-08-18 16:54:41,869][02847] Updated weights for policy 0, policy_version 470 (0.0013)
+[2025-08-18 16:54:42,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1925120. Throughput: 0: 922.9. Samples: 480920. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:54:42,423][02710] Avg episode reward: [(0, '11.479')]
+[2025-08-18 16:54:47,425][02710] Fps is (10 sec: 3685.1, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1941504. Throughput: 0: 905.7. Samples: 486138. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:54:47,426][02710] Avg episode reward: [(0, '10.827')]
+[2025-08-18 16:54:52,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1961984. Throughput: 0: 899.0. Samples: 488658. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:54:52,426][02710] Avg episode reward: [(0, '11.795')]
+[2025-08-18 16:54:53,332][02847] Updated weights for policy 0, policy_version 480 (0.0012)
+[2025-08-18 16:54:57,421][02710] Fps is (10 sec: 4097.5, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1982464. Throughput: 0: 924.0. Samples: 494740. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:54:57,426][02710] Avg episode reward: [(0, '12.692')]
+[2025-08-18 16:55:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1994752. Throughput: 0: 897.2. Samples: 499534. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:55:02,423][02710] Avg episode reward: [(0, '13.673')]
+[2025-08-18 16:55:02,434][02834] Saving new best policy, reward=13.673!
+[2025-08-18 16:55:04,633][02847] Updated weights for policy 0, policy_version 490 (0.0013)
+[2025-08-18 16:55:07,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3623.9). Total num frames: 2015232. Throughput: 0: 911.7. Samples: 502526. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:55:07,425][02710] Avg episode reward: [(0, '14.285')]
+[2025-08-18 16:55:07,426][02834] Saving new best policy, reward=14.285!
+[2025-08-18 16:55:12,422][02710] Fps is (10 sec: 4095.7, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 2035712. Throughput: 0: 921.1. Samples: 508520. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:55:12,424][02710] Avg episode reward: [(0, '14.011')]
+[2025-08-18 16:55:12,432][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth...
+[2025-08-18 16:55:12,521][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth
+[2025-08-18 16:55:16,225][02847] Updated weights for policy 0, policy_version 500 (0.0012)
+[2025-08-18 16:55:17,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2052096. Throughput: 0: 896.8. Samples: 513230. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:55:17,423][02710] Avg episode reward: [(0, '13.713')]
+[2025-08-18 16:55:22,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2072576. Throughput: 0: 920.4. Samples: 516218. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:55:22,423][02710] Avg episode reward: [(0, '13.395')]
+[2025-08-18 16:55:26,931][02847] Updated weights for policy 0, policy_version 510 (0.0013)
+[2025-08-18 16:55:27,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2088960. Throughput: 0: 912.8. Samples: 521998. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:55:27,423][02710] Avg episode reward: [(0, '12.197')]
+[2025-08-18 16:55:32,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2105344. Throughput: 0: 908.6. Samples: 527020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:55:32,423][02710] Avg episode reward: [(0, '11.480')]
+[2025-08-18 16:55:37,421][02710] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 2125824. Throughput: 0: 920.9. Samples: 530098. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:55:37,423][02710] Avg episode reward: [(0, '12.422')]
+[2025-08-18 16:55:37,851][02847] Updated weights for policy 0, policy_version 520 (0.0015)
+[2025-08-18 16:55:42,422][02710] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2142208. Throughput: 0: 899.8. Samples: 535232. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:55:42,423][02710] Avg episode reward: [(0, '12.747')]
+[2025-08-18 16:55:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3623.9). Total num frames: 2162688. Throughput: 0: 918.9. Samples: 540884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:55:47,426][02710] Avg episode reward: [(0, '13.417')]
+[2025-08-18 16:55:49,365][02847] Updated weights for policy 0, policy_version 530 (0.0021)
+[2025-08-18 16:55:52,421][02710] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2183168. Throughput: 0: 919.2. Samples: 543890. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:55:52,423][02710] Avg episode reward: [(0, '15.201')]
+[2025-08-18 16:55:52,429][02834] Saving new best policy, reward=15.201!
+[2025-08-18 16:55:57,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2195456. Throughput: 0: 889.5. Samples: 548548. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:55:57,423][02710] Avg episode reward: [(0, '15.048')]
+[2025-08-18 16:56:00,873][02847] Updated weights for policy 0, policy_version 540 (0.0014)
+[2025-08-18 16:56:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 2215936. Throughput: 0: 919.2. Samples: 554596. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:56:02,426][02710] Avg episode reward: [(0, '15.674')]
+[2025-08-18 16:56:02,433][02834] Saving new best policy, reward=15.674!
+[2025-08-18 16:56:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2232320. Throughput: 0: 918.7. Samples: 557558. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:56:07,423][02710] Avg episode reward: [(0, '17.067')]
+[2025-08-18 16:56:07,424][02834] Saving new best policy, reward=17.067!
+[2025-08-18 16:56:12,265][02847] Updated weights for policy 0, policy_version 550 (0.0013)
+[2025-08-18 16:56:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 2252800. Throughput: 0: 896.3. Samples: 562332. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:56:12,423][02710] Avg episode reward: [(0, '17.988')]
+[2025-08-18 16:56:12,431][02834] Saving new best policy, reward=17.988!
+[2025-08-18 16:56:17,421][02710] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2273280. Throughput: 0: 919.4. Samples: 568394. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:56:17,425][02710] Avg episode reward: [(0, '18.517')]
+[2025-08-18 16:56:17,427][02834] Saving new best policy, reward=18.517!
+[2025-08-18 16:56:22,424][02710] Fps is (10 sec: 3275.8, 60 sec: 3549.7, 300 sec: 3623.9). Total num frames: 2285568. Throughput: 0: 911.5. Samples: 571116. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:56:22,429][02710] Avg episode reward: [(0, '18.694')]
+[2025-08-18 16:56:22,443][02834] Saving new best policy, reward=18.694!
+[2025-08-18 16:56:23,875][02847] Updated weights for policy 0, policy_version 560 (0.0013)
+[2025-08-18 16:56:27,423][02710] Fps is (10 sec: 3276.2, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2306048. Throughput: 0: 909.0. Samples: 576140. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:56:27,428][02710] Avg episode reward: [(0, '18.572')]
+[2025-08-18 16:56:32,424][02710] Fps is (10 sec: 4096.2, 60 sec: 3686.2, 300 sec: 3637.8). Total num frames: 2326528. Throughput: 0: 917.7. Samples: 582184. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:56:32,425][02710] Avg episode reward: [(0, '19.030')]
+[2025-08-18 16:56:32,434][02834] Saving new best policy, reward=19.030!
+[2025-08-18 16:56:34,623][02847] Updated weights for policy 0, policy_version 570 (0.0013)
+[2025-08-18 16:56:37,421][02710] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2342912. Throughput: 0: 896.3. Samples: 584224. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:56:37,423][02710] Avg episode reward: [(0, '17.226')]
+[2025-08-18 16:56:42,422][02710] Fps is (10 sec: 3687.3, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2363392. Throughput: 0: 919.7. Samples: 589936. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:56:42,423][02710] Avg episode reward: [(0, '16.982')]
+[2025-08-18 16:56:45,418][02847] Updated weights for policy 0, policy_version 580 (0.0017)
+[2025-08-18 16:56:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2379776. Throughput: 0: 910.2. Samples: 595554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:56:47,426][02710] Avg episode reward: [(0, '18.134')]
+[2025-08-18 16:56:52,421][02710] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2396160. Throughput: 0: 892.1. Samples: 597702. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:56:52,423][02710] Avg episode reward: [(0, '19.013')]
+[2025-08-18 16:56:56,928][02847] Updated weights for policy 0, policy_version 590 (0.0013)
+[2025-08-18 16:56:57,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2416640. Throughput: 0: 921.6. Samples: 603806. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:56:57,426][02710] Avg episode reward: [(0, '18.117')]
+[2025-08-18 16:57:02,427][02710] Fps is (10 sec: 3684.3, 60 sec: 3617.8, 300 sec: 3637.7). Total num frames: 2433024. Throughput: 0: 898.7. Samples: 608842. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:57:02,430][02710] Avg episode reward: [(0, '19.705')]
+[2025-08-18 16:57:02,448][02834] Saving new best policy, reward=19.705!
+[2025-08-18 16:57:07,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2453504. Throughput: 0: 898.4. Samples: 611540. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:57:07,426][02710] Avg episode reward: [(0, '20.072')]
+[2025-08-18 16:57:07,430][02834] Saving new best policy, reward=20.072!
+[2025-08-18 16:57:08,203][02847] Updated weights for policy 0, policy_version 600 (0.0012)
+[2025-08-18 16:57:12,423][02710] Fps is (10 sec: 4097.7, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 2473984. Throughput: 0: 920.0. Samples: 617542. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:57:12,425][02710] Avg episode reward: [(0, '19.663')]
+[2025-08-18 16:57:12,433][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000604_2473984.pth...
+[2025-08-18 16:57:12,517][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000390_1597440.pth
+[2025-08-18 16:57:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2486272. Throughput: 0: 892.5. Samples: 622344. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:57:17,426][02710] Avg episode reward: [(0, '19.701')]
+[2025-08-18 16:57:19,750][02847] Updated weights for policy 0, policy_version 610 (0.0012)
+[2025-08-18 16:57:22,425][02710] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 2506752. Throughput: 0: 915.3. Samples: 625414. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:57:22,427][02710] Avg episode reward: [(0, '19.369')]
+[2025-08-18 16:57:27,423][02710] Fps is (10 sec: 4095.2, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2527232. Throughput: 0: 922.4. Samples: 631444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:57:27,425][02710] Avg episode reward: [(0, '19.081')]
+[2025-08-18 16:57:31,291][02847] Updated weights for policy 0, policy_version 620 (0.0021)
+[2025-08-18 16:57:32,421][02710] Fps is (10 sec: 3687.8, 60 sec: 3618.3, 300 sec: 3637.8). Total num frames: 2543616. Throughput: 0: 900.2. Samples: 636064. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:57:32,427][02710] Avg episode reward: [(0, '18.691')]
+[2025-08-18 16:57:37,422][02710] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2564096. Throughput: 0: 920.4. Samples: 639122. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:57:37,425][02710] Avg episode reward: [(0, '18.801')]
+[2025-08-18 16:57:42,124][02847] Updated weights for policy 0, policy_version 630 (0.0013)
+[2025-08-18 16:57:42,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2580480. Throughput: 0: 910.7. Samples: 644788. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:57:42,424][02710] Avg episode reward: [(0, '18.915')]
+[2025-08-18 16:57:47,422][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.9). Total num frames: 2596864. Throughput: 0: 914.1. Samples: 649970. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:57:47,424][02710] Avg episode reward: [(0, '19.465')]
+[2025-08-18 16:57:52,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2617344. Throughput: 0: 921.4. Samples: 653004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:57:52,423][02710] Avg episode reward: [(0, '20.587')]
+[2025-08-18 16:57:52,435][02834] Saving new best policy, reward=20.587!
+[2025-08-18 16:57:52,849][02847] Updated weights for policy 0, policy_version 640 (0.0013)
+[2025-08-18 16:57:57,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2633728. Throughput: 0: 899.7. Samples: 658028. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-18 16:57:57,426][02710] Avg episode reward: [(0, '20.577')]
+[2025-08-18 16:58:02,424][02710] Fps is (10 sec: 3685.3, 60 sec: 3686.6, 300 sec: 3637.8). Total num frames: 2654208. Throughput: 0: 920.6. Samples: 663772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:58:02,426][02710] Avg episode reward: [(0, '20.364')]
+[2025-08-18 16:58:04,387][02847] Updated weights for policy 0, policy_version 650 (0.0014)
+[2025-08-18 16:58:07,425][02710] Fps is (10 sec: 3685.0, 60 sec: 3617.9, 300 sec: 3637.9). Total num frames: 2670592. Throughput: 0: 918.3. Samples: 666738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:58:07,428][02710] Avg episode reward: [(0, '19.567')]
+[2025-08-18 16:58:12,421][02710] Fps is (10 sec: 3277.8, 60 sec: 3550.0, 300 sec: 3637.8). Total num frames: 2686976. Throughput: 0: 891.1. Samples: 671540. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:58:12,423][02710] Avg episode reward: [(0, '19.281')]
+[2025-08-18 16:58:15,681][02847] Updated weights for policy 0, policy_version 660 (0.0017)
+[2025-08-18 16:58:17,421][02710] Fps is (10 sec: 3687.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2707456. Throughput: 0: 923.1. Samples: 677602. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:58:17,423][02710] Avg episode reward: [(0, '18.751')]
+[2025-08-18 16:58:22,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3637.8). Total num frames: 2723840. Throughput: 0: 920.8. Samples: 680558. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:58:22,426][02710] Avg episode reward: [(0, '19.640')]
+[2025-08-18 16:58:27,186][02847] Updated weights for policy 0, policy_version 670 (0.0013)
+[2025-08-18 16:58:27,422][02710] Fps is (10 sec: 3686.2, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 2744320. Throughput: 0: 901.0. Samples: 685332. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:58:27,427][02710] Avg episode reward: [(0, '20.619')]
+[2025-08-18 16:58:27,431][02834] Saving new best policy, reward=20.619!
+[2025-08-18 16:58:32,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2760704. Throughput: 0: 918.5. Samples: 691302. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:58:32,424][02710] Avg episode reward: [(0, '20.258')]
+[2025-08-18 16:58:37,421][02710] Fps is (10 sec: 3277.0, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 2777088. Throughput: 0: 903.6. Samples: 693668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:58:37,425][02710] Avg episode reward: [(0, '21.294')]
+[2025-08-18 16:58:37,431][02834] Saving new best policy, reward=21.294!
+[2025-08-18 16:58:39,057][02847] Updated weights for policy 0, policy_version 680 (0.0013)
+[2025-08-18 16:58:42,422][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2797568. Throughput: 0: 906.8. Samples: 698836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:58:42,426][02710] Avg episode reward: [(0, '20.909')]
+[2025-08-18 16:58:47,422][02710] Fps is (10 sec: 4095.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2818048. Throughput: 0: 913.6. Samples: 704884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:58:47,427][02710] Avg episode reward: [(0, '19.464')]
+[2025-08-18 16:58:50,173][02847] Updated weights for policy 0, policy_version 690 (0.0012)
+[2025-08-18 16:58:52,421][02710] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2830336. Throughput: 0: 889.5. Samples: 706762. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:58:52,428][02710] Avg episode reward: [(0, '18.312')]
+[2025-08-18 16:58:57,421][02710] Fps is (10 sec: 3277.0, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2850816. Throughput: 0: 914.5. Samples: 712692. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:58:57,426][02710] Avg episode reward: [(0, '18.115')]
+[2025-08-18 16:59:00,504][02847] Updated weights for policy 0, policy_version 700 (0.0013)
+[2025-08-18 16:59:02,424][02710] Fps is (10 sec: 4094.9, 60 sec: 3618.2, 300 sec: 3651.7). Total num frames: 2871296. Throughput: 0: 902.8. Samples: 718230. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:59:02,429][02710] Avg episode reward: [(0, '17.390')]
+[2025-08-18 16:59:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3637.8). Total num frames: 2887680. Throughput: 0: 886.1. Samples: 720432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:59:07,427][02710] Avg episode reward: [(0, '17.663')]
+[2025-08-18 16:59:12,085][02847] Updated weights for policy 0, policy_version 710 (0.0013)
+[2025-08-18 16:59:12,421][02710] Fps is (10 sec: 3687.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2908160. Throughput: 0: 914.3. Samples: 726476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 16:59:12,426][02710] Avg episode reward: [(0, '18.940')]
+[2025-08-18 16:59:12,432][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000710_2908160.pth...
+[2025-08-18 16:59:12,508][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth
+[2025-08-18 16:59:17,422][02710] Fps is (10 sec: 3686.1, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2924544. Throughput: 0: 890.2. Samples: 731362. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:59:17,425][02710] Avg episode reward: [(0, '19.107')]
+[2025-08-18 16:59:22,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2940928. Throughput: 0: 898.6. Samples: 734104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:59:22,427][02710] Avg episode reward: [(0, '18.679')]
+[2025-08-18 16:59:23,622][02847] Updated weights for policy 0, policy_version 720 (0.0013)
+[2025-08-18 16:59:27,422][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2961408. Throughput: 0: 919.3. Samples: 740206. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:59:27,429][02710] Avg episode reward: [(0, '19.761')]
+[2025-08-18 16:59:32,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 2977792. Throughput: 0: 891.0. Samples: 744980. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:59:32,426][02710] Avg episode reward: [(0, '19.078')]
+[2025-08-18 16:59:35,036][02847] Updated weights for policy 0, policy_version 730 (0.0013)
+[2025-08-18 16:59:37,421][02710] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2998272. Throughput: 0: 915.8. Samples: 747974. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:59:37,423][02710] Avg episode reward: [(0, '17.645')]
+[2025-08-18 16:59:42,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3014656. Throughput: 0: 919.0. Samples: 754048. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:59:42,426][02710] Avg episode reward: [(0, '17.379')]
+[2025-08-18 16:59:46,477][02847] Updated weights for policy 0, policy_version 740 (0.0015)
+[2025-08-18 16:59:47,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 3031040. Throughput: 0: 900.7. Samples: 758758. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 16:59:47,426][02710] Avg episode reward: [(0, '17.897')]
+[2025-08-18 16:59:52,422][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3051520. Throughput: 0: 919.3. Samples: 761800. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:59:52,424][02710] Avg episode reward: [(0, '17.513')]
+[2025-08-18 16:59:57,423][02710] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3637.8). Total num frames: 3067904. Throughput: 0: 905.3. Samples: 767218. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 16:59:57,427][02710] Avg episode reward: [(0, '18.457')]
+[2025-08-18 16:59:57,777][02847] Updated weights for policy 0, policy_version 750 (0.0012)
+[2025-08-18 17:00:02,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.3, 300 sec: 3637.8). Total num frames: 3088384. Throughput: 0: 916.5. Samples: 772606. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:00:02,423][02710] Avg episode reward: [(0, '19.050')]
+[2025-08-18 17:00:07,424][02710] Fps is (10 sec: 4095.8, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 3108864. Throughput: 0: 923.2. Samples: 775652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:00:07,425][02710] Avg episode reward: [(0, '20.727')]
+[2025-08-18 17:00:08,039][02847] Updated weights for policy 0, policy_version 760 (0.0015)
+[2025-08-18 17:00:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3125248. Throughput: 0: 896.5. Samples: 780546. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:00:12,423][02710] Avg episode reward: [(0, '20.139')]
+[2025-08-18 17:00:17,421][02710] Fps is (10 sec: 3277.5, 60 sec: 3618.2, 300 sec: 3623.9). Total num frames: 3141632. Throughput: 0: 921.8. Samples: 786460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 17:00:17,426][02710] Avg episode reward: [(0, '19.983')]
+[2025-08-18 17:00:19,536][02847] Updated weights for policy 0, policy_version 770 (0.0012)
+[2025-08-18 17:00:22,424][02710] Fps is (10 sec: 3685.6, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 3162112. Throughput: 0: 922.7. Samples: 789496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:00:22,425][02710] Avg episode reward: [(0, '19.690')]
+[2025-08-18 17:00:27,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 3178496. Throughput: 0: 894.4. Samples: 794296. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:00:27,423][02710] Avg episode reward: [(0, '20.093')]
+[2025-08-18 17:00:30,936][02847] Updated weights for policy 0, policy_version 780 (0.0012)
+[2025-08-18 17:00:32,421][02710] Fps is (10 sec: 3687.2, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3198976. Throughput: 0: 921.4. Samples: 800222. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:00:32,427][02710] Avg episode reward: [(0, '19.491')]
+[2025-08-18 17:00:37,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3215360. Throughput: 0: 917.4. Samples: 803082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:00:37,426][02710] Avg episode reward: [(0, '19.146')]
+[2025-08-18 17:00:42,422][02710] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3231744. Throughput: 0: 905.6. Samples: 807970. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:00:42,425][02710] Avg episode reward: [(0, '19.702')]
+[2025-08-18 17:00:42,583][02847] Updated weights for policy 0, policy_version 790 (0.0013)
+[2025-08-18 17:00:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3252224. Throughput: 0: 916.9. Samples: 813866. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:00:47,424][02710] Avg episode reward: [(0, '21.069')]
+[2025-08-18 17:00:52,421][02710] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3268608. Throughput: 0: 900.0. Samples: 816150. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:00:52,426][02710] Avg episode reward: [(0, '20.612')]
+[2025-08-18 17:00:54,119][02847] Updated weights for policy 0, policy_version 800 (0.0014)
+[2025-08-18 17:00:57,423][02710] Fps is (10 sec: 3685.7, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3289088. Throughput: 0: 911.3. Samples: 821554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:00:57,425][02710] Avg episode reward: [(0, '21.625')]
+[2025-08-18 17:00:57,429][02834] Saving new best policy, reward=21.625!
+[2025-08-18 17:01:02,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3305472. Throughput: 0: 908.8. Samples: 827356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:01:02,426][02710] Avg episode reward: [(0, '21.996')]
+[2025-08-18 17:01:02,434][02834] Saving new best policy, reward=21.996!
+[2025-08-18 17:01:05,763][02847] Updated weights for policy 0, policy_version 810 (0.0013)
+[2025-08-18 17:01:07,421][02710] Fps is (10 sec: 3277.4, 60 sec: 3550.0, 300 sec: 3623.9). Total num frames: 3321856. Throughput: 0: 882.3. Samples: 829198. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:01:07,428][02710] Avg episode reward: [(0, '22.750')]
+[2025-08-18 17:01:07,433][02834] Saving new best policy, reward=22.750!
+[2025-08-18 17:01:12,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3342336. Throughput: 0: 906.9. Samples: 835108. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:01:12,427][02710] Avg episode reward: [(0, '21.894')]
+[2025-08-18 17:01:12,439][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000816_3342336.pth...
+[2025-08-18 17:01:12,535][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000604_2473984.pth
+[2025-08-18 17:01:16,993][02847] Updated weights for policy 0, policy_version 820 (0.0013)
+[2025-08-18 17:01:17,423][02710] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3637.8). Total num frames: 3358720. Throughput: 0: 882.9. Samples: 839954. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:01:17,424][02710] Avg episode reward: [(0, '21.076')]
+[2025-08-18 17:01:22,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3623.9). Total num frames: 3375104. Throughput: 0: 869.6. Samples: 842216. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:01:22,423][02710] Avg episode reward: [(0, '19.336')]
+[2025-08-18 17:01:27,421][02710] Fps is (10 sec: 3687.0, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3395584. Throughput: 0: 894.8. Samples: 848236. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:01:27,426][02710] Avg episode reward: [(0, '17.501')]
+[2025-08-18 17:01:28,168][02847] Updated weights for policy 0, policy_version 830 (0.0013)
+[2025-08-18 17:01:32,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 3411968. Throughput: 0: 870.2. Samples: 853024. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:01:32,427][02710] Avg episode reward: [(0, '16.579')]
+[2025-08-18 17:01:37,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3428352. Throughput: 0: 886.2. Samples: 856028. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:01:37,431][02710] Avg episode reward: [(0, '17.944')]
+[2025-08-18 17:01:39,536][02847] Updated weights for policy 0, policy_version 840 (0.0012)
+[2025-08-18 17:01:42,422][02710] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3448832. Throughput: 0: 901.6. Samples: 862124. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:01:42,427][02710] Avg episode reward: [(0, '19.347')]
+[2025-08-18 17:01:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 3465216. Throughput: 0: 876.7. Samples: 866808. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:01:47,426][02710] Avg episode reward: [(0, '19.877')]
+[2025-08-18 17:01:51,106][02847] Updated weights for policy 0, policy_version 850 (0.0012)
+[2025-08-18 17:01:52,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3485696. Throughput: 0: 901.2. Samples: 869750. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:01:52,425][02710] Avg episode reward: [(0, '22.127')]
+[2025-08-18 17:01:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3624.0). Total num frames: 3502080. Throughput: 0: 902.4. Samples: 875716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:01:57,426][02710] Avg episode reward: [(0, '23.165')]
+[2025-08-18 17:01:57,427][02834] Saving new best policy, reward=23.165!
+[2025-08-18 17:02:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3518464. Throughput: 0: 901.3. Samples: 880510. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:02:02,429][02710] Avg episode reward: [(0, '25.032')]
+[2025-08-18 17:02:02,438][02834] Saving new best policy, reward=25.032!
+[2025-08-18 17:02:02,651][02847] Updated weights for policy 0, policy_version 860 (0.0012)
+[2025-08-18 17:02:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 3538944. Throughput: 0: 915.7. Samples: 883422. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:02:07,427][02710] Avg episode reward: [(0, '24.863')]
+[2025-08-18 17:02:12,422][02710] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 3623.9). Total num frames: 3555328. Throughput: 0: 898.7. Samples: 888678. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-18 17:02:12,423][02710] Avg episode reward: [(0, '24.386')]
+[2025-08-18 17:02:14,537][02847] Updated weights for policy 0, policy_version 870 (0.0012)
+[2025-08-18 17:02:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3610.1). Total num frames: 3571712. Throughput: 0: 908.9. Samples: 893924. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 17:02:17,425][02710] Avg episode reward: [(0, '23.826')]
+[2025-08-18 17:02:22,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 3592192. Throughput: 0: 907.0. Samples: 896844. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:02:22,426][02710] Avg episode reward: [(0, '23.466')]
+[2025-08-18 17:02:26,062][02847] Updated weights for policy 0, policy_version 880 (0.0016)
+[2025-08-18 17:02:27,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3608576. Throughput: 0: 874.5. Samples: 901474. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:02:27,423][02710] Avg episode reward: [(0, '23.152')]
+[2025-08-18 17:02:32,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3629056. Throughput: 0: 903.0. Samples: 907444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:02:32,425][02710] Avg episode reward: [(0, '21.321')]
+[2025-08-18 17:02:36,526][02847] Updated weights for policy 0, policy_version 890 (0.0013)
+[2025-08-18 17:02:37,423][02710] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3610.0). Total num frames: 3645440. Throughput: 0: 904.5. Samples: 910456. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:02:37,430][02710] Avg episode reward: [(0, '20.257')]
+[2025-08-18 17:02:42,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3661824. Throughput: 0: 875.3. Samples: 915104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:02:42,425][02710] Avg episode reward: [(0, '19.857')]
+[2025-08-18 17:02:47,421][02710] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3682304. Throughput: 0: 898.4. Samples: 920936. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:02:47,426][02710] Avg episode reward: [(0, '20.608')]
+[2025-08-18 17:02:48,120][02847] Updated weights for policy 0, policy_version 900 (0.0017)
+[2025-08-18 17:02:52,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3596.1). Total num frames: 3694592. Throughput: 0: 893.2. Samples: 923618. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:02:52,427][02710] Avg episode reward: [(0, '20.518')]
+[2025-08-18 17:02:57,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 3715072. Throughput: 0: 885.2. Samples: 928510. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:02:57,426][02710] Avg episode reward: [(0, '21.497')]
+[2025-08-18 17:02:59,961][02847] Updated weights for policy 0, policy_version 910 (0.0016)
+[2025-08-18 17:03:02,421][02710] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 3735552. Throughput: 0: 899.1. Samples: 934384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 17:03:02,427][02710] Avg episode reward: [(0, '22.744')]
+[2025-08-18 17:03:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3751936. Throughput: 0: 883.9. Samples: 936620. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:03:07,428][02710] Avg episode reward: [(0, '23.443')]
+[2025-08-18 17:03:11,522][02847] Updated weights for policy 0, policy_version 920 (0.0017)
+[2025-08-18 17:03:12,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3768320. Throughput: 0: 903.1. Samples: 942114. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-18 17:03:12,423][02710] Avg episode reward: [(0, '23.664')]
+[2025-08-18 17:03:12,433][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000920_3768320.pth...
+[2025-08-18 17:03:12,514][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000710_2908160.pth
+[2025-08-18 17:03:17,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3788800. Throughput: 0: 894.5. Samples: 947698. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:03:17,425][02710] Avg episode reward: [(0, '23.514')]
+[2025-08-18 17:03:22,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 3805184. Throughput: 0: 867.5. Samples: 949494. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:03:22,427][02710] Avg episode reward: [(0, '22.147')]
+[2025-08-18 17:03:23,322][02847] Updated weights for policy 0, policy_version 930 (0.0013)
+[2025-08-18 17:03:27,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 3821568. Throughput: 0: 897.6. Samples: 955498. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:03:27,424][02710] Avg episode reward: [(0, '21.641')]
+[2025-08-18 17:03:32,422][02710] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 3610.0). Total num frames: 3842048. Throughput: 0: 885.3. Samples: 960776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:03:32,425][02710] Avg episode reward: [(0, '20.297')]
+[2025-08-18 17:03:34,873][02847] Updated weights for policy 0, policy_version 940 (0.0014)
+[2025-08-18 17:03:37,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3596.2). Total num frames: 3858432. Throughput: 0: 877.1. Samples: 963088. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:03:37,424][02710] Avg episode reward: [(0, '19.880')]
+[2025-08-18 17:03:42,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3878912. Throughput: 0: 903.4. Samples: 969162. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-18 17:03:42,423][02710] Avg episode reward: [(0, '20.029')]
+[2025-08-18 17:03:45,821][02847] Updated weights for policy 0, policy_version 950 (0.0012)
+[2025-08-18 17:03:47,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3596.1). Total num frames: 3891200. Throughput: 0: 875.5. Samples: 973782. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:03:47,423][02710] Avg episode reward: [(0, '20.250')]
+[2025-08-18 17:03:52,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3911680. Throughput: 0: 890.1. Samples: 976676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:03:52,423][02710] Avg episode reward: [(0, '20.663')]
+[2025-08-18 17:03:56,757][02847] Updated weights for policy 0, policy_version 960 (0.0013)
+[2025-08-18 17:03:57,422][02710] Fps is (10 sec: 4095.7, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3932160. Throughput: 0: 902.7. Samples: 982738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:03:57,424][02710] Avg episode reward: [(0, '21.067')]
+[2025-08-18 17:04:02,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3948544. Throughput: 0: 883.2. Samples: 987440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:04:02,424][02710] Avg episode reward: [(0, '20.413')]
+[2025-08-18 17:04:07,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3969024. Throughput: 0: 910.9. Samples: 990486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:04:07,424][02710] Avg episode reward: [(0, '20.813')]
+[2025-08-18 17:04:08,237][02847] Updated weights for policy 0, policy_version 970 (0.0012)
+[2025-08-18 17:04:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3985408. Throughput: 0: 905.3. Samples: 996238. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:04:12,424][02710] Avg episode reward: [(0, '20.233')]
+[2025-08-18 17:04:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 4001792. Throughput: 0: 896.9. Samples: 1001134. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-18 17:04:17,423][02710] Avg episode reward: [(0, '20.226')]
+[2025-08-18 17:04:17,814][02834] Stopping Batcher_0...
+[2025-08-18 17:04:17,816][02710] Component Batcher_0 stopped!
+[2025-08-18 17:04:17,816][02834] Loop batcher_evt_loop terminating...
+[2025-08-18 17:04:17,821][02710] Component RolloutWorker_w0 process died already! Don't wait for it.
+[2025-08-18 17:04:17,823][02710] Component RolloutWorker_w1 process died already! Don't wait for it.
+[2025-08-18 17:04:17,815][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-18 17:04:17,825][02710] Component RolloutWorker_w5 process died already! Don't wait for it.
+[2025-08-18 17:04:17,828][02710] Component RolloutWorker_w6 process died already! Don't wait for it.
+[2025-08-18 17:04:17,834][02710] Component RolloutWorker_w7 process died already! Don't wait for it.
+[2025-08-18 17:04:17,886][02847] Weights refcount: 2 0
+[2025-08-18 17:04:17,888][02710] Component InferenceWorker_p0-w0 stopped!
+[2025-08-18 17:04:17,890][02847] Stopping InferenceWorker_p0-w0...
+[2025-08-18 17:04:17,890][02847] Loop inference_proc0-0_evt_loop terminating...
+[2025-08-18 17:04:17,935][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000816_3342336.pth
+[2025-08-18 17:04:17,944][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-18 17:04:18,085][02710] Component RolloutWorker_w3 stopped!
+[2025-08-18 17:04:18,086][02851] Stopping RolloutWorker_w3...
+[2025-08-18 17:04:18,092][02834] Stopping LearnerWorker_p0...
+[2025-08-18 17:04:18,092][02834] Loop learner_proc0_evt_loop terminating...
+[2025-08-18 17:04:18,092][02710] Component LearnerWorker_p0 stopped!
+[2025-08-18 17:04:18,090][02851] Loop rollout_proc3_evt_loop terminating...
+[2025-08-18 17:04:18,236][02710] Component RolloutWorker_w4 stopped!
+[2025-08-18 17:04:18,236][02853] Stopping RolloutWorker_w4...
+[2025-08-18 17:04:18,251][02850] Stopping RolloutWorker_w2...
+[2025-08-18 17:04:18,251][02710] Component RolloutWorker_w2 stopped!
+[2025-08-18 17:04:18,253][02710] Waiting for process learner_proc0 to stop...
+[2025-08-18 17:04:18,239][02853] Loop rollout_proc4_evt_loop terminating...
+[2025-08-18 17:04:18,252][02850] Loop rollout_proc2_evt_loop terminating...
+[2025-08-18 17:04:19,591][02710] Waiting for process inference_proc0-0 to join...
+[2025-08-18 17:04:19,596][02710] Waiting for process rollout_proc0 to join...
+[2025-08-18 17:04:19,597][02710] Waiting for process rollout_proc1 to join...
+[2025-08-18 17:04:19,598][02710] Waiting for process rollout_proc2 to join...
+[2025-08-18 17:04:20,329][02710] Waiting for process rollout_proc3 to join...
+[2025-08-18 17:04:20,331][02710] Waiting for process rollout_proc4 to join...
+[2025-08-18 17:04:20,333][02710] Waiting for process rollout_proc5 to join...
+[2025-08-18 17:04:20,334][02710] Waiting for process rollout_proc6 to join...
+[2025-08-18 17:04:20,334][02710] Waiting for process rollout_proc7 to join...
+[2025-08-18 17:04:20,336][02710] Batcher 0 profile tree view:
+batching: 21.0831, releasing_batches: 0.0261
+[2025-08-18 17:04:20,338][02710] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0035
+  wait_policy_total: 426.4858
+update_model: 10.0812
+  weight_update: 0.0013
+one_step: 0.0026
+  handle_policy_step: 640.2106
+    deserialize: 15.5210, stack: 4.1158, obs_to_device_normalize: 146.3139, forward: 339.0730, send_messages: 21.3403
+    prepare_outputs: 85.7238
+      to_cpu: 52.9879
+[2025-08-18 17:04:20,339][02710] Learner 0 profile tree view:
+misc: 0.0048, prepare_batch: 11.8849
+train: 65.5446
+  epoch_init: 0.0069, minibatch_init: 0.0068, losses_postprocess: 0.5636, kl_divergence: 0.5490, after_optimizer: 31.7256
+  calculate_losses: 21.8246
+    losses_init: 0.0044, forward_head: 1.2480, bptt_initial: 15.0406, tail: 0.8599, advantages_returns: 0.2057, losses: 2.6615
+    bptt: 1.5943
+      bptt_forward_core: 1.5325
+  update: 10.4315
+    clip: 0.9332
+[2025-08-18 17:04:20,340][02710] Loop Runner_EvtLoop terminating...
+[2025-08-18 17:04:20,341][02710] Runner profile tree view:
+main_loop: 1142.6643
+[2025-08-18 17:04:20,342][02710] Collected {0: 4005888}, FPS: 3505.7
+[2025-08-18 17:04:20,657][02710] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-18 17:04:20,658][02710] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-18 17:04:20,659][02710] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-18 17:04:20,661][02710] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-18 17:04:20,662][02710] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-18 17:04:20,663][02710] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-18 17:04:20,664][02710] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-18 17:04:20,666][02710] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-18 17:04:20,667][02710] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-18 17:04:20,669][02710] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-18 17:04:20,670][02710] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-18 17:04:20,670][02710] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-18 17:04:20,671][02710] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-18 17:04:20,672][02710] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-18 17:04:20,673][02710] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-18 17:04:20,701][02710] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-18 17:04:20,704][02710] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-18 17:04:20,706][02710] RunningMeanStd input shape: (1,)
+[2025-08-18 17:04:20,721][02710] ConvEncoder: input_channels=3
+[2025-08-18 17:04:20,817][02710] Conv encoder output size: 512
+[2025-08-18 17:04:20,819][02710] Policy head output size: 512
+[2025-08-18 17:04:20,976][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-18 17:04:20,979][02710] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-18 17:04:20,982][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-18 17:04:20,985][02710] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-18 17:04:20,986][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-18 17:04:20,988][02710] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-18 17:07:55,828][02710] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-18 17:07:55,829][02710] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-18 17:07:55,830][02710] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-18 17:07:55,831][02710] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-18 17:07:55,832][02710] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-18 17:07:55,833][02710] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-18 17:07:55,833][02710] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-18 17:07:55,834][02710] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-18 17:07:55,835][02710] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-18 17:07:55,836][02710] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-18 17:07:55,837][02710] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-18 17:07:55,838][02710] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-18 17:07:55,838][02710] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-18 17:07:55,839][02710] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-18 17:07:55,840][02710] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-18 17:07:55,870][02710] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-18 17:07:55,871][02710] RunningMeanStd input shape: (1,)
+[2025-08-18 17:07:55,881][02710] ConvEncoder: input_channels=3
+[2025-08-18 17:07:55,919][02710] Conv encoder output size: 512
+[2025-08-18 17:07:55,922][02710] Policy head output size: 512
+[2025-08-18 17:07:55,950][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-18 17:07:55,953][02710] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-18 17:07:55,955][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-18 17:07:55,957][02710] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-18 17:07:55,958][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-18 17:07:55,960][02710] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-18 17:11:24,695][02710] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-18 17:11:24,696][02710] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-18 17:11:24,697][02710] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-18 17:11:24,698][02710] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-18 17:11:24,699][02710] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-18 17:11:24,699][02710] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-18 17:11:24,700][02710] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-18 17:11:24,701][02710] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-18 17:11:24,702][02710] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-18 17:11:24,703][02710] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-18 17:11:24,704][02710] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-18 17:11:24,704][02710] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-18 17:11:24,705][02710] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-18 17:11:24,706][02710] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-18 17:11:24,707][02710] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-18 17:11:24,734][02710] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-18 17:11:24,735][02710] RunningMeanStd input shape: (1,)
+[2025-08-18 17:11:24,746][02710] ConvEncoder: input_channels=3
+[2025-08-18 17:11:24,778][02710] Conv encoder output size: 512
+[2025-08-18 17:11:24,779][02710] Policy head output size: 512
+[2025-08-18 17:11:24,798][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-18 17:11:25,526][02710] Num frames 100...
+[2025-08-18 17:11:25,656][02710] Num frames 200...
+[2025-08-18 17:11:25,781][02710] Num frames 300...
+[2025-08-18 17:11:25,914][02710] Num frames 400...
+[2025-08-18 17:11:26,043][02710] Num frames 500...
+[2025-08-18 17:11:26,180][02710] Num frames 600...
+[2025-08-18 17:11:26,318][02710] Num frames 700...
+[2025-08-18 17:11:26,449][02710] Num frames 800...
+[2025-08-18 17:11:26,577][02710] Num frames 900...
+[2025-08-18 17:11:26,705][02710] Num frames 1000...
+[2025-08-18 17:11:26,831][02710] Num frames 1100...
+[2025-08-18 17:11:26,957][02710] Num frames 1200...
+[2025-08-18 17:11:27,086][02710] Num frames 1300...
+[2025-08-18 17:11:27,216][02710] Num frames 1400...
+[2025-08-18 17:11:27,353][02710] Num frames 1500...
+[2025-08-18 17:11:27,482][02710] Num frames 1600...
+[2025-08-18 17:11:27,614][02710] Num frames 1700...
+[2025-08-18 17:11:27,744][02710] Num frames 1800...
+[2025-08-18 17:11:27,873][02710] Num frames 1900...
+[2025-08-18 17:11:28,005][02710] Num frames 2000...
+[2025-08-18 17:11:28,140][02710] Num frames 2100...
+[2025-08-18 17:11:28,192][02710] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000
+[2025-08-18 17:11:28,192][02710] Avg episode reward: 58.999, avg true_objective: 21.000
+[2025-08-18 17:11:28,326][02710] Num frames 2200...
+[2025-08-18 17:11:28,451][02710] Num frames 2300...
+[2025-08-18 17:11:28,579][02710] Num frames 2400...
+[2025-08-18 17:11:28,708][02710] Num frames 2500...
+[2025-08-18 17:11:28,837][02710] Num frames 2600...
+[2025-08-18 17:11:28,963][02710] Num frames 2700...
+[2025-08-18 17:11:29,091][02710] Num frames 2800...
+[2025-08-18 17:11:29,219][02710] Num frames 2900...
+[2025-08-18 17:11:29,365][02710] Num frames 3000...
+[2025-08-18 17:11:29,496][02710] Num frames 3100...
+[2025-08-18 17:11:29,629][02710] Num frames 3200...
+[2025-08-18 17:11:29,757][02710] Num frames 3300...
+[2025-08-18 17:11:29,892][02710] Num frames 3400...
+[2025-08-18 17:11:30,027][02710] Num frames 3500...
+[2025-08-18 17:11:30,163][02710] Num frames 3600...
+[2025-08-18 17:11:30,297][02710] Num frames 3700...
+[2025-08-18 17:11:30,446][02710] Num frames 3800...
+[2025-08-18 17:11:30,573][02710] Num frames 3900...
+[2025-08-18 17:11:30,700][02710] Num frames 4000...
+[2025-08-18 17:11:30,824][02710] Num frames 4100...
+[2025-08-18 17:11:30,955][02710] Num frames 4200...
+[2025-08-18 17:11:31,007][02710] Avg episode rewards: #0: 53.999, true rewards: #0: 21.000
+[2025-08-18 17:11:31,008][02710] Avg episode reward: 53.999, avg true_objective: 21.000
+[2025-08-18 17:11:31,138][02710] Num frames 4300...
+[2025-08-18 17:11:31,268][02710] Num frames 4400...
+[2025-08-18 17:11:31,409][02710] Num frames 4500...
+[2025-08-18 17:11:31,542][02710] Num frames 4600...
+[2025-08-18 17:11:31,675][02710] Num frames 4700...
+[2025-08-18 17:11:31,802][02710] Num frames 4800...
+[2025-08-18 17:11:31,929][02710] Num frames 4900...
+[2025-08-18 17:11:32,055][02710] Num frames 5000...
+[2025-08-18 17:11:32,183][02710] Num frames 5100...
+[2025-08-18 17:11:32,313][02710] Num frames 5200...
+[2025-08-18 17:11:32,454][02710] Num frames 5300...
+[2025-08-18 17:11:32,579][02710] Num frames 5400...
+[2025-08-18 17:11:32,706][02710] Num frames 5500...
+[2025-08-18 17:11:32,836][02710] Num frames 5600...
+[2025-08-18 17:11:32,969][02710] Num frames 5700...
+[2025-08-18 17:11:33,077][02710] Avg episode rewards: #0: 48.133, true rewards: #0: 19.133
+[2025-08-18 17:11:33,078][02710] Avg episode reward: 48.133, avg true_objective: 19.133
+[2025-08-18 17:11:33,155][02710] Num frames 5800...
+[2025-08-18 17:11:33,281][02710] Num frames 5900...
+[2025-08-18 17:11:33,411][02710] Num frames 6000...
+[2025-08-18 17:11:33,551][02710] Num frames 6100...
+[2025-08-18 17:11:33,680][02710] Num frames 6200...
+[2025-08-18 17:11:33,808][02710] Num frames 6300...
+[2025-08-18 17:11:33,938][02710] Num frames 6400...
+[2025-08-18 17:11:34,090][02710] Avg episode rewards: #0: 40.189, true rewards: #0: 16.190
+[2025-08-18 17:11:34,091][02710] Avg episode reward: 40.189, avg true_objective: 16.190
+[2025-08-18 17:11:34,127][02710] Num frames 6500...
+[2025-08-18 17:11:34,251][02710] Num frames 6600...
+[2025-08-18 17:11:34,381][02710] Num frames 6700...
+[2025-08-18 17:11:34,522][02710] Num frames 6800...
+[2025-08-18 17:11:34,651][02710] Num frames 6900...
+[2025-08-18 17:11:34,823][02710] Num frames 7000...
+[2025-08-18 17:11:34,995][02710] Num frames 7100...
+[2025-08-18 17:11:35,161][02710] Num frames 7200...
+[2025-08-18 17:11:35,326][02710] Num frames 7300...
+[2025-08-18 17:11:35,493][02710] Num frames 7400...
+[2025-08-18 17:11:35,677][02710] Num frames 7500...
+[2025-08-18 17:11:35,844][02710] Num frames 7600...
+[2025-08-18 17:11:35,948][02710] Avg episode rewards: #0: 37.456, true rewards: #0: 15.256
+[2025-08-18 17:11:35,951][02710] Avg episode reward: 37.456, avg true_objective: 15.256
+[2025-08-18 17:11:36,075][02710] Num frames 7700...
+[2025-08-18 17:11:36,245][02710] Num frames 7800...
+[2025-08-18 17:11:36,418][02710] Num frames 7900...
+[2025-08-18 17:11:36,595][02710] Num frames 8000...
+[2025-08-18 17:11:36,781][02710] Num frames 8100...
+[2025-08-18 17:11:36,925][02710] Num frames 8200...
+[2025-08-18 17:11:37,055][02710] Num frames 8300...
+[2025-08-18 17:11:37,106][02710] Avg episode rewards: #0: 33.333, true rewards: #0: 13.833
+[2025-08-18 17:11:37,107][02710] Avg episode reward: 33.333, avg true_objective: 13.833
+[2025-08-18 17:11:37,233][02710] Num frames 8400...
+[2025-08-18 17:11:37,364][02710] Num frames 8500...
+[2025-08-18 17:11:37,491][02710] Num frames 8600...
+[2025-08-18 17:11:37,621][02710] Num frames 8700...
+[2025-08-18 17:11:37,761][02710] Num frames 8800...
+[2025-08-18 17:11:37,893][02710] Num frames 8900...
+[2025-08-18 17:11:38,029][02710] Num frames 9000...
+[2025-08-18 17:11:38,161][02710] Avg episode rewards: #0: 30.653, true rewards: #0: 12.939
+[2025-08-18 17:11:38,162][02710] Avg episode reward: 30.653, avg true_objective: 12.939
+[2025-08-18 17:11:38,218][02710] Num frames 9100...
+[2025-08-18 17:11:38,346][02710] Num frames 9200...
+[2025-08-18 17:11:38,474][02710] Num frames 9300...
+[2025-08-18 17:11:38,601][02710] Num frames 9400...
+[2025-08-18 17:11:38,737][02710] Num frames 9500...
+[2025-08-18 17:11:38,800][02710] Avg episode rewards: #0: 27.506, true rewards: #0: 11.881
+[2025-08-18 17:11:38,801][02710] Avg episode reward: 27.506, avg true_objective: 11.881
+[2025-08-18 17:11:38,924][02710] Num frames 9600...
+[2025-08-18 17:11:39,051][02710] Num frames 9700...
+[2025-08-18 17:11:39,174][02710] Num frames 9800...
+[2025-08-18 17:11:39,321][02710] Num frames 9900...
+[2025-08-18 17:11:39,448][02710] Num frames 10000...
+[2025-08-18 17:11:39,559][02710] Avg episode rewards: #0: 25.602, true rewards: #0: 11.158
+[2025-08-18 17:11:39,560][02710] Avg episode reward: 25.602, avg true_objective: 11.158
+[2025-08-18 17:11:39,635][02710] Num frames 10100...
+[2025-08-18 17:11:39,771][02710] Num frames 10200...
+[2025-08-18 17:11:39,899][02710] Num frames 10300...
+[2025-08-18 17:11:40,025][02710] Num frames 10400...
+[2025-08-18 17:11:40,153][02710] Num frames 10500...
+[2025-08-18 17:11:40,276][02710] Num frames 10600...
+[2025-08-18 17:11:40,402][02710] Num frames 10700...
+[2025-08-18 17:11:40,528][02710] Num frames 10800...
+[2025-08-18 17:11:40,653][02710] Num frames 10900...
+[2025-08-18 17:11:40,791][02710] Num frames 11000...
+[2025-08-18 17:11:40,919][02710] Num frames 11100...
+[2025-08-18 17:11:41,045][02710] Num frames 11200...
+[2025-08-18 17:11:41,172][02710] Num frames 11300...
+[2025-08-18 17:11:41,297][02710] Num frames 11400...
+[2025-08-18 17:11:41,426][02710] Num frames 11500...
+[2025-08-18 17:11:41,554][02710] Num frames 11600...
+[2025-08-18 17:11:41,688][02710] Num frames 11700...
+[2025-08-18 17:11:41,831][02710] Num frames 11800...
+[2025-08-18 17:11:41,932][02710] Avg episode rewards: #0: 28.334, true rewards: #0: 11.834
+[2025-08-18 17:11:41,933][02710] Avg episode reward: 28.334, avg true_objective: 11.834
+[2025-08-18 17:12:56,203][02710] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+[2025-08-18 17:19:37,465][02710] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-18 17:19:37,466][02710] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-18 17:19:37,467][02710] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-18 17:19:37,467][02710] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-18 17:19:37,468][02710] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-18 17:19:37,469][02710] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-18 17:19:37,470][02710] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-18 17:19:37,471][02710] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-18 17:19:37,472][02710] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-18 17:19:37,472][02710] Adding new argument 'hf_repository'='Nikhil058/vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-18 17:19:37,473][02710] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-18 17:19:37,474][02710] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-18 17:19:37,475][02710] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-18 17:19:37,476][02710] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-18 17:19:37,477][02710] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-18 17:19:37,505][02710] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-18 17:19:37,507][02710] RunningMeanStd input shape: (1,)
+[2025-08-18 17:19:37,519][02710] ConvEncoder: input_channels=3
+[2025-08-18 17:19:37,554][02710] Conv encoder output size: 512
+[2025-08-18 17:19:37,555][02710] Policy head output size: 512
+[2025-08-18 17:19:37,574][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-18 17:19:38,037][02710] Num frames 100...
+[2025-08-18 17:19:38,170][02710] Num frames 200...
+[2025-08-18 17:19:38,300][02710] Num frames 300...
+[2025-08-18 17:19:38,437][02710] Num frames 400...
+[2025-08-18 17:19:38,569][02710] Num frames 500...
+[2025-08-18 17:19:38,702][02710] Num frames 600...
+[2025-08-18 17:19:38,845][02710] Num frames 700...
+[2025-08-18 17:19:38,975][02710] Num frames 800...
+[2025-08-18 17:19:39,102][02710] Num frames 900...
+[2025-08-18 17:19:39,233][02710] Num frames 1000...
+[2025-08-18 17:19:39,358][02710] Num frames 1100...
+[2025-08-18 17:19:39,490][02710] Num frames 1200...
+[2025-08-18 17:19:39,621][02710] Num frames 1300...
+[2025-08-18 17:19:39,750][02710] Num frames 1400...
+[2025-08-18 17:19:39,893][02710] Num frames 1500...
+[2025-08-18 17:19:40,020][02710] Num frames 1600...
+[2025-08-18 17:19:40,156][02710] Avg episode rewards: #0: 43.650, true rewards: #0: 16.650
+[2025-08-18 17:19:40,157][02710] Avg episode reward: 43.650, avg true_objective: 16.650
+[2025-08-18 17:19:40,204][02710] Num frames 1700...
+[2025-08-18 17:19:40,328][02710] Num frames 1800...
+[2025-08-18 17:19:40,452][02710] Num frames 1900...
+[2025-08-18 17:19:40,580][02710] Num frames 2000...
+[2025-08-18 17:19:40,709][02710] Num frames 2100...
+[2025-08-18 17:19:40,835][02710] Num frames 2200...
+[2025-08-18 17:19:40,969][02710] Num frames 2300...
+[2025-08-18 17:19:41,106][02710] Num frames 2400...
+[2025-08-18 17:19:41,283][02710] Num frames 2500...
+[2025-08-18 17:19:41,455][02710] Num frames 2600...
+[2025-08-18 17:19:41,622][02710] Num frames 2700...
+[2025-08-18 17:19:41,790][02710] Num frames 2800...
+[2025-08-18 17:19:41,877][02710] Avg episode rewards: #0: 35.585, true rewards: #0: 14.085
+[2025-08-18 17:19:41,878][02710] Avg episode reward: 35.585, avg true_objective: 14.085
+[2025-08-18 17:19:42,024][02710] Num frames 2900...
+[2025-08-18 17:19:42,190][02710] Num frames 3000...
+[2025-08-18 17:19:42,352][02710] Num frames 3100...
+[2025-08-18 17:19:42,524][02710] Num frames 3200...
+[2025-08-18 17:19:42,704][02710] Num frames 3300...
+[2025-08-18 17:19:42,873][02710] Num frames 3400...
+[2025-08-18 17:19:43,031][02710] Avg episode rewards: #0: 27.857, true rewards: #0: 11.523
+[2025-08-18 17:19:43,033][02710] Avg episode reward: 27.857, avg true_objective: 11.523
+[2025-08-18 17:19:43,115][02710] Num frames 3500...
+[2025-08-18 17:19:43,303][02710] Avg episode rewards: #0: 21.213, true rewards: #0: 8.962
+[2025-08-18 17:19:43,304][02710] Avg episode reward: 21.213, avg true_objective: 8.962
+[2025-08-18 17:19:43,324][02710] Num frames 3600...
+[2025-08-18 17:19:43,450][02710] Num frames 3700...
+[2025-08-18 17:19:43,581][02710] Num frames 3800...
+[2025-08-18 17:19:43,710][02710] Num frames 3900...
+[2025-08-18 17:19:43,840][02710] Num frames 4000...
+[2025-08-18 17:19:43,970][02710] Num frames 4100...
+[2025-08-18 17:19:44,115][02710] Num frames 4200...
+[2025-08-18 17:19:44,245][02710] Num frames 4300...
+[2025-08-18 17:19:44,375][02710] Num frames 4400...
+[2025-08-18 17:19:44,501][02710] Num frames 4500...
+[2025-08-18 17:19:44,629][02710] Num frames 4600...
+[2025-08-18 17:19:44,719][02710] Avg episode rewards: #0: 21.050, true rewards: #0: 9.250
+[2025-08-18 17:19:44,720][02710] Avg episode reward: 21.050, avg true_objective: 9.250
+[2025-08-18 17:19:44,824][02710] Num frames 4700...
+[2025-08-18 17:19:44,948][02710] Num frames 4800...
+[2025-08-18 17:19:45,081][02710] Num frames 4900...
+[2025-08-18 17:19:45,209][02710] Num frames 5000...
+[2025-08-18 17:19:45,361][02710] Avg episode rewards: #0: 18.955, true rewards: #0: 8.455
+[2025-08-18 17:19:45,362][02710] Avg episode reward: 18.955, avg true_objective: 8.455
+[2025-08-18 17:19:45,397][02710] Num frames 5100...
+[2025-08-18 17:19:45,522][02710] Num frames 5200...
+[2025-08-18 17:19:45,653][02710] Num frames 5300...
+[2025-08-18 17:19:45,779][02710] Num frames 5400...
+[2025-08-18 17:19:45,910][02710] Num frames 5500...
+[2025-08-18 17:19:46,039][02710] Num frames 5600...
+[2025-08-18 17:19:46,185][02710] Num frames 5700...
+[2025-08-18 17:19:46,315][02710] Num frames 5800...
+[2025-08-18 17:19:46,446][02710] Num frames 5900...
+[2025-08-18 17:19:46,572][02710] Num frames 6000...
+[2025-08-18 17:19:46,702][02710] Num frames 6100...
+[2025-08-18 17:19:46,829][02710] Num frames 6200...
+[2025-08-18 17:19:46,955][02710] Num frames 6300...
+[2025-08-18 17:19:47,084][02710] Num frames 6400...
+[2025-08-18 17:19:47,240][02710] Num frames 6500...
+[2025-08-18 17:19:47,366][02710] Num frames 6600...
+[2025-08-18 17:19:47,502][02710] Num frames 6700...
+[2025-08-18 17:19:47,652][02710] Avg episode rewards: #0: 22.533, true rewards: #0: 9.676
+[2025-08-18 17:19:47,653][02710] Avg episode reward: 22.533, avg true_objective: 9.676
+[2025-08-18 17:19:47,690][02710] Num frames 6800...
+[2025-08-18 17:19:47,815][02710] Num frames 6900...
+[2025-08-18 17:19:47,941][02710] Num frames 7000...
+[2025-08-18 17:19:48,068][02710] Num frames 7100...
+[2025-08-18 17:19:48,207][02710] Num frames 7200...
+[2025-08-18 17:19:48,338][02710] Num frames 7300...
+[2025-08-18 17:19:48,466][02710] Num frames 7400...
+[2025-08-18 17:19:48,593][02710] Num frames 7500...
+[2025-08-18 17:19:48,723][02710] Num frames 7600...
+[2025-08-18 17:19:48,853][02710] Num frames 7700...
+[2025-08-18 17:19:48,981][02710] Num frames 7800...
+[2025-08-18 17:19:49,112][02710] Num frames 7900...
+[2025-08-18 17:19:49,251][02710] Num frames 8000...
+[2025-08-18 17:19:49,382][02710] Num frames 8100...
+[2025-08-18 17:19:49,510][02710] Num frames 8200...
+[2025-08-18 17:19:49,691][02710] Avg episode rewards: #0: 23.871, true rewards: #0: 10.371
+[2025-08-18 17:19:49,692][02710] Avg episode reward: 23.871, avg true_objective: 10.371
+[2025-08-18 17:19:49,697][02710] Num frames 8300...
+[2025-08-18 17:19:49,827][02710] Num frames 8400...
+[2025-08-18 17:19:49,954][02710] Num frames 8500...
+[2025-08-18 17:19:50,079][02710] Num frames 8600...
+[2025-08-18 17:19:50,209][02710] Num frames 8700...
+[2025-08-18 17:19:50,347][02710] Num frames 8800...
+[2025-08-18 17:19:50,480][02710] Num frames 8900...
+[2025-08-18 17:19:50,607][02710] Num frames 9000...
+[2025-08-18 17:19:50,666][02710] Avg episode rewards: #0: 22.668, true rewards: #0: 10.001
+[2025-08-18 17:19:50,667][02710] Avg episode reward: 22.668, avg true_objective: 10.001
+[2025-08-18 17:19:50,792][02710] Num frames 9100...
+[2025-08-18 17:19:50,921][02710] Num frames 9200...
+[2025-08-18 17:19:51,046][02710] Num frames 9300...
+[2025-08-18 17:19:51,171][02710] Num frames 9400...
+[2025-08-18 17:19:51,311][02710] Num frames 9500...
+[2025-08-18 17:19:51,437][02710] Num frames 9600...
+[2025-08-18 17:19:51,562][02710] Num frames 9700...
+[2025-08-18 17:19:51,689][02710] Num frames 9800...
+[2025-08-18 17:19:51,826][02710] Avg episode rewards: #0: 22.065, true rewards: #0: 9.865
+[2025-08-18 17:19:51,827][02710] Avg episode reward: 22.065, avg true_objective: 9.865
+[2025-08-18 17:20:53,325][02710] Replay video saved to /content/train_dir/default_experiment/replay.mp4!