diff --git "a/sf_log.txt" "b/sf_log.txt"
new file mode 100644--- /dev/null
+++ "b/sf_log.txt"
@@ -0,0 +1,1262 @@
+[2025-08-28 13:05:32,791][02490] Saving configuration to /content/train_dir/default_experiment/config.json...
+[2025-08-28 13:05:32,793][02490] Rollout worker 0 uses device cpu
+[2025-08-28 13:05:32,794][02490] Rollout worker 1 uses device cpu
+[2025-08-28 13:05:32,795][02490] Rollout worker 2 uses device cpu
+[2025-08-28 13:05:32,796][02490] Rollout worker 3 uses device cpu
+[2025-08-28 13:05:32,797][02490] Rollout worker 4 uses device cpu
+[2025-08-28 13:05:32,798][02490] Rollout worker 5 uses device cpu
+[2025-08-28 13:05:32,799][02490] Rollout worker 6 uses device cpu
+[2025-08-28 13:05:32,800][02490] Rollout worker 7 uses device cpu
+[2025-08-28 13:05:32,941][02490] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-28 13:05:32,942][02490] InferenceWorker_p0-w0: min num requests: 2
+[2025-08-28 13:05:32,969][02490] Starting all processes...
+[2025-08-28 13:05:32,970][02490] Starting process learner_proc0
+[2025-08-28 13:05:33,031][02490] Starting all processes...
+[2025-08-28 13:05:33,036][02490] Starting process inference_proc0-0
+[2025-08-28 13:05:33,037][02490] Starting process rollout_proc0
+[2025-08-28 13:05:33,037][02490] Starting process rollout_proc1
+[2025-08-28 13:05:33,037][02490] Starting process rollout_proc2
+[2025-08-28 13:05:33,037][02490] Starting process rollout_proc3
+[2025-08-28 13:05:33,037][02490] Starting process rollout_proc4
+[2025-08-28 13:05:33,038][02490] Starting process rollout_proc5
+[2025-08-28 13:05:33,038][02490] Starting process rollout_proc6
+[2025-08-28 13:05:33,038][02490] Starting process rollout_proc7
+[2025-08-28 13:05:48,455][02637] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-28 13:05:48,456][02637] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2025-08-28 13:05:48,530][02637] Num visible devices: 1
+[2025-08-28 13:05:48,545][02658] Worker 7 uses CPU cores [1]
+[2025-08-28 13:05:48,545][02637] Starting seed is not provided
+[2025-08-28 13:05:48,546][02637] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-28 13:05:48,547][02637] Initializing actor-critic model on device cuda:0
+[2025-08-28 13:05:48,548][02637] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-28 13:05:48,551][02637] RunningMeanStd input shape: (1,)
+[2025-08-28 13:05:48,556][02654] Worker 4 uses CPU cores [0]
+[2025-08-28 13:05:48,570][02652] Worker 1 uses CPU cores [1]
+[2025-08-28 13:05:48,595][02637] ConvEncoder: input_channels=3
+[2025-08-28 13:05:48,601][02653] Worker 2 uses CPU cores [0]
+[2025-08-28 13:05:48,663][02655] Worker 3 uses CPU cores [1]
+[2025-08-28 13:05:48,668][02656] Worker 5 uses CPU cores [1]
+[2025-08-28 13:05:48,725][02650] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-28 13:05:48,726][02650] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2025-08-28 13:05:48,748][02650] Num visible devices: 1
+[2025-08-28 13:05:48,776][02651] Worker 0 uses CPU cores [0]
+[2025-08-28 13:05:48,784][02657] Worker 6 uses CPU cores [0]
+[2025-08-28 13:05:48,873][02637] Conv encoder output size: 512
+[2025-08-28 13:05:48,873][02637] Policy head output size: 512
+[2025-08-28 13:05:48,919][02637] Created Actor Critic model with architecture:
+[2025-08-28 13:05:48,919][02637] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2025-08-28 13:05:49,206][02637] Using optimizer <class 'torch.optim.adam.Adam'>
+[2025-08-28 13:05:52,934][02490] Heartbeat connected on Batcher_0
+[2025-08-28 13:05:52,941][02490] Heartbeat connected on InferenceWorker_p0-w0
+[2025-08-28 13:05:52,948][02490] Heartbeat connected on RolloutWorker_w0
+[2025-08-28 13:05:52,951][02490] Heartbeat connected on RolloutWorker_w1
+[2025-08-28 13:05:52,957][02490] Heartbeat connected on RolloutWorker_w3
+[2025-08-28 13:05:52,959][02490] Heartbeat connected on RolloutWorker_w2
+[2025-08-28 13:05:52,961][02490] Heartbeat connected on RolloutWorker_w4
+[2025-08-28 13:05:52,964][02490] Heartbeat connected on RolloutWorker_w5
+[2025-08-28 13:05:52,969][02490] Heartbeat connected on RolloutWorker_w6
+[2025-08-28 13:05:52,974][02490] Heartbeat connected on RolloutWorker_w7
+[2025-08-28 13:05:54,200][02637] No checkpoints found
+[2025-08-28 13:05:54,201][02637] Did not load from checkpoint, starting from scratch!
+[2025-08-28 13:05:54,201][02637] Initialized policy 0 weights for model version 0
+[2025-08-28 13:05:54,210][02637] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-28 13:05:54,220][02637] LearnerWorker_p0 finished initialization!
+[2025-08-28 13:05:54,221][02490] Heartbeat connected on LearnerWorker_p0
+[2025-08-28 13:05:54,398][02650] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-28 13:05:54,399][02650] RunningMeanStd input shape: (1,)
+[2025-08-28 13:05:54,418][02650] ConvEncoder: input_channels=3
+[2025-08-28 13:05:54,567][02650] Conv encoder output size: 512
+[2025-08-28 13:05:54,568][02650] Policy head output size: 512
+[2025-08-28 13:05:54,624][02490] Inference worker 0-0 is ready!
+[2025-08-28 13:05:54,624][02490] All inference workers are ready! Signal rollout workers to start!
+[2025-08-28 13:05:54,844][02656] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-28 13:05:54,845][02652] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-28 13:05:54,860][02655] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-28 13:05:54,857][02658] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-28 13:05:54,921][02657] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-28 13:05:54,923][02654] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-28 13:05:54,933][02653] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-28 13:05:54,934][02651] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-28 13:05:56,086][02657] Decorrelating experience for 0 frames...
+[2025-08-28 13:05:56,089][02653] Decorrelating experience for 0 frames...
+[2025-08-28 13:05:56,463][02653] Decorrelating experience for 32 frames...
+[2025-08-28 13:05:56,725][02658] Decorrelating experience for 0 frames...
+[2025-08-28 13:05:56,727][02652] Decorrelating experience for 0 frames...
+[2025-08-28 13:05:56,725][02655] Decorrelating experience for 0 frames...
+[2025-08-28 13:05:56,734][02656] Decorrelating experience for 0 frames...
+[2025-08-28 13:05:57,090][02652] Decorrelating experience for 32 frames...
+[2025-08-28 13:05:57,537][02653] Decorrelating experience for 64 frames...
+[2025-08-28 13:05:57,573][02652] Decorrelating experience for 64 frames...
+[2025-08-28 13:05:57,809][02651] Decorrelating experience for 0 frames...
+[2025-08-28 13:05:57,814][02654] Decorrelating experience for 0 frames...
+[2025-08-28 13:05:58,302][02490] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-28 13:05:58,341][02658] Decorrelating experience for 32 frames...
+[2025-08-28 13:05:58,499][02652] Decorrelating experience for 96 frames...
+[2025-08-28 13:05:59,074][02654] Decorrelating experience for 32 frames...
+[2025-08-28 13:05:59,097][02653] Decorrelating experience for 96 frames...
+[2025-08-28 13:05:59,108][02657] Decorrelating experience for 32 frames...
+[2025-08-28 13:05:59,200][02655] Decorrelating experience for 32 frames...
+[2025-08-28 13:05:59,817][02651] Decorrelating experience for 32 frames...
+[2025-08-28 13:06:00,162][02656] Decorrelating experience for 32 frames...
+[2025-08-28 13:06:00,568][02654] Decorrelating experience for 64 frames...
+[2025-08-28 13:06:01,320][02657] Decorrelating experience for 64 frames...
+[2025-08-28 13:06:01,819][02651] Decorrelating experience for 64 frames...
+[2025-08-28 13:06:02,156][02658] Decorrelating experience for 64 frames...
+[2025-08-28 13:06:02,577][02656] Decorrelating experience for 64 frames...
+[2025-08-28 13:06:02,607][02654] Decorrelating experience for 96 frames...
+[2025-08-28 13:06:02,611][02655] Decorrelating experience for 64 frames...
+[2025-08-28 13:06:03,302][02490] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 20.8. Samples: 104. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-28 13:06:03,308][02490] Avg episode reward: [(0, '2.735')]
+[2025-08-28 13:06:04,842][02658] Decorrelating experience for 96 frames...
+[2025-08-28 13:06:04,977][02651] Decorrelating experience for 96 frames...
+[2025-08-28 13:06:05,677][02656] Decorrelating experience for 96 frames...
+[2025-08-28 13:06:05,700][02655] Decorrelating experience for 96 frames...
+[2025-08-28 13:06:06,617][02637] Signal inference workers to stop experience collection...
+[2025-08-28 13:06:06,626][02650] InferenceWorker_p0-w0: stopping experience collection
+[2025-08-28 13:06:06,696][02657] Decorrelating experience for 96 frames...
+[2025-08-28 13:06:06,962][02637] Signal inference workers to resume experience collection...
+[2025-08-28 13:06:06,962][02650] InferenceWorker_p0-w0: resuming experience collection
+[2025-08-28 13:06:08,303][02490] Fps is (10 sec: 819.1, 60 sec: 819.1, 300 sec: 819.1). Total num frames: 8192. Throughput: 0: 233.2. Samples: 2332. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2025-08-28 13:06:08,305][02490] Avg episode reward: [(0, '3.000')]
+[2025-08-28 13:06:13,302][02490] Fps is (10 sec: 2867.2, 60 sec: 1911.5, 300 sec: 1911.5). Total num frames: 28672. Throughput: 0: 482.8. Samples: 7242. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:06:13,305][02490] Avg episode reward: [(0, '3.523')]
+[2025-08-28 13:06:15,681][02650] Updated weights for policy 0, policy_version 10 (0.0127)
+[2025-08-28 13:06:18,302][02490] Fps is (10 sec: 4096.5, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 49152. Throughput: 0: 535.4. Samples: 10708. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:06:18,305][02490] Avg episode reward: [(0, '4.347')]
+[2025-08-28 13:06:23,302][02490] Fps is (10 sec: 4096.0, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 69632. Throughput: 0: 680.1. Samples: 17002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:06:23,305][02490] Avg episode reward: [(0, '4.598')]
+[2025-08-28 13:06:27,114][02650] Updated weights for policy 0, policy_version 20 (0.0016)
+[2025-08-28 13:06:28,302][02490] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 86016. Throughput: 0: 724.0. Samples: 21720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:06:28,303][02490] Avg episode reward: [(0, '4.490')]
+[2025-08-28 13:06:33,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3042.7, 300 sec: 3042.7). Total num frames: 106496. Throughput: 0: 717.8. Samples: 25124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:06:33,303][02490] Avg episode reward: [(0, '4.534')]
+[2025-08-28 13:06:33,354][02637] Saving new best policy, reward=4.534!
+[2025-08-28 13:06:36,105][02650] Updated weights for policy 0, policy_version 30 (0.0018)
+[2025-08-28 13:06:38,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3174.4, 300 sec: 3174.4). Total num frames: 126976. Throughput: 0: 787.7. Samples: 31508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:06:38,303][02490] Avg episode reward: [(0, '4.599')]
+[2025-08-28 13:06:38,307][02637] Saving new best policy, reward=4.599!
+[2025-08-28 13:06:43,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3185.8, 300 sec: 3185.8). Total num frames: 143360. Throughput: 0: 791.3. Samples: 35610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:06:43,303][02490] Avg episode reward: [(0, '4.502')]
+[2025-08-28 13:06:47,900][02650] Updated weights for policy 0, policy_version 40 (0.0021)
+[2025-08-28 13:06:48,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 163840. Throughput: 0: 865.2. Samples: 39036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:06:48,304][02490] Avg episode reward: [(0, '4.590')]
+[2025-08-28 13:06:53,303][02490] Fps is (10 sec: 3686.0, 60 sec: 3276.7, 300 sec: 3276.7). Total num frames: 180224. Throughput: 0: 952.5. Samples: 45196. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:06:53,306][02490] Avg episode reward: [(0, '4.640')]
+[2025-08-28 13:06:53,308][02637] Saving new best policy, reward=4.640!
+[2025-08-28 13:06:58,303][02490] Fps is (10 sec: 3685.9, 60 sec: 3345.0, 300 sec: 3345.0). Total num frames: 200704. Throughput: 0: 949.9. Samples: 49988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:06:58,314][02490] Avg episode reward: [(0, '4.503')]
+[2025-08-28 13:06:59,212][02650] Updated weights for policy 0, policy_version 50 (0.0030)
+[2025-08-28 13:07:03,302][02490] Fps is (10 sec: 4096.5, 60 sec: 3686.4, 300 sec: 3402.8). Total num frames: 221184. Throughput: 0: 945.7. Samples: 53264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-28 13:07:03,303][02490] Avg episode reward: [(0, '4.481')]
+[2025-08-28 13:07:08,302][02490] Fps is (10 sec: 3686.9, 60 sec: 3823.0, 300 sec: 3393.8). Total num frames: 237568. Throughput: 0: 933.7. Samples: 59020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:07:08,308][02490] Avg episode reward: [(0, '4.413')]
+[2025-08-28 13:07:10,620][02650] Updated weights for policy 0, policy_version 60 (0.0016)
+[2025-08-28 13:07:13,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3386.0). Total num frames: 253952. Throughput: 0: 942.5. Samples: 64132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:07:13,306][02490] Avg episode reward: [(0, '4.387')]
+[2025-08-28 13:07:18,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3481.6). Total num frames: 278528. Throughput: 0: 939.1. Samples: 67384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:07:18,307][02490] Avg episode reward: [(0, '4.427')]
+[2025-08-28 13:07:20,205][02650] Updated weights for policy 0, policy_version 70 (0.0015)
+[2025-08-28 13:07:23,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3469.6). Total num frames: 294912. Throughput: 0: 928.3. Samples: 73280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:07:23,307][02490] Avg episode reward: [(0, '4.354')]
+[2025-08-28 13:07:28,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3458.8). Total num frames: 311296. Throughput: 0: 951.2. Samples: 78414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:07:28,307][02490] Avg episode reward: [(0, '4.595')]
+[2025-08-28 13:07:28,313][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth...
+[2025-08-28 13:07:31,368][02650] Updated weights for policy 0, policy_version 80 (0.0024)
+[2025-08-28 13:07:33,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3535.5). Total num frames: 335872. Throughput: 0: 947.7. Samples: 81684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:07:33,307][02490] Avg episode reward: [(0, '4.600')]
+[2025-08-28 13:07:38,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3481.6). Total num frames: 348160. Throughput: 0: 939.9. Samples: 87492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:07:38,306][02490] Avg episode reward: [(0, '4.172')]
+[2025-08-28 13:07:42,621][02650] Updated weights for policy 0, policy_version 90 (0.0033)
+[2025-08-28 13:07:43,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3510.9). Total num frames: 368640. Throughput: 0: 949.4. Samples: 92708. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:07:43,306][02490] Avg episode reward: [(0, '4.339')]
+[2025-08-28 13:07:48,302][02490] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3574.7). Total num frames: 393216. Throughput: 0: 950.4. Samples: 96030. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-28 13:07:48,306][02490] Avg episode reward: [(0, '4.530')]
+[2025-08-28 13:07:53,305][02490] Fps is (10 sec: 3685.3, 60 sec: 3754.5, 300 sec: 3526.0). Total num frames: 405504. Throughput: 0: 948.3. Samples: 101696. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:07:53,307][02490] Avg episode reward: [(0, '4.412')]
+[2025-08-28 13:07:53,573][02650] Updated weights for policy 0, policy_version 100 (0.0022)
+[2025-08-28 13:07:58,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 425984. Throughput: 0: 954.1. Samples: 107066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:07:58,304][02490] Avg episode reward: [(0, '4.358')]
+[2025-08-28 13:08:03,304][02490] Fps is (10 sec: 4096.7, 60 sec: 3754.6, 300 sec: 3571.7). Total num frames: 446464. Throughput: 0: 955.9. Samples: 110402. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:08:03,307][02490] Avg episode reward: [(0, '4.448')]
+[2025-08-28 13:08:03,381][02650] Updated weights for policy 0, policy_version 110 (0.0016)
+[2025-08-28 13:08:08,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3560.4). Total num frames: 462848. Throughput: 0: 946.3. Samples: 115862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:08:08,304][02490] Avg episode reward: [(0, '4.515')]
+[2025-08-28 13:08:13,302][02490] Fps is (10 sec: 3686.9, 60 sec: 3822.9, 300 sec: 3580.2). Total num frames: 483328. Throughput: 0: 957.3. Samples: 121492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:08:13,303][02490] Avg episode reward: [(0, '4.573')]
+[2025-08-28 13:08:14,422][02650] Updated weights for policy 0, policy_version 120 (0.0017)
+[2025-08-28 13:08:18,302][02490] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3627.9). Total num frames: 507904. Throughput: 0: 958.2. Samples: 124802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:08:18,305][02490] Avg episode reward: [(0, '4.592')]
+[2025-08-28 13:08:23,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3587.5). Total num frames: 520192. Throughput: 0: 946.8. Samples: 130100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-28 13:08:23,308][02490] Avg episode reward: [(0, '4.636')]
+[2025-08-28 13:08:25,548][02650] Updated weights for policy 0, policy_version 130 (0.0014)
+[2025-08-28 13:08:28,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3604.5). Total num frames: 540672. Throughput: 0: 957.6. Samples: 135802. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:08:28,304][02490] Avg episode reward: [(0, '4.769')]
+[2025-08-28 13:08:28,310][02637] Saving new best policy, reward=4.769!
+[2025-08-28 13:08:33,302][02490] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3646.8). Total num frames: 565248. Throughput: 0: 953.4. Samples: 138934. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:08:33,306][02490] Avg episode reward: [(0, '4.538')]
+[2025-08-28 13:08:36,302][02650] Updated weights for policy 0, policy_version 140 (0.0016)
+[2025-08-28 13:08:38,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3609.6). Total num frames: 577536. Throughput: 0: 942.2. Samples: 144092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:08:38,303][02490] Avg episode reward: [(0, '4.624')]
+[2025-08-28 13:08:43,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3624.3). Total num frames: 598016. Throughput: 0: 956.6. Samples: 150112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:08:43,304][02490] Avg episode reward: [(0, '4.726')]
+[2025-08-28 13:08:46,338][02650] Updated weights for policy 0, policy_version 150 (0.0017)
+[2025-08-28 13:08:48,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3638.2). Total num frames: 618496. Throughput: 0: 956.3. Samples: 153432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:08:48,306][02490] Avg episode reward: [(0, '4.467')]
+[2025-08-28 13:08:53,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3627.9). Total num frames: 634880. Throughput: 0: 945.0. Samples: 158388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:08:53,303][02490] Avg episode reward: [(0, '4.484')]
+[2025-08-28 13:08:57,494][02650] Updated weights for policy 0, policy_version 160 (0.0016)
+[2025-08-28 13:08:58,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3640.9). Total num frames: 655360. Throughput: 0: 955.8. Samples: 164502. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:08:58,307][02490] Avg episode reward: [(0, '4.607')]
+[2025-08-28 13:09:03,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3653.2). Total num frames: 675840. Throughput: 0: 955.0. Samples: 167778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:09:03,305][02490] Avg episode reward: [(0, '4.581')]
+[2025-08-28 13:09:08,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3643.3). Total num frames: 692224. Throughput: 0: 944.0. Samples: 172582. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:09:08,307][02490] Avg episode reward: [(0, '4.504')]
+[2025-08-28 13:09:08,938][02650] Updated weights for policy 0, policy_version 170 (0.0013)
+[2025-08-28 13:09:13,302][02490] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3654.9). Total num frames: 712704. Throughput: 0: 958.0. Samples: 178912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:09:13,307][02490] Avg episode reward: [(0, '4.512')]
+[2025-08-28 13:09:18,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3665.9). Total num frames: 733184. Throughput: 0: 960.2. Samples: 182142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:09:18,306][02490] Avg episode reward: [(0, '4.570')]
+[2025-08-28 13:09:18,530][02650] Updated weights for policy 0, policy_version 180 (0.0012)
+[2025-08-28 13:09:23,302][02490] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3656.4). Total num frames: 749568. Throughput: 0: 949.0. Samples: 186796. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-28 13:09:23,306][02490] Avg episode reward: [(0, '4.680')]
+[2025-08-28 13:09:28,303][02490] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3666.9). Total num frames: 770048. Throughput: 0: 955.5. Samples: 193110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:09:28,304][02490] Avg episode reward: [(0, '4.703')]
+[2025-08-28 13:09:28,311][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth...
+[2025-08-28 13:09:29,511][02650] Updated weights for policy 0, policy_version 190 (0.0024)
+[2025-08-28 13:09:33,307][02490] Fps is (10 sec: 4093.9, 60 sec: 3754.3, 300 sec: 3676.8). Total num frames: 790528. Throughput: 0: 952.7. Samples: 196310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:09:33,309][02490] Avg episode reward: [(0, '4.481')]
+[2025-08-28 13:09:38,302][02490] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3667.8). Total num frames: 806912. Throughput: 0: 943.6. Samples: 200850. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:09:38,307][02490] Avg episode reward: [(0, '4.558')]
+[2025-08-28 13:09:40,802][02650] Updated weights for policy 0, policy_version 200 (0.0017)
+[2025-08-28 13:09:43,302][02490] Fps is (10 sec: 3688.3, 60 sec: 3822.9, 300 sec: 3677.3). Total num frames: 827392. Throughput: 0: 952.8. Samples: 207376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:09:43,307][02490] Avg episode reward: [(0, '4.465')]
+[2025-08-28 13:09:48,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3686.4). Total num frames: 847872. Throughput: 0: 952.4. Samples: 210638. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:09:48,308][02490] Avg episode reward: [(0, '4.377')]
+[2025-08-28 13:09:51,917][02650] Updated weights for policy 0, policy_version 210 (0.0015)
+[2025-08-28 13:09:53,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3677.7). Total num frames: 864256. Throughput: 0: 946.6. Samples: 215178. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-28 13:09:53,303][02490] Avg episode reward: [(0, '4.397')]
+[2025-08-28 13:09:58,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3686.4). Total num frames: 884736. Throughput: 0: 951.1. Samples: 221710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:09:58,304][02490] Avg episode reward: [(0, '4.681')]
+[2025-08-28 13:10:01,807][02650] Updated weights for policy 0, policy_version 220 (0.0016)
+[2025-08-28 13:10:03,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3678.0). Total num frames: 901120. Throughput: 0: 950.0. Samples: 224894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:10:03,307][02490] Avg episode reward: [(0, '4.482')]
+[2025-08-28 13:10:08,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3686.4). Total num frames: 921600. Throughput: 0: 947.2. Samples: 229418. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:10:08,303][02490] Avg episode reward: [(0, '4.468')]
+[2025-08-28 13:10:12,674][02650] Updated weights for policy 0, policy_version 230 (0.0021)
+[2025-08-28 13:10:13,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3694.4). Total num frames: 942080. Throughput: 0: 953.2. Samples: 236002. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:10:13,304][02490] Avg episode reward: [(0, '4.722')]
+[2025-08-28 13:10:18,303][02490] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3686.4). Total num frames: 958464. Throughput: 0: 955.0. Samples: 239280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:10:18,304][02490] Avg episode reward: [(0, '4.670')]
+[2025-08-28 13:10:23,306][02490] Fps is (10 sec: 3685.0, 60 sec: 3822.7, 300 sec: 3694.1). Total num frames: 978944. Throughput: 0: 956.0. Samples: 243874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:10:23,309][02490] Avg episode reward: [(0, '4.884')]
+[2025-08-28 13:10:23,311][02637] Saving new best policy, reward=4.884!
+[2025-08-28 13:10:24,034][02650] Updated weights for policy 0, policy_version 240 (0.0022)
+[2025-08-28 13:10:28,302][02490] Fps is (10 sec: 4096.1, 60 sec: 3823.0, 300 sec: 3701.6). Total num frames: 999424. Throughput: 0: 953.7. Samples: 250294. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:10:28,306][02490] Avg episode reward: [(0, '5.025')]
+[2025-08-28 13:10:28,319][02637] Saving new best policy, reward=5.025!
+[2025-08-28 13:10:33,303][02490] Fps is (10 sec: 3687.3, 60 sec: 3754.9, 300 sec: 3693.8). Total num frames: 1015808. Throughput: 0: 951.7. Samples: 253464. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:10:33,308][02490] Avg episode reward: [(0, '4.787')]
+[2025-08-28 13:10:35,382][02650] Updated weights for policy 0, policy_version 250 (0.0021)
+[2025-08-28 13:10:38,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3701.0). Total num frames: 1036288. Throughput: 0: 950.4. Samples: 257944. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-28 13:10:38,304][02490] Avg episode reward: [(0, '4.636')]
+[2025-08-28 13:10:43,302][02490] Fps is (10 sec: 4096.5, 60 sec: 3822.9, 300 sec: 3708.0). Total num frames: 1056768. Throughput: 0: 951.1. Samples: 264508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-28 13:10:43,303][02490] Avg episode reward: [(0, '4.549')]
+[2025-08-28 13:10:44,682][02650] Updated weights for policy 0, policy_version 260 (0.0023)
+[2025-08-28 13:10:48,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3700.5). Total num frames: 1073152. Throughput: 0: 952.1. Samples: 267740. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:10:48,304][02490] Avg episode reward: [(0, '4.524')]
+[2025-08-28 13:10:53,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3707.2). Total num frames: 1093632. Throughput: 0: 953.2. Samples: 272310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:10:53,306][02490] Avg episode reward: [(0, '4.545')]
+[2025-08-28 13:10:55,936][02650] Updated weights for policy 0, policy_version 270 (0.0021)
+[2025-08-28 13:10:58,302][02490] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 1114112. Throughput: 0: 952.1. Samples: 278848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:10:58,307][02490] Avg episode reward: [(0, '4.769')]
+[2025-08-28 13:11:03,303][02490] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1130496. Throughput: 0: 947.6. Samples: 281924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:11:03,304][02490] Avg episode reward: [(0, '4.705')]
+[2025-08-28 13:11:07,314][02650] Updated weights for policy 0, policy_version 280 (0.0016)
+[2025-08-28 13:11:08,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1146880. Throughput: 0: 950.6. Samples: 286646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:11:08,306][02490] Avg episode reward: [(0, '4.661')]
+[2025-08-28 13:11:13,302][02490] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1171456. Throughput: 0: 955.9. Samples: 293310. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:11:13,307][02490] Avg episode reward: [(0, '4.430')]
+[2025-08-28 13:11:17,763][02650] Updated weights for policy 0, policy_version 290 (0.0016)
+[2025-08-28 13:11:18,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 1187840. Throughput: 0: 950.8. Samples: 296248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:11:18,303][02490] Avg episode reward: [(0, '4.453')]
+[2025-08-28 13:11:23,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3804.4). Total num frames: 1208320. Throughput: 0: 958.8. Samples: 301090. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:11:23,307][02490] Avg episode reward: [(0, '4.608')]
+[2025-08-28 13:11:27,884][02650] Updated weights for policy 0, policy_version 300 (0.0017)
+[2025-08-28 13:11:28,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1228800. Throughput: 0: 958.1. Samples: 307624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:11:28,307][02490] Avg episode reward: [(0, '4.926')]
+[2025-08-28 13:11:28,315][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000300_1228800.pth...
+[2025-08-28 13:11:28,427][02637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000076_311296.pth
+[2025-08-28 13:11:33,309][02490] Fps is (10 sec: 3683.8, 60 sec: 3822.6, 300 sec: 3790.4). Total num frames: 1245184. Throughput: 0: 945.6. Samples: 310300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:11:33,311][02490] Avg episode reward: [(0, '4.840')]
+[2025-08-28 13:11:38,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1265664. Throughput: 0: 954.1. Samples: 315246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:11:38,307][02490] Avg episode reward: [(0, '4.604')]
+[2025-08-28 13:11:39,125][02650] Updated weights for policy 0, policy_version 310 (0.0013)
+[2025-08-28 13:11:43,302][02490] Fps is (10 sec: 4098.8, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1286144. Throughput: 0: 954.4. Samples: 321794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:11:43,307][02490] Avg episode reward: [(0, '4.541')]
+[2025-08-28 13:11:48,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1302528. Throughput: 0: 944.1. Samples: 324408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:11:48,310][02490] Avg episode reward: [(0, '4.633')]
+[2025-08-28 13:11:50,430][02650] Updated weights for policy 0, policy_version 320 (0.0017)
+[2025-08-28 13:11:53,306][02490] Fps is (10 sec: 3275.5, 60 sec: 3754.4, 300 sec: 3790.5). Total num frames: 1318912. Throughput: 0: 954.9. Samples: 329620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:11:53,308][02490] Avg episode reward: [(0, '4.786')]
+[2025-08-28 13:11:58,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1343488. Throughput: 0: 951.7. Samples: 336136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:11:58,306][02490] Avg episode reward: [(0, '4.655')]
+[2025-08-28 13:12:00,887][02650] Updated weights for policy 0, policy_version 330 (0.0031)
+[2025-08-28 13:12:03,302][02490] Fps is (10 sec: 3687.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1355776. Throughput: 0: 938.9. Samples: 338500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:12:03,310][02490] Avg episode reward: [(0, '4.767')]
+[2025-08-28 13:12:08,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1380352. Throughput: 0: 949.5. Samples: 343818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:12:08,304][02490] Avg episode reward: [(0, '5.000')]
+[2025-08-28 13:12:11,147][02650] Updated weights for policy 0, policy_version 340 (0.0029)
+[2025-08-28 13:12:13,302][02490] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1400832. Throughput: 0: 948.8. Samples: 350320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:12:13,304][02490] Avg episode reward: [(0, '4.729')]
+[2025-08-28 13:12:18,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1413120. Throughput: 0: 940.9. Samples: 352636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:12:18,307][02490] Avg episode reward: [(0, '4.643')]
+[2025-08-28 13:12:22,295][02650] Updated weights for policy 0, policy_version 350 (0.0014)
+[2025-08-28 13:12:23,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1433600. Throughput: 0: 952.4. Samples: 358102. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:12:23,307][02490] Avg episode reward: [(0, '4.500')]
+[2025-08-28 13:12:28,303][02490] Fps is (10 sec: 4505.2, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1458176. Throughput: 0: 951.9. Samples: 364632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:12:28,304][02490] Avg episode reward: [(0, '4.405')]
+[2025-08-28 13:12:33,302][02490] Fps is (10 sec: 3686.3, 60 sec: 3755.1, 300 sec: 3804.4). Total num frames: 1470464. Throughput: 0: 941.0. Samples: 366754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:12:33,304][02490] Avg episode reward: [(0, '4.370')]
+[2025-08-28 13:12:33,691][02650] Updated weights for policy 0, policy_version 360 (0.0030)
+[2025-08-28 13:12:38,302][02490] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1495040. Throughput: 0: 950.1. Samples: 372372. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
+[2025-08-28 13:12:38,303][02490] Avg episode reward: [(0, '4.608')]
+[2025-08-28 13:12:43,130][02650] Updated weights for policy 0, policy_version 370 (0.0013)
+[2025-08-28 13:12:43,302][02490] Fps is (10 sec: 4505.7, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1515520. Throughput: 0: 949.7. Samples: 378872. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-28 13:12:43,306][02490] Avg episode reward: [(0, '4.480')]
+[2025-08-28 13:12:48,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.5). Total num frames: 1527808. Throughput: 0: 942.7. Samples: 380920. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
+[2025-08-28 13:12:48,304][02490] Avg episode reward: [(0, '4.532')]
+[2025-08-28 13:12:53,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3823.2, 300 sec: 3804.4). Total num frames: 1548288. Throughput: 0: 951.7. Samples: 386646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:12:53,304][02490] Avg episode reward: [(0, '4.551')]
+[2025-08-28 13:12:54,526][02650] Updated weights for policy 0, policy_version 380 (0.0013)
+[2025-08-28 13:12:58,307][02490] Fps is (10 sec: 4094.2, 60 sec: 3754.4, 300 sec: 3804.4). Total num frames: 1568768. Throughput: 0: 950.0. Samples: 393072. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:12:58,311][02490] Avg episode reward: [(0, '4.612')]
+[2025-08-28 13:13:03,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1585152. Throughput: 0: 942.2. Samples: 395036. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:13:03,306][02490] Avg episode reward: [(0, '4.515')]
+[2025-08-28 13:13:05,535][02650] Updated weights for policy 0, policy_version 390 (0.0036)
+[2025-08-28 13:13:08,302][02490] Fps is (10 sec: 3687.9, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1605632. Throughput: 0: 949.1. Samples: 400812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:13:08,307][02490] Avg episode reward: [(0, '4.624')]
+[2025-08-28 13:13:13,306][02490] Fps is (10 sec: 4094.3, 60 sec: 3754.4, 300 sec: 3790.5). Total num frames: 1626112. Throughput: 0: 945.9. Samples: 407202. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
+[2025-08-28 13:13:13,311][02490] Avg episode reward: [(0, '4.650')]
+[2025-08-28 13:13:17,194][02650] Updated weights for policy 0, policy_version 400 (0.0015)
+[2025-08-28 13:13:18,302][02490] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1642496. Throughput: 0: 943.6. Samples: 409218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:13:18,306][02490] Avg episode reward: [(0, '4.546')]
+[2025-08-28 13:13:23,302][02490] Fps is (10 sec: 3687.9, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1662976. Throughput: 0: 948.7. Samples: 415064. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
+[2025-08-28 13:13:23,304][02490] Avg episode reward: [(0, '4.590')]
+[2025-08-28 13:13:26,441][02650] Updated weights for policy 0, policy_version 410 (0.0014)
+[2025-08-28 13:13:28,306][02490] Fps is (10 sec: 4094.3, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 1683456. Throughput: 0: 943.7. Samples: 421342. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:13:28,308][02490] Avg episode reward: [(0, '4.614')]
+[2025-08-28 13:13:28,320][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000411_1683456.pth...
+[2025-08-28 13:13:28,450][02637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth
+[2025-08-28 13:13:33,302][02490] Fps is (10 sec: 3276.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1695744. Throughput: 0: 939.7. Samples: 423206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:13:33,303][02490] Avg episode reward: [(0, '4.629')]
+[2025-08-28 13:13:38,081][02650] Updated weights for policy 0, policy_version 420 (0.0025)
+[2025-08-28 13:13:38,302][02490] Fps is (10 sec: 3687.9, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1720320. Throughput: 0: 946.2. Samples: 429224. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:13:38,304][02490] Avg episode reward: [(0, '4.701')]
+[2025-08-28 13:13:43,303][02490] Fps is (10 sec: 4095.8, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 1736704. Throughput: 0: 936.8. Samples: 435224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:13:43,304][02490] Avg episode reward: [(0, '4.903')]
+[2025-08-28 13:13:48,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1753088. Throughput: 0: 936.4. Samples: 437176. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:13:48,303][02490] Avg episode reward: [(0, '5.078')]
+[2025-08-28 13:13:48,314][02637] Saving new best policy, reward=5.078!
+[2025-08-28 13:13:49,516][02650] Updated weights for policy 0, policy_version 430 (0.0022)
+[2025-08-28 13:13:53,302][02490] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1773568. Throughput: 0: 942.8. Samples: 443238. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:13:53,307][02490] Avg episode reward: [(0, '4.874')]
+[2025-08-28 13:13:58,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.9, 300 sec: 3790.5). Total num frames: 1794048. Throughput: 0: 932.0. Samples: 449136. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:13:58,304][02490] Avg episode reward: [(0, '4.544')]
+[2025-08-28 13:14:00,513][02650] Updated weights for policy 0, policy_version 440 (0.0019)
+[2025-08-28 13:14:03,302][02490] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1810432. Throughput: 0: 929.8. Samples: 451060. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:14:03,304][02490] Avg episode reward: [(0, '4.387')]
+[2025-08-28 13:14:08,305][02490] Fps is (10 sec: 3685.2, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 1830912. Throughput: 0: 935.2. Samples: 457150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:14:08,307][02490] Avg episode reward: [(0, '4.418')]
+[2025-08-28 13:14:10,461][02650] Updated weights for policy 0, policy_version 450 (0.0013)
+[2025-08-28 13:14:13,302][02490] Fps is (10 sec: 4095.9, 60 sec: 3754.9, 300 sec: 3790.5). Total num frames: 1851392. Throughput: 0: 929.1. Samples: 463146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:14:13,307][02490] Avg episode reward: [(0, '4.764')]
+[2025-08-28 13:14:18,302][02490] Fps is (10 sec: 3687.6, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1867776. Throughput: 0: 930.6. Samples: 465084. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:14:18,306][02490] Avg episode reward: [(0, '4.817')]
+[2025-08-28 13:14:21,830][02650] Updated weights for policy 0, policy_version 460 (0.0017)
+[2025-08-28 13:14:23,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1888256. Throughput: 0: 938.6. Samples: 471460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:14:23,304][02490] Avg episode reward: [(0, '4.656')]
+[2025-08-28 13:14:28,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3776.7). Total num frames: 1904640. Throughput: 0: 931.3. Samples: 477132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:14:28,303][02490] Avg episode reward: [(0, '4.558')]
+[2025-08-28 13:14:33,216][02650] Updated weights for policy 0, policy_version 470 (0.0025)
+[2025-08-28 13:14:33,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1925120. Throughput: 0: 931.1. Samples: 479074. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:14:33,304][02490] Avg episode reward: [(0, '4.815')]
+[2025-08-28 13:14:38,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1945600. Throughput: 0: 940.4. Samples: 485556. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:14:38,304][02490] Avg episode reward: [(0, '4.966')]
+[2025-08-28 13:14:43,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1961984. Throughput: 0: 931.0. Samples: 491030. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:14:43,303][02490] Avg episode reward: [(0, '4.697')]
+[2025-08-28 13:14:44,122][02650] Updated weights for policy 0, policy_version 480 (0.0016)
+[2025-08-28 13:14:48,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1978368. Throughput: 0: 935.8. Samples: 493170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:14:48,308][02490] Avg episode reward: [(0, '4.447')]
+[2025-08-28 13:14:53,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2002944. Throughput: 0: 945.9. Samples: 499712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:14:53,307][02490] Avg episode reward: [(0, '4.629')]
+[2025-08-28 13:14:54,076][02650] Updated weights for policy 0, policy_version 490 (0.0017)
+[2025-08-28 13:14:58,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2019328. Throughput: 0: 934.2. Samples: 505184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:14:58,306][02490] Avg episode reward: [(0, '4.830')]
+[2025-08-28 13:15:03,302][02490] Fps is (10 sec: 3276.7, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 2035712. Throughput: 0: 940.1. Samples: 507390. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:15:03,304][02490] Avg episode reward: [(0, '4.649')]
+[2025-08-28 13:15:05,427][02650] Updated weights for policy 0, policy_version 500 (0.0018)
+[2025-08-28 13:15:08,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3790.5). Total num frames: 2060288. Throughput: 0: 942.7. Samples: 513882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:15:08,303][02490] Avg episode reward: [(0, '4.773')]
+[2025-08-28 13:15:13,302][02490] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2072576. Throughput: 0: 933.8. Samples: 519154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:15:13,309][02490] Avg episode reward: [(0, '4.849')]
+[2025-08-28 13:15:16,798][02650] Updated weights for policy 0, policy_version 510 (0.0013)
+[2025-08-28 13:15:18,302][02490] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3776.7). Total num frames: 2093056. Throughput: 0: 945.1. Samples: 521604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:15:18,304][02490] Avg episode reward: [(0, '4.584')]
+[2025-08-28 13:15:23,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2113536. Throughput: 0: 944.9. Samples: 528076. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:15:23,303][02490] Avg episode reward: [(0, '4.449')]
+[2025-08-28 13:15:27,291][02650] Updated weights for policy 0, policy_version 520 (0.0015)
+[2025-08-28 13:15:28,302][02490] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2129920. Throughput: 0: 938.4. Samples: 533258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:15:28,304][02490] Avg episode reward: [(0, '4.608')]
+[2025-08-28 13:15:28,313][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000520_2129920.pth...
+[2025-08-28 13:15:28,465][02637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000300_1228800.pth
+[2025-08-28 13:15:33,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2150400. Throughput: 0: 945.2. Samples: 535702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:15:33,313][02490] Avg episode reward: [(0, '4.776')]
+[2025-08-28 13:15:37,536][02650] Updated weights for policy 0, policy_version 530 (0.0012)
+[2025-08-28 13:15:38,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 2170880. Throughput: 0: 944.0. Samples: 542194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:15:38,311][02490] Avg episode reward: [(0, '4.976')]
+[2025-08-28 13:15:43,303][02490] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3776.6). Total num frames: 2187264. Throughput: 0: 934.1. Samples: 547218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:15:43,304][02490] Avg episode reward: [(0, '4.666')]
+[2025-08-28 13:15:48,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2207744. Throughput: 0: 946.2. Samples: 549968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:15:48,303][02490] Avg episode reward: [(0, '4.594')]
+[2025-08-28 13:15:48,881][02650] Updated weights for policy 0, policy_version 540 (0.0015)
+[2025-08-28 13:15:53,302][02490] Fps is (10 sec: 4096.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2228224. Throughput: 0: 948.0. Samples: 556542. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:15:53,303][02490] Avg episode reward: [(0, '4.696')]
+[2025-08-28 13:15:58,302][02490] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2244608. Throughput: 0: 940.7. Samples: 561486. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:15:58,306][02490] Avg episode reward: [(0, '4.712')]
+[2025-08-28 13:16:00,184][02650] Updated weights for policy 0, policy_version 550 (0.0013)
+[2025-08-28 13:16:03,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2265088. Throughput: 0: 947.6. Samples: 564246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:16:03,304][02490] Avg episode reward: [(0, '4.710')]
+[2025-08-28 13:16:08,302][02490] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2285568. Throughput: 0: 948.2. Samples: 570744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:16:08,307][02490] Avg episode reward: [(0, '4.845')]
+[2025-08-28 13:16:10,448][02650] Updated weights for policy 0, policy_version 560 (0.0012)
+[2025-08-28 13:16:13,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2297856. Throughput: 0: 936.4. Samples: 575396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:16:13,303][02490] Avg episode reward: [(0, '4.809')]
+[2025-08-28 13:16:18,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2322432. Throughput: 0: 948.8. Samples: 578398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:16:18,308][02490] Avg episode reward: [(0, '4.771')]
+[2025-08-28 13:16:21,070][02650] Updated weights for policy 0, policy_version 570 (0.0018)
+[2025-08-28 13:16:23,302][02490] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2342912. Throughput: 0: 947.1. Samples: 584814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:16:23,304][02490] Avg episode reward: [(0, '4.548')]
+[2025-08-28 13:16:28,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.9). Total num frames: 2355200. Throughput: 0: 936.8. Samples: 589372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:16:28,307][02490] Avg episode reward: [(0, '4.603')]
+[2025-08-28 13:16:32,506][02650] Updated weights for policy 0, policy_version 580 (0.0019)
+[2025-08-28 13:16:33,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2375680. Throughput: 0: 947.3. Samples: 592598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:16:33,304][02490] Avg episode reward: [(0, '4.616')]
+[2025-08-28 13:16:38,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2396160. Throughput: 0: 941.2. Samples: 598896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:16:38,306][02490] Avg episode reward: [(0, '4.539')]
+[2025-08-28 13:16:43,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2412544. Throughput: 0: 932.4. Samples: 603442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:16:43,307][02490] Avg episode reward: [(0, '4.483')]
+[2025-08-28 13:16:43,880][02650] Updated weights for policy 0, policy_version 590 (0.0016)
+[2025-08-28 13:16:48,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2433024. Throughput: 0: 943.3. Samples: 606694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:16:48,307][02490] Avg episode reward: [(0, '4.480')]
+[2025-08-28 13:16:53,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2453504. Throughput: 0: 942.7. Samples: 613166. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:16:53,309][02490] Avg episode reward: [(0, '4.496')]
+[2025-08-28 13:16:53,804][02650] Updated weights for policy 0, policy_version 600 (0.0011)
+[2025-08-28 13:16:58,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2469888. Throughput: 0: 940.2. Samples: 617704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:16:58,308][02490] Avg episode reward: [(0, '4.606')]
+[2025-08-28 13:17:03,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2490368. Throughput: 0: 946.3. Samples: 620982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:17:03,307][02490] Avg episode reward: [(0, '4.796')]
+[2025-08-28 13:17:04,835][02650] Updated weights for policy 0, policy_version 610 (0.0023)
+[2025-08-28 13:17:08,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2510848. Throughput: 0: 945.6. Samples: 627366. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-08-28 13:17:08,304][02490] Avg episode reward: [(0, '4.715')]
+[2025-08-28 13:17:13,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2527232. Throughput: 0: 944.8. Samples: 631886. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:17:13,303][02490] Avg episode reward: [(0, '4.748')]
+[2025-08-28 13:17:15,919][02650] Updated weights for policy 0, policy_version 620 (0.0022)
+[2025-08-28 13:17:18,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2547712. Throughput: 0: 944.8. Samples: 635114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:17:18,303][02490] Avg episode reward: [(0, '4.840')]
+[2025-08-28 13:17:23,304][02490] Fps is (10 sec: 4095.1, 60 sec: 3754.5, 300 sec: 3762.8). Total num frames: 2568192. Throughput: 0: 949.6. Samples: 641632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:17:23,306][02490] Avg episode reward: [(0, '4.882')]
+[2025-08-28 13:17:27,395][02650] Updated weights for policy 0, policy_version 630 (0.0023)
+[2025-08-28 13:17:28,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2580480. Throughput: 0: 948.2. Samples: 646112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:17:28,306][02490] Avg episode reward: [(0, '4.952')]
+[2025-08-28 13:17:28,402][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000631_2584576.pth...
+[2025-08-28 13:17:28,513][02637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000411_1683456.pth
+[2025-08-28 13:17:33,302][02490] Fps is (10 sec: 3277.5, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2600960. Throughput: 0: 945.6. Samples: 649246. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:17:33,307][02490] Avg episode reward: [(0, '4.915')]
+[2025-08-28 13:17:37,459][02650] Updated weights for policy 0, policy_version 640 (0.0016)
+[2025-08-28 13:17:38,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2621440. Throughput: 0: 941.7. Samples: 655544. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:17:38,303][02490] Avg episode reward: [(0, '4.937')]
+[2025-08-28 13:17:43,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2637824. Throughput: 0: 944.0. Samples: 660182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:17:43,308][02490] Avg episode reward: [(0, '5.027')]
+[2025-08-28 13:17:48,265][02650] Updated weights for policy 0, policy_version 650 (0.0015)
+[2025-08-28 13:17:48,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2662400. Throughput: 0: 943.1. Samples: 663422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:17:48,307][02490] Avg episode reward: [(0, '4.808')]
+[2025-08-28 13:17:53,303][02490] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 2678784. Throughput: 0: 941.0. Samples: 669712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:17:53,305][02490] Avg episode reward: [(0, '4.740')]
+[2025-08-28 13:17:58,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2695168. Throughput: 0: 945.7. Samples: 674442. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:17:58,307][02490] Avg episode reward: [(0, '4.757')]
+[2025-08-28 13:17:59,599][02650] Updated weights for policy 0, policy_version 660 (0.0023)
+[2025-08-28 13:18:03,302][02490] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2715648. Throughput: 0: 946.4. Samples: 677702. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:18:03,307][02490] Avg episode reward: [(0, '4.831')]
+[2025-08-28 13:18:08,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2736128. Throughput: 0: 932.5. Samples: 683592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:18:08,303][02490] Avg episode reward: [(0, '4.930')]
+[2025-08-28 13:18:10,824][02650] Updated weights for policy 0, policy_version 670 (0.0041)
+[2025-08-28 13:18:13,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2752512. Throughput: 0: 942.7. Samples: 688532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:18:13,306][02490] Avg episode reward: [(0, '4.743')]
+[2025-08-28 13:18:18,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2772992. Throughput: 0: 943.6. Samples: 691706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:18:18,303][02490] Avg episode reward: [(0, '4.681')]
+[2025-08-28 13:18:20,554][02650] Updated weights for policy 0, policy_version 680 (0.0023)
+[2025-08-28 13:18:23,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3748.9). Total num frames: 2789376. Throughput: 0: 937.2. Samples: 697716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:18:23,308][02490] Avg episode reward: [(0, '4.809')]
+[2025-08-28 13:18:28,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2809856. Throughput: 0: 947.3. Samples: 702812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:18:28,303][02490] Avg episode reward: [(0, '4.815')]
+[2025-08-28 13:18:31,545][02650] Updated weights for policy 0, policy_version 690 (0.0013)
+[2025-08-28 13:18:33,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 2830336. Throughput: 0: 948.7. Samples: 706112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:18:33,306][02490] Avg episode reward: [(0, '4.893')]
+[2025-08-28 13:18:38,303][02490] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 2846720. Throughput: 0: 933.5. Samples: 711718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:18:38,305][02490] Avg episode reward: [(0, '4.825')]
+[2025-08-28 13:18:42,985][02650] Updated weights for policy 0, policy_version 700 (0.0022)
+[2025-08-28 13:18:43,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2867200. Throughput: 0: 944.9. Samples: 716964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:18:43,304][02490] Avg episode reward: [(0, '4.630')]
+[2025-08-28 13:18:48,302][02490] Fps is (10 sec: 4096.5, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2887680. Throughput: 0: 945.3. Samples: 720240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:18:48,307][02490] Avg episode reward: [(0, '4.601')]
+[2025-08-28 13:18:53,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2904064. Throughput: 0: 939.2. Samples: 725858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:18:53,306][02490] Avg episode reward: [(0, '4.685')]
+[2025-08-28 13:18:54,411][02650] Updated weights for policy 0, policy_version 710 (0.0015)
+[2025-08-28 13:18:58,305][02490] Fps is (10 sec: 3685.4, 60 sec: 3822.8, 300 sec: 3776.6). Total num frames: 2924544. Throughput: 0: 949.1. Samples: 731246. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:18:58,306][02490] Avg episode reward: [(0, '4.697')]
+[2025-08-28 13:19:03,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2940928. Throughput: 0: 949.9. Samples: 734450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:19:03,303][02490] Avg episode reward: [(0, '4.544')]
+[2025-08-28 13:19:07,234][02650] Updated weights for policy 0, policy_version 720 (0.0018)
+[2025-08-28 13:19:08,302][02490] Fps is (10 sec: 2458.2, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2949120. Throughput: 0: 883.0. Samples: 737452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:19:08,306][02490] Avg episode reward: [(0, '4.423')]
+[2025-08-28 13:19:13,302][02490] Fps is (10 sec: 2457.6, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2965504. Throughput: 0: 868.7. Samples: 741904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:19:13,304][02490] Avg episode reward: [(0, '4.292')]
+[2025-08-28 13:19:18,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 2985984. Throughput: 0: 864.4. Samples: 745008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:19:18,303][02490] Avg episode reward: [(0, '4.656')]
+[2025-08-28 13:19:18,436][02650] Updated weights for policy 0, policy_version 730 (0.0017)
+[2025-08-28 13:19:23,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 3002368. Throughput: 0: 857.7. Samples: 750312. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-28 13:19:23,305][02490] Avg episode reward: [(0, '4.805')]
+[2025-08-28 13:19:28,302][02490] Fps is (10 sec: 3686.3, 60 sec: 3549.8, 300 sec: 3721.1). Total num frames: 3022848. Throughput: 0: 867.3. Samples: 755994. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-28 13:19:28,306][02490] Avg episode reward: [(0, '4.855')]
+[2025-08-28 13:19:28,313][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000738_3022848.pth...
+[2025-08-28 13:19:28,422][02637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000520_2129920.pth
+[2025-08-28 13:19:29,966][02650] Updated weights for policy 0, policy_version 740 (0.0019)
+[2025-08-28 13:19:33,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 3043328. Throughput: 0: 863.9. Samples: 759116. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:19:33,304][02490] Avg episode reward: [(0, '4.653')]
+[2025-08-28 13:19:38,302][02490] Fps is (10 sec: 3276.9, 60 sec: 3481.7, 300 sec: 3707.2). Total num frames: 3055616. Throughput: 0: 850.6. Samples: 764136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:19:38,308][02490] Avg episode reward: [(0, '4.818')]
+[2025-08-28 13:19:41,446][02650] Updated weights for policy 0, policy_version 750 (0.0027)
+[2025-08-28 13:19:43,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3735.0). Total num frames: 3080192. Throughput: 0: 860.6. Samples: 769972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:19:43,303][02490] Avg episode reward: [(0, '4.887')]
+[2025-08-28 13:19:48,302][02490] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3721.1). Total num frames: 3100672. Throughput: 0: 859.6. Samples: 773132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:19:48,303][02490] Avg episode reward: [(0, '4.957')]
+[2025-08-28 13:19:52,632][02650] Updated weights for policy 0, policy_version 760 (0.0012)
+[2025-08-28 13:19:53,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3707.2). Total num frames: 3112960. Throughput: 0: 905.6. Samples: 778202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:19:53,303][02490] Avg episode reward: [(0, '4.993')]
+[2025-08-28 13:19:58,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3481.7, 300 sec: 3721.1). Total num frames: 3133440. Throughput: 0: 938.8. Samples: 784150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:19:58,307][02490] Avg episode reward: [(0, '4.757')]
+[2025-08-28 13:20:01,933][02650] Updated weights for policy 0, policy_version 770 (0.0018)
+[2025-08-28 13:20:03,304][02490] Fps is (10 sec: 4504.7, 60 sec: 3618.0, 300 sec: 3721.1). Total num frames: 3158016. Throughput: 0: 943.2. Samples: 787452. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:20:03,305][02490] Avg episode reward: [(0, '4.579')]
+[2025-08-28 13:20:08,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3721.1). Total num frames: 3170304. Throughput: 0: 931.8. Samples: 792244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:20:08,307][02490] Avg episode reward: [(0, '4.578')]
+[2025-08-28 13:20:13,302][02490] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3190784. Throughput: 0: 925.2. Samples: 797628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:20:13,307][02490] Avg episode reward: [(0, '4.750')]
+[2025-08-28 13:20:14,157][02650] Updated weights for policy 0, policy_version 780 (0.0027)
+[2025-08-28 13:20:18,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3211264. Throughput: 0: 928.8. Samples: 800912. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-28 13:20:18,303][02490] Avg episode reward: [(0, '4.661')]
+[2025-08-28 13:20:23,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3227648. Throughput: 0: 923.9. Samples: 805710. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-28 13:20:23,307][02490] Avg episode reward: [(0, '4.566')]
+[2025-08-28 13:20:25,303][02650] Updated weights for policy 0, policy_version 790 (0.0020)
+[2025-08-28 13:20:28,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3244032. Throughput: 0: 931.0. Samples: 811866. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:20:28,307][02490] Avg episode reward: [(0, '4.675')]
+[2025-08-28 13:20:33,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3268608. Throughput: 0: 933.4. Samples: 815134. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-28 13:20:33,304][02490] Avg episode reward: [(0, '4.848')]
+[2025-08-28 13:20:36,358][02650] Updated weights for policy 0, policy_version 800 (0.0020)
+[2025-08-28 13:20:38,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3280896. Throughput: 0: 923.1. Samples: 819742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:20:38,305][02490] Avg episode reward: [(0, '4.984')]
+[2025-08-28 13:20:43,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3301376. Throughput: 0: 929.9. Samples: 825996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:20:43,304][02490] Avg episode reward: [(0, '4.804')]
+[2025-08-28 13:20:46,220][02650] Updated weights for policy 0, policy_version 810 (0.0019)
+[2025-08-28 13:20:48,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3321856. Throughput: 0: 930.7. Samples: 829332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:20:48,303][02490] Avg episode reward: [(0, '4.628')]
+[2025-08-28 13:20:53,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3338240. Throughput: 0: 926.6. Samples: 833942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:20:53,304][02490] Avg episode reward: [(0, '4.727')]
+[2025-08-28 13:20:57,601][02650] Updated weights for policy 0, policy_version 820 (0.0034)
+[2025-08-28 13:20:58,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3358720. Throughput: 0: 949.1. Samples: 840336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:20:58,304][02490] Avg episode reward: [(0, '4.667')]
+[2025-08-28 13:21:03,303][02490] Fps is (10 sec: 4095.5, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3379200. Throughput: 0: 949.5. Samples: 843642. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:21:03,307][02490] Avg episode reward: [(0, '4.625')]
+[2025-08-28 13:21:08,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3395584. Throughput: 0: 941.3. Samples: 848070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:21:08,306][02490] Avg episode reward: [(0, '4.637')]
+[2025-08-28 13:21:08,788][02650] Updated weights for policy 0, policy_version 830 (0.0016)
+[2025-08-28 13:21:13,302][02490] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3416064. Throughput: 0: 948.9. Samples: 854566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:21:13,309][02490] Avg episode reward: [(0, '4.697')]
+[2025-08-28 13:21:18,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3436544. Throughput: 0: 947.7. Samples: 857782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:21:18,303][02490] Avg episode reward: [(0, '4.688')]
+[2025-08-28 13:21:19,443][02650] Updated weights for policy 0, policy_version 840 (0.0020)
+[2025-08-28 13:21:23,303][02490] Fps is (10 sec: 3685.9, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 3452928. Throughput: 0: 946.6. Samples: 862340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:21:23,305][02490] Avg episode reward: [(0, '4.915')]
+[2025-08-28 13:21:28,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3473408. Throughput: 0: 953.6. Samples: 868908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-28 13:21:28,304][02490] Avg episode reward: [(0, '4.929')]
+[2025-08-28 13:21:28,315][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000848_3473408.pth...
+[2025-08-28 13:21:28,416][02637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000631_2584576.pth
+[2025-08-28 13:21:29,655][02650] Updated weights for policy 0, policy_version 850 (0.0017)
+[2025-08-28 13:21:33,302][02490] Fps is (10 sec: 3686.9, 60 sec: 3686.4, 300 sec: 3707.2). Total num frames: 3489792. Throughput: 0: 950.1. Samples: 872088. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:21:33,303][02490] Avg episode reward: [(0, '4.668')]
+[2025-08-28 13:21:38,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3506176. Throughput: 0: 944.8. Samples: 876460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:21:38,304][02490] Avg episode reward: [(0, '4.655')]
+[2025-08-28 13:21:41,101][02650] Updated weights for policy 0, policy_version 860 (0.0015)
+[2025-08-28 13:21:43,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3530752. Throughput: 0: 947.2. Samples: 882960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-28 13:21:43,303][02490] Avg episode reward: [(0, '4.584')]
+[2025-08-28 13:21:48,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3547136. Throughput: 0: 945.8. Samples: 886200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:21:48,306][02490] Avg episode reward: [(0, '4.533')]
+[2025-08-28 13:21:52,276][02650] Updated weights for policy 0, policy_version 870 (0.0012)
+[2025-08-28 13:21:53,302][02490] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3567616. Throughput: 0: 946.6. Samples: 890668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:21:53,304][02490] Avg episode reward: [(0, '4.561')]
+[2025-08-28 13:21:58,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3588096. Throughput: 0: 947.5. Samples: 897202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:21:58,304][02490] Avg episode reward: [(0, '4.475')]
+[2025-08-28 13:22:02,441][02650] Updated weights for policy 0, policy_version 880 (0.0024)
+[2025-08-28 13:22:03,302][02490] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3604480. Throughput: 0: 947.2. Samples: 900408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:22:03,303][02490] Avg episode reward: [(0, '4.520')]
+[2025-08-28 13:22:08,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3620864. Throughput: 0: 944.9. Samples: 904860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:22:08,304][02490] Avg episode reward: [(0, '4.516')]
+[2025-08-28 13:22:13,093][02650] Updated weights for policy 0, policy_version 890 (0.0021)
+[2025-08-28 13:22:13,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3645440. Throughput: 0: 943.8. Samples: 911378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:22:13,303][02490] Avg episode reward: [(0, '4.591')]
+[2025-08-28 13:22:18,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3707.3). Total num frames: 3661824. Throughput: 0: 943.1. Samples: 914528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:22:18,304][02490] Avg episode reward: [(0, '4.593')]
+[2025-08-28 13:22:23,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3678208. Throughput: 0: 949.6. Samples: 919192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:22:23,305][02490] Avg episode reward: [(0, '4.498')]
+[2025-08-28 13:22:24,424][02650] Updated weights for policy 0, policy_version 900 (0.0018)
+[2025-08-28 13:22:28,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 3702784. Throughput: 0: 948.4. Samples: 925640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:22:28,307][02490] Avg episode reward: [(0, '4.482')]
+[2025-08-28 13:22:33,302][02490] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3719168. Throughput: 0: 944.4. Samples: 928698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:22:33,304][02490] Avg episode reward: [(0, '4.350')]
+[2025-08-28 13:22:35,901][02650] Updated weights for policy 0, policy_version 910 (0.0024)
+[2025-08-28 13:22:38,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3735552. Throughput: 0: 948.5. Samples: 933352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:22:38,308][02490] Avg episode reward: [(0, '4.726')]
+[2025-08-28 13:22:43,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3756032. Throughput: 0: 948.3. Samples: 939874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:22:43,305][02490] Avg episode reward: [(0, '4.826')]
+[2025-08-28 13:22:45,464][02650] Updated weights for policy 0, policy_version 920 (0.0025)
+[2025-08-28 13:22:48,303][02490] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3707.2). Total num frames: 3772416. Throughput: 0: 943.0. Samples: 942842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:22:48,305][02490] Avg episode reward: [(0, '4.646')]
+[2025-08-28 13:22:53,302][02490] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3792896. Throughput: 0: 951.5. Samples: 947676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:22:53,304][02490] Avg episode reward: [(0, '4.406')]
+[2025-08-28 13:22:56,531][02650] Updated weights for policy 0, policy_version 930 (0.0014)
+[2025-08-28 13:22:58,302][02490] Fps is (10 sec: 4506.0, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 3817472. Throughput: 0: 952.1. Samples: 954222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:22:58,306][02490] Avg episode reward: [(0, '4.300')]
+[2025-08-28 13:23:03,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3707.2). Total num frames: 3829760. Throughput: 0: 945.3. Samples: 957068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:23:03,304][02490] Avg episode reward: [(0, '4.559')]
+[2025-08-28 13:23:07,632][02650] Updated weights for policy 0, policy_version 940 (0.0012)
+[2025-08-28 13:23:08,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3850240. Throughput: 0: 950.3. Samples: 961954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:23:08,306][02490] Avg episode reward: [(0, '4.717')]
+[2025-08-28 13:23:13,302][02490] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3735.0). Total num frames: 3874816. Throughput: 0: 951.8. Samples: 968472. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:23:13,310][02490] Avg episode reward: [(0, '4.537')]
+[2025-08-28 13:23:18,303][02490] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 3887104. Throughput: 0: 944.0. Samples: 971180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-28 13:23:18,305][02490] Avg episode reward: [(0, '4.424')]
+[2025-08-28 13:23:18,722][02650] Updated weights for policy 0, policy_version 950 (0.0018)
+[2025-08-28 13:23:23,302][02490] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3907584. Throughput: 0: 954.7. Samples: 976312. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:23:23,304][02490] Avg episode reward: [(0, '4.502')]
+[2025-08-28 13:23:28,303][02490] Fps is (10 sec: 4096.0, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 3928064. Throughput: 0: 953.5. Samples: 982784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:23:28,304][02490] Avg episode reward: [(0, '4.788')]
+[2025-08-28 13:23:28,371][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000960_3932160.pth...
+[2025-08-28 13:23:28,368][02650] Updated weights for policy 0, policy_version 960 (0.0021)
+[2025-08-28 13:23:28,495][02637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000738_3022848.pth
+[2025-08-28 13:23:33,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.1). Total num frames: 3944448. Throughput: 0: 942.6. Samples: 985260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:23:33,310][02490] Avg episode reward: [(0, '4.837')]
+[2025-08-28 13:23:38,302][02490] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3964928. Throughput: 0: 949.9. Samples: 990420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-28 13:23:38,304][02490] Avg episode reward: [(0, '4.907')]
+[2025-08-28 13:23:39,704][02650] Updated weights for policy 0, policy_version 970 (0.0023)
+[2025-08-28 13:23:43,302][02490] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 3985408. Throughput: 0: 948.0. Samples: 996882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-28 13:23:43,304][02490] Avg episode reward: [(0, '4.776')]
+[2025-08-28 13:23:48,302][02490] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3721.1). Total num frames: 4001792. Throughput: 0: 939.1. Samples: 999328. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-28 13:23:48,304][02490] Avg episode reward: [(0, '4.752')]
+[2025-08-28 13:23:49,061][02637] Stopping Batcher_0...
+[2025-08-28 13:23:49,062][02637] Loop batcher_evt_loop terminating...
+[2025-08-28 13:23:49,063][02490] Component Batcher_0 stopped!
+[2025-08-28 13:23:49,068][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:23:49,118][02650] Weights refcount: 2 0
+[2025-08-28 13:23:49,122][02650] Stopping InferenceWorker_p0-w0...
+[2025-08-28 13:23:49,122][02650] Loop inference_proc0-0_evt_loop terminating...
+[2025-08-28 13:23:49,123][02490] Component InferenceWorker_p0-w0 stopped!
+[2025-08-28 13:23:49,180][02637] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000848_3473408.pth
+[2025-08-28 13:23:49,188][02637] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:23:49,337][02490] Component LearnerWorker_p0 stopped!
+[2025-08-28 13:23:49,344][02637] Stopping LearnerWorker_p0...
+[2025-08-28 13:23:49,346][02637] Loop learner_proc0_evt_loop terminating...
+[2025-08-28 13:23:49,411][02653] Stopping RolloutWorker_w2...
+[2025-08-28 13:23:49,413][02657] Stopping RolloutWorker_w6...
+[2025-08-28 13:23:49,414][02657] Loop rollout_proc6_evt_loop terminating...
+[2025-08-28 13:23:49,411][02490] Component RolloutWorker_w2 stopped!
+[2025-08-28 13:23:49,412][02653] Loop rollout_proc2_evt_loop terminating...
+[2025-08-28 13:23:49,418][02654] Stopping RolloutWorker_w4...
+[2025-08-28 13:23:49,416][02490] Component RolloutWorker_w6 stopped!
+[2025-08-28 13:23:49,420][02490] Component RolloutWorker_w4 stopped!
+[2025-08-28 13:23:49,419][02654] Loop rollout_proc4_evt_loop terminating...
+[2025-08-28 13:23:49,423][02490] Component RolloutWorker_w0 stopped!
+[2025-08-28 13:23:49,425][02651] Stopping RolloutWorker_w0...
+[2025-08-28 13:23:49,426][02651] Loop rollout_proc0_evt_loop terminating...
+[2025-08-28 13:23:49,515][02652] Stopping RolloutWorker_w1...
+[2025-08-28 13:23:49,515][02490] Component RolloutWorker_w1 stopped!
+[2025-08-28 13:23:49,516][02652] Loop rollout_proc1_evt_loop terminating...
+[2025-08-28 13:23:49,554][02656] Stopping RolloutWorker_w5...
+[2025-08-28 13:23:49,555][02490] Component RolloutWorker_w5 stopped!
+[2025-08-28 13:23:49,556][02656] Loop rollout_proc5_evt_loop terminating...
+[2025-08-28 13:23:49,580][02490] Component RolloutWorker_w7 stopped!
+[2025-08-28 13:23:49,586][02655] Stopping RolloutWorker_w3...
+[2025-08-28 13:23:49,586][02490] Component RolloutWorker_w3 stopped!
+[2025-08-28 13:23:49,588][02490] Waiting for process learner_proc0 to stop...
+[2025-08-28 13:23:49,580][02658] Stopping RolloutWorker_w7...
+[2025-08-28 13:23:49,587][02655] Loop rollout_proc3_evt_loop terminating...
+[2025-08-28 13:23:49,592][02658] Loop rollout_proc7_evt_loop terminating...
+[2025-08-28 13:23:51,189][02490] Waiting for process inference_proc0-0 to join...
+[2025-08-28 13:23:51,192][02490] Waiting for process rollout_proc0 to join...
+[2025-08-28 13:23:53,085][02490] Waiting for process rollout_proc1 to join...
+[2025-08-28 13:23:53,171][02490] Waiting for process rollout_proc2 to join...
+[2025-08-28 13:23:53,172][02490] Waiting for process rollout_proc3 to join...
+[2025-08-28 13:23:53,174][02490] Waiting for process rollout_proc4 to join...
+[2025-08-28 13:23:53,176][02490] Waiting for process rollout_proc5 to join...
+[2025-08-28 13:23:53,178][02490] Waiting for process rollout_proc6 to join...
+[2025-08-28 13:23:53,180][02490] Waiting for process rollout_proc7 to join...
+[2025-08-28 13:23:53,181][02490] Batcher 0 profile tree view:
+batching: 25.5912, releasing_batches: 0.0295
+[2025-08-28 13:23:53,182][02490] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0000
+  wait_policy_total: 471.2061
+update_model: 8.2811
+  weight_update: 0.0028
+one_step: 0.0059
+  handle_policy_step: 557.6784
+    deserialize: 14.6219, stack: 3.1029, obs_to_device_normalize: 116.9911, forward: 287.9010, send_messages: 28.6439
+    prepare_outputs: 81.9475
+      to_cpu: 49.6363
+[2025-08-28 13:23:53,183][02490] Learner 0 profile tree view:
+misc: 0.0050, prepare_batch: 12.6129
+train: 73.8619
+  epoch_init: 0.0044, minibatch_init: 0.0060, losses_postprocess: 0.6863, kl_divergence: 0.7297, after_optimizer: 33.8509
+  calculate_losses: 25.9364
+    losses_init: 0.0031, forward_head: 1.3726, bptt_initial: 17.1762, tail: 1.0231, advantages_returns: 0.2714, losses: 3.7842
+    bptt: 2.0116
+      bptt_forward_core: 1.9187
+  update: 12.0604
+    clip: 1.0122
+[2025-08-28 13:23:53,184][02490] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 0.3202, enqueue_policy_requests: 123.6374, env_step: 835.9133, overhead: 14.5724, complete_rollouts: 7.4735
+save_policy_outputs: 19.3197
+  split_output_tensors: 8.0285
+[2025-08-28 13:23:53,185][02490] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.3022, enqueue_policy_requests: 132.9944, env_step: 825.9444, overhead: 14.5965, complete_rollouts: 6.7176
+save_policy_outputs: 19.0650
+  split_output_tensors: 7.5477
+[2025-08-28 13:23:53,186][02490] Loop Runner_EvtLoop terminating...
+[2025-08-28 13:23:53,187][02490] Runner profile tree view:
+main_loop: 1100.2178
+[2025-08-28 13:23:53,188][02490] Collected {0: 4005888}, FPS: 3641.0
+[2025-08-28 13:23:53,498][02490] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-28 13:23:53,499][02490] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-28 13:23:53,500][02490] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-28 13:23:53,501][02490] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-28 13:23:53,502][02490] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:23:53,503][02490] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-28 13:23:53,504][02490] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:23:53,505][02490] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-28 13:23:53,506][02490] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-28 13:23:53,507][02490] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-28 13:23:53,508][02490] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-28 13:23:53,509][02490] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-28 13:23:53,510][02490] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-28 13:23:53,511][02490] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-28 13:23:53,511][02490] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-28 13:23:53,539][02490] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-28 13:23:53,541][02490] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-28 13:23:53,543][02490] RunningMeanStd input shape: (1,)
+[2025-08-28 13:23:53,553][02490] ConvEncoder: input_channels=3
+[2025-08-28 13:23:53,646][02490] Conv encoder output size: 512
+[2025-08-28 13:23:53,647][02490] Policy head output size: 512
+[2025-08-28 13:23:53,822][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:23:53,826][02490] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:23:53,831][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:23:53,833][02490] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:23:53,835][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:23:53,837][02490] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:24:52,691][02490] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-28 13:24:52,692][02490] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-28 13:24:52,693][02490] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-28 13:24:52,694][02490] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-28 13:24:52,695][02490] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:24:52,696][02490] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-28 13:24:52,697][02490] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:24:52,698][02490] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-28 13:24:52,699][02490] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-28 13:24:52,699][02490] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-28 13:24:52,700][02490] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-28 13:24:52,701][02490] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-28 13:24:52,702][02490] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-28 13:24:52,703][02490] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-28 13:24:52,704][02490] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-28 13:24:52,730][02490] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-28 13:24:52,732][02490] RunningMeanStd input shape: (1,)
+[2025-08-28 13:24:52,740][02490] ConvEncoder: input_channels=3
+[2025-08-28 13:24:52,772][02490] Conv encoder output size: 512
+[2025-08-28 13:24:52,773][02490] Policy head output size: 512
+[2025-08-28 13:24:52,790][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:24:52,791][02490] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:24:52,793][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:24:52,794][02490] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:24:52,795][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:24:52,797][02490] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:25:30,517][02490] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-28 13:25:30,518][02490] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-28 13:25:30,519][02490] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-28 13:25:30,520][02490] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-28 13:25:30,521][02490] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:25:30,522][02490] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-28 13:25:30,523][02490] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-28 13:25:30,524][02490] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-28 13:25:30,524][02490] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-28 13:25:30,525][02490] Adding new argument 'hf_repository'='sanjaykushwah/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-28 13:25:30,527][02490] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-28 13:25:30,528][02490] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-28 13:25:30,528][02490] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-28 13:25:30,529][02490] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-28 13:25:30,530][02490] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-28 13:25:30,552][02490] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-28 13:25:30,553][02490] RunningMeanStd input shape: (1,)
+[2025-08-28 13:25:30,562][02490] ConvEncoder: input_channels=3
+[2025-08-28 13:25:30,591][02490] Conv encoder output size: 512
+[2025-08-28 13:25:30,592][02490] Policy head output size: 512
+[2025-08-28 13:25:30,608][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:25:30,610][02490] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:25:30,611][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:25:30,613][02490] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:25:30,614][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:25:30,615][02490] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:26:17,572][02490] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-28 13:26:17,573][02490] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-28 13:26:17,574][02490] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-28 13:26:17,575][02490] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-28 13:26:17,575][02490] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:26:17,576][02490] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-28 13:26:17,577][02490] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:26:17,578][02490] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-28 13:26:17,579][02490] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-28 13:26:17,580][02490] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-28 13:26:17,581][02490] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-28 13:26:17,581][02490] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-28 13:26:17,582][02490] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-28 13:26:17,583][02490] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-28 13:26:17,584][02490] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-28 13:26:17,609][02490] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-28 13:26:17,610][02490] RunningMeanStd input shape: (1,)
+[2025-08-28 13:26:17,619][02490] ConvEncoder: input_channels=3
+[2025-08-28 13:26:17,649][02490] Conv encoder output size: 512
+[2025-08-28 13:26:17,650][02490] Policy head output size: 512
+[2025-08-28 13:26:17,667][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:26:17,669][02490] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:26:17,671][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:26:17,672][02490] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:26:17,673][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:26:17,675][02490] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:36:34,470][02490] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-28 13:36:34,470][02490] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-28 13:36:34,472][02490] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-28 13:36:34,473][02490] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-28 13:36:34,474][02490] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:36:34,475][02490] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-28 13:36:34,476][02490] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:36:34,477][02490] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-28 13:36:34,478][02490] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-28 13:36:34,479][02490] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-28 13:36:34,480][02490] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-28 13:36:34,481][02490] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-28 13:36:34,482][02490] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-28 13:36:34,482][02490] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-28 13:36:34,483][02490] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-28 13:36:34,518][02490] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-28 13:36:34,520][02490] RunningMeanStd input shape: (1,)
+[2025-08-28 13:36:34,528][02490] ConvEncoder: input_channels=3
+[2025-08-28 13:36:34,560][02490] Conv encoder output size: 512
+[2025-08-28 13:36:34,560][02490] Policy head output size: 512
+[2025-08-28 13:36:34,579][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:36:34,581][02490] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:36:34,582][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:36:34,583][02490] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:36:34,585][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:36:34,586][02490] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-28 13:38:50,955][02490] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-28 13:38:50,956][02490] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-28 13:38:50,957][02490] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-28 13:38:50,958][02490] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-28 13:38:50,959][02490] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:38:50,959][02490] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-28 13:38:50,960][02490] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:38:50,961][02490] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-28 13:38:50,962][02490] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-28 13:38:50,963][02490] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-28 13:38:50,965][02490] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-28 13:38:50,966][02490] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-28 13:38:50,967][02490] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-28 13:38:50,968][02490] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-28 13:38:50,969][02490] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-28 13:38:50,997][02490] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-28 13:38:50,998][02490] RunningMeanStd input shape: (1,)
+[2025-08-28 13:38:51,007][02490] ConvEncoder: input_channels=3
+[2025-08-28 13:38:51,036][02490] Conv encoder output size: 512
+[2025-08-28 13:38:51,037][02490] Policy head output size: 512
+[2025-08-28 13:38:51,054][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:38:51,805][02490] Num frames 100...
+[2025-08-28 13:38:51,928][02490] Num frames 200...
+[2025-08-28 13:38:52,051][02490] Num frames 300...
+[2025-08-28 13:38:52,207][02490] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-28 13:38:52,209][02490] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-28 13:38:52,230][02490] Num frames 400...
+[2025-08-28 13:38:52,351][02490] Num frames 500...
+[2025-08-28 13:38:52,481][02490] Num frames 600...
+[2025-08-28 13:38:52,605][02490] Num frames 700...
+[2025-08-28 13:38:52,729][02490] Num frames 800...
+[2025-08-28 13:38:52,862][02490] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320
+[2025-08-28 13:38:52,863][02490] Avg episode reward: 5.320, avg true_objective: 4.320
+[2025-08-28 13:38:52,911][02490] Num frames 900...
+[2025-08-28 13:38:53,034][02490] Num frames 1000...
+[2025-08-28 13:38:53,154][02490] Num frames 1100...
+[2025-08-28 13:38:53,282][02490] Num frames 1200...
+[2025-08-28 13:38:53,407][02490] Num frames 1300...
+[2025-08-28 13:38:53,542][02490] Num frames 1400...
+[2025-08-28 13:38:53,607][02490] Avg episode rewards: #0: 6.027, true rewards: #0: 4.693
+[2025-08-28 13:38:53,608][02490] Avg episode reward: 6.027, avg true_objective: 4.693
+[2025-08-28 13:38:53,724][02490] Num frames 1500...
+[2025-08-28 13:38:53,847][02490] Num frames 1600...
+[2025-08-28 13:38:53,971][02490] Num frames 1700...
+[2025-08-28 13:38:54,137][02490] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
+[2025-08-28 13:38:54,138][02490] Avg episode reward: 5.480, avg true_objective: 4.480
+[2025-08-28 13:38:54,154][02490] Num frames 1800...
+[2025-08-28 13:38:54,278][02490] Num frames 1900...
+[2025-08-28 13:38:54,465][02490] Num frames 2000...
+[2025-08-28 13:38:54,644][02490] Num frames 2100...
+[2025-08-28 13:38:54,829][02490] Avg episode rewards: #0: 5.152, true rewards: #0: 4.352
+[2025-08-28 13:38:54,831][02490] Avg episode reward: 5.152, avg true_objective: 4.352
+[2025-08-28 13:38:54,880][02490] Num frames 2200...
+[2025-08-28 13:38:55,055][02490] Num frames 2300...
+[2025-08-28 13:38:55,226][02490] Num frames 2400...
+[2025-08-28 13:38:55,395][02490] Num frames 2500...
+[2025-08-28 13:38:55,573][02490] Num frames 2600...
+[2025-08-28 13:38:55,674][02490] Avg episode rewards: #0: 5.207, true rewards: #0: 4.373
+[2025-08-28 13:38:55,675][02490] Avg episode reward: 5.207, avg true_objective: 4.373
+[2025-08-28 13:38:55,809][02490] Num frames 2700...
+[2025-08-28 13:38:55,987][02490] Num frames 2800...
+[2025-08-28 13:38:56,174][02490] Num frames 2900...
+[2025-08-28 13:38:56,359][02490] Num frames 3000...
+[2025-08-28 13:38:56,508][02490] Num frames 3100...
+[2025-08-28 13:38:56,642][02490] Num frames 3200...
+[2025-08-28 13:38:56,736][02490] Avg episode rewards: #0: 5.760, true rewards: #0: 4.617
+[2025-08-28 13:38:56,737][02490] Avg episode reward: 5.760, avg true_objective: 4.617
+[2025-08-28 13:38:56,822][02490] Num frames 3300...
+[2025-08-28 13:38:56,946][02490] Num frames 3400...
+[2025-08-28 13:38:57,071][02490] Num frames 3500...
+[2025-08-28 13:38:57,194][02490] Num frames 3600...
+[2025-08-28 13:38:57,269][02490] Avg episode rewards: #0: 5.520, true rewards: #0: 4.520
+[2025-08-28 13:38:57,270][02490] Avg episode reward: 5.520, avg true_objective: 4.520
+[2025-08-28 13:38:57,374][02490] Num frames 3700...
+[2025-08-28 13:38:57,496][02490] Num frames 3800...
+[2025-08-28 13:38:57,746][02490] Num frames 3900...
+[2025-08-28 13:38:57,895][02490] Num frames 4000...
+[2025-08-28 13:38:57,947][02490] Avg episode rewards: #0: 5.333, true rewards: #0: 4.444
+[2025-08-28 13:38:57,948][02490] Avg episode reward: 5.333, avg true_objective: 4.444
+[2025-08-28 13:38:58,074][02490] Num frames 4100...
+[2025-08-28 13:38:58,197][02490] Num frames 4200...
+[2025-08-28 13:38:58,324][02490] Avg episode rewards: #0: 5.056, true rewards: #0: 4.256
+[2025-08-28 13:38:58,325][02490] Avg episode reward: 5.056, avg true_objective: 4.256
+[2025-08-28 13:39:19,574][02490] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+[2025-08-28 13:39:19,787][02490] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-28 13:39:19,788][02490] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-28 13:39:19,790][02490] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-28 13:39:19,791][02490] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-28 13:39:19,792][02490] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-28 13:39:19,793][02490] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-28 13:39:19,794][02490] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-28 13:39:19,796][02490] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-28 13:39:19,797][02490] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-28 13:39:19,798][02490] Adding new argument 'hf_repository'='sanjaykushwah/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-28 13:39:19,798][02490] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-28 13:39:19,799][02490] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-28 13:39:19,800][02490] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-28 13:39:19,801][02490] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-28 13:39:19,802][02490] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-28 13:39:19,843][02490] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-28 13:39:19,845][02490] RunningMeanStd input shape: (1,)
+[2025-08-28 13:39:19,857][02490] ConvEncoder: input_channels=3
+[2025-08-28 13:39:19,914][02490] Conv encoder output size: 512
+[2025-08-28 13:39:19,918][02490] Policy head output size: 512
+[2025-08-28 13:39:19,942][02490] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-28 13:39:20,562][02490] Num frames 100...
+[2025-08-28 13:39:20,776][02490] Num frames 200...
+[2025-08-28 13:39:20,948][02490] Num frames 300...
+[2025-08-28 13:39:21,146][02490] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-28 13:39:21,147][02490] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-28 13:39:21,191][02490] Num frames 400...
+[2025-08-28 13:39:21,371][02490] Num frames 500...
+[2025-08-28 13:39:21,541][02490] Num frames 600...
+[2025-08-28 13:39:21,738][02490] Num frames 700...
+[2025-08-28 13:39:21,919][02490] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-28 13:39:21,920][02490] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-28 13:39:21,983][02490] Num frames 800...
+[2025-08-28 13:39:22,168][02490] Num frames 900...
+[2025-08-28 13:39:22,333][02490] Num frames 1000...
+[2025-08-28 13:39:22,458][02490] Num frames 1100...
+[2025-08-28 13:39:22,619][02490] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947
+[2025-08-28 13:39:22,620][02490] Avg episode reward: 4.280, avg true_objective: 3.947
+[2025-08-28 13:39:22,641][02490] Num frames 1200...
+[2025-08-28 13:39:22,766][02490] Num frames 1300...
+[2025-08-28 13:39:22,897][02490] Num frames 1400...
+[2025-08-28 13:39:23,020][02490] Num frames 1500...
+[2025-08-28 13:39:23,145][02490] Num frames 1600...
+[2025-08-28 13:39:23,240][02490] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080
+[2025-08-28 13:39:23,241][02490] Avg episode reward: 4.580, avg true_objective: 4.080
+[2025-08-28 13:39:23,327][02490] Num frames 1700...
+[2025-08-28 13:39:23,449][02490] Num frames 1800...
+[2025-08-28 13:39:23,572][02490] Num frames 1900...
+[2025-08-28 13:39:23,695][02490] Num frames 2000...
+[2025-08-28 13:39:23,769][02490] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032
+[2025-08-28 13:39:23,771][02490] Avg episode reward: 4.432, avg true_objective: 4.032
+[2025-08-28 13:39:23,890][02490] Num frames 2100...
+[2025-08-28 13:39:24,014][02490] Num frames 2200...
+[2025-08-28 13:39:24,138][02490] Num frames 2300...
+[2025-08-28 13:39:24,266][02490] Num frames 2400...
+[2025-08-28 13:39:24,318][02490] Avg episode rewards: #0: 4.333, true rewards: #0: 4.000
+[2025-08-28 13:39:24,319][02490] Avg episode reward: 4.333, avg true_objective: 4.000
+[2025-08-28 13:39:24,441][02490] Num frames 2500...
+[2025-08-28 13:39:24,563][02490] Num frames 2600...
+[2025-08-28 13:39:24,685][02490] Num frames 2700...
+[2025-08-28 13:39:24,822][02490] Avg episode rewards: #0: 4.239, true rewards: #0: 3.953
+[2025-08-28 13:39:24,823][02490] Avg episode reward: 4.239, avg true_objective: 3.953
+[2025-08-28 13:39:24,873][02490] Num frames 2800...
+[2025-08-28 13:39:24,994][02490] Num frames 2900...
+[2025-08-28 13:39:25,115][02490] Num frames 3000...
+[2025-08-28 13:39:25,240][02490] Num frames 3100...
+[2025-08-28 13:39:25,356][02490] Avg episode rewards: #0: 4.189, true rewards: #0: 3.939
+[2025-08-28 13:39:25,357][02490] Avg episode reward: 4.189, avg true_objective: 3.939
+[2025-08-28 13:39:25,416][02490] Num frames 3200...
+[2025-08-28 13:39:25,536][02490] Num frames 3300...
+[2025-08-28 13:39:25,656][02490] Num frames 3400...
+[2025-08-28 13:39:25,780][02490] Num frames 3500...
+[2025-08-28 13:39:25,965][02490] Avg episode rewards: #0: 4.332, true rewards: #0: 3.999
+[2025-08-28 13:39:25,966][02490] Avg episode reward: 4.332, avg true_objective: 3.999
+[2025-08-28 13:39:25,970][02490] Num frames 3600...
+[2025-08-28 13:39:26,095][02490] Num frames 3700...
+[2025-08-28 13:39:26,218][02490] Num frames 3800...
+[2025-08-28 13:39:26,337][02490] Num frames 3900...
+[2025-08-28 13:39:26,457][02490] Num frames 4000...
+[2025-08-28 13:39:26,530][02490] Avg episode rewards: #0: 4.515, true rewards: #0: 4.015
+[2025-08-28 13:39:26,531][02490] Avg episode reward: 4.515, avg true_objective: 4.015
+[2025-08-28 13:39:45,613][02490] Replay video saved to /content/train_dir/default_experiment/replay.mp4!