diff --git "a/sf_log.txt" "b/sf_log.txt"
new file mode 100644--- /dev/null
+++ "b/sf_log.txt"
@@ -0,0 +1,1204 @@
+[2025-08-01 17:28:11,657][02698] Saving configuration to /content/train_dir/default_experiment/config.json...
+[2025-08-01 17:28:11,659][02698] Rollout worker 0 uses device cpu
+[2025-08-01 17:28:11,660][02698] Rollout worker 1 uses device cpu
+[2025-08-01 17:28:11,662][02698] Rollout worker 2 uses device cpu
+[2025-08-01 17:28:11,663][02698] Rollout worker 3 uses device cpu
+[2025-08-01 17:28:11,664][02698] Rollout worker 4 uses device cpu
+[2025-08-01 17:28:11,665][02698] Rollout worker 5 uses device cpu
+[2025-08-01 17:28:11,666][02698] Rollout worker 6 uses device cpu
+[2025-08-01 17:28:11,667][02698] Rollout worker 7 uses device cpu
+[2025-08-01 17:28:11,804][02698] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-01 17:28:11,804][02698] InferenceWorker_p0-w0: min num requests: 2
+[2025-08-01 17:28:11,833][02698] Starting all processes...
+[2025-08-01 17:28:11,834][02698] Starting process learner_proc0
+[2025-08-01 17:28:11,887][02698] Starting all processes...
+[2025-08-01 17:28:11,894][02698] Starting process inference_proc0-0
+[2025-08-01 17:28:11,895][02698] Starting process rollout_proc0
+[2025-08-01 17:28:11,895][02698] Starting process rollout_proc1
+[2025-08-01 17:28:11,895][02698] Starting process rollout_proc2
+[2025-08-01 17:28:11,896][02698] Starting process rollout_proc3
+[2025-08-01 17:28:11,896][02698] Starting process rollout_proc4
+[2025-08-01 17:28:11,896][02698] Starting process rollout_proc5
+[2025-08-01 17:28:11,896][02698] Starting process rollout_proc6
+[2025-08-01 17:28:11,896][02698] Starting process rollout_proc7
+[2025-08-01 17:28:27,931][02854] Worker 1 uses CPU cores [1]
+[2025-08-01 17:28:28,115][02856] Worker 3 uses CPU cores [1]
+[2025-08-01 17:28:28,182][02858] Worker 5 uses CPU cores [1]
+[2025-08-01 17:28:28,321][02860] Worker 7 uses CPU cores [1]
+[2025-08-01 17:28:28,520][02855] Worker 2 uses CPU cores [0]
+[2025-08-01 17:28:28,519][02857] Worker 4 uses CPU cores [0]
+[2025-08-01 17:28:28,572][02853] Worker 0 uses CPU cores [0]
+[2025-08-01 17:28:28,739][02835] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-01 17:28:28,739][02835] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2025-08-01 17:28:28,741][02859] Worker 6 uses CPU cores [0]
+[2025-08-01 17:28:28,755][02852] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-01 17:28:28,756][02852] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2025-08-01 17:28:28,773][02835] Num visible devices: 1
+[2025-08-01 17:28:28,775][02835] Starting seed is not provided
+[2025-08-01 17:28:28,775][02835] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-01 17:28:28,775][02835] Initializing actor-critic model on device cuda:0
+[2025-08-01 17:28:28,776][02835] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-01 17:28:28,780][02835] RunningMeanStd input shape: (1,)
+[2025-08-01 17:28:28,791][02852] Num visible devices: 1
+[2025-08-01 17:28:28,799][02835] ConvEncoder: input_channels=3
+[2025-08-01 17:28:29,140][02835] Conv encoder output size: 512
+[2025-08-01 17:28:29,141][02835] Policy head output size: 512
+[2025-08-01 17:28:29,206][02835] Created Actor Critic model with architecture:
+[2025-08-01 17:28:29,207][02835] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2025-08-01 17:28:29,535][02835] Using optimizer <class 'torch.optim.adam.Adam'>
+[2025-08-01 17:28:31,797][02698] Heartbeat connected on Batcher_0
+[2025-08-01 17:28:31,804][02698] Heartbeat connected on InferenceWorker_p0-w0
+[2025-08-01 17:28:31,810][02698] Heartbeat connected on RolloutWorker_w0
+[2025-08-01 17:28:31,817][02698] Heartbeat connected on RolloutWorker_w2
+[2025-08-01 17:28:31,819][02698] Heartbeat connected on RolloutWorker_w1
+[2025-08-01 17:28:31,821][02698] Heartbeat connected on RolloutWorker_w3
+[2025-08-01 17:28:31,824][02698] Heartbeat connected on RolloutWorker_w4
+[2025-08-01 17:28:31,827][02698] Heartbeat connected on RolloutWorker_w5
+[2025-08-01 17:28:31,834][02698] Heartbeat connected on RolloutWorker_w7
+[2025-08-01 17:28:31,835][02698] Heartbeat connected on RolloutWorker_w6
+[2025-08-01 17:28:34,347][02835] No checkpoints found
+[2025-08-01 17:28:34,347][02835] Did not load from checkpoint, starting from scratch!
+[2025-08-01 17:28:34,347][02835] Initialized policy 0 weights for model version 0
+[2025-08-01 17:28:34,350][02835] LearnerWorker_p0 finished initialization!
+[2025-08-01 17:28:34,351][02835] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-01 17:28:34,351][02698] Heartbeat connected on LearnerWorker_p0
+[2025-08-01 17:28:34,493][02852] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-01 17:28:34,494][02852] RunningMeanStd input shape: (1,)
+[2025-08-01 17:28:34,505][02852] ConvEncoder: input_channels=3
+[2025-08-01 17:28:34,606][02852] Conv encoder output size: 512
+[2025-08-01 17:28:34,606][02852] Policy head output size: 512
+[2025-08-01 17:28:34,644][02698] Inference worker 0-0 is ready!
+[2025-08-01 17:28:34,645][02698] All inference workers are ready! Signal rollout workers to start!
+[2025-08-01 17:28:34,932][02855] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-01 17:28:34,934][02859] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-01 17:28:34,944][02856] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-01 17:28:34,992][02857] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-01 17:28:34,999][02853] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-01 17:28:35,023][02860] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-01 17:28:35,024][02854] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-01 17:28:35,043][02858] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-01 17:28:36,088][02698] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-01 17:28:36,305][02856] Decorrelating experience for 0 frames...
+[2025-08-01 17:28:36,307][02860] Decorrelating experience for 0 frames...
+[2025-08-01 17:28:36,307][02855] Decorrelating experience for 0 frames...
+[2025-08-01 17:28:36,305][02859] Decorrelating experience for 0 frames...
+[2025-08-01 17:28:36,309][02854] Decorrelating experience for 0 frames...
+[2025-08-01 17:28:36,704][02855] Decorrelating experience for 32 frames...
+[2025-08-01 17:28:37,381][02858] Decorrelating experience for 0 frames...
+[2025-08-01 17:28:37,408][02860] Decorrelating experience for 32 frames...
+[2025-08-01 17:28:37,410][02854] Decorrelating experience for 32 frames...
+[2025-08-01 17:28:37,414][02856] Decorrelating experience for 32 frames...
+[2025-08-01 17:28:38,241][02859] Decorrelating experience for 32 frames...
+[2025-08-01 17:28:38,243][02857] Decorrelating experience for 0 frames...
+[2025-08-01 17:28:38,896][02858] Decorrelating experience for 32 frames...
+[2025-08-01 17:28:39,397][02860] Decorrelating experience for 64 frames...
+[2025-08-01 17:28:39,399][02854] Decorrelating experience for 64 frames...
+[2025-08-01 17:28:40,157][02857] Decorrelating experience for 32 frames...
+[2025-08-01 17:28:40,166][02855] Decorrelating experience for 64 frames...
+[2025-08-01 17:28:40,177][02853] Decorrelating experience for 0 frames...
+[2025-08-01 17:28:40,800][02858] Decorrelating experience for 64 frames...
+[2025-08-01 17:28:40,955][02860] Decorrelating experience for 96 frames...
+[2025-08-01 17:28:40,965][02854] Decorrelating experience for 96 frames...
+[2025-08-01 17:28:41,088][02698] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-01 17:28:42,345][02853] Decorrelating experience for 32 frames...
+[2025-08-01 17:28:42,351][02859] Decorrelating experience for 64 frames...
+[2025-08-01 17:28:43,011][02857] Decorrelating experience for 64 frames...
+[2025-08-01 17:28:43,361][02855] Decorrelating experience for 96 frames...
+[2025-08-01 17:28:44,657][02859] Decorrelating experience for 96 frames...
+[2025-08-01 17:28:45,485][02857] Decorrelating experience for 96 frames...
+[2025-08-01 17:28:45,674][02858] Decorrelating experience for 96 frames...
+[2025-08-01 17:28:45,695][02856] Decorrelating experience for 64 frames...
+[2025-08-01 17:28:46,088][02698] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 19.8. Samples: 198. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-01 17:28:46,089][02698] Avg episode reward: [(0, '2.655')]
+[2025-08-01 17:28:47,822][02835] Signal inference workers to stop experience collection...
+[2025-08-01 17:28:47,842][02852] InferenceWorker_p0-w0: stopping experience collection
+[2025-08-01 17:28:47,894][02853] Decorrelating experience for 64 frames...
+[2025-08-01 17:28:48,118][02856] Decorrelating experience for 96 frames...
+[2025-08-01 17:28:48,508][02853] Decorrelating experience for 96 frames...
+[2025-08-01 17:28:49,321][02835] Signal inference workers to resume experience collection...
+[2025-08-01 17:28:49,322][02852] InferenceWorker_p0-w0: resuming experience collection
+[2025-08-01 17:28:51,088][02698] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 168.3. Samples: 2524. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2025-08-01 17:28:51,090][02698] Avg episode reward: [(0, '3.087')]
+[2025-08-01 17:28:56,090][02698] Fps is (10 sec: 3276.0, 60 sec: 1638.2, 300 sec: 1638.2). Total num frames: 32768. Throughput: 0: 435.8. Samples: 8718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:28:56,095][02698] Avg episode reward: [(0, '3.684')]
+[2025-08-01 17:28:58,924][02852] Updated weights for policy 0, policy_version 10 (0.0100)
+[2025-08-01 17:29:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 425.1. Samples: 10628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:29:01,092][02698] Avg episode reward: [(0, '4.146')]
+[2025-08-01 17:29:06,088][02698] Fps is (10 sec: 3687.3, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 69632. Throughput: 0: 566.3. Samples: 16990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:29:06,092][02698] Avg episode reward: [(0, '4.383')]
+[2025-08-01 17:29:08,241][02852] Updated weights for policy 0, policy_version 20 (0.0021)
+[2025-08-01 17:29:11,093][02698] Fps is (10 sec: 4093.7, 60 sec: 2574.2, 300 sec: 2574.2). Total num frames: 90112. Throughput: 0: 659.3. Samples: 23078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:29:11,095][02698] Avg episode reward: [(0, '4.390')]
+[2025-08-01 17:29:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 623.4. Samples: 24934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:29:16,089][02698] Avg episode reward: [(0, '4.256')]
+[2025-08-01 17:29:16,097][02835] Saving new best policy, reward=4.256!
+[2025-08-01 17:29:19,712][02852] Updated weights for policy 0, policy_version 30 (0.0015)
+[2025-08-01 17:29:21,088][02698] Fps is (10 sec: 3688.5, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 694.6. Samples: 31256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:29:21,089][02698] Avg episode reward: [(0, '4.311')]
+[2025-08-01 17:29:21,093][02835] Saving new best policy, reward=4.311!
+[2025-08-01 17:29:26,088][02698] Fps is (10 sec: 4096.0, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 147456. Throughput: 0: 829.0. Samples: 37304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:29:26,089][02698] Avg episode reward: [(0, '4.300')]
+[2025-08-01 17:29:26,104][02835] Saving new best policy, reward=4.352!
+[2025-08-01 17:29:30,815][02852] Updated weights for policy 0, policy_version 40 (0.0033)
+[2025-08-01 17:29:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 163840. Throughput: 0: 867.2. Samples: 39220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:29:31,089][02698] Avg episode reward: [(0, '4.388')]
+[2025-08-01 17:29:31,093][02835] Saving new best policy, reward=4.388!
+[2025-08-01 17:29:36,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 958.7. Samples: 45664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:29:36,089][02698] Avg episode reward: [(0, '4.505')]
+[2025-08-01 17:29:36,096][02835] Saving new best policy, reward=4.492!
+[2025-08-01 17:29:41,089][02698] Fps is (10 sec: 2866.8, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 901.7. Samples: 49294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:29:41,102][02698] Avg episode reward: [(0, '4.531')]
+[2025-08-01 17:29:41,107][02835] Saving new best policy, reward=4.531!
+[2025-08-01 17:29:44,705][02852] Updated weights for policy 0, policy_version 50 (0.0021)
+[2025-08-01 17:29:46,088][02698] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 208896. Throughput: 0: 891.6. Samples: 50750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:29:46,092][02698] Avg episode reward: [(0, '4.485')]
+[2025-08-01 17:29:51,088][02698] Fps is (10 sec: 3686.9, 60 sec: 3618.1, 300 sec: 3058.3). Total num frames: 229376. Throughput: 0: 894.0. Samples: 57218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:29:51,091][02698] Avg episode reward: [(0, '4.324')]
+[2025-08-01 17:29:53,952][02852] Updated weights for policy 0, policy_version 60 (0.0015)
+[2025-08-01 17:29:56,088][02698] Fps is (10 sec: 4095.9, 60 sec: 3618.3, 300 sec: 3123.2). Total num frames: 249856. Throughput: 0: 892.4. Samples: 63232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:29:56,092][02698] Avg episode reward: [(0, '4.333')]
+[2025-08-01 17:30:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3132.2). Total num frames: 266240. Throughput: 0: 894.5. Samples: 65186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:30:01,093][02698] Avg episode reward: [(0, '4.505')]
+[2025-08-01 17:30:05,046][02852] Updated weights for policy 0, policy_version 70 (0.0014)
+[2025-08-01 17:30:06,088][02698] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3231.3). Total num frames: 290816. Throughput: 0: 901.3. Samples: 71814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:30:06,091][02698] Avg episode reward: [(0, '4.588')]
+[2025-08-01 17:30:06,098][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth...
+[2025-08-01 17:30:06,205][02835] Saving new best policy, reward=4.588!
+[2025-08-01 17:30:11,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3618.5, 300 sec: 3233.7). Total num frames: 307200. Throughput: 0: 889.2. Samples: 77318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:30:11,103][02698] Avg episode reward: [(0, '4.663')]
+[2025-08-01 17:30:11,110][02835] Saving new best policy, reward=4.663!
+[2025-08-01 17:30:16,089][02698] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3235.8). Total num frames: 323584. Throughput: 0: 889.4. Samples: 79244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:30:16,090][02698] Avg episode reward: [(0, '4.595')]
+[2025-08-01 17:30:16,390][02852] Updated weights for policy 0, policy_version 80 (0.0029)
+[2025-08-01 17:30:21,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3315.8). Total num frames: 348160. Throughput: 0: 898.6. Samples: 86100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:30:21,089][02698] Avg episode reward: [(0, '4.642')]
+[2025-08-01 17:30:26,088][02698] Fps is (10 sec: 4096.2, 60 sec: 3618.1, 300 sec: 3314.0). Total num frames: 364544. Throughput: 0: 943.2. Samples: 91736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:30:26,093][02698] Avg episode reward: [(0, '4.642')]
+[2025-08-01 17:30:27,388][02852] Updated weights for policy 0, policy_version 90 (0.0017)
+[2025-08-01 17:30:31,095][02698] Fps is (10 sec: 3684.0, 60 sec: 3686.0, 300 sec: 3347.8). Total num frames: 385024. Throughput: 0: 963.6. Samples: 94120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:30:31,096][02698] Avg episode reward: [(0, '4.506')]
+[2025-08-01 17:30:36,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3379.2). Total num frames: 405504. Throughput: 0: 972.2. Samples: 100968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-01 17:30:36,092][02698] Avg episode reward: [(0, '4.570')]
+[2025-08-01 17:30:36,476][02852] Updated weights for policy 0, policy_version 100 (0.0025)
+[2025-08-01 17:30:41,095][02698] Fps is (10 sec: 3686.3, 60 sec: 3822.6, 300 sec: 3374.9). Total num frames: 421888. Throughput: 0: 954.8. Samples: 106206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-01 17:30:41,098][02698] Avg episode reward: [(0, '4.631')]
+[2025-08-01 17:30:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3402.8). Total num frames: 442368. Throughput: 0: 970.1. Samples: 108840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:30:46,089][02698] Avg episode reward: [(0, '4.628')]
+[2025-08-01 17:30:47,609][02852] Updated weights for policy 0, policy_version 110 (0.0015)
+[2025-08-01 17:30:51,088][02698] Fps is (10 sec: 4098.8, 60 sec: 3891.2, 300 sec: 3428.5). Total num frames: 462848. Throughput: 0: 971.9. Samples: 115550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:30:51,089][02698] Avg episode reward: [(0, '4.722')]
+[2025-08-01 17:30:51,092][02835] Saving new best policy, reward=4.722!
+[2025-08-01 17:30:56,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3423.1). Total num frames: 479232. Throughput: 0: 959.9. Samples: 120514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:30:56,092][02698] Avg episode reward: [(0, '4.742')]
+[2025-08-01 17:30:56,099][02835] Saving new best policy, reward=4.742!
+[2025-08-01 17:30:58,672][02852] Updated weights for policy 0, policy_version 120 (0.0020)
+[2025-08-01 17:31:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3446.3). Total num frames: 499712. Throughput: 0: 982.4. Samples: 123452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:31:01,093][02698] Avg episode reward: [(0, '4.643')]
+[2025-08-01 17:31:06,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3495.3). Total num frames: 524288. Throughput: 0: 983.6. Samples: 130362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:31:06,092][02698] Avg episode reward: [(0, '4.681')]
+[2025-08-01 17:31:08,142][02852] Updated weights for policy 0, policy_version 130 (0.0012)
+[2025-08-01 17:31:11,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3461.8). Total num frames: 536576. Throughput: 0: 969.5. Samples: 135364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:31:11,093][02698] Avg episode reward: [(0, '4.797')]
+[2025-08-01 17:31:11,157][02835] Saving new best policy, reward=4.797!
+[2025-08-01 17:31:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3507.2). Total num frames: 561152. Throughput: 0: 985.3. Samples: 138452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:31:16,089][02698] Avg episode reward: [(0, '5.125')]
+[2025-08-01 17:31:16,096][02835] Saving new best policy, reward=5.125!
+[2025-08-01 17:31:18,500][02852] Updated weights for policy 0, policy_version 140 (0.0018)
+[2025-08-01 17:31:21,088][02698] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3525.0). Total num frames: 581632. Throughput: 0: 985.6. Samples: 145322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:31:21,089][02698] Avg episode reward: [(0, '4.893')]
+[2025-08-01 17:31:26,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3517.7). Total num frames: 598016. Throughput: 0: 973.9. Samples: 150026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:31:26,089][02698] Avg episode reward: [(0, '4.502')]
+[2025-08-01 17:31:29,417][02852] Updated weights for policy 0, policy_version 150 (0.0018)
+[2025-08-01 17:31:31,088][02698] Fps is (10 sec: 3686.5, 60 sec: 3891.6, 300 sec: 3534.3). Total num frames: 618496. Throughput: 0: 989.5. Samples: 153368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:31:31,089][02698] Avg episode reward: [(0, '4.596')]
+[2025-08-01 17:31:36,088][02698] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3572.6). Total num frames: 643072. Throughput: 0: 989.4. Samples: 160072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:31:36,092][02698] Avg episode reward: [(0, '5.116')]
+[2025-08-01 17:31:40,453][02852] Updated weights for policy 0, policy_version 160 (0.0019)
+[2025-08-01 17:31:41,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.6, 300 sec: 3542.5). Total num frames: 655360. Throughput: 0: 979.4. Samples: 164586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:31:41,091][02698] Avg episode reward: [(0, '5.249')]
+[2025-08-01 17:31:41,094][02835] Saving new best policy, reward=5.249!
+[2025-08-01 17:31:46,088][02698] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3578.6). Total num frames: 679936. Throughput: 0: 984.4. Samples: 167748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:31:46,092][02698] Avg episode reward: [(0, '5.159')]
+[2025-08-01 17:31:49,815][02852] Updated weights for policy 0, policy_version 170 (0.0016)
+[2025-08-01 17:31:51,090][02698] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3570.8). Total num frames: 696320. Throughput: 0: 978.6. Samples: 174400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:31:51,094][02698] Avg episode reward: [(0, '4.998')]
+[2025-08-01 17:31:56,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 971.9. Samples: 179098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:31:56,093][02698] Avg episode reward: [(0, '5.019')]
+[2025-08-01 17:32:00,583][02852] Updated weights for policy 0, policy_version 180 (0.0015)
+[2025-08-01 17:32:01,088][02698] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3596.5). Total num frames: 737280. Throughput: 0: 982.8. Samples: 182680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:32:01,092][02698] Avg episode reward: [(0, '5.457')]
+[2025-08-01 17:32:01,098][02835] Saving new best policy, reward=5.457!
+[2025-08-01 17:32:06,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3608.4). Total num frames: 757760. Throughput: 0: 979.5. Samples: 189400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-01 17:32:06,091][02698] Avg episode reward: [(0, '5.595')]
+[2025-08-01 17:32:06,103][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000185_757760.pth...
+[2025-08-01 17:32:06,251][02835] Saving new best policy, reward=5.595!
+[2025-08-01 17:32:11,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3600.7). Total num frames: 774144. Throughput: 0: 982.4. Samples: 194234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:32:11,091][02698] Avg episode reward: [(0, '5.753')]
+[2025-08-01 17:32:11,093][02835] Saving new best policy, reward=5.753!
+[2025-08-01 17:32:11,552][02852] Updated weights for policy 0, policy_version 190 (0.0021)
+[2025-08-01 17:32:16,088][02698] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3630.5). Total num frames: 798720. Throughput: 0: 979.5. Samples: 197446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:32:16,097][02698] Avg episode reward: [(0, '5.987')]
+[2025-08-01 17:32:16,104][02835] Saving new best policy, reward=5.987!
+[2025-08-01 17:32:21,088][02698] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3622.7). Total num frames: 815104. Throughput: 0: 976.4. Samples: 204012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-01 17:32:21,090][02698] Avg episode reward: [(0, '5.989')]
+[2025-08-01 17:32:21,092][02835] Saving new best policy, reward=5.989!
+[2025-08-01 17:32:21,791][02852] Updated weights for policy 0, policy_version 200 (0.0047)
+[2025-08-01 17:32:26,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3615.2). Total num frames: 831488. Throughput: 0: 986.6. Samples: 208982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:32:26,092][02698] Avg episode reward: [(0, '6.570')]
+[2025-08-01 17:32:26,108][02835] Saving new best policy, reward=6.570!
+[2025-08-01 17:32:31,088][02698] Fps is (10 sec: 4096.1, 60 sec: 3959.4, 300 sec: 3642.8). Total num frames: 856064. Throughput: 0: 991.6. Samples: 212370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:32:31,092][02698] Avg episode reward: [(0, '6.729')]
+[2025-08-01 17:32:31,096][02835] Saving new best policy, reward=6.729!
+[2025-08-01 17:32:31,495][02852] Updated weights for policy 0, policy_version 210 (0.0012)
+[2025-08-01 17:32:36,090][02698] Fps is (10 sec: 4095.2, 60 sec: 3822.8, 300 sec: 3635.2). Total num frames: 872448. Throughput: 0: 983.4. Samples: 218654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:32:36,095][02698] Avg episode reward: [(0, '6.733')]
+[2025-08-01 17:32:36,106][02835] Saving new best policy, reward=6.733!
+[2025-08-01 17:32:41,088][02698] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3644.6). Total num frames: 892928. Throughput: 0: 995.7. Samples: 223906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-01 17:32:41,095][02698] Avg episode reward: [(0, '6.304')]
+[2025-08-01 17:32:42,339][02852] Updated weights for policy 0, policy_version 220 (0.0023)
+[2025-08-01 17:32:46,088][02698] Fps is (10 sec: 4506.5, 60 sec: 3959.5, 300 sec: 3670.0). Total num frames: 917504. Throughput: 0: 989.6. Samples: 227214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:32:46,094][02698] Avg episode reward: [(0, '6.269')]
+[2025-08-01 17:32:51,088][02698] Fps is (10 sec: 4095.9, 60 sec: 3959.6, 300 sec: 3662.3). Total num frames: 933888. Throughput: 0: 973.6. Samples: 233212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:32:51,094][02698] Avg episode reward: [(0, '6.060')]
+[2025-08-01 17:32:53,305][02852] Updated weights for policy 0, policy_version 230 (0.0017)
+[2025-08-01 17:32:56,088][02698] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3670.6). Total num frames: 954368. Throughput: 0: 991.4. Samples: 238846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:32:56,089][02698] Avg episode reward: [(0, '6.089')]
+[2025-08-01 17:33:01,088][02698] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3678.7). Total num frames: 974848. Throughput: 0: 995.7. Samples: 242254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:33:01,089][02698] Avg episode reward: [(0, '6.167')]
+[2025-08-01 17:33:02,270][02852] Updated weights for policy 0, policy_version 240 (0.0027)
+[2025-08-01 17:33:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3671.2). Total num frames: 991232. Throughput: 0: 978.6. Samples: 248048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:33:06,091][02698] Avg episode reward: [(0, '6.106')]
+[2025-08-01 17:33:11,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3679.0). Total num frames: 1011712. Throughput: 0: 987.6. Samples: 253426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:33:11,092][02698] Avg episode reward: [(0, '6.346')]
+[2025-08-01 17:33:15,097][02852] Updated weights for policy 0, policy_version 250 (0.0015)
+[2025-08-01 17:33:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3671.8). Total num frames: 1028096. Throughput: 0: 954.9. Samples: 255342. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:33:16,092][02698] Avg episode reward: [(0, '6.526')]
+[2025-08-01 17:33:21,088][02698] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3650.5). Total num frames: 1040384. Throughput: 0: 931.5. Samples: 260568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:33:21,093][02698] Avg episode reward: [(0, '6.934')]
+[2025-08-01 17:33:21,097][02835] Saving new best policy, reward=6.934!
+[2025-08-01 17:33:25,907][02852] Updated weights for policy 0, policy_version 260 (0.0025)
+[2025-08-01 17:33:26,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3672.3). Total num frames: 1064960. Throughput: 0: 944.3. Samples: 266398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:33:26,089][02698] Avg episode reward: [(0, '7.142')]
+[2025-08-01 17:33:26,095][02835] Saving new best policy, reward=7.142!
+[2025-08-01 17:33:31,088][02698] Fps is (10 sec: 4505.7, 60 sec: 3823.0, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 946.8. Samples: 269822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:33:31,089][02698] Avg episode reward: [(0, '7.686')]
+[2025-08-01 17:33:31,091][02835] Saving new best policy, reward=7.686!
+[2025-08-01 17:33:36,088][02698] Fps is (10 sec: 3686.3, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 928.5. Samples: 274994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:33:36,090][02698] Avg episode reward: [(0, '8.522')]
+[2025-08-01 17:33:36,097][02835] Saving new best policy, reward=8.522!
+[2025-08-01 17:33:37,009][02852] Updated weights for policy 0, policy_version 270 (0.0025)
+[2025-08-01 17:33:41,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 941.0. Samples: 281190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:33:41,092][02698] Avg episode reward: [(0, '9.137')]
+[2025-08-01 17:33:41,094][02835] Saving new best policy, reward=9.137!
+[2025-08-01 17:33:46,092][02698] Fps is (10 sec: 4094.3, 60 sec: 3754.4, 300 sec: 3832.1). Total num frames: 1142784. Throughput: 0: 941.1. Samples: 284606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:33:46,094][02698] Avg episode reward: [(0, '8.802')]
+[2025-08-01 17:33:46,367][02852] Updated weights for policy 0, policy_version 280 (0.0033)
+[2025-08-01 17:33:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1159168. Throughput: 0: 921.2. Samples: 289502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:33:51,090][02698] Avg episode reward: [(0, '8.504')]
+[2025-08-01 17:33:56,088][02698] Fps is (10 sec: 3688.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1179648. Throughput: 0: 946.9. Samples: 296036. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:33:56,091][02698] Avg episode reward: [(0, '9.128')]
+[2025-08-01 17:33:56,995][02852] Updated weights for policy 0, policy_version 290 (0.0022)
+[2025-08-01 17:34:01,088][02698] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1204224. Throughput: 0: 980.4. Samples: 299460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:34:01,091][02698] Avg episode reward: [(0, '9.695')]
+[2025-08-01 17:34:01,095][02835] Saving new best policy, reward=9.695!
+[2025-08-01 17:34:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.4). Total num frames: 1216512. Throughput: 0: 967.7. Samples: 304114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-01 17:34:06,092][02698] Avg episode reward: [(0, '9.875')]
+[2025-08-01 17:34:06,100][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000297_1216512.pth...
+[2025-08-01 17:34:06,225][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth
+[2025-08-01 17:34:06,239][02835] Saving new best policy, reward=9.875!
+[2025-08-01 17:34:08,152][02852] Updated weights for policy 0, policy_version 300 (0.0019)
+[2025-08-01 17:34:11,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1241088. Throughput: 0: 985.2. Samples: 310730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:34:11,090][02698] Avg episode reward: [(0, '9.852')]
+[2025-08-01 17:34:16,093][02698] Fps is (10 sec: 4503.1, 60 sec: 3890.8, 300 sec: 3846.0). Total num frames: 1261568. Throughput: 0: 985.7. Samples: 314186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:34:16,097][02698] Avg episode reward: [(0, '10.712')]
+[2025-08-01 17:34:16,111][02835] Saving new best policy, reward=10.712!
+[2025-08-01 17:34:18,571][02852] Updated weights for policy 0, policy_version 310 (0.0023)
+[2025-08-01 17:34:21,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1277952. Throughput: 0: 974.5. Samples: 318844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:34:21,093][02698] Avg episode reward: [(0, '10.933')]
+[2025-08-01 17:34:21,097][02835] Saving new best policy, reward=10.933!
+[2025-08-01 17:34:26,088][02698] Fps is (10 sec: 4098.3, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1302528. Throughput: 0: 988.5. Samples: 325674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:34:26,092][02698] Avg episode reward: [(0, '12.438')]
+[2025-08-01 17:34:26,099][02835] Saving new best policy, reward=12.438!
+[2025-08-01 17:34:27,932][02852] Updated weights for policy 0, policy_version 320 (0.0017)
+[2025-08-01 17:34:31,095][02698] Fps is (10 sec: 4092.9, 60 sec: 3890.7, 300 sec: 3846.0). Total num frames: 1318912. Throughput: 0: 989.4. Samples: 329130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:34:31,099][02698] Avg episode reward: [(0, '12.477')]
+[2025-08-01 17:34:31,111][02835] Saving new best policy, reward=12.477!
+[2025-08-01 17:34:36,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1339392. Throughput: 0: 984.1. Samples: 333788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:34:36,092][02698] Avg episode reward: [(0, '13.024')]
+[2025-08-01 17:34:36,099][02835] Saving new best policy, reward=13.024!
+[2025-08-01 17:34:38,660][02852] Updated weights for policy 0, policy_version 330 (0.0014)
+[2025-08-01 17:34:41,088][02698] Fps is (10 sec: 4099.1, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1359872. Throughput: 0: 993.2. Samples: 340732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:34:41,092][02698] Avg episode reward: [(0, '12.956')]
+[2025-08-01 17:34:46,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3901.6). Total num frames: 1380352. Throughput: 0: 994.4. Samples: 344206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:34:46,093][02698] Avg episode reward: [(0, '13.146')]
+[2025-08-01 17:34:46,103][02835] Saving new best policy, reward=13.146!
+[2025-08-01 17:34:49,550][02852] Updated weights for policy 0, policy_version 340 (0.0018)
+[2025-08-01 17:34:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1396736. Throughput: 0: 992.1. Samples: 348758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:34:51,094][02698] Avg episode reward: [(0, '13.243')]
+[2025-08-01 17:34:51,096][02835] Saving new best policy, reward=13.243!
+[2025-08-01 17:34:56,088][02698] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1421312. Throughput: 0: 997.3. Samples: 355610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:34:56,090][02698] Avg episode reward: [(0, '13.926')]
+[2025-08-01 17:34:56,096][02835] Saving new best policy, reward=13.926!
+[2025-08-01 17:34:58,716][02852] Updated weights for policy 0, policy_version 350 (0.0037)
+[2025-08-01 17:35:01,090][02698] Fps is (10 sec: 4095.0, 60 sec: 3891.0, 300 sec: 3887.7). Total num frames: 1437696. Throughput: 0: 993.8. Samples: 358902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:35:01,099][02698] Avg episode reward: [(0, '14.527')]
+[2025-08-01 17:35:01,103][02835] Saving new best policy, reward=14.527!
+[2025-08-01 17:35:06,088][02698] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1454080. Throughput: 0: 990.5. Samples: 363418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:35:06,089][02698] Avg episode reward: [(0, '13.978')]
+[2025-08-01 17:35:09,641][02852] Updated weights for policy 0, policy_version 360 (0.0027)
+[2025-08-01 17:35:11,088][02698] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1478656. Throughput: 0: 992.0. Samples: 370312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-01 17:35:11,094][02698] Avg episode reward: [(0, '14.135')]
+[2025-08-01 17:35:16,094][02698] Fps is (10 sec: 4093.5, 60 sec: 3891.2, 300 sec: 3887.6). Total num frames: 1495040. Throughput: 0: 983.9. Samples: 373406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:35:16,096][02698] Avg episode reward: [(0, '15.391')]
+[2025-08-01 17:35:16,111][02835] Saving new best policy, reward=15.391!
+[2025-08-01 17:35:20,772][02852] Updated weights for policy 0, policy_version 370 (0.0015)
+[2025-08-01 17:35:21,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1515520. Throughput: 0: 989.3. Samples: 378308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-01 17:35:21,092][02698] Avg episode reward: [(0, '15.315')]
+[2025-08-01 17:35:26,088][02698] Fps is (10 sec: 4508.4, 60 sec: 3959.5, 300 sec: 3915.6). Total num frames: 1540096. Throughput: 0: 988.4. Samples: 385212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:35:26,089][02698] Avg episode reward: [(0, '15.540')]
+[2025-08-01 17:35:26,096][02835] Saving new best policy, reward=15.540!
+[2025-08-01 17:35:30,683][02852] Updated weights for policy 0, policy_version 380 (0.0014)
+[2025-08-01 17:35:31,090][02698] Fps is (10 sec: 4095.3, 60 sec: 3959.8, 300 sec: 3901.6). Total num frames: 1556480. Throughput: 0: 976.8. Samples: 388164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:35:31,097][02698] Avg episode reward: [(0, '16.398')]
+[2025-08-01 17:35:31,103][02835] Saving new best policy, reward=16.398!
+[2025-08-01 17:35:36,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.6). Total num frames: 1576960. Throughput: 0: 991.4. Samples: 393372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:35:36,089][02698] Avg episode reward: [(0, '16.327')]
+[2025-08-01 17:35:40,243][02852] Updated weights for policy 0, policy_version 390 (0.0014)
+[2025-08-01 17:35:41,088][02698] Fps is (10 sec: 4096.6, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 1597440. Throughput: 0: 993.2. Samples: 400306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:35:41,092][02698] Avg episode reward: [(0, '16.604')]
+[2025-08-01 17:35:41,156][02835] Saving new best policy, reward=16.604!
+[2025-08-01 17:35:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1613824. Throughput: 0: 979.8. Samples: 402992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:35:46,095][02698] Avg episode reward: [(0, '17.030')]
+[2025-08-01 17:35:46,107][02835] Saving new best policy, reward=17.030!
+[2025-08-01 17:35:51,088][02698] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1634304. Throughput: 0: 996.2. Samples: 408248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:35:51,089][02698] Avg episode reward: [(0, '15.661')]
+[2025-08-01 17:35:51,278][02852] Updated weights for policy 0, policy_version 400 (0.0014)
+[2025-08-01 17:35:56,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1658880. Throughput: 0: 998.2. Samples: 415230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:35:56,090][02698] Avg episode reward: [(0, '14.373')]
+[2025-08-01 17:36:01,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3901.6). Total num frames: 1675264. Throughput: 0: 987.4. Samples: 417834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:36:01,089][02698] Avg episode reward: [(0, '13.823')]
+[2025-08-01 17:36:02,036][02852] Updated weights for policy 0, policy_version 410 (0.0021)
+[2025-08-01 17:36:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1695744. Throughput: 0: 1006.0. Samples: 423576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-01 17:36:06,090][02698] Avg episode reward: [(0, '12.577')]
+[2025-08-01 17:36:06,097][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_1695744.pth...
+[2025-08-01 17:36:06,215][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000185_757760.pth
+[2025-08-01 17:36:10,606][02852] Updated weights for policy 0, policy_version 420 (0.0024)
+[2025-08-01 17:36:11,088][02698] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1720320. Throughput: 0: 1009.1. Samples: 430622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:36:11,089][02698] Avg episode reward: [(0, '12.525')]
+[2025-08-01 17:36:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.9, 300 sec: 3901.6). Total num frames: 1732608. Throughput: 0: 995.1. Samples: 432942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:36:16,089][02698] Avg episode reward: [(0, '12.542')]
+[2025-08-01 17:36:21,088][02698] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1757184. Throughput: 0: 1010.2. Samples: 438832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:36:21,089][02698] Avg episode reward: [(0, '13.692')]
+[2025-08-01 17:36:21,420][02852] Updated weights for policy 0, policy_version 430 (0.0017)
+[2025-08-01 17:36:26,090][02698] Fps is (10 sec: 4914.2, 60 sec: 4027.6, 300 sec: 3943.2). Total num frames: 1781760. Throughput: 0: 1011.9. Samples: 445842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:36:26,095][02698] Avg episode reward: [(0, '15.525')]
+[2025-08-01 17:36:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3901.6). Total num frames: 1794048. Throughput: 0: 999.4. Samples: 447966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:36:31,093][02698] Avg episode reward: [(0, '16.342')]
+[2025-08-01 17:36:32,026][02852] Updated weights for policy 0, policy_version 440 (0.0027)
+[2025-08-01 17:36:36,088][02698] Fps is (10 sec: 3687.1, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1818624. Throughput: 0: 1024.1. Samples: 454332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:36:36,090][02698] Avg episode reward: [(0, '18.705')]
+[2025-08-01 17:36:36,098][02835] Saving new best policy, reward=18.705!
+[2025-08-01 17:36:41,088][02698] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1839104. Throughput: 0: 1008.9. Samples: 460630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-01 17:36:41,098][02698] Avg episode reward: [(0, '19.674')]
+[2025-08-01 17:36:41,100][02835] Saving new best policy, reward=19.674!
+[2025-08-01 17:36:42,346][02852] Updated weights for policy 0, policy_version 450 (0.0037)
+[2025-08-01 17:36:46,088][02698] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1847296. Throughput: 0: 976.4. Samples: 461774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:36:46,092][02698] Avg episode reward: [(0, '19.751')]
+[2025-08-01 17:36:46,106][02835] Saving new best policy, reward=19.751!
+[2025-08-01 17:36:51,088][02698] Fps is (10 sec: 3277.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1871872. Throughput: 0: 975.3. Samples: 467466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:36:51,095][02698] Avg episode reward: [(0, '20.146')]
+[2025-08-01 17:36:51,099][02835] Saving new best policy, reward=20.146!
+[2025-08-01 17:36:53,319][02852] Updated weights for policy 0, policy_version 460 (0.0020)
+[2025-08-01 17:36:56,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1892352. Throughput: 0: 973.9. Samples: 474448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:36:56,092][02698] Avg episode reward: [(0, '20.184')]
+[2025-08-01 17:36:56,101][02835] Saving new best policy, reward=20.184!
+[2025-08-01 17:37:01,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1912832. Throughput: 0: 968.1. Samples: 476508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:37:01,094][02698] Avg episode reward: [(0, '18.801')]
+[2025-08-01 17:37:03,649][02852] Updated weights for policy 0, policy_version 470 (0.0014)
+[2025-08-01 17:37:06,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1933312. Throughput: 0: 980.3. Samples: 482944. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:37:06,089][02698] Avg episode reward: [(0, '18.533')]
+[2025-08-01 17:37:11,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1953792. Throughput: 0: 975.5. Samples: 489736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:37:11,091][02698] Avg episode reward: [(0, '18.537')]
+[2025-08-01 17:37:14,172][02852] Updated weights for policy 0, policy_version 480 (0.0019)
+[2025-08-01 17:37:16,089][02698] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1974272. Throughput: 0: 972.4. Samples: 491726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:37:16,092][02698] Avg episode reward: [(0, '18.844')]
+[2025-08-01 17:37:21,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 1994752. Throughput: 0: 976.3. Samples: 498264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:37:21,089][02698] Avg episode reward: [(0, '19.873')]
+[2025-08-01 17:37:23,107][02852] Updated weights for policy 0, policy_version 490 (0.0026)
+[2025-08-01 17:37:26,089][02698] Fps is (10 sec: 4095.8, 60 sec: 3891.3, 300 sec: 3929.4). Total num frames: 2015232. Throughput: 0: 977.1. Samples: 504600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:37:26,091][02698] Avg episode reward: [(0, '19.959')]
+[2025-08-01 17:37:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2031616. Throughput: 0: 993.2. Samples: 506466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:37:31,089][02698] Avg episode reward: [(0, '19.443')]
+[2025-08-01 17:37:34,234][02852] Updated weights for policy 0, policy_version 500 (0.0021)
+[2025-08-01 17:37:36,088][02698] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2052096. Throughput: 0: 1014.3. Samples: 513110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:37:36,093][02698] Avg episode reward: [(0, '19.624')]
+[2025-08-01 17:37:41,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2072576. Throughput: 0: 985.5. Samples: 518794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:37:41,092][02698] Avg episode reward: [(0, '20.270')]
+[2025-08-01 17:37:41,095][02835] Saving new best policy, reward=20.270!
+[2025-08-01 17:37:45,460][02852] Updated weights for policy 0, policy_version 510 (0.0033)
+[2025-08-01 17:37:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2088960. Throughput: 0: 985.1. Samples: 520838. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-01 17:37:46,091][02698] Avg episode reward: [(0, '20.035')]
+[2025-08-01 17:37:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2109440. Throughput: 0: 983.2. Samples: 527188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:37:51,091][02698] Avg episode reward: [(0, '20.689')]
+[2025-08-01 17:37:51,095][02835] Saving new best policy, reward=20.689!
+[2025-08-01 17:37:56,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2125824. Throughput: 0: 954.3. Samples: 532678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:37:56,092][02698] Avg episode reward: [(0, '21.080')]
+[2025-08-01 17:37:56,105][02835] Saving new best policy, reward=21.080!
+[2025-08-01 17:37:56,376][02852] Updated weights for policy 0, policy_version 520 (0.0014)
+[2025-08-01 17:38:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2146304. Throughput: 0: 958.9. Samples: 534874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:38:01,092][02698] Avg episode reward: [(0, '21.029')]
+[2025-08-01 17:38:06,061][02852] Updated weights for policy 0, policy_version 530 (0.0019)
+[2025-08-01 17:38:06,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2170880. Throughput: 0: 959.6. Samples: 541446. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-01 17:38:06,089][02698] Avg episode reward: [(0, '22.070')]
+[2025-08-01 17:38:06,097][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000530_2170880.pth...
+[2025-08-01 17:38:06,234][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000297_1216512.pth
+[2025-08-01 17:38:06,244][02835] Saving new best policy, reward=22.070!
+[2025-08-01 17:38:11,089][02698] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2183168. Throughput: 0: 936.0. Samples: 546720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:38:11,090][02698] Avg episode reward: [(0, '21.287')]
+[2025-08-01 17:38:16,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3943.3). Total num frames: 2203648. Throughput: 0: 952.8. Samples: 549342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:38:16,091][02698] Avg episode reward: [(0, '20.897')]
+[2025-08-01 17:38:17,579][02852] Updated weights for policy 0, policy_version 540 (0.0036)
+[2025-08-01 17:38:21,088][02698] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2224128. Throughput: 0: 950.7. Samples: 555892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:38:21,090][02698] Avg episode reward: [(0, '20.909')]
+[2025-08-01 17:38:26,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 2240512. Throughput: 0: 934.8. Samples: 560862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:38:26,093][02698] Avg episode reward: [(0, '20.215')]
+[2025-08-01 17:38:28,658][02852] Updated weights for policy 0, policy_version 550 (0.0029)
+[2025-08-01 17:38:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2260992. Throughput: 0: 953.0. Samples: 563722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:38:31,092][02698] Avg episode reward: [(0, '20.111')]
+[2025-08-01 17:38:36,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2281472. Throughput: 0: 958.4. Samples: 570314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-01 17:38:36,092][02698] Avg episode reward: [(0, '19.849')]
+[2025-08-01 17:38:38,769][02852] Updated weights for policy 0, policy_version 560 (0.0024)
+[2025-08-01 17:38:41,090][02698] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3915.5). Total num frames: 2297856. Throughput: 0: 942.9. Samples: 575110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:38:41,094][02698] Avg episode reward: [(0, '21.402')]
+[2025-08-01 17:38:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2318336. Throughput: 0: 963.5. Samples: 578232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:38:46,089][02698] Avg episode reward: [(0, '20.967')]
+[2025-08-01 17:38:49,113][02852] Updated weights for policy 0, policy_version 570 (0.0026)
+[2025-08-01 17:38:51,088][02698] Fps is (10 sec: 4506.6, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2342912. Throughput: 0: 962.8. Samples: 584772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:38:51,089][02698] Avg episode reward: [(0, '21.115')]
+[2025-08-01 17:38:56,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2355200. Throughput: 0: 946.3. Samples: 589304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-01 17:38:56,089][02698] Avg episode reward: [(0, '21.189')]
+[2025-08-01 17:39:00,495][02852] Updated weights for policy 0, policy_version 580 (0.0014)
+[2025-08-01 17:39:01,098][02698] Fps is (10 sec: 3273.4, 60 sec: 3822.3, 300 sec: 3929.2). Total num frames: 2375680. Throughput: 0: 960.0. Samples: 592550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:39:01,102][02698] Avg episode reward: [(0, '22.028')]
+[2025-08-01 17:39:06,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 2396160. Throughput: 0: 961.9. Samples: 599178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:39:06,096][02698] Avg episode reward: [(0, '21.148')]
+[2025-08-01 17:39:11,088][02698] Fps is (10 sec: 3690.3, 60 sec: 3823.0, 300 sec: 3901.7). Total num frames: 2412544. Throughput: 0: 951.8. Samples: 603694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:39:11,090][02698] Avg episode reward: [(0, '21.181')]
+[2025-08-01 17:39:11,680][02852] Updated weights for policy 0, policy_version 590 (0.0029)
+[2025-08-01 17:39:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2433024. Throughput: 0: 960.6. Samples: 606950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:39:16,090][02698] Avg episode reward: [(0, '21.999')]
+[2025-08-01 17:39:21,089][02698] Fps is (10 sec: 4095.6, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2453504. Throughput: 0: 962.0. Samples: 613604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:39:21,095][02698] Avg episode reward: [(0, '20.971')]
+[2025-08-01 17:39:21,286][02852] Updated weights for policy 0, policy_version 600 (0.0017)
+[2025-08-01 17:39:26,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.7). Total num frames: 2469888. Throughput: 0: 955.8. Samples: 618118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:39:26,093][02698] Avg episode reward: [(0, '21.228')]
+[2025-08-01 17:39:31,088][02698] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2494464. Throughput: 0: 960.1. Samples: 621436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:39:31,102][02698] Avg episode reward: [(0, '20.927')]
+[2025-08-01 17:39:31,970][02852] Updated weights for policy 0, policy_version 610 (0.0012)
+[2025-08-01 17:39:36,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2510848. Throughput: 0: 964.1. Samples: 628156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:39:36,093][02698] Avg episode reward: [(0, '22.250')]
+[2025-08-01 17:39:36,100][02835] Saving new best policy, reward=22.250!
+[2025-08-01 17:39:41,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3887.7). Total num frames: 2527232. Throughput: 0: 964.3. Samples: 632696. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:39:41,093][02698] Avg episode reward: [(0, '22.538')]
+[2025-08-01 17:39:41,096][02835] Saving new best policy, reward=22.538!
+[2025-08-01 17:39:43,225][02852] Updated weights for policy 0, policy_version 620 (0.0029)
+[2025-08-01 17:39:46,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2551808. Throughput: 0: 964.4. Samples: 635940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:39:46,094][02698] Avg episode reward: [(0, '22.108')]
+[2025-08-01 17:39:51,089][02698] Fps is (10 sec: 4095.7, 60 sec: 3754.6, 300 sec: 3887.7). Total num frames: 2568192. Throughput: 0: 957.9. Samples: 642282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:39:51,098][02698] Avg episode reward: [(0, '23.506')]
+[2025-08-01 17:39:51,100][02835] Saving new best policy, reward=23.506!
+[2025-08-01 17:39:54,295][02852] Updated weights for policy 0, policy_version 630 (0.0014)
+[2025-08-01 17:39:56,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.8). Total num frames: 2584576. Throughput: 0: 966.8. Samples: 647200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:39:56,093][02698] Avg episode reward: [(0, '23.901')]
+[2025-08-01 17:39:56,100][02835] Saving new best policy, reward=23.901!
+[2025-08-01 17:40:01,088][02698] Fps is (10 sec: 4096.3, 60 sec: 3891.9, 300 sec: 3915.5). Total num frames: 2609152. Throughput: 0: 969.4. Samples: 650572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:40:01,090][02698] Avg episode reward: [(0, '23.833')]
+[2025-08-01 17:40:03,446][02852] Updated weights for policy 0, policy_version 640 (0.0022)
+[2025-08-01 17:40:06,090][02698] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3887.7). Total num frames: 2625536. Throughput: 0: 958.1. Samples: 656718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:40:06,092][02698] Avg episode reward: [(0, '23.579')]
+[2025-08-01 17:40:06,104][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_2625536.pth...
+[2025-08-01 17:40:06,257][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_1695744.pth
+[2025-08-01 17:40:11,088][02698] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3873.9). Total num frames: 2637824. Throughput: 0: 940.4. Samples: 660434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:40:11,091][02698] Avg episode reward: [(0, '24.078')]
+[2025-08-01 17:40:11,097][02835] Saving new best policy, reward=24.078!
+[2025-08-01 17:40:16,088][02698] Fps is (10 sec: 3277.6, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 2658304. Throughput: 0: 925.2. Samples: 663068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:40:16,092][02698] Avg episode reward: [(0, '23.849')]
+[2025-08-01 17:40:16,484][02852] Updated weights for policy 0, policy_version 650 (0.0028)
+[2025-08-01 17:40:21,091][02698] Fps is (10 sec: 3685.3, 60 sec: 3686.3, 300 sec: 3846.0). Total num frames: 2674688. Throughput: 0: 903.1. Samples: 668800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:40:21,093][02698] Avg episode reward: [(0, '23.409')]
+[2025-08-01 17:40:26,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2695168. Throughput: 0: 925.3. Samples: 674336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:40:26,089][02698] Avg episode reward: [(0, '21.979')]
+[2025-08-01 17:40:27,599][02852] Updated weights for policy 0, policy_version 660 (0.0031)
+[2025-08-01 17:40:31,088][02698] Fps is (10 sec: 4097.2, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 2715648. Throughput: 0: 927.8. Samples: 677692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:40:31,092][02698] Avg episode reward: [(0, '24.166')]
+[2025-08-01 17:40:31,096][02835] Saving new best policy, reward=24.166!
+[2025-08-01 17:40:36,091][02698] Fps is (10 sec: 3685.5, 60 sec: 3686.2, 300 sec: 3846.0). Total num frames: 2732032. Throughput: 0: 912.3. Samples: 683338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:40:36,097][02698] Avg episode reward: [(0, '25.130')]
+[2025-08-01 17:40:36,112][02835] Saving new best policy, reward=25.130!
+[2025-08-01 17:40:38,704][02852] Updated weights for policy 0, policy_version 670 (0.0026)
+[2025-08-01 17:40:41,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2752512. Throughput: 0: 928.8. Samples: 688998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-01 17:40:41,091][02698] Avg episode reward: [(0, '24.134')]
+[2025-08-01 17:40:46,088][02698] Fps is (10 sec: 4506.7, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 2777088. Throughput: 0: 930.2. Samples: 692430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:40:46,091][02698] Avg episode reward: [(0, '24.382')]
+[2025-08-01 17:40:48,100][02852] Updated weights for policy 0, policy_version 680 (0.0017)
+[2025-08-01 17:40:51,089][02698] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3832.2). Total num frames: 2789376. Throughput: 0: 914.1. Samples: 697850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:40:51,091][02698] Avg episode reward: [(0, '24.034')]
+[2025-08-01 17:40:56,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2813952. Throughput: 0: 962.8. Samples: 703760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:40:56,093][02698] Avg episode reward: [(0, '24.029')]
+[2025-08-01 17:40:58,633][02852] Updated weights for policy 0, policy_version 690 (0.0031)
+[2025-08-01 17:41:01,088][02698] Fps is (10 sec: 4506.2, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2834432. Throughput: 0: 979.4. Samples: 707142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:41:01,094][02698] Avg episode reward: [(0, '23.236')]
+[2025-08-01 17:41:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3832.2). Total num frames: 2850816. Throughput: 0: 970.7. Samples: 712480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:41:06,093][02698] Avg episode reward: [(0, '22.939')]
+[2025-08-01 17:41:09,630][02852] Updated weights for policy 0, policy_version 700 (0.0024)
+[2025-08-01 17:41:11,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2871296. Throughput: 0: 983.3. Samples: 718586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:41:11,092][02698] Avg episode reward: [(0, '23.654')]
+[2025-08-01 17:41:16,091][02698] Fps is (10 sec: 4504.2, 60 sec: 3959.3, 300 sec: 3859.9). Total num frames: 2895872. Throughput: 0: 985.5. Samples: 722044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:41:16,095][02698] Avg episode reward: [(0, '23.847')]
+[2025-08-01 17:41:20,379][02852] Updated weights for policy 0, policy_version 710 (0.0017)
+[2025-08-01 17:41:21,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3818.3). Total num frames: 2908160. Throughput: 0: 972.9. Samples: 727116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:41:21,094][02698] Avg episode reward: [(0, '23.896')]
+[2025-08-01 17:41:26,088][02698] Fps is (10 sec: 3687.5, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2932736. Throughput: 0: 987.7. Samples: 733444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:41:26,093][02698] Avg episode reward: [(0, '24.909')]
+[2025-08-01 17:41:29,692][02852] Updated weights for policy 0, policy_version 720 (0.0014)
+[2025-08-01 17:41:31,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2953216. Throughput: 0: 985.9. Samples: 736796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:41:31,092][02698] Avg episode reward: [(0, '23.671')]
+[2025-08-01 17:41:36,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3832.2). Total num frames: 2969600. Throughput: 0: 972.7. Samples: 741618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:41:36,089][02698] Avg episode reward: [(0, '23.621')]
+[2025-08-01 17:41:40,770][02852] Updated weights for policy 0, policy_version 730 (0.0035)
+[2025-08-01 17:41:41,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2990080. Throughput: 0: 987.3. Samples: 748188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
+[2025-08-01 17:41:41,089][02698] Avg episode reward: [(0, '22.847')]
+[2025-08-01 17:41:46,092][02698] Fps is (10 sec: 4094.2, 60 sec: 3890.9, 300 sec: 3859.9). Total num frames: 3010560. Throughput: 0: 984.3. Samples: 751442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:41:46,094][02698] Avg episode reward: [(0, '22.065')]
+[2025-08-01 17:41:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3846.1). Total num frames: 3026944. Throughput: 0: 964.9. Samples: 755902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-01 17:41:51,093][02698] Avg episode reward: [(0, '21.646')]
+[2025-08-01 17:41:51,942][02852] Updated weights for policy 0, policy_version 740 (0.0014)
+[2025-08-01 17:41:56,088][02698] Fps is (10 sec: 3688.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3047424. Throughput: 0: 981.2. Samples: 762740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:41:56,089][02698] Avg episode reward: [(0, '22.730')]
+[2025-08-01 17:42:01,093][02698] Fps is (10 sec: 4093.7, 60 sec: 3890.8, 300 sec: 3846.0). Total num frames: 3067904. Throughput: 0: 978.5. Samples: 766080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:42:01,098][02698] Avg episode reward: [(0, '23.303')]
+[2025-08-01 17:42:01,795][02852] Updated weights for policy 0, policy_version 750 (0.0025)
+[2025-08-01 17:42:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3084288. Throughput: 0: 966.0. Samples: 770588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:42:06,089][02698] Avg episode reward: [(0, '23.703')]
+[2025-08-01 17:42:06,099][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth...
+[2025-08-01 17:42:06,211][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000530_2170880.pth
+[2025-08-01 17:42:11,088][02698] Fps is (10 sec: 4098.1, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 3108864. Throughput: 0: 972.6. Samples: 777210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-01 17:42:11,096][02698] Avg episode reward: [(0, '24.151')]
+[2025-08-01 17:42:12,111][02852] Updated weights for policy 0, policy_version 760 (0.0014)
+[2025-08-01 17:42:16,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3832.2). Total num frames: 3125248. Throughput: 0: 969.2. Samples: 780408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:42:16,093][02698] Avg episode reward: [(0, '25.022')]
+[2025-08-01 17:42:21,088][02698] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3141632. Throughput: 0: 962.0. Samples: 784906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:42:21,093][02698] Avg episode reward: [(0, '24.697')]
+[2025-08-01 17:42:23,276][02852] Updated weights for policy 0, policy_version 770 (0.0019)
+[2025-08-01 17:42:26,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3166208. Throughput: 0: 964.8. Samples: 791604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:42:26,089][02698] Avg episode reward: [(0, '24.642')]
+[2025-08-01 17:42:31,090][02698] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3832.2). Total num frames: 3182592. Throughput: 0: 968.3. Samples: 795014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:42:31,098][02698] Avg episode reward: [(0, '23.410')]
+[2025-08-01 17:42:34,486][02852] Updated weights for policy 0, policy_version 780 (0.0033)
+[2025-08-01 17:42:36,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3198976. Throughput: 0: 971.4. Samples: 799616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:42:36,093][02698] Avg episode reward: [(0, '23.647')]
+[2025-08-01 17:42:41,088][02698] Fps is (10 sec: 4097.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3223552. Throughput: 0: 968.8. Samples: 806336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:42:41,093][02698] Avg episode reward: [(0, '23.133')]
+[2025-08-01 17:42:43,737][02852] Updated weights for policy 0, policy_version 790 (0.0025)
+[2025-08-01 17:42:46,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3832.2). Total num frames: 3239936. Throughput: 0: 962.7. Samples: 809394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-01 17:42:46,090][02698] Avg episode reward: [(0, '22.424')]
+[2025-08-01 17:42:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3260416. Throughput: 0: 967.8. Samples: 814138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:42:51,090][02698] Avg episode reward: [(0, '22.679')]
+[2025-08-01 17:42:54,697][02852] Updated weights for policy 0, policy_version 800 (0.0012)
+[2025-08-01 17:42:56,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3280896. Throughput: 0: 970.2. Samples: 820870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:42:56,094][02698] Avg episode reward: [(0, '23.330')]
+[2025-08-01 17:43:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3818.3). Total num frames: 3297280. Throughput: 0: 961.5. Samples: 823676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:43:01,093][02698] Avg episode reward: [(0, '23.242')]
+[2025-08-01 17:43:05,642][02852] Updated weights for policy 0, policy_version 810 (0.0036)
+[2025-08-01 17:43:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3317760. Throughput: 0: 978.1. Samples: 828922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:43:06,089][02698] Avg episode reward: [(0, '22.768')]
+[2025-08-01 17:43:11,090][02698] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3846.0). Total num frames: 3338240. Throughput: 0: 976.3. Samples: 835542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:43:11,096][02698] Avg episode reward: [(0, '23.608')]
+[2025-08-01 17:43:16,095][02698] Fps is (10 sec: 3684.0, 60 sec: 3822.5, 300 sec: 3832.1). Total num frames: 3354624. Throughput: 0: 956.0. Samples: 838038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:43:16,099][02698] Avg episode reward: [(0, '23.432')]
+[2025-08-01 17:43:16,960][02852] Updated weights for policy 0, policy_version 820 (0.0018)
+[2025-08-01 17:43:21,088][02698] Fps is (10 sec: 3687.3, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3375104. Throughput: 0: 970.8. Samples: 843302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:43:21,093][02698] Avg episode reward: [(0, '23.359')]
+[2025-08-01 17:43:26,079][02852] Updated weights for policy 0, policy_version 830 (0.0014)
+[2025-08-01 17:43:26,088][02698] Fps is (10 sec: 4508.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3399680. Throughput: 0: 971.8. Samples: 850066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:43:26,090][02698] Avg episode reward: [(0, '22.985')]
+[2025-08-01 17:43:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3832.2). Total num frames: 3411968. Throughput: 0: 952.8. Samples: 852270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-01 17:43:31,094][02698] Avg episode reward: [(0, '23.403')]
+[2025-08-01 17:43:36,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3432448. Throughput: 0: 970.0. Samples: 857786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:43:36,108][02698] Avg episode reward: [(0, '24.934')]
+[2025-08-01 17:43:38,419][02852] Updated weights for policy 0, policy_version 840 (0.0020)
+[2025-08-01 17:43:41,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3444736. Throughput: 0: 919.0. Samples: 862226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:43:41,089][02698] Avg episode reward: [(0, '24.175')]
+[2025-08-01 17:43:46,088][02698] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3461120. Throughput: 0: 901.1. Samples: 864224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:43:46,093][02698] Avg episode reward: [(0, '24.371')]
+[2025-08-01 17:43:50,617][02852] Updated weights for policy 0, policy_version 850 (0.0026)
+[2025-08-01 17:43:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3481600. Throughput: 0: 914.0. Samples: 870050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:43:51,093][02698] Avg episode reward: [(0, '24.411')]
+[2025-08-01 17:43:56,092][02698] Fps is (10 sec: 4094.2, 60 sec: 3686.1, 300 sec: 3818.4). Total num frames: 3502080. Throughput: 0: 912.0. Samples: 876584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:43:56,094][02698] Avg episode reward: [(0, '25.077')]
+[2025-08-01 17:44:01,088][02698] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 3518464. Throughput: 0: 900.7. Samples: 878564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:44:01,092][02698] Avg episode reward: [(0, '24.845')]
+[2025-08-01 17:44:01,685][02852] Updated weights for policy 0, policy_version 860 (0.0023)
+[2025-08-01 17:44:06,088][02698] Fps is (10 sec: 3688.1, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3538944. Throughput: 0: 915.7. Samples: 884510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:44:06,090][02698] Avg episode reward: [(0, '25.047')]
+[2025-08-01 17:44:06,101][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000864_3538944.pth...
+[2025-08-01 17:44:06,236][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_2625536.pth
+[2025-08-01 17:44:11,092][02698] Fps is (10 sec: 4094.6, 60 sec: 3686.3, 300 sec: 3818.3). Total num frames: 3559424. Throughput: 0: 903.1. Samples: 890710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:44:11,096][02698] Avg episode reward: [(0, '26.152')]
+[2025-08-01 17:44:11,099][02835] Saving new best policy, reward=26.152!
+[2025-08-01 17:44:11,927][02852] Updated weights for policy 0, policy_version 870 (0.0024)
+[2025-08-01 17:44:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3686.8, 300 sec: 3804.4). Total num frames: 3575808. Throughput: 0: 896.2. Samples: 892598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:44:16,092][02698] Avg episode reward: [(0, '25.547')]
+[2025-08-01 17:44:21,088][02698] Fps is (10 sec: 3687.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3596288. Throughput: 0: 909.5. Samples: 898712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
+[2025-08-01 17:44:21,092][02698] Avg episode reward: [(0, '26.016')]
+[2025-08-01 17:44:22,450][02852] Updated weights for policy 0, policy_version 880 (0.0016)
+[2025-08-01 17:44:26,093][02698] Fps is (10 sec: 4093.9, 60 sec: 3617.8, 300 sec: 3804.4). Total num frames: 3616768. Throughput: 0: 946.0. Samples: 904802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:44:26,097][02698] Avg episode reward: [(0, '26.250')]
+[2025-08-01 17:44:26,104][02835] Saving new best policy, reward=26.250!
+[2025-08-01 17:44:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 3633152. Throughput: 0: 944.6. Samples: 906732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:44:31,090][02698] Avg episode reward: [(0, '26.601')]
+[2025-08-01 17:44:31,092][02835] Saving new best policy, reward=26.601!
+[2025-08-01 17:44:33,699][02852] Updated weights for policy 0, policy_version 890 (0.0017)
+[2025-08-01 17:44:36,088][02698] Fps is (10 sec: 3688.3, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3653632. Throughput: 0: 954.8. Samples: 913014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:44:36,090][02698] Avg episode reward: [(0, '26.049')]
+[2025-08-01 17:44:41,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3674112. Throughput: 0: 943.7. Samples: 919048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:44:41,089][02698] Avg episode reward: [(0, '24.543')]
+[2025-08-01 17:44:44,778][02852] Updated weights for policy 0, policy_version 900 (0.0022)
+[2025-08-01 17:44:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3690496. Throughput: 0: 943.2. Samples: 921008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:44:46,089][02698] Avg episode reward: [(0, '25.079')]
+[2025-08-01 17:44:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3710976. Throughput: 0: 956.9. Samples: 927570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:44:51,092][02698] Avg episode reward: [(0, '25.591')]
+[2025-08-01 17:44:54,208][02852] Updated weights for policy 0, policy_version 910 (0.0027)
+[2025-08-01 17:44:56,090][02698] Fps is (10 sec: 4095.1, 60 sec: 3823.1, 300 sec: 3804.4). Total num frames: 3731456. Throughput: 0: 944.9. Samples: 933228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:44:56,094][02698] Avg episode reward: [(0, '24.867')]
+[2025-08-01 17:45:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3804.5). Total num frames: 3747840. Throughput: 0: 948.0. Samples: 935258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:45:01,092][02698] Avg episode reward: [(0, '26.617')]
+[2025-08-01 17:45:01,095][02835] Saving new best policy, reward=26.617!
+[2025-08-01 17:45:05,458][02852] Updated weights for policy 0, policy_version 920 (0.0026)
+[2025-08-01 17:45:06,088][02698] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3768320. Throughput: 0: 956.0. Samples: 941732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:45:06,092][02698] Avg episode reward: [(0, '26.247')]
+[2025-08-01 17:45:11,093][02698] Fps is (10 sec: 3684.5, 60 sec: 3754.6, 300 sec: 3818.2). Total num frames: 3784704. Throughput: 0: 945.4. Samples: 947346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
+[2025-08-01 17:45:11,099][02698] Avg episode reward: [(0, '26.608')]
+[2025-08-01 17:45:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3805184. Throughput: 0: 952.1. Samples: 949576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:45:16,092][02698] Avg episode reward: [(0, '26.507')]
+[2025-08-01 17:45:16,636][02852] Updated weights for policy 0, policy_version 930 (0.0027)
+[2025-08-01 17:45:21,088][02698] Fps is (10 sec: 4098.1, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3825664. Throughput: 0: 961.6. Samples: 956288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:45:21,089][02698] Avg episode reward: [(0, '26.349')]
+[2025-08-01 17:45:26,089][02698] Fps is (10 sec: 4095.4, 60 sec: 3823.2, 300 sec: 3832.2). Total num frames: 3846144. Throughput: 0: 949.2. Samples: 961764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:45:26,091][02698] Avg episode reward: [(0, '25.475')]
+[2025-08-01 17:45:27,659][02852] Updated weights for policy 0, policy_version 940 (0.0017)
+[2025-08-01 17:45:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3862528. Throughput: 0: 962.8. Samples: 964332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
+[2025-08-01 17:45:31,092][02698] Avg episode reward: [(0, '24.982')]
+[2025-08-01 17:45:36,088][02698] Fps is (10 sec: 4096.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3887104. Throughput: 0: 969.5. Samples: 971198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:45:36,092][02698] Avg episode reward: [(0, '26.190')]
+[2025-08-01 17:45:36,653][02852] Updated weights for policy 0, policy_version 950 (0.0019)
+[2025-08-01 17:45:41,089][02698] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3903488. Throughput: 0: 956.7. Samples: 976276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:45:41,098][02698] Avg episode reward: [(0, '26.063')]
+[2025-08-01 17:45:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3923968. Throughput: 0: 975.9. Samples: 979172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
+[2025-08-01 17:45:46,093][02698] Avg episode reward: [(0, '25.614')]
+[2025-08-01 17:45:47,635][02852] Updated weights for policy 0, policy_version 960 (0.0025)
+[2025-08-01 17:45:51,088][02698] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3944448. Throughput: 0: 984.2. Samples: 986020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:45:51,089][02698] Avg episode reward: [(0, '25.870')]
+[2025-08-01 17:45:56,089][02698] Fps is (10 sec: 3685.9, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 3960832. Throughput: 0: 966.3. Samples: 990826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
+[2025-08-01 17:45:56,091][02698] Avg episode reward: [(0, '25.931')]
+[2025-08-01 17:45:58,586][02852] Updated weights for policy 0, policy_version 970 (0.0029)
+[2025-08-01 17:46:01,088][02698] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3981312. Throughput: 0: 986.9. Samples: 993988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:46:01,090][02698] Avg episode reward: [(0, '25.352')]
+[2025-08-01 17:46:06,091][02698] Fps is (10 sec: 4095.4, 60 sec: 3891.0, 300 sec: 3832.2). Total num frames: 4001792. Throughput: 0: 984.3. Samples: 1000586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
+[2025-08-01 17:46:06,102][02698] Avg episode reward: [(0, '23.697')]
+[2025-08-01 17:46:06,119][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000977_4001792.pth...
+[2025-08-01 17:46:06,448][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth
+[2025-08-01 17:46:08,589][02835] Stopping Batcher_0...
+[2025-08-01 17:46:08,590][02835] Loop batcher_evt_loop terminating...
+[2025-08-01 17:46:08,590][02698] Component Batcher_0 stopped!
+[2025-08-01 17:46:08,597][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:46:08,717][02852] Weights refcount: 2 0
+[2025-08-01 17:46:08,733][02852] Stopping InferenceWorker_p0-w0...
+[2025-08-01 17:46:08,734][02852] Loop inference_proc0-0_evt_loop terminating...
+[2025-08-01 17:46:08,739][02698] Component InferenceWorker_p0-w0 stopped!
+[2025-08-01 17:46:08,811][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000864_3538944.pth
+[2025-08-01 17:46:08,845][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:46:09,055][02854] Stopping RolloutWorker_w1...
+[2025-08-01 17:46:09,055][02698] Component RolloutWorker_w1 stopped!
+[2025-08-01 17:46:09,060][02854] Loop rollout_proc1_evt_loop terminating...
+[2025-08-01 17:46:09,067][02698] Component RolloutWorker_w7 stopped!
+[2025-08-01 17:46:09,072][02860] Stopping RolloutWorker_w7...
+[2025-08-01 17:46:09,073][02860] Loop rollout_proc7_evt_loop terminating...
+[2025-08-01 17:46:09,082][02698] Component RolloutWorker_w5 stopped!
+[2025-08-01 17:46:09,085][02858] Stopping RolloutWorker_w5...
+[2025-08-01 17:46:09,096][02698] Component RolloutWorker_w3 stopped!
+[2025-08-01 17:46:09,100][02856] Stopping RolloutWorker_w3...
+[2025-08-01 17:46:09,092][02858] Loop rollout_proc5_evt_loop terminating...
+[2025-08-01 17:46:09,101][02856] Loop rollout_proc3_evt_loop terminating...
+[2025-08-01 17:46:09,149][02835] Stopping LearnerWorker_p0...
+[2025-08-01 17:46:09,149][02835] Loop learner_proc0_evt_loop terminating...
+[2025-08-01 17:46:09,149][02698] Component LearnerWorker_p0 stopped!
+[2025-08-01 17:46:09,451][02857] Stopping RolloutWorker_w4...
+[2025-08-01 17:46:09,452][02857] Loop rollout_proc4_evt_loop terminating...
+[2025-08-01 17:46:09,452][02698] Component RolloutWorker_w4 stopped!
+[2025-08-01 17:46:09,479][02698] Component RolloutWorker_w0 stopped!
+[2025-08-01 17:46:09,473][02853] Stopping RolloutWorker_w0...
+[2025-08-01 17:46:09,485][02859] Stopping RolloutWorker_w6...
+[2025-08-01 17:46:09,485][02859] Loop rollout_proc6_evt_loop terminating...
+[2025-08-01 17:46:09,488][02853] Loop rollout_proc0_evt_loop terminating...
+[2025-08-01 17:46:09,487][02698] Component RolloutWorker_w6 stopped!
+[2025-08-01 17:46:09,499][02855] Stopping RolloutWorker_w2...
+[2025-08-01 17:46:09,500][02855] Loop rollout_proc2_evt_loop terminating...
+[2025-08-01 17:46:09,503][02698] Component RolloutWorker_w2 stopped!
+[2025-08-01 17:46:09,515][02698] Waiting for process learner_proc0 to stop...
+[2025-08-01 17:46:11,423][02698] Waiting for process inference_proc0-0 to join...
+[2025-08-01 17:46:11,425][02698] Waiting for process rollout_proc0 to join...
+[2025-08-01 17:46:13,790][02698] Waiting for process rollout_proc1 to join...
+[2025-08-01 17:46:13,791][02698] Waiting for process rollout_proc2 to join...
+[2025-08-01 17:46:13,793][02698] Waiting for process rollout_proc3 to join...
+[2025-08-01 17:46:13,795][02698] Waiting for process rollout_proc4 to join...
+[2025-08-01 17:46:13,798][02698] Waiting for process rollout_proc5 to join...
+[2025-08-01 17:46:13,799][02698] Waiting for process rollout_proc6 to join...
+[2025-08-01 17:46:13,801][02698] Waiting for process rollout_proc7 to join...
+[2025-08-01 17:46:13,803][02698] Batcher 0 profile tree view:
+batching: 26.8052, releasing_batches: 0.0308
+[2025-08-01 17:46:13,804][02698] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0190
+  wait_policy_total: 426.3653
+update_model: 8.8217
+  weight_update: 0.0017
+one_step: 0.0034
+  handle_policy_step: 580.7459
+    deserialize: 14.2973, stack: 3.1013, obs_to_device_normalize: 121.9128, forward: 302.5888, send_messages: 27.5275
+    prepare_outputs: 86.9119
+      to_cpu: 52.2295
+[2025-08-01 17:46:13,806][02698] Learner 0 profile tree view:
+misc: 0.0038, prepare_batch: 12.3597
+train: 73.3198
+  epoch_init: 0.0065, minibatch_init: 0.0073, losses_postprocess: 0.6659, kl_divergence: 0.7058, after_optimizer: 33.3045
+  calculate_losses: 26.0606
+    losses_init: 0.0129, forward_head: 1.4762, bptt_initial: 17.0544, tail: 1.2080, advantages_returns: 0.2881, losses: 3.5637
+    bptt: 2.1475
+      bptt_forward_core: 2.0454
+  update: 11.9187
+    clip: 0.9734
+[2025-08-01 17:46:13,808][02698] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 0.2858, enqueue_policy_requests: 105.0386, env_step: 825.2810, overhead: 13.8404, complete_rollouts: 7.4670
+save_policy_outputs: 19.5860
+  split_output_tensors: 7.4430
+[2025-08-01 17:46:13,809][02698] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.3324, enqueue_policy_requests: 112.0578, env_step: 818.5043, overhead: 13.9942, complete_rollouts: 6.5757
+save_policy_outputs: 19.7271
+  split_output_tensors: 7.4854
+[2025-08-01 17:46:13,811][02698] Loop Runner_EvtLoop terminating...
+[2025-08-01 17:46:13,813][02698] Runner profile tree view:
+main_loop: 1081.9796
+[2025-08-01 17:46:13,814][02698] Collected {0: 4005888}, FPS: 3702.4
+[2025-08-01 17:46:14,115][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-01 17:46:14,116][02698] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-01 17:46:14,118][02698] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-01 17:46:14,119][02698] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-01 17:46:14,120][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-01 17:46:14,122][02698] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-01 17:46:14,123][02698] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-01 17:46:14,125][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-01 17:46:14,126][02698] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-01 17:46:14,127][02698] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-01 17:46:14,128][02698] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-01 17:46:14,129][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-01 17:46:14,130][02698] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-01 17:46:14,131][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-01 17:46:14,133][02698] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-01 17:46:14,163][02698] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-01 17:46:14,166][02698] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-01 17:46:14,168][02698] RunningMeanStd input shape: (1,)
+[2025-08-01 17:46:14,182][02698] ConvEncoder: input_channels=3
+[2025-08-01 17:46:14,278][02698] Conv encoder output size: 512
+[2025-08-01 17:46:14,279][02698] Policy head output size: 512
+[2025-08-01 17:46:14,447][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:46:14,449][02698] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:46:14,453][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:46:14,454][02698] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:46:14,456][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:46:14,457][02698] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:46:46,985][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-01 17:46:46,986][02698] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-01 17:46:46,988][02698] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-01 17:46:46,988][02698] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-01 17:46:46,989][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-01 17:46:46,990][02698] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-01 17:46:46,991][02698] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-01 17:46:46,992][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-01 17:46:46,993][02698] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-01 17:46:46,994][02698] Adding new argument 'hf_repository'='TayJen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-01 17:46:46,995][02698] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-01 17:46:46,995][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-01 17:46:46,996][02698] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-01 17:46:46,997][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-01 17:46:46,998][02698] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-01 17:46:47,029][02698] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-01 17:46:47,030][02698] RunningMeanStd input shape: (1,)
+[2025-08-01 17:46:47,041][02698] ConvEncoder: input_channels=3
+[2025-08-01 17:46:47,078][02698] Conv encoder output size: 512
+[2025-08-01 17:46:47,079][02698] Policy head output size: 512
+[2025-08-01 17:46:47,099][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:46:47,100][02698] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:46:47,102][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:46:47,103][02698] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:46:47,104][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:46:47,106][02698] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:47:45,094][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-01 17:47:45,095][02698] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-01 17:47:45,096][02698] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-01 17:47:45,097][02698] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-01 17:47:45,098][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-01 17:47:45,099][02698] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-01 17:47:45,100][02698] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-01 17:47:45,101][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-01 17:47:45,102][02698] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-01 17:47:45,103][02698] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-01 17:47:45,104][02698] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-01 17:47:45,105][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-01 17:47:45,106][02698] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-01 17:47:45,107][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-01 17:47:45,108][02698] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-01 17:47:45,153][02698] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-01 17:47:45,157][02698] RunningMeanStd input shape: (1,)
+[2025-08-01 17:47:45,175][02698] ConvEncoder: input_channels=3
+[2025-08-01 17:47:45,227][02698] Conv encoder output size: 512
+[2025-08-01 17:47:45,228][02698] Policy head output size: 512
+[2025-08-01 17:47:45,253][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:47:45,255][02698] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:47:45,256][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:47:45,257][02698] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:47:45,259][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:47:45,260][02698] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:47:48,908][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-01 17:47:48,910][02698] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-01 17:47:48,911][02698] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-01 17:47:48,913][02698] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-01 17:47:48,914][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-01 17:47:48,914][02698] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-01 17:47:48,915][02698] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-01 17:47:48,917][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-01 17:47:48,919][02698] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-01 17:47:48,921][02698] Adding new argument 'hf_repository'='TayJen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-01 17:47:48,923][02698] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-01 17:47:48,925][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-01 17:47:48,925][02698] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-01 17:47:48,926][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-01 17:47:48,927][02698] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-01 17:47:48,952][02698] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-01 17:47:48,954][02698] RunningMeanStd input shape: (1,)
+[2025-08-01 17:47:48,965][02698] ConvEncoder: input_channels=3
+[2025-08-01 17:47:48,999][02698] Conv encoder output size: 512
+[2025-08-01 17:47:49,000][02698] Policy head output size: 512
+[2025-08-01 17:47:49,020][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:47:49,021][02698] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:47:49,022][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:47:49,024][02698] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:47:49,025][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:47:49,027][02698] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:48:31,235][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-01 17:48:31,236][02698] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-01 17:48:31,237][02698] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-01 17:48:31,238][02698] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-01 17:48:31,239][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-01 17:48:31,241][02698] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-01 17:48:31,242][02698] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-01 17:48:31,242][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-01 17:48:31,245][02698] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-01 17:48:31,246][02698] Adding new argument 'hf_repository'='TayJen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-01 17:48:31,248][02698] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-01 17:48:31,249][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-01 17:48:31,251][02698] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-01 17:48:31,253][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-01 17:48:31,253][02698] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-01 17:48:31,307][02698] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-01 17:48:31,310][02698] RunningMeanStd input shape: (1,)
+[2025-08-01 17:48:31,326][02698] ConvEncoder: input_channels=3
+[2025-08-01 17:48:31,381][02698] Conv encoder output size: 512
+[2025-08-01 17:48:31,383][02698] Policy head output size: 512
+[2025-08-01 17:48:31,411][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:48:31,413][02698] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:48:31,415][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:48:31,417][02698] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:48:31,419][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:48:31,421][02698] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:48:32,927][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-01 17:48:32,928][02698] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-01 17:48:32,930][02698] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-01 17:48:32,931][02698] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-01 17:48:32,932][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-01 17:48:32,933][02698] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-01 17:48:32,934][02698] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-01 17:48:32,935][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-01 17:48:32,936][02698] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-01 17:48:32,937][02698] Adding new argument 'hf_repository'='TayJen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-01 17:48:32,938][02698] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-01 17:48:32,939][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-01 17:48:32,940][02698] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-01 17:48:32,941][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-01 17:48:32,942][02698] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-01 17:48:32,968][02698] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-01 17:48:32,970][02698] RunningMeanStd input shape: (1,)
+[2025-08-01 17:48:32,980][02698] ConvEncoder: input_channels=3
+[2025-08-01 17:48:33,012][02698] Conv encoder output size: 512
+[2025-08-01 17:48:33,013][02698] Policy head output size: 512
+[2025-08-01 17:48:33,032][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:48:33,033][02698] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:48:33,035][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:48:33,036][02698] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-01 17:48:33,037][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-01 17:48:33,039][02698] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.