diff --git "a/sf_log.txt" "b/sf_log.txt"
new file mode 100644--- /dev/null
+++ "b/sf_log.txt"
@@ -0,0 +1,1564 @@
+[2025-08-22 18:32:41,754][19241] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json...
+[2025-08-22 18:32:41,871][19241] Rollout worker 0 uses device cpu
+[2025-08-22 18:32:41,873][19241] Rollout worker 1 uses device cpu
+[2025-08-22 18:32:41,873][19241] Rollout worker 2 uses device cpu
+[2025-08-22 18:32:41,874][19241] Rollout worker 3 uses device cpu
+[2025-08-22 18:32:41,875][19241] Rollout worker 4 uses device cpu
+[2025-08-22 18:32:41,876][19241] Rollout worker 5 uses device cpu
+[2025-08-22 18:32:41,877][19241] Rollout worker 6 uses device cpu
+[2025-08-22 18:32:41,878][19241] Rollout worker 7 uses device cpu
+[2025-08-22 18:32:41,945][19241] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-22 18:32:41,947][19241] InferenceWorker_p0-w0: min num requests: 2
+[2025-08-22 18:32:41,969][19241] Starting all processes...
+[2025-08-22 18:32:41,971][19241] Starting process learner_proc0
+[2025-08-22 18:32:42,068][19241] Starting all processes...
+[2025-08-22 18:32:42,076][19241] Starting process inference_proc0-0
+[2025-08-22 18:32:42,078][19241] Starting process rollout_proc0
+[2025-08-22 18:32:42,080][19241] Starting process rollout_proc1
+[2025-08-22 18:32:42,081][19241] Starting process rollout_proc2
+[2025-08-22 18:32:42,081][19241] Starting process rollout_proc3
+[2025-08-22 18:32:42,082][19241] Starting process rollout_proc4
+[2025-08-22 18:32:42,083][19241] Starting process rollout_proc5
+[2025-08-22 18:32:42,083][19241] Starting process rollout_proc6
+[2025-08-22 18:32:42,083][19241] Starting process rollout_proc7
+[2025-08-22 18:32:44,745][19431] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-22 18:32:44,745][19431] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2025-08-22 18:32:44,745][19433] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
+[2025-08-22 18:32:44,761][19434] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
+[2025-08-22 18:32:44,791][19435] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
+[2025-08-22 18:32:44,796][19418] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-22 18:32:44,797][19418] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2025-08-22 18:32:44,808][19432] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
+[2025-08-22 18:32:44,836][19439] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
+[2025-08-22 18:32:44,853][19418] Num visible devices: 1
+[2025-08-22 18:32:44,854][19418] Starting seed is not provided
+[2025-08-22 18:32:44,855][19418] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-22 18:32:44,854][19431] Num visible devices: 1
+[2025-08-22 18:32:44,855][19418] Initializing actor-critic model on device cuda:0
+[2025-08-22 18:32:44,855][19418] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 18:32:44,864][19418] RunningMeanStd input shape: (1,)
+[2025-08-22 18:32:44,873][19438] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
+[2025-08-22 18:32:44,882][19418] ConvEncoder: input_channels=3
+[2025-08-22 18:32:44,933][19437] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
+[2025-08-22 18:32:44,954][19436] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
+[2025-08-22 18:32:45,117][19418] Conv encoder output size: 512
+[2025-08-22 18:32:45,118][19418] Policy head output size: 512
+[2025-08-22 18:32:45,175][19418] Created Actor Critic model with architecture:
+[2025-08-22 18:32:45,175][19418] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2025-08-22 18:32:46,232][19418] Using optimizer <class 'torch.optim.adam.Adam'>
+[2025-08-22 18:32:51,539][19418] No checkpoints found
+[2025-08-22 18:32:51,539][19418] Did not load from checkpoint, starting from scratch!
+[2025-08-22 18:32:51,540][19418] Initialized policy 0 weights for model version 0
+[2025-08-22 18:32:51,548][19418] LearnerWorker_p0 finished initialization!
+[2025-08-22 18:32:51,548][19418] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-22 18:32:51,854][19431] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 18:32:51,858][19431] RunningMeanStd input shape: (1,)
+[2025-08-22 18:32:51,879][19431] ConvEncoder: input_channels=3
+[2025-08-22 18:32:51,930][19241] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-22 18:32:51,991][19431] Conv encoder output size: 512
+[2025-08-22 18:32:51,992][19431] Policy head output size: 512
+[2025-08-22 18:32:52,042][19241] Inference worker 0-0 is ready!
+[2025-08-22 18:32:52,043][19241] All inference workers are ready! Signal rollout workers to start!
+[2025-08-22 18:32:52,119][19439] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-22 18:32:52,124][19433] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-22 18:32:52,127][19438] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-22 18:32:52,129][19437] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-22 18:32:52,143][19432] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-22 18:32:52,145][19436] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-22 18:32:52,148][19434] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-22 18:32:52,149][19435] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-22 18:32:52,605][19435] Decorrelating experience for 0 frames...
+[2025-08-22 18:32:52,605][19437] Decorrelating experience for 0 frames...
+[2025-08-22 18:32:52,848][19435] Decorrelating experience for 32 frames...
+[2025-08-22 18:32:53,149][19437] Decorrelating experience for 32 frames...
+[2025-08-22 18:32:53,283][19435] Decorrelating experience for 64 frames...
+[2025-08-22 18:32:53,574][19435] Decorrelating experience for 96 frames...
+[2025-08-22 18:32:56,931][19241] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-22 18:32:56,934][19241] Avg episode reward: [(0, '3.950')]
+[2025-08-22 18:33:01,930][19241] Fps is (10 sec: 409.6, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 4096. Throughput: 0: 184.4. Samples: 1844. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2025-08-22 18:33:01,932][19241] Avg episode reward: [(0, '4.487')]
+[2025-08-22 18:33:01,938][19241] Heartbeat connected on Batcher_0
+[2025-08-22 18:33:01,960][19241] Heartbeat connected on InferenceWorker_p0-w0
+[2025-08-22 18:33:01,967][19241] Heartbeat connected on RolloutWorker_w6
+[2025-08-22 18:33:02,072][19241] Heartbeat connected on LearnerWorker_p0
+[2025-08-22 18:33:06,930][19241] Fps is (10 sec: 819.3, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 8192. Throughput: 0: 147.1. Samples: 2206. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2025-08-22 18:33:06,932][19241] Avg episode reward: [(0, '4.709')]
+[2025-08-22 18:33:11,930][19241] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 193.1. Samples: 3862. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-22 18:33:11,931][19241] Avg episode reward: [(0, '4.540')]
+[2025-08-22 18:33:15,151][19438] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 1
+[2025-08-22 18:33:15,151][19433] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 1
+[2025-08-22 18:33:16,236][19437] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 1
+[2025-08-22 18:33:16,930][19241] Fps is (10 sec: 1638.3, 60 sec: 983.0, 300 sec: 983.0). Total num frames: 24576. Throughput: 0: 248.9. Samples: 6222. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-22 18:33:16,931][19241] Avg episode reward: [(0, '4.527')]
+[2025-08-22 18:33:21,930][19241] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 32768. Throughput: 0: 245.1. Samples: 7354. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-22 18:33:21,932][19241] Avg episode reward: [(0, '4.569')]
+[2025-08-22 18:33:26,644][19431] Updated weights for policy 0, policy_version 10 (0.0014)
+[2025-08-22 18:33:26,930][19241] Fps is (10 sec: 1638.5, 60 sec: 1170.3, 300 sec: 1170.3). Total num frames: 40960. Throughput: 0: 276.2. Samples: 9666. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-22 18:33:26,931][19241] Avg episode reward: [(0, '4.551')]
+[2025-08-22 18:33:31,930][19241] Fps is (10 sec: 1638.5, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 49152. Throughput: 0: 306.0. Samples: 12238. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-22 18:33:31,931][19241] Avg episode reward: [(0, '4.335')]
+[2025-08-22 18:33:35,171][19433] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 2
+[2025-08-22 18:33:35,171][19438] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 2
+[2025-08-22 18:33:36,256][19437] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 2
+[2025-08-22 18:33:36,930][19241] Fps is (10 sec: 2048.0, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 61440. Throughput: 0: 308.0. Samples: 13858. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-22 18:33:36,931][19241] Avg episode reward: [(0, '4.294')]
+[2025-08-22 18:33:41,930][19241] Fps is (10 sec: 1638.4, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 65536. Throughput: 0: 349.2. Samples: 15726. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-22 18:33:41,930][19241] Avg episode reward: [(0, '4.363')]
+[2025-08-22 18:33:46,930][19241] Fps is (10 sec: 1228.8, 60 sec: 1340.5, 300 sec: 1340.5). Total num frames: 73728. Throughput: 0: 376.2. Samples: 18774. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-22 18:33:46,931][19241] Avg episode reward: [(0, '4.257')]
+[2025-08-22 18:33:49,518][19431] Updated weights for policy 0, policy_version 20 (0.0009)
+[2025-08-22 18:33:50,937][19433] Decorrelating experience for 0 frames...
+[2025-08-22 18:33:51,138][19433] Decorrelating experience for 32 frames...
+[2025-08-22 18:33:51,140][19437] Decorrelating experience for 64 frames...
+[2025-08-22 18:33:51,339][19438] Decorrelating experience for 0 frames...
+[2025-08-22 18:33:51,369][19437] Decorrelating experience for 96 frames...
+[2025-08-22 18:33:51,441][19241] Heartbeat connected on RolloutWorker_w4
+[2025-08-22 18:33:51,521][19438] Decorrelating experience for 32 frames...
+[2025-08-22 18:33:51,560][19433] Decorrelating experience for 64 frames...
+[2025-08-22 18:33:51,788][19438] Decorrelating experience for 64 frames...
+[2025-08-22 18:33:51,808][19433] Decorrelating experience for 96 frames...
+[2025-08-22 18:33:51,898][19241] Heartbeat connected on RolloutWorker_w0
+[2025-08-22 18:33:51,930][19241] Fps is (10 sec: 2048.0, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 86016. Throughput: 0: 402.1. Samples: 20302. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0)
+[2025-08-22 18:33:51,931][19241] Avg episode reward: [(0, '4.311')]
+[2025-08-22 18:33:52,037][19438] Decorrelating experience for 96 frames...
+[2025-08-22 18:33:52,117][19241] Heartbeat connected on RolloutWorker_w5
+[2025-08-22 18:33:56,930][19241] Fps is (10 sec: 4096.0, 60 sec: 1911.5, 300 sec: 1764.4). Total num frames: 114688. Throughput: 0: 506.1. Samples: 26636. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:33:56,931][19241] Avg episode reward: [(0, '4.146')]
+[2025-08-22 18:33:56,937][19418] Saving new best policy, reward=4.146!
+[2025-08-22 18:33:57,888][19431] Updated weights for policy 0, policy_version 30 (0.0016)
+[2025-08-22 18:34:01,930][19241] Fps is (10 sec: 6144.0, 60 sec: 2389.3, 300 sec: 2106.5). Total num frames: 147456. Throughput: 0: 663.4. Samples: 36076. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:34:01,931][19241] Avg episode reward: [(0, '4.181')]
+[2025-08-22 18:34:01,934][19418] Saving new best policy, reward=4.181!
+[2025-08-22 18:34:04,262][19431] Updated weights for policy 0, policy_version 40 (0.0014)
+[2025-08-22 18:34:06,930][19241] Fps is (10 sec: 6553.6, 60 sec: 2867.2, 300 sec: 2403.0). Total num frames: 180224. Throughput: 0: 746.6. Samples: 40950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:34:06,931][19241] Avg episode reward: [(0, '4.381')]
+[2025-08-22 18:34:06,939][19418] Saving new best policy, reward=4.381!
+[2025-08-22 18:34:10,505][19431] Updated weights for policy 0, policy_version 50 (0.0015)
+[2025-08-22 18:34:11,930][19241] Fps is (10 sec: 6553.7, 60 sec: 3276.8, 300 sec: 2662.4). Total num frames: 212992. Throughput: 0: 912.2. Samples: 50716. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:34:11,931][19241] Avg episode reward: [(0, '4.326')]
+[2025-08-22 18:34:16,930][19241] Fps is (10 sec: 4505.6, 60 sec: 3345.1, 300 sec: 2650.4). Total num frames: 225280. Throughput: 0: 972.7. Samples: 56008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:34:16,931][19241] Avg episode reward: [(0, '4.364')]
+[2025-08-22 18:34:19,343][19431] Updated weights for policy 0, policy_version 60 (0.0015)
+[2025-08-22 18:34:21,930][19241] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 2912.7). Total num frames: 262144. Throughput: 0: 1053.5. Samples: 61266. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:34:21,931][19241] Avg episode reward: [(0, '4.287')]
+[2025-08-22 18:34:25,043][19431] Updated weights for policy 0, policy_version 70 (0.0015)
+[2025-08-22 18:34:26,930][19241] Fps is (10 sec: 7372.8, 60 sec: 4300.8, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 1250.9. Samples: 72018. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:34:26,931][19241] Avg episode reward: [(0, '4.477')]
+[2025-08-22 18:34:26,937][19418] Saving new best policy, reward=4.477!
+[2025-08-22 18:34:30,745][19431] Updated weights for policy 0, policy_version 80 (0.0014)
+[2025-08-22 18:34:31,930][19241] Fps is (10 sec: 7372.8, 60 sec: 4778.7, 300 sec: 3358.7). Total num frames: 335872. Throughput: 0: 1422.5. Samples: 82788. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:34:31,932][19241] Avg episode reward: [(0, '4.496')]
+[2025-08-22 18:34:31,933][19418] Saving new best policy, reward=4.496!
+[2025-08-22 18:34:36,053][19431] Updated weights for policy 0, policy_version 90 (0.0013)
+[2025-08-22 18:34:36,930][19241] Fps is (10 sec: 7372.8, 60 sec: 5188.3, 300 sec: 3549.9). Total num frames: 372736. Throughput: 0: 1511.6. Samples: 88326. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:34:36,931][19241] Avg episode reward: [(0, '4.540')]
+[2025-08-22 18:34:36,935][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000091_372736.pth...
+[2025-08-22 18:34:36,987][19418] Saving new best policy, reward=4.540!
+[2025-08-22 18:34:40,763][19431] Updated weights for policy 0, policy_version 100 (0.0011)
+[2025-08-22 18:34:41,930][19241] Fps is (10 sec: 8192.0, 60 sec: 5870.9, 300 sec: 3798.1). Total num frames: 417792. Throughput: 0: 1655.8. Samples: 101146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:34:41,931][19241] Avg episode reward: [(0, '4.643')]
+[2025-08-22 18:34:41,933][19418] Saving new best policy, reward=4.643!
+[2025-08-22 18:34:46,112][19431] Updated weights for policy 0, policy_version 110 (0.0011)
+[2025-08-22 18:34:46,930][19241] Fps is (10 sec: 8192.0, 60 sec: 6348.8, 300 sec: 3953.5). Total num frames: 454656. Throughput: 0: 1707.8. Samples: 112928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-22 18:34:46,931][19241] Avg episode reward: [(0, '4.436')]
+[2025-08-22 18:34:51,930][19241] Fps is (10 sec: 5324.8, 60 sec: 6417.1, 300 sec: 3925.3). Total num frames: 471040. Throughput: 0: 1670.1. Samples: 116106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-22 18:34:51,931][19241] Avg episode reward: [(0, '4.411')]
+[2025-08-22 18:34:54,373][19431] Updated weights for policy 0, policy_version 120 (0.0012)
+[2025-08-22 18:34:56,930][19241] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 4096.0). Total num frames: 512000. Throughput: 0: 1645.1. Samples: 124746. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-22 18:34:56,931][19241] Avg episode reward: [(0, '4.502')]
+[2025-08-22 18:34:59,737][19431] Updated weights for policy 0, policy_version 130 (0.0014)
+[2025-08-22 18:35:01,930][19241] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 4190.5). Total num frames: 544768. Throughput: 0: 1772.1. Samples: 135754. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:35:01,931][19241] Avg episode reward: [(0, '4.493')]
+[2025-08-22 18:35:05,022][19431] Updated weights for policy 0, policy_version 140 (0.0012)
+[2025-08-22 18:35:06,930][19241] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 4338.7). Total num frames: 585728. Throughput: 0: 1788.3. Samples: 141738. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:35:06,931][19241] Avg episode reward: [(0, '4.574')]
+[2025-08-22 18:35:10,545][19431] Updated weights for policy 0, policy_version 150 (0.0011)
+[2025-08-22 18:35:11,930][19241] Fps is (10 sec: 7782.1, 60 sec: 6826.6, 300 sec: 4447.1). Total num frames: 622592. Throughput: 0: 1798.6. Samples: 152954. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:35:11,934][19241] Avg episode reward: [(0, '4.516')]
+[2025-08-22 18:35:15,846][19431] Updated weights for policy 0, policy_version 160 (0.0014)
+[2025-08-22 18:35:16,930][19241] Fps is (10 sec: 7782.2, 60 sec: 7304.5, 300 sec: 4576.2). Total num frames: 663552. Throughput: 0: 1817.8. Samples: 164588. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:35:16,931][19241] Avg episode reward: [(0, '4.382')]
+[2025-08-22 18:35:21,215][19431] Updated weights for policy 0, policy_version 170 (0.0012)
+[2025-08-22 18:35:21,930][19241] Fps is (10 sec: 7782.6, 60 sec: 7304.5, 300 sec: 4669.4). Total num frames: 700416. Throughput: 0: 1822.3. Samples: 170328. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:35:21,931][19241] Avg episode reward: [(0, '4.638')]
+[2025-08-22 18:35:26,930][19241] Fps is (10 sec: 4915.3, 60 sec: 6894.9, 300 sec: 4598.1). Total num frames: 712704. Throughput: 0: 1690.9. Samples: 177236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:35:26,931][19241] Avg episode reward: [(0, '4.473')]
+[2025-08-22 18:35:29,790][19431] Updated weights for policy 0, policy_version 180 (0.0014)
+[2025-08-22 18:35:31,930][19241] Fps is (10 sec: 4915.2, 60 sec: 6894.9, 300 sec: 4684.8). Total num frames: 749568. Throughput: 0: 1643.8. Samples: 186898. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:35:31,931][19241] Avg episode reward: [(0, '4.548')]
+[2025-08-22 18:35:35,325][19431] Updated weights for policy 0, policy_version 190 (0.0014)
+[2025-08-22 18:35:36,930][19241] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 4766.3). Total num frames: 786432. Throughput: 0: 1699.8. Samples: 192598. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:35:36,931][19241] Avg episode reward: [(0, '4.349')]
+[2025-08-22 18:35:41,349][19431] Updated weights for policy 0, policy_version 200 (0.0016)
+[2025-08-22 18:35:41,930][19241] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 4818.8). Total num frames: 819200. Throughput: 0: 1732.5. Samples: 202710. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:35:41,931][19241] Avg episode reward: [(0, '4.363')]
+[2025-08-22 18:35:46,930][19241] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 4891.8). Total num frames: 856064. Throughput: 0: 1724.8. Samples: 213370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:35:46,932][19241] Avg episode reward: [(0, '4.319')]
+[2025-08-22 18:35:47,162][19431] Updated weights for policy 0, policy_version 210 (0.0011)
+[2025-08-22 18:35:51,930][19241] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 4960.7). Total num frames: 892928. Throughput: 0: 1708.4. Samples: 218618. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:35:51,931][19241] Avg episode reward: [(0, '4.262')]
+[2025-08-22 18:35:52,430][19431] Updated weights for policy 0, policy_version 220 (0.0012)
+[2025-08-22 18:35:56,930][19241] Fps is (10 sec: 8191.6, 60 sec: 7099.7, 300 sec: 5070.2). Total num frames: 937984. Throughput: 0: 1745.7. Samples: 231512. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:35:56,932][19241] Avg episode reward: [(0, '4.472')]
+[2025-08-22 18:35:57,077][19431] Updated weights for policy 0, policy_version 230 (0.0013)
+[2025-08-22 18:36:01,930][19241] Fps is (10 sec: 6553.6, 60 sec: 6894.9, 300 sec: 5044.6). Total num frames: 958464. Throughput: 0: 1636.5. Samples: 238228. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:36:01,931][19241] Avg episode reward: [(0, '4.402')]
+[2025-08-22 18:36:04,921][19431] Updated weights for policy 0, policy_version 240 (0.0010)
+[2025-08-22 18:36:06,930][19241] Fps is (10 sec: 5734.7, 60 sec: 6826.7, 300 sec: 5104.2). Total num frames: 995328. Throughput: 0: 1648.4. Samples: 244508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:36:06,931][19241] Avg episode reward: [(0, '4.403')]
+[2025-08-22 18:36:11,244][19431] Updated weights for policy 0, policy_version 250 (0.0011)
+[2025-08-22 18:36:11,930][19241] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 5140.5). Total num frames: 1028096. Throughput: 0: 1711.6. Samples: 254260. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:36:11,931][19241] Avg episode reward: [(0, '4.421')]
+[2025-08-22 18:36:16,930][19241] Fps is (10 sec: 6553.4, 60 sec: 6621.8, 300 sec: 5174.9). Total num frames: 1060864. Throughput: 0: 1728.5. Samples: 264680. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:36:16,932][19241] Avg episode reward: [(0, '4.544')]
+[2025-08-22 18:36:17,114][19431] Updated weights for policy 0, policy_version 260 (0.0016)
+[2025-08-22 18:36:21,930][19241] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 5227.3). Total num frames: 1097728. Throughput: 0: 1716.9. Samples: 269858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:36:21,931][19241] Avg episode reward: [(0, '4.329')]
+[2025-08-22 18:36:22,993][19431] Updated weights for policy 0, policy_version 270 (0.0014)
+[2025-08-22 18:36:26,930][19241] Fps is (10 sec: 7782.7, 60 sec: 7099.7, 300 sec: 5296.2). Total num frames: 1138688. Throughput: 0: 1740.4. Samples: 281030. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:36:26,931][19241] Avg episode reward: [(0, '4.275')]
+[2025-08-22 18:36:27,817][19431] Updated weights for policy 0, policy_version 280 (0.0013)
+[2025-08-22 18:36:31,930][19241] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 5362.0). Total num frames: 1179648. Throughput: 0: 1798.5. Samples: 294300. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:36:31,931][19241] Avg episode reward: [(0, '4.298')]
+[2025-08-22 18:36:32,453][19431] Updated weights for policy 0, policy_version 290 (0.0010)
+[2025-08-22 18:36:36,930][19241] Fps is (10 sec: 6143.8, 60 sec: 6894.9, 300 sec: 5333.9). Total num frames: 1200128. Throughput: 0: 1788.1. Samples: 299082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:36:36,932][19241] Avg episode reward: [(0, '4.327')]
+[2025-08-22 18:36:36,938][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000293_1200128.pth...
+[2025-08-22 18:36:40,542][19431] Updated weights for policy 0, policy_version 300 (0.0013)
+[2025-08-22 18:36:41,930][19241] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 5378.2). Total num frames: 1236992. Throughput: 0: 1671.7. Samples: 306738. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:36:41,932][19241] Avg episode reward: [(0, '4.422')]
+[2025-08-22 18:36:46,509][19431] Updated weights for policy 0, policy_version 310 (0.0013)
+[2025-08-22 18:36:46,930][19241] Fps is (10 sec: 6963.4, 60 sec: 6894.9, 300 sec: 5403.2). Total num frames: 1269760. Throughput: 0: 1748.7. Samples: 316922. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:36:46,932][19241] Avg episode reward: [(0, '4.348')]
+[2025-08-22 18:36:51,930][19241] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 5444.3). Total num frames: 1306624. Throughput: 0: 1726.5. Samples: 322202. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:36:51,931][19241] Avg episode reward: [(0, '4.340')]
+[2025-08-22 18:36:52,559][19431] Updated weights for policy 0, policy_version 320 (0.0014)
+[2025-08-22 18:36:56,930][19241] Fps is (10 sec: 6553.5, 60 sec: 6621.9, 300 sec: 5450.2). Total num frames: 1335296. Throughput: 0: 1725.5. Samples: 331906. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:36:56,933][19241] Avg episode reward: [(0, '4.194')]
+[2025-08-22 18:36:58,998][19431] Updated weights for policy 0, policy_version 330 (0.0014)
+[2025-08-22 18:37:01,930][19241] Fps is (10 sec: 6553.6, 60 sec: 6894.9, 300 sec: 5488.6). Total num frames: 1372160. Throughput: 0: 1720.4. Samples: 342098. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:37:01,931][19241] Avg episode reward: [(0, '4.500')]
+[2025-08-22 18:37:03,719][19431] Updated weights for policy 0, policy_version 340 (0.0011)
+[2025-08-22 18:37:06,930][19241] Fps is (10 sec: 8191.9, 60 sec: 7031.4, 300 sec: 5557.7). Total num frames: 1417216. Throughput: 0: 1763.4. Samples: 349210. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:37:06,931][19241] Avg episode reward: [(0, '4.308')]
+[2025-08-22 18:37:11,453][19431] Updated weights for policy 0, policy_version 350 (0.0012)
+[2025-08-22 18:37:11,930][19241] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 5513.8). Total num frames: 1433600. Throughput: 0: 1725.7. Samples: 358686. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:37:11,932][19241] Avg episode reward: [(0, '4.350')]
+[2025-08-22 18:37:16,930][19241] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 5533.5). Total num frames: 1466368. Throughput: 0: 1599.7. Samples: 366288. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:37:16,932][19241] Avg episode reward: [(0, '4.251')]
+[2025-08-22 18:37:18,050][19431] Updated weights for policy 0, policy_version 360 (0.0016)
+[2025-08-22 18:37:21,930][19241] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 5552.4). Total num frames: 1499136. Throughput: 0: 1583.4. Samples: 370336. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:37:21,932][19241] Avg episode reward: [(0, '4.518')]
+[2025-08-22 18:37:23,805][19431] Updated weights for policy 0, policy_version 370 (0.0013)
+[2025-08-22 18:37:26,930][19241] Fps is (10 sec: 6963.2, 60 sec: 6621.8, 300 sec: 5585.5). Total num frames: 1536000. Throughput: 0: 1654.8. Samples: 381204. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:37:26,932][19241] Avg episode reward: [(0, '4.355')]
+[2025-08-22 18:37:29,266][19431] Updated weights for policy 0, policy_version 380 (0.0013)
+[2025-08-22 18:37:31,930][19241] Fps is (10 sec: 7782.3, 60 sec: 6621.8, 300 sec: 5632.0). Total num frames: 1576960. Throughput: 0: 1701.9. Samples: 393510. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:37:31,932][19241] Avg episode reward: [(0, '4.261')]
+[2025-08-22 18:37:34,726][19431] Updated weights for policy 0, policy_version 390 (0.0013)
+[2025-08-22 18:37:36,930][19241] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 5648.2). Total num frames: 1609728. Throughput: 0: 1701.4. Samples: 398764. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:37:36,932][19241] Avg episode reward: [(0, '4.196')]
+[2025-08-22 18:37:40,534][19431] Updated weights for policy 0, policy_version 400 (0.0011)
+[2025-08-22 18:37:41,930][19241] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 5677.9). Total num frames: 1646592. Throughput: 0: 1714.3. Samples: 409048. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:37:41,931][19241] Avg episode reward: [(0, '4.345')]
+[2025-08-22 18:37:46,930][19241] Fps is (10 sec: 4915.1, 60 sec: 6485.3, 300 sec: 5623.3). Total num frames: 1658880. Throughput: 0: 1634.3. Samples: 415644. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:37:46,933][19241] Avg episode reward: [(0, '4.418')]
+[2025-08-22 18:37:49,506][19431] Updated weights for policy 0, policy_version 410 (0.0012)
+[2025-08-22 18:37:51,930][19241] Fps is (10 sec: 4915.2, 60 sec: 6485.3, 300 sec: 5748.3). Total num frames: 1695744. Throughput: 0: 1561.8. Samples: 419490. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:37:51,931][19241] Avg episode reward: [(0, '4.696')]
+[2025-08-22 18:37:51,934][19418] Saving new best policy, reward=4.696!
+[2025-08-22 18:37:55,389][19431] Updated weights for policy 0, policy_version 420 (0.0016)
+[2025-08-22 18:37:56,930][19241] Fps is (10 sec: 6963.4, 60 sec: 6553.6, 300 sec: 5845.5). Total num frames: 1728512. Throughput: 0: 1582.9. Samples: 429916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:37:56,932][19241] Avg episode reward: [(0, '4.514')]
+[2025-08-22 18:38:00,424][19431] Updated weights for policy 0, policy_version 430 (0.0014)
+[2025-08-22 18:38:01,930][19241] Fps is (10 sec: 7782.6, 60 sec: 6690.1, 300 sec: 5984.3). Total num frames: 1773568. Throughput: 0: 1682.7. Samples: 442010. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:38:01,931][19241] Avg episode reward: [(0, '4.417')]
+[2025-08-22 18:38:05,789][19431] Updated weights for policy 0, policy_version 440 (0.0014)
+[2025-08-22 18:38:06,930][19241] Fps is (10 sec: 8192.1, 60 sec: 6553.6, 300 sec: 6081.5). Total num frames: 1810432. Throughput: 0: 1713.6. Samples: 447446. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:38:06,931][19241] Avg episode reward: [(0, '4.464')]
+[2025-08-22 18:38:11,276][19431] Updated weights for policy 0, policy_version 450 (0.0013)
+[2025-08-22 18:38:11,930][19241] Fps is (10 sec: 7372.5, 60 sec: 6894.9, 300 sec: 6178.7). Total num frames: 1847296. Throughput: 0: 1725.7. Samples: 458862. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:38:11,932][19241] Avg episode reward: [(0, '4.562')]
+[2025-08-22 18:38:16,930][19241] Fps is (10 sec: 6963.3, 60 sec: 6895.0, 300 sec: 6262.0). Total num frames: 1880064. Throughput: 0: 1686.1. Samples: 469384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:38:16,931][19241] Avg episode reward: [(0, '4.574')]
+[2025-08-22 18:38:17,235][19431] Updated weights for policy 0, policy_version 460 (0.0013)
+[2025-08-22 18:38:21,930][19241] Fps is (10 sec: 4915.4, 60 sec: 6621.9, 300 sec: 6289.8). Total num frames: 1896448. Throughput: 0: 1682.2. Samples: 474464. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:38:21,931][19241] Avg episode reward: [(0, '4.471')]
+[2025-08-22 18:38:25,578][19431] Updated weights for policy 0, policy_version 470 (0.0012)
+[2025-08-22 18:38:26,930][19241] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6387.0). Total num frames: 1933312. Throughput: 0: 1591.0. Samples: 480644. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:38:26,931][19241] Avg episode reward: [(0, '4.313')]
+[2025-08-22 18:38:30,771][19431] Updated weights for policy 0, policy_version 480 (0.0011)
+[2025-08-22 18:38:31,930][19241] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6484.2). Total num frames: 1974272. Throughput: 0: 1706.9. Samples: 492454. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:38:31,931][19241] Avg episode reward: [(0, '4.472')]
+[2025-08-22 18:38:35,621][19431] Updated weights for policy 0, policy_version 490 (0.0012)
+[2025-08-22 18:38:36,930][19241] Fps is (10 sec: 8191.8, 60 sec: 6758.4, 300 sec: 6609.1). Total num frames: 2015232. Throughput: 0: 1766.5. Samples: 498982. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:38:36,931][19241] Avg episode reward: [(0, '4.294')]
+[2025-08-22 18:38:36,938][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000492_2015232.pth...
+[2025-08-22 18:38:36,996][19418] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000091_372736.pth
+[2025-08-22 18:38:40,843][19431] Updated weights for policy 0, policy_version 500 (0.0013)
+[2025-08-22 18:38:41,930][19241] Fps is (10 sec: 8192.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 2056192. Throughput: 0: 1796.0. Samples: 510734. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:38:41,931][19241] Avg episode reward: [(0, '4.569')]
+[2025-08-22 18:38:46,012][19431] Updated weights for policy 0, policy_version 510 (0.0011)
+[2025-08-22 18:38:46,930][19241] Fps is (10 sec: 7782.6, 60 sec: 7236.3, 300 sec: 6803.5). Total num frames: 2093056. Throughput: 0: 1792.3. Samples: 522664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-22 18:38:46,931][19241] Avg episode reward: [(0, '4.556')]
+[2025-08-22 18:38:51,289][19431] Updated weights for policy 0, policy_version 520 (0.0011)
+[2025-08-22 18:38:51,930][19241] Fps is (10 sec: 7782.3, 60 sec: 7304.6, 300 sec: 6845.2). Total num frames: 2134016. Throughput: 0: 1804.7. Samples: 528656. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-22 18:38:51,931][19241] Avg episode reward: [(0, '4.274')]
+[2025-08-22 18:38:56,968][19241] Fps is (10 sec: 5712.5, 60 sec: 7027.0, 300 sec: 6788.8). Total num frames: 2150400. Throughput: 0: 1675.9. Samples: 534340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-22 18:38:56,969][19241] Avg episode reward: [(0, '4.471')]
+[2025-08-22 18:38:59,850][19431] Updated weights for policy 0, policy_version 530 (0.0016)
+[2025-08-22 18:39:01,930][19241] Fps is (10 sec: 4915.2, 60 sec: 6826.6, 300 sec: 6789.6). Total num frames: 2183168. Throughput: 0: 1684.6. Samples: 545190. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:39:01,931][19241] Avg episode reward: [(0, '4.260')]
+[2025-08-22 18:39:05,622][19431] Updated weights for policy 0, policy_version 540 (0.0014)
+[2025-08-22 18:39:06,930][19241] Fps is (10 sec: 6990.0, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 2220032. Throughput: 0: 1690.9. Samples: 550554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:39:06,931][19241] Avg episode reward: [(0, '4.323')]
+[2025-08-22 18:39:11,841][19431] Updated weights for policy 0, policy_version 550 (0.0013)
+[2025-08-22 18:39:11,930][19241] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 2252800. Throughput: 0: 1772.5. Samples: 560408. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:39:11,931][19241] Avg episode reward: [(0, '4.423')]
+[2025-08-22 18:39:16,930][19241] Fps is (10 sec: 6553.4, 60 sec: 6758.3, 300 sec: 6859.1). Total num frames: 2285568. Throughput: 0: 1738.6. Samples: 570692. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:39:16,931][19241] Avg episode reward: [(0, '4.422')]
+[2025-08-22 18:39:18,200][19431] Updated weights for policy 0, policy_version 560 (0.0015)
+[2025-08-22 18:39:21,930][19241] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 2314240. Throughput: 0: 1691.7. Samples: 575110. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:39:21,931][19241] Avg episode reward: [(0, '4.559')]
+[2025-08-22 18:39:24,448][19431] Updated weights for policy 0, policy_version 570 (0.0012)
+[2025-08-22 18:39:26,930][19241] Fps is (10 sec: 6553.8, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 2351104. Throughput: 0: 1651.9. Samples: 585070. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:39:26,931][19241] Avg episode reward: [(0, '4.338')]
+[2025-08-22 18:39:32,156][19241] Fps is (10 sec: 5206.8, 60 sec: 6528.9, 300 sec: 6756.7). Total num frames: 2367488. Throughput: 0: 1502.4. Samples: 590614. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:39:32,157][19241] Avg episode reward: [(0, '4.384')]
+[2025-08-22 18:39:32,694][19431] Updated weights for policy 0, policy_version 580 (0.0010)
+[2025-08-22 18:39:36,930][19241] Fps is (10 sec: 6144.1, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 2412544. Throughput: 0: 1527.1. Samples: 597374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-22 18:39:36,930][19241] Avg episode reward: [(0, '4.455')]
+[2025-08-22 18:39:37,240][19431] Updated weights for policy 0, policy_version 590 (0.0010)
+[2025-08-22 18:39:41,912][19431] Updated weights for policy 0, policy_version 600 (0.0011)
+[2025-08-22 18:39:41,930][19241] Fps is (10 sec: 9220.1, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 2457600. Throughput: 0: 1707.2. Samples: 611100. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:39:41,931][19241] Avg episode reward: [(0, '4.335')]
+[2025-08-22 18:39:46,930][19241] Fps is (10 sec: 7782.2, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 2490368. Throughput: 0: 1706.5. Samples: 621984. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:39:46,931][19241] Avg episode reward: [(0, '4.447')]
+[2025-08-22 18:39:47,691][19431] Updated weights for policy 0, policy_version 610 (0.0014)
+[2025-08-22 18:39:51,930][19241] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6831.3). Total num frames: 2527232. Throughput: 0: 1709.8. Samples: 627496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:39:51,931][19241] Avg episode reward: [(0, '4.464')]
+[2025-08-22 18:39:53,009][19431] Updated weights for policy 0, policy_version 620 (0.0010)
+[2025-08-22 18:39:56,930][19241] Fps is (10 sec: 8192.1, 60 sec: 7036.0, 300 sec: 6872.9). Total num frames: 2572288. Throughput: 0: 1759.6. Samples: 639592. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:39:56,931][19241] Avg episode reward: [(0, '4.204')]
+[2025-08-22 18:39:57,708][19431] Updated weights for policy 0, policy_version 630 (0.0011)
+[2025-08-22 18:40:01,930][19241] Fps is (10 sec: 9011.2, 60 sec: 7236.3, 300 sec: 6886.8). Total num frames: 2617344. Throughput: 0: 1824.7. Samples: 652804. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:40:01,930][19241] Avg episode reward: [(0, '4.648')]
+[2025-08-22 18:40:02,376][19431] Updated weights for policy 0, policy_version 640 (0.0012)
+[2025-08-22 18:40:07,336][19241] Fps is (10 sec: 6297.6, 60 sec: 6916.4, 300 sec: 6821.9). Total num frames: 2637824. Throughput: 0: 1853.1. Samples: 659254. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:40:07,337][19241] Avg episode reward: [(0, '4.523')]
+[2025-08-22 18:40:10,426][19431] Updated weights for policy 0, policy_version 650 (0.0013)
+[2025-08-22 18:40:11,930][19241] Fps is (10 sec: 5734.4, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 2674688. Throughput: 0: 1777.3. Samples: 665048. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:40:11,931][19241] Avg episode reward: [(0, '4.371')]
+[2025-08-22 18:40:15,043][19431] Updated weights for policy 0, policy_version 660 (0.0013)
+[2025-08-22 18:40:16,930][19241] Fps is (10 sec: 8539.1, 60 sec: 7236.3, 300 sec: 6845.2). Total num frames: 2719744. Throughput: 0: 1958.6. Samples: 678306. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:40:16,932][19241] Avg episode reward: [(0, '4.395')]
+[2025-08-22 18:40:19,684][19431] Updated weights for policy 0, policy_version 670 (0.0010)
+[2025-08-22 18:40:21,930][19241] Fps is (10 sec: 8601.6, 60 sec: 7441.1, 300 sec: 6942.4). Total num frames: 2760704. Throughput: 0: 1946.0. Samples: 684942. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:40:21,931][19241] Avg episode reward: [(0, '4.584')]
+[2025-08-22 18:40:24,509][19431] Updated weights for policy 0, policy_version 680 (0.0011)
+[2025-08-22 18:40:26,930][19241] Fps is (10 sec: 8601.6, 60 sec: 7577.6, 300 sec: 6970.1). Total num frames: 2805760. Throughput: 0: 1923.9. Samples: 697674. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:40:26,930][19241] Avg episode reward: [(0, '4.283')]
+[2025-08-22 18:40:29,085][19431] Updated weights for policy 0, policy_version 690 (0.0010)
+[2025-08-22 18:40:31,930][19241] Fps is (10 sec: 9011.2, 60 sec: 8086.0, 300 sec: 6997.9). Total num frames: 2850816. Throughput: 0: 1984.2. Samples: 711272. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:40:31,931][19241] Avg episode reward: [(0, '4.515')]
+[2025-08-22 18:40:33,663][19431] Updated weights for policy 0, policy_version 700 (0.0011)
+[2025-08-22 18:40:36,930][19241] Fps is (10 sec: 9011.2, 60 sec: 8055.5, 300 sec: 7039.6). Total num frames: 2895872. Throughput: 0: 2009.0. Samples: 717900. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:40:36,930][19241] Avg episode reward: [(0, '4.274')]
+[2025-08-22 18:40:36,937][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000707_2895872.pth...
+[2025-08-22 18:40:37,037][19418] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000293_1200128.pth
+[2025-08-22 18:40:38,293][19431] Updated weights for policy 0, policy_version 710 (0.0008)
+[2025-08-22 18:40:42,515][19241] Fps is (10 sec: 6191.0, 60 sec: 7572.0, 300 sec: 6970.2). Total num frames: 2916352. Throughput: 0: 1865.1. Samples: 724614. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-22 18:40:42,517][19241] Avg episode reward: [(0, '4.308')]
+[2025-08-22 18:40:45,721][19431] Updated weights for policy 0, policy_version 720 (0.0012)
+[2025-08-22 18:40:46,930][19241] Fps is (10 sec: 6143.8, 60 sec: 7782.4, 300 sec: 6997.9). Total num frames: 2957312. Throughput: 0: 1898.8. Samples: 738252. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-22 18:40:46,932][19241] Avg episode reward: [(0, '4.317')]
+[2025-08-22 18:40:50,370][19431] Updated weights for policy 0, policy_version 730 (0.0012)
+[2025-08-22 18:40:51,930][19241] Fps is (10 sec: 9136.7, 60 sec: 7918.9, 300 sec: 6997.9). Total num frames: 3002368. Throughput: 0: 1918.9. Samples: 744824. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:40:51,931][19241] Avg episode reward: [(0, '4.413')]
+[2025-08-22 18:40:55,104][19431] Updated weights for policy 0, policy_version 740 (0.0014)
+[2025-08-22 18:40:56,930][19241] Fps is (10 sec: 9011.5, 60 sec: 7918.9, 300 sec: 7081.2). Total num frames: 3047424. Throughput: 0: 2065.8. Samples: 758008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:40:56,931][19241] Avg episode reward: [(0, '4.207')]
+[2025-08-22 18:40:59,746][19431] Updated weights for policy 0, policy_version 750 (0.0011)
+[2025-08-22 18:41:01,930][19241] Fps is (10 sec: 8601.6, 60 sec: 7850.7, 300 sec: 7095.1). Total num frames: 3088384. Throughput: 0: 2066.9. Samples: 771316. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:01,931][19241] Avg episode reward: [(0, '4.356')]
+[2025-08-22 18:41:04,325][19431] Updated weights for policy 0, policy_version 760 (0.0010)
+[2025-08-22 18:41:06,930][19241] Fps is (10 sec: 8601.5, 60 sec: 8316.6, 300 sec: 7136.8). Total num frames: 3133440. Throughput: 0: 2067.9. Samples: 777996. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:06,931][19241] Avg episode reward: [(0, '4.369')]
+[2025-08-22 18:41:09,322][19431] Updated weights for policy 0, policy_version 770 (0.0010)
+[2025-08-22 18:41:11,930][19241] Fps is (10 sec: 8601.6, 60 sec: 8328.5, 300 sec: 7164.5). Total num frames: 3174400. Throughput: 0: 2062.7. Samples: 790494. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:11,931][19241] Avg episode reward: [(0, '4.640')]
+[2025-08-22 18:41:14,024][19431] Updated weights for policy 0, policy_version 780 (0.0010)
+[2025-08-22 18:41:17,703][19241] Fps is (10 sec: 6083.4, 60 sec: 7885.6, 300 sec: 7104.3). Total num frames: 3198976. Throughput: 0: 1873.7. Samples: 797038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-22 18:41:17,704][19241] Avg episode reward: [(0, '4.622')]
+[2025-08-22 18:41:21,660][19431] Updated weights for policy 0, policy_version 790 (0.0013)
+[2025-08-22 18:41:21,930][19241] Fps is (10 sec: 6144.0, 60 sec: 7918.9, 300 sec: 7109.0). Total num frames: 3235840. Throughput: 0: 1906.3. Samples: 803682. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:21,931][19241] Avg episode reward: [(0, '4.455')]
+[2025-08-22 18:41:26,221][19431] Updated weights for policy 0, policy_version 800 (0.0011)
+[2025-08-22 18:41:26,930][19241] Fps is (10 sec: 8878.1, 60 sec: 7918.9, 300 sec: 7122.9). Total num frames: 3280896. Throughput: 0: 2077.3. Samples: 816876. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:26,932][19241] Avg episode reward: [(0, '4.282')]
+[2025-08-22 18:41:30,665][19431] Updated weights for policy 0, policy_version 810 (0.0010)
+[2025-08-22 18:41:31,930][19241] Fps is (10 sec: 9011.2, 60 sec: 7918.9, 300 sec: 7206.2). Total num frames: 3325952. Throughput: 0: 2053.3. Samples: 830650. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:31,931][19241] Avg episode reward: [(0, '4.424')]
+[2025-08-22 18:41:35,168][19431] Updated weights for policy 0, policy_version 820 (0.0011)
+[2025-08-22 18:41:36,930][19241] Fps is (10 sec: 9011.3, 60 sec: 7918.9, 300 sec: 7234.0). Total num frames: 3371008. Throughput: 0: 2061.2. Samples: 837578. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:36,931][19241] Avg episode reward: [(0, '4.372')]
+[2025-08-22 18:41:39,760][19431] Updated weights for policy 0, policy_version 830 (0.0009)
+[2025-08-22 18:41:41,930][19241] Fps is (10 sec: 9011.1, 60 sec: 8410.6, 300 sec: 7275.6). Total num frames: 3416064. Throughput: 0: 2064.0. Samples: 850888. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:41,931][19241] Avg episode reward: [(0, '4.442')]
+[2025-08-22 18:41:44,408][19431] Updated weights for policy 0, policy_version 840 (0.0013)
+[2025-08-22 18:41:46,930][19241] Fps is (10 sec: 9010.9, 60 sec: 8396.8, 300 sec: 7303.4). Total num frames: 3461120. Throughput: 0: 2063.4. Samples: 864168. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:46,932][19241] Avg episode reward: [(0, '4.593')]
+[2025-08-22 18:41:48,981][19431] Updated weights for policy 0, policy_version 850 (0.0012)
+[2025-08-22 18:41:52,882][19241] Fps is (10 sec: 6732.0, 60 sec: 7996.9, 300 sec: 7279.9). Total num frames: 3489792. Throughput: 0: 2020.9. Samples: 870862. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:52,883][19241] Avg episode reward: [(0, '4.376')]
+[2025-08-22 18:41:56,478][19431] Updated weights for policy 0, policy_version 860 (0.0009)
+[2025-08-22 18:41:56,930][19241] Fps is (10 sec: 6553.9, 60 sec: 7987.2, 300 sec: 7303.4). Total num frames: 3526656. Throughput: 0: 1937.3. Samples: 877672. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:41:56,931][19241] Avg episode reward: [(0, '4.438')]
+[2025-08-22 18:42:01,052][19431] Updated weights for policy 0, policy_version 870 (0.0010)
+[2025-08-22 18:42:01,930][19241] Fps is (10 sec: 9053.9, 60 sec: 8055.5, 300 sec: 7303.4). Total num frames: 3571712. Throughput: 0: 2129.7. Samples: 891228. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:42:01,931][19241] Avg episode reward: [(0, '4.632')]
+[2025-08-22 18:42:05,726][19431] Updated weights for policy 0, policy_version 880 (0.0009)
+[2025-08-22 18:42:06,930][19241] Fps is (10 sec: 8601.4, 60 sec: 7987.2, 300 sec: 7386.7). Total num frames: 3612672. Throughput: 0: 2093.3. Samples: 897882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:42:06,931][19241] Avg episode reward: [(0, '4.446')]
+[2025-08-22 18:42:10,810][19431] Updated weights for policy 0, policy_version 890 (0.0011)
+[2025-08-22 18:42:11,930][19241] Fps is (10 sec: 8191.9, 60 sec: 7987.2, 300 sec: 7414.5). Total num frames: 3653632. Throughput: 0: 2070.2. Samples: 910034. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:42:11,931][19241] Avg episode reward: [(0, '4.543')]
+[2025-08-22 18:42:15,380][19431] Updated weights for policy 0, policy_version 900 (0.0010)
+[2025-08-22 18:42:16,930][19241] Fps is (10 sec: 8601.7, 60 sec: 8437.2, 300 sec: 7456.1). Total num frames: 3698688. Throughput: 0: 2062.8. Samples: 923476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:42:16,931][19241] Avg episode reward: [(0, '4.347')]
+[2025-08-22 18:42:19,862][19431] Updated weights for policy 0, policy_version 910 (0.0008)
+[2025-08-22 18:42:21,930][19241] Fps is (10 sec: 9011.3, 60 sec: 8465.1, 300 sec: 7483.9). Total num frames: 3743744. Throughput: 0: 2061.0. Samples: 930324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:42:21,931][19241] Avg episode reward: [(0, '4.602')]
+[2025-08-22 18:42:24,380][19431] Updated weights for policy 0, policy_version 920 (0.0010)
+[2025-08-22 18:42:28,067][19241] Fps is (10 sec: 6619.9, 60 sec: 8039.6, 300 sec: 7413.7). Total num frames: 3772416. Throughput: 0: 2017.5. Samples: 943970. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:42:28,068][19241] Avg episode reward: [(0, '4.474')]
+[2025-08-22 18:42:31,695][19431] Updated weights for policy 0, policy_version 930 (0.0010)
+[2025-08-22 18:42:31,930][19241] Fps is (10 sec: 6553.5, 60 sec: 8055.4, 300 sec: 7456.1). Total num frames: 3809280. Throughput: 0: 1931.8. Samples: 951100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-22 18:42:31,931][19241] Avg episode reward: [(0, '4.354')]
+[2025-08-22 18:42:36,159][19431] Updated weights for policy 0, policy_version 940 (0.0009)
+[2025-08-22 18:42:36,930][19241] Fps is (10 sec: 9243.1, 60 sec: 8055.5, 300 sec: 7483.9). Total num frames: 3854336. Throughput: 0: 1982.1. Samples: 958172. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:42:36,931][19241] Avg episode reward: [(0, '4.457')]
+[2025-08-22 18:42:36,936][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000941_3854336.pth...
+[2025-08-22 18:42:36,936][19241] Components not started: RolloutWorker_w1, RolloutWorker_w2, RolloutWorker_w3, RolloutWorker_w7, wait_time=600.0 seconds
+[2025-08-22 18:42:37,014][19418] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000492_2015232.pth
+[2025-08-22 18:42:40,762][19431] Updated weights for policy 0, policy_version 950 (0.0009)
+[2025-08-22 18:42:41,930][19241] Fps is (10 sec: 9011.4, 60 sec: 8055.5, 300 sec: 7595.0). Total num frames: 3899392. Throughput: 0: 2087.6. Samples: 971612. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-22 18:42:41,931][19241] Avg episode reward: [(0, '4.512')]
+[2025-08-22 18:42:45,221][19431] Updated weights for policy 0, policy_version 960 (0.0008)
+[2025-08-22 18:42:46,930][19241] Fps is (10 sec: 9011.2, 60 sec: 8055.5, 300 sec: 7622.7). Total num frames: 3944448. Throughput: 0: 2090.0. Samples: 985278. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:42:46,932][19241] Avg episode reward: [(0, '4.577')]
+[2025-08-22 18:42:49,574][19431] Updated weights for policy 0, policy_version 970 (0.0010)
+[2025-08-22 18:42:51,930][19241] Fps is (10 sec: 9420.8, 60 sec: 8532.2, 300 sec: 7678.3). Total num frames: 3993600. Throughput: 0: 2100.2. Samples: 992390. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-22 18:42:51,931][19241] Avg episode reward: [(0, '4.402')]
+[2025-08-22 18:42:53,098][19418] Stopping Batcher_0...
+[2025-08-22 18:42:53,099][19418] Loop batcher_evt_loop terminating...
+[2025-08-22 18:42:53,100][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 18:42:53,109][19241] Component Batcher_0 stopped!
+[2025-08-22 18:42:53,113][19241] Component RolloutWorker_w1 process died already! Don't wait for it.
+[2025-08-22 18:42:53,114][19241] Component RolloutWorker_w2 process died already! Don't wait for it.
+[2025-08-22 18:42:53,116][19241] Component RolloutWorker_w3 process died already! Don't wait for it.
+[2025-08-22 18:42:53,117][19241] Component RolloutWorker_w7 process died already! Don't wait for it.
+[2025-08-22 18:42:53,148][19241] Component RolloutWorker_w5 stopped!
+[2025-08-22 18:42:53,149][19438] Stopping RolloutWorker_w5...
+[2025-08-22 18:42:53,152][19438] Loop rollout_proc5_evt_loop terminating...
+[2025-08-22 18:42:53,151][19241] Component RolloutWorker_w0 stopped!
+[2025-08-22 18:42:53,151][19433] Stopping RolloutWorker_w0...
+[2025-08-22 18:42:53,153][19433] Loop rollout_proc0_evt_loop terminating...
+[2025-08-22 18:42:53,164][19418] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000707_2895872.pth
+[2025-08-22 18:42:53,163][19241] Component RolloutWorker_w4 stopped!
+[2025-08-22 18:42:53,163][19437] Stopping RolloutWorker_w4...
+[2025-08-22 18:42:53,168][19437] Loop rollout_proc4_evt_loop terminating...
+[2025-08-22 18:42:53,168][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 18:42:53,185][19241] Component RolloutWorker_w6 stopped!
+[2025-08-22 18:42:53,186][19435] Stopping RolloutWorker_w6...
+[2025-08-22 18:42:53,190][19435] Loop rollout_proc6_evt_loop terminating...
+[2025-08-22 18:42:53,227][19418] Stopping LearnerWorker_p0...
+[2025-08-22 18:42:53,228][19418] Loop learner_proc0_evt_loop terminating...
+[2025-08-22 18:42:53,228][19241] Component LearnerWorker_p0 stopped!
+[2025-08-22 18:42:53,348][19431] Weights refcount: 2 0
+[2025-08-22 18:42:53,357][19431] Stopping InferenceWorker_p0-w0...
+[2025-08-22 18:42:53,357][19431] Loop inference_proc0-0_evt_loop terminating...
+[2025-08-22 18:42:53,357][19241] Component InferenceWorker_p0-w0 stopped!
+[2025-08-22 18:42:53,359][19241] Waiting for process learner_proc0 to stop...
+[2025-08-22 18:42:55,644][19241] Waiting for process inference_proc0-0 to join...
+[2025-08-22 18:42:55,646][19241] Waiting for process rollout_proc0 to join...
+[2025-08-22 18:42:55,647][19241] Waiting for process rollout_proc1 to join...
+[2025-08-22 18:42:55,647][19241] Waiting for process rollout_proc2 to join...
+[2025-08-22 18:42:55,648][19241] Waiting for process rollout_proc3 to join...
+[2025-08-22 18:42:55,649][19241] Waiting for process rollout_proc4 to join...
+[2025-08-22 18:42:55,650][19241] Waiting for process rollout_proc5 to join...
+[2025-08-22 18:42:55,651][19241] Waiting for process rollout_proc6 to join...
+[2025-08-22 18:42:55,651][19241] Waiting for process rollout_proc7 to join...
+[2025-08-22 18:42:55,652][19241] Batcher 0 profile tree view:
+batching: 12.5134, releasing_batches: 0.0353
+[2025-08-22 18:42:55,654][19241] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0000
+  wait_policy_total: 28.2706
+update_model: 6.9900
+  weight_update: 0.0011
+one_step: 0.0029
+  handle_policy_step: 549.4462
+    deserialize: 13.7269, stack: 2.5084, obs_to_device_normalize: 133.6489, forward: 259.6892, send_messages: 28.1017
+    prepare_outputs: 97.0431
+      to_cpu: 74.6299
+[2025-08-22 18:42:55,656][19241] Learner 0 profile tree view:
+misc: 0.0062, prepare_batch: 13.1796
+train: 45.9289
+  epoch_init: 0.0050, minibatch_init: 0.0067, losses_postprocess: 0.5105, kl_divergence: 0.5937, after_optimizer: 18.0551
+  calculate_losses: 17.8366
+    losses_init: 0.0029, forward_head: 1.3506, bptt_initial: 12.2350, tail: 0.7778, advantages_returns: 0.2192, losses: 1.6308
+    bptt: 1.4168
+      bptt_forward_core: 1.3449
+  update: 8.4784
+    clip: 0.9633
+[2025-08-22 18:42:55,656][19241] RolloutWorker_w0 profile tree view:
+wait_for_trajectories: 0.2265, enqueue_policy_requests: 18.0249, env_step: 234.4725, overhead: 13.7474, complete_rollouts: 0.5150
+save_policy_outputs: 15.3030
+  split_output_tensors: 5.3707
+[2025-08-22 18:42:55,658][19241] Loop Runner_EvtLoop terminating...
+[2025-08-22 18:42:55,660][19241] Runner profile tree view:
+main_loop: 613.6903
+[2025-08-22 18:42:55,661][19241] Collected {0: 4005888}, FPS: 6527.5
+[2025-08-22 19:02:09,902][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:02:09,903][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:02:09,904][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:02:09,905][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:02:09,906][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:02:09,906][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:02:09,907][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:02:09,908][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:02:09,909][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-22 19:02:09,909][19241] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-22 19:02:09,911][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:02:09,911][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:02:09,912][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:02:09,913][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:02:09,914][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:02:09,976][19241] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-22 19:02:09,986][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:02:09,995][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:02:10,089][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:02:10,332][19241] Conv encoder output size: 512
+[2025-08-22 19:02:10,333][19241] Policy head output size: 512
+[2025-08-22 19:02:11,755][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:02:11,766][19241] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:02:11,771][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:02:11,772][19241] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:02:11,774][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:02:11,776][19241] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:02:21,150][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:02:21,153][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:02:21,154][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:02:21,155][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:02:21,156][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:02:21,157][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:02:21,158][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:02:21,160][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:02:21,161][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-22 19:02:21,163][19241] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-22 19:02:21,163][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:02:21,164][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:02:21,165][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:02:21,166][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:02:21,166][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:02:21,200][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:02:21,203][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:02:21,216][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:02:21,251][19241] Conv encoder output size: 512
+[2025-08-22 19:02:21,252][19241] Policy head output size: 512
+[2025-08-22 19:02:21,271][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:02:21,274][19241] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:02:21,275][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:02:21,277][19241] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:02:21,279][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:02:21,281][19241] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:02:52,880][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:02:52,881][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:02:52,882][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:02:52,883][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:02:52,884][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:02:52,885][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:02:52,886][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-22 19:02:52,887][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:02:52,888][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-22 19:02:52,889][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-22 19:02:52,890][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:02:52,892][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:02:52,893][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:02:52,894][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:02:52,895][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:02:52,924][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:02:52,926][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:02:52,939][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:02:52,977][19241] Conv encoder output size: 512
+[2025-08-22 19:02:52,978][19241] Policy head output size: 512
+[2025-08-22 19:02:53,006][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:02:53,009][19241] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:02:53,011][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:02:53,012][19241] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:02:53,014][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:02:53,017][19241] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:07:28,136][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:07:28,138][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:07:28,140][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:07:28,142][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:07:28,143][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:07:28,144][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:07:28,144][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:07:28,145][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:07:28,147][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-22 19:07:28,148][19241] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-22 19:07:28,148][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:07:28,149][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:07:28,150][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:07:28,151][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:07:28,152][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:07:28,182][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:07:28,184][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:07:28,198][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:07:28,246][19241] Conv encoder output size: 512
+[2025-08-22 19:07:28,247][19241] Policy head output size: 512
+[2025-08-22 19:07:28,292][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:07:28,296][19241] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.dtype])` or the `torch.serialization.safe_globals([numpy.dtype])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:07:28,298][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:07:28,299][19241] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.dtype])` or the `torch.serialization.safe_globals([numpy.dtype])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:07:28,300][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:07:28,302][19241] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.dtype])` or the `torch.serialization.safe_globals([numpy.dtype])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:08:04,895][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:08:04,897][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:08:04,898][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:08:04,898][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:08:04,899][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:08:04,900][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:08:04,900][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:08:04,901][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:08:04,902][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-22 19:08:04,902][19241] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-22 19:08:04,903][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:08:04,904][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:08:04,904][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:08:04,905][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:08:04,907][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:08:04,928][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:08:04,930][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:08:04,939][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:08:04,976][19241] Conv encoder output size: 512
+[2025-08-22 19:08:04,979][19241] Policy head output size: 512
+[2025-08-22 19:08:05,023][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:08:05,024][19241] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:08:05,026][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:08:05,027][19241] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:08:05,028][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:08:05,029][19241] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:10:32,993][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:10:32,995][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:10:32,997][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:10:32,998][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:10:32,999][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:10:33,001][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:10:33,001][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:10:33,003][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:10:33,004][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-22 19:10:33,005][19241] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-22 19:10:33,006][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:10:33,007][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:10:33,008][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:10:33,009][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:10:33,011][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:10:33,039][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:10:33,041][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:10:33,051][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:10:33,086][19241] Conv encoder output size: 512
+[2025-08-22 19:10:33,088][19241] Policy head output size: 512
+[2025-08-22 19:10:33,116][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:10:33,120][19241] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:10:33,123][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:10:33,125][19241] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:10:33,126][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:10:33,128][19241] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:11:15,711][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:11:15,712][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:11:15,714][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:11:15,716][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:11:15,717][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:11:15,718][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:11:15,719][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:11:15,720][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:11:15,721][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-22 19:11:15,722][19241] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-22 19:11:15,723][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:11:15,723][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:11:15,724][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:11:15,725][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:11:15,726][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:11:15,754][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:11:15,757][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:11:15,766][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:11:15,809][19241] Conv encoder output size: 512
+[2025-08-22 19:11:15,811][19241] Policy head output size: 512
+[2025-08-22 19:11:15,869][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:11:15,871][19241] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:11:15,874][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:11:15,875][19241] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:11:15,877][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:11:15,879][19241] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:11:30,952][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:11:30,953][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:11:30,954][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:11:30,956][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:11:30,957][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:11:30,958][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:11:30,960][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:11:30,962][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:11:30,963][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-22 19:11:30,964][19241] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-22 19:11:30,966][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:11:30,967][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:11:30,969][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:11:30,970][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:11:30,972][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:11:30,998][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:11:31,000][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:11:31,008][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:11:31,049][19241] Conv encoder output size: 512
+[2025-08-22 19:11:31,050][19241] Policy head output size: 512
+[2025-08-22 19:11:31,092][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:11:31,094][19241] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:11:31,095][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:11:31,096][19241] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:11:31,097][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:11:31,098][19241] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-22 19:11:54,717][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:11:54,718][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:11:54,720][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:11:54,720][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:11:54,721][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:11:54,723][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:11:54,724][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:11:54,725][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:11:54,726][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-22 19:11:54,727][19241] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-22 19:11:54,728][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:11:54,729][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:11:54,730][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:11:54,731][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:11:54,732][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:11:54,758][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:11:54,761][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:11:54,770][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:11:54,813][19241] Conv encoder output size: 512
+[2025-08-22 19:11:54,815][19241] Policy head output size: 512
+[2025-08-22 19:11:54,857][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:11:56,713][19241] Num frames 100...
+[2025-08-22 19:11:56,918][19241] Num frames 200...
+[2025-08-22 19:11:57,168][19241] Num frames 300...
+[2025-08-22 19:11:57,394][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-22 19:11:57,395][19241] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-22 19:11:57,429][19241] Num frames 400...
+[2025-08-22 19:11:57,621][19241] Num frames 500...
+[2025-08-22 19:11:57,811][19241] Num frames 600...
+[2025-08-22 19:11:58,008][19241] Num frames 700...
+[2025-08-22 19:11:58,192][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-22 19:11:58,193][19241] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-22 19:11:58,252][19241] Num frames 800...
+[2025-08-22 19:11:58,444][19241] Num frames 900...
+[2025-08-22 19:11:58,680][19241] Num frames 1000...
+[2025-08-22 19:11:58,889][19241] Num frames 1100...
+[2025-08-22 19:11:59,056][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-22 19:11:59,058][19241] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-22 19:11:59,169][19241] Num frames 1200...
+[2025-08-22 19:12:03,384][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:12:03,386][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:12:03,387][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:12:03,388][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:12:03,389][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:12:03,390][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:12:03,391][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:12:03,392][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:12:03,393][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-22 19:12:03,394][19241] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-22 19:12:03,395][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:12:03,395][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:12:03,396][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:12:03,398][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:12:03,398][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:12:03,427][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:12:03,429][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:12:03,438][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:12:03,468][19241] Conv encoder output size: 512
+[2025-08-22 19:12:03,470][19241] Policy head output size: 512
+[2025-08-22 19:12:03,490][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:12:03,864][19241] Num frames 100...
+[2025-08-22 19:12:04,041][19241] Num frames 200...
+[2025-08-22 19:12:04,233][19241] Num frames 300...
+[2025-08-22 19:12:04,413][19241] Num frames 400...
+[2025-08-22 19:12:04,555][19241] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
+[2025-08-22 19:12:04,556][19241] Avg episode reward: 5.480, avg true_objective: 4.480
+[2025-08-22 19:12:04,658][19241] Num frames 500...
+[2025-08-22 19:12:04,869][19241] Num frames 600...
+[2025-08-22 19:12:05,070][19241] Num frames 700...
+[2025-08-22 19:12:05,247][19241] Num frames 800...
+[2025-08-22 19:12:05,369][19241] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
+[2025-08-22 19:12:05,370][19241] Avg episode reward: 4.660, avg true_objective: 4.160
+[2025-08-22 19:12:05,512][19241] Num frames 900...
+[2025-08-22 19:12:05,686][19241] Num frames 1000...
+[2025-08-22 19:12:05,866][19241] Num frames 1100...
+[2025-08-22 19:12:06,064][19241] Num frames 1200...
+[2025-08-22 19:12:06,154][19241] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053
+[2025-08-22 19:12:06,156][19241] Avg episode reward: 4.387, avg true_objective: 4.053
+[2025-08-22 19:12:06,324][19241] Num frames 1300...
+[2025-08-22 19:12:06,483][19241] Num frames 1400...
+[2025-08-22 19:12:06,674][19241] Num frames 1500...
+[2025-08-22 19:12:06,899][19241] Num frames 1600...
+[2025-08-22 19:12:07,068][19241] Num frames 1700...
+[2025-08-22 19:12:07,253][19241] Num frames 1800...
+[2025-08-22 19:12:07,354][19241] Avg episode rewards: #0: 5.560, true rewards: #0: 4.560
+[2025-08-22 19:12:07,355][19241] Avg episode reward: 5.560, avg true_objective: 4.560
+[2025-08-22 19:12:07,531][19241] Num frames 1900...
+[2025-08-22 19:12:07,902][19241] Num frames 2000...
+[2025-08-22 19:12:08,175][19241] Num frames 2100...
+[2025-08-22 19:12:08,416][19241] Num frames 2200...
+[2025-08-22 19:12:08,489][19241] Avg episode rewards: #0: 5.216, true rewards: #0: 4.416
+[2025-08-22 19:12:08,489][19241] Avg episode reward: 5.216, avg true_objective: 4.416
+[2025-08-22 19:12:08,640][19241] Num frames 2300...
+[2025-08-22 19:12:08,828][19241] Num frames 2400...
+[2025-08-22 19:12:09,019][19241] Num frames 2500...
+[2025-08-22 19:12:09,232][19241] Avg episode rewards: #0: 4.987, true rewards: #0: 4.320
+[2025-08-22 19:12:09,234][19241] Avg episode reward: 4.987, avg true_objective: 4.320
+[2025-08-22 19:12:09,252][19241] Num frames 2600...
+[2025-08-22 19:12:09,463][19241] Num frames 2700...
+[2025-08-22 19:12:09,653][19241] Num frames 2800...
+[2025-08-22 19:12:09,856][19241] Num frames 2900...
+[2025-08-22 19:12:10,063][19241] Avg episode rewards: #0: 4.823, true rewards: #0: 4.251
+[2025-08-22 19:12:10,065][19241] Avg episode reward: 4.823, avg true_objective: 4.251
+[2025-08-22 19:12:10,115][19241] Num frames 3000...
+[2025-08-22 19:12:10,313][19241] Num frames 3100...
+[2025-08-22 19:12:10,519][19241] Num frames 3200...
+[2025-08-22 19:12:10,722][19241] Num frames 3300...
+[2025-08-22 19:12:10,890][19241] Avg episode rewards: #0: 4.700, true rewards: #0: 4.200
+[2025-08-22 19:12:10,891][19241] Avg episode reward: 4.700, avg true_objective: 4.200
+[2025-08-22 19:12:11,017][19241] Num frames 3400...
+[2025-08-22 19:12:11,282][19241] Num frames 3500...
+[2025-08-22 19:12:11,455][19241] Num frames 3600...
+[2025-08-22 19:12:11,631][19241] Num frames 3700...
+[2025-08-22 19:12:11,819][19241] Avg episode rewards: #0: 4.751, true rewards: #0: 4.196
+[2025-08-22 19:12:11,820][19241] Avg episode reward: 4.751, avg true_objective: 4.196
+[2025-08-22 19:12:11,874][19241] Num frames 3800...
+[2025-08-22 19:12:12,061][19241] Num frames 3900...
+[2025-08-22 19:12:12,251][19241] Num frames 4000...
+[2025-08-22 19:12:12,445][19241] Num frames 4100...
+[2025-08-22 19:12:12,644][19241] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
+[2025-08-22 19:12:12,646][19241] Avg episode reward: 4.660, avg true_objective: 4.160
+[2025-08-22 19:12:18,808][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!
+[2025-08-22 19:12:25,954][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:12:25,955][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:12:25,956][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:12:25,957][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:12:25,959][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:12:25,960][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:12:25,961][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-22 19:12:25,962][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:12:25,963][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-22 19:12:25,964][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-22 19:12:25,965][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:12:25,965][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:12:25,966][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:12:25,967][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:12:25,968][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:12:25,984][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:12:25,986][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:12:25,997][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:12:26,036][19241] Conv encoder output size: 512
+[2025-08-22 19:12:26,038][19241] Policy head output size: 512
+[2025-08-22 19:12:26,058][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:12:26,470][19241] Num frames 100...
+[2025-08-22 19:12:26,654][19241] Num frames 200...
+[2025-08-22 19:12:26,837][19241] Num frames 300...
+[2025-08-22 19:12:27,042][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-22 19:12:27,043][19241] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-22 19:12:27,081][19241] Num frames 400...
+[2025-08-22 19:12:27,286][19241] Num frames 500...
+[2025-08-22 19:12:27,455][19241] Num frames 600...
+[2025-08-22 19:12:27,634][19241] Num frames 700...
+[2025-08-22 19:12:27,849][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-22 19:12:27,850][19241] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-22 19:12:27,929][19241] Num frames 800...
+[2025-08-22 19:12:28,094][19241] Num frames 900...
+[2025-08-22 19:12:28,264][19241] Num frames 1000...
+[2025-08-22 19:12:28,439][19241] Num frames 1100...
+[2025-08-22 19:12:28,589][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-22 19:12:28,591][19241] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-22 19:12:28,686][19241] Num frames 1200...
+[2025-08-22 19:12:28,856][19241] Num frames 1300...
+[2025-08-22 19:12:29,029][19241] Num frames 1400...
+[2025-08-22 19:12:29,193][19241] Num frames 1500...
+[2025-08-22 19:12:29,365][19241] Avg episode rewards: #0: 4.170, true rewards: #0: 3.920
+[2025-08-22 19:12:29,366][19241] Avg episode reward: 4.170, avg true_objective: 3.920
+[2025-08-22 19:12:29,428][19241] Num frames 1600...
+[2025-08-22 19:12:29,606][19241] Num frames 1700...
+[2025-08-22 19:12:29,769][19241] Num frames 1800...
+[2025-08-22 19:12:29,944][19241] Num frames 1900...
+[2025-08-22 19:12:30,164][19241] Avg episode rewards: #0: 4.368, true rewards: #0: 3.968
+[2025-08-22 19:12:30,166][19241] Avg episode reward: 4.368, avg true_objective: 3.968
+[2025-08-22 19:12:30,207][19241] Num frames 2000...
+[2025-08-22 19:12:30,532][19241] Num frames 2100...
+[2025-08-22 19:12:30,774][19241] Num frames 2200...
+[2025-08-22 19:12:30,991][19241] Num frames 2300...
+[2025-08-22 19:12:31,184][19241] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947
+[2025-08-22 19:12:31,186][19241] Avg episode reward: 4.280, avg true_objective: 3.947
+[2025-08-22 19:12:31,269][19241] Num frames 2400...
+[2025-08-22 19:12:31,492][19241] Num frames 2500...
+[2025-08-22 19:12:31,697][19241] Num frames 2600...
+[2025-08-22 19:12:31,903][19241] Num frames 2700...
+[2025-08-22 19:12:32,111][19241] Num frames 2800...
+[2025-08-22 19:12:32,204][19241] Avg episode rewards: #0: 4.451, true rewards: #0: 4.023
+[2025-08-22 19:12:32,206][19241] Avg episode reward: 4.451, avg true_objective: 4.023
+[2025-08-22 19:12:32,396][19241] Num frames 2900...
+[2025-08-22 19:12:32,580][19241] Num frames 3000...
+[2025-08-22 19:12:32,772][19241] Num frames 3100...
+[2025-08-22 19:12:32,952][19241] Num frames 3200...
+[2025-08-22 19:12:33,067][19241] Avg episode rewards: #0: 4.540, true rewards: #0: 4.040
+[2025-08-22 19:12:33,069][19241] Avg episode reward: 4.540, avg true_objective: 4.040
+[2025-08-22 19:12:33,183][19241] Num frames 3300...
+[2025-08-22 19:12:33,445][19241] Num frames 3400...
+[2025-08-22 19:12:33,618][19241] Num frames 3500...
+[2025-08-22 19:12:33,815][19241] Num frames 3600...
+[2025-08-22 19:12:33,992][19241] Num frames 3700...
+[2025-08-22 19:12:34,188][19241] Avg episode rewards: #0: 4.862, true rewards: #0: 4.196
+[2025-08-22 19:12:34,190][19241] Avg episode reward: 4.862, avg true_objective: 4.196
+[2025-08-22 19:12:34,258][19241] Num frames 3800...
+[2025-08-22 19:12:34,467][19241] Num frames 3900...
+[2025-08-22 19:12:34,670][19241] Num frames 4000...
+[2025-08-22 19:12:34,850][19241] Num frames 4100...
+[2025-08-22 19:12:35,004][19241] Avg episode rewards: #0: 4.760, true rewards: #0: 4.160
+[2025-08-22 19:12:35,006][19241] Avg episode reward: 4.760, avg true_objective: 4.160
+[2025-08-22 19:12:40,632][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!
+[2025-08-22 19:14:00,178][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:14:00,179][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:14:00,180][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:14:00,181][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:14:00,182][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:14:00,183][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:14:00,184][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-22 19:14:00,185][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:14:00,186][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-22 19:14:00,187][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-22 19:14:00,188][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:14:00,190][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:14:00,191][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:14:00,192][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:14:00,192][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:14:00,222][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:14:00,224][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:14:00,236][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:14:00,274][19241] Conv encoder output size: 512
+[2025-08-22 19:14:00,276][19241] Policy head output size: 512
+[2025-08-22 19:14:00,297][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:14:00,715][19241] Num frames 100...
+[2025-08-22 19:14:00,890][19241] Num frames 200...
+[2025-08-22 19:14:01,079][19241] Num frames 300...
+[2025-08-22 19:14:01,282][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-22 19:14:01,283][19241] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-22 19:14:01,310][19241] Num frames 400...
+[2025-08-22 19:14:01,505][19241] Num frames 500...
+[2025-08-22 19:14:01,744][19241] Num frames 600...
+[2025-08-22 19:14:01,916][19241] Num frames 700...
+[2025-08-22 19:14:02,100][19241] Num frames 800...
+[2025-08-22 19:14:02,200][19241] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
+[2025-08-22 19:14:02,201][19241] Avg episode reward: 4.660, avg true_objective: 4.160
+[2025-08-22 19:14:02,318][19241] Num frames 900...
+[2025-08-22 19:14:02,499][19241] Num frames 1000...
+[2025-08-22 19:14:02,710][19241] Num frames 1100...
+[2025-08-22 19:14:02,935][19241] Num frames 1200...
+[2025-08-22 19:14:03,134][19241] Num frames 1300...
+[2025-08-22 19:14:03,335][19241] Avg episode rewards: #0: 5.587, true rewards: #0: 4.587
+[2025-08-22 19:14:03,336][19241] Avg episode reward: 5.587, avg true_objective: 4.587
+[2025-08-22 19:14:03,385][19241] Num frames 1400...
+[2025-08-22 19:14:03,563][19241] Num frames 1500...
+[2025-08-22 19:14:03,732][19241] Num frames 1600...
+[2025-08-22 19:14:03,918][19241] Num frames 1700...
+[2025-08-22 19:14:04,116][19241] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
+[2025-08-22 19:14:04,117][19241] Avg episode reward: 5.480, avg true_objective: 4.480
+[2025-08-22 19:14:04,133][19241] Num frames 1800...
+[2025-08-22 19:14:04,319][19241] Num frames 1900...
+[2025-08-22 19:14:04,497][19241] Num frames 2000...
+[2025-08-22 19:14:04,696][19241] Num frames 2100...
+[2025-08-22 19:14:04,891][19241] Avg episode rewards: #0: 5.152, true rewards: #0: 4.352
+[2025-08-22 19:14:04,892][19241] Avg episode reward: 5.152, avg true_objective: 4.352
+[2025-08-22 19:14:04,947][19241] Num frames 2200...
+[2025-08-22 19:14:08,057][19241] Num frames 2300...
+[2025-08-22 19:14:08,240][19241] Num frames 2400...
+[2025-08-22 19:14:08,497][19241] Num frames 2500...
+[2025-08-22 19:14:08,661][19241] Avg episode rewards: #0: 4.933, true rewards: #0: 4.267
+[2025-08-22 19:14:08,662][19241] Avg episode reward: 4.933, avg true_objective: 4.267
+[2025-08-22 19:14:08,735][19241] Num frames 2600...
+[2025-08-22 19:14:08,914][19241] Num frames 2700...
+[2025-08-22 19:14:09,087][19241] Num frames 2800...
+[2025-08-22 19:14:09,287][19241] Num frames 2900...
+[2025-08-22 19:14:09,434][19241] Avg episode rewards: #0: 4.777, true rewards: #0: 4.206
+[2025-08-22 19:14:09,435][19241] Avg episode reward: 4.777, avg true_objective: 4.206
+[2025-08-22 19:14:09,566][19241] Num frames 3000...
+[2025-08-22 19:14:09,790][19241] Num frames 3100...
+[2025-08-22 19:14:10,003][19241] Num frames 3200...
+[2025-08-22 19:14:10,211][19241] Num frames 3300...
+[2025-08-22 19:14:10,362][19241] Avg episode rewards: #0: 4.825, true rewards: #0: 4.200
+[2025-08-22 19:14:10,364][19241] Avg episode reward: 4.825, avg true_objective: 4.200
+[2025-08-22 19:14:10,434][19241] Num frames 3400...
+[2025-08-22 19:14:10,623][19241] Num frames 3500...
+[2025-08-22 19:14:10,854][19241] Num frames 3600...
+[2025-08-22 19:14:11,040][19241] Num frames 3700...
+[2025-08-22 19:14:11,169][19241] Avg episode rewards: #0: 4.716, true rewards: #0: 4.160
+[2025-08-22 19:14:11,170][19241] Avg episode reward: 4.716, avg true_objective: 4.160
+[2025-08-22 19:14:11,272][19241] Num frames 3800...
+[2025-08-22 19:14:11,460][19241] Num frames 3900...
+[2025-08-22 19:14:11,633][19241] Num frames 4000...
+[2025-08-22 19:14:11,804][19241] Num frames 4100...
+[2025-08-22 19:14:11,915][19241] Avg episode rewards: #0: 4.628, true rewards: #0: 4.128
+[2025-08-22 19:14:11,916][19241] Avg episode reward: 4.628, avg true_objective: 4.128
+[2025-08-22 19:14:17,610][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!
+[2025-08-22 19:19:03,331][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:19:03,333][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:19:03,335][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:19:03,336][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:19:03,337][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:19:03,338][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:19:03,339][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-22 19:19:03,340][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:19:03,341][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-22 19:19:03,343][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-22 19:19:03,344][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:19:03,344][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:19:03,345][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:19:03,346][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:19:03,347][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:19:03,386][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:19:03,389][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:19:03,404][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:19:03,466][19241] Conv encoder output size: 512
+[2025-08-22 19:19:03,472][19241] Policy head output size: 512
+[2025-08-22 19:19:03,511][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:19:03,994][19241] Num frames 100...
+[2025-08-22 19:19:04,180][19241] Num frames 200...
+[2025-08-22 19:19:04,358][19241] Num frames 300...
+[2025-08-22 19:19:04,532][19241] Num frames 400...
+[2025-08-22 19:19:04,675][19241] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
+[2025-08-22 19:19:04,676][19241] Avg episode reward: 5.480, avg true_objective: 4.480
+[2025-08-22 19:19:04,778][19241] Num frames 500...
+[2025-08-22 19:19:04,948][19241] Num frames 600...
+[2025-08-22 19:19:05,126][19241] Num frames 700...
+[2025-08-22 19:19:05,229][19241] Avg episode rewards: #0: 4.110, true rewards: #0: 3.610
+[2025-08-22 19:19:05,230][19241] Avg episode reward: 4.110, avg true_objective: 3.610
+[2025-08-22 19:19:05,368][19241] Num frames 800...
+[2025-08-22 19:19:05,555][19241] Num frames 900...
+[2025-08-22 19:19:05,749][19241] Avg episode rewards: #0: 3.593, true rewards: #0: 3.260
+[2025-08-22 19:19:05,751][19241] Avg episode reward: 3.593, avg true_objective: 3.260
+[2025-08-22 19:19:05,798][19241] Num frames 1000...
+[2025-08-22 19:19:05,987][19241] Num frames 1100...
+[2025-08-22 19:19:06,186][19241] Num frames 1200...
+[2025-08-22 19:19:06,368][19241] Num frames 1300...
+[2025-08-22 19:19:06,540][19241] Avg episode rewards: #0: 3.655, true rewards: #0: 3.405
+[2025-08-22 19:19:06,542][19241] Avg episode reward: 3.655, avg true_objective: 3.405
+[2025-08-22 19:19:06,622][19241] Num frames 1400...
+[2025-08-22 19:19:06,884][19241] Num frames 1500...
+[2025-08-22 19:19:07,055][19241] Num frames 1600...
+[2025-08-22 19:19:07,263][19241] Num frames 1700...
+[2025-08-22 19:19:07,402][19241] Avg episode rewards: #0: 3.692, true rewards: #0: 3.492
+[2025-08-22 19:19:07,404][19241] Avg episode reward: 3.692, avg true_objective: 3.492
+[2025-08-22 19:19:07,504][19241] Num frames 1800...
+[2025-08-22 19:19:07,721][19241] Num frames 1900...
+[2025-08-22 19:19:07,954][19241] Num frames 2000...
+[2025-08-22 19:19:08,185][19241] Num frames 2100...
+[2025-08-22 19:19:08,308][19241] Avg episode rewards: #0: 3.717, true rewards: #0: 3.550
+[2025-08-22 19:19:08,309][19241] Avg episode reward: 3.717, avg true_objective: 3.550
+[2025-08-22 19:19:08,473][19241] Num frames 2200...
+[2025-08-22 19:19:08,724][19241] Num frames 2300...
+[2025-08-22 19:19:08,927][19241] Num frames 2400...
+[2025-08-22 19:19:09,102][19241] Num frames 2500...
+[2025-08-22 19:19:09,187][19241] Avg episode rewards: #0: 3.734, true rewards: #0: 3.591
+[2025-08-22 19:19:09,189][19241] Avg episode reward: 3.734, avg true_objective: 3.591
+[2025-08-22 19:19:09,358][19241] Num frames 2600...
+[2025-08-22 19:19:09,557][19241] Num frames 2700...
+[2025-08-22 19:19:09,752][19241] Num frames 2800...
+[2025-08-22 19:19:09,994][19241] Num frames 2900...
+[2025-08-22 19:19:10,157][19241] Avg episode rewards: #0: 3.953, true rewards: #0: 3.702
+[2025-08-22 19:19:10,159][19241] Avg episode reward: 3.953, avg true_objective: 3.702
+[2025-08-22 19:19:10,238][19241] Num frames 3000...
+[2025-08-22 19:19:10,407][19241] Num frames 3100...
+[2025-08-22 19:19:10,587][19241] Num frames 3200...
+[2025-08-22 19:19:10,763][19241] Num frames 3300...
+[2025-08-22 19:19:10,899][19241] Avg episode rewards: #0: 3.940, true rewards: #0: 3.718
+[2025-08-22 19:19:10,900][19241] Avg episode reward: 3.940, avg true_objective: 3.718
+[2025-08-22 19:19:10,996][19241] Num frames 3400...
+[2025-08-22 19:19:11,168][19241] Num frames 3500...
+[2025-08-22 19:19:11,356][19241] Num frames 3600...
+[2025-08-22 19:19:11,532][19241] Num frames 3700...
+[2025-08-22 19:19:11,639][19241] Avg episode rewards: #0: 3.930, true rewards: #0: 3.730
+[2025-08-22 19:19:11,640][19241] Avg episode reward: 3.930, avg true_objective: 3.730
+[2025-08-22 19:19:16,692][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!
+[2025-08-22 19:21:49,310][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:21:49,312][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:21:49,313][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:21:49,314][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:21:49,315][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:21:49,316][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:21:49,317][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-22 19:21:49,318][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:21:49,319][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-22 19:21:49,320][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-22 19:21:49,321][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:21:49,322][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:21:49,323][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:21:49,323][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:21:49,324][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:21:49,351][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:21:49,352][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:21:49,365][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:21:49,403][19241] Conv encoder output size: 512
+[2025-08-22 19:21:49,405][19241] Policy head output size: 512
+[2025-08-22 19:21:49,448][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:21:50,038][19241] Num frames 100...
+[2025-08-22 19:21:50,278][19241] Num frames 200...
+[2025-08-22 19:21:50,493][19241] Num frames 300...
+[2025-08-22 19:21:50,738][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-22 19:21:50,740][19241] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-22 19:21:50,773][19241] Num frames 400...
+[2025-08-22 19:21:50,956][19241] Num frames 500...
+[2025-08-22 19:21:51,161][19241] Num frames 600...
+[2025-08-22 19:21:51,353][19241] Num frames 700...
+[2025-08-22 19:21:51,533][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
+[2025-08-22 19:21:51,534][19241] Avg episode reward: 3.840, avg true_objective: 3.840
+[2025-08-22 19:21:51,635][19241] Num frames 800...
+[2025-08-22 19:21:51,846][19241] Num frames 900...
+[2025-08-22 19:21:52,030][19241] Num frames 1000...
+[2025-08-22 19:21:52,259][19241] Num frames 1100...
+[2025-08-22 19:21:52,458][19241] Num frames 1200...
+[2025-08-22 19:21:52,644][19241] Num frames 1300...
+[2025-08-22 19:21:52,818][19241] Num frames 1400...
+[2025-08-22 19:21:52,884][19241] Avg episode rewards: #0: 6.027, true rewards: #0: 4.693
+[2025-08-22 19:21:52,885][19241] Avg episode reward: 6.027, avg true_objective: 4.693
+[2025-08-22 19:21:53,050][19241] Num frames 1500...
+[2025-08-22 19:21:53,243][19241] Num frames 1600...
+[2025-08-22 19:21:53,425][19241] Num frames 1700...
+[2025-08-22 19:21:53,633][19241] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
+[2025-08-22 19:21:53,634][19241] Avg episode reward: 5.480, avg true_objective: 4.480
+[2025-08-22 19:21:53,647][19241] Num frames 1800...
+[2025-08-22 19:21:53,811][19241] Num frames 1900...
+[2025-08-22 19:21:53,992][19241] Num frames 2000...
+[2025-08-22 19:21:54,174][19241] Num frames 2100...
+[2025-08-22 19:21:54,361][19241] Avg episode rewards: #0: 5.152, true rewards: #0: 4.352
+[2025-08-22 19:21:54,363][19241] Avg episode reward: 5.152, avg true_objective: 4.352
+[2025-08-22 19:21:54,413][19241] Num frames 2200...
+[2025-08-22 19:21:54,617][19241] Num frames 2300...
+[2025-08-22 19:21:54,816][19241] Num frames 2400...
+[2025-08-22 19:21:55,001][19241] Num frames 2500...
+[2025-08-22 19:21:55,165][19241] Avg episode rewards: #0: 4.933, true rewards: #0: 4.267
+[2025-08-22 19:21:55,167][19241] Avg episode reward: 4.933, avg true_objective: 4.267
+[2025-08-22 19:21:55,235][19241] Num frames 2600...
+[2025-08-22 19:21:55,411][19241] Num frames 2700...
+[2025-08-22 19:21:55,587][19241] Num frames 2800...
+[2025-08-22 19:21:55,744][19241] Num frames 2900...
+[2025-08-22 19:21:55,874][19241] Avg episode rewards: #0: 4.777, true rewards: #0: 4.206
+[2025-08-22 19:21:55,876][19241] Avg episode reward: 4.777, avg true_objective: 4.206
+[2025-08-22 19:21:55,983][19241] Num frames 3000...
+[2025-08-22 19:21:56,151][19241] Num frames 3100...
+[2025-08-22 19:21:56,310][19241] Num frames 3200...
+[2025-08-22 19:21:56,483][19241] Num frames 3300...
+[2025-08-22 19:21:56,591][19241] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160
+[2025-08-22 19:21:56,592][19241] Avg episode reward: 4.660, avg true_objective: 4.160
+[2025-08-22 19:21:56,727][19241] Num frames 3400...
+[2025-08-22 19:21:56,922][19241] Num frames 3500...
+[2025-08-22 19:21:57,130][19241] Num frames 3600...
+[2025-08-22 19:21:57,313][19241] Num frames 3700...
+[2025-08-22 19:21:57,406][19241] Avg episode rewards: #0: 4.569, true rewards: #0: 4.124
+[2025-08-22 19:21:57,408][19241] Avg episode reward: 4.569, avg true_objective: 4.124
+[2025-08-22 19:21:57,560][19241] Num frames 3800...
+[2025-08-22 19:21:57,753][19241] Num frames 3900...
+[2025-08-22 19:21:57,914][19241] Num frames 4000...
+[2025-08-22 19:21:58,136][19241] Avg episode rewards: #0: 4.496, true rewards: #0: 4.096
+[2025-08-22 19:21:58,137][19241] Avg episode reward: 4.496, avg true_objective: 4.096
+[2025-08-22 19:22:03,544][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!
+[2025-08-22 19:24:35,459][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json
+[2025-08-22 19:24:35,461][19241] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-22 19:24:35,462][19241] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-22 19:24:35,463][19241] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-22 19:24:35,464][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-22 19:24:35,465][19241] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-22 19:24:35,465][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-22 19:24:35,466][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-22 19:24:35,467][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-22 19:24:35,468][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-22 19:24:35,469][19241] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-22 19:24:35,470][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-22 19:24:35,470][19241] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-22 19:24:35,471][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-22 19:24:35,472][19241] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-22 19:24:35,498][19241] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-22 19:24:35,500][19241] RunningMeanStd input shape: (1,)
+[2025-08-22 19:24:35,510][19241] ConvEncoder: input_channels=3
+[2025-08-22 19:24:35,549][19241] Conv encoder output size: 512
+[2025-08-22 19:24:35,551][19241] Policy head output size: 512
+[2025-08-22 19:24:35,572][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-22 19:24:36,013][19241] Num frames 100...
+[2025-08-22 19:24:36,223][19241] Num frames 200...
+[2025-08-22 19:24:36,427][19241] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560
+[2025-08-22 19:24:36,429][19241] Avg episode reward: 2.560, avg true_objective: 2.560
+[2025-08-22 19:24:36,537][19241] Num frames 300...
+[2025-08-22 19:24:36,765][19241] Num frames 400...
+[2025-08-22 19:24:36,974][19241] Num frames 500...
+[2025-08-22 19:24:37,139][19241] Num frames 600...
+[2025-08-22 19:24:37,316][19241] Avg episode rewards: #0: 3.860, true rewards: #0: 3.360
+[2025-08-22 19:24:37,317][19241] Avg episode reward: 3.860, avg true_objective: 3.360
+[2025-08-22 19:24:37,377][19241] Num frames 700...
+[2025-08-22 19:24:37,557][19241] Num frames 800...
+[2025-08-22 19:24:37,729][19241] Num frames 900...
+[2025-08-22 19:24:37,901][19241] Num frames 1000...
+[2025-08-22 19:24:37,994][19241] Avg episode rewards: #0: 4.080, true rewards: #0: 3.413
+[2025-08-22 19:24:37,996][19241] Avg episode reward: 4.080, avg true_objective: 3.413
+[2025-08-22 19:24:38,132][19241] Num frames 1100...
+[2025-08-22 19:24:41,253][19241] Num frames 1200...
+[2025-08-22 19:24:41,432][19241] Num frames 1300...
+[2025-08-22 19:24:41,648][19241] Num frames 1400...
+[2025-08-22 19:24:41,725][19241] Avg episode rewards: #0: 4.020, true rewards: #0: 3.520
+[2025-08-22 19:24:41,727][19241] Avg episode reward: 4.020, avg true_objective: 3.520
+[2025-08-22 19:24:41,900][19241] Num frames 1500...
+[2025-08-22 19:24:42,074][19241] Num frames 1600...
+[2025-08-22 19:24:42,252][19241] Num frames 1700...
+[2025-08-22 19:24:42,456][19241] Avg episode rewards: #0: 3.984, true rewards: #0: 3.584
+[2025-08-22 19:24:42,457][19241] Avg episode reward: 3.984, avg true_objective: 3.584
+[2025-08-22 19:24:42,474][19241] Num frames 1800...
+[2025-08-22 19:24:42,638][19241] Num frames 1900...
+[2025-08-22 19:24:42,804][19241] Num frames 2000...
+[2025-08-22 19:24:42,970][19241] Num frames 2100...
+[2025-08-22 19:24:43,150][19241] Num frames 2200...
+[2025-08-22 19:24:43,275][19241] Avg episode rewards: #0: 4.233, true rewards: #0: 3.733
+[2025-08-22 19:24:43,277][19241] Avg episode reward: 4.233, avg true_objective: 3.733
+[2025-08-22 19:24:43,391][19241] Num frames 2300...
+[2025-08-22 19:24:43,544][19241] Num frames 2400...
+[2025-08-22 19:24:43,700][19241] Num frames 2500...
+[2025-08-22 19:24:43,878][19241] Num frames 2600...
+[2025-08-22 19:24:44,040][19241] Avg episode rewards: #0: 4.366, true rewards: #0: 3.794
+[2025-08-22 19:24:44,041][19241] Avg episode reward: 4.366, avg true_objective: 3.794
+[2025-08-22 19:24:44,124][19241] Num frames 2700...
+[2025-08-22 19:24:44,301][19241] Num frames 2800...
+[2025-08-22 19:24:44,479][19241] Num frames 2900...
+[2025-08-22 19:24:44,660][19241] Num frames 3000...
+[2025-08-22 19:24:44,835][19241] Num frames 3100...
+[2025-08-22 19:24:45,016][19241] Avg episode rewards: #0: 4.710, true rewards: #0: 3.960
+[2025-08-22 19:24:45,018][19241] Avg episode reward: 4.710, avg true_objective: 3.960
+[2025-08-22 19:24:45,079][19241] Num frames 3200...
+[2025-08-22 19:24:45,255][19241] Num frames 3300...
+[2025-08-22 19:24:45,454][19241] Num frames 3400...
+[2025-08-22 19:24:45,629][19241] Num frames 3500...
+[2025-08-22 19:24:45,778][19241] Avg episode rewards: #0: 4.613, true rewards: #0: 3.947
+[2025-08-22 19:24:45,779][19241] Avg episode reward: 4.613, avg true_objective: 3.947
+[2025-08-22 19:24:45,871][19241] Num frames 3600...
+[2025-08-22 19:24:46,116][19241] Num frames 3700...
+[2025-08-22 19:24:46,281][19241] Num frames 3800...
+[2025-08-22 19:24:46,451][19241] Num frames 3900...
+[2025-08-22 19:24:46,564][19241] Avg episode rewards: #0: 4.536, true rewards: #0: 3.936
+[2025-08-22 19:24:46,565][19241] Avg episode reward: 4.536, avg true_objective: 3.936
+[2025-08-22 19:24:51,963][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!