diff --git "a/sf_log.txt" "b/sf_log.txt"
new file mode 100644--- /dev/null
+++ "b/sf_log.txt"
@@ -0,0 +1,1156 @@
+[2025-08-03 20:03:28,074][02632] Saving configuration to /content/train_dir/default_experiment/config.json...
+[2025-08-03 20:03:28,076][02632] Rollout worker 0 uses device cpu
+[2025-08-03 20:03:28,078][02632] Rollout worker 1 uses device cpu
+[2025-08-03 20:03:28,079][02632] Rollout worker 2 uses device cpu
+[2025-08-03 20:03:28,080][02632] Rollout worker 3 uses device cpu
+[2025-08-03 20:03:28,081][02632] Rollout worker 4 uses device cpu
+[2025-08-03 20:03:28,082][02632] Rollout worker 5 uses device cpu
+[2025-08-03 20:03:28,083][02632] Rollout worker 6 uses device cpu
+[2025-08-03 20:03:28,084][02632] Rollout worker 7 uses device cpu
+[2025-08-03 20:03:28,232][02632] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-03 20:03:28,233][02632] InferenceWorker_p0-w0: min num requests: 2
+[2025-08-03 20:03:28,263][02632] Starting all processes...
+[2025-08-03 20:03:28,264][02632] Starting process learner_proc0
+[2025-08-03 20:03:28,316][02632] Starting all processes...
+[2025-08-03 20:03:28,327][02632] Starting process inference_proc0-0
+[2025-08-03 20:03:28,327][02632] Starting process rollout_proc0
+[2025-08-03 20:03:28,328][02632] Starting process rollout_proc1
+[2025-08-03 20:03:28,329][02632] Starting process rollout_proc2
+[2025-08-03 20:03:28,330][02632] Starting process rollout_proc3
+[2025-08-03 20:03:28,330][02632] Starting process rollout_proc4
+[2025-08-03 20:03:28,330][02632] Starting process rollout_proc5
+[2025-08-03 20:03:28,330][02632] Starting process rollout_proc6
+[2025-08-03 20:03:28,330][02632] Starting process rollout_proc7
+[2025-08-03 20:03:45,992][02886] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-03 20:03:45,993][02886] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
+[2025-08-03 20:03:46,089][02886] Num visible devices: 1
+[2025-08-03 20:03:46,128][02886] Starting seed is not provided
+[2025-08-03 20:03:46,129][02886] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-03 20:03:46,129][02886] Initializing actor-critic model on device cuda:0
+[2025-08-03 20:03:46,130][02886] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-03 20:03:46,138][02886] RunningMeanStd input shape: (1,)
+[2025-08-03 20:03:46,181][02904] Worker 4 uses CPU cores [0]
+[2025-08-03 20:03:46,259][02886] ConvEncoder: input_channels=3
+[2025-08-03 20:03:46,376][02907] Worker 7 uses CPU cores [1]
+[2025-08-03 20:03:46,385][02903] Worker 3 uses CPU cores [1]
+[2025-08-03 20:03:46,463][02899] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-03 20:03:46,463][02899] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
+[2025-08-03 20:03:46,520][02900] Worker 1 uses CPU cores [1]
+[2025-08-03 20:03:46,563][02899] Num visible devices: 1
+[2025-08-03 20:03:46,577][02901] Worker 0 uses CPU cores [0]
+[2025-08-03 20:03:46,621][02902] Worker 2 uses CPU cores [0]
+[2025-08-03 20:03:46,661][02905] Worker 5 uses CPU cores [1]
+[2025-08-03 20:03:46,762][02906] Worker 6 uses CPU cores [0]
+[2025-08-03 20:03:46,808][02886] Conv encoder output size: 512
+[2025-08-03 20:03:46,809][02886] Policy head output size: 512
+[2025-08-03 20:03:46,868][02886] Created Actor Critic model with architecture:
+[2025-08-03 20:03:46,868][02886] ActorCriticSharedWeights(
+  (obs_normalizer): ObservationNormalizer(
+    (running_mean_std): RunningMeanStdDictInPlace(
+      (running_mean_std): ModuleDict(
+        (obs): RunningMeanStdInPlace()
+      )
+    )
+  )
+  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
+  (encoder): VizdoomEncoder(
+    (basic_encoder): ConvEncoder(
+      (enc): RecursiveScriptModule(
+        original_name=ConvEncoderImpl
+        (conv_head): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Conv2d)
+          (1): RecursiveScriptModule(original_name=ELU)
+          (2): RecursiveScriptModule(original_name=Conv2d)
+          (3): RecursiveScriptModule(original_name=ELU)
+          (4): RecursiveScriptModule(original_name=Conv2d)
+          (5): RecursiveScriptModule(original_name=ELU)
+        )
+        (mlp_layers): RecursiveScriptModule(
+          original_name=Sequential
+          (0): RecursiveScriptModule(original_name=Linear)
+          (1): RecursiveScriptModule(original_name=ELU)
+        )
+      )
+    )
+  )
+  (core): ModelCoreRNN(
+    (core): GRU(512, 512)
+  )
+  (decoder): MlpDecoder(
+    (mlp): Identity()
+  )
+  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
+  (action_parameterization): ActionParameterizationDefault(
+    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
+  )
+)
+[2025-08-03 20:03:47,122][02886] Using optimizer <class 'torch.optim.adam.Adam'>
+[2025-08-03 20:03:48,227][02632] Heartbeat connected on Batcher_0
+[2025-08-03 20:03:48,233][02632] Heartbeat connected on InferenceWorker_p0-w0
+[2025-08-03 20:03:48,243][02632] Heartbeat connected on RolloutWorker_w0
+[2025-08-03 20:03:48,244][02632] Heartbeat connected on RolloutWorker_w1
+[2025-08-03 20:03:48,247][02632] Heartbeat connected on RolloutWorker_w2
+[2025-08-03 20:03:48,253][02632] Heartbeat connected on RolloutWorker_w4
+[2025-08-03 20:03:48,255][02632] Heartbeat connected on RolloutWorker_w3
+[2025-08-03 20:03:48,257][02632] Heartbeat connected on RolloutWorker_w5
+[2025-08-03 20:03:48,260][02632] Heartbeat connected on RolloutWorker_w6
+[2025-08-03 20:03:48,267][02632] Heartbeat connected on RolloutWorker_w7
+[2025-08-03 20:03:51,604][02886] No checkpoints found
+[2025-08-03 20:03:51,604][02886] Did not load from checkpoint, starting from scratch!
+[2025-08-03 20:03:51,604][02886] Initialized policy 0 weights for model version 0
+[2025-08-03 20:03:51,607][02886] LearnerWorker_p0 finished initialization!
+[2025-08-03 20:03:51,608][02632] Heartbeat connected on LearnerWorker_p0
+[2025-08-03 20:03:51,608][02886] Using GPUs [0] for process 0 (actually maps to GPUs [0])
+[2025-08-03 20:03:51,731][02899] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-03 20:03:51,732][02899] RunningMeanStd input shape: (1,)
+[2025-08-03 20:03:51,745][02899] ConvEncoder: input_channels=3
+[2025-08-03 20:03:51,849][02899] Conv encoder output size: 512
+[2025-08-03 20:03:51,849][02899] Policy head output size: 512
+[2025-08-03 20:03:51,884][02632] Inference worker 0-0 is ready!
+[2025-08-03 20:03:51,885][02632] All inference workers are ready! Signal rollout workers to start!
+[2025-08-03 20:03:52,124][02905] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-03 20:03:52,125][02907] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-03 20:03:52,128][02904] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-03 20:03:52,128][02903] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-03 20:03:52,133][02901] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-03 20:03:52,136][02902] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-03 20:03:52,132][02900] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-03 20:03:52,139][02906] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-03 20:03:52,912][02632] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-03 20:03:53,621][02905] Decorrelating experience for 0 frames...
+[2025-08-03 20:03:53,622][02907] Decorrelating experience for 0 frames...
+[2025-08-03 20:03:53,621][02904] Decorrelating experience for 0 frames...
+[2025-08-03 20:03:54,359][02907] Decorrelating experience for 32 frames...
+[2025-08-03 20:03:54,361][02905] Decorrelating experience for 32 frames...
+[2025-08-03 20:03:54,370][02906] Decorrelating experience for 0 frames...
+[2025-08-03 20:03:54,375][02904] Decorrelating experience for 32 frames...
+[2025-08-03 20:03:55,025][02905] Decorrelating experience for 64 frames...
+[2025-08-03 20:03:55,392][02906] Decorrelating experience for 32 frames...
+[2025-08-03 20:03:55,538][02904] Decorrelating experience for 64 frames...
+[2025-08-03 20:03:55,811][02905] Decorrelating experience for 96 frames...
+[2025-08-03 20:03:56,186][02906] Decorrelating experience for 64 frames...
+[2025-08-03 20:03:56,421][02907] Decorrelating experience for 64 frames...
+[2025-08-03 20:03:56,596][02904] Decorrelating experience for 96 frames...
+[2025-08-03 20:03:57,094][02906] Decorrelating experience for 96 frames...
+[2025-08-03 20:03:57,482][02907] Decorrelating experience for 96 frames...
+[2025-08-03 20:03:57,912][02632] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 72.0. Samples: 360. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
+[2025-08-03 20:03:57,914][02632] Avg episode reward: [(0, '1.280')]
+[2025-08-03 20:04:01,426][02886] Signal inference workers to stop experience collection...
+[2025-08-03 20:04:01,442][02899] InferenceWorker_p0-w0: stopping experience collection
+[2025-08-03 20:04:02,731][02886] Signal inference workers to resume experience collection...
+[2025-08-03 20:04:02,732][02899] InferenceWorker_p0-w0: resuming experience collection
+[2025-08-03 20:04:02,912][02632] Fps is (10 sec: 409.6, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 4096. Throughput: 0: 186.4. Samples: 1864. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
+[2025-08-03 20:04:02,916][02632] Avg episode reward: [(0, '3.037')]
+[2025-08-03 20:04:07,912][02632] Fps is (10 sec: 2048.0, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 20480. Throughput: 0: 329.5. Samples: 4942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:04:07,916][02632] Avg episode reward: [(0, '3.876')]
+[2025-08-03 20:04:12,912][02632] Fps is (10 sec: 3276.8, 60 sec: 1843.2, 300 sec: 1843.2). Total num frames: 36864. Throughput: 0: 513.5. Samples: 10270. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:04:12,914][02632] Avg episode reward: [(0, '4.101')]
+[2025-08-03 20:04:13,308][02899] Updated weights for policy 0, policy_version 10 (0.0021)
+[2025-08-03 20:04:17,912][02632] Fps is (10 sec: 3276.7, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 496.8. Samples: 12420. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:04:17,917][02632] Avg episode reward: [(0, '4.360')]
+[2025-08-03 20:04:22,912][02632] Fps is (10 sec: 4096.0, 60 sec: 2594.1, 300 sec: 2594.1). Total num frames: 77824. Throughput: 0: 622.9. Samples: 18688. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:04:22,914][02632] Avg episode reward: [(0, '4.378')]
+[2025-08-03 20:04:23,814][02899] Updated weights for policy 0, policy_version 20 (0.0014)
+[2025-08-03 20:04:27,912][02632] Fps is (10 sec: 4096.0, 60 sec: 2691.7, 300 sec: 2691.7). Total num frames: 94208. Throughput: 0: 685.5. Samples: 23994. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:04:27,914][02632] Avg episode reward: [(0, '4.486')]
+[2025-08-03 20:04:32,914][02632] Fps is (10 sec: 3276.2, 60 sec: 2764.7, 300 sec: 2764.7). Total num frames: 110592. Throughput: 0: 664.8. Samples: 26592. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:04:32,918][02632] Avg episode reward: [(0, '4.414')]
+[2025-08-03 20:04:32,927][02886] Saving new best policy, reward=4.414!
+[2025-08-03 20:04:35,076][02899] Updated weights for policy 0, policy_version 30 (0.0015)
+[2025-08-03 20:04:37,912][02632] Fps is (10 sec: 3686.4, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 131072. Throughput: 0: 731.2. Samples: 32902. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:04:37,915][02632] Avg episode reward: [(0, '4.447')]
+[2025-08-03 20:04:37,917][02886] Saving new best policy, reward=4.447!
+[2025-08-03 20:04:42,912][02632] Fps is (10 sec: 3687.0, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 147456. Throughput: 0: 830.4. Samples: 37726. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:04:42,916][02632] Avg episode reward: [(0, '4.467')]
+[2025-08-03 20:04:42,923][02886] Saving new best policy, reward=4.467!
+[2025-08-03 20:04:46,515][02899] Updated weights for policy 0, policy_version 40 (0.0013)
+[2025-08-03 20:04:47,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 860.6. Samples: 40592. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:04:47,914][02632] Avg episode reward: [(0, '4.378')]
+[2025-08-03 20:04:52,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3140.3). Total num frames: 188416. Throughput: 0: 935.2. Samples: 47028. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:04:52,914][02632] Avg episode reward: [(0, '4.535')]
+[2025-08-03 20:04:52,918][02886] Saving new best policy, reward=4.535!
+[2025-08-03 20:04:57,699][02899] Updated weights for policy 0, policy_version 50 (0.0019)
+[2025-08-03 20:04:57,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3150.8). Total num frames: 204800. Throughput: 0: 921.6. Samples: 51742. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:04:57,916][02632] Avg episode reward: [(0, '4.659')]
+[2025-08-03 20:04:57,917][02886] Saving new best policy, reward=4.659!
+[2025-08-03 20:05:02,917][02632] Fps is (10 sec: 3684.7, 60 sec: 3686.1, 300 sec: 3218.1). Total num frames: 225280. Throughput: 0: 944.3. Samples: 54916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:05:02,918][02632] Avg episode reward: [(0, '4.599')]
+[2025-08-03 20:05:07,268][02899] Updated weights for policy 0, policy_version 60 (0.0012)
+[2025-08-03 20:05:07,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 245760. Throughput: 0: 946.9. Samples: 61298. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-03 20:05:07,914][02632] Avg episode reward: [(0, '4.509')]
+[2025-08-03 20:05:12,912][02632] Fps is (10 sec: 3688.1, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 937.7. Samples: 66192. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-03 20:05:12,917][02632] Avg episode reward: [(0, '4.508')]
+[2025-08-03 20:05:17,913][02632] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3325.0). Total num frames: 282624. Throughput: 0: 950.5. Samples: 69362. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:05:17,916][02632] Avg episode reward: [(0, '4.619')]
+[2025-08-03 20:05:18,258][02899] Updated weights for policy 0, policy_version 70 (0.0013)
+[2025-08-03 20:05:22,913][02632] Fps is (10 sec: 4095.8, 60 sec: 3754.6, 300 sec: 3367.8). Total num frames: 303104. Throughput: 0: 945.6. Samples: 75454. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:05:22,916][02632] Avg episode reward: [(0, '4.507')]
+[2025-08-03 20:05:22,924][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth...
+[2025-08-03 20:05:27,912][02632] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 955.3. Samples: 80716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:05:27,916][02632] Avg episode reward: [(0, '4.373')]
+[2025-08-03 20:05:29,356][02899] Updated weights for policy 0, policy_version 80 (0.0019)
+[2025-08-03 20:05:32,912][02632] Fps is (10 sec: 3686.6, 60 sec: 3823.0, 300 sec: 3399.7). Total num frames: 339968. Throughput: 0: 963.9. Samples: 83966. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:05:32,914][02632] Avg episode reward: [(0, '4.425')]
+[2025-08-03 20:05:37,914][02632] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 944.3. Samples: 89524. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:05:37,916][02632] Avg episode reward: [(0, '4.398')]
+[2025-08-03 20:05:40,226][02899] Updated weights for policy 0, policy_version 90 (0.0020)
+[2025-08-03 20:05:42,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3388.5). Total num frames: 372736. Throughput: 0: 948.1. Samples: 94406. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:05:42,917][02632] Avg episode reward: [(0, '4.341')]
+[2025-08-03 20:05:47,912][02632] Fps is (10 sec: 3277.5, 60 sec: 3686.4, 300 sec: 3383.7). Total num frames: 389120. Throughput: 0: 913.4. Samples: 96016. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:05:47,916][02632] Avg episode reward: [(0, '4.411')]
+[2025-08-03 20:05:52,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3379.2). Total num frames: 405504. Throughput: 0: 880.3. Samples: 100912. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:05:52,914][02632] Avg episode reward: [(0, '4.436')]
+[2025-08-03 20:05:53,847][02899] Updated weights for policy 0, policy_version 100 (0.0014)
+[2025-08-03 20:05:57,917][02632] Fps is (10 sec: 3684.7, 60 sec: 3686.1, 300 sec: 3407.7). Total num frames: 425984. Throughput: 0: 910.3. Samples: 107158. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:05:57,918][02632] Avg episode reward: [(0, '4.528')]
+[2025-08-03 20:06:02,915][02632] Fps is (10 sec: 4094.9, 60 sec: 3686.5, 300 sec: 3434.3). Total num frames: 446464. Throughput: 0: 907.7. Samples: 110212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:06:02,916][02632] Avg episode reward: [(0, '4.505')]
+[2025-08-03 20:06:04,169][02899] Updated weights for policy 0, policy_version 110 (0.0018)
+[2025-08-03 20:06:07,912][02632] Fps is (10 sec: 3688.1, 60 sec: 3618.1, 300 sec: 3428.5). Total num frames: 462848. Throughput: 0: 879.2. Samples: 115018. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:06:07,917][02632] Avg episode reward: [(0, '4.485')]
+[2025-08-03 20:06:12,912][02632] Fps is (10 sec: 3687.4, 60 sec: 3686.4, 300 sec: 3452.3). Total num frames: 483328. Throughput: 0: 904.4. Samples: 121414. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:06:12,917][02632] Avg episode reward: [(0, '4.520')]
+[2025-08-03 20:06:14,419][02899] Updated weights for policy 0, policy_version 120 (0.0013)
+[2025-08-03 20:06:17,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3446.3). Total num frames: 499712. Throughput: 0: 902.9. Samples: 124596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:06:17,915][02632] Avg episode reward: [(0, '4.532')]
+[2025-08-03 20:06:22,913][02632] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3467.9). Total num frames: 520192. Throughput: 0: 887.5. Samples: 129462. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:06:22,914][02632] Avg episode reward: [(0, '4.704')]
+[2025-08-03 20:06:22,919][02886] Saving new best policy, reward=4.704!
+[2025-08-03 20:06:25,709][02899] Updated weights for policy 0, policy_version 130 (0.0015)
+[2025-08-03 20:06:27,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3488.2). Total num frames: 540672. Throughput: 0: 920.2. Samples: 135816. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:06:27,917][02632] Avg episode reward: [(0, '4.624')]
+[2025-08-03 20:06:32,913][02632] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3481.6). Total num frames: 557056. Throughput: 0: 948.6. Samples: 138702. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:06:32,916][02632] Avg episode reward: [(0, '4.613')]
+[2025-08-03 20:06:36,966][02899] Updated weights for policy 0, policy_version 140 (0.0021)
+[2025-08-03 20:06:37,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3475.4). Total num frames: 573440. Throughput: 0: 945.9. Samples: 143476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:06:37,917][02632] Avg episode reward: [(0, '4.680')]
+[2025-08-03 20:06:42,912][02632] Fps is (10 sec: 4096.3, 60 sec: 3754.7, 300 sec: 3517.7). Total num frames: 598016. Throughput: 0: 948.9. Samples: 149852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:06:42,917][02632] Avg episode reward: [(0, '4.595')]
+[2025-08-03 20:06:47,914][02632] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3487.4). Total num frames: 610304. Throughput: 0: 934.3. Samples: 152256. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:06:47,918][02632] Avg episode reward: [(0, '4.683')]
+[2025-08-03 20:06:48,234][02899] Updated weights for policy 0, policy_version 150 (0.0029)
+[2025-08-03 20:06:52,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3504.4). Total num frames: 630784. Throughput: 0: 950.5. Samples: 157790. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:06:52,916][02632] Avg episode reward: [(0, '4.767')]
+[2025-08-03 20:06:52,922][02886] Saving new best policy, reward=4.767!
+[2025-08-03 20:06:57,855][02899] Updated weights for policy 0, policy_version 160 (0.0013)
+[2025-08-03 20:06:57,912][02632] Fps is (10 sec: 4506.5, 60 sec: 3823.2, 300 sec: 3542.5). Total num frames: 655360. Throughput: 0: 948.5. Samples: 164096. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:06:57,914][02632] Avg episode reward: [(0, '4.582')]
+[2025-08-03 20:07:02,913][02632] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3513.9). Total num frames: 667648. Throughput: 0: 922.5. Samples: 166110. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:07:02,921][02632] Avg episode reward: [(0, '4.791')]
+[2025-08-03 20:07:02,926][02886] Saving new best policy, reward=4.791!
+[2025-08-03 20:07:07,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3528.9). Total num frames: 688128. Throughput: 0: 946.3. Samples: 172046. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:07:07,914][02632] Avg episode reward: [(0, '4.894')]
+[2025-08-03 20:07:07,915][02886] Saving new best policy, reward=4.894!
+[2025-08-03 20:07:09,166][02899] Updated weights for policy 0, policy_version 170 (0.0023)
+[2025-08-03 20:07:12,912][02632] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3543.0). Total num frames: 708608. Throughput: 0: 936.2. Samples: 177944. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:07:12,915][02632] Avg episode reward: [(0, '4.516')]
+[2025-08-03 20:07:17,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3536.5). Total num frames: 724992. Throughput: 0: 915.8. Samples: 179914. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:07:17,914][02632] Avg episode reward: [(0, '4.444')]
+[2025-08-03 20:07:20,257][02899] Updated weights for policy 0, policy_version 180 (0.0015)
+[2025-08-03 20:07:22,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3549.9). Total num frames: 745472. Throughput: 0: 952.9. Samples: 186356. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:07:22,915][02632] Avg episode reward: [(0, '4.363')]
+[2025-08-03 20:07:22,921][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000182_745472.pth...
+[2025-08-03 20:07:27,914][02632] Fps is (10 sec: 4095.2, 60 sec: 3754.6, 300 sec: 3562.5). Total num frames: 765952. Throughput: 0: 933.7. Samples: 191870. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:07:27,917][02632] Avg episode reward: [(0, '4.530')]
+[2025-08-03 20:07:31,239][02899] Updated weights for policy 0, policy_version 190 (0.0012)
+[2025-08-03 20:07:32,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3556.1). Total num frames: 782336. Throughput: 0: 935.8. Samples: 194366. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:07:32,914][02632] Avg episode reward: [(0, '4.738')]
+[2025-08-03 20:07:37,912][02632] Fps is (10 sec: 3687.1, 60 sec: 3822.9, 300 sec: 3568.1). Total num frames: 802816. Throughput: 0: 955.1. Samples: 200768. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:07:37,918][02632] Avg episode reward: [(0, '4.660')]
+[2025-08-03 20:07:41,822][02899] Updated weights for policy 0, policy_version 200 (0.0017)
+[2025-08-03 20:07:42,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3561.7). Total num frames: 819200. Throughput: 0: 927.3. Samples: 205824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:07:42,917][02632] Avg episode reward: [(0, '4.654')]
+[2025-08-03 20:07:47,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3573.1). Total num frames: 839680. Throughput: 0: 949.0. Samples: 208814. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:07:47,915][02632] Avg episode reward: [(0, '4.529')]
+[2025-08-03 20:07:52,028][02899] Updated weights for policy 0, policy_version 210 (0.0014)
+[2025-08-03 20:07:52,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3584.0). Total num frames: 860160. Throughput: 0: 956.7. Samples: 215098. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:07:52,919][02632] Avg episode reward: [(0, '4.569')]
+[2025-08-03 20:07:57,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3577.7). Total num frames: 876544. Throughput: 0: 931.8. Samples: 219874. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:07:57,917][02632] Avg episode reward: [(0, '5.039')]
+[2025-08-03 20:07:57,921][02886] Saving new best policy, reward=5.039!
+[2025-08-03 20:08:02,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3588.1). Total num frames: 897024. Throughput: 0: 957.0. Samples: 222980. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:08:02,917][02632] Avg episode reward: [(0, '5.055')]
+[2025-08-03 20:08:02,928][02886] Saving new best policy, reward=5.055!
+[2025-08-03 20:08:03,345][02899] Updated weights for policy 0, policy_version 220 (0.0017)
+[2025-08-03 20:08:07,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3598.1). Total num frames: 917504. Throughput: 0: 953.5. Samples: 229262. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:08:07,914][02632] Avg episode reward: [(0, '4.897')]
+[2025-08-03 20:08:12,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3591.9). Total num frames: 933888. Throughput: 0: 937.2. Samples: 234044. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:08:12,915][02632] Avg episode reward: [(0, '4.827')]
+[2025-08-03 20:08:14,628][02899] Updated weights for policy 0, policy_version 230 (0.0013)
+[2025-08-03 20:08:17,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3601.4). Total num frames: 954368. Throughput: 0: 952.5. Samples: 237228. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-03 20:08:17,915][02632] Avg episode reward: [(0, '4.785')]
+[2025-08-03 20:08:22,920][02632] Fps is (10 sec: 3683.6, 60 sec: 3754.2, 300 sec: 3595.3). Total num frames: 970752. Throughput: 0: 944.2. Samples: 243266. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:08:22,922][02632] Avg episode reward: [(0, '4.764')]
+[2025-08-03 20:08:25,864][02899] Updated weights for policy 0, policy_version 240 (0.0013)
+[2025-08-03 20:08:27,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3604.5). Total num frames: 991232. Throughput: 0: 943.3. Samples: 248274. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:08:27,914][02632] Avg episode reward: [(0, '4.717')]
+[2025-08-03 20:08:32,912][02632] Fps is (10 sec: 4099.2, 60 sec: 3822.9, 300 sec: 3613.3). Total num frames: 1011712. Throughput: 0: 947.2. Samples: 251436. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:08:32,914][02632] Avg episode reward: [(0, '4.647')]
+[2025-08-03 20:08:35,639][02899] Updated weights for policy 0, policy_version 250 (0.0013)
+[2025-08-03 20:08:37,913][02632] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3607.4). Total num frames: 1028096. Throughput: 0: 933.8. Samples: 257118. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:08:37,914][02632] Avg episode reward: [(0, '4.464')]
+[2025-08-03 20:08:42,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3615.8). Total num frames: 1048576. Throughput: 0: 946.4. Samples: 262462. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:08:42,914][02632] Avg episode reward: [(0, '4.735')]
+[2025-08-03 20:08:46,715][02899] Updated weights for policy 0, policy_version 260 (0.0013)
+[2025-08-03 20:08:47,912][02632] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3623.9). Total num frames: 1069056. Throughput: 0: 947.3. Samples: 265608. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:08:47,916][02632] Avg episode reward: [(0, '5.054')]
+[2025-08-03 20:08:52,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1081344. Throughput: 0: 919.8. Samples: 270652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:08:52,915][02632] Avg episode reward: [(0, '4.864')]
+[2025-08-03 20:08:57,913][02632] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3721.1). Total num frames: 1101824. Throughput: 0: 945.2. Samples: 276578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-03 20:08:57,917][02632] Avg episode reward: [(0, '4.582')]
+[2025-08-03 20:08:58,130][02899] Updated weights for policy 0, policy_version 270 (0.0019)
+[2025-08-03 20:09:02,912][02632] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1126400. Throughput: 0: 943.6. Samples: 279692. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:09:02,923][02632] Avg episode reward: [(0, '4.614')]
+[2025-08-03 20:09:07,912][02632] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1138688. Throughput: 0: 915.2. Samples: 284444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:09:07,917][02632] Avg episode reward: [(0, '4.689')]
+[2025-08-03 20:09:09,363][02899] Updated weights for policy 0, policy_version 280 (0.0018)
+[2025-08-03 20:09:12,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1159168. Throughput: 0: 945.9. Samples: 290838. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:09:12,917][02632] Avg episode reward: [(0, '4.526')]
+[2025-08-03 20:09:17,913][02632] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3735.0). Total num frames: 1179648. Throughput: 0: 946.4. Samples: 294024. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:09:17,915][02632] Avg episode reward: [(0, '4.360')]
+[2025-08-03 20:09:20,208][02899] Updated weights for policy 0, policy_version 290 (0.0014)
+[2025-08-03 20:09:22,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3755.1, 300 sec: 3735.0). Total num frames: 1196032. Throughput: 0: 925.5. Samples: 298764. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-03 20:09:22,917][02632] Avg episode reward: [(0, '4.566')]
+[2025-08-03 20:09:22,924][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_1196032.pth...
+[2025-08-03 20:09:23,015][02886] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth
+[2025-08-03 20:09:27,912][02632] Fps is (10 sec: 3686.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1216512. Throughput: 0: 946.8. Samples: 305068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:09:27,921][02632] Avg episode reward: [(0, '4.516')]
+[2025-08-03 20:09:30,208][02899] Updated weights for policy 0, policy_version 300 (0.0013)
+[2025-08-03 20:09:32,917][02632] Fps is (10 sec: 3684.7, 60 sec: 3686.1, 300 sec: 3734.9). Total num frames: 1232896. Throughput: 0: 946.1. Samples: 308186. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:09:32,918][02632] Avg episode reward: [(0, '4.389')]
+[2025-08-03 20:09:37,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1253376. Throughput: 0: 942.2. Samples: 313052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:09:37,919][02632] Avg episode reward: [(0, '4.552')]
+[2025-08-03 20:09:41,327][02899] Updated weights for policy 0, policy_version 310 (0.0022)
+[2025-08-03 20:09:42,912][02632] Fps is (10 sec: 4097.9, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 1273856. Throughput: 0: 953.2. Samples: 319472. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:09:42,917][02632] Avg episode reward: [(0, '4.636')]
+[2025-08-03 20:09:47,913][02632] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1290240. Throughput: 0: 943.5. Samples: 322150. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:09:47,916][02632] Avg episode reward: [(0, '4.868')]
+[2025-08-03 20:09:52,426][02899] Updated weights for policy 0, policy_version 320 (0.0015)
+[2025-08-03 20:09:52,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1310720. Throughput: 0: 954.4. Samples: 327390. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:09:52,916][02632] Avg episode reward: [(0, '5.081')]
+[2025-08-03 20:09:52,927][02886] Saving new best policy, reward=5.081!
+[2025-08-03 20:09:57,912][02632] Fps is (10 sec: 4096.3, 60 sec: 3823.0, 300 sec: 3748.9). Total num frames: 1331200. Throughput: 0: 953.5. Samples: 333744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:09:57,914][02632] Avg episode reward: [(0, '5.012')]
+[2025-08-03 20:10:02,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 1347584. Throughput: 0: 932.0. Samples: 335962. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:10:02,918][02632] Avg episode reward: [(0, '4.878')]
+[2025-08-03 20:10:03,462][02899] Updated weights for policy 0, policy_version 330 (0.0017)
+[2025-08-03 20:10:07,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1368064. Throughput: 0: 955.1. Samples: 341744. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:10:07,917][02632] Avg episode reward: [(0, '4.868')]
+[2025-08-03 20:10:12,914][02632] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3748.9). Total num frames: 1388544. Throughput: 0: 950.8. Samples: 347856. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:10:12,915][02632] Avg episode reward: [(0, '4.894')]
+[2025-08-03 20:10:13,602][02899] Updated weights for policy 0, policy_version 340 (0.0013)
+[2025-08-03 20:10:17,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3735.0). Total num frames: 1404928. Throughput: 0: 925.0. Samples: 349808. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:10:17,917][02632] Avg episode reward: [(0, '4.642')]
+[2025-08-03 20:10:22,912][02632] Fps is (10 sec: 3686.9, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 1425408. Throughput: 0: 956.0. Samples: 356070. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:10:22,917][02632] Avg episode reward: [(0, '4.699')]
+[2025-08-03 20:10:24,402][02899] Updated weights for policy 0, policy_version 350 (0.0015)
+[2025-08-03 20:10:27,915][02632] Fps is (10 sec: 4095.1, 60 sec: 3822.8, 300 sec: 3748.9). Total num frames: 1445888. Throughput: 0: 938.0. Samples: 361682. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:10:27,916][02632] Avg episode reward: [(0, '4.693')]
+[2025-08-03 20:10:32,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3748.9). Total num frames: 1462272. Throughput: 0: 932.5. Samples: 364112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:10:32,917][02632] Avg episode reward: [(0, '4.539')]
+[2025-08-03 20:10:35,366][02899] Updated weights for policy 0, policy_version 360 (0.0016)
+[2025-08-03 20:10:37,912][02632] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 1482752. Throughput: 0: 958.5. Samples: 370522. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:10:37,915][02632] Avg episode reward: [(0, '4.624')]
+[2025-08-03 20:10:42,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1499136. Throughput: 0: 930.6. Samples: 375620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:10:42,918][02632] Avg episode reward: [(0, '4.744')]
+[2025-08-03 20:10:46,558][02899] Updated weights for policy 0, policy_version 370 (0.0014)
+[2025-08-03 20:10:47,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 1519616. Throughput: 0: 945.2. Samples: 378498. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:10:47,914][02632] Avg episode reward: [(0, '4.952')]
+[2025-08-03 20:10:52,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1540096. Throughput: 0: 956.9. Samples: 384806. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:10:52,914][02632] Avg episode reward: [(0, '5.012')]
+[2025-08-03 20:10:57,681][02899] Updated weights for policy 0, policy_version 380 (0.0012)
+[2025-08-03 20:10:57,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1556480. Throughput: 0: 925.3. Samples: 389492. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-03 20:10:57,915][02632] Avg episode reward: [(0, '4.829')]
+[2025-08-03 20:11:02,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1576960. Throughput: 0: 952.8. Samples: 392682. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:11:02,916][02632] Avg episode reward: [(0, '4.867')]
+[2025-08-03 20:11:07,355][02899] Updated weights for policy 0, policy_version 390 (0.0015)
+[2025-08-03 20:11:07,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1597440. Throughput: 0: 954.6. Samples: 399028. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:11:07,914][02632] Avg episode reward: [(0, '4.880')]
+[2025-08-03 20:11:12,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 1613824. Throughput: 0: 938.0. Samples: 403892. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:11:12,915][02632] Avg episode reward: [(0, '4.820')]
+[2025-08-03 20:11:17,913][02632] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1634304. Throughput: 0: 956.0. Samples: 407130. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:11:17,918][02632] Avg episode reward: [(0, '5.039')]
+[2025-08-03 20:11:18,438][02899] Updated weights for policy 0, policy_version 400 (0.0014)
+[2025-08-03 20:11:22,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1650688. Throughput: 0: 946.4. Samples: 413108. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:11:22,916][02632] Avg episode reward: [(0, '4.772')]
+[2025-08-03 20:11:22,924][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000403_1650688.pth...
+[2025-08-03 20:11:23,027][02886] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000182_745472.pth
+[2025-08-03 20:11:27,912][02632] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 1671168. Throughput: 0: 945.8. Samples: 418182. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:11:27,917][02632] Avg episode reward: [(0, '4.583')]
+[2025-08-03 20:11:29,715][02899] Updated weights for policy 0, policy_version 410 (0.0024)
+[2025-08-03 20:11:32,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1691648. Throughput: 0: 953.0. Samples: 421382. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-03 20:11:32,918][02632] Avg episode reward: [(0, '4.688')]
+[2025-08-03 20:11:37,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 1708032. Throughput: 0: 937.8. Samples: 427006. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:11:37,914][02632] Avg episode reward: [(0, '4.769')]
+[2025-08-03 20:11:40,773][02899] Updated weights for policy 0, policy_version 420 (0.0016)
+[2025-08-03 20:11:42,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 1728512. Throughput: 0: 956.9. Samples: 432554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:11:42,917][02632] Avg episode reward: [(0, '4.733')]
+[2025-08-03 20:11:47,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1748992. Throughput: 0: 956.8. Samples: 435736. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:11:47,917][02632] Avg episode reward: [(0, '4.806')]
+[2025-08-03 20:11:51,593][02899] Updated weights for policy 0, policy_version 430 (0.0012)
+[2025-08-03 20:11:52,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 1761280. Throughput: 0: 929.5. Samples: 440854. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:11:52,916][02632] Avg episode reward: [(0, '4.672')]
+[2025-08-03 20:11:57,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1785856. Throughput: 0: 956.4. Samples: 446932. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:11:57,916][02632] Avg episode reward: [(0, '4.655')]
+[2025-08-03 20:12:01,557][02899] Updated weights for policy 0, policy_version 440 (0.0014)
+[2025-08-03 20:12:02,916][02632] Fps is (10 sec: 4504.1, 60 sec: 3822.7, 300 sec: 3790.5). Total num frames: 1806336. Throughput: 0: 955.1. Samples: 450114. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:12:02,917][02632] Avg episode reward: [(0, '4.736')]
+[2025-08-03 20:12:07,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1822720. Throughput: 0: 928.5. Samples: 454892. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:12:07,914][02632] Avg episode reward: [(0, '4.629')]
+[2025-08-03 20:12:12,685][02899] Updated weights for policy 0, policy_version 450 (0.0012)
+[2025-08-03 20:12:12,912][02632] Fps is (10 sec: 3687.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1843200. Throughput: 0: 958.6. Samples: 461320. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:12:12,914][02632] Avg episode reward: [(0, '4.495')]
+[2025-08-03 20:12:17,913][02632] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3776.6). Total num frames: 1859584. Throughput: 0: 957.9. Samples: 464490. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:12:17,915][02632] Avg episode reward: [(0, '4.503')]
+[2025-08-03 20:12:22,915][02632] Fps is (10 sec: 3275.9, 60 sec: 3754.5, 300 sec: 3762.8). Total num frames: 1875968. Throughput: 0: 939.7. Samples: 469294. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:12:22,916][02632] Avg episode reward: [(0, '4.403')]
+[2025-08-03 20:12:23,833][02899] Updated weights for policy 0, policy_version 460 (0.0021)
+[2025-08-03 20:12:27,912][02632] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1900544. Throughput: 0: 956.0. Samples: 475576. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:12:27,914][02632] Avg episode reward: [(0, '4.460')]
+[2025-08-03 20:12:32,915][02632] Fps is (10 sec: 4096.1, 60 sec: 3754.5, 300 sec: 3776.6). Total num frames: 1916928. Throughput: 0: 951.4. Samples: 478550. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:12:32,918][02632] Avg episode reward: [(0, '4.501')]
+[2025-08-03 20:12:34,977][02899] Updated weights for policy 0, policy_version 470 (0.0016)
+[2025-08-03 20:12:37,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1933312. Throughput: 0: 949.2. Samples: 483570. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:12:37,913][02632] Avg episode reward: [(0, '4.532')]
+[2025-08-03 20:12:42,912][02632] Fps is (10 sec: 4097.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1957888. Throughput: 0: 953.6. Samples: 489844. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:12:42,914][02632] Avg episode reward: [(0, '4.804')]
+[2025-08-03 20:12:44,992][02899] Updated weights for policy 0, policy_version 480 (0.0013)
+[2025-08-03 20:12:47,913][02632] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1970176. Throughput: 0: 938.1. Samples: 492326. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:12:47,914][02632] Avg episode reward: [(0, '4.807')]
+[2025-08-03 20:12:52,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1990656. Throughput: 0: 953.2. Samples: 497786. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:12:52,914][02632] Avg episode reward: [(0, '4.818')]
+[2025-08-03 20:12:55,865][02899] Updated weights for policy 0, policy_version 490 (0.0013)
+[2025-08-03 20:12:57,912][02632] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2015232. Throughput: 0: 951.9. Samples: 504156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:12:57,914][02632] Avg episode reward: [(0, '4.762')]
+[2025-08-03 20:13:02,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3762.8). Total num frames: 2027520. Throughput: 0: 926.2. Samples: 506170. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:13:02,914][02632] Avg episode reward: [(0, '4.931')]
+[2025-08-03 20:13:06,917][02899] Updated weights for policy 0, policy_version 500 (0.0015)
+[2025-08-03 20:13:07,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2052096. Throughput: 0: 954.4. Samples: 512240. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:13:07,914][02632] Avg episode reward: [(0, '4.900')]
+[2025-08-03 20:13:12,916][02632] Fps is (10 sec: 4094.5, 60 sec: 3754.4, 300 sec: 3776.6). Total num frames: 2068480. Throughput: 0: 949.1. Samples: 518290. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:13:12,921][02632] Avg episode reward: [(0, '5.056')]
+[2025-08-03 20:13:17,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3776.8). Total num frames: 2084864. Throughput: 0: 930.7. Samples: 520430. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:13:17,916][02632] Avg episode reward: [(0, '5.116')]
+[2025-08-03 20:13:17,919][02886] Saving new best policy, reward=5.116!
+[2025-08-03 20:13:17,922][02899] Updated weights for policy 0, policy_version 510 (0.0013)
+[2025-08-03 20:13:22,912][02632] Fps is (10 sec: 4097.5, 60 sec: 3891.4, 300 sec: 3790.5). Total num frames: 2109440. Throughput: 0: 959.2. Samples: 526736. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:13:22,914][02632] Avg episode reward: [(0, '5.147')]
+[2025-08-03 20:13:22,924][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000515_2109440.pth...
+[2025-08-03 20:13:23,013][02886] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_1196032.pth
+[2025-08-03 20:13:23,023][02886] Saving new best policy, reward=5.147!
+[2025-08-03 20:13:27,914][02632] Fps is (10 sec: 4095.1, 60 sec: 3754.5, 300 sec: 3776.6). Total num frames: 2125824. Throughput: 0: 937.6. Samples: 532038. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:13:27,919][02632] Avg episode reward: [(0, '5.229')]
+[2025-08-03 20:13:27,920][02886] Saving new best policy, reward=5.229!
+[2025-08-03 20:13:29,032][02899] Updated weights for policy 0, policy_version 520 (0.0017)
+[2025-08-03 20:13:32,912][02632] Fps is (10 sec: 3276.8, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 2142208. Throughput: 0: 939.2. Samples: 534592. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:13:32,917][02632] Avg episode reward: [(0, '5.324')]
+[2025-08-03 20:13:32,990][02886] Saving new best policy, reward=5.324!
+[2025-08-03 20:13:37,915][02632] Fps is (10 sec: 4095.7, 60 sec: 3891.0, 300 sec: 3790.5). Total num frames: 2166784. Throughput: 0: 959.2. Samples: 540954. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:13:37,916][02632] Avg episode reward: [(0, '5.906')]
+[2025-08-03 20:13:37,922][02886] Saving new best policy, reward=5.906!
+[2025-08-03 20:13:38,778][02899] Updated weights for policy 0, policy_version 530 (0.0014)
+[2025-08-03 20:13:42,914][02632] Fps is (10 sec: 3685.6, 60 sec: 3686.3, 300 sec: 3762.7). Total num frames: 2179072. Throughput: 0: 928.7. Samples: 545948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:13:42,919][02632] Avg episode reward: [(0, '5.915')]
+[2025-08-03 20:13:42,926][02886] Saving new best policy, reward=5.915!
+[2025-08-03 20:13:47,912][02632] Fps is (10 sec: 3277.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2199552. Throughput: 0: 952.0. Samples: 549012. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:13:47,916][02632] Avg episode reward: [(0, '6.418')]
+[2025-08-03 20:13:47,933][02886] Saving new best policy, reward=6.418!
+[2025-08-03 20:13:49,902][02899] Updated weights for policy 0, policy_version 540 (0.0017)
+[2025-08-03 20:13:52,912][02632] Fps is (10 sec: 4506.5, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2224128. Throughput: 0: 957.2. Samples: 555312. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-03 20:13:52,914][02632] Avg episode reward: [(0, '5.949')]
+[2025-08-03 20:13:57,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2236416. Throughput: 0: 931.3. Samples: 560194. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:13:57,914][02632] Avg episode reward: [(0, '5.871')]
+[2025-08-03 20:14:00,936][02899] Updated weights for policy 0, policy_version 550 (0.0024)
+[2025-08-03 20:14:02,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2260992. Throughput: 0: 954.8. Samples: 563396. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:14:02,914][02632] Avg episode reward: [(0, '5.456')]
+[2025-08-03 20:14:07,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2277376. Throughput: 0: 957.1. Samples: 569806. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:14:07,914][02632] Avg episode reward: [(0, '4.972')]
+[2025-08-03 20:14:11,971][02899] Updated weights for policy 0, policy_version 560 (0.0015)
+[2025-08-03 20:14:12,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3790.6). Total num frames: 2297856. Throughput: 0: 947.6. Samples: 574678. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:14:12,915][02632] Avg episode reward: [(0, '4.986')]
+[2025-08-03 20:14:17,913][02632] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2318336. Throughput: 0: 963.3. Samples: 577940. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:14:17,916][02632] Avg episode reward: [(0, '5.467')]
+[2025-08-03 20:14:21,766][02899] Updated weights for policy 0, policy_version 570 (0.0014)
+[2025-08-03 20:14:22,913][02632] Fps is (10 sec: 3686.3, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2334720. Throughput: 0: 956.4. Samples: 583988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:14:22,914][02632] Avg episode reward: [(0, '5.738')]
+[2025-08-03 20:14:27,912][02632] Fps is (10 sec: 3686.5, 60 sec: 3823.1, 300 sec: 3804.5). Total num frames: 2355200. Throughput: 0: 964.7. Samples: 589358. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:14:27,914][02632] Avg episode reward: [(0, '5.944')]
+[2025-08-03 20:14:32,277][02899] Updated weights for policy 0, policy_version 580 (0.0017)
+[2025-08-03 20:14:32,912][02632] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2375680. Throughput: 0: 969.6. Samples: 592646. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:14:32,914][02632] Avg episode reward: [(0, '6.235')]
+[2025-08-03 20:14:37,913][02632] Fps is (10 sec: 3686.2, 60 sec: 3754.8, 300 sec: 3790.5). Total num frames: 2392064. Throughput: 0: 954.7. Samples: 598272. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:14:37,914][02632] Avg episode reward: [(0, '6.239')]
+[2025-08-03 20:14:42,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3804.4). Total num frames: 2412544. Throughput: 0: 975.8. Samples: 604104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:14:42,917][02632] Avg episode reward: [(0, '6.416')]
+[2025-08-03 20:14:43,191][02899] Updated weights for policy 0, policy_version 590 (0.0012)
+[2025-08-03 20:14:47,912][02632] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2437120. Throughput: 0: 977.7. Samples: 607394. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:14:47,918][02632] Avg episode reward: [(0, '6.137')]
+[2025-08-03 20:14:52,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2449408. Throughput: 0: 946.4. Samples: 612392. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:14:52,917][02632] Avg episode reward: [(0, '6.201')]
+[2025-08-03 20:14:54,086][02899] Updated weights for policy 0, policy_version 600 (0.0015)
+[2025-08-03 20:14:57,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3818.3). Total num frames: 2473984. Throughput: 0: 980.2. Samples: 618788. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:14:57,917][02632] Avg episode reward: [(0, '6.531')]
+[2025-08-03 20:14:57,922][02886] Saving new best policy, reward=6.531!
+[2025-08-03 20:15:02,914][02632] Fps is (10 sec: 4504.8, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 2494464. Throughput: 0: 979.3. Samples: 622012. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:15:02,919][02632] Avg episode reward: [(0, '6.781')]
+[2025-08-03 20:15:02,932][02886] Saving new best policy, reward=6.781!
+[2025-08-03 20:15:04,281][02899] Updated weights for policy 0, policy_version 610 (0.0023)
+[2025-08-03 20:15:07,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2510848. Throughput: 0: 954.4. Samples: 626936. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:15:07,914][02632] Avg episode reward: [(0, '6.490')]
+[2025-08-03 20:15:12,912][02632] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2531328. Throughput: 0: 978.9. Samples: 633410. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:15:12,916][02632] Avg episode reward: [(0, '6.709')]
+[2025-08-03 20:15:14,385][02899] Updated weights for policy 0, policy_version 620 (0.0015)
+[2025-08-03 20:15:17,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2551808. Throughput: 0: 979.3. Samples: 636716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:15:17,914][02632] Avg episode reward: [(0, '6.525')]
+[2025-08-03 20:15:22,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2568192. Throughput: 0: 963.5. Samples: 641628. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-03 20:15:22,918][02632] Avg episode reward: [(0, '6.484')]
+[2025-08-03 20:15:22,926][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000627_2568192.pth...
+[2025-08-03 20:15:23,016][02886] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000403_1650688.pth
+[2025-08-03 20:15:25,280][02899] Updated weights for policy 0, policy_version 630 (0.0019)
+[2025-08-03 20:15:27,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2588672. Throughput: 0: 978.5. Samples: 648136. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:15:27,917][02632] Avg episode reward: [(0, '6.323')]
+[2025-08-03 20:15:32,913][02632] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2605056. Throughput: 0: 967.9. Samples: 650948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:15:32,920][02632] Avg episode reward: [(0, '6.440')]
+[2025-08-03 20:15:36,109][02899] Updated weights for policy 0, policy_version 640 (0.0012)
+[2025-08-03 20:15:37,913][02632] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2625536. Throughput: 0: 979.2. Samples: 656454. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:15:37,914][02632] Avg episode reward: [(0, '5.839')]
+[2025-08-03 20:15:42,912][02632] Fps is (10 sec: 4505.9, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2650112. Throughput: 0: 983.6. Samples: 663048. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:15:42,917][02632] Avg episode reward: [(0, '6.078')]
+[2025-08-03 20:15:46,476][02899] Updated weights for policy 0, policy_version 650 (0.0018)
+[2025-08-03 20:15:47,914][02632] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3818.3). Total num frames: 2666496. Throughput: 0: 963.6. Samples: 665372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:15:47,919][02632] Avg episode reward: [(0, '6.404')]
+[2025-08-03 20:15:52,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 2686976. Throughput: 0: 986.0. Samples: 671304. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:15:52,914][02632] Avg episode reward: [(0, '6.227')]
+[2025-08-03 20:15:56,198][02899] Updated weights for policy 0, policy_version 660 (0.0017)
+[2025-08-03 20:15:57,912][02632] Fps is (10 sec: 4096.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2707456. Throughput: 0: 983.5. Samples: 677668. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-03 20:15:57,923][02632] Avg episode reward: [(0, '6.060')]
+[2025-08-03 20:16:02,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 2723840. Throughput: 0: 954.5. Samples: 679670. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:16:02,916][02632] Avg episode reward: [(0, '5.923')]
+[2025-08-03 20:16:06,921][02899] Updated weights for policy 0, policy_version 670 (0.0013)
+[2025-08-03 20:16:07,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2744320. Throughput: 0: 990.2. Samples: 686188. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-03 20:16:07,918][02632] Avg episode reward: [(0, '5.533')]
+[2025-08-03 20:16:12,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2764800. Throughput: 0: 974.4. Samples: 691982. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:16:12,915][02632] Avg episode reward: [(0, '5.165')]
+[2025-08-03 20:16:17,897][02899] Updated weights for policy 0, policy_version 680 (0.0013)
+[2025-08-03 20:16:17,912][02632] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2785280. Throughput: 0: 967.6. Samples: 694490. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:16:17,917][02632] Avg episode reward: [(0, '4.876')]
+[2025-08-03 20:16:22,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2805760. Throughput: 0: 989.0. Samples: 700960. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:16:22,914][02632] Avg episode reward: [(0, '5.028')]
+[2025-08-03 20:16:27,912][02632] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2822144. Throughput: 0: 960.5. Samples: 706270. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:16:27,914][02632] Avg episode reward: [(0, '5.181')]
+[2025-08-03 20:16:28,777][02899] Updated weights for policy 0, policy_version 690 (0.0018)
+[2025-08-03 20:16:32,913][02632] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2842624. Throughput: 0: 975.6. Samples: 709274. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:16:32,917][02632] Avg episode reward: [(0, '5.542')]
+[2025-08-03 20:16:37,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2863104. Throughput: 0: 990.8. Samples: 715888. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:16:37,914][02632] Avg episode reward: [(0, '5.086')]
+[2025-08-03 20:16:38,007][02899] Updated weights for policy 0, policy_version 700 (0.0014)
+[2025-08-03 20:16:42,912][02632] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2879488. Throughput: 0: 959.9. Samples: 720864. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:16:42,914][02632] Avg episode reward: [(0, '5.319')]
+[2025-08-03 20:16:47,914][02632] Fps is (10 sec: 4095.2, 60 sec: 3959.4, 300 sec: 3873.8). Total num frames: 2904064. Throughput: 0: 989.1. Samples: 724182. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:16:47,916][02632] Avg episode reward: [(0, '5.603')]
+[2025-08-03 20:16:48,776][02899] Updated weights for policy 0, policy_version 710 (0.0020)
+[2025-08-03 20:16:52,913][02632] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3860.0). Total num frames: 2924544. Throughput: 0: 991.7. Samples: 730814. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:16:52,914][02632] Avg episode reward: [(0, '5.188')]
+[2025-08-03 20:16:57,912][02632] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2940928. Throughput: 0: 972.3. Samples: 735734. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:16:57,917][02632] Avg episode reward: [(0, '5.161')]
+[2025-08-03 20:16:59,637][02899] Updated weights for policy 0, policy_version 720 (0.0013)
+[2025-08-03 20:17:02,912][02632] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2961408. Throughput: 0: 990.1. Samples: 739044. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:17:02,918][02632] Avg episode reward: [(0, '5.250')]
+[2025-08-03 20:17:07,912][02632] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2981888. Throughput: 0: 986.9. Samples: 745370. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:17:07,915][02632] Avg episode reward: [(0, '5.343')]
+[2025-08-03 20:17:10,305][02899] Updated weights for policy 0, policy_version 730 (0.0017)
+[2025-08-03 20:17:12,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2998272. Throughput: 0: 987.5. Samples: 750706. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:17:12,916][02632] Avg episode reward: [(0, '5.373')]
+[2025-08-03 20:17:17,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.8). Total num frames: 3022848. Throughput: 0: 993.3. Samples: 753970. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:17:17,916][02632] Avg episode reward: [(0, '5.458')]
+[2025-08-03 20:17:19,443][02899] Updated weights for policy 0, policy_version 740 (0.0014)
+[2025-08-03 20:17:22,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3039232. Throughput: 0: 976.7. Samples: 759840. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:17:22,918][02632] Avg episode reward: [(0, '6.023')]
+[2025-08-03 20:17:22,924][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000742_3039232.pth...
+[2025-08-03 20:17:23,053][02886] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000515_2109440.pth
+[2025-08-03 20:17:27,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 3059712. Throughput: 0: 991.0. Samples: 765458. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:17:27,914][02632] Avg episode reward: [(0, '6.727')]
+[2025-08-03 20:17:30,419][02899] Updated weights for policy 0, policy_version 750 (0.0013)
+[2025-08-03 20:17:32,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3080192. Throughput: 0: 990.2. Samples: 768738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:17:32,916][02632] Avg episode reward: [(0, '7.200')]
+[2025-08-03 20:17:32,922][02886] Saving new best policy, reward=7.200!
+[2025-08-03 20:17:37,913][02632] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3096576. Throughput: 0: 959.9. Samples: 774008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:17:37,920][02632] Avg episode reward: [(0, '8.056')]
+[2025-08-03 20:17:37,922][02886] Saving new best policy, reward=8.056!
+[2025-08-03 20:17:41,272][02899] Updated weights for policy 0, policy_version 760 (0.0016)
+[2025-08-03 20:17:42,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 3117056. Throughput: 0: 988.0. Samples: 780192. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:17:42,916][02632] Avg episode reward: [(0, '8.051')]
+[2025-08-03 20:17:47,912][02632] Fps is (10 sec: 4096.1, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 3137536. Throughput: 0: 988.0. Samples: 783506. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:17:47,918][02632] Avg episode reward: [(0, '7.897')]
+[2025-08-03 20:17:52,207][02899] Updated weights for policy 0, policy_version 770 (0.0018)
+[2025-08-03 20:17:52,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 3153920. Throughput: 0: 956.2. Samples: 788398. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:17:52,914][02632] Avg episode reward: [(0, '7.331')]
+[2025-08-03 20:17:57,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3174400. Throughput: 0: 981.3. Samples: 794864. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
+[2025-08-03 20:17:57,917][02632] Avg episode reward: [(0, '6.769')]
+[2025-08-03 20:18:01,836][02899] Updated weights for policy 0, policy_version 780 (0.0013)
+[2025-08-03 20:18:02,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3194880. Throughput: 0: 981.6. Samples: 798142. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:18:02,919][02632] Avg episode reward: [(0, '6.990')]
+[2025-08-03 20:18:07,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.9). Total num frames: 3211264. Throughput: 0: 958.6. Samples: 802978. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:18:07,917][02632] Avg episode reward: [(0, '7.583')]
+[2025-08-03 20:18:12,696][02899] Updated weights for policy 0, policy_version 790 (0.0016)
+[2025-08-03 20:18:12,915][02632] Fps is (10 sec: 4095.1, 60 sec: 3959.3, 300 sec: 3901.6). Total num frames: 3235840. Throughput: 0: 978.8. Samples: 809508. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:18:12,916][02632] Avg episode reward: [(0, '7.472')]
+[2025-08-03 20:18:17,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3252224. Throughput: 0: 973.6. Samples: 812548. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:18:17,914][02632] Avg episode reward: [(0, '7.116')]
+[2025-08-03 20:18:22,913][02632] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3887.8). Total num frames: 3272704. Throughput: 0: 969.1. Samples: 817616. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:18:22,916][02632] Avg episode reward: [(0, '7.973')]
+[2025-08-03 20:18:23,511][02899] Updated weights for policy 0, policy_version 800 (0.0014)
+[2025-08-03 20:18:27,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3293184. Throughput: 0: 977.6. Samples: 824182. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:18:27,917][02632] Avg episode reward: [(0, '8.305')]
+[2025-08-03 20:18:27,922][02886] Saving new best policy, reward=8.305!
+[2025-08-03 20:18:32,915][02632] Fps is (10 sec: 3685.7, 60 sec: 3822.8, 300 sec: 3873.9). Total num frames: 3309568. Throughput: 0: 961.1. Samples: 826758. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:18:32,919][02632] Avg episode reward: [(0, '8.428')]
+[2025-08-03 20:18:32,925][02886] Saving new best policy, reward=8.428!
+[2025-08-03 20:18:34,439][02899] Updated weights for policy 0, policy_version 810 (0.0012)
+[2025-08-03 20:18:37,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3330048. Throughput: 0: 977.7. Samples: 832394. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:18:37,914][02632] Avg episode reward: [(0, '8.744')]
+[2025-08-03 20:18:37,916][02886] Saving new best policy, reward=8.744!
+[2025-08-03 20:18:42,912][02632] Fps is (10 sec: 4506.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3354624. Throughput: 0: 978.9. Samples: 838916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:18:42,914][02632] Avg episode reward: [(0, '8.556')]
+[2025-08-03 20:18:44,114][02899] Updated weights for policy 0, policy_version 820 (0.0015)
+[2025-08-03 20:18:47,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3366912. Throughput: 0: 952.8. Samples: 841018. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:18:47,914][02632] Avg episode reward: [(0, '8.924')]
+[2025-08-03 20:18:47,916][02886] Saving new best policy, reward=8.924!
+[2025-08-03 20:18:52,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3391488. Throughput: 0: 982.1. Samples: 847174. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:18:52,918][02632] Avg episode reward: [(0, '9.093')]
+[2025-08-03 20:18:52,925][02886] Saving new best policy, reward=9.093!
+[2025-08-03 20:18:54,844][02899] Updated weights for policy 0, policy_version 830 (0.0012)
+[2025-08-03 20:18:57,913][02632] Fps is (10 sec: 4505.3, 60 sec: 3959.4, 300 sec: 3901.6). Total num frames: 3411968. Throughput: 0: 971.2. Samples: 853212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:18:57,915][02632] Avg episode reward: [(0, '9.790')]
+[2025-08-03 20:18:57,923][02886] Saving new best policy, reward=9.790!
+[2025-08-03 20:19:02,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3428352. Throughput: 0: 951.0. Samples: 855342. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:19:02,923][02632] Avg episode reward: [(0, '10.043')]
+[2025-08-03 20:19:02,931][02886] Saving new best policy, reward=10.043!
+[2025-08-03 20:19:05,636][02899] Updated weights for policy 0, policy_version 840 (0.0013)
+[2025-08-03 20:19:07,912][02632] Fps is (10 sec: 3686.7, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3448832. Throughput: 0: 985.3. Samples: 861954. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-03 20:19:07,914][02632] Avg episode reward: [(0, '10.628')]
+[2025-08-03 20:19:07,915][02886] Saving new best policy, reward=10.628!
+[2025-08-03 20:19:12,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3887.7). Total num frames: 3465216. Throughput: 0: 962.2. Samples: 867480. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:19:12,918][02632] Avg episode reward: [(0, '11.314')]
+[2025-08-03 20:19:12,932][02886] Saving new best policy, reward=11.314!
+[2025-08-03 20:19:16,295][02899] Updated weights for policy 0, policy_version 850 (0.0012)
+[2025-08-03 20:19:17,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3485696. Throughput: 0: 964.6. Samples: 870164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:19:17,916][02632] Avg episode reward: [(0, '11.483')]
+[2025-08-03 20:19:17,919][02886] Saving new best policy, reward=11.483!
+[2025-08-03 20:19:22,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3506176. Throughput: 0: 985.4. Samples: 876736. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:19:22,917][02632] Avg episode reward: [(0, '11.372')]
+[2025-08-03 20:19:22,963][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000857_3510272.pth...
+[2025-08-03 20:19:23,050][02886] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000627_2568192.pth
+[2025-08-03 20:19:26,833][02899] Updated weights for policy 0, policy_version 860 (0.0012)
+[2025-08-03 20:19:27,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3522560. Throughput: 0: 951.5. Samples: 881734. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:19:27,922][02632] Avg episode reward: [(0, '10.521')]
+[2025-08-03 20:19:32,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3915.5). Total num frames: 3547136. Throughput: 0: 977.1. Samples: 884988. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-03 20:19:32,914][02632] Avg episode reward: [(0, '10.504')]
+[2025-08-03 20:19:36,457][02899] Updated weights for policy 0, policy_version 870 (0.0013)
+[2025-08-03 20:19:37,912][02632] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3567616. Throughput: 0: 988.2. Samples: 891642. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:19:37,914][02632] Avg episode reward: [(0, '10.904')]
+[2025-08-03 20:19:42,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3584000. Throughput: 0: 966.3. Samples: 896696. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:19:42,915][02632] Avg episode reward: [(0, '11.269')]
+[2025-08-03 20:19:47,073][02899] Updated weights for policy 0, policy_version 880 (0.0013)
+[2025-08-03 20:19:47,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3604480. Throughput: 0: 993.3. Samples: 900040. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:19:47,917][02632] Avg episode reward: [(0, '11.378')]
+[2025-08-03 20:19:52,917][02632] Fps is (10 sec: 4093.9, 60 sec: 3890.9, 300 sec: 3901.5). Total num frames: 3624960. Throughput: 0: 993.5. Samples: 906668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:19:52,922][02632] Avg episode reward: [(0, '11.439')]
+[2025-08-03 20:19:57,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3887.8). Total num frames: 3641344. Throughput: 0: 981.5. Samples: 911648. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:19:57,915][02632] Avg episode reward: [(0, '10.924')]
+[2025-08-03 20:19:57,924][02899] Updated weights for policy 0, policy_version 890 (0.0013)
+[2025-08-03 20:20:02,912][02632] Fps is (10 sec: 4098.1, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3665920. Throughput: 0: 996.4. Samples: 915002. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:20:02,913][02632] Avg episode reward: [(0, '10.914')]
+[2025-08-03 20:20:07,758][02899] Updated weights for policy 0, policy_version 900 (0.0012)
+[2025-08-03 20:20:07,915][02632] Fps is (10 sec: 4504.5, 60 sec: 3959.3, 300 sec: 3915.5). Total num frames: 3686400. Throughput: 0: 987.7. Samples: 921184. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:20:07,916][02632] Avg episode reward: [(0, '12.074')]
+[2025-08-03 20:20:07,918][02886] Saving new best policy, reward=12.074!
+[2025-08-03 20:20:12,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 3702784. Throughput: 0: 1000.1. Samples: 926738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:20:12,919][02632] Avg episode reward: [(0, '11.615')]
+[2025-08-03 20:20:17,754][02899] Updated weights for policy 0, policy_version 910 (0.0012)
+[2025-08-03 20:20:17,913][02632] Fps is (10 sec: 4096.9, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3727360. Throughput: 0: 1001.3. Samples: 930046. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:20:17,919][02632] Avg episode reward: [(0, '11.942')]
+[2025-08-03 20:20:22,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3743744. Throughput: 0: 975.5. Samples: 935538. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:20:22,914][02632] Avg episode reward: [(0, '12.913')]
+[2025-08-03 20:20:22,921][02886] Saving new best policy, reward=12.913!
+[2025-08-03 20:20:27,915][02632] Fps is (10 sec: 3685.5, 60 sec: 4027.6, 300 sec: 3929.4). Total num frames: 3764224. Throughput: 0: 995.5. Samples: 941498. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:20:27,919][02632] Avg episode reward: [(0, '14.229')]
+[2025-08-03 20:20:27,925][02886] Saving new best policy, reward=14.229!
+[2025-08-03 20:20:28,723][02899] Updated weights for policy 0, policy_version 920 (0.0017)
+[2025-08-03 20:20:32,913][02632] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 3929.4). Total num frames: 3784704. Throughput: 0: 994.1. Samples: 944776. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:20:32,914][02632] Avg episode reward: [(0, '14.683')]
+[2025-08-03 20:20:32,922][02886] Saving new best policy, reward=14.683!
+[2025-08-03 20:20:37,912][02632] Fps is (10 sec: 3687.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3801088. Throughput: 0: 958.6. Samples: 949798. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:20:37,917][02632] Avg episode reward: [(0, '15.836')]
+[2025-08-03 20:20:37,924][02886] Saving new best policy, reward=15.836!
+[2025-08-03 20:20:39,496][02899] Updated weights for policy 0, policy_version 930 (0.0021)
+[2025-08-03 20:20:42,912][02632] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3821568. Throughput: 0: 993.8. Samples: 956368. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:20:42,914][02632] Avg episode reward: [(0, '15.629')]
+[2025-08-03 20:20:47,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3842048. Throughput: 0: 992.6. Samples: 959668. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:20:47,914][02632] Avg episode reward: [(0, '13.374')]
+[2025-08-03 20:20:49,851][02899] Updated weights for policy 0, policy_version 940 (0.0012)
+[2025-08-03 20:20:52,913][02632] Fps is (10 sec: 3686.3, 60 sec: 3891.5, 300 sec: 3901.6). Total num frames: 3858432. Throughput: 0: 967.7. Samples: 964728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:20:52,914][02632] Avg episode reward: [(0, '13.418')]
+[2025-08-03 20:20:57,912][02632] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 3883008. Throughput: 0: 990.3. Samples: 971300. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:20:57,914][02632] Avg episode reward: [(0, '13.807')]
+[2025-08-03 20:20:59,430][02899] Updated weights for policy 0, policy_version 950 (0.0014)
+[2025-08-03 20:21:02,912][02632] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3899392. Throughput: 0: 987.5. Samples: 974484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
+[2025-08-03 20:21:02,914][02632] Avg episode reward: [(0, '13.711')]
+[2025-08-03 20:21:07,915][02632] Fps is (10 sec: 3685.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3919872. Throughput: 0: 979.5. Samples: 979616. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-03 20:21:07,920][02632] Avg episode reward: [(0, '15.170')]
+[2025-08-03 20:21:10,261][02899] Updated weights for policy 0, policy_version 960 (0.0017)
+[2025-08-03 20:21:12,912][02632] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 3940352. Throughput: 0: 993.2. Samples: 986188. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:21:12,914][02632] Avg episode reward: [(0, '14.776')]
+[2025-08-03 20:21:17,912][02632] Fps is (10 sec: 3687.4, 60 sec: 3823.0, 300 sec: 3901.6). Total num frames: 3956736. Throughput: 0: 979.8. Samples: 988868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
+[2025-08-03 20:21:17,916][02632] Avg episode reward: [(0, '14.668')]
+[2025-08-03 20:21:21,095][02899] Updated weights for policy 0, policy_version 970 (0.0017)
+[2025-08-03 20:21:22,912][02632] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 3977216. Throughput: 0: 992.7. Samples: 994470. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:21:22,913][02632] Avg episode reward: [(0, '14.649')]
+[2025-08-03 20:21:22,921][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000972_3981312.pth...
+[2025-08-03 20:21:23,014][02886] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000742_3039232.pth
+[2025-08-03 20:21:27,915][02632] Fps is (10 sec: 4504.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 4001792. Throughput: 0: 989.9. Samples: 1000916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
+[2025-08-03 20:21:27,916][02632] Avg episode reward: [(0, '14.402')]
+[2025-08-03 20:21:28,841][02886] Stopping Batcher_0...
+[2025-08-03 20:21:28,842][02886] Loop batcher_evt_loop terminating...
+[2025-08-03 20:21:28,843][02632] Component Batcher_0 stopped!
+[2025-08-03 20:21:28,849][02632] Component RolloutWorker_w0 process died already! Don't wait for it.
+[2025-08-03 20:21:28,850][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-03 20:21:28,850][02632] Component RolloutWorker_w1 process died already! Don't wait for it.
+[2025-08-03 20:21:28,855][02632] Component RolloutWorker_w2 process died already! Don't wait for it.
+[2025-08-03 20:21:28,856][02632] Component RolloutWorker_w3 process died already! Don't wait for it.
+[2025-08-03 20:21:28,959][02899] Weights refcount: 2 0
+[2025-08-03 20:21:28,967][02632] Component InferenceWorker_p0-w0 stopped!
+[2025-08-03 20:21:28,966][02899] Stopping InferenceWorker_p0-w0...
+[2025-08-03 20:21:28,979][02899] Loop inference_proc0-0_evt_loop terminating...
+[2025-08-03 20:21:29,011][02886] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000857_3510272.pth
+[2025-08-03 20:21:29,024][02886] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-03 20:21:29,191][02886] Stopping LearnerWorker_p0...
+[2025-08-03 20:21:29,192][02886] Loop learner_proc0_evt_loop terminating...
+[2025-08-03 20:21:29,192][02632] Component LearnerWorker_p0 stopped!
+[2025-08-03 20:21:29,261][02632] Component RolloutWorker_w7 stopped!
+[2025-08-03 20:21:29,262][02907] Stopping RolloutWorker_w7...
+[2025-08-03 20:21:29,274][02907] Loop rollout_proc7_evt_loop terminating...
+[2025-08-03 20:21:29,322][02905] Stopping RolloutWorker_w5...
+[2025-08-03 20:21:29,324][02632] Component RolloutWorker_w5 stopped!
+[2025-08-03 20:21:29,330][02905] Loop rollout_proc5_evt_loop terminating...
+[2025-08-03 20:21:29,449][02632] Component RolloutWorker_w4 stopped!
+[2025-08-03 20:21:29,455][02904] Stopping RolloutWorker_w4...
+[2025-08-03 20:21:29,456][02904] Loop rollout_proc4_evt_loop terminating...
+[2025-08-03 20:21:29,528][02632] Component RolloutWorker_w6 stopped!
+[2025-08-03 20:21:29,529][02632] Waiting for process learner_proc0 to stop...
+[2025-08-03 20:21:29,530][02906] Stopping RolloutWorker_w6...
+[2025-08-03 20:21:29,531][02906] Loop rollout_proc6_evt_loop terminating...
+[2025-08-03 20:21:31,171][02632] Waiting for process inference_proc0-0 to join...
+[2025-08-03 20:21:31,177][02632] Waiting for process rollout_proc0 to join...
+[2025-08-03 20:21:31,178][02632] Waiting for process rollout_proc1 to join...
+[2025-08-03 20:21:31,184][02632] Waiting for process rollout_proc2 to join...
+[2025-08-03 20:21:31,184][02632] Waiting for process rollout_proc3 to join...
+[2025-08-03 20:21:31,185][02632] Waiting for process rollout_proc4 to join...
+[2025-08-03 20:21:32,209][02632] Waiting for process rollout_proc5 to join...
+[2025-08-03 20:21:32,210][02632] Waiting for process rollout_proc6 to join...
+[2025-08-03 20:21:32,214][02632] Waiting for process rollout_proc7 to join...
+[2025-08-03 20:21:32,215][02632] Batcher 0 profile tree view:
+batching: 21.2339, releasing_batches: 0.0264
+[2025-08-03 20:21:32,217][02632] InferenceWorker_p0-w0 profile tree view:
+wait_policy: 0.0092
+  wait_policy_total: 413.6809
+update_model: 9.0492
+  weight_update: 0.0012
+one_step: 0.0022
+  handle_policy_step: 596.4835
+    deserialize: 14.8088, stack: 3.6541, obs_to_device_normalize: 135.1099, forward: 310.6817, send_messages: 22.2921
+    prepare_outputs: 83.7538
+      to_cpu: 52.3756
+[2025-08-03 20:21:32,218][02632] Learner 0 profile tree view:
+misc: 0.0064, prepare_batch: 12.1557
+train: 66.9718
+  epoch_init: 0.0054, minibatch_init: 0.0097, losses_postprocess: 0.6009, kl_divergence: 0.5726, after_optimizer: 32.1211
+  calculate_losses: 22.7371
+    losses_init: 0.0035, forward_head: 1.1507, bptt_initial: 15.7829, tail: 0.8248, advantages_returns: 0.2078, losses: 2.9167
+    bptt: 1.6383
+      bptt_forward_core: 1.5666
+  update: 10.4995
+    clip: 0.9266
+[2025-08-03 20:21:32,219][02632] RolloutWorker_w7 profile tree view:
+wait_for_trajectories: 0.4415, enqueue_policy_requests: 155.5528, env_step: 729.5225, overhead: 18.0487, complete_rollouts: 6.0526
+save_policy_outputs: 24.7048
+  split_output_tensors: 9.4038
+[2025-08-03 20:21:32,220][02632] Loop Runner_EvtLoop terminating...
+[2025-08-03 20:21:32,221][02632] Runner profile tree view:
+main_loop: 1083.9575
+[2025-08-03 20:21:32,222][02632] Collected {0: 4005888}, FPS: 3695.6
+[2025-08-03 20:22:22,611][02632] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-03 20:22:22,611][02632] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-03 20:22:22,613][02632] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-03 20:22:22,614][02632] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-03 20:22:22,616][02632] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-03 20:22:22,617][02632] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-03 20:22:22,618][02632] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-03 20:22:22,619][02632] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-03 20:22:22,620][02632] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-03 20:22:22,621][02632] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-03 20:22:22,622][02632] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-03 20:22:22,623][02632] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-03 20:22:22,624][02632] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-03 20:22:22,625][02632] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-03 20:22:22,626][02632] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-03 20:22:22,654][02632] Doom resolution: 160x120, resize resolution: (128, 72)
+[2025-08-03 20:22:22,657][02632] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-03 20:22:22,659][02632] RunningMeanStd input shape: (1,)
+[2025-08-03 20:22:22,671][02632] ConvEncoder: input_channels=3
+[2025-08-03 20:22:22,772][02632] Conv encoder output size: 512
+[2025-08-03 20:22:22,773][02632] Policy head output size: 512
+[2025-08-03 20:22:23,033][02632] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-03 20:22:23,035][02632] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-03 20:22:23,038][02632] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-03 20:22:23,039][02632] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-03 20:22:23,041][02632] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-03 20:22:23,043][02632] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-03 20:22:46,264][02632] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-03 20:22:46,265][02632] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-03 20:22:46,266][02632] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-03 20:22:46,267][02632] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-03 20:22:46,268][02632] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-03 20:22:46,269][02632] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-03 20:22:46,270][02632] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-03 20:22:46,271][02632] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-03 20:22:46,272][02632] Adding new argument 'push_to_hub'=False that is not in the saved config file!
+[2025-08-03 20:22:46,273][02632] Adding new argument 'hf_repository'=None that is not in the saved config file!
+[2025-08-03 20:22:46,274][02632] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-03 20:22:46,274][02632] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-03 20:22:46,275][02632] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-03 20:22:46,276][02632] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-03 20:22:46,277][02632] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-03 20:22:46,305][02632] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-03 20:22:46,306][02632] RunningMeanStd input shape: (1,)
+[2025-08-03 20:22:46,316][02632] ConvEncoder: input_channels=3
+[2025-08-03 20:22:46,352][02632] Conv encoder output size: 512
+[2025-08-03 20:22:46,354][02632] Policy head output size: 512
+[2025-08-03 20:22:46,371][02632] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-03 20:22:46,400][02632] Loaded checkpoint successfully.
+[2025-08-03 20:22:47,088][02632] Num frames 100...
+[2025-08-03 20:22:47,213][02632] Num frames 200...
+[2025-08-03 20:22:47,340][02632] Num frames 300...
+[2025-08-03 20:22:47,471][02632] Num frames 400...
+[2025-08-03 20:22:47,597][02632] Num frames 500...
+[2025-08-03 20:22:47,725][02632] Num frames 600...
+[2025-08-03 20:22:47,864][02632] Num frames 700...
+[2025-08-03 20:22:47,991][02632] Num frames 800...
+[2025-08-03 20:22:48,118][02632] Num frames 900...
+[2025-08-03 20:22:48,247][02632] Num frames 1000...
+[2025-08-03 20:22:48,414][02632] Avg episode rewards: #0: 23.880, true rewards: #0: 10.880
+[2025-08-03 20:22:48,415][02632] Avg episode reward: 23.880, avg true_objective: 10.880
+[2025-08-03 20:22:48,433][02632] Num frames 1100...
+[2025-08-03 20:22:48,560][02632] Num frames 1200...
+[2025-08-03 20:22:48,685][02632] Num frames 1300...
+[2025-08-03 20:22:48,809][02632] Num frames 1400...
+[2025-08-03 20:22:48,943][02632] Num frames 1500...
+[2025-08-03 20:22:49,083][02632] Avg episode rewards: #0: 14.840, true rewards: #0: 7.840
+[2025-08-03 20:22:49,083][02632] Avg episode reward: 14.840, avg true_objective: 7.840
+[2025-08-03 20:22:49,125][02632] Num frames 1600...
+[2025-08-03 20:22:49,292][02632] Num frames 1700...
+[2025-08-03 20:22:49,466][02632] Num frames 1800...
+[2025-08-03 20:22:49,632][02632] Num frames 1900...
+[2025-08-03 20:22:49,801][02632] Num frames 2000...
+[2025-08-03 20:22:49,984][02632] Num frames 2100...
+[2025-08-03 20:22:50,157][02632] Num frames 2200...
+[2025-08-03 20:22:50,329][02632] Num frames 2300...
+[2025-08-03 20:22:50,496][02632] Num frames 2400...
+[2025-08-03 20:22:50,663][02632] Avg episode rewards: #0: 16.547, true rewards: #0: 8.213
+[2025-08-03 20:22:50,664][02632] Avg episode reward: 16.547, avg true_objective: 8.213
+[2025-08-03 20:22:50,728][02632] Num frames 2500...
+[2025-08-03 20:22:50,900][02632] Num frames 2600...
+[2025-08-03 20:22:51,089][02632] Num frames 2700...
+[2025-08-03 20:22:51,270][02632] Num frames 2800...
+[2025-08-03 20:22:51,421][02632] Num frames 2900...
+[2025-08-03 20:22:51,547][02632] Num frames 3000...
+[2025-08-03 20:22:51,693][02632] Avg episode rewards: #0: 14.680, true rewards: #0: 7.680
+[2025-08-03 20:22:51,694][02632] Avg episode reward: 14.680, avg true_objective: 7.680
+[2025-08-03 20:22:51,732][02632] Num frames 3100...
+[2025-08-03 20:22:51,857][02632] Num frames 3200...
+[2025-08-03 20:22:51,986][02632] Num frames 3300...
+[2025-08-03 20:22:52,125][02632] Num frames 3400...
+[2025-08-03 20:22:52,251][02632] Num frames 3500...
+[2025-08-03 20:22:52,384][02632] Num frames 3600...
+[2025-08-03 20:22:52,513][02632] Num frames 3700...
+[2025-08-03 20:22:52,641][02632] Num frames 3800...
+[2025-08-03 20:22:52,706][02632] Avg episode rewards: #0: 14.816, true rewards: #0: 7.616
+[2025-08-03 20:22:52,707][02632] Avg episode reward: 14.816, avg true_objective: 7.616
+[2025-08-03 20:22:52,825][02632] Num frames 3900...
+[2025-08-03 20:22:52,953][02632] Num frames 4000...
+[2025-08-03 20:22:53,090][02632] Num frames 4100...
+[2025-08-03 20:22:53,220][02632] Num frames 4200...
+[2025-08-03 20:22:53,347][02632] Num frames 4300...
+[2025-08-03 20:22:53,476][02632] Num frames 4400...
+[2025-08-03 20:22:53,603][02632] Num frames 4500...
+[2025-08-03 20:22:53,752][02632] Avg episode rewards: #0: 14.627, true rewards: #0: 7.627
+[2025-08-03 20:22:53,754][02632] Avg episode reward: 14.627, avg true_objective: 7.627
+[2025-08-03 20:22:53,787][02632] Num frames 4600...
+[2025-08-03 20:22:53,920][02632] Num frames 4700...
+[2025-08-03 20:22:54,048][02632] Num frames 4800...
+[2025-08-03 20:22:54,184][02632] Num frames 4900...
+[2025-08-03 20:22:54,314][02632] Num frames 5000...
+[2025-08-03 20:22:54,440][02632] Num frames 5100...
+[2025-08-03 20:22:54,566][02632] Num frames 5200...
+[2025-08-03 20:22:54,643][02632] Avg episode rewards: #0: 14.309, true rewards: #0: 7.451
+[2025-08-03 20:22:54,643][02632] Avg episode reward: 14.309, avg true_objective: 7.451
+[2025-08-03 20:22:54,748][02632] Num frames 5300...
+[2025-08-03 20:22:54,876][02632] Num frames 5400...
+[2025-08-03 20:22:55,006][02632] Num frames 5500...
+[2025-08-03 20:22:55,146][02632] Num frames 5600...
+[2025-08-03 20:22:55,272][02632] Num frames 5700...
+[2025-08-03 20:22:55,399][02632] Num frames 5800...
+[2025-08-03 20:22:55,526][02632] Num frames 5900...
+[2025-08-03 20:22:55,688][02632] Avg episode rewards: #0: 14.105, true rewards: #0: 7.480
+[2025-08-03 20:22:55,689][02632] Avg episode reward: 14.105, avg true_objective: 7.480
+[2025-08-03 20:22:55,711][02632] Num frames 6000...
+[2025-08-03 20:22:55,835][02632] Num frames 6100...
+[2025-08-03 20:22:55,959][02632] Num frames 6200...
+[2025-08-03 20:22:56,085][02632] Num frames 6300...
+[2025-08-03 20:22:56,224][02632] Num frames 6400...
+[2025-08-03 20:22:56,356][02632] Num frames 6500...
+[2025-08-03 20:22:56,484][02632] Num frames 6600...
+[2025-08-03 20:22:56,616][02632] Num frames 6700...
+[2025-08-03 20:22:56,745][02632] Num frames 6800...
+[2025-08-03 20:22:56,906][02632] Num frames 6900...
+[2025-08-03 20:22:57,034][02632] Num frames 7000...
+[2025-08-03 20:22:57,190][02632] Num frames 7100...
+[2025-08-03 20:22:57,321][02632] Num frames 7200...
+[2025-08-03 20:22:57,448][02632] Num frames 7300...
+[2025-08-03 20:22:57,584][02632] Num frames 7400...
+[2025-08-03 20:22:57,713][02632] Num frames 7500...
+[2025-08-03 20:22:57,844][02632] Num frames 7600...
+[2025-08-03 20:22:57,974][02632] Num frames 7700...
+[2025-08-03 20:22:58,085][02632] Avg episode rewards: #0: 16.938, true rewards: #0: 8.604
+[2025-08-03 20:22:58,086][02632] Avg episode reward: 16.938, avg true_objective: 8.604
+[2025-08-03 20:22:58,157][02632] Num frames 7800...
+[2025-08-03 20:22:58,295][02632] Num frames 7900...
+[2025-08-03 20:22:58,423][02632] Num frames 8000...
+[2025-08-03 20:22:58,549][02632] Num frames 8100...
+[2025-08-03 20:22:58,675][02632] Num frames 8200...
+[2025-08-03 20:22:58,801][02632] Num frames 8300...
+[2025-08-03 20:22:58,925][02632] Num frames 8400...
+[2025-08-03 20:22:59,052][02632] Num frames 8500...
+[2025-08-03 20:22:59,179][02632] Num frames 8600...
+[2025-08-03 20:22:59,320][02632] Num frames 8700...
+[2025-08-03 20:22:59,494][02632] Avg episode rewards: #0: 17.394, true rewards: #0: 8.794
+[2025-08-03 20:22:59,495][02632] Avg episode reward: 17.394, avg true_objective: 8.794
+[2025-08-03 20:23:53,235][02632] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
+[2025-08-03 20:26:42,608][02632] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-03 20:26:42,609][02632] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-03 20:26:42,610][02632] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-03 20:26:42,611][02632] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-03 20:26:42,612][02632] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-03 20:26:42,613][02632] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-03 20:26:42,614][02632] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-03 20:26:42,615][02632] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-03 20:26:42,616][02632] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-03 20:26:42,617][02632] Adding new argument 'hf_repository'='ArkenB/doom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-03 20:26:42,619][02632] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-03 20:26:42,619][02632] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-03 20:26:42,620][02632] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-03 20:26:42,622][02632] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-03 20:26:42,622][02632] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-03 20:26:42,647][02632] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-03 20:26:42,649][02632] RunningMeanStd input shape: (1,)
+[2025-08-03 20:26:42,660][02632] ConvEncoder: input_channels=3
+[2025-08-03 20:26:42,693][02632] Conv encoder output size: 512
+[2025-08-03 20:26:42,694][02632] Policy head output size: 512
+[2025-08-03 20:26:42,711][02632] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-03 20:26:42,713][02632] Could not load from checkpoint, attempt 0
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-03 20:26:42,714][02632] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-03 20:26:42,716][02632] Could not load from checkpoint, attempt 1
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-03 20:26:42,717][02632] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
+[2025-08-03 20:26:42,718][02632] Could not load from checkpoint, attempt 2
+Traceback (most recent call last):
+  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
+    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
+                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
+    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
+_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
+	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
+	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
+	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
+
+Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
+[2025-08-03 20:27:14,542][02632] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
+[2025-08-03 20:27:14,543][02632] Overriding arg 'num_workers' with value 1 passed from command line
+[2025-08-03 20:27:14,544][02632] Adding new argument 'no_render'=True that is not in the saved config file!
+[2025-08-03 20:27:14,545][02632] Adding new argument 'save_video'=True that is not in the saved config file!
+[2025-08-03 20:27:14,546][02632] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
+[2025-08-03 20:27:14,547][02632] Adding new argument 'video_name'=None that is not in the saved config file!
+[2025-08-03 20:27:14,548][02632] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
+[2025-08-03 20:27:14,549][02632] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
+[2025-08-03 20:27:14,550][02632] Adding new argument 'push_to_hub'=True that is not in the saved config file!
+[2025-08-03 20:27:14,551][02632] Adding new argument 'hf_repository'='ArkenB/doom_health_gathering_supreme' that is not in the saved config file!
+[2025-08-03 20:27:14,552][02632] Adding new argument 'policy_index'=0 that is not in the saved config file!
+[2025-08-03 20:27:14,552][02632] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
+[2025-08-03 20:27:14,553][02632] Adding new argument 'train_script'=None that is not in the saved config file!
+[2025-08-03 20:27:14,554][02632] Adding new argument 'enjoy_script'=None that is not in the saved config file!
+[2025-08-03 20:27:14,555][02632] Using frameskip 1 and render_action_repeat=4 for evaluation
+[2025-08-03 20:27:14,581][02632] RunningMeanStd input shape: (3, 72, 128)
+[2025-08-03 20:27:14,582][02632] RunningMeanStd input shape: (1,)
+[2025-08-03 20:27:14,591][02632] ConvEncoder: input_channels=3
+[2025-08-03 20:27:14,623][02632] Conv encoder output size: 512
+[2025-08-03 20:27:14,624][02632] Policy head output size: 512
+[2025-08-03 20:27:15,055][02632] Num frames 100...
+[2025-08-03 20:27:15,191][02632] Num frames 200...
+[2025-08-03 20:27:15,322][02632] Num frames 300...
+[2025-08-03 20:27:15,447][02632] Num frames 400...
+[2025-08-03 20:27:15,590][02632] Num frames 500...
+[2025-08-03 20:27:15,716][02632] Num frames 600...
+[2025-08-03 20:27:15,782][02632] Avg episode rewards: #0: 12.080, true rewards: #0: 6.080
+[2025-08-03 20:27:15,783][02632] Avg episode reward: 12.080, avg true_objective: 6.080
+[2025-08-03 20:27:15,897][02632] Num frames 700...
+[2025-08-03 20:27:16,024][02632] Num frames 800...
+[2025-08-03 20:27:16,150][02632] Num frames 900...
+[2025-08-03 20:27:16,285][02632] Num frames 1000...
+[2025-08-03 20:27:16,412][02632] Num frames 1100...
+[2025-08-03 20:27:16,537][02632] Num frames 1200...
+[2025-08-03 20:27:16,694][02632] Avg episode rewards: #0: 12.900, true rewards: #0: 6.400
+[2025-08-03 20:27:16,695][02632] Avg episode reward: 12.900, avg true_objective: 6.400
+[2025-08-03 20:27:16,723][02632] Num frames 1300...
+[2025-08-03 20:27:16,848][02632] Num frames 1400...
+[2025-08-03 20:27:16,974][02632] Num frames 1500...
+[2025-08-03 20:27:17,097][02632] Num frames 1600...
+[2025-08-03 20:27:17,230][02632] Num frames 1700...
+[2025-08-03 20:27:17,360][02632] Num frames 1800...
+[2025-08-03 20:27:17,444][02632] Avg episode rewards: #0: 11.080, true rewards: #0: 6.080
+[2025-08-03 20:27:17,445][02632] Avg episode reward: 11.080, avg true_objective: 6.080
+[2025-08-03 20:27:17,542][02632] Num frames 1900...
+[2025-08-03 20:27:17,667][02632] Num frames 2000...
+[2025-08-03 20:27:17,790][02632] Num frames 2100...
+[2025-08-03 20:27:17,915][02632] Num frames 2200...
+[2025-08-03 20:27:18,083][02632] Num frames 2300...
+[2025-08-03 20:27:18,264][02632] Num frames 2400...
+[2025-08-03 20:27:18,485][02632] Avg episode rewards: #0: 10.990, true rewards: #0: 6.240
+[2025-08-03 20:27:18,488][02632] Avg episode reward: 10.990, avg true_objective: 6.240
+[2025-08-03 20:27:18,498][02632] Num frames 2500...
+[2025-08-03 20:27:18,666][02632] Num frames 2600...
+[2025-08-03 20:27:18,830][02632] Num frames 2700...
+[2025-08-03 20:27:18,992][02632] Num frames 2800...
+[2025-08-03 20:27:19,156][02632] Num frames 2900...
+[2025-08-03 20:27:19,340][02632] Num frames 3000...
+[2025-08-03 20:27:19,516][02632] Num frames 3100...
+[2025-08-03 20:27:19,692][02632] Num frames 3200...
+[2025-08-03 20:27:19,868][02632] Num frames 3300...
+[2025-08-03 20:27:20,050][02632] Num frames 3400...
+[2025-08-03 20:27:20,191][02632] Num frames 3500...
+[2025-08-03 20:27:20,294][02632] Avg episode rewards: #0: 12.874, true rewards: #0: 7.074
+[2025-08-03 20:27:20,294][02632] Avg episode reward: 12.874, avg true_objective: 7.074
+[2025-08-03 20:27:20,384][02632] Num frames 3600...
+[2025-08-03 20:27:20,507][02632] Num frames 3700...
+[2025-08-03 20:27:20,632][02632] Num frames 3800...
+[2025-08-03 20:27:20,765][02632] Num frames 3900...
+[2025-08-03 20:27:20,891][02632] Num frames 4000...
+[2025-08-03 20:27:21,016][02632] Num frames 4100...
+[2025-08-03 20:27:21,142][02632] Num frames 4200...
+[2025-08-03 20:27:21,269][02632] Num frames 4300...
+[2025-08-03 20:27:21,402][02632] Num frames 4400...
+[2025-08-03 20:27:21,530][02632] Num frames 4500...
+[2025-08-03 20:27:21,657][02632] Num frames 4600...
+[2025-08-03 20:27:21,744][02632] Avg episode rewards: #0: 14.542, true rewards: #0: 7.708
+[2025-08-03 20:27:21,745][02632] Avg episode reward: 14.542, avg true_objective: 7.708
+[2025-08-03 20:27:21,842][02632] Num frames 4700...
+[2025-08-03 20:27:21,966][02632] Num frames 4800...
+[2025-08-03 20:27:22,093][02632] Num frames 4900...
+[2025-08-03 20:27:22,218][02632] Num frames 5000...
+[2025-08-03 20:27:22,344][02632] Num frames 5100...
+[2025-08-03 20:27:22,480][02632] Num frames 5200...
+[2025-08-03 20:27:22,604][02632] Num frames 5300...
+[2025-08-03 20:27:22,731][02632] Num frames 5400...
+[2025-08-03 20:27:22,854][02632] Num frames 5500...
+[2025-08-03 20:27:22,977][02632] Num frames 5600...
+[2025-08-03 20:27:23,110][02632] Num frames 5700...
+[2025-08-03 20:27:23,237][02632] Num frames 5800...
+[2025-08-03 20:27:23,362][02632] Num frames 5900...
+[2025-08-03 20:27:23,425][02632] Avg episode rewards: #0: 17.007, true rewards: #0: 8.436
+[2025-08-03 20:27:23,426][02632] Avg episode reward: 17.007, avg true_objective: 8.436
+[2025-08-03 20:27:23,565][02632] Num frames 6000...
+[2025-08-03 20:27:23,700][02632] Num frames 6100...
+[2025-08-03 20:27:23,830][02632] Num frames 6200...
+[2025-08-03 20:27:23,956][02632] Num frames 6300...
+[2025-08-03 20:27:24,087][02632] Num frames 6400...
+[2025-08-03 20:27:24,218][02632] Num frames 6500...
+[2025-08-03 20:27:24,349][02632] Num frames 6600...
+[2025-08-03 20:27:24,474][02632] Num frames 6700...
+[2025-08-03 20:27:24,610][02632] Num frames 6800...
+[2025-08-03 20:27:24,748][02632] Avg episode rewards: #0: 17.206, true rewards: #0: 8.581
+[2025-08-03 20:27:24,748][02632] Avg episode reward: 17.206, avg true_objective: 8.581
+[2025-08-03 20:27:24,792][02632] Num frames 6900...
+[2025-08-03 20:27:24,915][02632] Num frames 7000...
+[2025-08-03 20:27:25,037][02632] Num frames 7100...
+[2025-08-03 20:27:25,164][02632] Num frames 7200...
+[2025-08-03 20:27:25,289][02632] Num frames 7300...
+[2025-08-03 20:27:25,414][02632] Num frames 7400...
+[2025-08-03 20:27:25,484][02632] Avg episode rewards: #0: 16.121, true rewards: #0: 8.232
+[2025-08-03 20:27:25,485][02632] Avg episode reward: 16.121, avg true_objective: 8.232
+[2025-08-03 20:27:25,615][02632] Num frames 7500...
+[2025-08-03 20:27:25,745][02632] Num frames 7600...
+[2025-08-03 20:27:25,873][02632] Num frames 7700...
+[2025-08-03 20:27:26,000][02632] Num frames 7800...
+[2025-08-03 20:27:26,088][02632] Avg episode rewards: #0: 15.125, true rewards: #0: 7.825
+[2025-08-03 20:27:26,089][02632] Avg episode reward: 15.125, avg true_objective: 7.825
+[2025-08-03 20:28:13,414][02632] Replay video saved to /content/train_dir/default_experiment/replay.mp4!