diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,50 +1,68 @@ -[2025-08-22 18:32:41,754][19241] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2025-08-22 18:32:41,871][19241] Rollout worker 0 uses device cpu -[2025-08-22 18:32:41,873][19241] Rollout worker 1 uses device cpu -[2025-08-22 18:32:41,873][19241] Rollout worker 2 uses device cpu -[2025-08-22 18:32:41,874][19241] Rollout worker 3 uses device cpu -[2025-08-22 18:32:41,875][19241] Rollout worker 4 uses device cpu -[2025-08-22 18:32:41,876][19241] Rollout worker 5 uses device cpu -[2025-08-22 18:32:41,877][19241] Rollout worker 6 uses device cpu -[2025-08-22 18:32:41,878][19241] Rollout worker 7 uses device cpu -[2025-08-22 18:32:41,945][19241] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-22 18:32:41,947][19241] InferenceWorker_p0-w0: min num requests: 2 -[2025-08-22 18:32:41,969][19241] Starting all processes... -[2025-08-22 18:32:41,971][19241] Starting process learner_proc0 -[2025-08-22 18:32:42,068][19241] Starting all processes... -[2025-08-22 18:32:42,076][19241] Starting process inference_proc0-0 -[2025-08-22 18:32:42,078][19241] Starting process rollout_proc0 -[2025-08-22 18:32:42,080][19241] Starting process rollout_proc1 -[2025-08-22 18:32:42,081][19241] Starting process rollout_proc2 -[2025-08-22 18:32:42,081][19241] Starting process rollout_proc3 -[2025-08-22 18:32:42,082][19241] Starting process rollout_proc4 -[2025-08-22 18:32:42,083][19241] Starting process rollout_proc5 -[2025-08-22 18:32:42,083][19241] Starting process rollout_proc6 -[2025-08-22 18:32:42,083][19241] Starting process rollout_proc7 -[2025-08-22 18:32:44,745][19431] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-22 18:32:44,745][19431] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2025-08-22 18:32:44,745][19433] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2025-08-22 18:32:44,761][19434] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2025-08-22 18:32:44,791][19435] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2025-08-22 18:32:44,796][19418] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-22 18:32:44,797][19418] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2025-08-22 18:32:44,808][19432] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2025-08-22 18:32:44,836][19439] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2025-08-22 18:32:44,853][19418] Num visible devices: 1 -[2025-08-22 18:32:44,854][19418] Starting seed is not provided -[2025-08-22 18:32:44,855][19418] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-22 18:32:44,854][19431] Num visible devices: 1 -[2025-08-22 18:32:44,855][19418] Initializing actor-critic model on device cuda:0 -[2025-08-22 18:32:44,855][19418] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 18:32:44,864][19418] RunningMeanStd input shape: (1,) -[2025-08-22 18:32:44,873][19438] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2025-08-22 18:32:44,882][19418] ConvEncoder: input_channels=3 -[2025-08-22 18:32:44,933][19437] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2025-08-22 18:32:44,954][19436] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] -[2025-08-22 18:32:45,117][19418] Conv encoder output size: 512 -[2025-08-22 18:32:45,118][19418] Policy head output size: 512 -[2025-08-22 18:32:45,175][19418] Created Actor Critic model with architecture: -[2025-08-22 18:32:45,175][19418] ActorCriticSharedWeights( +[2025-09-02 16:15:02,649][03057] Saving configuration to /content/train_dir/default_experiment/config.json... +[2025-09-02 16:15:02,651][03057] Rollout worker 0 uses device cpu +[2025-09-02 16:15:02,651][03057] Rollout worker 1 uses device cpu +[2025-09-02 16:15:02,652][03057] Rollout worker 2 uses device cpu +[2025-09-02 16:15:02,653][03057] Rollout worker 3 uses device cpu +[2025-09-02 16:15:02,654][03057] Rollout worker 4 uses device cpu +[2025-09-02 16:15:02,656][03057] Rollout worker 5 uses device cpu +[2025-09-02 16:15:02,657][03057] Rollout worker 6 uses device cpu +[2025-09-02 16:15:02,658][03057] Rollout worker 7 uses device cpu +[2025-09-02 16:15:02,660][03057] Rollout worker 8 uses device cpu +[2025-09-02 16:15:02,660][03057] Rollout worker 9 uses device cpu +[2025-09-02 16:15:03,884][03057] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-02 16:15:03,886][03057] InferenceWorker_p0-w0: min num requests: 3 +[2025-09-02 16:15:03,920][03057] Starting all processes... +[2025-09-02 16:15:03,921][03057] Starting process learner_proc0 +[2025-09-02 16:15:03,983][03057] Starting all processes... +[2025-09-02 16:15:03,990][03057] Starting process inference_proc0-0 +[2025-09-02 16:15:03,991][03057] Starting process rollout_proc0 +[2025-09-02 16:15:03,991][03057] Starting process rollout_proc1 +[2025-09-02 16:15:03,992][03057] Starting process rollout_proc2 +[2025-09-02 16:15:03,992][03057] Starting process rollout_proc3 +[2025-09-02 16:15:03,993][03057] Starting process rollout_proc4 +[2025-09-02 16:15:03,993][03057] Starting process rollout_proc5 +[2025-09-02 16:15:03,993][03057] Starting process rollout_proc6 +[2025-09-02 16:15:03,993][03057] Starting process rollout_proc7 +[2025-09-02 16:15:03,993][03057] Starting process rollout_proc8 +[2025-09-02 16:15:03,993][03057] Starting process rollout_proc9 +[2025-09-02 16:15:23,608][03391] Worker 0 uses CPU cores [0] +[2025-09-02 16:15:23,892][03057] Heartbeat connected on RolloutWorker_w0 +[2025-09-02 16:15:23,947][03394] Worker 2 uses CPU cores [0] +[2025-09-02 16:15:23,948][03057] Heartbeat connected on RolloutWorker_w2 +[2025-09-02 16:15:24,073][03397] Worker 6 uses CPU cores [0] +[2025-09-02 16:15:24,078][03057] Heartbeat connected on RolloutWorker_w6 +[2025-09-02 16:15:24,179][03396] Worker 5 uses CPU cores [1] +[2025-09-02 16:15:24,184][03057] Heartbeat connected on RolloutWorker_w5 +[2025-09-02 16:15:24,199][03398] Worker 7 uses CPU cores [1] +[2025-09-02 16:15:24,200][03057] Heartbeat connected on RolloutWorker_w7 +[2025-09-02 16:15:24,222][03400] Worker 9 uses CPU cores [1] +[2025-09-02 16:15:24,223][03393] Worker 3 uses CPU cores [1] +[2025-09-02 16:15:24,236][03057] Heartbeat connected on RolloutWorker_w3 +[2025-09-02 16:15:24,237][03057] Heartbeat connected on RolloutWorker_w9 +[2025-09-02 16:15:24,394][03375] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-02 16:15:24,395][03375] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-09-02 16:15:24,482][03375] Num visible devices: 1 +[2025-09-02 16:15:24,489][03057] Heartbeat connected on Batcher_0 +[2025-09-02 16:15:24,490][03375] Starting seed is not provided +[2025-09-02 16:15:24,491][03375] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-02 16:15:24,492][03375] Initializing actor-critic model on device cuda:0 +[2025-09-02 16:15:24,495][03375] RunningMeanStd input shape: (3, 72, 128) +[2025-09-02 16:15:24,501][03375] RunningMeanStd input shape: (1,) +[2025-09-02 16:15:24,538][03399] Worker 8 uses CPU cores [0] +[2025-09-02 16:15:24,554][03392] Worker 1 uses CPU cores [1] +[2025-09-02 16:15:24,567][03057] Heartbeat connected on RolloutWorker_w8 +[2025-09-02 16:15:24,568][03057] Heartbeat connected on RolloutWorker_w1 +[2025-09-02 16:15:24,567][03375] ConvEncoder: input_channels=3 +[2025-09-02 16:15:24,593][03390] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-02 16:15:24,595][03390] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-09-02 16:15:24,643][03395] Worker 4 uses CPU cores [0] +[2025-09-02 16:15:24,645][03057] Heartbeat connected on RolloutWorker_w4 +[2025-09-02 16:15:24,649][03390] Num visible devices: 1 +[2025-09-02 16:15:24,651][03057] Heartbeat connected on InferenceWorker_p0-w0 +[2025-09-02 16:15:24,893][03375] Conv encoder output size: 512 +[2025-09-02 16:15:24,893][03375] Policy head output size: 512 +[2025-09-02 16:15:24,941][03375] Created Actor Critic model with architecture: +[2025-09-02 16:15:24,942][03375] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -85,8566 +103,3775 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2025-08-22 18:32:46,232][19418] Using optimizer -[2025-08-22 18:32:51,539][19418] No checkpoints found -[2025-08-22 18:32:51,539][19418] Did not load from checkpoint, starting from scratch! -[2025-08-22 18:32:51,540][19418] Initialized policy 0 weights for model version 0 -[2025-08-22 18:32:51,548][19418] LearnerWorker_p0 finished initialization! -[2025-08-22 18:32:51,548][19418] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-22 18:32:51,854][19431] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 18:32:51,858][19431] RunningMeanStd input shape: (1,) -[2025-08-22 18:32:51,879][19431] ConvEncoder: input_channels=3 -[2025-08-22 18:32:51,930][19241] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-22 18:32:51,991][19431] Conv encoder output size: 512 -[2025-08-22 18:32:51,992][19431] Policy head output size: 512 -[2025-08-22 18:32:52,042][19241] Inference worker 0-0 is ready! -[2025-08-22 18:32:52,043][19241] All inference workers are ready! Signal rollout workers to start! -[2025-08-22 18:32:52,119][19439] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-22 18:32:52,124][19433] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-22 18:32:52,127][19438] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-22 18:32:52,129][19437] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-22 18:32:52,143][19432] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-22 18:32:52,145][19436] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-22 18:32:52,148][19434] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-22 18:32:52,149][19435] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-22 18:32:52,605][19435] Decorrelating experience for 0 frames... -[2025-08-22 18:32:52,605][19437] Decorrelating experience for 0 frames... -[2025-08-22 18:32:52,848][19435] Decorrelating experience for 32 frames... -[2025-08-22 18:32:53,149][19437] Decorrelating experience for 32 frames... -[2025-08-22 18:32:53,283][19435] Decorrelating experience for 64 frames... -[2025-08-22 18:32:53,574][19435] Decorrelating experience for 96 frames... -[2025-08-22 18:32:56,931][19241] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-22 18:32:56,934][19241] Avg episode reward: [(0, '3.950')] -[2025-08-22 18:33:01,930][19241] Fps is (10 sec: 409.6, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 4096. Throughput: 0: 184.4. Samples: 1844. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-08-22 18:33:01,932][19241] Avg episode reward: [(0, '4.487')] -[2025-08-22 18:33:01,938][19241] Heartbeat connected on Batcher_0 -[2025-08-22 18:33:01,960][19241] Heartbeat connected on InferenceWorker_p0-w0 -[2025-08-22 18:33:01,967][19241] Heartbeat connected on RolloutWorker_w6 -[2025-08-22 18:33:02,072][19241] Heartbeat connected on LearnerWorker_p0 -[2025-08-22 18:33:06,930][19241] Fps is (10 sec: 819.3, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 8192. Throughput: 0: 147.1. Samples: 2206. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-08-22 18:33:06,932][19241] Avg episode reward: [(0, '4.709')] -[2025-08-22 18:33:11,930][19241] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 16384. Throughput: 0: 193.1. Samples: 3862. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) -[2025-08-22 18:33:11,931][19241] Avg episode reward: [(0, '4.540')] -[2025-08-22 18:33:15,151][19438] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 1 -[2025-08-22 18:33:15,151][19433] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 1 -[2025-08-22 18:33:16,236][19437] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 1 -[2025-08-22 18:33:16,930][19241] Fps is (10 sec: 1638.3, 60 sec: 983.0, 300 sec: 983.0). Total num frames: 24576. Throughput: 0: 248.9. Samples: 6222. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) -[2025-08-22 18:33:16,931][19241] Avg episode reward: [(0, '4.527')] -[2025-08-22 18:33:21,930][19241] Fps is (10 sec: 1638.4, 60 sec: 1092.3, 300 sec: 1092.3). Total num frames: 32768. Throughput: 0: 245.1. Samples: 7354. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) -[2025-08-22 18:33:21,932][19241] Avg episode reward: [(0, '4.569')] -[2025-08-22 18:33:26,644][19431] Updated weights for policy 0, policy_version 10 (0.0014) -[2025-08-22 18:33:26,930][19241] Fps is (10 sec: 1638.5, 60 sec: 1170.3, 300 sec: 1170.3). Total num frames: 40960. Throughput: 0: 276.2. Samples: 9666. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) -[2025-08-22 18:33:26,931][19241] Avg episode reward: [(0, '4.551')] -[2025-08-22 18:33:31,930][19241] Fps is (10 sec: 1638.5, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 49152. Throughput: 0: 306.0. Samples: 12238. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) -[2025-08-22 18:33:31,931][19241] Avg episode reward: [(0, '4.335')] -[2025-08-22 18:33:35,171][19433] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 2 -[2025-08-22 18:33:35,171][19438] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 2 -[2025-08-22 18:33:36,256][19437] Another process currently holds the lock /tmp/sf2_mique/doom_002.lockfile, attempt: 2 -[2025-08-22 18:33:36,930][19241] Fps is (10 sec: 2048.0, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 61440. Throughput: 0: 308.0. Samples: 13858. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) -[2025-08-22 18:33:36,931][19241] Avg episode reward: [(0, '4.294')] -[2025-08-22 18:33:41,930][19241] Fps is (10 sec: 1638.4, 60 sec: 1310.7, 300 sec: 1310.7). Total num frames: 65536. Throughput: 0: 349.2. Samples: 15726. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) -[2025-08-22 18:33:41,930][19241] Avg episode reward: [(0, '4.363')] -[2025-08-22 18:33:46,930][19241] Fps is (10 sec: 1228.8, 60 sec: 1340.5, 300 sec: 1340.5). Total num frames: 73728. Throughput: 0: 376.2. Samples: 18774. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) -[2025-08-22 18:33:46,931][19241] Avg episode reward: [(0, '4.257')] -[2025-08-22 18:33:49,518][19431] Updated weights for policy 0, policy_version 20 (0.0009) -[2025-08-22 18:33:50,937][19433] Decorrelating experience for 0 frames... -[2025-08-22 18:33:51,138][19433] Decorrelating experience for 32 frames... -[2025-08-22 18:33:51,140][19437] Decorrelating experience for 64 frames... -[2025-08-22 18:33:51,339][19438] Decorrelating experience for 0 frames... -[2025-08-22 18:33:51,369][19437] Decorrelating experience for 96 frames... -[2025-08-22 18:33:51,441][19241] Heartbeat connected on RolloutWorker_w4 -[2025-08-22 18:33:51,521][19438] Decorrelating experience for 32 frames... -[2025-08-22 18:33:51,560][19433] Decorrelating experience for 64 frames... -[2025-08-22 18:33:51,788][19438] Decorrelating experience for 64 frames... -[2025-08-22 18:33:51,808][19433] Decorrelating experience for 96 frames... -[2025-08-22 18:33:51,898][19241] Heartbeat connected on RolloutWorker_w0 -[2025-08-22 18:33:51,930][19241] Fps is (10 sec: 2048.0, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 86016. Throughput: 0: 402.1. Samples: 20302. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) -[2025-08-22 18:33:51,931][19241] Avg episode reward: [(0, '4.311')] -[2025-08-22 18:33:52,037][19438] Decorrelating experience for 96 frames... -[2025-08-22 18:33:52,117][19241] Heartbeat connected on RolloutWorker_w5 -[2025-08-22 18:33:56,930][19241] Fps is (10 sec: 4096.0, 60 sec: 1911.5, 300 sec: 1764.4). Total num frames: 114688. Throughput: 0: 506.1. Samples: 26636. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:33:56,931][19241] Avg episode reward: [(0, '4.146')] -[2025-08-22 18:33:56,937][19418] Saving new best policy, reward=4.146! -[2025-08-22 18:33:57,888][19431] Updated weights for policy 0, policy_version 30 (0.0016) -[2025-08-22 18:34:01,930][19241] Fps is (10 sec: 6144.0, 60 sec: 2389.3, 300 sec: 2106.5). Total num frames: 147456. Throughput: 0: 663.4. Samples: 36076. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:34:01,931][19241] Avg episode reward: [(0, '4.181')] -[2025-08-22 18:34:01,934][19418] Saving new best policy, reward=4.181! -[2025-08-22 18:34:04,262][19431] Updated weights for policy 0, policy_version 40 (0.0014) -[2025-08-22 18:34:06,930][19241] Fps is (10 sec: 6553.6, 60 sec: 2867.2, 300 sec: 2403.0). Total num frames: 180224. Throughput: 0: 746.6. Samples: 40950. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:34:06,931][19241] Avg episode reward: [(0, '4.381')] -[2025-08-22 18:34:06,939][19418] Saving new best policy, reward=4.381! -[2025-08-22 18:34:10,505][19431] Updated weights for policy 0, policy_version 50 (0.0015) -[2025-08-22 18:34:11,930][19241] Fps is (10 sec: 6553.7, 60 sec: 3276.8, 300 sec: 2662.4). Total num frames: 212992. Throughput: 0: 912.2. Samples: 50716. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:34:11,931][19241] Avg episode reward: [(0, '4.326')] -[2025-08-22 18:34:16,930][19241] Fps is (10 sec: 4505.6, 60 sec: 3345.1, 300 sec: 2650.4). Total num frames: 225280. Throughput: 0: 972.7. Samples: 56008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:34:16,931][19241] Avg episode reward: [(0, '4.364')] -[2025-08-22 18:34:19,343][19431] Updated weights for policy 0, policy_version 60 (0.0015) -[2025-08-22 18:34:21,930][19241] Fps is (10 sec: 4915.2, 60 sec: 3822.9, 300 sec: 2912.7). Total num frames: 262144. Throughput: 0: 1053.5. Samples: 61266. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:34:21,931][19241] Avg episode reward: [(0, '4.287')] -[2025-08-22 18:34:25,043][19431] Updated weights for policy 0, policy_version 70 (0.0015) -[2025-08-22 18:34:26,930][19241] Fps is (10 sec: 7372.8, 60 sec: 4300.8, 300 sec: 3147.5). Total num frames: 299008. Throughput: 0: 1250.9. Samples: 72018. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:34:26,931][19241] Avg episode reward: [(0, '4.477')] -[2025-08-22 18:34:26,937][19418] Saving new best policy, reward=4.477! -[2025-08-22 18:34:30,745][19431] Updated weights for policy 0, policy_version 80 (0.0014) -[2025-08-22 18:34:31,930][19241] Fps is (10 sec: 7372.8, 60 sec: 4778.7, 300 sec: 3358.7). Total num frames: 335872. Throughput: 0: 1422.5. Samples: 82788. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:34:31,932][19241] Avg episode reward: [(0, '4.496')] -[2025-08-22 18:34:31,933][19418] Saving new best policy, reward=4.496! -[2025-08-22 18:34:36,053][19431] Updated weights for policy 0, policy_version 90 (0.0013) -[2025-08-22 18:34:36,930][19241] Fps is (10 sec: 7372.8, 60 sec: 5188.3, 300 sec: 3549.9). Total num frames: 372736. Throughput: 0: 1511.6. Samples: 88326. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:34:36,931][19241] Avg episode reward: [(0, '4.540')] -[2025-08-22 18:34:36,935][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000091_372736.pth... -[2025-08-22 18:34:36,987][19418] Saving new best policy, reward=4.540! -[2025-08-22 18:34:40,763][19431] Updated weights for policy 0, policy_version 100 (0.0011) -[2025-08-22 18:34:41,930][19241] Fps is (10 sec: 8192.0, 60 sec: 5870.9, 300 sec: 3798.1). Total num frames: 417792. Throughput: 0: 1655.8. Samples: 101146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:34:41,931][19241] Avg episode reward: [(0, '4.643')] -[2025-08-22 18:34:41,933][19418] Saving new best policy, reward=4.643! -[2025-08-22 18:34:46,112][19431] Updated weights for policy 0, policy_version 110 (0.0011) -[2025-08-22 18:34:46,930][19241] Fps is (10 sec: 8192.0, 60 sec: 6348.8, 300 sec: 3953.5). Total num frames: 454656. Throughput: 0: 1707.8. Samples: 112928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2025-08-22 18:34:46,931][19241] Avg episode reward: [(0, '4.436')] -[2025-08-22 18:34:51,930][19241] Fps is (10 sec: 5324.8, 60 sec: 6417.1, 300 sec: 3925.3). Total num frames: 471040. Throughput: 0: 1670.1. Samples: 116106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2025-08-22 18:34:51,931][19241] Avg episode reward: [(0, '4.411')] -[2025-08-22 18:34:54,373][19431] Updated weights for policy 0, policy_version 120 (0.0012) -[2025-08-22 18:34:56,930][19241] Fps is (10 sec: 5734.4, 60 sec: 6621.9, 300 sec: 4096.0). Total num frames: 512000. Throughput: 0: 1645.1. Samples: 124746. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2025-08-22 18:34:56,931][19241] Avg episode reward: [(0, '4.502')] -[2025-08-22 18:34:59,737][19431] Updated weights for policy 0, policy_version 130 (0.0014) -[2025-08-22 18:35:01,930][19241] Fps is (10 sec: 7372.9, 60 sec: 6621.9, 300 sec: 4190.5). Total num frames: 544768. Throughput: 0: 1772.1. Samples: 135754. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:35:01,931][19241] Avg episode reward: [(0, '4.493')] -[2025-08-22 18:35:05,022][19431] Updated weights for policy 0, policy_version 140 (0.0012) -[2025-08-22 18:35:06,930][19241] Fps is (10 sec: 7372.9, 60 sec: 6758.4, 300 sec: 4338.7). Total num frames: 585728. Throughput: 0: 1788.3. Samples: 141738. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:35:06,931][19241] Avg episode reward: [(0, '4.574')] -[2025-08-22 18:35:10,545][19431] Updated weights for policy 0, policy_version 150 (0.0011) -[2025-08-22 18:35:11,930][19241] Fps is (10 sec: 7782.1, 60 sec: 6826.6, 300 sec: 4447.1). Total num frames: 622592. Throughput: 0: 1798.6. Samples: 152954. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:35:11,934][19241] Avg episode reward: [(0, '4.516')] -[2025-08-22 18:35:15,846][19431] Updated weights for policy 0, policy_version 160 (0.0014) -[2025-08-22 18:35:16,930][19241] Fps is (10 sec: 7782.2, 60 sec: 7304.5, 300 sec: 4576.2). Total num frames: 663552. Throughput: 0: 1817.8. Samples: 164588. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:35:16,931][19241] Avg episode reward: [(0, '4.382')] -[2025-08-22 18:35:21,215][19431] Updated weights for policy 0, policy_version 170 (0.0012) -[2025-08-22 18:35:21,930][19241] Fps is (10 sec: 7782.6, 60 sec: 7304.5, 300 sec: 4669.4). Total num frames: 700416. Throughput: 0: 1822.3. Samples: 170328. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:35:21,931][19241] Avg episode reward: [(0, '4.638')] -[2025-08-22 18:35:26,930][19241] Fps is (10 sec: 4915.3, 60 sec: 6894.9, 300 sec: 4598.1). Total num frames: 712704. Throughput: 0: 1690.9. Samples: 177236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:35:26,931][19241] Avg episode reward: [(0, '4.473')] -[2025-08-22 18:35:29,790][19431] Updated weights for policy 0, policy_version 180 (0.0014) -[2025-08-22 18:35:31,930][19241] Fps is (10 sec: 4915.2, 60 sec: 6894.9, 300 sec: 4684.8). Total num frames: 749568. Throughput: 0: 1643.8. Samples: 186898. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:35:31,931][19241] Avg episode reward: [(0, '4.548')] -[2025-08-22 18:35:35,325][19431] Updated weights for policy 0, policy_version 190 (0.0014) -[2025-08-22 18:35:36,930][19241] Fps is (10 sec: 7372.8, 60 sec: 6894.9, 300 sec: 4766.3). Total num frames: 786432. Throughput: 0: 1699.8. Samples: 192598. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:35:36,931][19241] Avg episode reward: [(0, '4.349')] -[2025-08-22 18:35:41,349][19431] Updated weights for policy 0, policy_version 200 (0.0016) -[2025-08-22 18:35:41,930][19241] Fps is (10 sec: 6963.3, 60 sec: 6690.2, 300 sec: 4818.8). Total num frames: 819200. Throughput: 0: 1732.5. Samples: 202710. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:35:41,931][19241] Avg episode reward: [(0, '4.363')] -[2025-08-22 18:35:46,930][19241] Fps is (10 sec: 6963.1, 60 sec: 6690.1, 300 sec: 4891.8). Total num frames: 856064. Throughput: 0: 1724.8. Samples: 213370. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:35:46,932][19241] Avg episode reward: [(0, '4.319')] -[2025-08-22 18:35:47,162][19431] Updated weights for policy 0, policy_version 210 (0.0011) -[2025-08-22 18:35:51,930][19241] Fps is (10 sec: 7372.7, 60 sec: 7031.5, 300 sec: 4960.7). Total num frames: 892928. Throughput: 0: 1708.4. Samples: 218618. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:35:51,931][19241] Avg episode reward: [(0, '4.262')] -[2025-08-22 18:35:52,430][19431] Updated weights for policy 0, policy_version 220 (0.0012) -[2025-08-22 18:35:56,930][19241] Fps is (10 sec: 8191.6, 60 sec: 7099.7, 300 sec: 5070.2). Total num frames: 937984. Throughput: 0: 1745.7. Samples: 231512. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:35:56,932][19241] Avg episode reward: [(0, '4.472')] -[2025-08-22 18:35:57,077][19431] Updated weights for policy 0, policy_version 230 (0.0013) -[2025-08-22 18:36:01,930][19241] Fps is (10 sec: 6553.6, 60 sec: 6894.9, 300 sec: 5044.6). Total num frames: 958464. Throughput: 0: 1636.5. Samples: 238228. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:36:01,931][19241] Avg episode reward: [(0, '4.402')] -[2025-08-22 18:36:04,921][19431] Updated weights for policy 0, policy_version 240 (0.0010) -[2025-08-22 18:36:06,930][19241] Fps is (10 sec: 5734.7, 60 sec: 6826.7, 300 sec: 5104.2). Total num frames: 995328. Throughput: 0: 1648.4. Samples: 244508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:36:06,931][19241] Avg episode reward: [(0, '4.403')] -[2025-08-22 18:36:11,244][19431] Updated weights for policy 0, policy_version 250 (0.0011) -[2025-08-22 18:36:11,930][19241] Fps is (10 sec: 6963.1, 60 sec: 6758.4, 300 sec: 5140.5). Total num frames: 1028096. Throughput: 0: 1711.6. Samples: 254260. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:36:11,931][19241] Avg episode reward: [(0, '4.421')] -[2025-08-22 18:36:16,930][19241] Fps is (10 sec: 6553.4, 60 sec: 6621.8, 300 sec: 5174.9). Total num frames: 1060864. Throughput: 0: 1728.5. Samples: 264680. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:36:16,932][19241] Avg episode reward: [(0, '4.544')] -[2025-08-22 18:36:17,114][19431] Updated weights for policy 0, policy_version 260 (0.0016) -[2025-08-22 18:36:21,930][19241] Fps is (10 sec: 6963.2, 60 sec: 6621.9, 300 sec: 5227.3). Total num frames: 1097728. Throughput: 0: 1716.9. Samples: 269858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:36:21,931][19241] Avg episode reward: [(0, '4.329')] -[2025-08-22 18:36:22,993][19431] Updated weights for policy 0, policy_version 270 (0.0014) -[2025-08-22 18:36:26,930][19241] Fps is (10 sec: 7782.7, 60 sec: 7099.7, 300 sec: 5296.2). Total num frames: 1138688. Throughput: 0: 1740.4. Samples: 281030. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:36:26,931][19241] Avg episode reward: [(0, '4.275')] -[2025-08-22 18:36:27,817][19431] Updated weights for policy 0, policy_version 280 (0.0013) -[2025-08-22 18:36:31,930][19241] Fps is (10 sec: 8192.0, 60 sec: 7168.0, 300 sec: 5362.0). Total num frames: 1179648. Throughput: 0: 1798.5. Samples: 294300. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:36:31,931][19241] Avg episode reward: [(0, '4.298')] -[2025-08-22 18:36:32,453][19431] Updated weights for policy 0, policy_version 290 (0.0010) -[2025-08-22 18:36:36,930][19241] Fps is (10 sec: 6143.8, 60 sec: 6894.9, 300 sec: 5333.9). Total num frames: 1200128. Throughput: 0: 1788.1. Samples: 299082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:36:36,932][19241] Avg episode reward: [(0, '4.327')] -[2025-08-22 18:36:36,938][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000293_1200128.pth... -[2025-08-22 18:36:40,542][19431] Updated weights for policy 0, policy_version 300 (0.0013) -[2025-08-22 18:36:41,930][19241] Fps is (10 sec: 5734.4, 60 sec: 6963.2, 300 sec: 5378.2). Total num frames: 1236992. Throughput: 0: 1671.7. Samples: 306738. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:36:41,932][19241] Avg episode reward: [(0, '4.422')] -[2025-08-22 18:36:46,509][19431] Updated weights for policy 0, policy_version 310 (0.0013) -[2025-08-22 18:36:46,930][19241] Fps is (10 sec: 6963.4, 60 sec: 6894.9, 300 sec: 5403.2). Total num frames: 1269760. Throughput: 0: 1748.7. Samples: 316922. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:36:46,932][19241] Avg episode reward: [(0, '4.348')] -[2025-08-22 18:36:51,930][19241] Fps is (10 sec: 6963.2, 60 sec: 6894.9, 300 sec: 5444.3). Total num frames: 1306624. Throughput: 0: 1726.5. Samples: 322202. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:36:51,931][19241] Avg episode reward: [(0, '4.340')] -[2025-08-22 18:36:52,559][19431] Updated weights for policy 0, policy_version 320 (0.0014) -[2025-08-22 18:36:56,930][19241] Fps is (10 sec: 6553.5, 60 sec: 6621.9, 300 sec: 5450.2). Total num frames: 1335296. Throughput: 0: 1725.5. Samples: 331906. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:36:56,933][19241] Avg episode reward: [(0, '4.194')] -[2025-08-22 18:36:58,998][19431] Updated weights for policy 0, policy_version 330 (0.0014) -[2025-08-22 18:37:01,930][19241] Fps is (10 sec: 6553.6, 60 sec: 6894.9, 300 sec: 5488.6). Total num frames: 1372160. Throughput: 0: 1720.4. Samples: 342098. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:37:01,931][19241] Avg episode reward: [(0, '4.500')] -[2025-08-22 18:37:03,719][19431] Updated weights for policy 0, policy_version 340 (0.0011) -[2025-08-22 18:37:06,930][19241] Fps is (10 sec: 8191.9, 60 sec: 7031.4, 300 sec: 5557.7). Total num frames: 1417216. Throughput: 0: 1763.4. Samples: 349210. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:37:06,931][19241] Avg episode reward: [(0, '4.308')] -[2025-08-22 18:37:11,453][19431] Updated weights for policy 0, policy_version 350 (0.0012) -[2025-08-22 18:37:11,930][19241] Fps is (10 sec: 6143.9, 60 sec: 6758.4, 300 sec: 5513.8). Total num frames: 1433600. Throughput: 0: 1725.7. Samples: 358686. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:37:11,932][19241] Avg episode reward: [(0, '4.350')] -[2025-08-22 18:37:16,930][19241] Fps is (10 sec: 4915.2, 60 sec: 6758.4, 300 sec: 5533.5). Total num frames: 1466368. Throughput: 0: 1599.7. Samples: 366288. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:37:16,932][19241] Avg episode reward: [(0, '4.251')] -[2025-08-22 18:37:18,050][19431] Updated weights for policy 0, policy_version 360 (0.0016) -[2025-08-22 18:37:21,930][19241] Fps is (10 sec: 6553.6, 60 sec: 6690.1, 300 sec: 5552.4). Total num frames: 1499136. Throughput: 0: 1583.4. Samples: 370336. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:37:21,932][19241] Avg episode reward: [(0, '4.518')] -[2025-08-22 18:37:23,805][19431] Updated weights for policy 0, policy_version 370 (0.0013) -[2025-08-22 18:37:26,930][19241] Fps is (10 sec: 6963.2, 60 sec: 6621.8, 300 sec: 5585.5). Total num frames: 1536000. Throughput: 0: 1654.8. Samples: 381204. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:37:26,932][19241] Avg episode reward: [(0, '4.355')] -[2025-08-22 18:37:29,266][19431] Updated weights for policy 0, policy_version 380 (0.0013) -[2025-08-22 18:37:31,930][19241] Fps is (10 sec: 7782.3, 60 sec: 6621.8, 300 sec: 5632.0). Total num frames: 1576960. Throughput: 0: 1701.9. Samples: 393510. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:37:31,932][19241] Avg episode reward: [(0, '4.261')] -[2025-08-22 18:37:34,726][19431] Updated weights for policy 0, policy_version 390 (0.0013) -[2025-08-22 18:37:36,930][19241] Fps is (10 sec: 7372.8, 60 sec: 6826.7, 300 sec: 5648.2). Total num frames: 1609728. Throughput: 0: 1701.4. Samples: 398764. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:37:36,932][19241] Avg episode reward: [(0, '4.196')] -[2025-08-22 18:37:40,534][19431] Updated weights for policy 0, policy_version 400 (0.0011) -[2025-08-22 18:37:41,930][19241] Fps is (10 sec: 6963.4, 60 sec: 6826.7, 300 sec: 5677.9). Total num frames: 1646592. Throughput: 0: 1714.3. Samples: 409048. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:37:41,931][19241] Avg episode reward: [(0, '4.345')] -[2025-08-22 18:37:46,930][19241] Fps is (10 sec: 4915.1, 60 sec: 6485.3, 300 sec: 5623.3). Total num frames: 1658880. Throughput: 0: 1634.3. Samples: 415644. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:37:46,933][19241] Avg episode reward: [(0, '4.418')] -[2025-08-22 18:37:49,506][19431] Updated weights for policy 0, policy_version 410 (0.0012) -[2025-08-22 18:37:51,930][19241] Fps is (10 sec: 4915.2, 60 sec: 6485.3, 300 sec: 5748.3). Total num frames: 1695744. Throughput: 0: 1561.8. Samples: 419490. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:37:51,931][19241] Avg episode reward: [(0, '4.696')] -[2025-08-22 18:37:51,934][19418] Saving new best policy, reward=4.696! -[2025-08-22 18:37:55,389][19431] Updated weights for policy 0, policy_version 420 (0.0016) -[2025-08-22 18:37:56,930][19241] Fps is (10 sec: 6963.4, 60 sec: 6553.6, 300 sec: 5845.5). Total num frames: 1728512. Throughput: 0: 1582.9. Samples: 429916. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:37:56,932][19241] Avg episode reward: [(0, '4.514')] -[2025-08-22 18:38:00,424][19431] Updated weights for policy 0, policy_version 430 (0.0014) -[2025-08-22 18:38:01,930][19241] Fps is (10 sec: 7782.6, 60 sec: 6690.1, 300 sec: 5984.3). Total num frames: 1773568. Throughput: 0: 1682.7. Samples: 442010. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:38:01,931][19241] Avg episode reward: [(0, '4.417')] -[2025-08-22 18:38:05,789][19431] Updated weights for policy 0, policy_version 440 (0.0014) -[2025-08-22 18:38:06,930][19241] Fps is (10 sec: 8192.1, 60 sec: 6553.6, 300 sec: 6081.5). Total num frames: 1810432. Throughput: 0: 1713.6. Samples: 447446. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:38:06,931][19241] Avg episode reward: [(0, '4.464')] -[2025-08-22 18:38:11,276][19431] Updated weights for policy 0, policy_version 450 (0.0013) -[2025-08-22 18:38:11,930][19241] Fps is (10 sec: 7372.5, 60 sec: 6894.9, 300 sec: 6178.7). Total num frames: 1847296. Throughput: 0: 1725.7. Samples: 458862. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:38:11,932][19241] Avg episode reward: [(0, '4.562')] -[2025-08-22 18:38:16,930][19241] Fps is (10 sec: 6963.3, 60 sec: 6895.0, 300 sec: 6262.0). Total num frames: 1880064. Throughput: 0: 1686.1. Samples: 469384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:38:16,931][19241] Avg episode reward: [(0, '4.574')] -[2025-08-22 18:38:17,235][19431] Updated weights for policy 0, policy_version 460 (0.0013) -[2025-08-22 18:38:21,930][19241] Fps is (10 sec: 4915.4, 60 sec: 6621.9, 300 sec: 6289.8). Total num frames: 1896448. Throughput: 0: 1682.2. Samples: 474464. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:38:21,931][19241] Avg episode reward: [(0, '4.471')] -[2025-08-22 18:38:25,578][19431] Updated weights for policy 0, policy_version 470 (0.0012) -[2025-08-22 18:38:26,930][19241] Fps is (10 sec: 5324.7, 60 sec: 6621.9, 300 sec: 6387.0). Total num frames: 1933312. Throughput: 0: 1591.0. Samples: 480644. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:38:26,931][19241] Avg episode reward: [(0, '4.313')] -[2025-08-22 18:38:30,771][19431] Updated weights for policy 0, policy_version 480 (0.0011) -[2025-08-22 18:38:31,930][19241] Fps is (10 sec: 7782.4, 60 sec: 6621.9, 300 sec: 6484.2). Total num frames: 1974272. Throughput: 0: 1706.9. Samples: 492454. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:38:31,931][19241] Avg episode reward: [(0, '4.472')] -[2025-08-22 18:38:35,621][19431] Updated weights for policy 0, policy_version 490 (0.0012) -[2025-08-22 18:38:36,930][19241] Fps is (10 sec: 8191.8, 60 sec: 6758.4, 300 sec: 6609.1). Total num frames: 2015232. Throughput: 0: 1766.5. Samples: 498982. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:38:36,931][19241] Avg episode reward: [(0, '4.294')] -[2025-08-22 18:38:36,938][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000492_2015232.pth... -[2025-08-22 18:38:36,996][19418] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000091_372736.pth -[2025-08-22 18:38:40,843][19431] Updated weights for policy 0, policy_version 500 (0.0013) -[2025-08-22 18:38:41,930][19241] Fps is (10 sec: 8192.0, 60 sec: 6826.7, 300 sec: 6720.2). Total num frames: 2056192. Throughput: 0: 1796.0. Samples: 510734. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:38:41,931][19241] Avg episode reward: [(0, '4.569')] -[2025-08-22 18:38:46,012][19431] Updated weights for policy 0, policy_version 510 (0.0011) -[2025-08-22 18:38:46,930][19241] Fps is (10 sec: 7782.6, 60 sec: 7236.3, 300 sec: 6803.5). Total num frames: 2093056. Throughput: 0: 1792.3. Samples: 522664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2025-08-22 18:38:46,931][19241] Avg episode reward: [(0, '4.556')] -[2025-08-22 18:38:51,289][19431] Updated weights for policy 0, policy_version 520 (0.0011) -[2025-08-22 18:38:51,930][19241] Fps is (10 sec: 7782.3, 60 sec: 7304.6, 300 sec: 6845.2). Total num frames: 2134016. Throughput: 0: 1804.7. Samples: 528656. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2025-08-22 18:38:51,931][19241] Avg episode reward: [(0, '4.274')] -[2025-08-22 18:38:56,968][19241] Fps is (10 sec: 5712.5, 60 sec: 7027.0, 300 sec: 6788.8). Total num frames: 2150400. Throughput: 0: 1675.9. Samples: 534340. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2025-08-22 18:38:56,969][19241] Avg episode reward: [(0, '4.471')] -[2025-08-22 18:38:59,850][19431] Updated weights for policy 0, policy_version 530 (0.0016) -[2025-08-22 18:39:01,930][19241] Fps is (10 sec: 4915.2, 60 sec: 6826.6, 300 sec: 6789.6). Total num frames: 2183168. Throughput: 0: 1684.6. Samples: 545190. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:39:01,931][19241] Avg episode reward: [(0, '4.260')] -[2025-08-22 18:39:05,622][19431] Updated weights for policy 0, policy_version 540 (0.0014) -[2025-08-22 18:39:06,930][19241] Fps is (10 sec: 6990.0, 60 sec: 6826.7, 300 sec: 6803.5). Total num frames: 2220032. Throughput: 0: 1690.9. Samples: 550554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:39:06,931][19241] Avg episode reward: [(0, '4.323')] -[2025-08-22 18:39:11,841][19431] Updated weights for policy 0, policy_version 550 (0.0013) -[2025-08-22 18:39:11,930][19241] Fps is (10 sec: 6963.3, 60 sec: 6758.4, 300 sec: 6872.9). Total num frames: 2252800. Throughput: 0: 1772.5. Samples: 560408. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:39:11,931][19241] Avg episode reward: [(0, '4.423')] -[2025-08-22 18:39:16,930][19241] Fps is (10 sec: 6553.4, 60 sec: 6758.3, 300 sec: 6859.1). Total num frames: 2285568. Throughput: 0: 1738.6. Samples: 570692. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:39:16,931][19241] Avg episode reward: [(0, '4.422')] -[2025-08-22 18:39:18,200][19431] Updated weights for policy 0, policy_version 560 (0.0015) -[2025-08-22 18:39:21,930][19241] Fps is (10 sec: 6144.0, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 2314240. Throughput: 0: 1691.7. Samples: 575110. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:39:21,931][19241] Avg episode reward: [(0, '4.559')] -[2025-08-22 18:39:24,448][19431] Updated weights for policy 0, policy_version 570 (0.0012) -[2025-08-22 18:39:26,930][19241] Fps is (10 sec: 6553.8, 60 sec: 6963.2, 300 sec: 6831.3). Total num frames: 2351104. Throughput: 0: 1651.9. Samples: 585070. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:39:26,931][19241] Avg episode reward: [(0, '4.338')] -[2025-08-22 18:39:32,156][19241] Fps is (10 sec: 5206.8, 60 sec: 6528.9, 300 sec: 6756.7). Total num frames: 2367488. Throughput: 0: 1502.4. Samples: 590614. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:39:32,157][19241] Avg episode reward: [(0, '4.384')] -[2025-08-22 18:39:32,694][19431] Updated weights for policy 0, policy_version 580 (0.0010) -[2025-08-22 18:39:36,930][19241] Fps is (10 sec: 6144.1, 60 sec: 6621.9, 300 sec: 6761.9). Total num frames: 2412544. Throughput: 0: 1527.1. Samples: 597374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2025-08-22 18:39:36,930][19241] Avg episode reward: [(0, '4.455')] -[2025-08-22 18:39:37,240][19431] Updated weights for policy 0, policy_version 590 (0.0010) -[2025-08-22 18:39:41,912][19431] Updated weights for policy 0, policy_version 600 (0.0011) -[2025-08-22 18:39:41,930][19241] Fps is (10 sec: 9220.1, 60 sec: 6690.1, 300 sec: 6789.6). Total num frames: 2457600. Throughput: 0: 1707.2. Samples: 611100. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:39:41,931][19241] Avg episode reward: [(0, '4.335')] -[2025-08-22 18:39:46,930][19241] Fps is (10 sec: 7782.2, 60 sec: 6621.9, 300 sec: 6845.2). Total num frames: 2490368. Throughput: 0: 1706.5. Samples: 621984. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:39:46,931][19241] Avg episode reward: [(0, '4.447')] -[2025-08-22 18:39:47,691][19431] Updated weights for policy 0, policy_version 610 (0.0014) -[2025-08-22 18:39:51,930][19241] Fps is (10 sec: 6963.3, 60 sec: 6553.6, 300 sec: 6831.3). Total num frames: 2527232. Throughput: 0: 1709.8. Samples: 627496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:39:51,931][19241] Avg episode reward: [(0, '4.464')] -[2025-08-22 18:39:53,009][19431] Updated weights for policy 0, policy_version 620 (0.0010) -[2025-08-22 18:39:56,930][19241] Fps is (10 sec: 8192.1, 60 sec: 7036.0, 300 sec: 6872.9). Total num frames: 2572288. Throughput: 0: 1759.6. Samples: 639592. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:39:56,931][19241] Avg episode reward: [(0, '4.204')] -[2025-08-22 18:39:57,708][19431] Updated weights for policy 0, policy_version 630 (0.0011) -[2025-08-22 18:40:01,930][19241] Fps is (10 sec: 9011.2, 60 sec: 7236.3, 300 sec: 6886.8). Total num frames: 2617344. Throughput: 0: 1824.7. Samples: 652804. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:40:01,930][19241] Avg episode reward: [(0, '4.648')] -[2025-08-22 18:40:02,376][19431] Updated weights for policy 0, policy_version 640 (0.0012) -[2025-08-22 18:40:07,336][19241] Fps is (10 sec: 6297.6, 60 sec: 6916.4, 300 sec: 6821.9). Total num frames: 2637824. Throughput: 0: 1853.1. Samples: 659254. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:40:07,337][19241] Avg episode reward: [(0, '4.523')] -[2025-08-22 18:40:10,426][19431] Updated weights for policy 0, policy_version 650 (0.0013) -[2025-08-22 18:40:11,930][19241] Fps is (10 sec: 5734.4, 60 sec: 7031.5, 300 sec: 6817.4). Total num frames: 2674688. Throughput: 0: 1777.3. Samples: 665048. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:40:11,931][19241] Avg episode reward: [(0, '4.371')] -[2025-08-22 18:40:15,043][19431] Updated weights for policy 0, policy_version 660 (0.0013) -[2025-08-22 18:40:16,930][19241] Fps is (10 sec: 8539.1, 60 sec: 7236.3, 300 sec: 6845.2). Total num frames: 2719744. Throughput: 0: 1958.6. Samples: 678306. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:40:16,932][19241] Avg episode reward: [(0, '4.395')] -[2025-08-22 18:40:19,684][19431] Updated weights for policy 0, policy_version 670 (0.0010) -[2025-08-22 18:40:21,930][19241] Fps is (10 sec: 8601.6, 60 sec: 7441.1, 300 sec: 6942.4). Total num frames: 2760704. Throughput: 0: 1946.0. Samples: 684942. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:40:21,931][19241] Avg episode reward: [(0, '4.584')] -[2025-08-22 18:40:24,509][19431] Updated weights for policy 0, policy_version 680 (0.0011) -[2025-08-22 18:40:26,930][19241] Fps is (10 sec: 8601.6, 60 sec: 7577.6, 300 sec: 6970.1). Total num frames: 2805760. Throughput: 0: 1923.9. Samples: 697674. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:40:26,930][19241] Avg episode reward: [(0, '4.283')] -[2025-08-22 18:40:29,085][19431] Updated weights for policy 0, policy_version 690 (0.0010) -[2025-08-22 18:40:31,930][19241] Fps is (10 sec: 9011.2, 60 sec: 8086.0, 300 sec: 6997.9). Total num frames: 2850816. Throughput: 0: 1984.2. Samples: 711272. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:40:31,931][19241] Avg episode reward: [(0, '4.515')] -[2025-08-22 18:40:33,663][19431] Updated weights for policy 0, policy_version 700 (0.0011) -[2025-08-22 18:40:36,930][19241] Fps is (10 sec: 9011.2, 60 sec: 8055.5, 300 sec: 7039.6). Total num frames: 2895872. Throughput: 0: 2009.0. Samples: 717900. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:40:36,930][19241] Avg episode reward: [(0, '4.274')] -[2025-08-22 18:40:36,937][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000707_2895872.pth... -[2025-08-22 18:40:37,037][19418] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000293_1200128.pth -[2025-08-22 18:40:38,293][19431] Updated weights for policy 0, policy_version 710 (0.0008) -[2025-08-22 18:40:42,515][19241] Fps is (10 sec: 6191.0, 60 sec: 7572.0, 300 sec: 6970.2). Total num frames: 2916352. Throughput: 0: 1865.1. Samples: 724614. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) -[2025-08-22 18:40:42,517][19241] Avg episode reward: [(0, '4.308')] -[2025-08-22 18:40:45,721][19431] Updated weights for policy 0, policy_version 720 (0.0012) -[2025-08-22 18:40:46,930][19241] Fps is (10 sec: 6143.8, 60 sec: 7782.4, 300 sec: 6997.9). Total num frames: 2957312. Throughput: 0: 1898.8. Samples: 738252. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) -[2025-08-22 18:40:46,932][19241] Avg episode reward: [(0, '4.317')] -[2025-08-22 18:40:50,370][19431] Updated weights for policy 0, policy_version 730 (0.0012) -[2025-08-22 18:40:51,930][19241] Fps is (10 sec: 9136.7, 60 sec: 7918.9, 300 sec: 6997.9). Total num frames: 3002368. Throughput: 0: 1918.9. Samples: 744824. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:40:51,931][19241] Avg episode reward: [(0, '4.413')] -[2025-08-22 18:40:55,104][19431] Updated weights for policy 0, policy_version 740 (0.0014) -[2025-08-22 18:40:56,930][19241] Fps is (10 sec: 9011.5, 60 sec: 7918.9, 300 sec: 7081.2). Total num frames: 3047424. Throughput: 0: 2065.8. Samples: 758008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:40:56,931][19241] Avg episode reward: [(0, '4.207')] -[2025-08-22 18:40:59,746][19431] Updated weights for policy 0, policy_version 750 (0.0011) -[2025-08-22 18:41:01,930][19241] Fps is (10 sec: 8601.6, 60 sec: 7850.7, 300 sec: 7095.1). Total num frames: 3088384. Throughput: 0: 2066.9. Samples: 771316. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:01,931][19241] Avg episode reward: [(0, '4.356')] -[2025-08-22 18:41:04,325][19431] Updated weights for policy 0, policy_version 760 (0.0010) -[2025-08-22 18:41:06,930][19241] Fps is (10 sec: 8601.5, 60 sec: 8316.6, 300 sec: 7136.8). Total num frames: 3133440. Throughput: 0: 2067.9. Samples: 777996. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:06,931][19241] Avg episode reward: [(0, '4.369')] -[2025-08-22 18:41:09,322][19431] Updated weights for policy 0, policy_version 770 (0.0010) -[2025-08-22 18:41:11,930][19241] Fps is (10 sec: 8601.6, 60 sec: 8328.5, 300 sec: 7164.5). Total num frames: 3174400. Throughput: 0: 2062.7. Samples: 790494. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:11,931][19241] Avg episode reward: [(0, '4.640')] -[2025-08-22 18:41:14,024][19431] Updated weights for policy 0, policy_version 780 (0.0010) -[2025-08-22 18:41:17,703][19241] Fps is (10 sec: 6083.4, 60 sec: 7885.6, 300 sec: 7104.3). Total num frames: 3198976. Throughput: 0: 1873.7. Samples: 797038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2025-08-22 18:41:17,704][19241] Avg episode reward: [(0, '4.622')] -[2025-08-22 18:41:21,660][19431] Updated weights for policy 0, policy_version 790 (0.0013) -[2025-08-22 18:41:21,930][19241] Fps is (10 sec: 6144.0, 60 sec: 7918.9, 300 sec: 7109.0). Total num frames: 3235840. Throughput: 0: 1906.3. Samples: 803682. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:21,931][19241] Avg episode reward: [(0, '4.455')] -[2025-08-22 18:41:26,221][19431] Updated weights for policy 0, policy_version 800 (0.0011) -[2025-08-22 18:41:26,930][19241] Fps is (10 sec: 8878.1, 60 sec: 7918.9, 300 sec: 7122.9). Total num frames: 3280896. Throughput: 0: 2077.3. Samples: 816876. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:26,932][19241] Avg episode reward: [(0, '4.282')] -[2025-08-22 18:41:30,665][19431] Updated weights for policy 0, policy_version 810 (0.0010) -[2025-08-22 18:41:31,930][19241] Fps is (10 sec: 9011.2, 60 sec: 7918.9, 300 sec: 7206.2). Total num frames: 3325952. Throughput: 0: 2053.3. Samples: 830650. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:31,931][19241] Avg episode reward: [(0, '4.424')] -[2025-08-22 18:41:35,168][19431] Updated weights for policy 0, policy_version 820 (0.0011) -[2025-08-22 18:41:36,930][19241] Fps is (10 sec: 9011.3, 60 sec: 7918.9, 300 sec: 7234.0). Total num frames: 3371008. Throughput: 0: 2061.2. Samples: 837578. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:36,931][19241] Avg episode reward: [(0, '4.372')] -[2025-08-22 18:41:39,760][19431] Updated weights for policy 0, policy_version 830 (0.0009) -[2025-08-22 18:41:41,930][19241] Fps is (10 sec: 9011.1, 60 sec: 8410.6, 300 sec: 7275.6). Total num frames: 3416064. Throughput: 0: 2064.0. Samples: 850888. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:41,931][19241] Avg episode reward: [(0, '4.442')] -[2025-08-22 18:41:44,408][19431] Updated weights for policy 0, policy_version 840 (0.0013) -[2025-08-22 18:41:46,930][19241] Fps is (10 sec: 9010.9, 60 sec: 8396.8, 300 sec: 7303.4). Total num frames: 3461120. Throughput: 0: 2063.4. Samples: 864168. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:46,932][19241] Avg episode reward: [(0, '4.593')] -[2025-08-22 18:41:48,981][19431] Updated weights for policy 0, policy_version 850 (0.0012) -[2025-08-22 18:41:52,882][19241] Fps is (10 sec: 6732.0, 60 sec: 7996.9, 300 sec: 7279.9). Total num frames: 3489792. Throughput: 0: 2020.9. Samples: 870862. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:52,883][19241] Avg episode reward: [(0, '4.376')] -[2025-08-22 18:41:56,478][19431] Updated weights for policy 0, policy_version 860 (0.0009) -[2025-08-22 18:41:56,930][19241] Fps is (10 sec: 6553.9, 60 sec: 7987.2, 300 sec: 7303.4). Total num frames: 3526656. Throughput: 0: 1937.3. Samples: 877672. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:41:56,931][19241] Avg episode reward: [(0, '4.438')] -[2025-08-22 18:42:01,052][19431] Updated weights for policy 0, policy_version 870 (0.0010) -[2025-08-22 18:42:01,930][19241] Fps is (10 sec: 9053.9, 60 sec: 8055.5, 300 sec: 7303.4). Total num frames: 3571712. Throughput: 0: 2129.7. Samples: 891228. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:42:01,931][19241] Avg episode reward: [(0, '4.632')] -[2025-08-22 18:42:05,726][19431] Updated weights for policy 0, policy_version 880 (0.0009) -[2025-08-22 18:42:06,930][19241] Fps is (10 sec: 8601.4, 60 sec: 7987.2, 300 sec: 7386.7). Total num frames: 3612672. Throughput: 0: 2093.3. Samples: 897882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:42:06,931][19241] Avg episode reward: [(0, '4.446')] -[2025-08-22 18:42:10,810][19431] Updated weights for policy 0, policy_version 890 (0.0011) -[2025-08-22 18:42:11,930][19241] Fps is (10 sec: 8191.9, 60 sec: 7987.2, 300 sec: 7414.5). Total num frames: 3653632. Throughput: 0: 2070.2. Samples: 910034. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:42:11,931][19241] Avg episode reward: [(0, '4.543')] -[2025-08-22 18:42:15,380][19431] Updated weights for policy 0, policy_version 900 (0.0010) -[2025-08-22 18:42:16,930][19241] Fps is (10 sec: 8601.7, 60 sec: 8437.2, 300 sec: 7456.1). Total num frames: 3698688. Throughput: 0: 2062.8. Samples: 923476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:42:16,931][19241] Avg episode reward: [(0, '4.347')] -[2025-08-22 18:42:19,862][19431] Updated weights for policy 0, policy_version 910 (0.0008) -[2025-08-22 18:42:21,930][19241] Fps is (10 sec: 9011.3, 60 sec: 8465.1, 300 sec: 7483.9). Total num frames: 3743744. Throughput: 0: 2061.0. Samples: 930324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:42:21,931][19241] Avg episode reward: [(0, '4.602')] -[2025-08-22 18:42:24,380][19431] Updated weights for policy 0, policy_version 920 (0.0010) -[2025-08-22 18:42:28,067][19241] Fps is (10 sec: 6619.9, 60 sec: 8039.6, 300 sec: 7413.7). Total num frames: 3772416. Throughput: 0: 2017.5. Samples: 943970. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:42:28,068][19241] Avg episode reward: [(0, '4.474')] -[2025-08-22 18:42:31,695][19431] Updated weights for policy 0, policy_version 930 (0.0010) -[2025-08-22 18:42:31,930][19241] Fps is (10 sec: 6553.5, 60 sec: 8055.4, 300 sec: 7456.1). Total num frames: 3809280. Throughput: 0: 1931.8. Samples: 951100. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2025-08-22 18:42:31,931][19241] Avg episode reward: [(0, '4.354')] -[2025-08-22 18:42:36,159][19431] Updated weights for policy 0, policy_version 940 (0.0009) -[2025-08-22 18:42:36,930][19241] Fps is (10 sec: 9243.1, 60 sec: 8055.5, 300 sec: 7483.9). Total num frames: 3854336. Throughput: 0: 1982.1. Samples: 958172. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:42:36,931][19241] Avg episode reward: [(0, '4.457')] -[2025-08-22 18:42:36,936][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000941_3854336.pth... -[2025-08-22 18:42:36,936][19241] Components not started: RolloutWorker_w1, RolloutWorker_w2, RolloutWorker_w3, RolloutWorker_w7, wait_time=600.0 seconds -[2025-08-22 18:42:37,014][19418] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000492_2015232.pth -[2025-08-22 18:42:40,762][19431] Updated weights for policy 0, policy_version 950 (0.0009) -[2025-08-22 18:42:41,930][19241] Fps is (10 sec: 9011.4, 60 sec: 8055.5, 300 sec: 7595.0). Total num frames: 3899392. Throughput: 0: 2087.6. Samples: 971612. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) -[2025-08-22 18:42:41,931][19241] Avg episode reward: [(0, '4.512')] -[2025-08-22 18:42:45,221][19431] Updated weights for policy 0, policy_version 960 (0.0008) -[2025-08-22 18:42:46,930][19241] Fps is (10 sec: 9011.2, 60 sec: 8055.5, 300 sec: 7622.7). Total num frames: 3944448. Throughput: 0: 2090.0. Samples: 985278. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:42:46,932][19241] Avg episode reward: [(0, '4.577')] -[2025-08-22 18:42:49,574][19431] Updated weights for policy 0, policy_version 970 (0.0010) -[2025-08-22 18:42:51,930][19241] Fps is (10 sec: 9420.8, 60 sec: 8532.2, 300 sec: 7678.3). Total num frames: 3993600. Throughput: 0: 2100.2. Samples: 992390. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) -[2025-08-22 18:42:51,931][19241] Avg episode reward: [(0, '4.402')] -[2025-08-22 18:42:53,098][19418] Stopping Batcher_0... -[2025-08-22 18:42:53,099][19418] Loop batcher_evt_loop terminating... -[2025-08-22 18:42:53,100][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 18:42:53,109][19241] Component Batcher_0 stopped! -[2025-08-22 18:42:53,113][19241] Component RolloutWorker_w1 process died already! Don't wait for it. -[2025-08-22 18:42:53,114][19241] Component RolloutWorker_w2 process died already! Don't wait for it. -[2025-08-22 18:42:53,116][19241] Component RolloutWorker_w3 process died already! Don't wait for it. -[2025-08-22 18:42:53,117][19241] Component RolloutWorker_w7 process died already! Don't wait for it. -[2025-08-22 18:42:53,148][19241] Component RolloutWorker_w5 stopped! -[2025-08-22 18:42:53,149][19438] Stopping RolloutWorker_w5... -[2025-08-22 18:42:53,152][19438] Loop rollout_proc5_evt_loop terminating... -[2025-08-22 18:42:53,151][19241] Component RolloutWorker_w0 stopped! -[2025-08-22 18:42:53,151][19433] Stopping RolloutWorker_w0... -[2025-08-22 18:42:53,153][19433] Loop rollout_proc0_evt_loop terminating... -[2025-08-22 18:42:53,164][19418] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000707_2895872.pth -[2025-08-22 18:42:53,163][19241] Component RolloutWorker_w4 stopped! -[2025-08-22 18:42:53,163][19437] Stopping RolloutWorker_w4... -[2025-08-22 18:42:53,168][19437] Loop rollout_proc4_evt_loop terminating... -[2025-08-22 18:42:53,168][19418] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 18:42:53,185][19241] Component RolloutWorker_w6 stopped! -[2025-08-22 18:42:53,186][19435] Stopping RolloutWorker_w6... -[2025-08-22 18:42:53,190][19435] Loop rollout_proc6_evt_loop terminating... -[2025-08-22 18:42:53,227][19418] Stopping LearnerWorker_p0... -[2025-08-22 18:42:53,228][19418] Loop learner_proc0_evt_loop terminating... -[2025-08-22 18:42:53,228][19241] Component LearnerWorker_p0 stopped! -[2025-08-22 18:42:53,348][19431] Weights refcount: 2 0 -[2025-08-22 18:42:53,357][19431] Stopping InferenceWorker_p0-w0... -[2025-08-22 18:42:53,357][19431] Loop inference_proc0-0_evt_loop terminating... -[2025-08-22 18:42:53,357][19241] Component InferenceWorker_p0-w0 stopped! -[2025-08-22 18:42:53,359][19241] Waiting for process learner_proc0 to stop... -[2025-08-22 18:42:55,644][19241] Waiting for process inference_proc0-0 to join... -[2025-08-22 18:42:55,646][19241] Waiting for process rollout_proc0 to join... -[2025-08-22 18:42:55,647][19241] Waiting for process rollout_proc1 to join... -[2025-08-22 18:42:55,647][19241] Waiting for process rollout_proc2 to join... -[2025-08-22 18:42:55,648][19241] Waiting for process rollout_proc3 to join... -[2025-08-22 18:42:55,649][19241] Waiting for process rollout_proc4 to join... -[2025-08-22 18:42:55,650][19241] Waiting for process rollout_proc5 to join... -[2025-08-22 18:42:55,651][19241] Waiting for process rollout_proc6 to join... -[2025-08-22 18:42:55,651][19241] Waiting for process rollout_proc7 to join... -[2025-08-22 18:42:55,652][19241] Batcher 0 profile tree view: -batching: 12.5134, releasing_batches: 0.0353 -[2025-08-22 18:42:55,654][19241] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0000 - wait_policy_total: 28.2706 -update_model: 6.9900 - weight_update: 0.0011 -one_step: 0.0029 - handle_policy_step: 549.4462 - deserialize: 13.7269, stack: 2.5084, obs_to_device_normalize: 133.6489, forward: 259.6892, send_messages: 28.1017 - prepare_outputs: 97.0431 - to_cpu: 74.6299 -[2025-08-22 18:42:55,656][19241] Learner 0 profile tree view: -misc: 0.0062, prepare_batch: 13.1796 -train: 45.9289 - epoch_init: 0.0050, minibatch_init: 0.0067, losses_postprocess: 0.5105, kl_divergence: 0.5937, after_optimizer: 18.0551 - calculate_losses: 17.8366 - losses_init: 0.0029, forward_head: 1.3506, bptt_initial: 12.2350, tail: 0.7778, advantages_returns: 0.2192, losses: 1.6308 - bptt: 1.4168 - bptt_forward_core: 1.3449 - update: 8.4784 - clip: 0.9633 -[2025-08-22 18:42:55,656][19241] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.2265, enqueue_policy_requests: 18.0249, env_step: 234.4725, overhead: 13.7474, complete_rollouts: 0.5150 -save_policy_outputs: 15.3030 - split_output_tensors: 5.3707 -[2025-08-22 18:42:55,658][19241] Loop Runner_EvtLoop terminating... -[2025-08-22 18:42:55,660][19241] Runner profile tree view: -main_loop: 613.6903 -[2025-08-22 18:42:55,661][19241] Collected {0: 4005888}, FPS: 6527.5 -[2025-08-22 19:02:09,902][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:02:09,903][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:02:09,904][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:02:09,905][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:02:09,906][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:02:09,906][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:02:09,907][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:02:09,908][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:02:09,909][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-22 19:02:09,909][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-22 19:02:09,911][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:02:09,911][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:02:09,912][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:02:09,913][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:02:09,914][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:02:09,976][19241] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-22 19:02:09,986][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:02:09,995][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:02:10,089][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:02:10,332][19241] Conv encoder output size: 512 -[2025-08-22 19:02:10,333][19241] Policy head output size: 512 -[2025-08-22 19:02:11,755][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:02:11,766][19241] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:02:11,771][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:02:11,772][19241] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:02:11,774][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:02:11,776][19241] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:02:21,150][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:02:21,153][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:02:21,154][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:02:21,155][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:02:21,156][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:02:21,157][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:02:21,158][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:02:21,160][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:02:21,161][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-22 19:02:21,163][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-22 19:02:21,163][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:02:21,164][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:02:21,165][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:02:21,166][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:02:21,166][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:02:21,200][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:02:21,203][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:02:21,216][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:02:21,251][19241] Conv encoder output size: 512 -[2025-08-22 19:02:21,252][19241] Policy head output size: 512 -[2025-08-22 19:02:21,271][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:02:21,274][19241] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:02:21,275][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:02:21,277][19241] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:02:21,279][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:02:21,281][19241] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:02:52,880][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:02:52,881][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:02:52,882][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:02:52,883][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:02:52,884][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:02:52,885][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:02:52,886][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-22 19:02:52,887][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:02:52,888][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-22 19:02:52,889][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-22 19:02:52,890][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:02:52,892][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:02:52,893][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:02:52,894][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:02:52,895][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:02:52,924][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:02:52,926][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:02:52,939][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:02:52,977][19241] Conv encoder output size: 512 -[2025-08-22 19:02:52,978][19241] Policy head output size: 512 -[2025-08-22 19:02:53,006][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:02:53,009][19241] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:02:53,011][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:02:53,012][19241] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:02:53,014][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:02:53,017][19241] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:07:28,136][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:07:28,138][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:07:28,140][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:07:28,142][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:07:28,143][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:07:28,144][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:07:28,144][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:07:28,145][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:07:28,147][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-22 19:07:28,148][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-22 19:07:28,148][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:07:28,149][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:07:28,150][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:07:28,151][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:07:28,152][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:07:28,182][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:07:28,184][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:07:28,198][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:07:28,246][19241] Conv encoder output size: 512 -[2025-08-22 19:07:28,247][19241] Policy head output size: 512 -[2025-08-22 19:07:28,292][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:07:28,296][19241] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.dtype])` or the `torch.serialization.safe_globals([numpy.dtype])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:07:28,298][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:07:28,299][19241] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.dtype])` or the `torch.serialization.safe_globals([numpy.dtype])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:07:28,300][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:07:28,302][19241] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.dtype])` or the `torch.serialization.safe_globals([numpy.dtype])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:08:04,895][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:08:04,897][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:08:04,898][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:08:04,898][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:08:04,899][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:08:04,900][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:08:04,900][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:08:04,901][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:08:04,902][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-22 19:08:04,902][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-22 19:08:04,903][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:08:04,904][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:08:04,904][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:08:04,905][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:08:04,907][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:08:04,928][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:08:04,930][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:08:04,939][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:08:04,976][19241] Conv encoder output size: 512 -[2025-08-22 19:08:04,979][19241] Policy head output size: 512 -[2025-08-22 19:08:05,023][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:08:05,024][19241] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:08:05,026][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:08:05,027][19241] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:08:05,028][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:08:05,029][19241] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:10:32,993][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:10:32,995][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:10:32,997][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:10:32,998][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:10:32,999][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:10:33,001][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:10:33,001][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:10:33,003][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:10:33,004][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-22 19:10:33,005][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-22 19:10:33,006][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:10:33,007][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:10:33,008][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:10:33,009][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:10:33,011][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:10:33,039][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:10:33,041][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:10:33,051][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:10:33,086][19241] Conv encoder output size: 512 -[2025-08-22 19:10:33,088][19241] Policy head output size: 512 -[2025-08-22 19:10:33,116][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:10:33,120][19241] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:10:33,123][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:10:33,125][19241] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:10:33,126][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:10:33,128][19241] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:11:15,711][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:11:15,712][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:11:15,714][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:11:15,716][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:11:15,717][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:11:15,718][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:11:15,719][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:11:15,720][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:11:15,721][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-22 19:11:15,722][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-22 19:11:15,723][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:11:15,723][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:11:15,724][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:11:15,725][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:11:15,726][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:11:15,754][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:11:15,757][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:11:15,766][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:11:15,809][19241] Conv encoder output size: 512 -[2025-08-22 19:11:15,811][19241] Policy head output size: 512 -[2025-08-22 19:11:15,869][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:11:15,871][19241] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:11:15,874][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:11:15,875][19241] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:11:15,877][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:11:15,879][19241] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:11:30,952][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:11:30,953][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:11:30,954][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:11:30,956][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:11:30,957][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:11:30,958][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:11:30,960][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:11:30,962][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:11:30,963][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-22 19:11:30,964][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-22 19:11:30,966][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:11:30,967][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:11:30,969][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:11:30,970][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:11:30,972][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:11:30,998][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:11:31,000][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:11:31,008][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:11:31,049][19241] Conv encoder output size: 512 -[2025-08-22 19:11:31,050][19241] Policy head output size: 512 -[2025-08-22 19:11:31,092][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:11:31,094][19241] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:11:31,095][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:11:31,096][19241] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:11:31,097][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:11:31,098][19241] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. -Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-22 19:11:54,717][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:11:54,718][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:11:54,720][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:11:54,720][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:11:54,721][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:11:54,723][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:11:54,724][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:11:54,725][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:11:54,726][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-22 19:11:54,727][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-22 19:11:54,728][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:11:54,729][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:11:54,730][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:11:54,731][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:11:54,732][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:11:54,758][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:11:54,761][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:11:54,770][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:11:54,813][19241] Conv encoder output size: 512 -[2025-08-22 19:11:54,815][19241] Policy head output size: 512 -[2025-08-22 19:11:54,857][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:11:56,713][19241] Num frames 100... -[2025-08-22 19:11:56,918][19241] Num frames 200... -[2025-08-22 19:11:57,168][19241] Num frames 300... -[2025-08-22 19:11:57,394][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-22 19:11:57,395][19241] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-22 19:11:57,429][19241] Num frames 400... -[2025-08-22 19:11:57,621][19241] Num frames 500... -[2025-08-22 19:11:57,811][19241] Num frames 600... -[2025-08-22 19:11:58,008][19241] Num frames 700... -[2025-08-22 19:11:58,192][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-22 19:11:58,193][19241] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-22 19:11:58,252][19241] Num frames 800... -[2025-08-22 19:11:58,444][19241] Num frames 900... -[2025-08-22 19:11:58,680][19241] Num frames 1000... -[2025-08-22 19:11:58,889][19241] Num frames 1100... -[2025-08-22 19:11:59,056][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-22 19:11:59,058][19241] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-22 19:11:59,169][19241] Num frames 1200... -[2025-08-22 19:12:03,384][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:12:03,386][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:12:03,387][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:12:03,388][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:12:03,389][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:12:03,390][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:12:03,391][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:12:03,392][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:12:03,393][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-22 19:12:03,394][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-22 19:12:03,395][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:12:03,395][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:12:03,396][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:12:03,398][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:12:03,398][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:12:03,427][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:12:03,429][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:12:03,438][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:12:03,468][19241] Conv encoder output size: 512 -[2025-08-22 19:12:03,470][19241] Policy head output size: 512 -[2025-08-22 19:12:03,490][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:12:03,864][19241] Num frames 100... -[2025-08-22 19:12:04,041][19241] Num frames 200... -[2025-08-22 19:12:04,233][19241] Num frames 300... -[2025-08-22 19:12:04,413][19241] Num frames 400... -[2025-08-22 19:12:04,555][19241] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 -[2025-08-22 19:12:04,556][19241] Avg episode reward: 5.480, avg true_objective: 4.480 -[2025-08-22 19:12:04,658][19241] Num frames 500... -[2025-08-22 19:12:04,869][19241] Num frames 600... -[2025-08-22 19:12:05,070][19241] Num frames 700... -[2025-08-22 19:12:05,247][19241] Num frames 800... -[2025-08-22 19:12:05,369][19241] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 -[2025-08-22 19:12:05,370][19241] Avg episode reward: 4.660, avg true_objective: 4.160 -[2025-08-22 19:12:05,512][19241] Num frames 900... -[2025-08-22 19:12:05,686][19241] Num frames 1000... -[2025-08-22 19:12:05,866][19241] Num frames 1100... -[2025-08-22 19:12:06,064][19241] Num frames 1200... -[2025-08-22 19:12:06,154][19241] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 -[2025-08-22 19:12:06,156][19241] Avg episode reward: 4.387, avg true_objective: 4.053 -[2025-08-22 19:12:06,324][19241] Num frames 1300... -[2025-08-22 19:12:06,483][19241] Num frames 1400... -[2025-08-22 19:12:06,674][19241] Num frames 1500... -[2025-08-22 19:12:06,899][19241] Num frames 1600... -[2025-08-22 19:12:07,068][19241] Num frames 1700... -[2025-08-22 19:12:07,253][19241] Num frames 1800... -[2025-08-22 19:12:07,354][19241] Avg episode rewards: #0: 5.560, true rewards: #0: 4.560 -[2025-08-22 19:12:07,355][19241] Avg episode reward: 5.560, avg true_objective: 4.560 -[2025-08-22 19:12:07,531][19241] Num frames 1900... -[2025-08-22 19:12:07,902][19241] Num frames 2000... -[2025-08-22 19:12:08,175][19241] Num frames 2100... -[2025-08-22 19:12:08,416][19241] Num frames 2200... -[2025-08-22 19:12:08,489][19241] Avg episode rewards: #0: 5.216, true rewards: #0: 4.416 -[2025-08-22 19:12:08,489][19241] Avg episode reward: 5.216, avg true_objective: 4.416 -[2025-08-22 19:12:08,640][19241] Num frames 2300... -[2025-08-22 19:12:08,828][19241] Num frames 2400... -[2025-08-22 19:12:09,019][19241] Num frames 2500... -[2025-08-22 19:12:09,232][19241] Avg episode rewards: #0: 4.987, true rewards: #0: 4.320 -[2025-08-22 19:12:09,234][19241] Avg episode reward: 4.987, avg true_objective: 4.320 -[2025-08-22 19:12:09,252][19241] Num frames 2600... -[2025-08-22 19:12:09,463][19241] Num frames 2700... -[2025-08-22 19:12:09,653][19241] Num frames 2800... -[2025-08-22 19:12:09,856][19241] Num frames 2900... -[2025-08-22 19:12:10,063][19241] Avg episode rewards: #0: 4.823, true rewards: #0: 4.251 -[2025-08-22 19:12:10,065][19241] Avg episode reward: 4.823, avg true_objective: 4.251 -[2025-08-22 19:12:10,115][19241] Num frames 3000... -[2025-08-22 19:12:10,313][19241] Num frames 3100... -[2025-08-22 19:12:10,519][19241] Num frames 3200... -[2025-08-22 19:12:10,722][19241] Num frames 3300... -[2025-08-22 19:12:10,890][19241] Avg episode rewards: #0: 4.700, true rewards: #0: 4.200 -[2025-08-22 19:12:10,891][19241] Avg episode reward: 4.700, avg true_objective: 4.200 -[2025-08-22 19:12:11,017][19241] Num frames 3400... -[2025-08-22 19:12:11,282][19241] Num frames 3500... -[2025-08-22 19:12:11,455][19241] Num frames 3600... -[2025-08-22 19:12:11,631][19241] Num frames 3700... -[2025-08-22 19:12:11,819][19241] Avg episode rewards: #0: 4.751, true rewards: #0: 4.196 -[2025-08-22 19:12:11,820][19241] Avg episode reward: 4.751, avg true_objective: 4.196 -[2025-08-22 19:12:11,874][19241] Num frames 3800... -[2025-08-22 19:12:12,061][19241] Num frames 3900... -[2025-08-22 19:12:12,251][19241] Num frames 4000... -[2025-08-22 19:12:12,445][19241] Num frames 4100... -[2025-08-22 19:12:12,644][19241] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 -[2025-08-22 19:12:12,646][19241] Avg episode reward: 4.660, avg true_objective: 4.160 -[2025-08-22 19:12:18,808][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-22 19:12:25,954][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:12:25,955][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:12:25,956][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:12:25,957][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:12:25,959][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:12:25,960][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:12:25,961][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-22 19:12:25,962][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:12:25,963][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-22 19:12:25,964][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-22 19:12:25,965][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:12:25,965][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:12:25,966][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:12:25,967][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:12:25,968][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:12:25,984][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:12:25,986][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:12:25,997][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:12:26,036][19241] Conv encoder output size: 512 -[2025-08-22 19:12:26,038][19241] Policy head output size: 512 -[2025-08-22 19:12:26,058][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:12:26,470][19241] Num frames 100... -[2025-08-22 19:12:26,654][19241] Num frames 200... -[2025-08-22 19:12:26,837][19241] Num frames 300... -[2025-08-22 19:12:27,042][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-22 19:12:27,043][19241] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-22 19:12:27,081][19241] Num frames 400... -[2025-08-22 19:12:27,286][19241] Num frames 500... -[2025-08-22 19:12:27,455][19241] Num frames 600... -[2025-08-22 19:12:27,634][19241] Num frames 700... -[2025-08-22 19:12:27,849][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-22 19:12:27,850][19241] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-22 19:12:27,929][19241] Num frames 800... -[2025-08-22 19:12:28,094][19241] Num frames 900... -[2025-08-22 19:12:28,264][19241] Num frames 1000... -[2025-08-22 19:12:28,439][19241] Num frames 1100... -[2025-08-22 19:12:28,589][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-22 19:12:28,591][19241] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-22 19:12:28,686][19241] Num frames 1200... -[2025-08-22 19:12:28,856][19241] Num frames 1300... -[2025-08-22 19:12:29,029][19241] Num frames 1400... -[2025-08-22 19:12:29,193][19241] Num frames 1500... -[2025-08-22 19:12:29,365][19241] Avg episode rewards: #0: 4.170, true rewards: #0: 3.920 -[2025-08-22 19:12:29,366][19241] Avg episode reward: 4.170, avg true_objective: 3.920 -[2025-08-22 19:12:29,428][19241] Num frames 1600... -[2025-08-22 19:12:29,606][19241] Num frames 1700... -[2025-08-22 19:12:29,769][19241] Num frames 1800... -[2025-08-22 19:12:29,944][19241] Num frames 1900... -[2025-08-22 19:12:30,164][19241] Avg episode rewards: #0: 4.368, true rewards: #0: 3.968 -[2025-08-22 19:12:30,166][19241] Avg episode reward: 4.368, avg true_objective: 3.968 -[2025-08-22 19:12:30,207][19241] Num frames 2000... -[2025-08-22 19:12:30,532][19241] Num frames 2100... -[2025-08-22 19:12:30,774][19241] Num frames 2200... -[2025-08-22 19:12:30,991][19241] Num frames 2300... -[2025-08-22 19:12:31,184][19241] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947 -[2025-08-22 19:12:31,186][19241] Avg episode reward: 4.280, avg true_objective: 3.947 -[2025-08-22 19:12:31,269][19241] Num frames 2400... -[2025-08-22 19:12:31,492][19241] Num frames 2500... -[2025-08-22 19:12:31,697][19241] Num frames 2600... -[2025-08-22 19:12:31,903][19241] Num frames 2700... -[2025-08-22 19:12:32,111][19241] Num frames 2800... -[2025-08-22 19:12:32,204][19241] Avg episode rewards: #0: 4.451, true rewards: #0: 4.023 -[2025-08-22 19:12:32,206][19241] Avg episode reward: 4.451, avg true_objective: 4.023 -[2025-08-22 19:12:32,396][19241] Num frames 2900... -[2025-08-22 19:12:32,580][19241] Num frames 3000... -[2025-08-22 19:12:32,772][19241] Num frames 3100... -[2025-08-22 19:12:32,952][19241] Num frames 3200... -[2025-08-22 19:12:33,067][19241] Avg episode rewards: #0: 4.540, true rewards: #0: 4.040 -[2025-08-22 19:12:33,069][19241] Avg episode reward: 4.540, avg true_objective: 4.040 -[2025-08-22 19:12:33,183][19241] Num frames 3300... -[2025-08-22 19:12:33,445][19241] Num frames 3400... -[2025-08-22 19:12:33,618][19241] Num frames 3500... -[2025-08-22 19:12:33,815][19241] Num frames 3600... -[2025-08-22 19:12:33,992][19241] Num frames 3700... -[2025-08-22 19:12:34,188][19241] Avg episode rewards: #0: 4.862, true rewards: #0: 4.196 -[2025-08-22 19:12:34,190][19241] Avg episode reward: 4.862, avg true_objective: 4.196 -[2025-08-22 19:12:34,258][19241] Num frames 3800... -[2025-08-22 19:12:34,467][19241] Num frames 3900... -[2025-08-22 19:12:34,670][19241] Num frames 4000... -[2025-08-22 19:12:34,850][19241] Num frames 4100... -[2025-08-22 19:12:35,004][19241] Avg episode rewards: #0: 4.760, true rewards: #0: 4.160 -[2025-08-22 19:12:35,006][19241] Avg episode reward: 4.760, avg true_objective: 4.160 -[2025-08-22 19:12:40,632][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-22 19:14:00,178][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:14:00,179][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:14:00,180][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:14:00,181][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:14:00,182][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:14:00,183][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:14:00,184][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-22 19:14:00,185][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:14:00,186][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-22 19:14:00,187][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-22 19:14:00,188][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:14:00,190][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:14:00,191][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:14:00,192][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:14:00,192][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:14:00,222][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:14:00,224][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:14:00,236][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:14:00,274][19241] Conv encoder output size: 512 -[2025-08-22 19:14:00,276][19241] Policy head output size: 512 -[2025-08-22 19:14:00,297][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:14:00,715][19241] Num frames 100... -[2025-08-22 19:14:00,890][19241] Num frames 200... -[2025-08-22 19:14:01,079][19241] Num frames 300... -[2025-08-22 19:14:01,282][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-22 19:14:01,283][19241] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-22 19:14:01,310][19241] Num frames 400... -[2025-08-22 19:14:01,505][19241] Num frames 500... -[2025-08-22 19:14:01,744][19241] Num frames 600... -[2025-08-22 19:14:01,916][19241] Num frames 700... -[2025-08-22 19:14:02,100][19241] Num frames 800... -[2025-08-22 19:14:02,200][19241] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 -[2025-08-22 19:14:02,201][19241] Avg episode reward: 4.660, avg true_objective: 4.160 -[2025-08-22 19:14:02,318][19241] Num frames 900... -[2025-08-22 19:14:02,499][19241] Num frames 1000... -[2025-08-22 19:14:02,710][19241] Num frames 1100... -[2025-08-22 19:14:02,935][19241] Num frames 1200... -[2025-08-22 19:14:03,134][19241] Num frames 1300... -[2025-08-22 19:14:03,335][19241] Avg episode rewards: #0: 5.587, true rewards: #0: 4.587 -[2025-08-22 19:14:03,336][19241] Avg episode reward: 5.587, avg true_objective: 4.587 -[2025-08-22 19:14:03,385][19241] Num frames 1400... -[2025-08-22 19:14:03,563][19241] Num frames 1500... -[2025-08-22 19:14:03,732][19241] Num frames 1600... -[2025-08-22 19:14:03,918][19241] Num frames 1700... -[2025-08-22 19:14:04,116][19241] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 -[2025-08-22 19:14:04,117][19241] Avg episode reward: 5.480, avg true_objective: 4.480 -[2025-08-22 19:14:04,133][19241] Num frames 1800... -[2025-08-22 19:14:04,319][19241] Num frames 1900... -[2025-08-22 19:14:04,497][19241] Num frames 2000... -[2025-08-22 19:14:04,696][19241] Num frames 2100... -[2025-08-22 19:14:04,891][19241] Avg episode rewards: #0: 5.152, true rewards: #0: 4.352 -[2025-08-22 19:14:04,892][19241] Avg episode reward: 5.152, avg true_objective: 4.352 -[2025-08-22 19:14:04,947][19241] Num frames 2200... -[2025-08-22 19:14:08,057][19241] Num frames 2300... -[2025-08-22 19:14:08,240][19241] Num frames 2400... -[2025-08-22 19:14:08,497][19241] Num frames 2500... -[2025-08-22 19:14:08,661][19241] Avg episode rewards: #0: 4.933, true rewards: #0: 4.267 -[2025-08-22 19:14:08,662][19241] Avg episode reward: 4.933, avg true_objective: 4.267 -[2025-08-22 19:14:08,735][19241] Num frames 2600... -[2025-08-22 19:14:08,914][19241] Num frames 2700... -[2025-08-22 19:14:09,087][19241] Num frames 2800... -[2025-08-22 19:14:09,287][19241] Num frames 2900... -[2025-08-22 19:14:09,434][19241] Avg episode rewards: #0: 4.777, true rewards: #0: 4.206 -[2025-08-22 19:14:09,435][19241] Avg episode reward: 4.777, avg true_objective: 4.206 -[2025-08-22 19:14:09,566][19241] Num frames 3000... -[2025-08-22 19:14:09,790][19241] Num frames 3100... -[2025-08-22 19:14:10,003][19241] Num frames 3200... -[2025-08-22 19:14:10,211][19241] Num frames 3300... -[2025-08-22 19:14:10,362][19241] Avg episode rewards: #0: 4.825, true rewards: #0: 4.200 -[2025-08-22 19:14:10,364][19241] Avg episode reward: 4.825, avg true_objective: 4.200 -[2025-08-22 19:14:10,434][19241] Num frames 3400... -[2025-08-22 19:14:10,623][19241] Num frames 3500... -[2025-08-22 19:14:10,854][19241] Num frames 3600... -[2025-08-22 19:14:11,040][19241] Num frames 3700... -[2025-08-22 19:14:11,169][19241] Avg episode rewards: #0: 4.716, true rewards: #0: 4.160 -[2025-08-22 19:14:11,170][19241] Avg episode reward: 4.716, avg true_objective: 4.160 -[2025-08-22 19:14:11,272][19241] Num frames 3800... -[2025-08-22 19:14:11,460][19241] Num frames 3900... -[2025-08-22 19:14:11,633][19241] Num frames 4000... -[2025-08-22 19:14:11,804][19241] Num frames 4100... -[2025-08-22 19:14:11,915][19241] Avg episode rewards: #0: 4.628, true rewards: #0: 4.128 -[2025-08-22 19:14:11,916][19241] Avg episode reward: 4.628, avg true_objective: 4.128 -[2025-08-22 19:14:17,610][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-22 19:19:03,331][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:19:03,333][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:19:03,335][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:19:03,336][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:19:03,337][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:19:03,338][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:19:03,339][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-22 19:19:03,340][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:19:03,341][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-22 19:19:03,343][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-22 19:19:03,344][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:19:03,344][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:19:03,345][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:19:03,346][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:19:03,347][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:19:03,386][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:19:03,389][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:19:03,404][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:19:03,466][19241] Conv encoder output size: 512 -[2025-08-22 19:19:03,472][19241] Policy head output size: 512 -[2025-08-22 19:19:03,511][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:19:03,994][19241] Num frames 100... -[2025-08-22 19:19:04,180][19241] Num frames 200... -[2025-08-22 19:19:04,358][19241] Num frames 300... -[2025-08-22 19:19:04,532][19241] Num frames 400... -[2025-08-22 19:19:04,675][19241] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 -[2025-08-22 19:19:04,676][19241] Avg episode reward: 5.480, avg true_objective: 4.480 -[2025-08-22 19:19:04,778][19241] Num frames 500... -[2025-08-22 19:19:04,948][19241] Num frames 600... -[2025-08-22 19:19:05,126][19241] Num frames 700... -[2025-08-22 19:19:05,229][19241] Avg episode rewards: #0: 4.110, true rewards: #0: 3.610 -[2025-08-22 19:19:05,230][19241] Avg episode reward: 4.110, avg true_objective: 3.610 -[2025-08-22 19:19:05,368][19241] Num frames 800... -[2025-08-22 19:19:05,555][19241] Num frames 900... -[2025-08-22 19:19:05,749][19241] Avg episode rewards: #0: 3.593, true rewards: #0: 3.260 -[2025-08-22 19:19:05,751][19241] Avg episode reward: 3.593, avg true_objective: 3.260 -[2025-08-22 19:19:05,798][19241] Num frames 1000... -[2025-08-22 19:19:05,987][19241] Num frames 1100... -[2025-08-22 19:19:06,186][19241] Num frames 1200... -[2025-08-22 19:19:06,368][19241] Num frames 1300... -[2025-08-22 19:19:06,540][19241] Avg episode rewards: #0: 3.655, true rewards: #0: 3.405 -[2025-08-22 19:19:06,542][19241] Avg episode reward: 3.655, avg true_objective: 3.405 -[2025-08-22 19:19:06,622][19241] Num frames 1400... -[2025-08-22 19:19:06,884][19241] Num frames 1500... -[2025-08-22 19:19:07,055][19241] Num frames 1600... -[2025-08-22 19:19:07,263][19241] Num frames 1700... -[2025-08-22 19:19:07,402][19241] Avg episode rewards: #0: 3.692, true rewards: #0: 3.492 -[2025-08-22 19:19:07,404][19241] Avg episode reward: 3.692, avg true_objective: 3.492 -[2025-08-22 19:19:07,504][19241] Num frames 1800... -[2025-08-22 19:19:07,721][19241] Num frames 1900... -[2025-08-22 19:19:07,954][19241] Num frames 2000... -[2025-08-22 19:19:08,185][19241] Num frames 2100... -[2025-08-22 19:19:08,308][19241] Avg episode rewards: #0: 3.717, true rewards: #0: 3.550 -[2025-08-22 19:19:08,309][19241] Avg episode reward: 3.717, avg true_objective: 3.550 -[2025-08-22 19:19:08,473][19241] Num frames 2200... -[2025-08-22 19:19:08,724][19241] Num frames 2300... -[2025-08-22 19:19:08,927][19241] Num frames 2400... -[2025-08-22 19:19:09,102][19241] Num frames 2500... -[2025-08-22 19:19:09,187][19241] Avg episode rewards: #0: 3.734, true rewards: #0: 3.591 -[2025-08-22 19:19:09,189][19241] Avg episode reward: 3.734, avg true_objective: 3.591 -[2025-08-22 19:19:09,358][19241] Num frames 2600... -[2025-08-22 19:19:09,557][19241] Num frames 2700... -[2025-08-22 19:19:09,752][19241] Num frames 2800... -[2025-08-22 19:19:09,994][19241] Num frames 2900... -[2025-08-22 19:19:10,157][19241] Avg episode rewards: #0: 3.953, true rewards: #0: 3.702 -[2025-08-22 19:19:10,159][19241] Avg episode reward: 3.953, avg true_objective: 3.702 -[2025-08-22 19:19:10,238][19241] Num frames 3000... -[2025-08-22 19:19:10,407][19241] Num frames 3100... -[2025-08-22 19:19:10,587][19241] Num frames 3200... -[2025-08-22 19:19:10,763][19241] Num frames 3300... -[2025-08-22 19:19:10,899][19241] Avg episode rewards: #0: 3.940, true rewards: #0: 3.718 -[2025-08-22 19:19:10,900][19241] Avg episode reward: 3.940, avg true_objective: 3.718 -[2025-08-22 19:19:10,996][19241] Num frames 3400... -[2025-08-22 19:19:11,168][19241] Num frames 3500... -[2025-08-22 19:19:11,356][19241] Num frames 3600... -[2025-08-22 19:19:11,532][19241] Num frames 3700... -[2025-08-22 19:19:11,639][19241] Avg episode rewards: #0: 3.930, true rewards: #0: 3.730 -[2025-08-22 19:19:11,640][19241] Avg episode reward: 3.930, avg true_objective: 3.730 -[2025-08-22 19:19:16,692][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-22 19:21:49,310][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:21:49,312][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:21:49,313][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:21:49,314][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:21:49,315][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:21:49,316][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:21:49,317][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-22 19:21:49,318][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:21:49,319][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-22 19:21:49,320][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-22 19:21:49,321][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:21:49,322][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:21:49,323][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:21:49,323][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:21:49,324][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:21:49,351][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:21:49,352][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:21:49,365][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:21:49,403][19241] Conv encoder output size: 512 -[2025-08-22 19:21:49,405][19241] Policy head output size: 512 -[2025-08-22 19:21:49,448][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:21:50,038][19241] Num frames 100... -[2025-08-22 19:21:50,278][19241] Num frames 200... -[2025-08-22 19:21:50,493][19241] Num frames 300... -[2025-08-22 19:21:50,738][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-22 19:21:50,740][19241] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-22 19:21:50,773][19241] Num frames 400... -[2025-08-22 19:21:50,956][19241] Num frames 500... -[2025-08-22 19:21:51,161][19241] Num frames 600... -[2025-08-22 19:21:51,353][19241] Num frames 700... -[2025-08-22 19:21:51,533][19241] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-22 19:21:51,534][19241] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-22 19:21:51,635][19241] Num frames 800... -[2025-08-22 19:21:51,846][19241] Num frames 900... -[2025-08-22 19:21:52,030][19241] Num frames 1000... -[2025-08-22 19:21:52,259][19241] Num frames 1100... -[2025-08-22 19:21:52,458][19241] Num frames 1200... -[2025-08-22 19:21:52,644][19241] Num frames 1300... -[2025-08-22 19:21:52,818][19241] Num frames 1400... -[2025-08-22 19:21:52,884][19241] Avg episode rewards: #0: 6.027, true rewards: #0: 4.693 -[2025-08-22 19:21:52,885][19241] Avg episode reward: 6.027, avg true_objective: 4.693 -[2025-08-22 19:21:53,050][19241] Num frames 1500... -[2025-08-22 19:21:53,243][19241] Num frames 1600... -[2025-08-22 19:21:53,425][19241] Num frames 1700... -[2025-08-22 19:21:53,633][19241] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 -[2025-08-22 19:21:53,634][19241] Avg episode reward: 5.480, avg true_objective: 4.480 -[2025-08-22 19:21:53,647][19241] Num frames 1800... -[2025-08-22 19:21:53,811][19241] Num frames 1900... -[2025-08-22 19:21:53,992][19241] Num frames 2000... -[2025-08-22 19:21:54,174][19241] Num frames 2100... -[2025-08-22 19:21:54,361][19241] Avg episode rewards: #0: 5.152, true rewards: #0: 4.352 -[2025-08-22 19:21:54,363][19241] Avg episode reward: 5.152, avg true_objective: 4.352 -[2025-08-22 19:21:54,413][19241] Num frames 2200... -[2025-08-22 19:21:54,617][19241] Num frames 2300... -[2025-08-22 19:21:54,816][19241] Num frames 2400... -[2025-08-22 19:21:55,001][19241] Num frames 2500... -[2025-08-22 19:21:55,165][19241] Avg episode rewards: #0: 4.933, true rewards: #0: 4.267 -[2025-08-22 19:21:55,167][19241] Avg episode reward: 4.933, avg true_objective: 4.267 -[2025-08-22 19:21:55,235][19241] Num frames 2600... -[2025-08-22 19:21:55,411][19241] Num frames 2700... -[2025-08-22 19:21:55,587][19241] Num frames 2800... -[2025-08-22 19:21:55,744][19241] Num frames 2900... -[2025-08-22 19:21:55,874][19241] Avg episode rewards: #0: 4.777, true rewards: #0: 4.206 -[2025-08-22 19:21:55,876][19241] Avg episode reward: 4.777, avg true_objective: 4.206 -[2025-08-22 19:21:55,983][19241] Num frames 3000... -[2025-08-22 19:21:56,151][19241] Num frames 3100... -[2025-08-22 19:21:56,310][19241] Num frames 3200... -[2025-08-22 19:21:56,483][19241] Num frames 3300... -[2025-08-22 19:21:56,591][19241] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 -[2025-08-22 19:21:56,592][19241] Avg episode reward: 4.660, avg true_objective: 4.160 -[2025-08-22 19:21:56,727][19241] Num frames 3400... -[2025-08-22 19:21:56,922][19241] Num frames 3500... -[2025-08-22 19:21:57,130][19241] Num frames 3600... -[2025-08-22 19:21:57,313][19241] Num frames 3700... -[2025-08-22 19:21:57,406][19241] Avg episode rewards: #0: 4.569, true rewards: #0: 4.124 -[2025-08-22 19:21:57,408][19241] Avg episode reward: 4.569, avg true_objective: 4.124 -[2025-08-22 19:21:57,560][19241] Num frames 3800... -[2025-08-22 19:21:57,753][19241] Num frames 3900... -[2025-08-22 19:21:57,914][19241] Num frames 4000... -[2025-08-22 19:21:58,136][19241] Avg episode rewards: #0: 4.496, true rewards: #0: 4.096 -[2025-08-22 19:21:58,137][19241] Avg episode reward: 4.496, avg true_objective: 4.096 -[2025-08-22 19:22:03,544][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-22 19:24:35,459][19241] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-22 19:24:35,461][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:24:35,462][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:24:35,463][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:24:35,464][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:24:35,465][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:24:35,465][19241] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-22 19:24:35,466][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:24:35,467][19241] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-22 19:24:35,468][19241] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-22 19:24:35,469][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:24:35,470][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:24:35,470][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:24:35,471][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:24:35,472][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:24:35,498][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:24:35,500][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:24:35,510][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:24:35,549][19241] Conv encoder output size: 512 -[2025-08-22 19:24:35,551][19241] Policy head output size: 512 -[2025-08-22 19:24:35,572][19241] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-22 19:24:36,013][19241] Num frames 100... -[2025-08-22 19:24:36,223][19241] Num frames 200... -[2025-08-22 19:24:36,427][19241] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 -[2025-08-22 19:24:36,429][19241] Avg episode reward: 2.560, avg true_objective: 2.560 -[2025-08-22 19:24:36,537][19241] Num frames 300... -[2025-08-22 19:24:36,765][19241] Num frames 400... -[2025-08-22 19:24:36,974][19241] Num frames 500... -[2025-08-22 19:24:37,139][19241] Num frames 600... -[2025-08-22 19:24:37,316][19241] Avg episode rewards: #0: 3.860, true rewards: #0: 3.360 -[2025-08-22 19:24:37,317][19241] Avg episode reward: 3.860, avg true_objective: 3.360 -[2025-08-22 19:24:37,377][19241] Num frames 700... -[2025-08-22 19:24:37,557][19241] Num frames 800... -[2025-08-22 19:24:37,729][19241] Num frames 900... -[2025-08-22 19:24:37,901][19241] Num frames 1000... -[2025-08-22 19:24:37,994][19241] Avg episode rewards: #0: 4.080, true rewards: #0: 3.413 -[2025-08-22 19:24:37,996][19241] Avg episode reward: 4.080, avg true_objective: 3.413 -[2025-08-22 19:24:38,132][19241] Num frames 1100... -[2025-08-22 19:24:41,253][19241] Num frames 1200... -[2025-08-22 19:24:41,432][19241] Num frames 1300... -[2025-08-22 19:24:41,648][19241] Num frames 1400... -[2025-08-22 19:24:41,725][19241] Avg episode rewards: #0: 4.020, true rewards: #0: 3.520 -[2025-08-22 19:24:41,727][19241] Avg episode reward: 4.020, avg true_objective: 3.520 -[2025-08-22 19:24:41,900][19241] Num frames 1500... -[2025-08-22 19:24:42,074][19241] Num frames 1600... -[2025-08-22 19:24:42,252][19241] Num frames 1700... -[2025-08-22 19:24:42,456][19241] Avg episode rewards: #0: 3.984, true rewards: #0: 3.584 -[2025-08-22 19:24:42,457][19241] Avg episode reward: 3.984, avg true_objective: 3.584 -[2025-08-22 19:24:42,474][19241] Num frames 1800... -[2025-08-22 19:24:42,638][19241] Num frames 1900... -[2025-08-22 19:24:42,804][19241] Num frames 2000... -[2025-08-22 19:24:42,970][19241] Num frames 2100... -[2025-08-22 19:24:43,150][19241] Num frames 2200... -[2025-08-22 19:24:43,275][19241] Avg episode rewards: #0: 4.233, true rewards: #0: 3.733 -[2025-08-22 19:24:43,277][19241] Avg episode reward: 4.233, avg true_objective: 3.733 -[2025-08-22 19:24:43,391][19241] Num frames 2300... -[2025-08-22 19:24:43,544][19241] Num frames 2400... -[2025-08-22 19:24:43,700][19241] Num frames 2500... -[2025-08-22 19:24:43,878][19241] Num frames 2600... -[2025-08-22 19:24:44,040][19241] Avg episode rewards: #0: 4.366, true rewards: #0: 3.794 -[2025-08-22 19:24:44,041][19241] Avg episode reward: 4.366, avg true_objective: 3.794 -[2025-08-22 19:24:44,124][19241] Num frames 2700... -[2025-08-22 19:24:44,301][19241] Num frames 2800... -[2025-08-22 19:24:44,479][19241] Num frames 2900... -[2025-08-22 19:24:44,660][19241] Num frames 3000... -[2025-08-22 19:24:44,835][19241] Num frames 3100... -[2025-08-22 19:24:45,016][19241] Avg episode rewards: #0: 4.710, true rewards: #0: 3.960 -[2025-08-22 19:24:45,018][19241] Avg episode reward: 4.710, avg true_objective: 3.960 -[2025-08-22 19:24:45,079][19241] Num frames 3200... -[2025-08-22 19:24:45,255][19241] Num frames 3300... -[2025-08-22 19:24:45,454][19241] Num frames 3400... -[2025-08-22 19:24:45,629][19241] Num frames 3500... -[2025-08-22 19:24:45,778][19241] Avg episode rewards: #0: 4.613, true rewards: #0: 3.947 -[2025-08-22 19:24:45,779][19241] Avg episode reward: 4.613, avg true_objective: 3.947 -[2025-08-22 19:24:45,871][19241] Num frames 3600... -[2025-08-22 19:24:46,116][19241] Num frames 3700... -[2025-08-22 19:24:46,281][19241] Num frames 3800... -[2025-08-22 19:24:46,451][19241] Num frames 3900... -[2025-08-22 19:24:46,564][19241] Avg episode rewards: #0: 4.536, true rewards: #0: 3.936 -[2025-08-22 19:24:46,565][19241] Avg episode reward: 4.536, avg true_objective: 3.936 -[2025-08-22 19:24:51,963][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-22 19:24:55,751][19241] The model has been pushed to https://huggingface.co/turbo-maikol/rl_course_vizdoom_health_gathering_supreme -[2025-08-22 19:28:43,395][19241] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json -[2025-08-22 19:28:43,399][19241] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json -[2025-08-22 19:28:43,401][19241] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line -[2025-08-22 19:28:43,402][19241] Overriding arg 'train_dir' with value 'train_dir' passed from command line -[2025-08-22 19:28:43,402][19241] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-22 19:28:43,404][19241] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! -[2025-08-22 19:28:43,406][19241] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! -[2025-08-22 19:28:43,409][19241] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! -[2025-08-22 19:28:43,410][19241] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-22 19:28:43,411][19241] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-22 19:28:43,412][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:28:43,414][19241] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-22 19:28:43,415][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-22 19:28:43,417][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-22 19:28:43,418][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-22 19:28:43,419][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-22 19:28:43,420][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-22 19:28:43,421][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-22 19:28:43,422][19241] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-22 19:28:43,423][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-22 19:28:43,423][19241] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-22 19:28:43,487][19241] RunningMeanStd input shape: (3, 72, 128) -[2025-08-22 19:28:43,492][19241] RunningMeanStd input shape: (1,) -[2025-08-22 19:28:43,512][19241] ConvEncoder: input_channels=3 -[2025-08-22 19:28:43,636][19241] Conv encoder output size: 512 -[2025-08-22 19:28:43,638][19241] Policy head output size: 512 -[2025-08-22 19:28:43,679][19241] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... -[2025-08-22 19:28:44,434][19241] Num frames 100... -[2025-08-22 19:28:47,541][19241] Num frames 200... -[2025-08-22 19:28:47,722][19241] Num frames 300... -[2025-08-22 19:28:47,905][19241] Num frames 400... -[2025-08-22 19:28:48,069][19241] Num frames 500... -[2025-08-22 19:28:48,335][19241] Num frames 600... -[2025-08-22 19:28:48,597][19241] Num frames 700... -[2025-08-22 19:28:48,814][19241] Num frames 800... -[2025-08-22 19:28:49,006][19241] Num frames 900... -[2025-08-22 19:28:49,161][19241] Num frames 1000... -[2025-08-22 19:28:49,344][19241] Num frames 1100... -[2025-08-22 19:28:49,498][19241] Num frames 1200... -[2025-08-22 19:28:49,686][19241] Num frames 1300... -[2025-08-22 19:28:49,856][19241] Num frames 1400... -[2025-08-22 19:28:50,021][19241] Num frames 1500... -[2025-08-22 19:28:50,213][19241] Num frames 1600... -[2025-08-22 19:28:50,387][19241] Num frames 1700... -[2025-08-22 19:28:50,560][19241] Num frames 1800... -[2025-08-22 19:28:50,740][19241] Num frames 1900... -[2025-08-22 19:28:50,909][19241] Num frames 2000... -[2025-08-22 19:28:51,091][19241] Num frames 2100... -[2025-08-22 19:28:51,143][19241] Avg episode rewards: #0: 68.999, true rewards: #0: 21.000 -[2025-08-22 19:28:51,145][19241] Avg episode reward: 68.999, avg true_objective: 21.000 -[2025-08-22 19:28:51,323][19241] Num frames 2200... -[2025-08-22 19:28:51,495][19241] Num frames 2300... -[2025-08-22 19:28:51,679][19241] Num frames 2400... -[2025-08-22 19:28:51,962][19241] Num frames 2500... -[2025-08-22 19:28:52,278][19241] Num frames 2600... -[2025-08-22 19:28:52,520][19241] Num frames 2700... -[2025-08-22 19:28:52,740][19241] Num frames 2800... -[2025-08-22 19:28:52,962][19241] Num frames 2900... -[2025-08-22 19:28:53,199][19241] Num frames 3000... -[2025-08-22 19:28:53,398][19241] Num frames 3100... -[2025-08-22 19:28:53,593][19241] Num frames 3200... -[2025-08-22 19:28:53,894][19241] Num frames 3300... -[2025-08-22 19:28:54,117][19241] Num frames 3400... -[2025-08-22 19:28:54,320][19241] Num frames 3500... -[2025-08-22 19:28:54,501][19241] Num frames 3600... -[2025-08-22 19:28:54,701][19241] Num frames 3700... -[2025-08-22 19:28:54,982][19241] Num frames 3800... -[2025-08-22 19:28:55,175][19241] Num frames 3900... -[2025-08-22 19:28:55,370][19241] Num frames 4000... -[2025-08-22 19:28:55,573][19241] Num frames 4100... -[2025-08-22 19:28:55,820][19241] Num frames 4200... -[2025-08-22 19:28:55,872][19241] Avg episode rewards: #0: 67.499, true rewards: #0: 21.000 -[2025-08-22 19:28:55,874][19241] Avg episode reward: 67.499, avg true_objective: 21.000 -[2025-08-22 19:28:56,101][19241] Num frames 4300... -[2025-08-22 19:28:56,285][19241] Num frames 4400... -[2025-08-22 19:28:56,468][19241] Num frames 4500... -[2025-08-22 19:28:56,685][19241] Num frames 4600... -[2025-08-22 19:28:56,966][19241] Num frames 4700... -[2025-08-22 19:28:57,192][19241] Num frames 4800... -[2025-08-22 19:28:57,385][19241] Num frames 4900... -[2025-08-22 19:28:57,592][19241] Num frames 5000... -[2025-08-22 19:28:57,791][19241] Num frames 5100... -[2025-08-22 19:28:58,010][19241] Num frames 5200... -[2025-08-22 19:28:58,204][19241] Num frames 5300... -[2025-08-22 19:28:58,397][19241] Num frames 5400... -[2025-08-22 19:28:58,646][19241] Num frames 5500... -[2025-08-22 19:28:58,863][19241] Num frames 5600... -[2025-08-22 19:28:59,137][19241] Num frames 5700... -[2025-08-22 19:28:59,349][19241] Num frames 5800... -[2025-08-22 19:28:59,541][19241] Num frames 5900... -[2025-08-22 19:28:59,726][19241] Num frames 6000... -[2025-08-22 19:28:59,948][19241] Num frames 6100... -[2025-08-22 19:29:00,187][19241] Num frames 6200... -[2025-08-22 19:29:00,461][19241] Num frames 6300... -[2025-08-22 19:29:00,514][19241] Avg episode rewards: #0: 66.666, true rewards: #0: 21.000 -[2025-08-22 19:29:00,517][19241] Avg episode reward: 66.666, avg true_objective: 21.000 -[2025-08-22 19:29:00,727][19241] Num frames 6400... -[2025-08-22 19:29:00,969][19241] Num frames 6500... -[2025-08-22 19:29:01,265][19241] Num frames 6600... -[2025-08-22 19:29:01,504][19241] Num frames 6700... -[2025-08-22 19:29:01,691][19241] Num frames 6800... -[2025-08-22 19:29:01,878][19241] Num frames 6900... -[2025-08-22 19:29:02,105][19241] Num frames 7000... -[2025-08-22 19:29:02,330][19241] Num frames 7100... -[2025-08-22 19:29:02,531][19241] Num frames 7200... -[2025-08-22 19:29:02,737][19241] Num frames 7300... -[2025-08-22 19:29:02,905][19241] Num frames 7400... -[2025-08-22 19:29:03,095][19241] Num frames 7500... -[2025-08-22 19:29:03,303][19241] Num frames 7600... -[2025-08-22 19:29:03,493][19241] Num frames 7700... -[2025-08-22 19:29:03,725][19241] Num frames 7800... -[2025-08-22 19:29:03,910][19241] Num frames 7900... -[2025-08-22 19:29:04,115][19241] Num frames 8000... -[2025-08-22 19:29:04,196][19241] Avg episode rewards: #0: 62.027, true rewards: #0: 20.028 -[2025-08-22 19:29:04,197][19241] Avg episode reward: 62.027, avg true_objective: 20.028 -[2025-08-22 19:29:04,376][19241] Num frames 8100... -[2025-08-22 19:29:04,534][19241] Num frames 8200... -[2025-08-22 19:29:04,710][19241] Num frames 8300... -[2025-08-22 19:29:04,878][19241] Num frames 8400... -[2025-08-22 19:29:05,072][19241] Num frames 8500... -[2025-08-22 19:29:05,242][19241] Num frames 8600... -[2025-08-22 19:29:05,416][19241] Num frames 8700... -[2025-08-22 19:29:05,589][19241] Num frames 8800... -[2025-08-22 19:29:05,761][19241] Num frames 8900... -[2025-08-22 19:29:05,972][19241] Num frames 9000... -[2025-08-22 19:29:06,189][19241] Num frames 9100... -[2025-08-22 19:29:06,393][19241] Num frames 9200... -[2025-08-22 19:29:06,561][19241] Num frames 9300... -[2025-08-22 19:29:06,752][19241] Num frames 9400... -[2025-08-22 19:29:06,955][19241] Num frames 9500... -[2025-08-22 19:29:07,180][19241] Avg episode rewards: #0: 58.357, true rewards: #0: 19.158 -[2025-08-22 19:29:07,182][19241] Avg episode reward: 58.357, avg true_objective: 19.158 -[2025-08-22 19:29:07,227][19241] Num frames 9600... -[2025-08-22 19:29:07,440][19241] Num frames 9700... -[2025-08-22 19:29:07,652][19241] Num frames 9800... -[2025-08-22 19:29:08,009][19241] Num frames 9900... -[2025-08-22 19:29:08,255][19241] Num frames 10000... -[2025-08-22 19:29:08,483][19241] Num frames 10100... -[2025-08-22 19:29:08,673][19241] Num frames 10200... -[2025-08-22 19:29:08,892][19241] Num frames 10300... -[2025-08-22 19:29:09,083][19241] Num frames 10400... -[2025-08-22 19:29:09,293][19241] Num frames 10500... -[2025-08-22 19:29:09,459][19241] Num frames 10600... -[2025-08-22 19:29:09,646][19241] Num frames 10700... -[2025-08-22 19:29:09,826][19241] Num frames 10800... -[2025-08-22 19:29:09,993][19241] Num frames 10900... -[2025-08-22 19:29:10,195][19241] Num frames 11000... -[2025-08-22 19:29:10,363][19241] Num frames 11100... -[2025-08-22 19:29:10,555][19241] Num frames 11200... -[2025-08-22 19:29:10,730][19241] Num frames 11300... -[2025-08-22 19:29:10,885][19241] Num frames 11400... -[2025-08-22 19:29:11,045][19241] Num frames 11500... -[2025-08-22 19:29:11,204][19241] Num frames 11600... -[2025-08-22 19:29:11,376][19241] Avg episode rewards: #0: 59.297, true rewards: #0: 19.465 -[2025-08-22 19:29:11,377][19241] Avg episode reward: 59.297, avg true_objective: 19.465 -[2025-08-22 19:29:11,413][19241] Num frames 11700... -[2025-08-22 19:29:11,634][19241] Num frames 11800... -[2025-08-22 19:29:11,780][19241] Num frames 11900... -[2025-08-22 19:29:11,927][19241] Num frames 12000... -[2025-08-22 19:29:12,074][19241] Num frames 12100... -[2025-08-22 19:29:12,226][19241] Num frames 12200... -[2025-08-22 19:29:12,405][19241] Num frames 12300... -[2025-08-22 19:29:12,557][19241] Num frames 12400... -[2025-08-22 19:29:12,702][19241] Num frames 12500... -[2025-08-22 19:29:12,840][19241] Num frames 12600... -[2025-08-22 19:29:12,983][19241] Num frames 12700... -[2025-08-22 19:29:13,125][19241] Num frames 12800... -[2025-08-22 19:29:13,278][19241] Num frames 12900... -[2025-08-22 19:29:13,496][19241] Num frames 13000... -[2025-08-22 19:29:13,701][19241] Num frames 13100... -[2025-08-22 19:29:13,853][19241] Num frames 13200... -[2025-08-22 19:29:13,928][19241] Avg episode rewards: #0: 56.449, true rewards: #0: 18.879 -[2025-08-22 19:29:13,931][19241] Avg episode reward: 56.449, avg true_objective: 18.879 -[2025-08-22 19:29:14,063][19241] Num frames 13300... -[2025-08-22 19:29:14,230][19241] Num frames 13400... -[2025-08-22 19:29:14,371][19241] Num frames 13500... -[2025-08-22 19:29:14,514][19241] Num frames 13600... -[2025-08-22 19:29:14,655][19241] Num frames 13700... -[2025-08-22 19:29:14,792][19241] Num frames 13800... -[2025-08-22 19:29:14,961][19241] Num frames 13900... -[2025-08-22 19:29:15,106][19241] Num frames 14000... -[2025-08-22 19:29:15,276][19241] Num frames 14100... -[2025-08-22 19:29:15,511][19241] Num frames 14200... -[2025-08-22 19:29:15,728][19241] Num frames 14300... -[2025-08-22 19:29:15,866][19241] Num frames 14400... -[2025-08-22 19:29:16,023][19241] Num frames 14500... -[2025-08-22 19:29:16,182][19241] Num frames 14600... -[2025-08-22 19:29:16,335][19241] Num frames 14700... -[2025-08-22 19:29:16,480][19241] Num frames 14800... -[2025-08-22 19:29:16,624][19241] Num frames 14900... -[2025-08-22 19:29:16,783][19241] Num frames 15000... -[2025-08-22 19:29:16,930][19241] Num frames 15100... -[2025-08-22 19:29:17,083][19241] Num frames 15200... -[2025-08-22 19:29:17,242][19241] Num frames 15300... -[2025-08-22 19:29:17,322][19241] Avg episode rewards: #0: 56.768, true rewards: #0: 19.144 -[2025-08-22 19:29:17,323][19241] Avg episode reward: 56.768, avg true_objective: 19.144 -[2025-08-22 19:29:17,498][19241] Num frames 15400... -[2025-08-22 19:29:17,721][19241] Num frames 15500... -[2025-08-22 19:29:17,885][19241] Num frames 15600... -[2025-08-22 19:29:18,052][19241] Num frames 15700... -[2025-08-22 19:29:18,212][19241] Num frames 15800... -[2025-08-22 19:29:18,408][19241] Num frames 15900... -[2025-08-22 19:29:18,570][19241] Num frames 16000... -[2025-08-22 19:29:18,849][19241] Num frames 16100... -[2025-08-22 19:29:19,101][19241] Num frames 16200... -[2025-08-22 19:29:19,363][19241] Num frames 16300... -[2025-08-22 19:29:19,622][19241] Num frames 16400... -[2025-08-22 19:29:22,775][19241] Num frames 16500... -[2025-08-22 19:29:22,977][19241] Num frames 16600... -[2025-08-22 19:29:23,156][19241] Num frames 16700... -[2025-08-22 19:29:23,364][19241] Num frames 16800... -[2025-08-22 19:29:23,561][19241] Num frames 16900... -[2025-08-22 19:29:23,789][19241] Num frames 17000... -[2025-08-22 19:29:23,977][19241] Num frames 17100... -[2025-08-22 19:29:24,169][19241] Num frames 17200... -[2025-08-22 19:29:24,391][19241] Num frames 17300... -[2025-08-22 19:29:24,590][19241] Num frames 17400... -[2025-08-22 19:29:24,680][19241] Avg episode rewards: #0: 57.238, true rewards: #0: 19.350 -[2025-08-22 19:29:24,682][19241] Avg episode reward: 57.238, avg true_objective: 19.350 -[2025-08-22 19:29:24,847][19241] Num frames 17500... -[2025-08-22 19:29:25,018][19241] Num frames 17600... -[2025-08-22 19:29:25,189][19241] Num frames 17700... -[2025-08-22 19:29:25,405][19241] Num frames 17800... -[2025-08-22 19:29:25,593][19241] Num frames 17900... -[2025-08-22 19:29:25,803][19241] Num frames 18000... -[2025-08-22 19:29:25,989][19241] Num frames 18100... -[2025-08-22 19:29:26,247][19241] Num frames 18200... -[2025-08-22 19:29:26,507][19241] Num frames 18300... -[2025-08-22 19:29:26,708][19241] Num frames 18400... -[2025-08-22 19:29:26,915][19241] Num frames 18500... -[2025-08-22 19:29:27,110][19241] Num frames 18600... -[2025-08-22 19:29:27,333][19241] Num frames 18700... -[2025-08-22 19:29:27,554][19241] Num frames 18800... -[2025-08-22 19:29:27,778][19241] Num frames 18900... -[2025-08-22 19:29:27,990][19241] Num frames 19000... -[2025-08-22 19:29:28,187][19241] Num frames 19100... -[2025-08-22 19:29:28,417][19241] Num frames 19200... -[2025-08-22 19:29:28,610][19241] Num frames 19300... -[2025-08-22 19:29:28,798][19241] Num frames 19400... -[2025-08-22 19:29:29,010][19241] Num frames 19500... -[2025-08-22 19:29:29,098][19241] Avg episode rewards: #0: 57.014, true rewards: #0: 19.515 -[2025-08-22 19:29:29,099][19241] Avg episode reward: 57.014, avg true_objective: 19.515 -[2025-08-22 19:30:01,772][19241] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4! -[2025-08-29 18:22:49,095][15827] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2025-08-29 18:22:49,213][15827] Rollout worker 0 uses device cpu -[2025-08-29 18:22:49,215][15827] Rollout worker 1 uses device cpu -[2025-08-29 18:22:49,216][15827] Rollout worker 2 uses device cpu -[2025-08-29 18:22:49,216][15827] Rollout worker 3 uses device cpu -[2025-08-29 18:22:49,217][15827] Rollout worker 4 uses device cpu -[2025-08-29 18:22:49,218][15827] Rollout worker 5 uses device cpu -[2025-08-29 18:22:49,219][15827] Rollout worker 6 uses device cpu -[2025-08-29 18:22:49,220][15827] Rollout worker 7 uses device cpu -[2025-08-29 18:22:49,221][15827] Rollout worker 8 uses device cpu -[2025-08-29 18:22:49,222][15827] Rollout worker 9 uses device cpu -[2025-08-29 18:22:49,222][15827] Rollout worker 10 uses device cpu -[2025-08-29 18:22:49,224][15827] Rollout worker 11 uses device cpu -[2025-08-29 18:22:49,874][15827] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:22:49,876][15827] InferenceWorker_p0-w0: min num requests: 4 -[2025-08-29 18:22:49,950][15827] Starting all processes... -[2025-08-29 18:22:49,952][15827] Starting process learner_proc0 -[2025-08-29 18:22:49,998][15827] Starting all processes... -[2025-08-29 18:22:50,005][15827] Starting process inference_proc0-0 -[2025-08-29 18:22:50,005][15827] Starting process rollout_proc0 -[2025-08-29 18:22:50,006][15827] Starting process rollout_proc1 -[2025-08-29 18:22:50,006][15827] Starting process rollout_proc2 -[2025-08-29 18:22:50,008][15827] Starting process rollout_proc3 -[2025-08-29 18:22:50,008][15827] Starting process rollout_proc4 -[2025-08-29 18:22:50,008][15827] Starting process rollout_proc5 -[2025-08-29 18:22:50,009][15827] Starting process rollout_proc6 -[2025-08-29 18:22:50,017][15827] Starting process rollout_proc7 -[2025-08-29 18:22:50,018][15827] Starting process rollout_proc8 -[2025-08-29 18:22:50,020][15827] Starting process rollout_proc9 -[2025-08-29 18:22:50,021][15827] Starting process rollout_proc10 -[2025-08-29 18:22:50,025][15827] Starting process rollout_proc11 -[2025-08-29 18:23:00,344][17355] Worker 1 uses CPU cores [1] -[2025-08-29 18:23:00,345][17359] Worker 5 uses CPU cores [5] -[2025-08-29 18:23:00,347][17354] Worker 0 uses CPU cores [0] -[2025-08-29 18:23:00,347][17353] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:23:00,347][17353] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2025-08-29 18:23:00,347][17358] Worker 4 uses CPU cores [4] -[2025-08-29 18:23:00,351][17360] Worker 6 uses CPU cores [6] -[2025-08-29 18:23:00,352][17373] Worker 10 uses CPU cores [0, 1, 2, 3, 4] -[2025-08-29 18:23:00,352][17336] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:23:00,353][17336] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2025-08-29 18:23:00,353][17361] Worker 8 uses CPU cores [8] -[2025-08-29 18:23:00,354][17372] Worker 9 uses CPU cores [9] -[2025-08-29 18:23:00,354][17357] Worker 3 uses CPU cores [3] -[2025-08-29 18:23:00,355][17374] Worker 11 uses CPU cores [5, 6, 7, 8, 9] -[2025-08-29 18:23:00,358][17356] Worker 2 uses CPU cores [2] -[2025-08-29 18:23:00,363][17362] Worker 7 uses CPU cores [7] -[2025-08-29 18:23:00,571][17336] Num visible devices: 1 -[2025-08-29 18:23:00,572][17336] Starting seed is not provided -[2025-08-29 18:23:00,572][17336] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:23:00,572][17336] Initializing actor-critic model on device cuda:0 -[2025-08-29 18:23:00,573][17336] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 18:23:00,575][17353] Num visible devices: 1 -[2025-08-29 18:23:00,588][17336] RunningMeanStd input shape: (1,) -[2025-08-29 18:23:00,608][17336] ConvEncoder: input_channels=3 -[2025-08-29 18:23:01,046][17336] Conv encoder output size: 512 -[2025-08-29 18:23:01,047][17336] Policy head output size: 512 -[2025-08-29 18:23:01,127][17336] Created Actor Critic model with architecture: -[2025-08-29 18:23:01,127][17336] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2025-08-29 18:23:02,276][17336] Using optimizer -[2025-08-29 18:23:05,013][17336] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-29 18:23:05,024][17336] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 18:23:05,028][17336] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-29 18:23:05,030][17336] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 18:23:05,030][17336] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-29 18:23:05,031][17336] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 18:23:05,032][17336] Did not load from checkpoint, starting from scratch! -[2025-08-29 18:23:05,032][17336] Initialized policy 0 weights for model version 0 -[2025-08-29 18:23:05,043][17336] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:23:05,043][17336] LearnerWorker_p0 finished initialization! -[2025-08-29 18:23:05,608][17353] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 18:23:05,613][17353] RunningMeanStd input shape: (1,) -[2025-08-29 18:23:05,642][17353] ConvEncoder: input_channels=3 -[2025-08-29 18:23:05,849][17353] Conv encoder output size: 512 -[2025-08-29 18:23:05,850][17353] Policy head output size: 512 -[2025-08-29 18:23:05,990][15827] Inference worker 0-0 is ready! -[2025-08-29 18:23:05,995][15827] All inference workers are ready! Signal rollout workers to start! -[2025-08-29 18:23:06,203][17356] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,208][17360] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,215][17355] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,223][17358] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,246][17359] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,267][17357] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,293][17372] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,319][17373] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,331][17361] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,357][17362] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,362][17354] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,386][17374] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:23:06,859][17359] Decorrelating experience for 0 frames... -[2025-08-29 18:23:06,859][17358] Decorrelating experience for 0 frames... -[2025-08-29 18:23:06,859][17357] Decorrelating experience for 0 frames... -[2025-08-29 18:23:06,859][17355] Decorrelating experience for 0 frames... -[2025-08-29 18:23:06,859][17354] Decorrelating experience for 0 frames... -[2025-08-29 18:23:06,859][17360] Decorrelating experience for 0 frames... -[2025-08-29 18:23:06,861][17373] Decorrelating experience for 0 frames... -[2025-08-29 18:23:06,864][17356] Decorrelating experience for 0 frames... -[2025-08-29 18:23:10,601][15827] Heartbeat connected on LearnerWorker_p0 -[2025-08-29 18:23:10,604][15827] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:23:10,606][15827] Heartbeat connected on Batcher_0 -[2025-08-29 18:23:10,629][15827] Heartbeat connected on InferenceWorker_p0-w0 -[2025-08-29 18:23:10,679][17361] Decorrelating experience for 0 frames... -[2025-08-29 18:23:10,694][17359] Decorrelating experience for 64 frames... -[2025-08-29 18:23:10,695][17357] Decorrelating experience for 64 frames... -[2025-08-29 18:23:10,695][17354] Decorrelating experience for 64 frames... -[2025-08-29 18:23:10,696][17362] Decorrelating experience for 0 frames... -[2025-08-29 18:23:10,730][17372] Decorrelating experience for 0 frames... -[2025-08-29 18:23:10,742][17355] Decorrelating experience for 64 frames... -[2025-08-29 18:23:10,935][17362] Decorrelating experience for 64 frames... -[2025-08-29 18:23:10,940][17358] Decorrelating experience for 64 frames... -[2025-08-29 18:23:10,971][17357] Decorrelating experience for 128 frames... -[2025-08-29 18:23:10,972][17354] Decorrelating experience for 128 frames... -[2025-08-29 18:23:10,973][17372] Decorrelating experience for 64 frames... -[2025-08-29 18:23:10,977][17356] Decorrelating experience for 64 frames... -[2025-08-29 18:23:10,983][17374] Decorrelating experience for 0 frames... -[2025-08-29 18:23:11,001][17373] Decorrelating experience for 64 frames... -[2025-08-29 18:23:11,215][17362] Decorrelating experience for 128 frames... -[2025-08-29 18:23:11,273][17374] Decorrelating experience for 64 frames... -[2025-08-29 18:23:11,290][17372] Decorrelating experience for 128 frames... -[2025-08-29 18:23:11,292][17356] Decorrelating experience for 128 frames... -[2025-08-29 18:23:11,307][17360] Decorrelating experience for 64 frames... -[2025-08-29 18:23:11,316][17358] Decorrelating experience for 128 frames... -[2025-08-29 18:23:11,373][17373] Decorrelating experience for 128 frames... -[2025-08-29 18:23:11,555][17355] Decorrelating experience for 128 frames... -[2025-08-29 18:23:11,558][17359] Decorrelating experience for 128 frames... -[2025-08-29 18:23:11,596][17362] Decorrelating experience for 192 frames... -[2025-08-29 18:23:11,626][17361] Decorrelating experience for 64 frames... -[2025-08-29 18:23:11,628][17374] Decorrelating experience for 128 frames... -[2025-08-29 18:23:11,653][17354] Decorrelating experience for 192 frames... -[2025-08-29 18:23:11,676][17358] Decorrelating experience for 192 frames... -[2025-08-29 18:23:11,855][17360] Decorrelating experience for 128 frames... -[2025-08-29 18:23:11,902][17355] Decorrelating experience for 192 frames... -[2025-08-29 18:23:11,911][17359] Decorrelating experience for 192 frames... -[2025-08-29 18:23:11,917][17361] Decorrelating experience for 128 frames... -[2025-08-29 18:23:11,918][17373] Decorrelating experience for 192 frames... -[2025-08-29 18:23:11,968][17356] Decorrelating experience for 192 frames... -[2025-08-29 18:23:12,030][17357] Decorrelating experience for 192 frames... -[2025-08-29 18:23:12,258][17374] Decorrelating experience for 192 frames... -[2025-08-29 18:23:12,266][17358] Decorrelating experience for 256 frames... -[2025-08-29 18:23:12,768][17354] Decorrelating experience for 256 frames... -[2025-08-29 18:23:12,770][17360] Decorrelating experience for 192 frames... -[2025-08-29 18:23:12,770][17372] Decorrelating experience for 192 frames... -[2025-08-29 18:23:12,928][17361] Decorrelating experience for 192 frames... -[2025-08-29 18:23:12,927][17357] Decorrelating experience for 256 frames... -[2025-08-29 18:23:12,933][17374] Decorrelating experience for 256 frames... -[2025-08-29 18:23:12,984][17358] Decorrelating experience for 320 frames... -[2025-08-29 18:23:13,000][17355] Decorrelating experience for 256 frames... -[2025-08-29 18:23:13,067][17362] Decorrelating experience for 256 frames... -[2025-08-29 18:23:13,255][17373] Decorrelating experience for 256 frames... -[2025-08-29 18:23:13,279][17354] Decorrelating experience for 320 frames... -[2025-08-29 18:23:13,308][17360] Decorrelating experience for 256 frames... -[2025-08-29 18:23:13,310][17356] Decorrelating experience for 256 frames... -[2025-08-29 18:23:13,345][17374] Decorrelating experience for 320 frames... -[2025-08-29 18:23:13,435][17358] Decorrelating experience for 384 frames... -[2025-08-29 18:23:13,462][17355] Decorrelating experience for 320 frames... -[2025-08-29 18:23:13,494][17362] Decorrelating experience for 320 frames... -[2025-08-29 18:23:13,608][17361] Decorrelating experience for 256 frames... -[2025-08-29 18:23:13,662][17372] Decorrelating experience for 256 frames... -[2025-08-29 18:23:13,758][17373] Decorrelating experience for 320 frames... -[2025-08-29 18:23:13,791][17356] Decorrelating experience for 320 frames... -[2025-08-29 18:23:13,804][17360] Decorrelating experience for 320 frames... -[2025-08-29 18:23:13,929][17374] Decorrelating experience for 384 frames... -[2025-08-29 18:23:14,027][17358] Decorrelating experience for 448 frames... -[2025-08-29 18:23:14,032][17357] Decorrelating experience for 320 frames... -[2025-08-29 18:23:14,087][17362] Decorrelating experience for 384 frames... -[2025-08-29 18:23:14,352][17355] Decorrelating experience for 384 frames... -[2025-08-29 18:23:14,394][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:23:14,709][17360] Decorrelating experience for 384 frames... -[2025-08-29 18:23:14,714][17359] Decorrelating experience for 256 frames... -[2025-08-29 18:23:14,760][15827] Heartbeat connected on RolloutWorker_w4 -[2025-08-29 18:23:14,790][17356] Decorrelating experience for 384 frames... -[2025-08-29 18:23:14,830][17354] Decorrelating experience for 384 frames... -[2025-08-29 18:23:15,080][17362] Decorrelating experience for 448 frames... -[2025-08-29 18:23:15,099][17372] Decorrelating experience for 320 frames... -[2025-08-29 18:23:15,125][17357] Decorrelating experience for 384 frames... -[2025-08-29 18:23:15,171][17373] Decorrelating experience for 384 frames... -[2025-08-29 18:23:15,223][17361] Decorrelating experience for 320 frames... -[2025-08-29 18:23:15,265][17355] Decorrelating experience for 448 frames... -[2025-08-29 18:23:15,337][17374] Decorrelating experience for 448 frames... -[2025-08-29 18:23:15,376][15827] Heartbeat connected on RolloutWorker_w7 -[2025-08-29 18:23:15,495][17354] Decorrelating experience for 448 frames... -[2025-08-29 18:23:15,523][17359] Decorrelating experience for 320 frames... -[2025-08-29 18:23:15,580][15827] Heartbeat connected on RolloutWorker_w1 -[2025-08-29 18:23:15,608][15827] Heartbeat connected on RolloutWorker_w11 -[2025-08-29 18:23:15,651][17372] Decorrelating experience for 384 frames... -[2025-08-29 18:23:15,663][17357] Decorrelating experience for 448 frames... -[2025-08-29 18:23:15,687][17356] Decorrelating experience for 448 frames... -[2025-08-29 18:23:15,754][15827] Heartbeat connected on RolloutWorker_w0 -[2025-08-29 18:23:15,799][17373] Decorrelating experience for 448 frames... -[2025-08-29 18:23:15,847][17361] Decorrelating experience for 384 frames... -[2025-08-29 18:23:15,908][15827] Heartbeat connected on RolloutWorker_w3 -[2025-08-29 18:23:15,966][15827] Heartbeat connected on RolloutWorker_w2 -[2025-08-29 18:23:16,039][17359] Decorrelating experience for 384 frames... -[2025-08-29 18:23:16,247][15827] Heartbeat connected on RolloutWorker_w10 -[2025-08-29 18:23:16,277][17372] Decorrelating experience for 448 frames... -[2025-08-29 18:23:16,277][17360] Decorrelating experience for 448 frames... -[2025-08-29 18:23:16,435][15827] Heartbeat connected on RolloutWorker_w9 -[2025-08-29 18:23:16,437][15827] Heartbeat connected on RolloutWorker_w6 -[2025-08-29 18:23:16,707][17359] Decorrelating experience for 448 frames... -[2025-08-29 18:23:16,884][15827] Heartbeat connected on RolloutWorker_w5 -[2025-08-29 18:23:17,328][17361] Decorrelating experience for 448 frames... -[2025-08-29 18:23:19,406][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:23:24,424][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:23:29,412][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:23:31,107][15827] Heartbeat connected on RolloutWorker_w8 -[2025-08-29 18:23:34,399][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:23:39,460][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:23:46,448][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:23:49,426][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:23:54,407][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:23:59,609][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:04,419][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:09,507][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:14,460][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:22,313][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:24,455][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:29,508][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:34,819][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:39,608][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:44,726][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:49,472][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:24:50,386][17336] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... -[2025-08-29 18:24:54,474][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:25:00,020][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:25:04,543][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:25:08,113][17336] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth -[2025-08-29 18:25:09,465][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:25:12,505][15827] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 15827], exiting... -[2025-08-29 18:25:12,533][17336] Stopping Batcher_0... -[2025-08-29 18:25:12,533][17336] Loop batcher_evt_loop terminating... -[2025-08-29 18:25:12,531][15827] Runner profile tree view: -main_loop: 142.5811 -[2025-08-29 18:25:12,539][15827] Collected {0: 0}, FPS: 0.0 -[2025-08-29 18:25:12,594][17336] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... -[2025-08-29 18:25:12,752][17360] Stopping RolloutWorker_w6... -[2025-08-29 18:25:12,758][17360] Loop rollout_proc6_evt_loop terminating... -[2025-08-29 18:25:12,756][17359] Stopping RolloutWorker_w5... -[2025-08-29 18:25:12,760][17359] Loop rollout_proc5_evt_loop terminating... -[2025-08-29 18:25:12,766][17374] Stopping RolloutWorker_w11... -[2025-08-29 18:25:12,768][17374] Loop rollout_proc11_evt_loop terminating... -[2025-08-29 18:25:12,790][17362] Stopping RolloutWorker_w7... -[2025-08-29 18:25:12,792][17362] Loop rollout_proc7_evt_loop terminating... -[2025-08-29 18:25:12,809][17336] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth -[2025-08-29 18:25:12,814][17336] Stopping LearnerWorker_p0... -[2025-08-29 18:25:12,816][17336] Loop learner_proc0_evt_loop terminating... -[2025-08-29 18:25:12,821][17356] Stopping RolloutWorker_w2... -[2025-08-29 18:25:12,820][17372] Stopping RolloutWorker_w9... -[2025-08-29 18:25:12,822][17356] Loop rollout_proc2_evt_loop terminating... -[2025-08-29 18:25:12,822][17372] Loop rollout_proc9_evt_loop terminating... -[2025-08-29 18:25:12,823][17357] Stopping RolloutWorker_w3... -[2025-08-29 18:25:12,827][17357] Loop rollout_proc3_evt_loop terminating... -[2025-08-29 18:25:12,832][17354] Stopping RolloutWorker_w0... -[2025-08-29 18:25:12,834][17354] Loop rollout_proc0_evt_loop terminating... -[2025-08-29 18:25:12,836][17361] Stopping RolloutWorker_w8... -[2025-08-29 18:25:12,829][17373] Stopping RolloutWorker_w10... -[2025-08-29 18:25:12,837][17373] Loop rollout_proc10_evt_loop terminating... -[2025-08-29 18:25:12,836][17361] Loop rollout_proc8_evt_loop terminating... -[2025-08-29 18:25:12,836][17358] Stopping RolloutWorker_w4... -[2025-08-29 18:25:12,840][17358] Loop rollout_proc4_evt_loop terminating... -[2025-08-29 18:25:13,013][17355] Stopping RolloutWorker_w1... -[2025-08-29 18:25:13,015][17355] Loop rollout_proc1_evt_loop terminating... -[2025-08-29 18:25:14,529][17353] Weights refcount: 2 0 -[2025-08-29 18:25:14,540][17353] Stopping InferenceWorker_p0-w0... -[2025-08-29 18:25:14,541][17353] Loop inference_proc0-0_evt_loop terminating... -[2025-08-29 18:27:07,198][15827] Environment doom_basic already registered, overwriting... -[2025-08-29 18:27:07,204][15827] Environment doom_two_colors_easy already registered, overwriting... -[2025-08-29 18:27:07,207][15827] Environment doom_two_colors_hard already registered, overwriting... -[2025-08-29 18:27:07,210][15827] Environment doom_dm already registered, overwriting... -[2025-08-29 18:27:07,213][15827] Environment doom_dwango5 already registered, overwriting... -[2025-08-29 18:27:07,215][15827] Environment doom_my_way_home_flat_actions already registered, overwriting... -[2025-08-29 18:27:07,220][15827] Environment doom_defend_the_center_flat_actions already registered, overwriting... -[2025-08-29 18:27:07,223][15827] Environment doom_my_way_home already registered, overwriting... -[2025-08-29 18:27:07,224][15827] Environment doom_deadly_corridor already registered, overwriting... -[2025-08-29 18:27:07,224][15827] Environment doom_defend_the_center already registered, overwriting... -[2025-08-29 18:27:07,225][15827] Environment doom_defend_the_line already registered, overwriting... -[2025-08-29 18:27:07,226][15827] Environment doom_health_gathering already registered, overwriting... -[2025-08-29 18:27:07,227][15827] Environment doom_health_gathering_supreme already registered, overwriting... -[2025-08-29 18:27:07,228][15827] Environment doom_battle already registered, overwriting... -[2025-08-29 18:27:07,230][15827] Environment doom_battle2 already registered, overwriting... -[2025-08-29 18:27:07,233][15827] Environment doom_duel_bots already registered, overwriting... -[2025-08-29 18:27:07,235][15827] Environment doom_deathmatch_bots already registered, overwriting... -[2025-08-29 18:27:07,237][15827] Environment doom_duel already registered, overwriting... -[2025-08-29 18:27:07,238][15827] Environment doom_deathmatch_full already registered, overwriting... -[2025-08-29 18:27:07,241][15827] Environment doom_benchmark already registered, overwriting... -[2025-08-29 18:27:07,243][15827] register_encoder_factory: -[2025-08-29 18:27:07,298][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 18:27:07,299][15827] Overriding arg 'num_workers' with value 10 passed from command line -[2025-08-29 18:27:07,300][15827] Overriding arg 'num_envs_per_worker' with value 4 passed from command line -[2025-08-29 18:27:07,313][15827] Experiment dir /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists! -[2025-08-29 18:27:07,314][15827] Resuming existing experiment from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment... -[2025-08-29 18:27:07,315][15827] Weights and Biases integration disabled -[2025-08-29 18:27:07,327][15827] Environment var CUDA_VISIBLE_DEVICES is 0 - -[2025-08-29 18:27:10,679][15827] Starting experiment with the following configuration: -help=False -algo=APPO -env=doom_health_gathering_supreme -experiment=default_experiment -train_dir=/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir -restart_behavior=resume -device=gpu -seed=None -num_policies=1 -async_rl=True -serial_mode=False -batched_sampling=False -num_batches_to_accumulate=2 -worker_num_splits=2 -policy_workers_per_policy=1 -max_policy_lag=1000 -num_workers=10 -num_envs_per_worker=4 -batch_size=1024 -num_batches_per_epoch=1 -num_epochs=1 -rollout=64 -recurrence=32 -shuffle_minibatches=False -gamma=0.99 -reward_scale=1.0 -reward_clip=1000.0 -value_bootstrap=False -normalize_returns=True -exploration_loss_coeff=0.001 -value_loss_coeff=0.5 -kl_loss_coeff=0.0 -exploration_loss=symmetric_kl -gae_lambda=0.95 -ppo_clip_ratio=0.1 -ppo_clip_value=0.2 -with_vtrace=False -vtrace_rho=1.0 -vtrace_c=1.0 -optimizer=adam -adam_eps=1e-06 -adam_beta1=0.9 -adam_beta2=0.999 -max_grad_norm=4.0 -learning_rate=0.0001 -lr_schedule=constant -lr_schedule_kl_threshold=0.008 -lr_adaptive_min=1e-06 -lr_adaptive_max=0.01 -obs_subtract_mean=0.0 -obs_scale=255.0 -normalize_input=True -normalize_input_keys=None -decorrelate_experience_max_seconds=0 -decorrelate_envs_on_one_worker=True -actor_worker_gpus=[] -set_workers_cpu_affinity=True -force_envs_single_thread=False -default_niceness=0 -log_to_file=True -experiment_summaries_interval=10 -flush_summaries_interval=30 -stats_avg=100 -summaries_use_frameskip=True -heartbeat_interval=20 -heartbeat_reporting_interval=600 -train_for_env_steps=20000000 -train_for_seconds=10000000000 -save_every_sec=120 -keep_checkpoints=2 -load_checkpoint_kind=latest -save_milestones_sec=-1 -save_best_every_sec=5 -save_best_metric=reward -save_best_after=100000 -benchmark=False -encoder_mlp_layers=[512, 512] -encoder_conv_architecture=convnet_simple -encoder_conv_mlp_layers=[512] -use_rnn=True -rnn_size=512 -rnn_type=gru -rnn_num_layers=1 -decoder_mlp_layers=[] -nonlinearity=elu -policy_initialization=orthogonal -policy_init_gain=1.0 -actor_critic_share_weights=True -adaptive_stddev=True -continuous_tanh_scale=0.0 -initial_stddev=1.0 -use_env_info_cache=False -env_gpu_actions=False -env_gpu_observations=True -env_frameskip=4 -env_framestack=1 -pixel_format=CHW -use_record_episode_statistics=False -with_wandb=False -wandb_user=None -wandb_project=sample_factory -wandb_group=None -wandb_job_type=SF -wandb_tags=[] -with_pbt=False -pbt_mix_policies_in_one_env=True -pbt_period_env_steps=5000000 -pbt_start_mutation=20000000 -pbt_replace_fraction=0.3 -pbt_mutation_rate=0.15 -pbt_replace_reward_gap=0.1 -pbt_replace_reward_gap_absolute=1e-06 -pbt_optimize_gamma=False -pbt_target_objective=true_objective -pbt_perturb_min=1.1 -pbt_perturb_max=1.5 -num_agents=-1 -num_humans=0 -num_bots=-1 -start_bot_difficulty=None -timelimit=None -res_w=128 -res_h=72 -wide_aspect_ratio=False -eval_env_frameskip=1 -fps=35 -command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 -cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} -git_hash=f8ed470f837e96d11b86d84cc03d9d0be1dc0042 -git_repo_name=git@github.com:huggingface/deep-rl-class.git -[2025-08-29 18:27:10,681][15827] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2025-08-29 18:27:10,802][15827] Rollout worker 0 uses device cpu -[2025-08-29 18:27:10,804][15827] Rollout worker 1 uses device cpu -[2025-08-29 18:27:10,806][15827] Rollout worker 2 uses device cpu -[2025-08-29 18:27:10,807][15827] Rollout worker 3 uses device cpu -[2025-08-29 18:27:10,809][15827] Rollout worker 4 uses device cpu -[2025-08-29 18:27:10,811][15827] Rollout worker 5 uses device cpu -[2025-08-29 18:27:10,813][15827] Rollout worker 6 uses device cpu -[2025-08-29 18:27:10,816][15827] Rollout worker 7 uses device cpu -[2025-08-29 18:27:10,818][15827] Rollout worker 8 uses device cpu -[2025-08-29 18:27:10,820][15827] Rollout worker 9 uses device cpu -[2025-08-29 18:27:10,906][15827] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:27:10,907][15827] InferenceWorker_p0-w0: min num requests: 3 -[2025-08-29 18:27:10,943][15827] Starting all processes... -[2025-08-29 18:27:10,944][15827] Starting process learner_proc0 -[2025-08-29 18:27:10,993][15827] Starting all processes... -[2025-08-29 18:27:11,001][15827] Starting process inference_proc0-0 -[2025-08-29 18:27:11,003][15827] Starting process rollout_proc0 -[2025-08-29 18:27:11,003][15827] Starting process rollout_proc1 -[2025-08-29 18:27:11,004][15827] Starting process rollout_proc2 -[2025-08-29 18:27:11,004][15827] Starting process rollout_proc3 -[2025-08-29 18:27:11,005][15827] Starting process rollout_proc4 -[2025-08-29 18:27:11,005][15827] Starting process rollout_proc5 -[2025-08-29 18:27:11,005][15827] Starting process rollout_proc6 -[2025-08-29 18:27:11,005][15827] Starting process rollout_proc7 -[2025-08-29 18:27:11,006][15827] Starting process rollout_proc8 -[2025-08-29 18:27:11,006][15827] Starting process rollout_proc9 -[2025-08-29 18:27:15,683][18823] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:27:15,683][18823] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2025-08-29 18:27:15,978][18823] Num visible devices: 1 -[2025-08-29 18:27:15,999][18823] Starting seed is not provided -[2025-08-29 18:27:15,999][18823] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:27:15,999][18823] Initializing actor-critic model on device cuda:0 -[2025-08-29 18:27:16,000][18823] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 18:27:16,013][18823] RunningMeanStd input shape: (1,) -[2025-08-29 18:27:16,062][18823] ConvEncoder: input_channels=3 -[2025-08-29 18:27:16,128][18843] Worker 4 uses CPU cores [4] -[2025-08-29 18:27:16,129][18842] Worker 2 uses CPU cores [2] -[2025-08-29 18:27:16,142][18838] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:27:16,143][18838] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2025-08-29 18:27:16,152][18841] Worker 3 uses CPU cores [3] -[2025-08-29 18:27:16,183][18845] Worker 6 uses CPU cores [6] -[2025-08-29 18:27:16,196][18839] Worker 0 uses CPU cores [0] -[2025-08-29 18:27:16,197][18844] Worker 5 uses CPU cores [5] -[2025-08-29 18:27:16,226][18840] Worker 1 uses CPU cores [1] -[2025-08-29 18:27:16,244][18855] Worker 9 uses CPU cores [9] -[2025-08-29 18:27:16,271][18838] Num visible devices: 1 -[2025-08-29 18:27:16,408][18823] Conv encoder output size: 512 -[2025-08-29 18:27:16,408][18823] Policy head output size: 512 -[2025-08-29 18:27:16,455][18857] Worker 8 uses CPU cores [8] -[2025-08-29 18:27:16,472][18823] Created Actor Critic model with architecture: -[2025-08-29 18:27:16,472][18823] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2025-08-29 18:27:16,566][18856] Worker 7 uses CPU cores [7] -[2025-08-29 18:27:17,682][18823] Using optimizer -[2025-08-29 18:27:23,850][18823] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-29 18:27:23,856][18823] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 18:27:23,858][18823] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-29 18:27:23,859][18823] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 18:27:23,859][18823] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-29 18:27:23,860][18823] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 18:27:23,860][18823] Did not load from checkpoint, starting from scratch! -[2025-08-29 18:27:23,860][18823] Initialized policy 0 weights for model version 0 -[2025-08-29 18:27:23,866][18823] LearnerWorker_p0 finished initialization! -[2025-08-29 18:27:23,867][18823] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:27:24,260][18838] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 18:27:24,263][18838] RunningMeanStd input shape: (1,) -[2025-08-29 18:27:24,274][18838] ConvEncoder: input_channels=3 -[2025-08-29 18:27:24,389][18838] Conv encoder output size: 512 -[2025-08-29 18:27:24,390][18838] Policy head output size: 512 -[2025-08-29 18:27:24,492][15827] Inference worker 0-0 is ready! -[2025-08-29 18:27:24,495][15827] All inference workers are ready! Signal rollout workers to start! -[2025-08-29 18:27:24,653][18840] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:27:24,654][18855] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:27:24,654][18841] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:27:24,654][18845] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:27:24,660][18839] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:27:24,662][18857] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:27:24,662][18842] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:27:24,661][18844] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:27:24,664][18843] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:27:24,666][18856] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:27:25,049][18844] Decorrelating experience for 0 frames... -[2025-08-29 18:27:25,049][18840] Decorrelating experience for 0 frames... -[2025-08-29 18:27:25,049][18843] Decorrelating experience for 0 frames... -[2025-08-29 18:27:25,049][18855] Decorrelating experience for 0 frames... -[2025-08-29 18:27:25,049][18842] Decorrelating experience for 0 frames... -[2025-08-29 18:27:25,049][18839] Decorrelating experience for 0 frames... -[2025-08-29 18:27:25,240][18840] Decorrelating experience for 64 frames... -[2025-08-29 18:27:25,241][18856] Decorrelating experience for 0 frames... -[2025-08-29 18:27:25,253][18855] Decorrelating experience for 64 frames... -[2025-08-29 18:27:25,257][18844] Decorrelating experience for 64 frames... -[2025-08-29 18:27:25,267][18842] Decorrelating experience for 64 frames... -[2025-08-29 18:27:25,271][18839] Decorrelating experience for 64 frames... -[2025-08-29 18:27:25,300][18845] Decorrelating experience for 0 frames... -[2025-08-29 18:27:25,431][18856] Decorrelating experience for 64 frames... -[2025-08-29 18:27:25,441][18841] Decorrelating experience for 0 frames... -[2025-08-29 18:27:25,480][18857] Decorrelating experience for 0 frames... -[2025-08-29 18:27:25,488][18845] Decorrelating experience for 64 frames... -[2025-08-29 18:27:25,542][18844] Decorrelating experience for 128 frames... -[2025-08-29 18:27:25,623][18843] Decorrelating experience for 64 frames... -[2025-08-29 18:27:25,641][18841] Decorrelating experience for 64 frames... -[2025-08-29 18:27:25,693][18840] Decorrelating experience for 128 frames... -[2025-08-29 18:27:25,729][18856] Decorrelating experience for 128 frames... -[2025-08-29 18:27:25,774][18857] Decorrelating experience for 64 frames... -[2025-08-29 18:27:25,831][18844] Decorrelating experience for 192 frames... -[2025-08-29 18:27:25,885][18845] Decorrelating experience for 128 frames... -[2025-08-29 18:27:25,949][18855] Decorrelating experience for 128 frames... -[2025-08-29 18:27:25,960][18843] Decorrelating experience for 128 frames... -[2025-08-29 18:27:26,073][18840] Decorrelating experience for 192 frames... -[2025-08-29 18:27:26,101][18841] Decorrelating experience for 128 frames... -[2025-08-29 18:27:26,173][18845] Decorrelating experience for 192 frames... -[2025-08-29 18:27:26,211][18857] Decorrelating experience for 128 frames... -[2025-08-29 18:27:26,225][18839] Decorrelating experience for 128 frames... -[2025-08-29 18:27:26,247][18843] Decorrelating experience for 192 frames... -[2025-08-29 18:27:26,388][18842] Decorrelating experience for 128 frames... -[2025-08-29 18:27:26,388][18841] Decorrelating experience for 192 frames... -[2025-08-29 18:27:26,421][18855] Decorrelating experience for 192 frames... -[2025-08-29 18:27:26,459][18856] Decorrelating experience for 192 frames... -[2025-08-29 18:27:26,482][18857] Decorrelating experience for 192 frames... -[2025-08-29 18:27:26,491][18839] Decorrelating experience for 192 frames... -[2025-08-29 18:27:26,628][18842] Decorrelating experience for 192 frames... -[2025-08-29 18:27:27,328][15827] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:27:29,461][18823] Signal inference workers to stop experience collection... -[2025-08-29 18:27:29,477][18838] InferenceWorker_p0-w0: stopping experience collection -[2025-08-29 18:27:30,906][15827] Heartbeat connected on Batcher_0 -[2025-08-29 18:27:30,952][15827] Heartbeat connected on InferenceWorker_p0-w0 -[2025-08-29 18:27:30,963][15827] Heartbeat connected on RolloutWorker_w0 -[2025-08-29 18:27:30,965][15827] Heartbeat connected on RolloutWorker_w1 -[2025-08-29 18:27:30,966][15827] Heartbeat connected on RolloutWorker_w2 -[2025-08-29 18:27:30,968][15827] Heartbeat connected on RolloutWorker_w3 -[2025-08-29 18:27:30,970][15827] Heartbeat connected on RolloutWorker_w4 -[2025-08-29 18:27:30,971][15827] Heartbeat connected on RolloutWorker_w5 -[2025-08-29 18:27:30,972][15827] Heartbeat connected on RolloutWorker_w6 -[2025-08-29 18:27:30,973][15827] Heartbeat connected on RolloutWorker_w8 -[2025-08-29 18:27:30,973][15827] Heartbeat connected on RolloutWorker_w7 -[2025-08-29 18:27:30,975][15827] Heartbeat connected on RolloutWorker_w9 -[2025-08-29 18:27:32,334][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 650.5. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:27:32,369][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:27:37,410][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 323.6. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:27:37,862][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:27:42,413][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 215.9. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:27:42,443][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:27:47,439][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 162.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:27:48,122][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:27:52,356][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 130.1. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:27:52,839][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:27:57,380][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 108.5. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:27:57,833][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:02,476][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 92.7. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:02,912][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:07,464][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 81.2. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:07,904][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:12,456][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 72.2. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:12,727][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:17,350][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:17,702][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:22,406][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:22,542][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:27,336][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:27,657][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:33,122][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:33,144][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:37,358][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:37,718][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:42,390][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:42,529][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:47,356][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:47,631][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:52,447][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:52,725][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:28:57,358][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:28:57,548][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:29:02,462][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:29:03,339][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:29:08,946][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:29:10,332][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:29:13,333][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:29:15,106][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:29:17,378][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:29:17,760][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:29:22,385][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:29:23,312][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:29:31,368][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:29:31,408][15827] Avg episode reward: [(0, '2.022')] -[2025-08-29 18:29:31,944][15827] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 15827], exiting... -[2025-08-29 18:29:31,948][18823] Stopping Batcher_0... -[2025-08-29 18:29:31,949][18823] Loop batcher_evt_loop terminating... -[2025-08-29 18:29:31,947][15827] Runner profile tree view: -main_loop: 141.0043 -[2025-08-29 18:29:31,953][15827] Collected {0: 0}, FPS: 0.0 -[2025-08-29 18:29:31,995][18838] Weights refcount: 2 0 -[2025-08-29 18:29:32,011][18843] Stopping RolloutWorker_w4... -[2025-08-29 18:29:32,011][18843] Loop rollout_proc4_evt_loop terminating... -[2025-08-29 18:29:32,017][18840] Stopping RolloutWorker_w1... -[2025-08-29 18:29:32,018][18840] Loop rollout_proc1_evt_loop terminating... -[2025-08-29 18:29:32,023][18838] Stopping InferenceWorker_p0-w0... -[2025-08-29 18:29:32,022][18855] Stopping RolloutWorker_w9... -[2025-08-29 18:29:32,023][18838] Loop inference_proc0-0_evt_loop terminating... -[2025-08-29 18:29:32,023][18855] Loop rollout_proc9_evt_loop terminating... -[2025-08-29 18:29:32,024][18845] Stopping RolloutWorker_w6... -[2025-08-29 18:29:32,024][18845] Loop rollout_proc6_evt_loop terminating... -[2025-08-29 18:29:32,026][18839] Stopping RolloutWorker_w0... -[2025-08-29 18:29:32,027][18839] Loop rollout_proc0_evt_loop terminating... -[2025-08-29 18:29:32,034][18856] Stopping RolloutWorker_w7... -[2025-08-29 18:29:32,036][18856] Loop rollout_proc7_evt_loop terminating... -[2025-08-29 18:29:32,036][18841] Stopping RolloutWorker_w3... -[2025-08-29 18:29:32,037][18841] Loop rollout_proc3_evt_loop terminating... -[2025-08-29 18:29:32,039][18842] Stopping RolloutWorker_w2... -[2025-08-29 18:29:32,040][18842] Loop rollout_proc2_evt_loop terminating... -[2025-08-29 18:29:32,043][18844] Stopping RolloutWorker_w5... -[2025-08-29 18:29:32,045][18844] Loop rollout_proc5_evt_loop terminating... -[2025-08-29 18:29:32,049][18857] Stopping RolloutWorker_w8... -[2025-08-29 18:29:32,050][18857] Loop rollout_proc8_evt_loop terminating... -[2025-08-29 18:29:35,446][18823] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth... -[2025-08-29 18:29:35,497][18823] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth -[2025-08-29 18:29:35,500][18823] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth... -[2025-08-29 18:29:35,543][18823] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth -[2025-08-29 18:29:35,547][18823] Stopping LearnerWorker_p0... -[2025-08-29 18:29:35,547][18823] Loop learner_proc0_evt_loop terminating... -[2025-08-29 18:29:38,831][15827] Environment doom_basic already registered, overwriting... -[2025-08-29 18:29:38,835][15827] Environment doom_two_colors_easy already registered, overwriting... -[2025-08-29 18:29:38,837][15827] Environment doom_two_colors_hard already registered, overwriting... -[2025-08-29 18:29:38,838][15827] Environment doom_dm already registered, overwriting... -[2025-08-29 18:29:38,839][15827] Environment doom_dwango5 already registered, overwriting... -[2025-08-29 18:29:38,840][15827] Environment doom_my_way_home_flat_actions already registered, overwriting... -[2025-08-29 18:29:38,841][15827] Environment doom_defend_the_center_flat_actions already registered, overwriting... -[2025-08-29 18:29:38,842][15827] Environment doom_my_way_home already registered, overwriting... -[2025-08-29 18:29:38,844][15827] Environment doom_deadly_corridor already registered, overwriting... -[2025-08-29 18:29:38,845][15827] Environment doom_defend_the_center already registered, overwriting... -[2025-08-29 18:29:38,846][15827] Environment doom_defend_the_line already registered, overwriting... -[2025-08-29 18:29:38,847][15827] Environment doom_health_gathering already registered, overwriting... -[2025-08-29 18:29:38,848][15827] Environment doom_health_gathering_supreme already registered, overwriting... -[2025-08-29 18:29:38,849][15827] Environment doom_battle already registered, overwriting... -[2025-08-29 18:29:38,851][15827] Environment doom_battle2 already registered, overwriting... -[2025-08-29 18:29:38,852][15827] Environment doom_duel_bots already registered, overwriting... -[2025-08-29 18:29:38,853][15827] Environment doom_deathmatch_bots already registered, overwriting... -[2025-08-29 18:29:38,854][15827] Environment doom_duel already registered, overwriting... -[2025-08-29 18:29:38,855][15827] Environment doom_deathmatch_full already registered, overwriting... -[2025-08-29 18:29:38,858][15827] Environment doom_benchmark already registered, overwriting... -[2025-08-29 18:29:38,859][15827] register_encoder_factory: -[2025-08-29 18:29:38,967][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 18:29:39,044][15827] Experiment dir /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists! -[2025-08-29 18:29:39,046][15827] Resuming existing experiment from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment... -[2025-08-29 18:29:39,047][15827] Weights and Biases integration disabled -[2025-08-29 18:29:39,061][15827] Environment var CUDA_VISIBLE_DEVICES is 0 - -[2025-08-29 18:29:47,659][15827] Starting experiment with the following configuration: -help=False -algo=APPO -env=doom_health_gathering_supreme -experiment=default_experiment -train_dir=/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir -restart_behavior=resume -device=gpu -seed=None -num_policies=1 -async_rl=True -serial_mode=False -batched_sampling=False -num_batches_to_accumulate=2 -worker_num_splits=2 -policy_workers_per_policy=1 -max_policy_lag=1000 -num_workers=10 -num_envs_per_worker=4 -batch_size=1024 -num_batches_per_epoch=1 -num_epochs=1 -rollout=64 -recurrence=32 -shuffle_minibatches=False -gamma=0.99 -reward_scale=1.0 -reward_clip=1000.0 -value_bootstrap=False -normalize_returns=True -exploration_loss_coeff=0.001 -value_loss_coeff=0.5 -kl_loss_coeff=0.0 -exploration_loss=symmetric_kl -gae_lambda=0.95 -ppo_clip_ratio=0.1 -ppo_clip_value=0.2 -with_vtrace=False -vtrace_rho=1.0 -vtrace_c=1.0 -optimizer=adam -adam_eps=1e-06 -adam_beta1=0.9 -adam_beta2=0.999 -max_grad_norm=4.0 -learning_rate=0.0001 -lr_schedule=constant -lr_schedule_kl_threshold=0.008 -lr_adaptive_min=1e-06 -lr_adaptive_max=0.01 -obs_subtract_mean=0.0 -obs_scale=255.0 -normalize_input=True -normalize_input_keys=None -decorrelate_experience_max_seconds=0 -decorrelate_envs_on_one_worker=True -actor_worker_gpus=[] -set_workers_cpu_affinity=True -force_envs_single_thread=False -default_niceness=0 -log_to_file=True -experiment_summaries_interval=10 -flush_summaries_interval=30 -stats_avg=100 -summaries_use_frameskip=True -heartbeat_interval=20 -heartbeat_reporting_interval=600 -train_for_env_steps=20000000 -train_for_seconds=10000000000 -save_every_sec=120 -keep_checkpoints=2 -load_checkpoint_kind=latest -save_milestones_sec=-1 -save_best_every_sec=5 -save_best_metric=reward -save_best_after=100000 -benchmark=False -encoder_mlp_layers=[512, 512] -encoder_conv_architecture=convnet_simple -encoder_conv_mlp_layers=[512] -use_rnn=True -rnn_size=512 -rnn_type=gru -rnn_num_layers=1 -decoder_mlp_layers=[] -nonlinearity=elu -policy_initialization=orthogonal -policy_init_gain=1.0 -actor_critic_share_weights=True -adaptive_stddev=True -continuous_tanh_scale=0.0 -initial_stddev=1.0 -use_env_info_cache=False -env_gpu_actions=False -env_gpu_observations=True -env_frameskip=4 -env_framestack=1 -pixel_format=CHW -use_record_episode_statistics=False -with_wandb=False -wandb_user=None -wandb_project=sample_factory -wandb_group=None -wandb_job_type=SF -wandb_tags=[] -with_pbt=False -pbt_mix_policies_in_one_env=True -pbt_period_env_steps=5000000 -pbt_start_mutation=20000000 -pbt_replace_fraction=0.3 -pbt_mutation_rate=0.15 -pbt_replace_reward_gap=0.1 -pbt_replace_reward_gap_absolute=1e-06 -pbt_optimize_gamma=False -pbt_target_objective=true_objective -pbt_perturb_min=1.1 -pbt_perturb_max=1.5 -num_agents=-1 -num_humans=0 -num_bots=-1 -start_bot_difficulty=None -timelimit=None -res_w=128 -res_h=72 -wide_aspect_ratio=False -eval_env_frameskip=1 -fps=35 -command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 -cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} -git_hash=f8ed470f837e96d11b86d84cc03d9d0be1dc0042 -git_repo_name=git@github.com:huggingface/deep-rl-class.git -[2025-08-29 18:29:47,663][15827] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2025-08-29 18:29:47,785][15827] Rollout worker 0 uses device cpu -[2025-08-29 18:29:47,787][15827] Rollout worker 1 uses device cpu -[2025-08-29 18:29:47,788][15827] Rollout worker 2 uses device cpu -[2025-08-29 18:29:47,789][15827] Rollout worker 3 uses device cpu -[2025-08-29 18:29:47,790][15827] Rollout worker 4 uses device cpu -[2025-08-29 18:29:47,791][15827] Rollout worker 5 uses device cpu -[2025-08-29 18:29:47,793][15827] Rollout worker 6 uses device cpu -[2025-08-29 18:29:47,795][15827] Rollout worker 7 uses device cpu -[2025-08-29 18:29:47,796][15827] Rollout worker 8 uses device cpu -[2025-08-29 18:29:47,797][15827] Rollout worker 9 uses device cpu -[2025-08-29 18:29:47,898][15827] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:29:47,899][15827] InferenceWorker_p0-w0: min num requests: 3 -[2025-08-29 18:29:48,389][15827] Starting all processes... -[2025-08-29 18:29:48,390][15827] Starting process learner_proc0 -[2025-08-29 18:29:48,493][15827] Starting all processes... -[2025-08-29 18:29:48,508][15827] Starting process inference_proc0-0 -[2025-08-29 18:29:48,512][15827] Starting process rollout_proc0 -[2025-08-29 18:29:48,513][15827] Starting process rollout_proc1 -[2025-08-29 18:29:48,513][15827] Starting process rollout_proc2 -[2025-08-29 18:29:48,514][15827] Starting process rollout_proc3 -[2025-08-29 18:29:48,514][15827] Starting process rollout_proc4 -[2025-08-29 18:29:48,514][15827] Starting process rollout_proc5 -[2025-08-29 18:29:48,523][15827] Starting process rollout_proc6 -[2025-08-29 18:29:48,524][15827] Starting process rollout_proc7 -[2025-08-29 18:29:48,525][15827] Starting process rollout_proc8 -[2025-08-29 18:29:48,526][15827] Starting process rollout_proc9 -[2025-08-29 18:29:52,935][19395] Worker 3 uses CPU cores [3] -[2025-08-29 18:29:52,947][19394] Worker 0 uses CPU cores [0] -[2025-08-29 18:29:52,954][19378] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:29:52,955][19378] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2025-08-29 18:29:53,005][19398] Worker 4 uses CPU cores [4] -[2025-08-29 18:29:53,115][19396] Worker 2 uses CPU cores [2] -[2025-08-29 18:29:53,183][19378] Num visible devices: 1 -[2025-08-29 18:29:53,200][19378] Starting seed is not provided -[2025-08-29 18:29:53,200][19378] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:29:53,201][19378] Initializing actor-critic model on device cuda:0 -[2025-08-29 18:29:53,201][19378] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 18:29:53,205][19400] Worker 7 uses CPU cores [7] -[2025-08-29 18:29:53,209][19378] RunningMeanStd input shape: (1,) -[2025-08-29 18:29:53,227][19378] ConvEncoder: input_channels=3 -[2025-08-29 18:29:53,265][19403] Worker 9 uses CPU cores [9] -[2025-08-29 18:29:53,325][19401] Worker 8 uses CPU cores [8] -[2025-08-29 18:29:53,465][19397] Worker 1 uses CPU cores [1] -[2025-08-29 18:29:53,465][19402] Worker 5 uses CPU cores [5] -[2025-08-29 18:29:53,482][19399] Worker 6 uses CPU cores [6] -[2025-08-29 18:29:53,488][19378] Conv encoder output size: 512 -[2025-08-29 18:29:53,489][19378] Policy head output size: 512 -[2025-08-29 18:29:53,516][19378] Created Actor Critic model with architecture: -[2025-08-29 18:29:53,517][19378] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2025-08-29 18:29:53,597][19393] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:29:53,597][19393] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2025-08-29 18:29:53,637][19393] Num visible devices: 1 -[2025-08-29 18:29:54,111][19378] Using optimizer -[2025-08-29 18:29:56,115][19378] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-29 18:29:56,120][19378] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 18:29:56,123][19378] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-29 18:29:56,124][19378] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 18:29:56,124][19378] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... -[2025-08-29 18:29:56,125][19378] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 18:29:56,125][19378] Did not load from checkpoint, starting from scratch! -[2025-08-29 18:29:56,126][19378] Initialized policy 0 weights for model version 0 -[2025-08-29 18:29:56,134][19378] LearnerWorker_p0 finished initialization! -[2025-08-29 18:29:56,134][19378] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 18:29:56,395][19393] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 18:29:56,397][19393] RunningMeanStd input shape: (1,) -[2025-08-29 18:29:56,407][19393] ConvEncoder: input_channels=3 -[2025-08-29 18:29:56,482][19393] Conv encoder output size: 512 -[2025-08-29 18:29:56,482][19393] Policy head output size: 512 -[2025-08-29 18:29:56,519][15827] Inference worker 0-0 is ready! -[2025-08-29 18:29:56,521][15827] All inference workers are ready! Signal rollout workers to start! -[2025-08-29 18:29:56,604][19402] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:29:56,614][19398] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:29:56,616][19403] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:29:56,620][19401] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:29:56,623][19394] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:29:56,630][19400] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:29:56,631][19396] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:29:56,640][19395] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:29:56,645][19397] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:29:56,665][19399] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:29:57,062][19398] Decorrelating experience for 0 frames... -[2025-08-29 18:29:57,062][19399] Decorrelating experience for 0 frames... -[2025-08-29 18:29:57,062][19395] Decorrelating experience for 0 frames... -[2025-08-29 18:29:57,078][19402] Decorrelating experience for 0 frames... -[2025-08-29 18:29:57,079][19397] Decorrelating experience for 0 frames... -[2025-08-29 18:29:57,079][19401] Decorrelating experience for 0 frames... -[2025-08-29 18:29:57,080][19403] Decorrelating experience for 0 frames... -[2025-08-29 18:29:57,080][19396] Decorrelating experience for 0 frames... -[2025-08-29 18:29:57,081][19400] Decorrelating experience for 0 frames... -[2025-08-29 18:29:57,284][19397] Decorrelating experience for 64 frames... -[2025-08-29 18:29:57,286][19395] Decorrelating experience for 64 frames... -[2025-08-29 18:29:57,299][19402] Decorrelating experience for 64 frames... -[2025-08-29 18:29:57,303][19403] Decorrelating experience for 64 frames... -[2025-08-29 18:29:57,303][19401] Decorrelating experience for 64 frames... -[2025-08-29 18:29:57,329][19399] Decorrelating experience for 64 frames... -[2025-08-29 18:29:57,340][19398] Decorrelating experience for 64 frames... -[2025-08-29 18:29:57,375][19394] Decorrelating experience for 0 frames... -[2025-08-29 18:29:57,510][19400] Decorrelating experience for 64 frames... -[2025-08-29 18:29:57,577][19396] Decorrelating experience for 64 frames... -[2025-08-29 18:29:57,580][19395] Decorrelating experience for 128 frames... -[2025-08-29 18:29:57,603][19397] Decorrelating experience for 128 frames... -[2025-08-29 18:29:57,603][19402] Decorrelating experience for 128 frames... -[2025-08-29 18:29:57,629][19403] Decorrelating experience for 128 frames... -[2025-08-29 18:29:57,752][19400] Decorrelating experience for 128 frames... -[2025-08-29 18:29:57,788][19394] Decorrelating experience for 64 frames... -[2025-08-29 18:29:57,790][19399] Decorrelating experience for 128 frames... -[2025-08-29 18:29:57,839][19397] Decorrelating experience for 192 frames... -[2025-08-29 18:29:57,848][19396] Decorrelating experience for 128 frames... -[2025-08-29 18:29:57,851][19398] Decorrelating experience for 128 frames... -[2025-08-29 18:29:57,898][19403] Decorrelating experience for 192 frames... -[2025-08-29 18:29:58,004][19401] Decorrelating experience for 128 frames... -[2025-08-29 18:29:58,047][19395] Decorrelating experience for 192 frames... -[2025-08-29 18:29:58,181][19394] Decorrelating experience for 128 frames... -[2025-08-29 18:29:58,257][19400] Decorrelating experience for 192 frames... -[2025-08-29 18:29:58,260][19399] Decorrelating experience for 192 frames... -[2025-08-29 18:29:58,299][19398] Decorrelating experience for 192 frames... -[2025-08-29 18:29:58,353][19396] Decorrelating experience for 192 frames... -[2025-08-29 18:29:58,369][19401] Decorrelating experience for 192 frames... -[2025-08-29 18:29:58,445][19394] Decorrelating experience for 192 frames... -[2025-08-29 18:29:58,478][19402] Decorrelating experience for 192 frames... -[2025-08-29 18:29:59,061][15827] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:30:00,642][19378] Signal inference workers to stop experience collection... -[2025-08-29 18:30:00,652][19393] InferenceWorker_p0-w0: stopping experience collection -[2025-08-29 18:30:04,069][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 639.5. Samples: 3202. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:30:04,118][15827] Avg episode reward: [(0, '1.964')] -[2025-08-29 18:30:07,895][15827] Heartbeat connected on Batcher_0 -[2025-08-29 18:30:07,909][15827] Heartbeat connected on InferenceWorker_p0-w0 -[2025-08-29 18:30:07,909][15827] Heartbeat connected on RolloutWorker_w0 -[2025-08-29 18:30:07,910][15827] Heartbeat connected on RolloutWorker_w1 -[2025-08-29 18:30:07,911][15827] Heartbeat connected on RolloutWorker_w2 -[2025-08-29 18:30:07,913][15827] Heartbeat connected on RolloutWorker_w3 -[2025-08-29 18:30:07,954][15827] Heartbeat connected on RolloutWorker_w4 -[2025-08-29 18:30:08,038][15827] Heartbeat connected on RolloutWorker_w5 -[2025-08-29 18:30:08,189][15827] Heartbeat connected on RolloutWorker_w6 -[2025-08-29 18:30:08,262][15827] Heartbeat connected on RolloutWorker_w7 -[2025-08-29 18:30:08,362][15827] Heartbeat connected on RolloutWorker_w8 -[2025-08-29 18:30:08,400][15827] Heartbeat connected on RolloutWorker_w9 -[2025-08-29 18:30:09,068][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 320.0. Samples: 3202. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 18:30:09,116][15827] Avg episode reward: [(0, '1.964')] -[2025-08-29 18:30:13,517][19378] Signal inference workers to resume experience collection... -[2025-08-29 18:30:13,525][19393] InferenceWorker_p0-w0: resuming experience collection -[2025-08-29 18:30:14,061][15827] Fps is (10 sec: 409.9, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 213.5. Samples: 3202. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-08-29 18:30:14,064][15827] Avg episode reward: [(0, '2.085')] -[2025-08-29 18:30:14,168][15827] Heartbeat connected on LearnerWorker_p0 -[2025-08-29 18:30:16,484][19393] Updated weights for policy 0, policy_version 10 (0.0336) -[2025-08-29 18:30:20,599][15827] Fps is (10 sec: 4261.8, 60 sec: 2282.0, 300 sec: 2282.0). Total num frames: 49152. Throughput: 0: 231.7. Samples: 4990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:30:20,601][15827] Avg episode reward: [(0, '4.441')] -[2025-08-29 18:30:23,007][19393] Updated weights for policy 0, policy_version 20 (0.0013) -[2025-08-29 18:30:24,061][15827] Fps is (10 sec: 9420.7, 60 sec: 3932.2, 300 sec: 3932.2). Total num frames: 98304. Throughput: 0: 951.9. Samples: 23798. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:30:24,062][15827] Avg episode reward: [(0, '4.251')] -[2025-08-29 18:30:25,756][19393] Updated weights for policy 0, policy_version 30 (0.0010) -[2025-08-29 18:30:28,414][19393] Updated weights for policy 0, policy_version 40 (0.0014) -[2025-08-29 18:30:29,060][15827] Fps is (10 sec: 14523.2, 60 sec: 5734.4, 300 sec: 5734.4). Total num frames: 172032. Throughput: 0: 1184.1. Samples: 35522. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:30:29,062][15827] Avg episode reward: [(0, '4.330')] -[2025-08-29 18:30:29,068][19378] Saving new best policy, reward=4.330! -[2025-08-29 18:30:30,993][19393] Updated weights for policy 0, policy_version 50 (0.0011) -[2025-08-29 18:30:33,666][19393] Updated weights for policy 0, policy_version 60 (0.0013) -[2025-08-29 18:30:34,060][15827] Fps is (10 sec: 15155.3, 60 sec: 7138.8, 300 sec: 7138.8). Total num frames: 249856. Throughput: 0: 1674.1. Samples: 58592. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:30:34,062][15827] Avg episode reward: [(0, '4.487')] -[2025-08-29 18:30:34,064][19378] Saving new best policy, reward=4.487! -[2025-08-29 18:30:36,408][19393] Updated weights for policy 0, policy_version 70 (0.0015) -[2025-08-29 18:30:38,988][19393] Updated weights for policy 0, policy_version 80 (0.0013) -[2025-08-29 18:30:39,060][15827] Fps is (10 sec: 15564.6, 60 sec: 8192.0, 300 sec: 8192.0). Total num frames: 327680. Throughput: 0: 2036.5. Samples: 81458. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:30:39,063][15827] Avg episode reward: [(0, '4.568')] -[2025-08-29 18:30:39,069][19378] Saving new best policy, reward=4.568! -[2025-08-29 18:30:41,577][19393] Updated weights for policy 0, policy_version 90 (0.0015) -[2025-08-29 18:30:44,061][15827] Fps is (10 sec: 15564.7, 60 sec: 9011.2, 300 sec: 9011.2). Total num frames: 405504. Throughput: 0: 2070.2. Samples: 93158. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:30:44,062][15827] Avg episode reward: [(0, '4.371')] -[2025-08-29 18:30:44,163][19393] Updated weights for policy 0, policy_version 100 (0.0011) -[2025-08-29 18:30:46,845][19393] Updated weights for policy 0, policy_version 110 (0.0014) -[2025-08-29 18:30:49,060][15827] Fps is (10 sec: 15565.0, 60 sec: 9666.6, 300 sec: 9666.6). Total num frames: 483328. Throughput: 0: 2518.9. Samples: 116534. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:30:49,061][15827] Avg episode reward: [(0, '4.453')] -[2025-08-29 18:30:49,470][19393] Updated weights for policy 0, policy_version 120 (0.0013) -[2025-08-29 18:30:52,112][19393] Updated weights for policy 0, policy_version 130 (0.0013) -[2025-08-29 18:30:56,427][15827] Fps is (10 sec: 10930.4, 60 sec: 9424.9, 300 sec: 9424.9). Total num frames: 540672. Throughput: 0: 2639.5. Samples: 128212. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:30:56,431][15827] Avg episode reward: [(0, '4.376')] -[2025-08-29 18:30:58,325][19393] Updated weights for policy 0, policy_version 140 (0.0013) -[2025-08-29 18:30:59,060][15827] Fps is (10 sec: 9830.4, 60 sec: 9693.9, 300 sec: 9693.9). Total num frames: 581632. Throughput: 0: 2988.0. Samples: 137660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:30:59,061][15827] Avg episode reward: [(0, '4.393')] -[2025-08-29 18:31:00,969][19393] Updated weights for policy 0, policy_version 150 (0.0014) -[2025-08-29 18:31:03,665][19393] Updated weights for policy 0, policy_version 160 (0.0012) -[2025-08-29 18:31:04,060][15827] Fps is (10 sec: 15023.9, 60 sec: 10923.9, 300 sec: 10082.5). Total num frames: 655360. Throughput: 0: 3571.2. Samples: 160200. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) -[2025-08-29 18:31:04,062][15827] Avg episode reward: [(0, '4.436')] -[2025-08-29 18:31:06,363][19393] Updated weights for policy 0, policy_version 170 (0.0010) -[2025-08-29 18:31:09,061][15827] Fps is (10 sec: 15154.9, 60 sec: 12220.9, 300 sec: 10474.1). Total num frames: 733184. Throughput: 0: 3544.3. Samples: 183292. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:31:09,062][15827] Avg episode reward: [(0, '4.287')] -[2025-08-29 18:31:09,116][19393] Updated weights for policy 0, policy_version 180 (0.0013) -[2025-08-29 18:31:11,767][19393] Updated weights for policy 0, policy_version 190 (0.0011) -[2025-08-29 18:31:14,060][15827] Fps is (10 sec: 15974.6, 60 sec: 13516.8, 300 sec: 10868.1). Total num frames: 815104. Throughput: 0: 3542.7. Samples: 194942. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:31:14,061][15827] Avg episode reward: [(0, '4.362')] -[2025-08-29 18:31:14,230][19393] Updated weights for policy 0, policy_version 200 (0.0014) -[2025-08-29 18:31:16,752][19393] Updated weights for policy 0, policy_version 210 (0.0012) -[2025-08-29 18:31:19,061][15827] Fps is (10 sec: 15155.3, 60 sec: 14293.0, 300 sec: 11059.2). Total num frames: 884736. Throughput: 0: 3553.5. Samples: 218498. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:31:19,062][15827] Avg episode reward: [(0, '4.387')] -[2025-08-29 18:31:20,251][19393] Updated weights for policy 0, policy_version 220 (0.0014) -[2025-08-29 18:31:23,187][19393] Updated weights for policy 0, policy_version 230 (0.0013) -[2025-08-29 18:31:24,061][15827] Fps is (10 sec: 13106.7, 60 sec: 14131.2, 300 sec: 11131.5). Total num frames: 946176. Throughput: 0: 3471.4. Samples: 237672. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:31:24,062][15827] Avg episode reward: [(0, '4.617')] -[2025-08-29 18:31:24,068][19378] Saving new best policy, reward=4.617! -[2025-08-29 18:31:26,406][19393] Updated weights for policy 0, policy_version 240 (0.0017) -[2025-08-29 18:31:32,254][15827] Fps is (10 sec: 9623.9, 60 sec: 13287.3, 300 sec: 10856.0). Total num frames: 1011712. Throughput: 0: 3204.8. Samples: 247608. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:31:32,255][15827] Avg episode reward: [(0, '4.342')] -[2025-08-29 18:31:33,147][19393] Updated weights for policy 0, policy_version 250 (0.0017) -[2025-08-29 18:31:34,060][15827] Fps is (10 sec: 9011.6, 60 sec: 13107.2, 300 sec: 10908.3). Total num frames: 1036288. Throughput: 0: 3087.3. Samples: 255462. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:31:34,062][15827] Avg episode reward: [(0, '4.375')] -[2025-08-29 18:31:36,778][19393] Updated weights for policy 0, policy_version 260 (0.0018) -[2025-08-29 18:31:39,061][15827] Fps is (10 sec: 12035.4, 60 sec: 12765.8, 300 sec: 10936.3). Total num frames: 1093632. Throughput: 0: 3397.5. Samples: 273060. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:31:39,063][15827] Avg episode reward: [(0, '4.311')] -[2025-08-29 18:31:39,069][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000267_1093632.pth... -[2025-08-29 18:31:39,146][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000267_1093632.pth -[2025-08-29 18:31:40,018][19393] Updated weights for policy 0, policy_version 270 (0.0015) -[2025-08-29 18:31:43,012][19393] Updated weights for policy 0, policy_version 280 (0.0016) -[2025-08-29 18:31:44,060][15827] Fps is (10 sec: 12287.9, 60 sec: 12561.1, 300 sec: 11039.7). Total num frames: 1159168. Throughput: 0: 3238.6. Samples: 283396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:31:44,062][15827] Avg episode reward: [(0, '4.458')] -[2025-08-29 18:31:46,227][19393] Updated weights for policy 0, policy_version 290 (0.0015) -[2025-08-29 18:31:49,061][15827] Fps is (10 sec: 12697.4, 60 sec: 12287.9, 300 sec: 11096.4). Total num frames: 1220608. Throughput: 0: 3169.8. Samples: 302844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:31:49,064][15827] Avg episode reward: [(0, '4.347')] -[2025-08-29 18:31:49,460][19393] Updated weights for policy 0, policy_version 300 (0.0015) -[2025-08-29 18:31:53,455][19393] Updated weights for policy 0, policy_version 310 (0.0025) -[2025-08-29 18:31:54,061][15827] Fps is (10 sec: 11468.7, 60 sec: 12721.4, 300 sec: 11077.0). Total num frames: 1273856. Throughput: 0: 3018.1. Samples: 319106. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:31:54,062][15827] Avg episode reward: [(0, '4.509')] -[2025-08-29 18:31:56,849][19393] Updated weights for policy 0, policy_version 320 (0.0019) -[2025-08-29 18:31:59,060][15827] Fps is (10 sec: 11060.0, 60 sec: 12492.8, 300 sec: 11093.4). Total num frames: 1331200. Throughput: 0: 2958.5. Samples: 328074. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:31:59,061][15827] Avg episode reward: [(0, '4.292')] -[2025-08-29 18:32:00,408][19393] Updated weights for policy 0, policy_version 330 (0.0023) -[2025-08-29 18:32:03,376][19393] Updated weights for policy 0, policy_version 340 (0.0019) -[2025-08-29 18:32:04,060][15827] Fps is (10 sec: 12288.2, 60 sec: 12356.3, 300 sec: 11173.9). Total num frames: 1396736. Throughput: 0: 2839.9. Samples: 346294. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:32:04,062][15827] Avg episode reward: [(0, '4.476')] -[2025-08-29 18:32:09,060][15827] Fps is (10 sec: 8601.6, 60 sec: 11400.6, 300 sec: 10901.7). Total num frames: 1417216. Throughput: 0: 2584.2. Samples: 353962. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:32:09,062][15827] Avg episode reward: [(0, '4.641')] -[2025-08-29 18:32:09,068][19378] Saving new best policy, reward=4.677! -[2025-08-29 18:32:10,004][19393] Updated weights for policy 0, policy_version 350 (0.0017) -[2025-08-29 18:32:13,012][19393] Updated weights for policy 0, policy_version 360 (0.0011) -[2025-08-29 18:32:14,060][15827] Fps is (10 sec: 9011.1, 60 sec: 11195.7, 300 sec: 11013.7). Total num frames: 1486848. Throughput: 0: 2799.5. Samples: 364646. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:32:14,062][15827] Avg episode reward: [(0, '4.525')] -[2025-08-29 18:32:16,067][19393] Updated weights for policy 0, policy_version 370 (0.0015) -[2025-08-29 18:32:19,060][15827] Fps is (10 sec: 13107.1, 60 sec: 11059.2, 300 sec: 11059.2). Total num frames: 1548288. Throughput: 0: 2869.6. Samples: 384592. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:32:19,062][15827] Avg episode reward: [(0, '4.449')] -[2025-08-29 18:32:19,483][19393] Updated weights for policy 0, policy_version 380 (0.0015) -[2025-08-29 18:32:22,801][19393] Updated weights for policy 0, policy_version 390 (0.0013) -[2025-08-29 18:32:24,060][15827] Fps is (10 sec: 12697.8, 60 sec: 11127.6, 300 sec: 11129.8). Total num frames: 1613824. Throughput: 0: 2888.8. Samples: 403056. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:32:24,062][15827] Avg episode reward: [(0, '4.439')] -[2025-08-29 18:32:25,752][19393] Updated weights for policy 0, policy_version 400 (0.0013) -[2025-08-29 18:32:28,603][19393] Updated weights for policy 0, policy_version 410 (0.0014) -[2025-08-29 18:32:29,060][15827] Fps is (10 sec: 13926.4, 60 sec: 11897.3, 300 sec: 11250.4). Total num frames: 1687552. Throughput: 0: 2904.0. Samples: 414078. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:32:29,062][15827] Avg episode reward: [(0, '4.499')] -[2025-08-29 18:32:31,246][19393] Updated weights for policy 0, policy_version 420 (0.0015) -[2025-08-29 18:32:34,029][19393] Updated weights for policy 0, policy_version 430 (0.0012) -[2025-08-29 18:32:34,061][15827] Fps is (10 sec: 14745.2, 60 sec: 12083.2, 300 sec: 11363.1). Total num frames: 1761280. Throughput: 0: 2970.0. Samples: 436492. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:32:34,062][15827] Avg episode reward: [(0, '4.446')] -[2025-08-29 18:32:36,789][19393] Updated weights for policy 0, policy_version 440 (0.0012) -[2025-08-29 18:32:39,060][15827] Fps is (10 sec: 14336.0, 60 sec: 12288.1, 300 sec: 11443.2). Total num frames: 1830912. Throughput: 0: 3096.1. Samples: 458432. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:32:39,062][15827] Avg episode reward: [(0, '4.408')] -[2025-08-29 18:32:39,566][19393] Updated weights for policy 0, policy_version 450 (0.0011) -[2025-08-29 18:32:44,061][15827] Fps is (10 sec: 9420.8, 60 sec: 11605.3, 300 sec: 11245.4). Total num frames: 1855488. Throughput: 0: 3051.8. Samples: 465406. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:32:44,062][15827] Avg episode reward: [(0, '4.473')] -[2025-08-29 18:32:45,723][19393] Updated weights for policy 0, policy_version 460 (0.0014) -[2025-08-29 18:32:48,648][19393] Updated weights for policy 0, policy_version 470 (0.0013) -[2025-08-29 18:32:49,061][15827] Fps is (10 sec: 9830.3, 60 sec: 11810.2, 300 sec: 11348.3). Total num frames: 1929216. Throughput: 0: 2949.0. Samples: 478998. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:32:49,062][15827] Avg episode reward: [(0, '4.338')] -[2025-08-29 18:32:51,478][19393] Updated weights for policy 0, policy_version 480 (0.0011) -[2025-08-29 18:32:54,060][15827] Fps is (10 sec: 14745.9, 60 sec: 12151.5, 300 sec: 11445.4). Total num frames: 2002944. Throughput: 0: 3265.2. Samples: 500894. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:32:54,061][15827] Avg episode reward: [(0, '4.232')] -[2025-08-29 18:32:54,187][19393] Updated weights for policy 0, policy_version 490 (0.0012) -[2025-08-29 18:32:56,797][19393] Updated weights for policy 0, policy_version 500 (0.0012) -[2025-08-29 18:32:59,060][15827] Fps is (10 sec: 14745.7, 60 sec: 12424.5, 300 sec: 11537.1). Total num frames: 2076672. Throughput: 0: 3284.2. Samples: 512436. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:32:59,062][15827] Avg episode reward: [(0, '4.215')] -[2025-08-29 18:32:59,834][19393] Updated weights for policy 0, policy_version 510 (0.0016) -[2025-08-29 18:33:02,533][19393] Updated weights for policy 0, policy_version 520 (0.0013) -[2025-08-29 18:33:04,060][15827] Fps is (10 sec: 14745.3, 60 sec: 12561.0, 300 sec: 11623.8). Total num frames: 2150400. Throughput: 0: 3324.4. Samples: 534192. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:33:04,062][15827] Avg episode reward: [(0, '4.277')] -[2025-08-29 18:33:05,157][19393] Updated weights for policy 0, policy_version 530 (0.0012) -[2025-08-29 18:33:07,828][19393] Updated weights for policy 0, policy_version 540 (0.0013) -[2025-08-29 18:33:09,061][15827] Fps is (10 sec: 15155.0, 60 sec: 13516.8, 300 sec: 11727.5). Total num frames: 2228224. Throughput: 0: 3421.3. Samples: 557014. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:33:09,062][15827] Avg episode reward: [(0, '4.432')] -[2025-08-29 18:33:10,641][19393] Updated weights for policy 0, policy_version 550 (0.0014) -[2025-08-29 18:33:13,388][19393] Updated weights for policy 0, policy_version 560 (0.0013) -[2025-08-29 18:33:14,061][15827] Fps is (10 sec: 15155.3, 60 sec: 13585.1, 300 sec: 11804.9). Total num frames: 2301952. Throughput: 0: 3416.9. Samples: 567838. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:33:14,062][15827] Avg episode reward: [(0, '4.490')] -[2025-08-29 18:33:15,964][19393] Updated weights for policy 0, policy_version 570 (0.0012) -[2025-08-29 18:33:19,754][15827] Fps is (10 sec: 9958.6, 60 sec: 12957.3, 300 sec: 11633.2). Total num frames: 2334720. Throughput: 0: 3121.1. Samples: 579108. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:33:19,756][15827] Avg episode reward: [(0, '4.451')] -[2025-08-29 18:33:22,521][19393] Updated weights for policy 0, policy_version 580 (0.0013) -[2025-08-29 18:33:24,060][15827] Fps is (10 sec: 9420.8, 60 sec: 13038.9, 300 sec: 11688.6). Total num frames: 2396160. Throughput: 0: 3122.8. Samples: 598960. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:33:24,062][15827] Avg episode reward: [(0, '4.347')] -[2025-08-29 18:33:25,366][19393] Updated weights for policy 0, policy_version 590 (0.0015) -[2025-08-29 18:33:28,319][19393] Updated weights for policy 0, policy_version 600 (0.0011) -[2025-08-29 18:33:29,060][15827] Fps is (10 sec: 14084.8, 60 sec: 12970.7, 300 sec: 11741.9). Total num frames: 2465792. Throughput: 0: 3208.0. Samples: 609766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:33:29,062][15827] Avg episode reward: [(0, '4.327')] -[2025-08-29 18:33:31,220][19393] Updated weights for policy 0, policy_version 610 (0.0013) -[2025-08-29 18:33:34,061][15827] Fps is (10 sec: 13926.2, 60 sec: 12902.4, 300 sec: 11792.7). Total num frames: 2535424. Throughput: 0: 3376.4. Samples: 630938. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) -[2025-08-29 18:33:34,062][15827] Avg episode reward: [(0, '4.548')] -[2025-08-29 18:33:34,225][19393] Updated weights for policy 0, policy_version 620 (0.0015) -[2025-08-29 18:33:37,207][19393] Updated weights for policy 0, policy_version 630 (0.0010) -[2025-08-29 18:33:39,061][15827] Fps is (10 sec: 13926.2, 60 sec: 12902.4, 300 sec: 11841.2). Total num frames: 2605056. Throughput: 0: 3346.4. Samples: 651482. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:33:39,062][15827] Avg episode reward: [(0, '4.336')] -[2025-08-29 18:33:39,068][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000636_2605056.pth... -[2025-08-29 18:33:39,158][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000636_2605056.pth -[2025-08-29 18:33:40,079][19393] Updated weights for policy 0, policy_version 640 (0.0011) -[2025-08-29 18:33:42,924][19393] Updated weights for policy 0, policy_version 650 (0.0015) -[2025-08-29 18:33:44,060][15827] Fps is (10 sec: 14336.4, 60 sec: 13721.7, 300 sec: 11905.7). Total num frames: 2678784. Throughput: 0: 3324.5. Samples: 662040. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:33:44,062][15827] Avg episode reward: [(0, '4.435')] -[2025-08-29 18:33:45,827][19393] Updated weights for policy 0, policy_version 660 (0.0012) -[2025-08-29 18:33:48,557][19393] Updated weights for policy 0, policy_version 670 (0.0012) -[2025-08-29 18:33:49,061][15827] Fps is (10 sec: 14745.6, 60 sec: 13721.6, 300 sec: 11967.4). Total num frames: 2752512. Throughput: 0: 3326.3. Samples: 683876. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:33:49,062][15827] Avg episode reward: [(0, '4.448')] -[2025-08-29 18:33:51,476][19393] Updated weights for policy 0, policy_version 680 (0.0013) -[2025-08-29 18:33:55,593][15827] Fps is (10 sec: 9944.4, 60 sec: 12847.2, 300 sec: 11810.1). Total num frames: 2793472. Throughput: 0: 2955.4. Samples: 694536. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:33:55,595][15827] Avg episode reward: [(0, '4.359')] -[2025-08-29 18:33:57,929][19393] Updated weights for policy 0, policy_version 690 (0.0013) -[2025-08-29 18:33:59,061][15827] Fps is (10 sec: 9011.2, 60 sec: 12765.8, 300 sec: 11844.3). Total num frames: 2842624. Throughput: 0: 3012.2. Samples: 703388. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:33:59,062][15827] Avg episode reward: [(0, '4.307')] -[2025-08-29 18:34:00,938][19393] Updated weights for policy 0, policy_version 700 (0.0018) -[2025-08-29 18:34:03,902][19393] Updated weights for policy 0, policy_version 710 (0.0014) -[2025-08-29 18:34:04,061][15827] Fps is (10 sec: 13544.8, 60 sec: 12629.3, 300 sec: 11870.0). Total num frames: 2908160. Throughput: 0: 3269.8. Samples: 723980. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:34:04,062][15827] Avg episode reward: [(0, '4.493')] -[2025-08-29 18:34:06,430][19393] Updated weights for policy 0, policy_version 720 (0.0014) -[2025-08-29 18:34:09,061][15827] Fps is (10 sec: 14335.4, 60 sec: 12629.2, 300 sec: 11943.9). Total num frames: 2985984. Throughput: 0: 3286.6. Samples: 746858. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:34:09,063][15827] Avg episode reward: [(0, '4.283')] -[2025-08-29 18:34:09,111][19393] Updated weights for policy 0, policy_version 730 (0.0014) -[2025-08-29 18:34:11,959][19393] Updated weights for policy 0, policy_version 740 (0.0013) -[2025-08-29 18:34:14,060][15827] Fps is (10 sec: 15155.5, 60 sec: 12629.4, 300 sec: 11998.9). Total num frames: 3059712. Throughput: 0: 3284.6. Samples: 757574. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:34:14,061][15827] Avg episode reward: [(0, '4.420')] -[2025-08-29 18:34:14,928][19393] Updated weights for policy 0, policy_version 750 (0.0018) -[2025-08-29 18:34:17,832][19393] Updated weights for policy 0, policy_version 760 (0.0011) -[2025-08-29 18:34:19,060][15827] Fps is (10 sec: 14336.8, 60 sec: 13398.7, 300 sec: 12035.9). Total num frames: 3129344. Throughput: 0: 3283.7. Samples: 778706. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:34:19,062][15827] Avg episode reward: [(0, '4.333')] -[2025-08-29 18:34:20,738][19393] Updated weights for policy 0, policy_version 770 (0.0014) -[2025-08-29 18:34:23,466][19393] Updated weights for policy 0, policy_version 780 (0.0011) -[2025-08-29 18:34:24,060][15827] Fps is (10 sec: 13926.3, 60 sec: 13380.3, 300 sec: 12071.6). Total num frames: 3198976. Throughput: 0: 3303.5. Samples: 800140. Policy #0 lag: (min: 0.0, avg: 1.4, max: 4.0) -[2025-08-29 18:34:24,063][15827] Avg episode reward: [(0, '4.588')] -[2025-08-29 18:34:26,562][19393] Updated weights for policy 0, policy_version 790 (0.0015) -[2025-08-29 18:34:31,421][15827] Fps is (10 sec: 10272.9, 60 sec: 12676.8, 300 sec: 11955.9). Total num frames: 3256320. Throughput: 0: 3131.4. Samples: 810346. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:34:31,422][15827] Avg episode reward: [(0, '4.347')] -[2025-08-29 18:34:32,725][19393] Updated weights for policy 0, policy_version 800 (0.0011) -[2025-08-29 18:34:34,060][15827] Fps is (10 sec: 9830.4, 60 sec: 12697.6, 300 sec: 11990.1). Total num frames: 3297280. Throughput: 0: 3020.0. Samples: 819778. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:34:34,062][15827] Avg episode reward: [(0, '4.659')] -[2025-08-29 18:34:35,258][19393] Updated weights for policy 0, policy_version 810 (0.0012) -[2025-08-29 18:34:38,139][19393] Updated weights for policy 0, policy_version 820 (0.0012) -[2025-08-29 18:34:39,060][15827] Fps is (10 sec: 15012.0, 60 sec: 12765.9, 300 sec: 12039.3). Total num frames: 3371008. Throughput: 0: 3399.0. Samples: 842282. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:34:39,062][15827] Avg episode reward: [(0, '4.296')] -[2025-08-29 18:34:40,885][19393] Updated weights for policy 0, policy_version 830 (0.0011) -[2025-08-29 18:34:44,061][15827] Fps is (10 sec: 13926.4, 60 sec: 12629.3, 300 sec: 12058.1). Total num frames: 3436544. Throughput: 0: 3338.2. Samples: 853606. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:34:44,063][15827] Avg episode reward: [(0, '4.363')] -[2025-08-29 18:34:44,105][19393] Updated weights for policy 0, policy_version 840 (0.0017) -[2025-08-29 18:34:47,384][19393] Updated weights for policy 0, policy_version 850 (0.0018) -[2025-08-29 18:34:49,060][15827] Fps is (10 sec: 13107.3, 60 sec: 12492.8, 300 sec: 12076.1). Total num frames: 3502080. Throughput: 0: 3291.1. Samples: 872080. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:34:49,061][15827] Avg episode reward: [(0, '4.441')] -[2025-08-29 18:34:50,069][19393] Updated weights for policy 0, policy_version 860 (0.0013) -[2025-08-29 18:34:52,987][19393] Updated weights for policy 0, policy_version 870 (0.0012) -[2025-08-29 18:34:54,060][15827] Fps is (10 sec: 13926.6, 60 sec: 13380.8, 300 sec: 12121.4). Total num frames: 3575808. Throughput: 0: 3274.7. Samples: 894218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:34:54,062][15827] Avg episode reward: [(0, '4.484')] -[2025-08-29 18:34:55,909][19393] Updated weights for policy 0, policy_version 880 (0.0014) -[2025-08-29 18:34:58,925][19393] Updated weights for policy 0, policy_version 890 (0.0013) -[2025-08-29 18:34:59,061][15827] Fps is (10 sec: 14335.5, 60 sec: 13380.2, 300 sec: 12357.7). Total num frames: 3645440. Throughput: 0: 3270.4. Samples: 904744. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:34:59,062][15827] Avg episode reward: [(0, '4.500')] -[2025-08-29 18:35:01,812][19393] Updated weights for policy 0, policy_version 900 (0.0014) -[2025-08-29 18:35:07,255][15827] Fps is (10 sec: 10244.2, 60 sec: 12703.9, 300 sec: 12445.1). Total num frames: 3710976. Throughput: 0: 3043.5. Samples: 925384. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:35:07,256][15827] Avg episode reward: [(0, '4.466')] -[2025-08-29 18:35:08,389][19393] Updated weights for policy 0, policy_version 910 (0.0012) -[2025-08-29 18:35:09,060][15827] Fps is (10 sec: 9011.4, 60 sec: 12492.9, 300 sec: 12649.0). Total num frames: 3735552. Throughput: 0: 2973.4. Samples: 933944. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:35:09,062][15827] Avg episode reward: [(0, '4.405')] -[2025-08-29 18:35:11,079][19393] Updated weights for policy 0, policy_version 920 (0.0012) -[2025-08-29 18:35:13,889][19393] Updated weights for policy 0, policy_version 930 (0.0013) -[2025-08-29 18:35:14,060][15827] Fps is (10 sec: 14444.8, 60 sec: 12492.8, 300 sec: 12813.0). Total num frames: 3809280. Throughput: 0: 3157.9. Samples: 944996. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:35:14,061][15827] Avg episode reward: [(0, '4.231')] -[2025-08-29 18:35:16,866][19393] Updated weights for policy 0, policy_version 940 (0.0020) -[2025-08-29 18:35:19,061][15827] Fps is (10 sec: 14335.6, 60 sec: 12492.7, 300 sec: 12815.6). Total num frames: 3878912. Throughput: 0: 3246.3. Samples: 965862. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:35:19,063][15827] Avg episode reward: [(0, '4.572')] -[2025-08-29 18:35:20,011][19393] Updated weights for policy 0, policy_version 950 (0.0020) -[2025-08-29 18:35:23,158][19393] Updated weights for policy 0, policy_version 960 (0.0015) -[2025-08-29 18:35:24,060][15827] Fps is (10 sec: 13516.7, 60 sec: 12424.5, 300 sec: 12787.8). Total num frames: 3944448. Throughput: 0: 3191.1. Samples: 985882. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:35:24,062][15827] Avg episode reward: [(0, '4.458')] -[2025-08-29 18:35:26,073][19393] Updated weights for policy 0, policy_version 970 (0.0013) -[2025-08-29 18:35:29,061][15827] Fps is (10 sec: 13107.2, 60 sec: 13075.4, 300 sec: 12746.2). Total num frames: 4009984. Throughput: 0: 3165.2. Samples: 996040. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:35:29,062][15827] Avg episode reward: [(0, '4.519')] -[2025-08-29 18:35:29,270][19393] Updated weights for policy 0, policy_version 980 (0.0013) -[2025-08-29 18:35:32,252][19393] Updated weights for policy 0, policy_version 990 (0.0015) -[2025-08-29 18:35:34,060][15827] Fps is (10 sec: 13516.9, 60 sec: 13039.0, 300 sec: 12718.4). Total num frames: 4079616. Throughput: 0: 3202.5. Samples: 1016192. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:35:34,062][15827] Avg episode reward: [(0, '4.477')] -[2025-08-29 18:35:35,282][19393] Updated weights for policy 0, policy_version 1000 (0.0013) -[2025-08-29 18:35:38,308][19393] Updated weights for policy 0, policy_version 1010 (0.0016) -[2025-08-29 18:35:39,060][15827] Fps is (10 sec: 13517.3, 60 sec: 12902.4, 300 sec: 12676.8). Total num frames: 4145152. Throughput: 0: 3161.4. Samples: 1036482. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:35:39,062][15827] Avg episode reward: [(0, '4.567')] -[2025-08-29 18:35:39,068][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001012_4145152.pth... -[2025-08-29 18:35:39,156][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000941_3854336.pth -[2025-08-29 18:35:44,060][15827] Fps is (10 sec: 8601.4, 60 sec: 12151.5, 300 sec: 12482.4). Total num frames: 4165632. Throughput: 0: 3001.5. Samples: 1039810. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:35:44,061][15827] Avg episode reward: [(0, '4.398')] -[2025-08-29 18:35:44,933][19393] Updated weights for policy 0, policy_version 1020 (0.0012) -[2025-08-29 18:35:48,016][19393] Updated weights for policy 0, policy_version 1030 (0.0017) -[2025-08-29 18:35:49,060][15827] Fps is (10 sec: 9011.1, 60 sec: 12219.7, 300 sec: 12625.3). Total num frames: 4235264. Throughput: 0: 3101.0. Samples: 1055024. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:35:49,062][15827] Avg episode reward: [(0, '4.314')] -[2025-08-29 18:35:51,204][19393] Updated weights for policy 0, policy_version 1040 (0.0019) -[2025-08-29 18:35:54,061][15827] Fps is (10 sec: 13107.1, 60 sec: 12014.9, 300 sec: 12593.5). Total num frames: 4296704. Throughput: 0: 3134.6. Samples: 1075000. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:35:54,062][15827] Avg episode reward: [(0, '4.534')] -[2025-08-29 18:35:54,120][19393] Updated weights for policy 0, policy_version 1050 (0.0015) -[2025-08-29 18:35:56,849][19393] Updated weights for policy 0, policy_version 1060 (0.0011) -[2025-08-29 18:35:59,060][15827] Fps is (10 sec: 13516.9, 60 sec: 12083.3, 300 sec: 12593.5). Total num frames: 4370432. Throughput: 0: 3128.8. Samples: 1085794. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:35:59,062][15827] Avg episode reward: [(0, '4.444')] -[2025-08-29 18:35:59,762][19393] Updated weights for policy 0, policy_version 1070 (0.0012) -[2025-08-29 18:36:02,615][19393] Updated weights for policy 0, policy_version 1080 (0.0014) -[2025-08-29 18:36:04,060][15827] Fps is (10 sec: 14336.2, 60 sec: 12834.8, 300 sec: 12565.7). Total num frames: 4440064. Throughput: 0: 3140.2. Samples: 1107172. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:36:04,061][15827] Avg episode reward: [(0, '4.585')] -[2025-08-29 18:36:05,699][19393] Updated weights for policy 0, policy_version 1090 (0.0012) -[2025-08-29 18:36:08,570][19393] Updated weights for policy 0, policy_version 1100 (0.0017) -[2025-08-29 18:36:09,061][15827] Fps is (10 sec: 13926.1, 60 sec: 12902.4, 300 sec: 12524.0). Total num frames: 4509696. Throughput: 0: 3150.2. Samples: 1127640. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:36:09,062][15827] Avg episode reward: [(0, '4.381')] -[2025-08-29 18:36:11,395][19393] Updated weights for policy 0, policy_version 1110 (0.0012) -[2025-08-29 18:36:14,060][15827] Fps is (10 sec: 14336.2, 60 sec: 12902.4, 300 sec: 12537.9). Total num frames: 4583424. Throughput: 0: 3165.9. Samples: 1138506. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:36:14,061][15827] Avg episode reward: [(0, '4.365')] -[2025-08-29 18:36:14,417][19393] Updated weights for policy 0, policy_version 1120 (0.0015) -[2025-08-29 18:36:19,060][15827] Fps is (10 sec: 9421.0, 60 sec: 12083.3, 300 sec: 12399.1). Total num frames: 4603904. Throughput: 0: 3023.2. Samples: 1152234. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:36:19,062][15827] Avg episode reward: [(0, '4.568')] -[2025-08-29 18:36:20,747][19393] Updated weights for policy 0, policy_version 1130 (0.0014) -[2025-08-29 18:36:23,772][19393] Updated weights for policy 0, policy_version 1140 (0.0016) -[2025-08-29 18:36:24,060][15827] Fps is (10 sec: 8601.6, 60 sec: 12083.2, 300 sec: 12534.8). Total num frames: 4669440. Throughput: 0: 2916.0. Samples: 1167702. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:36:24,062][15827] Avg episode reward: [(0, '4.653')] -[2025-08-29 18:36:26,867][19393] Updated weights for policy 0, policy_version 1150 (0.0012) -[2025-08-29 18:36:29,061][15827] Fps is (10 sec: 13516.5, 60 sec: 12151.5, 300 sec: 12551.8). Total num frames: 4739072. Throughput: 0: 3064.8. Samples: 1177728. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:36:29,062][15827] Avg episode reward: [(0, '4.427')] -[2025-08-29 18:36:29,966][19393] Updated weights for policy 0, policy_version 1160 (0.0013) -[2025-08-29 18:36:32,991][19393] Updated weights for policy 0, policy_version 1170 (0.0017) -[2025-08-29 18:36:34,060][15827] Fps is (10 sec: 13926.3, 60 sec: 12151.4, 300 sec: 12593.5). Total num frames: 4808704. Throughput: 0: 3178.8. Samples: 1198070. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) -[2025-08-29 18:36:34,062][15827] Avg episode reward: [(0, '4.450')] -[2025-08-29 18:36:35,678][19393] Updated weights for policy 0, policy_version 1180 (0.0012) -[2025-08-29 18:36:38,608][19393] Updated weights for policy 0, policy_version 1190 (0.0014) -[2025-08-29 18:36:39,060][15827] Fps is (10 sec: 13926.7, 60 sec: 12219.7, 300 sec: 12607.4). Total num frames: 4878336. Throughput: 0: 3221.2. Samples: 1219954. Policy #0 lag: (min: 0.0, avg: 1.4, max: 4.0) -[2025-08-29 18:36:39,061][15827] Avg episode reward: [(0, '4.421')] -[2025-08-29 18:36:41,488][19393] Updated weights for policy 0, policy_version 1200 (0.0016) -[2025-08-29 18:36:44,060][15827] Fps is (10 sec: 13926.5, 60 sec: 13039.0, 300 sec: 12635.1). Total num frames: 4947968. Throughput: 0: 3213.4. Samples: 1230398. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:36:44,062][15827] Avg episode reward: [(0, '4.472')] -[2025-08-29 18:36:44,599][19393] Updated weights for policy 0, policy_version 1210 (0.0014) -[2025-08-29 18:36:47,468][19393] Updated weights for policy 0, policy_version 1220 (0.0011) -[2025-08-29 18:36:49,060][15827] Fps is (10 sec: 13926.3, 60 sec: 13038.9, 300 sec: 12690.7). Total num frames: 5017600. Throughput: 0: 3196.0. Samples: 1250994. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:36:49,061][15827] Avg episode reward: [(0, '4.491')] -[2025-08-29 18:36:50,308][19393] Updated weights for policy 0, policy_version 1230 (0.0016) -[2025-08-29 18:36:54,753][15827] Fps is (10 sec: 9576.9, 60 sec: 12417.8, 300 sec: 12577.8). Total num frames: 5050368. Throughput: 0: 2936.8. Samples: 1261830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:36:54,754][15827] Avg episode reward: [(0, '4.628')] -[2025-08-29 18:36:57,027][19393] Updated weights for policy 0, policy_version 1240 (0.0015) -[2025-08-29 18:36:59,060][15827] Fps is (10 sec: 9011.2, 60 sec: 12288.0, 300 sec: 12579.6). Total num frames: 5107712. Throughput: 0: 2906.6. Samples: 1269302. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:36:59,062][15827] Avg episode reward: [(0, '4.577')] -[2025-08-29 18:36:59,677][19393] Updated weights for policy 0, policy_version 1250 (0.0012) -[2025-08-29 18:37:02,657][19393] Updated weights for policy 0, policy_version 1260 (0.0017) -[2025-08-29 18:37:04,060][15827] Fps is (10 sec: 13642.2, 60 sec: 12288.0, 300 sec: 12746.2). Total num frames: 5177344. Throughput: 0: 3081.1. Samples: 1290884. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:37:04,061][15827] Avg episode reward: [(0, '4.664')] -[2025-08-29 18:37:05,470][19393] Updated weights for policy 0, policy_version 1270 (0.0012) -[2025-08-29 18:37:08,304][19393] Updated weights for policy 0, policy_version 1280 (0.0013) -[2025-08-29 18:37:09,060][15827] Fps is (10 sec: 14336.1, 60 sec: 12356.3, 300 sec: 12760.1). Total num frames: 5251072. Throughput: 0: 3234.0. Samples: 1313230. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:37:09,062][15827] Avg episode reward: [(0, '4.490')] -[2025-08-29 18:37:10,948][19393] Updated weights for policy 0, policy_version 1290 (0.0010) -[2025-08-29 18:37:13,654][19393] Updated weights for policy 0, policy_version 1300 (0.0010) -[2025-08-29 18:37:14,061][15827] Fps is (10 sec: 15154.8, 60 sec: 12424.5, 300 sec: 12815.6). Total num frames: 5328896. Throughput: 0: 3258.4. Samples: 1324354. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:37:14,062][15827] Avg episode reward: [(0, '4.394')] -[2025-08-29 18:37:16,288][19393] Updated weights for policy 0, policy_version 1310 (0.0014) -[2025-08-29 18:37:19,061][15827] Fps is (10 sec: 15154.9, 60 sec: 13312.0, 300 sec: 12843.4). Total num frames: 5402624. Throughput: 0: 3311.0. Samples: 1347066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) -[2025-08-29 18:37:19,062][15827] Avg episode reward: [(0, '4.494')] -[2025-08-29 18:37:19,244][19393] Updated weights for policy 0, policy_version 1320 (0.0013) -[2025-08-29 18:37:22,009][19393] Updated weights for policy 0, policy_version 1330 (0.0013) -[2025-08-29 18:37:24,060][15827] Fps is (10 sec: 14746.0, 60 sec: 13448.5, 300 sec: 12843.4). Total num frames: 5476352. Throughput: 0: 3312.5. Samples: 1369018. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:37:24,062][15827] Avg episode reward: [(0, '4.358')] -[2025-08-29 18:37:24,702][19393] Updated weights for policy 0, policy_version 1340 (0.0011) -[2025-08-29 18:37:30,587][15827] Fps is (10 sec: 10305.1, 60 sec: 12715.4, 300 sec: 12680.6). Total num frames: 5521408. Throughput: 0: 3230.2. Samples: 1380688. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:37:30,589][15827] Avg episode reward: [(0, '4.231')] -[2025-08-29 18:37:30,972][19393] Updated weights for policy 0, policy_version 1350 (0.0014) -[2025-08-29 18:37:33,597][19393] Updated weights for policy 0, policy_version 1360 (0.0014) -[2025-08-29 18:37:34,061][15827] Fps is (10 sec: 9830.3, 60 sec: 12765.9, 300 sec: 12690.7). Total num frames: 5574656. Throughput: 0: 3082.6. Samples: 1389710. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:37:34,062][15827] Avg episode reward: [(0, '4.357')] -[2025-08-29 18:37:36,344][19393] Updated weights for policy 0, policy_version 1370 (0.0012) -[2025-08-29 18:37:39,060][15827] Fps is (10 sec: 14985.5, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 5648384. Throughput: 0: 3395.0. Samples: 1412256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) -[2025-08-29 18:37:39,062][15827] Avg episode reward: [(0, '4.262')] -[2025-08-29 18:37:39,068][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001379_5648384.pth... -[2025-08-29 18:37:39,151][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth -[2025-08-29 18:37:39,208][19393] Updated weights for policy 0, policy_version 1380 (0.0015) -[2025-08-29 18:37:41,950][19393] Updated weights for policy 0, policy_version 1390 (0.0012) -[2025-08-29 18:37:44,060][15827] Fps is (10 sec: 15155.4, 60 sec: 12970.7, 300 sec: 12871.2). Total num frames: 5726208. Throughput: 0: 3424.2. Samples: 1423390. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:37:44,062][15827] Avg episode reward: [(0, '4.601')] -[2025-08-29 18:37:44,659][19393] Updated weights for policy 0, policy_version 1400 (0.0010) -[2025-08-29 18:37:47,416][19393] Updated weights for policy 0, policy_version 1410 (0.0013) -[2025-08-29 18:37:49,060][15827] Fps is (10 sec: 15155.4, 60 sec: 13039.0, 300 sec: 12871.2). Total num frames: 5799936. Throughput: 0: 3440.7. Samples: 1445716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:37:49,061][15827] Avg episode reward: [(0, '4.228')] -[2025-08-29 18:37:50,025][19393] Updated weights for policy 0, policy_version 1420 (0.0010) -[2025-08-29 18:37:52,751][19393] Updated weights for policy 0, policy_version 1430 (0.0015) -[2025-08-29 18:37:54,060][15827] Fps is (10 sec: 14745.7, 60 sec: 13881.8, 300 sec: 12871.2). Total num frames: 5873664. Throughput: 0: 3447.0. Samples: 1468346. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:37:54,061][15827] Avg episode reward: [(0, '4.244')] -[2025-08-29 18:37:55,529][19393] Updated weights for policy 0, policy_version 1440 (0.0013) -[2025-08-29 18:37:58,103][19393] Updated weights for policy 0, policy_version 1450 (0.0011) -[2025-08-29 18:37:59,060][15827] Fps is (10 sec: 15155.2, 60 sec: 14062.9, 300 sec: 12885.0). Total num frames: 5951488. Throughput: 0: 3453.8. Samples: 1479774. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:37:59,063][15827] Avg episode reward: [(0, '4.463')] -[2025-08-29 18:38:00,575][19393] Updated weights for policy 0, policy_version 1460 (0.0011) -[2025-08-29 18:38:06,419][15827] Fps is (10 sec: 11268.2, 60 sec: 13399.6, 300 sec: 12727.7). Total num frames: 6012928. Throughput: 0: 3304.1. Samples: 1503544. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:38:06,420][15827] Avg episode reward: [(0, '4.535')] -[2025-08-29 18:38:06,830][19393] Updated weights for policy 0, policy_version 1470 (0.0013) -[2025-08-29 18:38:09,061][15827] Fps is (10 sec: 9830.3, 60 sec: 13312.0, 300 sec: 12704.5). Total num frames: 6049792. Throughput: 0: 3189.7. Samples: 1512556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:38:09,062][15827] Avg episode reward: [(0, '4.430')] -[2025-08-29 18:38:09,648][19393] Updated weights for policy 0, policy_version 1480 (0.0011) -[2025-08-29 18:38:12,429][19393] Updated weights for policy 0, policy_version 1490 (0.0013) -[2025-08-29 18:38:14,060][15827] Fps is (10 sec: 14473.3, 60 sec: 13243.8, 300 sec: 12873.7). Total num frames: 6123520. Throughput: 0: 3289.1. Samples: 1523676. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:38:14,062][15827] Avg episode reward: [(0, '4.466')] -[2025-08-29 18:38:15,246][19393] Updated weights for policy 0, policy_version 1500 (0.0011) -[2025-08-29 18:38:18,018][19393] Updated weights for policy 0, policy_version 1510 (0.0014) -[2025-08-29 18:38:19,061][15827] Fps is (10 sec: 14745.6, 60 sec: 13243.7, 300 sec: 12885.0). Total num frames: 6197248. Throughput: 0: 3466.3. Samples: 1545692. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:38:19,062][15827] Avg episode reward: [(0, '4.297')] -[2025-08-29 18:38:21,147][19393] Updated weights for policy 0, policy_version 1520 (0.0015) -[2025-08-29 18:38:23,892][19393] Updated weights for policy 0, policy_version 1530 (0.0013) -[2025-08-29 18:38:24,061][15827] Fps is (10 sec: 14335.9, 60 sec: 13175.4, 300 sec: 12885.0). Total num frames: 6266880. Throughput: 0: 3437.6. Samples: 1566948. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:38:24,062][15827] Avg episode reward: [(0, '4.254')] -[2025-08-29 18:38:26,781][19393] Updated weights for policy 0, policy_version 1540 (0.0012) -[2025-08-29 18:38:29,060][15827] Fps is (10 sec: 14336.2, 60 sec: 14009.8, 300 sec: 12898.9). Total num frames: 6340608. Throughput: 0: 3422.4. Samples: 1577400. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:38:29,062][15827] Avg episode reward: [(0, '4.214')] -[2025-08-29 18:38:29,406][19393] Updated weights for policy 0, policy_version 1550 (0.0011) -[2025-08-29 18:38:31,940][19393] Updated weights for policy 0, policy_version 1560 (0.0011) -[2025-08-29 18:38:34,060][15827] Fps is (10 sec: 15155.4, 60 sec: 14063.0, 300 sec: 12926.7). Total num frames: 6418432. Throughput: 0: 3447.8. Samples: 1600868. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:38:34,062][15827] Avg episode reward: [(0, '4.310')] -[2025-08-29 18:38:34,610][19393] Updated weights for policy 0, policy_version 1570 (0.0011) -[2025-08-29 18:38:37,374][19393] Updated weights for policy 0, policy_version 1580 (0.0014) -[2025-08-29 18:38:42,252][15827] Fps is (10 sec: 11178.3, 60 sec: 13287.9, 300 sec: 12774.6). Total num frames: 6488064. Throughput: 0: 3225.3. Samples: 1623778. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:38:42,253][15827] Avg episode reward: [(0, '4.598')] -[2025-08-29 18:38:43,594][19393] Updated weights for policy 0, policy_version 1590 (0.0014) -[2025-08-29 18:38:44,061][15827] Fps is (10 sec: 9830.2, 60 sec: 13175.4, 300 sec: 12760.1). Total num frames: 6516736. Throughput: 0: 3213.8. Samples: 1624394. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:38:44,062][15827] Avg episode reward: [(0, '4.438')] -[2025-08-29 18:38:46,404][19393] Updated weights for policy 0, policy_version 1600 (0.0014) -[2025-08-29 18:38:49,061][15827] Fps is (10 sec: 15039.2, 60 sec: 13175.4, 300 sec: 12938.4). Total num frames: 6590464. Throughput: 0: 3292.2. Samples: 1643928. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:38:49,062][15827] Avg episode reward: [(0, '4.193')] -[2025-08-29 18:38:49,205][19393] Updated weights for policy 0, policy_version 1610 (0.0013) -[2025-08-29 18:38:51,992][19393] Updated weights for policy 0, policy_version 1620 (0.0012) -[2025-08-29 18:38:54,061][15827] Fps is (10 sec: 14745.4, 60 sec: 13175.4, 300 sec: 12954.5). Total num frames: 6664192. Throughput: 0: 3405.6. Samples: 1665810. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:38:54,062][15827] Avg episode reward: [(0, '4.403')] -[2025-08-29 18:38:55,249][19393] Updated weights for policy 0, policy_version 1630 (0.0016) -[2025-08-29 18:38:57,991][19393] Updated weights for policy 0, policy_version 1640 (0.0013) -[2025-08-29 18:38:59,060][15827] Fps is (10 sec: 13926.5, 60 sec: 12970.7, 300 sec: 12954.5). Total num frames: 6729728. Throughput: 0: 3363.2. Samples: 1675020. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:38:59,062][15827] Avg episode reward: [(0, '4.409')] -[2025-08-29 18:39:00,588][19393] Updated weights for policy 0, policy_version 1650 (0.0013) -[2025-08-29 18:39:03,119][19393] Updated weights for policy 0, policy_version 1660 (0.0013) -[2025-08-29 18:39:04,060][15827] Fps is (10 sec: 14746.0, 60 sec: 13856.8, 300 sec: 12968.4). Total num frames: 6811648. Throughput: 0: 3399.8. Samples: 1698682. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:39:04,062][15827] Avg episode reward: [(0, '4.439')] -[2025-08-29 18:39:05,876][19393] Updated weights for policy 0, policy_version 1670 (0.0011) -[2025-08-29 18:39:08,596][19393] Updated weights for policy 0, policy_version 1680 (0.0011) -[2025-08-29 18:39:09,060][15827] Fps is (10 sec: 15564.7, 60 sec: 13926.4, 300 sec: 12968.3). Total num frames: 6885376. Throughput: 0: 3434.9. Samples: 1721518. Policy #0 lag: (min: 0.0, avg: 1.7, max: 3.0) -[2025-08-29 18:39:09,063][15827] Avg episode reward: [(0, '4.379')] -[2025-08-29 18:39:11,146][19393] Updated weights for policy 0, policy_version 1690 (0.0016) -[2025-08-29 18:39:13,739][19393] Updated weights for policy 0, policy_version 1700 (0.0013) -[2025-08-29 18:39:14,060][15827] Fps is (10 sec: 15564.8, 60 sec: 14062.9, 300 sec: 13010.0). Total num frames: 6967296. Throughput: 0: 3465.6. Samples: 1733354. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:39:14,061][15827] Avg episode reward: [(0, '4.431')] -[2025-08-29 18:39:19,060][15827] Fps is (10 sec: 10240.0, 60 sec: 13175.5, 300 sec: 12843.4). Total num frames: 6987776. Throughput: 0: 3204.9. Samples: 1745090. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:39:19,062][15827] Avg episode reward: [(0, '4.286')] -[2025-08-29 18:39:20,029][19393] Updated weights for policy 0, policy_version 1710 (0.0016) -[2025-08-29 18:39:22,956][19393] Updated weights for policy 0, policy_version 1720 (0.0011) -[2025-08-29 18:39:24,060][15827] Fps is (10 sec: 9420.9, 60 sec: 13243.8, 300 sec: 13003.0). Total num frames: 7061504. Throughput: 0: 3378.6. Samples: 1765034. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:39:24,062][15827] Avg episode reward: [(0, '4.454')] -[2025-08-29 18:39:25,732][19393] Updated weights for policy 0, policy_version 1730 (0.0016) -[2025-08-29 18:39:28,313][19393] Updated weights for policy 0, policy_version 1740 (0.0010) -[2025-08-29 18:39:29,060][15827] Fps is (10 sec: 14745.7, 60 sec: 13243.7, 300 sec: 13010.0). Total num frames: 7135232. Throughput: 0: 3373.5. Samples: 1776200. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:39:29,061][15827] Avg episode reward: [(0, '4.445')] -[2025-08-29 18:39:30,908][19393] Updated weights for policy 0, policy_version 1750 (0.0014) -[2025-08-29 18:39:33,452][19393] Updated weights for policy 0, policy_version 1760 (0.0013) -[2025-08-29 18:39:34,060][15827] Fps is (10 sec: 15564.6, 60 sec: 13312.0, 300 sec: 13037.8). Total num frames: 7217152. Throughput: 0: 3464.7. Samples: 1799840. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:39:34,062][15827] Avg episode reward: [(0, '4.437')] -[2025-08-29 18:39:36,168][19393] Updated weights for policy 0, policy_version 1770 (0.0013) -[2025-08-29 18:39:38,901][19393] Updated weights for policy 0, policy_version 1780 (0.0013) -[2025-08-29 18:39:39,060][15827] Fps is (10 sec: 15564.8, 60 sec: 14131.9, 300 sec: 13065.6). Total num frames: 7290880. Throughput: 0: 3480.4. Samples: 1822426. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:39:39,062][15827] Avg episode reward: [(0, '4.503')] -[2025-08-29 18:39:39,066][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001780_7290880.pth... -[2025-08-29 18:39:39,145][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001012_4145152.pth -[2025-08-29 18:39:41,686][19393] Updated weights for policy 0, policy_version 1790 (0.0014) -[2025-08-29 18:39:44,060][15827] Fps is (10 sec: 15155.2, 60 sec: 14199.5, 300 sec: 13107.2). Total num frames: 7368704. Throughput: 0: 3528.0. Samples: 1833780. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:39:44,062][15827] Avg episode reward: [(0, '4.482')] -[2025-08-29 18:39:44,365][19393] Updated weights for policy 0, policy_version 1800 (0.0013) -[2025-08-29 18:39:47,118][19393] Updated weights for policy 0, policy_version 1810 (0.0015) -[2025-08-29 18:39:49,061][15827] Fps is (10 sec: 15154.5, 60 sec: 14199.4, 300 sec: 13107.2). Total num frames: 7442432. Throughput: 0: 3506.5. Samples: 1856474. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:39:49,062][15827] Avg episode reward: [(0, '4.437')] -[2025-08-29 18:39:49,878][19393] Updated weights for policy 0, policy_version 1820 (0.0012) -[2025-08-29 18:39:54,061][15827] Fps is (10 sec: 9420.5, 60 sec: 13312.0, 300 sec: 12940.6). Total num frames: 7462912. Throughput: 0: 3239.0. Samples: 1867276. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:39:54,062][15827] Avg episode reward: [(0, '4.319')] -[2025-08-29 18:39:56,134][19393] Updated weights for policy 0, policy_version 1830 (0.0011) -[2025-08-29 18:39:58,756][19393] Updated weights for policy 0, policy_version 1840 (0.0009) -[2025-08-29 18:39:59,060][15827] Fps is (10 sec: 9830.9, 60 sec: 13516.8, 300 sec: 13124.4). Total num frames: 7540736. Throughput: 0: 3197.8. Samples: 1877254. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:39:59,062][15827] Avg episode reward: [(0, '4.545')] -[2025-08-29 18:40:01,493][19393] Updated weights for policy 0, policy_version 1850 (0.0011) -[2025-08-29 18:40:04,061][15827] Fps is (10 sec: 15155.4, 60 sec: 13380.2, 300 sec: 13148.8). Total num frames: 7614464. Throughput: 0: 3441.4. Samples: 1899954. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:40:04,062][15827] Avg episode reward: [(0, '4.372')] -[2025-08-29 18:40:04,128][19393] Updated weights for policy 0, policy_version 1860 (0.0013) -[2025-08-29 18:40:06,872][19393] Updated weights for policy 0, policy_version 1870 (0.0011) -[2025-08-29 18:40:09,061][15827] Fps is (10 sec: 14745.4, 60 sec: 13380.3, 300 sec: 13148.8). Total num frames: 7688192. Throughput: 0: 3493.5. Samples: 1922240. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:40:09,062][15827] Avg episode reward: [(0, '4.241')] -[2025-08-29 18:40:09,616][19393] Updated weights for policy 0, policy_version 1880 (0.0016) -[2025-08-29 18:40:12,468][19393] Updated weights for policy 0, policy_version 1890 (0.0014) -[2025-08-29 18:40:14,060][15827] Fps is (10 sec: 14746.0, 60 sec: 13243.8, 300 sec: 13162.8). Total num frames: 7761920. Throughput: 0: 3489.5. Samples: 1933226. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:40:14,061][15827] Avg episode reward: [(0, '4.266')] -[2025-08-29 18:40:15,129][19393] Updated weights for policy 0, policy_version 1900 (0.0011) -[2025-08-29 18:40:18,613][19393] Updated weights for policy 0, policy_version 1910 (0.0016) -[2025-08-29 18:40:19,060][15827] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 13162.7). Total num frames: 7827456. Throughput: 0: 3437.3. Samples: 1954520. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:40:19,062][15827] Avg episode reward: [(0, '4.325')] -[2025-08-29 18:40:21,932][19393] Updated weights for policy 0, policy_version 1920 (0.0016) -[2025-08-29 18:40:24,060][15827] Fps is (10 sec: 13107.1, 60 sec: 13858.1, 300 sec: 13162.8). Total num frames: 7892992. Throughput: 0: 3355.0. Samples: 1973400. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:40:24,061][15827] Avg episode reward: [(0, '4.410')] -[2025-08-29 18:40:24,796][19393] Updated weights for policy 0, policy_version 1930 (0.0012) -[2025-08-29 18:40:29,753][15827] Fps is (10 sec: 8810.7, 60 sec: 12957.7, 300 sec: 12993.4). Total num frames: 7921664. Throughput: 0: 3055.6. Samples: 1973400. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:40:29,754][15827] Avg episode reward: [(0, '4.596')] -[2025-08-29 18:40:31,233][19393] Updated weights for policy 0, policy_version 1940 (0.0013) -[2025-08-29 18:40:33,952][19393] Updated weights for policy 0, policy_version 1950 (0.0014) -[2025-08-29 18:40:34,061][15827] Fps is (10 sec: 9420.5, 60 sec: 12834.1, 300 sec: 13023.9). Total num frames: 7987200. Throughput: 0: 3035.4. Samples: 1993068. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:40:34,063][15827] Avg episode reward: [(0, '4.433')] -[2025-08-29 18:40:36,789][19393] Updated weights for policy 0, policy_version 1960 (0.0013) -[2025-08-29 18:40:39,060][15827] Fps is (10 sec: 14522.5, 60 sec: 12765.9, 300 sec: 13190.5). Total num frames: 8056832. Throughput: 0: 3267.9. Samples: 2014330. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:40:39,062][15827] Avg episode reward: [(0, '4.378')] -[2025-08-29 18:40:39,857][19393] Updated weights for policy 0, policy_version 1970 (0.0021) -[2025-08-29 18:40:43,115][19393] Updated weights for policy 0, policy_version 1980 (0.0016) -[2025-08-29 18:40:44,060][15827] Fps is (10 sec: 13517.2, 60 sec: 12561.1, 300 sec: 13176.6). Total num frames: 8122368. Throughput: 0: 3267.5. Samples: 2024290. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:40:44,063][15827] Avg episode reward: [(0, '4.350')] -[2025-08-29 18:40:46,091][19393] Updated weights for policy 0, policy_version 1990 (0.0015) -[2025-08-29 18:40:48,900][19393] Updated weights for policy 0, policy_version 2000 (0.0013) -[2025-08-29 18:40:49,060][15827] Fps is (10 sec: 13516.8, 60 sec: 12492.9, 300 sec: 13204.4). Total num frames: 8192000. Throughput: 0: 3212.1. Samples: 2044496. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:40:49,062][15827] Avg episode reward: [(0, '4.421')] -[2025-08-29 18:40:51,727][19393] Updated weights for policy 0, policy_version 2010 (0.0012) -[2025-08-29 18:40:54,060][15827] Fps is (10 sec: 13926.6, 60 sec: 13312.1, 300 sec: 13190.5). Total num frames: 8261632. Throughput: 0: 3188.8. Samples: 2065734. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:40:54,061][15827] Avg episode reward: [(0, '4.350')] -[2025-08-29 18:40:54,708][19393] Updated weights for policy 0, policy_version 2020 (0.0014) -[2025-08-29 18:40:57,638][19393] Updated weights for policy 0, policy_version 2030 (0.0015) -[2025-08-29 18:40:59,061][15827] Fps is (10 sec: 14335.8, 60 sec: 13243.7, 300 sec: 13204.4). Total num frames: 8335360. Throughput: 0: 3179.0. Samples: 2076280. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:40:59,062][15827] Avg episode reward: [(0, '4.415')] -[2025-08-29 18:41:00,285][19393] Updated weights for policy 0, policy_version 2040 (0.0014) -[2025-08-29 18:41:05,587][15827] Fps is (10 sec: 10305.0, 60 sec: 12449.1, 300 sec: 13053.5). Total num frames: 8380416. Throughput: 0: 2858.7. Samples: 2087528. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:41:05,590][15827] Avg episode reward: [(0, '4.290')] -[2025-08-29 18:41:06,695][19393] Updated weights for policy 0, policy_version 2050 (0.0014) -[2025-08-29 18:41:09,060][15827] Fps is (10 sec: 9420.9, 60 sec: 12356.3, 300 sec: 13037.8). Total num frames: 8429568. Throughput: 0: 2980.9. Samples: 2107542. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:41:09,062][15827] Avg episode reward: [(0, '4.317')] -[2025-08-29 18:41:09,525][19393] Updated weights for policy 0, policy_version 2060 (0.0015) -[2025-08-29 18:41:12,383][19393] Updated weights for policy 0, policy_version 2070 (0.0011) -[2025-08-29 18:41:14,061][15827] Fps is (10 sec: 14502.0, 60 sec: 12356.2, 300 sec: 13218.3). Total num frames: 8503296. Throughput: 0: 3261.1. Samples: 2117892. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:41:14,062][15827] Avg episode reward: [(0, '4.445')] -[2025-08-29 18:41:15,057][19393] Updated weights for policy 0, policy_version 2080 (0.0012) -[2025-08-29 18:41:17,872][19393] Updated weights for policy 0, policy_version 2090 (0.0014) -[2025-08-29 18:41:19,060][15827] Fps is (10 sec: 14336.1, 60 sec: 12424.5, 300 sec: 13232.2). Total num frames: 8572928. Throughput: 0: 3274.7. Samples: 2140430. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:41:19,061][15827] Avg episode reward: [(0, '4.245')] -[2025-08-29 18:41:20,903][19393] Updated weights for policy 0, policy_version 2100 (0.0012) -[2025-08-29 18:41:23,564][19393] Updated weights for policy 0, policy_version 2110 (0.0014) -[2025-08-29 18:41:24,060][15827] Fps is (10 sec: 14336.3, 60 sec: 12561.1, 300 sec: 13246.1). Total num frames: 8646656. Throughput: 0: 3282.9. Samples: 2162062. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:41:24,061][15827] Avg episode reward: [(0, '4.502')] -[2025-08-29 18:41:26,303][19393] Updated weights for policy 0, policy_version 2120 (0.0015) -[2025-08-29 18:41:29,006][19393] Updated weights for policy 0, policy_version 2130 (0.0013) -[2025-08-29 18:41:29,061][15827] Fps is (10 sec: 15154.7, 60 sec: 13536.4, 300 sec: 13273.8). Total num frames: 8724480. Throughput: 0: 3306.2. Samples: 2173072. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:41:29,062][15827] Avg episode reward: [(0, '4.272')] -[2025-08-29 18:41:31,853][19393] Updated weights for policy 0, policy_version 2140 (0.0013) -[2025-08-29 18:41:34,061][15827] Fps is (10 sec: 14335.3, 60 sec: 13380.2, 300 sec: 13259.9). Total num frames: 8790016. Throughput: 0: 3352.7. Samples: 2195370. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:41:34,064][15827] Avg episode reward: [(0, '4.450')] -[2025-08-29 18:41:35,030][19393] Updated weights for policy 0, policy_version 2150 (0.0018) -[2025-08-29 18:41:37,774][19393] Updated weights for policy 0, policy_version 2160 (0.0014) -[2025-08-29 18:41:41,416][15827] Fps is (10 sec: 9945.3, 60 sec: 12677.7, 300 sec: 13113.6). Total num frames: 8847360. Throughput: 0: 2936.2. Samples: 2204780. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:41:41,417][15827] Avg episode reward: [(0, '4.569')] -[2025-08-29 18:41:41,424][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002160_8847360.pth... -[2025-08-29 18:41:41,507][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001379_5648384.pth -[2025-08-29 18:41:44,061][15827] Fps is (10 sec: 9010.8, 60 sec: 12629.2, 300 sec: 13093.3). Total num frames: 8880128. Throughput: 0: 3054.7. Samples: 2213746. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:41:44,064][15827] Avg episode reward: [(0, '4.361')] -[2025-08-29 18:41:44,779][19393] Updated weights for policy 0, policy_version 2170 (0.0015) -[2025-08-29 18:41:47,687][19393] Updated weights for policy 0, policy_version 2180 (0.0016) -[2025-08-29 18:41:49,061][15827] Fps is (10 sec: 12859.2, 60 sec: 12560.9, 300 sec: 13235.4). Total num frames: 8945664. Throughput: 0: 3347.3. Samples: 2233046. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:41:49,063][15827] Avg episode reward: [(0, '4.626')] -[2025-08-29 18:41:50,830][19393] Updated weights for policy 0, policy_version 2190 (0.0013) -[2025-08-29 18:41:53,928][19393] Updated weights for policy 0, policy_version 2200 (0.0016) -[2025-08-29 18:41:54,061][15827] Fps is (10 sec: 13108.2, 60 sec: 12492.8, 300 sec: 13232.2). Total num frames: 9011200. Throughput: 0: 3230.4. Samples: 2252908. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:41:54,062][15827] Avg episode reward: [(0, '4.361')] -[2025-08-29 18:41:56,687][19393] Updated weights for policy 0, policy_version 2210 (0.0012) -[2025-08-29 18:41:59,061][15827] Fps is (10 sec: 14336.2, 60 sec: 12561.0, 300 sec: 13259.9). Total num frames: 9089024. Throughput: 0: 3248.8. Samples: 2264090. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:41:59,062][15827] Avg episode reward: [(0, '4.144')] -[2025-08-29 18:41:59,296][19393] Updated weights for policy 0, policy_version 2220 (0.0013) -[2025-08-29 18:42:02,052][19393] Updated weights for policy 0, policy_version 2230 (0.0009) -[2025-08-29 18:42:04,060][15827] Fps is (10 sec: 14745.8, 60 sec: 13309.3, 300 sec: 13246.0). Total num frames: 9158656. Throughput: 0: 3251.9. Samples: 2286766. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:42:04,062][15827] Avg episode reward: [(0, '4.461')] -[2025-08-29 18:42:05,198][19393] Updated weights for policy 0, policy_version 2240 (0.0013) -[2025-08-29 18:42:08,652][19393] Updated weights for policy 0, policy_version 2250 (0.0015) -[2025-08-29 18:42:09,061][15827] Fps is (10 sec: 13106.7, 60 sec: 13175.3, 300 sec: 13190.5). Total num frames: 9220096. Throughput: 0: 3178.7. Samples: 2305108. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:42:09,064][15827] Avg episode reward: [(0, '4.559')] -[2025-08-29 18:42:11,854][19393] Updated weights for policy 0, policy_version 2260 (0.0017) -[2025-08-29 18:42:17,254][15827] Fps is (10 sec: 9003.3, 60 sec: 12250.4, 300 sec: 12994.3). Total num frames: 9277440. Throughput: 0: 2937.4. Samples: 2314634. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:42:17,255][15827] Avg episode reward: [(0, '4.611')] -[2025-08-29 18:42:19,060][15827] Fps is (10 sec: 7373.4, 60 sec: 12014.9, 300 sec: 12940.6). Total num frames: 9293824. Throughput: 0: 2805.8. Samples: 2321630. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:42:19,062][15827] Avg episode reward: [(0, '4.455')] -[2025-08-29 18:42:19,067][19393] Updated weights for policy 0, policy_version 2270 (0.0021) -[2025-08-29 18:42:22,791][19393] Updated weights for policy 0, policy_version 2280 (0.0018) -[2025-08-29 18:42:24,061][15827] Fps is (10 sec: 10831.7, 60 sec: 11741.8, 300 sec: 13049.8). Total num frames: 9351168. Throughput: 0: 3128.3. Samples: 2338182. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:42:24,063][15827] Avg episode reward: [(0, '4.419')] -[2025-08-29 18:42:26,582][19393] Updated weights for policy 0, policy_version 2290 (0.0021) -[2025-08-29 18:42:29,061][15827] Fps is (10 sec: 11059.0, 60 sec: 11332.3, 300 sec: 12982.2). Total num frames: 9404416. Throughput: 0: 2944.3. Samples: 2346236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:42:29,062][15827] Avg episode reward: [(0, '4.502')] -[2025-08-29 18:42:30,318][19393] Updated weights for policy 0, policy_version 2300 (0.0014) -[2025-08-29 18:42:33,804][19393] Updated weights for policy 0, policy_version 2310 (0.0014) -[2025-08-29 18:42:34,061][15827] Fps is (10 sec: 11058.6, 60 sec: 11195.7, 300 sec: 12926.7). Total num frames: 9461760. Throughput: 0: 2880.7. Samples: 2362680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:42:34,063][15827] Avg episode reward: [(0, '4.433')] -[2025-08-29 18:42:37,110][19393] Updated weights for policy 0, policy_version 2320 (0.0016) -[2025-08-29 18:42:39,061][15827] Fps is (10 sec: 11878.3, 60 sec: 11724.3, 300 sec: 12871.1). Total num frames: 9523200. Throughput: 0: 2863.5. Samples: 2381766. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:42:39,062][15827] Avg episode reward: [(0, '4.413')] -[2025-08-29 18:42:40,262][19393] Updated weights for policy 0, policy_version 2330 (0.0016) -[2025-08-29 18:42:43,382][19393] Updated weights for policy 0, policy_version 2340 (0.0017) -[2025-08-29 18:42:44,060][15827] Fps is (10 sec: 12698.4, 60 sec: 11810.3, 300 sec: 12843.4). Total num frames: 9588736. Throughput: 0: 2829.7. Samples: 2391426. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) -[2025-08-29 18:42:44,062][15827] Avg episode reward: [(0, '4.435')] -[2025-08-29 18:42:46,719][19393] Updated weights for policy 0, policy_version 2350 (0.0017) -[2025-08-29 18:42:49,060][15827] Fps is (10 sec: 12697.9, 60 sec: 11742.0, 300 sec: 12801.7). Total num frames: 9650176. Throughput: 0: 2744.0. Samples: 2410246. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:42:49,062][15827] Avg episode reward: [(0, '4.199')] -[2025-08-29 18:42:53,630][19393] Updated weights for policy 0, policy_version 2360 (0.0014) -[2025-08-29 18:42:54,061][15827] Fps is (10 sec: 8191.9, 60 sec: 10990.9, 300 sec: 12607.3). Total num frames: 9670656. Throughput: 0: 2504.3. Samples: 2417800. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:42:54,062][15827] Avg episode reward: [(0, '4.225')] -[2025-08-29 18:42:57,093][19393] Updated weights for policy 0, policy_version 2370 (0.0017) -[2025-08-29 18:42:59,061][15827] Fps is (10 sec: 7782.4, 60 sec: 10649.7, 300 sec: 12695.0). Total num frames: 9728000. Throughput: 0: 2681.4. Samples: 2426734. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:42:59,062][15827] Avg episode reward: [(0, '4.307')] -[2025-08-29 18:43:00,958][19393] Updated weights for policy 0, policy_version 2380 (0.0022) -[2025-08-29 18:43:04,060][15827] Fps is (10 sec: 11059.4, 60 sec: 10376.5, 300 sec: 12649.0). Total num frames: 9781248. Throughput: 0: 2683.9. Samples: 2442406. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:43:04,062][15827] Avg episode reward: [(0, '4.503')] -[2025-08-29 18:43:04,673][19393] Updated weights for policy 0, policy_version 2390 (0.0020) -[2025-08-29 18:43:07,847][19393] Updated weights for policy 0, policy_version 2400 (0.0016) -[2025-08-29 18:43:09,061][15827] Fps is (10 sec: 11468.6, 60 sec: 10376.6, 300 sec: 12607.3). Total num frames: 9842688. Throughput: 0: 2732.1. Samples: 2461126. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:43:09,062][15827] Avg episode reward: [(0, '4.495')] -[2025-08-29 18:43:11,537][19393] Updated weights for policy 0, policy_version 2410 (0.0018) -[2025-08-29 18:43:14,060][15827] Fps is (10 sec: 11878.4, 60 sec: 10959.9, 300 sec: 12551.8). Total num frames: 9900032. Throughput: 0: 2732.4. Samples: 2469192. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:43:14,062][15827] Avg episode reward: [(0, '4.496')] -[2025-08-29 18:43:14,960][19393] Updated weights for policy 0, policy_version 2420 (0.0017) -[2025-08-29 18:43:17,837][19393] Updated weights for policy 0, policy_version 2430 (0.0010) -[2025-08-29 18:43:19,061][15827] Fps is (10 sec: 12697.7, 60 sec: 11264.0, 300 sec: 12551.8). Total num frames: 9969664. Throughput: 0: 2801.5. Samples: 2488748. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:43:19,062][15827] Avg episode reward: [(0, '4.315')] -[2025-08-29 18:43:20,894][19393] Updated weights for policy 0, policy_version 2440 (0.0017) -[2025-08-29 18:43:23,953][19393] Updated weights for policy 0, policy_version 2450 (0.0011) -[2025-08-29 18:43:24,061][15827] Fps is (10 sec: 13516.6, 60 sec: 11400.5, 300 sec: 12524.0). Total num frames: 10035200. Throughput: 0: 2829.3. Samples: 2509086. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:43:24,063][15827] Avg episode reward: [(0, '4.404')] -[2025-08-29 18:43:29,060][15827] Fps is (10 sec: 8192.1, 60 sec: 10786.2, 300 sec: 12315.8). Total num frames: 10051584. Throughput: 0: 2750.3. Samples: 2515188. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:43:29,063][15827] Avg episode reward: [(0, '4.366')] -[2025-08-29 18:43:30,548][19393] Updated weights for policy 0, policy_version 2460 (0.0015) -[2025-08-29 18:43:33,335][19393] Updated weights for policy 0, policy_version 2470 (0.0014) -[2025-08-29 18:43:34,060][15827] Fps is (10 sec: 9011.3, 60 sec: 11059.3, 300 sec: 12464.5). Total num frames: 10125312. Throughput: 0: 2612.4. Samples: 2527802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:43:34,061][15827] Avg episode reward: [(0, '4.263')] -[2025-08-29 18:43:36,279][19393] Updated weights for policy 0, policy_version 2480 (0.0015) -[2025-08-29 18:43:39,061][15827] Fps is (10 sec: 14335.2, 60 sec: 11195.7, 300 sec: 12468.5). Total num frames: 10194944. Throughput: 0: 2925.7. Samples: 2549456. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:43:39,063][15827] Avg episode reward: [(0, '4.259')] -[2025-08-29 18:43:39,089][19393] Updated weights for policy 0, policy_version 2490 (0.0013) -[2025-08-29 18:43:39,090][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002490_10199040.pth... -[2025-08-29 18:43:39,165][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001780_7290880.pth -[2025-08-29 18:43:42,008][19393] Updated weights for policy 0, policy_version 2500 (0.0016) -[2025-08-29 18:43:44,061][15827] Fps is (10 sec: 13926.3, 60 sec: 11264.0, 300 sec: 12454.6). Total num frames: 10264576. Throughput: 0: 2953.8. Samples: 2559654. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:43:44,062][15827] Avg episode reward: [(0, '4.462')] -[2025-08-29 18:43:45,157][19393] Updated weights for policy 0, policy_version 2510 (0.0012) -[2025-08-29 18:43:48,519][19393] Updated weights for policy 0, policy_version 2520 (0.0019) -[2025-08-29 18:43:49,061][15827] Fps is (10 sec: 13107.6, 60 sec: 11264.0, 300 sec: 12413.0). Total num frames: 10326016. Throughput: 0: 3020.9. Samples: 2578346. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:43:49,063][15827] Avg episode reward: [(0, '4.308')] -[2025-08-29 18:43:51,804][19393] Updated weights for policy 0, policy_version 2530 (0.0019) -[2025-08-29 18:43:54,061][15827] Fps is (10 sec: 12288.0, 60 sec: 11946.7, 300 sec: 12399.1). Total num frames: 10387456. Throughput: 0: 3030.8. Samples: 2597512. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:43:54,063][15827] Avg episode reward: [(0, '4.352')] -[2025-08-29 18:43:55,138][19393] Updated weights for policy 0, policy_version 2540 (0.0015) -[2025-08-29 18:43:58,255][19393] Updated weights for policy 0, policy_version 2550 (0.0018) -[2025-08-29 18:43:59,061][15827] Fps is (10 sec: 12697.8, 60 sec: 12083.2, 300 sec: 12343.5). Total num frames: 10452992. Throughput: 0: 3047.4. Samples: 2606326. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:43:59,062][15827] Avg episode reward: [(0, '4.502')] -[2025-08-29 18:44:04,754][15827] Fps is (10 sec: 8809.5, 60 sec: 11540.1, 300 sec: 12162.2). Total num frames: 10481664. Throughput: 0: 2802.4. Samples: 2616800. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:44:04,755][15827] Avg episode reward: [(0, '4.691')] -[2025-08-29 18:44:04,757][19378] Saving new best policy, reward=4.691! -[2025-08-29 18:44:04,915][19393] Updated weights for policy 0, policy_version 2560 (0.0012) -[2025-08-29 18:44:07,901][19393] Updated weights for policy 0, policy_version 2570 (0.0014) -[2025-08-29 18:44:09,061][15827] Fps is (10 sec: 8601.5, 60 sec: 11605.3, 300 sec: 12107.5). Total num frames: 10539008. Throughput: 0: 2803.6. Samples: 2635250. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:44:09,063][15827] Avg episode reward: [(0, '4.231')] -[2025-08-29 18:44:11,310][19393] Updated weights for policy 0, policy_version 2580 (0.0022) -[2025-08-29 18:44:14,060][15827] Fps is (10 sec: 12764.4, 60 sec: 11673.6, 300 sec: 12246.3). Total num frames: 10600448. Throughput: 0: 2860.8. Samples: 2643924. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:44:14,061][15827] Avg episode reward: [(0, '4.388')] -[2025-08-29 18:44:14,450][19393] Updated weights for policy 0, policy_version 2590 (0.0012) -[2025-08-29 18:44:17,346][19393] Updated weights for policy 0, policy_version 2600 (0.0012) -[2025-08-29 18:44:19,061][15827] Fps is (10 sec: 13107.4, 60 sec: 11673.6, 300 sec: 12232.5). Total num frames: 10670080. Throughput: 0: 3037.6. Samples: 2664494. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:44:19,062][15827] Avg episode reward: [(0, '4.446')] -[2025-08-29 18:44:20,443][19393] Updated weights for policy 0, policy_version 2610 (0.0014) -[2025-08-29 18:44:23,226][19393] Updated weights for policy 0, policy_version 2620 (0.0012) -[2025-08-29 18:44:24,061][15827] Fps is (10 sec: 13926.2, 60 sec: 11741.9, 300 sec: 12218.6). Total num frames: 10739712. Throughput: 0: 3017.4. Samples: 2685238. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:44:24,062][15827] Avg episode reward: [(0, '4.418')] -[2025-08-29 18:44:26,099][19393] Updated weights for policy 0, policy_version 2630 (0.0012) -[2025-08-29 18:44:29,060][15827] Fps is (10 sec: 13926.6, 60 sec: 12629.4, 300 sec: 12176.9). Total num frames: 10809344. Throughput: 0: 3029.3. Samples: 2695974. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:44:29,062][15827] Avg episode reward: [(0, '4.299')] -[2025-08-29 18:44:29,209][19393] Updated weights for policy 0, policy_version 2640 (0.0015) -[2025-08-29 18:44:32,017][19393] Updated weights for policy 0, policy_version 2650 (0.0011) -[2025-08-29 18:44:34,060][15827] Fps is (10 sec: 14336.1, 60 sec: 12629.3, 300 sec: 12176.9). Total num frames: 10883072. Throughput: 0: 3078.6. Samples: 2716884. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:44:34,061][15827] Avg episode reward: [(0, '4.530')] -[2025-08-29 18:44:34,763][19393] Updated weights for policy 0, policy_version 2660 (0.0012) -[2025-08-29 18:44:40,583][15827] Fps is (10 sec: 9953.2, 60 sec: 11850.8, 300 sec: 11990.1). Total num frames: 10924032. Throughput: 0: 2803.5. Samples: 2727940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:44:40,584][15827] Avg episode reward: [(0, '4.444')] -[2025-08-29 18:44:41,183][19393] Updated weights for policy 0, policy_version 2670 (0.0011) -[2025-08-29 18:44:44,060][15827] Fps is (10 sec: 9421.0, 60 sec: 11878.4, 300 sec: 11982.6). Total num frames: 10977280. Throughput: 0: 2893.7. Samples: 2736540. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:44:44,062][19393] Updated weights for policy 0, policy_version 2680 (0.0012) -[2025-08-29 18:44:44,062][15827] Avg episode reward: [(0, '4.252')] -[2025-08-29 18:44:46,817][19393] Updated weights for policy 0, policy_version 2690 (0.0017) -[2025-08-29 18:44:49,061][15827] Fps is (10 sec: 14494.9, 60 sec: 12015.0, 300 sec: 12149.2). Total num frames: 11046912. Throughput: 0: 3199.3. Samples: 2758550. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:44:49,063][15827] Avg episode reward: [(0, '4.443')] -[2025-08-29 18:44:49,693][19393] Updated weights for policy 0, policy_version 2700 (0.0015) -[2025-08-29 18:44:52,622][19393] Updated weights for policy 0, policy_version 2710 (0.0012) -[2025-08-29 18:44:54,060][15827] Fps is (10 sec: 13926.3, 60 sec: 12151.5, 300 sec: 12121.4). Total num frames: 11116544. Throughput: 0: 3215.1. Samples: 2779928. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:44:54,062][15827] Avg episode reward: [(0, '4.306')] -[2025-08-29 18:44:55,579][19393] Updated weights for policy 0, policy_version 2720 (0.0013) -[2025-08-29 18:44:58,498][19393] Updated weights for policy 0, policy_version 2730 (0.0013) -[2025-08-29 18:44:59,061][15827] Fps is (10 sec: 14335.7, 60 sec: 12288.0, 300 sec: 12121.4). Total num frames: 11190272. Throughput: 0: 3248.7. Samples: 2790118. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:44:59,062][15827] Avg episode reward: [(0, '4.287')] -[2025-08-29 18:45:01,198][19393] Updated weights for policy 0, policy_version 2740 (0.0013) -[2025-08-29 18:45:04,029][19393] Updated weights for policy 0, policy_version 2750 (0.0014) -[2025-08-29 18:45:04,060][15827] Fps is (10 sec: 14745.6, 60 sec: 13191.5, 300 sec: 12121.4). Total num frames: 11264000. Throughput: 0: 3283.3. Samples: 2812240. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:45:04,061][15827] Avg episode reward: [(0, '4.231')] -[2025-08-29 18:45:06,987][19393] Updated weights for policy 0, policy_version 2760 (0.0013) -[2025-08-29 18:45:09,060][15827] Fps is (10 sec: 13927.0, 60 sec: 13175.5, 300 sec: 12093.6). Total num frames: 11329536. Throughput: 0: 3283.2. Samples: 2832980. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:45:09,061][15827] Avg episode reward: [(0, '4.698')] -[2025-08-29 18:45:09,066][19378] Saving new best policy, reward=4.698! -[2025-08-29 18:45:09,869][19393] Updated weights for policy 0, policy_version 2770 (0.0011) -[2025-08-29 18:45:12,758][19393] Updated weights for policy 0, policy_version 2780 (0.0011) -[2025-08-29 18:45:16,421][15827] Fps is (10 sec: 9941.0, 60 sec: 12611.0, 300 sec: 11970.0). Total num frames: 11386880. Throughput: 0: 3112.9. Samples: 2843404. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:45:16,423][15827] Avg episode reward: [(0, '4.367')] -[2025-08-29 18:45:19,061][15827] Fps is (10 sec: 9420.6, 60 sec: 12561.1, 300 sec: 11968.6). Total num frames: 11423744. Throughput: 0: 3005.0. Samples: 2852110. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:45:19,062][15827] Avg episode reward: [(0, '4.320')] -[2025-08-29 18:45:19,309][19393] Updated weights for policy 0, policy_version 2790 (0.0016) -[2025-08-29 18:45:22,558][19393] Updated weights for policy 0, policy_version 2800 (0.0016) -[2025-08-29 18:45:24,060][15827] Fps is (10 sec: 13404.8, 60 sec: 12492.8, 300 sec: 12122.1). Total num frames: 11489280. Throughput: 0: 3309.9. Samples: 2871846. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:45:24,062][15827] Avg episode reward: [(0, '4.421')] -[2025-08-29 18:45:25,531][19393] Updated weights for policy 0, policy_version 2810 (0.0013) -[2025-08-29 18:45:28,318][19393] Updated weights for policy 0, policy_version 2820 (0.0012) -[2025-08-29 18:45:29,061][15827] Fps is (10 sec: 13517.0, 60 sec: 12492.8, 300 sec: 12107.5). Total num frames: 11558912. Throughput: 0: 3240.0. Samples: 2882340. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:45:29,062][15827] Avg episode reward: [(0, '4.640')] -[2025-08-29 18:45:31,184][19393] Updated weights for policy 0, policy_version 2830 (0.0015) -[2025-08-29 18:45:33,869][19393] Updated weights for policy 0, policy_version 2840 (0.0014) -[2025-08-29 18:45:34,060][15827] Fps is (10 sec: 14335.9, 60 sec: 12492.8, 300 sec: 12121.4). Total num frames: 11632640. Throughput: 0: 3240.2. Samples: 2904358. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:45:34,062][15827] Avg episode reward: [(0, '4.403')] -[2025-08-29 18:45:36,716][19393] Updated weights for policy 0, policy_version 2850 (0.0011) -[2025-08-29 18:45:39,060][15827] Fps is (10 sec: 14745.7, 60 sec: 13378.5, 300 sec: 12149.2). Total num frames: 11706368. Throughput: 0: 3252.1. Samples: 2926272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:45:39,062][15827] Avg episode reward: [(0, '4.435')] -[2025-08-29 18:45:39,070][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002858_11706368.pth... -[2025-08-29 18:45:39,136][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002160_8847360.pth -[2025-08-29 18:45:39,524][19393] Updated weights for policy 0, policy_version 2860 (0.0014) -[2025-08-29 18:45:42,358][19393] Updated weights for policy 0, policy_version 2870 (0.0012) -[2025-08-29 18:45:44,060][15827] Fps is (10 sec: 14745.7, 60 sec: 13380.3, 300 sec: 12163.0). Total num frames: 11780096. Throughput: 0: 3262.7. Samples: 2936938. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:45:44,062][15827] Avg episode reward: [(0, '4.445')] -[2025-08-29 18:45:45,184][19393] Updated weights for policy 0, policy_version 2880 (0.0010) -[2025-08-29 18:45:48,091][19393] Updated weights for policy 0, policy_version 2890 (0.0014) -[2025-08-29 18:45:52,251][15827] Fps is (10 sec: 10247.6, 60 sec: 12575.2, 300 sec: 12005.4). Total num frames: 11841536. Throughput: 0: 3036.7. Samples: 2958580. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:45:52,252][15827] Avg episode reward: [(0, '4.426')] -[2025-08-29 18:45:54,061][15827] Fps is (10 sec: 9011.1, 60 sec: 12561.0, 300 sec: 11982.5). Total num frames: 11870208. Throughput: 0: 2989.5. Samples: 2967506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:45:54,062][15827] Avg episode reward: [(0, '4.455')] -[2025-08-29 18:45:54,524][19393] Updated weights for policy 0, policy_version 2900 (0.0012) -[2025-08-29 18:45:57,361][19393] Updated weights for policy 0, policy_version 2910 (0.0013) -[2025-08-29 18:45:59,060][15827] Fps is (10 sec: 15036.8, 60 sec: 12561.1, 300 sec: 12142.6). Total num frames: 11943936. Throughput: 0: 3161.8. Samples: 2978222. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) -[2025-08-29 18:45:59,062][15827] Avg episode reward: [(0, '4.502')] -[2025-08-29 18:46:00,146][19393] Updated weights for policy 0, policy_version 2920 (0.0014) -[2025-08-29 18:46:02,830][19393] Updated weights for policy 0, policy_version 2930 (0.0010) -[2025-08-29 18:46:04,060][15827] Fps is (10 sec: 14336.3, 60 sec: 12492.8, 300 sec: 12149.2). Total num frames: 12013568. Throughput: 0: 3299.5. Samples: 3000588. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:46:04,061][15827] Avg episode reward: [(0, '4.259')] -[2025-08-29 18:46:05,541][19393] Updated weights for policy 0, policy_version 2940 (0.0013) -[2025-08-29 18:46:08,248][19393] Updated weights for policy 0, policy_version 2950 (0.0009) -[2025-08-29 18:46:09,060][15827] Fps is (10 sec: 14745.8, 60 sec: 12697.6, 300 sec: 12163.0). Total num frames: 12091392. Throughput: 0: 3359.0. Samples: 3023002. Policy #0 lag: (min: 0.0, avg: 1.1, max: 4.0) -[2025-08-29 18:46:09,062][15827] Avg episode reward: [(0, '4.580')] -[2025-08-29 18:46:11,079][19393] Updated weights for policy 0, policy_version 2960 (0.0012) -[2025-08-29 18:46:13,922][19393] Updated weights for policy 0, policy_version 2970 (0.0016) -[2025-08-29 18:46:14,060][15827] Fps is (10 sec: 15155.0, 60 sec: 13501.9, 300 sec: 12176.9). Total num frames: 12165120. Throughput: 0: 3368.8. Samples: 3033938. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:46:14,061][15827] Avg episode reward: [(0, '4.374')] -[2025-08-29 18:46:16,738][19393] Updated weights for policy 0, policy_version 2980 (0.0013) -[2025-08-29 18:46:19,060][15827] Fps is (10 sec: 14335.8, 60 sec: 13516.8, 300 sec: 12163.0). Total num frames: 12234752. Throughput: 0: 3351.3. Samples: 3055166. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:46:19,061][15827] Avg episode reward: [(0, '4.473')] -[2025-08-29 18:46:19,886][19393] Updated weights for policy 0, policy_version 2990 (0.0016) -[2025-08-29 18:46:22,789][19393] Updated weights for policy 0, policy_version 3000 (0.0017) -[2025-08-29 18:46:24,060][15827] Fps is (10 sec: 13516.9, 60 sec: 13516.8, 300 sec: 12121.4). Total num frames: 12300288. Throughput: 0: 3318.9. Samples: 3075622. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:46:24,063][15827] Avg episode reward: [(0, '4.445')] -[2025-08-29 18:46:29,060][15827] Fps is (10 sec: 8601.6, 60 sec: 12697.6, 300 sec: 11968.7). Total num frames: 12320768. Throughput: 0: 3153.8. Samples: 3078860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:46:29,062][15827] Avg episode reward: [(0, '4.387')] -[2025-08-29 18:46:29,387][19393] Updated weights for policy 0, policy_version 3010 (0.0014) -[2025-08-29 18:46:32,323][19393] Updated weights for policy 0, policy_version 3020 (0.0014) -[2025-08-29 18:46:34,060][15827] Fps is (10 sec: 9011.2, 60 sec: 12629.3, 300 sec: 12107.0). Total num frames: 12390400. Throughput: 0: 3253.4. Samples: 3094604. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) -[2025-08-29 18:46:34,062][15827] Avg episode reward: [(0, '4.182')] -[2025-08-29 18:46:35,133][19393] Updated weights for policy 0, policy_version 3030 (0.0014) -[2025-08-29 18:46:37,998][19393] Updated weights for policy 0, policy_version 3040 (0.0012) -[2025-08-29 18:46:39,061][15827] Fps is (10 sec: 14335.9, 60 sec: 12629.3, 300 sec: 12149.2). Total num frames: 12464128. Throughput: 0: 3307.6. Samples: 3116348. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:46:39,062][15827] Avg episode reward: [(0, '4.507')] -[2025-08-29 18:46:40,836][19393] Updated weights for policy 0, policy_version 3050 (0.0016) -[2025-08-29 18:46:43,695][19393] Updated weights for policy 0, policy_version 3060 (0.0014) -[2025-08-29 18:46:44,061][15827] Fps is (10 sec: 14745.5, 60 sec: 12629.3, 300 sec: 12176.9). Total num frames: 12537856. Throughput: 0: 3310.6. Samples: 3127198. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:46:44,062][15827] Avg episode reward: [(0, '4.395')] -[2025-08-29 18:46:46,574][19393] Updated weights for policy 0, policy_version 3070 (0.0012) -[2025-08-29 18:46:49,060][15827] Fps is (10 sec: 14745.7, 60 sec: 13554.8, 300 sec: 12204.7). Total num frames: 12611584. Throughput: 0: 3291.9. Samples: 3148724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:46:49,062][15827] Avg episode reward: [(0, '4.618')] -[2025-08-29 18:46:49,391][19393] Updated weights for policy 0, policy_version 3080 (0.0018) -[2025-08-29 18:46:52,204][19393] Updated weights for policy 0, policy_version 3090 (0.0012) -[2025-08-29 18:46:54,060][15827] Fps is (10 sec: 14336.3, 60 sec: 13516.8, 300 sec: 12176.9). Total num frames: 12681216. Throughput: 0: 3269.9. Samples: 3170146. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:46:54,062][15827] Avg episode reward: [(0, '4.170')] -[2025-08-29 18:46:55,198][19393] Updated weights for policy 0, policy_version 3100 (0.0015) -[2025-08-29 18:46:58,019][19393] Updated weights for policy 0, policy_version 3110 (0.0013) -[2025-08-29 18:46:59,061][15827] Fps is (10 sec: 13925.9, 60 sec: 13448.4, 300 sec: 12176.9). Total num frames: 12750848. Throughput: 0: 3265.3. Samples: 3180876. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:46:59,063][15827] Avg episode reward: [(0, '4.565')] -[2025-08-29 18:47:04,060][15827] Fps is (10 sec: 8601.5, 60 sec: 12561.1, 300 sec: 12024.2). Total num frames: 12767232. Throughput: 0: 3071.2. Samples: 3193370. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:47:04,062][15827] Avg episode reward: [(0, '4.576')] -[2025-08-29 18:47:04,857][19393] Updated weights for policy 0, policy_version 3120 (0.0014) -[2025-08-29 18:47:08,218][19393] Updated weights for policy 0, policy_version 3130 (0.0015) -[2025-08-29 18:47:09,061][15827] Fps is (10 sec: 7782.3, 60 sec: 12287.9, 300 sec: 12169.8). Total num frames: 12828672. Throughput: 0: 2939.6. Samples: 3207906. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:47:09,063][15827] Avg episode reward: [(0, '4.267')] -[2025-08-29 18:47:11,413][19393] Updated weights for policy 0, policy_version 3140 (0.0015) -[2025-08-29 18:47:14,060][15827] Fps is (10 sec: 13107.1, 60 sec: 12219.7, 300 sec: 12218.6). Total num frames: 12898304. Throughput: 0: 3081.6. Samples: 3217530. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:47:14,062][15827] Avg episode reward: [(0, '4.751')] -[2025-08-29 18:47:14,063][19378] Saving new best policy, reward=4.751! -[2025-08-29 18:47:14,592][19393] Updated weights for policy 0, policy_version 3150 (0.0016) -[2025-08-29 18:47:17,597][19393] Updated weights for policy 0, policy_version 3160 (0.0013) -[2025-08-29 18:47:19,060][15827] Fps is (10 sec: 13108.0, 60 sec: 12083.2, 300 sec: 12232.5). Total num frames: 12959744. Throughput: 0: 3166.6. Samples: 3237100. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:47:19,062][15827] Avg episode reward: [(0, '4.426')] -[2025-08-29 18:47:21,004][19393] Updated weights for policy 0, policy_version 3170 (0.0015) -[2025-08-29 18:47:24,061][15827] Fps is (10 sec: 12287.9, 60 sec: 12014.9, 300 sec: 12260.2). Total num frames: 13021184. Throughput: 0: 3089.7. Samples: 3255386. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:47:24,062][15827] Avg episode reward: [(0, '4.482')] -[2025-08-29 18:47:24,423][19393] Updated weights for policy 0, policy_version 3180 (0.0018) -[2025-08-29 18:47:27,574][19393] Updated weights for policy 0, policy_version 3190 (0.0015) -[2025-08-29 18:47:29,060][15827] Fps is (10 sec: 12697.5, 60 sec: 12765.9, 300 sec: 12288.0). Total num frames: 13086720. Throughput: 0: 3058.0. Samples: 3264810. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:47:29,068][15827] Avg episode reward: [(0, '4.401')] -[2025-08-29 18:47:30,708][19393] Updated weights for policy 0, policy_version 3200 (0.0015) -[2025-08-29 18:47:34,060][15827] Fps is (10 sec: 12288.1, 60 sec: 12561.1, 300 sec: 12274.1). Total num frames: 13144064. Throughput: 0: 2989.0. Samples: 3283230. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:47:34,062][15827] Avg episode reward: [(0, '4.407')] -[2025-08-29 18:47:34,117][19393] Updated weights for policy 0, policy_version 3210 (0.0018) -[2025-08-29 18:47:39,752][15827] Fps is (10 sec: 8045.3, 60 sec: 11675.6, 300 sec: 12120.7). Total num frames: 13172736. Throughput: 0: 2689.2. Samples: 3293018. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:47:39,753][15827] Avg episode reward: [(0, '4.305')] -[2025-08-29 18:47:39,759][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003216_13172736.pth... -[2025-08-29 18:47:39,843][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002490_10199040.pth -[2025-08-29 18:47:40,858][19393] Updated weights for policy 0, policy_version 3220 (0.0013) -[2025-08-29 18:47:43,824][19393] Updated weights for policy 0, policy_version 3230 (0.0014) -[2025-08-29 18:47:44,060][15827] Fps is (10 sec: 8601.6, 60 sec: 11537.1, 300 sec: 12135.3). Total num frames: 13230080. Throughput: 0: 2671.5. Samples: 3301094. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:47:44,062][15827] Avg episode reward: [(0, '4.470')] -[2025-08-29 18:47:46,982][19393] Updated weights for policy 0, policy_version 3240 (0.0015) -[2025-08-29 18:47:49,060][15827] Fps is (10 sec: 13200.9, 60 sec: 11400.6, 300 sec: 12288.0). Total num frames: 13295616. Throughput: 0: 2826.5. Samples: 3320562. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:47:49,061][15827] Avg episode reward: [(0, '4.426')] -[2025-08-29 18:47:50,069][19393] Updated weights for policy 0, policy_version 3250 (0.0013) -[2025-08-29 18:47:53,164][19393] Updated weights for policy 0, policy_version 3260 (0.0014) -[2025-08-29 18:47:54,060][15827] Fps is (10 sec: 13107.3, 60 sec: 11332.3, 300 sec: 12315.8). Total num frames: 13361152. Throughput: 0: 2943.9. Samples: 3340380. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:47:54,062][15827] Avg episode reward: [(0, '4.520')] -[2025-08-29 18:47:56,215][19393] Updated weights for policy 0, policy_version 3270 (0.0016) -[2025-08-29 18:47:59,060][15827] Fps is (10 sec: 13516.7, 60 sec: 11332.4, 300 sec: 12371.3). Total num frames: 13430784. Throughput: 0: 2956.6. Samples: 3350576. Policy #0 lag: (min: 0.0, avg: 1.7, max: 3.0) -[2025-08-29 18:47:59,062][15827] Avg episode reward: [(0, '4.450')] -[2025-08-29 18:47:59,231][19393] Updated weights for policy 0, policy_version 3280 (0.0016) -[2025-08-29 18:48:02,283][19393] Updated weights for policy 0, policy_version 3290 (0.0014) -[2025-08-29 18:48:04,061][15827] Fps is (10 sec: 13515.6, 60 sec: 12151.3, 300 sec: 12385.2). Total num frames: 13496320. Throughput: 0: 2976.0. Samples: 3371024. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:48:04,063][15827] Avg episode reward: [(0, '4.382')] -[2025-08-29 18:48:05,502][19393] Updated weights for policy 0, policy_version 3300 (0.0013) -[2025-08-29 18:48:08,561][19393] Updated weights for policy 0, policy_version 3310 (0.0014) -[2025-08-29 18:48:09,061][15827] Fps is (10 sec: 13106.9, 60 sec: 12219.8, 300 sec: 12413.0). Total num frames: 13561856. Throughput: 0: 3007.6. Samples: 3390728. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:48:09,062][15827] Avg episode reward: [(0, '4.440')] -[2025-08-29 18:48:11,611][19393] Updated weights for policy 0, policy_version 3320 (0.0012) -[2025-08-29 18:48:15,582][15827] Fps is (10 sec: 9243.8, 60 sec: 11451.5, 300 sec: 12252.6). Total num frames: 13602816. Throughput: 0: 2924.1. Samples: 3400844. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:48:15,583][15827] Avg episode reward: [(0, '4.511')] -[2025-08-29 18:48:18,250][19393] Updated weights for policy 0, policy_version 3330 (0.0015) -[2025-08-29 18:48:19,060][15827] Fps is (10 sec: 8601.8, 60 sec: 11468.8, 300 sec: 12246.4). Total num frames: 13647872. Throughput: 0: 2799.1. Samples: 3409188. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:48:19,061][15827] Avg episode reward: [(0, '4.412')] -[2025-08-29 18:48:21,387][19393] Updated weights for policy 0, policy_version 3340 (0.0014) -[2025-08-29 18:48:24,061][15827] Fps is (10 sec: 13526.8, 60 sec: 11605.3, 300 sec: 12426.8). Total num frames: 13717504. Throughput: 0: 3061.0. Samples: 3428646. Policy #0 lag: (min: 0.0, avg: 1.8, max: 3.0) -[2025-08-29 18:48:24,063][15827] Avg episode reward: [(0, '4.649')] -[2025-08-29 18:48:24,566][19393] Updated weights for policy 0, policy_version 3350 (0.0012) -[2025-08-29 18:48:27,589][19393] Updated weights for policy 0, policy_version 3360 (0.0014) -[2025-08-29 18:48:29,060][15827] Fps is (10 sec: 13107.1, 60 sec: 11537.1, 300 sec: 12385.2). Total num frames: 13778944. Throughput: 0: 3055.7. Samples: 3438602. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:48:29,062][15827] Avg episode reward: [(0, '4.415')] -[2025-08-29 18:48:30,547][19393] Updated weights for policy 0, policy_version 3370 (0.0011) -[2025-08-29 18:48:33,503][19393] Updated weights for policy 0, policy_version 3380 (0.0014) -[2025-08-29 18:48:34,060][15827] Fps is (10 sec: 13107.5, 60 sec: 11741.9, 300 sec: 12385.2). Total num frames: 13848576. Throughput: 0: 3085.3. Samples: 3459402. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:48:34,062][15827] Avg episode reward: [(0, '4.437')] -[2025-08-29 18:48:36,511][19393] Updated weights for policy 0, policy_version 3390 (0.0012) -[2025-08-29 18:48:39,060][15827] Fps is (10 sec: 13926.4, 60 sec: 12569.4, 300 sec: 12385.2). Total num frames: 13918208. Throughput: 0: 3092.9. Samples: 3479560. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:48:39,062][15827] Avg episode reward: [(0, '4.313')] -[2025-08-29 18:48:39,556][19393] Updated weights for policy 0, policy_version 3400 (0.0014) -[2025-08-29 18:48:42,431][19393] Updated weights for policy 0, policy_version 3410 (0.0013) -[2025-08-29 18:48:44,060][15827] Fps is (10 sec: 13926.2, 60 sec: 12629.3, 300 sec: 12413.0). Total num frames: 13987840. Throughput: 0: 3097.5. Samples: 3489964. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:48:44,063][15827] Avg episode reward: [(0, '4.660')] -[2025-08-29 18:48:45,499][19393] Updated weights for policy 0, policy_version 3420 (0.0015) -[2025-08-29 18:48:51,420][15827] Fps is (10 sec: 9610.8, 60 sec: 11888.7, 300 sec: 12273.2). Total num frames: 14036992. Throughput: 0: 2940.6. Samples: 3510288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:48:51,423][15827] Avg episode reward: [(0, '4.379')] -[2025-08-29 18:48:52,043][19393] Updated weights for policy 0, policy_version 3430 (0.0013) -[2025-08-29 18:48:54,060][15827] Fps is (10 sec: 8192.0, 60 sec: 11810.1, 300 sec: 12260.2). Total num frames: 14069760. Throughput: 0: 2835.2. Samples: 3518312. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:48:54,062][15827] Avg episode reward: [(0, '4.461')] -[2025-08-29 18:48:55,206][19393] Updated weights for policy 0, policy_version 3440 (0.0015) -[2025-08-29 18:48:58,313][19393] Updated weights for policy 0, policy_version 3450 (0.0014) -[2025-08-29 18:48:59,060][15827] Fps is (10 sec: 13402.1, 60 sec: 11810.1, 300 sec: 12428.3). Total num frames: 14139392. Throughput: 0: 2924.1. Samples: 3527980. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:48:59,062][15827] Avg episode reward: [(0, '4.392')] -[2025-08-29 18:49:01,345][19393] Updated weights for policy 0, policy_version 3460 (0.0014) -[2025-08-29 18:49:04,060][15827] Fps is (10 sec: 13516.8, 60 sec: 11810.3, 300 sec: 12426.9). Total num frames: 14204928. Throughput: 0: 3094.3. Samples: 3548430. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:49:04,062][15827] Avg episode reward: [(0, '4.620')] -[2025-08-29 18:49:04,411][19393] Updated weights for policy 0, policy_version 3470 (0.0013) -[2025-08-29 18:49:07,540][19393] Updated weights for policy 0, policy_version 3480 (0.0015) -[2025-08-29 18:49:09,060][15827] Fps is (10 sec: 13517.0, 60 sec: 11878.5, 300 sec: 12454.6). Total num frames: 14274560. Throughput: 0: 3101.6. Samples: 3568218. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:49:09,061][15827] Avg episode reward: [(0, '4.597')] -[2025-08-29 18:49:10,607][19393] Updated weights for policy 0, policy_version 3490 (0.0011) -[2025-08-29 18:49:13,749][19393] Updated weights for policy 0, policy_version 3500 (0.0018) -[2025-08-29 18:49:14,060][15827] Fps is (10 sec: 13107.2, 60 sec: 12537.7, 300 sec: 12426.9). Total num frames: 14336000. Throughput: 0: 3096.6. Samples: 3577948. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:49:14,062][15827] Avg episode reward: [(0, '4.463')] -[2025-08-29 18:49:16,645][19393] Updated weights for policy 0, policy_version 3510 (0.0013) -[2025-08-29 18:49:19,061][15827] Fps is (10 sec: 12287.7, 60 sec: 12492.8, 300 sec: 12399.1). Total num frames: 14397440. Throughput: 0: 3074.6. Samples: 3597760. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:49:19,062][15827] Avg episode reward: [(0, '4.504')] -[2025-08-29 18:49:20,366][19393] Updated weights for policy 0, policy_version 3520 (0.0018) -[2025-08-29 18:49:27,254][15827] Fps is (10 sec: 9002.9, 60 sec: 11666.9, 300 sec: 12225.1). Total num frames: 14454784. Throughput: 0: 2804.0. Samples: 3614694. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:49:27,256][15827] Avg episode reward: [(0, '4.645')] -[2025-08-29 18:49:27,458][19393] Updated weights for policy 0, policy_version 3530 (0.0012) -[2025-08-29 18:49:29,061][15827] Fps is (10 sec: 7782.4, 60 sec: 11605.3, 300 sec: 12176.9). Total num frames: 14475264. Throughput: 0: 2782.3. Samples: 3615168. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:49:29,062][15827] Avg episode reward: [(0, '4.469')] -[2025-08-29 18:49:30,849][19393] Updated weights for policy 0, policy_version 3540 (0.0017) -[2025-08-29 18:49:34,060][15827] Fps is (10 sec: 12036.7, 60 sec: 11468.8, 300 sec: 12309.9). Total num frames: 14536704. Throughput: 0: 2834.6. Samples: 3631156. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:49:34,062][15827] Avg episode reward: [(0, '4.297')] -[2025-08-29 18:49:34,384][19393] Updated weights for policy 0, policy_version 3550 (0.0017) -[2025-08-29 18:49:37,678][19393] Updated weights for policy 0, policy_version 3560 (0.0016) -[2025-08-29 18:49:39,060][15827] Fps is (10 sec: 12288.0, 60 sec: 11332.3, 300 sec: 12274.1). Total num frames: 14598144. Throughput: 0: 2917.1. Samples: 3649582. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:49:39,062][15827] Avg episode reward: [(0, '4.168')] -[2025-08-29 18:49:39,069][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003564_14598144.pth... -[2025-08-29 18:49:39,149][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002858_11706368.pth -[2025-08-29 18:49:40,981][19393] Updated weights for policy 0, policy_version 3570 (0.0013) -[2025-08-29 18:49:43,986][19393] Updated weights for policy 0, policy_version 3580 (0.0016) -[2025-08-29 18:49:44,060][15827] Fps is (10 sec: 12697.5, 60 sec: 11264.0, 300 sec: 12260.2). Total num frames: 14663680. Throughput: 0: 2915.4. Samples: 3659172. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:49:44,062][15827] Avg episode reward: [(0, '4.408')] -[2025-08-29 18:49:47,214][19393] Updated weights for policy 0, policy_version 3590 (0.0019) -[2025-08-29 18:49:49,060][15827] Fps is (10 sec: 12697.7, 60 sec: 11938.3, 300 sec: 12232.5). Total num frames: 14725120. Throughput: 0: 2891.9. Samples: 3678566. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:49:49,062][15827] Avg episode reward: [(0, '4.397')] -[2025-08-29 18:49:50,716][19393] Updated weights for policy 0, policy_version 3600 (0.0017) -[2025-08-29 18:49:53,965][19393] Updated weights for policy 0, policy_version 3610 (0.0016) -[2025-08-29 18:49:54,061][15827] Fps is (10 sec: 12287.9, 60 sec: 11946.7, 300 sec: 12190.8). Total num frames: 14786560. Throughput: 0: 2849.1. Samples: 3696428. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:49:54,070][15827] Avg episode reward: [(0, '4.462')] -[2025-08-29 18:49:57,230][19393] Updated weights for policy 0, policy_version 3620 (0.0014) -[2025-08-29 18:49:59,060][15827] Fps is (10 sec: 12288.0, 60 sec: 11810.1, 300 sec: 12149.2). Total num frames: 14848000. Throughput: 0: 2849.8. Samples: 3706190. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:49:59,062][15827] Avg episode reward: [(0, '4.356')] -[2025-08-29 18:50:04,061][15827] Fps is (10 sec: 7782.0, 60 sec: 10990.8, 300 sec: 11982.5). Total num frames: 14864384. Throughput: 0: 2608.8. Samples: 3715156. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:50:04,064][15827] Avg episode reward: [(0, '4.396')] -[2025-08-29 18:50:04,322][19393] Updated weights for policy 0, policy_version 3630 (0.0013) -[2025-08-29 18:50:08,247][19393] Updated weights for policy 0, policy_version 3640 (0.0018) -[2025-08-29 18:50:09,061][15827] Fps is (10 sec: 6963.0, 60 sec: 10717.8, 300 sec: 12065.2). Total num frames: 14917632. Throughput: 0: 2753.0. Samples: 3729788. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:50:09,064][15827] Avg episode reward: [(0, '4.451')] -[2025-08-29 18:50:12,125][19393] Updated weights for policy 0, policy_version 3650 (0.0025) -[2025-08-29 18:50:14,061][15827] Fps is (10 sec: 10240.4, 60 sec: 10513.0, 300 sec: 12010.3). Total num frames: 14966784. Throughput: 0: 2716.5. Samples: 3737412. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:50:14,063][15827] Avg episode reward: [(0, '4.359')] -[2025-08-29 18:50:16,435][19393] Updated weights for policy 0, policy_version 3660 (0.0027) -[2025-08-29 18:50:19,061][15827] Fps is (10 sec: 10240.1, 60 sec: 10376.5, 300 sec: 11968.6). Total num frames: 15020032. Throughput: 0: 2700.7. Samples: 3752686. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:50:19,065][15827] Avg episode reward: [(0, '4.528')] -[2025-08-29 18:50:20,166][19393] Updated weights for policy 0, policy_version 3670 (0.0026) -[2025-08-29 18:50:22,988][19393] Updated weights for policy 0, policy_version 3680 (0.0015) -[2025-08-29 18:50:24,060][15827] Fps is (10 sec: 12288.1, 60 sec: 11176.3, 300 sec: 11968.6). Total num frames: 15089664. Throughput: 0: 2717.9. Samples: 3771886. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:50:24,062][15827] Avg episode reward: [(0, '4.440')] -[2025-08-29 18:50:25,808][19393] Updated weights for policy 0, policy_version 3690 (0.0012) -[2025-08-29 18:50:29,061][15827] Fps is (10 sec: 13107.2, 60 sec: 11264.0, 300 sec: 11927.0). Total num frames: 15151104. Throughput: 0: 2737.4. Samples: 3782356. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:50:29,062][15827] Avg episode reward: [(0, '4.444')] -[2025-08-29 18:50:29,377][19393] Updated weights for policy 0, policy_version 3700 (0.0018) -[2025-08-29 18:50:33,244][19393] Updated weights for policy 0, policy_version 3710 (0.0022) -[2025-08-29 18:50:34,061][15827] Fps is (10 sec: 11058.9, 60 sec: 11059.1, 300 sec: 11843.7). Total num frames: 15200256. Throughput: 0: 2670.3. Samples: 3798732. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:50:34,064][15827] Avg episode reward: [(0, '4.367')] -[2025-08-29 18:50:39,062][15827] Fps is (10 sec: 6143.5, 60 sec: 10239.8, 300 sec: 11635.4). Total num frames: 15212544. Throughput: 0: 2418.6. Samples: 3805268. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:50:39,070][15827] Avg episode reward: [(0, '4.363')] -[2025-08-29 18:50:41,940][19393] Updated weights for policy 0, policy_version 3720 (0.0028) -[2025-08-29 18:50:44,061][15827] Fps is (10 sec: 5324.9, 60 sec: 9830.4, 300 sec: 11692.4). Total num frames: 15253504. Throughput: 0: 2305.2. Samples: 3809924. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:50:44,063][15827] Avg episode reward: [(0, '4.269')] -[2025-08-29 18:50:46,611][19393] Updated weights for policy 0, policy_version 3730 (0.0025) -[2025-08-29 18:50:49,061][15827] Fps is (10 sec: 8602.3, 60 sec: 9557.3, 300 sec: 11621.5). Total num frames: 15298560. Throughput: 0: 2399.1. Samples: 3823116. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:50:49,063][15827] Avg episode reward: [(0, '4.348')] -[2025-08-29 18:50:50,433][19393] Updated weights for policy 0, policy_version 3740 (0.0017) -[2025-08-29 18:50:53,943][19393] Updated weights for policy 0, policy_version 3750 (0.0019) -[2025-08-29 18:50:54,061][15827] Fps is (10 sec: 10649.6, 60 sec: 9557.3, 300 sec: 11579.9). Total num frames: 15360000. Throughput: 0: 2449.4. Samples: 3840012. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:50:54,063][15827] Avg episode reward: [(0, '4.309')] -[2025-08-29 18:50:57,404][19393] Updated weights for policy 0, policy_version 3760 (0.0017) -[2025-08-29 18:50:59,061][15827] Fps is (10 sec: 11468.8, 60 sec: 9420.8, 300 sec: 11524.3). Total num frames: 15413248. Throughput: 0: 2478.7. Samples: 3848952. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:50:59,063][15827] Avg episode reward: [(0, '4.472')] -[2025-08-29 18:51:00,931][19393] Updated weights for policy 0, policy_version 3770 (0.0014) -[2025-08-29 18:51:04,060][15827] Fps is (10 sec: 11468.9, 60 sec: 10171.8, 300 sec: 11468.8). Total num frames: 15474688. Throughput: 0: 2529.2. Samples: 3866500. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:51:04,062][15827] Avg episode reward: [(0, '4.568')] -[2025-08-29 18:51:04,375][19393] Updated weights for policy 0, policy_version 3780 (0.0014) -[2025-08-29 18:51:07,584][19393] Updated weights for policy 0, policy_version 3790 (0.0014) -[2025-08-29 18:51:09,061][15827] Fps is (10 sec: 12288.0, 60 sec: 10308.3, 300 sec: 11427.1). Total num frames: 15536128. Throughput: 0: 2512.0. Samples: 3884928. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:51:09,063][15827] Avg episode reward: [(0, '4.354')] -[2025-08-29 18:51:14,749][15827] Fps is (10 sec: 8047.4, 60 sec: 9786.4, 300 sec: 11248.2). Total num frames: 15560704. Throughput: 0: 2245.0. Samples: 3884928. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:51:14,751][15827] Avg episode reward: [(0, '4.332')] -[2025-08-29 18:51:15,017][19393] Updated weights for policy 0, policy_version 3800 (0.0019) -[2025-08-29 18:51:18,275][19393] Updated weights for policy 0, policy_version 3810 (0.0013) -[2025-08-29 18:51:19,060][15827] Fps is (10 sec: 7782.5, 60 sec: 9898.7, 300 sec: 11232.8). Total num frames: 15613952. Throughput: 0: 2259.3. Samples: 3900398. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:51:19,062][15827] Avg episode reward: [(0, '4.571')] -[2025-08-29 18:51:21,995][19393] Updated weights for policy 0, policy_version 3820 (0.0017) -[2025-08-29 18:51:24,060][15827] Fps is (10 sec: 11437.3, 60 sec: 9625.6, 300 sec: 11343.8). Total num frames: 15667200. Throughput: 0: 2482.0. Samples: 3916956. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:51:24,063][15827] Avg episode reward: [(0, '4.306')] -[2025-08-29 18:51:25,630][19393] Updated weights for policy 0, policy_version 3830 (0.0016) -[2025-08-29 18:51:28,923][19393] Updated weights for policy 0, policy_version 3840 (0.0015) -[2025-08-29 18:51:29,061][15827] Fps is (10 sec: 11468.1, 60 sec: 9625.5, 300 sec: 11316.0). Total num frames: 15728640. Throughput: 0: 2586.4. Samples: 3926314. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:51:29,063][15827] Avg episode reward: [(0, '4.300')] -[2025-08-29 18:51:32,796][19393] Updated weights for policy 0, policy_version 3850 (0.0017) -[2025-08-29 18:51:34,061][15827] Fps is (10 sec: 11468.7, 60 sec: 9693.9, 300 sec: 11246.6). Total num frames: 15781888. Throughput: 0: 2671.7. Samples: 3943342. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:51:34,062][15827] Avg episode reward: [(0, '4.462')] -[2025-08-29 18:51:36,027][19393] Updated weights for policy 0, policy_version 3860 (0.0015) -[2025-08-29 18:51:38,931][19393] Updated weights for policy 0, policy_version 3870 (0.0012) -[2025-08-29 18:51:39,060][15827] Fps is (10 sec: 12288.9, 60 sec: 10649.8, 300 sec: 11232.8). Total num frames: 15851520. Throughput: 0: 2720.2. Samples: 3962418. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:51:39,061][15827] Avg episode reward: [(0, '4.436')] -[2025-08-29 18:51:39,066][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003870_15851520.pth... -[2025-08-29 18:51:39,176][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003216_13172736.pth -[2025-08-29 18:51:41,644][19393] Updated weights for policy 0, policy_version 3880 (0.0011) -[2025-08-29 18:51:44,060][15827] Fps is (10 sec: 13926.5, 60 sec: 11127.5, 300 sec: 11218.9). Total num frames: 15921152. Throughput: 0: 2780.1. Samples: 3974058. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:51:44,062][15827] Avg episode reward: [(0, '4.390')] -[2025-08-29 18:51:44,906][19393] Updated weights for policy 0, policy_version 3890 (0.0016) -[2025-08-29 18:51:50,584][15827] Fps is (10 sec: 8530.5, 60 sec: 10585.6, 300 sec: 11023.1). Total num frames: 15949824. Throughput: 0: 2518.1. Samples: 3983652. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:51:50,586][15827] Avg episode reward: [(0, '4.694')] -[2025-08-29 18:51:52,172][19393] Updated weights for policy 0, policy_version 3900 (0.0014) -[2025-08-29 18:51:54,061][15827] Fps is (10 sec: 7372.8, 60 sec: 10581.3, 300 sec: 10996.7). Total num frames: 15994880. Throughput: 0: 2531.5. Samples: 3998846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:51:54,062][15827] Avg episode reward: [(0, '4.501')] -[2025-08-29 18:51:55,449][19393] Updated weights for policy 0, policy_version 3910 (0.0014) -[2025-08-29 18:51:58,820][19393] Updated weights for policy 0, policy_version 3920 (0.0011) -[2025-08-29 18:51:59,061][15827] Fps is (10 sec: 12562.7, 60 sec: 10717.8, 300 sec: 11149.4). Total num frames: 16056320. Throughput: 0: 2789.1. Samples: 4008518. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:51:59,065][15827] Avg episode reward: [(0, '4.402')] -[2025-08-29 18:52:02,471][19393] Updated weights for policy 0, policy_version 3930 (0.0013) -[2025-08-29 18:52:04,060][15827] Fps is (10 sec: 11878.5, 60 sec: 10649.6, 300 sec: 11135.6). Total num frames: 16113664. Throughput: 0: 2784.8. Samples: 4025716. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:52:04,062][15827] Avg episode reward: [(0, '4.423')] -[2025-08-29 18:52:05,810][19393] Updated weights for policy 0, policy_version 3940 (0.0021) -[2025-08-29 18:52:09,061][15827] Fps is (10 sec: 11469.3, 60 sec: 10581.3, 300 sec: 11093.9). Total num frames: 16171008. Throughput: 0: 2809.1. Samples: 4043366. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:52:09,063][15827] Avg episode reward: [(0, '4.553')] -[2025-08-29 18:52:09,547][19393] Updated weights for policy 0, policy_version 3950 (0.0017) -[2025-08-29 18:52:13,361][19393] Updated weights for policy 0, policy_version 3960 (0.0018) -[2025-08-29 18:52:14,060][15827] Fps is (10 sec: 11059.2, 60 sec: 11187.6, 300 sec: 11066.1). Total num frames: 16224256. Throughput: 0: 2786.3. Samples: 4051694. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:52:14,062][15827] Avg episode reward: [(0, '4.276')] -[2025-08-29 18:52:17,061][19393] Updated weights for policy 0, policy_version 3970 (0.0018) -[2025-08-29 18:52:19,060][15827] Fps is (10 sec: 11059.7, 60 sec: 11127.5, 300 sec: 11052.3). Total num frames: 16281600. Throughput: 0: 2771.6. Samples: 4068064. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:52:19,061][15827] Avg episode reward: [(0, '4.533')] -[2025-08-29 18:52:20,575][19393] Updated weights for policy 0, policy_version 3980 (0.0017) -[2025-08-29 18:52:26,426][15827] Fps is (10 sec: 8612.0, 60 sec: 10639.6, 300 sec: 10909.2). Total num frames: 16330752. Throughput: 0: 2415.1. Samples: 4076814. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:52:26,428][15827] Avg episode reward: [(0, '4.270')] -[2025-08-29 18:52:27,201][19393] Updated weights for policy 0, policy_version 3990 (0.0011) -[2025-08-29 18:52:29,061][15827] Fps is (10 sec: 8191.8, 60 sec: 10581.4, 300 sec: 10913.4). Total num frames: 16363520. Throughput: 0: 2462.5. Samples: 4084872. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:52:29,063][15827] Avg episode reward: [(0, '4.417')] -[2025-08-29 18:52:30,437][19393] Updated weights for policy 0, policy_version 4000 (0.0014) -[2025-08-29 18:52:33,476][19393] Updated weights for policy 0, policy_version 4010 (0.0016) -[2025-08-29 18:52:34,060][15827] Fps is (10 sec: 12877.1, 60 sec: 10786.1, 300 sec: 11064.3). Total num frames: 16429056. Throughput: 0: 2767.7. Samples: 4103980. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:52:34,062][15827] Avg episode reward: [(0, '4.551')] -[2025-08-29 18:52:36,707][19393] Updated weights for policy 0, policy_version 4020 (0.0015) -[2025-08-29 18:52:39,060][15827] Fps is (10 sec: 13107.5, 60 sec: 10717.8, 300 sec: 11066.1). Total num frames: 16494592. Throughput: 0: 2767.5. Samples: 4123384. Policy #0 lag: (min: 0.0, avg: 1.7, max: 3.0) -[2025-08-29 18:52:39,062][15827] Avg episode reward: [(0, '4.204')] -[2025-08-29 18:52:39,893][19393] Updated weights for policy 0, policy_version 4030 (0.0014) -[2025-08-29 18:52:43,160][19393] Updated weights for policy 0, policy_version 4040 (0.0012) -[2025-08-29 18:52:44,060][15827] Fps is (10 sec: 13107.3, 60 sec: 10649.6, 300 sec: 11066.1). Total num frames: 16560128. Throughput: 0: 2772.0. Samples: 4133258. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:52:44,061][15827] Avg episode reward: [(0, '4.444')] -[2025-08-29 18:52:46,408][19393] Updated weights for policy 0, policy_version 4050 (0.0014) -[2025-08-29 18:52:49,061][15827] Fps is (10 sec: 12287.5, 60 sec: 11417.3, 300 sec: 11038.4). Total num frames: 16617472. Throughput: 0: 2814.8. Samples: 4152382. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:52:49,063][15827] Avg episode reward: [(0, '4.343')] -[2025-08-29 18:52:49,937][19393] Updated weights for policy 0, policy_version 4060 (0.0017) -[2025-08-29 18:52:53,778][19393] Updated weights for policy 0, policy_version 4070 (0.0023) -[2025-08-29 18:52:54,060][15827] Fps is (10 sec: 11468.9, 60 sec: 11332.3, 300 sec: 10996.7). Total num frames: 16674816. Throughput: 0: 2782.0. Samples: 4168554. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:52:54,062][15827] Avg episode reward: [(0, '4.238')] -[2025-08-29 18:52:56,986][19393] Updated weights for policy 0, policy_version 4080 (0.0015) -[2025-08-29 18:53:02,255][15827] Fps is (10 sec: 9002.5, 60 sec: 10759.5, 300 sec: 10865.2). Total num frames: 16736256. Throughput: 0: 2620.3. Samples: 4177978. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:53:02,257][15827] Avg episode reward: [(0, '4.580')] -[2025-08-29 18:53:03,481][19393] Updated weights for policy 0, policy_version 4090 (0.0011) -[2025-08-29 18:53:04,061][15827] Fps is (10 sec: 8191.8, 60 sec: 10717.8, 300 sec: 10830.1). Total num frames: 16756736. Throughput: 0: 2628.3. Samples: 4186338. Policy #0 lag: (min: 0.0, avg: 1.2, max: 4.0) -[2025-08-29 18:53:04,062][15827] Avg episode reward: [(0, '4.269')] -[2025-08-29 18:53:06,200][19393] Updated weights for policy 0, policy_version 4100 (0.0014) -[2025-08-29 18:53:09,061][15827] Fps is (10 sec: 13241.4, 60 sec: 10922.7, 300 sec: 10983.9). Total num frames: 16826368. Throughput: 0: 3057.6. Samples: 4207174. Policy #0 lag: (min: 0.0, avg: 1.7, max: 3.0) -[2025-08-29 18:53:09,062][15827] Avg episode reward: [(0, '4.358')] -[2025-08-29 18:53:09,527][19393] Updated weights for policy 0, policy_version 4110 (0.0011) -[2025-08-29 18:53:12,941][19393] Updated weights for policy 0, policy_version 4120 (0.0017) -[2025-08-29 18:53:14,060][15827] Fps is (10 sec: 12697.9, 60 sec: 10991.0, 300 sec: 10969.0). Total num frames: 16883712. Throughput: 0: 2906.8. Samples: 4215678. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:53:14,062][15827] Avg episode reward: [(0, '4.316')] -[2025-08-29 18:53:16,061][19393] Updated weights for policy 0, policy_version 4130 (0.0016) -[2025-08-29 18:53:19,061][15827] Fps is (10 sec: 12288.1, 60 sec: 11127.4, 300 sec: 10955.1). Total num frames: 16949248. Throughput: 0: 2904.0. Samples: 4234660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:53:19,063][15827] Avg episode reward: [(0, '4.329')] -[2025-08-29 18:53:19,406][19393] Updated weights for policy 0, policy_version 4140 (0.0015) -[2025-08-29 18:53:22,802][19393] Updated weights for policy 0, policy_version 4150 (0.0018) -[2025-08-29 18:53:24,061][15827] Fps is (10 sec: 12287.8, 60 sec: 11726.4, 300 sec: 10941.2). Total num frames: 17006592. Throughput: 0: 2873.2. Samples: 4252678. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:53:24,062][15827] Avg episode reward: [(0, '4.336')] -[2025-08-29 18:53:26,333][19393] Updated weights for policy 0, policy_version 4160 (0.0016) -[2025-08-29 18:53:29,060][15827] Fps is (10 sec: 12288.2, 60 sec: 11810.2, 300 sec: 10927.3). Total num frames: 17072128. Throughput: 0: 2850.8. Samples: 4261544. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:53:29,062][15827] Avg episode reward: [(0, '4.376')] -[2025-08-29 18:53:29,540][19393] Updated weights for policy 0, policy_version 4170 (0.0020) -[2025-08-29 18:53:32,712][19393] Updated weights for policy 0, policy_version 4180 (0.0014) -[2025-08-29 18:53:34,060][15827] Fps is (10 sec: 13107.3, 60 sec: 11810.1, 300 sec: 10913.4). Total num frames: 17137664. Throughput: 0: 2861.8. Samples: 4281162. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:53:34,062][15827] Avg episode reward: [(0, '4.387')] -[2025-08-29 18:53:39,060][15827] Fps is (10 sec: 8191.9, 60 sec: 10990.9, 300 sec: 10732.9). Total num frames: 17154048. Throughput: 0: 2671.9. Samples: 4288788. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:53:39,062][15827] Avg episode reward: [(0, '4.496')] -[2025-08-29 18:53:39,068][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004188_17154048.pth... -[2025-08-29 18:53:39,192][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003564_14598144.pth -[2025-08-29 18:53:39,528][19393] Updated weights for policy 0, policy_version 4190 (0.0011) -[2025-08-29 18:53:42,777][19393] Updated weights for policy 0, policy_version 4200 (0.0019) -[2025-08-29 18:53:44,060][15827] Fps is (10 sec: 8192.0, 60 sec: 10990.9, 300 sec: 10875.4). Total num frames: 17219584. Throughput: 0: 2883.8. Samples: 4298536. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:53:44,062][15827] Avg episode reward: [(0, '4.423')] -[2025-08-29 18:53:45,930][19393] Updated weights for policy 0, policy_version 4210 (0.0014) -[2025-08-29 18:53:49,039][19393] Updated weights for policy 0, policy_version 4220 (0.0012) -[2025-08-29 18:53:49,061][15827] Fps is (10 sec: 13107.0, 60 sec: 11127.5, 300 sec: 10899.5). Total num frames: 17285120. Throughput: 0: 2922.5. Samples: 4317852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:53:49,064][15827] Avg episode reward: [(0, '4.442')] -[2025-08-29 18:53:52,304][19393] Updated weights for policy 0, policy_version 4230 (0.0018) -[2025-08-29 18:53:54,061][15827] Fps is (10 sec: 12697.5, 60 sec: 11195.7, 300 sec: 10871.8). Total num frames: 17346560. Throughput: 0: 2886.4. Samples: 4337062. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:53:54,062][15827] Avg episode reward: [(0, '4.417')] -[2025-08-29 18:53:55,329][19393] Updated weights for policy 0, policy_version 4240 (0.0016) -[2025-08-29 18:53:58,559][19393] Updated weights for policy 0, policy_version 4250 (0.0015) -[2025-08-29 18:53:59,060][15827] Fps is (10 sec: 12697.8, 60 sec: 11897.5, 300 sec: 10871.8). Total num frames: 17412096. Throughput: 0: 2914.7. Samples: 4346840. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:53:59,062][15827] Avg episode reward: [(0, '4.444')] -[2025-08-29 18:54:01,873][19393] Updated weights for policy 0, policy_version 4260 (0.0016) -[2025-08-29 18:54:04,061][15827] Fps is (10 sec: 13107.1, 60 sec: 12014.9, 300 sec: 10857.9). Total num frames: 17477632. Throughput: 0: 2909.7. Samples: 4365598. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:54:04,063][15827] Avg episode reward: [(0, '4.350')] -[2025-08-29 18:54:04,970][19393] Updated weights for policy 0, policy_version 4270 (0.0014) -[2025-08-29 18:54:08,300][19393] Updated weights for policy 0, policy_version 4280 (0.0016) -[2025-08-29 18:54:09,060][15827] Fps is (10 sec: 12697.6, 60 sec: 11878.4, 300 sec: 10857.9). Total num frames: 17539072. Throughput: 0: 2939.1. Samples: 4384936. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:54:09,062][15827] Avg episode reward: [(0, '4.394')] -[2025-08-29 18:54:14,060][15827] Fps is (10 sec: 8192.1, 60 sec: 11264.0, 300 sec: 10719.0). Total num frames: 17559552. Throughput: 0: 2866.7. Samples: 4390544. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:54:14,062][15827] Avg episode reward: [(0, '4.301')] -[2025-08-29 18:54:15,212][19393] Updated weights for policy 0, policy_version 4290 (0.0014) -[2025-08-29 18:54:18,429][19393] Updated weights for policy 0, policy_version 4300 (0.0017) -[2025-08-29 18:54:19,060][15827] Fps is (10 sec: 8192.1, 60 sec: 11195.8, 300 sec: 10850.4). Total num frames: 17620992. Throughput: 0: 2685.5. Samples: 4402008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:54:19,062][15827] Avg episode reward: [(0, '4.280')] -[2025-08-29 18:54:21,708][19393] Updated weights for policy 0, policy_version 4310 (0.0015) -[2025-08-29 18:54:24,060][15827] Fps is (10 sec: 12288.1, 60 sec: 11264.0, 300 sec: 10871.8). Total num frames: 17682432. Throughput: 0: 2935.3. Samples: 4420878. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:54:24,062][15827] Avg episode reward: [(0, '4.485')] -[2025-08-29 18:54:24,851][19393] Updated weights for policy 0, policy_version 4320 (0.0015) -[2025-08-29 18:54:27,964][19393] Updated weights for policy 0, policy_version 4330 (0.0016) -[2025-08-29 18:54:29,061][15827] Fps is (10 sec: 12696.9, 60 sec: 11263.9, 300 sec: 10885.6). Total num frames: 17747968. Throughput: 0: 2936.5. Samples: 4430682. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:54:29,063][15827] Avg episode reward: [(0, '4.383')] -[2025-08-29 18:54:31,209][19393] Updated weights for policy 0, policy_version 4340 (0.0014) -[2025-08-29 18:54:34,060][15827] Fps is (10 sec: 13107.3, 60 sec: 11264.0, 300 sec: 10899.5). Total num frames: 17813504. Throughput: 0: 2934.6. Samples: 4449910. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:54:34,062][15827] Avg episode reward: [(0, '4.478')] -[2025-08-29 18:54:34,480][19393] Updated weights for policy 0, policy_version 4350 (0.0020) -[2025-08-29 18:54:37,716][19393] Updated weights for policy 0, policy_version 4360 (0.0016) -[2025-08-29 18:54:39,060][15827] Fps is (10 sec: 12698.3, 60 sec: 12014.9, 300 sec: 10885.6). Total num frames: 17874944. Throughput: 0: 2927.4. Samples: 4468794. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:54:39,062][15827] Avg episode reward: [(0, '4.622')] -[2025-08-29 18:54:40,989][19393] Updated weights for policy 0, policy_version 4370 (0.0016) -[2025-08-29 18:54:44,061][15827] Fps is (10 sec: 12287.6, 60 sec: 11946.6, 300 sec: 10885.6). Total num frames: 17936384. Throughput: 0: 2921.8. Samples: 4478322. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:54:44,062][15827] Avg episode reward: [(0, '4.503')] -[2025-08-29 18:54:44,207][19393] Updated weights for policy 0, policy_version 4380 (0.0015) -[2025-08-29 18:54:49,747][15827] Fps is (10 sec: 8432.6, 60 sec: 11204.2, 300 sec: 10749.6). Total num frames: 17965056. Throughput: 0: 2671.1. Samples: 4487632. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:54:49,749][15827] Avg episode reward: [(0, '4.626')] -[2025-08-29 18:54:50,867][19393] Updated weights for policy 0, policy_version 4390 (0.0014) -[2025-08-29 18:54:54,061][15827] Fps is (10 sec: 8192.2, 60 sec: 11195.7, 300 sec: 10746.8). Total num frames: 18018304. Throughput: 0: 2663.0. Samples: 4504772. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:54:54,062][15827] Avg episode reward: [(0, '4.377')] -[2025-08-29 18:54:54,130][19393] Updated weights for policy 0, policy_version 4400 (0.0013) -[2025-08-29 18:54:57,586][19393] Updated weights for policy 0, policy_version 4410 (0.0018) -[2025-08-29 18:54:59,060][15827] Fps is (10 sec: 12313.6, 60 sec: 11127.5, 300 sec: 10899.5). Total num frames: 18079744. Throughput: 0: 2744.4. Samples: 4514040. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:54:59,062][15827] Avg episode reward: [(0, '4.470')] -[2025-08-29 18:55:00,949][19393] Updated weights for policy 0, policy_version 4420 (0.0015) -[2025-08-29 18:55:04,061][15827] Fps is (10 sec: 11878.3, 60 sec: 10990.9, 300 sec: 10913.4). Total num frames: 18137088. Throughput: 0: 2891.4. Samples: 4532122. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:55:04,062][15827] Avg episode reward: [(0, '4.506')] -[2025-08-29 18:55:04,425][19393] Updated weights for policy 0, policy_version 4430 (0.0016) -[2025-08-29 18:55:07,704][19393] Updated weights for policy 0, policy_version 4440 (0.0016) -[2025-08-29 18:55:09,060][15827] Fps is (10 sec: 12288.1, 60 sec: 11059.2, 300 sec: 10969.0). Total num frames: 18202624. Throughput: 0: 2887.4. Samples: 4550812. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:55:09,062][15827] Avg episode reward: [(0, '4.333')] -[2025-08-29 18:55:10,939][19393] Updated weights for policy 0, policy_version 4450 (0.0019) -[2025-08-29 18:55:14,060][15827] Fps is (10 sec: 12697.9, 60 sec: 11741.9, 300 sec: 10996.7). Total num frames: 18264064. Throughput: 0: 2877.1. Samples: 4560152. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:55:14,062][15827] Avg episode reward: [(0, '4.228')] -[2025-08-29 18:55:14,146][19393] Updated weights for policy 0, policy_version 4460 (0.0015) -[2025-08-29 18:55:17,364][19393] Updated weights for policy 0, policy_version 4470 (0.0015) -[2025-08-29 18:55:19,060][15827] Fps is (10 sec: 12288.0, 60 sec: 11741.9, 300 sec: 10969.0). Total num frames: 18325504. Throughput: 0: 2866.0. Samples: 4578880. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:55:19,062][15827] Avg episode reward: [(0, '4.319')] -[2025-08-29 18:55:21,036][19393] Updated weights for policy 0, policy_version 4480 (0.0018) -[2025-08-29 18:55:25,582][15827] Fps is (10 sec: 8176.5, 60 sec: 10985.4, 300 sec: 10816.0). Total num frames: 18358272. Throughput: 0: 2545.5. Samples: 4587214. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:55:25,584][15827] Avg episode reward: [(0, '4.504')] -[2025-08-29 18:55:27,875][19393] Updated weights for policy 0, policy_version 4490 (0.0017) -[2025-08-29 18:55:29,061][15827] Fps is (10 sec: 7781.9, 60 sec: 10922.6, 300 sec: 10857.9). Total num frames: 18403328. Throughput: 0: 2575.4. Samples: 4594218. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:55:29,063][15827] Avg episode reward: [(0, '4.278')] -[2025-08-29 18:55:31,110][19393] Updated weights for policy 0, policy_version 4500 (0.0013) -[2025-08-29 18:55:34,060][15827] Fps is (10 sec: 12561.1, 60 sec: 10854.4, 300 sec: 11024.5). Total num frames: 18464768. Throughput: 0: 2841.7. Samples: 4613560. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:55:34,062][15827] Avg episode reward: [(0, '4.289')] -[2025-08-29 18:55:34,305][19393] Updated weights for policy 0, policy_version 4510 (0.0016) -[2025-08-29 18:55:37,806][19393] Updated weights for policy 0, policy_version 4520 (0.0017) -[2025-08-29 18:55:39,061][15827] Fps is (10 sec: 12288.6, 60 sec: 10854.4, 300 sec: 11093.9). Total num frames: 18526208. Throughput: 0: 2820.7. Samples: 4631706. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:55:39,062][15827] Avg episode reward: [(0, '4.602')] -[2025-08-29 18:55:39,068][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004523_18526208.pth... -[2025-08-29 18:55:39,175][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003870_15851520.pth -[2025-08-29 18:55:41,247][19393] Updated weights for policy 0, policy_version 4530 (0.0014) -[2025-08-29 18:55:44,061][15827] Fps is (10 sec: 12287.9, 60 sec: 10854.4, 300 sec: 11149.5). Total num frames: 18587648. Throughput: 0: 2823.0. Samples: 4641076. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:55:44,062][15827] Avg episode reward: [(0, '4.691')] -[2025-08-29 18:55:44,375][19393] Updated weights for policy 0, policy_version 4540 (0.0011) -[2025-08-29 18:55:47,592][19393] Updated weights for policy 0, policy_version 4550 (0.0016) -[2025-08-29 18:55:49,061][15827] Fps is (10 sec: 12287.8, 60 sec: 11532.4, 300 sec: 11149.4). Total num frames: 18649088. Throughput: 0: 2849.9. Samples: 4660366. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:55:49,062][15827] Avg episode reward: [(0, '4.329')] -[2025-08-29 18:55:50,891][19393] Updated weights for policy 0, policy_version 4560 (0.0017) -[2025-08-29 18:55:54,061][15827] Fps is (10 sec: 12697.5, 60 sec: 11605.3, 300 sec: 11191.1). Total num frames: 18714624. Throughput: 0: 2848.1. Samples: 4678976. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:55:54,064][15827] Avg episode reward: [(0, '4.301')] -[2025-08-29 18:55:54,276][19393] Updated weights for policy 0, policy_version 4570 (0.0018) -[2025-08-29 18:55:57,529][19393] Updated weights for policy 0, policy_version 4580 (0.0016) -[2025-08-29 18:56:01,419][15827] Fps is (10 sec: 9280.1, 60 sec: 10969.3, 300 sec: 11061.0). Total num frames: 18763776. Throughput: 0: 2706.2. Samples: 4688316. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:56:01,421][15827] Avg episode reward: [(0, '4.334')] -[2025-08-29 18:56:04,061][15827] Fps is (10 sec: 8192.1, 60 sec: 10990.9, 300 sec: 11052.3). Total num frames: 18796544. Throughput: 0: 2595.7. Samples: 4695686. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:56:04,063][15827] Avg episode reward: [(0, '4.438')] -[2025-08-29 18:56:04,395][19393] Updated weights for policy 0, policy_version 4590 (0.0020) -[2025-08-29 18:56:08,339][19393] Updated weights for policy 0, policy_version 4600 (0.0016) -[2025-08-29 18:56:09,061][15827] Fps is (10 sec: 11256.4, 60 sec: 10786.1, 300 sec: 11175.5). Total num frames: 18849792. Throughput: 0: 2882.8. Samples: 4712554. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:56:09,065][15827] Avg episode reward: [(0, '4.339')] -[2025-08-29 18:56:12,259][19393] Updated weights for policy 0, policy_version 4610 (0.0016) -[2025-08-29 18:56:14,061][15827] Fps is (10 sec: 10240.0, 60 sec: 10581.3, 300 sec: 11135.6). Total num frames: 18898944. Throughput: 0: 2801.7. Samples: 4720294. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:56:14,063][15827] Avg episode reward: [(0, '4.443')] -[2025-08-29 18:56:15,923][19393] Updated weights for policy 0, policy_version 4620 (0.0021) -[2025-08-29 18:56:19,060][15827] Fps is (10 sec: 11059.6, 60 sec: 10581.3, 300 sec: 11163.3). Total num frames: 18960384. Throughput: 0: 2759.0. Samples: 4737716. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:56:19,062][15827] Avg episode reward: [(0, '4.249')] -[2025-08-29 18:56:19,114][19393] Updated weights for policy 0, policy_version 4630 (0.0013) -[2025-08-29 18:56:23,041][19393] Updated weights for policy 0, policy_version 4640 (0.0016) -[2025-08-29 18:56:24,061][15827] Fps is (10 sec: 11468.8, 60 sec: 11206.9, 300 sec: 11135.6). Total num frames: 19013632. Throughput: 0: 2725.2. Samples: 4754342. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:56:24,062][15827] Avg episode reward: [(0, '4.492')] -[2025-08-29 18:56:26,487][19393] Updated weights for policy 0, policy_version 4650 (0.0018) -[2025-08-29 18:56:29,060][15827] Fps is (10 sec: 11468.8, 60 sec: 11195.8, 300 sec: 11163.3). Total num frames: 19075072. Throughput: 0: 2714.0. Samples: 4763208. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:56:29,062][15827] Avg episode reward: [(0, '4.501')] -[2025-08-29 18:56:29,836][19393] Updated weights for policy 0, policy_version 4660 (0.0012) -[2025-08-29 18:56:33,212][19393] Updated weights for policy 0, policy_version 4670 (0.0018) -[2025-08-29 18:56:37,249][15827] Fps is (10 sec: 9006.4, 60 sec: 10565.9, 300 sec: 11002.7). Total num frames: 19132416. Throughput: 0: 2512.1. Samples: 4781420. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:56:37,250][15827] Avg episode reward: [(0, '4.494')] -[2025-08-29 18:56:39,061][15827] Fps is (10 sec: 7782.2, 60 sec: 10444.8, 300 sec: 10955.1). Total num frames: 19152896. Throughput: 0: 2439.8. Samples: 4788768. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:56:39,064][15827] Avg episode reward: [(0, '4.466')] -[2025-08-29 18:56:40,452][19393] Updated weights for policy 0, policy_version 4680 (0.0013) -[2025-08-29 18:56:44,060][15827] Fps is (10 sec: 10824.8, 60 sec: 10308.3, 300 sec: 11095.7). Total num frames: 19206144. Throughput: 0: 2532.5. Samples: 4796306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:56:44,062][15827] Avg episode reward: [(0, '4.523')] -[2025-08-29 18:56:44,612][19393] Updated weights for policy 0, policy_version 4690 (0.0019) -[2025-08-29 18:56:47,933][19393] Updated weights for policy 0, policy_version 4700 (0.0016) -[2025-08-29 18:56:49,061][15827] Fps is (10 sec: 11468.9, 60 sec: 10308.3, 300 sec: 11093.9). Total num frames: 19267584. Throughput: 0: 2609.3. Samples: 4813104. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) -[2025-08-29 18:56:49,062][15827] Avg episode reward: [(0, '4.397')] -[2025-08-29 18:56:50,953][19393] Updated weights for policy 0, policy_version 4710 (0.0012) -[2025-08-29 18:56:54,060][15827] Fps is (10 sec: 11878.3, 60 sec: 10171.8, 300 sec: 11080.1). Total num frames: 19324928. Throughput: 0: 2652.6. Samples: 4831922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:56:54,062][15827] Avg episode reward: [(0, '4.393')] -[2025-08-29 18:56:54,637][19393] Updated weights for policy 0, policy_version 4720 (0.0016) -[2025-08-29 18:56:58,134][19393] Updated weights for policy 0, policy_version 4730 (0.0017) -[2025-08-29 18:56:59,061][15827] Fps is (10 sec: 11878.6, 60 sec: 10801.2, 300 sec: 11093.9). Total num frames: 19386368. Throughput: 0: 2657.2. Samples: 4839868. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) -[2025-08-29 18:56:59,062][15827] Avg episode reward: [(0, '4.449')] -[2025-08-29 18:57:01,496][19393] Updated weights for policy 0, policy_version 4740 (0.0016) -[2025-08-29 18:57:04,061][15827] Fps is (10 sec: 12287.4, 60 sec: 10854.3, 300 sec: 11107.8). Total num frames: 19447808. Throughput: 0: 2679.1. Samples: 4858278. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:57:04,065][15827] Avg episode reward: [(0, '4.389')] -[2025-08-29 18:57:05,043][19393] Updated weights for policy 0, policy_version 4750 (0.0015) -[2025-08-29 18:57:09,061][15827] Fps is (10 sec: 10649.5, 60 sec: 10717.9, 300 sec: 11080.0). Total num frames: 19492864. Throughput: 0: 2647.8. Samples: 4873494. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) -[2025-08-29 18:57:09,062][15827] Avg episode reward: [(0, '4.320')] -[2025-08-29 18:57:09,195][19393] Updated weights for policy 0, policy_version 4760 (0.0016) -[2025-08-29 18:57:14,060][15827] Fps is (10 sec: 6554.0, 60 sec: 10240.0, 300 sec: 10955.1). Total num frames: 19513344. Throughput: 0: 2522.0. Samples: 4876700. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:57:14,062][15827] Avg episode reward: [(0, '4.520')] -[2025-08-29 18:57:15,580][19393] Updated weights for policy 0, policy_version 4770 (0.0012) -[2025-08-29 18:57:18,405][19393] Updated weights for policy 0, policy_version 4780 (0.0012) -[2025-08-29 18:57:19,061][15827] Fps is (10 sec: 9420.7, 60 sec: 10444.8, 300 sec: 11127.6). Total num frames: 19587072. Throughput: 0: 2663.0. Samples: 4892764. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) -[2025-08-29 18:57:19,063][15827] Avg episode reward: [(0, '4.555')] -[2025-08-29 18:57:21,525][19393] Updated weights for policy 0, policy_version 4790 (0.0019) -[2025-08-29 18:57:24,060][15827] Fps is (10 sec: 12697.6, 60 sec: 10444.8, 300 sec: 11107.8). Total num frames: 19640320. Throughput: 0: 2704.9. Samples: 4910486. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) -[2025-08-29 18:57:24,062][15827] Avg episode reward: [(0, '4.335')] -[2025-08-29 18:57:25,678][19393] Updated weights for policy 0, policy_version 4800 (0.0017) -[2025-08-29 18:57:29,060][15827] Fps is (10 sec: 11059.5, 60 sec: 10376.5, 300 sec: 11080.0). Total num frames: 19697664. Throughput: 0: 2715.7. Samples: 4918512. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:57:29,062][15827] Avg episode reward: [(0, '4.454')] -[2025-08-29 18:57:29,103][19393] Updated weights for policy 0, policy_version 4810 (0.0016) -[2025-08-29 18:57:32,512][19393] Updated weights for policy 0, policy_version 4820 (0.0015) -[2025-08-29 18:57:34,061][15827] Fps is (10 sec: 11878.3, 60 sec: 11031.1, 300 sec: 11066.1). Total num frames: 19759104. Throughput: 0: 2754.0. Samples: 4937032. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:57:34,062][15827] Avg episode reward: [(0, '4.382')] -[2025-08-29 18:57:35,908][19393] Updated weights for policy 0, policy_version 4830 (0.0013) -[2025-08-29 18:57:39,061][15827] Fps is (10 sec: 12287.8, 60 sec: 11127.5, 300 sec: 11052.3). Total num frames: 19820544. Throughput: 0: 2741.8. Samples: 4955304. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:57:39,062][15827] Avg episode reward: [(0, '4.406')] -[2025-08-29 18:57:39,069][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004839_19820544.pth... -[2025-08-29 18:57:39,165][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004188_17154048.pth -[2025-08-29 18:57:39,298][19393] Updated weights for policy 0, policy_version 4840 (0.0013) -[2025-08-29 18:57:42,799][19393] Updated weights for policy 0, policy_version 4850 (0.0015) -[2025-08-29 18:57:44,061][15827] Fps is (10 sec: 11878.2, 60 sec: 11195.7, 300 sec: 11052.3). Total num frames: 19877888. Throughput: 0: 2761.8. Samples: 4964148. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) -[2025-08-29 18:57:44,062][15827] Avg episode reward: [(0, '4.303')] -[2025-08-29 18:57:49,061][15827] Fps is (10 sec: 7782.3, 60 sec: 10513.1, 300 sec: 10927.3). Total num frames: 19898368. Throughput: 0: 2612.6. Samples: 4975844. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:57:49,062][15827] Avg episode reward: [(0, '4.298')] -[2025-08-29 18:57:49,732][19393] Updated weights for policy 0, policy_version 4860 (0.0014) -[2025-08-29 18:57:52,918][19393] Updated weights for policy 0, policy_version 4870 (0.0016) -[2025-08-29 18:57:54,061][15827] Fps is (10 sec: 7782.5, 60 sec: 10513.0, 300 sec: 11032.9). Total num frames: 19955712. Throughput: 0: 2584.3. Samples: 4989786. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) -[2025-08-29 18:57:54,063][15827] Avg episode reward: [(0, '4.506')] -[2025-08-29 18:57:57,044][19393] Updated weights for policy 0, policy_version 4880 (0.0020) -[2025-08-29 18:57:58,132][19378] Stopping Batcher_0... -[2025-08-29 18:57:58,137][19378] Loop batcher_evt_loop terminating... -[2025-08-29 18:57:58,141][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 18:57:58,141][15827] Component Batcher_0 stopped! -[2025-08-29 18:57:58,237][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004523_18526208.pth -[2025-08-29 18:57:58,251][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 18:57:58,403][15827] Component RolloutWorker_w8 stopped! -[2025-08-29 18:57:58,421][15827] Component RolloutWorker_w1 stopped! -[2025-08-29 18:57:58,424][19378] Stopping LearnerWorker_p0... -[2025-08-29 18:57:58,424][19378] Loop learner_proc0_evt_loop terminating... -[2025-08-29 18:57:58,424][15827] Component LearnerWorker_p0 stopped! -[2025-08-29 18:57:58,426][15827] Component RolloutWorker_w7 stopped! -[2025-08-29 18:57:58,432][15827] Component RolloutWorker_w4 stopped! -[2025-08-29 18:57:58,408][19401] Stopping RolloutWorker_w8... -[2025-08-29 18:57:58,437][15827] Component RolloutWorker_w5 stopped! -[2025-08-29 18:57:58,441][19401] Loop rollout_proc8_evt_loop terminating... -[2025-08-29 18:57:58,453][15827] Component RolloutWorker_w9 stopped! -[2025-08-29 18:57:58,433][19400] Stopping RolloutWorker_w7... -[2025-08-29 18:57:58,432][19397] Stopping RolloutWorker_w1... -[2025-08-29 18:57:58,462][19400] Loop rollout_proc7_evt_loop terminating... -[2025-08-29 18:57:58,462][19397] Loop rollout_proc1_evt_loop terminating... -[2025-08-29 18:57:58,460][15827] Component RolloutWorker_w3 stopped! -[2025-08-29 18:57:58,444][19398] Stopping RolloutWorker_w4... -[2025-08-29 18:57:58,443][19402] Stopping RolloutWorker_w5... -[2025-08-29 18:57:58,458][19403] Stopping RolloutWorker_w9... -[2025-08-29 18:57:58,475][19398] Loop rollout_proc4_evt_loop terminating... -[2025-08-29 18:57:58,476][19402] Loop rollout_proc5_evt_loop terminating... -[2025-08-29 18:57:58,479][19403] Loop rollout_proc9_evt_loop terminating... -[2025-08-29 18:57:58,488][15827] Component RolloutWorker_w0 stopped! -[2025-08-29 18:57:58,527][19393] Weights refcount: 2 0 -[2025-08-29 18:57:58,495][19394] Stopping RolloutWorker_w0... -[2025-08-29 18:57:58,537][19394] Loop rollout_proc0_evt_loop terminating... -[2025-08-29 18:57:58,579][19393] Stopping InferenceWorker_p0-w0... -[2025-08-29 18:57:58,579][19393] Loop inference_proc0-0_evt_loop terminating... -[2025-08-29 18:57:58,580][15827] Component InferenceWorker_p0-w0 stopped! -[2025-08-29 18:57:58,587][15827] Component RolloutWorker_w6 stopped! -[2025-08-29 18:57:58,590][19399] Stopping RolloutWorker_w6... -[2025-08-29 18:57:58,601][15827] Component RolloutWorker_w2 stopped! -[2025-08-29 18:57:58,603][19399] Loop rollout_proc6_evt_loop terminating... -[2025-08-29 18:57:58,604][15827] Waiting for process learner_proc0 to stop... -[2025-08-29 18:57:58,608][19396] Stopping RolloutWorker_w2... -[2025-08-29 18:57:58,629][19396] Loop rollout_proc2_evt_loop terminating... -[2025-08-29 18:57:58,485][19395] Stopping RolloutWorker_w3... -[2025-08-29 18:57:58,639][19395] Loop rollout_proc3_evt_loop terminating... -[2025-08-29 18:58:10,820][15827] Waiting for process inference_proc0-0 to join... -[2025-08-29 18:58:10,822][15827] Waiting for process rollout_proc0 to join... -[2025-08-29 18:58:10,823][15827] Waiting for process rollout_proc1 to join... -[2025-08-29 18:58:10,824][15827] Waiting for process rollout_proc2 to join... -[2025-08-29 18:58:10,825][15827] Waiting for process rollout_proc3 to join... -[2025-08-29 18:58:10,826][15827] Waiting for process rollout_proc4 to join... -[2025-08-29 18:58:10,827][15827] Waiting for process rollout_proc5 to join... -[2025-08-29 18:58:10,828][15827] Waiting for process rollout_proc6 to join... -[2025-08-29 18:58:10,829][15827] Waiting for process rollout_proc7 to join... -[2025-08-29 18:58:10,830][15827] Waiting for process rollout_proc8 to join... -[2025-08-29 18:58:10,831][15827] Waiting for process rollout_proc9 to join... -[2025-08-29 18:58:10,833][15827] Batcher 0 profile tree view: -batching: 96.9923, releasing_batches: 0.2179 -[2025-08-29 18:58:10,835][15827] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0000 - wait_policy_total: 17.7003 -update_model: 26.5437 - weight_update: 0.0018 -one_step: 0.0047 - handle_policy_step: 1571.9646 - deserialize: 60.5948, stack: 8.7009, obs_to_device_normalize: 436.3278, forward: 658.4755, send_messages: 111.6453 - prepare_outputs: 239.6883 - to_cpu: 179.7071 -[2025-08-29 18:58:10,839][15827] Learner 0 profile tree view: -misc: 0.0255, prepare_batch: 66.1001 -train: 223.9064 - epoch_init: 0.0215, minibatch_init: 0.0330, losses_postprocess: 2.7224, kl_divergence: 3.1769, after_optimizer: 77.3884 - calculate_losses: 82.4285 - losses_init: 0.0122, forward_head: 5.8426, bptt_initial: 53.8262, tail: 3.9175, advantages_returns: 1.2866, losses: 8.5309 - bptt: 7.9398 - bptt_forward_core: 7.5254 - update: 55.5816 - clip: 5.1826 -[2025-08-29 18:58:10,841][15827] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.2804, enqueue_policy_requests: 43.9031, env_step: 387.9207, overhead: 32.5997, complete_rollouts: 0.8219 -save_policy_outputs: 45.3418 - split_output_tensors: 14.6080 -[2025-08-29 18:58:10,844][15827] RolloutWorker_w9 profile tree view: -wait_for_trajectories: 0.2721, enqueue_policy_requests: 36.4583, env_step: 371.5879, overhead: 28.0753, complete_rollouts: 0.8656 -save_policy_outputs: 43.8599 - split_output_tensors: 17.6693 -[2025-08-29 18:58:10,847][15827] Loop Runner_EvtLoop terminating... -[2025-08-29 18:58:10,850][15827] Runner profile tree view: -main_loop: 1702.4607 -[2025-08-29 18:58:10,851][15827] Collected {0: 20004864}, FPS: 11750.6 -[2025-08-29 18:58:26,518][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 18:58:26,520][15827] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-29 18:58:26,520][15827] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-29 18:58:26,521][15827] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-29 18:58:26,521][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-29 18:58:26,522][15827] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-29 18:58:26,524][15827] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-29 18:58:26,524][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-29 18:58:26,525][15827] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-29 18:58:26,526][15827] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-29 18:58:26,527][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-29 18:58:26,527][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-29 18:58:26,528][15827] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-29 18:58:26,528][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-29 18:58:26,529][15827] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-29 18:58:26,933][15827] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 18:58:26,960][15827] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 18:58:27,019][15827] RunningMeanStd input shape: (1,) -[2025-08-29 18:58:27,266][15827] ConvEncoder: input_channels=3 -[2025-08-29 18:58:28,061][15827] Conv encoder output size: 512 -[2025-08-29 18:58:28,064][15827] Policy head output size: 512 -[2025-08-29 18:58:29,963][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 18:58:34,432][15827] Num frames 100... -[2025-08-29 18:58:34,626][15827] Num frames 200... -[2025-08-29 18:58:34,961][15827] Num frames 300... -[2025-08-29 18:58:35,261][15827] Num frames 400... -[2025-08-29 18:58:35,542][15827] Num frames 500... -[2025-08-29 18:58:35,789][15827] Avg episode rewards: #0: 8.760, true rewards: #0: 5.760 -[2025-08-29 18:58:35,790][15827] Avg episode reward: 8.760, avg true_objective: 5.760 -[2025-08-29 18:58:35,846][15827] Num frames 600... -[2025-08-29 18:58:36,074][15827] Num frames 700... -[2025-08-29 18:58:36,308][15827] Num frames 800... -[2025-08-29 18:58:36,555][15827] Avg episode rewards: #0: 6.320, true rewards: #0: 4.320 -[2025-08-29 18:58:36,556][15827] Avg episode reward: 6.320, avg true_objective: 4.320 -[2025-08-29 18:58:36,683][15827] Num frames 900... -[2025-08-29 18:58:37,000][15827] Num frames 1000... -[2025-08-29 18:58:37,343][15827] Num frames 1100... -[2025-08-29 18:58:37,647][15827] Num frames 1200... -[2025-08-29 18:58:37,826][15827] Avg episode rewards: #0: 5.493, true rewards: #0: 4.160 -[2025-08-29 18:58:37,828][15827] Avg episode reward: 5.493, avg true_objective: 4.160 -[2025-08-29 18:58:37,954][15827] Num frames 1300... -[2025-08-29 18:58:38,236][15827] Num frames 1400... -[2025-08-29 18:58:38,573][15827] Num frames 1500... -[2025-08-29 18:58:38,908][15827] Num frames 1600... -[2025-08-29 18:58:39,306][15827] Avg episode rewards: #0: 5.490, true rewards: #0: 4.240 -[2025-08-29 18:58:39,308][15827] Avg episode reward: 5.490, avg true_objective: 4.240 -[2025-08-29 18:58:39,322][15827] Num frames 1700... -[2025-08-29 18:58:39,769][15827] Num frames 1800... -[2025-08-29 18:58:40,129][15827] Num frames 1900... -[2025-08-29 18:58:40,501][15827] Num frames 2000... -[2025-08-29 18:58:40,845][15827] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 -[2025-08-29 18:58:40,848][15827] Avg episode reward: 5.160, avg true_objective: 4.160 -[2025-08-29 18:58:40,922][15827] Num frames 2100... -[2025-08-29 18:58:41,159][15827] Num frames 2200... -[2025-08-29 18:58:41,421][15827] Num frames 2300... -[2025-08-29 18:58:41,701][15827] Num frames 2400... -[2025-08-29 18:58:41,971][15827] Avg episode rewards: #0: 4.940, true rewards: #0: 4.107 -[2025-08-29 18:58:41,972][15827] Avg episode reward: 4.940, avg true_objective: 4.107 -[2025-08-29 18:58:42,061][15827] Num frames 2500... -[2025-08-29 18:58:42,353][15827] Num frames 2600... -[2025-08-29 18:58:42,639][15827] Num frames 2700... -[2025-08-29 18:58:42,880][15827] Num frames 2800... -[2025-08-29 18:58:43,072][15827] Avg episode rewards: #0: 4.783, true rewards: #0: 4.069 -[2025-08-29 18:58:43,073][15827] Avg episode reward: 4.783, avg true_objective: 4.069 -[2025-08-29 18:58:43,219][15827] Num frames 2900... -[2025-08-29 18:58:43,468][15827] Num frames 3000... -[2025-08-29 18:58:43,725][15827] Num frames 3100... -[2025-08-29 18:58:43,973][15827] Num frames 3200... -[2025-08-29 18:58:44,026][15827] Avg episode rewards: #0: 4.750, true rewards: #0: 4.000 -[2025-08-29 18:58:44,028][15827] Avg episode reward: 4.750, avg true_objective: 4.000 -[2025-08-29 18:58:44,298][15827] Num frames 3300... -[2025-08-29 18:58:44,561][15827] Num frames 3400... -[2025-08-29 18:58:44,816][15827] Num frames 3500... -[2025-08-29 18:58:45,133][15827] Avg episode rewards: #0: 4.649, true rewards: #0: 3.982 -[2025-08-29 18:58:45,134][15827] Avg episode reward: 4.649, avg true_objective: 3.982 -[2025-08-29 18:58:45,181][15827] Num frames 3600... -[2025-08-29 18:58:45,399][15827] Num frames 3700... -[2025-08-29 18:58:45,637][15827] Num frames 3800... -[2025-08-29 18:58:45,900][15827] Num frames 3900... -[2025-08-29 18:58:46,160][15827] Num frames 4000... -[2025-08-29 18:58:46,294][15827] Avg episode rewards: #0: 4.732, true rewards: #0: 4.032 -[2025-08-29 18:58:46,295][15827] Avg episode reward: 4.732, avg true_objective: 4.032 -[2025-08-29 18:58:53,528][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-29 18:58:53,559][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 18:58:53,560][15827] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-29 18:58:53,561][15827] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-29 18:58:53,562][15827] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-29 18:58:53,563][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-29 18:58:53,564][15827] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-29 18:58:53,565][15827] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-29 18:58:53,566][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-29 18:58:53,567][15827] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-29 18:58:53,568][15827] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-29 18:58:53,569][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-29 18:58:53,570][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-29 18:58:53,571][15827] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-29 18:58:53,572][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-29 18:58:53,573][15827] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-29 18:58:53,594][15827] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 18:58:53,596][15827] RunningMeanStd input shape: (1,) -[2025-08-29 18:58:53,629][15827] ConvEncoder: input_channels=3 -[2025-08-29 18:58:53,687][15827] Conv encoder output size: 512 -[2025-08-29 18:58:53,688][15827] Policy head output size: 512 -[2025-08-29 18:58:53,726][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 18:58:54,271][15827] Num frames 100... -[2025-08-29 18:58:54,485][15827] Num frames 200... -[2025-08-29 18:58:54,638][15827] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 -[2025-08-29 18:58:54,639][15827] Avg episode reward: 2.560, avg true_objective: 2.560 -[2025-08-29 18:58:54,716][15827] Num frames 300... -[2025-08-29 18:58:54,899][15827] Num frames 400... -[2025-08-29 18:58:55,080][15827] Num frames 500... -[2025-08-29 18:58:55,292][15827] Num frames 600... -[2025-08-29 18:58:55,463][15827] Avg episode rewards: #0: 3.200, true rewards: #0: 3.200 -[2025-08-29 18:58:55,464][15827] Avg episode reward: 3.200, avg true_objective: 3.200 -[2025-08-29 18:58:55,597][15827] Num frames 700... -[2025-08-29 18:58:55,842][15827] Num frames 800... -[2025-08-29 18:58:56,068][15827] Num frames 900... -[2025-08-29 18:58:56,242][15827] Num frames 1000... -[2025-08-29 18:58:56,338][15827] Avg episode rewards: #0: 3.413, true rewards: #0: 3.413 -[2025-08-29 18:58:56,340][15827] Avg episode reward: 3.413, avg true_objective: 3.413 -[2025-08-29 18:58:56,481][15827] Num frames 1100... -[2025-08-29 18:58:56,650][15827] Num frames 1200... -[2025-08-29 18:58:56,815][15827] Num frames 1300... -[2025-08-29 18:58:56,974][15827] Num frames 1400... -[2025-08-29 18:59:00,683][15827] Avg episode rewards: #0: 3.850, true rewards: #0: 3.600 -[2025-08-29 18:59:00,685][15827] Avg episode reward: 3.850, avg true_objective: 3.600 -[2025-08-29 18:59:00,793][15827] Num frames 1500... -[2025-08-29 18:59:00,973][15827] Num frames 1600... -[2025-08-29 18:59:01,188][15827] Avg episode rewards: #0: 3.592, true rewards: #0: 3.392 -[2025-08-29 18:59:01,189][15827] Avg episode reward: 3.592, avg true_objective: 3.392 -[2025-08-29 18:59:01,198][15827] Num frames 1700... -[2025-08-29 18:59:01,373][15827] Num frames 1800... -[2025-08-29 18:59:01,558][15827] Num frames 1900... -[2025-08-29 18:59:01,798][15827] Num frames 2000... -[2025-08-29 18:59:02,067][15827] Avg episode rewards: #0: 3.633, true rewards: #0: 3.467 -[2025-08-29 18:59:02,069][15827] Avg episode reward: 3.633, avg true_objective: 3.467 -[2025-08-29 18:59:02,118][15827] Num frames 2100... -[2025-08-29 18:59:02,355][15827] Num frames 2200... -[2025-08-29 18:59:02,549][15827] Num frames 2300... -[2025-08-29 18:59:02,764][15827] Num frames 2400... -[2025-08-29 18:59:02,956][15827] Num frames 2500... -[2025-08-29 18:59:03,068][15827] Avg episode rewards: #0: 3.897, true rewards: #0: 3.611 -[2025-08-29 18:59:03,070][15827] Avg episode reward: 3.897, avg true_objective: 3.611 -[2025-08-29 18:59:03,308][15827] Num frames 2600... -[2025-08-29 18:59:03,502][15827] Num frames 2700... -[2025-08-29 18:59:03,702][15827] Num frames 2800... -[2025-08-29 18:59:03,933][15827] Num frames 2900... -[2025-08-29 18:59:04,014][15827] Avg episode rewards: #0: 3.890, true rewards: #0: 3.640 -[2025-08-29 18:59:04,017][15827] Avg episode reward: 3.890, avg true_objective: 3.640 -[2025-08-29 18:59:04,198][15827] Num frames 3000... -[2025-08-29 18:59:04,386][15827] Num frames 3100... -[2025-08-29 18:59:04,597][15827] Num frames 3200... -[2025-08-29 18:59:04,866][15827] Avg episode rewards: #0: 3.884, true rewards: #0: 3.662 -[2025-08-29 18:59:04,868][15827] Avg episode reward: 3.884, avg true_objective: 3.662 -[2025-08-29 18:59:04,885][15827] Num frames 3300... -[2025-08-29 18:59:05,157][15827] Num frames 3400... -[2025-08-29 18:59:05,351][15827] Num frames 3500... -[2025-08-29 18:59:05,535][15827] Num frames 3600... -[2025-08-29 18:59:05,710][15827] Num frames 3700... -[2025-08-29 18:59:05,837][15827] Avg episode rewards: #0: 4.044, true rewards: #0: 3.744 -[2025-08-29 18:59:05,838][15827] Avg episode reward: 4.044, avg true_objective: 3.744 -[2025-08-29 18:59:10,753][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-29 19:00:41,771][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 19:00:41,772][15827] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-29 19:00:41,773][15827] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-29 19:00:41,774][15827] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-29 19:00:41,774][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-29 19:00:41,775][15827] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-29 19:00:41,775][15827] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-29 19:00:41,776][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-29 19:00:41,777][15827] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-29 19:00:41,777][15827] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-29 19:00:41,778][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-29 19:00:41,779][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-29 19:00:41,780][15827] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-29 19:00:41,780][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-29 19:00:41,782][15827] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-29 19:00:41,825][15827] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 19:00:41,827][15827] RunningMeanStd input shape: (1,) -[2025-08-29 19:00:41,839][15827] ConvEncoder: input_channels=3 -[2025-08-29 19:00:41,870][15827] Conv encoder output size: 512 -[2025-08-29 19:00:41,871][15827] Policy head output size: 512 -[2025-08-29 19:00:41,891][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:00:42,317][15827] Num frames 100... -[2025-08-29 19:00:42,514][15827] Num frames 200... -[2025-08-29 19:00:42,711][15827] Num frames 300... -[2025-08-29 19:00:42,888][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-29 19:00:42,889][15827] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-29 19:00:42,925][15827] Num frames 400... -[2025-08-29 19:00:43,102][15827] Num frames 500... -[2025-08-29 19:00:43,317][15827] Num frames 600... -[2025-08-29 19:00:43,510][15827] Num frames 700... -[2025-08-29 19:00:43,697][15827] Num frames 800... -[2025-08-29 19:00:43,750][15827] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 -[2025-08-29 19:00:43,751][15827] Avg episode reward: 4.500, avg true_objective: 4.000 -[2025-08-29 19:00:43,990][15827] Num frames 900... -[2025-08-29 19:00:44,196][15827] Num frames 1000... -[2025-08-29 19:00:44,392][15827] Num frames 1100... -[2025-08-29 19:00:48,193][15827] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947 -[2025-08-29 19:00:48,194][15827] Avg episode reward: 4.280, avg true_objective: 3.947 -[2025-08-29 19:00:48,231][15827] Num frames 1200... -[2025-08-29 19:00:48,436][15827] Num frames 1300... -[2025-08-29 19:00:48,618][15827] Num frames 1400... -[2025-08-29 19:00:48,808][15827] Num frames 1500... -[2025-08-29 19:00:48,984][15827] Num frames 1600... -[2025-08-29 19:00:49,096][15827] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 -[2025-08-29 19:00:49,097][15827] Avg episode reward: 4.580, avg true_objective: 4.080 -[2025-08-29 19:00:49,223][15827] Num frames 1700... -[2025-08-29 19:00:49,443][15827] Num frames 1800... -[2025-08-29 19:00:49,699][15827] Num frames 1900... -[2025-08-29 19:00:49,977][15827] Num frames 2000... -[2025-08-29 19:00:50,084][15827] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032 -[2025-08-29 19:00:50,085][15827] Avg episode reward: 4.432, avg true_objective: 4.032 -[2025-08-29 19:00:50,302][15827] Num frames 2100... -[2025-08-29 19:00:50,569][15827] Num frames 2200... -[2025-08-29 19:00:50,777][15827] Num frames 2300... -[2025-08-29 19:00:50,964][15827] Num frames 2400... -[2025-08-29 19:00:51,075][15827] Avg episode rewards: #0: 4.553, true rewards: #0: 4.053 -[2025-08-29 19:00:51,076][15827] Avg episode reward: 4.553, avg true_objective: 4.053 -[2025-08-29 19:00:51,213][15827] Num frames 2500... -[2025-08-29 19:00:51,430][15827] Num frames 2600... -[2025-08-29 19:00:51,706][15827] Num frames 2700... -[2025-08-29 19:00:51,958][15827] Num frames 2800... -[2025-08-29 19:00:52,062][15827] Avg episode rewards: #0: 4.451, true rewards: #0: 4.023 -[2025-08-29 19:00:52,063][15827] Avg episode reward: 4.451, avg true_objective: 4.023 -[2025-08-29 19:00:52,335][15827] Num frames 2900... -[2025-08-29 19:00:52,623][15827] Num frames 3000... -[2025-08-29 19:00:52,851][15827] Num frames 3100... -[2025-08-29 19:00:53,055][15827] Num frames 3200... -[2025-08-29 19:00:53,108][15827] Avg episode rewards: #0: 4.375, true rewards: #0: 4.000 -[2025-08-29 19:00:53,109][15827] Avg episode reward: 4.375, avg true_objective: 4.000 -[2025-08-29 19:00:53,327][15827] Num frames 3300... -[2025-08-29 19:00:53,537][15827] Num frames 3400... -[2025-08-29 19:00:53,767][15827] Num frames 3500... -[2025-08-29 19:00:53,984][15827] Avg episode rewards: #0: 4.316, true rewards: #0: 3.982 -[2025-08-29 19:00:53,986][15827] Avg episode reward: 4.316, avg true_objective: 3.982 -[2025-08-29 19:00:54,026][15827] Num frames 3600... -[2025-08-29 19:00:54,276][15827] Num frames 3700... -[2025-08-29 19:00:54,534][15827] Num frames 3800... -[2025-08-29 19:00:54,815][15827] Num frames 3900... -[2025-08-29 19:00:55,101][15827] Num frames 4000... -[2025-08-29 19:00:55,153][15827] Avg episode rewards: #0: 4.300, true rewards: #0: 4.000 -[2025-08-29 19:00:55,155][15827] Avg episode reward: 4.300, avg true_objective: 4.000 -[2025-08-29 19:01:01,058][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-29 19:03:50,532][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 19:03:50,534][15827] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-29 19:03:50,535][15827] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-29 19:03:50,536][15827] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-29 19:03:50,536][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-29 19:03:50,537][15827] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-29 19:03:50,538][15827] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-29 19:03:50,538][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-29 19:03:50,539][15827] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-29 19:03:50,540][15827] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-29 19:03:50,541][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-29 19:03:50,542][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-29 19:03:50,542][15827] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-29 19:03:50,543][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-29 19:03:50,545][15827] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-29 19:03:50,580][15827] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 19:03:50,581][15827] RunningMeanStd input shape: (1,) -[2025-08-29 19:03:50,598][15827] ConvEncoder: input_channels=3 -[2025-08-29 19:03:50,631][15827] Conv encoder output size: 512 -[2025-08-29 19:03:50,633][15827] Policy head output size: 512 -[2025-08-29 19:03:50,668][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:03:51,206][15827] Num frames 100... -[2025-08-29 19:03:51,458][15827] Num frames 200... -[2025-08-29 19:03:51,669][15827] Num frames 300... -[2025-08-29 19:03:51,877][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-29 19:03:51,878][15827] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-29 19:03:51,913][15827] Num frames 400... -[2025-08-29 19:03:52,094][15827] Num frames 500... -[2025-08-29 19:03:52,281][15827] Num frames 600... -[2025-08-29 19:03:52,475][15827] Num frames 700... -[2025-08-29 19:03:52,659][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-29 19:03:52,660][15827] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-29 19:03:52,730][15827] Num frames 800... -[2025-08-29 19:03:52,915][15827] Num frames 900... -[2025-08-29 19:03:53,117][15827] Num frames 1000... -[2025-08-29 19:03:53,320][15827] Num frames 1100... -[2025-08-29 19:03:53,512][15827] Num frames 1200... -[2025-08-29 19:03:53,605][15827] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 -[2025-08-29 19:03:53,607][15827] Avg episode reward: 4.387, avg true_objective: 4.053 -[2025-08-29 19:03:53,780][15827] Num frames 1300... -[2025-08-29 19:03:53,963][15827] Num frames 1400... -[2025-08-29 19:03:54,157][15827] Num frames 1500... -[2025-08-29 19:03:54,360][15827] Num frames 1600... -[2025-08-29 19:03:54,413][15827] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 -[2025-08-29 19:03:54,414][15827] Avg episode reward: 4.250, avg true_objective: 4.000 -[2025-08-29 19:03:54,630][15827] Num frames 1700... -[2025-08-29 19:03:54,865][15827] Num frames 1800... -[2025-08-29 19:03:55,049][15827] Num frames 1900... -[2025-08-29 19:03:55,269][15827] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 -[2025-08-29 19:03:55,270][15827] Avg episode reward: 4.168, avg true_objective: 3.968 -[2025-08-29 19:03:55,308][15827] Num frames 2000... -[2025-08-29 19:03:55,509][15827] Num frames 2100... -[2025-08-29 19:03:55,701][15827] Num frames 2200... -[2025-08-29 19:03:55,882][15827] Num frames 2300... -[2025-08-29 19:03:56,084][15827] Num frames 2400... -[2025-08-29 19:03:56,217][15827] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 -[2025-08-29 19:03:56,219][15827] Avg episode reward: 4.387, avg true_objective: 4.053 -[2025-08-29 19:03:56,364][15827] Num frames 2500... -[2025-08-29 19:03:56,573][15827] Num frames 2600... -[2025-08-29 19:03:56,768][15827] Num frames 2700... -[2025-08-29 19:03:56,968][15827] Num frames 2800... -[2025-08-29 19:03:57,059][15827] Avg episode rewards: #0: 4.309, true rewards: #0: 4.023 -[2025-08-29 19:03:57,061][15827] Avg episode reward: 4.309, avg true_objective: 4.023 -[2025-08-29 19:03:57,226][15827] Num frames 2900... -[2025-08-29 19:03:57,433][15827] Num frames 3000... -[2025-08-29 19:03:57,631][15827] Num frames 3100... -[2025-08-29 19:03:57,824][15827] Num frames 3200... -[2025-08-29 19:03:57,943][15827] Avg episode rewards: #0: 4.415, true rewards: #0: 4.040 -[2025-08-29 19:03:57,944][15827] Avg episode reward: 4.415, avg true_objective: 4.040 -[2025-08-29 19:03:58,104][15827] Num frames 3300... -[2025-08-29 19:03:58,348][15827] Num frames 3400... -[2025-08-29 19:03:58,555][15827] Num frames 3500... -[2025-08-29 19:03:58,761][15827] Num frames 3600... -[2025-08-29 19:03:58,970][15827] Avg episode rewards: #0: 4.533, true rewards: #0: 4.089 -[2025-08-29 19:03:58,971][15827] Avg episode reward: 4.533, avg true_objective: 4.089 -[2025-08-29 19:03:59,017][15827] Num frames 3700... -[2025-08-29 19:03:59,202][15827] Num frames 3800... -[2025-08-29 19:03:59,394][15827] Num frames 3900... -[2025-08-29 19:03:59,590][15827] Num frames 4000... -[2025-08-29 19:03:59,798][15827] Avg episode rewards: #0: 4.596, true rewards: #0: 4.096 -[2025-08-29 19:03:59,799][15827] Avg episode reward: 4.596, avg true_objective: 4.096 -[2025-08-29 19:04:05,558][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-29 19:06:00,917][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 19:06:00,918][15827] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-29 19:06:00,919][15827] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-29 19:06:00,920][15827] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-29 19:06:00,922][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-29 19:06:00,924][15827] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-29 19:06:00,925][15827] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-29 19:06:00,926][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-29 19:06:00,927][15827] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-29 19:06:00,927][15827] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-29 19:06:00,928][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-29 19:06:00,929][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-29 19:06:00,929][15827] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-29 19:06:00,930][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-29 19:06:00,931][15827] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-29 19:06:00,957][15827] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 19:06:00,959][15827] RunningMeanStd input shape: (1,) -[2025-08-29 19:06:00,971][15827] ConvEncoder: input_channels=3 -[2025-08-29 19:06:01,002][15827] Conv encoder output size: 512 -[2025-08-29 19:06:01,003][15827] Policy head output size: 512 -[2025-08-29 19:06:01,045][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:06:01,719][15827] Num frames 100... -[2025-08-29 19:06:01,917][15827] Num frames 200... -[2025-08-29 19:06:02,156][15827] Num frames 300... -[2025-08-29 19:06:02,427][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-29 19:06:02,428][15827] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-29 19:06:02,474][15827] Num frames 400... -[2025-08-29 19:06:02,683][15827] Num frames 500... -[2025-08-29 19:06:02,881][15827] Num frames 600... -[2025-08-29 19:06:03,003][15827] Num frames 700... -[2025-08-29 19:06:03,153][15827] Num frames 800... -[2025-08-29 19:06:03,253][15827] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 -[2025-08-29 19:06:03,254][15827] Avg episode reward: 4.660, avg true_objective: 4.160 -[2025-08-29 19:06:03,358][15827] Num frames 900... -[2025-08-29 19:06:03,509][15827] Num frames 1000... -[2025-08-29 19:06:03,679][15827] Num frames 1100... -[2025-08-29 19:06:03,794][15827] Num frames 1200... -[2025-08-29 19:06:03,867][15827] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 -[2025-08-29 19:06:03,868][15827] Avg episode reward: 4.387, avg true_objective: 4.053 -[2025-08-29 19:06:04,053][15827] Num frames 1300... -[2025-08-29 19:06:04,204][15827] Num frames 1400... -[2025-08-29 19:06:04,377][15827] Num frames 1500... -[2025-08-29 19:06:04,537][15827] Num frames 1600... -[2025-08-29 19:06:04,685][15827] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 -[2025-08-29 19:06:04,686][15827] Avg episode reward: 4.660, avg true_objective: 4.160 -[2025-08-29 19:06:04,750][15827] Num frames 1700... -[2025-08-29 19:06:04,890][15827] Num frames 1800... -[2025-08-29 19:06:05,043][15827] Num frames 1900... -[2025-08-29 19:06:05,178][15827] Num frames 2000... -[2025-08-29 19:06:05,329][15827] Num frames 2100... -[2025-08-29 19:06:05,400][15827] Avg episode rewards: #0: 4.824, true rewards: #0: 4.224 -[2025-08-29 19:06:05,401][15827] Avg episode reward: 4.824, avg true_objective: 4.224 -[2025-08-29 19:06:05,581][15827] Num frames 2200... -[2025-08-29 19:06:05,717][15827] Num frames 2300... -[2025-08-29 19:06:05,860][15827] Num frames 2400... -[2025-08-29 19:06:06,052][15827] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 -[2025-08-29 19:06:06,053][15827] Avg episode reward: 4.660, avg true_objective: 4.160 -[2025-08-29 19:06:06,068][15827] Num frames 2500... -[2025-08-29 19:06:06,206][15827] Num frames 2600... -[2025-08-29 19:06:06,417][15827] Num frames 2700... -[2025-08-29 19:06:06,581][15827] Num frames 2800... -[2025-08-29 19:06:06,746][15827] Avg episode rewards: #0: 4.543, true rewards: #0: 4.114 -[2025-08-29 19:06:06,747][15827] Avg episode reward: 4.543, avg true_objective: 4.114 -[2025-08-29 19:06:06,795][15827] Num frames 2900... -[2025-08-29 19:06:10,630][15827] Num frames 3000... -[2025-08-29 19:06:10,769][15827] Num frames 3100... -[2025-08-29 19:06:10,957][15827] Num frames 3200... -[2025-08-29 19:06:11,107][15827] Avg episode rewards: #0: 4.455, true rewards: #0: 4.080 -[2025-08-29 19:06:11,108][15827] Avg episode reward: 4.455, avg true_objective: 4.080 -[2025-08-29 19:06:11,177][15827] Num frames 3300... -[2025-08-29 19:06:11,317][15827] Num frames 3400... -[2025-08-29 19:06:11,483][15827] Num frames 3500... -[2025-08-29 19:06:11,681][15827] Num frames 3600... -[2025-08-29 19:06:11,852][15827] Avg episode rewards: #0: 4.533, true rewards: #0: 4.089 -[2025-08-29 19:06:11,853][15827] Avg episode reward: 4.533, avg true_objective: 4.089 -[2025-08-29 19:06:11,920][15827] Num frames 3700... -[2025-08-29 19:06:12,073][15827] Num frames 3800... -[2025-08-29 19:06:12,202][15827] Num frames 3900... -[2025-08-29 19:06:12,483][15827] Num frames 4000... -[2025-08-29 19:06:12,731][15827] Avg episode rewards: #0: 4.464, true rewards: #0: 4.064 -[2025-08-29 19:06:12,733][15827] Avg episode reward: 4.464, avg true_objective: 4.064 -[2025-08-29 19:06:18,499][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-29 19:06:22,890][15827] The model has been pushed to https://huggingface.co/turbo-maikol/rl_course_vizdoom_health_gathering_supreme -[2025-08-29 19:09:28,003][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 19:09:28,004][15827] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-29 19:09:28,004][15827] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-29 19:09:28,005][15827] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-29 19:09:28,006][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-29 19:09:28,006][15827] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-29 19:09:28,007][15827] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-29 19:09:28,007][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-29 19:09:28,008][15827] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-29 19:09:28,008][15827] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-29 19:09:28,009][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-29 19:09:28,010][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-29 19:09:28,011][15827] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-29 19:09:28,011][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-29 19:09:28,012][15827] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-29 19:09:28,068][15827] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 19:09:28,070][15827] RunningMeanStd input shape: (1,) -[2025-08-29 19:09:28,078][15827] ConvEncoder: input_channels=3 -[2025-08-29 19:09:28,110][15827] Conv encoder output size: 512 -[2025-08-29 19:09:28,111][15827] Policy head output size: 512 -[2025-08-29 19:09:28,147][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:09:28,678][15827] Num frames 100... -[2025-08-29 19:09:28,901][15827] Num frames 200... -[2025-08-29 19:09:29,082][15827] Num frames 300... -[2025-08-29 19:09:29,271][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-29 19:09:29,273][15827] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-29 19:09:29,303][15827] Num frames 400... -[2025-08-29 19:09:29,488][15827] Num frames 500... -[2025-08-29 19:09:29,669][15827] Num frames 600... -[2025-08-29 19:09:29,866][15827] Num frames 700... -[2025-08-29 19:09:30,068][15827] Num frames 800... -[2025-08-29 19:09:30,279][15827] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320 -[2025-08-29 19:09:30,281][15827] Avg episode reward: 5.320, avg true_objective: 4.320 -[2025-08-29 19:09:30,350][15827] Num frames 900... -[2025-08-29 19:09:30,519][15827] Num frames 1000... -[2025-08-29 19:09:30,721][15827] Num frames 1100... -[2025-08-29 19:09:30,932][15827] Num frames 1200... -[2025-08-29 19:09:31,091][15827] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160 -[2025-08-29 19:09:31,093][15827] Avg episode reward: 4.827, avg true_objective: 4.160 -[2025-08-29 19:09:31,207][15827] Num frames 1300... -[2025-08-29 19:09:31,409][15827] Num frames 1400... -[2025-08-29 19:09:31,677][15827] Num frames 1500... -[2025-08-29 19:09:31,882][15827] Num frames 1600... -[2025-08-29 19:09:32,000][15827] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 -[2025-08-29 19:09:32,002][15827] Avg episode reward: 4.580, avg true_objective: 4.080 -[2025-08-29 19:09:32,154][15827] Num frames 1700... -[2025-08-29 19:09:32,364][15827] Num frames 1800... -[2025-08-29 19:09:32,596][15827] Avg episode rewards: #0: 4.176, true rewards: #0: 3.776 -[2025-08-29 19:09:32,597][15827] Avg episode reward: 4.176, avg true_objective: 3.776 -[2025-08-29 19:09:32,628][15827] Num frames 1900... -[2025-08-29 19:09:32,853][15827] Num frames 2000... -[2025-08-29 19:09:33,054][15827] Num frames 2100... -[2025-08-29 19:09:33,264][15827] Num frames 2200... -[2025-08-29 19:09:33,433][15827] Num frames 2300... -[2025-08-29 19:09:33,602][15827] Avg episode rewards: #0: 4.393, true rewards: #0: 3.893 -[2025-08-29 19:09:33,603][15827] Avg episode reward: 4.393, avg true_objective: 3.893 -[2025-08-29 19:09:33,741][15827] Num frames 2400... -[2025-08-29 19:09:33,951][15827] Num frames 2500... -[2025-08-29 19:09:34,199][15827] Num frames 2600... -[2025-08-29 19:09:34,376][15827] Num frames 2700... -[2025-08-29 19:09:34,540][15827] Avg episode rewards: #0: 4.549, true rewards: #0: 3.977 -[2025-08-29 19:09:34,541][15827] Avg episode reward: 4.549, avg true_objective: 3.977 -[2025-08-29 19:09:34,566][15827] Num frames 2800... -[2025-08-29 19:09:34,788][15827] Num frames 2900... -[2025-08-29 19:09:34,990][15827] Num frames 3000... -[2025-08-29 19:09:35,103][15827] Num frames 3100... -[2025-08-29 19:09:35,292][15827] Num frames 3200... -[2025-08-29 19:09:35,394][15827] Avg episode rewards: #0: 4.665, true rewards: #0: 4.040 -[2025-08-29 19:09:35,396][15827] Avg episode reward: 4.665, avg true_objective: 4.040 -[2025-08-29 19:09:35,502][15827] Num frames 3300... -[2025-08-29 19:09:35,645][15827] Num frames 3400... -[2025-08-29 19:09:35,752][15827] Num frames 3500... -[2025-08-29 19:09:35,878][15827] Num frames 3600... -[2025-08-29 19:09:35,951][15827] Avg episode rewards: #0: 4.573, true rewards: #0: 4.018 -[2025-08-29 19:09:35,952][15827] Avg episode reward: 4.573, avg true_objective: 4.018 -[2025-08-29 19:09:36,061][15827] Num frames 3700... -[2025-08-29 19:09:36,168][15827] Num frames 3800... -[2025-08-29 19:09:36,298][15827] Num frames 3900... -[2025-08-29 19:09:36,417][15827] Num frames 4000... -[2025-08-29 19:09:36,468][15827] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 -[2025-08-29 19:09:36,469][15827] Avg episode reward: 4.500, avg true_objective: 4.000 -[2025-08-29 19:09:41,440][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-29 19:11:52,387][15827] Environment doom_basic already registered, overwriting... -[2025-08-29 19:11:52,390][15827] Environment doom_two_colors_easy already registered, overwriting... -[2025-08-29 19:11:52,391][15827] Environment doom_two_colors_hard already registered, overwriting... -[2025-08-29 19:11:52,392][15827] Environment doom_dm already registered, overwriting... -[2025-08-29 19:11:52,393][15827] Environment doom_dwango5 already registered, overwriting... -[2025-08-29 19:11:52,394][15827] Environment doom_my_way_home_flat_actions already registered, overwriting... -[2025-08-29 19:11:52,395][15827] Environment doom_defend_the_center_flat_actions already registered, overwriting... -[2025-08-29 19:11:52,397][15827] Environment doom_my_way_home already registered, overwriting... -[2025-08-29 19:11:52,399][15827] Environment doom_deadly_corridor already registered, overwriting... -[2025-08-29 19:11:52,401][15827] Environment doom_defend_the_center already registered, overwriting... -[2025-08-29 19:11:52,402][15827] Environment doom_defend_the_line already registered, overwriting... -[2025-08-29 19:11:52,404][15827] Environment doom_health_gathering already registered, overwriting... -[2025-08-29 19:11:52,405][15827] Environment doom_health_gathering_supreme already registered, overwriting... -[2025-08-29 19:11:52,406][15827] Environment doom_battle already registered, overwriting... -[2025-08-29 19:11:52,408][15827] Environment doom_battle2 already registered, overwriting... -[2025-08-29 19:11:52,409][15827] Environment doom_duel_bots already registered, overwriting... -[2025-08-29 19:11:52,411][15827] Environment doom_deathmatch_bots already registered, overwriting... -[2025-08-29 19:11:52,412][15827] Environment doom_duel already registered, overwriting... -[2025-08-29 19:11:52,414][15827] Environment doom_deathmatch_full already registered, overwriting... -[2025-08-29 19:11:52,415][15827] Environment doom_benchmark already registered, overwriting... -[2025-08-29 19:11:52,416][15827] register_encoder_factory: -[2025-08-29 19:11:52,431][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 19:11:52,432][15827] Overriding arg 'num_envs_per_worker' with value 8 passed from command line -[2025-08-29 19:11:52,433][15827] Overriding arg 'batch_size' with value 16384 passed from command line -[2025-08-29 19:11:52,435][15827] Overriding arg 'ppo_clip_ratio' with value 0.2 passed from command line -[2025-08-29 19:11:52,436][15827] Overriding arg 'learning_rate' with value 0.0002 passed from command line -[2025-08-29 19:11:52,437][15827] Overriding arg 'train_for_env_steps' with value 30000000 passed from command line -[2025-08-29 19:11:52,461][15827] Experiment dir /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists! -[2025-08-29 19:11:52,462][15827] Resuming existing experiment from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment... -[2025-08-29 19:11:52,463][15827] Weights and Biases integration disabled -[2025-08-29 19:11:52,466][15827] Environment var CUDA_VISIBLE_DEVICES is 0 - -[2025-08-29 19:11:56,717][15827] Starting experiment with the following configuration: -help=False -algo=APPO -env=doom_health_gathering_supreme -experiment=default_experiment -train_dir=/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir -restart_behavior=resume -device=gpu -seed=None -num_policies=1 -async_rl=True -serial_mode=False -batched_sampling=False -num_batches_to_accumulate=2 -worker_num_splits=2 -policy_workers_per_policy=1 -max_policy_lag=1000 -num_workers=10 -num_envs_per_worker=8 -batch_size=16384 -num_batches_per_epoch=1 -num_epochs=1 -rollout=64 -recurrence=32 -shuffle_minibatches=False -gamma=0.99 -reward_scale=1.0 -reward_clip=1000.0 -value_bootstrap=False -normalize_returns=True -exploration_loss_coeff=0.001 -value_loss_coeff=0.5 -kl_loss_coeff=0.0 -exploration_loss=symmetric_kl -gae_lambda=0.95 -ppo_clip_ratio=0.2 -ppo_clip_value=0.2 -with_vtrace=False -vtrace_rho=1.0 -vtrace_c=1.0 -optimizer=adam -adam_eps=1e-06 -adam_beta1=0.9 -adam_beta2=0.999 -max_grad_norm=4.0 -learning_rate=0.0002 -lr_schedule=constant -lr_schedule_kl_threshold=0.008 -lr_adaptive_min=1e-06 -lr_adaptive_max=0.01 -obs_subtract_mean=0.0 -obs_scale=255.0 -normalize_input=True -normalize_input_keys=None -decorrelate_experience_max_seconds=0 -decorrelate_envs_on_one_worker=True -actor_worker_gpus=[] -set_workers_cpu_affinity=True -force_envs_single_thread=False -default_niceness=0 -log_to_file=True -experiment_summaries_interval=10 -flush_summaries_interval=30 -stats_avg=100 -summaries_use_frameskip=True -heartbeat_interval=20 -heartbeat_reporting_interval=600 -train_for_env_steps=30000000 -train_for_seconds=10000000000 -save_every_sec=120 -keep_checkpoints=2 -load_checkpoint_kind=latest -save_milestones_sec=-1 -save_best_every_sec=5 -save_best_metric=reward -save_best_after=100000 -benchmark=False -encoder_mlp_layers=[512, 512] -encoder_conv_architecture=convnet_simple -encoder_conv_mlp_layers=[512] -use_rnn=True -rnn_size=512 -rnn_type=gru -rnn_num_layers=1 -decoder_mlp_layers=[] -nonlinearity=elu -policy_initialization=orthogonal -policy_init_gain=1.0 -actor_critic_share_weights=True -adaptive_stddev=True -continuous_tanh_scale=0.0 -initial_stddev=1.0 -use_env_info_cache=False -env_gpu_actions=False -env_gpu_observations=True -env_frameskip=4 -env_framestack=1 -pixel_format=CHW -use_record_episode_statistics=False -with_wandb=False -wandb_user=None -wandb_project=sample_factory -wandb_group=None -wandb_job_type=SF -wandb_tags=[] -with_pbt=False -pbt_mix_policies_in_one_env=True -pbt_period_env_steps=5000000 -pbt_start_mutation=20000000 -pbt_replace_fraction=0.3 -pbt_mutation_rate=0.15 -pbt_replace_reward_gap=0.1 -pbt_replace_reward_gap_absolute=1e-06 -pbt_optimize_gamma=False -pbt_target_objective=true_objective -pbt_perturb_min=1.1 -pbt_perturb_max=1.5 -num_agents=-1 -num_humans=0 -num_bots=-1 -start_bot_difficulty=None -timelimit=None -res_w=128 -res_h=72 -wide_aspect_ratio=False -eval_env_frameskip=1 -fps=35 -command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 -cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} -git_hash=f8ed470f837e96d11b86d84cc03d9d0be1dc0042 -git_repo_name=git@github.com:huggingface/deep-rl-class.git -[2025-08-29 19:11:56,719][15827] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2025-08-29 19:11:56,835][15827] Rollout worker 0 uses device cpu -[2025-08-29 19:11:56,837][15827] Rollout worker 1 uses device cpu -[2025-08-29 19:11:56,838][15827] Rollout worker 2 uses device cpu -[2025-08-29 19:11:56,839][15827] Rollout worker 3 uses device cpu -[2025-08-29 19:11:56,840][15827] Rollout worker 4 uses device cpu -[2025-08-29 19:11:56,841][15827] Rollout worker 5 uses device cpu -[2025-08-29 19:11:56,842][15827] Rollout worker 6 uses device cpu -[2025-08-29 19:11:56,842][15827] Rollout worker 7 uses device cpu -[2025-08-29 19:11:56,843][15827] Rollout worker 8 uses device cpu -[2025-08-29 19:11:56,845][15827] Rollout worker 9 uses device cpu -[2025-08-29 19:11:57,783][15827] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:11:57,784][15827] InferenceWorker_p0-w0: min num requests: 3 -[2025-08-29 19:11:57,826][15827] Starting all processes... -[2025-08-29 19:11:57,827][15827] Starting process learner_proc0 -[2025-08-29 19:11:57,863][15827] Starting all processes... -[2025-08-29 19:11:57,878][15827] Starting process inference_proc0-0 -[2025-08-29 19:11:57,880][15827] Starting process rollout_proc0 -[2025-08-29 19:11:57,880][15827] Starting process rollout_proc1 -[2025-08-29 19:11:57,881][15827] Starting process rollout_proc2 -[2025-08-29 19:11:57,884][15827] Starting process rollout_proc3 -[2025-08-29 19:11:57,884][15827] Starting process rollout_proc4 -[2025-08-29 19:11:57,886][15827] Starting process rollout_proc5 -[2025-08-29 19:11:57,887][15827] Starting process rollout_proc6 -[2025-08-29 19:11:57,887][15827] Starting process rollout_proc7 -[2025-08-29 19:11:57,888][15827] Starting process rollout_proc8 -[2025-08-29 19:11:57,888][15827] Starting process rollout_proc9 -[2025-08-29 19:12:14,844][26744] Worker 4 uses CPU cores [4] -[2025-08-29 19:12:14,844][26743] Worker 1 uses CPU cores [1] -[2025-08-29 19:12:14,845][26758] Worker 7 uses CPU cores [7] -[2025-08-29 19:12:14,845][26748] Worker 3 uses CPU cores [3] -[2025-08-29 19:12:14,845][26725] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:12:14,845][26746] Worker 5 uses CPU cores [5] -[2025-08-29 19:12:14,845][26757] Worker 6 uses CPU cores [6] -[2025-08-29 19:12:14,845][26725] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2025-08-29 19:12:14,845][26741] Worker 0 uses CPU cores [0] -[2025-08-29 19:12:14,845][26740] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:12:14,845][26760] Worker 9 uses CPU cores [9] -[2025-08-29 19:12:14,845][26740] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2025-08-29 19:12:14,846][26759] Worker 8 uses CPU cores [8] -[2025-08-29 19:12:14,846][26742] Worker 2 uses CPU cores [2] -[2025-08-29 19:12:15,015][26725] Num visible devices: 1 -[2025-08-29 19:12:15,015][26740] Num visible devices: 1 -[2025-08-29 19:12:15,018][26725] Starting seed is not provided -[2025-08-29 19:12:15,018][26725] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:12:15,018][26725] Initializing actor-critic model on device cuda:0 -[2025-08-29 19:12:15,019][26725] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 19:12:15,028][26725] RunningMeanStd input shape: (1,) -[2025-08-29 19:12:15,045][26725] ConvEncoder: input_channels=3 -[2025-08-29 19:12:15,412][26725] Conv encoder output size: 512 -[2025-08-29 19:12:15,412][26725] Policy head output size: 512 -[2025-08-29 19:12:15,452][26725] Created Actor Critic model with architecture: -[2025-08-29 19:12:15,452][26725] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2025-08-29 19:12:16,920][26725] Using optimizer -[2025-08-29 19:12:17,773][15827] Heartbeat connected on Batcher_0 -[2025-08-29 19:12:17,784][15827] Heartbeat connected on InferenceWorker_p0-w0 -[2025-08-29 19:12:17,790][15827] Heartbeat connected on RolloutWorker_w0 -[2025-08-29 19:12:17,794][15827] Heartbeat connected on RolloutWorker_w1 -[2025-08-29 19:12:17,799][15827] Heartbeat connected on RolloutWorker_w2 -[2025-08-29 19:12:17,803][15827] Heartbeat connected on RolloutWorker_w3 -[2025-08-29 19:12:17,805][15827] Heartbeat connected on RolloutWorker_w4 -[2025-08-29 19:12:17,808][15827] Heartbeat connected on RolloutWorker_w5 -[2025-08-29 19:12:17,810][15827] Heartbeat connected on RolloutWorker_w6 -[2025-08-29 19:12:17,814][15827] Heartbeat connected on RolloutWorker_w7 -[2025-08-29 19:12:17,816][15827] Heartbeat connected on RolloutWorker_w8 -[2025-08-29 19:12:17,820][15827] Heartbeat connected on RolloutWorker_w9 -[2025-08-29 19:12:19,538][26725] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:12:19,549][26725] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 19:12:19,552][26725] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:12:19,553][26725] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 19:12:19,554][26725] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:12:19,554][26725] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 19:12:19,555][26725] Did not load from checkpoint, starting from scratch! -[2025-08-29 19:12:19,556][26725] Initialized policy 0 weights for model version 0 -[2025-08-29 19:12:19,567][26725] LearnerWorker_p0 finished initialization! -[2025-08-29 19:12:19,567][26725] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:12:19,574][15827] Heartbeat connected on LearnerWorker_p0 -[2025-08-29 19:12:20,141][26740] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 19:12:20,154][26740] RunningMeanStd input shape: (1,) -[2025-08-29 19:12:20,214][26740] ConvEncoder: input_channels=3 -[2025-08-29 19:12:21,638][26740] Conv encoder output size: 512 -[2025-08-29 19:12:21,641][26740] Policy head output size: 512 -[2025-08-29 19:12:47,435][26929] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2025-08-29 19:12:47,552][26929] Rollout worker 0 uses device cpu -[2025-08-29 19:12:47,554][26929] Rollout worker 1 uses device cpu -[2025-08-29 19:12:47,555][26929] Rollout worker 2 uses device cpu -[2025-08-29 19:12:47,555][26929] Rollout worker 3 uses device cpu -[2025-08-29 19:12:47,556][26929] Rollout worker 4 uses device cpu -[2025-08-29 19:12:47,557][26929] Rollout worker 5 uses device cpu -[2025-08-29 19:12:47,558][26929] Rollout worker 6 uses device cpu -[2025-08-29 19:12:47,559][26929] Rollout worker 7 uses device cpu -[2025-08-29 19:12:47,559][26929] Rollout worker 8 uses device cpu -[2025-08-29 19:12:47,560][26929] Rollout worker 9 uses device cpu -[2025-08-29 19:12:50,182][26929] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:12:50,189][26929] InferenceWorker_p0-w0: min num requests: 3 -[2025-08-29 19:12:50,268][26929] Starting all processes... -[2025-08-29 19:12:50,269][26929] Starting process learner_proc0 -[2025-08-29 19:12:50,347][26929] Starting all processes... -[2025-08-29 19:12:50,356][26929] Starting process inference_proc0-0 -[2025-08-29 19:12:50,356][26929] Starting process rollout_proc0 -[2025-08-29 19:12:50,358][26929] Starting process rollout_proc1 -[2025-08-29 19:12:50,358][26929] Starting process rollout_proc2 -[2025-08-29 19:12:50,359][26929] Starting process rollout_proc3 -[2025-08-29 19:12:50,359][26929] Starting process rollout_proc4 -[2025-08-29 19:12:50,360][26929] Starting process rollout_proc5 -[2025-08-29 19:12:50,361][26929] Starting process rollout_proc6 -[2025-08-29 19:12:50,364][26929] Starting process rollout_proc7 -[2025-08-29 19:12:50,364][26929] Starting process rollout_proc8 -[2025-08-29 19:12:50,365][26929] Starting process rollout_proc9 -[2025-08-29 19:13:07,799][27057] Worker 9 uses CPU cores [9] -[2025-08-29 19:13:07,800][27051] Worker 4 uses CPU cores [4] -[2025-08-29 19:13:07,800][27054] Worker 6 uses CPU cores [6] -[2025-08-29 19:13:07,799][27049] Worker 1 uses CPU cores [1] -[2025-08-29 19:13:07,800][27047] Worker 0 uses CPU cores [0] -[2025-08-29 19:13:07,800][27053] Worker 7 uses CPU cores [7] -[2025-08-29 19:13:07,800][27048] Worker 2 uses CPU cores [2] -[2025-08-29 19:13:07,800][27046] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:13:07,800][27052] Worker 5 uses CPU cores [5] -[2025-08-29 19:13:07,800][27056] Worker 8 uses CPU cores [8] -[2025-08-29 19:13:07,800][27046] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2025-08-29 19:13:07,801][27031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:13:07,801][27031] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2025-08-29 19:13:07,801][27050] Worker 3 uses CPU cores [3] -[2025-08-29 19:13:07,873][27031] Num visible devices: 1 -[2025-08-29 19:13:07,873][27046] Num visible devices: 1 -[2025-08-29 19:13:07,875][27031] Starting seed is not provided -[2025-08-29 19:13:07,875][27031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:13:07,875][27031] Initializing actor-critic model on device cuda:0 -[2025-08-29 19:13:07,875][27031] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 19:13:07,889][27031] RunningMeanStd input shape: (1,) -[2025-08-29 19:13:07,897][27031] ConvEncoder: input_channels=3 -[2025-08-29 19:13:08,057][27031] Conv encoder output size: 512 -[2025-08-29 19:13:08,057][27031] Policy head output size: 512 -[2025-08-29 19:13:08,093][27031] Created Actor Critic model with architecture: -[2025-08-29 19:13:08,093][27031] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2025-08-29 19:13:09,580][27031] Using optimizer -[2025-08-29 19:13:13,825][27031] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:13:13,831][27031] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 19:13:13,833][27031] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:13:13,834][27031] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 19:13:13,834][27031] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:13:13,835][27031] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 19:13:13,835][27031] Did not load from checkpoint, starting from scratch! -[2025-08-29 19:13:13,837][27031] Initialized policy 0 weights for model version 0 -[2025-08-29 19:13:13,848][27031] LearnerWorker_p0 finished initialization! -[2025-08-29 19:13:13,849][27031] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:13:16,185][27046] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 19:13:16,198][27046] RunningMeanStd input shape: (1,) -[2025-08-29 19:13:16,305][27046] ConvEncoder: input_channels=3 -[2025-08-29 19:13:16,597][27046] Conv encoder output size: 512 -[2025-08-29 19:13:16,598][27046] Policy head output size: 512 -[2025-08-29 19:16:00,543][32845] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2025-08-29 19:16:00,665][32845] Rollout worker 0 uses device cpu -[2025-08-29 19:16:00,667][32845] Rollout worker 1 uses device cpu -[2025-08-29 19:16:00,667][32845] Rollout worker 2 uses device cpu -[2025-08-29 19:16:00,668][32845] Rollout worker 3 uses device cpu -[2025-08-29 19:16:00,669][32845] Rollout worker 4 uses device cpu -[2025-08-29 19:16:00,670][32845] Rollout worker 5 uses device cpu -[2025-08-29 19:16:00,670][32845] Rollout worker 6 uses device cpu -[2025-08-29 19:16:00,671][32845] Rollout worker 7 uses device cpu -[2025-08-29 19:16:00,672][32845] Rollout worker 8 uses device cpu -[2025-08-29 19:16:00,673][32845] Rollout worker 9 uses device cpu -[2025-08-29 19:16:01,209][32845] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:16:01,210][32845] InferenceWorker_p0-w0: min num requests: 3 -[2025-08-29 19:16:01,287][32845] Starting all processes... -[2025-08-29 19:16:01,288][32845] Starting process learner_proc0 -[2025-08-29 19:16:01,321][32845] Starting all processes... -[2025-08-29 19:16:01,330][32845] Starting process inference_proc0-0 -[2025-08-29 19:16:01,330][32845] Starting process rollout_proc0 -[2025-08-29 19:16:01,330][32845] Starting process rollout_proc1 -[2025-08-29 19:16:01,331][32845] Starting process rollout_proc2 -[2025-08-29 19:16:01,332][32845] Starting process rollout_proc3 -[2025-08-29 19:16:01,333][32845] Starting process rollout_proc4 -[2025-08-29 19:16:01,333][32845] Starting process rollout_proc5 -[2025-08-29 19:16:01,333][32845] Starting process rollout_proc6 -[2025-08-29 19:16:01,334][32845] Starting process rollout_proc7 -[2025-08-29 19:16:01,335][32845] Starting process rollout_proc8 -[2025-08-29 19:16:01,336][32845] Starting process rollout_proc9 -[2025-08-29 19:16:06,968][32989] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:16:06,970][32989] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2025-08-29 19:16:06,995][33009] Worker 5 uses CPU cores [5] -[2025-08-29 19:16:07,242][33008] Worker 2 uses CPU cores [2] -[2025-08-29 19:16:07,413][33007] Worker 3 uses CPU cores [3] -[2025-08-29 19:16:07,496][32989] Num visible devices: 1 -[2025-08-29 19:16:07,512][32989] Starting seed is not provided -[2025-08-29 19:16:07,513][32989] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:16:07,513][32989] Initializing actor-critic model on device cuda:0 -[2025-08-29 19:16:07,513][32989] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 19:16:07,520][32989] RunningMeanStd input shape: (1,) -[2025-08-29 19:16:07,526][33020] Worker 7 uses CPU cores [7] -[2025-08-29 19:16:07,538][32989] ConvEncoder: input_channels=3 -[2025-08-29 19:16:07,555][33006] Worker 1 uses CPU cores [1] -[2025-08-29 19:16:07,628][33004] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:16:07,628][33004] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2025-08-29 19:16:07,708][33004] Num visible devices: 1 -[2025-08-29 19:16:07,740][33012] Worker 8 uses CPU cores [8] -[2025-08-29 19:16:07,749][33005] Worker 0 uses CPU cores [0] -[2025-08-29 19:16:07,753][33010] Worker 4 uses CPU cores [4] -[2025-08-29 19:16:07,786][33011] Worker 6 uses CPU cores [6] -[2025-08-29 19:16:07,790][32989] Conv encoder output size: 512 -[2025-08-29 19:16:07,790][32989] Policy head output size: 512 -[2025-08-29 19:16:07,804][33023] Worker 9 uses CPU cores [9] -[2025-08-29 19:16:07,838][32989] Created Actor Critic model with architecture: -[2025-08-29 19:16:07,838][32989] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2025-08-29 19:16:09,095][32989] Using optimizer -[2025-08-29 19:16:10,568][32989] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:16:10,574][32989] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 19:16:10,577][32989] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:16:10,577][32989] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 19:16:10,578][32989] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 19:16:10,578][32989] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 19:16:10,578][32989] Did not load from checkpoint, starting from scratch! -[2025-08-29 19:16:10,578][32989] Initialized policy 0 weights for model version 0 -[2025-08-29 19:16:10,586][32989] LearnerWorker_p0 finished initialization! -[2025-08-29 19:16:10,586][32989] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 19:16:10,895][32845] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:16:10,928][33004] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 19:16:10,930][33004] RunningMeanStd input shape: (1,) -[2025-08-29 19:16:10,942][33004] ConvEncoder: input_channels=3 -[2025-08-29 19:16:11,045][33004] Conv encoder output size: 512 -[2025-08-29 19:16:11,046][33004] Policy head output size: 512 -[2025-08-29 19:16:11,118][32845] Inference worker 0-0 is ready! -[2025-08-29 19:16:11,119][32845] All inference workers are ready! Signal rollout workers to start! -[2025-08-29 19:16:11,708][33012] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 19:16:11,708][33007] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 19:16:11,709][33008] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 19:16:11,709][33010] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 19:16:11,709][33005] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 19:16:11,709][33009] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 19:16:11,710][33006] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 19:16:11,711][33023] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 19:16:11,711][33011] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 19:16:11,712][33020] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 19:16:12,240][33008] Decorrelating experience for 0 frames... -[2025-08-29 19:16:12,240][33010] Decorrelating experience for 0 frames... -[2025-08-29 19:16:12,240][33009] Decorrelating experience for 0 frames... -[2025-08-29 19:16:12,240][33020] Decorrelating experience for 0 frames... -[2025-08-29 19:16:12,302][33006] Decorrelating experience for 0 frames... -[2025-08-29 19:16:12,404][33010] Decorrelating experience for 64 frames... -[2025-08-29 19:16:12,404][33009] Decorrelating experience for 64 frames... -[2025-08-29 19:16:12,412][33011] Decorrelating experience for 0 frames... -[2025-08-29 19:16:12,418][33008] Decorrelating experience for 64 frames... -[2025-08-29 19:16:12,419][33012] Decorrelating experience for 0 frames... -[2025-08-29 19:16:12,464][33006] Decorrelating experience for 64 frames... -[2025-08-29 19:16:12,523][33020] Decorrelating experience for 64 frames... -[2025-08-29 19:16:12,575][33023] Decorrelating experience for 0 frames... -[2025-08-29 19:16:12,577][33007] Decorrelating experience for 0 frames... -[2025-08-29 19:16:12,584][33005] Decorrelating experience for 0 frames... -[2025-08-29 19:16:12,590][33012] Decorrelating experience for 64 frames... -[2025-08-29 19:16:12,599][33009] Decorrelating experience for 128 frames... -[2025-08-29 19:16:12,620][33008] Decorrelating experience for 128 frames... -[2025-08-29 19:16:12,641][33010] Decorrelating experience for 128 frames... -[2025-08-29 19:16:12,748][33023] Decorrelating experience for 64 frames... -[2025-08-29 19:16:12,761][33005] Decorrelating experience for 64 frames... -[2025-08-29 19:16:12,771][33012] Decorrelating experience for 128 frames... -[2025-08-29 19:16:12,797][33020] Decorrelating experience for 128 frames... -[2025-08-29 19:16:12,803][33011] Decorrelating experience for 64 frames... -[2025-08-29 19:16:12,815][33008] Decorrelating experience for 192 frames... -[2025-08-29 19:16:12,857][33007] Decorrelating experience for 64 frames... -[2025-08-29 19:16:12,939][33023] Decorrelating experience for 128 frames... -[2025-08-29 19:16:12,972][33009] Decorrelating experience for 192 frames... -[2025-08-29 19:16:12,974][33010] Decorrelating experience for 192 frames... -[2025-08-29 19:16:13,007][33012] Decorrelating experience for 192 frames... -[2025-08-29 19:16:13,026][33020] Decorrelating experience for 192 frames... -[2025-08-29 19:16:13,050][33011] Decorrelating experience for 128 frames... -[2025-08-29 19:16:13,142][33008] Decorrelating experience for 256 frames... -[2025-08-29 19:16:13,162][33007] Decorrelating experience for 128 frames... -[2025-08-29 19:16:13,169][33005] Decorrelating experience for 128 frames... -[2025-08-29 19:16:13,267][33006] Decorrelating experience for 128 frames... -[2025-08-29 19:16:13,287][33011] Decorrelating experience for 192 frames... -[2025-08-29 19:16:13,354][33010] Decorrelating experience for 256 frames... -[2025-08-29 19:16:13,405][33007] Decorrelating experience for 192 frames... -[2025-08-29 19:16:13,407][33005] Decorrelating experience for 192 frames... -[2025-08-29 19:16:13,449][33020] Decorrelating experience for 256 frames... -[2025-08-29 19:16:13,495][33012] Decorrelating experience for 256 frames... -[2025-08-29 19:16:13,597][33006] Decorrelating experience for 192 frames... -[2025-08-29 19:16:13,636][33011] Decorrelating experience for 256 frames... -[2025-08-29 19:16:13,655][33010] Decorrelating experience for 320 frames... -[2025-08-29 19:16:13,675][33009] Decorrelating experience for 256 frames... -[2025-08-29 19:16:13,744][33007] Decorrelating experience for 256 frames... -[2025-08-29 19:16:13,751][33008] Decorrelating experience for 320 frames... -[2025-08-29 19:16:13,819][33012] Decorrelating experience for 320 frames... -[2025-08-29 19:16:13,834][33023] Decorrelating experience for 192 frames... -[2025-08-29 19:16:13,949][33011] Decorrelating experience for 320 frames... -[2025-08-29 19:16:13,976][33010] Decorrelating experience for 384 frames... -[2025-08-29 19:16:13,997][33005] Decorrelating experience for 256 frames... -[2025-08-29 19:16:14,278][33006] Decorrelating experience for 256 frames... -[2025-08-29 19:16:14,297][33008] Decorrelating experience for 384 frames... -[2025-08-29 19:16:14,362][33012] Decorrelating experience for 384 frames... -[2025-08-29 19:16:14,452][33011] Decorrelating experience for 384 frames... -[2025-08-29 19:16:14,461][33020] Decorrelating experience for 320 frames... -[2025-08-29 19:16:14,487][33023] Decorrelating experience for 256 frames... -[2025-08-29 19:16:14,550][33005] Decorrelating experience for 320 frames... -[2025-08-29 19:16:14,645][33010] Decorrelating experience for 448 frames... -[2025-08-29 19:16:14,746][33008] Decorrelating experience for 448 frames... -[2025-08-29 19:16:14,859][33012] Decorrelating experience for 448 frames... -[2025-08-29 19:16:14,871][33020] Decorrelating experience for 384 frames... -[2025-08-29 19:16:14,882][33011] Decorrelating experience for 448 frames... -[2025-08-29 19:16:14,914][33005] Decorrelating experience for 384 frames... -[2025-08-29 19:16:14,995][33007] Decorrelating experience for 320 frames... -[2025-08-29 19:16:15,085][33023] Decorrelating experience for 320 frames... -[2025-08-29 19:16:15,098][33006] Decorrelating experience for 320 frames... -[2025-08-29 19:16:15,190][33009] Decorrelating experience for 320 frames... -[2025-08-29 19:16:15,190][33020] Decorrelating experience for 448 frames... -[2025-08-29 19:16:15,373][33023] Decorrelating experience for 384 frames... -[2025-08-29 19:16:15,398][33006] Decorrelating experience for 384 frames... -[2025-08-29 19:16:15,424][33007] Decorrelating experience for 384 frames... -[2025-08-29 19:16:15,434][33005] Decorrelating experience for 448 frames... -[2025-08-29 19:16:15,651][33009] Decorrelating experience for 384 frames... -[2025-08-29 19:16:15,694][33023] Decorrelating experience for 448 frames... -[2025-08-29 19:16:15,726][33007] Decorrelating experience for 448 frames... -[2025-08-29 19:16:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:16:15,921][33006] Decorrelating experience for 448 frames... -[2025-08-29 19:16:15,968][33009] Decorrelating experience for 448 frames... -[2025-08-29 19:16:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:16:21,205][32845] Heartbeat connected on Batcher_0 -[2025-08-29 19:16:21,207][32845] Heartbeat connected on LearnerWorker_p0 -[2025-08-29 19:16:21,217][32845] Heartbeat connected on RolloutWorker_w0 -[2025-08-29 19:16:21,220][32845] Heartbeat connected on RolloutWorker_w2 -[2025-08-29 19:16:21,227][32845] Heartbeat connected on RolloutWorker_w1 -[2025-08-29 19:16:21,230][32845] Heartbeat connected on RolloutWorker_w6 -[2025-08-29 19:16:21,236][32845] Heartbeat connected on RolloutWorker_w7 -[2025-08-29 19:16:21,237][32845] Heartbeat connected on RolloutWorker_w3 -[2025-08-29 19:16:21,247][32845] Heartbeat connected on RolloutWorker_w5 -[2025-08-29 19:16:21,248][32845] Heartbeat connected on RolloutWorker_w8 -[2025-08-29 19:16:21,251][32845] Heartbeat connected on RolloutWorker_w4 -[2025-08-29 19:16:21,285][32845] Heartbeat connected on RolloutWorker_w9 -[2025-08-29 19:16:21,733][32845] Heartbeat connected on InferenceWorker_p0-w0 -[2025-08-29 19:16:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 276.3. Samples: 4144. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:16:25,898][32845] Avg episode reward: [(0, '4.231')] -[2025-08-29 19:16:30,689][32989] Signal inference workers to stop experience collection... -[2025-08-29 19:16:30,698][33004] InferenceWorker_p0-w0: stopping experience collection -[2025-08-29 19:16:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1549.2. Samples: 30984. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:16:30,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:16:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1562.7. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:16:35,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:16:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1302.3. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:16:40,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:16:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1116.2. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:16:45,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:16:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 976.7. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:16:50,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:16:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 868.2. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:16:55,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 868.2. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:00,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 868.2. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:05,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 776.1. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:10,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 179.6. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:15,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:20,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:25,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:31,413][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:31,415][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:35,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:45,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:50,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:17:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:17:55,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:00,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:07,238][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:07,240][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:10,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:15,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:20,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:25,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:30,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:35,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:43,078][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:43,081][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:45,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:50,914][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:50,985][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:18:56,107][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:18:57,860][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:00,908][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:01,362][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:10,484][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:11,440][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:13,121][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:13,143][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:18,915][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:18,917][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:20,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:25,907][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:25,955][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:30,923][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:35,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:35,900][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:45,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:50,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:50,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:19:55,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:19:55,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:00,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:00,898][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:05,900][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:10,898][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:15,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:20,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:25,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:30,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:35,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:45,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:50,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:20:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:20:55,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:00,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:06,408][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:06,410][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:10,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:15,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:20,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:25,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:30,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:35,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:42,241][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:42,242][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:45,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:50,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:21:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:21:55,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:00,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:05,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:10,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:18,070][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:18,073][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:20,898][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:25,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:30,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:35,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:45,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:45,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:53,905][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:53,906][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:22:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:22:55,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:00,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:05,903][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:10,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:15,898][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:20,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:25,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:30,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:35,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:35,907][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:40,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:40,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:45,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:50,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:23:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:23:55,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:00,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:05,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:05,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:10,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:10,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:15,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:20,899][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:25,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:30,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:35,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:41,405][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:41,407][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:45,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:50,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:50,899][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:24:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:24:55,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:00,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:05,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:10,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:17,238][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:17,240][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:20,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:25,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:30,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:35,904][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:45,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:53,071][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:53,073][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:25:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:25:55,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:00,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:05,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:10,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:15,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:20,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:28,903][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:28,906][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:30,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:35,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:45,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:50,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:26:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:26:55,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:00,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:05,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:10,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:15,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:20,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:25,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:30,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:35,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:40,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:45,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:45,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:50,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:50,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:27:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:27:55,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:00,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:05,898][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:10,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:16,406][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:16,407][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:20,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:25,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:25,898][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:30,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:35,904][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:45,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:52,235][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:52,236][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:28:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:28:55,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:00,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:05,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:10,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:15,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:20,904][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:28,072][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:28,075][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:30,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:35,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:45,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:45,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:50,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:29:55,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:29:55,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:03,905][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:03,908][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:05,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:10,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:15,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:20,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:25,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:30,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:30,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:35,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:45,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:50,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:30:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:30:55,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:00,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:00,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:05,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:05,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:10,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:15,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:20,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:25,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:30,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:35,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:45,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:45,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:51,401][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:51,403][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:31:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:31:55,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:00,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:05,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:10,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:15,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:20,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:27,234][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:27,236][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:30,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:35,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:35,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:45,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:50,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:32:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:32:55,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:03,066][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:03,068][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:05,900][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:10,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:15,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:20,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:25,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:30,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:38,900][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:38,901][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:45,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:50,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:33:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:33:55,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:00,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:05,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:10,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:15,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:20,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:25,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:25,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:30,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:30,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:35,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:40,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:40,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:45,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:45,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:50,897][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:34:55,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:34:55,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:00,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:00,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:05,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:05,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:10,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:10,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:15,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:15,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:20,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:20,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:26,396][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:26,398][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:30,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:30,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:35,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:35,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:40,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:40,895][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:45,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:45,898][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:50,894][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:50,896][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:55,895][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 39068. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 19:35:55,898][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 19:35:55,908][32845] No heartbeat for components: Batcher_0 (994 seconds), LearnerWorker_p0 (1174 seconds) -[2025-08-29 19:35:55,909][32845] Stopping training due to lack of heartbeats from , -[2025-08-29 19:35:56,597][32845] Component LearnerWorker_p0 process died already! Don't wait for it. -[2025-08-29 19:35:56,599][32845] Component RolloutWorker_w3 stopped! -[2025-08-29 19:35:56,601][32845] Waiting for ['Batcher_0', 'InferenceWorker_p0-w0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w2', 'RolloutWorker_w4', 'RolloutWorker_w5', 'RolloutWorker_w6', 'RolloutWorker_w7', 'RolloutWorker_w8', 'RolloutWorker_w9'] to stop... -[2025-08-29 19:35:56,596][33007] Stopping RolloutWorker_w3... -[2025-08-29 19:35:56,612][33007] Loop rollout_proc3_evt_loop terminating... -[2025-08-29 19:35:56,615][32845] Component RolloutWorker_w7 stopped! -[2025-08-29 19:35:56,617][32845] Waiting for ['Batcher_0', 'InferenceWorker_p0-w0', 'RolloutWorker_w0', 'RolloutWorker_w1', 'RolloutWorker_w2', 'RolloutWorker_w4', 'RolloutWorker_w5', 'RolloutWorker_w6', 'RolloutWorker_w8', 'RolloutWorker_w9'] to stop... -[2025-08-29 19:35:56,628][32845] Component RolloutWorker_w0 stopped! -[2025-08-29 19:35:56,630][32845] Waiting for ['Batcher_0', 'InferenceWorker_p0-w0', 'RolloutWorker_w1', 'RolloutWorker_w2', 'RolloutWorker_w4', 'RolloutWorker_w5', 'RolloutWorker_w6', 'RolloutWorker_w8', 'RolloutWorker_w9'] to stop... -[2025-08-29 19:35:56,632][32845] Component RolloutWorker_w6 stopped! -[2025-08-29 19:35:56,620][33005] Stopping RolloutWorker_w0... -[2025-08-29 19:35:56,633][32845] Waiting for ['Batcher_0', 'InferenceWorker_p0-w0', 'RolloutWorker_w1', 'RolloutWorker_w2', 'RolloutWorker_w4', 'RolloutWorker_w5', 'RolloutWorker_w8', 'RolloutWorker_w9'] to stop... -[2025-08-29 19:35:56,635][33005] Loop rollout_proc0_evt_loop terminating... -[2025-08-29 19:35:56,619][33020] Stopping RolloutWorker_w7... -[2025-08-29 19:35:56,634][33004] Weights refcount: 2 0 -[2025-08-29 19:35:56,643][33020] Loop rollout_proc7_evt_loop terminating... -[2025-08-29 19:35:56,631][33011] Stopping RolloutWorker_w6... -[2025-08-29 19:35:56,648][33011] Loop rollout_proc6_evt_loop terminating... -[2025-08-29 19:35:56,760][32845] Component RolloutWorker_w5 stopped! -[2025-08-29 19:35:56,762][32845] Waiting for ['Batcher_0', 'InferenceWorker_p0-w0', 'RolloutWorker_w1', 'RolloutWorker_w2', 'RolloutWorker_w4', 'RolloutWorker_w8', 'RolloutWorker_w9'] to stop... -[2025-08-29 19:35:56,765][32845] Component RolloutWorker_w9 stopped! -[2025-08-29 19:35:56,768][32845] Waiting for ['Batcher_0', 'InferenceWorker_p0-w0', 'RolloutWorker_w1', 'RolloutWorker_w2', 'RolloutWorker_w4', 'RolloutWorker_w8'] to stop... -[2025-08-29 19:35:56,777][32845] Component RolloutWorker_w8 stopped! -[2025-08-29 19:35:56,765][33023] Stopping RolloutWorker_w9... -[2025-08-29 19:35:56,781][32845] Waiting for ['Batcher_0', 'InferenceWorker_p0-w0', 'RolloutWorker_w1', 'RolloutWorker_w2', 'RolloutWorker_w4'] to stop... -[2025-08-29 19:35:56,787][33023] Loop rollout_proc9_evt_loop terminating... -[2025-08-29 19:35:56,782][33012] Stopping RolloutWorker_w8... -[2025-08-29 19:35:56,796][33012] Loop rollout_proc8_evt_loop terminating... -[2025-08-29 19:35:56,774][33009] Stopping RolloutWorker_w5... -[2025-08-29 19:35:56,800][33009] Loop rollout_proc5_evt_loop terminating... -[2025-08-29 19:35:56,797][32845] Component RolloutWorker_w4 stopped! -[2025-08-29 19:35:56,802][32845] Waiting for ['Batcher_0', 'InferenceWorker_p0-w0', 'RolloutWorker_w1', 'RolloutWorker_w2'] to stop... -[2025-08-29 19:35:56,804][32845] Component RolloutWorker_w1 stopped! -[2025-08-29 19:35:56,806][32845] Waiting for ['Batcher_0', 'InferenceWorker_p0-w0', 'RolloutWorker_w2'] to stop... -[2025-08-29 19:35:56,803][33010] Stopping RolloutWorker_w4... -[2025-08-29 19:35:56,817][33010] Loop rollout_proc4_evt_loop terminating... -[2025-08-29 19:35:56,809][33006] Stopping RolloutWorker_w1... -[2025-08-29 19:35:56,824][33006] Loop rollout_proc1_evt_loop terminating... -[2025-08-29 19:35:56,862][32845] Component RolloutWorker_w2 stopped! -[2025-08-29 19:35:56,864][32845] Waiting for ['Batcher_0', 'InferenceWorker_p0-w0'] to stop... -[2025-08-29 19:35:56,870][33008] Stopping RolloutWorker_w2... -[2025-08-29 19:35:56,887][33008] Loop rollout_proc2_evt_loop terminating... -[2025-08-29 19:35:56,893][33004] Stopping InferenceWorker_p0-w0... -[2025-08-29 19:35:56,894][33004] Loop inference_proc0-0_evt_loop terminating... -[2025-08-29 19:35:56,894][32845] Component InferenceWorker_p0-w0 stopped! -[2025-08-29 19:35:56,897][32845] Waiting for ['Batcher_0'] to stop... -[2025-08-29 19:45:23,737][32845] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 32845], exiting... -[2025-08-29 19:45:23,751][32845] Runner profile tree view: -main_loop: 1762.4656 -[2025-08-29 19:45:23,755][32845] Collected {0: 0}, FPS: 0.0 -[2025-08-29 19:52:59,093][32845] Environment doom_basic already registered, overwriting... -[2025-08-29 19:52:59,095][32845] Environment doom_two_colors_easy already registered, overwriting... -[2025-08-29 19:52:59,096][32845] Environment doom_two_colors_hard already registered, overwriting... -[2025-08-29 19:52:59,098][32845] Environment doom_dm already registered, overwriting... -[2025-08-29 19:52:59,098][32845] Environment doom_dwango5 already registered, overwriting... -[2025-08-29 19:52:59,099][32845] Environment doom_my_way_home_flat_actions already registered, overwriting... -[2025-08-29 19:52:59,100][32845] Environment doom_defend_the_center_flat_actions already registered, overwriting... -[2025-08-29 19:52:59,100][32845] Environment doom_my_way_home already registered, overwriting... -[2025-08-29 19:52:59,101][32845] Environment doom_deadly_corridor already registered, overwriting... -[2025-08-29 19:52:59,102][32845] Environment doom_defend_the_center already registered, overwriting... -[2025-08-29 19:52:59,103][32845] Environment doom_defend_the_line already registered, overwriting... -[2025-08-29 19:52:59,104][32845] Environment doom_health_gathering already registered, overwriting... -[2025-08-29 19:52:59,104][32845] Environment doom_health_gathering_supreme already registered, overwriting... -[2025-08-29 19:52:59,105][32845] Environment doom_battle already registered, overwriting... -[2025-08-29 19:52:59,106][32845] Environment doom_battle2 already registered, overwriting... -[2025-08-29 19:52:59,106][32845] Environment doom_duel_bots already registered, overwriting... -[2025-08-29 19:52:59,107][32845] Environment doom_deathmatch_bots already registered, overwriting... -[2025-08-29 19:52:59,107][32845] Environment doom_duel already registered, overwriting... -[2025-08-29 19:52:59,108][32845] Environment doom_deathmatch_full already registered, overwriting... -[2025-08-29 19:52:59,109][32845] Environment doom_benchmark already registered, overwriting... -[2025-08-29 19:52:59,109][32845] register_encoder_factory: -[2025-08-29 19:52:59,191][32845] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 19:52:59,209][32845] Experiment dir /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists! -[2025-08-29 19:52:59,224][32845] Resuming existing experiment from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment... -[2025-08-29 19:52:59,225][32845] Weights and Biases integration disabled -[2025-08-29 19:52:59,235][32845] Environment var CUDA_VISIBLE_DEVICES is 0 - -[2025-08-29 19:53:01,760][32845] Starting experiment with the following configuration: -help=False -algo=APPO -env=doom_health_gathering_supreme -experiment=default_experiment -train_dir=/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir -restart_behavior=resume -device=gpu -seed=None -num_policies=1 -async_rl=True -serial_mode=False -batched_sampling=False -num_batches_to_accumulate=2 -worker_num_splits=2 -policy_workers_per_policy=1 -max_policy_lag=1000 -num_workers=10 -num_envs_per_worker=8 -batch_size=16384 -num_batches_per_epoch=1 -num_epochs=1 -rollout=64 -recurrence=32 -shuffle_minibatches=False -gamma=0.99 -reward_scale=1.0 -reward_clip=1000.0 -value_bootstrap=False -normalize_returns=True -exploration_loss_coeff=0.001 -value_loss_coeff=0.5 -kl_loss_coeff=0.0 -exploration_loss=symmetric_kl -gae_lambda=0.95 -ppo_clip_ratio=0.2 -ppo_clip_value=0.2 -with_vtrace=False -vtrace_rho=1.0 -vtrace_c=1.0 -optimizer=adam -adam_eps=1e-06 -adam_beta1=0.9 -adam_beta2=0.999 -max_grad_norm=4.0 -learning_rate=0.0002 -lr_schedule=constant -lr_schedule_kl_threshold=0.008 -lr_adaptive_min=1e-06 -lr_adaptive_max=0.01 -obs_subtract_mean=0.0 -obs_scale=255.0 -normalize_input=True -normalize_input_keys=None -decorrelate_experience_max_seconds=0 -decorrelate_envs_on_one_worker=True -actor_worker_gpus=[] -set_workers_cpu_affinity=True -force_envs_single_thread=False -default_niceness=0 -log_to_file=True -experiment_summaries_interval=10 -flush_summaries_interval=30 -stats_avg=100 -summaries_use_frameskip=True -heartbeat_interval=20 -heartbeat_reporting_interval=600 -train_for_env_steps=30000000 -train_for_seconds=10000000000 -save_every_sec=120 -keep_checkpoints=2 -load_checkpoint_kind=latest -save_milestones_sec=-1 -save_best_every_sec=5 -save_best_metric=reward -save_best_after=100000 -benchmark=False -encoder_mlp_layers=[512, 512] -encoder_conv_architecture=convnet_simple -encoder_conv_mlp_layers=[512] -use_rnn=True -rnn_size=512 -rnn_type=gru -rnn_num_layers=1 -decoder_mlp_layers=[] -nonlinearity=elu -policy_initialization=orthogonal -policy_init_gain=1.0 -actor_critic_share_weights=True -adaptive_stddev=True -continuous_tanh_scale=0.0 -initial_stddev=1.0 -use_env_info_cache=False -env_gpu_actions=False -env_gpu_observations=True -env_frameskip=4 -env_framestack=1 -pixel_format=CHW -use_record_episode_statistics=False -with_wandb=False -wandb_user=None -wandb_project=sample_factory -wandb_group=None -wandb_job_type=SF -wandb_tags=[] -with_pbt=False -pbt_mix_policies_in_one_env=True -pbt_period_env_steps=5000000 -pbt_start_mutation=20000000 -pbt_replace_fraction=0.3 -pbt_mutation_rate=0.15 -pbt_replace_reward_gap=0.1 -pbt_replace_reward_gap_absolute=1e-06 -pbt_optimize_gamma=False -pbt_target_objective=true_objective -pbt_perturb_min=1.1 -pbt_perturb_max=1.5 -num_agents=-1 -num_humans=0 -num_bots=-1 -start_bot_difficulty=None -timelimit=None -res_w=128 -res_h=72 -wide_aspect_ratio=False -eval_env_frameskip=1 -fps=35 -command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 -cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} -git_hash=f8ed470f837e96d11b86d84cc03d9d0be1dc0042 -git_repo_name=git@github.com:huggingface/deep-rl-class.git -[2025-08-29 19:53:01,762][32845] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2025-08-29 19:53:01,831][32845] Rollout worker 0 uses device cpu -[2025-08-29 19:53:01,832][32845] Rollout worker 1 uses device cpu -[2025-08-29 19:53:01,832][32845] Rollout worker 2 uses device cpu -[2025-08-29 19:53:01,833][32845] Rollout worker 3 uses device cpu -[2025-08-29 19:53:01,833][32845] Rollout worker 4 uses device cpu -[2025-08-29 19:53:01,834][32845] Rollout worker 5 uses device cpu -[2025-08-29 19:53:01,836][32845] Rollout worker 6 uses device cpu -[2025-08-29 19:53:01,836][32845] Rollout worker 7 uses device cpu -[2025-08-29 19:53:01,837][32845] Rollout worker 8 uses device cpu -[2025-08-29 19:53:01,837][32845] Rollout worker 9 uses device cpu -[2025-08-29 20:05:29,181][32845] Environment doom_basic already registered, overwriting... -[2025-08-29 20:05:29,187][32845] Environment doom_two_colors_easy already registered, overwriting... -[2025-08-29 20:05:29,187][32845] Environment doom_two_colors_hard already registered, overwriting... -[2025-08-29 20:05:29,188][32845] Environment doom_dm already registered, overwriting... -[2025-08-29 20:05:29,189][32845] Environment doom_dwango5 already registered, overwriting... -[2025-08-29 20:05:29,190][32845] Environment doom_my_way_home_flat_actions already registered, overwriting... -[2025-08-29 20:05:29,191][32845] Environment doom_defend_the_center_flat_actions already registered, overwriting... -[2025-08-29 20:05:29,192][32845] Environment doom_my_way_home already registered, overwriting... -[2025-08-29 20:05:29,193][32845] Environment doom_deadly_corridor already registered, overwriting... -[2025-08-29 20:05:29,194][32845] Environment doom_defend_the_center already registered, overwriting... -[2025-08-29 20:05:29,196][32845] Environment doom_defend_the_line already registered, overwriting... -[2025-08-29 20:05:29,196][32845] Environment doom_health_gathering already registered, overwriting... -[2025-08-29 20:05:29,197][32845] Environment doom_health_gathering_supreme already registered, overwriting... -[2025-08-29 20:05:29,198][32845] Environment doom_battle already registered, overwriting... -[2025-08-29 20:05:29,199][32845] Environment doom_battle2 already registered, overwriting... -[2025-08-29 20:05:29,200][32845] Environment doom_duel_bots already registered, overwriting... -[2025-08-29 20:05:29,200][32845] Environment doom_deathmatch_bots already registered, overwriting... -[2025-08-29 20:05:29,201][32845] Environment doom_duel already registered, overwriting... -[2025-08-29 20:05:29,202][32845] Environment doom_deathmatch_full already registered, overwriting... -[2025-08-29 20:05:29,203][32845] Environment doom_benchmark already registered, overwriting... -[2025-08-29 20:05:29,204][32845] register_encoder_factory: -[2025-08-29 20:05:29,248][32845] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-29 20:05:29,281][32845] Experiment dir /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists! -[2025-08-29 20:05:29,282][32845] Resuming existing experiment from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment... -[2025-08-29 20:05:29,283][32845] Weights and Biases integration disabled -[2025-08-29 20:05:29,291][32845] Environment var CUDA_VISIBLE_DEVICES is 0 - -[2025-08-29 20:05:31,804][32845] Starting experiment with the following configuration: -help=False -algo=APPO -env=doom_health_gathering_supreme -experiment=default_experiment -train_dir=/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir -restart_behavior=resume -device=gpu -seed=None -num_policies=1 -async_rl=True -serial_mode=False -batched_sampling=False -num_batches_to_accumulate=2 -worker_num_splits=2 -policy_workers_per_policy=1 -max_policy_lag=1000 -num_workers=10 -num_envs_per_worker=8 -batch_size=16384 -num_batches_per_epoch=1 -num_epochs=1 -rollout=64 -recurrence=32 -shuffle_minibatches=False -gamma=0.99 -reward_scale=1.0 -reward_clip=1000.0 -value_bootstrap=False -normalize_returns=True -exploration_loss_coeff=0.001 -value_loss_coeff=0.5 -kl_loss_coeff=0.0 -exploration_loss=symmetric_kl -gae_lambda=0.95 -ppo_clip_ratio=0.2 -ppo_clip_value=0.2 -with_vtrace=False -vtrace_rho=1.0 -vtrace_c=1.0 -optimizer=adam -adam_eps=1e-06 -adam_beta1=0.9 -adam_beta2=0.999 -max_grad_norm=4.0 -learning_rate=0.0002 -lr_schedule=constant -lr_schedule_kl_threshold=0.008 -lr_adaptive_min=1e-06 -lr_adaptive_max=0.01 -obs_subtract_mean=0.0 -obs_scale=255.0 -normalize_input=True -normalize_input_keys=None -decorrelate_experience_max_seconds=0 -decorrelate_envs_on_one_worker=True -actor_worker_gpus=[] -set_workers_cpu_affinity=True -force_envs_single_thread=False -default_niceness=0 -log_to_file=True -experiment_summaries_interval=10 -flush_summaries_interval=30 -stats_avg=100 -summaries_use_frameskip=True -heartbeat_interval=20 -heartbeat_reporting_interval=600 -train_for_env_steps=30000000 -train_for_seconds=10000000000 -save_every_sec=120 -keep_checkpoints=2 -load_checkpoint_kind=latest -save_milestones_sec=-1 -save_best_every_sec=5 -save_best_metric=reward -save_best_after=100000 -benchmark=False -encoder_mlp_layers=[512, 512] -encoder_conv_architecture=convnet_simple -encoder_conv_mlp_layers=[512] -use_rnn=True -rnn_size=512 -rnn_type=gru -rnn_num_layers=1 -decoder_mlp_layers=[] -nonlinearity=elu -policy_initialization=orthogonal -policy_init_gain=1.0 -actor_critic_share_weights=True -adaptive_stddev=True -continuous_tanh_scale=0.0 -initial_stddev=1.0 -use_env_info_cache=False -env_gpu_actions=False -env_gpu_observations=True -env_frameskip=4 -env_framestack=1 -pixel_format=CHW -use_record_episode_statistics=False -with_wandb=False -wandb_user=None -wandb_project=sample_factory -wandb_group=None -wandb_job_type=SF -wandb_tags=[] -with_pbt=False -pbt_mix_policies_in_one_env=True -pbt_period_env_steps=5000000 -pbt_start_mutation=20000000 -pbt_replace_fraction=0.3 -pbt_mutation_rate=0.15 -pbt_replace_reward_gap=0.1 -pbt_replace_reward_gap_absolute=1e-06 -pbt_optimize_gamma=False -pbt_target_objective=true_objective -pbt_perturb_min=1.1 -pbt_perturb_max=1.5 -num_agents=-1 -num_humans=0 -num_bots=-1 -start_bot_difficulty=None -timelimit=None -res_w=128 -res_h=72 -wide_aspect_ratio=False -eval_env_frameskip=1 -fps=35 -command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 -cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} -git_hash=f8ed470f837e96d11b86d84cc03d9d0be1dc0042 -git_repo_name=git@github.com:huggingface/deep-rl-class.git -[2025-08-29 20:05:31,805][32845] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... -[2025-08-29 20:05:31,873][32845] Rollout worker 0 uses device cpu -[2025-08-29 20:05:31,874][32845] Rollout worker 1 uses device cpu -[2025-08-29 20:05:31,874][32845] Rollout worker 2 uses device cpu -[2025-08-29 20:05:31,875][32845] Rollout worker 3 uses device cpu -[2025-08-29 20:05:31,876][32845] Rollout worker 4 uses device cpu -[2025-08-29 20:05:31,876][32845] Rollout worker 5 uses device cpu -[2025-08-29 20:05:31,877][32845] Rollout worker 6 uses device cpu -[2025-08-29 20:05:31,877][32845] Rollout worker 7 uses device cpu -[2025-08-29 20:05:31,878][32845] Rollout worker 8 uses device cpu -[2025-08-29 20:05:31,879][32845] Rollout worker 9 uses device cpu -[2025-08-29 20:05:32,658][32845] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 20:05:32,660][32845] InferenceWorker_p0-w0: min num requests: 3 -[2025-08-29 20:05:32,721][32845] Starting all processes... -[2025-08-29 20:05:32,722][32845] Starting process learner_proc0 -[2025-08-29 20:05:32,770][32845] Starting all processes... -[2025-08-29 20:05:32,779][32845] Starting process inference_proc0-0 -[2025-08-29 20:05:32,784][32845] Starting process rollout_proc0 -[2025-08-29 20:05:32,786][32845] Starting process rollout_proc1 -[2025-08-29 20:05:32,788][32845] Starting process rollout_proc2 -[2025-08-29 20:05:32,789][32845] Starting process rollout_proc3 -[2025-08-29 20:05:32,790][32845] Starting process rollout_proc4 -[2025-08-29 20:05:32,791][32845] Starting process rollout_proc5 -[2025-08-29 20:05:32,792][32845] Starting process rollout_proc6 -[2025-08-29 20:05:32,803][32845] Starting process rollout_proc7 -[2025-08-29 20:05:32,804][32845] Starting process rollout_proc8 -[2025-08-29 20:05:32,811][32845] Starting process rollout_proc9 -[2025-08-29 20:05:35,494][48863] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 20:05:35,495][48863] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 -[2025-08-29 20:05:35,505][48864] Worker 0 uses CPU cores [0] -[2025-08-29 20:05:35,512][48868] Worker 5 uses CPU cores [5] -[2025-08-29 20:05:35,522][48865] Worker 1 uses CPU cores [1] -[2025-08-29 20:05:35,534][48846] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 20:05:35,534][48846] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 -[2025-08-29 20:05:35,562][48867] Worker 3 uses CPU cores [3] -[2025-08-29 20:05:35,572][48879] Worker 6 uses CPU cores [6] -[2025-08-29 20:05:35,579][48863] Num visible devices: 1 -[2025-08-29 20:05:35,579][48846] Num visible devices: 1 -[2025-08-29 20:05:35,581][48846] Starting seed is not provided -[2025-08-29 20:05:35,581][48846] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 20:05:35,581][48846] Initializing actor-critic model on device cuda:0 -[2025-08-29 20:05:35,582][48846] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 20:05:35,585][48846] RunningMeanStd input shape: (1,) -[2025-08-29 20:05:35,586][48880] Worker 7 uses CPU cores [7] -[2025-08-29 20:05:35,593][48878] Worker 4 uses CPU cores [4] -[2025-08-29 20:05:35,597][48846] ConvEncoder: input_channels=3 -[2025-08-29 20:05:35,626][48866] Worker 2 uses CPU cores [2] -[2025-08-29 20:05:35,712][48881] Worker 9 uses CPU cores [9] -[2025-08-29 20:05:35,735][48882] Worker 8 uses CPU cores [8] -[2025-08-29 20:05:35,743][48846] Conv encoder output size: 512 -[2025-08-29 20:05:35,743][48846] Policy head output size: 512 -[2025-08-29 20:05:35,767][48846] Created Actor Critic model with architecture: -[2025-08-29 20:05:35,767][48846] ActorCriticSharedWeights( - (obs_normalizer): ObservationNormalizer( - (running_mean_std): RunningMeanStdDictInPlace( - (running_mean_std): ModuleDict( - (obs): RunningMeanStdInPlace() - ) - ) - ) - (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) - (encoder): VizdoomEncoder( - (basic_encoder): ConvEncoder( - (enc): RecursiveScriptModule( - original_name=ConvEncoderImpl - (conv_head): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Conv2d) - (1): RecursiveScriptModule(original_name=ELU) - (2): RecursiveScriptModule(original_name=Conv2d) - (3): RecursiveScriptModule(original_name=ELU) - (4): RecursiveScriptModule(original_name=Conv2d) - (5): RecursiveScriptModule(original_name=ELU) - ) - (mlp_layers): RecursiveScriptModule( - original_name=Sequential - (0): RecursiveScriptModule(original_name=Linear) - (1): RecursiveScriptModule(original_name=ELU) - ) - ) - ) - ) - (core): ModelCoreRNN( - (core): GRU(512, 512) - ) - (decoder): MlpDecoder( - (mlp): Identity() - ) - (critic_linear): Linear(in_features=512, out_features=1, bias=True) - (action_parameterization): ActionParameterizationDefault( - (distribution_linear): Linear(in_features=512, out_features=5, bias=True) - ) -) -[2025-08-29 20:05:36,459][48846] Using optimizer -[2025-08-29 20:05:37,777][48846] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 20:05:37,783][48846] Could not load from checkpoint, attempt 0 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 20:05:37,785][48846] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 20:05:37,786][48846] Could not load from checkpoint, attempt 1 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 20:05:37,786][48846] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-29 20:05:37,786][48846] Could not load from checkpoint, attempt 2 -Traceback (most recent call last): - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint - checkpoint_dict = torch.load(latest_checkpoint, map_location=device) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load - raise pickle.UnpicklingError(_get_wo_message(str(e))) from None -_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. - (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. - (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. - WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. - -Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. -[2025-08-29 20:05:37,787][48846] Did not load from checkpoint, starting from scratch! -[2025-08-29 20:05:37,787][48846] Initialized policy 0 weights for model version 0 -[2025-08-29 20:05:37,793][48846] LearnerWorker_p0 finished initialization! -[2025-08-29 20:05:37,793][48846] Using GPUs [0] for process 0 (actually maps to GPUs [0]) -[2025-08-29 20:05:38,156][48863] RunningMeanStd input shape: (3, 72, 128) -[2025-08-29 20:05:38,162][48863] RunningMeanStd input shape: (1,) -[2025-08-29 20:05:38,173][48863] ConvEncoder: input_channels=3 -[2025-08-29 20:05:38,310][48863] Conv encoder output size: 512 -[2025-08-29 20:05:38,310][48863] Policy head output size: 512 -[2025-08-29 20:05:38,368][32845] Inference worker 0-0 is ready! -[2025-08-29 20:05:38,370][32845] All inference workers are ready! Signal rollout workers to start! -[2025-08-29 20:05:38,534][48880] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 20:05:38,535][48868] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 20:05:38,535][48867] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 20:05:38,548][48878] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 20:05:38,555][48864] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 20:05:38,555][48865] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 20:05:38,556][48881] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 20:05:38,574][48882] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 20:05:38,576][48866] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 20:05:38,625][48879] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-29 20:05:39,292][32845] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:05:39,318][48880] Decorrelating experience for 0 frames... -[2025-08-29 20:05:39,318][48879] Decorrelating experience for 0 frames... -[2025-08-29 20:05:39,318][48864] Decorrelating experience for 0 frames... -[2025-08-29 20:05:39,342][48881] Decorrelating experience for 0 frames... -[2025-08-29 20:05:39,342][48865] Decorrelating experience for 0 frames... -[2025-08-29 20:05:39,343][48882] Decorrelating experience for 0 frames... -[2025-08-29 20:05:39,343][48866] Decorrelating experience for 0 frames... -[2025-08-29 20:05:39,344][48867] Decorrelating experience for 0 frames... -[2025-08-29 20:05:39,344][48878] Decorrelating experience for 0 frames... -[2025-08-29 20:05:39,483][48879] Decorrelating experience for 64 frames... -[2025-08-29 20:05:39,484][48880] Decorrelating experience for 64 frames... -[2025-08-29 20:05:39,504][48882] Decorrelating experience for 64 frames... -[2025-08-29 20:05:39,510][48868] Decorrelating experience for 0 frames... -[2025-08-29 20:05:39,520][48865] Decorrelating experience for 64 frames... -[2025-08-29 20:05:39,526][48866] Decorrelating experience for 64 frames... -[2025-08-29 20:05:39,526][48867] Decorrelating experience for 64 frames... -[2025-08-29 20:05:39,546][48864] Decorrelating experience for 64 frames... -[2025-08-29 20:05:39,658][48879] Decorrelating experience for 128 frames... -[2025-08-29 20:05:39,661][48881] Decorrelating experience for 64 frames... -[2025-08-29 20:05:39,666][48868] Decorrelating experience for 64 frames... -[2025-08-29 20:05:39,670][48882] Decorrelating experience for 128 frames... -[2025-08-29 20:05:39,696][48878] Decorrelating experience for 64 frames... -[2025-08-29 20:05:39,697][48867] Decorrelating experience for 128 frames... -[2025-08-29 20:05:39,724][48864] Decorrelating experience for 128 frames... -[2025-08-29 20:05:39,837][48879] Decorrelating experience for 192 frames... -[2025-08-29 20:05:39,846][48868] Decorrelating experience for 128 frames... -[2025-08-29 20:05:39,860][48866] Decorrelating experience for 128 frames... -[2025-08-29 20:05:39,903][48867] Decorrelating experience for 192 frames... -[2025-08-29 20:05:39,918][48864] Decorrelating experience for 192 frames... -[2025-08-29 20:05:40,010][48865] Decorrelating experience for 128 frames... -[2025-08-29 20:05:40,034][48878] Decorrelating experience for 128 frames... -[2025-08-29 20:05:40,061][48868] Decorrelating experience for 192 frames... -[2025-08-29 20:05:40,072][48866] Decorrelating experience for 192 frames... -[2025-08-29 20:05:40,078][48882] Decorrelating experience for 192 frames... -[2025-08-29 20:05:40,232][48867] Decorrelating experience for 256 frames... -[2025-08-29 20:05:40,252][48880] Decorrelating experience for 128 frames... -[2025-08-29 20:05:40,256][48864] Decorrelating experience for 256 frames... -[2025-08-29 20:05:40,264][48878] Decorrelating experience for 192 frames... -[2025-08-29 20:05:40,285][48881] Decorrelating experience for 128 frames... -[2025-08-29 20:05:40,299][48879] Decorrelating experience for 256 frames... -[2025-08-29 20:05:40,368][48868] Decorrelating experience for 256 frames... -[2025-08-29 20:05:40,394][48866] Decorrelating experience for 256 frames... -[2025-08-29 20:05:40,445][48865] Decorrelating experience for 192 frames... -[2025-08-29 20:05:40,487][48882] Decorrelating experience for 256 frames... -[2025-08-29 20:05:40,535][48867] Decorrelating experience for 320 frames... -[2025-08-29 20:05:40,603][48878] Decorrelating experience for 256 frames... -[2025-08-29 20:05:40,656][48864] Decorrelating experience for 320 frames... -[2025-08-29 20:05:40,693][48881] Decorrelating experience for 192 frames... -[2025-08-29 20:05:40,757][48879] Decorrelating experience for 320 frames... -[2025-08-29 20:05:40,782][48882] Decorrelating experience for 320 frames... -[2025-08-29 20:05:40,792][48868] Decorrelating experience for 320 frames... -[2025-08-29 20:05:41,009][48864] Decorrelating experience for 384 frames... -[2025-08-29 20:05:41,013][48865] Decorrelating experience for 256 frames... -[2025-08-29 20:05:41,022][48867] Decorrelating experience for 384 frames... -[2025-08-29 20:05:41,022][48880] Decorrelating experience for 192 frames... -[2025-08-29 20:05:41,026][48878] Decorrelating experience for 320 frames... -[2025-08-29 20:05:41,076][48881] Decorrelating experience for 256 frames... -[2025-08-29 20:05:41,100][48882] Decorrelating experience for 384 frames... -[2025-08-29 20:05:41,146][48868] Decorrelating experience for 384 frames... -[2025-08-29 20:05:41,281][48866] Decorrelating experience for 320 frames... -[2025-08-29 20:05:41,326][48879] Decorrelating experience for 384 frames... -[2025-08-29 20:05:41,357][48865] Decorrelating experience for 320 frames... -[2025-08-29 20:05:41,371][48878] Decorrelating experience for 384 frames... -[2025-08-29 20:05:41,422][48864] Decorrelating experience for 448 frames... -[2025-08-29 20:05:41,533][48880] Decorrelating experience for 256 frames... -[2025-08-29 20:05:41,538][48867] Decorrelating experience for 448 frames... -[2025-08-29 20:05:41,538][48868] Decorrelating experience for 448 frames... -[2025-08-29 20:05:41,609][48881] Decorrelating experience for 320 frames... -[2025-08-29 20:05:41,627][48866] Decorrelating experience for 384 frames... -[2025-08-29 20:05:41,653][48879] Decorrelating experience for 448 frames... -[2025-08-29 20:05:41,680][48865] Decorrelating experience for 384 frames... -[2025-08-29 20:05:41,788][48882] Decorrelating experience for 448 frames... -[2025-08-29 20:05:41,790][48880] Decorrelating experience for 320 frames... -[2025-08-29 20:05:41,892][48878] Decorrelating experience for 448 frames... -[2025-08-29 20:05:41,977][48866] Decorrelating experience for 448 frames... -[2025-08-29 20:05:42,010][48881] Decorrelating experience for 384 frames... -[2025-08-29 20:05:42,045][48865] Decorrelating experience for 448 frames... -[2025-08-29 20:05:42,152][48880] Decorrelating experience for 384 frames... -[2025-08-29 20:05:42,312][48881] Decorrelating experience for 448 frames... -[2025-08-29 20:05:42,512][48880] Decorrelating experience for 448 frames... -[2025-08-29 20:05:44,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:05:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1116.5. Samples: 11164. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:05:49,296][32845] Avg episode reward: [(0, '4.276')] -[2025-08-29 20:05:53,891][32845] Heartbeat connected on Batcher_0 -[2025-08-29 20:05:53,892][32845] Heartbeat connected on RolloutWorker_w4 -[2025-08-29 20:05:53,893][32845] Heartbeat connected on RolloutWorker_w1 -[2025-08-29 20:05:53,895][32845] Heartbeat connected on RolloutWorker_w6 -[2025-08-29 20:05:53,897][32845] Heartbeat connected on RolloutWorker_w0 -[2025-08-29 20:05:53,899][32845] Heartbeat connected on RolloutWorker_w5 -[2025-08-29 20:05:53,900][32845] Heartbeat connected on RolloutWorker_w9 -[2025-08-29 20:05:53,900][32845] Heartbeat connected on RolloutWorker_w8 -[2025-08-29 20:05:53,901][32845] Heartbeat connected on RolloutWorker_w7 -[2025-08-29 20:05:53,902][32845] Heartbeat connected on RolloutWorker_w3 -[2025-08-29 20:05:53,905][32845] Heartbeat connected on InferenceWorker_p0-w0 -[2025-08-29 20:05:53,913][32845] Heartbeat connected on RolloutWorker_w2 -[2025-08-29 20:05:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2125.4. Samples: 31880. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:05:54,292][32845] Avg episode reward: [(0, '4.334')] -[2025-08-29 20:05:54,672][48846] Signal inference workers to stop experience collection... -[2025-08-29 20:05:54,676][48863] InferenceWorker_p0-w0: stopping experience collection -[2025-08-29 20:05:59,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1839.1. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:05:59,293][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:04,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1471.2. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:04,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:09,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1226.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:09,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:14,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 1050.9. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:14,293][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:19,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 919.5. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:19,293][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:24,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 817.3. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:24,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:29,722][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 809.6. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:29,723][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:34,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 569.2. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:34,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:39,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 108.9. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:39,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:44,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:44,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:49,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:54,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:06:59,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:06:59,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:05,553][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:05,554][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:09,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:09,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:14,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:14,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:19,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:19,293][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:24,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:24,297][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:29,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:29,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:34,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:34,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:41,390][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:41,391][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:44,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:44,293][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:49,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:54,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:07:59,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:07:59,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:04,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:04,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:09,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:09,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:17,219][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:17,220][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:19,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:19,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:24,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:24,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:29,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:29,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:34,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:34,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:39,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:39,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:44,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:44,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:49,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:54,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:08:59,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:08:59,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:04,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:04,300][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:09,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:09,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:14,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:14,299][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:19,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:19,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:24,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:24,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:29,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:29,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:34,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:34,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:39,648][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:39,651][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:44,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:44,293][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:49,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:54,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:09:59,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:09:59,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:04,727][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:04,728][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:09,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:09,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:14,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:14,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:19,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:19,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:24,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:24,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:29,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:29,293][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:34,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:34,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:40,550][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:40,551][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:44,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:44,293][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:49,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:54,293][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:10:59,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:10:59,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:11:04,291][32845] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 36780. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-08-29 20:11:04,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:11:08,245][48846] Signal inference workers to resume experience collection... -[2025-08-29 20:11:08,246][48863] InferenceWorker_p0-w0: resuming experience collection -[2025-08-29 20:11:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 1092.3, 300 sec: 222.2). Total num frames: 65536. Throughput: 0: 2.0. Samples: 36872. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-08-29 20:11:09,292][32845] Avg episode reward: [(0, '4.399')] -[2025-08-29 20:11:16,394][32845] Fps is (10 sec: 5415.1, 60 sec: 1055.3, 300 sec: 220.6). Total num frames: 65536. Throughput: 0: 182.6. Samples: 45380. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-08-29 20:11:16,395][32845] Avg episode reward: [(0, '4.591')] -[2025-08-29 20:11:19,291][32845] Fps is (10 sec: 0.0, 60 sec: 1092.3, 300 sec: 222.2). Total num frames: 65536. Throughput: 0: 372.0. Samples: 53520. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-08-29 20:11:19,292][32845] Avg episode reward: [(0, '4.489')] -[2025-08-29 20:11:24,291][32845] Fps is (10 sec: 0.0, 60 sec: 1092.3, 300 sec: 222.5). Total num frames: 65536. Throughput: 0: 657.4. Samples: 65536. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-08-29 20:11:24,292][32845] Avg episode reward: [(0, '4.476')] -[2025-08-29 20:11:24,593][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_131072.pth... -[2025-08-29 20:11:25,042][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_131072.pth -[2025-08-29 20:11:25,046][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_131072.pth... -[2025-08-29 20:11:25,171][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_131072.pth -[2025-08-29 20:11:25,175][32845] Heartbeat connected on LearnerWorker_p0 -[2025-08-29 20:11:29,291][32845] Fps is (10 sec: 6553.5, 60 sec: 2184.5, 300 sec: 444.3). Total num frames: 131072. Throughput: 0: 664.0. Samples: 66660. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-08-29 20:11:29,293][32845] Avg episode reward: [(0, '4.489')] -[2025-08-29 20:11:34,291][32845] Fps is (10 sec: 13107.3, 60 sec: 3276.8, 300 sec: 666.5). Total num frames: 196608. Throughput: 0: 814.0. Samples: 73412. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) -[2025-08-29 20:11:34,293][32845] Avg episode reward: [(0, '4.670')] -[2025-08-29 20:11:39,291][32845] Fps is (10 sec: 6553.7, 60 sec: 3347.0, 300 sec: 666.5). Total num frames: 196608. Throughput: 0: 1203.3. Samples: 90928. Policy #0 lag: (min: 1.0, avg: 1.2, max: 2.0) -[2025-08-29 20:11:39,292][32845] Avg episode reward: [(0, '4.287')] -[2025-08-29 20:11:41,595][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000004_262144.pth... -[2025-08-29 20:11:41,733][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000004_262144.pth -[2025-08-29 20:11:41,738][48846] Saving new best policy, reward=4.489! -[2025-08-29 20:11:41,814][48846] Saving new best policy, reward=4.670! -[2025-08-29 20:11:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 888.6). Total num frames: 262144. Throughput: 0: 1235.8. Samples: 92392. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:11:44,293][32845] Avg episode reward: [(0, '4.290')] -[2025-08-29 20:11:52,224][32845] Fps is (10 sec: 5067.6, 60 sec: 4165.5, 300 sec: 879.9). Total num frames: 262144. Throughput: 0: 1499.9. Samples: 108676. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:11:52,225][32845] Avg episode reward: [(0, '4.358')] -[2025-08-29 20:11:54,291][32845] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 1110.8). Total num frames: 327680. Throughput: 0: 1621.5. Samples: 109840. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) -[2025-08-29 20:11:54,292][32845] Avg episode reward: [(0, '4.441')] -[2025-08-29 20:11:59,291][32845] Fps is (10 sec: 9272.5, 60 sec: 5461.3, 300 sec: 1115.6). Total num frames: 327680. Throughput: 0: 1710.8. Samples: 118768. Policy #0 lag: (min: 1.0, avg: 1.1, max: 2.0) -[2025-08-29 20:11:59,293][32845] Avg episode reward: [(0, '4.362')] -[2025-08-29 20:12:04,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 1332.9). Total num frames: 393216. Throughput: 0: 1658.5. Samples: 128152. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:04,296][32845] Avg episode reward: [(0, '4.356')] -[2025-08-29 20:12:09,292][32845] Fps is (10 sec: 6553.5, 60 sec: 5461.3, 300 sec: 1332.9). Total num frames: 393216. Throughput: 0: 1664.3. Samples: 140432. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:09,294][32845] Avg episode reward: [(0, '4.539')] -[2025-08-29 20:12:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6791.6, 300 sec: 1555.1). Total num frames: 458752. Throughput: 0: 1766.0. Samples: 146128. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:14,294][32845] Avg episode reward: [(0, '4.415')] -[2025-08-29 20:12:19,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.8, 300 sec: 1777.2). Total num frames: 524288. Throughput: 0: 1863.4. Samples: 157264. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:19,294][32845] Avg episode reward: [(0, '4.286')] -[2025-08-29 20:12:24,292][32845] Fps is (10 sec: 6553.5, 60 sec: 7645.8, 300 sec: 1777.2). Total num frames: 524288. Throughput: 0: 1795.3. Samples: 171716. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:24,295][32845] Avg episode reward: [(0, '4.592')] -[2025-08-29 20:12:29,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 1777.2). Total num frames: 524288. Throughput: 0: 1799.8. Samples: 173384. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:29,293][32845] Avg episode reward: [(0, '4.592')] -[2025-08-29 20:12:34,291][32845] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 2013.7). Total num frames: 589824. Throughput: 0: 1657.4. Samples: 178400. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:34,294][32845] Avg episode reward: [(0, '4.239')] -[2025-08-29 20:12:39,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 1999.4). Total num frames: 589824. Throughput: 0: 1925.5. Samples: 196488. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:39,293][32845] Avg episode reward: [(0, '4.470')] -[2025-08-29 20:12:41,635][48863] Updated weights for policy 0, policy_version 10 (0.0191) -[2025-08-29 20:12:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 2221.6). Total num frames: 655360. Throughput: 0: 1729.8. Samples: 196608. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:44,293][32845] Avg episode reward: [(0, '4.340')] -[2025-08-29 20:12:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6890.3, 300 sec: 2221.6). Total num frames: 655360. Throughput: 0: 1870.4. Samples: 212320. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:49,293][32845] Avg episode reward: [(0, '4.411')] -[2025-08-29 20:12:54,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 2443.7). Total num frames: 720896. Throughput: 0: 1854.2. Samples: 223872. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:12:54,293][32845] Avg episode reward: [(0, '4.358')] -[2025-08-29 20:12:59,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 2665.9). Total num frames: 786432. Throughput: 0: 1807.9. Samples: 227484. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:12:59,293][32845] Avg episode reward: [(0, '4.488')] -[2025-08-29 20:13:04,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 2665.9). Total num frames: 786432. Throughput: 0: 1732.9. Samples: 235244. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:13:04,293][32845] Avg episode reward: [(0, '4.610')] -[2025-08-29 20:13:09,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 2692.6). Total num frames: 786432. Throughput: 0: 1603.7. Samples: 243884. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:13:09,293][32845] Avg episode reward: [(0, '4.482')] -[2025-08-29 20:13:14,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 2888.0). Total num frames: 851968. Throughput: 0: 1776.6. Samples: 253332. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:13:14,293][32845] Avg episode reward: [(0, '4.412')] -[2025-08-29 20:13:19,291][32845] Fps is (10 sec: 13107.2, 60 sec: 6553.6, 300 sec: 3110.2). Total num frames: 917504. Throughput: 0: 1876.4. Samples: 262840. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:13:19,294][32845] Avg episode reward: [(0, '4.401')] -[2025-08-29 20:13:24,291][32845] Fps is (10 sec: 13107.4, 60 sec: 7645.9, 300 sec: 3332.3). Total num frames: 983040. Throughput: 0: 1651.5. Samples: 270804. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:13:24,293][32845] Avg episode reward: [(0, '4.452')] -[2025-08-29 20:13:29,291][32845] Fps is (10 sec: 6553.5, 60 sec: 7645.8, 300 sec: 3332.3). Total num frames: 983040. Throughput: 0: 1842.5. Samples: 279520. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:13:29,293][32845] Avg episode reward: [(0, '4.421')] -[2025-08-29 20:13:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 1694.9. Samples: 288588. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 20:13:34,292][32845] Avg episode reward: [(0, '4.493')] -[2025-08-29 20:13:39,726][32845] Fps is (10 sec: 6280.5, 60 sec: 7590.8, 300 sec: 3549.3). Total num frames: 1048576. Throughput: 0: 1629.0. Samples: 297884. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 20:13:39,728][32845] Avg episode reward: [(0, '4.738')] -[2025-08-29 20:13:44,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 1761.8. Samples: 306764. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 20:13:44,292][32845] Avg episode reward: [(0, '4.472')] -[2025-08-29 20:13:45,301][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000017_1114112.pth... -[2025-08-29 20:13:45,478][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000017_1114112.pth -[2025-08-29 20:13:45,679][48846] Saving new best policy, reward=4.738! -[2025-08-29 20:13:49,291][32845] Fps is (10 sec: 6851.6, 60 sec: 7645.9, 300 sec: 3776.6). Total num frames: 1114112. Throughput: 0: 1708.6. Samples: 312132. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-08-29 20:13:49,293][32845] Avg episode reward: [(0, '4.404')] -[2025-08-29 20:13:54,291][32845] Fps is (10 sec: 13107.1, 60 sec: 7645.9, 300 sec: 3998.8). Total num frames: 1179648. Throughput: 0: 1744.5. Samples: 322388. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-08-29 20:13:54,292][32845] Avg episode reward: [(0, '4.488')] -[2025-08-29 20:13:59,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 3998.8). Total num frames: 1179648. Throughput: 0: 1754.8. Samples: 332296. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-08-29 20:13:59,292][32845] Avg episode reward: [(0, '4.282')] -[2025-08-29 20:14:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 4221.0). Total num frames: 1245184. Throughput: 0: 1805.0. Samples: 344064. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:14:04,292][32845] Avg episode reward: [(0, '4.113')] -[2025-08-29 20:14:09,291][32845] Fps is (10 sec: 6553.5, 60 sec: 7645.9, 300 sec: 4221.0). Total num frames: 1245184. Throughput: 0: 1940.8. Samples: 358140. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:14:09,293][32845] Avg episode reward: [(0, '4.321')] -[2025-08-29 20:14:10,402][48863] Updated weights for policy 0, policy_version 20 (0.0020) -[2025-08-29 20:14:15,562][32845] Fps is (10 sec: 5814.8, 60 sec: 7487.3, 300 sec: 4424.1). Total num frames: 1310720. Throughput: 0: 1776.4. Samples: 361716. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:14:15,563][32845] Avg episode reward: [(0, '4.460')] -[2025-08-29 20:14:19,291][32845] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 4443.1). Total num frames: 1310720. Throughput: 0: 1784.2. Samples: 368876. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:14:19,293][32845] Avg episode reward: [(0, '4.513')] -[2025-08-29 20:14:24,291][32845] Fps is (10 sec: 7507.5, 60 sec: 6553.6, 300 sec: 4665.3). Total num frames: 1376256. Throughput: 0: 1775.7. Samples: 377016. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:14:24,293][32845] Avg episode reward: [(0, '4.358')] -[2025-08-29 20:14:29,291][32845] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 4665.3). Total num frames: 1376256. Throughput: 0: 1741.4. Samples: 385128. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:14:29,292][32845] Avg episode reward: [(0, '4.537')] -[2025-08-29 20:14:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 4893.4). Total num frames: 1441792. Throughput: 0: 1817.2. Samples: 393908. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:14:34,292][32845] Avg episode reward: [(0, '4.426')] -[2025-08-29 20:14:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6601.5, 300 sec: 4887.4). Total num frames: 1441792. Throughput: 0: 1931.8. Samples: 409320. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:14:39,293][32845] Avg episode reward: [(0, '4.436')] -[2025-08-29 20:14:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 5109.6). Total num frames: 1507328. Throughput: 0: 1720.0. Samples: 409696. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) -[2025-08-29 20:14:44,292][32845] Avg episode reward: [(0, '4.338')] -[2025-08-29 20:14:51,382][32845] Fps is (10 sec: 5420.1, 60 sec: 6332.9, 300 sec: 5073.6). Total num frames: 1507328. Throughput: 0: 1725.2. Samples: 425304. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) -[2025-08-29 20:14:51,383][32845] Avg episode reward: [(0, '4.393')] -[2025-08-29 20:14:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 5461.3, 300 sec: 5109.6). Total num frames: 1507328. Throughput: 0: 1509.8. Samples: 426080. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) -[2025-08-29 20:14:54,292][32845] Avg episode reward: [(0, '4.380')] -[2025-08-29 20:14:59,291][32845] Fps is (10 sec: 8286.4, 60 sec: 6553.6, 300 sec: 5339.6). Total num frames: 1572864. Throughput: 0: 1613.0. Samples: 432252. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:14:59,293][32845] Avg episode reward: [(0, '4.276')] -[2025-08-29 20:15:04,291][32845] Fps is (10 sec: 13107.1, 60 sec: 6553.6, 300 sec: 5553.9). Total num frames: 1638400. Throughput: 0: 1629.6. Samples: 442208. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:15:04,293][32845] Avg episode reward: [(0, '4.321')] -[2025-08-29 20:15:09,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 5553.9). Total num frames: 1638400. Throughput: 0: 1800.3. Samples: 458028. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:15:09,292][32845] Avg episode reward: [(0, '4.330')] -[2025-08-29 20:15:14,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6695.4, 300 sec: 5776.1). Total num frames: 1703936. Throughput: 0: 1639.2. Samples: 458892. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:15:14,292][32845] Avg episode reward: [(0, '4.604')] -[2025-08-29 20:15:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 5776.1). Total num frames: 1703936. Throughput: 0: 1785.3. Samples: 474248. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:15:19,292][32845] Avg episode reward: [(0, '4.431')] -[2025-08-29 20:15:27,216][32845] Fps is (10 sec: 5070.5, 60 sec: 6249.0, 300 sec: 5939.3). Total num frames: 1769472. Throughput: 0: 1513.9. Samples: 481872. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:15:27,218][32845] Avg episode reward: [(0, '4.526')] -[2025-08-29 20:15:29,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 5998.2). Total num frames: 1769472. Throughput: 0: 1618.3. Samples: 482520. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:15:29,293][32845] Avg episode reward: [(0, '4.389')] -[2025-08-29 20:15:33,442][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_1835008.pth... -[2025-08-29 20:15:33,543][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000028_1835008.pth -[2025-08-29 20:15:34,291][32845] Fps is (10 sec: 9262.9, 60 sec: 6553.6, 300 sec: 6247.0). Total num frames: 1835008. Throughput: 0: 1531.1. Samples: 491000. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:15:34,293][32845] Avg episode reward: [(0, '4.385')] -[2025-08-29 20:15:39,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6220.4). Total num frames: 1835008. Throughput: 0: 1767.1. Samples: 505600. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:15:39,292][32845] Avg episode reward: [(0, '4.294')] -[2025-08-29 20:15:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 1900544. Throughput: 0: 1683.5. Samples: 508008. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:15:44,292][32845] Avg episode reward: [(0, '4.411')] -[2025-08-29 20:15:49,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6790.2, 300 sec: 6442.5). Total num frames: 1900544. Throughput: 0: 1757.2. Samples: 521280. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:15:49,294][32845] Avg episode reward: [(0, '4.414')] -[2025-08-29 20:15:51,392][48863] Updated weights for policy 0, policy_version 30 (0.0012) -[2025-08-29 20:15:54,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7645.9, 300 sec: 6664.7). Total num frames: 1966080. Throughput: 0: 1566.6. Samples: 528524. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:15:54,292][32845] Avg episode reward: [(0, '4.345')] -[2025-08-29 20:15:59,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 1966080. Throughput: 0: 1757.8. Samples: 537992. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:15:59,292][32845] Avg episode reward: [(0, '4.320')] -[2025-08-29 20:16:04,291][32845] Fps is (10 sec: 0.0, 60 sec: 5461.3, 300 sec: 6442.5). Total num frames: 1966080. Throughput: 0: 1478.1. Samples: 540764. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:04,292][32845] Avg episode reward: [(0, '4.320')] -[2025-08-29 20:16:09,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6712.5). Total num frames: 2031616. Throughput: 0: 1599.0. Samples: 549152. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:09,293][32845] Avg episode reward: [(0, '4.364')] -[2025-08-29 20:16:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 6664.7). Total num frames: 2031616. Throughput: 0: 1649.0. Samples: 556724. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:14,292][32845] Avg episode reward: [(0, '4.399')] -[2025-08-29 20:16:19,292][32845] Fps is (10 sec: 6553.4, 60 sec: 6553.5, 300 sec: 6886.8). Total num frames: 2097152. Throughput: 0: 1678.3. Samples: 566524. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:16:19,295][32845] Avg episode reward: [(0, '4.454')] -[2025-08-29 20:16:24,291][32845] Fps is (10 sec: 13107.3, 60 sec: 6889.5, 300 sec: 6886.8). Total num frames: 2162688. Throughput: 0: 1494.8. Samples: 572868. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:24,293][32845] Avg episode reward: [(0, '4.482')] -[2025-08-29 20:16:29,291][32845] Fps is (10 sec: 6553.9, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 2162688. Throughput: 0: 1621.4. Samples: 580972. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:29,292][32845] Avg episode reward: [(0, '4.331')] -[2025-08-29 20:16:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 2228224. Throughput: 0: 1525.3. Samples: 589916. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:34,292][32845] Avg episode reward: [(0, '4.424')] -[2025-08-29 20:16:39,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 2228224. Throughput: 0: 1503.1. Samples: 596164. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:39,293][32845] Avg episode reward: [(0, '4.446')] -[2025-08-29 20:16:44,291][32845] Fps is (10 sec: 0.0, 60 sec: 5461.3, 300 sec: 6731.6). Total num frames: 2228224. Throughput: 0: 1441.3. Samples: 602852. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:44,292][32845] Avg episode reward: [(0, '4.254')] -[2025-08-29 20:16:49,291][32845] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 2293760. Throughput: 0: 1580.8. Samples: 611900. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:49,293][32845] Avg episode reward: [(0, '4.461')] -[2025-08-29 20:16:54,291][32845] Fps is (10 sec: 13107.4, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 2359296. Throughput: 0: 1610.9. Samples: 621640. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:54,293][32845] Avg episode reward: [(0, '4.537')] -[2025-08-29 20:16:59,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 2359296. Throughput: 0: 1597.2. Samples: 628596. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:16:59,292][32845] Avg episode reward: [(0, '4.348')] -[2025-08-29 20:17:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 2424832. Throughput: 0: 1612.1. Samples: 639068. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:17:04,292][32845] Avg episode reward: [(0, '4.424')] -[2025-08-29 20:17:09,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 2424832. Throughput: 0: 1764.0. Samples: 652248. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:17:09,292][32845] Avg episode reward: [(0, '4.391')] -[2025-08-29 20:17:14,713][32845] Fps is (10 sec: 0.0, 60 sec: 6507.8, 300 sec: 6433.3). Total num frames: 2424832. Throughput: 0: 1569.2. Samples: 652248. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:17:14,715][32845] Avg episode reward: [(0, '4.403')] -[2025-08-29 20:17:19,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 2490368. Throughput: 0: 1568.1. Samples: 660480. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:17:19,293][32845] Avg episode reward: [(0, '4.454')] -[2025-08-29 20:17:24,291][32845] Fps is (10 sec: 13684.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 2555904. Throughput: 0: 1669.3. Samples: 671284. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:17:24,293][32845] Avg episode reward: [(0, '4.578')] -[2025-08-29 20:17:27,691][48863] Updated weights for policy 0, policy_version 40 (0.0016) -[2025-08-29 20:17:29,291][32845] Fps is (10 sec: 13107.5, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 2621440. Throughput: 0: 1841.0. Samples: 685696. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:17:29,292][32845] Avg episode reward: [(0, '4.463')] -[2025-08-29 20:17:32,080][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000041_2686976.pth... -[2025-08-29 20:17:32,174][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000041_2686976.pth -[2025-08-29 20:17:34,291][32845] Fps is (10 sec: 13107.4, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 2686976. Throughput: 0: 1947.6. Samples: 699544. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:17:34,293][32845] Avg episode reward: [(0, '4.595')] -[2025-08-29 20:17:39,291][32845] Fps is (10 sec: 13107.3, 60 sec: 8738.2, 300 sec: 7109.0). Total num frames: 2752512. Throughput: 0: 2278.4. Samples: 724168. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:17:39,292][32845] Avg episode reward: [(0, '4.379')] -[2025-08-29 20:17:44,291][32845] Fps is (10 sec: 13107.1, 60 sec: 9830.4, 300 sec: 7331.2). Total num frames: 2818048. Throughput: 0: 2266.5. Samples: 730588. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:17:44,292][32845] Avg episode reward: [(0, '4.304')] -[2025-08-29 20:17:50,556][32845] Fps is (10 sec: 5817.6, 60 sec: 8557.7, 300 sec: 7078.6). Total num frames: 2818048. Throughput: 0: 2178.5. Samples: 739856. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:17:50,560][32845] Avg episode reward: [(0, '4.439')] -[2025-08-29 20:17:54,291][32845] Fps is (10 sec: 6553.6, 60 sec: 8738.1, 300 sec: 7109.0). Total num frames: 2883584. Throughput: 0: 2081.3. Samples: 745908. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:17:54,292][32845] Avg episode reward: [(0, '4.483')] -[2025-08-29 20:17:59,291][32845] Fps is (10 sec: 7502.7, 60 sec: 8738.1, 300 sec: 7109.0). Total num frames: 2883584. Throughput: 0: 2282.6. Samples: 754000. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:17:59,292][32845] Avg episode reward: [(0, '4.499')] -[2025-08-29 20:18:04,291][32845] Fps is (10 sec: 6553.4, 60 sec: 8738.1, 300 sec: 7331.1). Total num frames: 2949120. Throughput: 0: 2379.8. Samples: 767572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:18:04,294][32845] Avg episode reward: [(0, '4.467')] -[2025-08-29 20:18:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 8738.1, 300 sec: 7109.0). Total num frames: 2949120. Throughput: 0: 2446.5. Samples: 781376. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:18:09,293][32845] Avg episode reward: [(0, '4.634')] -[2025-08-29 20:18:14,291][32845] Fps is (10 sec: 6553.8, 60 sec: 9900.0, 300 sec: 7109.0). Total num frames: 3014656. Throughput: 0: 2240.6. Samples: 786524. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-08-29 20:18:14,292][32845] Avg episode reward: [(0, '4.506')] -[2025-08-29 20:18:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 8738.2, 300 sec: 6886.8). Total num frames: 3014656. Throughput: 0: 2220.3. Samples: 799456. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-08-29 20:18:19,293][32845] Avg episode reward: [(0, '4.511')] -[2025-08-29 20:18:26,386][32845] Fps is (10 sec: 5418.4, 60 sec: 8443.3, 300 sec: 7058.9). Total num frames: 3080192. Throughput: 0: 1646.5. Samples: 801708. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) -[2025-08-29 20:18:26,389][32845] Avg episode reward: [(0, '4.365')] -[2025-08-29 20:18:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 3080192. Throughput: 0: 1642.9. Samples: 804520. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) -[2025-08-29 20:18:29,292][32845] Avg episode reward: [(0, '4.315')] -[2025-08-29 20:18:34,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6897.0). Total num frames: 3080192. Throughput: 0: 1814.2. Samples: 819200. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) -[2025-08-29 20:18:34,292][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 20:18:39,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 3145728. Throughput: 0: 1814.2. Samples: 827548. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:18:39,293][32845] Avg episode reward: [(0, '4.470')] -[2025-08-29 20:18:44,291][32845] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 6886.8). Total num frames: 3145728. Throughput: 0: 1806.5. Samples: 835292. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:18:44,292][32845] Avg episode reward: [(0, '4.511')] -[2025-08-29 20:18:49,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6694.7, 300 sec: 6886.8). Total num frames: 3211264. Throughput: 0: 1696.7. Samples: 843924. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:18:49,292][32845] Avg episode reward: [(0, '4.461')] -[2025-08-29 20:18:52,440][48863] Updated weights for policy 0, policy_version 50 (0.0017) -[2025-08-29 20:18:54,291][32845] Fps is (10 sec: 13107.3, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 3276800. Throughput: 0: 1564.1. Samples: 851760. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:18:54,292][32845] Avg episode reward: [(0, '4.340')] -[2025-08-29 20:18:58,480][48846] Signal inference workers to stop experience collection... (50 times) -[2025-08-29 20:18:58,488][48863] InferenceWorker_p0-w0: stopping experience collection (50 times) -[2025-08-29 20:19:02,221][32845] Fps is (10 sec: 5068.6, 60 sec: 6248.5, 300 sec: 6819.1). Total num frames: 3276800. Throughput: 0: 1528.4. Samples: 859780. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:19:02,222][32845] Avg episode reward: [(0, '4.390')] -[2025-08-29 20:19:04,291][32845] Fps is (10 sec: 0.0, 60 sec: 5461.4, 300 sec: 6886.8). Total num frames: 3276800. Throughput: 0: 1476.2. Samples: 865884. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:19:04,292][32845] Avg episode reward: [(0, '4.390')] -[2025-08-29 20:19:04,616][48846] Signal inference workers to resume experience collection... (50 times) -[2025-08-29 20:19:04,617][48863] InferenceWorker_p0-w0: resuming experience collection (50 times) -[2025-08-29 20:19:09,291][32845] Fps is (10 sec: 9269.1, 60 sec: 6553.6, 300 sec: 6916.6). Total num frames: 3342336. Throughput: 0: 1739.1. Samples: 876324. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:19:09,293][32845] Avg episode reward: [(0, '4.238')] -[2025-08-29 20:19:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 6886.8). Total num frames: 3342336. Throughput: 0: 1777.8. Samples: 884520. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:19:14,292][32845] Avg episode reward: [(0, '4.243')] -[2025-08-29 20:19:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 3407872. Throughput: 0: 1689.2. Samples: 895212. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) -[2025-08-29 20:19:19,293][32845] Avg episode reward: [(0, '4.271')] -[2025-08-29 20:19:24,291][32845] Fps is (10 sec: 13107.1, 60 sec: 6790.7, 300 sec: 7109.0). Total num frames: 3473408. Throughput: 0: 1649.6. Samples: 901780. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) -[2025-08-29 20:19:24,292][32845] Avg episode reward: [(0, '4.335')] -[2025-08-29 20:19:29,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 3473408. Throughput: 0: 1653.4. Samples: 909696. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) -[2025-08-29 20:19:29,293][32845] Avg episode reward: [(0, '4.279')] -[2025-08-29 20:19:32,329][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000054_3538944.pth... -[2025-08-29 20:19:32,502][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000054_3538944.pth -[2025-08-29 20:19:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 3538944. Throughput: 0: 1642.3. Samples: 917828. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:19:34,292][32845] Avg episode reward: [(0, '4.252')] -[2025-08-29 20:19:39,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 3538944. Throughput: 0: 1477.4. Samples: 918244. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:19:39,292][32845] Avg episode reward: [(0, '4.252')] -[2025-08-29 20:19:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7159.7). Total num frames: 3604480. Throughput: 0: 1497.3. Samples: 922772. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:19:44,292][32845] Avg episode reward: [(0, '4.369')] -[2025-08-29 20:19:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 3604480. Throughput: 0: 1655.2. Samples: 940368. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:19:49,292][32845] Avg episode reward: [(0, '4.527')] -[2025-08-29 20:19:54,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 3670016. Throughput: 0: 1713.1. Samples: 953412. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:19:54,292][32845] Avg episode reward: [(0, '4.495')] -[2025-08-29 20:19:59,291][32845] Fps is (10 sec: 13107.3, 60 sec: 8038.4, 300 sec: 7109.0). Total num frames: 3735552. Throughput: 0: 1625.9. Samples: 957684. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:19:59,293][32845] Avg episode reward: [(0, '4.513')] -[2025-08-29 20:20:04,291][32845] Fps is (10 sec: 6553.5, 60 sec: 7645.8, 300 sec: 7109.0). Total num frames: 3735552. Throughput: 0: 1660.3. Samples: 969924. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:20:04,293][32845] Avg episode reward: [(0, '4.460')] -[2025-08-29 20:20:09,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 3735552. Throughput: 0: 1787.7. Samples: 982228. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:20:09,292][32845] Avg episode reward: [(0, '4.469')] -[2025-08-29 20:20:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 3801088. Throughput: 0: 1679.9. Samples: 985292. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:20:14,293][32845] Avg episode reward: [(0, '4.432')] -[2025-08-29 20:20:19,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6955.8). Total num frames: 3801088. Throughput: 0: 1728.1. Samples: 995592. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:20:19,293][32845] Avg episode reward: [(0, '4.363')] -[2025-08-29 20:20:24,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 3866624. Throughput: 0: 1844.6. Samples: 1001252. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-08-29 20:20:24,292][32845] Avg episode reward: [(0, '4.394')] -[2025-08-29 20:20:29,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 3866624. Throughput: 0: 1959.0. Samples: 1010928. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-08-29 20:20:29,293][32845] Avg episode reward: [(0, '4.337')] -[2025-08-29 20:20:30,043][48863] Updated weights for policy 0, policy_version 60 (0.0015) -[2025-08-29 20:20:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 3932160. Throughput: 0: 1717.4. Samples: 1017652. Policy #0 lag: (min: 1.0, avg: 1.9, max: 2.0) -[2025-08-29 20:20:34,292][32845] Avg episode reward: [(0, '4.315')] -[2025-08-29 20:20:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 3932160. Throughput: 0: 1753.3. Samples: 1032312. Policy #0 lag: (min: 1.0, avg: 1.9, max: 2.0) -[2025-08-29 20:20:39,295][32845] Avg episode reward: [(0, '4.436')] -[2025-08-29 20:20:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 3997696. Throughput: 0: 1709.5. Samples: 1034612. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:20:44,292][32845] Avg episode reward: [(0, '4.348')] -[2025-08-29 20:20:49,726][32845] Fps is (10 sec: 6280.5, 60 sec: 6506.4, 300 sec: 6876.7). Total num frames: 3997696. Throughput: 0: 1612.3. Samples: 1043180. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:20:49,727][32845] Avg episode reward: [(0, '4.471')] -[2025-08-29 20:20:54,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 4063232. Throughput: 0: 1476.3. Samples: 1048660. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) -[2025-08-29 20:20:54,292][32845] Avg episode reward: [(0, '4.605')] -[2025-08-29 20:20:59,292][32845] Fps is (10 sec: 6851.5, 60 sec: 5461.3, 300 sec: 7109.0). Total num frames: 4063232. Throughput: 0: 1632.9. Samples: 1058772. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) -[2025-08-29 20:20:59,293][32845] Avg episode reward: [(0, '4.456')] -[2025-08-29 20:21:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 4128768. Throughput: 0: 1564.2. Samples: 1065980. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:21:04,292][32845] Avg episode reward: [(0, '4.422')] -[2025-08-29 20:21:09,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 4128768. Throughput: 0: 1781.8. Samples: 1081432. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:21:09,293][32845] Avg episode reward: [(0, '4.416')] -[2025-08-29 20:21:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 4194304. Throughput: 0: 1605.8. Samples: 1083188. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:21:14,292][32845] Avg episode reward: [(0, '4.554')] -[2025-08-29 20:21:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 4194304. Throughput: 0: 1779.5. Samples: 1097728. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:21:19,292][32845] Avg episode reward: [(0, '4.378')] -[2025-08-29 20:21:25,561][32845] Fps is (10 sec: 5814.9, 60 sec: 6417.7, 300 sec: 7078.5). Total num frames: 4259840. Throughput: 0: 1415.9. Samples: 1097828. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) -[2025-08-29 20:21:25,563][32845] Avg episode reward: [(0, '4.324')] -[2025-08-29 20:21:29,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 4259840. Throughput: 0: 1591.5. Samples: 1106232. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) -[2025-08-29 20:21:29,293][32845] Avg episode reward: [(0, '4.276')] -[2025-08-29 20:21:34,291][32845] Fps is (10 sec: 7507.2, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 4325376. Throughput: 0: 1601.2. Samples: 1114536. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:21:34,292][32845] Avg episode reward: [(0, '4.418')] -[2025-08-29 20:21:39,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 4325376. Throughput: 0: 1816.8. Samples: 1130416. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:21:39,292][32845] Avg episode reward: [(0, '4.392')] -[2025-08-29 20:21:41,113][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_4390912.pth... -[2025-08-29 20:21:41,273][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000067_4390912.pth -[2025-08-29 20:21:44,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 4390912. Throughput: 0: 1612.8. Samples: 1131348. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:21:44,292][32845] Avg episode reward: [(0, '4.327')] -[2025-08-29 20:21:49,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7701.7, 300 sec: 7109.0). Total num frames: 4456448. Throughput: 0: 1599.7. Samples: 1137968. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) -[2025-08-29 20:21:49,292][32845] Avg episode reward: [(0, '4.204')] -[2025-08-29 20:21:54,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 4456448. Throughput: 0: 1651.3. Samples: 1155740. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) -[2025-08-29 20:21:54,292][32845] Avg episode reward: [(0, '4.580')] -[2025-08-29 20:22:01,410][32845] Fps is (10 sec: 5407.6, 60 sec: 7385.1, 300 sec: 7058.3). Total num frames: 4521984. Throughput: 0: 1622.8. Samples: 1159652. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:22:01,412][32845] Avg episode reward: [(0, '4.475')] -[2025-08-29 20:22:04,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 4521984. Throughput: 0: 1552.6. Samples: 1167596. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:22:04,294][32845] Avg episode reward: [(0, '4.531')] -[2025-08-29 20:22:06,874][48863] Updated weights for policy 0, policy_version 70 (0.0012) -[2025-08-29 20:22:09,291][32845] Fps is (10 sec: 8315.6, 60 sec: 7645.9, 300 sec: 7341.6). Total num frames: 4587520. Throughput: 0: 1771.2. Samples: 1175284. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:22:09,293][32845] Avg episode reward: [(0, '4.387')] -[2025-08-29 20:22:14,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 4587520. Throughput: 0: 1738.1. Samples: 1184444. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:22:14,293][32845] Avg episode reward: [(0, '4.117')] -[2025-08-29 20:22:19,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 4653056. Throughput: 0: 1800.5. Samples: 1195560. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-08-29 20:22:19,292][32845] Avg episode reward: [(0, '4.385')] -[2025-08-29 20:22:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6695.3, 300 sec: 6886.8). Total num frames: 4653056. Throughput: 0: 1776.7. Samples: 1210368. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-08-29 20:22:24,293][32845] Avg episode reward: [(0, '4.510')] -[2025-08-29 20:22:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 4718592. Throughput: 0: 1784.4. Samples: 1211648. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-08-29 20:22:29,292][32845] Avg episode reward: [(0, '4.511')] -[2025-08-29 20:22:37,216][32845] Fps is (10 sec: 5070.5, 60 sec: 6249.0, 300 sec: 6599.2). Total num frames: 4718592. Throughput: 0: 1883.9. Samples: 1228252. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-08-29 20:22:37,217][32845] Avg episode reward: [(0, '4.283')] -[2025-08-29 20:22:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 6664.7). Total num frames: 4784128. Throughput: 0: 1625.4. Samples: 1228884. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 20:22:39,292][32845] Avg episode reward: [(0, '4.296')] -[2025-08-29 20:22:44,291][32845] Fps is (10 sec: 9263.1, 60 sec: 6553.6, 300 sec: 6693.4). Total num frames: 4784128. Throughput: 0: 1751.3. Samples: 1234748. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 20:22:44,292][32845] Avg episode reward: [(0, '4.287')] -[2025-08-29 20:22:49,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 4849664. Throughput: 0: 1726.4. Samples: 1245284. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:22:49,292][32845] Avg episode reward: [(0, '4.272')] -[2025-08-29 20:22:54,291][32845] Fps is (10 sec: 6553.3, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 4849664. Throughput: 0: 1858.2. Samples: 1258904. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:22:54,293][32845] Avg episode reward: [(0, '4.368')] -[2025-08-29 20:22:59,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6793.5, 300 sec: 6664.7). Total num frames: 4915200. Throughput: 0: 1705.9. Samples: 1261208. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:22:59,292][32845] Avg episode reward: [(0, '4.770')] -[2025-08-29 20:23:04,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 4915200. Throughput: 0: 1822.6. Samples: 1277576. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:23:04,293][32845] Avg episode reward: [(0, '4.345')] -[2025-08-29 20:23:05,470][48846] Saving new best policy, reward=4.770! -[2025-08-29 20:23:09,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 4980736. Throughput: 0: 1522.4. Samples: 1278876. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:23:09,293][32845] Avg episode reward: [(0, '4.277')] -[2025-08-29 20:23:14,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 5046272. Throughput: 0: 1514.1. Samples: 1279784. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) -[2025-08-29 20:23:14,293][32845] Avg episode reward: [(0, '4.259')] -[2025-08-29 20:23:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6712.3). Total num frames: 5046272. Throughput: 0: 1556.3. Samples: 1293732. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) -[2025-08-29 20:23:19,293][32845] Avg episode reward: [(0, '4.374')] -[2025-08-29 20:23:24,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 5046272. Throughput: 0: 1808.6. Samples: 1310272. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) -[2025-08-29 20:23:24,300][32845] Avg episode reward: [(0, '4.493')] -[2025-08-29 20:23:29,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 5111808. Throughput: 0: 1724.2. Samples: 1312336. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:23:29,292][32845] Avg episode reward: [(0, '4.270')] -[2025-08-29 20:23:34,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6889.4, 300 sec: 6664.7). Total num frames: 5111808. Throughput: 0: 1820.1. Samples: 1327188. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:23:34,292][32845] Avg episode reward: [(0, '4.376')] -[2025-08-29 20:23:34,627][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000079_5177344.pth... -[2025-08-29 20:23:34,788][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000079_5177344.pth -[2025-08-29 20:23:38,309][48863] Updated weights for policy 0, policy_version 80 (0.0015) -[2025-08-29 20:23:39,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 5242880. Throughput: 0: 1546.3. Samples: 1328488. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:23:39,292][32845] Avg episode reward: [(0, '4.443')] -[2025-08-29 20:23:44,291][32845] Fps is (10 sec: 13107.0, 60 sec: 7645.8, 300 sec: 6886.8). Total num frames: 5242880. Throughput: 0: 1719.6. Samples: 1338592. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:23:44,293][32845] Avg episode reward: [(0, '4.272')] -[2025-08-29 20:23:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 5242880. Throughput: 0: 1622.7. Samples: 1350596. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:23:49,292][32845] Avg episode reward: [(0, '4.292')] -[2025-08-29 20:23:54,291][32845] Fps is (10 sec: 6553.8, 60 sec: 7645.9, 300 sec: 6955.9). Total num frames: 5308416. Throughput: 0: 1799.9. Samples: 1359872. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:23:54,292][32845] Avg episode reward: [(0, '4.374')] -[2025-08-29 20:23:59,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 5308416. Throughput: 0: 1926.1. Samples: 1366460. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:23:59,292][32845] Avg episode reward: [(0, '4.334')] -[2025-08-29 20:24:04,291][32845] Fps is (10 sec: 6553.5, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 5373952. Throughput: 0: 1836.5. Samples: 1376376. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:24:04,292][32845] Avg episode reward: [(0, '4.236')] -[2025-08-29 20:24:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 5373952. Throughput: 0: 1811.8. Samples: 1391804. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:24:09,292][32845] Avg episode reward: [(0, '4.373')] -[2025-08-29 20:24:14,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 5439488. Throughput: 0: 1786.5. Samples: 1392728. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:24:14,292][32845] Avg episode reward: [(0, '4.713')] -[2025-08-29 20:24:19,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 5439488. Throughput: 0: 1785.2. Samples: 1407524. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:24:19,293][32845] Avg episode reward: [(0, '4.490')] -[2025-08-29 20:24:24,711][32845] Fps is (10 sec: 6289.7, 60 sec: 7592.8, 300 sec: 6877.1). Total num frames: 5505024. Throughput: 0: 1763.3. Samples: 1408576. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:24:24,712][32845] Avg episode reward: [(0, '4.372')] -[2025-08-29 20:24:29,291][32845] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 5505024. Throughput: 0: 1700.6. Samples: 1415120. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:24:29,292][32845] Avg episode reward: [(0, '4.467')] -[2025-08-29 20:24:34,291][32845] Fps is (10 sec: 6840.7, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 5570560. Throughput: 0: 1664.6. Samples: 1425504. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:24:34,293][32845] Avg episode reward: [(0, '4.402')] -[2025-08-29 20:24:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 6664.7). Total num frames: 5570560. Throughput: 0: 1788.4. Samples: 1440348. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:24:39,292][32845] Avg episode reward: [(0, '4.373')] -[2025-08-29 20:24:44,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 5636096. Throughput: 0: 1675.9. Samples: 1441876. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:24:44,292][32845] Avg episode reward: [(0, '4.407')] -[2025-08-29 20:24:49,291][32845] Fps is (10 sec: 13107.1, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 5701632. Throughput: 0: 1632.8. Samples: 1449852. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:24:49,293][32845] Avg episode reward: [(0, '4.422')] -[2025-08-29 20:24:54,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 5767168. Throughput: 0: 1839.0. Samples: 1474560. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:24:54,293][32845] Avg episode reward: [(0, '4.578')] -[2025-08-29 20:25:00,547][32845] Fps is (10 sec: 5822.6, 60 sec: 7489.2, 300 sec: 6857.7). Total num frames: 5767168. Throughput: 0: 1995.9. Samples: 1485048. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:25:00,547][32845] Avg episode reward: [(0, '4.530')] -[2025-08-29 20:25:04,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 5832704. Throughput: 0: 1792.1. Samples: 1488168. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 20:25:04,292][32845] Avg episode reward: [(0, '4.243')] -[2025-08-29 20:25:07,527][48863] Updated weights for policy 0, policy_version 90 (0.0015) -[2025-08-29 20:25:09,291][32845] Fps is (10 sec: 14988.6, 60 sec: 8738.1, 300 sec: 7109.0). Total num frames: 5898240. Throughput: 0: 2302.9. Samples: 1511240. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:25:09,293][32845] Avg episode reward: [(0, '4.372')] -[2025-08-29 20:25:14,291][32845] Fps is (10 sec: 13107.0, 60 sec: 8738.1, 300 sec: 7331.1). Total num frames: 5963776. Throughput: 0: 2278.8. Samples: 1517664. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:25:14,293][32845] Avg episode reward: [(0, '4.382')] -[2025-08-29 20:25:19,291][32845] Fps is (10 sec: 13107.3, 60 sec: 9830.4, 300 sec: 7331.1). Total num frames: 6029312. Throughput: 0: 2542.5. Samples: 1539916. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:25:19,294][32845] Avg episode reward: [(0, '4.604')] -[2025-08-29 20:25:24,291][32845] Fps is (10 sec: 13107.3, 60 sec: 9899.6, 300 sec: 7553.3). Total num frames: 6094848. Throughput: 0: 2580.7. Samples: 1556480. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:25:24,292][32845] Avg episode reward: [(0, '4.512')] -[2025-08-29 20:25:29,291][32845] Fps is (10 sec: 13107.2, 60 sec: 10922.6, 300 sec: 7553.3). Total num frames: 6160384. Throughput: 0: 2763.1. Samples: 1566216. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:25:29,293][32845] Avg episode reward: [(0, '4.561')] -[2025-08-29 20:25:36,384][32845] Fps is (10 sec: 5419.3, 60 sec: 9499.0, 300 sec: 7500.1). Total num frames: 6160384. Throughput: 0: 2862.0. Samples: 1584632. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:25:36,385][32845] Avg episode reward: [(0, '4.372')] -[2025-08-29 20:25:37,956][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000095_6225920.pth... -[2025-08-29 20:25:38,134][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000095_6225920.pth -[2025-08-29 20:25:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 10922.6, 300 sec: 7553.3). Total num frames: 6225920. Throughput: 0: 2568.3. Samples: 1590132. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:25:39,292][32845] Avg episode reward: [(0, '4.467')] -[2025-08-29 20:25:44,291][32845] Fps is (10 sec: 16576.7, 60 sec: 10922.7, 300 sec: 7786.9). Total num frames: 6291456. Throughput: 0: 2402.2. Samples: 1590132. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:25:44,292][32845] Avg episode reward: [(0, '4.524')] -[2025-08-29 20:25:49,291][32845] Fps is (10 sec: 13107.3, 60 sec: 10922.7, 300 sec: 7775.5). Total num frames: 6356992. Throughput: 0: 2862.4. Samples: 1616976. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:25:49,293][32845] Avg episode reward: [(0, '4.412')] -[2025-08-29 20:25:54,291][32845] Fps is (10 sec: 13107.3, 60 sec: 10922.7, 300 sec: 7997.6). Total num frames: 6422528. Throughput: 0: 2688.7. Samples: 1632232. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:25:54,292][32845] Avg episode reward: [(0, '4.453')] -[2025-08-29 20:25:59,291][32845] Fps is (10 sec: 13106.9, 60 sec: 12271.6, 300 sec: 7997.6). Total num frames: 6488064. Throughput: 0: 2878.1. Samples: 1647180. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 20:25:59,293][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 20:26:03,257][48863] Updated weights for policy 0, policy_version 100 (0.0020) -[2025-08-29 20:26:04,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 8219.8). Total num frames: 6553600. Throughput: 0: 2762.0. Samples: 1664204. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 20:26:04,292][32845] Avg episode reward: [(0, '4.406')] -[2025-08-29 20:26:12,216][32845] Fps is (10 sec: 5070.6, 60 sec: 10415.0, 300 sec: 7919.1). Total num frames: 6553600. Throughput: 0: 2733.4. Samples: 1687480. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 20:26:12,217][32845] Avg episode reward: [(0, '4.466')] -[2025-08-29 20:26:12,524][48846] Signal inference workers to stop experience collection... (100 times) -[2025-08-29 20:26:12,531][48846] Signal inference workers to resume experience collection... (100 times) -[2025-08-29 20:26:12,538][48863] InferenceWorker_p0-w0: stopping experience collection (100 times) -[2025-08-29 20:26:12,545][48863] InferenceWorker_p0-w0: resuming experience collection (100 times) -[2025-08-29 20:26:14,291][32845] Fps is (10 sec: 6553.5, 60 sec: 10922.7, 300 sec: 8219.8). Total num frames: 6619136. Throughput: 0: 2696.4. Samples: 1687552. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:26:14,292][32845] Avg episode reward: [(0, '4.482')] -[2025-08-29 20:26:19,291][32845] Fps is (10 sec: 18525.9, 60 sec: 10922.7, 300 sec: 8255.3). Total num frames: 6684672. Throughput: 0: 2615.9. Samples: 1696872. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:26:19,292][32845] Avg episode reward: [(0, '4.608')] -[2025-08-29 20:26:24,291][32845] Fps is (10 sec: 13107.3, 60 sec: 10922.7, 300 sec: 8441.9). Total num frames: 6750208. Throughput: 0: 2908.3. Samples: 1721004. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:26:24,292][32845] Avg episode reward: [(0, '4.652')] -[2025-08-29 20:26:29,291][32845] Fps is (10 sec: 13107.2, 60 sec: 10922.7, 300 sec: 8441.9). Total num frames: 6815744. Throughput: 0: 3101.4. Samples: 1729696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 20:26:29,292][32845] Avg episode reward: [(0, '4.539')] -[2025-08-29 20:26:34,291][32845] Fps is (10 sec: 13107.2, 60 sec: 12449.2, 300 sec: 8664.1). Total num frames: 6881280. Throughput: 0: 3015.0. Samples: 1752652. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:26:34,292][32845] Avg episode reward: [(0, '4.460')] -[2025-08-29 20:26:39,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12015.0, 300 sec: 8664.1). Total num frames: 6946816. Throughput: 0: 3005.3. Samples: 1767472. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:26:39,292][32845] Avg episode reward: [(0, '4.371')] -[2025-08-29 20:26:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 10922.7, 300 sec: 8441.9). Total num frames: 6946816. Throughput: 0: 2886.1. Samples: 1777052. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:26:44,292][32845] Avg episode reward: [(0, '4.270')] -[2025-08-29 20:26:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 9830.4, 300 sec: 8441.9). Total num frames: 6946816. Throughput: 0: 2703.4. Samples: 1785856. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:26:49,292][32845] Avg episode reward: [(0, '4.270')] -[2025-08-29 20:26:54,291][32845] Fps is (10 sec: 6553.5, 60 sec: 9830.4, 300 sec: 8503.0). Total num frames: 7012352. Throughput: 0: 2530.9. Samples: 1793968. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:26:54,293][32845] Avg episode reward: [(0, '4.442')] -[2025-08-29 20:26:59,291][32845] Fps is (10 sec: 6553.6, 60 sec: 8738.2, 300 sec: 8441.9). Total num frames: 7012352. Throughput: 0: 2537.1. Samples: 1801720. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:26:59,292][32845] Avg episode reward: [(0, '4.502')] -[2025-08-29 20:27:04,291][32845] Fps is (10 sec: 6553.7, 60 sec: 8738.1, 300 sec: 8441.9). Total num frames: 7077888. Throughput: 0: 2500.7. Samples: 1809404. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:27:04,292][32845] Avg episode reward: [(0, '4.553')] -[2025-08-29 20:27:09,291][32845] Fps is (10 sec: 13107.3, 60 sec: 10334.2, 300 sec: 8664.1). Total num frames: 7143424. Throughput: 0: 2171.5. Samples: 1818720. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:27:09,292][32845] Avg episode reward: [(0, '4.436')] -[2025-08-29 20:27:14,291][32845] Fps is (10 sec: 6553.5, 60 sec: 8738.1, 300 sec: 8441.9). Total num frames: 7143424. Throughput: 0: 2115.1. Samples: 1824876. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:27:14,293][32845] Avg episode reward: [(0, '4.582')] -[2025-08-29 20:27:18,799][48863] Updated weights for policy 0, policy_version 110 (0.0018) -[2025-08-29 20:27:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 8738.1, 300 sec: 8664.1). Total num frames: 7208960. Throughput: 0: 1831.9. Samples: 1835088. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:27:19,292][32845] Avg episode reward: [(0, '4.617')] -[2025-08-29 20:27:24,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7645.9, 300 sec: 8441.9). Total num frames: 7208960. Throughput: 0: 1611.8. Samples: 1840004. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:27:24,292][32845] Avg episode reward: [(0, '4.550')] -[2025-08-29 20:27:29,292][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 8526.5). Total num frames: 7208960. Throughput: 0: 1569.1. Samples: 1847664. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:27:29,293][32845] Avg episode reward: [(0, '4.206')] -[2025-08-29 20:27:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7274496. Throughput: 0: 1536.7. Samples: 1855008. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:27:34,292][32845] Avg episode reward: [(0, '4.361')] -[2025-08-29 20:27:39,291][32845] Fps is (10 sec: 6553.8, 60 sec: 5461.3, 300 sec: 8441.9). Total num frames: 7274496. Throughput: 0: 1640.2. Samples: 1867776. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:27:39,292][32845] Avg episode reward: [(0, '4.524')] -[2025-08-29 20:27:39,893][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000112_7340032.pth... -[2025-08-29 20:27:40,041][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000112_7340032.pth -[2025-08-29 20:27:44,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.9, 300 sec: 8664.1). Total num frames: 7405568. Throughput: 0: 1489.2. Samples: 1868732. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:27:44,292][32845] Avg episode reward: [(0, '4.538')] -[2025-08-29 20:27:49,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.9, 300 sec: 8664.1). Total num frames: 7405568. Throughput: 0: 1550.7. Samples: 1879184. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:27:49,292][32845] Avg episode reward: [(0, '4.398')] -[2025-08-29 20:27:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7405568. Throughput: 0: 1768.6. Samples: 1898308. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:27:54,292][32845] Avg episode reward: [(0, '4.354')] -[2025-08-29 20:27:59,722][32845] Fps is (10 sec: 0.0, 60 sec: 6506.9, 300 sec: 8429.6). Total num frames: 7405568. Throughput: 0: 1616.4. Samples: 1898308. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:27:59,723][32845] Avg episode reward: [(0, '4.413')] -[2025-08-29 20:28:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7471104. Throughput: 0: 1595.9. Samples: 1906904. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:28:04,292][32845] Avg episode reward: [(0, '4.418')] -[2025-08-29 20:28:09,291][32845] Fps is (10 sec: 13696.9, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7536640. Throughput: 0: 1709.4. Samples: 1916928. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:28:09,292][32845] Avg episode reward: [(0, '4.466')] -[2025-08-29 20:28:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7536640. Throughput: 0: 1700.4. Samples: 1924180. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:28:14,293][32845] Avg episode reward: [(0, '4.288')] -[2025-08-29 20:28:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8664.1). Total num frames: 7602176. Throughput: 0: 1742.0. Samples: 1933396. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:28:19,292][32845] Avg episode reward: [(0, '4.551')] -[2025-08-29 20:28:24,291][32845] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7602176. Throughput: 0: 1766.3. Samples: 1947260. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:28:24,293][32845] Avg episode reward: [(0, '4.326')] -[2025-08-29 20:28:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 8664.1). Total num frames: 7667712. Throughput: 0: 1805.8. Samples: 1949992. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:28:29,293][32845] Avg episode reward: [(0, '4.229')] -[2025-08-29 20:28:35,546][32845] Fps is (10 sec: 5822.9, 60 sec: 6419.3, 300 sec: 8184.9). Total num frames: 7667712. Throughput: 0: 1695.7. Samples: 1957620. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:28:35,548][32845] Avg episode reward: [(0, '4.311')] -[2025-08-29 20:28:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 8441.9). Total num frames: 7733248. Throughput: 0: 1508.1. Samples: 1966172. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:28:39,292][32845] Avg episode reward: [(0, '4.286')] -[2025-08-29 20:28:44,291][32845] Fps is (10 sec: 7494.1, 60 sec: 5461.3, 300 sec: 8441.9). Total num frames: 7733248. Throughput: 0: 1677.8. Samples: 1973088. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:28:44,292][32845] Avg episode reward: [(0, '4.501')] -[2025-08-29 20:28:49,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7798784. Throughput: 0: 1681.1. Samples: 1982552. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:28:49,293][32845] Avg episode reward: [(0, '4.563')] -[2025-08-29 20:28:54,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7798784. Throughput: 0: 1819.0. Samples: 1998784. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:28:54,292][32845] Avg episode reward: [(0, '4.359')] -[2025-08-29 20:28:56,363][48863] Updated weights for policy 0, policy_version 120 (0.0014) -[2025-08-29 20:28:59,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7701.1, 300 sec: 8441.9). Total num frames: 7864320. Throughput: 0: 1663.8. Samples: 1999052. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:28:59,292][32845] Avg episode reward: [(0, '4.599')] -[2025-08-29 20:29:04,292][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7864320. Throughput: 0: 1807.8. Samples: 2014748. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:29:04,293][32845] Avg episode reward: [(0, '4.603')] -[2025-08-29 20:29:11,376][32845] Fps is (10 sec: 5422.9, 60 sec: 6333.5, 300 sec: 8382.7). Total num frames: 7929856. Throughput: 0: 1445.7. Samples: 2015332. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:11,377][32845] Avg episode reward: [(0, '4.635')] -[2025-08-29 20:29:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7929856. Throughput: 0: 1585.1. Samples: 2021320. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:14,292][32845] Avg episode reward: [(0, '4.391')] -[2025-08-29 20:29:19,291][32845] Fps is (10 sec: 8279.8, 60 sec: 6553.6, 300 sec: 8453.9). Total num frames: 7995392. Throughput: 0: 1694.4. Samples: 2031740. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:19,293][32845] Avg episode reward: [(0, '4.446')] -[2025-08-29 20:29:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 7995392. Throughput: 0: 1789.1. Samples: 2046680. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:24,292][32845] Avg episode reward: [(0, '4.259')] -[2025-08-29 20:29:29,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 8060928. Throughput: 0: 1664.7. Samples: 2048000. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:29,293][32845] Avg episode reward: [(0, '4.173')] -[2025-08-29 20:29:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6693.6, 300 sec: 8441.9). Total num frames: 8060928. Throughput: 0: 1786.9. Samples: 2062964. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:34,292][32845] Avg episode reward: [(0, '4.306')] -[2025-08-29 20:29:36,151][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000124_8126464.pth... -[2025-08-29 20:29:36,342][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000124_8126464.pth -[2025-08-29 20:29:39,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 8126464. Throughput: 0: 1477.7. Samples: 2065280. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:39,292][32845] Avg episode reward: [(0, '4.299')] -[2025-08-29 20:29:47,211][32845] Fps is (10 sec: 10144.9, 60 sec: 7291.0, 300 sec: 8359.2). Total num frames: 8192000. Throughput: 0: 1499.0. Samples: 2070884. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:47,212][32845] Avg episode reward: [(0, '4.354')] -[2025-08-29 20:29:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 8192000. Throughput: 0: 1420.2. Samples: 2078656. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:49,292][32845] Avg episode reward: [(0, '4.283')] -[2025-08-29 20:29:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 8254.9). Total num frames: 8192000. Throughput: 0: 1904.5. Samples: 2097064. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:54,292][32845] Avg episode reward: [(0, '4.247')] -[2025-08-29 20:29:59,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 8257536. Throughput: 0: 1734.6. Samples: 2099376. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:29:59,293][32845] Avg episode reward: [(0, '4.460')] -[2025-08-29 20:30:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7997.6). Total num frames: 8257536. Throughput: 0: 1820.0. Samples: 2113640. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:30:04,293][32845] Avg episode reward: [(0, '4.454')] -[2025-08-29 20:30:09,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6789.5, 300 sec: 7997.6). Total num frames: 8323072. Throughput: 0: 1715.3. Samples: 2123868. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:30:09,292][32845] Avg episode reward: [(0, '4.355')] -[2025-08-29 20:30:14,291][32845] Fps is (10 sec: 13107.4, 60 sec: 7645.9, 300 sec: 7997.6). Total num frames: 8388608. Throughput: 0: 1822.6. Samples: 2130016. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:30:14,292][32845] Avg episode reward: [(0, '4.435')] -[2025-08-29 20:30:19,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 8388608. Throughput: 0: 1784.4. Samples: 2143264. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:30:19,293][32845] Avg episode reward: [(0, '4.356')] -[2025-08-29 20:30:24,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 8388608. Throughput: 0: 1783.1. Samples: 2145520. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:30:24,292][32845] Avg episode reward: [(0, '4.356')] -[2025-08-29 20:30:29,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7831.0). Total num frames: 8454144. Throughput: 0: 1870.8. Samples: 2149608. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:30:29,292][32845] Avg episode reward: [(0, '4.444')] -[2025-08-29 20:30:33,959][48863] Updated weights for policy 0, policy_version 130 (0.0011) -[2025-08-29 20:30:34,291][32845] Fps is (10 sec: 13107.1, 60 sec: 7645.9, 300 sec: 7775.5). Total num frames: 8519680. Throughput: 0: 1869.2. Samples: 2162772. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:30:34,292][32845] Avg episode reward: [(0, '4.357')] -[2025-08-29 20:30:39,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 8519680. Throughput: 0: 1738.5. Samples: 2175296. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:30:39,292][32845] Avg episode reward: [(0, '4.759')] -[2025-08-29 20:30:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6888.9, 300 sec: 7553.3). Total num frames: 8585216. Throughput: 0: 1773.0. Samples: 2179160. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:30:44,292][32845] Avg episode reward: [(0, '4.477')] -[2025-08-29 20:30:49,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7331.1). Total num frames: 8585216. Throughput: 0: 1771.1. Samples: 2193340. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:30:49,293][32845] Avg episode reward: [(0, '4.384')] -[2025-08-29 20:30:54,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7331.2). Total num frames: 8650752. Throughput: 0: 1737.3. Samples: 2202048. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:30:54,292][32845] Avg episode reward: [(0, '4.551')] -[2025-08-29 20:30:59,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 8650752. Throughput: 0: 1743.2. Samples: 2208460. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:30:59,292][32845] Avg episode reward: [(0, '4.641')] -[2025-08-29 20:31:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7404.6). Total num frames: 8716288. Throughput: 0: 1526.0. Samples: 2211932. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:31:04,292][32845] Avg episode reward: [(0, '4.590')] -[2025-08-29 20:31:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 8716288. Throughput: 0: 1819.4. Samples: 2227392. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:31:09,292][32845] Avg episode reward: [(0, '4.333')] -[2025-08-29 20:31:14,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 8781824. Throughput: 0: 1748.4. Samples: 2228288. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:31:14,292][32845] Avg episode reward: [(0, '4.653')] -[2025-08-29 20:31:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 8781824. Throughput: 0: 1801.1. Samples: 2243820. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:31:19,293][32845] Avg episode reward: [(0, '4.738')] -[2025-08-29 20:31:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.8, 300 sec: 6886.8). Total num frames: 8847360. Throughput: 0: 1734.6. Samples: 2253352. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:31:24,292][32845] Avg episode reward: [(0, '4.554')] -[2025-08-29 20:31:29,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 8847360. Throughput: 0: 1813.2. Samples: 2260752. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:31:29,292][32845] Avg episode reward: [(0, '4.445')] -[2025-08-29 20:31:34,711][32845] Fps is (10 sec: 6289.9, 60 sec: 6508.1, 300 sec: 6655.2). Total num frames: 8912896. Throughput: 0: 1491.8. Samples: 2261096. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:31:34,712][32845] Avg episode reward: [(0, '4.395')] -[2025-08-29 20:31:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 8912896. Throughput: 0: 1648.7. Samples: 2276240. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:31:39,292][32845] Avg episode reward: [(0, '4.484')] -[2025-08-29 20:31:42,653][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000137_8978432.pth... -[2025-08-29 20:31:42,859][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000137_8978432.pth -[2025-08-29 20:31:44,291][32845] Fps is (10 sec: 6840.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 8978432. Throughput: 0: 1533.7. Samples: 2277476. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:31:44,292][32845] Avg episode reward: [(0, '4.455')] -[2025-08-29 20:31:49,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 9043968. Throughput: 0: 1505.4. Samples: 2279676. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:31:49,292][32845] Avg episode reward: [(0, '4.482')] -[2025-08-29 20:31:54,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 9043968. Throughput: 0: 1555.5. Samples: 2297388. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:31:54,292][32845] Avg episode reward: [(0, '4.394')] -[2025-08-29 20:31:59,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 9043968. Throughput: 0: 1746.0. Samples: 2306856. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:31:59,292][32845] Avg episode reward: [(0, '4.398')] -[2025-08-29 20:32:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 9109504. Throughput: 0: 1702.3. Samples: 2320424. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:32:04,292][32845] Avg episode reward: [(0, '4.498')] -[2025-08-29 20:32:06,198][48863] Updated weights for policy 0, policy_version 140 (0.0022) -[2025-08-29 20:32:10,543][32845] Fps is (10 sec: 11648.6, 60 sec: 7489.6, 300 sec: 6857.7). Total num frames: 9175040. Throughput: 0: 1582.1. Samples: 2326528. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:32:10,545][32845] Avg episode reward: [(0, '4.462')] -[2025-08-29 20:32:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 9175040. Throughput: 0: 1634.4. Samples: 2334300. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:32:14,292][32845] Avg episode reward: [(0, '4.427')] -[2025-08-29 20:32:19,291][32845] Fps is (10 sec: 7491.6, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 9240576. Throughput: 0: 2015.1. Samples: 2350932. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:32:19,293][32845] Avg episode reward: [(0, '4.382')] -[2025-08-29 20:32:24,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 9306112. Throughput: 0: 2073.7. Samples: 2369556. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:32:24,292][32845] Avg episode reward: [(0, '4.416')] -[2025-08-29 20:32:29,291][32845] Fps is (10 sec: 13107.2, 60 sec: 8738.1, 300 sec: 7109.0). Total num frames: 9371648. Throughput: 0: 2182.3. Samples: 2375680. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:32:29,293][32845] Avg episode reward: [(0, '4.541')] -[2025-08-29 20:32:34,291][32845] Fps is (10 sec: 13107.2, 60 sec: 8799.7, 300 sec: 7331.1). Total num frames: 9437184. Throughput: 0: 2552.2. Samples: 2394524. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:32:34,292][32845] Avg episode reward: [(0, '4.548')] -[2025-08-29 20:32:39,291][32845] Fps is (10 sec: 13107.5, 60 sec: 9830.4, 300 sec: 7109.0). Total num frames: 9502720. Throughput: 0: 2601.7. Samples: 2414464. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:32:39,292][32845] Avg episode reward: [(0, '4.595')] -[2025-08-29 20:32:46,386][32845] Fps is (10 sec: 10836.7, 60 sec: 9498.7, 300 sec: 7279.4). Total num frames: 9568256. Throughput: 0: 2505.1. Samples: 2424832. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:32:46,387][32845] Avg episode reward: [(0, '4.577')] -[2025-08-29 20:32:49,291][32845] Fps is (10 sec: 6553.5, 60 sec: 8738.1, 300 sec: 7331.1). Total num frames: 9568256. Throughput: 0: 2489.6. Samples: 2432456. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:32:49,292][32845] Avg episode reward: [(0, '4.536')] -[2025-08-29 20:32:54,291][32845] Fps is (10 sec: 8290.7, 60 sec: 9830.4, 300 sec: 7564.3). Total num frames: 9633792. Throughput: 0: 2823.7. Samples: 2450060. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:32:54,292][32845] Avg episode reward: [(0, '4.695')] -[2025-08-29 20:32:59,291][32845] Fps is (10 sec: 13107.4, 60 sec: 10922.7, 300 sec: 7553.3). Total num frames: 9699328. Throughput: 0: 2740.0. Samples: 2457600. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:32:59,292][32845] Avg episode reward: [(0, '4.497')] -[2025-08-29 20:33:04,291][32845] Fps is (10 sec: 13107.1, 60 sec: 10922.7, 300 sec: 7553.3). Total num frames: 9764864. Throughput: 0: 2758.9. Samples: 2475080. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:04,292][32845] Avg episode reward: [(0, '4.463')] -[2025-08-29 20:33:07,455][48863] Updated weights for policy 0, policy_version 150 (0.0016) -[2025-08-29 20:33:09,291][32845] Fps is (10 sec: 13106.9, 60 sec: 11155.5, 300 sec: 7775.5). Total num frames: 9830400. Throughput: 0: 2787.5. Samples: 2494992. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:09,293][32845] Avg episode reward: [(0, '4.364')] -[2025-08-29 20:33:12,744][48846] Signal inference workers to stop experience collection... (150 times) -[2025-08-29 20:33:12,755][48863] InferenceWorker_p0-w0: stopping experience collection (150 times) -[2025-08-29 20:33:12,764][48846] Signal inference workers to resume experience collection... (150 times) -[2025-08-29 20:33:12,765][48863] InferenceWorker_p0-w0: resuming experience collection (150 times) -[2025-08-29 20:33:14,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12015.0, 300 sec: 7775.5). Total num frames: 9895936. Throughput: 0: 2912.7. Samples: 2506752. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:14,292][32845] Avg episode reward: [(0, '4.346')] -[2025-08-29 20:33:22,211][32845] Fps is (10 sec: 5072.7, 60 sec: 10415.9, 300 sec: 7699.3). Total num frames: 9895936. Throughput: 0: 2667.1. Samples: 2522332. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:22,212][32845] Avg episode reward: [(0, '4.315')] -[2025-08-29 20:33:24,291][32845] Fps is (10 sec: 0.0, 60 sec: 9830.4, 300 sec: 7553.3). Total num frames: 9895936. Throughput: 0: 2414.9. Samples: 2523136. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:24,292][32845] Avg episode reward: [(0, '4.315')] -[2025-08-29 20:33:29,291][32845] Fps is (10 sec: 9255.7, 60 sec: 9830.4, 300 sec: 7808.7). Total num frames: 9961472. Throughput: 0: 2395.0. Samples: 2527588. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:29,293][32845] Avg episode reward: [(0, '4.185')] -[2025-08-29 20:33:34,218][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000153_10027008.pth... -[2025-08-29 20:33:34,291][32845] Fps is (10 sec: 13106.9, 60 sec: 9830.4, 300 sec: 7775.5). Total num frames: 10027008. Throughput: 0: 2381.2. Samples: 2539612. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:34,301][32845] Avg episode reward: [(0, '4.232')] -[2025-08-29 20:33:34,363][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000153_10027008.pth -[2025-08-29 20:33:39,291][32845] Fps is (10 sec: 13107.2, 60 sec: 9830.4, 300 sec: 7997.6). Total num frames: 10092544. Throughput: 0: 2011.2. Samples: 2540564. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:39,293][32845] Avg episode reward: [(0, '4.358')] -[2025-08-29 20:33:44,291][32845] Fps is (10 sec: 6553.7, 60 sec: 9054.3, 300 sec: 7775.5). Total num frames: 10092544. Throughput: 0: 2065.4. Samples: 2550544. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:44,292][32845] Avg episode reward: [(0, '4.495')] -[2025-08-29 20:33:49,292][32845] Fps is (10 sec: 0.0, 60 sec: 8738.1, 300 sec: 7775.4). Total num frames: 10092544. Throughput: 0: 2125.2. Samples: 2570716. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:49,294][32845] Avg episode reward: [(0, '4.501')] -[2025-08-29 20:33:54,291][32845] Fps is (10 sec: 6553.6, 60 sec: 8738.1, 300 sec: 7775.5). Total num frames: 10158080. Throughput: 0: 1870.8. Samples: 2579180. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:54,292][32845] Avg episode reward: [(0, '4.637')] -[2025-08-29 20:33:59,291][32845] Fps is (10 sec: 6553.8, 60 sec: 7645.8, 300 sec: 7775.5). Total num frames: 10158080. Throughput: 0: 1671.1. Samples: 2581952. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:33:59,293][32845] Avg episode reward: [(0, '4.605')] -[2025-08-29 20:34:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7830.8). Total num frames: 10223616. Throughput: 0: 1579.2. Samples: 2588784. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:34:04,292][32845] Avg episode reward: [(0, '4.621')] -[2025-08-29 20:34:09,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 10223616. Throughput: 0: 1755.3. Samples: 2602124. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:34:09,292][32845] Avg episode reward: [(0, '4.352')] -[2025-08-29 20:34:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 10289152. Throughput: 0: 1723.6. Samples: 2605152. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:34:14,292][32845] Avg episode reward: [(0, '4.312')] -[2025-08-29 20:34:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6888.8, 300 sec: 7775.5). Total num frames: 10289152. Throughput: 0: 1756.3. Samples: 2618644. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:34:19,292][32845] Avg episode reward: [(0, '4.403')] -[2025-08-29 20:34:24,292][32845] Fps is (10 sec: 6553.4, 60 sec: 7645.8, 300 sec: 7775.5). Total num frames: 10354688. Throughput: 0: 1891.5. Samples: 2625680. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:34:24,294][32845] Avg episode reward: [(0, '4.375')] -[2025-08-29 20:34:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 10354688. Throughput: 0: 1865.5. Samples: 2634492. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:34:29,292][32845] Avg episode reward: [(0, '4.362')] -[2025-08-29 20:34:34,291][32845] Fps is (10 sec: 0.0, 60 sec: 5461.4, 300 sec: 7553.3). Total num frames: 10354688. Throughput: 0: 1494.0. Samples: 2637944. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:34:34,292][32845] Avg episode reward: [(0, '4.362')] -[2025-08-29 20:34:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 7628.8). Total num frames: 10420224. Throughput: 0: 1506.0. Samples: 2646952. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:34:39,292][32845] Avg episode reward: [(0, '4.325')] -[2025-08-29 20:34:44,273][48863] Updated weights for policy 0, policy_version 160 (0.0031) -[2025-08-29 20:34:44,291][32845] Fps is (10 sec: 13106.8, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 10485760. Throughput: 0: 1605.6. Samples: 2654204. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:34:44,293][32845] Avg episode reward: [(0, '4.306')] -[2025-08-29 20:34:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 10485760. Throughput: 0: 1690.4. Samples: 2664852. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:34:49,292][32845] Avg episode reward: [(0, '4.374')] -[2025-08-29 20:34:54,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 10551296. Throughput: 0: 1579.2. Samples: 2673188. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:34:54,293][32845] Avg episode reward: [(0, '4.364')] -[2025-08-29 20:34:59,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 10551296. Throughput: 0: 1688.9. Samples: 2681152. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:34:59,292][32845] Avg episode reward: [(0, '4.282')] -[2025-08-29 20:35:04,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 10616832. Throughput: 0: 1542.5. Samples: 2688056. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:04,292][32845] Avg episode reward: [(0, '4.447')] -[2025-08-29 20:35:09,711][32845] Fps is (10 sec: 6289.3, 60 sec: 6508.0, 300 sec: 7542.6). Total num frames: 10616832. Throughput: 0: 1547.4. Samples: 2695964. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:09,713][32845] Avg episode reward: [(0, '4.397')] -[2025-08-29 20:35:14,291][32845] Fps is (10 sec: 0.0, 60 sec: 5461.3, 300 sec: 7553.3). Total num frames: 10616832. Throughput: 0: 1516.2. Samples: 2702720. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:14,292][32845] Avg episode reward: [(0, '4.046')] -[2025-08-29 20:35:19,292][32845] Fps is (10 sec: 6840.9, 60 sec: 6553.6, 300 sec: 7775.4). Total num frames: 10682368. Throughput: 0: 1591.9. Samples: 2709580. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:19,293][32845] Avg episode reward: [(0, '4.265')] -[2025-08-29 20:35:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 5461.4, 300 sec: 7553.3). Total num frames: 10682368. Throughput: 0: 1619.6. Samples: 2719836. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:24,292][32845] Avg episode reward: [(0, '4.346')] -[2025-08-29 20:35:29,291][32845] Fps is (10 sec: 6553.9, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 10747904. Throughput: 0: 1564.5. Samples: 2724608. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:29,292][32845] Avg episode reward: [(0, '4.344')] -[2025-08-29 20:35:33,605][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000165_10813440.pth... -[2025-08-29 20:35:33,756][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000165_10813440.pth -[2025-08-29 20:35:34,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.9, 300 sec: 7775.5). Total num frames: 10813440. Throughput: 0: 1583.9. Samples: 2736128. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:34,292][32845] Avg episode reward: [(0, '4.342')] -[2025-08-29 20:35:39,291][32845] Fps is (10 sec: 13107.1, 60 sec: 7645.9, 300 sec: 7775.5). Total num frames: 10878976. Throughput: 0: 1478.8. Samples: 2739732. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:39,292][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 20:35:45,542][32845] Fps is (10 sec: 5824.9, 60 sec: 6419.8, 300 sec: 7742.6). Total num frames: 10878976. Throughput: 0: 1461.2. Samples: 2748732. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:45,544][32845] Avg episode reward: [(0, '4.468')] -[2025-08-29 20:35:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 10878976. Throughput: 0: 1527.4. Samples: 2756788. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:49,293][32845] Avg episode reward: [(0, '4.320')] -[2025-08-29 20:35:54,291][32845] Fps is (10 sec: 7490.6, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 10944512. Throughput: 0: 1636.0. Samples: 2768896. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:54,292][32845] Avg episode reward: [(0, '4.417')] -[2025-08-29 20:35:59,291][32845] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 10944512. Throughput: 0: 1607.7. Samples: 2775068. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:35:59,292][32845] Avg episode reward: [(0, '4.618')] -[2025-08-29 20:36:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 11010048. Throughput: 0: 1684.5. Samples: 2785380. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:36:04,292][32845] Avg episode reward: [(0, '4.537')] -[2025-08-29 20:36:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6599.8, 300 sec: 7553.3). Total num frames: 11010048. Throughput: 0: 1782.8. Samples: 2800064. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:36:09,292][32845] Avg episode reward: [(0, '4.430')] -[2025-08-29 20:36:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7775.5). Total num frames: 11075584. Throughput: 0: 1714.6. Samples: 2801764. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:36:14,292][32845] Avg episode reward: [(0, '4.452')] -[2025-08-29 20:36:21,386][32845] Fps is (10 sec: 5418.5, 60 sec: 6332.5, 300 sec: 7500.0). Total num frames: 11075584. Throughput: 0: 1701.8. Samples: 2816272. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:36:21,388][32845] Avg episode reward: [(0, '4.571')] -[2025-08-29 20:36:24,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 11075584. Throughput: 0: 1740.4. Samples: 2818048. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:36:24,292][32845] Avg episode reward: [(0, '4.578')] -[2025-08-29 20:36:25,287][48863] Updated weights for policy 0, policy_version 170 (0.0012) -[2025-08-29 20:36:29,291][32845] Fps is (10 sec: 8290.4, 60 sec: 6553.6, 300 sec: 7564.1). Total num frames: 11141120. Throughput: 0: 1653.7. Samples: 2821080. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:36:29,292][32845] Avg episode reward: [(0, '4.448')] -[2025-08-29 20:36:34,291][32845] Fps is (10 sec: 6553.5, 60 sec: 5461.3, 300 sec: 7553.3). Total num frames: 11141120. Throughput: 0: 1727.8. Samples: 2834540. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:36:34,293][32845] Avg episode reward: [(0, '4.399')] -[2025-08-29 20:36:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 7553.3). Total num frames: 11206656. Throughput: 0: 1708.1. Samples: 2845760. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:36:39,292][32845] Avg episode reward: [(0, '4.331')] -[2025-08-29 20:36:44,291][32845] Fps is (10 sec: 13107.5, 60 sec: 6693.2, 300 sec: 7553.3). Total num frames: 11272192. Throughput: 0: 1685.3. Samples: 2850908. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:36:44,292][32845] Avg episode reward: [(0, '4.380')] -[2025-08-29 20:36:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 11272192. Throughput: 0: 1739.4. Samples: 2863652. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:36:49,292][32845] Avg episode reward: [(0, '4.476')] -[2025-08-29 20:36:57,216][32845] Fps is (10 sec: 5070.6, 60 sec: 6249.0, 300 sec: 7699.1). Total num frames: 11337728. Throughput: 0: 1485.7. Samples: 2871264. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:36:57,217][32845] Avg episode reward: [(0, '4.480')] -[2025-08-29 20:36:59,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 11337728. Throughput: 0: 1557.9. Samples: 2871868. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:36:59,292][32845] Avg episode reward: [(0, '4.419')] -[2025-08-29 20:37:04,291][32845] Fps is (10 sec: 0.0, 60 sec: 5461.3, 300 sec: 7362.4). Total num frames: 11337728. Throughput: 0: 1568.9. Samples: 2883584. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:04,292][32845] Avg episode reward: [(0, '4.511')] -[2025-08-29 20:37:09,291][32845] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 11403264. Throughput: 0: 1658.9. Samples: 2892700. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:09,293][32845] Avg episode reward: [(0, '4.214')] -[2025-08-29 20:37:14,291][32845] Fps is (10 sec: 13106.9, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 11468800. Throughput: 0: 1751.0. Samples: 2899876. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:14,293][32845] Avg episode reward: [(0, '4.437')] -[2025-08-29 20:37:19,291][32845] Fps is (10 sec: 6553.8, 60 sec: 6790.7, 300 sec: 7331.1). Total num frames: 11468800. Throughput: 0: 1686.2. Samples: 2910420. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:19,292][32845] Avg episode reward: [(0, '4.310')] -[2025-08-29 20:37:24,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7645.9, 300 sec: 7331.1). Total num frames: 11534336. Throughput: 0: 1589.0. Samples: 2917264. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:24,292][32845] Avg episode reward: [(0, '4.443')] -[2025-08-29 20:37:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 11534336. Throughput: 0: 1649.4. Samples: 2925132. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:29,292][32845] Avg episode reward: [(0, '4.328')] -[2025-08-29 20:37:34,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 11534336. Throughput: 0: 1534.0. Samples: 2932680. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:34,292][32845] Avg episode reward: [(0, '4.393')] -[2025-08-29 20:37:35,641][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_11599872.pth... -[2025-08-29 20:37:35,793][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_11599872.pth -[2025-08-29 20:37:39,291][32845] Fps is (10 sec: 13107.1, 60 sec: 7645.8, 300 sec: 7159.8). Total num frames: 11665408. Throughput: 0: 1482.5. Samples: 2933640. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:39,293][32845] Avg episode reward: [(0, '4.448')] -[2025-08-29 20:37:44,291][32845] Fps is (10 sec: 13107.0, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 11665408. Throughput: 0: 1554.5. Samples: 2941820. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:44,293][32845] Avg episode reward: [(0, '4.510')] -[2025-08-29 20:37:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 11665408. Throughput: 0: 1746.7. Samples: 2962184. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:49,293][32845] Avg episode reward: [(0, '4.506')] -[2025-08-29 20:37:54,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6889.4, 300 sec: 6886.8). Total num frames: 11730944. Throughput: 0: 1725.3. Samples: 2970340. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:54,292][32845] Avg episode reward: [(0, '4.522')] -[2025-08-29 20:37:59,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 11730944. Throughput: 0: 1775.7. Samples: 2979784. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:37:59,293][32845] Avg episode reward: [(0, '4.527')] -[2025-08-29 20:38:01,046][48863] Updated weights for policy 0, policy_version 180 (0.0012) -[2025-08-29 20:38:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 6664.7). Total num frames: 11796480. Throughput: 0: 1676.6. Samples: 2985868. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:38:04,292][32845] Avg episode reward: [(0, '4.616')] -[2025-08-29 20:38:09,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 11796480. Throughput: 0: 1720.5. Samples: 2994688. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:38:09,292][32845] Avg episode reward: [(0, '4.413')] -[2025-08-29 20:38:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6731.3). Total num frames: 11862016. Throughput: 0: 1625.3. Samples: 2998272. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:38:14,292][32845] Avg episode reward: [(0, '4.253')] -[2025-08-29 20:38:19,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 11862016. Throughput: 0: 1654.5. Samples: 3007132. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:38:19,293][32845] Avg episode reward: [(0, '4.482')] -[2025-08-29 20:38:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 11927552. Throughput: 0: 1802.4. Samples: 3014748. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:38:24,293][32845] Avg episode reward: [(0, '4.496')] -[2025-08-29 20:38:29,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 11927552. Throughput: 0: 1813.4. Samples: 3023424. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:38:29,292][32845] Avg episode reward: [(0, '4.430')] -[2025-08-29 20:38:34,291][32845] Fps is (10 sec: 6553.5, 60 sec: 7645.8, 300 sec: 6442.5). Total num frames: 11993088. Throughput: 0: 1540.0. Samples: 3031484. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:38:34,293][32845] Avg episode reward: [(0, '4.473')] -[2025-08-29 20:38:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 6442.5). Total num frames: 11993088. Throughput: 0: 1710.0. Samples: 3047292. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:38:39,292][32845] Avg episode reward: [(0, '4.466')] -[2025-08-29 20:38:44,714][32845] Fps is (10 sec: 0.0, 60 sec: 5423.1, 300 sec: 6433.3). Total num frames: 11993088. Throughput: 0: 1486.2. Samples: 3047292. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:38:44,716][32845] Avg episode reward: [(0, '4.466')] -[2025-08-29 20:38:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 12058624. Throughput: 0: 1520.9. Samples: 3054308. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:38:49,293][32845] Avg episode reward: [(0, '4.365')] -[2025-08-29 20:38:54,291][32845] Fps is (10 sec: 13685.9, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 12124160. Throughput: 0: 1538.1. Samples: 3063904. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:38:54,292][32845] Avg episode reward: [(0, '4.388')] -[2025-08-29 20:38:59,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 12124160. Throughput: 0: 1660.2. Samples: 3072980. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 20:38:59,292][32845] Avg episode reward: [(0, '4.390')] -[2025-08-29 20:39:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 12189696. Throughput: 0: 1648.1. Samples: 3081296. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:39:04,292][32845] Avg episode reward: [(0, '4.460')] -[2025-08-29 20:39:09,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 12189696. Throughput: 0: 1815.3. Samples: 3096436. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:39:09,293][32845] Avg episode reward: [(0, '4.291')] -[2025-08-29 20:39:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 12255232. Throughput: 0: 1645.4. Samples: 3097468. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:39:14,292][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 20:39:20,544][32845] Fps is (10 sec: 5823.9, 60 sec: 6419.5, 300 sec: 6415.3). Total num frames: 12255232. Throughput: 0: 1596.9. Samples: 3105344. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:39:20,545][32845] Avg episode reward: [(0, '4.802')] -[2025-08-29 20:39:24,253][48846] Saving new best policy, reward=4.802! -[2025-08-29 20:39:24,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 12320768. Throughput: 0: 1459.3. Samples: 3112960. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:39:24,293][32845] Avg episode reward: [(0, '4.710')] -[2025-08-29 20:39:29,291][32845] Fps is (10 sec: 14984.9, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 12386304. Throughput: 0: 1528.0. Samples: 3115408. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:39:29,292][32845] Avg episode reward: [(0, '4.551')] -[2025-08-29 20:39:34,292][32845] Fps is (10 sec: 6553.0, 60 sec: 6553.5, 300 sec: 6664.7). Total num frames: 12386304. Throughput: 0: 1642.4. Samples: 3128216. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:39:34,293][32845] Avg episode reward: [(0, '4.494')] -[2025-08-29 20:39:39,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 12386304. Throughput: 0: 1806.2. Samples: 3145184. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:39:39,292][32845] Avg episode reward: [(0, '4.543')] -[2025-08-29 20:39:40,756][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000190_12451840.pth... -[2025-08-29 20:39:40,759][48863] Updated weights for policy 0, policy_version 190 (0.0041) -[2025-08-29 20:39:40,880][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000190_12451840.pth -[2025-08-29 20:39:44,291][32845] Fps is (10 sec: 6554.2, 60 sec: 7700.1, 300 sec: 6664.7). Total num frames: 12451840. Throughput: 0: 1636.2. Samples: 3146608. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:39:44,294][32845] Avg episode reward: [(0, '4.582')] -[2025-08-29 20:39:49,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 6664.7). Total num frames: 12517376. Throughput: 0: 1694.6. Samples: 3157552. Policy #0 lag: (min: 1.0, avg: 1.2, max: 3.0) -[2025-08-29 20:39:49,292][32845] Avg episode reward: [(0, '4.304')] -[2025-08-29 20:39:56,373][32845] Fps is (10 sec: 5424.2, 60 sec: 6333.8, 300 sec: 6618.0). Total num frames: 12517376. Throughput: 0: 1518.5. Samples: 3167928. Policy #0 lag: (min: 1.0, avg: 1.2, max: 3.0) -[2025-08-29 20:39:56,375][32845] Avg episode reward: [(0, '4.546')] -[2025-08-29 20:39:59,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6442.5). Total num frames: 12517376. Throughput: 0: 1759.3. Samples: 3176636. Policy #0 lag: (min: 1.0, avg: 1.2, max: 3.0) -[2025-08-29 20:39:59,292][32845] Avg episode reward: [(0, '4.575')] -[2025-08-29 20:40:04,291][32845] Fps is (10 sec: 8277.2, 60 sec: 6553.6, 300 sec: 6674.2). Total num frames: 12582912. Throughput: 0: 1799.5. Samples: 3184068. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:40:04,292][32845] Avg episode reward: [(0, '4.657')] -[2025-08-29 20:40:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 12582912. Throughput: 0: 1822.5. Samples: 3194972. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:40:09,292][32845] Avg episode reward: [(0, '4.534')] -[2025-08-29 20:40:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 12648448. Throughput: 0: 1874.8. Samples: 3199772. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:40:14,292][32845] Avg episode reward: [(0, '4.358')] -[2025-08-29 20:40:19,291][32845] Fps is (10 sec: 6553.4, 60 sec: 6693.4, 300 sec: 6664.7). Total num frames: 12648448. Throughput: 0: 1845.5. Samples: 3211264. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:40:19,293][32845] Avg episode reward: [(0, '4.368')] -[2025-08-29 20:40:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 12713984. Throughput: 0: 1832.1. Samples: 3227628. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:40:24,292][32845] Avg episode reward: [(0, '4.389')] -[2025-08-29 20:40:32,219][32845] Fps is (10 sec: 10138.8, 60 sec: 6248.7, 300 sec: 6599.2). Total num frames: 12779520. Throughput: 0: 1788.3. Samples: 3232316. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:40:32,220][32845] Avg episode reward: [(0, '4.310')] -[2025-08-29 20:40:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.7, 300 sec: 6442.5). Total num frames: 12779520. Throughput: 0: 1911.2. Samples: 3243556. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:40:34,294][32845] Avg episode reward: [(0, '4.340')] -[2025-08-29 20:40:39,291][32845] Fps is (10 sec: 9267.0, 60 sec: 7645.9, 300 sec: 6693.1). Total num frames: 12845056. Throughput: 0: 2155.0. Samples: 3260416. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:40:39,292][32845] Avg episode reward: [(0, '4.485')] -[2025-08-29 20:40:44,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 12910592. Throughput: 0: 2085.2. Samples: 3270468. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:40:44,292][32845] Avg episode reward: [(0, '4.434')] -[2025-08-29 20:40:49,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 12976128. Throughput: 0: 2326.3. Samples: 3288752. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:40:49,292][32845] Avg episode reward: [(0, '4.579')] -[2025-08-29 20:40:54,291][32845] Fps is (10 sec: 13107.1, 60 sec: 9052.3, 300 sec: 7109.0). Total num frames: 13041664. Throughput: 0: 2526.9. Samples: 3308684. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:40:54,292][32845] Avg episode reward: [(0, '4.412')] -[2025-08-29 20:40:55,521][48863] Updated weights for policy 0, policy_version 200 (0.0018) -[2025-08-29 20:40:59,291][32845] Fps is (10 sec: 13107.2, 60 sec: 9830.4, 300 sec: 7109.0). Total num frames: 13107200. Throughput: 0: 2533.6. Samples: 3313784. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:40:59,292][32845] Avg episode reward: [(0, '4.271')] -[2025-08-29 20:41:00,759][48846] Signal inference workers to stop experience collection... (200 times) -[2025-08-29 20:41:00,769][48846] Signal inference workers to resume experience collection... (200 times) -[2025-08-29 20:41:00,770][48863] InferenceWorker_p0-w0: stopping experience collection (200 times) -[2025-08-29 20:41:00,795][48863] InferenceWorker_p0-w0: resuming experience collection (200 times) -[2025-08-29 20:41:04,291][32845] Fps is (10 sec: 13107.3, 60 sec: 9830.4, 300 sec: 7331.1). Total num frames: 13172736. Throughput: 0: 2726.7. Samples: 3333964. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:41:04,292][32845] Avg episode reward: [(0, '4.528')] -[2025-08-29 20:41:09,291][32845] Fps is (10 sec: 6553.5, 60 sec: 9830.4, 300 sec: 7109.0). Total num frames: 13172736. Throughput: 0: 2549.1. Samples: 3342336. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:41:09,292][32845] Avg episode reward: [(0, '4.528')] -[2025-08-29 20:41:14,291][32845] Fps is (10 sec: 6553.4, 60 sec: 9830.4, 300 sec: 7383.6). Total num frames: 13238272. Throughput: 0: 2824.4. Samples: 3351144. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:41:14,293][32845] Avg episode reward: [(0, '4.246')] -[2025-08-29 20:41:19,291][32845] Fps is (10 sec: 13107.1, 60 sec: 10922.7, 300 sec: 7553.3). Total num frames: 13303808. Throughput: 0: 2857.1. Samples: 3372124. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:41:19,293][32845] Avg episode reward: [(0, '4.226')] -[2025-08-29 20:41:24,291][32845] Fps is (10 sec: 13107.4, 60 sec: 10922.7, 300 sec: 7553.3). Total num frames: 13369344. Throughput: 0: 2775.8. Samples: 3385328. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:41:24,292][32845] Avg episode reward: [(0, '4.196')] -[2025-08-29 20:41:29,291][32845] Fps is (10 sec: 13107.4, 60 sec: 11483.0, 300 sec: 7775.5). Total num frames: 13434880. Throughput: 0: 2658.6. Samples: 3390104. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:41:29,292][32845] Avg episode reward: [(0, '4.429')] -[2025-08-29 20:41:31,086][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000206_13500416.pth... -[2025-08-29 20:41:31,246][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000206_13500416.pth -[2025-08-29 20:41:34,291][32845] Fps is (10 sec: 13107.0, 60 sec: 12014.9, 300 sec: 7775.5). Total num frames: 13500416. Throughput: 0: 2668.5. Samples: 3408836. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:41:34,292][32845] Avg episode reward: [(0, '4.421')] -[2025-08-29 20:41:39,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 7775.5). Total num frames: 13565952. Throughput: 0: 2605.6. Samples: 3425936. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:41:39,292][32845] Avg episode reward: [(0, '4.318')] -[2025-08-29 20:41:44,291][32845] Fps is (10 sec: 6553.5, 60 sec: 10922.6, 300 sec: 7775.5). Total num frames: 13565952. Throughput: 0: 2705.2. Samples: 3435520. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-08-29 20:41:44,293][32845] Avg episode reward: [(0, '4.124')] -[2025-08-29 20:41:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 10922.7, 300 sec: 7853.3). Total num frames: 13631488. Throughput: 0: 2638.4. Samples: 3452692. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:41:49,293][32845] Avg episode reward: [(0, '4.142')] -[2025-08-29 20:41:54,291][32845] Fps is (10 sec: 13107.6, 60 sec: 10922.7, 300 sec: 7997.6). Total num frames: 13697024. Throughput: 0: 2761.1. Samples: 3466584. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:41:54,292][32845] Avg episode reward: [(0, '4.336')] -[2025-08-29 20:41:54,720][48863] Updated weights for policy 0, policy_version 210 (0.0054) -[2025-08-29 20:41:59,291][32845] Fps is (10 sec: 13107.2, 60 sec: 10922.7, 300 sec: 8219.8). Total num frames: 13762560. Throughput: 0: 2749.4. Samples: 3474868. Policy #0 lag: (min: 1.0, avg: 1.9, max: 2.0) -[2025-08-29 20:41:59,292][32845] Avg episode reward: [(0, '4.548')] -[2025-08-29 20:42:04,291][32845] Fps is (10 sec: 13107.2, 60 sec: 10922.7, 300 sec: 8219.8). Total num frames: 13828096. Throughput: 0: 2847.9. Samples: 3500280. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:42:04,292][32845] Avg episode reward: [(0, '4.529')] -[2025-08-29 20:42:09,291][32845] Fps is (10 sec: 13106.9, 60 sec: 12014.9, 300 sec: 8219.8). Total num frames: 13893632. Throughput: 0: 2892.3. Samples: 3515480. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:42:09,293][32845] Avg episode reward: [(0, '4.496')] -[2025-08-29 20:42:14,291][32845] Fps is (10 sec: 13107.1, 60 sec: 12015.0, 300 sec: 8441.9). Total num frames: 13959168. Throughput: 0: 2966.5. Samples: 3523596. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:42:14,292][32845] Avg episode reward: [(0, '4.324')] -[2025-08-29 20:42:19,708][32845] Fps is (10 sec: 6291.3, 60 sec: 10847.3, 300 sec: 8208.2). Total num frames: 13959168. Throughput: 0: 2745.7. Samples: 3533536. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:42:19,709][32845] Avg episode reward: [(0, '4.365')] -[2025-08-29 20:42:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 10922.7, 300 sec: 8441.9). Total num frames: 14024704. Throughput: 0: 2548.5. Samples: 3540620. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:42:24,292][32845] Avg episode reward: [(0, '4.398')] -[2025-08-29 20:42:29,291][32845] Fps is (10 sec: 6838.8, 60 sec: 9830.4, 300 sec: 8441.9). Total num frames: 14024704. Throughput: 0: 2520.3. Samples: 3548932. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:42:29,292][32845] Avg episode reward: [(0, '4.724')] -[2025-08-29 20:42:34,291][32845] Fps is (10 sec: 6553.5, 60 sec: 9830.4, 300 sec: 8219.8). Total num frames: 14090240. Throughput: 0: 2305.3. Samples: 3556432. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:42:34,293][32845] Avg episode reward: [(0, '4.712')] -[2025-08-29 20:42:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 8738.1, 300 sec: 8219.8). Total num frames: 14090240. Throughput: 0: 2336.2. Samples: 3571712. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:42:39,292][32845] Avg episode reward: [(0, '4.408')] -[2025-08-29 20:42:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 9830.4, 300 sec: 8441.9). Total num frames: 14155776. Throughput: 0: 2171.1. Samples: 3572568. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:42:44,293][32845] Avg episode reward: [(0, '4.381')] -[2025-08-29 20:42:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 8738.1, 300 sec: 8219.8). Total num frames: 14155776. Throughput: 0: 1948.4. Samples: 3587956. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:42:49,292][32845] Avg episode reward: [(0, '4.543')] -[2025-08-29 20:42:55,544][32845] Fps is (10 sec: 5823.9, 60 sec: 8559.4, 300 sec: 8406.2). Total num frames: 14221312. Throughput: 0: 1572.4. Samples: 3588208. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:42:55,545][32845] Avg episode reward: [(0, '4.301')] -[2025-08-29 20:42:59,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 8219.8). Total num frames: 14221312. Throughput: 0: 1616.4. Samples: 3596336. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:42:59,292][32845] Avg episode reward: [(0, '4.472')] -[2025-08-29 20:43:04,291][32845] Fps is (10 sec: 7492.6, 60 sec: 7645.9, 300 sec: 8441.9). Total num frames: 14286848. Throughput: 0: 1613.1. Samples: 3605452. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:43:04,292][32845] Avg episode reward: [(0, '4.299')] -[2025-08-29 20:43:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 14286848. Throughput: 0: 1783.1. Samples: 3620860. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:43:09,292][32845] Avg episode reward: [(0, '4.430')] -[2025-08-29 20:43:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 14352384. Throughput: 0: 1612.2. Samples: 3621480. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:43:14,292][32845] Avg episode reward: [(0, '4.465')] -[2025-08-29 20:43:19,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6599.5, 300 sec: 8219.8). Total num frames: 14352384. Throughput: 0: 1791.2. Samples: 3637036. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:43:19,293][32845] Avg episode reward: [(0, '4.547')] -[2025-08-29 20:43:20,955][48863] Updated weights for policy 0, policy_version 220 (0.0012) -[2025-08-29 20:43:24,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 14417920. Throughput: 0: 1649.0. Samples: 3645916. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:43:24,292][32845] Avg episode reward: [(0, '4.679')] -[2025-08-29 20:43:31,375][32845] Fps is (10 sec: 5423.5, 60 sec: 6333.6, 300 sec: 8162.1). Total num frames: 14417920. Throughput: 0: 1720.3. Samples: 3653564. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:43:31,376][32845] Avg episode reward: [(0, '4.806')] -[2025-08-29 20:43:32,424][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000221_14483456.pth... -[2025-08-29 20:43:32,559][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000221_14483456.pth -[2025-08-29 20:43:32,562][48846] Saving new best policy, reward=4.806! -[2025-08-29 20:43:34,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 14483456. Throughput: 0: 1459.9. Samples: 3653652. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:43:34,292][32845] Avg episode reward: [(0, '4.865')] -[2025-08-29 20:43:39,291][32845] Fps is (10 sec: 16557.5, 60 sec: 7645.9, 300 sec: 8676.5). Total num frames: 14548992. Throughput: 0: 1647.9. Samples: 3660300. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:43:39,292][32845] Avg episode reward: [(0, '4.520')] -[2025-08-29 20:43:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 14548992. Throughput: 0: 1621.1. Samples: 3669284. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:43:44,292][32845] Avg episode reward: [(0, '4.658')] -[2025-08-29 20:43:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 14548992. Throughput: 0: 1798.8. Samples: 3686396. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:43:49,292][32845] Avg episode reward: [(0, '4.522')] -[2025-08-29 20:43:49,439][48846] Saving new best policy, reward=4.865! -[2025-08-29 20:43:54,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7809.0, 300 sec: 8664.1). Total num frames: 14680064. Throughput: 0: 1486.6. Samples: 3687756. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:43:54,292][32845] Avg episode reward: [(0, '4.194')] -[2025-08-29 20:43:59,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.9, 300 sec: 8441.9). Total num frames: 14680064. Throughput: 0: 1653.0. Samples: 3695864. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:43:59,292][32845] Avg episode reward: [(0, '4.281')] -[2025-08-29 20:44:07,212][32845] Fps is (10 sec: 0.0, 60 sec: 6249.4, 300 sec: 8359.2). Total num frames: 14680064. Throughput: 0: 1633.7. Samples: 3715324. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:44:07,213][32845] Avg episode reward: [(0, '4.337')] -[2025-08-29 20:44:09,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 14680064. Throughput: 0: 1627.8. Samples: 3719168. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:44:09,292][32845] Avg episode reward: [(0, '4.347')] -[2025-08-29 20:44:14,291][32845] Fps is (10 sec: 9257.4, 60 sec: 6553.6, 300 sec: 8477.9). Total num frames: 14745600. Throughput: 0: 1619.7. Samples: 3723076. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:44:14,292][32845] Avg episode reward: [(0, '4.539')] -[2025-08-29 20:44:19,292][32845] Fps is (10 sec: 6552.8, 60 sec: 6553.5, 300 sec: 8219.7). Total num frames: 14745600. Throughput: 0: 1819.9. Samples: 3735552. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:44:19,294][32845] Avg episode reward: [(0, '4.616')] -[2025-08-29 20:44:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 14811136. Throughput: 0: 1945.0. Samples: 3747824. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:44:24,293][32845] Avg episode reward: [(0, '4.283')] -[2025-08-29 20:44:29,291][32845] Fps is (10 sec: 13108.5, 60 sec: 7920.9, 300 sec: 8441.9). Total num frames: 14876672. Throughput: 0: 1836.7. Samples: 3751936. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:44:29,293][32845] Avg episode reward: [(0, '4.402')] -[2025-08-29 20:44:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 14876672. Throughput: 0: 1746.6. Samples: 3764992. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:44:34,292][32845] Avg episode reward: [(0, '4.344')] -[2025-08-29 20:44:39,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 14942208. Throughput: 0: 1860.0. Samples: 3771456. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:44:39,292][32845] Avg episode reward: [(0, '4.285')] -[2025-08-29 20:44:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 14942208. Throughput: 0: 1747.3. Samples: 3774492. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:44:44,292][32845] Avg episode reward: [(0, '4.282')] -[2025-08-29 20:44:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 8278.2). Total num frames: 14942208. Throughput: 0: 1648.8. Samples: 3784704. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:44:49,292][32845] Avg episode reward: [(0, '4.356')] -[2025-08-29 20:44:54,291][32845] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 8441.9). Total num frames: 15007744. Throughput: 0: 1712.2. Samples: 3796216. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:44:54,292][32845] Avg episode reward: [(0, '4.398')] -[2025-08-29 20:44:57,880][48863] Updated weights for policy 0, policy_version 230 (0.0026) -[2025-08-29 20:44:59,291][32845] Fps is (10 sec: 13107.2, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 15073280. Throughput: 0: 1733.6. Samples: 3801088. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:44:59,293][32845] Avg episode reward: [(0, '4.349')] -[2025-08-29 20:45:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6889.0, 300 sec: 8441.9). Total num frames: 15073280. Throughput: 0: 1750.7. Samples: 3814332. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:04,293][32845] Avg episode reward: [(0, '4.586')] -[2025-08-29 20:45:09,292][32845] Fps is (10 sec: 6553.2, 60 sec: 7645.8, 300 sec: 8441.9). Total num frames: 15138816. Throughput: 0: 1642.1. Samples: 3821720. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:09,293][32845] Avg episode reward: [(0, '4.494')] -[2025-08-29 20:45:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 15138816. Throughput: 0: 1748.5. Samples: 3830616. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:14,292][32845] Avg episode reward: [(0, '4.352')] -[2025-08-29 20:45:19,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.7, 300 sec: 8219.8). Total num frames: 15138816. Throughput: 0: 1530.3. Samples: 3833856. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:19,292][32845] Avg episode reward: [(0, '4.352')] -[2025-08-29 20:45:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8302.2). Total num frames: 15204352. Throughput: 0: 1642.9. Samples: 3845388. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:24,292][32845] Avg episode reward: [(0, '4.193')] -[2025-08-29 20:45:29,291][32845] Fps is (10 sec: 13107.2, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 15269888. Throughput: 0: 1683.3. Samples: 3850240. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:29,292][32845] Avg episode reward: [(0, '4.344')] -[2025-08-29 20:45:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 15269888. Throughput: 0: 1757.2. Samples: 3863780. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:34,292][32845] Avg episode reward: [(0, '4.233')] -[2025-08-29 20:45:36,689][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000234_15335424.pth... -[2025-08-29 20:45:36,827][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000234_15335424.pth -[2025-08-29 20:45:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 15335424. Throughput: 0: 1583.7. Samples: 3867484. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:39,292][32845] Avg episode reward: [(0, '4.207')] -[2025-08-29 20:45:44,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.9, 300 sec: 8219.8). Total num frames: 15400960. Throughput: 0: 1581.9. Samples: 3872272. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:44,292][32845] Avg episode reward: [(0, '4.356')] -[2025-08-29 20:45:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7997.6). Total num frames: 15400960. Throughput: 0: 1740.1. Samples: 3892636. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:49,292][32845] Avg episode reward: [(0, '4.352')] -[2025-08-29 20:45:54,707][32845] Fps is (10 sec: 0.0, 60 sec: 6508.5, 300 sec: 7764.5). Total num frames: 15400960. Throughput: 0: 1710.3. Samples: 3899392. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:54,708][32845] Avg episode reward: [(0, '4.318')] -[2025-08-29 20:45:59,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 15466496. Throughput: 0: 1573.7. Samples: 3901432. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:45:59,292][32845] Avg episode reward: [(0, '4.489')] -[2025-08-29 20:46:04,291][32845] Fps is (10 sec: 13676.0, 60 sec: 7645.9, 300 sec: 7997.6). Total num frames: 15532032. Throughput: 0: 1820.4. Samples: 3915776. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:46:04,292][32845] Avg episode reward: [(0, '4.454')] -[2025-08-29 20:46:09,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.7, 300 sec: 7775.5). Total num frames: 15532032. Throughput: 0: 1806.2. Samples: 3926668. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:46:09,292][32845] Avg episode reward: [(0, '4.352')] -[2025-08-29 20:46:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7775.5). Total num frames: 15597568. Throughput: 0: 1820.4. Samples: 3932160. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:46:14,293][32845] Avg episode reward: [(0, '4.186')] -[2025-08-29 20:46:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7553.3). Total num frames: 15597568. Throughput: 0: 1851.5. Samples: 3947096. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:46:19,292][32845] Avg episode reward: [(0, '4.618')] -[2025-08-29 20:46:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7553.3). Total num frames: 15663104. Throughput: 0: 1938.3. Samples: 3954708. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:46:24,292][32845] Avg episode reward: [(0, '4.361')] -[2025-08-29 20:46:30,550][32845] Fps is (10 sec: 5820.7, 60 sec: 6418.9, 300 sec: 7300.0). Total num frames: 15663104. Throughput: 0: 1978.7. Samples: 3963804. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:46:30,551][32845] Avg episode reward: [(0, '4.380')] -[2025-08-29 20:46:33,360][48863] Updated weights for policy 0, policy_version 240 (0.0044) -[2025-08-29 20:46:34,291][32845] Fps is (10 sec: 6553.5, 60 sec: 7645.8, 300 sec: 7331.1). Total num frames: 15728640. Throughput: 0: 1606.5. Samples: 3964928. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:46:34,292][32845] Avg episode reward: [(0, '4.320')] -[2025-08-29 20:46:39,291][32845] Fps is (10 sec: 7497.5, 60 sec: 6553.6, 300 sec: 7331.2). Total num frames: 15728640. Throughput: 0: 1801.7. Samples: 3979720. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:46:39,292][32845] Avg episode reward: [(0, '4.371')] -[2025-08-29 20:46:44,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7331.1). Total num frames: 15794176. Throughput: 0: 1775.1. Samples: 3981312. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:46:44,292][32845] Avg episode reward: [(0, '4.497')] -[2025-08-29 20:46:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 15794176. Throughput: 0: 1817.7. Samples: 3997572. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:46:49,292][32845] Avg episode reward: [(0, '4.306')] -[2025-08-29 20:46:54,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7699.2, 300 sec: 7109.0). Total num frames: 15859712. Throughput: 0: 1770.8. Samples: 4006352. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:46:54,292][32845] Avg episode reward: [(0, '4.387')] -[2025-08-29 20:46:59,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 15925248. Throughput: 0: 1816.0. Samples: 4013880. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:46:59,292][32845] Avg episode reward: [(0, '4.308')] -[2025-08-29 20:47:06,382][32845] Fps is (10 sec: 5420.4, 60 sec: 6333.0, 300 sec: 6838.4). Total num frames: 15925248. Throughput: 0: 1612.0. Samples: 4023008. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:06,383][32845] Avg episode reward: [(0, '4.399')] -[2025-08-29 20:47:09,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 15925248. Throughput: 0: 1669.1. Samples: 4029816. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:09,292][32845] Avg episode reward: [(0, '4.327')] -[2025-08-29 20:47:14,292][32845] Fps is (10 sec: 8285.2, 60 sec: 6553.5, 300 sec: 6896.6). Total num frames: 15990784. Throughput: 0: 1524.0. Samples: 4030464. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:14,293][32845] Avg episode reward: [(0, '4.533')] -[2025-08-29 20:47:19,291][32845] Fps is (10 sec: 6553.4, 60 sec: 6553.6, 300 sec: 6664.7). Total num frames: 15990784. Throughput: 0: 1806.2. Samples: 4046208. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:19,293][32845] Avg episode reward: [(0, '4.503')] -[2025-08-29 20:47:24,291][32845] Fps is (10 sec: 6553.9, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 16056320. Throughput: 0: 1655.1. Samples: 4054200. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:24,292][32845] Avg episode reward: [(0, '4.444')] -[2025-08-29 20:47:29,292][32845] Fps is (10 sec: 13107.1, 60 sec: 7809.7, 300 sec: 6886.8). Total num frames: 16121856. Throughput: 0: 1811.1. Samples: 4062812. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:29,293][32845] Avg episode reward: [(0, '4.304')] -[2025-08-29 20:47:34,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 16121856. Throughput: 0: 1657.8. Samples: 4072172. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:34,292][32845] Avg episode reward: [(0, '4.241')] -[2025-08-29 20:47:37,715][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000247_16187392.pth... -[2025-08-29 20:47:37,898][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000247_16187392.pth -[2025-08-29 20:47:42,209][32845] Fps is (10 sec: 5073.4, 60 sec: 7291.3, 300 sec: 6819.4). Total num frames: 16187392. Throughput: 0: 1550.1. Samples: 4080628. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:42,210][32845] Avg episode reward: [(0, '4.202')] -[2025-08-29 20:47:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 16187392. Throughput: 0: 1483.3. Samples: 4080628. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:44,292][32845] Avg episode reward: [(0, '4.202')] -[2025-08-29 20:47:49,291][32845] Fps is (10 sec: 9253.8, 60 sec: 7645.9, 300 sec: 6916.2). Total num frames: 16252928. Throughput: 0: 1675.0. Samples: 4094880. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:49,292][32845] Avg episode reward: [(0, '4.608')] -[2025-08-29 20:47:54,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 16318464. Throughput: 0: 2109.2. Samples: 4124732. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 20:47:54,292][32845] Avg episode reward: [(0, '4.365')] -[2025-08-29 20:47:55,376][48863] Updated weights for policy 0, policy_version 250 (0.0021) -[2025-08-29 20:47:59,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 16384000. Throughput: 0: 2295.6. Samples: 4133764. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:47:59,292][32845] Avg episode reward: [(0, '4.549')] -[2025-08-29 20:47:59,995][48846] Signal inference workers to stop experience collection... (250 times) -[2025-08-29 20:48:00,002][48863] InferenceWorker_p0-w0: stopping experience collection (250 times) -[2025-08-29 20:48:00,007][48846] Signal inference workers to resume experience collection... (250 times) -[2025-08-29 20:48:00,007][48863] InferenceWorker_p0-w0: resuming experience collection (250 times) -[2025-08-29 20:48:04,291][32845] Fps is (10 sec: 13107.2, 60 sec: 9053.6, 300 sec: 7331.1). Total num frames: 16449536. Throughput: 0: 2454.2. Samples: 4156648. Policy #0 lag: (min: 1.0, avg: 1.9, max: 2.0) -[2025-08-29 20:48:04,292][32845] Avg episode reward: [(0, '4.257')] -[2025-08-29 20:48:09,291][32845] Fps is (10 sec: 13107.2, 60 sec: 9830.4, 300 sec: 7331.1). Total num frames: 16515072. Throughput: 0: 2749.2. Samples: 4177912. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:48:09,292][32845] Avg episode reward: [(0, '4.555')] -[2025-08-29 20:48:14,291][32845] Fps is (10 sec: 19660.5, 60 sec: 10922.7, 300 sec: 7775.5). Total num frames: 16646144. Throughput: 0: 2733.0. Samples: 4185796. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:48:14,293][32845] Avg episode reward: [(0, '4.211')] -[2025-08-29 20:48:19,291][32845] Fps is (10 sec: 13107.2, 60 sec: 10922.7, 300 sec: 7553.3). Total num frames: 16646144. Throughput: 0: 2633.7. Samples: 4190688. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:48:19,292][32845] Avg episode reward: [(0, '4.418')] -[2025-08-29 20:48:24,291][32845] Fps is (10 sec: 6553.7, 60 sec: 10922.7, 300 sec: 7830.8). Total num frames: 16711680. Throughput: 0: 3148.9. Samples: 4213140. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:48:24,292][32845] Avg episode reward: [(0, '4.710')] -[2025-08-29 20:48:29,291][32845] Fps is (10 sec: 13107.1, 60 sec: 10922.7, 300 sec: 7775.5). Total num frames: 16777216. Throughput: 0: 3241.3. Samples: 4226488. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 20:48:29,293][32845] Avg episode reward: [(0, '4.301')] -[2025-08-29 20:48:34,291][32845] Fps is (10 sec: 13107.1, 60 sec: 12014.9, 300 sec: 7775.5). Total num frames: 16842752. Throughput: 0: 3306.3. Samples: 4243664. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 20:48:34,292][32845] Avg episode reward: [(0, '4.509')] -[2025-08-29 20:48:39,291][32845] Fps is (10 sec: 13107.2, 60 sec: 12629.1, 300 sec: 7997.6). Total num frames: 16908288. Throughput: 0: 3155.9. Samples: 4266748. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:48:39,292][32845] Avg episode reward: [(0, '4.389')] -[2025-08-29 20:48:44,291][32845] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 8219.8). Total num frames: 16973824. Throughput: 0: 3165.8. Samples: 4276224. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:48:44,293][32845] Avg episode reward: [(0, '4.282')] -[2025-08-29 20:48:46,337][48863] Updated weights for policy 0, policy_version 260 (0.0014) -[2025-08-29 20:48:49,291][32845] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 7997.6). Total num frames: 17039360. Throughput: 0: 3120.4. Samples: 4297068. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:48:49,293][32845] Avg episode reward: [(0, '4.462')] -[2025-08-29 20:48:54,291][32845] Fps is (10 sec: 6553.6, 60 sec: 12014.9, 300 sec: 7997.6). Total num frames: 17039360. Throughput: 0: 2912.8. Samples: 4308988. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:48:54,292][32845] Avg episode reward: [(0, '4.513')] -[2025-08-29 20:48:59,291][32845] Fps is (10 sec: 6553.7, 60 sec: 12014.9, 300 sec: 8302.0). Total num frames: 17104896. Throughput: 0: 2774.3. Samples: 4310640. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:48:59,292][32845] Avg episode reward: [(0, '4.310')] -[2025-08-29 20:49:04,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 8441.9). Total num frames: 17170432. Throughput: 0: 3085.9. Samples: 4329552. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:49:04,292][32845] Avg episode reward: [(0, '4.347')] -[2025-08-29 20:49:09,291][32845] Fps is (10 sec: 13106.7, 60 sec: 12014.9, 300 sec: 8441.9). Total num frames: 17235968. Throughput: 0: 2976.3. Samples: 4347076. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:49:09,293][32845] Avg episode reward: [(0, '4.543')] -[2025-08-29 20:49:14,291][32845] Fps is (10 sec: 13107.1, 60 sec: 10922.7, 300 sec: 8664.1). Total num frames: 17301504. Throughput: 0: 2925.7. Samples: 4358144. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:49:14,292][32845] Avg episode reward: [(0, '4.484')] -[2025-08-29 20:49:19,291][32845] Fps is (10 sec: 13107.4, 60 sec: 12014.9, 300 sec: 8664.1). Total num frames: 17367040. Throughput: 0: 2908.1. Samples: 4374528. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:49:19,293][32845] Avg episode reward: [(0, '4.268')] -[2025-08-29 20:49:24,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 8664.1). Total num frames: 17432576. Throughput: 0: 2759.2. Samples: 4390912. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:49:24,292][32845] Avg episode reward: [(0, '4.521')] -[2025-08-29 20:49:29,703][32845] Fps is (10 sec: 6294.2, 60 sec: 10848.2, 300 sec: 8652.0). Total num frames: 17432576. Throughput: 0: 2525.5. Samples: 4390912. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:49:29,705][32845] Avg episode reward: [(0, '4.537')] -[2025-08-29 20:49:31,985][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000267_17498112.pth... -[2025-08-29 20:49:32,154][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000267_17498112.pth -[2025-08-29 20:49:34,291][32845] Fps is (10 sec: 6553.5, 60 sec: 10922.7, 300 sec: 8664.1). Total num frames: 17498112. Throughput: 0: 2469.2. Samples: 4408180. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:49:34,292][32845] Avg episode reward: [(0, '4.470')] -[2025-08-29 20:49:39,291][32845] Fps is (10 sec: 13670.8, 60 sec: 10922.7, 300 sec: 8886.2). Total num frames: 17563648. Throughput: 0: 2523.2. Samples: 4422532. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:49:39,292][32845] Avg episode reward: [(0, '4.272')] -[2025-08-29 20:49:44,291][32845] Fps is (10 sec: 13107.4, 60 sec: 10922.7, 300 sec: 9108.4). Total num frames: 17629184. Throughput: 0: 2824.2. Samples: 4437728. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:49:44,292][32845] Avg episode reward: [(0, '4.451')] -[2025-08-29 20:49:46,698][48863] Updated weights for policy 0, policy_version 270 (0.0381) -[2025-08-29 20:49:49,291][32845] Fps is (10 sec: 13107.1, 60 sec: 10922.7, 300 sec: 9108.4). Total num frames: 17694720. Throughput: 0: 2893.3. Samples: 4459752. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:49:49,293][32845] Avg episode reward: [(0, '4.352')] -[2025-08-29 20:49:54,291][32845] Fps is (10 sec: 13107.1, 60 sec: 12014.9, 300 sec: 9108.4). Total num frames: 17760256. Throughput: 0: 3000.6. Samples: 4482104. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) -[2025-08-29 20:49:54,292][32845] Avg episode reward: [(0, '4.316')] -[2025-08-29 20:49:59,291][32845] Fps is (10 sec: 13107.4, 60 sec: 12014.9, 300 sec: 9330.5). Total num frames: 17825792. Throughput: 0: 2918.6. Samples: 4489480. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:49:59,292][32845] Avg episode reward: [(0, '4.465')] -[2025-08-29 20:50:05,538][32845] Fps is (10 sec: 11654.3, 60 sec: 11770.4, 300 sec: 9291.3). Total num frames: 17891328. Throughput: 0: 2809.4. Samples: 4504452. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:05,539][32845] Avg episode reward: [(0, '4.402')] -[2025-08-29 20:50:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 10922.7, 300 sec: 9330.5). Total num frames: 17891328. Throughput: 0: 2912.5. Samples: 4521976. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:09,292][32845] Avg episode reward: [(0, '4.466')] -[2025-08-29 20:50:14,291][32845] Fps is (10 sec: 7487.0, 60 sec: 10922.7, 300 sec: 9552.7). Total num frames: 17956864. Throughput: 0: 3148.6. Samples: 4531300. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:14,292][32845] Avg episode reward: [(0, '4.360')] -[2025-08-29 20:50:19,292][32845] Fps is (10 sec: 13105.8, 60 sec: 10922.5, 300 sec: 9552.7). Total num frames: 18022400. Throughput: 0: 3232.6. Samples: 4553648. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:19,294][32845] Avg episode reward: [(0, '4.334')] -[2025-08-29 20:50:24,294][32845] Fps is (10 sec: 13102.8, 60 sec: 10922.1, 300 sec: 9552.6). Total num frames: 18087936. Throughput: 0: 3292.6. Samples: 4570712. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:24,296][32845] Avg episode reward: [(0, '4.525')] -[2025-08-29 20:50:29,291][32845] Fps is (10 sec: 13108.5, 60 sec: 12098.0, 300 sec: 9774.9). Total num frames: 18153472. Throughput: 0: 3152.9. Samples: 4579608. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:29,293][32845] Avg episode reward: [(0, '4.446')] -[2025-08-29 20:50:34,291][32845] Fps is (10 sec: 19667.4, 60 sec: 13107.2, 300 sec: 9997.0). Total num frames: 18284544. Throughput: 0: 3177.2. Samples: 4602724. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:34,292][32845] Avg episode reward: [(0, '4.340')] -[2025-08-29 20:50:41,373][32845] Fps is (10 sec: 10848.6, 60 sec: 11612.0, 300 sec: 9706.4). Total num frames: 18284544. Throughput: 0: 2731.9. Samples: 4610728. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:41,374][32845] Avg episode reward: [(0, '4.430')] -[2025-08-29 20:50:42,199][48863] Updated weights for policy 0, policy_version 280 (0.0028) -[2025-08-29 20:50:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 12014.9, 300 sec: 9997.0). Total num frames: 18350080. Throughput: 0: 2906.8. Samples: 4620288. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:44,292][32845] Avg episode reward: [(0, '4.467')] -[2025-08-29 20:50:49,291][32845] Fps is (10 sec: 16553.8, 60 sec: 12015.0, 300 sec: 10233.6). Total num frames: 18415616. Throughput: 0: 3029.4. Samples: 4637000. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:49,292][32845] Avg episode reward: [(0, '4.348')] -[2025-08-29 20:50:54,291][32845] Fps is (10 sec: 13107.1, 60 sec: 12014.9, 300 sec: 10219.2). Total num frames: 18481152. Throughput: 0: 3070.1. Samples: 4660132. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:54,293][32845] Avg episode reward: [(0, '4.500')] -[2025-08-29 20:50:59,291][32845] Fps is (10 sec: 13107.2, 60 sec: 12014.9, 300 sec: 10219.2). Total num frames: 18546688. Throughput: 0: 3069.8. Samples: 4669440. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:50:59,292][32845] Avg episode reward: [(0, '4.226')] -[2025-08-29 20:51:04,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12269.9, 300 sec: 10441.3). Total num frames: 18612224. Throughput: 0: 3061.4. Samples: 4691408. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:04,292][32845] Avg episode reward: [(0, '4.417')] -[2025-08-29 20:51:09,291][32845] Fps is (10 sec: 13107.1, 60 sec: 13107.2, 300 sec: 10441.3). Total num frames: 18677760. Throughput: 0: 3206.4. Samples: 4714988. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:09,292][32845] Avg episode reward: [(0, '4.385')] -[2025-08-29 20:51:17,203][32845] Fps is (10 sec: 10151.0, 60 sec: 12500.5, 300 sec: 10559.2). Total num frames: 18743296. Throughput: 0: 3016.2. Samples: 4724120. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:17,204][32845] Avg episode reward: [(0, '4.635')] -[2025-08-29 20:51:19,291][32845] Fps is (10 sec: 13107.2, 60 sec: 13107.4, 300 sec: 10663.5). Total num frames: 18808832. Throughput: 0: 2928.0. Samples: 4734484. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:19,292][32845] Avg episode reward: [(0, '4.419')] -[2025-08-29 20:51:24,291][32845] Fps is (10 sec: 18492.9, 60 sec: 13107.9, 300 sec: 10932.3). Total num frames: 18874368. Throughput: 0: 3276.8. Samples: 4751360. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:24,293][32845] Avg episode reward: [(0, '4.680')] -[2025-08-29 20:51:29,291][32845] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 10885.6). Total num frames: 18939904. Throughput: 0: 3157.2. Samples: 4762364. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:29,292][32845] Avg episode reward: [(0, '4.396')] -[2025-08-29 20:51:33,343][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000290_19005440.pth... -[2025-08-29 20:51:33,349][48863] Updated weights for policy 0, policy_version 290 (0.0019) -[2025-08-29 20:51:33,491][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000290_19005440.pth -[2025-08-29 20:51:34,291][32845] Fps is (10 sec: 13107.1, 60 sec: 12014.9, 300 sec: 11107.8). Total num frames: 19005440. Throughput: 0: 3268.2. Samples: 4784068. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:34,292][32845] Avg episode reward: [(0, '4.402')] -[2025-08-29 20:51:39,291][32845] Fps is (10 sec: 13107.3, 60 sec: 13578.4, 300 sec: 11107.8). Total num frames: 19070976. Throughput: 0: 2937.2. Samples: 4792304. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:39,292][32845] Avg episode reward: [(0, '4.411')] -[2025-08-29 20:51:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 12014.9, 300 sec: 11107.8). Total num frames: 19070976. Throughput: 0: 2949.9. Samples: 4802184. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:44,292][32845] Avg episode reward: [(0, '4.341')] -[2025-08-29 20:51:49,291][32845] Fps is (10 sec: 6553.5, 60 sec: 12014.9, 300 sec: 11107.8). Total num frames: 19136512. Throughput: 0: 2788.6. Samples: 4816896. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:49,292][32845] Avg episode reward: [(0, '4.474')] -[2025-08-29 20:51:54,291][32845] Fps is (10 sec: 6553.7, 60 sec: 10922.7, 300 sec: 10885.6). Total num frames: 19136512. Throughput: 0: 2364.7. Samples: 4821400. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:54,292][32845] Avg episode reward: [(0, '4.601')] -[2025-08-29 20:51:59,291][32845] Fps is (10 sec: 0.0, 60 sec: 9830.4, 300 sec: 10963.3). Total num frames: 19136512. Throughput: 0: 2503.6. Samples: 4829492. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:51:59,292][32845] Avg episode reward: [(0, '4.460')] -[2025-08-29 20:52:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 9830.4, 300 sec: 11107.8). Total num frames: 19202048. Throughput: 0: 2307.2. Samples: 4838308. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:52:04,292][32845] Avg episode reward: [(0, '4.460')] -[2025-08-29 20:52:09,291][32845] Fps is (10 sec: 6553.7, 60 sec: 8738.1, 300 sec: 10885.7). Total num frames: 19202048. Throughput: 0: 2186.6. Samples: 4849756. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:52:09,292][32845] Avg episode reward: [(0, '4.220')] -[2025-08-29 20:52:14,291][32845] Fps is (10 sec: 6553.5, 60 sec: 9183.9, 300 sec: 11107.8). Total num frames: 19267584. Throughput: 0: 2064.8. Samples: 4855280. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:52:14,292][32845] Avg episode reward: [(0, '4.404')] -[2025-08-29 20:52:19,291][32845] Fps is (10 sec: 13107.1, 60 sec: 8738.1, 300 sec: 11107.8). Total num frames: 19333120. Throughput: 0: 1821.8. Samples: 4866048. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:52:19,292][32845] Avg episode reward: [(0, '4.254')] -[2025-08-29 20:52:24,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7645.9, 300 sec: 10885.7). Total num frames: 19333120. Throughput: 0: 1968.1. Samples: 4880868. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:52:24,292][32845] Avg episode reward: [(0, '4.699')] -[2025-08-29 20:52:29,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 10885.6). Total num frames: 19333120. Throughput: 0: 1783.3. Samples: 4882432. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:52:29,292][32845] Avg episode reward: [(0, '4.715')] -[2025-08-29 20:52:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 10994.4). Total num frames: 19398656. Throughput: 0: 1596.8. Samples: 4888752. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:52:34,292][32845] Avg episode reward: [(0, '4.335')] -[2025-08-29 20:52:39,291][32845] Fps is (10 sec: 13107.1, 60 sec: 6553.6, 300 sec: 11107.8). Total num frames: 19464192. Throughput: 0: 1720.4. Samples: 4898816. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:52:39,292][32845] Avg episode reward: [(0, '4.650')] -[2025-08-29 20:52:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 10885.6). Total num frames: 19464192. Throughput: 0: 1700.8. Samples: 4906028. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:52:44,292][32845] Avg episode reward: [(0, '4.533')] -[2025-08-29 20:52:49,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 10885.6). Total num frames: 19529728. Throughput: 0: 1710.7. Samples: 4915288. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:52:49,292][32845] Avg episode reward: [(0, '4.381')] -[2025-08-29 20:52:54,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 10663.5). Total num frames: 19529728. Throughput: 0: 1818.4. Samples: 4931584. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:52:54,292][32845] Avg episode reward: [(0, '4.533')] -[2025-08-29 20:52:59,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 10663.5). Total num frames: 19595264. Throughput: 0: 1770.8. Samples: 4934964. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:52:59,292][32845] Avg episode reward: [(0, '4.494')] -[2025-08-29 20:53:04,708][32845] Fps is (10 sec: 6291.3, 60 sec: 6508.4, 300 sec: 10426.6). Total num frames: 19595264. Throughput: 0: 1694.3. Samples: 4942996. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:04,709][32845] Avg episode reward: [(0, '4.467')] -[2025-08-29 20:53:07,201][48863] Updated weights for policy 0, policy_version 300 (0.0014) -[2025-08-29 20:53:09,291][32845] Fps is (10 sec: 6553.5, 60 sec: 7645.8, 300 sec: 10219.2). Total num frames: 19660800. Throughput: 0: 1566.4. Samples: 4951356. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:09,293][32845] Avg episode reward: [(0, '4.459')] -[2025-08-29 20:53:14,291][32845] Fps is (10 sec: 6838.7, 60 sec: 6553.6, 300 sec: 10219.2). Total num frames: 19660800. Throughput: 0: 1700.4. Samples: 4958952. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:14,292][32845] Avg episode reward: [(0, '4.350')] -[2025-08-29 20:53:15,602][48846] Signal inference workers to stop experience collection... (300 times) -[2025-08-29 20:53:15,616][48863] InferenceWorker_p0-w0: stopping experience collection (300 times) -[2025-08-29 20:53:15,622][48846] Signal inference workers to resume experience collection... (300 times) -[2025-08-29 20:53:15,622][48863] InferenceWorker_p0-w0: resuming experience collection (300 times) -[2025-08-29 20:53:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 10219.2). Total num frames: 19726336. Throughput: 0: 1814.9. Samples: 4970424. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:19,293][32845] Avg episode reward: [(0, '4.502')] -[2025-08-29 20:53:24,291][32845] Fps is (10 sec: 13107.1, 60 sec: 7645.8, 300 sec: 10219.2). Total num frames: 19791872. Throughput: 0: 1820.4. Samples: 4980736. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:24,292][32845] Avg episode reward: [(0, '4.580')] -[2025-08-29 20:53:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 9997.0). Total num frames: 19791872. Throughput: 0: 1811.9. Samples: 4987564. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:29,292][32845] Avg episode reward: [(0, '4.478')] -[2025-08-29 20:53:32,487][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000303_19857408.pth... -[2025-08-29 20:53:32,629][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000303_19857408.pth -[2025-08-29 20:53:34,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7645.9, 300 sec: 9997.0). Total num frames: 19857408. Throughput: 0: 1820.7. Samples: 4997220. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:34,292][32845] Avg episode reward: [(0, '4.465')] -[2025-08-29 20:53:40,542][32845] Fps is (10 sec: 11650.1, 60 sec: 7489.7, 300 sec: 9954.8). Total num frames: 19922944. Throughput: 0: 1434.6. Samples: 4997936. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:40,543][32845] Avg episode reward: [(0, '4.481')] -[2025-08-29 20:53:44,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 9774.9). Total num frames: 19922944. Throughput: 0: 1532.1. Samples: 5003908. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:44,292][32845] Avg episode reward: [(0, '4.340')] -[2025-08-29 20:53:49,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 9774.9). Total num frames: 19922944. Throughput: 0: 1767.5. Samples: 5021796. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:49,293][32845] Avg episode reward: [(0, '4.322')] -[2025-08-29 20:53:54,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7645.9, 300 sec: 9774.9). Total num frames: 19988480. Throughput: 0: 1891.5. Samples: 5036472. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:54,292][32845] Avg episode reward: [(0, '4.589')] -[2025-08-29 20:53:59,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 9552.7). Total num frames: 19988480. Throughput: 0: 1920.0. Samples: 5045352. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:53:59,292][32845] Avg episode reward: [(0, '4.560')] -[2025-08-29 20:54:04,291][32845] Fps is (10 sec: 6553.5, 60 sec: 7699.3, 300 sec: 9552.7). Total num frames: 20054016. Throughput: 0: 1850.1. Samples: 5053680. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:54:04,293][32845] Avg episode reward: [(0, '4.300')] -[2025-08-29 20:54:09,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.9, 300 sec: 9552.7). Total num frames: 20119552. Throughput: 0: 1820.4. Samples: 5062656. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:54:09,292][32845] Avg episode reward: [(0, '4.395')] -[2025-08-29 20:54:16,369][32845] Fps is (10 sec: 5426.2, 60 sec: 7390.0, 300 sec: 9265.3). Total num frames: 20119552. Throughput: 0: 1757.0. Samples: 5070280. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:54:16,370][32845] Avg episode reward: [(0, '4.351')] -[2025-08-29 20:54:19,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 9108.4). Total num frames: 20119552. Throughput: 0: 1790.7. Samples: 5077800. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:54:19,292][32845] Avg episode reward: [(0, '4.395')] -[2025-08-29 20:54:24,291][32845] Fps is (10 sec: 8272.4, 60 sec: 6553.6, 300 sec: 9343.6). Total num frames: 20185088. Throughput: 0: 1997.4. Samples: 5085320. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:54:24,294][32845] Avg episode reward: [(0, '4.259')] -[2025-08-29 20:54:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 9108.4). Total num frames: 20185088. Throughput: 0: 2013.8. Samples: 5094528. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:54:29,292][32845] Avg episode reward: [(0, '4.511')] -[2025-08-29 20:54:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 9108.4). Total num frames: 20250624. Throughput: 0: 1802.9. Samples: 5102924. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:54:34,293][32845] Avg episode reward: [(0, '4.293')] -[2025-08-29 20:54:38,411][48863] Updated weights for policy 0, policy_version 310 (0.0013) -[2025-08-29 20:54:39,291][32845] Fps is (10 sec: 13107.1, 60 sec: 6693.1, 300 sec: 9108.4). Total num frames: 20316160. Throughput: 0: 1677.2. Samples: 5111948. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:54:39,292][32845] Avg episode reward: [(0, '4.329')] -[2025-08-29 20:54:44,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 8886.2). Total num frames: 20316160. Throughput: 0: 1652.5. Samples: 5119716. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:54:44,293][32845] Avg episode reward: [(0, '4.602')] -[2025-08-29 20:54:52,213][32845] Fps is (10 sec: 5071.8, 60 sec: 7290.9, 300 sec: 8799.1). Total num frames: 20381696. Throughput: 0: 1576.2. Samples: 5129216. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:54:52,214][32845] Avg episode reward: [(0, '4.399')] -[2025-08-29 20:54:54,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 8664.1). Total num frames: 20381696. Throughput: 0: 1643.3. Samples: 5136604. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:54:54,293][32845] Avg episode reward: [(0, '4.595')] -[2025-08-29 20:54:59,291][32845] Fps is (10 sec: 9258.9, 60 sec: 7645.9, 300 sec: 8700.9). Total num frames: 20447232. Throughput: 0: 1730.9. Samples: 5144576. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:54:59,292][32845] Avg episode reward: [(0, '4.680')] -[2025-08-29 20:55:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8664.1). Total num frames: 20447232. Throughput: 0: 1699.5. Samples: 5154276. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:04,292][32845] Avg episode reward: [(0, '4.684')] -[2025-08-29 20:55:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8664.1). Total num frames: 20512768. Throughput: 0: 1732.4. Samples: 5163280. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:09,292][32845] Avg episode reward: [(0, '4.589')] -[2025-08-29 20:55:14,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6788.7, 300 sec: 8442.0). Total num frames: 20512768. Throughput: 0: 1705.2. Samples: 5171260. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:14,292][32845] Avg episode reward: [(0, '4.520')] -[2025-08-29 20:55:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 8442.0). Total num frames: 20578304. Throughput: 0: 1742.5. Samples: 5181336. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:19,292][32845] Avg episode reward: [(0, '4.236')] -[2025-08-29 20:55:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 20578304. Throughput: 0: 1817.3. Samples: 5193728. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:24,292][32845] Avg episode reward: [(0, '4.522')] -[2025-08-29 20:55:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7997.6). Total num frames: 20643840. Throughput: 0: 1644.7. Samples: 5193728. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:29,292][32845] Avg episode reward: [(0, '4.526')] -[2025-08-29 20:55:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8054.5). Total num frames: 20643840. Throughput: 0: 1776.5. Samples: 5203968. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:34,292][32845] Avg episode reward: [(0, '4.293')] -[2025-08-29 20:55:37,263][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000316_20709376.pth... -[2025-08-29 20:55:37,423][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000316_20709376.pth -[2025-08-29 20:55:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7997.6). Total num frames: 20709376. Throughput: 0: 1653.2. Samples: 5211000. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:39,293][32845] Avg episode reward: [(0, '4.248')] -[2025-08-29 20:55:44,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 7997.6). Total num frames: 20774912. Throughput: 0: 1562.4. Samples: 5214884. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:44,292][32845] Avg episode reward: [(0, '4.355')] -[2025-08-29 20:55:49,292][32845] Fps is (10 sec: 6553.3, 60 sec: 6889.0, 300 sec: 7775.4). Total num frames: 20774912. Throughput: 0: 1758.0. Samples: 5233388. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:49,294][32845] Avg episode reward: [(0, '4.612')] -[2025-08-29 20:55:54,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7775.5). Total num frames: 20840448. Throughput: 0: 1823.6. Samples: 5245344. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:54,292][32845] Avg episode reward: [(0, '4.396')] -[2025-08-29 20:55:59,291][32845] Fps is (10 sec: 6553.9, 60 sec: 6553.6, 300 sec: 7553.3). Total num frames: 20840448. Throughput: 0: 1825.4. Samples: 5253404. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:55:59,292][32845] Avg episode reward: [(0, '4.616')] -[2025-08-29 20:56:04,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 7331.1). Total num frames: 20840448. Throughput: 0: 1734.0. Samples: 5259364. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:56:04,292][32845] Avg episode reward: [(0, '4.616')] -[2025-08-29 20:56:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7404.2). Total num frames: 20905984. Throughput: 0: 1690.1. Samples: 5269784. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:56:09,292][32845] Avg episode reward: [(0, '4.302')] -[2025-08-29 20:56:13,142][48863] Updated weights for policy 0, policy_version 320 (0.0022) -[2025-08-29 20:56:14,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 7331.1). Total num frames: 20971520. Throughput: 0: 1820.4. Samples: 5275648. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:56:14,292][32845] Avg episode reward: [(0, '4.277')] -[2025-08-29 20:56:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 20971520. Throughput: 0: 1913.8. Samples: 5290088. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:56:19,293][32845] Avg episode reward: [(0, '4.531')] -[2025-08-29 20:56:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 21037056. Throughput: 0: 1934.6. Samples: 5298056. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:56:24,292][32845] Avg episode reward: [(0, '4.387')] -[2025-08-29 20:56:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 21037056. Throughput: 0: 2049.3. Samples: 5307104. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:56:29,293][32845] Avg episode reward: [(0, '4.398')] -[2025-08-29 20:56:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 21102592. Throughput: 0: 1796.1. Samples: 5314212. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:56:34,292][32845] Avg episode reward: [(0, '4.372')] -[2025-08-29 20:56:39,708][32845] Fps is (10 sec: 6291.6, 60 sec: 6508.4, 300 sec: 6877.1). Total num frames: 21102592. Throughput: 0: 1720.6. Samples: 5323488. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:56:39,709][32845] Avg episode reward: [(0, '4.250')] -[2025-08-29 20:56:44,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 21168128. Throughput: 0: 1586.6. Samples: 5324800. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:56:44,293][32845] Avg episode reward: [(0, '4.296')] -[2025-08-29 20:56:49,291][32845] Fps is (10 sec: 6838.2, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 21168128. Throughput: 0: 1752.9. Samples: 5338244. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:56:49,293][32845] Avg episode reward: [(0, '4.706')] -[2025-08-29 20:56:54,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 21233664. Throughput: 0: 1712.4. Samples: 5346844. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:56:54,292][32845] Avg episode reward: [(0, '4.489')] -[2025-08-29 20:56:59,291][32845] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 21233664. Throughput: 0: 1793.3. Samples: 5356348. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:56:59,293][32845] Avg episode reward: [(0, '4.534')] -[2025-08-29 20:57:04,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 21299200. Throughput: 0: 1680.3. Samples: 5365700. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:04,292][32845] Avg episode reward: [(0, '4.491')] -[2025-08-29 20:57:09,291][32845] Fps is (10 sec: 13106.9, 60 sec: 7645.8, 300 sec: 7109.0). Total num frames: 21364736. Throughput: 0: 1695.5. Samples: 5374352. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:09,294][32845] Avg episode reward: [(0, '4.329')] -[2025-08-29 20:57:15,535][32845] Fps is (10 sec: 5828.9, 60 sec: 6420.6, 300 sec: 6857.9). Total num frames: 21364736. Throughput: 0: 1629.9. Samples: 5382476. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:15,536][32845] Avg episode reward: [(0, '4.506')] -[2025-08-29 20:57:19,292][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 21364736. Throughput: 0: 1675.3. Samples: 5389600. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:19,293][32845] Avg episode reward: [(0, '4.461')] -[2025-08-29 20:57:24,291][32845] Fps is (10 sec: 7484.1, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 21430272. Throughput: 0: 1644.8. Samples: 5396820. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:24,292][32845] Avg episode reward: [(0, '4.461')] -[2025-08-29 20:57:29,291][32845] Fps is (10 sec: 13107.6, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 21495808. Throughput: 0: 1806.3. Samples: 5406084. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:29,292][32845] Avg episode reward: [(0, '4.549')] -[2025-08-29 20:57:34,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 21495808. Throughput: 0: 1725.7. Samples: 5415900. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:34,293][32845] Avg episode reward: [(0, '4.595')] -[2025-08-29 20:57:37,872][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000329_21561344.pth... -[2025-08-29 20:57:38,043][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000329_21561344.pth -[2025-08-29 20:57:39,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7699.3, 300 sec: 7109.0). Total num frames: 21561344. Throughput: 0: 1716.7. Samples: 5424096. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:39,292][32845] Avg episode reward: [(0, '4.612')] -[2025-08-29 20:57:41,446][48863] Updated weights for policy 0, policy_version 330 (0.0015) -[2025-08-29 20:57:44,291][32845] Fps is (10 sec: 13107.4, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 21626880. Throughput: 0: 1505.5. Samples: 5424096. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:44,293][32845] Avg episode reward: [(0, '4.348')] -[2025-08-29 20:57:51,385][32845] Fps is (10 sec: 5419.1, 60 sec: 7388.1, 300 sec: 7058.9). Total num frames: 21626880. Throughput: 0: 1677.8. Samples: 5444712. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:51,386][32845] Avg episode reward: [(0, '4.399')] -[2025-08-29 20:57:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 21626880. Throughput: 0: 1726.4. Samples: 5452040. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:54,292][32845] Avg episode reward: [(0, '4.507')] -[2025-08-29 20:57:59,291][32845] Fps is (10 sec: 8289.0, 60 sec: 7645.9, 300 sec: 7119.0). Total num frames: 21692416. Throughput: 0: 1677.4. Samples: 5455872. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:57:59,293][32845] Avg episode reward: [(0, '4.382')] -[2025-08-29 20:58:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 21692416. Throughput: 0: 1820.7. Samples: 5471532. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:58:04,292][32845] Avg episode reward: [(0, '4.489')] -[2025-08-29 20:58:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 21757952. Throughput: 0: 1847.9. Samples: 5479976. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:58:09,292][32845] Avg episode reward: [(0, '4.251')] -[2025-08-29 20:58:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6692.3, 300 sec: 6886.8). Total num frames: 21757952. Throughput: 0: 1822.4. Samples: 5488092. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:58:14,292][32845] Avg episode reward: [(0, '4.183')] -[2025-08-29 20:58:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 6886.8). Total num frames: 21823488. Throughput: 0: 1953.7. Samples: 5503816. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:58:19,292][32845] Avg episode reward: [(0, '4.550')] -[2025-08-29 20:58:27,203][32845] Fps is (10 sec: 10151.0, 60 sec: 7291.9, 300 sec: 7039.5). Total num frames: 21889024. Throughput: 0: 2029.5. Samples: 5521336. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:58:27,205][32845] Avg episode reward: [(0, '4.340')] -[2025-08-29 20:58:29,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 21954560. Throughput: 0: 2162.5. Samples: 5521408. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:58:29,292][32845] Avg episode reward: [(0, '4.480')] -[2025-08-29 20:58:34,291][32845] Fps is (10 sec: 18492.3, 60 sec: 8738.1, 300 sec: 7139.3). Total num frames: 22020096. Throughput: 0: 2169.4. Samples: 5537792. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:58:34,293][32845] Avg episode reward: [(0, '4.278')] -[2025-08-29 20:58:39,291][32845] Fps is (10 sec: 13107.2, 60 sec: 8738.1, 300 sec: 7331.1). Total num frames: 22085632. Throughput: 0: 2250.3. Samples: 5553304. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:58:39,292][32845] Avg episode reward: [(0, '4.498')] -[2025-08-29 20:58:44,291][32845] Fps is (10 sec: 13107.3, 60 sec: 8738.1, 300 sec: 7553.3). Total num frames: 22151168. Throughput: 0: 2482.8. Samples: 5567596. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:58:44,292][32845] Avg episode reward: [(0, '4.503')] -[2025-08-29 20:58:49,291][32845] Fps is (10 sec: 13107.1, 60 sec: 10185.8, 300 sec: 7553.3). Total num frames: 22216704. Throughput: 0: 2564.7. Samples: 5586944. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:58:49,292][32845] Avg episode reward: [(0, '4.457')] -[2025-08-29 20:58:52,781][48863] Updated weights for policy 0, policy_version 340 (0.0044) -[2025-08-29 20:58:54,292][32845] Fps is (10 sec: 13107.0, 60 sec: 10922.6, 300 sec: 7775.5). Total num frames: 22282240. Throughput: 0: 2806.1. Samples: 5606252. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:58:54,293][32845] Avg episode reward: [(0, '4.234')] -[2025-08-29 20:58:59,291][32845] Fps is (10 sec: 13107.3, 60 sec: 10922.7, 300 sec: 7775.5). Total num frames: 22347776. Throughput: 0: 2790.2. Samples: 5613652. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:58:59,292][32845] Avg episode reward: [(0, '4.473')] -[2025-08-29 20:59:04,291][32845] Fps is (10 sec: 6553.7, 60 sec: 10922.7, 300 sec: 7553.3). Total num frames: 22347776. Throughput: 0: 2696.6. Samples: 5625164. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:59:04,292][32845] Avg episode reward: [(0, '4.776')] -[2025-08-29 20:59:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 10922.7, 300 sec: 7830.6). Total num frames: 22413312. Throughput: 0: 2963.6. Samples: 5646068. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:59:09,292][32845] Avg episode reward: [(0, '4.641')] -[2025-08-29 20:59:14,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 7997.6). Total num frames: 22478848. Throughput: 0: 2889.2. Samples: 5651420. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:59:14,292][32845] Avg episode reward: [(0, '4.229')] -[2025-08-29 20:59:19,291][32845] Fps is (10 sec: 13107.2, 60 sec: 12014.9, 300 sec: 7997.6). Total num frames: 22544384. Throughput: 0: 3071.9. Samples: 5676028. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:59:19,292][32845] Avg episode reward: [(0, '4.456')] -[2025-08-29 20:59:24,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12627.9, 300 sec: 8219.8). Total num frames: 22609920. Throughput: 0: 3101.1. Samples: 5692852. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:59:24,292][32845] Avg episode reward: [(0, '4.538')] -[2025-08-29 20:59:29,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 8219.8). Total num frames: 22675456. Throughput: 0: 3029.3. Samples: 5703916. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:59:29,292][32845] Avg episode reward: [(0, '4.415')] -[2025-08-29 20:59:30,751][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000347_22740992.pth... -[2025-08-29 20:59:30,885][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000347_22740992.pth -[2025-08-29 20:59:34,292][32845] Fps is (10 sec: 13106.5, 60 sec: 12014.9, 300 sec: 8219.8). Total num frames: 22740992. Throughput: 0: 2931.3. Samples: 5718856. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 20:59:34,293][32845] Avg episode reward: [(0, '4.385')] -[2025-08-29 20:59:39,291][32845] Fps is (10 sec: 13107.0, 60 sec: 12014.9, 300 sec: 8441.9). Total num frames: 22806528. Throughput: 0: 2608.1. Samples: 5723616. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:59:39,293][32845] Avg episode reward: [(0, '4.319')] -[2025-08-29 20:59:44,291][32845] Fps is (10 sec: 13107.7, 60 sec: 12014.9, 300 sec: 8526.4). Total num frames: 22872064. Throughput: 0: 2736.3. Samples: 5736784. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 20:59:44,293][32845] Avg episode reward: [(0, '4.460')] -[2025-08-29 20:59:48,738][48863] Updated weights for policy 0, policy_version 350 (0.0014) -[2025-08-29 20:59:49,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 8664.1). Total num frames: 22937600. Throughput: 0: 3126.4. Samples: 5765852. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 20:59:49,292][32845] Avg episode reward: [(0, '4.485')] -[2025-08-29 20:59:50,688][48846] Signal inference workers to stop experience collection... (350 times) -[2025-08-29 20:59:50,695][48863] InferenceWorker_p0-w0: stopping experience collection (350 times) -[2025-08-29 20:59:52,773][48846] Signal inference workers to resume experience collection... (350 times) -[2025-08-29 20:59:52,773][48863] InferenceWorker_p0-w0: resuming experience collection (350 times) -[2025-08-29 20:59:54,291][32845] Fps is (10 sec: 13107.2, 60 sec: 12015.0, 300 sec: 8664.1). Total num frames: 23003136. Throughput: 0: 2972.3. Samples: 5779820. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) -[2025-08-29 20:59:54,292][32845] Avg episode reward: [(0, '4.470')] -[2025-08-29 20:59:59,291][32845] Fps is (10 sec: 13107.2, 60 sec: 12014.9, 300 sec: 8886.2). Total num frames: 23068672. Throughput: 0: 3188.0. Samples: 5794880. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 20:59:59,292][32845] Avg episode reward: [(0, '4.541')] -[2025-08-29 21:00:04,291][32845] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 8886.2). Total num frames: 23134208. Throughput: 0: 3117.6. Samples: 5816320. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:00:04,293][32845] Avg episode reward: [(0, '4.303')] -[2025-08-29 21:00:09,291][32845] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 9108.4). Total num frames: 23199744. Throughput: 0: 3161.3. Samples: 5835112. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:00:09,293][32845] Avg episode reward: [(0, '4.185')] -[2025-08-29 21:00:14,711][32845] Fps is (10 sec: 6289.6, 60 sec: 11931.4, 300 sec: 8873.6). Total num frames: 23199744. Throughput: 0: 2888.5. Samples: 5835112. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:00:14,713][32845] Avg episode reward: [(0, '4.402')] -[2025-08-29 21:00:19,291][32845] Fps is (10 sec: 6553.7, 60 sec: 12014.9, 300 sec: 9108.4). Total num frames: 23265280. Throughput: 0: 3075.5. Samples: 5857252. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:00:19,292][32845] Avg episode reward: [(0, '4.423')] -[2025-08-29 21:00:24,291][32845] Fps is (10 sec: 13681.7, 60 sec: 12014.9, 300 sec: 9108.4). Total num frames: 23330816. Throughput: 0: 3339.6. Samples: 5873896. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:00:24,292][32845] Avg episode reward: [(0, '4.476')] -[2025-08-29 21:00:29,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 9330.5). Total num frames: 23396352. Throughput: 0: 3362.4. Samples: 5888092. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:00:29,292][32845] Avg episode reward: [(0, '4.604')] -[2025-08-29 21:00:34,291][32845] Fps is (10 sec: 19660.9, 60 sec: 13107.3, 300 sec: 9552.7). Total num frames: 23527424. Throughput: 0: 3163.5. Samples: 5908208. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:00:34,293][32845] Avg episode reward: [(0, '4.687')] -[2025-08-29 21:00:38,198][48863] Updated weights for policy 0, policy_version 360 (0.0021) -[2025-08-29 21:00:39,291][32845] Fps is (10 sec: 19660.5, 60 sec: 13107.2, 300 sec: 9552.7). Total num frames: 23592960. Throughput: 0: 3359.7. Samples: 5931008. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:00:39,292][32845] Avg episode reward: [(0, '4.300')] -[2025-08-29 21:00:44,291][32845] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 9774.9). Total num frames: 23658496. Throughput: 0: 3313.4. Samples: 5943984. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:00:44,292][32845] Avg episode reward: [(0, '4.647')] -[2025-08-29 21:00:50,550][32845] Fps is (10 sec: 5820.6, 60 sec: 11767.9, 300 sec: 9512.1). Total num frames: 23658496. Throughput: 0: 2912.9. Samples: 5951068. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:00:50,551][32845] Avg episode reward: [(0, '4.457')] -[2025-08-29 21:00:54,291][32845] Fps is (10 sec: 6553.4, 60 sec: 12014.9, 300 sec: 9774.9). Total num frames: 23724032. Throughput: 0: 3012.4. Samples: 5970672. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:00:54,293][32845] Avg episode reward: [(0, '4.533')] -[2025-08-29 21:00:59,291][32845] Fps is (10 sec: 14995.7, 60 sec: 12015.0, 300 sec: 9997.0). Total num frames: 23789568. Throughput: 0: 3253.7. Samples: 5980160. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:00:59,292][32845] Avg episode reward: [(0, '4.197')] -[2025-08-29 21:01:04,291][32845] Fps is (10 sec: 13107.5, 60 sec: 12014.9, 300 sec: 9997.0). Total num frames: 23855104. Throughput: 0: 3172.5. Samples: 6000016. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:01:04,292][32845] Avg episode reward: [(0, '4.255')] -[2025-08-29 21:01:09,291][32845] Fps is (10 sec: 13107.1, 60 sec: 12015.0, 300 sec: 9997.0). Total num frames: 23920640. Throughput: 0: 3217.4. Samples: 6018680. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:01:09,293][32845] Avg episode reward: [(0, '4.330')] -[2025-08-29 21:01:14,291][32845] Fps is (10 sec: 13107.2, 60 sec: 13199.6, 300 sec: 10219.2). Total num frames: 23986176. Throughput: 0: 3148.8. Samples: 6029788. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:01:14,292][32845] Avg episode reward: [(0, '4.733')] -[2025-08-29 21:01:19,291][32845] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 10219.2). Total num frames: 24051712. Throughput: 0: 3065.4. Samples: 6046152. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:01:19,293][32845] Avg episode reward: [(0, '4.547')] -[2025-08-29 21:01:26,373][32845] Fps is (10 sec: 10848.5, 60 sec: 12667.6, 300 sec: 10368.2). Total num frames: 24117248. Throughput: 0: 2729.9. Samples: 6059536. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:01:26,374][32845] Avg episode reward: [(0, '4.670')] -[2025-08-29 21:01:29,291][32845] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 10441.3). Total num frames: 24182784. Throughput: 0: 2818.6. Samples: 6070820. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:01:29,293][32845] Avg episode reward: [(0, '4.653')] -[2025-08-29 21:01:33,242][48863] Updated weights for policy 0, policy_version 370 (0.0014) -[2025-08-29 21:01:33,242][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000370_24248320.pth... -[2025-08-29 21:01:33,310][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000370_24248320.pth -[2025-08-29 21:01:34,291][32845] Fps is (10 sec: 16553.8, 60 sec: 12014.9, 300 sec: 10678.6). Total num frames: 24248320. Throughput: 0: 3137.8. Samples: 6088316. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:01:34,292][32845] Avg episode reward: [(0, '4.319')] -[2025-08-29 21:01:39,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12015.0, 300 sec: 10663.5). Total num frames: 24313856. Throughput: 0: 3123.8. Samples: 6111244. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:01:39,293][32845] Avg episode reward: [(0, '4.534')] -[2025-08-29 21:01:44,291][32845] Fps is (10 sec: 13107.0, 60 sec: 12014.9, 300 sec: 10885.7). Total num frames: 24379392. Throughput: 0: 3107.7. Samples: 6120008. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:01:44,292][32845] Avg episode reward: [(0, '4.292')] -[2025-08-29 21:01:49,291][32845] Fps is (10 sec: 13107.3, 60 sec: 13388.2, 300 sec: 10885.6). Total num frames: 24444928. Throughput: 0: 3172.6. Samples: 6142784. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:01:49,292][32845] Avg episode reward: [(0, '4.367')] -[2025-08-29 21:01:54,291][32845] Fps is (10 sec: 13107.2, 60 sec: 13107.2, 300 sec: 11107.8). Total num frames: 24510464. Throughput: 0: 3265.5. Samples: 6165628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:01:54,292][32845] Avg episode reward: [(0, '4.222')] -[2025-08-29 21:02:02,201][32845] Fps is (10 sec: 10152.6, 60 sec: 12500.9, 300 sec: 10999.3). Total num frames: 24576000. Throughput: 0: 2941.5. Samples: 6170716. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:02:02,202][32845] Avg episode reward: [(0, '4.350')] -[2025-08-29 21:02:04,291][32845] Fps is (10 sec: 6553.7, 60 sec: 12015.0, 300 sec: 10885.7). Total num frames: 24576000. Throughput: 0: 3040.5. Samples: 6182972. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:02:04,292][32845] Avg episode reward: [(0, '4.447')] -[2025-08-29 21:02:09,291][32845] Fps is (10 sec: 18487.1, 60 sec: 13107.2, 300 sec: 11377.9). Total num frames: 24707072. Throughput: 0: 3317.7. Samples: 6201924. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:02:09,292][32845] Avg episode reward: [(0, '4.379')] -[2025-08-29 21:02:14,291][32845] Fps is (10 sec: 19660.2, 60 sec: 13107.1, 300 sec: 11552.1). Total num frames: 24772608. Throughput: 0: 3245.2. Samples: 6216856. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:02:14,293][32845] Avg episode reward: [(0, '4.392')] -[2025-08-29 21:02:19,291][32845] Fps is (10 sec: 13106.9, 60 sec: 13107.2, 300 sec: 11552.1). Total num frames: 24838144. Throughput: 0: 3252.3. Samples: 6234672. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:02:19,292][32845] Avg episode reward: [(0, '4.192')] -[2025-08-29 21:02:22,656][48863] Updated weights for policy 0, policy_version 380 (0.0011) -[2025-08-29 21:02:24,291][32845] Fps is (10 sec: 13107.3, 60 sec: 13578.3, 300 sec: 11552.1). Total num frames: 24903680. Throughput: 0: 3333.4. Samples: 6261248. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:02:24,293][32845] Avg episode reward: [(0, '4.321')] -[2025-08-29 21:02:29,291][32845] Fps is (10 sec: 13107.5, 60 sec: 13107.2, 300 sec: 11774.3). Total num frames: 24969216. Throughput: 0: 3273.4. Samples: 6267312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:02:29,292][32845] Avg episode reward: [(0, '4.414')] -[2025-08-29 21:02:34,291][32845] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 11774.3). Total num frames: 25034752. Throughput: 0: 3415.5. Samples: 6296484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:02:34,293][32845] Avg episode reward: [(0, '4.263')] -[2025-08-29 21:02:39,292][32845] Fps is (10 sec: 13106.2, 60 sec: 13107.0, 300 sec: 11774.2). Total num frames: 25100288. Throughput: 0: 2974.4. Samples: 6299476. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-08-29 21:02:39,293][32845] Avg episode reward: [(0, '4.242')] -[2025-08-29 21:02:44,291][32845] Fps is (10 sec: 13107.0, 60 sec: 13107.2, 300 sec: 12082.2). Total num frames: 25165824. Throughput: 0: 3413.0. Samples: 6314368. Policy #0 lag: (min: 1.0, avg: 1.9, max: 2.0) -[2025-08-29 21:02:44,293][32845] Avg episode reward: [(0, '4.295')] -[2025-08-29 21:02:49,291][32845] Fps is (10 sec: 13108.2, 60 sec: 13107.2, 300 sec: 12218.6). Total num frames: 25231360. Throughput: 0: 3298.2. Samples: 6331392. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-08-29 21:02:49,293][32845] Avg episode reward: [(0, '4.477')] -[2025-08-29 21:02:54,291][32845] Fps is (10 sec: 13107.6, 60 sec: 13107.2, 300 sec: 12218.6). Total num frames: 25296896. Throughput: 0: 3457.0. Samples: 6357488. Policy #0 lag: (min: 1.0, avg: 1.9, max: 2.0) -[2025-08-29 21:02:54,292][32845] Avg episode reward: [(0, '4.359')] -[2025-08-29 21:02:59,291][32845] Fps is (10 sec: 13107.1, 60 sec: 13775.3, 300 sec: 12440.7). Total num frames: 25362432. Throughput: 0: 3312.0. Samples: 6365896. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-08-29 21:02:59,292][32845] Avg episode reward: [(0, '4.293')] -[2025-08-29 21:03:04,291][32845] Fps is (10 sec: 13107.1, 60 sec: 14199.5, 300 sec: 12440.7). Total num frames: 25427968. Throughput: 0: 3443.4. Samples: 6389624. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:03:04,292][32845] Avg episode reward: [(0, '4.205')] -[2025-08-29 21:03:09,291][32845] Fps is (10 sec: 13107.3, 60 sec: 13107.2, 300 sec: 12662.9). Total num frames: 25493504. Throughput: 0: 3227.2. Samples: 6406472. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:03:09,292][32845] Avg episode reward: [(0, '4.386')] -[2025-08-29 21:03:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 12015.0, 300 sec: 12440.7). Total num frames: 25493504. Throughput: 0: 3290.9. Samples: 6415404. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:03:14,293][32845] Avg episode reward: [(0, '4.395')] -[2025-08-29 21:03:16,222][48863] Updated weights for policy 0, policy_version 390 (0.0011) -[2025-08-29 21:03:19,292][32845] Fps is (10 sec: 6553.1, 60 sec: 12014.8, 300 sec: 12564.7). Total num frames: 25559040. Throughput: 0: 2932.1. Samples: 6428432. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:03:19,294][32845] Avg episode reward: [(0, '4.387')] -[2025-08-29 21:03:24,291][32845] Fps is (10 sec: 13107.2, 60 sec: 12015.0, 300 sec: 12440.7). Total num frames: 25624576. Throughput: 0: 3216.5. Samples: 6444216. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:03:24,292][32845] Avg episode reward: [(0, '4.308')] -[2025-08-29 21:03:29,291][32845] Fps is (10 sec: 13107.9, 60 sec: 12014.9, 300 sec: 12440.7). Total num frames: 25690112. Throughput: 0: 3136.3. Samples: 6455500. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:03:29,292][32845] Avg episode reward: [(0, '4.283')] -[2025-08-29 21:03:34,291][32845] Fps is (10 sec: 13107.1, 60 sec: 12015.0, 300 sec: 12440.7). Total num frames: 25755648. Throughput: 0: 3158.1. Samples: 6473508. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:03:34,292][32845] Avg episode reward: [(0, '4.511')] -[2025-08-29 21:03:35,911][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000394_25821184.pth... -[2025-08-29 21:03:36,054][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000394_25821184.pth -[2025-08-29 21:03:39,291][32845] Fps is (10 sec: 13107.5, 60 sec: 12015.1, 300 sec: 12440.7). Total num frames: 25821184. Throughput: 0: 2919.1. Samples: 6488848. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:03:39,292][32845] Avg episode reward: [(0, '4.703')] -[2025-08-29 21:03:44,291][32845] Fps is (10 sec: 13107.1, 60 sec: 12015.0, 300 sec: 12440.7). Total num frames: 25886720. Throughput: 0: 2949.6. Samples: 6498628. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:03:44,292][32845] Avg episode reward: [(0, '4.492')] -[2025-08-29 21:03:49,705][32845] Fps is (10 sec: 12586.0, 60 sec: 11932.6, 300 sec: 12423.3). Total num frames: 25952256. Throughput: 0: 2730.4. Samples: 6513624. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:03:49,706][32845] Avg episode reward: [(0, '4.534')] -[2025-08-29 21:03:54,291][32845] Fps is (10 sec: 13107.1, 60 sec: 12014.9, 300 sec: 12440.7). Total num frames: 26017792. Throughput: 0: 2689.6. Samples: 6527504. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 21:03:54,293][32845] Avg episode reward: [(0, '4.351')] -[2025-08-29 21:03:59,291][32845] Fps is (10 sec: 13673.5, 60 sec: 12015.0, 300 sec: 12662.9). Total num frames: 26083328. Throughput: 0: 2826.9. Samples: 6542616. Policy #0 lag: (min: 1.0, avg: 1.8, max: 2.0) -[2025-08-29 21:03:59,292][32845] Avg episode reward: [(0, '4.306')] -[2025-08-29 21:04:04,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 12662.9). Total num frames: 26148864. Throughput: 0: 2948.1. Samples: 6561096. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-08-29 21:04:04,292][32845] Avg episode reward: [(0, '4.422')] -[2025-08-29 21:04:08,065][48863] Updated weights for policy 0, policy_version 400 (0.0014) -[2025-08-29 21:04:09,291][32845] Fps is (10 sec: 13107.1, 60 sec: 12014.9, 300 sec: 12662.9). Total num frames: 26214400. Throughput: 0: 3200.0. Samples: 6588216. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:04:09,292][32845] Avg episode reward: [(0, '4.431')] -[2025-08-29 21:04:09,666][48846] Signal inference workers to stop experience collection... (400 times) -[2025-08-29 21:04:09,677][48863] InferenceWorker_p0-w0: stopping experience collection (400 times) -[2025-08-29 21:04:12,052][48846] Signal inference workers to resume experience collection... (400 times) -[2025-08-29 21:04:12,052][48863] InferenceWorker_p0-w0: resuming experience collection (400 times) -[2025-08-29 21:04:14,291][32845] Fps is (10 sec: 13107.4, 60 sec: 13107.2, 300 sec: 12662.9). Total num frames: 26279936. Throughput: 0: 3093.3. Samples: 6594696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:04:14,292][32845] Avg episode reward: [(0, '4.350')] -[2025-08-29 21:04:19,291][32845] Fps is (10 sec: 13107.0, 60 sec: 13107.3, 300 sec: 12662.9). Total num frames: 26345472. Throughput: 0: 3230.9. Samples: 6618900. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:04:19,293][32845] Avg episode reward: [(0, '4.539')] -[2025-08-29 21:04:25,537][32845] Fps is (10 sec: 5827.4, 60 sec: 11770.5, 300 sec: 12388.4). Total num frames: 26345472. Throughput: 0: 2989.7. Samples: 6627112. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:04:25,538][32845] Avg episode reward: [(0, '4.529')] -[2025-08-29 21:04:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 12015.0, 300 sec: 12440.8). Total num frames: 26411008. Throughput: 0: 2926.8. Samples: 6630332. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:04:29,293][32845] Avg episode reward: [(0, '4.336')] -[2025-08-29 21:04:34,291][32845] Fps is (10 sec: 14972.8, 60 sec: 12014.9, 300 sec: 12440.7). Total num frames: 26476544. Throughput: 0: 3239.1. Samples: 6658044. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:04:34,293][32845] Avg episode reward: [(0, '4.550')] -[2025-08-29 21:04:39,291][32845] Fps is (10 sec: 13107.0, 60 sec: 12014.9, 300 sec: 12440.7). Total num frames: 26542080. Throughput: 0: 3323.7. Samples: 6677072. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:04:39,293][32845] Avg episode reward: [(0, '4.608')] -[2025-08-29 21:04:44,291][32845] Fps is (10 sec: 13107.4, 60 sec: 12014.9, 300 sec: 12440.7). Total num frames: 26607616. Throughput: 0: 3282.1. Samples: 6690312. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:04:44,292][32845] Avg episode reward: [(0, '4.545')] -[2025-08-29 21:04:49,291][32845] Fps is (10 sec: 13107.2, 60 sec: 12098.4, 300 sec: 12440.7). Total num frames: 26673152. Throughput: 0: 3240.0. Samples: 6706896. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:04:49,293][32845] Avg episode reward: [(0, '4.400')] -[2025-08-29 21:04:54,291][32845] Fps is (10 sec: 19660.5, 60 sec: 13107.2, 300 sec: 12662.9). Total num frames: 26804224. Throughput: 0: 3044.0. Samples: 6725196. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:04:54,293][32845] Avg episode reward: [(0, '4.394')] -[2025-08-29 21:05:01,370][32845] Fps is (10 sec: 10851.7, 60 sec: 11612.6, 300 sec: 12353.7). Total num frames: 26804224. Throughput: 0: 3046.4. Samples: 6738116. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:05:01,371][32845] Avg episode reward: [(0, '4.539')] -[2025-08-29 21:05:02,876][48863] Updated weights for policy 0, policy_version 410 (0.0014) -[2025-08-29 21:05:04,291][32845] Fps is (10 sec: 6553.7, 60 sec: 12015.0, 300 sec: 12440.7). Total num frames: 26869760. Throughput: 0: 2898.4. Samples: 6749328. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:05:04,292][32845] Avg episode reward: [(0, '4.722')] -[2025-08-29 21:05:09,291][32845] Fps is (10 sec: 16546.3, 60 sec: 12014.9, 300 sec: 12680.9). Total num frames: 26935296. Throughput: 0: 3159.9. Samples: 6765372. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:05:09,293][32845] Avg episode reward: [(0, '4.338')] -[2025-08-29 21:05:14,291][32845] Fps is (10 sec: 6553.5, 60 sec: 10922.7, 300 sec: 12440.7). Total num frames: 26935296. Throughput: 0: 3213.3. Samples: 6774932. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:05:14,292][32845] Avg episode reward: [(0, '4.344')] -[2025-08-29 21:05:19,291][32845] Fps is (10 sec: 6553.7, 60 sec: 10922.7, 300 sec: 12440.7). Total num frames: 27000832. Throughput: 0: 2831.9. Samples: 6785480. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:05:19,293][32845] Avg episode reward: [(0, '4.463')] -[2025-08-29 21:05:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 11154.3, 300 sec: 12218.6). Total num frames: 27000832. Throughput: 0: 2717.5. Samples: 6799360. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:05:24,292][32845] Avg episode reward: [(0, '4.313')] -[2025-08-29 21:05:29,291][32845] Fps is (10 sec: 6553.7, 60 sec: 10922.7, 300 sec: 11996.4). Total num frames: 27066368. Throughput: 0: 2465.6. Samples: 6801264. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:05:29,292][32845] Avg episode reward: [(0, '4.500')] -[2025-08-29 21:05:37,203][32845] Fps is (10 sec: 5075.8, 60 sec: 9375.5, 300 sec: 11659.2). Total num frames: 27066368. Throughput: 0: 2273.8. Samples: 6815836. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-08-29 21:05:37,204][32845] Avg episode reward: [(0, '4.463')] -[2025-08-29 21:05:38,268][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_27131904.pth... -[2025-08-29 21:05:38,444][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_27131904.pth -[2025-08-29 21:05:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 9830.4, 300 sec: 11774.3). Total num frames: 27131904. Throughput: 0: 2028.0. Samples: 6816456. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:05:39,292][32845] Avg episode reward: [(0, '4.544')] -[2025-08-29 21:05:44,291][32845] Fps is (10 sec: 18490.9, 60 sec: 9830.4, 300 sec: 12047.9). Total num frames: 27197440. Throughput: 0: 1828.0. Samples: 6816576. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:05:44,292][32845] Avg episode reward: [(0, '4.726')] -[2025-08-29 21:05:49,291][32845] Fps is (10 sec: 6553.7, 60 sec: 8738.2, 300 sec: 11774.3). Total num frames: 27197440. Throughput: 0: 1909.6. Samples: 6835260. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:05:49,293][32845] Avg episode reward: [(0, '4.459')] -[2025-08-29 21:05:54,291][32845] Fps is (10 sec: 6553.7, 60 sec: 7645.9, 300 sec: 11774.3). Total num frames: 27262976. Throughput: 0: 1847.6. Samples: 6848512. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:05:54,292][32845] Avg episode reward: [(0, '4.532')] -[2025-08-29 21:05:59,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7920.3, 300 sec: 11552.1). Total num frames: 27262976. Throughput: 0: 1817.8. Samples: 6856732. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:05:59,292][32845] Avg episode reward: [(0, '4.454')] -[2025-08-29 21:06:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 11552.1). Total num frames: 27328512. Throughput: 0: 1788.6. Samples: 6865968. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 21:06:04,292][32845] Avg episode reward: [(0, '4.307')] -[2025-08-29 21:06:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 11330.0). Total num frames: 27328512. Throughput: 0: 1818.7. Samples: 6881200. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 21:06:09,292][32845] Avg episode reward: [(0, '4.357')] -[2025-08-29 21:06:14,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 11107.8). Total num frames: 27328512. Throughput: 0: 1781.1. Samples: 6881412. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 21:06:14,292][32845] Avg episode reward: [(0, '4.357')] -[2025-08-29 21:06:19,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 11186.7). Total num frames: 27394048. Throughput: 0: 1745.7. Samples: 6889308. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:06:19,292][32845] Avg episode reward: [(0, '4.570')] -[2025-08-29 21:06:24,291][32845] Fps is (10 sec: 13107.1, 60 sec: 7645.9, 300 sec: 11107.8). Total num frames: 27459584. Throughput: 0: 1806.9. Samples: 6897768. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:06:24,292][32845] Avg episode reward: [(0, '4.427')] -[2025-08-29 21:06:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 10885.6). Total num frames: 27459584. Throughput: 0: 2016.2. Samples: 6907304. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:06:29,292][32845] Avg episode reward: [(0, '4.498')] -[2025-08-29 21:06:29,708][48863] Updated weights for policy 0, policy_version 420 (0.0012) -[2025-08-29 21:06:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 8035.8, 300 sec: 10885.6). Total num frames: 27525120. Throughput: 0: 1821.9. Samples: 6917244. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:06:34,292][32845] Avg episode reward: [(0, '4.459')] -[2025-08-29 21:06:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 10663.5). Total num frames: 27525120. Throughput: 0: 1821.9. Samples: 6930496. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:06:39,292][32845] Avg episode reward: [(0, '4.543')] -[2025-08-29 21:06:44,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 10663.5). Total num frames: 27590656. Throughput: 0: 1759.5. Samples: 6935908. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:06:44,293][32845] Avg episode reward: [(0, '4.640')] -[2025-08-29 21:06:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 10441.3). Total num frames: 27590656. Throughput: 0: 1783.2. Samples: 6946212. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:06:49,292][32845] Avg episode reward: [(0, '4.506')] -[2025-08-29 21:06:54,291][32845] Fps is (10 sec: 6553.8, 60 sec: 6553.6, 300 sec: 10545.4). Total num frames: 27656192. Throughput: 0: 1578.0. Samples: 6952208. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) -[2025-08-29 21:06:54,292][32845] Avg episode reward: [(0, '4.258')] -[2025-08-29 21:06:59,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 10663.5). Total num frames: 27721728. Throughput: 0: 1730.8. Samples: 6959300. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:06:59,292][32845] Avg episode reward: [(0, '4.459')] -[2025-08-29 21:07:04,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 10219.2). Total num frames: 27721728. Throughput: 0: 1827.0. Samples: 6971524. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:07:04,293][32845] Avg episode reward: [(0, '4.264')] -[2025-08-29 21:07:09,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 10219.2). Total num frames: 27787264. Throughput: 0: 1866.5. Samples: 6981760. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:07:09,292][32845] Avg episode reward: [(0, '4.606')] -[2025-08-29 21:07:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 9997.0). Total num frames: 27787264. Throughput: 0: 1828.1. Samples: 6989568. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:07:14,293][32845] Avg episode reward: [(0, '4.465')] -[2025-08-29 21:07:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 9997.0). Total num frames: 27852800. Throughput: 0: 1844.0. Samples: 7000224. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) -[2025-08-29 21:07:19,292][32845] Avg episode reward: [(0, '4.317')] -[2025-08-29 21:07:24,705][32845] Fps is (10 sec: 6293.3, 60 sec: 6508.7, 300 sec: 9761.2). Total num frames: 27852800. Throughput: 0: 1724.9. Samples: 7008832. Policy #0 lag: (min: 1.0, avg: 1.8, max: 3.0) -[2025-08-29 21:07:24,706][32845] Avg episode reward: [(0, '4.281')] -[2025-08-29 21:07:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 9774.9). Total num frames: 27918336. Throughput: 0: 1669.7. Samples: 7011044. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:07:29,294][32845] Avg episode reward: [(0, '4.406')] -[2025-08-29 21:07:34,291][32845] Fps is (10 sec: 6836.4, 60 sec: 6553.6, 300 sec: 9552.7). Total num frames: 27918336. Throughput: 0: 1771.6. Samples: 7025936. Policy #0 lag: (min: 1.0, avg: 2.0, max: 2.0) -[2025-08-29 21:07:34,292][32845] Avg episode reward: [(0, '4.417')] -[2025-08-29 21:07:36,587][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000427_27983872.pth... -[2025-08-29 21:07:36,757][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000427_27983872.pth -[2025-08-29 21:07:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 9552.7). Total num frames: 27983872. Throughput: 0: 1722.2. Samples: 7029708. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:07:39,292][32845] Avg episode reward: [(0, '4.384')] -[2025-08-29 21:07:44,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7645.9, 300 sec: 9552.7). Total num frames: 28049408. Throughput: 0: 1659.8. Samples: 7033992. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 21:07:44,292][32845] Avg episode reward: [(0, '4.527')] -[2025-08-29 21:07:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 9330.5). Total num frames: 28049408. Throughput: 0: 1858.9. Samples: 7055172. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 21:07:49,292][32845] Avg episode reward: [(0, '4.671')] -[2025-08-29 21:07:54,291][32845] Fps is (10 sec: 6553.4, 60 sec: 7645.8, 300 sec: 9330.5). Total num frames: 28114944. Throughput: 0: 1898.3. Samples: 7067184. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:07:54,293][32845] Avg episode reward: [(0, '4.367')] -[2025-08-29 21:08:00,533][32845] Fps is (10 sec: 5829.8, 60 sec: 6420.8, 300 sec: 9070.2). Total num frames: 28114944. Throughput: 0: 1858.6. Samples: 7075512. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:08:00,534][32845] Avg episode reward: [(0, '4.392')] -[2025-08-29 21:08:04,020][48863] Updated weights for policy 0, policy_version 430 (0.0012) -[2025-08-29 21:08:04,291][32845] Fps is (10 sec: 6553.8, 60 sec: 7645.9, 300 sec: 9108.4). Total num frames: 28180480. Throughput: 0: 1727.9. Samples: 7077980. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 21:08:04,292][32845] Avg episode reward: [(0, '4.431')] -[2025-08-29 21:08:09,291][32845] Fps is (10 sec: 7482.5, 60 sec: 6553.6, 300 sec: 9108.4). Total num frames: 28180480. Throughput: 0: 1805.8. Samples: 7089348. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 21:08:09,292][32845] Avg episode reward: [(0, '4.413')] -[2025-08-29 21:08:14,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 9108.4). Total num frames: 28246016. Throughput: 0: 1851.4. Samples: 7094356. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:08:14,292][32845] Avg episode reward: [(0, '4.536')] -[2025-08-29 21:08:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8886.2). Total num frames: 28246016. Throughput: 0: 1837.5. Samples: 7108624. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:08:19,292][32845] Avg episode reward: [(0, '4.488')] -[2025-08-29 21:08:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7699.0, 300 sec: 8886.2). Total num frames: 28311552. Throughput: 0: 1925.5. Samples: 7116356. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:08:24,292][32845] Avg episode reward: [(0, '4.205')] -[2025-08-29 21:08:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8664.1). Total num frames: 28311552. Throughput: 0: 2036.6. Samples: 7125640. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:08:29,292][32845] Avg episode reward: [(0, '4.833')] -[2025-08-29 21:08:36,372][32845] Fps is (10 sec: 5424.7, 60 sec: 7389.6, 300 sec: 8603.4). Total num frames: 28377088. Throughput: 0: 1677.9. Samples: 7134168. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:08:36,374][32845] Avg episode reward: [(0, '4.381')] -[2025-08-29 21:08:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8441.9). Total num frames: 28377088. Throughput: 0: 1657.3. Samples: 7141764. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:08:39,292][32845] Avg episode reward: [(0, '4.378')] -[2025-08-29 21:08:44,291][32845] Fps is (10 sec: 8275.8, 60 sec: 6553.6, 300 sec: 8453.8). Total num frames: 28442624. Throughput: 0: 1555.1. Samples: 7143560. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 21:08:44,292][32845] Avg episode reward: [(0, '4.278')] -[2025-08-29 21:08:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 28442624. Throughput: 0: 1787.9. Samples: 7158436. Policy #0 lag: (min: 1.0, avg: 1.9, max: 3.0) -[2025-08-29 21:08:49,292][32845] Avg episode reward: [(0, '4.183')] -[2025-08-29 21:08:54,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 8219.8). Total num frames: 28508160. Throughput: 0: 1756.2. Samples: 7168376. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:08:54,292][32845] Avg episode reward: [(0, '4.347')] -[2025-08-29 21:08:59,291][32845] Fps is (10 sec: 13107.2, 60 sec: 7807.4, 300 sec: 8219.8). Total num frames: 28573696. Throughput: 0: 1806.0. Samples: 7175628. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:08:59,293][32845] Avg episode reward: [(0, '4.472')] -[2025-08-29 21:09:04,292][32845] Fps is (10 sec: 6553.1, 60 sec: 6553.5, 300 sec: 7997.6). Total num frames: 28573696. Throughput: 0: 1748.0. Samples: 7187284. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:04,293][32845] Avg episode reward: [(0, '4.474')] -[2025-08-29 21:09:12,204][32845] Fps is (10 sec: 5075.2, 60 sec: 7291.8, 300 sec: 7919.4). Total num frames: 28639232. Throughput: 0: 1683.2. Samples: 7197004. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:12,205][32845] Avg episode reward: [(0, '4.415')] -[2025-08-29 21:09:14,291][32845] Fps is (10 sec: 6554.1, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 28639232. Throughput: 0: 1597.2. Samples: 7197512. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:14,292][32845] Avg episode reward: [(0, '4.348')] -[2025-08-29 21:09:19,292][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 7808.4). Total num frames: 28639232. Throughput: 0: 1745.0. Samples: 7209064. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:19,294][32845] Avg episode reward: [(0, '4.580')] -[2025-08-29 21:09:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7775.5). Total num frames: 28704768. Throughput: 0: 1759.6. Samples: 7220944. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:24,292][32845] Avg episode reward: [(0, '4.325')] -[2025-08-29 21:09:29,291][32845] Fps is (10 sec: 13107.6, 60 sec: 7645.9, 300 sec: 7775.5). Total num frames: 28770304. Throughput: 0: 1802.9. Samples: 7224692. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:29,292][32845] Avg episode reward: [(0, '4.486')] -[2025-08-29 21:09:34,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6789.1, 300 sec: 7553.3). Total num frames: 28770304. Throughput: 0: 1753.6. Samples: 7237348. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:34,292][32845] Avg episode reward: [(0, '4.622')] -[2025-08-29 21:09:38,915][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_28835840.pth... -[2025-08-29 21:09:38,916][48863] Updated weights for policy 0, policy_version 440 (0.0012) -[2025-08-29 21:09:39,067][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000440_28835840.pth -[2025-08-29 21:09:39,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7553.3). Total num frames: 28835840. Throughput: 0: 1632.4. Samples: 7241832. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:39,292][32845] Avg episode reward: [(0, '4.569')] -[2025-08-29 21:09:44,291][32845] Fps is (10 sec: 13107.1, 60 sec: 7645.9, 300 sec: 7553.3). Total num frames: 28901376. Throughput: 0: 1491.5. Samples: 7242744. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:44,292][32845] Avg episode reward: [(0, '4.509')] -[2025-08-29 21:09:49,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 28901376. Throughput: 0: 1329.6. Samples: 7247116. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:49,292][32845] Avg episode reward: [(0, '4.470')] -[2025-08-29 21:09:54,291][32845] Fps is (10 sec: 0.0, 60 sec: 6553.6, 300 sec: 7159.4). Total num frames: 28901376. Throughput: 0: 1626.3. Samples: 7265448. Policy #0 lag: (min: 1.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:54,292][32845] Avg episode reward: [(0, '4.543')] -[2025-08-29 21:09:59,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 7109.0). Total num frames: 28966912. Throughput: 0: 1693.6. Samples: 7273724. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:09:59,293][32845] Avg episode reward: [(0, '4.395')] -[2025-08-29 21:10:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.7, 300 sec: 6886.8). Total num frames: 28966912. Throughput: 0: 1668.5. Samples: 7284144. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:10:04,292][32845] Avg episode reward: [(0, '4.433')] -[2025-08-29 21:10:09,291][32845] Fps is (10 sec: 6553.7, 60 sec: 6888.0, 300 sec: 7109.0). Total num frames: 29032448. Throughput: 0: 1571.6. Samples: 7291664. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 21:10:09,292][32845] Avg episode reward: [(0, '4.392')] -[2025-08-29 21:10:14,291][32845] Fps is (10 sec: 6553.5, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 29032448. Throughput: 0: 1669.5. Samples: 7299820. Policy #0 lag: (min: 1.0, avg: 1.1, max: 3.0) -[2025-08-29 21:10:14,292][32845] Avg episode reward: [(0, '4.491')] -[2025-08-29 21:10:19,291][32845] Fps is (10 sec: 6553.6, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 29097984. Throughput: 0: 1555.7. Samples: 7307356. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:10:19,293][32845] Avg episode reward: [(0, '4.384')] -[2025-08-29 21:10:24,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 6886.8). Total num frames: 29097984. Throughput: 0: 1717.8. Samples: 7319132. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:10:24,292][32845] Avg episode reward: [(0, '4.352')] -[2025-08-29 21:10:29,291][32845] Fps is (10 sec: 6553.6, 60 sec: 6553.6, 300 sec: 7179.9). Total num frames: 29163520. Throughput: 0: 1797.9. Samples: 7323648. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:10:29,292][32845] Avg episode reward: [(0, '4.619')] -[2025-08-29 21:10:34,291][32845] Fps is (10 sec: 13106.9, 60 sec: 7645.8, 300 sec: 7109.0). Total num frames: 29229056. Throughput: 0: 2040.6. Samples: 7338944. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:10:34,293][32845] Avg episode reward: [(0, '4.339')] -[2025-08-29 21:10:39,291][32845] Fps is (10 sec: 13107.3, 60 sec: 7645.9, 300 sec: 7109.0). Total num frames: 29294592. Throughput: 0: 2149.0. Samples: 7362152. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:10:39,292][32845] Avg episode reward: [(0, '4.561')] -[2025-08-29 21:10:44,291][32845] Fps is (10 sec: 13107.5, 60 sec: 7645.9, 300 sec: 7331.1). Total num frames: 29360128. Throughput: 0: 2055.7. Samples: 7366232. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:10:44,292][32845] Avg episode reward: [(0, '4.434')] -[2025-08-29 21:10:49,291][32845] Fps is (10 sec: 13107.1, 60 sec: 8738.1, 300 sec: 7331.1). Total num frames: 29425664. Throughput: 0: 2334.2. Samples: 7389184. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:10:49,293][32845] Avg episode reward: [(0, '4.533')] -[2025-08-29 21:10:52,572][48863] Updated weights for policy 0, policy_version 450 (0.0014) -[2025-08-29 21:10:54,291][32845] Fps is (10 sec: 13107.2, 60 sec: 9830.4, 300 sec: 7553.3). Total num frames: 29491200. Throughput: 0: 2621.3. Samples: 7409624. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:10:54,292][32845] Avg episode reward: [(0, '4.377')] -[2025-08-29 21:10:54,541][48846] Signal inference workers to stop experience collection... (450 times) -[2025-08-29 21:10:54,548][48863] InferenceWorker_p0-w0: stopping experience collection (450 times) -[2025-08-29 21:10:59,706][32845] Fps is (10 sec: 6292.3, 60 sec: 8678.1, 300 sec: 7320.8). Total num frames: 29491200. Throughput: 0: 2417.8. Samples: 7409624. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:10:59,707][32845] Avg episode reward: [(0, '4.367')] -[2025-08-29 21:11:00,587][48846] Signal inference workers to resume experience collection... (450 times) -[2025-08-29 21:11:00,588][48863] InferenceWorker_p0-w0: resuming experience collection (450 times) -[2025-08-29 21:11:04,291][32845] Fps is (10 sec: 6553.6, 60 sec: 9830.4, 300 sec: 7553.3). Total num frames: 29556736. Throughput: 0: 2606.8. Samples: 7424660. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:11:04,293][32845] Avg episode reward: [(0, '4.346')] -[2025-08-29 21:11:09,291][32845] Fps is (10 sec: 13675.1, 60 sec: 9830.4, 300 sec: 7775.5). Total num frames: 29622272. Throughput: 0: 2863.1. Samples: 7447972. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:11:09,292][32845] Avg episode reward: [(0, '4.230')] -[2025-08-29 21:11:14,291][32845] Fps is (10 sec: 13107.0, 60 sec: 10922.6, 300 sec: 7775.5). Total num frames: 29687808. Throughput: 0: 2863.2. Samples: 7452492. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:11:14,294][32845] Avg episode reward: [(0, '4.584')] -[2025-08-29 21:11:19,291][32845] Fps is (10 sec: 13107.0, 60 sec: 10922.7, 300 sec: 7775.5). Total num frames: 29753344. Throughput: 0: 3062.4. Samples: 7476752. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:11:19,293][32845] Avg episode reward: [(0, '4.544')] -[2025-08-29 21:11:24,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 7997.6). Total num frames: 29818880. Throughput: 0: 2975.0. Samples: 7496028. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-08-29 21:11:24,293][32845] Avg episode reward: [(0, '4.573')] -[2025-08-29 21:11:29,291][32845] Fps is (10 sec: 13107.3, 60 sec: 12014.9, 300 sec: 7997.6). Total num frames: 29884416. Throughput: 0: 3058.7. Samples: 7503872. Policy #0 lag: (min: 2.0, avg: 2.0, max: 2.0) -[2025-08-29 21:11:29,292][32845] Avg episode reward: [(0, '4.468')] -[2025-08-29 21:11:35,532][32845] Fps is (10 sec: 11660.6, 60 sec: 11771.6, 300 sec: 8185.3). Total num frames: 29949952. Throughput: 0: 2691.0. Samples: 7513616. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:11:35,533][32845] Avg episode reward: [(0, '4.445')] -[2025-08-29 21:11:39,291][32845] Fps is (10 sec: 6553.5, 60 sec: 10922.6, 300 sec: 7997.6). Total num frames: 29949952. Throughput: 0: 2721.6. Samples: 7532096. Policy #0 lag: (min: 2.0, avg: 2.0, max: 3.0) -[2025-08-29 21:11:39,292][32845] Avg episode reward: [(0, '4.348')] -[2025-08-29 21:11:40,352][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000458_30015488.pth... -[2025-08-29 21:11:40,503][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000458_30015488.pth -[2025-08-29 21:11:44,168][48846] Stopping Batcher_0... -[2025-08-29 21:11:44,169][48846] Loop batcher_evt_loop terminating... -[2025-08-29 21:11:44,177][32845] Component Batcher_0 stopped! -[2025-08-29 21:11:44,314][48863] Weights refcount: 2 0 -[2025-08-29 21:11:44,341][32845] Component RolloutWorker_w9 stopped! -[2025-08-29 21:11:44,341][48881] Stopping RolloutWorker_w9... -[2025-08-29 21:11:44,344][48881] Loop rollout_proc9_evt_loop terminating... -[2025-08-29 21:11:44,348][48863] Stopping InferenceWorker_p0-w0... -[2025-08-29 21:11:44,349][48863] Loop inference_proc0-0_evt_loop terminating... -[2025-08-29 21:11:44,349][32845] Component InferenceWorker_p0-w0 stopped! -[2025-08-29 21:11:44,400][32845] Component RolloutWorker_w4 stopped! -[2025-08-29 21:11:44,402][32845] Component RolloutWorker_w3 stopped! -[2025-08-29 21:11:44,403][48867] Stopping RolloutWorker_w3... -[2025-08-29 21:11:44,401][48878] Stopping RolloutWorker_w4... -[2025-08-29 21:11:44,405][48867] Loop rollout_proc3_evt_loop terminating... -[2025-08-29 21:11:44,405][48878] Loop rollout_proc4_evt_loop terminating... -[2025-08-29 21:11:44,409][32845] Component RolloutWorker_w1 stopped! -[2025-08-29 21:11:44,410][48865] Stopping RolloutWorker_w1... -[2025-08-29 21:11:44,418][32845] Component RolloutWorker_w2 stopped! -[2025-08-29 21:11:44,419][48866] Stopping RolloutWorker_w2... -[2025-08-29 21:11:44,427][48865] Loop rollout_proc1_evt_loop terminating... -[2025-08-29 21:11:44,427][32845] Component RolloutWorker_w8 stopped! -[2025-08-29 21:11:44,428][48866] Loop rollout_proc2_evt_loop terminating... -[2025-08-29 21:11:44,424][48882] Stopping RolloutWorker_w8... -[2025-08-29 21:11:44,423][48879] Stopping RolloutWorker_w6... -[2025-08-29 21:11:44,428][32845] Component RolloutWorker_w6 stopped! -[2025-08-29 21:11:44,430][48882] Loop rollout_proc8_evt_loop terminating... -[2025-08-29 21:11:44,430][48880] Stopping RolloutWorker_w7... -[2025-08-29 21:11:44,430][32845] Component RolloutWorker_w7 stopped! -[2025-08-29 21:11:44,432][48879] Loop rollout_proc6_evt_loop terminating... -[2025-08-29 21:11:44,432][48880] Loop rollout_proc7_evt_loop terminating... -[2025-08-29 21:11:44,436][32845] Component RolloutWorker_w0 stopped! -[2025-08-29 21:11:44,436][48864] Stopping RolloutWorker_w0... -[2025-08-29 21:11:44,440][48864] Loop rollout_proc0_evt_loop terminating... -[2025-08-29 21:11:44,441][32845] Component RolloutWorker_w5 stopped! -[2025-08-29 21:11:44,441][48868] Stopping RolloutWorker_w5... -[2025-08-29 21:11:44,444][48868] Loop rollout_proc5_evt_loop terminating... -[2025-08-29 21:11:48,749][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000460_30146560.pth... -[2025-08-29 21:11:48,862][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000460_30146560.pth -[2025-08-29 21:11:48,868][48846] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000460_30146560.pth... -[2025-08-29 21:11:48,917][48846] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000460_30146560.pth -[2025-08-29 21:11:48,923][48846] Stopping LearnerWorker_p0... -[2025-08-29 21:11:48,924][48846] Loop learner_proc0_evt_loop terminating... -[2025-08-29 21:11:48,923][32845] Component LearnerWorker_p0 stopped! -[2025-08-29 21:11:48,926][32845] Waiting for process learner_proc0 to stop... -[2025-08-29 21:11:51,267][32845] Waiting for process inference_proc0-0 to join... -[2025-08-29 21:11:51,268][32845] Waiting for process rollout_proc0 to join... -[2025-08-29 21:11:51,269][32845] Waiting for process rollout_proc1 to join... -[2025-08-29 21:11:51,270][32845] Waiting for process rollout_proc2 to join... -[2025-08-29 21:11:51,270][32845] Waiting for process rollout_proc3 to join... -[2025-08-29 21:11:51,271][32845] Waiting for process rollout_proc4 to join... -[2025-08-29 21:11:51,272][32845] Waiting for process rollout_proc5 to join... -[2025-08-29 21:11:51,273][32845] Waiting for process rollout_proc6 to join... -[2025-08-29 21:11:51,273][32845] Waiting for process rollout_proc7 to join... -[2025-08-29 21:11:51,274][32845] Waiting for process rollout_proc8 to join... -[2025-08-29 21:11:51,275][32845] Waiting for process rollout_proc9 to join... -[2025-08-29 21:11:51,275][32845] Batcher 0 profile tree view: -batching: 730.9102, releasing_batches: 1728.5482 -[2025-08-29 21:11:51,277][32845] InferenceWorker_p0-w0 profile tree view: -wait_policy: 0.0000 - wait_policy_total: 470.2164 -update_model: 12.8932 - weight_update: 0.0017 -one_step: 0.0051 - handle_policy_step: 2415.5070 - deserialize: 56.2736, stack: 5.6245, obs_to_device_normalize: 490.7794, forward: 429.3519, send_messages: 69.4829 - prepare_outputs: 1323.5790 - to_cpu: 1281.2401 -[2025-08-29 21:11:51,278][32845] Learner 0 profile tree view: -misc: 0.0014, prepare_batch: 292.6362 -train: 3622.8606 - epoch_init: 0.0015, minibatch_init: 0.0033, losses_postprocess: 1.9000, kl_divergence: 2.4252, after_optimizer: 1637.0893 - calculate_losses: 1660.7779 - losses_init: 0.0011, forward_head: 56.5466, bptt_initial: 1525.7658, tail: 1.5539, advantages_returns: 0.1239, losses: 73.7921 - bptt: 2.8815 - bptt_forward_core: 2.7329 - update: 320.4680 - clip: 10.6376 -[2025-08-29 21:11:51,279][32845] RolloutWorker_w0 profile tree view: -wait_for_trajectories: 0.2121, enqueue_policy_requests: 28.9411, env_step: 416.1793, overhead: 25.9884, complete_rollouts: 0.3510 -save_policy_outputs: 33.6680 - split_output_tensors: 11.5176 -[2025-08-29 21:11:51,280][32845] RolloutWorker_w9 profile tree view: -wait_for_trajectories: 0.2356, enqueue_policy_requests: 37.8527, env_step: 450.9692, overhead: 26.6845, complete_rollouts: 0.3988 -save_policy_outputs: 38.9512 - split_output_tensors: 11.9529 -[2025-08-29 21:11:51,281][32845] Loop Runner_EvtLoop terminating... -[2025-08-29 21:11:51,283][32845] Runner profile tree view: -main_loop: 3978.5627 -[2025-08-29 21:11:51,283][32845] Collected {0: 30146560}, FPS: 7577.2 -[2025-08-30 00:42:31,720][32845] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-30 00:42:31,722][32845] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-30 00:42:31,723][32845] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-30 00:42:31,724][32845] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-30 00:42:31,725][32845] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-30 00:42:31,725][32845] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-30 00:42:31,726][32845] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! -[2025-08-30 00:42:31,727][32845] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-30 00:42:31,728][32845] Adding new argument 'push_to_hub'=False that is not in the saved config file! -[2025-08-30 00:42:31,728][32845] Adding new argument 'hf_repository'=None that is not in the saved config file! -[2025-08-30 00:42:31,729][32845] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-30 00:42:31,730][32845] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-30 00:42:31,731][32845] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-30 00:42:31,731][32845] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-30 00:42:31,732][32845] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-30 00:42:31,793][32845] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-08-30 00:42:31,799][32845] RunningMeanStd input shape: (3, 72, 128) -[2025-08-30 00:42:31,813][32845] RunningMeanStd input shape: (1,) -[2025-08-30 00:42:31,917][32845] ConvEncoder: input_channels=3 -[2025-08-30 00:42:32,483][32845] Conv encoder output size: 512 -[2025-08-30 00:42:32,485][32845] Policy head output size: 512 -[2025-08-30 00:42:34,183][32845] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-30 00:42:35,993][32845] Num frames 100... -[2025-08-30 00:42:36,123][32845] Num frames 200... -[2025-08-30 00:42:36,237][32845] Num frames 300... -[2025-08-30 00:42:36,392][32845] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 -[2025-08-30 00:42:36,393][32845] Avg episode reward: 3.840, avg true_objective: 3.840 -[2025-08-30 00:42:36,420][32845] Num frames 400... -[2025-08-30 00:42:36,540][32845] Num frames 500... -[2025-08-30 00:42:36,664][32845] Num frames 600... -[2025-08-30 00:42:36,783][32845] Num frames 700... -[2025-08-30 00:42:36,920][32845] Num frames 800... -[2025-08-30 00:42:36,971][32845] Avg episode rewards: #0: 5.000, true rewards: #0: 4.000 -[2025-08-30 00:42:36,972][32845] Avg episode reward: 5.000, avg true_objective: 4.000 -[2025-08-30 00:42:37,091][32845] Num frames 900... -[2025-08-30 00:42:37,208][32845] Num frames 1000... -[2025-08-30 00:42:37,332][32845] Num frames 1100... -[2025-08-30 00:42:37,478][32845] Avg episode rewards: #0: 4.613, true rewards: #0: 3.947 -[2025-08-30 00:42:37,479][32845] Avg episode reward: 4.613, avg true_objective: 3.947 -[2025-08-30 00:42:37,506][32845] Num frames 1200... -[2025-08-30 00:42:37,613][32845] Num frames 1300... -[2025-08-30 00:42:37,733][32845] Num frames 1400... -[2025-08-30 00:42:37,849][32845] Num frames 1500... -[2025-08-30 00:42:37,968][32845] Num frames 1600... -[2025-08-30 00:42:38,093][32845] Num frames 1700... -[2025-08-30 00:42:38,189][32845] Avg episode rewards: #0: 5.320, true rewards: #0: 4.320 -[2025-08-30 00:42:38,190][32845] Avg episode reward: 5.320, avg true_objective: 4.320 -[2025-08-30 00:42:38,277][32845] Num frames 1800... -[2025-08-30 00:42:38,382][32845] Num frames 1900... -[2025-08-30 00:42:38,496][32845] Num frames 2000... -[2025-08-30 00:42:38,605][32845] Num frames 2100... -[2025-08-30 00:42:38,679][32845] Avg episode rewards: #0: 5.024, true rewards: #0: 4.224 -[2025-08-30 00:42:38,680][32845] Avg episode reward: 5.024, avg true_objective: 4.224 -[2025-08-30 00:42:38,790][32845] Num frames 2200... -[2025-08-30 00:42:38,910][32845] Num frames 2300... -[2025-08-30 00:42:39,026][32845] Num frames 2400... -[2025-08-30 00:42:39,199][32845] Avg episode rewards: #0: 4.827, true rewards: #0: 4.160 -[2025-08-30 00:42:39,200][32845] Avg episode reward: 4.827, avg true_objective: 4.160 -[2025-08-30 00:42:39,206][32845] Num frames 2500... -[2025-08-30 00:42:39,332][32845] Num frames 2600... -[2025-08-30 00:42:39,448][32845] Num frames 2700... -[2025-08-30 00:42:39,565][32845] Num frames 2800... -[2025-08-30 00:42:39,701][32845] Num frames 2900... -[2025-08-30 00:42:39,817][32845] Avg episode rewards: #0: 4.920, true rewards: #0: 4.206 -[2025-08-30 00:42:39,818][32845] Avg episode reward: 4.920, avg true_objective: 4.206 -[2025-08-30 00:42:39,897][32845] Num frames 3000... -[2025-08-30 00:42:40,029][32845] Num frames 3100... -[2025-08-30 00:42:40,162][32845] Num frames 3200... -[2025-08-30 00:42:40,292][32845] Num frames 3300... -[2025-08-30 00:42:40,425][32845] Avg episode rewards: #0: 4.825, true rewards: #0: 4.200 -[2025-08-30 00:42:40,426][32845] Avg episode reward: 4.825, avg true_objective: 4.200 -[2025-08-30 00:42:40,487][32845] Num frames 3400... -[2025-08-30 00:42:40,657][32845] Num frames 3500... -[2025-08-30 00:42:40,810][32845] Num frames 3600... -[2025-08-30 00:42:40,888][32845] Avg episode rewards: #0: 4.573, true rewards: #0: 4.018 -[2025-08-30 00:42:40,889][32845] Avg episode reward: 4.573, avg true_objective: 4.018 -[2025-08-30 00:42:41,013][32845] Num frames 3700... -[2025-08-30 00:42:41,172][32845] Num frames 3800... -[2025-08-30 00:42:41,324][32845] Num frames 3900... -[2025-08-30 00:42:41,525][32845] Num frames 4000... -[2025-08-30 00:42:41,825][32845] Num frames 4100... -[2025-08-30 00:42:42,045][32845] Avg episode rewards: #0: 4.860, true rewards: #0: 4.160 -[2025-08-30 00:42:42,046][32845] Avg episode reward: 4.860, avg true_objective: 4.160 -[2025-08-30 00:42:47,486][32845] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-30 00:42:49,617][32845] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-30 00:42:49,618][32845] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-30 00:42:49,619][32845] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-30 00:42:49,621][32845] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-30 00:42:49,621][32845] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-30 00:42:49,622][32845] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-30 00:42:49,623][32845] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-30 00:42:49,623][32845] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-30 00:42:49,624][32845] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-30 00:42:49,625][32845] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-30 00:42:49,625][32845] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-30 00:42:49,626][32845] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-30 00:42:49,627][32845] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-30 00:42:49,628][32845] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-30 00:42:49,628][32845] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-30 00:42:49,641][32845] RunningMeanStd input shape: (3, 72, 128) -[2025-08-30 00:42:49,642][32845] RunningMeanStd input shape: (1,) -[2025-08-30 00:42:49,650][32845] ConvEncoder: input_channels=3 -[2025-08-30 00:42:49,693][32845] Conv encoder output size: 512 -[2025-08-30 00:42:49,695][32845] Policy head output size: 512 -[2025-08-30 00:42:49,728][32845] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-30 00:42:50,305][32845] Num frames 100... -[2025-08-30 00:42:50,620][32845] Num frames 200... -[2025-08-30 00:42:50,906][32845] Num frames 300... -[2025-08-30 00:42:51,190][32845] Num frames 400... -[2025-08-30 00:42:51,391][32845] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480 -[2025-08-30 00:42:51,392][32845] Avg episode reward: 5.480, avg true_objective: 4.480 -[2025-08-30 00:42:51,545][32845] Num frames 500... -[2025-08-30 00:42:51,833][32845] Num frames 600... -[2025-08-30 00:42:52,110][32845] Num frames 700... -[2025-08-30 00:42:52,398][32845] Num frames 800... -[2025-08-30 00:42:52,542][32845] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 -[2025-08-30 00:42:52,543][32845] Avg episode reward: 4.660, avg true_objective: 4.160 -[2025-08-30 00:42:52,738][32845] Num frames 900... -[2025-08-30 00:42:53,026][32845] Num frames 1000... -[2025-08-30 00:42:53,323][32845] Avg episode rewards: #0: 3.960, true rewards: #0: 3.627 -[2025-08-30 00:42:53,324][32845] Avg episode reward: 3.960, avg true_objective: 3.627 -[2025-08-30 00:42:53,359][32845] Num frames 1100... -[2025-08-30 00:42:53,632][32845] Num frames 1200... -[2025-08-30 00:42:53,906][32845] Num frames 1300... -[2025-08-30 00:42:54,179][32845] Num frames 1400... -[2025-08-30 00:42:54,430][32845] Avg episode rewards: #0: 3.930, true rewards: #0: 3.680 -[2025-08-30 00:42:54,431][32845] Avg episode reward: 3.930, avg true_objective: 3.680 -[2025-08-30 00:42:54,511][32845] Num frames 1500... -[2025-08-30 00:42:54,794][32845] Num frames 1600... -[2025-08-30 00:42:55,076][32845] Num frames 1700... -[2025-08-30 00:42:55,342][32845] Num frames 1800... -[2025-08-30 00:42:55,552][32845] Avg episode rewards: #0: 3.912, true rewards: #0: 3.712 -[2025-08-30 00:42:55,553][32845] Avg episode reward: 3.912, avg true_objective: 3.712 -[2025-08-30 00:42:55,683][32845] Num frames 1900... -[2025-08-30 00:42:55,964][32845] Num frames 2000... -[2025-08-30 00:42:56,251][32845] Num frames 2100... -[2025-08-30 00:42:56,532][32845] Num frames 2200... -[2025-08-30 00:42:56,709][32845] Avg episode rewards: #0: 3.900, true rewards: #0: 3.733 -[2025-08-30 00:42:56,711][32845] Avg episode reward: 3.900, avg true_objective: 3.733 -[2025-08-30 00:43:00,475][32845] Num frames 2300... -[2025-08-30 00:43:00,781][32845] Num frames 2400... -[2025-08-30 00:43:01,094][32845] Num frames 2500... -[2025-08-30 00:43:01,420][32845] Num frames 2600... -[2025-08-30 00:43:01,755][32845] Num frames 2700... -[2025-08-30 00:43:02,071][32845] Avg episode rewards: #0: 4.406, true rewards: #0: 3.977 -[2025-08-30 00:43:02,072][32845] Avg episode reward: 4.406, avg true_objective: 3.977 -[2025-08-30 00:43:02,118][32845] Num frames 2800... -[2025-08-30 00:43:02,407][32845] Num frames 2900... -[2025-08-30 00:43:02,696][32845] Num frames 3000... -[2025-08-30 00:43:02,978][32845] Num frames 3100... -[2025-08-30 00:43:03,263][32845] Num frames 3200... -[2025-08-30 00:43:03,407][32845] Avg episode rewards: #0: 4.540, true rewards: #0: 4.040 -[2025-08-30 00:43:03,408][32845] Avg episode reward: 4.540, avg true_objective: 4.040 -[2025-08-30 00:43:03,612][32845] Num frames 3300... -[2025-08-30 00:43:03,898][32845] Num frames 3400... -[2025-08-30 00:43:04,194][32845] Num frames 3500... -[2025-08-30 00:43:04,481][32845] Num frames 3600... -[2025-08-30 00:43:04,582][32845] Avg episode rewards: #0: 4.462, true rewards: #0: 4.018 -[2025-08-30 00:43:04,583][32845] Avg episode reward: 4.462, avg true_objective: 4.018 -[2025-08-30 00:43:04,837][32845] Num frames 3700... -[2025-08-30 00:43:05,118][32845] Num frames 3800... -[2025-08-30 00:43:05,388][32845] Num frames 3900... -[2025-08-30 00:43:05,663][32845] Num frames 4000... -[2025-08-30 00:43:05,895][32845] Avg episode rewards: #0: 4.564, true rewards: #0: 4.064 -[2025-08-30 00:43:05,896][32845] Avg episode reward: 4.564, avg true_objective: 4.064 -[2025-08-30 00:43:11,010][32845] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! -[2025-08-30 00:44:13,610][32845] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json -[2025-08-30 00:44:13,611][32845] Overriding arg 'num_workers' with value 1 passed from command line -[2025-08-30 00:44:13,612][32845] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-08-30 00:44:13,612][32845] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-08-30 00:44:13,613][32845] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-08-30 00:44:13,613][32845] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-08-30 00:44:13,614][32845] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-08-30 00:44:13,615][32845] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-08-30 00:44:13,616][32845] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-08-30 00:44:13,617][32845] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-08-30 00:44:13,617][32845] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-08-30 00:44:13,618][32845] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-08-30 00:44:13,620][32845] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-08-30 00:44:13,620][32845] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-08-30 00:44:13,621][32845] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-08-30 00:44:13,645][32845] RunningMeanStd input shape: (3, 72, 128) -[2025-08-30 00:44:13,646][32845] RunningMeanStd input shape: (1,) -[2025-08-30 00:44:13,655][32845] ConvEncoder: input_channels=3 -[2025-08-30 00:44:13,681][32845] Conv encoder output size: 512 -[2025-08-30 00:44:13,682][32845] Policy head output size: 512 -[2025-08-30 00:44:13,714][32845] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... -[2025-08-30 00:44:14,296][32845] Num frames 100... -[2025-08-30 00:44:14,589][32845] Num frames 200... -[2025-08-30 00:44:14,858][32845] Num frames 300... -[2025-08-30 00:44:14,923][32845] Avg episode rewards: #0: 3.020, true rewards: #0: 3.020 -[2025-08-30 00:44:14,926][32845] Avg episode reward: 3.020, avg true_objective: 3.020 -[2025-08-30 00:44:15,210][32845] Num frames 400... -[2025-08-30 00:44:15,485][32845] Num frames 500... -[2025-08-30 00:44:15,766][32845] Num frames 600... -[2025-08-30 00:44:16,078][32845] Avg episode rewards: #0: 3.430, true rewards: #0: 3.430 -[2025-08-30 00:44:16,080][32845] Avg episode reward: 3.430, avg true_objective: 3.430 -[2025-08-30 00:44:16,116][32845] Num frames 700... -[2025-08-30 00:44:16,399][32845] Num frames 800... -[2025-08-30 00:44:16,680][32845] Num frames 900... -[2025-08-30 00:44:16,962][32845] Num frames 1000... -[2025-08-30 00:44:17,228][32845] Avg episode rewards: #0: 3.567, true rewards: #0: 3.567 -[2025-08-30 00:44:17,231][32845] Avg episode reward: 3.567, avg true_objective: 3.567 -[2025-08-30 00:44:17,322][32845] Num frames 1100... -[2025-08-30 00:44:17,597][32845] Num frames 1200... -[2025-08-30 00:44:17,897][32845] Num frames 1300... -[2025-08-30 00:44:18,192][32845] Num frames 1400... -[2025-08-30 00:44:18,400][32845] Avg episode rewards: #0: 3.635, true rewards: #0: 3.635 -[2025-08-30 00:44:18,402][32845] Avg episode reward: 3.635, avg true_objective: 3.635 -[2025-08-30 00:44:18,535][32845] Num frames 1500... -[2025-08-30 00:44:18,821][32845] Num frames 1600... -[2025-08-30 00:44:19,172][32845] Num frames 1700... -[2025-08-30 00:44:19,634][32845] Num frames 1800... -[2025-08-30 00:44:19,831][32845] Num frames 1900... -[2025-08-30 00:44:20,052][32845] Avg episode rewards: #0: 4.396, true rewards: #0: 3.996 -[2025-08-30 00:44:20,053][32845] Avg episode reward: 4.396, avg true_objective: 3.996 -[2025-08-30 00:44:20,056][32845] Num frames 2000... -[2025-08-30 00:44:20,220][32845] Num frames 2100... -[2025-08-30 00:44:20,333][32845] Num frames 2200... -[2025-08-30 00:44:20,450][32845] Num frames 2300... -[2025-08-30 00:44:20,609][32845] Avg episode rewards: #0: 4.303, true rewards: #0: 3.970 -[2025-08-30 00:44:20,610][32845] Avg episode reward: 4.303, avg true_objective: 3.970 -[2025-08-30 00:44:20,632][32845] Num frames 2400... -[2025-08-30 00:44:20,753][32845] Num frames 2500... -[2025-08-30 00:44:20,883][32845] Num frames 2600... -[2025-08-30 00:44:21,008][32845] Num frames 2700... -[2025-08-30 00:44:21,146][32845] Num frames 2800... -[2025-08-30 00:44:21,242][32845] Avg episode rewards: #0: 4.471, true rewards: #0: 4.043 -[2025-08-30 00:44:21,244][32845] Avg episode reward: 4.471, avg true_objective: 4.043 -[2025-08-30 00:44:21,344][32845] Num frames 2900... -[2025-08-30 00:44:21,467][32845] Num frames 3000... -[2025-08-30 00:44:21,587][32845] Num frames 3100... -[2025-08-30 00:44:21,708][32845] Num frames 3200... -[2025-08-30 00:44:21,783][32845] Avg episode rewards: #0: 4.393, true rewards: #0: 4.017 -[2025-08-30 00:44:21,784][32845] Avg episode reward: 4.393, avg true_objective: 4.017 -[2025-08-30 00:44:21,930][32845] Num frames 3300... -[2025-08-30 00:44:22,085][32845] Num frames 3400... -[2025-08-30 00:44:22,236][32845] Num frames 3500... -[2025-08-30 00:44:22,413][32845] Num frames 3600... -[2025-08-30 00:44:22,629][32845] Avg episode rewards: #0: 4.549, true rewards: #0: 4.104 -[2025-08-30 00:44:22,630][32845] Avg episode reward: 4.549, avg true_objective: 4.104 -[2025-08-30 00:44:22,640][32845] Num frames 3700... -[2025-08-30 00:44:22,812][32845] Num frames 3800... -[2025-08-30 00:44:22,970][32845] Num frames 3900... -[2025-08-30 00:44:23,125][32845] Num frames 4000... -[2025-08-30 00:44:23,298][32845] Avg episode rewards: #0: 4.478, true rewards: #0: 4.078 -[2025-08-30 00:44:23,299][32845] Avg episode reward: 4.478, avg true_objective: 4.078 -[2025-08-30 00:44:28,439][32845] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! +[2025-09-02 16:15:25,266][03375] Using optimizer +[2025-09-02 16:15:29,946][03375] No checkpoints found +[2025-09-02 16:15:29,946][03375] Did not load from checkpoint, starting from scratch! +[2025-09-02 16:15:29,946][03375] Initialized policy 0 weights for model version 0 +[2025-09-02 16:15:29,949][03375] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-02 16:15:29,956][03375] LearnerWorker_p0 finished initialization! +[2025-09-02 16:15:29,969][03057] Heartbeat connected on LearnerWorker_p0 +[2025-09-02 16:15:30,104][03390] RunningMeanStd input shape: (3, 72, 128) +[2025-09-02 16:15:30,106][03390] RunningMeanStd input shape: (1,) +[2025-09-02 16:15:30,144][03390] ConvEncoder: input_channels=3 +[2025-09-02 16:15:30,450][03390] Conv encoder output size: 512 +[2025-09-02 16:15:30,455][03390] Policy head output size: 512 +[2025-09-02 16:15:30,602][03057] Inference worker 0-0 is ready! +[2025-09-02 16:15:30,604][03057] All inference workers are ready! Signal rollout workers to start! +[2025-09-02 16:15:31,121][03400] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 16:15:31,120][03392] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 16:15:31,157][03393] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 16:15:31,158][03396] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 16:15:31,190][03398] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 16:15:31,567][03391] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 16:15:31,602][03397] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 16:15:31,660][03395] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 16:15:31,686][03399] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 16:15:31,698][03394] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 16:15:31,786][03057] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:15:34,222][03400] Decorrelating experience for 0 frames... +[2025-09-02 16:15:34,225][03393] Decorrelating experience for 0 frames... +[2025-09-02 16:15:34,225][03396] Decorrelating experience for 0 frames... +[2025-09-02 16:15:34,255][03391] Decorrelating experience for 0 frames... +[2025-09-02 16:15:34,274][03399] Decorrelating experience for 0 frames... +[2025-09-02 16:15:34,278][03394] Decorrelating experience for 0 frames... +[2025-09-02 16:15:35,020][03399] Decorrelating experience for 64 frames... +[2025-09-02 16:15:35,128][03391] Decorrelating experience for 64 frames... +[2025-09-02 16:15:35,792][03393] Decorrelating experience for 64 frames... +[2025-09-02 16:15:35,795][03396] Decorrelating experience for 64 frames... +[2025-09-02 16:15:35,797][03398] Decorrelating experience for 0 frames... +[2025-09-02 16:15:35,850][03392] Decorrelating experience for 0 frames... +[2025-09-02 16:15:36,144][03399] Decorrelating experience for 128 frames... +[2025-09-02 16:15:36,221][03395] Decorrelating experience for 0 frames... +[2025-09-02 16:15:36,606][03398] Decorrelating experience for 64 frames... +[2025-09-02 16:15:36,780][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:15:37,184][03392] Decorrelating experience for 64 frames... +[2025-09-02 16:15:37,267][03393] Decorrelating experience for 128 frames... +[2025-09-02 16:15:37,730][03391] Decorrelating experience for 128 frames... +[2025-09-02 16:15:37,732][03394] Decorrelating experience for 64 frames... +[2025-09-02 16:15:37,819][03397] Decorrelating experience for 0 frames... +[2025-09-02 16:15:38,228][03395] Decorrelating experience for 64 frames... +[2025-09-02 16:15:38,426][03399] Decorrelating experience for 192 frames... +[2025-09-02 16:15:38,514][03392] Decorrelating experience for 128 frames... +[2025-09-02 16:15:38,841][03393] Decorrelating experience for 192 frames... +[2025-09-02 16:15:39,405][03400] Decorrelating experience for 64 frames... +[2025-09-02 16:15:39,570][03396] Decorrelating experience for 128 frames... +[2025-09-02 16:15:39,643][03397] Decorrelating experience for 64 frames... +[2025-09-02 16:15:39,884][03391] Decorrelating experience for 192 frames... +[2025-09-02 16:15:40,230][03394] Decorrelating experience for 128 frames... +[2025-09-02 16:15:41,034][03395] Decorrelating experience for 128 frames... +[2025-09-02 16:15:41,120][03392] Decorrelating experience for 192 frames... +[2025-09-02 16:15:41,229][03400] Decorrelating experience for 128 frames... +[2025-09-02 16:15:41,385][03397] Decorrelating experience for 128 frames... +[2025-09-02 16:15:41,512][03393] Decorrelating experience for 256 frames... +[2025-09-02 16:15:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:15:42,036][03398] Decorrelating experience for 128 frames... +[2025-09-02 16:15:42,798][03394] Decorrelating experience for 192 frames... +[2025-09-02 16:15:42,921][03395] Decorrelating experience for 192 frames... +[2025-09-02 16:15:42,949][03391] Decorrelating experience for 256 frames... +[2025-09-02 16:15:43,037][03396] Decorrelating experience for 192 frames... +[2025-09-02 16:15:43,269][03392] Decorrelating experience for 256 frames... +[2025-09-02 16:15:43,735][03397] Decorrelating experience for 192 frames... +[2025-09-02 16:15:44,699][03398] Decorrelating experience for 192 frames... +[2025-09-02 16:15:46,010][03399] Decorrelating experience for 256 frames... +[2025-09-02 16:15:46,151][03393] Decorrelating experience for 320 frames... +[2025-09-02 16:15:46,310][03392] Decorrelating experience for 320 frames... +[2025-09-02 16:15:46,764][03396] Decorrelating experience for 256 frames... +[2025-09-02 16:15:46,779][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:15:47,002][03391] Decorrelating experience for 320 frames... +[2025-09-02 16:15:47,497][03394] Decorrelating experience for 256 frames... +[2025-09-02 16:15:47,628][03395] Decorrelating experience for 256 frames... +[2025-09-02 16:15:49,207][03400] Decorrelating experience for 192 frames... +[2025-09-02 16:15:49,666][03398] Decorrelating experience for 256 frames... +[2025-09-02 16:15:50,305][03399] Decorrelating experience for 320 frames... +[2025-09-02 16:15:50,372][03393] Decorrelating experience for 384 frames... +[2025-09-02 16:15:50,435][03392] Decorrelating experience for 384 frames... +[2025-09-02 16:15:50,767][03397] Decorrelating experience for 256 frames... +[2025-09-02 16:15:51,150][03391] Decorrelating experience for 384 frames... +[2025-09-02 16:15:51,452][03394] Decorrelating experience for 320 frames... +[2025-09-02 16:15:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:15:52,039][03396] Decorrelating experience for 320 frames... +[2025-09-02 16:15:52,945][03399] Decorrelating experience for 384 frames... +[2025-09-02 16:15:53,012][03398] Decorrelating experience for 320 frames... +[2025-09-02 16:15:53,016][03393] Decorrelating experience for 448 frames... +[2025-09-02 16:15:53,272][03397] Decorrelating experience for 320 frames... +[2025-09-02 16:15:54,068][03391] Decorrelating experience for 448 frames... +[2025-09-02 16:15:54,219][03400] Decorrelating experience for 256 frames... +[2025-09-02 16:15:55,219][03396] Decorrelating experience for 384 frames... +[2025-09-02 16:15:55,265][03394] Decorrelating experience for 384 frames... +[2025-09-02 16:15:55,415][03392] Decorrelating experience for 448 frames... +[2025-09-02 16:15:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:15:57,088][03398] Decorrelating experience for 384 frames... +[2025-09-02 16:15:57,284][03399] Decorrelating experience for 448 frames... +[2025-09-02 16:15:59,194][03395] Decorrelating experience for 320 frames... +[2025-09-02 16:15:59,301][03397] Decorrelating experience for 384 frames... +[2025-09-02 16:16:01,021][03396] Decorrelating experience for 448 frames... +[2025-09-02 16:16:01,172][03400] Decorrelating experience for 320 frames... +[2025-09-02 16:16:01,348][03394] Decorrelating experience for 448 frames... +[2025-09-02 16:16:01,786][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 46.1. Samples: 1384. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:01,787][03057] Avg episode reward: [(0, '1.428')] +[2025-09-02 16:16:03,253][03398] Decorrelating experience for 448 frames... +[2025-09-02 16:16:05,648][03395] Decorrelating experience for 384 frames... +[2025-09-02 16:16:06,073][03397] Decorrelating experience for 448 frames... +[2025-09-02 16:16:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 117.1. Samples: 4096. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:06,782][03057] Avg episode reward: [(0, '2.169')] +[2025-09-02 16:16:09,356][03400] Decorrelating experience for 384 frames... +[2025-09-02 16:16:11,374][03395] Decorrelating experience for 448 frames... +[2025-09-02 16:16:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 172.7. Samples: 6908. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:11,786][03057] Avg episode reward: [(0, '2.982')] +[2025-09-02 16:16:14,304][03400] Decorrelating experience for 448 frames... +[2025-09-02 16:16:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 299.5. Samples: 13476. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:16,781][03057] Avg episode reward: [(0, '4.096')] +[2025-09-02 16:16:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 418.1. Samples: 18812. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:21,784][03057] Avg episode reward: [(0, '4.056')] +[2025-09-02 16:16:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 460.0. Samples: 20700. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:26,782][03057] Avg episode reward: [(0, '4.086')] +[2025-09-02 16:16:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 565.7. Samples: 25456. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:31,782][03057] Avg episode reward: [(0, '4.173')] +[2025-09-02 16:16:36,779][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 647.6. Samples: 29140. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:36,782][03057] Avg episode reward: [(0, '4.269')] +[2025-09-02 16:16:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 682.8. Samples: 30724. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:41,782][03057] Avg episode reward: [(0, '4.296')] +[2025-09-02 16:16:46,525][03375] Signal inference workers to stop experience collection... +[2025-09-02 16:16:46,564][03390] InferenceWorker_p0-w0: stopping experience collection +[2025-09-02 16:16:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 758.7. Samples: 35520. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:46,781][03057] Avg episode reward: [(0, '4.319')] +[2025-09-02 16:16:51,779][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 716.3. Samples: 36328. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:51,780][03057] Avg episode reward: [(0, '4.319')] +[2025-09-02 16:16:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 653.8. Samples: 36328. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-02 16:16:56,780][03057] Avg episode reward: [(0, '4.319')] +[2025-09-02 16:16:58,422][03375] Signal inference workers to resume experience collection... +[2025-09-02 16:16:58,423][03390] InferenceWorker_p0-w0: resuming experience collection +[2025-09-02 16:16:59,425][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_131072.pth... +[2025-09-02 16:17:01,778][03057] Fps is (10 sec: 13108.3, 60 sec: 2184.8, 300 sec: 1456.5). Total num frames: 131072. Throughput: 0: 557.9. Samples: 38580. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2025-09-02 16:17:01,783][03057] Avg episode reward: [(0, '4.380')] +[2025-09-02 16:17:01,786][03375] Saving new best policy, reward=4.380! +[2025-09-02 16:17:06,778][03057] Fps is (10 sec: 13107.3, 60 sec: 2184.5, 300 sec: 1379.8). Total num frames: 131072. Throughput: 0: 593.9. Samples: 45536. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2025-09-02 16:17:06,780][03057] Avg episode reward: [(0, '4.532')] +[2025-09-02 16:17:06,795][03375] Saving new best policy, reward=4.532! +[2025-09-02 16:17:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 2184.5, 300 sec: 1310.8). Total num frames: 131072. Throughput: 0: 604.4. Samples: 47896. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2025-09-02 16:17:11,781][03057] Avg episode reward: [(0, '4.505')] +[2025-09-02 16:17:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 3276.8, 300 sec: 1872.6). Total num frames: 196608. Throughput: 0: 641.2. Samples: 54308. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-02 16:17:16,784][03057] Avg episode reward: [(0, '4.429')] +[2025-09-02 16:17:21,778][03057] Fps is (10 sec: 6553.7, 60 sec: 3276.8, 300 sec: 1787.5). Total num frames: 196608. Throughput: 0: 734.9. Samples: 62212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-02 16:17:21,782][03057] Avg episode reward: [(0, '4.361')] +[2025-09-02 16:17:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 1709.8). Total num frames: 196608. Throughput: 0: 760.9. Samples: 64964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-02 16:17:26,780][03057] Avg episode reward: [(0, '4.432')] +[2025-09-02 16:17:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 2184.7). Total num frames: 262144. Throughput: 0: 778.0. Samples: 70532. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:17:31,779][03057] Avg episode reward: [(0, '4.538')] +[2025-09-02 16:17:31,786][03375] Saving new best policy, reward=4.538! +[2025-09-02 16:17:36,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 2097.3). Total num frames: 262144. Throughput: 0: 927.8. Samples: 78080. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:17:36,781][03057] Avg episode reward: [(0, '4.523')] +[2025-09-02 16:17:41,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 2016.6). Total num frames: 262144. Throughput: 0: 1010.7. Samples: 81812. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:17:41,785][03057] Avg episode reward: [(0, '4.455')] +[2025-09-02 16:17:46,780][03057] Fps is (10 sec: 6552.5, 60 sec: 5461.2, 300 sec: 2427.4). Total num frames: 327680. Throughput: 0: 1068.9. Samples: 86684. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:17:46,781][03057] Avg episode reward: [(0, '4.487')] +[2025-09-02 16:17:51,778][03057] Fps is (10 sec: 6555.4, 60 sec: 5461.4, 300 sec: 2340.7). Total num frames: 327680. Throughput: 0: 1085.1. Samples: 94364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:17:51,780][03057] Avg episode reward: [(0, '4.472')] +[2025-09-02 16:17:56,783][03057] Fps is (10 sec: 0.0, 60 sec: 5460.9, 300 sec: 2259.9). Total num frames: 327680. Throughput: 0: 1120.1. Samples: 98304. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:17:56,787][03057] Avg episode reward: [(0, '4.383')] +[2025-09-02 16:18:01,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 2621.6). Total num frames: 393216. Throughput: 0: 1101.7. Samples: 103884. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:01,783][03057] Avg episode reward: [(0, '4.378')] +[2025-09-02 16:18:06,778][03057] Fps is (10 sec: 6556.5, 60 sec: 4369.1, 300 sec: 2537.0). Total num frames: 393216. Throughput: 0: 1065.5. Samples: 110160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:06,782][03057] Avg episode reward: [(0, '4.530')] +[2025-09-02 16:18:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 2457.7). Total num frames: 393216. Throughput: 0: 1092.4. Samples: 114120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:11,783][03057] Avg episode reward: [(0, '4.535')] +[2025-09-02 16:18:16,781][03057] Fps is (10 sec: 6551.8, 60 sec: 4368.9, 300 sec: 2780.4). Total num frames: 458752. Throughput: 0: 1115.7. Samples: 120740. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:16,783][03057] Avg episode reward: [(0, '4.414')] +[2025-09-02 16:18:21,780][03057] Fps is (10 sec: 6552.4, 60 sec: 4368.9, 300 sec: 2698.6). Total num frames: 458752. Throughput: 0: 1050.4. Samples: 125352. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:21,781][03057] Avg episode reward: [(0, '4.209')] +[2025-09-02 16:18:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 2621.6). Total num frames: 458752. Throughput: 0: 1023.0. Samples: 127844. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:26,779][03057] Avg episode reward: [(0, '4.280')] +[2025-09-02 16:18:31,780][03057] Fps is (10 sec: 6553.6, 60 sec: 4368.9, 300 sec: 2912.8). Total num frames: 524288. Throughput: 0: 1060.4. Samples: 134404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:31,784][03057] Avg episode reward: [(0, '4.352')] +[2025-09-02 16:18:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 2834.1). Total num frames: 524288. Throughput: 0: 1008.1. Samples: 139728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:36,780][03057] Avg episode reward: [(0, '4.441')] +[2025-09-02 16:18:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 2759.5). Total num frames: 524288. Throughput: 0: 982.8. Samples: 142524. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:41,782][03057] Avg episode reward: [(0, '4.666')] +[2025-09-02 16:18:41,785][03375] Saving new best policy, reward=4.666! +[2025-09-02 16:18:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 3024.9). Total num frames: 589824. Throughput: 0: 1026.8. Samples: 150092. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:46,781][03057] Avg episode reward: [(0, '4.531')] +[2025-09-02 16:18:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 2949.2). Total num frames: 589824. Throughput: 0: 1032.8. Samples: 156636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:51,780][03057] Avg episode reward: [(0, '4.293')] +[2025-09-02 16:18:56,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.4, 300 sec: 2877.3). Total num frames: 589824. Throughput: 0: 998.8. Samples: 159068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:18:56,780][03057] Avg episode reward: [(0, '4.278')] +[2025-09-02 16:18:56,787][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000009_589824.pth... +[2025-09-02 16:19:01,155][03390] Updated weights for policy 0, policy_version 10 (0.0013) +[2025-09-02 16:19:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3120.9). Total num frames: 655360. Throughput: 0: 1005.8. Samples: 166000. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:01,780][03057] Avg episode reward: [(0, '4.425')] +[2025-09-02 16:19:06,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 3048.3). Total num frames: 655360. Throughput: 0: 1057.0. Samples: 172916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:06,785][03057] Avg episode reward: [(0, '4.460')] +[2025-09-02 16:19:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 2979.0). Total num frames: 655360. Throughput: 0: 1058.4. Samples: 175472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:11,783][03057] Avg episode reward: [(0, '4.475')] +[2025-09-02 16:19:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 3204.1). Total num frames: 720896. Throughput: 0: 1054.4. Samples: 181852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:16,782][03057] Avg episode reward: [(0, '4.528')] +[2025-09-02 16:19:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 3134.4). Total num frames: 720896. Throughput: 0: 1099.8. Samples: 189220. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:21,780][03057] Avg episode reward: [(0, '4.514')] +[2025-09-02 16:19:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3067.7). Total num frames: 720896. Throughput: 0: 1105.2. Samples: 192256. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:26,782][03057] Avg episode reward: [(0, '4.430')] +[2025-09-02 16:19:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.9, 300 sec: 3003.8). Total num frames: 720896. Throughput: 0: 1059.1. Samples: 197752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:31,779][03057] Avg episode reward: [(0, '4.357')] +[2025-09-02 16:19:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3210.0). Total num frames: 786432. Throughput: 0: 1075.6. Samples: 205036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:36,779][03057] Avg episode reward: [(0, '4.364')] +[2025-09-02 16:19:41,781][03057] Fps is (10 sec: 6551.9, 60 sec: 4368.9, 300 sec: 3145.8). Total num frames: 786432. Throughput: 0: 1109.4. Samples: 208992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:41,784][03057] Avg episode reward: [(0, '4.426')] +[2025-09-02 16:19:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3084.1). Total num frames: 786432. Throughput: 0: 1069.2. Samples: 214116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:46,779][03057] Avg episode reward: [(0, '4.397')] +[2025-09-02 16:19:51,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 3276.9). Total num frames: 851968. Throughput: 0: 1070.9. Samples: 221108. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:51,782][03057] Avg episode reward: [(0, '4.307')] +[2025-09-02 16:19:56,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4369.0, 300 sec: 3215.0). Total num frames: 851968. Throughput: 0: 1101.4. Samples: 225036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:19:56,790][03057] Avg episode reward: [(0, '4.299')] +[2025-09-02 16:20:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3155.5). Total num frames: 851968. Throughput: 0: 1095.0. Samples: 231128. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:01,782][03057] Avg episode reward: [(0, '4.446')] +[2025-09-02 16:20:06,778][03057] Fps is (10 sec: 6554.7, 60 sec: 4369.1, 300 sec: 3336.5). Total num frames: 917504. Throughput: 0: 1066.4. Samples: 237208. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:06,780][03057] Avg episode reward: [(0, '4.377')] +[2025-09-02 16:20:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3276.9). Total num frames: 917504. Throughput: 0: 1087.5. Samples: 241192. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:11,780][03057] Avg episode reward: [(0, '4.508')] +[2025-09-02 16:20:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3449.4). Total num frames: 983040. Throughput: 0: 1123.1. Samples: 248292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:16,780][03057] Avg episode reward: [(0, '4.643')] +[2025-09-02 16:20:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 3389.9). Total num frames: 983040. Throughput: 0: 1075.9. Samples: 253452. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:21,780][03057] Avg episode reward: [(0, '4.712')] +[2025-09-02 16:20:21,786][03375] Saving new best policy, reward=4.712! +[2025-09-02 16:20:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3332.4). Total num frames: 983040. Throughput: 0: 1073.6. Samples: 257300. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:26,780][03057] Avg episode reward: [(0, '4.548')] +[2025-09-02 16:20:31,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 1129.4. Samples: 264940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:31,780][03057] Avg episode reward: [(0, '4.365')] +[2025-09-02 16:20:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 1085.6. Samples: 269960. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:36,782][03057] Avg episode reward: [(0, '4.449')] +[2025-09-02 16:20:41,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 3554.5). Total num frames: 1048576. Throughput: 0: 1069.4. Samples: 273156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:41,782][03057] Avg episode reward: [(0, '4.575')] +[2025-09-02 16:20:46,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 3776.6). Total num frames: 1114112. Throughput: 0: 1109.6. Samples: 281060. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:46,780][03057] Avg episode reward: [(0, '4.531')] +[2025-09-02 16:20:51,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 3776.7). Total num frames: 1114112. Throughput: 0: 1105.4. Samples: 286952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:51,781][03057] Avg episode reward: [(0, '4.722')] +[2025-09-02 16:20:51,783][03375] Saving new best policy, reward=4.722! +[2025-09-02 16:20:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 3776.7). Total num frames: 1114112. Throughput: 0: 1071.7. Samples: 289420. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:20:56,782][03057] Avg episode reward: [(0, '4.581')] +[2025-09-02 16:20:56,791][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000017_1114112.pth... +[2025-09-02 16:20:56,933][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_131072.pth +[2025-09-02 16:21:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 3998.8). Total num frames: 1179648. Throughput: 0: 1084.5. Samples: 297096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:01,779][03057] Avg episode reward: [(0, '4.391')] +[2025-09-02 16:21:06,781][03057] Fps is (10 sec: 6551.9, 60 sec: 4368.9, 300 sec: 3998.8). Total num frames: 1179648. Throughput: 0: 1113.7. Samples: 303572. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:06,786][03057] Avg episode reward: [(0, '4.421')] +[2025-09-02 16:21:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 1179648. Throughput: 0: 1085.3. Samples: 306140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:11,779][03057] Avg episode reward: [(0, '4.527')] +[2025-09-02 16:21:16,778][03057] Fps is (10 sec: 6555.3, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1245184. Throughput: 0: 1069.2. Samples: 313056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:16,779][03057] Avg episode reward: [(0, '4.562')] +[2025-09-02 16:21:21,783][03057] Fps is (10 sec: 6550.4, 60 sec: 4368.7, 300 sec: 4220.9). Total num frames: 1245184. Throughput: 0: 1124.2. Samples: 320552. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:21,784][03057] Avg episode reward: [(0, '4.513')] +[2025-09-02 16:21:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1245184. Throughput: 0: 1114.0. Samples: 323284. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:26,780][03057] Avg episode reward: [(0, '4.396')] +[2025-09-02 16:21:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 1245184. Throughput: 0: 1070.8. Samples: 329244. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:31,783][03057] Avg episode reward: [(0, '4.456')] +[2025-09-02 16:21:32,306][03390] Updated weights for policy 0, policy_version 20 (0.0018) +[2025-09-02 16:21:36,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 1310720. Throughput: 0: 1101.0. Samples: 336496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:36,781][03057] Avg episode reward: [(0, '4.442')] +[2025-09-02 16:21:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 1310720. Throughput: 0: 1124.8. Samples: 340036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:41,779][03057] Avg episode reward: [(0, '4.414')] +[2025-09-02 16:21:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4443.1). Total num frames: 1310720. Throughput: 0: 1068.4. Samples: 345176. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:46,780][03057] Avg episode reward: [(0, '4.435')] +[2025-09-02 16:21:51,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4665.3). Total num frames: 1376256. Throughput: 0: 1085.6. Samples: 352420. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:51,783][03057] Avg episode reward: [(0, '4.437')] +[2025-09-02 16:21:56,785][03057] Fps is (10 sec: 6549.2, 60 sec: 4368.6, 300 sec: 4220.9). Total num frames: 1376256. Throughput: 0: 1114.1. Samples: 356280. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:21:56,787][03057] Avg episode reward: [(0, '4.473')] +[2025-09-02 16:22:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 1376256. Throughput: 0: 1082.4. Samples: 361764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:22:01,779][03057] Avg episode reward: [(0, '4.345')] +[2025-09-02 16:22:06,778][03057] Fps is (10 sec: 6558.0, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 1441792. Throughput: 0: 1055.5. Samples: 368044. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:22:06,780][03057] Avg episode reward: [(0, '4.424')] +[2025-09-02 16:22:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1441792. Throughput: 0: 1080.2. Samples: 371892. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:22:11,779][03057] Avg episode reward: [(0, '4.578')] +[2025-09-02 16:22:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 1441792. Throughput: 0: 1090.1. Samples: 378300. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:22:16,782][03057] Avg episode reward: [(0, '4.525')] +[2025-09-02 16:22:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.4, 300 sec: 4443.1). Total num frames: 1507328. Throughput: 0: 1052.4. Samples: 383852. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:22:21,782][03057] Avg episode reward: [(0, '4.545')] +[2025-09-02 16:22:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1507328. Throughput: 0: 1061.5. Samples: 387804. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:22:26,786][03057] Avg episode reward: [(0, '4.520')] +[2025-09-02 16:22:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1507328. Throughput: 0: 1108.4. Samples: 395056. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:22:31,782][03057] Avg episode reward: [(0, '4.458')] +[2025-09-02 16:22:36,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4443.2). Total num frames: 1572864. Throughput: 0: 1051.0. Samples: 399716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:22:36,780][03057] Avg episode reward: [(0, '4.478')] +[2025-09-02 16:22:41,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 1572864. Throughput: 0: 1052.7. Samples: 403644. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:22:41,780][03057] Avg episode reward: [(0, '4.358')] +[2025-09-02 16:22:46,778][03057] Fps is (10 sec: 6553.9, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 1638400. Throughput: 0: 1103.1. Samples: 411404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:22:46,782][03057] Avg episode reward: [(0, '4.397')] +[2025-09-02 16:22:51,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 1638400. Throughput: 0: 1082.7. Samples: 416764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:22:51,781][03057] Avg episode reward: [(0, '4.538')] +[2025-09-02 16:22:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.6, 300 sec: 4221.0). Total num frames: 1638400. Throughput: 0: 1061.3. Samples: 419652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:22:56,786][03057] Avg episode reward: [(0, '4.446')] +[2025-09-02 16:22:56,794][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000025_1638400.pth... +[2025-09-02 16:22:56,919][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000009_589824.pth +[2025-09-02 16:23:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1638400. Throughput: 0: 1093.1. Samples: 427488. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:01,784][03057] Avg episode reward: [(0, '4.515')] +[2025-09-02 16:23:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 1703936. Throughput: 0: 1106.5. Samples: 433644. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:06,782][03057] Avg episode reward: [(0, '4.628')] +[2025-09-02 16:23:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1703936. Throughput: 0: 1073.6. Samples: 436116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:11,780][03057] Avg episode reward: [(0, '4.536')] +[2025-09-02 16:23:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1703936. Throughput: 0: 1073.7. Samples: 443372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:16,780][03057] Avg episode reward: [(0, '4.494')] +[2025-09-02 16:23:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 1769472. Throughput: 0: 1092.2. Samples: 448864. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:21,784][03057] Avg episode reward: [(0, '4.389')] +[2025-09-02 16:23:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1769472. Throughput: 0: 1052.4. Samples: 451000. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:26,780][03057] Avg episode reward: [(0, '4.558')] +[2025-09-02 16:23:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1769472. Throughput: 0: 997.7. Samples: 456300. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:31,783][03057] Avg episode reward: [(0, '4.610')] +[2025-09-02 16:23:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 1835008. Throughput: 0: 1044.3. Samples: 463756. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:36,784][03057] Avg episode reward: [(0, '4.458')] +[2025-09-02 16:23:41,785][03057] Fps is (10 sec: 6549.2, 60 sec: 4368.6, 300 sec: 4220.9). Total num frames: 1835008. Throughput: 0: 1061.8. Samples: 467440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:41,789][03057] Avg episode reward: [(0, '4.630')] +[2025-09-02 16:23:46,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 1835008. Throughput: 0: 1006.7. Samples: 472792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:46,783][03057] Avg episode reward: [(0, '4.623')] +[2025-09-02 16:23:51,778][03057] Fps is (10 sec: 6558.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 1900544. Throughput: 0: 1025.1. Samples: 479772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:51,780][03057] Avg episode reward: [(0, '4.344')] +[2025-09-02 16:23:56,780][03057] Fps is (10 sec: 6552.6, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 1900544. Throughput: 0: 1058.9. Samples: 483768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:23:56,781][03057] Avg episode reward: [(0, '4.555')] +[2025-09-02 16:24:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1900544. Throughput: 0: 1033.2. Samples: 489864. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:24:01,787][03057] Avg episode reward: [(0, '4.831')] +[2025-09-02 16:24:01,791][03375] Saving new best policy, reward=4.831! +[2025-09-02 16:24:05,592][03390] Updated weights for policy 0, policy_version 30 (0.0017) +[2025-09-02 16:24:06,778][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 1966080. Throughput: 0: 1045.9. Samples: 495928. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:24:06,780][03057] Avg episode reward: [(0, '4.659')] +[2025-09-02 16:24:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1966080. Throughput: 0: 1080.7. Samples: 499632. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:24:11,780][03057] Avg episode reward: [(0, '4.497')] +[2025-09-02 16:24:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 1966080. Throughput: 0: 1118.3. Samples: 506624. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:24:16,780][03057] Avg episode reward: [(0, '4.746')] +[2025-09-02 16:24:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2031616. Throughput: 0: 1072.1. Samples: 512000. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:24:21,780][03057] Avg episode reward: [(0, '4.687')] +[2025-09-02 16:24:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2031616. Throughput: 0: 1076.0. Samples: 515852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:24:26,780][03057] Avg episode reward: [(0, '4.350')] +[2025-09-02 16:24:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2031616. Throughput: 0: 1133.9. Samples: 523816. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:24:31,781][03057] Avg episode reward: [(0, '4.488')] +[2025-09-02 16:24:36,780][03057] Fps is (10 sec: 6552.4, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 2097152. Throughput: 0: 1085.1. Samples: 528604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:24:36,781][03057] Avg episode reward: [(0, '4.515')] +[2025-09-02 16:24:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.6, 300 sec: 4443.1). Total num frames: 2097152. Throughput: 0: 1068.2. Samples: 531836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:24:41,780][03057] Avg episode reward: [(0, '4.437')] +[2025-09-02 16:24:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2097152. Throughput: 0: 1109.0. Samples: 539768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:24:46,780][03057] Avg episode reward: [(0, '4.302')] +[2025-09-02 16:24:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2162688. Throughput: 0: 1103.2. Samples: 545572. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:24:51,780][03057] Avg episode reward: [(0, '4.452')] +[2025-09-02 16:24:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 2162688. Throughput: 0: 1077.1. Samples: 548100. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:24:56,780][03057] Avg episode reward: [(0, '4.519')] +[2025-09-02 16:24:56,790][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000033_2162688.pth... +[2025-09-02 16:24:56,902][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000017_1114112.pth +[2025-09-02 16:25:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2162688. Throughput: 0: 1093.4. Samples: 555828. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:25:01,779][03057] Avg episode reward: [(0, '4.520')] +[2025-09-02 16:25:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2228224. Throughput: 0: 1119.9. Samples: 562396. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:06,783][03057] Avg episode reward: [(0, '4.484')] +[2025-09-02 16:25:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2228224. Throughput: 0: 1086.8. Samples: 564756. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:11,782][03057] Avg episode reward: [(0, '4.565')] +[2025-09-02 16:25:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2228224. Throughput: 0: 1056.8. Samples: 571372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:16,780][03057] Avg episode reward: [(0, '4.668')] +[2025-09-02 16:25:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2293760. Throughput: 0: 1110.4. Samples: 578568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:21,780][03057] Avg episode reward: [(0, '4.522')] +[2025-09-02 16:25:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2293760. Throughput: 0: 1101.8. Samples: 581416. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:26,785][03057] Avg episode reward: [(0, '4.452')] +[2025-09-02 16:25:31,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 2293760. Throughput: 0: 1057.5. Samples: 587356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:31,780][03057] Avg episode reward: [(0, '4.594')] +[2025-09-02 16:25:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 2359296. Throughput: 0: 1095.1. Samples: 594852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:36,782][03057] Avg episode reward: [(0, '4.517')] +[2025-09-02 16:25:41,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2359296. Throughput: 0: 1117.4. Samples: 598384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:41,780][03057] Avg episode reward: [(0, '4.583')] +[2025-09-02 16:25:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2359296. Throughput: 0: 1060.4. Samples: 603548. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:46,783][03057] Avg episode reward: [(0, '4.686')] +[2025-09-02 16:25:51,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2424832. Throughput: 0: 1074.1. Samples: 610732. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:51,780][03057] Avg episode reward: [(0, '4.664')] +[2025-09-02 16:25:56,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2424832. Throughput: 0: 1107.7. Samples: 614604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:25:56,780][03057] Avg episode reward: [(0, '4.552')] +[2025-09-02 16:26:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2424832. Throughput: 0: 1093.1. Samples: 620560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:01,783][03057] Avg episode reward: [(0, '4.507')] +[2025-09-02 16:26:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2490368. Throughput: 0: 1073.8. Samples: 626888. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:06,780][03057] Avg episode reward: [(0, '4.566')] +[2025-09-02 16:26:11,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2490368. Throughput: 0: 1098.6. Samples: 630852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:11,784][03057] Avg episode reward: [(0, '4.845')] +[2025-09-02 16:26:11,787][03375] Saving new best policy, reward=4.845! +[2025-09-02 16:26:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2490368. Throughput: 0: 1118.4. Samples: 637684. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:16,783][03057] Avg episode reward: [(0, '4.706')] +[2025-09-02 16:26:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2555904. Throughput: 0: 1072.9. Samples: 643132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:21,779][03057] Avg episode reward: [(0, '4.646')] +[2025-09-02 16:26:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2555904. Throughput: 0: 1083.5. Samples: 647140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:26,780][03057] Avg episode reward: [(0, '4.848')] +[2025-09-02 16:26:26,796][03375] Saving new best policy, reward=4.848! +[2025-09-02 16:26:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2555904. Throughput: 0: 1136.0. Samples: 654668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:31,783][03057] Avg episode reward: [(0, '4.684')] +[2025-09-02 16:26:34,801][03390] Updated weights for policy 0, policy_version 40 (0.0014) +[2025-09-02 16:26:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2621440. Throughput: 0: 1082.0. Samples: 659424. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:36,782][03057] Avg episode reward: [(0, '4.452')] +[2025-09-02 16:26:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2621440. Throughput: 0: 1074.0. Samples: 662936. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:41,782][03057] Avg episode reward: [(0, '4.608')] +[2025-09-02 16:26:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2621440. Throughput: 0: 1107.4. Samples: 670392. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:46,782][03057] Avg episode reward: [(0, '4.770')] +[2025-09-02 16:26:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 2686976. Throughput: 0: 1090.1. Samples: 675944. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:51,783][03057] Avg episode reward: [(0, '4.553')] +[2025-09-02 16:26:56,781][03057] Fps is (10 sec: 6551.7, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 2686976. Throughput: 0: 1064.0. Samples: 678736. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:26:56,783][03057] Avg episode reward: [(0, '4.443')] +[2025-09-02 16:26:56,792][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000041_2686976.pth... +[2025-09-02 16:26:56,917][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000025_1638400.pth +[2025-09-02 16:27:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2686976. Throughput: 0: 1089.5. Samples: 686712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:01,780][03057] Avg episode reward: [(0, '4.730')] +[2025-09-02 16:27:06,778][03057] Fps is (10 sec: 6555.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2752512. Throughput: 0: 1110.9. Samples: 693124. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:06,782][03057] Avg episode reward: [(0, '4.797')] +[2025-09-02 16:27:11,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 2752512. Throughput: 0: 1079.3. Samples: 695708. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:11,784][03057] Avg episode reward: [(0, '4.631')] +[2025-09-02 16:27:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2752512. Throughput: 0: 1071.2. Samples: 702872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:16,783][03057] Avg episode reward: [(0, '4.746')] +[2025-09-02 16:27:21,780][03057] Fps is (10 sec: 6553.3, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 2818048. Throughput: 0: 1129.4. Samples: 710248. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:21,782][03057] Avg episode reward: [(0, '4.809')] +[2025-09-02 16:27:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2818048. Throughput: 0: 1108.6. Samples: 712824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:26,780][03057] Avg episode reward: [(0, '4.849')] +[2025-09-02 16:27:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2818048. Throughput: 0: 1083.9. Samples: 719168. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:31,781][03057] Avg episode reward: [(0, '4.945')] +[2025-09-02 16:27:31,784][03375] Saving new best policy, reward=4.945! +[2025-09-02 16:27:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2883584. Throughput: 0: 1126.3. Samples: 726628. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:36,779][03057] Avg episode reward: [(0, '4.814')] +[2025-09-02 16:27:41,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2883584. Throughput: 0: 1139.3. Samples: 730000. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:41,780][03057] Avg episode reward: [(0, '4.537')] +[2025-09-02 16:27:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 2883584. Throughput: 0: 1080.2. Samples: 735320. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:46,779][03057] Avg episode reward: [(0, '4.518')] +[2025-09-02 16:27:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 2949120. Throughput: 0: 1108.3. Samples: 742996. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:51,780][03057] Avg episode reward: [(0, '4.646')] +[2025-09-02 16:27:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 2949120. Throughput: 0: 1141.0. Samples: 747052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:27:56,780][03057] Avg episode reward: [(0, '4.832')] +[2025-09-02 16:28:01,783][03057] Fps is (10 sec: 0.0, 60 sec: 4368.7, 300 sec: 4220.9). Total num frames: 2949120. Throughput: 0: 1104.2. Samples: 752564. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:01,786][03057] Avg episode reward: [(0, '4.685')] +[2025-09-02 16:28:06,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 3014656. Throughput: 0: 1092.5. Samples: 759408. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:06,782][03057] Avg episode reward: [(0, '4.415')] +[2025-09-02 16:28:11,778][03057] Fps is (10 sec: 6556.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 3014656. Throughput: 0: 1123.6. Samples: 763384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:11,782][03057] Avg episode reward: [(0, '4.362')] +[2025-09-02 16:28:16,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 3014656. Throughput: 0: 1120.2. Samples: 769580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:16,780][03057] Avg episode reward: [(0, '4.456')] +[2025-09-02 16:28:21,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 3080192. Throughput: 0: 1046.3. Samples: 773712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:21,784][03057] Avg episode reward: [(0, '4.740')] +[2025-09-02 16:28:26,778][03057] Fps is (10 sec: 6554.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3080192. Throughput: 0: 1026.2. Samples: 776180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:26,783][03057] Avg episode reward: [(0, '4.930')] +[2025-09-02 16:28:31,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 3080192. Throughput: 0: 1076.6. Samples: 783768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:31,785][03057] Avg episode reward: [(0, '4.753')] +[2025-09-02 16:28:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.1). Total num frames: 3080192. Throughput: 0: 1016.3. Samples: 788728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:36,782][03057] Avg episode reward: [(0, '4.760')] +[2025-09-02 16:28:41,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3145728. Throughput: 0: 994.2. Samples: 791792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:41,782][03057] Avg episode reward: [(0, '4.642')] +[2025-09-02 16:28:46,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 3145728. Throughput: 0: 1033.1. Samples: 799048. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:46,782][03057] Avg episode reward: [(0, '4.719')] +[2025-09-02 16:28:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 3145728. Throughput: 0: 1006.2. Samples: 804688. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:51,781][03057] Avg episode reward: [(0, '4.844')] +[2025-09-02 16:28:56,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3211264. Throughput: 0: 972.4. Samples: 807144. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:28:56,780][03057] Avg episode reward: [(0, '4.875')] +[2025-09-02 16:28:56,788][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000049_3211264.pth... +[2025-09-02 16:28:56,917][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000033_2162688.pth +[2025-09-02 16:29:01,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.4, 300 sec: 4221.0). Total num frames: 3211264. Throughput: 0: 1005.6. Samples: 814832. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:29:01,785][03057] Avg episode reward: [(0, '5.035')] +[2025-09-02 16:29:01,792][03375] Saving new best policy, reward=5.035! +[2025-09-02 16:29:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 3211264. Throughput: 0: 1059.6. Samples: 821396. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:29:06,788][03057] Avg episode reward: [(0, '4.858')] +[2025-09-02 16:29:08,101][03390] Updated weights for policy 0, policy_version 50 (0.0020) +[2025-09-02 16:29:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3276800. Throughput: 0: 1051.2. Samples: 823484. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:29:11,780][03057] Avg episode reward: [(0, '4.776')] +[2025-09-02 16:29:16,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3276800. Throughput: 0: 1033.2. Samples: 830260. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:29:16,780][03057] Avg episode reward: [(0, '4.948')] +[2025-09-02 16:29:21,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 3276800. Throughput: 0: 1091.3. Samples: 837836. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:29:21,781][03057] Avg episode reward: [(0, '5.433')] +[2025-09-02 16:29:21,785][03375] Saving new best policy, reward=5.433! +[2025-09-02 16:29:26,781][03057] Fps is (10 sec: 6552.2, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 3342336. Throughput: 0: 1071.4. Samples: 840008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:29:26,787][03057] Avg episode reward: [(0, '5.271')] +[2025-09-02 16:29:31,781][03057] Fps is (10 sec: 6552.3, 60 sec: 4368.9, 300 sec: 4221.0). Total num frames: 3342336. Throughput: 0: 1046.4. Samples: 846140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:29:31,782][03057] Avg episode reward: [(0, '4.943')] +[2025-09-02 16:29:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3342336. Throughput: 0: 1092.0. Samples: 853828. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:29:36,780][03057] Avg episode reward: [(0, '5.239')] +[2025-09-02 16:29:41,779][03057] Fps is (10 sec: 6555.2, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 3407872. Throughput: 0: 1100.1. Samples: 856648. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:29:41,780][03057] Avg episode reward: [(0, '5.004')] +[2025-09-02 16:29:46,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3407872. Throughput: 0: 1044.4. Samples: 861828. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:29:46,779][03057] Avg episode reward: [(0, '4.802')] +[2025-09-02 16:29:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3407872. Throughput: 0: 1070.5. Samples: 869568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:29:51,782][03057] Avg episode reward: [(0, '4.849')] +[2025-09-02 16:29:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3473408. Throughput: 0: 1102.0. Samples: 873072. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:29:56,780][03057] Avg episode reward: [(0, '4.998')] +[2025-09-02 16:30:01,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 3473408. Throughput: 0: 1069.2. Samples: 878376. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:01,784][03057] Avg episode reward: [(0, '4.880')] +[2025-09-02 16:30:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3473408. Throughput: 0: 1063.4. Samples: 885688. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:06,780][03057] Avg episode reward: [(0, '4.849')] +[2025-09-02 16:30:11,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3538944. Throughput: 0: 1093.5. Samples: 889212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:11,780][03057] Avg episode reward: [(0, '4.904')] +[2025-09-02 16:30:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3538944. Throughput: 0: 1096.9. Samples: 895496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:16,783][03057] Avg episode reward: [(0, '5.155')] +[2025-09-02 16:30:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3538944. Throughput: 0: 1067.3. Samples: 901856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:21,780][03057] Avg episode reward: [(0, '5.057')] +[2025-09-02 16:30:26,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 3604480. Throughput: 0: 1081.8. Samples: 905328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:26,780][03057] Avg episode reward: [(0, '4.813')] +[2025-09-02 16:30:31,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 3604480. Throughput: 0: 1132.8. Samples: 912804. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:31,780][03057] Avg episode reward: [(0, '5.000')] +[2025-09-02 16:30:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3604480. Throughput: 0: 1077.6. Samples: 918060. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:36,779][03057] Avg episode reward: [(0, '4.997')] +[2025-09-02 16:30:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3670016. Throughput: 0: 1076.4. Samples: 921508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:41,782][03057] Avg episode reward: [(0, '5.062')] +[2025-09-02 16:30:46,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3670016. Throughput: 0: 1127.5. Samples: 929112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:46,780][03057] Avg episode reward: [(0, '5.032')] +[2025-09-02 16:30:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 3670016. Throughput: 0: 1083.6. Samples: 934448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:51,781][03057] Avg episode reward: [(0, '5.332')] +[2025-09-02 16:30:56,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 3735552. Throughput: 0: 1066.0. Samples: 937184. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:30:56,780][03057] Avg episode reward: [(0, '5.388')] +[2025-09-02 16:30:56,791][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000057_3735552.pth... +[2025-09-02 16:30:56,943][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000041_2686976.pth +[2025-09-02 16:31:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3735552. Throughput: 0: 1104.0. Samples: 945176. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:01,781][03057] Avg episode reward: [(0, '5.157')] +[2025-09-02 16:31:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3735552. Throughput: 0: 1103.4. Samples: 951508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:06,781][03057] Avg episode reward: [(0, '5.075')] +[2025-09-02 16:31:11,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3801088. Throughput: 0: 1078.0. Samples: 953836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:11,781][03057] Avg episode reward: [(0, '5.067')] +[2025-09-02 16:31:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3801088. Throughput: 0: 1080.9. Samples: 961444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:16,782][03057] Avg episode reward: [(0, '5.098')] +[2025-09-02 16:31:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3801088. Throughput: 0: 1128.0. Samples: 968820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:21,784][03057] Avg episode reward: [(0, '5.225')] +[2025-09-02 16:31:26,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3866624. Throughput: 0: 1102.5. Samples: 971120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:26,782][03057] Avg episode reward: [(0, '5.463')] +[2025-09-02 16:31:26,792][03375] Saving new best policy, reward=5.463! +[2025-09-02 16:31:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3866624. Throughput: 0: 1079.3. Samples: 977680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:31,784][03057] Avg episode reward: [(0, '5.462')] +[2025-09-02 16:31:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3866624. Throughput: 0: 1132.9. Samples: 985428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:36,782][03057] Avg episode reward: [(0, '5.234')] +[2025-09-02 16:31:37,570][03390] Updated weights for policy 0, policy_version 60 (0.0020) +[2025-09-02 16:31:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3932160. Throughput: 0: 1124.3. Samples: 987776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:41,782][03057] Avg episode reward: [(0, '5.345')] +[2025-09-02 16:31:46,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3932160. Throughput: 0: 1073.3. Samples: 993476. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:46,780][03057] Avg episode reward: [(0, '5.487')] +[2025-09-02 16:31:46,789][03375] Saving new best policy, reward=5.487! +[2025-09-02 16:31:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3932160. Throughput: 0: 1111.6. Samples: 1001528. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:51,784][03057] Avg episode reward: [(0, '5.463')] +[2025-09-02 16:31:56,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 3997696. Throughput: 0: 1130.8. Samples: 1004724. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:31:56,781][03057] Avg episode reward: [(0, '5.594')] +[2025-09-02 16:31:56,792][03375] Saving new best policy, reward=5.594! +[2025-09-02 16:32:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3997696. Throughput: 0: 1074.4. Samples: 1009792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:01,780][03057] Avg episode reward: [(0, '5.322')] +[2025-09-02 16:32:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 3997696. Throughput: 0: 1087.6. Samples: 1017764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:06,783][03057] Avg episode reward: [(0, '5.456')] +[2025-09-02 16:32:11,789][03057] Fps is (10 sec: 6546.6, 60 sec: 4368.3, 300 sec: 4443.0). Total num frames: 4063232. Throughput: 0: 1114.8. Samples: 1021296. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:11,790][03057] Avg episode reward: [(0, '5.620')] +[2025-09-02 16:32:11,792][03375] Saving new best policy, reward=5.620! +[2025-09-02 16:32:16,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 4063232. Throughput: 0: 1094.6. Samples: 1026936. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:16,780][03057] Avg episode reward: [(0, '5.601')] +[2025-09-02 16:32:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4063232. Throughput: 0: 1072.7. Samples: 1033700. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:21,780][03057] Avg episode reward: [(0, '5.486')] +[2025-09-02 16:32:26,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 4128768. Throughput: 0: 1100.7. Samples: 1037308. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:26,780][03057] Avg episode reward: [(0, '5.641')] +[2025-09-02 16:32:26,785][03375] Saving new best policy, reward=5.641! +[2025-09-02 16:32:31,779][03057] Fps is (10 sec: 6552.9, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 4128768. Throughput: 0: 1122.8. Samples: 1044004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:31,781][03057] Avg episode reward: [(0, '5.846')] +[2025-09-02 16:32:31,788][03375] Saving new best policy, reward=5.846! +[2025-09-02 16:32:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4128768. Throughput: 0: 1069.2. Samples: 1049644. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:36,788][03057] Avg episode reward: [(0, '5.736')] +[2025-09-02 16:32:41,778][03057] Fps is (10 sec: 6554.3, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 4194304. Throughput: 0: 1077.4. Samples: 1053208. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:41,784][03057] Avg episode reward: [(0, '5.329')] +[2025-09-02 16:32:46,782][03057] Fps is (10 sec: 6551.2, 60 sec: 4368.8, 300 sec: 4220.9). Total num frames: 4194304. Throughput: 0: 1134.2. Samples: 1060836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:46,783][03057] Avg episode reward: [(0, '5.378')] +[2025-09-02 16:32:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4194304. Throughput: 0: 1067.9. Samples: 1065820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:51,782][03057] Avg episode reward: [(0, '5.594')] +[2025-09-02 16:32:56,778][03057] Fps is (10 sec: 6556.0, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 4259840. Throughput: 0: 1059.4. Samples: 1068956. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:32:56,783][03057] Avg episode reward: [(0, '5.716')] +[2025-09-02 16:32:56,794][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000065_4259840.pth... +[2025-09-02 16:32:56,915][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000049_3211264.pth +[2025-09-02 16:33:01,782][03057] Fps is (10 sec: 6551.3, 60 sec: 4368.8, 300 sec: 4220.9). Total num frames: 4259840. Throughput: 0: 1104.5. Samples: 1076640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:01,783][03057] Avg episode reward: [(0, '5.924')] +[2025-09-02 16:33:01,785][03375] Saving new best policy, reward=5.924! +[2025-09-02 16:33:06,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 4259840. Throughput: 0: 1084.0. Samples: 1082484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:06,786][03057] Avg episode reward: [(0, '5.854')] +[2025-09-02 16:33:11,778][03057] Fps is (10 sec: 6555.8, 60 sec: 4369.8, 300 sec: 4443.1). Total num frames: 4325376. Throughput: 0: 1054.5. Samples: 1084760. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:11,780][03057] Avg episode reward: [(0, '5.550')] +[2025-09-02 16:33:16,778][03057] Fps is (10 sec: 6555.2, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4325376. Throughput: 0: 1034.8. Samples: 1090568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:16,782][03057] Avg episode reward: [(0, '5.441')] +[2025-09-02 16:33:21,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 4325376. Throughput: 0: 1026.3. Samples: 1095832. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:21,782][03057] Avg episode reward: [(0, '5.335')] +[2025-09-02 16:33:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 4325376. Throughput: 0: 1002.5. Samples: 1098320. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:26,782][03057] Avg episode reward: [(0, '5.358')] +[2025-09-02 16:33:31,778][03057] Fps is (10 sec: 6555.3, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 4390912. Throughput: 0: 977.6. Samples: 1104824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:31,784][03057] Avg episode reward: [(0, '6.037')] +[2025-09-02 16:33:31,788][03375] Saving new best policy, reward=6.037! +[2025-09-02 16:33:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4390912. Throughput: 0: 1036.5. Samples: 1112464. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:36,780][03057] Avg episode reward: [(0, '6.554')] +[2025-09-02 16:33:36,795][03375] Saving new best policy, reward=6.554! +[2025-09-02 16:33:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 4390912. Throughput: 0: 1023.6. Samples: 1115016. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:41,782][03057] Avg episode reward: [(0, '5.949')] +[2025-09-02 16:33:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 4456448. Throughput: 0: 977.8. Samples: 1120636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:46,780][03057] Avg episode reward: [(0, '6.270')] +[2025-09-02 16:33:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4456448. Throughput: 0: 1013.7. Samples: 1128100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:51,780][03057] Avg episode reward: [(0, '6.336')] +[2025-09-02 16:33:56,781][03057] Fps is (10 sec: 0.0, 60 sec: 3276.7, 300 sec: 4220.9). Total num frames: 4456448. Throughput: 0: 1036.2. Samples: 1131392. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:33:56,783][03057] Avg episode reward: [(0, '6.433')] +[2025-09-02 16:34:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 4521984. Throughput: 0: 1012.8. Samples: 1136144. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:34:01,779][03057] Avg episode reward: [(0, '6.371')] +[2025-09-02 16:34:06,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 4521984. Throughput: 0: 1066.9. Samples: 1143840. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:34:06,780][03057] Avg episode reward: [(0, '6.020')] +[2025-09-02 16:34:11,745][03390] Updated weights for policy 0, policy_version 70 (0.0014) +[2025-09-02 16:34:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 4587520. Throughput: 0: 1095.2. Samples: 1147604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:34:11,779][03057] Avg episode reward: [(0, '5.876')] +[2025-09-02 16:34:16,782][03057] Fps is (10 sec: 6551.5, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 4587520. Throughput: 0: 1065.3. Samples: 1152768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:34:16,783][03057] Avg episode reward: [(0, '6.070')] +[2025-09-02 16:34:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 4587520. Throughput: 0: 1046.1. Samples: 1159540. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:34:21,780][03057] Avg episode reward: [(0, '6.980')] +[2025-09-02 16:34:21,781][03375] Saving new best policy, reward=6.980! +[2025-09-02 16:34:26,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 4587520. Throughput: 0: 1075.6. Samples: 1163420. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:34:26,785][03057] Avg episode reward: [(0, '7.119')] +[2025-09-02 16:34:26,930][03375] Saving new best policy, reward=7.119! +[2025-09-02 16:34:31,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 4653056. Throughput: 0: 1091.7. Samples: 1169764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:34:31,782][03057] Avg episode reward: [(0, '6.886')] +[2025-09-02 16:34:36,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4653056. Throughput: 0: 1055.3. Samples: 1175588. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:34:36,779][03057] Avg episode reward: [(0, '6.821')] +[2025-09-02 16:34:41,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 4653056. Throughput: 0: 1071.6. Samples: 1179616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:34:41,786][03057] Avg episode reward: [(0, '7.383')] +[2025-09-02 16:34:42,339][03375] Saving new best policy, reward=7.383! +[2025-09-02 16:34:46,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 4718592. Throughput: 0: 1123.0. Samples: 1186680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:34:46,781][03057] Avg episode reward: [(0, '7.603')] +[2025-09-02 16:34:46,793][03375] Saving new best policy, reward=7.603! +[2025-09-02 16:34:51,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4718592. Throughput: 0: 1062.6. Samples: 1191656. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:34:51,779][03057] Avg episode reward: [(0, '7.167')] +[2025-09-02 16:34:56,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 4718592. Throughput: 0: 1063.0. Samples: 1195440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:34:56,782][03057] Avg episode reward: [(0, '7.491')] +[2025-09-02 16:34:56,791][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000072_4718592.pth... +[2025-09-02 16:34:57,065][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000057_3735552.pth +[2025-09-02 16:35:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 4784128. Throughput: 0: 1115.3. Samples: 1202952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:01,781][03057] Avg episode reward: [(0, '6.917')] +[2025-09-02 16:35:06,781][03057] Fps is (10 sec: 6552.4, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 4784128. Throughput: 0: 1096.0. Samples: 1208864. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:06,784][03057] Avg episode reward: [(0, '7.058')] +[2025-09-02 16:35:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 4784128. Throughput: 0: 1072.2. Samples: 1211668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:11,780][03057] Avg episode reward: [(0, '7.508')] +[2025-09-02 16:35:16,778][03057] Fps is (10 sec: 6555.2, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 4849664. Throughput: 0: 1096.4. Samples: 1219104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:16,779][03057] Avg episode reward: [(0, '8.014')] +[2025-09-02 16:35:16,789][03375] Saving new best policy, reward=8.014! +[2025-09-02 16:35:21,779][03057] Fps is (10 sec: 6552.8, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 4849664. Throughput: 0: 1116.2. Samples: 1225820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:21,788][03057] Avg episode reward: [(0, '7.674')] +[2025-09-02 16:35:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4849664. Throughput: 0: 1082.4. Samples: 1228320. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:26,779][03057] Avg episode reward: [(0, '7.633')] +[2025-09-02 16:35:31,778][03057] Fps is (10 sec: 6554.3, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 4915200. Throughput: 0: 1070.0. Samples: 1234832. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:31,780][03057] Avg episode reward: [(0, '7.140')] +[2025-09-02 16:35:36,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 4915200. Throughput: 0: 1133.7. Samples: 1242672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:36,780][03057] Avg episode reward: [(0, '7.594')] +[2025-09-02 16:35:41,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 4915200. Throughput: 0: 1106.5. Samples: 1245232. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:41,780][03057] Avg episode reward: [(0, '7.581')] +[2025-09-02 16:35:46,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 4980736. Throughput: 0: 1071.2. Samples: 1251156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:46,784][03057] Avg episode reward: [(0, '7.084')] +[2025-09-02 16:35:51,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4980736. Throughput: 0: 1110.5. Samples: 1258836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:51,784][03057] Avg episode reward: [(0, '7.277')] +[2025-09-02 16:35:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 4980736. Throughput: 0: 1119.6. Samples: 1262052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:35:56,780][03057] Avg episode reward: [(0, '7.315')] +[2025-09-02 16:36:01,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 5046272. Throughput: 0: 1065.7. Samples: 1267064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:01,781][03057] Avg episode reward: [(0, '7.237')] +[2025-09-02 16:36:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 5046272. Throughput: 0: 1086.7. Samples: 1274720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:06,781][03057] Avg episode reward: [(0, '7.313')] +[2025-09-02 16:36:11,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 5046272. Throughput: 0: 1118.5. Samples: 1278652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:11,784][03057] Avg episode reward: [(0, '7.537')] +[2025-09-02 16:36:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 5111808. Throughput: 0: 1085.3. Samples: 1283672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:16,782][03057] Avg episode reward: [(0, '8.084')] +[2025-09-02 16:36:16,791][03375] Saving new best policy, reward=8.084! +[2025-09-02 16:36:21,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 5111808. Throughput: 0: 1059.8. Samples: 1290360. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:21,780][03057] Avg episode reward: [(0, '8.621')] +[2025-09-02 16:36:21,782][03375] Saving new best policy, reward=8.621! +[2025-09-02 16:36:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5111808. Throughput: 0: 1086.0. Samples: 1294100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:26,779][03057] Avg episode reward: [(0, '7.938')] +[2025-09-02 16:36:31,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 5177344. Throughput: 0: 1085.9. Samples: 1300020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:31,785][03057] Avg episode reward: [(0, '7.558')] +[2025-09-02 16:36:36,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5177344. Throughput: 0: 1048.8. Samples: 1306032. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:36,780][03057] Avg episode reward: [(0, '7.334')] +[2025-09-02 16:36:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5177344. Throughput: 0: 1063.6. Samples: 1309916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:41,780][03057] Avg episode reward: [(0, '7.216')] +[2025-09-02 16:36:42,754][03390] Updated weights for policy 0, policy_version 80 (0.0026) +[2025-09-02 16:36:46,783][03057] Fps is (10 sec: 6550.6, 60 sec: 4368.7, 300 sec: 4443.1). Total num frames: 5242880. Throughput: 0: 1104.7. Samples: 1316780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:46,784][03057] Avg episode reward: [(0, '7.480')] +[2025-09-02 16:36:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5242880. Throughput: 0: 1043.7. Samples: 1321688. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:51,780][03057] Avg episode reward: [(0, '7.823')] +[2025-09-02 16:36:56,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 5242880. Throughput: 0: 1042.4. Samples: 1325560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:36:56,782][03057] Avg episode reward: [(0, '7.716')] +[2025-09-02 16:36:56,788][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000080_5242880.pth... +[2025-09-02 16:36:56,915][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000065_4259840.pth +[2025-09-02 16:37:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 5308416. Throughput: 0: 1094.1. Samples: 1332908. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:01,780][03057] Avg episode reward: [(0, '6.990')] +[2025-09-02 16:37:06,778][03057] Fps is (10 sec: 6555.3, 60 sec: 4369.1, 300 sec: 4221.1). Total num frames: 5308416. Throughput: 0: 1066.7. Samples: 1338360. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:06,783][03057] Avg episode reward: [(0, '7.539')] +[2025-09-02 16:37:11,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5308416. Throughput: 0: 1048.3. Samples: 1341276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:11,780][03057] Avg episode reward: [(0, '7.608')] +[2025-09-02 16:37:16,781][03057] Fps is (10 sec: 6552.0, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 5373952. Throughput: 0: 1075.1. Samples: 1348400. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:16,782][03057] Avg episode reward: [(0, '8.065')] +[2025-09-02 16:37:21,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5373952. Throughput: 0: 1081.6. Samples: 1354704. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:21,779][03057] Avg episode reward: [(0, '8.283')] +[2025-09-02 16:37:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5373952. Throughput: 0: 1049.9. Samples: 1357160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:26,780][03057] Avg episode reward: [(0, '8.315')] +[2025-09-02 16:37:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 5439488. Throughput: 0: 1051.5. Samples: 1364092. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:31,782][03057] Avg episode reward: [(0, '8.548')] +[2025-09-02 16:37:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5439488. Throughput: 0: 1103.9. Samples: 1371364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:36,784][03057] Avg episode reward: [(0, '8.566')] +[2025-09-02 16:37:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5439488. Throughput: 0: 1075.4. Samples: 1373948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:41,781][03057] Avg episode reward: [(0, '8.790')] +[2025-09-02 16:37:41,784][03375] Saving new best policy, reward=8.790! +[2025-09-02 16:37:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.4, 300 sec: 4443.1). Total num frames: 5505024. Throughput: 0: 1038.4. Samples: 1379636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:46,780][03057] Avg episode reward: [(0, '8.590')] +[2025-09-02 16:37:51,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5505024. Throughput: 0: 1081.9. Samples: 1387044. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:51,780][03057] Avg episode reward: [(0, '7.694')] +[2025-09-02 16:37:56,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 5505024. Throughput: 0: 1084.1. Samples: 1390060. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:37:56,782][03057] Avg episode reward: [(0, '7.887')] +[2025-09-02 16:38:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 5570560. Throughput: 0: 1040.7. Samples: 1395228. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:01,789][03057] Avg episode reward: [(0, '8.209')] +[2025-09-02 16:38:06,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5570560. Throughput: 0: 1064.1. Samples: 1402588. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:06,780][03057] Avg episode reward: [(0, '8.792')] +[2025-09-02 16:38:06,785][03375] Saving new best policy, reward=8.792! +[2025-09-02 16:38:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5570560. Throughput: 0: 1090.1. Samples: 1406216. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:11,780][03057] Avg episode reward: [(0, '8.924')] +[2025-09-02 16:38:11,782][03375] Saving new best policy, reward=8.924! +[2025-09-02 16:38:16,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.9, 300 sec: 4221.0). Total num frames: 5570560. Throughput: 0: 1025.5. Samples: 1410240. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:16,780][03057] Avg episode reward: [(0, '9.041')] +[2025-09-02 16:38:16,787][03375] Saving new best policy, reward=9.041! +[2025-09-02 16:38:21,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 5636096. Throughput: 0: 963.2. Samples: 1414708. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:21,780][03057] Avg episode reward: [(0, '9.323')] +[2025-09-02 16:38:21,784][03375] Saving new best policy, reward=9.323! +[2025-09-02 16:38:26,778][03057] Fps is (10 sec: 6554.2, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5636096. Throughput: 0: 991.6. Samples: 1418572. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:26,780][03057] Avg episode reward: [(0, '9.517')] +[2025-09-02 16:38:26,789][03375] Saving new best policy, reward=9.517! +[2025-09-02 16:38:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 5636096. Throughput: 0: 1035.6. Samples: 1426236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:31,780][03057] Avg episode reward: [(0, '9.322')] +[2025-09-02 16:38:36,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 5701632. Throughput: 0: 978.0. Samples: 1431056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:36,780][03057] Avg episode reward: [(0, '9.239')] +[2025-09-02 16:38:41,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 5701632. Throughput: 0: 992.6. Samples: 1434728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:41,780][03057] Avg episode reward: [(0, '9.042')] +[2025-09-02 16:38:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 5701632. Throughput: 0: 1052.8. Samples: 1442604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:46,780][03057] Avg episode reward: [(0, '8.475')] +[2025-09-02 16:38:51,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 5767168. Throughput: 0: 1012.8. Samples: 1448164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:51,780][03057] Avg episode reward: [(0, '8.755')] +[2025-09-02 16:38:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5767168. Throughput: 0: 988.1. Samples: 1450680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:38:56,780][03057] Avg episode reward: [(0, '8.742')] +[2025-09-02 16:38:56,790][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000088_5767168.pth... +[2025-09-02 16:38:56,942][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000072_4718592.pth +[2025-09-02 16:39:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 5767168. Throughput: 0: 1069.3. Samples: 1458356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:39:01,790][03057] Avg episode reward: [(0, '9.290')] +[2025-09-02 16:39:06,780][03057] Fps is (10 sec: 6552.6, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 5832704. Throughput: 0: 1117.0. Samples: 1464976. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:39:06,786][03057] Avg episode reward: [(0, '9.089')] +[2025-09-02 16:39:11,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 5832704. Throughput: 0: 1086.9. Samples: 1467484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:39:11,784][03057] Avg episode reward: [(0, '9.432')] +[2025-09-02 16:39:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5832704. Throughput: 0: 1069.4. Samples: 1474360. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:39:16,781][03057] Avg episode reward: [(0, '9.476')] +[2025-09-02 16:39:18,114][03390] Updated weights for policy 0, policy_version 90 (0.0021) +[2025-09-02 16:39:21,782][03057] Fps is (10 sec: 6551.4, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 5898240. Throughput: 0: 1126.6. Samples: 1481756. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:39:21,790][03057] Avg episode reward: [(0, '9.664')] +[2025-09-02 16:39:21,792][03375] Saving new best policy, reward=9.664! +[2025-09-02 16:39:26,785][03057] Fps is (10 sec: 6549.2, 60 sec: 4368.6, 300 sec: 4220.9). Total num frames: 5898240. Throughput: 0: 1096.8. Samples: 1484092. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:39:26,790][03057] Avg episode reward: [(0, '8.862')] +[2025-09-02 16:39:31,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 5898240. Throughput: 0: 1045.5. Samples: 1489656. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:39:31,787][03057] Avg episode reward: [(0, '8.866')] +[2025-09-02 16:39:36,778][03057] Fps is (10 sec: 6558.1, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 5963776. Throughput: 0: 1078.6. Samples: 1496700. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:39:36,779][03057] Avg episode reward: [(0, '9.050')] +[2025-09-02 16:39:41,779][03057] Fps is (10 sec: 6555.1, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 5963776. Throughput: 0: 1100.3. Samples: 1500192. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:39:41,782][03057] Avg episode reward: [(0, '10.191')] +[2025-09-02 16:39:41,784][03375] Saving new best policy, reward=10.191! +[2025-09-02 16:39:46,780][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 5963776. Throughput: 0: 1035.7. Samples: 1504964. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:39:46,781][03057] Avg episode reward: [(0, '9.937')] +[2025-09-02 16:39:51,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 6029312. Throughput: 0: 1039.6. Samples: 1511756. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:39:51,783][03057] Avg episode reward: [(0, '10.737')] +[2025-09-02 16:39:51,785][03375] Saving new best policy, reward=10.737! +[2025-09-02 16:39:56,780][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 6029312. Throughput: 0: 1059.4. Samples: 1515160. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:39:56,786][03057] Avg episode reward: [(0, '10.254')] +[2025-09-02 16:40:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6029312. Throughput: 0: 1031.6. Samples: 1520780. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:40:01,784][03057] Avg episode reward: [(0, '10.445')] +[2025-09-02 16:40:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.9, 300 sec: 4221.0). Total num frames: 6029312. Throughput: 0: 998.7. Samples: 1526692. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:40:06,783][03057] Avg episode reward: [(0, '10.831')] +[2025-09-02 16:40:07,005][03375] Saving new best policy, reward=10.831! +[2025-09-02 16:40:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6094848. Throughput: 0: 1024.3. Samples: 1530180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:40:11,785][03057] Avg episode reward: [(0, '11.332')] +[2025-09-02 16:40:11,787][03375] Saving new best policy, reward=11.332! +[2025-09-02 16:40:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6094848. Throughput: 0: 1046.9. Samples: 1536764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:40:16,783][03057] Avg episode reward: [(0, '12.120')] +[2025-09-02 16:40:16,792][03375] Saving new best policy, reward=12.120! +[2025-09-02 16:40:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 3277.0, 300 sec: 4221.0). Total num frames: 6094848. Throughput: 0: 1006.1. Samples: 1541976. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:40:21,780][03057] Avg episode reward: [(0, '12.049')] +[2025-09-02 16:40:26,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4369.4, 300 sec: 4220.9). Total num frames: 6160384. Throughput: 0: 999.7. Samples: 1545180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:40:26,782][03057] Avg episode reward: [(0, '11.288')] +[2025-09-02 16:40:31,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 6160384. Throughput: 0: 1060.1. Samples: 1552668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:40:31,780][03057] Avg episode reward: [(0, '11.055')] +[2025-09-02 16:40:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 6160384. Throughput: 0: 1016.4. Samples: 1557492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:40:36,783][03057] Avg episode reward: [(0, '10.444')] +[2025-09-02 16:40:41,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6225920. Throughput: 0: 1002.6. Samples: 1560276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:40:41,780][03057] Avg episode reward: [(0, '11.183')] +[2025-09-02 16:40:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 6225920. Throughput: 0: 1044.7. Samples: 1567792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:40:46,780][03057] Avg episode reward: [(0, '11.206')] +[2025-09-02 16:40:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 6225920. Throughput: 0: 1043.7. Samples: 1573660. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:40:51,783][03057] Avg episode reward: [(0, '11.901')] +[2025-09-02 16:40:56,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6291456. Throughput: 0: 1011.6. Samples: 1575704. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:40:56,783][03057] Avg episode reward: [(0, '11.639')] +[2025-09-02 16:40:56,793][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000096_6291456.pth... +[2025-09-02 16:40:56,931][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000080_5242880.pth +[2025-09-02 16:41:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6291456. Throughput: 0: 1021.4. Samples: 1582728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:01,785][03057] Avg episode reward: [(0, '12.072')] +[2025-09-02 16:41:06,782][03057] Fps is (10 sec: 0.0, 60 sec: 4368.8, 300 sec: 4220.9). Total num frames: 6291456. Throughput: 0: 1059.1. Samples: 1589640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:06,785][03057] Avg episode reward: [(0, '11.617')] +[2025-09-02 16:41:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6356992. Throughput: 0: 1037.8. Samples: 1591880. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:11,780][03057] Avg episode reward: [(0, '11.069')] +[2025-09-02 16:41:16,778][03057] Fps is (10 sec: 6556.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6356992. Throughput: 0: 1007.2. Samples: 1597992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:16,784][03057] Avg episode reward: [(0, '10.675')] +[2025-09-02 16:41:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6356992. Throughput: 0: 1068.2. Samples: 1605560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:21,780][03057] Avg episode reward: [(0, '10.800')] +[2025-09-02 16:41:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 6422528. Throughput: 0: 1061.1. Samples: 1608024. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:26,780][03057] Avg episode reward: [(0, '10.902')] +[2025-09-02 16:41:31,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6422528. Throughput: 0: 1006.6. Samples: 1613088. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:31,779][03057] Avg episode reward: [(0, '11.775')] +[2025-09-02 16:41:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6422528. Throughput: 0: 1041.0. Samples: 1620504. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:36,783][03057] Avg episode reward: [(0, '11.624')] +[2025-09-02 16:41:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6488064. Throughput: 0: 1071.6. Samples: 1623924. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:41,780][03057] Avg episode reward: [(0, '11.808')] +[2025-09-02 16:41:46,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6488064. Throughput: 0: 1024.6. Samples: 1628836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:46,783][03057] Avg episode reward: [(0, '11.881')] +[2025-09-02 16:41:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6488064. Throughput: 0: 1024.1. Samples: 1635720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:41:51,780][03057] Avg episode reward: [(0, '10.952')] +[2025-09-02 16:41:56,238][03390] Updated weights for policy 0, policy_version 100 (0.0017) +[2025-09-02 16:41:56,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6553600. Throughput: 0: 1056.3. Samples: 1639412. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:41:56,780][03057] Avg episode reward: [(0, '11.450')] +[2025-09-02 16:42:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6553600. Throughput: 0: 1040.1. Samples: 1644796. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:42:01,780][03057] Avg episode reward: [(0, '13.180')] +[2025-09-02 16:42:01,782][03375] Saving new best policy, reward=13.180! +[2025-09-02 16:42:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 6553600. Throughput: 0: 1000.4. Samples: 1650576. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:42:06,781][03057] Avg episode reward: [(0, '14.934')] +[2025-09-02 16:42:06,800][03375] Saving new best policy, reward=14.934! +[2025-09-02 16:42:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 6553600. Throughput: 0: 1026.9. Samples: 1654236. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:42:11,780][03057] Avg episode reward: [(0, '16.100')] +[2025-09-02 16:42:11,781][03375] Saving new best policy, reward=16.100! +[2025-09-02 16:42:16,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6619136. Throughput: 0: 1057.5. Samples: 1660676. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:42:16,784][03057] Avg episode reward: [(0, '15.554')] +[2025-09-02 16:42:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6619136. Throughput: 0: 999.7. Samples: 1665492. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:42:21,785][03057] Avg episode reward: [(0, '14.952')] +[2025-09-02 16:42:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 6619136. Throughput: 0: 1006.2. Samples: 1669204. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:42:26,782][03057] Avg episode reward: [(0, '13.944')] +[2025-09-02 16:42:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6684672. Throughput: 0: 1049.3. Samples: 1676056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:42:31,780][03057] Avg episode reward: [(0, '13.312')] +[2025-09-02 16:42:36,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 6684672. Throughput: 0: 1010.6. Samples: 1681196. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:42:36,780][03057] Avg episode reward: [(0, '13.769')] +[2025-09-02 16:42:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 6684672. Throughput: 0: 993.7. Samples: 1684128. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:42:41,780][03057] Avg episode reward: [(0, '14.059')] +[2025-09-02 16:42:46,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6750208. Throughput: 0: 1031.7. Samples: 1691224. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:42:46,780][03057] Avg episode reward: [(0, '14.344')] +[2025-09-02 16:42:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6750208. Throughput: 0: 1042.9. Samples: 1697508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:42:51,784][03057] Avg episode reward: [(0, '13.869')] +[2025-09-02 16:42:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 6750208. Throughput: 0: 1013.6. Samples: 1699848. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:42:56,780][03057] Avg episode reward: [(0, '13.665')] +[2025-09-02 16:42:56,790][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000103_6750208.pth... +[2025-09-02 16:42:56,922][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000088_5767168.pth +[2025-09-02 16:43:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6815744. Throughput: 0: 1013.3. Samples: 1706276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:01,780][03057] Avg episode reward: [(0, '12.978')] +[2025-09-02 16:43:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6815744. Throughput: 0: 1051.5. Samples: 1712808. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:06,783][03057] Avg episode reward: [(0, '14.105')] +[2025-09-02 16:43:11,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 6815744. Throughput: 0: 1011.5. Samples: 1714724. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:11,784][03057] Avg episode reward: [(0, '14.338')] +[2025-09-02 16:43:16,781][03057] Fps is (10 sec: 0.0, 60 sec: 3276.7, 300 sec: 3998.8). Total num frames: 6815744. Throughput: 0: 950.4. Samples: 1718828. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:16,782][03057] Avg episode reward: [(0, '13.935')] +[2025-09-02 16:43:21,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6881280. Throughput: 0: 985.4. Samples: 1725540. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:21,779][03057] Avg episode reward: [(0, '14.078')] +[2025-09-02 16:43:26,778][03057] Fps is (10 sec: 6555.3, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6881280. Throughput: 0: 1002.3. Samples: 1729232. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:26,780][03057] Avg episode reward: [(0, '12.729')] +[2025-09-02 16:43:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 6881280. Throughput: 0: 968.4. Samples: 1734804. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:31,780][03057] Avg episode reward: [(0, '12.376')] +[2025-09-02 16:43:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6946816. Throughput: 0: 956.9. Samples: 1740568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:36,783][03057] Avg episode reward: [(0, '12.702')] +[2025-09-02 16:43:41,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 6946816. Throughput: 0: 987.7. Samples: 1744296. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:41,783][03057] Avg episode reward: [(0, '12.590')] +[2025-09-02 16:43:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 6946816. Throughput: 0: 992.6. Samples: 1750944. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:46,780][03057] Avg episode reward: [(0, '12.920')] +[2025-09-02 16:43:51,781][03057] Fps is (10 sec: 0.0, 60 sec: 3276.6, 300 sec: 3998.8). Total num frames: 6946816. Throughput: 0: 956.7. Samples: 1755864. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:51,782][03057] Avg episode reward: [(0, '12.386')] +[2025-09-02 16:43:56,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 7012352. Throughput: 0: 991.9. Samples: 1759360. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:43:56,784][03057] Avg episode reward: [(0, '14.415')] +[2025-09-02 16:44:01,778][03057] Fps is (10 sec: 6555.6, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 7012352. Throughput: 0: 1063.9. Samples: 1766700. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:01,779][03057] Avg episode reward: [(0, '14.245')] +[2025-09-02 16:44:06,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 7012352. Throughput: 0: 1025.1. Samples: 1771668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:06,782][03057] Avg episode reward: [(0, '14.775')] +[2025-09-02 16:44:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 7077888. Throughput: 0: 1004.4. Samples: 1774428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:11,780][03057] Avg episode reward: [(0, '14.915')] +[2025-09-02 16:44:16,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.3, 300 sec: 3998.9). Total num frames: 7077888. Throughput: 0: 1046.7. Samples: 1781904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:16,780][03057] Avg episode reward: [(0, '12.963')] +[2025-09-02 16:44:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.9). Total num frames: 7077888. Throughput: 0: 1053.3. Samples: 1787968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:21,782][03057] Avg episode reward: [(0, '13.670')] +[2025-09-02 16:44:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 7143424. Throughput: 0: 1018.0. Samples: 1790108. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:26,785][03057] Avg episode reward: [(0, '13.399')] +[2025-09-02 16:44:31,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7143424. Throughput: 0: 1023.3. Samples: 1796992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:31,785][03057] Avg episode reward: [(0, '14.204')] +[2025-09-02 16:44:36,782][03057] Fps is (10 sec: 0.0, 60 sec: 3276.6, 300 sec: 3998.8). Total num frames: 7143424. Throughput: 0: 1065.5. Samples: 1803812. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:36,783][03057] Avg episode reward: [(0, '14.365')] +[2025-09-02 16:44:38,866][03390] Updated weights for policy 0, policy_version 110 (0.0025) +[2025-09-02 16:44:41,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 7208960. Throughput: 0: 1034.0. Samples: 1805888. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:41,783][03057] Avg episode reward: [(0, '14.806')] +[2025-09-02 16:44:46,778][03057] Fps is (10 sec: 6556.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7208960. Throughput: 0: 1010.0. Samples: 1812152. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:46,780][03057] Avg episode reward: [(0, '16.195')] +[2025-09-02 16:44:46,789][03375] Saving new best policy, reward=16.195! +[2025-09-02 16:44:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 3998.8). Total num frames: 7208960. Throughput: 0: 1065.1. Samples: 1819596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:51,783][03057] Avg episode reward: [(0, '15.363')] +[2025-09-02 16:44:56,780][03057] Fps is (10 sec: 6552.4, 60 sec: 4369.1, 300 sec: 4220.9). Total num frames: 7274496. Throughput: 0: 1057.7. Samples: 1822028. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:44:56,781][03057] Avg episode reward: [(0, '15.456')] +[2025-09-02 16:44:56,797][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000111_7274496.pth... +[2025-09-02 16:44:56,960][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000096_6291456.pth +[2025-09-02 16:45:01,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 7274496. Throughput: 0: 1002.9. Samples: 1827036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:01,784][03057] Avg episode reward: [(0, '15.517')] +[2025-09-02 16:45:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7274496. Throughput: 0: 1036.4. Samples: 1834604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:06,780][03057] Avg episode reward: [(0, '15.313')] +[2025-09-02 16:45:11,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 7340032. Throughput: 0: 1068.5. Samples: 1838192. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:11,779][03057] Avg episode reward: [(0, '16.182')] +[2025-09-02 16:45:16,781][03057] Fps is (10 sec: 6551.8, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 7340032. Throughput: 0: 1020.6. Samples: 1842924. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:16,788][03057] Avg episode reward: [(0, '15.472')] +[2025-09-02 16:45:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7340032. Throughput: 0: 1023.6. Samples: 1849872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:21,780][03057] Avg episode reward: [(0, '16.344')] +[2025-09-02 16:45:21,790][03375] Saving new best policy, reward=16.344! +[2025-09-02 16:45:26,778][03057] Fps is (10 sec: 6555.3, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 7405568. Throughput: 0: 1058.2. Samples: 1853508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:26,780][03057] Avg episode reward: [(0, '15.701')] +[2025-09-02 16:45:31,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 7405568. Throughput: 0: 1038.7. Samples: 1858892. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:31,779][03057] Avg episode reward: [(0, '15.446')] +[2025-09-02 16:45:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 3998.8). Total num frames: 7405568. Throughput: 0: 1002.0. Samples: 1864684. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:36,779][03057] Avg episode reward: [(0, '14.956')] +[2025-09-02 16:45:41,782][03057] Fps is (10 sec: 6550.8, 60 sec: 4368.8, 300 sec: 4220.9). Total num frames: 7471104. Throughput: 0: 1029.5. Samples: 1868356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:41,784][03057] Avg episode reward: [(0, '15.034')] +[2025-09-02 16:45:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 7471104. Throughput: 0: 1062.6. Samples: 1874852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:46,780][03057] Avg episode reward: [(0, '14.871')] +[2025-09-02 16:45:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7471104. Throughput: 0: 1004.5. Samples: 1879808. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:51,780][03057] Avg episode reward: [(0, '15.518')] +[2025-09-02 16:45:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.9, 300 sec: 3998.8). Total num frames: 7471104. Throughput: 0: 1008.3. Samples: 1883564. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:45:56,785][03057] Avg episode reward: [(0, '14.500')] +[2025-09-02 16:46:01,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 7536640. Throughput: 0: 1057.0. Samples: 1890488. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:46:01,780][03057] Avg episode reward: [(0, '14.354')] +[2025-09-02 16:46:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7536640. Throughput: 0: 1017.7. Samples: 1895668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:46:06,781][03057] Avg episode reward: [(0, '13.847')] +[2025-09-02 16:46:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 7536640. Throughput: 0: 1000.8. Samples: 1898544. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:46:11,780][03057] Avg episode reward: [(0, '13.044')] +[2025-09-02 16:46:16,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 7602176. Throughput: 0: 1037.9. Samples: 1905596. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:46:16,780][03057] Avg episode reward: [(0, '14.821')] +[2025-09-02 16:46:21,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4368.9, 300 sec: 3998.8). Total num frames: 7602176. Throughput: 0: 1046.4. Samples: 1911776. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:46:21,781][03057] Avg episode reward: [(0, '14.346')] +[2025-09-02 16:46:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 7602176. Throughput: 0: 1018.0. Samples: 1914164. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:46:26,780][03057] Avg episode reward: [(0, '15.263')] +[2025-09-02 16:46:31,778][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 7667712. Throughput: 0: 1017.1. Samples: 1920620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:46:31,785][03057] Avg episode reward: [(0, '14.542')] +[2025-09-02 16:46:36,779][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.0, 300 sec: 3998.8). Total num frames: 7667712. Throughput: 0: 1065.9. Samples: 1927772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:46:36,780][03057] Avg episode reward: [(0, '14.776')] +[2025-09-02 16:46:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 3277.0, 300 sec: 3998.8). Total num frames: 7667712. Throughput: 0: 1035.9. Samples: 1930180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:46:41,780][03057] Avg episode reward: [(0, '14.081')] +[2025-09-02 16:46:46,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 7733248. Throughput: 0: 1009.8. Samples: 1935928. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:46:46,780][03057] Avg episode reward: [(0, '14.208')] +[2025-09-02 16:46:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7733248. Throughput: 0: 1056.3. Samples: 1943200. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:46:51,784][03057] Avg episode reward: [(0, '12.675')] +[2025-09-02 16:46:56,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 3998.8). Total num frames: 7733248. Throughput: 0: 1063.6. Samples: 1946408. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:46:56,783][03057] Avg episode reward: [(0, '12.586')] +[2025-09-02 16:46:56,793][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000118_7733248.pth... +[2025-09-02 16:46:56,953][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000103_6750208.pth +[2025-09-02 16:47:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 7733248. Throughput: 0: 1009.9. Samples: 1951040. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:47:01,780][03057] Avg episode reward: [(0, '12.881')] +[2025-09-02 16:47:06,778][03057] Fps is (10 sec: 6555.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 7798784. Throughput: 0: 1031.0. Samples: 1958168. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:47:06,786][03057] Avg episode reward: [(0, '13.629')] +[2025-09-02 16:47:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7798784. Throughput: 0: 1059.6. Samples: 1961848. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:47:11,785][03057] Avg episode reward: [(0, '15.556')] +[2025-09-02 16:47:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 7798784. Throughput: 0: 1035.8. Samples: 1967232. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:47:16,785][03057] Avg episode reward: [(0, '15.475')] +[2025-09-02 16:47:17,347][03390] Updated weights for policy 0, policy_version 120 (0.0017) +[2025-09-02 16:47:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 7864320. Throughput: 0: 1013.4. Samples: 1973376. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:47:21,782][03057] Avg episode reward: [(0, '15.905')] +[2025-09-02 16:47:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7864320. Throughput: 0: 1042.8. Samples: 1977104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:47:26,781][03057] Avg episode reward: [(0, '16.405')] +[2025-09-02 16:47:26,787][03375] Saving new best policy, reward=16.405! +[2025-09-02 16:47:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 7864320. Throughput: 0: 1052.5. Samples: 1983292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:47:31,784][03057] Avg episode reward: [(0, '15.735')] +[2025-09-02 16:47:36,784][03057] Fps is (10 sec: 6549.8, 60 sec: 4368.7, 300 sec: 4220.9). Total num frames: 7929856. Throughput: 0: 1001.3. Samples: 1988264. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:47:36,785][03057] Avg episode reward: [(0, '16.375')] +[2025-09-02 16:47:41,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7929856. Throughput: 0: 1013.8. Samples: 1992028. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:47:41,779][03057] Avg episode reward: [(0, '16.965')] +[2025-09-02 16:47:41,783][03375] Saving new best policy, reward=16.965! +[2025-09-02 16:47:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 7929856. Throughput: 0: 1078.8. Samples: 1999584. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:47:46,785][03057] Avg episode reward: [(0, '15.975')] +[2025-09-02 16:47:51,781][03057] Fps is (10 sec: 6551.7, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 7995392. Throughput: 0: 1028.1. Samples: 2004436. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:47:51,782][03057] Avg episode reward: [(0, '16.978')] +[2025-09-02 16:47:51,784][03375] Saving new best policy, reward=16.978! +[2025-09-02 16:47:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 3998.8). Total num frames: 7995392. Throughput: 0: 1023.5. Samples: 2007904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:47:56,780][03057] Avg episode reward: [(0, '16.509')] +[2025-09-02 16:48:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 7995392. Throughput: 0: 1064.3. Samples: 2015124. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:48:01,780][03057] Avg episode reward: [(0, '16.986')] +[2025-09-02 16:48:01,781][03375] Saving new best policy, reward=16.986! +[2025-09-02 16:48:06,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 8060928. Throughput: 0: 1016.4. Samples: 2019116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:48:06,787][03057] Avg episode reward: [(0, '16.922')] +[2025-09-02 16:48:11,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8060928. Throughput: 0: 984.1. Samples: 2021388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:48:11,781][03057] Avg episode reward: [(0, '17.180')] +[2025-09-02 16:48:11,784][03375] Saving new best policy, reward=17.180! +[2025-09-02 16:48:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 8060928. Throughput: 0: 1003.3. Samples: 2028440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:48:16,780][03057] Avg episode reward: [(0, '18.721')] +[2025-09-02 16:48:16,790][03375] Saving new best policy, reward=18.721! +[2025-09-02 16:48:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8126464. Throughput: 0: 1053.3. Samples: 2035656. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:48:21,781][03057] Avg episode reward: [(0, '19.167')] +[2025-09-02 16:48:21,790][03375] Saving new best policy, reward=19.167! +[2025-09-02 16:48:26,780][03057] Fps is (10 sec: 6552.7, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 8126464. Throughput: 0: 1026.6. Samples: 2038228. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:48:26,781][03057] Avg episode reward: [(0, '19.198')] +[2025-09-02 16:48:26,790][03375] Saving new best policy, reward=19.198! +[2025-09-02 16:48:31,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 3998.8). Total num frames: 8126464. Throughput: 0: 995.4. Samples: 2044376. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:48:31,782][03057] Avg episode reward: [(0, '20.757')] +[2025-09-02 16:48:31,783][03375] Saving new best policy, reward=20.757! +[2025-09-02 16:48:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 3277.1, 300 sec: 3998.8). Total num frames: 8126464. Throughput: 0: 1049.2. Samples: 2051648. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:48:36,782][03057] Avg episode reward: [(0, '19.608')] +[2025-09-02 16:48:41,778][03057] Fps is (10 sec: 6554.4, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8192000. Throughput: 0: 1042.0. Samples: 2054792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:48:41,780][03057] Avg episode reward: [(0, '18.598')] +[2025-09-02 16:48:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8192000. Throughput: 0: 998.1. Samples: 2060040. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:48:46,780][03057] Avg episode reward: [(0, '17.932')] +[2025-09-02 16:48:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 8257536. Throughput: 0: 1078.9. Samples: 2067668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:48:51,780][03057] Avg episode reward: [(0, '17.240')] +[2025-09-02 16:48:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8257536. Throughput: 0: 1116.1. Samples: 2071612. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:48:56,780][03057] Avg episode reward: [(0, '17.213')] +[2025-09-02 16:48:56,785][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000126_8257536.pth... +[2025-09-02 16:48:56,918][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000111_7274496.pth +[2025-09-02 16:49:01,780][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 8257536. Throughput: 0: 1078.9. Samples: 2076992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:01,781][03057] Avg episode reward: [(0, '17.165')] +[2025-09-02 16:49:06,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 8257536. Throughput: 0: 1071.2. Samples: 2083860. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:06,784][03057] Avg episode reward: [(0, '16.449')] +[2025-09-02 16:49:11,778][03057] Fps is (10 sec: 6554.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8323072. Throughput: 0: 1095.7. Samples: 2087532. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:11,786][03057] Avg episode reward: [(0, '16.318')] +[2025-09-02 16:49:16,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 8323072. Throughput: 0: 1105.6. Samples: 2094128. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:16,781][03057] Avg episode reward: [(0, '16.120')] +[2025-09-02 16:49:21,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 8323072. Throughput: 0: 1076.3. Samples: 2100084. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:21,786][03057] Avg episode reward: [(0, '17.946')] +[2025-09-02 16:49:26,778][03057] Fps is (10 sec: 6554.2, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 8388608. Throughput: 0: 1084.6. Samples: 2103600. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:26,785][03057] Avg episode reward: [(0, '18.682')] +[2025-09-02 16:49:31,778][03057] Fps is (10 sec: 6554.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 8388608. Throughput: 0: 1133.2. Samples: 2111032. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:31,780][03057] Avg episode reward: [(0, '18.540')] +[2025-09-02 16:49:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 8388608. Throughput: 0: 1074.7. Samples: 2116028. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:36,781][03057] Avg episode reward: [(0, '19.540')] +[2025-09-02 16:49:41,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 8454144. Throughput: 0: 1065.8. Samples: 2119572. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:41,781][03057] Avg episode reward: [(0, '18.277')] +[2025-09-02 16:49:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8454144. Throughput: 0: 1125.1. Samples: 2127620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:46,780][03057] Avg episode reward: [(0, '17.395')] +[2025-09-02 16:49:50,945][03390] Updated weights for policy 0, policy_version 130 (0.0014) +[2025-09-02 16:49:51,779][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 8519680. Throughput: 0: 1097.2. Samples: 2133232. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:51,780][03057] Avg episode reward: [(0, '16.978')] +[2025-09-02 16:49:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8519680. Throughput: 0: 1077.4. Samples: 2136016. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:49:56,780][03057] Avg episode reward: [(0, '17.103')] +[2025-09-02 16:50:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 8519680. Throughput: 0: 1109.0. Samples: 2144032. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:01,782][03057] Avg episode reward: [(0, '17.078')] +[2025-09-02 16:50:06,781][03057] Fps is (10 sec: 6551.9, 60 sec: 5461.1, 300 sec: 4220.9). Total num frames: 8585216. Throughput: 0: 1115.5. Samples: 2150284. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:06,782][03057] Avg episode reward: [(0, '17.340')] +[2025-09-02 16:50:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8585216. Throughput: 0: 1094.6. Samples: 2152856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:11,779][03057] Avg episode reward: [(0, '16.882')] +[2025-09-02 16:50:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8585216. Throughput: 0: 1091.2. Samples: 2160136. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:16,783][03057] Avg episode reward: [(0, '16.593')] +[2025-09-02 16:50:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 5461.4, 300 sec: 4221.0). Total num frames: 8650752. Throughput: 0: 1143.0. Samples: 2167464. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:21,782][03057] Avg episode reward: [(0, '17.251')] +[2025-09-02 16:50:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8650752. Throughput: 0: 1125.1. Samples: 2170200. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:26,782][03057] Avg episode reward: [(0, '18.562')] +[2025-09-02 16:50:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8650752. Throughput: 0: 1084.7. Samples: 2176432. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:31,780][03057] Avg episode reward: [(0, '19.089')] +[2025-09-02 16:50:36,779][03057] Fps is (10 sec: 6553.4, 60 sec: 5461.3, 300 sec: 4221.0). Total num frames: 8716288. Throughput: 0: 1117.1. Samples: 2183500. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:36,780][03057] Avg episode reward: [(0, '19.254')] +[2025-09-02 16:50:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8716288. Throughput: 0: 1129.7. Samples: 2186852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:41,780][03057] Avg episode reward: [(0, '20.125')] +[2025-09-02 16:50:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8716288. Throughput: 0: 1068.4. Samples: 2192112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:46,784][03057] Avg episode reward: [(0, '19.494')] +[2025-09-02 16:50:51,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 8781824. Throughput: 0: 1099.2. Samples: 2199744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:51,780][03057] Avg episode reward: [(0, '20.047')] +[2025-09-02 16:50:56,784][03057] Fps is (10 sec: 6549.9, 60 sec: 4368.6, 300 sec: 4220.9). Total num frames: 8781824. Throughput: 0: 1129.1. Samples: 2203672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:50:56,785][03057] Avg episode reward: [(0, '20.784')] +[2025-09-02 16:50:56,795][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000134_8781824.pth... +[2025-09-02 16:50:56,961][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000118_7733248.pth +[2025-09-02 16:50:56,983][03375] Saving new best policy, reward=20.784! +[2025-09-02 16:51:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8781824. Throughput: 0: 1089.7. Samples: 2209172. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:01,787][03057] Avg episode reward: [(0, '19.730')] +[2025-09-02 16:51:06,778][03057] Fps is (10 sec: 6557.3, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 8847360. Throughput: 0: 1068.5. Samples: 2215548. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:06,785][03057] Avg episode reward: [(0, '20.494')] +[2025-09-02 16:51:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8847360. Throughput: 0: 1094.3. Samples: 2219444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:11,780][03057] Avg episode reward: [(0, '21.088')] +[2025-09-02 16:51:11,785][03375] Saving new best policy, reward=21.088! +[2025-09-02 16:51:16,780][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4221.0). Total num frames: 8847360. Throughput: 0: 1095.1. Samples: 2225712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:16,781][03057] Avg episode reward: [(0, '21.846')] +[2025-09-02 16:51:16,798][03375] Saving new best policy, reward=21.846! +[2025-09-02 16:51:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 8847360. Throughput: 0: 1058.2. Samples: 2231120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:21,785][03057] Avg episode reward: [(0, '21.660')] +[2025-09-02 16:51:26,778][03057] Fps is (10 sec: 6554.9, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8912896. Throughput: 0: 1063.9. Samples: 2234728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:26,779][03057] Avg episode reward: [(0, '21.699')] +[2025-09-02 16:51:31,783][03057] Fps is (10 sec: 6550.5, 60 sec: 4368.7, 300 sec: 4220.9). Total num frames: 8912896. Throughput: 0: 1110.0. Samples: 2242068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:31,784][03057] Avg episode reward: [(0, '20.507')] +[2025-09-02 16:51:36,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 8912896. Throughput: 0: 1050.5. Samples: 2247016. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:36,787][03057] Avg episode reward: [(0, '20.803')] +[2025-09-02 16:51:41,781][03057] Fps is (10 sec: 6555.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 8978432. Throughput: 0: 1041.3. Samples: 2250528. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:41,782][03057] Avg episode reward: [(0, '20.058')] +[2025-09-02 16:51:46,778][03057] Fps is (10 sec: 6554.2, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 8978432. Throughput: 0: 1095.2. Samples: 2258456. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:46,780][03057] Avg episode reward: [(0, '21.519')] +[2025-09-02 16:51:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 8978432. Throughput: 0: 1080.4. Samples: 2264164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:51,784][03057] Avg episode reward: [(0, '19.795')] +[2025-09-02 16:51:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.5, 300 sec: 4443.1). Total num frames: 9043968. Throughput: 0: 1054.8. Samples: 2266912. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:51:56,780][03057] Avg episode reward: [(0, '18.761')] +[2025-09-02 16:52:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9043968. Throughput: 0: 1094.1. Samples: 2274944. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:52:01,788][03057] Avg episode reward: [(0, '18.629')] +[2025-09-02 16:52:06,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 9109504. Throughput: 0: 1115.4. Samples: 2281312. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:52:06,780][03057] Avg episode reward: [(0, '18.751')] +[2025-09-02 16:52:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 9109504. Throughput: 0: 1094.3. Samples: 2283972. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:52:11,780][03057] Avg episode reward: [(0, '19.215')] +[2025-09-02 16:52:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 9109504. Throughput: 0: 1088.1. Samples: 2291028. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:52:16,780][03057] Avg episode reward: [(0, '20.053')] +[2025-09-02 16:52:20,406][03390] Updated weights for policy 0, policy_version 140 (0.0016) +[2025-09-02 16:52:21,783][03057] Fps is (10 sec: 6550.5, 60 sec: 5460.9, 300 sec: 4443.0). Total num frames: 9175040. Throughput: 0: 1140.9. Samples: 2298360. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:52:21,784][03057] Avg episode reward: [(0, '21.271')] +[2025-09-02 16:52:26,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 9175040. Throughput: 0: 1119.8. Samples: 2300916. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:52:26,780][03057] Avg episode reward: [(0, '19.518')] +[2025-09-02 16:52:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.4, 300 sec: 4221.0). Total num frames: 9175040. Throughput: 0: 1080.4. Samples: 2307076. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:52:31,786][03057] Avg episode reward: [(0, '19.319')] +[2025-09-02 16:52:36,778][03057] Fps is (10 sec: 6554.0, 60 sec: 5461.4, 300 sec: 4443.1). Total num frames: 9240576. Throughput: 0: 1116.4. Samples: 2314404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:52:36,780][03057] Avg episode reward: [(0, '19.150')] +[2025-09-02 16:52:41,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 9240576. Throughput: 0: 1130.2. Samples: 2317772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:52:41,780][03057] Avg episode reward: [(0, '17.783')] +[2025-09-02 16:52:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9240576. Throughput: 0: 1067.6. Samples: 2322988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:52:46,780][03057] Avg episode reward: [(0, '18.143')] +[2025-09-02 16:52:51,780][03057] Fps is (10 sec: 6552.4, 60 sec: 5461.2, 300 sec: 4443.1). Total num frames: 9306112. Throughput: 0: 1083.8. Samples: 2330084. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:52:51,782][03057] Avg episode reward: [(0, '18.991')] +[2025-09-02 16:52:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 9306112. Throughput: 0: 1081.1. Samples: 2332620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:52:56,781][03057] Avg episode reward: [(0, '19.472')] +[2025-09-02 16:52:56,790][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000142_9306112.pth... +[2025-09-02 16:52:57,001][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000126_8257536.pth +[2025-09-02 16:53:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9306112. Throughput: 0: 1013.1. Samples: 2336616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:01,783][03057] Avg episode reward: [(0, '19.659')] +[2025-09-02 16:53:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 9306112. Throughput: 0: 976.6. Samples: 2342304. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:06,780][03057] Avg episode reward: [(0, '21.513')] +[2025-09-02 16:53:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 9371648. Throughput: 0: 997.3. Samples: 2345796. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:11,782][03057] Avg episode reward: [(0, '21.401')] +[2025-09-02 16:53:16,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 9371648. Throughput: 0: 1013.2. Samples: 2352672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:16,785][03057] Avg episode reward: [(0, '20.280')] +[2025-09-02 16:53:21,780][03057] Fps is (10 sec: 0.0, 60 sec: 3277.0, 300 sec: 4221.0). Total num frames: 9371648. Throughput: 0: 962.5. Samples: 2357716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:21,785][03057] Avg episode reward: [(0, '20.207')] +[2025-09-02 16:53:26,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 9437184. Throughput: 0: 970.5. Samples: 2361444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:26,780][03057] Avg episode reward: [(0, '18.237')] +[2025-09-02 16:53:31,778][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 9437184. Throughput: 0: 1014.1. Samples: 2368624. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:31,784][03057] Avg episode reward: [(0, '18.616')] +[2025-09-02 16:53:36,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.7, 300 sec: 4220.9). Total num frames: 9437184. Throughput: 0: 970.4. Samples: 2373752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:36,783][03057] Avg episode reward: [(0, '18.858')] +[2025-09-02 16:53:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 9502720. Throughput: 0: 976.8. Samples: 2376576. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:41,779][03057] Avg episode reward: [(0, '20.674')] +[2025-09-02 16:53:46,779][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 9502720. Throughput: 0: 1049.4. Samples: 2383840. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:46,780][03057] Avg episode reward: [(0, '21.179')] +[2025-09-02 16:53:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.9, 300 sec: 4221.0). Total num frames: 9502720. Throughput: 0: 1061.1. Samples: 2390052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:51,780][03057] Avg episode reward: [(0, '20.361')] +[2025-09-02 16:53:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 9502720. Throughput: 0: 1036.7. Samples: 2392448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:53:56,780][03057] Avg episode reward: [(0, '20.432')] +[2025-09-02 16:54:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 9568256. Throughput: 0: 1033.0. Samples: 2399156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:54:01,780][03057] Avg episode reward: [(0, '20.512')] +[2025-09-02 16:54:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9568256. Throughput: 0: 1077.9. Samples: 2406220. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:54:06,780][03057] Avg episode reward: [(0, '19.356')] +[2025-09-02 16:54:11,780][03057] Fps is (10 sec: 0.0, 60 sec: 3276.7, 300 sec: 4221.0). Total num frames: 9568256. Throughput: 0: 1046.4. Samples: 2408532. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:54:11,785][03057] Avg episode reward: [(0, '20.229')] +[2025-09-02 16:54:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 9633792. Throughput: 0: 1013.8. Samples: 2414244. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:54:16,782][03057] Avg episode reward: [(0, '20.790')] +[2025-09-02 16:54:21,778][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 9633792. Throughput: 0: 1063.7. Samples: 2421616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:54:21,781][03057] Avg episode reward: [(0, '20.289')] +[2025-09-02 16:54:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 9633792. Throughput: 0: 1066.5. Samples: 2424568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:54:26,780][03057] Avg episode reward: [(0, '20.164')] +[2025-09-02 16:54:31,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 9699328. Throughput: 0: 1013.6. Samples: 2429452. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:54:31,780][03057] Avg episode reward: [(0, '20.995')] +[2025-09-02 16:54:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9699328. Throughput: 0: 1040.1. Samples: 2436856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:54:36,784][03057] Avg episode reward: [(0, '20.202')] +[2025-09-02 16:54:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 9699328. Throughput: 0: 1061.1. Samples: 2440196. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:54:41,780][03057] Avg episode reward: [(0, '21.290')] +[2025-09-02 16:54:46,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 9764864. Throughput: 0: 1013.0. Samples: 2444744. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:54:46,784][03057] Avg episode reward: [(0, '22.100')] +[2025-09-02 16:54:46,794][03375] Saving new best policy, reward=22.100! +[2025-09-02 16:54:51,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9764864. Throughput: 0: 1007.7. Samples: 2451568. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:54:51,790][03057] Avg episode reward: [(0, '23.608')] +[2025-09-02 16:54:51,794][03375] Saving new best policy, reward=23.608! +[2025-09-02 16:54:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9764864. Throughput: 0: 1035.6. Samples: 2455132. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:54:56,781][03057] Avg episode reward: [(0, '22.817')] +[2025-09-02 16:54:56,791][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000149_9764864.pth... +[2025-09-02 16:54:56,911][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000134_8781824.pth +[2025-09-02 16:54:59,678][03390] Updated weights for policy 0, policy_version 150 (0.0018) +[2025-09-02 16:55:01,786][03057] Fps is (10 sec: 6548.6, 60 sec: 4368.5, 300 sec: 4220.9). Total num frames: 9830400. Throughput: 0: 1028.4. Samples: 2460532. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:55:01,787][03057] Avg episode reward: [(0, '22.184')] +[2025-09-02 16:55:06,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9830400. Throughput: 0: 994.9. Samples: 2466388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:55:06,781][03057] Avg episode reward: [(0, '22.102')] +[2025-09-02 16:55:11,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 9830400. Throughput: 0: 1007.1. Samples: 2469888. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:55:11,780][03057] Avg episode reward: [(0, '21.001')] +[2025-09-02 16:55:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9895936. Throughput: 0: 1041.8. Samples: 2476332. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:55:16,786][03057] Avg episode reward: [(0, '20.542')] +[2025-09-02 16:55:21,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9895936. Throughput: 0: 982.0. Samples: 2481048. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:55:21,780][03057] Avg episode reward: [(0, '20.478')] +[2025-09-02 16:55:26,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 9895936. Throughput: 0: 989.1. Samples: 2484708. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:55:26,780][03057] Avg episode reward: [(0, '20.939')] +[2025-09-02 16:55:31,786][03057] Fps is (10 sec: 0.0, 60 sec: 3276.4, 300 sec: 3998.7). Total num frames: 9895936. Throughput: 0: 1047.7. Samples: 2491900. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:55:31,791][03057] Avg episode reward: [(0, '20.978')] +[2025-09-02 16:55:36,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9961472. Throughput: 0: 1002.8. Samples: 2496696. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:55:36,780][03057] Avg episode reward: [(0, '20.541')] +[2025-09-02 16:55:41,778][03057] Fps is (10 sec: 6558.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 9961472. Throughput: 0: 982.5. Samples: 2499344. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:55:41,779][03057] Avg episode reward: [(0, '19.955')] +[2025-09-02 16:55:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 9961472. Throughput: 0: 1028.3. Samples: 2506796. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:55:46,784][03057] Avg episode reward: [(0, '22.324')] +[2025-09-02 16:55:51,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10027008. Throughput: 0: 1034.1. Samples: 2512924. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:55:51,780][03057] Avg episode reward: [(0, '21.984')] +[2025-09-02 16:55:56,781][03057] Fps is (10 sec: 6551.8, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 10027008. Throughput: 0: 1010.6. Samples: 2515368. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:55:56,782][03057] Avg episode reward: [(0, '21.571')] +[2025-09-02 16:56:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3277.2, 300 sec: 3998.8). Total num frames: 10027008. Throughput: 0: 1017.8. Samples: 2522132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:01,783][03057] Avg episode reward: [(0, '21.686')] +[2025-09-02 16:56:06,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10092544. Throughput: 0: 1065.2. Samples: 2528984. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:06,779][03057] Avg episode reward: [(0, '22.407')] +[2025-09-02 16:56:11,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10092544. Throughput: 0: 1035.5. Samples: 2531304. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:11,780][03057] Avg episode reward: [(0, '21.975')] +[2025-09-02 16:56:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 10092544. Throughput: 0: 1000.3. Samples: 2536908. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:16,780][03057] Avg episode reward: [(0, '22.438')] +[2025-09-02 16:56:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10158080. Throughput: 0: 1050.1. Samples: 2543948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:21,779][03057] Avg episode reward: [(0, '23.300')] +[2025-09-02 16:56:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10158080. Throughput: 0: 1063.6. Samples: 2547204. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:26,780][03057] Avg episode reward: [(0, '23.781')] +[2025-09-02 16:56:26,785][03375] Saving new best policy, reward=23.781! +[2025-09-02 16:56:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.6, 300 sec: 4221.0). Total num frames: 10158080. Throughput: 0: 1006.7. Samples: 2552096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:31,780][03057] Avg episode reward: [(0, '25.258')] +[2025-09-02 16:56:31,789][03375] Saving new best policy, reward=25.258! +[2025-09-02 16:56:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 10158080. Throughput: 0: 1029.1. Samples: 2559232. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:36,780][03057] Avg episode reward: [(0, '24.984')] +[2025-09-02 16:56:41,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10223616. Throughput: 0: 1051.4. Samples: 2562676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:41,782][03057] Avg episode reward: [(0, '25.601')] +[2025-09-02 16:56:41,784][03375] Saving new best policy, reward=25.601! +[2025-09-02 16:56:46,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10223616. Throughput: 0: 1015.1. Samples: 2567812. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:46,780][03057] Avg episode reward: [(0, '25.697')] +[2025-09-02 16:56:46,792][03375] Saving new best policy, reward=25.697! +[2025-09-02 16:56:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 10223616. Throughput: 0: 1001.6. Samples: 2574056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:51,782][03057] Avg episode reward: [(0, '25.359')] +[2025-09-02 16:56:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 10289152. Throughput: 0: 1024.4. Samples: 2577404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:56:56,780][03057] Avg episode reward: [(0, '24.204')] +[2025-09-02 16:56:56,792][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000157_10289152.pth... +[2025-09-02 16:56:56,912][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000142_9306112.pth +[2025-09-02 16:57:01,780][03057] Fps is (10 sec: 6552.7, 60 sec: 4369.0, 300 sec: 3998.8). Total num frames: 10289152. Throughput: 0: 1036.4. Samples: 2583548. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:01,783][03057] Avg episode reward: [(0, '25.081')] +[2025-09-02 16:57:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 10289152. Throughput: 0: 999.6. Samples: 2588928. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:06,780][03057] Avg episode reward: [(0, '24.123')] +[2025-09-02 16:57:11,778][03057] Fps is (10 sec: 6554.4, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10354688. Throughput: 0: 1002.9. Samples: 2592336. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:11,785][03057] Avg episode reward: [(0, '23.978')] +[2025-09-02 16:57:16,781][03057] Fps is (10 sec: 6551.8, 60 sec: 4368.9, 300 sec: 3998.8). Total num frames: 10354688. Throughput: 0: 1051.7. Samples: 2599424. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:16,786][03057] Avg episode reward: [(0, '23.792')] +[2025-09-02 16:57:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 10354688. Throughput: 0: 1002.4. Samples: 2604340. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:21,783][03057] Avg episode reward: [(0, '22.996')] +[2025-09-02 16:57:26,780][03057] Fps is (10 sec: 6554.2, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 10420224. Throughput: 0: 1002.3. Samples: 2607780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:26,782][03057] Avg episode reward: [(0, '23.790')] +[2025-09-02 16:57:31,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 10420224. Throughput: 0: 1049.4. Samples: 2615036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:31,780][03057] Avg episode reward: [(0, '22.965')] +[2025-09-02 16:57:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 10420224. Throughput: 0: 1034.2. Samples: 2620596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:36,784][03057] Avg episode reward: [(0, '22.871')] +[2025-09-02 16:57:41,095][03390] Updated weights for policy 0, policy_version 160 (0.0014) +[2025-09-02 16:57:41,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10485760. Throughput: 0: 1016.9. Samples: 2623164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:41,780][03057] Avg episode reward: [(0, '24.035')] +[2025-09-02 16:57:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 10485760. Throughput: 0: 1027.4. Samples: 2629780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:46,785][03057] Avg episode reward: [(0, '23.128')] +[2025-09-02 16:57:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 10485760. Throughput: 0: 1008.5. Samples: 2634312. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:51,780][03057] Avg episode reward: [(0, '23.216')] +[2025-09-02 16:57:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 10485760. Throughput: 0: 976.4. Samples: 2636276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:57:56,782][03057] Avg episode reward: [(0, '22.664')] +[2025-09-02 16:58:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 10551296. Throughput: 0: 945.1. Samples: 2641952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:01,784][03057] Avg episode reward: [(0, '22.877')] +[2025-09-02 16:58:06,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 10551296. Throughput: 0: 998.9. Samples: 2649292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:06,780][03057] Avg episode reward: [(0, '21.688')] +[2025-09-02 16:58:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 10551296. Throughput: 0: 990.4. Samples: 2652344. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:11,780][03057] Avg episode reward: [(0, '21.260')] +[2025-09-02 16:58:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 10616832. Throughput: 0: 935.5. Samples: 2657132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:16,780][03057] Avg episode reward: [(0, '22.137')] +[2025-09-02 16:58:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 10616832. Throughput: 0: 967.8. Samples: 2664148. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:21,780][03057] Avg episode reward: [(0, '23.054')] +[2025-09-02 16:58:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.9, 300 sec: 3998.8). Total num frames: 10616832. Throughput: 0: 993.0. Samples: 2667848. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:26,781][03057] Avg episode reward: [(0, '24.177')] +[2025-09-02 16:58:31,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 10616832. Throughput: 0: 960.6. Samples: 2673008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:31,782][03057] Avg episode reward: [(0, '25.129')] +[2025-09-02 16:58:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 10682368. Throughput: 0: 998.1. Samples: 2679228. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:36,786][03057] Avg episode reward: [(0, '25.423')] +[2025-09-02 16:58:41,780][03057] Fps is (10 sec: 6553.0, 60 sec: 3276.7, 300 sec: 3998.8). Total num frames: 10682368. Throughput: 0: 1036.1. Samples: 2682904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:41,784][03057] Avg episode reward: [(0, '24.823')] +[2025-09-02 16:58:46,780][03057] Fps is (10 sec: 0.0, 60 sec: 3276.7, 300 sec: 3998.8). Total num frames: 10682368. Throughput: 0: 1039.6. Samples: 2688736. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:46,785][03057] Avg episode reward: [(0, '25.706')] +[2025-09-02 16:58:46,798][03375] Saving new best policy, reward=25.706! +[2025-09-02 16:58:51,778][03057] Fps is (10 sec: 6554.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10747904. Throughput: 0: 994.1. Samples: 2694028. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:51,779][03057] Avg episode reward: [(0, '25.411')] +[2025-09-02 16:58:56,778][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 10747904. Throughput: 0: 1007.5. Samples: 2697680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:58:56,781][03057] Avg episode reward: [(0, '24.796')] +[2025-09-02 16:58:56,791][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000164_10747904.pth... +[2025-09-02 16:58:56,917][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000149_9764864.pth +[2025-09-02 16:59:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 10747904. Throughput: 0: 1055.6. Samples: 2704636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:59:01,784][03057] Avg episode reward: [(0, '24.485')] +[2025-09-02 16:59:06,779][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10813440. Throughput: 0: 998.7. Samples: 2709088. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:59:06,780][03057] Avg episode reward: [(0, '24.691')] +[2025-09-02 16:59:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 10813440. Throughput: 0: 997.8. Samples: 2712748. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:59:11,784][03057] Avg episode reward: [(0, '24.893')] +[2025-09-02 16:59:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 10813440. Throughput: 0: 1043.5. Samples: 2719964. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:59:16,782][03057] Avg episode reward: [(0, '24.730')] +[2025-09-02 16:59:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10878976. Throughput: 0: 1013.8. Samples: 2724848. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:59:21,780][03057] Avg episode reward: [(0, '24.486')] +[2025-09-02 16:59:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 10878976. Throughput: 0: 993.5. Samples: 2727612. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:59:26,780][03057] Avg episode reward: [(0, '24.814')] +[2025-09-02 16:59:31,782][03057] Fps is (10 sec: 0.0, 60 sec: 4368.8, 300 sec: 3998.8). Total num frames: 10878976. Throughput: 0: 1029.8. Samples: 2735080. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:59:31,784][03057] Avg episode reward: [(0, '24.435')] +[2025-09-02 16:59:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 10944512. Throughput: 0: 1042.1. Samples: 2740924. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:59:36,783][03057] Avg episode reward: [(0, '23.962')] +[2025-09-02 16:59:41,778][03057] Fps is (10 sec: 6556.0, 60 sec: 4369.2, 300 sec: 3998.8). Total num frames: 10944512. Throughput: 0: 1014.7. Samples: 2743340. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:59:41,779][03057] Avg episode reward: [(0, '23.171')] +[2025-09-02 16:59:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 3998.8). Total num frames: 10944512. Throughput: 0: 1010.7. Samples: 2750116. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 16:59:46,780][03057] Avg episode reward: [(0, '23.273')] +[2025-09-02 16:59:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11010048. Throughput: 0: 1065.9. Samples: 2757052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:59:51,779][03057] Avg episode reward: [(0, '23.313')] +[2025-09-02 16:59:56,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 3998.9). Total num frames: 11010048. Throughput: 0: 1037.7. Samples: 2759444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 16:59:56,781][03057] Avg episode reward: [(0, '23.049')] +[2025-09-02 17:00:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 11010048. Throughput: 0: 1010.6. Samples: 2765440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:01,780][03057] Avg episode reward: [(0, '24.812')] +[2025-09-02 17:00:06,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11075584. Throughput: 0: 1062.2. Samples: 2772648. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:06,780][03057] Avg episode reward: [(0, '24.599')] +[2025-09-02 17:00:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 11075584. Throughput: 0: 1070.6. Samples: 2775788. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:11,780][03057] Avg episode reward: [(0, '23.908')] +[2025-09-02 17:00:16,785][03057] Fps is (10 sec: 0.0, 60 sec: 4368.6, 300 sec: 3998.7). Total num frames: 11075584. Throughput: 0: 1014.8. Samples: 2780748. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:16,786][03057] Avg episode reward: [(0, '22.831')] +[2025-09-02 17:00:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 11075584. Throughput: 0: 1051.8. Samples: 2788256. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:21,780][03057] Avg episode reward: [(0, '22.371')] +[2025-09-02 17:00:22,500][03390] Updated weights for policy 0, policy_version 170 (0.0016) +[2025-09-02 17:00:26,778][03057] Fps is (10 sec: 6557.9, 60 sec: 4369.1, 300 sec: 4221.1). Total num frames: 11141120. Throughput: 0: 1073.8. Samples: 2791660. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:26,787][03057] Avg episode reward: [(0, '21.669')] +[2025-09-02 17:00:31,781][03057] Fps is (10 sec: 6551.8, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 11141120. Throughput: 0: 1037.9. Samples: 2796824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:31,782][03057] Avg episode reward: [(0, '22.591')] +[2025-09-02 17:00:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 11141120. Throughput: 0: 1034.4. Samples: 2803600. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:36,789][03057] Avg episode reward: [(0, '24.174')] +[2025-09-02 17:00:41,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11206656. Throughput: 0: 1056.1. Samples: 2806968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:41,781][03057] Avg episode reward: [(0, '25.091')] +[2025-09-02 17:00:46,780][03057] Fps is (10 sec: 6552.3, 60 sec: 4368.9, 300 sec: 3998.8). Total num frames: 11206656. Throughput: 0: 1056.4. Samples: 2812980. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:46,786][03057] Avg episode reward: [(0, '24.664')] +[2025-09-02 17:00:51,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 11206656. Throughput: 0: 1024.5. Samples: 2818752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:51,780][03057] Avg episode reward: [(0, '24.343')] +[2025-09-02 17:00:56,778][03057] Fps is (10 sec: 6554.9, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11272192. Throughput: 0: 1028.3. Samples: 2822060. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:00:56,780][03057] Avg episode reward: [(0, '24.893')] +[2025-09-02 17:00:56,786][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000172_11272192.pth... +[2025-09-02 17:00:56,929][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000157_10289152.pth +[2025-09-02 17:01:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 11272192. Throughput: 0: 1076.3. Samples: 2829176. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:01:01,784][03057] Avg episode reward: [(0, '24.012')] +[2025-09-02 17:01:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 11272192. Throughput: 0: 1016.9. Samples: 2834016. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:01:06,781][03057] Avg episode reward: [(0, '23.817')] +[2025-09-02 17:01:11,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 11337728. Throughput: 0: 1015.0. Samples: 2837336. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:01:11,782][03057] Avg episode reward: [(0, '23.582')] +[2025-09-02 17:01:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.5, 300 sec: 3998.8). Total num frames: 11337728. Throughput: 0: 1063.4. Samples: 2844672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:01:16,788][03057] Avg episode reward: [(0, '23.366')] +[2025-09-02 17:01:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 11337728. Throughput: 0: 1036.4. Samples: 2850236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:01:21,785][03057] Avg episode reward: [(0, '22.872')] +[2025-09-02 17:01:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11403264. Throughput: 0: 1017.7. Samples: 2852764. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:01:26,780][03057] Avg episode reward: [(0, '23.231')] +[2025-09-02 17:01:31,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 11403264. Throughput: 0: 1050.4. Samples: 2860244. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:01:31,779][03057] Avg episode reward: [(0, '24.417')] +[2025-09-02 17:01:36,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 3998.8). Total num frames: 11403264. Throughput: 0: 1063.6. Samples: 2866616. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:01:36,789][03057] Avg episode reward: [(0, '24.875')] +[2025-09-02 17:01:41,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 11468800. Throughput: 0: 1043.4. Samples: 2869012. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:01:41,783][03057] Avg episode reward: [(0, '24.819')] +[2025-09-02 17:01:46,778][03057] Fps is (10 sec: 6555.3, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 11468800. Throughput: 0: 1030.8. Samples: 2875564. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:01:46,781][03057] Avg episode reward: [(0, '24.585')] +[2025-09-02 17:01:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 11468800. Throughput: 0: 1087.9. Samples: 2882972. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:01:51,783][03057] Avg episode reward: [(0, '24.048')] +[2025-09-02 17:01:56,784][03057] Fps is (10 sec: 6550.1, 60 sec: 4368.7, 300 sec: 4220.9). Total num frames: 11534336. Throughput: 0: 1066.4. Samples: 2885328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:01:56,785][03057] Avg episode reward: [(0, '23.574')] +[2025-09-02 17:02:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11534336. Throughput: 0: 1045.2. Samples: 2891704. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:02:01,780][03057] Avg episode reward: [(0, '24.335')] +[2025-09-02 17:02:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 3998.8). Total num frames: 11534336. Throughput: 0: 1098.0. Samples: 2899648. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:02:06,779][03057] Avg episode reward: [(0, '24.475')] +[2025-09-02 17:02:11,780][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11599872. Throughput: 0: 1105.2. Samples: 2902496. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:02:11,781][03057] Avg episode reward: [(0, '25.184')] +[2025-09-02 17:02:16,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11599872. Throughput: 0: 1058.8. Samples: 2907888. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:02:16,779][03057] Avg episode reward: [(0, '25.313')] +[2025-09-02 17:02:21,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 3998.8). Total num frames: 11599872. Throughput: 0: 1093.4. Samples: 2915816. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:02:21,780][03057] Avg episode reward: [(0, '25.209')] +[2025-09-02 17:02:26,784][03057] Fps is (10 sec: 6550.1, 60 sec: 4368.7, 300 sec: 4220.9). Total num frames: 11665408. Throughput: 0: 1129.2. Samples: 2919832. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:02:26,785][03057] Avg episode reward: [(0, '24.265')] +[2025-09-02 17:02:31,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 11665408. Throughput: 0: 1097.4. Samples: 2924948. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:02:31,780][03057] Avg episode reward: [(0, '24.000')] +[2025-09-02 17:02:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 3998.8). Total num frames: 11665408. Throughput: 0: 1094.8. Samples: 2932240. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:02:36,780][03057] Avg episode reward: [(0, '23.275')] +[2025-09-02 17:02:41,780][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 11730944. Throughput: 0: 1112.6. Samples: 2935392. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:02:41,784][03057] Avg episode reward: [(0, '23.857')] +[2025-09-02 17:02:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11730944. Throughput: 0: 1062.0. Samples: 2939492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:02:46,782][03057] Avg episode reward: [(0, '22.243')] +[2025-09-02 17:02:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11730944. Throughput: 0: 1010.6. Samples: 2945124. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:02:51,781][03057] Avg episode reward: [(0, '23.625')] +[2025-09-02 17:02:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 3277.1, 300 sec: 3998.8). Total num frames: 11730944. Throughput: 0: 1037.0. Samples: 2949160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:02:56,781][03057] Avg episode reward: [(0, '24.441')] +[2025-09-02 17:02:56,926][03390] Updated weights for policy 0, policy_version 180 (0.0017) +[2025-09-02 17:02:56,935][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_11796480.pth... +[2025-09-02 17:02:57,061][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000164_10747904.pth +[2025-09-02 17:03:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11796480. Throughput: 0: 1082.6. Samples: 2956604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:03:01,779][03057] Avg episode reward: [(0, '26.406')] +[2025-09-02 17:03:01,781][03375] Saving new best policy, reward=26.406! +[2025-09-02 17:03:06,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 11796480. Throughput: 0: 1020.3. Samples: 2961732. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:03:06,781][03057] Avg episode reward: [(0, '25.607')] +[2025-09-02 17:03:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 3998.8). Total num frames: 11796480. Throughput: 0: 1012.3. Samples: 2965380. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:03:11,784][03057] Avg episode reward: [(0, '26.065')] +[2025-09-02 17:03:16,778][03057] Fps is (10 sec: 6554.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11862016. Throughput: 0: 1066.2. Samples: 2972928. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:03:16,780][03057] Avg episode reward: [(0, '25.801')] +[2025-09-02 17:03:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11862016. Throughput: 0: 1036.3. Samples: 2978872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:03:21,780][03057] Avg episode reward: [(0, '25.205')] +[2025-09-02 17:03:26,779][03057] Fps is (10 sec: 0.0, 60 sec: 3277.1, 300 sec: 4221.0). Total num frames: 11862016. Throughput: 0: 1026.5. Samples: 2981584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:03:26,787][03057] Avg episode reward: [(0, '23.348')] +[2025-09-02 17:03:31,783][03057] Fps is (10 sec: 6550.2, 60 sec: 4368.8, 300 sec: 4220.9). Total num frames: 11927552. Throughput: 0: 1105.0. Samples: 2989224. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:03:31,785][03057] Avg episode reward: [(0, '23.348')] +[2025-09-02 17:03:36,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11927552. Throughput: 0: 1134.6. Samples: 2996180. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:03:36,787][03057] Avg episode reward: [(0, '23.331')] +[2025-09-02 17:03:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.9, 300 sec: 4221.0). Total num frames: 11927552. Throughput: 0: 1102.3. Samples: 2998764. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:03:41,779][03057] Avg episode reward: [(0, '23.183')] +[2025-09-02 17:03:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11993088. Throughput: 0: 1089.2. Samples: 3005620. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:03:46,785][03057] Avg episode reward: [(0, '23.509')] +[2025-09-02 17:03:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 11993088. Throughput: 0: 1144.8. Samples: 3013244. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:03:51,779][03057] Avg episode reward: [(0, '24.452')] +[2025-09-02 17:03:56,778][03057] Fps is (10 sec: 6553.5, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 12058624. Throughput: 0: 1122.9. Samples: 3015912. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:03:56,785][03057] Avg episode reward: [(0, '25.265')] +[2025-09-02 17:04:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12058624. Throughput: 0: 1089.3. Samples: 3021948. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:04:01,785][03057] Avg episode reward: [(0, '26.117')] +[2025-09-02 17:04:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 12058624. Throughput: 0: 1133.3. Samples: 3029872. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:04:06,784][03057] Avg episode reward: [(0, '25.334')] +[2025-09-02 17:04:11,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 12124160. Throughput: 0: 1149.5. Samples: 3033312. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:04:11,786][03057] Avg episode reward: [(0, '26.141')] +[2025-09-02 17:04:16,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 12124160. Throughput: 0: 1093.3. Samples: 3038416. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:04:16,780][03057] Avg episode reward: [(0, '24.515')] +[2025-09-02 17:04:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12124160. Throughput: 0: 1113.0. Samples: 3046264. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:04:21,787][03057] Avg episode reward: [(0, '24.364')] +[2025-09-02 17:04:26,778][03057] Fps is (10 sec: 6553.8, 60 sec: 5461.4, 300 sec: 4443.2). Total num frames: 12189696. Throughput: 0: 1139.6. Samples: 3050044. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:04:26,784][03057] Avg episode reward: [(0, '23.933')] +[2025-09-02 17:04:31,782][03057] Fps is (10 sec: 6551.1, 60 sec: 4369.2, 300 sec: 4220.9). Total num frames: 12189696. Throughput: 0: 1107.4. Samples: 3055456. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:04:31,783][03057] Avg episode reward: [(0, '25.629')] +[2025-09-02 17:04:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12189696. Throughput: 0: 1097.6. Samples: 3062636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:04:36,780][03057] Avg episode reward: [(0, '25.734')] +[2025-09-02 17:04:41,778][03057] Fps is (10 sec: 6556.1, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 12255232. Throughput: 0: 1123.7. Samples: 3066480. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:04:41,785][03057] Avg episode reward: [(0, '25.138')] +[2025-09-02 17:04:46,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 12255232. Throughput: 0: 1129.0. Samples: 3072752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:04:46,781][03057] Avg episode reward: [(0, '24.588')] +[2025-09-02 17:04:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12255232. Throughput: 0: 1089.0. Samples: 3078876. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:04:51,785][03057] Avg episode reward: [(0, '23.761')] +[2025-09-02 17:04:56,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 12320768. Throughput: 0: 1099.6. Samples: 3082792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:04:56,787][03057] Avg episode reward: [(0, '22.579')] +[2025-09-02 17:04:56,807][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_12320768.pth... +[2025-09-02 17:04:56,958][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000172_11272192.pth +[2025-09-02 17:05:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12320768. Throughput: 0: 1140.6. Samples: 3089744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:01,783][03057] Avg episode reward: [(0, '24.246')] +[2025-09-02 17:05:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12320768. Throughput: 0: 1086.0. Samples: 3095132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:06,780][03057] Avg episode reward: [(0, '24.341')] +[2025-09-02 17:05:11,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 12386304. Throughput: 0: 1091.5. Samples: 3099160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:11,779][03057] Avg episode reward: [(0, '24.562')] +[2025-09-02 17:05:16,780][03057] Fps is (10 sec: 6552.3, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 12386304. Throughput: 0: 1140.5. Samples: 3106776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:16,782][03057] Avg episode reward: [(0, '24.276')] +[2025-09-02 17:05:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12386304. Throughput: 0: 1102.0. Samples: 3112228. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:21,780][03057] Avg episode reward: [(0, '22.428')] +[2025-09-02 17:05:25,542][03390] Updated weights for policy 0, policy_version 190 (0.0018) +[2025-09-02 17:05:26,778][03057] Fps is (10 sec: 6555.0, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 12451840. Throughput: 0: 1088.2. Samples: 3115448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:26,780][03057] Avg episode reward: [(0, '21.151')] +[2025-09-02 17:05:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 12451840. Throughput: 0: 1120.5. Samples: 3123172. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:31,779][03057] Avg episode reward: [(0, '22.530')] +[2025-09-02 17:05:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12451840. Throughput: 0: 1129.8. Samples: 3129716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:36,781][03057] Avg episode reward: [(0, '22.525')] +[2025-09-02 17:05:41,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 12517376. Throughput: 0: 1101.8. Samples: 3132372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:41,781][03057] Avg episode reward: [(0, '22.569')] +[2025-09-02 17:05:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 12517376. Throughput: 0: 1108.7. Samples: 3139636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:46,785][03057] Avg episode reward: [(0, '24.554')] +[2025-09-02 17:05:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12517376. Throughput: 0: 1151.9. Samples: 3146968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:51,780][03057] Avg episode reward: [(0, '25.266')] +[2025-09-02 17:05:56,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 12582912. Throughput: 0: 1113.7. Samples: 3149276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:05:56,780][03057] Avg episode reward: [(0, '25.318')] +[2025-09-02 17:06:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 12582912. Throughput: 0: 1093.2. Samples: 3155968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:01,780][03057] Avg episode reward: [(0, '24.961')] +[2025-09-02 17:06:06,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 12582912. Throughput: 0: 1151.2. Samples: 3164032. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:06,787][03057] Avg episode reward: [(0, '25.884')] +[2025-09-02 17:06:11,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 12648448. Throughput: 0: 1136.3. Samples: 3166580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:11,780][03057] Avg episode reward: [(0, '25.071')] +[2025-09-02 17:06:16,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 12648448. Throughput: 0: 1093.2. Samples: 3172368. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:16,779][03057] Avg episode reward: [(0, '24.365')] +[2025-09-02 17:06:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12648448. Throughput: 0: 1122.0. Samples: 3180204. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:21,782][03057] Avg episode reward: [(0, '23.912')] +[2025-09-02 17:06:26,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 12713984. Throughput: 0: 1138.6. Samples: 3183608. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:26,783][03057] Avg episode reward: [(0, '23.235')] +[2025-09-02 17:06:31,780][03057] Fps is (10 sec: 6552.6, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 12713984. Throughput: 0: 1093.1. Samples: 3188828. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:31,789][03057] Avg episode reward: [(0, '23.029')] +[2025-09-02 17:06:36,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 12713984. Throughput: 0: 1105.9. Samples: 3196732. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:36,780][03057] Avg episode reward: [(0, '22.492')] +[2025-09-02 17:06:41,778][03057] Fps is (10 sec: 6554.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 12779520. Throughput: 0: 1131.2. Samples: 3200180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:41,779][03057] Avg episode reward: [(0, '22.945')] +[2025-09-02 17:06:46,779][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 12779520. Throughput: 0: 1112.4. Samples: 3206028. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:46,781][03057] Avg episode reward: [(0, '23.504')] +[2025-09-02 17:06:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12779520. Throughput: 0: 1083.4. Samples: 3212784. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:51,785][03057] Avg episode reward: [(0, '24.435')] +[2025-09-02 17:06:56,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 12845056. Throughput: 0: 1106.3. Samples: 3216364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:06:56,780][03057] Avg episode reward: [(0, '24.557')] +[2025-09-02 17:06:56,793][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_12845056.pth... +[2025-09-02 17:06:56,928][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000180_11796480.pth +[2025-09-02 17:07:01,782][03057] Fps is (10 sec: 6551.0, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 12845056. Throughput: 0: 1126.9. Samples: 3223084. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:01,784][03057] Avg episode reward: [(0, '25.771')] +[2025-09-02 17:07:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12845056. Throughput: 0: 1079.7. Samples: 3228792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:06,780][03057] Avg episode reward: [(0, '26.501')] +[2025-09-02 17:07:06,793][03375] Saving new best policy, reward=26.501! +[2025-09-02 17:07:11,779][03057] Fps is (10 sec: 6556.0, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 12910592. Throughput: 0: 1082.6. Samples: 3232324. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:11,784][03057] Avg episode reward: [(0, '26.440')] +[2025-09-02 17:07:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 12910592. Throughput: 0: 1140.4. Samples: 3240144. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:16,780][03057] Avg episode reward: [(0, '25.843')] +[2025-09-02 17:07:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12910592. Throughput: 0: 1080.2. Samples: 3245340. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:21,784][03057] Avg episode reward: [(0, '26.362')] +[2025-09-02 17:07:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 12976128. Throughput: 0: 1074.2. Samples: 3248520. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:26,779][03057] Avg episode reward: [(0, '26.240')] +[2025-09-02 17:07:31,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 12976128. Throughput: 0: 1122.0. Samples: 3256516. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:31,780][03057] Avg episode reward: [(0, '26.441')] +[2025-09-02 17:07:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 12976128. Throughput: 0: 1107.1. Samples: 3262604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:36,783][03057] Avg episode reward: [(0, '26.343')] +[2025-09-02 17:07:41,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13041664. Throughput: 0: 1077.7. Samples: 3264860. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:41,780][03057] Avg episode reward: [(0, '25.470')] +[2025-09-02 17:07:46,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13041664. Throughput: 0: 1104.1. Samples: 3272764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:46,779][03057] Avg episode reward: [(0, '26.167')] +[2025-09-02 17:07:50,752][03390] Updated weights for policy 0, policy_version 200 (0.0022) +[2025-09-02 17:07:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4665.3). Total num frames: 13107200. Throughput: 0: 1126.5. Samples: 3279484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:51,784][03057] Avg episode reward: [(0, '26.279')] +[2025-09-02 17:07:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13107200. Throughput: 0: 1103.8. Samples: 3281996. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:07:56,780][03057] Avg episode reward: [(0, '26.515')] +[2025-09-02 17:07:56,793][03375] Saving new best policy, reward=26.515! +[2025-09-02 17:08:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.4, 300 sec: 4443.1). Total num frames: 13107200. Throughput: 0: 1084.6. Samples: 3288952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:08:01,782][03057] Avg episode reward: [(0, '27.165')] +[2025-09-02 17:08:01,785][03375] Saving new best policy, reward=27.165! +[2025-09-02 17:08:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4665.3). Total num frames: 13172736. Throughput: 0: 1136.4. Samples: 3296480. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:08:06,785][03057] Avg episode reward: [(0, '27.707')] +[2025-09-02 17:08:06,798][03375] Saving new best policy, reward=27.707! +[2025-09-02 17:08:11,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13172736. Throughput: 0: 1122.7. Samples: 3299044. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:08:11,780][03057] Avg episode reward: [(0, '29.787')] +[2025-09-02 17:08:11,782][03375] Saving new best policy, reward=29.787! +[2025-09-02 17:08:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13172736. Throughput: 0: 1080.3. Samples: 3305128. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:08:16,781][03057] Avg episode reward: [(0, '29.085')] +[2025-09-02 17:08:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13172736. Throughput: 0: 1116.4. Samples: 3312840. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:08:21,783][03057] Avg episode reward: [(0, '27.788')] +[2025-09-02 17:08:26,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4443.2). Total num frames: 13238272. Throughput: 0: 1139.8. Samples: 3316152. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:08:26,785][03057] Avg episode reward: [(0, '27.843')] +[2025-09-02 17:08:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13238272. Throughput: 0: 1081.9. Samples: 3321448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:08:31,784][03057] Avg episode reward: [(0, '27.311')] +[2025-09-02 17:08:36,778][03057] Fps is (10 sec: 6554.0, 60 sec: 5461.3, 300 sec: 4665.3). Total num frames: 13303808. Throughput: 0: 1101.9. Samples: 3329068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:08:36,788][03057] Avg episode reward: [(0, '27.545')] +[2025-09-02 17:08:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13303808. Throughput: 0: 1132.9. Samples: 3332976. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:08:41,781][03057] Avg episode reward: [(0, '27.379')] +[2025-09-02 17:08:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13303808. Throughput: 0: 1105.3. Samples: 3338692. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:08:46,782][03057] Avg episode reward: [(0, '28.706')] +[2025-09-02 17:08:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13369344. Throughput: 0: 1086.3. Samples: 3345364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:08:51,779][03057] Avg episode reward: [(0, '27.756')] +[2025-09-02 17:08:56,782][03057] Fps is (10 sec: 6551.1, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 13369344. Throughput: 0: 1119.5. Samples: 3349424. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:08:56,783][03057] Avg episode reward: [(0, '27.434')] +[2025-09-02 17:08:56,794][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000204_13369344.pth... +[2025-09-02 17:08:56,951][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_12320768.pth +[2025-09-02 17:09:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13369344. Throughput: 0: 1128.9. Samples: 3355928. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:09:01,785][03057] Avg episode reward: [(0, '25.571')] +[2025-09-02 17:09:06,778][03057] Fps is (10 sec: 6556.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13434880. Throughput: 0: 1089.6. Samples: 3361872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:09:06,780][03057] Avg episode reward: [(0, '25.167')] +[2025-09-02 17:09:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13434880. Throughput: 0: 1103.8. Samples: 3365824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:09:11,780][03057] Avg episode reward: [(0, '23.250')] +[2025-09-02 17:09:16,780][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 13434880. Throughput: 0: 1148.7. Samples: 3373140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:09:16,782][03057] Avg episode reward: [(0, '25.171')] +[2025-09-02 17:09:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 13500416. Throughput: 0: 1092.8. Samples: 3378244. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:09:21,780][03057] Avg episode reward: [(0, '24.967')] +[2025-09-02 17:09:26,779][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 13500416. Throughput: 0: 1090.7. Samples: 3382060. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:09:26,780][03057] Avg episode reward: [(0, '25.136')] +[2025-09-02 17:09:31,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 13500416. Throughput: 0: 1142.1. Samples: 3390088. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:09:31,781][03057] Avg episode reward: [(0, '25.808')] +[2025-09-02 17:09:36,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13565952. Throughput: 0: 1108.6. Samples: 3395252. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:09:36,783][03057] Avg episode reward: [(0, '24.890')] +[2025-09-02 17:09:41,778][03057] Fps is (10 sec: 6554.3, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13565952. Throughput: 0: 1092.0. Samples: 3398560. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:09:41,780][03057] Avg episode reward: [(0, '24.149')] +[2025-09-02 17:09:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13565952. Throughput: 0: 1124.7. Samples: 3406540. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:09:46,779][03057] Avg episode reward: [(0, '24.873')] +[2025-09-02 17:09:51,779][03057] Fps is (10 sec: 6553.0, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 13631488. Throughput: 0: 1127.3. Samples: 3412600. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:09:51,783][03057] Avg episode reward: [(0, '24.405')] +[2025-09-02 17:09:56,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 13631488. Throughput: 0: 1095.5. Samples: 3415124. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:09:56,781][03057] Avg episode reward: [(0, '25.337')] +[2025-09-02 17:10:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13631488. Throughput: 0: 1104.1. Samples: 3422820. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:10:01,780][03057] Avg episode reward: [(0, '24.904')] +[2025-09-02 17:10:06,778][03057] Fps is (10 sec: 6554.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13697024. Throughput: 0: 1149.0. Samples: 3429948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:06,784][03057] Avg episode reward: [(0, '24.150')] +[2025-09-02 17:10:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 13697024. Throughput: 0: 1120.2. Samples: 3432468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:11,779][03057] Avg episode reward: [(0, '23.468')] +[2025-09-02 17:10:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 13697024. Throughput: 0: 1088.3. Samples: 3439060. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:16,780][03057] Avg episode reward: [(0, '25.111')] +[2025-09-02 17:10:19,851][03390] Updated weights for policy 0, policy_version 210 (0.0023) +[2025-09-02 17:10:21,779][03057] Fps is (10 sec: 6552.9, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 13762560. Throughput: 0: 1139.3. Samples: 3446520. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:21,782][03057] Avg episode reward: [(0, '27.139')] +[2025-09-02 17:10:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13762560. Throughput: 0: 1130.7. Samples: 3449440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:26,780][03057] Avg episode reward: [(0, '27.455')] +[2025-09-02 17:10:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13762560. Throughput: 0: 1076.4. Samples: 3454976. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:31,779][03057] Avg episode reward: [(0, '27.762')] +[2025-09-02 17:10:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13828096. Throughput: 0: 1103.1. Samples: 3462240. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:36,780][03057] Avg episode reward: [(0, '29.530')] +[2025-09-02 17:10:41,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13828096. Throughput: 0: 1130.8. Samples: 3466008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:41,784][03057] Avg episode reward: [(0, '28.932')] +[2025-09-02 17:10:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13828096. Throughput: 0: 1070.6. Samples: 3470996. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:46,780][03057] Avg episode reward: [(0, '27.992')] +[2025-09-02 17:10:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13893632. Throughput: 0: 1063.3. Samples: 3477796. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:51,780][03057] Avg episode reward: [(0, '26.765')] +[2025-09-02 17:10:56,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 13893632. Throughput: 0: 1087.6. Samples: 3481412. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:10:56,784][03057] Avg episode reward: [(0, '26.965')] +[2025-09-02 17:10:56,791][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000212_13893632.pth... +[2025-09-02 17:10:56,960][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000196_12845056.pth +[2025-09-02 17:11:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13893632. Throughput: 0: 1073.4. Samples: 3487364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:01,780][03057] Avg episode reward: [(0, '26.290')] +[2025-09-02 17:11:06,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13959168. Throughput: 0: 1038.0. Samples: 3493228. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:06,788][03057] Avg episode reward: [(0, '26.119')] +[2025-09-02 17:11:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 13959168. Throughput: 0: 1059.0. Samples: 3497096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:11,785][03057] Avg episode reward: [(0, '25.973')] +[2025-09-02 17:11:16,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 13959168. Throughput: 0: 1091.6. Samples: 3504100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:16,782][03057] Avg episode reward: [(0, '25.342')] +[2025-09-02 17:11:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.9, 300 sec: 4221.0). Total num frames: 13959168. Throughput: 0: 1040.4. Samples: 3509056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:21,781][03057] Avg episode reward: [(0, '25.889')] +[2025-09-02 17:11:26,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14024704. Throughput: 0: 1033.0. Samples: 3512492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:26,785][03057] Avg episode reward: [(0, '25.786')] +[2025-09-02 17:11:31,779][03057] Fps is (10 sec: 6552.8, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 14024704. Throughput: 0: 1092.5. Samples: 3520160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:31,781][03057] Avg episode reward: [(0, '24.676')] +[2025-09-02 17:11:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 14024704. Throughput: 0: 1060.7. Samples: 3525528. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:36,780][03057] Avg episode reward: [(0, '23.507')] +[2025-09-02 17:11:41,778][03057] Fps is (10 sec: 6554.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14090240. Throughput: 0: 1042.3. Samples: 3528316. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:41,780][03057] Avg episode reward: [(0, '23.679')] +[2025-09-02 17:11:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14090240. Throughput: 0: 1085.0. Samples: 3536188. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:46,781][03057] Avg episode reward: [(0, '22.462')] +[2025-09-02 17:11:51,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 14155776. Throughput: 0: 1097.1. Samples: 3542600. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:51,781][03057] Avg episode reward: [(0, '23.631')] +[2025-09-02 17:11:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 14155776. Throughput: 0: 1065.2. Samples: 3545032. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:11:56,780][03057] Avg episode reward: [(0, '23.831')] +[2025-09-02 17:12:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14155776. Throughput: 0: 1071.9. Samples: 3552332. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:01,780][03057] Avg episode reward: [(0, '25.021')] +[2025-09-02 17:12:06,782][03057] Fps is (10 sec: 6551.1, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 14221312. Throughput: 0: 1120.5. Samples: 3559484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:06,784][03057] Avg episode reward: [(0, '26.998')] +[2025-09-02 17:12:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14221312. Throughput: 0: 1103.4. Samples: 3562144. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:11,783][03057] Avg episode reward: [(0, '27.539')] +[2025-09-02 17:12:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 14221312. Throughput: 0: 1068.9. Samples: 3568260. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:16,779][03057] Avg episode reward: [(0, '27.029')] +[2025-09-02 17:12:21,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 14286848. Throughput: 0: 1115.0. Samples: 3575704. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:21,782][03057] Avg episode reward: [(0, '27.504')] +[2025-09-02 17:12:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14286848. Throughput: 0: 1129.8. Samples: 3579156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:26,785][03057] Avg episode reward: [(0, '27.282')] +[2025-09-02 17:12:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 14286848. Throughput: 0: 1065.7. Samples: 3584144. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:31,780][03057] Avg episode reward: [(0, '27.193')] +[2025-09-02 17:12:36,780][03057] Fps is (10 sec: 6552.5, 60 sec: 5461.2, 300 sec: 4443.1). Total num frames: 14352384. Throughput: 0: 1089.4. Samples: 3591624. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:36,781][03057] Avg episode reward: [(0, '26.608')] +[2025-09-02 17:12:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14352384. Throughput: 0: 1120.9. Samples: 3595472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:41,783][03057] Avg episode reward: [(0, '27.073')] +[2025-09-02 17:12:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 14352384. Throughput: 0: 1084.6. Samples: 3601140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:46,781][03057] Avg episode reward: [(0, '26.549')] +[2025-09-02 17:12:51,323][03390] Updated weights for policy 0, policy_version 220 (0.0024) +[2025-09-02 17:12:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14417920. Throughput: 0: 1066.9. Samples: 3607492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:51,780][03057] Avg episode reward: [(0, '25.253')] +[2025-09-02 17:12:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14417920. Throughput: 0: 1093.4. Samples: 3611348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:12:56,780][03057] Avg episode reward: [(0, '25.869')] +[2025-09-02 17:12:56,785][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000220_14417920.pth... +[2025-09-02 17:12:56,908][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000204_13369344.pth +[2025-09-02 17:13:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 14417920. Throughput: 0: 1100.9. Samples: 3617800. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:13:01,787][03057] Avg episode reward: [(0, '26.357')] +[2025-09-02 17:13:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 3277.0, 300 sec: 4221.0). Total num frames: 14417920. Throughput: 0: 1062.6. Samples: 3623520. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:13:06,779][03057] Avg episode reward: [(0, '26.443')] +[2025-09-02 17:13:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14483456. Throughput: 0: 1066.9. Samples: 3627168. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:13:11,779][03057] Avg episode reward: [(0, '27.532')] +[2025-09-02 17:13:16,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 14483456. Throughput: 0: 1127.8. Samples: 3634896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:13:16,780][03057] Avg episode reward: [(0, '28.086')] +[2025-09-02 17:13:21,780][03057] Fps is (10 sec: 0.0, 60 sec: 3276.7, 300 sec: 4221.0). Total num frames: 14483456. Throughput: 0: 1081.1. Samples: 3640272. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:13:21,785][03057] Avg episode reward: [(0, '28.526')] +[2025-09-02 17:13:26,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14548992. Throughput: 0: 1068.3. Samples: 3643544. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:13:26,788][03057] Avg episode reward: [(0, '26.322')] +[2025-09-02 17:13:31,778][03057] Fps is (10 sec: 6554.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 14548992. Throughput: 0: 1116.4. Samples: 3651380. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:13:31,780][03057] Avg episode reward: [(0, '26.659')] +[2025-09-02 17:13:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 14614528. Throughput: 0: 1098.8. Samples: 3656940. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-02 17:13:36,780][03057] Avg episode reward: [(0, '27.341')] +[2025-09-02 17:13:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14614528. Throughput: 0: 1074.9. Samples: 3659720. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-02 17:13:41,779][03057] Avg episode reward: [(0, '26.667')] +[2025-09-02 17:13:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 14614528. Throughput: 0: 1105.5. Samples: 3667548. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-02 17:13:46,780][03057] Avg episode reward: [(0, '26.600')] +[2025-09-02 17:13:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 14680064. Throughput: 0: 1114.7. Samples: 3673680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:13:51,781][03057] Avg episode reward: [(0, '27.252')] +[2025-09-02 17:13:56,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14680064. Throughput: 0: 1088.7. Samples: 3676160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:13:56,780][03057] Avg episode reward: [(0, '25.812')] +[2025-09-02 17:14:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 14680064. Throughput: 0: 1075.7. Samples: 3683300. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:01,780][03057] Avg episode reward: [(0, '24.293')] +[2025-09-02 17:14:06,778][03057] Fps is (10 sec: 6553.8, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 14745600. Throughput: 0: 1114.3. Samples: 3690416. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:06,781][03057] Avg episode reward: [(0, '24.843')] +[2025-09-02 17:14:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 14745600. Throughput: 0: 1099.8. Samples: 3693036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:11,779][03057] Avg episode reward: [(0, '25.313')] +[2025-09-02 17:14:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 14745600. Throughput: 0: 1060.1. Samples: 3699084. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:16,780][03057] Avg episode reward: [(0, '25.235')] +[2025-09-02 17:14:21,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.5, 300 sec: 4443.1). Total num frames: 14811136. Throughput: 0: 1094.3. Samples: 3706184. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:21,779][03057] Avg episode reward: [(0, '25.073')] +[2025-09-02 17:14:26,782][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14811136. Throughput: 0: 1104.6. Samples: 3709428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:26,783][03057] Avg episode reward: [(0, '25.679')] +[2025-09-02 17:14:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 14811136. Throughput: 0: 1042.8. Samples: 3714472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:31,779][03057] Avg episode reward: [(0, '24.959')] +[2025-09-02 17:14:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 14811136. Throughput: 0: 1068.0. Samples: 3721740. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:36,781][03057] Avg episode reward: [(0, '23.892')] +[2025-09-02 17:14:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14876672. Throughput: 0: 1093.8. Samples: 3725380. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:41,784][03057] Avg episode reward: [(0, '24.632')] +[2025-09-02 17:14:46,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 14876672. Throughput: 0: 1057.1. Samples: 3730868. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:46,780][03057] Avg episode reward: [(0, '26.364')] +[2025-09-02 17:14:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 14876672. Throughput: 0: 1049.9. Samples: 3737660. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:51,780][03057] Avg episode reward: [(0, '26.542')] +[2025-09-02 17:14:56,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 14942208. Throughput: 0: 1075.6. Samples: 3741440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:14:56,779][03057] Avg episode reward: [(0, '26.504')] +[2025-09-02 17:14:56,785][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000228_14942208.pth... +[2025-09-02 17:14:56,923][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000212_13893632.pth +[2025-09-02 17:15:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 14942208. Throughput: 0: 1083.0. Samples: 3747820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:01,784][03057] Avg episode reward: [(0, '25.103')] +[2025-09-02 17:15:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 14942208. Throughput: 0: 1061.2. Samples: 3753940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:06,789][03057] Avg episode reward: [(0, '22.204')] +[2025-09-02 17:15:11,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 15007744. Throughput: 0: 1066.6. Samples: 3757424. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:11,780][03057] Avg episode reward: [(0, '21.936')] +[2025-09-02 17:15:16,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15007744. Throughput: 0: 1121.0. Samples: 3764916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:16,780][03057] Avg episode reward: [(0, '20.680')] +[2025-09-02 17:15:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 15007744. Throughput: 0: 1071.1. Samples: 3769940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:21,784][03057] Avg episode reward: [(0, '22.133')] +[2025-09-02 17:15:23,901][03390] Updated weights for policy 0, policy_version 230 (0.0012) +[2025-09-02 17:15:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15073280. Throughput: 0: 1065.1. Samples: 3773308. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:26,780][03057] Avg episode reward: [(0, '23.636')] +[2025-09-02 17:15:31,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 15073280. Throughput: 0: 1109.9. Samples: 3780812. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:31,788][03057] Avg episode reward: [(0, '24.265')] +[2025-09-02 17:15:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15073280. Throughput: 0: 1087.6. Samples: 3786600. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:36,782][03057] Avg episode reward: [(0, '26.219')] +[2025-09-02 17:15:41,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15138816. Throughput: 0: 1059.8. Samples: 3789132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:41,788][03057] Avg episode reward: [(0, '27.746')] +[2025-09-02 17:15:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15138816. Throughput: 0: 1091.3. Samples: 3796928. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:46,780][03057] Avg episode reward: [(0, '28.336')] +[2025-09-02 17:15:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15138816. Throughput: 0: 1104.1. Samples: 3803624. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:51,780][03057] Avg episode reward: [(0, '27.146')] +[2025-09-02 17:15:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15204352. Throughput: 0: 1075.4. Samples: 3805816. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:15:56,780][03057] Avg episode reward: [(0, '26.507')] +[2025-09-02 17:16:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15204352. Throughput: 0: 1070.8. Samples: 3813100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:01,780][03057] Avg episode reward: [(0, '25.624')] +[2025-09-02 17:16:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15204352. Throughput: 0: 1121.7. Samples: 3820416. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:06,781][03057] Avg episode reward: [(0, '25.608')] +[2025-09-02 17:16:11,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 15269888. Throughput: 0: 1098.0. Samples: 3822720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:11,781][03057] Avg episode reward: [(0, '24.435')] +[2025-09-02 17:16:16,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15269888. Throughput: 0: 1081.3. Samples: 3829468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:16,780][03057] Avg episode reward: [(0, '24.862')] +[2025-09-02 17:16:21,778][03057] Fps is (10 sec: 6554.7, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 15335424. Throughput: 0: 1123.5. Samples: 3837156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:21,780][03057] Avg episode reward: [(0, '26.240')] +[2025-09-02 17:16:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15335424. Throughput: 0: 1135.1. Samples: 3840212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:26,780][03057] Avg episode reward: [(0, '25.360')] +[2025-09-02 17:16:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15335424. Throughput: 0: 1086.0. Samples: 3845796. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:31,779][03057] Avg episode reward: [(0, '24.766')] +[2025-09-02 17:16:36,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 15335424. Throughput: 0: 1107.7. Samples: 3853472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:36,783][03057] Avg episode reward: [(0, '25.583')] +[2025-09-02 17:16:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15400960. Throughput: 0: 1142.4. Samples: 3857224. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:41,779][03057] Avg episode reward: [(0, '26.306')] +[2025-09-02 17:16:46,778][03057] Fps is (10 sec: 6555.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15400960. Throughput: 0: 1097.9. Samples: 3862504. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:46,779][03057] Avg episode reward: [(0, '25.699')] +[2025-09-02 17:16:51,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 15400960. Throughput: 0: 1097.8. Samples: 3869816. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:51,780][03057] Avg episode reward: [(0, '27.992')] +[2025-09-02 17:16:56,780][03057] Fps is (10 sec: 6552.4, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 15466496. Throughput: 0: 1128.2. Samples: 3873488. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:16:56,781][03057] Avg episode reward: [(0, '27.780')] +[2025-09-02 17:16:56,792][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000236_15466496.pth... +[2025-09-02 17:16:56,934][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000220_14417920.pth +[2025-09-02 17:17:01,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15466496. Throughput: 0: 1106.1. Samples: 3879244. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:01,781][03057] Avg episode reward: [(0, '28.726')] +[2025-09-02 17:17:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15466496. Throughput: 0: 1068.7. Samples: 3885248. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:06,789][03057] Avg episode reward: [(0, '28.525')] +[2025-09-02 17:17:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 15532032. Throughput: 0: 1080.4. Samples: 3888832. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:11,780][03057] Avg episode reward: [(0, '29.179')] +[2025-09-02 17:17:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15532032. Throughput: 0: 1117.2. Samples: 3896072. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:16,780][03057] Avg episode reward: [(0, '27.061')] +[2025-09-02 17:17:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 15532032. Throughput: 0: 1068.2. Samples: 3901540. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:21,780][03057] Avg episode reward: [(0, '27.385')] +[2025-09-02 17:17:26,780][03057] Fps is (10 sec: 6552.6, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 15597568. Throughput: 0: 1068.1. Samples: 3905288. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:26,781][03057] Avg episode reward: [(0, '25.803')] +[2025-09-02 17:17:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15597568. Throughput: 0: 1125.0. Samples: 3913128. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:31,786][03057] Avg episode reward: [(0, '23.704')] +[2025-09-02 17:17:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 15597568. Throughput: 0: 1081.1. Samples: 3918464. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:36,780][03057] Avg episode reward: [(0, '22.368')] +[2025-09-02 17:17:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15663104. Throughput: 0: 1068.8. Samples: 3921584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:41,781][03057] Avg episode reward: [(0, '22.537')] +[2025-09-02 17:17:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15663104. Throughput: 0: 1113.7. Samples: 3929360. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:46,780][03057] Avg episode reward: [(0, '23.443')] +[2025-09-02 17:17:50,755][03390] Updated weights for policy 0, policy_version 240 (0.0026) +[2025-09-02 17:17:51,779][03057] Fps is (10 sec: 6553.3, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 15728640. Throughput: 0: 1113.5. Samples: 3935356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:51,780][03057] Avg episode reward: [(0, '24.209')] +[2025-09-02 17:17:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 15728640. Throughput: 0: 1090.9. Samples: 3937924. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:17:56,779][03057] Avg episode reward: [(0, '24.947')] +[2025-09-02 17:18:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15728640. Throughput: 0: 1102.4. Samples: 3945680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:18:01,786][03057] Avg episode reward: [(0, '27.275')] +[2025-09-02 17:18:06,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 15794176. Throughput: 0: 1129.1. Samples: 3952348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:18:06,784][03057] Avg episode reward: [(0, '26.631')] +[2025-09-02 17:18:11,780][03057] Fps is (10 sec: 6552.4, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 15794176. Throughput: 0: 1104.7. Samples: 3955000. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:18:11,781][03057] Avg episode reward: [(0, '25.834')] +[2025-09-02 17:18:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15794176. Throughput: 0: 1089.2. Samples: 3962140. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:18:16,787][03057] Avg episode reward: [(0, '25.231')] +[2025-09-02 17:18:21,779][03057] Fps is (10 sec: 6554.4, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 15859712. Throughput: 0: 1134.4. Samples: 3969512. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:18:21,780][03057] Avg episode reward: [(0, '24.470')] +[2025-09-02 17:18:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 15859712. Throughput: 0: 1123.5. Samples: 3972140. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:18:26,785][03057] Avg episode reward: [(0, '24.899')] +[2025-09-02 17:18:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15859712. Throughput: 0: 1081.1. Samples: 3978008. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:18:31,784][03057] Avg episode reward: [(0, '25.363')] +[2025-09-02 17:18:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 15925248. Throughput: 0: 1106.9. Samples: 3985168. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:18:36,781][03057] Avg episode reward: [(0, '25.916')] +[2025-09-02 17:18:41,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15925248. Throughput: 0: 1124.8. Samples: 3988540. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:18:41,782][03057] Avg episode reward: [(0, '25.218')] +[2025-09-02 17:18:46,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 15925248. Throughput: 0: 1064.4. Samples: 3993580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:18:46,790][03057] Avg episode reward: [(0, '26.721')] +[2025-09-02 17:18:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15990784. Throughput: 0: 1081.5. Samples: 4001016. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:18:51,787][03057] Avg episode reward: [(0, '27.216')] +[2025-09-02 17:18:56,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 15990784. Throughput: 0: 1108.2. Samples: 4004868. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:18:56,780][03057] Avg episode reward: [(0, '27.891')] +[2025-09-02 17:18:56,786][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000244_15990784.pth... +[2025-09-02 17:18:56,956][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000228_14942208.pth +[2025-09-02 17:19:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 15990784. Throughput: 0: 1070.9. Samples: 4010332. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:01,782][03057] Avg episode reward: [(0, '27.859')] +[2025-09-02 17:19:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 15990784. Throughput: 0: 1047.2. Samples: 4016636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:06,795][03057] Avg episode reward: [(0, '26.636')] +[2025-09-02 17:19:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 16056320. Throughput: 0: 1072.1. Samples: 4020384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:11,784][03057] Avg episode reward: [(0, '26.461')] +[2025-09-02 17:19:16,779][03057] Fps is (10 sec: 6553.0, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 16056320. Throughput: 0: 1096.3. Samples: 4027344. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:16,781][03057] Avg episode reward: [(0, '27.900')] +[2025-09-02 17:19:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 16056320. Throughput: 0: 1067.7. Samples: 4033216. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:21,779][03057] Avg episode reward: [(0, '27.761')] +[2025-09-02 17:19:26,778][03057] Fps is (10 sec: 6554.3, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16121856. Throughput: 0: 1070.8. Samples: 4036724. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:26,779][03057] Avg episode reward: [(0, '27.999')] +[2025-09-02 17:19:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16121856. Throughput: 0: 1132.2. Samples: 4044528. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:31,780][03057] Avg episode reward: [(0, '27.457')] +[2025-09-02 17:19:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 16121856. Throughput: 0: 1081.2. Samples: 4049672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:36,785][03057] Avg episode reward: [(0, '26.690')] +[2025-09-02 17:19:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16187392. Throughput: 0: 1066.8. Samples: 4052872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:41,779][03057] Avg episode reward: [(0, '26.711')] +[2025-09-02 17:19:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 16187392. Throughput: 0: 1125.0. Samples: 4060956. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:46,782][03057] Avg episode reward: [(0, '26.371')] +[2025-09-02 17:19:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 16187392. Throughput: 0: 1112.3. Samples: 4066688. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:51,784][03057] Avg episode reward: [(0, '26.205')] +[2025-09-02 17:19:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16252928. Throughput: 0: 1089.9. Samples: 4069428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:19:56,779][03057] Avg episode reward: [(0, '27.600')] +[2025-09-02 17:20:01,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16252928. Throughput: 0: 1114.9. Samples: 4077512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:01,787][03057] Avg episode reward: [(0, '28.653')] +[2025-09-02 17:20:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 16318464. Throughput: 0: 1124.6. Samples: 4083824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:06,782][03057] Avg episode reward: [(0, '28.690')] +[2025-09-02 17:20:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16318464. Throughput: 0: 1104.2. Samples: 4086412. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:11,786][03057] Avg episode reward: [(0, '28.335')] +[2025-09-02 17:20:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16318464. Throughput: 0: 1090.9. Samples: 4093620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:16,780][03057] Avg episode reward: [(0, '28.721')] +[2025-09-02 17:20:20,302][03390] Updated weights for policy 0, policy_version 250 (0.0015) +[2025-09-02 17:20:21,782][03057] Fps is (10 sec: 6550.9, 60 sec: 5460.9, 300 sec: 4443.1). Total num frames: 16384000. Throughput: 0: 1142.7. Samples: 4101100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:21,784][03057] Avg episode reward: [(0, '28.012')] +[2025-09-02 17:20:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16384000. Throughput: 0: 1131.2. Samples: 4103776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:26,781][03057] Avg episode reward: [(0, '29.419')] +[2025-09-02 17:20:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16384000. Throughput: 0: 1088.8. Samples: 4109952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:31,779][03057] Avg episode reward: [(0, '28.691')] +[2025-09-02 17:20:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 16449536. Throughput: 0: 1124.8. Samples: 4117304. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:36,780][03057] Avg episode reward: [(0, '29.139')] +[2025-09-02 17:20:41,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16449536. Throughput: 0: 1137.2. Samples: 4120604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:41,780][03057] Avg episode reward: [(0, '29.819')] +[2025-09-02 17:20:41,782][03375] Saving new best policy, reward=29.819! +[2025-09-02 17:20:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16449536. Throughput: 0: 1078.0. Samples: 4126024. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:46,780][03057] Avg episode reward: [(0, '30.399')] +[2025-09-02 17:20:46,796][03375] Saving new best policy, reward=30.399! +[2025-09-02 17:20:51,778][03057] Fps is (10 sec: 6553.4, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 16515072. Throughput: 0: 1102.0. Samples: 4133416. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:51,780][03057] Avg episode reward: [(0, '30.045')] +[2025-09-02 17:20:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16515072. Throughput: 0: 1133.3. Samples: 4137412. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:20:56,780][03057] Avg episode reward: [(0, '30.215')] +[2025-09-02 17:20:56,794][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000252_16515072.pth... +[2025-09-02 17:20:56,936][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000236_15466496.pth +[2025-09-02 17:21:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16515072. Throughput: 0: 1093.2. Samples: 4142812. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:01,786][03057] Avg episode reward: [(0, '29.331')] +[2025-09-02 17:21:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16580608. Throughput: 0: 1077.3. Samples: 4149576. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:06,780][03057] Avg episode reward: [(0, '28.881')] +[2025-09-02 17:21:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16580608. Throughput: 0: 1106.0. Samples: 4153548. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:11,780][03057] Avg episode reward: [(0, '26.783')] +[2025-09-02 17:21:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 16580608. Throughput: 0: 1111.2. Samples: 4159956. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:16,782][03057] Avg episode reward: [(0, '27.447')] +[2025-09-02 17:21:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.4, 300 sec: 4443.1). Total num frames: 16646144. Throughput: 0: 1081.6. Samples: 4165976. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:21,780][03057] Avg episode reward: [(0, '25.853')] +[2025-09-02 17:21:26,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16646144. Throughput: 0: 1095.4. Samples: 4169896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:26,780][03057] Avg episode reward: [(0, '26.858')] +[2025-09-02 17:21:31,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4443.2). Total num frames: 16646144. Throughput: 0: 1139.9. Samples: 4177320. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:31,782][03057] Avg episode reward: [(0, '25.274')] +[2025-09-02 17:21:36,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16711680. Throughput: 0: 1081.4. Samples: 4182080. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:36,780][03057] Avg episode reward: [(0, '26.376')] +[2025-09-02 17:21:41,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16711680. Throughput: 0: 1080.4. Samples: 4186032. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:41,779][03057] Avg episode reward: [(0, '26.066')] +[2025-09-02 17:21:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16711680. Throughput: 0: 1136.9. Samples: 4193972. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:46,783][03057] Avg episode reward: [(0, '26.799')] +[2025-09-02 17:21:51,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 16777216. Throughput: 0: 1105.7. Samples: 4199332. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:51,784][03057] Avg episode reward: [(0, '26.615')] +[2025-09-02 17:21:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16777216. Throughput: 0: 1087.4. Samples: 4202480. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:21:56,785][03057] Avg episode reward: [(0, '26.809')] +[2025-09-02 17:22:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16777216. Throughput: 0: 1121.0. Samples: 4210400. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:01,785][03057] Avg episode reward: [(0, '26.584')] +[2025-09-02 17:22:06,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 16842752. Throughput: 0: 1121.0. Samples: 4216420. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:06,780][03057] Avg episode reward: [(0, '27.037')] +[2025-09-02 17:22:11,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16842752. Throughput: 0: 1092.1. Samples: 4219040. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:11,779][03057] Avg episode reward: [(0, '26.279')] +[2025-09-02 17:22:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16842752. Throughput: 0: 1092.2. Samples: 4226468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:16,781][03057] Avg episode reward: [(0, '27.691')] +[2025-09-02 17:22:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16908288. Throughput: 0: 1146.5. Samples: 4233672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:21,783][03057] Avg episode reward: [(0, '27.776')] +[2025-09-02 17:22:26,781][03057] Fps is (10 sec: 6551.8, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 16908288. Throughput: 0: 1116.6. Samples: 4236280. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:26,785][03057] Avg episode reward: [(0, '27.067')] +[2025-09-02 17:22:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 16908288. Throughput: 0: 1084.7. Samples: 4242784. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:31,782][03057] Avg episode reward: [(0, '27.122')] +[2025-09-02 17:22:36,780][03057] Fps is (10 sec: 6553.9, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 16973824. Throughput: 0: 1129.9. Samples: 4250180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:36,782][03057] Avg episode reward: [(0, '26.881')] +[2025-09-02 17:22:41,780][03057] Fps is (10 sec: 6552.4, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 16973824. Throughput: 0: 1130.0. Samples: 4253332. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:41,783][03057] Avg episode reward: [(0, '26.208')] +[2025-09-02 17:22:46,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 16973824. Throughput: 0: 1077.2. Samples: 4258876. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:46,780][03057] Avg episode reward: [(0, '26.556')] +[2025-09-02 17:22:49,205][03390] Updated weights for policy 0, policy_version 260 (0.0014) +[2025-09-02 17:22:51,778][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17039360. Throughput: 0: 1111.6. Samples: 4266440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:51,781][03057] Avg episode reward: [(0, '26.864')] +[2025-09-02 17:22:56,782][03057] Fps is (10 sec: 6551.1, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 17039360. Throughput: 0: 1139.8. Samples: 4270336. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:22:56,784][03057] Avg episode reward: [(0, '26.830')] +[2025-09-02 17:22:56,796][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000260_17039360.pth... +[2025-09-02 17:22:56,939][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000244_15990784.pth +[2025-09-02 17:23:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 17039360. Throughput: 0: 1088.7. Samples: 4275460. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:23:01,787][03057] Avg episode reward: [(0, '27.418')] +[2025-09-02 17:23:06,778][03057] Fps is (10 sec: 6556.3, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17104896. Throughput: 0: 1079.8. Samples: 4282264. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:23:06,785][03057] Avg episode reward: [(0, '28.579')] +[2025-09-02 17:23:11,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17104896. Throughput: 0: 1110.3. Samples: 4286240. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:23:11,785][03057] Avg episode reward: [(0, '28.059')] +[2025-09-02 17:23:16,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 17104896. Throughput: 0: 1107.3. Samples: 4292612. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:23:16,780][03057] Avg episode reward: [(0, '27.760')] +[2025-09-02 17:23:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17170432. Throughput: 0: 1078.9. Samples: 4298728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:23:21,786][03057] Avg episode reward: [(0, '28.109')] +[2025-09-02 17:23:26,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 17170432. Throughput: 0: 1096.0. Samples: 4302652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:23:26,779][03057] Avg episode reward: [(0, '27.579')] +[2025-09-02 17:23:31,782][03057] Fps is (10 sec: 0.0, 60 sec: 4368.8, 300 sec: 4220.9). Total num frames: 17170432. Throughput: 0: 1129.2. Samples: 4309692. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:23:31,783][03057] Avg episode reward: [(0, '27.424')] +[2025-09-02 17:23:36,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 17235968. Throughput: 0: 1072.4. Samples: 4314696. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:23:36,785][03057] Avg episode reward: [(0, '27.239')] +[2025-09-02 17:23:41,778][03057] Fps is (10 sec: 6556.0, 60 sec: 4369.2, 300 sec: 4443.2). Total num frames: 17235968. Throughput: 0: 1074.1. Samples: 4318664. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:23:41,779][03057] Avg episode reward: [(0, '27.144')] +[2025-09-02 17:23:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 17235968. Throughput: 0: 1137.1. Samples: 4326628. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:23:46,782][03057] Avg episode reward: [(0, '26.788')] +[2025-09-02 17:23:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17301504. Throughput: 0: 1097.5. Samples: 4331652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:23:51,784][03057] Avg episode reward: [(0, '28.609')] +[2025-09-02 17:23:56,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.4, 300 sec: 4443.1). Total num frames: 17301504. Throughput: 0: 1088.4. Samples: 4335216. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:23:56,779][03057] Avg episode reward: [(0, '28.007')] +[2025-09-02 17:24:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17301504. Throughput: 0: 1120.8. Samples: 4343048. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:01,787][03057] Avg episode reward: [(0, '27.521')] +[2025-09-02 17:24:06,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17367040. Throughput: 0: 1110.0. Samples: 4348676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:06,780][03057] Avg episode reward: [(0, '27.631')] +[2025-09-02 17:24:11,780][03057] Fps is (10 sec: 6552.4, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 17367040. Throughput: 0: 1078.9. Samples: 4351204. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:11,781][03057] Avg episode reward: [(0, '27.306')] +[2025-09-02 17:24:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17367040. Throughput: 0: 1092.1. Samples: 4358832. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:16,780][03057] Avg episode reward: [(0, '26.600')] +[2025-09-02 17:24:21,778][03057] Fps is (10 sec: 6554.6, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 17432576. Throughput: 0: 1135.1. Samples: 4365776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:21,783][03057] Avg episode reward: [(0, '27.254')] +[2025-09-02 17:24:26,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17432576. Throughput: 0: 1102.8. Samples: 4368288. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:26,780][03057] Avg episode reward: [(0, '26.529')] +[2025-09-02 17:24:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 17432576. Throughput: 0: 1077.5. Samples: 4375116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:31,780][03057] Avg episode reward: [(0, '26.990')] +[2025-09-02 17:24:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17498112. Throughput: 0: 1130.4. Samples: 4382520. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:36,779][03057] Avg episode reward: [(0, '27.983')] +[2025-09-02 17:24:41,779][03057] Fps is (10 sec: 6552.8, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 17498112. Throughput: 0: 1109.8. Samples: 4385156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:41,781][03057] Avg episode reward: [(0, '27.490')] +[2025-09-02 17:24:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17498112. Throughput: 0: 1065.5. Samples: 4390996. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:46,780][03057] Avg episode reward: [(0, '28.287')] +[2025-09-02 17:24:51,779][03057] Fps is (10 sec: 6554.0, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 17563648. Throughput: 0: 1105.9. Samples: 4398444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:51,780][03057] Avg episode reward: [(0, '28.132')] +[2025-09-02 17:24:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17563648. Throughput: 0: 1131.7. Samples: 4402128. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:24:56,782][03057] Avg episode reward: [(0, '28.144')] +[2025-09-02 17:24:56,790][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_17563648.pth... +[2025-09-02 17:24:56,928][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000252_16515072.pth +[2025-09-02 17:25:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 17563648. Throughput: 0: 1074.3. Samples: 4407176. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:25:01,779][03057] Avg episode reward: [(0, '28.090')] +[2025-09-02 17:25:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17629184. Throughput: 0: 1078.5. Samples: 4414308. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:25:06,780][03057] Avg episode reward: [(0, '27.598')] +[2025-09-02 17:25:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 17629184. Throughput: 0: 1106.8. Samples: 4418096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:25:11,785][03057] Avg episode reward: [(0, '27.005')] +[2025-09-02 17:25:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 17629184. Throughput: 0: 1086.0. Samples: 4423988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:25:16,785][03057] Avg episode reward: [(0, '27.137')] +[2025-09-02 17:25:19,219][03390] Updated weights for policy 0, policy_version 270 (0.0037) +[2025-09-02 17:25:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17694720. Throughput: 0: 1054.8. Samples: 4429988. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:25:21,780][03057] Avg episode reward: [(0, '28.780')] +[2025-09-02 17:25:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17694720. Throughput: 0: 1082.8. Samples: 4433880. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:25:26,780][03057] Avg episode reward: [(0, '27.620')] +[2025-09-02 17:25:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 17694720. Throughput: 0: 1103.8. Samples: 4440668. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:25:31,785][03057] Avg episode reward: [(0, '28.830')] +[2025-09-02 17:25:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17760256. Throughput: 0: 1059.8. Samples: 4446136. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:25:36,781][03057] Avg episode reward: [(0, '28.098')] +[2025-09-02 17:25:41,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 17760256. Throughput: 0: 1062.3. Samples: 4449932. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:25:41,785][03057] Avg episode reward: [(0, '27.431')] +[2025-09-02 17:25:46,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 17760256. Throughput: 0: 1114.9. Samples: 4457348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:25:46,784][03057] Avg episode reward: [(0, '27.924')] +[2025-09-02 17:25:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17825792. Throughput: 0: 1065.2. Samples: 4462244. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:25:51,780][03057] Avg episode reward: [(0, '27.954')] +[2025-09-02 17:25:56,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17825792. Throughput: 0: 1063.9. Samples: 4465972. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:25:56,780][03057] Avg episode reward: [(0, '27.982')] +[2025-09-02 17:26:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 17825792. Throughput: 0: 1107.1. Samples: 4473808. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:01,788][03057] Avg episode reward: [(0, '28.603')] +[2025-09-02 17:26:06,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 17891328. Throughput: 0: 1099.2. Samples: 4479452. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:06,780][03057] Avg episode reward: [(0, '29.714')] +[2025-09-02 17:26:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17891328. Throughput: 0: 1070.6. Samples: 4482056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:11,788][03057] Avg episode reward: [(0, '29.276')] +[2025-09-02 17:26:16,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 17891328. Throughput: 0: 1092.5. Samples: 4489832. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:16,788][03057] Avg episode reward: [(0, '30.053')] +[2025-09-02 17:26:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17956864. Throughput: 0: 1115.1. Samples: 4496316. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:21,780][03057] Avg episode reward: [(0, '30.315')] +[2025-09-02 17:26:26,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 17956864. Throughput: 0: 1087.2. Samples: 4498856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:26,780][03057] Avg episode reward: [(0, '28.403')] +[2025-09-02 17:26:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 17956864. Throughput: 0: 1084.7. Samples: 4506160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:31,779][03057] Avg episode reward: [(0, '28.631')] +[2025-09-02 17:26:36,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 18022400. Throughput: 0: 1138.4. Samples: 4513476. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:36,781][03057] Avg episode reward: [(0, '29.218')] +[2025-09-02 17:26:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18022400. Throughput: 0: 1113.2. Samples: 4516064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:41,782][03057] Avg episode reward: [(0, '27.171')] +[2025-09-02 17:26:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 18022400. Throughput: 0: 1080.1. Samples: 4522412. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:46,779][03057] Avg episode reward: [(0, '26.396')] +[2025-09-02 17:26:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18087936. Throughput: 0: 1122.1. Samples: 4529944. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:51,780][03057] Avg episode reward: [(0, '25.977')] +[2025-09-02 17:26:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18087936. Throughput: 0: 1141.4. Samples: 4533420. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:26:56,782][03057] Avg episode reward: [(0, '26.156')] +[2025-09-02 17:26:56,791][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000276_18087936.pth... +[2025-09-02 17:26:56,938][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000260_17039360.pth +[2025-09-02 17:27:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 18087936. Throughput: 0: 1087.8. Samples: 4538784. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:27:01,779][03057] Avg episode reward: [(0, '23.903')] +[2025-09-02 17:27:06,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18153472. Throughput: 0: 1111.8. Samples: 4546348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:27:06,779][03057] Avg episode reward: [(0, '25.184')] +[2025-09-02 17:27:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18153472. Throughput: 0: 1142.4. Samples: 4550264. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:27:11,784][03057] Avg episode reward: [(0, '27.104')] +[2025-09-02 17:27:16,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 18153472. Throughput: 0: 1101.2. Samples: 4555716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:27:16,781][03057] Avg episode reward: [(0, '28.046')] +[2025-09-02 17:27:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 18219008. Throughput: 0: 1092.6. Samples: 4562640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:27:21,783][03057] Avg episode reward: [(0, '26.682')] +[2025-09-02 17:27:26,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18219008. Throughput: 0: 1124.5. Samples: 4566668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:27:26,781][03057] Avg episode reward: [(0, '27.073')] +[2025-09-02 17:27:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.2). Total num frames: 18284544. Throughput: 0: 1125.3. Samples: 4573052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:27:31,780][03057] Avg episode reward: [(0, '26.096')] +[2025-09-02 17:27:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 18284544. Throughput: 0: 1090.9. Samples: 4579036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:27:36,779][03057] Avg episode reward: [(0, '26.251')] +[2025-09-02 17:27:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18284544. Throughput: 0: 1098.9. Samples: 4582872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:27:41,780][03057] Avg episode reward: [(0, '26.728')] +[2025-09-02 17:27:45,693][03390] Updated weights for policy 0, policy_version 280 (0.0021) +[2025-09-02 17:27:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 18350080. Throughput: 0: 1130.6. Samples: 4589660. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:27:46,779][03057] Avg episode reward: [(0, '26.821')] +[2025-09-02 17:27:51,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4443.2). Total num frames: 18350080. Throughput: 0: 1082.5. Samples: 4595060. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:27:51,782][03057] Avg episode reward: [(0, '27.068')] +[2025-09-02 17:27:56,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 18350080. Throughput: 0: 1082.8. Samples: 4598992. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:27:56,782][03057] Avg episode reward: [(0, '27.202')] +[2025-09-02 17:28:01,778][03057] Fps is (10 sec: 6553.9, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 18415616. Throughput: 0: 1137.2. Samples: 4606888. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:28:01,783][03057] Avg episode reward: [(0, '25.284')] +[2025-09-02 17:28:06,779][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 18415616. Throughput: 0: 1094.4. Samples: 4611888. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:28:06,780][03057] Avg episode reward: [(0, '24.377')] +[2025-09-02 17:28:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18415616. Throughput: 0: 1077.8. Samples: 4615168. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:28:11,780][03057] Avg episode reward: [(0, '25.245')] +[2025-09-02 17:28:16,778][03057] Fps is (10 sec: 6554.0, 60 sec: 5461.4, 300 sec: 4443.1). Total num frames: 18481152. Throughput: 0: 1108.6. Samples: 4622940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:28:16,780][03057] Avg episode reward: [(0, '25.281')] +[2025-09-02 17:28:21,779][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 18481152. Throughput: 0: 1112.0. Samples: 4629076. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:28:21,780][03057] Avg episode reward: [(0, '25.597')] +[2025-09-02 17:28:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 18481152. Throughput: 0: 1084.3. Samples: 4631664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:28:26,781][03057] Avg episode reward: [(0, '27.064')] +[2025-09-02 17:28:31,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18546688. Throughput: 0: 1097.5. Samples: 4639048. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:28:31,785][03057] Avg episode reward: [(0, '29.502')] +[2025-09-02 17:28:36,781][03057] Fps is (10 sec: 6551.8, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 18546688. Throughput: 0: 1132.4. Samples: 4646020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:28:36,782][03057] Avg episode reward: [(0, '29.450')] +[2025-09-02 17:28:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 18546688. Throughput: 0: 1103.0. Samples: 4648624. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:28:41,780][03057] Avg episode reward: [(0, '28.887')] +[2025-09-02 17:28:46,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18612224. Throughput: 0: 1067.0. Samples: 4654904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:28:46,787][03057] Avg episode reward: [(0, '30.453')] +[2025-09-02 17:28:46,796][03375] Saving new best policy, reward=30.453! +[2025-09-02 17:28:51,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18612224. Throughput: 0: 1118.8. Samples: 4662232. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:28:51,781][03057] Avg episode reward: [(0, '30.321')] +[2025-09-02 17:28:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 18612224. Throughput: 0: 1112.1. Samples: 4665212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:28:56,780][03057] Avg episode reward: [(0, '30.420')] +[2025-09-02 17:28:56,787][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_18612224.pth... +[2025-09-02 17:28:56,978][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000268_17563648.pth +[2025-09-02 17:29:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 18612224. Throughput: 0: 1056.2. Samples: 4670468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:01,782][03057] Avg episode reward: [(0, '29.123')] +[2025-09-02 17:29:06,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18677760. Throughput: 0: 1080.1. Samples: 4677680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:06,781][03057] Avg episode reward: [(0, '28.343')] +[2025-09-02 17:29:11,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18677760. Throughput: 0: 1106.9. Samples: 4681476. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:11,780][03057] Avg episode reward: [(0, '28.992')] +[2025-09-02 17:29:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 18677760. Throughput: 0: 1055.6. Samples: 4686552. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:16,783][03057] Avg episode reward: [(0, '27.564')] +[2025-09-02 17:29:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18743296. Throughput: 0: 1049.2. Samples: 4693232. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:21,787][03057] Avg episode reward: [(0, '27.364')] +[2025-09-02 17:29:26,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18743296. Throughput: 0: 1078.7. Samples: 4697164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:26,786][03057] Avg episode reward: [(0, '27.488')] +[2025-09-02 17:29:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 18743296. Throughput: 0: 1081.4. Samples: 4703568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:31,782][03057] Avg episode reward: [(0, '28.633')] +[2025-09-02 17:29:36,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 18808832. Throughput: 0: 1051.6. Samples: 4709556. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:36,780][03057] Avg episode reward: [(0, '27.764')] +[2025-09-02 17:29:41,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18808832. Throughput: 0: 1071.4. Samples: 4713424. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:41,784][03057] Avg episode reward: [(0, '28.770')] +[2025-09-02 17:29:46,779][03057] Fps is (10 sec: 6552.8, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 18874368. Throughput: 0: 1112.1. Samples: 4720516. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:46,781][03057] Avg episode reward: [(0, '28.026')] +[2025-09-02 17:29:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18874368. Throughput: 0: 1066.5. Samples: 4725672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:51,779][03057] Avg episode reward: [(0, '27.101')] +[2025-09-02 17:29:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 18874368. Throughput: 0: 1069.0. Samples: 4729580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:29:56,780][03057] Avg episode reward: [(0, '27.310')] +[2025-09-02 17:30:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 18939904. Throughput: 0: 1135.1. Samples: 4737632. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:30:01,785][03057] Avg episode reward: [(0, '26.819')] +[2025-09-02 17:30:06,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 18939904. Throughput: 0: 1094.6. Samples: 4742488. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:30:06,786][03057] Avg episode reward: [(0, '25.965')] +[2025-09-02 17:30:11,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 18939904. Throughput: 0: 1082.0. Samples: 4745856. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:30:11,780][03057] Avg episode reward: [(0, '26.418')] +[2025-09-02 17:30:16,658][03390] Updated weights for policy 0, policy_version 290 (0.0019) +[2025-09-02 17:30:16,779][03057] Fps is (10 sec: 6553.8, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 19005440. Throughput: 0: 1111.5. Samples: 4753584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:30:16,780][03057] Avg episode reward: [(0, '25.692')] +[2025-09-02 17:30:21,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19005440. Throughput: 0: 1109.1. Samples: 4759464. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:30:21,783][03057] Avg episode reward: [(0, '25.387')] +[2025-09-02 17:30:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19005440. Throughput: 0: 1080.2. Samples: 4762032. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:30:26,785][03057] Avg episode reward: [(0, '26.771')] +[2025-09-02 17:30:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 19005440. Throughput: 0: 1094.0. Samples: 4769744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:30:31,779][03057] Avg episode reward: [(0, '26.031')] +[2025-09-02 17:30:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19070976. Throughput: 0: 1131.6. Samples: 4776596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:30:36,783][03057] Avg episode reward: [(0, '25.879')] +[2025-09-02 17:30:41,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19070976. Throughput: 0: 1101.4. Samples: 4779144. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:30:41,780][03057] Avg episode reward: [(0, '25.210')] +[2025-09-02 17:30:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.9, 300 sec: 4221.0). Total num frames: 19070976. Throughput: 0: 1063.7. Samples: 4785500. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:30:46,779][03057] Avg episode reward: [(0, '25.848')] +[2025-09-02 17:30:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19136512. Throughput: 0: 1124.1. Samples: 4793072. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:30:51,780][03057] Avg episode reward: [(0, '25.074')] +[2025-09-02 17:30:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19136512. Throughput: 0: 1112.7. Samples: 4795928. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:30:56,780][03057] Avg episode reward: [(0, '24.890')] +[2025-09-02 17:30:56,791][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_19136512.pth... +[2025-09-02 17:30:56,985][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000276_18087936.pth +[2025-09-02 17:31:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 19136512. Throughput: 0: 1062.8. Samples: 4801408. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:01,781][03057] Avg episode reward: [(0, '24.577')] +[2025-09-02 17:31:06,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19202048. Throughput: 0: 1100.9. Samples: 4809004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:06,783][03057] Avg episode reward: [(0, '26.139')] +[2025-09-02 17:31:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19202048. Throughput: 0: 1128.0. Samples: 4812792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:11,779][03057] Avg episode reward: [(0, '26.450')] +[2025-09-02 17:31:16,781][03057] Fps is (10 sec: 0.0, 60 sec: 3276.6, 300 sec: 4220.9). Total num frames: 19202048. Throughput: 0: 1069.7. Samples: 4817884. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:16,787][03057] Avg episode reward: [(0, '26.255')] +[2025-09-02 17:31:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19267584. Throughput: 0: 1076.8. Samples: 4825052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:21,781][03057] Avg episode reward: [(0, '27.195')] +[2025-09-02 17:31:26,781][03057] Fps is (10 sec: 6553.8, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 19267584. Throughput: 0: 1108.0. Samples: 4829008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:26,782][03057] Avg episode reward: [(0, '27.235')] +[2025-09-02 17:31:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 19267584. Throughput: 0: 1105.2. Samples: 4835232. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:31,781][03057] Avg episode reward: [(0, '27.373')] +[2025-09-02 17:31:36,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19333120. Throughput: 0: 1075.0. Samples: 4841448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:36,780][03057] Avg episode reward: [(0, '27.061')] +[2025-09-02 17:31:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19333120. Throughput: 0: 1100.6. Samples: 4845456. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:41,781][03057] Avg episode reward: [(0, '26.805')] +[2025-09-02 17:31:46,779][03057] Fps is (10 sec: 6553.2, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 19398656. Throughput: 0: 1130.7. Samples: 4852292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:46,786][03057] Avg episode reward: [(0, '27.428')] +[2025-09-02 17:31:51,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 19398656. Throughput: 0: 1080.6. Samples: 4857632. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:51,780][03057] Avg episode reward: [(0, '27.875')] +[2025-09-02 17:31:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19398656. Throughput: 0: 1085.3. Samples: 4861632. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:31:56,779][03057] Avg episode reward: [(0, '27.240')] +[2025-09-02 17:32:01,778][03057] Fps is (10 sec: 6553.9, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 19464192. Throughput: 0: 1147.1. Samples: 4869500. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:01,782][03057] Avg episode reward: [(0, '26.669')] +[2025-09-02 17:32:06,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 19464192. Throughput: 0: 1100.0. Samples: 4874552. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:06,782][03057] Avg episode reward: [(0, '26.059')] +[2025-09-02 17:32:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19464192. Throughput: 0: 1090.9. Samples: 4878096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:11,780][03057] Avg episode reward: [(0, '25.936')] +[2025-09-02 17:32:16,778][03057] Fps is (10 sec: 6553.9, 60 sec: 5461.6, 300 sec: 4443.1). Total num frames: 19529728. Throughput: 0: 1120.5. Samples: 4885656. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:16,785][03057] Avg episode reward: [(0, '25.851')] +[2025-09-02 17:32:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19529728. Throughput: 0: 1122.0. Samples: 4891940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:21,784][03057] Avg episode reward: [(0, '26.971')] +[2025-09-02 17:32:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 19529728. Throughput: 0: 1090.7. Samples: 4894536. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:26,784][03057] Avg episode reward: [(0, '26.145')] +[2025-09-02 17:32:31,781][03057] Fps is (10 sec: 6551.9, 60 sec: 5461.1, 300 sec: 4443.1). Total num frames: 19595264. Throughput: 0: 1106.1. Samples: 4902068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:31,789][03057] Avg episode reward: [(0, '26.720')] +[2025-09-02 17:32:36,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19595264. Throughput: 0: 1148.0. Samples: 4909292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:36,782][03057] Avg episode reward: [(0, '26.844')] +[2025-09-02 17:32:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 19595264. Throughput: 0: 1117.2. Samples: 4911904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:41,781][03057] Avg episode reward: [(0, '26.762')] +[2025-09-02 17:32:45,782][03390] Updated weights for policy 0, policy_version 300 (0.0020) +[2025-09-02 17:32:46,778][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19660800. Throughput: 0: 1086.2. Samples: 4918380. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:46,780][03057] Avg episode reward: [(0, '26.236')] +[2025-09-02 17:32:51,781][03057] Fps is (10 sec: 6551.8, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 19660800. Throughput: 0: 1153.2. Samples: 4926448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:51,782][03057] Avg episode reward: [(0, '24.083')] +[2025-09-02 17:32:56,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 19660800. Throughput: 0: 1133.0. Samples: 4929084. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:32:56,788][03057] Avg episode reward: [(0, '24.545')] +[2025-09-02 17:32:56,797][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000300_19660800.pth... +[2025-09-02 17:32:56,940][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_18612224.pth +[2025-09-02 17:33:01,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19726336. Throughput: 0: 1092.6. Samples: 4934824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:33:01,780][03057] Avg episode reward: [(0, '23.800')] +[2025-09-02 17:33:06,778][03057] Fps is (10 sec: 6555.2, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19726336. Throughput: 0: 1127.1. Samples: 4942660. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:33:06,780][03057] Avg episode reward: [(0, '22.812')] +[2025-09-02 17:33:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 19726336. Throughput: 0: 1148.1. Samples: 4946200. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:33:11,779][03057] Avg episode reward: [(0, '22.306')] +[2025-09-02 17:33:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19791872. Throughput: 0: 1089.1. Samples: 4951076. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:33:16,780][03057] Avg episode reward: [(0, '23.373')] +[2025-09-02 17:33:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19791872. Throughput: 0: 1104.6. Samples: 4959000. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:33:21,786][03057] Avg episode reward: [(0, '23.539')] +[2025-09-02 17:33:26,784][03057] Fps is (10 sec: 0.0, 60 sec: 4368.6, 300 sec: 4220.9). Total num frames: 19791872. Throughput: 0: 1136.2. Samples: 4963040. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:33:26,786][03057] Avg episode reward: [(0, '23.648')] +[2025-09-02 17:33:31,779][03057] Fps is (10 sec: 6552.9, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 19857408. Throughput: 0: 1112.3. Samples: 4968436. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:33:31,781][03057] Avg episode reward: [(0, '25.004')] +[2025-09-02 17:33:36,778][03057] Fps is (10 sec: 6557.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19857408. Throughput: 0: 1091.3. Samples: 4975552. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:33:36,788][03057] Avg episode reward: [(0, '26.193')] +[2025-09-02 17:33:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 19857408. Throughput: 0: 1120.2. Samples: 4979488. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:33:41,780][03057] Avg episode reward: [(0, '27.919')] +[2025-09-02 17:33:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19922944. Throughput: 0: 1130.1. Samples: 4985680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:33:46,780][03057] Avg episode reward: [(0, '27.882')] +[2025-09-02 17:33:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 19922944. Throughput: 0: 1090.3. Samples: 4991724. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:33:51,786][03057] Avg episode reward: [(0, '27.924')] +[2025-09-02 17:33:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 19922944. Throughput: 0: 1100.2. Samples: 4995708. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:33:56,781][03057] Avg episode reward: [(0, '28.532')] +[2025-09-02 17:34:01,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19988480. Throughput: 0: 1152.3. Samples: 5002928. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:01,783][03057] Avg episode reward: [(0, '28.409')] +[2025-09-02 17:34:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19988480. Throughput: 0: 1093.5. Samples: 5008208. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:06,779][03057] Avg episode reward: [(0, '27.424')] +[2025-09-02 17:34:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 19988480. Throughput: 0: 1091.8. Samples: 5012164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:11,781][03057] Avg episode reward: [(0, '26.837')] +[2025-09-02 17:34:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20054016. Throughput: 0: 1140.9. Samples: 5019776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:16,780][03057] Avg episode reward: [(0, '26.551')] +[2025-09-02 17:34:21,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20054016. Throughput: 0: 1106.3. Samples: 5025336. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:21,782][03057] Avg episode reward: [(0, '26.571')] +[2025-09-02 17:34:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.5, 300 sec: 4443.1). Total num frames: 20054016. Throughput: 0: 1085.3. Samples: 5028328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:26,788][03057] Avg episode reward: [(0, '26.756')] +[2025-09-02 17:34:31,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20119552. Throughput: 0: 1116.4. Samples: 5035916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:31,781][03057] Avg episode reward: [(0, '26.577')] +[2025-09-02 17:34:36,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 20119552. Throughput: 0: 1127.4. Samples: 5042460. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:36,780][03057] Avg episode reward: [(0, '28.153')] +[2025-09-02 17:34:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 20119552. Throughput: 0: 1094.8. Samples: 5044972. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:41,779][03057] Avg episode reward: [(0, '28.715')] +[2025-09-02 17:34:46,779][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 20185088. Throughput: 0: 1094.8. Samples: 5052196. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:46,780][03057] Avg episode reward: [(0, '28.550')] +[2025-09-02 17:34:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20185088. Throughput: 0: 1141.9. Samples: 5059592. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:51,784][03057] Avg episode reward: [(0, '29.801')] +[2025-09-02 17:34:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 20185088. Throughput: 0: 1112.3. Samples: 5062216. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:34:56,780][03057] Avg episode reward: [(0, '28.685')] +[2025-09-02 17:34:56,787][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_20185088.pth... +[2025-09-02 17:34:56,923][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_19136512.pth +[2025-09-02 17:35:01,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20250624. Throughput: 0: 1078.8. Samples: 5068320. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:35:01,781][03057] Avg episode reward: [(0, '29.467')] +[2025-09-02 17:35:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20250624. Throughput: 0: 1127.3. Samples: 5076064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:35:06,781][03057] Avg episode reward: [(0, '28.230')] +[2025-09-02 17:35:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 20250624. Throughput: 0: 1131.3. Samples: 5079236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:35:11,780][03057] Avg episode reward: [(0, '27.554')] +[2025-09-02 17:35:12,647][03390] Updated weights for policy 0, policy_version 310 (0.0013) +[2025-09-02 17:35:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20316160. Throughput: 0: 1077.8. Samples: 5084416. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:35:16,780][03057] Avg episode reward: [(0, '27.613')] +[2025-09-02 17:35:21,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20316160. Throughput: 0: 1101.4. Samples: 5092020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:35:21,780][03057] Avg episode reward: [(0, '26.665')] +[2025-09-02 17:35:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20316160. Throughput: 0: 1132.8. Samples: 5095948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:35:26,784][03057] Avg episode reward: [(0, '25.089')] +[2025-09-02 17:35:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20381696. Throughput: 0: 1081.9. Samples: 5100880. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:35:31,781][03057] Avg episode reward: [(0, '25.481')] +[2025-09-02 17:35:36,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20381696. Throughput: 0: 1077.4. Samples: 5108076. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:35:36,780][03057] Avg episode reward: [(0, '25.072')] +[2025-09-02 17:35:41,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 20381696. Throughput: 0: 1105.7. Samples: 5111972. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:35:41,781][03057] Avg episode reward: [(0, '26.390')] +[2025-09-02 17:35:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20447232. Throughput: 0: 1095.6. Samples: 5117620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:35:46,780][03057] Avg episode reward: [(0, '26.930')] +[2025-09-02 17:35:51,778][03057] Fps is (10 sec: 6554.3, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20447232. Throughput: 0: 1060.2. Samples: 5123772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:35:51,780][03057] Avg episode reward: [(0, '27.270')] +[2025-09-02 17:35:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20447232. Throughput: 0: 1077.3. Samples: 5127716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:35:56,780][03057] Avg episode reward: [(0, '26.411')] +[2025-09-02 17:36:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20512768. Throughput: 0: 1109.8. Samples: 5134356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:01,780][03057] Avg episode reward: [(0, '26.014')] +[2025-09-02 17:36:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20512768. Throughput: 0: 1058.7. Samples: 5139660. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:06,780][03057] Avg episode reward: [(0, '24.833')] +[2025-09-02 17:36:11,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 20512768. Throughput: 0: 1055.1. Samples: 5143428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:11,781][03057] Avg episode reward: [(0, '24.562')] +[2025-09-02 17:36:16,781][03057] Fps is (10 sec: 6551.8, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 20578304. Throughput: 0: 1112.6. Samples: 5150948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:16,782][03057] Avg episode reward: [(0, '24.073')] +[2025-09-02 17:36:21,778][03057] Fps is (10 sec: 6554.3, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 20578304. Throughput: 0: 1067.5. Samples: 5156112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:21,785][03057] Avg episode reward: [(0, '25.218')] +[2025-09-02 17:36:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20578304. Throughput: 0: 1047.7. Samples: 5159116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:26,780][03057] Avg episode reward: [(0, '25.091')] +[2025-09-02 17:36:31,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20643840. Throughput: 0: 1090.1. Samples: 5166676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:31,784][03057] Avg episode reward: [(0, '25.568')] +[2025-09-02 17:36:36,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20643840. Throughput: 0: 1096.0. Samples: 5173092. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:36,779][03057] Avg episode reward: [(0, '26.357')] +[2025-09-02 17:36:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 20643840. Throughput: 0: 1065.2. Samples: 5175652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:41,783][03057] Avg episode reward: [(0, '25.424')] +[2025-09-02 17:36:46,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20709376. Throughput: 0: 1074.8. Samples: 5182720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:46,780][03057] Avg episode reward: [(0, '25.197')] +[2025-09-02 17:36:51,779][03057] Fps is (10 sec: 6553.0, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 20709376. Throughput: 0: 1116.6. Samples: 5189908. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:51,780][03057] Avg episode reward: [(0, '24.854')] +[2025-09-02 17:36:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 20709376. Throughput: 0: 1090.1. Samples: 5192480. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:36:56,780][03057] Avg episode reward: [(0, '25.372')] +[2025-09-02 17:36:56,792][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000316_20709376.pth... +[2025-09-02 17:36:56,958][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000300_19660800.pth +[2025-09-02 17:37:01,778][03057] Fps is (10 sec: 6554.2, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20774912. Throughput: 0: 1054.0. Samples: 5198376. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:37:01,782][03057] Avg episode reward: [(0, '26.519')] +[2025-09-02 17:37:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20774912. Throughput: 0: 1113.2. Samples: 5206204. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:37:06,780][03057] Avg episode reward: [(0, '28.001')] +[2025-09-02 17:37:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 20774912. Throughput: 0: 1116.4. Samples: 5209352. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:37:11,780][03057] Avg episode reward: [(0, '27.054')] +[2025-09-02 17:37:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 20840448. Throughput: 0: 1061.7. Samples: 5214452. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:37:16,779][03057] Avg episode reward: [(0, '28.599')] +[2025-09-02 17:37:21,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20840448. Throughput: 0: 1090.3. Samples: 5222156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:37:21,780][03057] Avg episode reward: [(0, '26.880')] +[2025-09-02 17:37:26,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 20840448. Throughput: 0: 1121.0. Samples: 5226096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:37:26,781][03057] Avg episode reward: [(0, '28.011')] +[2025-09-02 17:37:31,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 20905984. Throughput: 0: 1074.0. Samples: 5231052. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:37:31,780][03057] Avg episode reward: [(0, '27.775')] +[2025-09-02 17:37:36,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 20905984. Throughput: 0: 1077.3. Samples: 5238384. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:37:36,784][03057] Avg episode reward: [(0, '28.168')] +[2025-09-02 17:37:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 20905984. Throughput: 0: 1106.3. Samples: 5242264. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:37:41,787][03057] Avg episode reward: [(0, '29.108')] +[2025-09-02 17:37:42,718][03390] Updated weights for policy 0, policy_version 320 (0.0022) +[2025-09-02 17:37:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 20971520. Throughput: 0: 1108.3. Samples: 5248248. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:37:46,780][03057] Avg episode reward: [(0, '28.665')] +[2025-09-02 17:37:51,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 20971520. Throughput: 0: 1068.8. Samples: 5254300. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:37:51,787][03057] Avg episode reward: [(0, '28.281')] +[2025-09-02 17:37:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 20971520. Throughput: 0: 1085.1. Samples: 5258180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:37:56,780][03057] Avg episode reward: [(0, '28.807')] +[2025-09-02 17:38:01,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21037056. Throughput: 0: 1127.2. Samples: 5265176. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:01,782][03057] Avg episode reward: [(0, '28.507')] +[2025-09-02 17:38:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21037056. Throughput: 0: 1075.1. Samples: 5270536. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:06,785][03057] Avg episode reward: [(0, '28.032')] +[2025-09-02 17:38:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21037056. Throughput: 0: 1077.1. Samples: 5274564. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:11,786][03057] Avg episode reward: [(0, '27.476')] +[2025-09-02 17:38:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21102592. Throughput: 0: 1132.6. Samples: 5282016. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:16,784][03057] Avg episode reward: [(0, '27.442')] +[2025-09-02 17:38:21,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4443.2). Total num frames: 21102592. Throughput: 0: 1087.2. Samples: 5287308. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:21,780][03057] Avg episode reward: [(0, '25.773')] +[2025-09-02 17:38:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21102592. Throughput: 0: 1069.9. Samples: 5290408. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:26,786][03057] Avg episode reward: [(0, '26.489')] +[2025-09-02 17:38:31,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21168128. Throughput: 0: 1098.7. Samples: 5297688. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:31,780][03057] Avg episode reward: [(0, '29.110')] +[2025-09-02 17:38:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21168128. Throughput: 0: 1105.9. Samples: 5304064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:36,780][03057] Avg episode reward: [(0, '29.357')] +[2025-09-02 17:38:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21168128. Throughput: 0: 1074.6. Samples: 5306536. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:41,790][03057] Avg episode reward: [(0, '29.093')] +[2025-09-02 17:38:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21233664. Throughput: 0: 1076.0. Samples: 5313596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:46,779][03057] Avg episode reward: [(0, '30.322')] +[2025-09-02 17:38:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21233664. Throughput: 0: 1120.5. Samples: 5320960. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:51,785][03057] Avg episode reward: [(0, '29.168')] +[2025-09-02 17:38:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21233664. Throughput: 0: 1089.1. Samples: 5323572. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:38:56,782][03057] Avg episode reward: [(0, '28.677')] +[2025-09-02 17:38:56,792][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000324_21233664.pth... +[2025-09-02 17:38:56,970][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000308_20185088.pth +[2025-09-02 17:39:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21299200. Throughput: 0: 1051.9. Samples: 5329352. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:39:01,779][03057] Avg episode reward: [(0, '28.893')] +[2025-09-02 17:39:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21299200. Throughput: 0: 1110.3. Samples: 5337272. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:39:06,779][03057] Avg episode reward: [(0, '28.411')] +[2025-09-02 17:39:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21299200. Throughput: 0: 1113.0. Samples: 5340492. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:39:11,780][03057] Avg episode reward: [(0, '28.361')] +[2025-09-02 17:39:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21364736. Throughput: 0: 1060.7. Samples: 5345420. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:39:16,779][03057] Avg episode reward: [(0, '27.898')] +[2025-09-02 17:39:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21364736. Throughput: 0: 1089.7. Samples: 5353100. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:39:21,781][03057] Avg episode reward: [(0, '28.940')] +[2025-09-02 17:39:26,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 21364736. Throughput: 0: 1118.7. Samples: 5356880. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:39:26,789][03057] Avg episode reward: [(0, '27.546')] +[2025-09-02 17:39:31,782][03057] Fps is (10 sec: 6551.4, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 21430272. Throughput: 0: 1075.7. Samples: 5362004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:39:31,783][03057] Avg episode reward: [(0, '27.727')] +[2025-09-02 17:39:36,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21430272. Throughput: 0: 1062.0. Samples: 5368748. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:39:36,779][03057] Avg episode reward: [(0, '27.626')] +[2025-09-02 17:39:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21430272. Throughput: 0: 1088.5. Samples: 5372556. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:39:41,785][03057] Avg episode reward: [(0, '27.384')] +[2025-09-02 17:39:46,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 21495808. Throughput: 0: 1096.6. Samples: 5378700. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:39:46,781][03057] Avg episode reward: [(0, '25.929')] +[2025-09-02 17:39:51,781][03057] Fps is (10 sec: 6551.7, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 21495808. Throughput: 0: 1054.5. Samples: 5384728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:39:51,786][03057] Avg episode reward: [(0, '25.904')] +[2025-09-02 17:39:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21495808. Throughput: 0: 1065.4. Samples: 5388436. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:39:56,784][03057] Avg episode reward: [(0, '27.475')] +[2025-09-02 17:40:01,779][03057] Fps is (10 sec: 6555.0, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 21561344. Throughput: 0: 1112.5. Samples: 5395484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:01,780][03057] Avg episode reward: [(0, '27.740')] +[2025-09-02 17:40:06,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 21561344. Throughput: 0: 1056.7. Samples: 5400652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:06,785][03057] Avg episode reward: [(0, '28.190')] +[2025-09-02 17:40:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21561344. Throughput: 0: 1060.2. Samples: 5404584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:11,783][03057] Avg episode reward: [(0, '28.959')] +[2025-09-02 17:40:14,655][03390] Updated weights for policy 0, policy_version 330 (0.0023) +[2025-09-02 17:40:16,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21626880. Throughput: 0: 1112.7. Samples: 5412072. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:16,780][03057] Avg episode reward: [(0, '28.087')] +[2025-09-02 17:40:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 21626880. Throughput: 0: 1088.3. Samples: 5417720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:21,783][03057] Avg episode reward: [(0, '26.385')] +[2025-09-02 17:40:26,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 21626880. Throughput: 0: 1068.5. Samples: 5420640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:26,780][03057] Avg episode reward: [(0, '25.627')] +[2025-09-02 17:40:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 21692416. Throughput: 0: 1096.6. Samples: 5428044. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:31,780][03057] Avg episode reward: [(0, '25.757')] +[2025-09-02 17:40:36,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21692416. Throughput: 0: 1111.9. Samples: 5434760. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:36,785][03057] Avg episode reward: [(0, '26.046')] +[2025-09-02 17:40:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21692416. Throughput: 0: 1084.4. Samples: 5437236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:41,780][03057] Avg episode reward: [(0, '26.260')] +[2025-09-02 17:40:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 21757952. Throughput: 0: 1077.4. Samples: 5443968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:46,781][03057] Avg episode reward: [(0, '27.017')] +[2025-09-02 17:40:51,782][03057] Fps is (10 sec: 6551.1, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 21757952. Throughput: 0: 1133.8. Samples: 5451676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:51,783][03057] Avg episode reward: [(0, '27.389')] +[2025-09-02 17:40:56,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 21757952. Throughput: 0: 1100.0. Samples: 5454084. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:40:56,780][03057] Avg episode reward: [(0, '27.467')] +[2025-09-02 17:40:56,796][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000332_21757952.pth... +[2025-09-02 17:40:56,952][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000316_20709376.pth +[2025-09-02 17:41:01,778][03057] Fps is (10 sec: 6556.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21823488. Throughput: 0: 1056.9. Samples: 5459632. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:41:01,782][03057] Avg episode reward: [(0, '28.662')] +[2025-09-02 17:41:06,780][03057] Fps is (10 sec: 6552.9, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 21823488. Throughput: 0: 1101.2. Samples: 5467276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:41:06,781][03057] Avg episode reward: [(0, '27.583')] +[2025-09-02 17:41:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21823488. Throughput: 0: 1109.9. Samples: 5470584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:41:11,780][03057] Avg episode reward: [(0, '27.512')] +[2025-09-02 17:41:16,780][03057] Fps is (10 sec: 6553.6, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 21889024. Throughput: 0: 1052.1. Samples: 5475392. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:41:16,785][03057] Avg episode reward: [(0, '29.814')] +[2025-09-02 17:41:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21889024. Throughput: 0: 1072.5. Samples: 5483024. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:41:21,782][03057] Avg episode reward: [(0, '29.473')] +[2025-09-02 17:41:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21889024. Throughput: 0: 1099.5. Samples: 5486712. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:41:26,781][03057] Avg episode reward: [(0, '26.958')] +[2025-09-02 17:41:31,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21954560. Throughput: 0: 1065.3. Samples: 5491908. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:41:31,781][03057] Avg episode reward: [(0, '28.364')] +[2025-09-02 17:41:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 21954560. Throughput: 0: 1048.1. Samples: 5498836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:41:36,787][03057] Avg episode reward: [(0, '26.743')] +[2025-09-02 17:41:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 21954560. Throughput: 0: 1082.2. Samples: 5502784. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:41:41,780][03057] Avg episode reward: [(0, '25.554')] +[2025-09-02 17:41:46,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 22020096. Throughput: 0: 1092.7. Samples: 5508804. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:41:46,780][03057] Avg episode reward: [(0, '26.017')] +[2025-09-02 17:41:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 22020096. Throughput: 0: 1058.1. Samples: 5514888. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:41:51,780][03057] Avg episode reward: [(0, '25.750')] +[2025-09-02 17:41:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 22020096. Throughput: 0: 1066.8. Samples: 5518592. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:41:56,780][03057] Avg episode reward: [(0, '26.183')] +[2025-09-02 17:42:01,780][03057] Fps is (10 sec: 6552.4, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 22085632. Throughput: 0: 1114.7. Samples: 5525552. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:42:01,781][03057] Avg episode reward: [(0, '26.763')] +[2025-09-02 17:42:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 22085632. Throughput: 0: 1059.1. Samples: 5530684. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:42:06,783][03057] Avg episode reward: [(0, '27.657')] +[2025-09-02 17:42:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 22085632. Throughput: 0: 1064.4. Samples: 5534612. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:42:11,781][03057] Avg episode reward: [(0, '27.114')] +[2025-09-02 17:42:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 22151168. Throughput: 0: 1108.7. Samples: 5541800. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:42:16,780][03057] Avg episode reward: [(0, '26.814')] +[2025-09-02 17:42:21,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 22151168. Throughput: 0: 1079.8. Samples: 5547428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:42:21,781][03057] Avg episode reward: [(0, '26.204')] +[2025-09-02 17:42:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 22151168. Throughput: 0: 1054.0. Samples: 5550212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:42:26,779][03057] Avg episode reward: [(0, '28.409')] +[2025-09-02 17:42:31,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22216704. Throughput: 0: 1085.6. Samples: 5557656. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:42:31,780][03057] Avg episode reward: [(0, '28.105')] +[2025-09-02 17:42:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22216704. Throughput: 0: 1093.1. Samples: 5564076. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:42:36,780][03057] Avg episode reward: [(0, '28.371')] +[2025-09-02 17:42:41,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 22216704. Throughput: 0: 1066.2. Samples: 5566572. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:42:41,788][03057] Avg episode reward: [(0, '27.888')] +[2025-09-02 17:42:46,328][03390] Updated weights for policy 0, policy_version 340 (0.0013) +[2025-09-02 17:42:46,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22282240. Throughput: 0: 1072.7. Samples: 5573824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:42:46,784][03057] Avg episode reward: [(0, '28.302')] +[2025-09-02 17:42:51,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22282240. Throughput: 0: 1115.3. Samples: 5580872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:42:51,784][03057] Avg episode reward: [(0, '26.478')] +[2025-09-02 17:42:56,780][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 22282240. Throughput: 0: 1085.6. Samples: 5583464. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:42:56,781][03057] Avg episode reward: [(0, '25.123')] +[2025-09-02 17:42:56,793][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000340_22282240.pth... +[2025-09-02 17:42:56,943][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000324_21233664.pth +[2025-09-02 17:43:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 22347776. Throughput: 0: 1063.2. Samples: 5589644. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:01,780][03057] Avg episode reward: [(0, '26.009')] +[2025-09-02 17:43:06,779][03057] Fps is (10 sec: 6554.5, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 22347776. Throughput: 0: 1102.7. Samples: 5597048. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:06,784][03057] Avg episode reward: [(0, '26.436')] +[2025-09-02 17:43:11,787][03057] Fps is (10 sec: 0.0, 60 sec: 4368.4, 300 sec: 4220.8). Total num frames: 22347776. Throughput: 0: 1112.7. Samples: 5600292. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:11,790][03057] Avg episode reward: [(0, '25.023')] +[2025-09-02 17:43:16,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22413312. Throughput: 0: 1065.7. Samples: 5605612. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:16,779][03057] Avg episode reward: [(0, '25.728')] +[2025-09-02 17:43:21,778][03057] Fps is (10 sec: 6559.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22413312. Throughput: 0: 1090.1. Samples: 5613132. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:21,779][03057] Avg episode reward: [(0, '26.435')] +[2025-09-02 17:43:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 22413312. Throughput: 0: 1122.3. Samples: 5617076. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:26,788][03057] Avg episode reward: [(0, '25.306')] +[2025-09-02 17:43:31,781][03057] Fps is (10 sec: 6551.6, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 22478848. Throughput: 0: 1078.2. Samples: 5622344. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:31,782][03057] Avg episode reward: [(0, '25.610')] +[2025-09-02 17:43:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22478848. Throughput: 0: 1071.9. Samples: 5629108. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:36,780][03057] Avg episode reward: [(0, '26.586')] +[2025-09-02 17:43:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 22478848. Throughput: 0: 1101.5. Samples: 5633028. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:41,781][03057] Avg episode reward: [(0, '27.319')] +[2025-09-02 17:43:46,780][03057] Fps is (10 sec: 6552.6, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 22544384. Throughput: 0: 1101.6. Samples: 5639216. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:46,781][03057] Avg episode reward: [(0, '27.699')] +[2025-09-02 17:43:51,781][03057] Fps is (10 sec: 6551.9, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 22544384. Throughput: 0: 1072.6. Samples: 5645316. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:51,782][03057] Avg episode reward: [(0, '28.811')] +[2025-09-02 17:43:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 22544384. Throughput: 0: 1087.0. Samples: 5649196. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:43:56,787][03057] Avg episode reward: [(0, '30.846')] +[2025-09-02 17:43:56,797][03375] Saving new best policy, reward=30.846! +[2025-09-02 17:44:01,779][03057] Fps is (10 sec: 6555.2, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 22609920. Throughput: 0: 1121.4. Samples: 5656076. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:44:01,784][03057] Avg episode reward: [(0, '32.215')] +[2025-09-02 17:44:01,786][03375] Saving new best policy, reward=32.215! +[2025-09-02 17:44:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22609920. Throughput: 0: 1071.3. Samples: 5661340. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:44:06,788][03057] Avg episode reward: [(0, '31.217')] +[2025-09-02 17:44:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.7, 300 sec: 4221.0). Total num frames: 22609920. Throughput: 0: 1069.9. Samples: 5665220. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:44:11,785][03057] Avg episode reward: [(0, '31.237')] +[2025-09-02 17:44:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22675456. Throughput: 0: 1121.1. Samples: 5672792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:44:16,781][03057] Avg episode reward: [(0, '31.176')] +[2025-09-02 17:44:21,780][03057] Fps is (10 sec: 6552.5, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 22675456. Throughput: 0: 1096.4. Samples: 5678448. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:44:21,781][03057] Avg episode reward: [(0, '30.244')] +[2025-09-02 17:44:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 22675456. Throughput: 0: 1074.6. Samples: 5681384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:44:26,780][03057] Avg episode reward: [(0, '29.600')] +[2025-09-02 17:44:31,778][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 22740992. Throughput: 0: 1097.0. Samples: 5688580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:44:31,789][03057] Avg episode reward: [(0, '29.432')] +[2025-09-02 17:44:36,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22740992. Throughput: 0: 1106.8. Samples: 5695120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:44:36,779][03057] Avg episode reward: [(0, '28.454')] +[2025-09-02 17:44:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 22740992. Throughput: 0: 1076.7. Samples: 5697648. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:44:41,780][03057] Avg episode reward: [(0, '29.082')] +[2025-09-02 17:44:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.2). Total num frames: 22806528. Throughput: 0: 1080.4. Samples: 5704692. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:44:46,781][03057] Avg episode reward: [(0, '27.828')] +[2025-09-02 17:44:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 22806528. Throughput: 0: 1130.6. Samples: 5712216. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:44:51,783][03057] Avg episode reward: [(0, '28.503')] +[2025-09-02 17:44:56,780][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 22806528. Throughput: 0: 1102.3. Samples: 5714824. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:44:56,783][03057] Avg episode reward: [(0, '29.149')] +[2025-09-02 17:44:56,795][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000348_22806528.pth... +[2025-09-02 17:44:56,988][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000332_21757952.pth +[2025-09-02 17:45:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22872064. Throughput: 0: 1062.0. Samples: 5720584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:01,779][03057] Avg episode reward: [(0, '29.619')] +[2025-09-02 17:45:06,778][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 22872064. Throughput: 0: 1107.5. Samples: 5728284. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:06,780][03057] Avg episode reward: [(0, '29.513')] +[2025-09-02 17:45:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 22872064. Throughput: 0: 1118.8. Samples: 5731728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:11,780][03057] Avg episode reward: [(0, '28.250')] +[2025-09-02 17:45:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 22872064. Throughput: 0: 1072.4. Samples: 5736840. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:16,788][03057] Avg episode reward: [(0, '28.488')] +[2025-09-02 17:45:16,845][03390] Updated weights for policy 0, policy_version 350 (0.0024) +[2025-09-02 17:45:21,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 22937600. Throughput: 0: 1092.8. Samples: 5744296. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:21,790][03057] Avg episode reward: [(0, '26.985')] +[2025-09-02 17:45:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 22937600. Throughput: 0: 1123.8. Samples: 5748220. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:26,779][03057] Avg episode reward: [(0, '26.350')] +[2025-09-02 17:45:31,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23003136. Throughput: 0: 1087.5. Samples: 5753628. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:31,780][03057] Avg episode reward: [(0, '26.526')] +[2025-09-02 17:45:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23003136. Throughput: 0: 1070.8. Samples: 5760404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:36,780][03057] Avg episode reward: [(0, '29.077')] +[2025-09-02 17:45:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23003136. Throughput: 0: 1099.0. Samples: 5764276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:41,779][03057] Avg episode reward: [(0, '28.003')] +[2025-09-02 17:45:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 23068672. Throughput: 0: 1110.7. Samples: 5770564. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:46,780][03057] Avg episode reward: [(0, '28.356')] +[2025-09-02 17:45:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23068672. Throughput: 0: 1075.3. Samples: 5776672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:51,785][03057] Avg episode reward: [(0, '27.085')] +[2025-09-02 17:45:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 23068672. Throughput: 0: 1083.7. Samples: 5780496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:45:56,780][03057] Avg episode reward: [(0, '26.506')] +[2025-09-02 17:46:01,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 23134208. Throughput: 0: 1125.3. Samples: 5787480. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:01,784][03057] Avg episode reward: [(0, '26.868')] +[2025-09-02 17:46:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23134208. Throughput: 0: 1073.7. Samples: 5792612. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:06,781][03057] Avg episode reward: [(0, '27.295')] +[2025-09-02 17:46:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23134208. Throughput: 0: 1072.8. Samples: 5796496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:11,780][03057] Avg episode reward: [(0, '27.786')] +[2025-09-02 17:46:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 23199744. Throughput: 0: 1118.5. Samples: 5803960. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:16,780][03057] Avg episode reward: [(0, '28.061')] +[2025-09-02 17:46:21,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23199744. Throughput: 0: 1092.9. Samples: 5809584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:21,780][03057] Avg episode reward: [(0, '28.531')] +[2025-09-02 17:46:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23199744. Throughput: 0: 1071.8. Samples: 5812508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:26,783][03057] Avg episode reward: [(0, '28.485')] +[2025-09-02 17:46:31,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23265280. Throughput: 0: 1092.4. Samples: 5819720. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:31,780][03057] Avg episode reward: [(0, '26.820')] +[2025-09-02 17:46:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23265280. Throughput: 0: 1102.0. Samples: 5826264. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:36,780][03057] Avg episode reward: [(0, '28.347')] +[2025-09-02 17:46:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23265280. Throughput: 0: 1074.9. Samples: 5828868. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:41,779][03057] Avg episode reward: [(0, '28.830')] +[2025-09-02 17:46:46,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23330816. Throughput: 0: 1077.8. Samples: 5835980. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:46,780][03057] Avg episode reward: [(0, '28.491')] +[2025-09-02 17:46:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23330816. Throughput: 0: 1128.5. Samples: 5843396. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:51,780][03057] Avg episode reward: [(0, '28.871')] +[2025-09-02 17:46:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23330816. Throughput: 0: 1100.6. Samples: 5846024. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:46:56,781][03057] Avg episode reward: [(0, '29.199')] +[2025-09-02 17:46:56,787][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000356_23330816.pth... +[2025-09-02 17:46:56,944][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000340_22282240.pth +[2025-09-02 17:47:01,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23396352. Throughput: 0: 1062.9. Samples: 5851792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:01,780][03057] Avg episode reward: [(0, '28.070')] +[2025-09-02 17:47:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23396352. Throughput: 0: 1102.1. Samples: 5859180. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:06,782][03057] Avg episode reward: [(0, '27.248')] +[2025-09-02 17:47:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23396352. Throughput: 0: 1111.9. Samples: 5862544. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:11,779][03057] Avg episode reward: [(0, '27.813')] +[2025-09-02 17:47:16,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 23461888. Throughput: 0: 1069.4. Samples: 5867844. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:16,780][03057] Avg episode reward: [(0, '28.162')] +[2025-09-02 17:47:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23461888. Throughput: 0: 1090.7. Samples: 5875344. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:21,779][03057] Avg episode reward: [(0, '27.348')] +[2025-09-02 17:47:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23461888. Throughput: 0: 1121.2. Samples: 5879324. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:26,781][03057] Avg episode reward: [(0, '27.865')] +[2025-09-02 17:47:31,783][03057] Fps is (10 sec: 6550.4, 60 sec: 4368.7, 300 sec: 4443.0). Total num frames: 23527424. Throughput: 0: 1084.8. Samples: 5884800. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:31,785][03057] Avg episode reward: [(0, '28.882')] +[2025-09-02 17:47:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23527424. Throughput: 0: 1065.8. Samples: 5891356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:36,781][03057] Avg episode reward: [(0, '28.018')] +[2025-09-02 17:47:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23527424. Throughput: 0: 1095.7. Samples: 5895332. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:41,779][03057] Avg episode reward: [(0, '29.756')] +[2025-09-02 17:47:45,031][03390] Updated weights for policy 0, policy_version 360 (0.0025) +[2025-09-02 17:47:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23592960. Throughput: 0: 1106.8. Samples: 5901596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:46,782][03057] Avg episode reward: [(0, '29.038')] +[2025-09-02 17:47:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23592960. Throughput: 0: 1074.0. Samples: 5907508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:51,781][03057] Avg episode reward: [(0, '30.876')] +[2025-09-02 17:47:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23592960. Throughput: 0: 1081.2. Samples: 5911196. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:47:56,782][03057] Avg episode reward: [(0, '30.289')] +[2025-09-02 17:48:01,782][03057] Fps is (10 sec: 6551.2, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 23658496. Throughput: 0: 1122.1. Samples: 5918344. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:01,784][03057] Avg episode reward: [(0, '31.015')] +[2025-09-02 17:48:06,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4443.2). Total num frames: 23658496. Throughput: 0: 1071.4. Samples: 5923556. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:06,784][03057] Avg episode reward: [(0, '31.254')] +[2025-09-02 17:48:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23658496. Throughput: 0: 1061.9. Samples: 5927108. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:11,781][03057] Avg episode reward: [(0, '29.565')] +[2025-09-02 17:48:16,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23724032. Throughput: 0: 1105.3. Samples: 5934532. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:16,781][03057] Avg episode reward: [(0, '29.320')] +[2025-09-02 17:48:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23724032. Throughput: 0: 1086.6. Samples: 5940252. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:21,779][03057] Avg episode reward: [(0, '28.793')] +[2025-09-02 17:48:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23724032. Throughput: 0: 1058.1. Samples: 5942948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:26,780][03057] Avg episode reward: [(0, '28.166')] +[2025-09-02 17:48:31,780][03057] Fps is (10 sec: 6552.4, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 23789568. Throughput: 0: 1082.8. Samples: 5950324. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:31,781][03057] Avg episode reward: [(0, '26.197')] +[2025-09-02 17:48:36,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 23789568. Throughput: 0: 1099.6. Samples: 5956992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:36,780][03057] Avg episode reward: [(0, '27.614')] +[2025-09-02 17:48:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23789568. Throughput: 0: 1072.7. Samples: 5959468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:41,785][03057] Avg episode reward: [(0, '26.987')] +[2025-09-02 17:48:46,778][03057] Fps is (10 sec: 6554.0, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 23855104. Throughput: 0: 1061.7. Samples: 5966116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:46,780][03057] Avg episode reward: [(0, '26.635')] +[2025-09-02 17:48:51,780][03057] Fps is (10 sec: 6552.8, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 23855104. Throughput: 0: 1107.0. Samples: 5973372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:51,782][03057] Avg episode reward: [(0, '27.979')] +[2025-09-02 17:48:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23855104. Throughput: 0: 1084.3. Samples: 5975900. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:48:56,787][03057] Avg episode reward: [(0, '27.934')] +[2025-09-02 17:48:56,796][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000364_23855104.pth... +[2025-09-02 17:48:56,949][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000348_22806528.pth +[2025-09-02 17:49:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3277.0, 300 sec: 4221.0). Total num frames: 23855104. Throughput: 0: 1045.3. Samples: 5981572. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:49:01,782][03057] Avg episode reward: [(0, '29.710')] +[2025-09-02 17:49:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23920640. Throughput: 0: 1079.9. Samples: 5988848. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:49:06,779][03057] Avg episode reward: [(0, '29.130')] +[2025-09-02 17:49:11,779][03057] Fps is (10 sec: 6553.2, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 23920640. Throughput: 0: 1099.3. Samples: 5992416. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:49:11,780][03057] Avg episode reward: [(0, '30.362')] +[2025-09-02 17:49:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 23920640. Throughput: 0: 1050.1. Samples: 5997576. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:49:16,779][03057] Avg episode reward: [(0, '30.149')] +[2025-09-02 17:49:21,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 23986176. Throughput: 0: 1064.2. Samples: 6004880. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:49:21,780][03057] Avg episode reward: [(0, '31.803')] +[2025-09-02 17:49:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 23986176. Throughput: 0: 1095.5. Samples: 6008764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:49:26,781][03057] Avg episode reward: [(0, '31.259')] +[2025-09-02 17:49:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 24051712. Throughput: 0: 1077.5. Samples: 6014604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:49:31,780][03057] Avg episode reward: [(0, '30.195')] +[2025-09-02 17:49:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24051712. Throughput: 0: 1057.1. Samples: 6020940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:49:36,780][03057] Avg episode reward: [(0, '30.530')] +[2025-09-02 17:49:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24051712. Throughput: 0: 1088.8. Samples: 6024896. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:49:41,780][03057] Avg episode reward: [(0, '30.008')] +[2025-09-02 17:49:46,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24117248. Throughput: 0: 1114.1. Samples: 6031704. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:49:46,780][03057] Avg episode reward: [(0, '31.134')] +[2025-09-02 17:49:51,782][03057] Fps is (10 sec: 6551.2, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 24117248. Throughput: 0: 1074.2. Samples: 6037192. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:49:51,785][03057] Avg episode reward: [(0, '29.553')] +[2025-09-02 17:49:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24117248. Throughput: 0: 1079.3. Samples: 6040984. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:49:56,780][03057] Avg episode reward: [(0, '30.545')] +[2025-09-02 17:50:01,781][03057] Fps is (10 sec: 6554.2, 60 sec: 5461.1, 300 sec: 4443.1). Total num frames: 24182784. Throughput: 0: 1132.6. Samples: 6048548. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:01,782][03057] Avg episode reward: [(0, '30.029')] +[2025-09-02 17:50:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24182784. Throughput: 0: 1080.9. Samples: 6053520. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:06,783][03057] Avg episode reward: [(0, '28.879')] +[2025-09-02 17:50:11,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24182784. Throughput: 0: 1074.5. Samples: 6057116. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:11,781][03057] Avg episode reward: [(0, '28.403')] +[2025-09-02 17:50:16,100][03390] Updated weights for policy 0, policy_version 370 (0.0032) +[2025-09-02 17:50:16,779][03057] Fps is (10 sec: 6553.3, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 24248320. Throughput: 0: 1119.2. Samples: 6064968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:16,780][03057] Avg episode reward: [(0, '29.006')] +[2025-09-02 17:50:21,781][03057] Fps is (10 sec: 6552.4, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 24248320. Throughput: 0: 1109.5. Samples: 6070872. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:21,782][03057] Avg episode reward: [(0, '30.219')] +[2025-09-02 17:50:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24248320. Throughput: 0: 1081.2. Samples: 6073548. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:26,780][03057] Avg episode reward: [(0, '29.681')] +[2025-09-02 17:50:31,778][03057] Fps is (10 sec: 6555.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24313856. Throughput: 0: 1096.7. Samples: 6081056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:31,779][03057] Avg episode reward: [(0, '30.016')] +[2025-09-02 17:50:36,781][03057] Fps is (10 sec: 6552.0, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 24313856. Throughput: 0: 1121.4. Samples: 6087652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:36,782][03057] Avg episode reward: [(0, '29.977')] +[2025-09-02 17:50:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24313856. Throughput: 0: 1092.6. Samples: 6090152. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:41,780][03057] Avg episode reward: [(0, '29.615')] +[2025-09-02 17:50:46,778][03057] Fps is (10 sec: 6555.2, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24379392. Throughput: 0: 1081.4. Samples: 6097208. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:46,780][03057] Avg episode reward: [(0, '29.738')] +[2025-09-02 17:50:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 24379392. Throughput: 0: 1138.3. Samples: 6104744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:51,783][03057] Avg episode reward: [(0, '29.556')] +[2025-09-02 17:50:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24379392. Throughput: 0: 1116.5. Samples: 6107356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:50:56,782][03057] Avg episode reward: [(0, '28.332')] +[2025-09-02 17:50:56,793][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000372_24379392.pth... +[2025-09-02 17:50:56,968][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000356_23330816.pth +[2025-09-02 17:51:01,779][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 24444928. Throughput: 0: 1074.5. Samples: 6113320. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:51:01,787][03057] Avg episode reward: [(0, '30.100')] +[2025-09-02 17:51:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24444928. Throughput: 0: 1109.0. Samples: 6120776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:51:06,779][03057] Avg episode reward: [(0, '29.684')] +[2025-09-02 17:51:11,781][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 24444928. Throughput: 0: 1127.4. Samples: 6124284. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:51:11,786][03057] Avg episode reward: [(0, '28.624')] +[2025-09-02 17:51:16,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 24444928. Throughput: 0: 1074.7. Samples: 6129416. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:51:16,780][03057] Avg episode reward: [(0, '29.050')] +[2025-09-02 17:51:21,779][03057] Fps is (10 sec: 6555.2, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 24510464. Throughput: 0: 1092.0. Samples: 6136788. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:51:21,782][03057] Avg episode reward: [(0, '29.736')] +[2025-09-02 17:51:26,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24510464. Throughput: 0: 1120.9. Samples: 6140592. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:51:26,783][03057] Avg episode reward: [(0, '30.365')] +[2025-09-02 17:51:31,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24576000. Throughput: 0: 1084.4. Samples: 6146008. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:51:31,781][03057] Avg episode reward: [(0, '28.386')] +[2025-09-02 17:51:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 24576000. Throughput: 0: 1058.8. Samples: 6152388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:51:36,779][03057] Avg episode reward: [(0, '29.339')] +[2025-09-02 17:51:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24576000. Throughput: 0: 1083.7. Samples: 6156124. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:51:41,780][03057] Avg episode reward: [(0, '29.171')] +[2025-09-02 17:51:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24641536. Throughput: 0: 1100.7. Samples: 6162852. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:51:46,781][03057] Avg episode reward: [(0, '28.834')] +[2025-09-02 17:51:51,780][03057] Fps is (10 sec: 6552.1, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 24641536. Throughput: 0: 1058.3. Samples: 6168400. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:51:51,782][03057] Avg episode reward: [(0, '27.407')] +[2025-09-02 17:51:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24641536. Throughput: 0: 1067.6. Samples: 6172324. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:51:56,788][03057] Avg episode reward: [(0, '29.112')] +[2025-09-02 17:52:01,778][03057] Fps is (10 sec: 6555.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24707072. Throughput: 0: 1113.2. Samples: 6179508. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:52:01,787][03057] Avg episode reward: [(0, '28.564')] +[2025-09-02 17:52:06,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24707072. Throughput: 0: 1063.5. Samples: 6184644. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:52:06,781][03057] Avg episode reward: [(0, '29.149')] +[2025-09-02 17:52:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 24707072. Throughput: 0: 1058.0. Samples: 6188204. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:52:11,780][03057] Avg episode reward: [(0, '29.010')] +[2025-09-02 17:52:16,779][03057] Fps is (10 sec: 6553.4, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 24772608. Throughput: 0: 1104.3. Samples: 6195700. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:52:16,781][03057] Avg episode reward: [(0, '29.281')] +[2025-09-02 17:52:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24772608. Throughput: 0: 1095.0. Samples: 6201664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:52:21,780][03057] Avg episode reward: [(0, '28.814')] +[2025-09-02 17:52:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24772608. Throughput: 0: 1072.2. Samples: 6204372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:52:26,782][03057] Avg episode reward: [(0, '28.863')] +[2025-09-02 17:52:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24838144. Throughput: 0: 1098.8. Samples: 6212300. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:52:31,780][03057] Avg episode reward: [(0, '29.087')] +[2025-09-02 17:52:36,781][03057] Fps is (10 sec: 6551.9, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 24838144. Throughput: 0: 1116.8. Samples: 6218656. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:52:36,787][03057] Avg episode reward: [(0, '30.432')] +[2025-09-02 17:52:41,782][03057] Fps is (10 sec: 0.0, 60 sec: 4368.8, 300 sec: 4220.9). Total num frames: 24838144. Throughput: 0: 1084.9. Samples: 6221148. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:52:41,783][03057] Avg episode reward: [(0, '30.920')] +[2025-09-02 17:52:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 24838144. Throughput: 0: 1079.8. Samples: 6228100. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:52:46,789][03057] Avg episode reward: [(0, '33.822')] +[2025-09-02 17:52:47,190][03375] Saving new best policy, reward=33.822! +[2025-09-02 17:52:47,203][03390] Updated weights for policy 0, policy_version 380 (0.0017) +[2025-09-02 17:52:51,778][03057] Fps is (10 sec: 6556.2, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 24903680. Throughput: 0: 1128.3. Samples: 6235416. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:52:51,780][03057] Avg episode reward: [(0, '34.035')] +[2025-09-02 17:52:51,782][03375] Saving new best policy, reward=34.035! +[2025-09-02 17:52:56,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24903680. Throughput: 0: 1104.0. Samples: 6237884. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:52:56,780][03057] Avg episode reward: [(0, '33.082')] +[2025-09-02 17:52:56,790][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000380_24903680.pth... +[2025-09-02 17:52:56,946][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000364_23855104.pth +[2025-09-02 17:53:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 24903680. Throughput: 0: 1069.2. Samples: 6243812. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:01,780][03057] Avg episode reward: [(0, '32.724')] +[2025-09-02 17:53:06,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 24969216. Throughput: 0: 1104.3. Samples: 6251356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:06,783][03057] Avg episode reward: [(0, '32.965')] +[2025-09-02 17:53:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 24969216. Throughput: 0: 1121.4. Samples: 6254836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:11,785][03057] Avg episode reward: [(0, '33.045')] +[2025-09-02 17:53:16,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 24969216. Throughput: 0: 1059.2. Samples: 6259964. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:16,780][03057] Avg episode reward: [(0, '31.871')] +[2025-09-02 17:53:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25034752. Throughput: 0: 1081.3. Samples: 6267312. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:21,779][03057] Avg episode reward: [(0, '32.758')] +[2025-09-02 17:53:26,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 25034752. Throughput: 0: 1112.1. Samples: 6271188. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:26,784][03057] Avg episode reward: [(0, '32.879')] +[2025-09-02 17:53:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 25034752. Throughput: 0: 1089.7. Samples: 6277136. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:31,784][03057] Avg episode reward: [(0, '30.512')] +[2025-09-02 17:53:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4443.1). Total num frames: 25100288. Throughput: 0: 1070.8. Samples: 6283604. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:36,780][03057] Avg episode reward: [(0, '30.145')] +[2025-09-02 17:53:41,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.4, 300 sec: 4221.0). Total num frames: 25100288. Throughput: 0: 1105.7. Samples: 6287640. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:41,781][03057] Avg episode reward: [(0, '28.962')] +[2025-09-02 17:53:46,779][03057] Fps is (10 sec: 6553.2, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 25165824. Throughput: 0: 1111.7. Samples: 6293840. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:46,780][03057] Avg episode reward: [(0, '27.275')] +[2025-09-02 17:53:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25165824. Throughput: 0: 1070.6. Samples: 6299532. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:51,779][03057] Avg episode reward: [(0, '27.565')] +[2025-09-02 17:53:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25165824. Throughput: 0: 1079.8. Samples: 6303428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:53:56,780][03057] Avg episode reward: [(0, '27.865')] +[2025-09-02 17:54:01,778][03057] Fps is (10 sec: 6553.5, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 25231360. Throughput: 0: 1120.9. Samples: 6310404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:54:01,779][03057] Avg episode reward: [(0, '28.846')] +[2025-09-02 17:54:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25231360. Throughput: 0: 1070.4. Samples: 6315480. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:54:06,780][03057] Avg episode reward: [(0, '28.906')] +[2025-09-02 17:54:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25231360. Throughput: 0: 1069.5. Samples: 6319316. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:54:11,780][03057] Avg episode reward: [(0, '28.735')] +[2025-09-02 17:54:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.4, 300 sec: 4443.1). Total num frames: 25296896. Throughput: 0: 1100.1. Samples: 6326640. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:54:16,780][03057] Avg episode reward: [(0, '29.156')] +[2025-09-02 17:54:21,780][03057] Fps is (10 sec: 6552.7, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 25296896. Throughput: 0: 1084.1. Samples: 6332388. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:54:21,782][03057] Avg episode reward: [(0, '29.333')] +[2025-09-02 17:54:26,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 25296896. Throughput: 0: 1054.9. Samples: 6335112. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:54:26,780][03057] Avg episode reward: [(0, '29.645')] +[2025-09-02 17:54:31,778][03057] Fps is (10 sec: 6554.5, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 25362432. Throughput: 0: 1087.6. Samples: 6342780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:54:31,787][03057] Avg episode reward: [(0, '29.784')] +[2025-09-02 17:54:36,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25362432. Throughput: 0: 1105.6. Samples: 6349284. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:54:36,783][03057] Avg episode reward: [(0, '28.790')] +[2025-09-02 17:54:41,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 25362432. Throughput: 0: 1073.7. Samples: 6351744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:54:41,784][03057] Avg episode reward: [(0, '27.865')] +[2025-09-02 17:54:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 25427968. Throughput: 0: 1070.7. Samples: 6358584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:54:46,783][03057] Avg episode reward: [(0, '27.729')] +[2025-09-02 17:54:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25427968. Throughput: 0: 1125.1. Samples: 6366108. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:54:51,780][03057] Avg episode reward: [(0, '28.271')] +[2025-09-02 17:54:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 25427968. Throughput: 0: 1099.3. Samples: 6368784. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:54:56,780][03057] Avg episode reward: [(0, '29.058')] +[2025-09-02 17:54:56,791][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000388_25427968.pth... +[2025-09-02 17:54:56,961][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000372_24379392.pth +[2025-09-02 17:55:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 25427968. Throughput: 0: 1069.4. Samples: 6374764. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:55:01,780][03057] Avg episode reward: [(0, '28.415')] +[2025-09-02 17:55:06,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25493504. Throughput: 0: 1107.0. Samples: 6382200. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:55:06,779][03057] Avg episode reward: [(0, '29.293')] +[2025-09-02 17:55:11,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 25493504. Throughput: 0: 1127.6. Samples: 6385852. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:55:11,782][03057] Avg episode reward: [(0, '30.685')] +[2025-09-02 17:55:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 25493504. Throughput: 0: 1068.6. Samples: 6390868. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:55:16,780][03057] Avg episode reward: [(0, '32.446')] +[2025-09-02 17:55:17,968][03390] Updated weights for policy 0, policy_version 390 (0.0029) +[2025-09-02 17:55:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 25559040. Throughput: 0: 1087.8. Samples: 6398236. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-02 17:55:21,785][03057] Avg episode reward: [(0, '33.008')] +[2025-09-02 17:55:26,782][03057] Fps is (10 sec: 6551.4, 60 sec: 4368.8, 300 sec: 4220.9). Total num frames: 25559040. Throughput: 0: 1119.7. Samples: 6402136. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-02 17:55:26,783][03057] Avg episode reward: [(0, '31.872')] +[2025-09-02 17:55:31,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 25559040. Throughput: 0: 1099.3. Samples: 6408052. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-02 17:55:31,781][03057] Avg episode reward: [(0, '32.639')] +[2025-09-02 17:55:36,778][03057] Fps is (10 sec: 6555.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25624576. Throughput: 0: 1075.1. Samples: 6414488. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:55:36,780][03057] Avg episode reward: [(0, '32.906')] +[2025-09-02 17:55:41,778][03057] Fps is (10 sec: 6554.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 25624576. Throughput: 0: 1102.7. Samples: 6418404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:55:41,780][03057] Avg episode reward: [(0, '30.258')] +[2025-09-02 17:55:46,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 25690112. Throughput: 0: 1115.0. Samples: 6424940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:55:46,782][03057] Avg episode reward: [(0, '29.513')] +[2025-09-02 17:55:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25690112. Throughput: 0: 1074.7. Samples: 6430560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:55:51,779][03057] Avg episode reward: [(0, '28.581')] +[2025-09-02 17:55:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 25690112. Throughput: 0: 1081.3. Samples: 6434512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:55:56,780][03057] Avg episode reward: [(0, '30.461')] +[2025-09-02 17:56:01,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 25755648. Throughput: 0: 1132.8. Samples: 6441844. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:01,784][03057] Avg episode reward: [(0, '29.527')] +[2025-09-02 17:56:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 25755648. Throughput: 0: 1082.3. Samples: 6446940. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:06,783][03057] Avg episode reward: [(0, '29.227')] +[2025-09-02 17:56:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25755648. Throughput: 0: 1078.7. Samples: 6450672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:11,780][03057] Avg episode reward: [(0, '31.190')] +[2025-09-02 17:56:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 25821184. Throughput: 0: 1108.9. Samples: 6457952. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:16,779][03057] Avg episode reward: [(0, '31.151')] +[2025-09-02 17:56:21,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25821184. Throughput: 0: 1098.4. Samples: 6463916. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:21,779][03057] Avg episode reward: [(0, '31.138')] +[2025-09-02 17:56:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 25821184. Throughput: 0: 1070.9. Samples: 6466596. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:26,780][03057] Avg episode reward: [(0, '30.346')] +[2025-09-02 17:56:31,778][03057] Fps is (10 sec: 6553.5, 60 sec: 5461.4, 300 sec: 4443.1). Total num frames: 25886720. Throughput: 0: 1094.1. Samples: 6474172. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:31,785][03057] Avg episode reward: [(0, '31.004')] +[2025-09-02 17:56:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25886720. Throughput: 0: 1117.1. Samples: 6480828. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:36,780][03057] Avg episode reward: [(0, '29.406')] +[2025-09-02 17:56:41,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 25886720. Throughput: 0: 1086.6. Samples: 6483408. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:41,785][03057] Avg episode reward: [(0, '29.014')] +[2025-09-02 17:56:46,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 25952256. Throughput: 0: 1073.5. Samples: 6490152. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:46,780][03057] Avg episode reward: [(0, '29.622')] +[2025-09-02 17:56:51,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 25952256. Throughput: 0: 1125.5. Samples: 6497588. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:51,783][03057] Avg episode reward: [(0, '28.325')] +[2025-09-02 17:56:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 25952256. Throughput: 0: 1100.9. Samples: 6500212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:56:56,780][03057] Avg episode reward: [(0, '28.262')] +[2025-09-02 17:56:56,795][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000396_25952256.pth... +[2025-09-02 17:56:56,954][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000380_24903680.pth +[2025-09-02 17:57:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 25952256. Throughput: 0: 1071.1. Samples: 6506152. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:57:01,780][03057] Avg episode reward: [(0, '28.671')] +[2025-09-02 17:57:06,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 26017792. Throughput: 0: 1101.9. Samples: 6513500. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:57:06,780][03057] Avg episode reward: [(0, '29.623')] +[2025-09-02 17:57:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 26017792. Throughput: 0: 1118.5. Samples: 6516928. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:57:11,783][03057] Avg episode reward: [(0, '28.517')] +[2025-09-02 17:57:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 26017792. Throughput: 0: 1061.2. Samples: 6521924. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:57:16,780][03057] Avg episode reward: [(0, '29.313')] +[2025-09-02 17:57:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26083328. Throughput: 0: 1069.3. Samples: 6528948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:57:21,780][03057] Avg episode reward: [(0, '29.601')] +[2025-09-02 17:57:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 26083328. Throughput: 0: 1098.9. Samples: 6532856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:57:26,779][03057] Avg episode reward: [(0, '30.260')] +[2025-09-02 17:57:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 26083328. Throughput: 0: 1080.4. Samples: 6538772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:57:31,782][03057] Avg episode reward: [(0, '30.500')] +[2025-09-02 17:57:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 26148864. Throughput: 0: 1054.1. Samples: 6545024. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:57:36,787][03057] Avg episode reward: [(0, '30.002')] +[2025-09-02 17:57:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26148864. Throughput: 0: 1083.0. Samples: 6548948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:57:41,780][03057] Avg episode reward: [(0, '31.939')] +[2025-09-02 17:57:46,549][03390] Updated weights for policy 0, policy_version 400 (0.0014) +[2025-09-02 17:57:46,781][03057] Fps is (10 sec: 6551.8, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 26214400. Throughput: 0: 1100.6. Samples: 6555680. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:57:46,782][03057] Avg episode reward: [(0, '31.407')] +[2025-09-02 17:57:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26214400. Throughput: 0: 1055.9. Samples: 6561016. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:57:51,780][03057] Avg episode reward: [(0, '30.331')] +[2025-09-02 17:57:56,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26214400. Throughput: 0: 1066.1. Samples: 6564904. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:57:56,779][03057] Avg episode reward: [(0, '30.761')] +[2025-09-02 17:58:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 26279936. Throughput: 0: 1122.8. Samples: 6572452. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:58:01,779][03057] Avg episode reward: [(0, '31.990')] +[2025-09-02 17:58:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26279936. Throughput: 0: 1079.7. Samples: 6577536. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:58:06,780][03057] Avg episode reward: [(0, '30.730')] +[2025-09-02 17:58:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26279936. Throughput: 0: 1072.3. Samples: 6581108. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:58:11,784][03057] Avg episode reward: [(0, '30.769')] +[2025-09-02 17:58:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 26345472. Throughput: 0: 1105.4. Samples: 6588516. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:58:16,781][03057] Avg episode reward: [(0, '30.908')] +[2025-09-02 17:58:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26345472. Throughput: 0: 1099.1. Samples: 6594484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:58:21,780][03057] Avg episode reward: [(0, '30.139')] +[2025-09-02 17:58:26,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26345472. Throughput: 0: 1069.1. Samples: 6597056. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:58:26,779][03057] Avg episode reward: [(0, '28.810')] +[2025-09-02 17:58:31,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 26411008. Throughput: 0: 1091.2. Samples: 6604780. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:58:31,780][03057] Avg episode reward: [(0, '28.541')] +[2025-09-02 17:58:36,780][03057] Fps is (10 sec: 6552.7, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 26411008. Throughput: 0: 1124.1. Samples: 6611604. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:58:36,786][03057] Avg episode reward: [(0, '26.556')] +[2025-09-02 17:58:41,780][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 26411008. Throughput: 0: 1092.9. Samples: 6614088. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:58:41,781][03057] Avg episode reward: [(0, '26.587')] +[2025-09-02 17:58:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 3277.0, 300 sec: 4221.0). Total num frames: 26411008. Throughput: 0: 1071.1. Samples: 6620652. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:58:46,779][03057] Avg episode reward: [(0, '27.542')] +[2025-09-02 17:58:51,778][03057] Fps is (10 sec: 6554.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26476544. Throughput: 0: 1126.0. Samples: 6628208. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:58:51,780][03057] Avg episode reward: [(0, '28.254')] +[2025-09-02 17:58:56,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 26476544. Throughput: 0: 1107.6. Samples: 6630948. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:58:56,784][03057] Avg episode reward: [(0, '27.408')] +[2025-09-02 17:58:56,794][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000404_26476544.pth... +[2025-09-02 17:58:56,975][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000388_25427968.pth +[2025-09-02 17:59:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26542080. Throughput: 0: 1071.0. Samples: 6636712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:59:01,780][03057] Avg episode reward: [(0, '26.867')] +[2025-09-02 17:59:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26542080. Throughput: 0: 1106.3. Samples: 6644268. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:59:06,780][03057] Avg episode reward: [(0, '30.816')] +[2025-09-02 17:59:11,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 26542080. Throughput: 0: 1129.7. Samples: 6647892. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:59:11,779][03057] Avg episode reward: [(0, '31.796')] +[2025-09-02 17:59:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 26542080. Throughput: 0: 1071.6. Samples: 6653004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:59:16,780][03057] Avg episode reward: [(0, '31.358')] +[2025-09-02 17:59:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26607616. Throughput: 0: 1078.3. Samples: 6660128. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:59:21,780][03057] Avg episode reward: [(0, '33.480')] +[2025-09-02 17:59:26,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 26607616. Throughput: 0: 1108.8. Samples: 6663984. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:59:26,780][03057] Avg episode reward: [(0, '34.587')] +[2025-09-02 17:59:26,789][03375] Saving new best policy, reward=34.587! +[2025-09-02 17:59:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 26607616. Throughput: 0: 1094.1. Samples: 6669888. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 17:59:31,783][03057] Avg episode reward: [(0, '33.759')] +[2025-09-02 17:59:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 26673152. Throughput: 0: 1063.3. Samples: 6676056. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-02 17:59:36,780][03057] Avg episode reward: [(0, '33.347')] +[2025-09-02 17:59:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 26673152. Throughput: 0: 1088.5. Samples: 6679932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-02 17:59:41,779][03057] Avg episode reward: [(0, '33.348')] +[2025-09-02 17:59:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 26673152. Throughput: 0: 1112.2. Samples: 6686760. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-02 17:59:46,788][03057] Avg episode reward: [(0, '33.384')] +[2025-09-02 17:59:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26738688. Throughput: 0: 1057.0. Samples: 6691832. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:59:51,779][03057] Avg episode reward: [(0, '33.429')] +[2025-09-02 17:59:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26738688. Throughput: 0: 1061.8. Samples: 6695672. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 17:59:56,780][03057] Avg episode reward: [(0, '32.416')] +[2025-09-02 18:00:01,781][03057] Fps is (10 sec: 0.0, 60 sec: 3276.6, 300 sec: 4220.9). Total num frames: 26738688. Throughput: 0: 1123.6. Samples: 6703568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:01,783][03057] Avg episode reward: [(0, '31.286')] +[2025-09-02 18:00:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26804224. Throughput: 0: 1070.6. Samples: 6708304. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:06,787][03057] Avg episode reward: [(0, '32.096')] +[2025-09-02 18:00:11,778][03057] Fps is (10 sec: 6555.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26804224. Throughput: 0: 1060.0. Samples: 6711684. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:11,780][03057] Avg episode reward: [(0, '32.116')] +[2025-09-02 18:00:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 26804224. Throughput: 0: 1102.2. Samples: 6719488. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:16,784][03057] Avg episode reward: [(0, '32.204')] +[2025-09-02 18:00:17,045][03390] Updated weights for policy 0, policy_version 410 (0.0013) +[2025-09-02 18:00:21,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 26869760. Throughput: 0: 1089.9. Samples: 6725100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:21,780][03057] Avg episode reward: [(0, '31.891')] +[2025-09-02 18:00:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26869760. Throughput: 0: 1060.1. Samples: 6727636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:26,788][03057] Avg episode reward: [(0, '31.967')] +[2025-09-02 18:00:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 26869760. Throughput: 0: 1081.6. Samples: 6735432. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:31,788][03057] Avg episode reward: [(0, '31.436')] +[2025-09-02 18:00:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 26935296. Throughput: 0: 1117.5. Samples: 6742120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:36,784][03057] Avg episode reward: [(0, '31.619')] +[2025-09-02 18:00:41,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 26935296. Throughput: 0: 1087.5. Samples: 6744608. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:41,781][03057] Avg episode reward: [(0, '32.518')] +[2025-09-02 18:00:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 26935296. Throughput: 0: 1061.4. Samples: 6751328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:46,780][03057] Avg episode reward: [(0, '32.757')] +[2025-09-02 18:00:51,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27000832. Throughput: 0: 1116.4. Samples: 6758544. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:51,784][03057] Avg episode reward: [(0, '31.370')] +[2025-09-02 18:00:56,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 27000832. Throughput: 0: 1100.6. Samples: 6761212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:00:56,785][03057] Avg episode reward: [(0, '32.476')] +[2025-09-02 18:00:56,790][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_27000832.pth... +[2025-09-02 18:00:56,982][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000396_25952256.pth +[2025-09-02 18:01:01,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 27000832. Throughput: 0: 1051.8. Samples: 6766820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:01:01,781][03057] Avg episode reward: [(0, '31.515')] +[2025-09-02 18:01:06,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27066368. Throughput: 0: 1092.7. Samples: 6774272. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 18:01:06,780][03057] Avg episode reward: [(0, '29.818')] +[2025-09-02 18:01:11,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27066368. Throughput: 0: 1118.0. Samples: 6777948. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 18:01:11,782][03057] Avg episode reward: [(0, '27.192')] +[2025-09-02 18:01:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27066368. Throughput: 0: 1057.8. Samples: 6783032. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 18:01:16,780][03057] Avg episode reward: [(0, '27.918')] +[2025-09-02 18:01:21,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 27131904. Throughput: 0: 1063.3. Samples: 6789968. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:01:21,780][03057] Avg episode reward: [(0, '28.279')] +[2025-09-02 18:01:26,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27131904. Throughput: 0: 1091.8. Samples: 6793740. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:01:26,780][03057] Avg episode reward: [(0, '27.112')] +[2025-09-02 18:01:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27131904. Throughput: 0: 1076.1. Samples: 6799752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:01:31,780][03057] Avg episode reward: [(0, '27.292')] +[2025-09-02 18:01:36,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27197440. Throughput: 0: 1051.5. Samples: 6805860. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:01:36,780][03057] Avg episode reward: [(0, '28.317')] +[2025-09-02 18:01:41,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27197440. Throughput: 0: 1078.3. Samples: 6809736. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:01:41,779][03057] Avg episode reward: [(0, '29.626')] +[2025-09-02 18:01:46,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27197440. Throughput: 0: 1108.3. Samples: 6816692. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:01:46,780][03057] Avg episode reward: [(0, '30.116')] +[2025-09-02 18:01:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27262976. Throughput: 0: 1052.0. Samples: 6821612. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:01:51,779][03057] Avg episode reward: [(0, '30.821')] +[2025-09-02 18:01:56,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27262976. Throughput: 0: 1056.4. Samples: 6825484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:01:56,780][03057] Avg episode reward: [(0, '31.771')] +[2025-09-02 18:02:01,780][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 27262976. Throughput: 0: 1113.0. Samples: 6833120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:01,781][03057] Avg episode reward: [(0, '32.487')] +[2025-09-02 18:02:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27328512. Throughput: 0: 1068.2. Samples: 6838036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:06,780][03057] Avg episode reward: [(0, '33.391')] +[2025-09-02 18:02:11,778][03057] Fps is (10 sec: 6554.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27328512. Throughput: 0: 1057.2. Samples: 6841312. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:11,780][03057] Avg episode reward: [(0, '32.255')] +[2025-09-02 18:02:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27328512. Throughput: 0: 1100.2. Samples: 6849260. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:16,784][03057] Avg episode reward: [(0, '31.263')] +[2025-09-02 18:02:21,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27394048. Throughput: 0: 1091.7. Samples: 6854988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:21,780][03057] Avg episode reward: [(0, '31.362')] +[2025-09-02 18:02:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27394048. Throughput: 0: 1061.2. Samples: 6857488. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:26,780][03057] Avg episode reward: [(0, '31.018')] +[2025-09-02 18:02:31,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 27394048. Throughput: 0: 1066.7. Samples: 6864696. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:31,781][03057] Avg episode reward: [(0, '32.230')] +[2025-09-02 18:02:36,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27459584. Throughput: 0: 1110.2. Samples: 6871572. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:36,784][03057] Avg episode reward: [(0, '31.135')] +[2025-09-02 18:02:41,778][03057] Fps is (10 sec: 6554.2, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27459584. Throughput: 0: 1080.4. Samples: 6874100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:41,780][03057] Avg episode reward: [(0, '32.077')] +[2025-09-02 18:02:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27459584. Throughput: 0: 1050.7. Samples: 6880400. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:46,780][03057] Avg episode reward: [(0, '31.497')] +[2025-09-02 18:02:48,964][03390] Updated weights for policy 0, policy_version 420 (0.0017) +[2025-09-02 18:02:51,779][03057] Fps is (10 sec: 6553.3, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 27525120. Throughput: 0: 1099.5. Samples: 6887516. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:51,780][03057] Avg episode reward: [(0, '31.465')] +[2025-09-02 18:02:56,780][03057] Fps is (10 sec: 6552.2, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 27525120. Throughput: 0: 1095.7. Samples: 6890620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:02:56,784][03057] Avg episode reward: [(0, '32.741')] +[2025-09-02 18:02:56,796][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000420_27525120.pth... +[2025-09-02 18:02:56,966][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000404_26476544.pth +[2025-09-02 18:03:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 27525120. Throughput: 0: 1038.6. Samples: 6895996. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:03:01,782][03057] Avg episode reward: [(0, '31.713')] +[2025-09-02 18:03:06,779][03057] Fps is (10 sec: 6554.7, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 27590656. Throughput: 0: 1074.8. Samples: 6903356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:03:06,781][03057] Avg episode reward: [(0, '33.303')] +[2025-09-02 18:03:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27590656. Throughput: 0: 1104.4. Samples: 6907184. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:03:11,783][03057] Avg episode reward: [(0, '31.845')] +[2025-09-02 18:03:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27590656. Throughput: 0: 1062.4. Samples: 6912504. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:03:16,782][03057] Avg episode reward: [(0, '31.236')] +[2025-09-02 18:03:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27656192. Throughput: 0: 1058.5. Samples: 6919204. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:03:21,780][03057] Avg episode reward: [(0, '30.159')] +[2025-09-02 18:03:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27656192. Throughput: 0: 1090.4. Samples: 6923168. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:03:26,788][03057] Avg episode reward: [(0, '28.825')] +[2025-09-02 18:03:31,780][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 27656192. Throughput: 0: 1088.0. Samples: 6929364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:03:31,781][03057] Avg episode reward: [(0, '28.637')] +[2025-09-02 18:03:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27721728. Throughput: 0: 1059.6. Samples: 6935196. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 18:03:36,782][03057] Avg episode reward: [(0, '28.985')] +[2025-09-02 18:03:41,778][03057] Fps is (10 sec: 6554.9, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27721728. Throughput: 0: 1077.0. Samples: 6939084. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 18:03:41,780][03057] Avg episode reward: [(0, '31.185')] +[2025-09-02 18:03:46,780][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 27721728. Throughput: 0: 1115.8. Samples: 6946208. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 18:03:46,781][03057] Avg episode reward: [(0, '32.285')] +[2025-09-02 18:03:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27787264. Throughput: 0: 1054.8. Samples: 6950820. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:03:51,781][03057] Avg episode reward: [(0, '30.980')] +[2025-09-02 18:03:56,778][03057] Fps is (10 sec: 6554.7, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 27787264. Throughput: 0: 1055.4. Samples: 6954676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:03:56,780][03057] Avg episode reward: [(0, '30.946')] +[2025-09-02 18:04:01,782][03057] Fps is (10 sec: 0.0, 60 sec: 4368.8, 300 sec: 4220.9). Total num frames: 27787264. Throughput: 0: 1111.7. Samples: 6962536. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:01,783][03057] Avg episode reward: [(0, '29.874')] +[2025-09-02 18:04:06,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 27852800. Throughput: 0: 1076.4. Samples: 6967644. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:06,780][03057] Avg episode reward: [(0, '29.793')] +[2025-09-02 18:04:11,779][03057] Fps is (10 sec: 6555.8, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 27852800. Throughput: 0: 1056.5. Samples: 6970712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:11,780][03057] Avg episode reward: [(0, '28.632')] +[2025-09-02 18:04:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 27852800. Throughput: 0: 1087.7. Samples: 6978308. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:16,781][03057] Avg episode reward: [(0, '27.439')] +[2025-09-02 18:04:21,779][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 27918336. Throughput: 0: 1093.5. Samples: 6984404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:21,785][03057] Avg episode reward: [(0, '28.696')] +[2025-09-02 18:04:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27918336. Throughput: 0: 1062.7. Samples: 6986904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:26,788][03057] Avg episode reward: [(0, '29.839')] +[2025-09-02 18:04:31,783][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 27918336. Throughput: 0: 1064.5. Samples: 6994112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:31,788][03057] Avg episode reward: [(0, '31.376')] +[2025-09-02 18:04:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 27983872. Throughput: 0: 1120.6. Samples: 7001248. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:36,782][03057] Avg episode reward: [(0, '31.232')] +[2025-09-02 18:04:41,780][03057] Fps is (10 sec: 6555.3, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 27983872. Throughput: 0: 1091.4. Samples: 7003792. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:41,786][03057] Avg episode reward: [(0, '29.950')] +[2025-09-02 18:04:46,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 27983872. Throughput: 0: 1054.6. Samples: 7009988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:46,789][03057] Avg episode reward: [(0, '30.641')] +[2025-09-02 18:04:51,778][03057] Fps is (10 sec: 6555.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28049408. Throughput: 0: 1100.2. Samples: 7017152. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:51,780][03057] Avg episode reward: [(0, '29.449')] +[2025-09-02 18:04:56,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 28049408. Throughput: 0: 1104.3. Samples: 7020404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:04:56,780][03057] Avg episode reward: [(0, '28.298')] +[2025-09-02 18:04:56,785][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000428_28049408.pth... +[2025-09-02 18:04:56,985][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000412_27000832.pth +[2025-09-02 18:05:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 28049408. Throughput: 0: 1048.6. Samples: 7025496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:01,779][03057] Avg episode reward: [(0, '28.838')] +[2025-09-02 18:05:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28114944. Throughput: 0: 1076.3. Samples: 7032836. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:06,780][03057] Avg episode reward: [(0, '30.881')] +[2025-09-02 18:05:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28114944. Throughput: 0: 1105.7. Samples: 7036660. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:11,783][03057] Avg episode reward: [(0, '30.421')] +[2025-09-02 18:05:16,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 28114944. Throughput: 0: 1065.1. Samples: 7042036. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:16,780][03057] Avg episode reward: [(0, '29.110')] +[2025-09-02 18:05:21,483][03390] Updated weights for policy 0, policy_version 430 (0.0013) +[2025-09-02 18:05:21,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28180480. Throughput: 0: 1050.7. Samples: 7048528. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:21,780][03057] Avg episode reward: [(0, '30.740')] +[2025-09-02 18:05:26,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28180480. Throughput: 0: 1078.0. Samples: 7052300. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:26,779][03057] Avg episode reward: [(0, '30.647')] +[2025-09-02 18:05:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.4, 300 sec: 4221.0). Total num frames: 28180480. Throughput: 0: 1082.8. Samples: 7058712. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:31,780][03057] Avg episode reward: [(0, '31.451')] +[2025-09-02 18:05:36,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28246016. Throughput: 0: 1049.5. Samples: 7064380. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:36,780][03057] Avg episode reward: [(0, '30.874')] +[2025-09-02 18:05:41,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 28246016. Throughput: 0: 1062.9. Samples: 7068236. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:41,785][03057] Avg episode reward: [(0, '31.095')] +[2025-09-02 18:05:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 28246016. Throughput: 0: 1113.0. Samples: 7075580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:46,780][03057] Avg episode reward: [(0, '32.793')] +[2025-09-02 18:05:51,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28311552. Throughput: 0: 1054.4. Samples: 7080284. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:51,781][03057] Avg episode reward: [(0, '32.190')] +[2025-09-02 18:05:56,779][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28311552. Throughput: 0: 1051.0. Samples: 7083956. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:05:56,785][03057] Avg episode reward: [(0, '32.979')] +[2025-09-02 18:06:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 28311552. Throughput: 0: 1103.0. Samples: 7091668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:06:01,785][03057] Avg episode reward: [(0, '32.737')] +[2025-09-02 18:06:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 28311552. Throughput: 0: 1084.3. Samples: 7097320. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:06:06,782][03057] Avg episode reward: [(0, '31.861')] +[2025-09-02 18:06:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28377088. Throughput: 0: 1059.6. Samples: 7099980. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 18:06:11,779][03057] Avg episode reward: [(0, '31.185')] +[2025-09-02 18:06:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 28377088. Throughput: 0: 1092.2. Samples: 7107860. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-09-02 18:06:16,780][03057] Avg episode reward: [(0, '29.136')] +[2025-09-02 18:06:21,779][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28442624. Throughput: 0: 1108.3. Samples: 7114256. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:06:21,787][03057] Avg episode reward: [(0, '28.438')] +[2025-09-02 18:06:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28442624. Throughput: 0: 1078.5. Samples: 7116768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:06:26,788][03057] Avg episode reward: [(0, '28.248')] +[2025-09-02 18:06:31,780][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 28442624. Throughput: 0: 1070.1. Samples: 7123736. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:06:31,781][03057] Avg episode reward: [(0, '28.597')] +[2025-09-02 18:06:36,782][03057] Fps is (10 sec: 6551.1, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 28508160. Throughput: 0: 1129.2. Samples: 7131104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:06:36,788][03057] Avg episode reward: [(0, '27.700')] +[2025-09-02 18:06:41,780][03057] Fps is (10 sec: 6553.3, 60 sec: 4368.9, 300 sec: 4443.1). Total num frames: 28508160. Throughput: 0: 1106.5. Samples: 7133752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:06:41,782][03057] Avg episode reward: [(0, '29.084')] +[2025-09-02 18:06:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 28508160. Throughput: 0: 1069.6. Samples: 7139800. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:06:46,780][03057] Avg episode reward: [(0, '29.873')] +[2025-09-02 18:06:51,778][03057] Fps is (10 sec: 6555.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28573696. Throughput: 0: 1109.3. Samples: 7147240. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:06:51,779][03057] Avg episode reward: [(0, '31.678')] +[2025-09-02 18:06:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28573696. Throughput: 0: 1120.5. Samples: 7150404. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:06:56,786][03057] Avg episode reward: [(0, '29.521')] +[2025-09-02 18:06:56,798][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000436_28573696.pth... +[2025-09-02 18:06:56,954][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000420_27525120.pth +[2025-09-02 18:07:01,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 28573696. Throughput: 0: 1055.5. Samples: 7155356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:01,781][03057] Avg episode reward: [(0, '29.143')] +[2025-09-02 18:07:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 28573696. Throughput: 0: 1079.8. Samples: 7162848. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:06,789][03057] Avg episode reward: [(0, '31.259')] +[2025-09-02 18:07:11,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 28639232. Throughput: 0: 1105.1. Samples: 7166500. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:11,785][03057] Avg episode reward: [(0, '30.983')] +[2025-09-02 18:07:16,786][03057] Fps is (10 sec: 6548.6, 60 sec: 4368.5, 300 sec: 4220.9). Total num frames: 28639232. Throughput: 0: 1074.6. Samples: 7172100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:16,794][03057] Avg episode reward: [(0, '31.072')] +[2025-09-02 18:07:21,779][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 28639232. Throughput: 0: 1059.0. Samples: 7178756. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:21,783][03057] Avg episode reward: [(0, '31.692')] +[2025-09-02 18:07:26,778][03057] Fps is (10 sec: 6558.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28704768. Throughput: 0: 1080.4. Samples: 7182368. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:26,781][03057] Avg episode reward: [(0, '32.012')] +[2025-09-02 18:07:31,782][03057] Fps is (10 sec: 6551.4, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 28704768. Throughput: 0: 1089.5. Samples: 7188832. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:31,783][03057] Avg episode reward: [(0, '31.517')] +[2025-09-02 18:07:36,779][03057] Fps is (10 sec: 0.0, 60 sec: 3277.0, 300 sec: 4221.0). Total num frames: 28704768. Throughput: 0: 1056.3. Samples: 7194776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:36,780][03057] Avg episode reward: [(0, '30.771')] +[2025-09-02 18:07:41,778][03057] Fps is (10 sec: 6556.0, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 28770304. Throughput: 0: 1063.1. Samples: 7198244. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:41,780][03057] Avg episode reward: [(0, '30.586')] +[2025-09-02 18:07:46,778][03057] Fps is (10 sec: 6553.9, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 28770304. Throughput: 0: 1118.8. Samples: 7205700. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:46,783][03057] Avg episode reward: [(0, '31.480')] +[2025-09-02 18:07:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 3276.8, 300 sec: 4221.0). Total num frames: 28770304. Throughput: 0: 1066.6. Samples: 7210844. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:51,781][03057] Avg episode reward: [(0, '30.742')] +[2025-09-02 18:07:53,256][03390] Updated weights for policy 0, policy_version 440 (0.0013) +[2025-09-02 18:07:56,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28835840. Throughput: 0: 1055.5. Samples: 7213996. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:07:56,780][03057] Avg episode reward: [(0, '31.539')] +[2025-09-02 18:08:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 28835840. Throughput: 0: 1105.9. Samples: 7221856. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:01,779][03057] Avg episode reward: [(0, '33.094')] +[2025-09-02 18:08:06,780][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 28835840. Throughput: 0: 1084.8. Samples: 7227572. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:06,782][03057] Avg episode reward: [(0, '31.812')] +[2025-09-02 18:08:11,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 28901376. Throughput: 0: 1057.2. Samples: 7229944. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:11,779][03057] Avg episode reward: [(0, '32.236')] +[2025-09-02 18:08:16,779][03057] Fps is (10 sec: 6554.6, 60 sec: 4369.6, 300 sec: 4221.0). Total num frames: 28901376. Throughput: 0: 1088.3. Samples: 7237800. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:16,782][03057] Avg episode reward: [(0, '31.576')] +[2025-09-02 18:08:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 28901376. Throughput: 0: 1102.2. Samples: 7244376. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:21,781][03057] Avg episode reward: [(0, '30.688')] +[2025-09-02 18:08:26,779][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 28966912. Throughput: 0: 1079.3. Samples: 7246812. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:26,780][03057] Avg episode reward: [(0, '30.071')] +[2025-09-02 18:08:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.3, 300 sec: 4221.0). Total num frames: 28966912. Throughput: 0: 1065.5. Samples: 7253648. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:31,779][03057] Avg episode reward: [(0, '29.496')] +[2025-09-02 18:08:36,781][03057] Fps is (10 sec: 6551.9, 60 sec: 5461.1, 300 sec: 4443.1). Total num frames: 29032448. Throughput: 0: 1117.5. Samples: 7261136. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:36,783][03057] Avg episode reward: [(0, '27.966')] +[2025-09-02 18:08:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29032448. Throughput: 0: 1104.9. Samples: 7263716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:41,784][03057] Avg episode reward: [(0, '28.880')] +[2025-09-02 18:08:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 29032448. Throughput: 0: 1061.8. Samples: 7269636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:46,780][03057] Avg episode reward: [(0, '28.228')] +[2025-09-02 18:08:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 29032448. Throughput: 0: 1107.1. Samples: 7277388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:51,783][03057] Avg episode reward: [(0, '28.145')] +[2025-09-02 18:08:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 29097984. Throughput: 0: 1128.0. Samples: 7280704. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:08:56,779][03057] Avg episode reward: [(0, '28.403')] +[2025-09-02 18:08:56,791][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000444_29097984.pth... +[2025-09-02 18:08:56,952][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000428_28049408.pth +[2025-09-02 18:09:01,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 29097984. Throughput: 0: 1066.4. Samples: 7285788. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:01,785][03057] Avg episode reward: [(0, '29.007')] +[2025-09-02 18:09:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 29097984. Throughput: 0: 1084.9. Samples: 7293196. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:06,789][03057] Avg episode reward: [(0, '29.905')] +[2025-09-02 18:09:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29163520. Throughput: 0: 1115.6. Samples: 7297012. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:11,780][03057] Avg episode reward: [(0, '30.937')] +[2025-09-02 18:09:16,780][03057] Fps is (10 sec: 6552.3, 60 sec: 4368.9, 300 sec: 4220.9). Total num frames: 29163520. Throughput: 0: 1096.1. Samples: 7302976. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:16,784][03057] Avg episode reward: [(0, '29.845')] +[2025-09-02 18:09:21,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 29163520. Throughput: 0: 1080.4. Samples: 7309752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:21,780][03057] Avg episode reward: [(0, '29.562')] +[2025-09-02 18:09:26,779][03057] Fps is (10 sec: 6554.6, 60 sec: 4369.1, 300 sec: 4443.2). Total num frames: 29229056. Throughput: 0: 1101.5. Samples: 7313284. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:26,785][03057] Avg episode reward: [(0, '29.424')] +[2025-09-02 18:09:31,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 29229056. Throughput: 0: 1119.7. Samples: 7320024. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:31,782][03057] Avg episode reward: [(0, '30.034')] +[2025-09-02 18:09:36,778][03057] Fps is (10 sec: 0.0, 60 sec: 3277.0, 300 sec: 4221.0). Total num frames: 29229056. Throughput: 0: 1076.7. Samples: 7325840. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:36,788][03057] Avg episode reward: [(0, '29.789')] +[2025-09-02 18:09:41,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29294592. Throughput: 0: 1083.7. Samples: 7329472. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:41,780][03057] Avg episode reward: [(0, '29.909')] +[2025-09-02 18:09:46,779][03057] Fps is (10 sec: 6553.1, 60 sec: 4369.0, 300 sec: 4220.9). Total num frames: 29294592. Throughput: 0: 1140.5. Samples: 7337112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:46,781][03057] Avg episode reward: [(0, '30.675')] +[2025-09-02 18:09:51,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 29294592. Throughput: 0: 1092.6. Samples: 7342364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:51,780][03057] Avg episode reward: [(0, '30.470')] +[2025-09-02 18:09:56,778][03057] Fps is (10 sec: 6554.1, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29360128. Throughput: 0: 1082.6. Samples: 7345728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:09:56,780][03057] Avg episode reward: [(0, '30.492')] +[2025-09-02 18:10:01,778][03057] Fps is (10 sec: 6553.7, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 29360128. Throughput: 0: 1122.7. Samples: 7353496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:01,780][03057] Avg episode reward: [(0, '30.840')] +[2025-09-02 18:10:06,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 29360128. Throughput: 0: 1101.5. Samples: 7359320. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:06,783][03057] Avg episode reward: [(0, '31.411')] +[2025-09-02 18:10:11,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29425664. Throughput: 0: 1084.0. Samples: 7362064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:11,779][03057] Avg episode reward: [(0, '31.608')] +[2025-09-02 18:10:16,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.2, 300 sec: 4221.0). Total num frames: 29425664. Throughput: 0: 1112.1. Samples: 7370068. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:16,780][03057] Avg episode reward: [(0, '32.201')] +[2025-09-02 18:10:21,095][03390] Updated weights for policy 0, policy_version 450 (0.0023) +[2025-09-02 18:10:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 29491200. Throughput: 0: 1124.0. Samples: 7376420. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:21,781][03057] Avg episode reward: [(0, '31.158')] +[2025-09-02 18:10:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29491200. Throughput: 0: 1100.1. Samples: 7378976. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:26,779][03057] Avg episode reward: [(0, '31.580')] +[2025-09-02 18:10:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 29491200. Throughput: 0: 1091.1. Samples: 7386212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:31,780][03057] Avg episode reward: [(0, '31.512')] +[2025-09-02 18:10:36,778][03057] Fps is (10 sec: 6553.7, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 29556736. Throughput: 0: 1140.0. Samples: 7393664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:36,782][03057] Avg episode reward: [(0, '30.683')] +[2025-09-02 18:10:41,783][03057] Fps is (10 sec: 6550.8, 60 sec: 4368.7, 300 sec: 4443.1). Total num frames: 29556736. Throughput: 0: 1124.0. Samples: 7396312. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:41,784][03057] Avg episode reward: [(0, '31.929')] +[2025-09-02 18:10:46,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 29556736. Throughput: 0: 1093.0. Samples: 7402680. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:46,779][03057] Avg episode reward: [(0, '30.365')] +[2025-09-02 18:10:51,778][03057] Fps is (10 sec: 6556.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 29622272. Throughput: 0: 1130.9. Samples: 7410212. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:51,780][03057] Avg episode reward: [(0, '31.674')] +[2025-09-02 18:10:56,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29622272. Throughput: 0: 1143.1. Samples: 7413504. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:10:56,784][03057] Avg episode reward: [(0, '32.619')] +[2025-09-02 18:10:56,798][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000452_29622272.pth... +[2025-09-02 18:10:56,968][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000436_28573696.pth +[2025-09-02 18:11:01,782][03057] Fps is (10 sec: 0.0, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 29622272. Throughput: 0: 1078.9. Samples: 7418624. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:01,784][03057] Avg episode reward: [(0, '33.095')] +[2025-09-02 18:11:06,778][03057] Fps is (10 sec: 6553.6, 60 sec: 5461.3, 300 sec: 4443.1). Total num frames: 29687808. Throughput: 0: 1105.9. Samples: 7426184. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:06,780][03057] Avg episode reward: [(0, '32.499')] +[2025-09-02 18:11:11,778][03057] Fps is (10 sec: 6556.4, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29687808. Throughput: 0: 1136.4. Samples: 7430112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:11,785][03057] Avg episode reward: [(0, '34.265')] +[2025-09-02 18:11:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4221.0). Total num frames: 29687808. Throughput: 0: 1099.9. Samples: 7435708. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:16,781][03057] Avg episode reward: [(0, '33.569')] +[2025-09-02 18:11:21,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29753344. Throughput: 0: 1088.7. Samples: 7442656. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:21,783][03057] Avg episode reward: [(0, '33.847')] +[2025-09-02 18:11:26,783][03057] Fps is (10 sec: 6550.7, 60 sec: 4368.8, 300 sec: 4443.1). Total num frames: 29753344. Throughput: 0: 1117.8. Samples: 7446612. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:26,784][03057] Avg episode reward: [(0, '31.547')] +[2025-09-02 18:11:31,780][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 29753344. Throughput: 0: 1116.7. Samples: 7452932. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:31,781][03057] Avg episode reward: [(0, '31.168')] +[2025-09-02 18:11:36,786][03057] Fps is (10 sec: 6551.5, 60 sec: 4368.5, 300 sec: 4443.0). Total num frames: 29818880. Throughput: 0: 1078.0. Samples: 7458728. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:36,787][03057] Avg episode reward: [(0, '29.567')] +[2025-09-02 18:11:41,778][03057] Fps is (10 sec: 6554.6, 60 sec: 4369.4, 300 sec: 4443.1). Total num frames: 29818880. Throughput: 0: 1090.6. Samples: 7462580. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:41,780][03057] Avg episode reward: [(0, '28.237')] +[2025-09-02 18:11:46,779][03057] Fps is (10 sec: 0.0, 60 sec: 4369.0, 300 sec: 4221.0). Total num frames: 29818880. Throughput: 0: 1138.2. Samples: 7469840. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:46,780][03057] Avg episode reward: [(0, '28.660')] +[2025-09-02 18:11:51,778][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29884416. Throughput: 0: 1080.0. Samples: 7474784. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:51,779][03057] Avg episode reward: [(0, '29.024')] +[2025-09-02 18:11:56,778][03057] Fps is (10 sec: 6553.8, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29884416. Throughput: 0: 1078.3. Samples: 7478636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:11:56,782][03057] Avg episode reward: [(0, '30.272')] +[2025-09-02 18:12:01,785][03057] Fps is (10 sec: 0.0, 60 sec: 4368.9, 300 sec: 4443.0). Total num frames: 29884416. Throughput: 0: 1126.8. Samples: 7486420. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:12:01,790][03057] Avg episode reward: [(0, '31.341')] +[2025-09-02 18:12:06,779][03057] Fps is (10 sec: 6553.5, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 29949952. Throughput: 0: 1089.7. Samples: 7491692. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:12:06,780][03057] Avg episode reward: [(0, '30.186')] +[2025-09-02 18:12:11,779][03057] Fps is (10 sec: 6557.9, 60 sec: 4369.0, 300 sec: 4443.2). Total num frames: 29949952. Throughput: 0: 1073.3. Samples: 7494904. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:12:11,780][03057] Avg episode reward: [(0, '30.639')] +[2025-09-02 18:12:16,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.1, 300 sec: 4443.1). Total num frames: 29949952. Throughput: 0: 1105.9. Samples: 7502696. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:12:16,780][03057] Avg episode reward: [(0, '31.305')] +[2025-09-02 18:12:21,779][03057] Fps is (10 sec: 6553.4, 60 sec: 4369.0, 300 sec: 4443.1). Total num frames: 30015488. Throughput: 0: 1109.1. Samples: 7508628. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:12:21,784][03057] Avg episode reward: [(0, '30.366')] +[2025-09-02 18:12:26,778][03057] Fps is (10 sec: 6553.6, 60 sec: 4369.4, 300 sec: 4443.2). Total num frames: 30015488. Throughput: 0: 1078.7. Samples: 7511120. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:12:26,780][03057] Avg episode reward: [(0, '30.918')] +[2025-09-02 18:12:31,778][03057] Fps is (10 sec: 0.0, 60 sec: 4369.2, 300 sec: 4443.1). Total num frames: 30015488. Throughput: 0: 1078.6. Samples: 7518376. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-09-02 18:12:31,779][03057] Avg episode reward: [(0, '32.368')] +[2025-09-02 18:12:34,925][03375] Stopping Batcher_0... +[2025-09-02 18:12:34,925][03375] Loop batcher_evt_loop terminating... +[2025-09-02 18:12:34,930][03057] Component Batcher_0 stopped! +[2025-09-02 18:12:34,933][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000459_30081024.pth... +[2025-09-02 18:12:35,001][03390] Weights refcount: 2 0 +[2025-09-02 18:12:35,004][03390] Stopping InferenceWorker_p0-w0... +[2025-09-02 18:12:35,004][03057] Component InferenceWorker_p0-w0 stopped! +[2025-09-02 18:12:35,005][03390] Loop inference_proc0-0_evt_loop terminating... +[2025-09-02 18:12:35,062][03375] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000444_29097984.pth +[2025-09-02 18:12:35,073][03375] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000459_30081024.pth... +[2025-09-02 18:12:35,204][03057] Component LearnerWorker_p0 stopped! +[2025-09-02 18:12:35,207][03375] Stopping LearnerWorker_p0... +[2025-09-02 18:12:35,207][03375] Loop learner_proc0_evt_loop terminating... +[2025-09-02 18:12:36,294][03057] Component RolloutWorker_w4 stopped! +[2025-09-02 18:12:36,297][03395] Stopping RolloutWorker_w4... +[2025-09-02 18:12:36,300][03395] Loop rollout_proc4_evt_loop terminating... +[2025-09-02 18:12:36,341][03057] Component RolloutWorker_w0 stopped! +[2025-09-02 18:12:36,342][03391] Stopping RolloutWorker_w0... +[2025-09-02 18:12:36,345][03391] Loop rollout_proc0_evt_loop terminating... +[2025-09-02 18:12:36,391][03057] Component RolloutWorker_w6 stopped! +[2025-09-02 18:12:36,393][03397] Stopping RolloutWorker_w6... +[2025-09-02 18:12:36,394][03397] Loop rollout_proc6_evt_loop terminating... +[2025-09-02 18:12:36,396][03394] Stopping RolloutWorker_w2... +[2025-09-02 18:12:36,396][03394] Loop rollout_proc2_evt_loop terminating... +[2025-09-02 18:12:36,395][03057] Component RolloutWorker_w2 stopped! +[2025-09-02 18:12:36,444][03057] Component RolloutWorker_w8 stopped! +[2025-09-02 18:12:36,446][03399] Stopping RolloutWorker_w8... +[2025-09-02 18:12:36,447][03399] Loop rollout_proc8_evt_loop terminating... +[2025-09-02 18:12:36,475][03057] Component RolloutWorker_w3 stopped! +[2025-09-02 18:12:36,475][03393] Stopping RolloutWorker_w3... +[2025-09-02 18:12:36,484][03393] Loop rollout_proc3_evt_loop terminating... +[2025-09-02 18:12:36,526][03392] Stopping RolloutWorker_w1... +[2025-09-02 18:12:36,527][03392] Loop rollout_proc1_evt_loop terminating... +[2025-09-02 18:12:36,528][03057] Component RolloutWorker_w1 stopped! +[2025-09-02 18:12:36,620][03398] Stopping RolloutWorker_w7... +[2025-09-02 18:12:36,620][03057] Component RolloutWorker_w7 stopped! +[2025-09-02 18:12:36,629][03398] Loop rollout_proc7_evt_loop terminating... +[2025-09-02 18:12:36,665][03400] Stopping RolloutWorker_w9... +[2025-09-02 18:12:36,664][03057] Component RolloutWorker_w9 stopped! +[2025-09-02 18:12:36,671][03400] Loop rollout_proc9_evt_loop terminating... +[2025-09-02 18:12:36,680][03396] Stopping RolloutWorker_w5... +[2025-09-02 18:12:36,680][03396] Loop rollout_proc5_evt_loop terminating... +[2025-09-02 18:12:36,679][03057] Component RolloutWorker_w5 stopped! +[2025-09-02 18:12:36,682][03057] Waiting for process learner_proc0 to stop... +[2025-09-02 18:12:38,175][03057] Waiting for process inference_proc0-0 to join... +[2025-09-02 18:12:38,548][03057] Waiting for process rollout_proc0 to join... +[2025-09-02 18:12:41,898][03057] Waiting for process rollout_proc1 to join... +[2025-09-02 18:12:41,911][03057] Waiting for process rollout_proc2 to join... +[2025-09-02 18:12:41,913][03057] Waiting for process rollout_proc3 to join... +[2025-09-02 18:12:41,914][03057] Waiting for process rollout_proc4 to join... +[2025-09-02 18:12:41,915][03057] Waiting for process rollout_proc5 to join... +[2025-09-02 18:12:41,916][03057] Waiting for process rollout_proc6 to join... +[2025-09-02 18:12:41,917][03057] Waiting for process rollout_proc7 to join... +[2025-09-02 18:12:41,919][03057] Waiting for process rollout_proc8 to join... +[2025-09-02 18:12:41,920][03057] Waiting for process rollout_proc9 to join... +[2025-09-02 18:12:41,921][03057] Batcher 0 profile tree view: +batching: 128.4729, releasing_batches: 0.0174 +[2025-09-02 18:12:41,922][03057] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0093 + wait_policy_total: 4936.8080 +update_model: 21.9419 + weight_update: 0.0013 +one_step: 0.0028 + handle_policy_step: 1920.4507 + deserialize: 72.0978, stack: 11.5074, obs_to_device_normalize: 437.7046, forward: 939.6098, send_messages: 94.9998 + prepare_outputs: 282.8230 + to_cpu: 169.2522 +[2025-09-02 18:12:41,923][03057] Learner 0 profile tree view: +misc: 0.0028, prepare_batch: 60.2502 +train: 270.3081 + epoch_init: 0.0025, minibatch_init: 0.0154, losses_postprocess: 0.2030, kl_divergence: 0.2463, after_optimizer: 147.6050 + calculate_losses: 86.5467 + losses_init: 0.0016, forward_head: 5.0863, bptt_initial: 71.2847, tail: 0.6654, advantages_returns: 0.2357, losses: 7.7762 + bptt: 1.3207 + bptt_forward_core: 1.2568 + update: 35.2737 + clip: 0.5477 +[2025-09-02 18:12:41,924][03057] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.9038, enqueue_policy_requests: 1147.9684, env_step: 5429.2517, overhead: 109.2582, complete_rollouts: 16.6535 +save_policy_outputs: 127.1976 + split_output_tensors: 47.6355 +[2025-09-02 18:12:41,925][03057] RolloutWorker_w9 profile tree view: +wait_for_trajectories: 0.9799, enqueue_policy_requests: 1093.1780, env_step: 5466.9592, overhead: 106.8543, complete_rollouts: 15.2246 +save_policy_outputs: 126.9153 + split_output_tensors: 47.4081 +[2025-09-02 18:12:41,926][03057] Loop Runner_EvtLoop terminating... +[2025-09-02 18:12:41,927][03057] Runner profile tree view: +main_loop: 7058.0076 +[2025-09-02 18:12:41,928][03057] Collected {0: 30081024}, FPS: 4262.0 +[2025-09-02 19:08:32,953][03057] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-09-02 19:08:32,954][03057] Overriding arg 'num_workers' with value 1 passed from command line +[2025-09-02 19:08:32,955][03057] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-09-02 19:08:32,956][03057] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-09-02 19:08:32,956][03057] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-09-02 19:08:32,958][03057] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-09-02 19:08:32,959][03057] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-09-02 19:08:32,960][03057] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-09-02 19:08:32,962][03057] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-09-02 19:08:32,963][03057] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-09-02 19:08:32,964][03057] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-09-02 19:08:32,965][03057] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-09-02 19:08:32,966][03057] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-09-02 19:08:32,967][03057] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-09-02 19:08:32,968][03057] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-09-02 19:08:32,999][03057] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-02 19:08:33,003][03057] RunningMeanStd input shape: (3, 72, 128) +[2025-09-02 19:08:33,004][03057] RunningMeanStd input shape: (1,) +[2025-09-02 19:08:33,016][03057] ConvEncoder: input_channels=3 +[2025-09-02 19:08:33,106][03057] Conv encoder output size: 512 +[2025-09-02 19:08:33,107][03057] Policy head output size: 512 +[2025-09-02 19:08:33,282][03057] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000459_30081024.pth... +[2025-09-02 19:08:34,046][03057] Num frames 100... +[2025-09-02 19:08:34,176][03057] Num frames 200... +[2025-09-02 19:08:34,299][03057] Num frames 300... +[2025-09-02 19:08:34,427][03057] Num frames 400... +[2025-09-02 19:08:34,565][03057] Num frames 500... +[2025-09-02 19:08:34,691][03057] Num frames 600... +[2025-09-02 19:08:34,822][03057] Num frames 700... +[2025-09-02 19:08:34,959][03057] Num frames 800... +[2025-09-02 19:08:35,085][03057] Num frames 900... +[2025-09-02 19:08:35,218][03057] Num frames 1000... +[2025-09-02 19:08:35,343][03057] Num frames 1100... +[2025-09-02 19:08:35,503][03057] Avg episode rewards: #0: 26.840, true rewards: #0: 11.840 +[2025-09-02 19:08:35,504][03057] Avg episode reward: 26.840, avg true_objective: 11.840 +[2025-09-02 19:08:35,535][03057] Num frames 1200... +[2025-09-02 19:08:35,663][03057] Num frames 1300... +[2025-09-02 19:08:35,792][03057] Num frames 1400... +[2025-09-02 19:08:35,919][03057] Num frames 1500... +[2025-09-02 19:08:36,048][03057] Num frames 1600... +[2025-09-02 19:08:36,176][03057] Num frames 1700... +[2025-09-02 19:08:36,308][03057] Num frames 1800... +[2025-09-02 19:08:36,432][03057] Num frames 1900... +[2025-09-02 19:08:36,573][03057] Num frames 2000... +[2025-09-02 19:08:36,702][03057] Num frames 2100... +[2025-09-02 19:08:36,829][03057] Num frames 2200... +[2025-09-02 19:08:36,955][03057] Num frames 2300... +[2025-09-02 19:08:37,079][03057] Num frames 2400... +[2025-09-02 19:08:37,205][03057] Num frames 2500... +[2025-09-02 19:08:37,330][03057] Num frames 2600... +[2025-09-02 19:08:37,459][03057] Num frames 2700... +[2025-09-02 19:08:37,595][03057] Num frames 2800... +[2025-09-02 19:08:37,729][03057] Num frames 2900... +[2025-09-02 19:08:37,859][03057] Num frames 3000... +[2025-09-02 19:08:37,987][03057] Num frames 3100... +[2025-09-02 19:08:38,127][03057] Avg episode rewards: #0: 39.840, true rewards: #0: 15.840 +[2025-09-02 19:08:38,128][03057] Avg episode reward: 39.840, avg true_objective: 15.840 +[2025-09-02 19:08:38,171][03057] Num frames 3200... +[2025-09-02 19:08:38,293][03057] Num frames 3300... +[2025-09-02 19:08:38,417][03057] Num frames 3400... +[2025-09-02 19:08:38,543][03057] Num frames 3500... +[2025-09-02 19:08:38,679][03057] Num frames 3600... +[2025-09-02 19:08:38,811][03057] Num frames 3700... +[2025-09-02 19:08:38,979][03057] Num frames 3800... +[2025-09-02 19:08:39,121][03057] Num frames 3900... +[2025-09-02 19:08:39,300][03057] Avg episode rewards: #0: 31.560, true rewards: #0: 13.227 +[2025-09-02 19:08:39,302][03057] Avg episode reward: 31.560, avg true_objective: 13.227 +[2025-09-02 19:08:39,365][03057] Num frames 4000... +[2025-09-02 19:08:39,538][03057] Num frames 4100... +[2025-09-02 19:08:39,718][03057] Num frames 4200... +[2025-09-02 19:08:39,951][03057] Num frames 4300... +[2025-09-02 19:08:40,166][03057] Num frames 4400... +[2025-09-02 19:08:40,338][03057] Num frames 4500... +[2025-09-02 19:08:40,512][03057] Num frames 4600... +[2025-09-02 19:08:40,779][03057] Num frames 4700... +[2025-09-02 19:08:40,962][03057] Num frames 4800... +[2025-09-02 19:08:41,154][03057] Num frames 4900... +[2025-09-02 19:08:41,327][03057] Num frames 5000... +[2025-09-02 19:08:41,415][03057] Avg episode rewards: #0: 29.815, true rewards: #0: 12.565 +[2025-09-02 19:08:41,416][03057] Avg episode reward: 29.815, avg true_objective: 12.565 +[2025-09-02 19:08:41,508][03057] Num frames 5100... +[2025-09-02 19:08:41,630][03057] Num frames 5200... +[2025-09-02 19:08:41,760][03057] Num frames 5300... +[2025-09-02 19:08:41,900][03057] Num frames 5400... +[2025-09-02 19:08:42,026][03057] Num frames 5500... +[2025-09-02 19:08:42,154][03057] Num frames 5600... +[2025-09-02 19:08:42,278][03057] Num frames 5700... +[2025-09-02 19:08:42,402][03057] Num frames 5800... +[2025-09-02 19:08:42,530][03057] Num frames 5900... +[2025-09-02 19:08:42,658][03057] Num frames 6000... +[2025-09-02 19:08:42,786][03057] Num frames 6100... +[2025-09-02 19:08:42,929][03057] Num frames 6200... +[2025-09-02 19:08:43,059][03057] Num frames 6300... +[2025-09-02 19:08:43,191][03057] Num frames 6400... +[2025-09-02 19:08:43,318][03057] Num frames 6500... +[2025-09-02 19:08:43,453][03057] Num frames 6600... +[2025-09-02 19:08:43,584][03057] Num frames 6700... +[2025-09-02 19:08:43,717][03057] Num frames 6800... +[2025-09-02 19:08:43,873][03057] Num frames 6900... +[2025-09-02 19:08:44,000][03057] Num frames 7000... +[2025-09-02 19:08:44,076][03057] Avg episode rewards: #0: 35.632, true rewards: #0: 14.032 +[2025-09-02 19:08:44,077][03057] Avg episode reward: 35.632, avg true_objective: 14.032 +[2025-09-02 19:08:44,188][03057] Num frames 7100... +[2025-09-02 19:08:44,333][03057] Num frames 7200... +[2025-09-02 19:08:44,465][03057] Num frames 7300... +[2025-09-02 19:08:44,591][03057] Num frames 7400... +[2025-09-02 19:08:44,715][03057] Num frames 7500... +[2025-09-02 19:08:44,844][03057] Num frames 7600... +[2025-09-02 19:08:44,981][03057] Num frames 7700... +[2025-09-02 19:08:45,107][03057] Num frames 7800... +[2025-09-02 19:08:45,246][03057] Num frames 7900... +[2025-09-02 19:08:45,377][03057] Num frames 8000... +[2025-09-02 19:08:45,506][03057] Num frames 8100... +[2025-09-02 19:08:45,634][03057] Num frames 8200... +[2025-09-02 19:08:45,761][03057] Num frames 8300... +[2025-09-02 19:08:45,891][03057] Num frames 8400... +[2025-09-02 19:08:46,030][03057] Num frames 8500... +[2025-09-02 19:08:46,154][03057] Num frames 8600... +[2025-09-02 19:08:46,282][03057] Num frames 8700... +[2025-09-02 19:08:46,429][03057] Avg episode rewards: #0: 37.626, true rewards: #0: 14.627 +[2025-09-02 19:08:46,430][03057] Avg episode reward: 37.626, avg true_objective: 14.627 +[2025-09-02 19:08:46,464][03057] Num frames 8800... +[2025-09-02 19:08:46,584][03057] Num frames 8900... +[2025-09-02 19:08:46,709][03057] Num frames 9000... +[2025-09-02 19:08:46,834][03057] Num frames 9100... +[2025-09-02 19:08:46,975][03057] Num frames 9200... +[2025-09-02 19:08:47,097][03057] Num frames 9300... +[2025-09-02 19:08:47,224][03057] Num frames 9400... +[2025-09-02 19:08:47,352][03057] Num frames 9500... +[2025-09-02 19:08:47,484][03057] Num frames 9600... +[2025-09-02 19:08:47,609][03057] Num frames 9700... +[2025-09-02 19:08:47,739][03057] Num frames 9800... +[2025-09-02 19:08:47,870][03057] Num frames 9900... +[2025-09-02 19:08:48,011][03057] Num frames 10000... +[2025-09-02 19:08:48,142][03057] Num frames 10100... +[2025-09-02 19:08:48,274][03057] Num frames 10200... +[2025-09-02 19:08:48,372][03057] Avg episode rewards: #0: 37.761, true rewards: #0: 14.619 +[2025-09-02 19:08:48,373][03057] Avg episode reward: 37.761, avg true_objective: 14.619 +[2025-09-02 19:08:48,461][03057] Num frames 10300... +[2025-09-02 19:08:48,584][03057] Num frames 10400... +[2025-09-02 19:08:48,709][03057] Num frames 10500... +[2025-09-02 19:08:48,835][03057] Num frames 10600... +[2025-09-02 19:08:48,962][03057] Num frames 10700... +[2025-09-02 19:08:49,096][03057] Num frames 10800... +[2025-09-02 19:08:49,223][03057] Num frames 10900... +[2025-09-02 19:08:49,348][03057] Num frames 11000... +[2025-09-02 19:08:49,473][03057] Num frames 11100... +[2025-09-02 19:08:49,598][03057] Num frames 11200... +[2025-09-02 19:08:49,725][03057] Num frames 11300... +[2025-09-02 19:08:49,890][03057] Avg episode rewards: #0: 36.856, true rewards: #0: 14.231 +[2025-09-02 19:08:49,890][03057] Avg episode reward: 36.856, avg true_objective: 14.231 +[2025-09-02 19:08:49,911][03057] Num frames 11400... +[2025-09-02 19:08:50,034][03057] Num frames 11500... +[2025-09-02 19:08:50,176][03057] Num frames 11600... +[2025-09-02 19:08:50,301][03057] Num frames 11700... +[2025-09-02 19:08:50,433][03057] Num frames 11800... +[2025-09-02 19:08:50,568][03057] Num frames 11900... +[2025-09-02 19:08:50,696][03057] Num frames 12000... +[2025-09-02 19:08:50,823][03057] Num frames 12100... +[2025-09-02 19:08:50,954][03057] Num frames 12200... +[2025-09-02 19:08:51,033][03057] Avg episode rewards: #0: 34.574, true rewards: #0: 13.574 +[2025-09-02 19:08:51,034][03057] Avg episode reward: 34.574, avg true_objective: 13.574 +[2025-09-02 19:08:51,152][03057] Num frames 12300... +[2025-09-02 19:08:51,278][03057] Num frames 12400... +[2025-09-02 19:08:51,451][03057] Num frames 12500... +[2025-09-02 19:08:51,632][03057] Num frames 12600... +[2025-09-02 19:08:51,818][03057] Num frames 12700... +[2025-09-02 19:08:51,998][03057] Num frames 12800... +[2025-09-02 19:08:52,182][03057] Num frames 12900... +[2025-09-02 19:08:52,360][03057] Num frames 13000... +[2025-09-02 19:08:52,533][03057] Num frames 13100... +[2025-09-02 19:08:52,708][03057] Num frames 13200... +[2025-09-02 19:08:52,899][03057] Num frames 13300... +[2025-09-02 19:08:53,091][03057] Num frames 13400... +[2025-09-02 19:08:53,294][03057] Num frames 13500... +[2025-09-02 19:08:53,482][03057] Num frames 13600... +[2025-09-02 19:08:53,634][03057] Num frames 13700... +[2025-09-02 19:08:53,763][03057] Num frames 13800... +[2025-09-02 19:08:53,891][03057] Num frames 13900... +[2025-09-02 19:08:54,020][03057] Num frames 14000... +[2025-09-02 19:08:54,167][03057] Avg episode rewards: #0: 35.773, true rewards: #0: 14.073 +[2025-09-02 19:08:54,168][03057] Avg episode reward: 35.773, avg true_objective: 14.073 +[2025-09-02 19:10:27,197][03057] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2025-09-02 19:52:10,351][03057] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-09-02 19:52:10,353][03057] Overriding arg 'num_workers' with value 1 passed from command line +[2025-09-02 19:52:10,353][03057] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-09-02 19:52:10,354][03057] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-09-02 19:52:10,355][03057] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-09-02 19:52:10,358][03057] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-09-02 19:52:10,359][03057] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-09-02 19:52:10,360][03057] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-09-02 19:52:10,361][03057] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-09-02 19:52:10,362][03057] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-09-02 19:52:10,363][03057] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-09-02 19:52:10,363][03057] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-09-02 19:52:10,364][03057] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-09-02 19:52:10,368][03057] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-09-02 19:52:10,369][03057] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-09-02 19:52:10,406][03057] RunningMeanStd input shape: (3, 72, 128) +[2025-09-02 19:52:10,408][03057] RunningMeanStd input shape: (1,) +[2025-09-02 19:52:10,423][03057] ConvEncoder: input_channels=3 +[2025-09-02 19:52:10,480][03057] Conv encoder output size: 512 +[2025-09-02 19:52:10,482][03057] Policy head output size: 512 +[2025-09-02 19:52:10,510][03057] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000459_30081024.pth... +[2025-09-02 19:52:11,143][03057] Num frames 100... +[2025-09-02 19:52:11,338][03057] Num frames 200... +[2025-09-02 19:52:11,515][03057] Num frames 300... +[2025-09-02 19:52:11,686][03057] Num frames 400... +[2025-09-02 19:52:11,865][03057] Num frames 500... +[2025-09-02 19:52:12,041][03057] Num frames 600... +[2025-09-02 19:52:12,226][03057] Num frames 700... +[2025-09-02 19:52:12,416][03057] Num frames 800... +[2025-09-02 19:52:12,579][03057] Num frames 900... +[2025-09-02 19:52:12,704][03057] Num frames 1000... +[2025-09-02 19:52:12,831][03057] Num frames 1100... +[2025-09-02 19:52:12,962][03057] Num frames 1200... +[2025-09-02 19:52:13,091][03057] Num frames 1300... +[2025-09-02 19:52:13,222][03057] Num frames 1400... +[2025-09-02 19:52:13,363][03057] Num frames 1500... +[2025-09-02 19:52:13,493][03057] Num frames 1600... +[2025-09-02 19:52:13,621][03057] Num frames 1700... +[2025-09-02 19:52:13,752][03057] Num frames 1800... +[2025-09-02 19:52:13,877][03057] Num frames 1900... +[2025-09-02 19:52:14,009][03057] Num frames 2000... +[2025-09-02 19:52:14,140][03057] Num frames 2100... +[2025-09-02 19:52:14,192][03057] Avg episode rewards: #0: 60.999, true rewards: #0: 21.000 +[2025-09-02 19:52:14,193][03057] Avg episode reward: 60.999, avg true_objective: 21.000 +[2025-09-02 19:52:14,323][03057] Num frames 2200... +[2025-09-02 19:52:14,463][03057] Num frames 2300... +[2025-09-02 19:52:14,593][03057] Num frames 2400... +[2025-09-02 19:52:14,722][03057] Num frames 2500... +[2025-09-02 19:52:14,849][03057] Num frames 2600... +[2025-09-02 19:52:14,976][03057] Num frames 2700... +[2025-09-02 19:52:15,102][03057] Num frames 2800... +[2025-09-02 19:52:15,226][03057] Num frames 2900... +[2025-09-02 19:52:15,350][03057] Num frames 3000... +[2025-09-02 19:52:15,490][03057] Num frames 3100... +[2025-09-02 19:52:15,614][03057] Num frames 3200... +[2025-09-02 19:52:15,745][03057] Num frames 3300... +[2025-09-02 19:52:15,872][03057] Num frames 3400... +[2025-09-02 19:52:15,998][03057] Num frames 3500... +[2025-09-02 19:52:16,123][03057] Num frames 3600... +[2025-09-02 19:52:16,253][03057] Num frames 3700... +[2025-09-02 19:52:16,379][03057] Num frames 3800... +[2025-09-02 19:52:16,518][03057] Num frames 3900... +[2025-09-02 19:52:16,644][03057] Avg episode rewards: #0: 53.279, true rewards: #0: 19.780 +[2025-09-02 19:52:16,645][03057] Avg episode reward: 53.279, avg true_objective: 19.780 +[2025-09-02 19:52:16,702][03057] Num frames 4000... +[2025-09-02 19:52:16,827][03057] Num frames 4100... +[2025-09-02 19:52:16,956][03057] Num frames 4200... +[2025-09-02 19:52:17,081][03057] Num frames 4300... +[2025-09-02 19:52:17,210][03057] Num frames 4400... +[2025-09-02 19:52:17,396][03057] Num frames 4500... +[2025-09-02 19:52:17,576][03057] Num frames 4600... +[2025-09-02 19:52:17,753][03057] Num frames 4700... +[2025-09-02 19:52:17,872][03057] Avg episode rewards: #0: 42.116, true rewards: #0: 15.783 +[2025-09-02 19:52:17,873][03057] Avg episode reward: 42.116, avg true_objective: 15.783 +[2025-09-02 19:52:17,984][03057] Num frames 4800... +[2025-09-02 19:52:18,105][03057] Num frames 4900... +[2025-09-02 19:52:18,231][03057] Num frames 5000... +[2025-09-02 19:52:18,357][03057] Num frames 5100... +[2025-09-02 19:52:18,500][03057] Num frames 5200... +[2025-09-02 19:52:18,629][03057] Num frames 5300... +[2025-09-02 19:52:18,740][03057] Avg episode rewards: #0: 34.857, true rewards: #0: 13.358 +[2025-09-02 19:52:18,741][03057] Avg episode reward: 34.857, avg true_objective: 13.358 +[2025-09-02 19:52:18,812][03057] Num frames 5400... +[2025-09-02 19:52:18,938][03057] Num frames 5500... +[2025-09-02 19:52:19,064][03057] Num frames 5600... +[2025-09-02 19:52:19,186][03057] Num frames 5700... +[2025-09-02 19:52:19,311][03057] Num frames 5800... +[2025-09-02 19:52:19,441][03057] Num frames 5900... +[2025-09-02 19:52:19,574][03057] Num frames 6000... +[2025-09-02 19:52:19,699][03057] Num frames 6100... +[2025-09-02 19:52:19,826][03057] Num frames 6200... +[2025-09-02 19:52:19,958][03057] Num frames 6300... +[2025-09-02 19:52:20,095][03057] Num frames 6400... +[2025-09-02 19:52:20,228][03057] Num frames 6500... +[2025-09-02 19:52:20,353][03057] Num frames 6600... +[2025-09-02 19:52:20,482][03057] Num frames 6700... +[2025-09-02 19:52:20,620][03057] Num frames 6800... +[2025-09-02 19:52:20,746][03057] Num frames 6900... +[2025-09-02 19:52:20,871][03057] Num frames 7000... +[2025-09-02 19:52:20,998][03057] Num frames 7100... +[2025-09-02 19:52:21,122][03057] Num frames 7200... +[2025-09-02 19:52:21,250][03057] Num frames 7300... +[2025-09-02 19:52:21,376][03057] Num frames 7400... +[2025-09-02 19:52:21,485][03057] Avg episode rewards: #0: 39.685, true rewards: #0: 14.886 +[2025-09-02 19:52:21,487][03057] Avg episode reward: 39.685, avg true_objective: 14.886 +[2025-09-02 19:52:21,555][03057] Num frames 7500... +[2025-09-02 19:52:21,686][03057] Num frames 7600... +[2025-09-02 19:52:21,810][03057] Num frames 7700... +[2025-09-02 19:52:21,935][03057] Num frames 7800... +[2025-09-02 19:52:22,059][03057] Num frames 7900... +[2025-09-02 19:52:22,186][03057] Num frames 8000... +[2025-09-02 19:52:22,308][03057] Num frames 8100... +[2025-09-02 19:52:22,433][03057] Num frames 8200... +[2025-09-02 19:52:22,575][03057] Num frames 8300... +[2025-09-02 19:52:22,765][03057] Num frames 8400... +[2025-09-02 19:52:22,941][03057] Num frames 8500... +[2025-09-02 19:52:23,119][03057] Num frames 8600... +[2025-09-02 19:52:23,293][03057] Num frames 8700... +[2025-09-02 19:52:23,473][03057] Num frames 8800... +[2025-09-02 19:52:23,647][03057] Num frames 8900... +[2025-09-02 19:52:23,838][03057] Num frames 9000... +[2025-09-02 19:52:24,015][03057] Num frames 9100... +[2025-09-02 19:52:24,203][03057] Num frames 9200... +[2025-09-02 19:52:24,387][03057] Num frames 9300... +[2025-09-02 19:52:24,583][03057] Num frames 9400... +[2025-09-02 19:52:24,768][03057] Num frames 9500... +[2025-09-02 19:52:24,886][03057] Avg episode rewards: #0: 42.071, true rewards: #0: 15.905 +[2025-09-02 19:52:24,887][03057] Avg episode reward: 42.071, avg true_objective: 15.905 +[2025-09-02 19:52:24,963][03057] Num frames 9600... +[2025-09-02 19:52:25,090][03057] Num frames 9700... +[2025-09-02 19:52:25,215][03057] Num frames 9800... +[2025-09-02 19:52:25,346][03057] Num frames 9900... +[2025-09-02 19:52:25,474][03057] Num frames 10000... +[2025-09-02 19:52:25,603][03057] Num frames 10100... +[2025-09-02 19:52:25,728][03057] Num frames 10200... +[2025-09-02 19:52:25,869][03057] Num frames 10300... +[2025-09-02 19:52:25,998][03057] Num frames 10400... +[2025-09-02 19:52:26,125][03057] Num frames 10500... +[2025-09-02 19:52:26,251][03057] Num frames 10600... +[2025-09-02 19:52:26,377][03057] Num frames 10700... +[2025-09-02 19:52:26,502][03057] Num frames 10800... +[2025-09-02 19:52:26,629][03057] Num frames 10900... +[2025-09-02 19:52:26,756][03057] Num frames 11000... +[2025-09-02 19:52:26,871][03057] Avg episode rewards: #0: 41.352, true rewards: #0: 15.781 +[2025-09-02 19:52:26,872][03057] Avg episode reward: 41.352, avg true_objective: 15.781 +[2025-09-02 19:52:26,941][03057] Num frames 11100... +[2025-09-02 19:52:27,063][03057] Num frames 11200... +[2025-09-02 19:52:27,189][03057] Num frames 11300... +[2025-09-02 19:52:27,312][03057] Num frames 11400... +[2025-09-02 19:52:27,438][03057] Num frames 11500... +[2025-09-02 19:52:27,563][03057] Num frames 11600... +[2025-09-02 19:52:27,688][03057] Num frames 11700... +[2025-09-02 19:52:27,814][03057] Num frames 11800... +[2025-09-02 19:52:27,957][03057] Num frames 11900... +[2025-09-02 19:52:28,084][03057] Num frames 12000... +[2025-09-02 19:52:28,208][03057] Num frames 12100... +[2025-09-02 19:52:28,330][03057] Num frames 12200... +[2025-09-02 19:52:28,398][03057] Avg episode rewards: #0: 39.637, true rewards: #0: 15.263 +[2025-09-02 19:52:28,399][03057] Avg episode reward: 39.637, avg true_objective: 15.263 +[2025-09-02 19:52:28,513][03057] Num frames 12300... +[2025-09-02 19:52:28,636][03057] Num frames 12400... +[2025-09-02 19:52:28,772][03057] Num frames 12500... +[2025-09-02 19:52:28,897][03057] Num frames 12600... +[2025-09-02 19:52:29,036][03057] Num frames 12700... +[2025-09-02 19:52:29,163][03057] Num frames 12800... +[2025-09-02 19:52:29,409][03057] Num frames 12900... +[2025-09-02 19:52:29,540][03057] Num frames 13000... +[2025-09-02 19:52:29,670][03057] Num frames 13100... +[2025-09-02 19:52:29,801][03057] Num frames 13200... +[2025-09-02 19:52:29,928][03057] Num frames 13300... +[2025-09-02 19:52:30,210][03057] Num frames 13400... +[2025-09-02 19:52:30,338][03057] Num frames 13500... +[2025-09-02 19:52:30,463][03057] Num frames 13600... +[2025-09-02 19:52:30,589][03057] Num frames 13700... +[2025-09-02 19:52:30,715][03057] Num frames 13800... +[2025-09-02 19:52:30,978][03057] Num frames 13900... +[2025-09-02 19:52:31,079][03057] Avg episode rewards: #0: 40.583, true rewards: #0: 15.472 +[2025-09-02 19:52:31,080][03057] Avg episode reward: 40.583, avg true_objective: 15.472 +[2025-09-02 19:52:31,175][03057] Num frames 14000... +[2025-09-02 19:52:31,302][03057] Num frames 14100... +[2025-09-02 19:52:31,428][03057] Num frames 14200... +[2025-09-02 19:52:31,556][03057] Num frames 14300... +[2025-09-02 19:52:31,686][03057] Num frames 14400... +[2025-09-02 19:52:31,815][03057] Num frames 14500... +[2025-09-02 19:52:31,950][03057] Num frames 14600... +[2025-09-02 19:52:32,091][03057] Num frames 14700... +[2025-09-02 19:52:32,219][03057] Num frames 14800... +[2025-09-02 19:52:32,344][03057] Num frames 14900... +[2025-09-02 19:52:32,472][03057] Num frames 15000... +[2025-09-02 19:52:32,601][03057] Num frames 15100... +[2025-09-02 19:52:32,730][03057] Num frames 15200... +[2025-09-02 19:52:32,859][03057] Num frames 15300... +[2025-09-02 19:52:32,986][03057] Num frames 15400... +[2025-09-02 19:52:33,124][03057] Num frames 15500... +[2025-09-02 19:52:33,256][03057] Avg episode rewards: #0: 41.256, true rewards: #0: 15.557 +[2025-09-02 19:52:33,257][03057] Avg episode reward: 41.256, avg true_objective: 15.557 +[2025-09-02 19:54:16,331][03057] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2025-09-02 19:55:29,757][03057] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-09-02 19:55:29,758][03057] Overriding arg 'num_workers' with value 1 passed from command line +[2025-09-02 19:55:29,759][03057] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-09-02 19:55:29,760][03057] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-09-02 19:55:29,760][03057] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-09-02 19:55:29,761][03057] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-09-02 19:55:29,762][03057] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-09-02 19:55:29,763][03057] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-09-02 19:55:29,764][03057] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-09-02 19:55:29,765][03057] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-09-02 19:55:29,766][03057] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-09-02 19:55:29,766][03057] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-09-02 19:55:29,767][03057] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-09-02 19:55:29,768][03057] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-09-02 19:55:29,769][03057] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-09-02 19:55:29,798][03057] RunningMeanStd input shape: (3, 72, 128) +[2025-09-02 19:55:29,799][03057] RunningMeanStd input shape: (1,) +[2025-09-02 19:55:29,809][03057] ConvEncoder: input_channels=3 +[2025-09-02 19:55:29,840][03057] Conv encoder output size: 512 +[2025-09-02 19:55:29,841][03057] Policy head output size: 512 +[2025-09-02 19:55:29,857][03057] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000459_30081024.pth... +[2025-09-02 19:55:30,293][03057] Num frames 100... +[2025-09-02 19:55:30,415][03057] Num frames 200... +[2025-09-02 19:55:30,539][03057] Num frames 300... +[2025-09-02 19:55:30,665][03057] Num frames 400... +[2025-09-02 19:55:30,795][03057] Num frames 500... +[2025-09-02 19:55:30,925][03057] Num frames 600... +[2025-09-02 19:55:31,072][03057] Num frames 700... +[2025-09-02 19:55:31,203][03057] Num frames 800... +[2025-09-02 19:55:31,336][03057] Num frames 900... +[2025-09-02 19:55:31,466][03057] Num frames 1000... +[2025-09-02 19:55:31,579][03057] Avg episode rewards: #0: 24.440, true rewards: #0: 10.440 +[2025-09-02 19:55:31,580][03057] Avg episode reward: 24.440, avg true_objective: 10.440 +[2025-09-02 19:55:31,652][03057] Num frames 1100... +[2025-09-02 19:55:31,780][03057] Num frames 1200... +[2025-09-02 19:55:31,909][03057] Num frames 1300... +[2025-09-02 19:55:32,051][03057] Num frames 1400... +[2025-09-02 19:55:32,185][03057] Num frames 1500... +[2025-09-02 19:55:32,313][03057] Num frames 1600... +[2025-09-02 19:55:32,442][03057] Num frames 1700... +[2025-09-02 19:55:32,573][03057] Num frames 1800... +[2025-09-02 19:55:32,701][03057] Num frames 1900... +[2025-09-02 19:55:32,826][03057] Num frames 2000... +[2025-09-02 19:55:32,955][03057] Num frames 2100... +[2025-09-02 19:55:33,093][03057] Num frames 2200... +[2025-09-02 19:55:33,221][03057] Num frames 2300... +[2025-09-02 19:55:33,347][03057] Num frames 2400... +[2025-09-02 19:55:33,475][03057] Num frames 2500... +[2025-09-02 19:55:33,599][03057] Num frames 2600... +[2025-09-02 19:55:33,712][03057] Avg episode rewards: #0: 29.220, true rewards: #0: 13.220 +[2025-09-02 19:55:33,712][03057] Avg episode reward: 29.220, avg true_objective: 13.220 +[2025-09-02 19:55:33,783][03057] Num frames 2700... +[2025-09-02 19:55:33,910][03057] Num frames 2800... +[2025-09-02 19:55:34,039][03057] Num frames 2900... +[2025-09-02 19:55:34,184][03057] Num frames 3000... +[2025-09-02 19:55:34,320][03057] Num frames 3100... +[2025-09-02 19:55:34,451][03057] Num frames 3200... +[2025-09-02 19:55:34,579][03057] Num frames 3300... +[2025-09-02 19:55:34,708][03057] Num frames 3400... +[2025-09-02 19:55:34,837][03057] Num frames 3500... +[2025-09-02 19:55:34,963][03057] Num frames 3600... +[2025-09-02 19:55:35,093][03057] Num frames 3700... +[2025-09-02 19:55:35,228][03057] Num frames 3800... +[2025-09-02 19:55:35,363][03057] Num frames 3900... +[2025-09-02 19:55:35,489][03057] Num frames 4000... +[2025-09-02 19:55:35,617][03057] Num frames 4100... +[2025-09-02 19:55:35,747][03057] Num frames 4200... +[2025-09-02 19:55:35,876][03057] Num frames 4300... +[2025-09-02 19:55:36,006][03057] Num frames 4400... +[2025-09-02 19:55:36,144][03057] Num frames 4500... +[2025-09-02 19:55:36,275][03057] Num frames 4600... +[2025-09-02 19:55:36,403][03057] Num frames 4700... +[2025-09-02 19:55:36,503][03057] Avg episode rewards: #0: 39.116, true rewards: #0: 15.783 +[2025-09-02 19:55:36,504][03057] Avg episode reward: 39.116, avg true_objective: 15.783 +[2025-09-02 19:55:36,585][03057] Num frames 4800... +[2025-09-02 19:55:36,712][03057] Num frames 4900... +[2025-09-02 19:55:36,838][03057] Num frames 5000... +[2025-09-02 19:55:36,962][03057] Num frames 5100... +[2025-09-02 19:55:37,085][03057] Num frames 5200... +[2025-09-02 19:55:37,227][03057] Num frames 5300... +[2025-09-02 19:55:37,397][03057] Avg episode rewards: #0: 32.972, true rewards: #0: 13.472 +[2025-09-02 19:55:37,398][03057] Avg episode reward: 32.972, avg true_objective: 13.472 +[2025-09-02 19:55:37,413][03057] Num frames 5400... +[2025-09-02 19:55:37,538][03057] Num frames 5500... +[2025-09-02 19:55:37,663][03057] Num frames 5600... +[2025-09-02 19:55:37,790][03057] Num frames 5700... +[2025-09-02 19:55:37,918][03057] Num frames 5800... +[2025-09-02 19:55:38,043][03057] Num frames 5900... +[2025-09-02 19:55:38,170][03057] Num frames 6000... +[2025-09-02 19:55:38,309][03057] Num frames 6100... +[2025-09-02 19:55:38,436][03057] Num frames 6200... +[2025-09-02 19:55:38,563][03057] Num frames 6300... +[2025-09-02 19:55:38,689][03057] Num frames 6400... +[2025-09-02 19:55:38,816][03057] Num frames 6500... +[2025-09-02 19:55:38,946][03057] Num frames 6600... +[2025-09-02 19:55:39,071][03057] Num frames 6700... +[2025-09-02 19:55:39,258][03057] Num frames 6800... +[2025-09-02 19:55:39,374][03057] Avg episode rewards: #0: 33.658, true rewards: #0: 13.658 +[2025-09-02 19:55:39,375][03057] Avg episode reward: 33.658, avg true_objective: 13.658 +[2025-09-02 19:55:39,505][03057] Num frames 6900... +[2025-09-02 19:55:39,677][03057] Num frames 7000... +[2025-09-02 19:55:39,856][03057] Num frames 7100... +[2025-09-02 19:55:40,037][03057] Num frames 7200... +[2025-09-02 19:55:40,212][03057] Num frames 7300... +[2025-09-02 19:55:40,395][03057] Num frames 7400... +[2025-09-02 19:55:40,573][03057] Num frames 7500... +[2025-09-02 19:55:40,756][03057] Num frames 7600... +[2025-09-02 19:55:40,938][03057] Num frames 7700... +[2025-09-02 19:55:41,126][03057] Num frames 7800... +[2025-09-02 19:55:41,316][03057] Num frames 7900... +[2025-09-02 19:55:41,475][03057] Num frames 8000... +[2025-09-02 19:55:41,602][03057] Num frames 8100... +[2025-09-02 19:55:41,730][03057] Num frames 8200... +[2025-09-02 19:55:41,861][03057] Num frames 8300... +[2025-09-02 19:55:41,992][03057] Num frames 8400... +[2025-09-02 19:55:42,119][03057] Num frames 8500... +[2025-09-02 19:55:42,246][03057] Num frames 8600... +[2025-09-02 19:55:42,378][03057] Num frames 8700... +[2025-09-02 19:55:42,443][03057] Avg episode rewards: #0: 36.013, true rewards: #0: 14.513 +[2025-09-02 19:55:42,445][03057] Avg episode reward: 36.013, avg true_objective: 14.513 +[2025-09-02 19:55:42,564][03057] Num frames 8800... +[2025-09-02 19:55:42,690][03057] Num frames 8900... +[2025-09-02 19:55:42,823][03057] Num frames 9000... +[2025-09-02 19:55:42,953][03057] Num frames 9100... +[2025-09-02 19:55:43,078][03057] Num frames 9200... +[2025-09-02 19:55:43,209][03057] Num frames 9300... +[2025-09-02 19:55:43,337][03057] Num frames 9400... +[2025-09-02 19:55:43,496][03057] Avg episode rewards: #0: 32.965, true rewards: #0: 13.537 +[2025-09-02 19:55:43,497][03057] Avg episode reward: 32.965, avg true_objective: 13.537 +[2025-09-02 19:55:43,529][03057] Num frames 9500... +[2025-09-02 19:55:43,659][03057] Num frames 9600... +[2025-09-02 19:55:43,787][03057] Num frames 9700... +[2025-09-02 19:55:43,912][03057] Num frames 9800... +[2025-09-02 19:55:44,040][03057] Num frames 9900... +[2025-09-02 19:55:44,207][03057] Avg episode rewards: #0: 29.985, true rewards: #0: 12.485 +[2025-09-02 19:55:44,208][03057] Avg episode reward: 29.985, avg true_objective: 12.485 +[2025-09-02 19:55:44,229][03057] Num frames 10000... +[2025-09-02 19:55:44,358][03057] Num frames 10100... +[2025-09-02 19:55:44,499][03057] Num frames 10200... +[2025-09-02 19:55:44,626][03057] Num frames 10300... +[2025-09-02 19:55:44,751][03057] Num frames 10400... +[2025-09-02 19:55:44,878][03057] Num frames 10500... +[2025-09-02 19:55:44,974][03057] Avg episode rewards: #0: 27.813, true rewards: #0: 11.702 +[2025-09-02 19:55:44,975][03057] Avg episode reward: 27.813, avg true_objective: 11.702 +[2025-09-02 19:55:45,060][03057] Num frames 10600... +[2025-09-02 19:55:45,189][03057] Num frames 10700... +[2025-09-02 19:55:45,317][03057] Num frames 10800... +[2025-09-02 19:55:45,441][03057] Num frames 10900... +[2025-09-02 19:55:45,582][03057] Num frames 11000... +[2025-09-02 19:55:45,712][03057] Num frames 11100... +[2025-09-02 19:55:45,838][03057] Num frames 11200... +[2025-09-02 19:55:45,929][03057] Avg episode rewards: #0: 26.527, true rewards: #0: 11.227 +[2025-09-02 19:55:45,930][03057] Avg episode reward: 26.527, avg true_objective: 11.227 +[2025-09-02 19:56:58,996][03057] Replay video saved to /content/train_dir/default_experiment/replay.mp4!