diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1236 @@ +[2025-08-18 16:45:17,470][02710] Saving configuration to /content/train_dir/default_experiment/config.json... +[2025-08-18 16:45:17,472][02710] Rollout worker 0 uses device cpu +[2025-08-18 16:45:17,473][02710] Rollout worker 1 uses device cpu +[2025-08-18 16:45:17,475][02710] Rollout worker 2 uses device cpu +[2025-08-18 16:45:17,475][02710] Rollout worker 3 uses device cpu +[2025-08-18 16:45:17,476][02710] Rollout worker 4 uses device cpu +[2025-08-18 16:45:17,477][02710] Rollout worker 5 uses device cpu +[2025-08-18 16:45:17,478][02710] Rollout worker 6 uses device cpu +[2025-08-18 16:45:17,479][02710] Rollout worker 7 uses device cpu +[2025-08-18 16:45:17,646][02710] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-18 16:45:17,647][02710] InferenceWorker_p0-w0: min num requests: 2 +[2025-08-18 16:45:17,677][02710] Starting all processes... +[2025-08-18 16:45:17,678][02710] Starting process learner_proc0 +[2025-08-18 16:45:17,729][02710] Starting all processes... +[2025-08-18 16:45:17,738][02710] Starting process inference_proc0-0 +[2025-08-18 16:45:17,741][02710] Starting process rollout_proc0 +[2025-08-18 16:45:17,751][02710] Starting process rollout_proc1 +[2025-08-18 16:45:17,751][02710] Starting process rollout_proc2 +[2025-08-18 16:45:17,751][02710] Starting process rollout_proc3 +[2025-08-18 16:45:17,751][02710] Starting process rollout_proc4 +[2025-08-18 16:45:17,751][02710] Starting process rollout_proc5 +[2025-08-18 16:45:17,751][02710] Starting process rollout_proc6 +[2025-08-18 16:45:17,751][02710] Starting process rollout_proc7 +[2025-08-18 16:45:34,168][02847] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-18 16:45:34,179][02847] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-08-18 16:45:34,276][02847] Num visible devices: 1 +[2025-08-18 16:45:34,417][02834] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-18 16:45:34,424][02834] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-08-18 16:45:34,514][02834] Num visible devices: 1 +[2025-08-18 16:45:34,530][02834] Starting seed is not provided +[2025-08-18 16:45:34,531][02834] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-18 16:45:34,532][02834] Initializing actor-critic model on device cuda:0 +[2025-08-18 16:45:34,533][02834] RunningMeanStd input shape: (3, 72, 128) +[2025-08-18 16:45:34,536][02834] RunningMeanStd input shape: (1,) +[2025-08-18 16:45:34,604][02834] ConvEncoder: input_channels=3 +[2025-08-18 16:45:35,470][02849] Worker 1 uses CPU cores [1] +[2025-08-18 16:45:35,692][02852] Worker 5 uses CPU cores [1] +[2025-08-18 16:45:35,692][02854] Worker 7 uses CPU cores [1] +[2025-08-18 16:45:35,700][02851] Worker 3 uses CPU cores [1] +[2025-08-18 16:45:35,724][02850] Worker 2 uses CPU cores [0] +[2025-08-18 16:45:35,789][02848] Worker 0 uses CPU cores [0] +[2025-08-18 16:45:35,851][02855] Worker 6 uses CPU cores [0] +[2025-08-18 16:45:35,856][02834] Conv encoder output size: 512 +[2025-08-18 16:45:35,857][02834] Policy head output size: 512 +[2025-08-18 16:45:35,937][02834] Created Actor Critic model with architecture: +[2025-08-18 16:45:35,938][02834] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2025-08-18 16:45:35,945][02853] Worker 4 uses CPU cores [0] +[2025-08-18 16:45:36,224][02834] Using optimizer +[2025-08-18 16:45:37,641][02710] Heartbeat connected on Batcher_0 +[2025-08-18 16:45:37,647][02710] Heartbeat connected on InferenceWorker_p0-w0 +[2025-08-18 16:45:37,657][02710] Heartbeat connected on RolloutWorker_w0 +[2025-08-18 16:45:37,658][02710] Heartbeat connected on RolloutWorker_w1 +[2025-08-18 16:45:37,660][02710] Heartbeat connected on RolloutWorker_w2 +[2025-08-18 16:45:37,666][02710] Heartbeat connected on RolloutWorker_w4 +[2025-08-18 16:45:37,668][02710] Heartbeat connected on RolloutWorker_w3 +[2025-08-18 16:45:37,669][02710] Heartbeat connected on RolloutWorker_w5 +[2025-08-18 16:45:37,673][02710] Heartbeat connected on RolloutWorker_w6 +[2025-08-18 16:45:37,677][02710] Heartbeat connected on RolloutWorker_w7 +[2025-08-18 16:45:40,966][02834] No checkpoints found +[2025-08-18 16:45:40,967][02834] Did not load from checkpoint, starting from scratch! +[2025-08-18 16:45:40,967][02834] Initialized policy 0 weights for model version 0 +[2025-08-18 16:45:40,970][02834] LearnerWorker_p0 finished initialization! +[2025-08-18 16:45:40,970][02834] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-18 16:45:40,982][02710] Heartbeat connected on LearnerWorker_p0 +[2025-08-18 16:45:41,117][02847] RunningMeanStd input shape: (3, 72, 128) +[2025-08-18 16:45:41,118][02847] RunningMeanStd input shape: (1,) +[2025-08-18 16:45:41,130][02847] ConvEncoder: input_channels=3 +[2025-08-18 16:45:41,232][02847] Conv encoder output size: 512 +[2025-08-18 16:45:41,232][02847] Policy head output size: 512 +[2025-08-18 16:45:41,270][02710] Inference worker 0-0 is ready! +[2025-08-18 16:45:41,271][02710] All inference workers are ready! Signal rollout workers to start! +[2025-08-18 16:45:41,521][02849] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-18 16:45:41,524][02855] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-18 16:45:41,534][02848] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-18 16:45:41,535][02850] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-18 16:45:41,550][02851] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-18 16:45:41,557][02852] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-18 16:45:41,569][02854] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-18 16:45:41,565][02853] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-18 16:45:42,421][02710] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-18 16:45:42,761][02850] Decorrelating experience for 0 frames... +[2025-08-18 16:45:42,761][02851] Decorrelating experience for 0 frames... +[2025-08-18 16:45:42,763][02853] Decorrelating experience for 0 frames... +[2025-08-18 16:45:43,139][02853] Decorrelating experience for 32 frames... +[2025-08-18 16:45:43,145][02851] Decorrelating experience for 32 frames... +[2025-08-18 16:45:43,701][02851] Decorrelating experience for 64 frames... +[2025-08-18 16:45:43,974][02850] Decorrelating experience for 32 frames... +[2025-08-18 16:45:44,137][02851] Decorrelating experience for 96 frames... +[2025-08-18 16:45:44,186][02853] Decorrelating experience for 64 frames... +[2025-08-18 16:45:44,858][02850] Decorrelating experience for 64 frames... +[2025-08-18 16:45:44,918][02853] Decorrelating experience for 96 frames... +[2025-08-18 16:45:45,337][02850] Decorrelating experience for 96 frames... +[2025-08-18 16:45:47,423][02710] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 336.7. Samples: 1684. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-18 16:45:47,433][02710] Avg episode reward: [(0, '2.807')] +[2025-08-18 16:45:48,920][02834] Signal inference workers to stop experience collection... +[2025-08-18 16:45:48,980][02847] InferenceWorker_p0-w0: stopping experience collection +[2025-08-18 16:45:49,912][02834] Signal inference workers to resume experience collection... +[2025-08-18 16:45:49,913][02847] InferenceWorker_p0-w0: resuming experience collection +[2025-08-18 16:45:52,421][02710] Fps is (10 sec: 1638.4, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 16384. Throughput: 0: 234.8. Samples: 2348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:45:52,422][02710] Avg episode reward: [(0, '3.675')] +[2025-08-18 16:45:57,421][02710] Fps is (10 sec: 3277.2, 60 sec: 2184.5, 300 sec: 2184.5). Total num frames: 32768. Throughput: 0: 541.5. Samples: 8122. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:45:57,424][02710] Avg episode reward: [(0, '3.983')] +[2025-08-18 16:45:58,451][02847] Updated weights for policy 0, policy_version 10 (0.0014) +[2025-08-18 16:46:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 49152. Throughput: 0: 640.4. Samples: 12808. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:46:02,424][02710] Avg episode reward: [(0, '4.358')] +[2025-08-18 16:46:07,421][02710] Fps is (10 sec: 3276.8, 60 sec: 2621.5, 300 sec: 2621.5). Total num frames: 65536. Throughput: 0: 611.1. Samples: 15278. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:46:07,426][02710] Avg episode reward: [(0, '4.387')] +[2025-08-18 16:46:10,606][02847] Updated weights for policy 0, policy_version 20 (0.0012) +[2025-08-18 16:46:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 86016. Throughput: 0: 713.8. Samples: 21414. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:46:12,426][02710] Avg episode reward: [(0, '4.366')] +[2025-08-18 16:46:17,421][02710] Fps is (10 sec: 3686.4, 60 sec: 2925.7, 300 sec: 2925.7). Total num frames: 102400. Throughput: 0: 749.0. Samples: 26214. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:46:17,607][02710] Avg episode reward: [(0, '4.474')] +[2025-08-18 16:46:17,611][02834] Saving new best policy, reward=4.474! +[2025-08-18 16:46:22,010][02847] Updated weights for policy 0, policy_version 30 (0.0012) +[2025-08-18 16:46:22,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 122880. Throughput: 0: 728.3. Samples: 29130. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:46:22,428][02710] Avg episode reward: [(0, '4.356')] +[2025-08-18 16:46:27,423][02710] Fps is (10 sec: 4095.4, 60 sec: 3185.7, 300 sec: 3185.7). Total num frames: 143360. Throughput: 0: 780.8. Samples: 35138. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:46:27,424][02710] Avg episode reward: [(0, '4.364')] +[2025-08-18 16:46:32,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3113.0, 300 sec: 3113.0). Total num frames: 155648. Throughput: 0: 852.0. Samples: 40024. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:46:32,425][02710] Avg episode reward: [(0, '4.474')] +[2025-08-18 16:46:33,446][02847] Updated weights for policy 0, policy_version 40 (0.0013) +[2025-08-18 16:46:37,421][02710] Fps is (10 sec: 3277.3, 60 sec: 3202.3, 300 sec: 3202.3). Total num frames: 176128. Throughput: 0: 904.5. Samples: 43052. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:46:37,423][02710] Avg episode reward: [(0, '4.520')] +[2025-08-18 16:46:37,428][02834] Saving new best policy, reward=4.520! +[2025-08-18 16:46:42,422][02710] Fps is (10 sec: 4095.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 196608. Throughput: 0: 902.2. Samples: 48720. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:46:42,423][02710] Avg episode reward: [(0, '4.406')] +[2025-08-18 16:46:44,910][02847] Updated weights for policy 0, policy_version 50 (0.0012) +[2025-08-18 16:46:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3276.8). Total num frames: 212992. Throughput: 0: 910.2. Samples: 53768. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:46:47,423][02710] Avg episode reward: [(0, '4.282')] +[2025-08-18 16:46:52,421][02710] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3335.3). Total num frames: 233472. Throughput: 0: 922.8. Samples: 56804. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:46:52,423][02710] Avg episode reward: [(0, '4.279')] +[2025-08-18 16:46:55,698][02847] Updated weights for policy 0, policy_version 60 (0.0015) +[2025-08-18 16:46:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3331.4). Total num frames: 249856. Throughput: 0: 901.1. Samples: 61962. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:46:57,423][02710] Avg episode reward: [(0, '4.481')] +[2025-08-18 16:47:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3328.0). Total num frames: 266240. Throughput: 0: 919.1. Samples: 67574. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:47:02,426][02710] Avg episode reward: [(0, '4.473')] +[2025-08-18 16:47:06,567][02847] Updated weights for policy 0, policy_version 70 (0.0015) +[2025-08-18 16:47:07,422][02710] Fps is (10 sec: 3686.1, 60 sec: 3686.3, 300 sec: 3373.1). Total num frames: 286720. Throughput: 0: 921.9. Samples: 70616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:47:07,424][02710] Avg episode reward: [(0, '4.403')] +[2025-08-18 16:47:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3367.8). Total num frames: 303104. Throughput: 0: 894.4. Samples: 75384. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:47:12,425][02710] Avg episode reward: [(0, '4.476')] +[2025-08-18 16:47:12,431][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth... +[2025-08-18 16:47:17,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3686.4, 300 sec: 3406.1). Total num frames: 323584. Throughput: 0: 918.7. Samples: 81364. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:47:17,429][02710] Avg episode reward: [(0, '4.339')] +[2025-08-18 16:47:18,067][02847] Updated weights for policy 0, policy_version 80 (0.0016) +[2025-08-18 16:47:22,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3399.7). Total num frames: 339968. Throughput: 0: 918.2. Samples: 84372. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:47:22,423][02710] Avg episode reward: [(0, '4.223')] +[2025-08-18 16:47:27,422][02710] Fps is (10 sec: 2867.1, 60 sec: 3481.7, 300 sec: 3354.8). Total num frames: 352256. Throughput: 0: 887.9. Samples: 88676. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:47:27,425][02710] Avg episode reward: [(0, '4.369')] +[2025-08-18 16:47:31,789][02847] Updated weights for policy 0, policy_version 90 (0.0014) +[2025-08-18 16:47:32,421][02710] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3351.3). Total num frames: 368640. Throughput: 0: 870.5. Samples: 92942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:47:32,423][02710] Avg episode reward: [(0, '4.571')] +[2025-08-18 16:47:32,433][02834] Saving new best policy, reward=4.571! +[2025-08-18 16:47:37,421][02710] Fps is (10 sec: 3276.9, 60 sec: 3481.6, 300 sec: 3348.0). Total num frames: 385024. Throughput: 0: 857.3. Samples: 95384. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:47:37,423][02710] Avg episode reward: [(0, '4.595')] +[2025-08-18 16:47:37,425][02834] Saving new best policy, reward=4.595! +[2025-08-18 16:47:42,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3379.2). Total num frames: 405504. Throughput: 0: 860.6. Samples: 100690. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:47:42,423][02710] Avg episode reward: [(0, '4.709')] +[2025-08-18 16:47:42,431][02834] Saving new best policy, reward=4.709! +[2025-08-18 16:47:43,242][02847] Updated weights for policy 0, policy_version 100 (0.0013) +[2025-08-18 16:47:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3375.1). Total num frames: 421888. Throughput: 0: 868.0. Samples: 106636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:47:47,423][02710] Avg episode reward: [(0, '4.724')] +[2025-08-18 16:47:47,452][02834] Saving new best policy, reward=4.724! +[2025-08-18 16:47:52,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3371.3). Total num frames: 438272. Throughput: 0: 842.2. Samples: 108514. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:47:52,425][02710] Avg episode reward: [(0, '4.591')] +[2025-08-18 16:47:54,745][02847] Updated weights for policy 0, policy_version 110 (0.0012) +[2025-08-18 16:47:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3398.2). Total num frames: 458752. Throughput: 0: 866.7. Samples: 114386. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:47:57,430][02710] Avg episode reward: [(0, '4.709')] +[2025-08-18 16:48:02,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3393.8). Total num frames: 475136. Throughput: 0: 854.9. Samples: 119834. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:48:02,423][02710] Avg episode reward: [(0, '4.757')] +[2025-08-18 16:48:02,429][02834] Saving new best policy, reward=4.757! +[2025-08-18 16:48:06,314][02847] Updated weights for policy 0, policy_version 120 (0.0012) +[2025-08-18 16:48:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3418.0). Total num frames: 495616. Throughput: 0: 837.6. Samples: 122064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:48:07,423][02710] Avg episode reward: [(0, '4.793')] +[2025-08-18 16:48:07,426][02834] Saving new best policy, reward=4.793! +[2025-08-18 16:48:12,429][02710] Fps is (10 sec: 3683.4, 60 sec: 3481.1, 300 sec: 3413.2). Total num frames: 512000. Throughput: 0: 874.1. Samples: 128016. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:48:12,431][02710] Avg episode reward: [(0, '4.621')] +[2025-08-18 16:48:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3408.9). Total num frames: 528384. Throughput: 0: 888.4. Samples: 132922. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:48:17,423][02710] Avg episode reward: [(0, '4.410')] +[2025-08-18 16:48:17,796][02847] Updated weights for policy 0, policy_version 130 (0.0013) +[2025-08-18 16:48:22,422][02710] Fps is (10 sec: 3689.3, 60 sec: 3481.6, 300 sec: 3430.4). Total num frames: 548864. Throughput: 0: 896.6. Samples: 135730. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:48:22,426][02710] Avg episode reward: [(0, '4.345')] +[2025-08-18 16:48:27,424][02710] Fps is (10 sec: 4094.8, 60 sec: 3618.0, 300 sec: 3450.5). Total num frames: 569344. Throughput: 0: 913.0. Samples: 141776. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:48:27,426][02710] Avg episode reward: [(0, '4.459')] +[2025-08-18 16:48:28,106][02847] Updated weights for policy 0, policy_version 140 (0.0013) +[2025-08-18 16:48:32,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3445.5). Total num frames: 585728. Throughput: 0: 886.6. Samples: 146534. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:48:32,423][02710] Avg episode reward: [(0, '4.510')] +[2025-08-18 16:48:37,421][02710] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3440.6). Total num frames: 602112. Throughput: 0: 905.5. Samples: 149262. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:48:37,426][02710] Avg episode reward: [(0, '4.573')] +[2025-08-18 16:48:39,959][02847] Updated weights for policy 0, policy_version 150 (0.0013) +[2025-08-18 16:48:42,423][02710] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3458.8). Total num frames: 622592. Throughput: 0: 906.0. Samples: 155156. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:48:42,424][02710] Avg episode reward: [(0, '4.590')] +[2025-08-18 16:48:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3453.9). Total num frames: 638976. Throughput: 0: 889.0. Samples: 159838. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:48:47,426][02710] Avg episode reward: [(0, '4.541')] +[2025-08-18 16:48:51,438][02847] Updated weights for policy 0, policy_version 160 (0.0013) +[2025-08-18 16:48:52,421][02710] Fps is (10 sec: 3277.4, 60 sec: 3618.1, 300 sec: 3449.3). Total num frames: 655360. Throughput: 0: 906.7. Samples: 162866. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:48:52,425][02710] Avg episode reward: [(0, '4.667')] +[2025-08-18 16:48:57,422][02710] Fps is (10 sec: 3276.7, 60 sec: 3549.9, 300 sec: 3444.8). Total num frames: 671744. Throughput: 0: 893.8. Samples: 168232. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:48:57,427][02710] Avg episode reward: [(0, '4.904')] +[2025-08-18 16:48:57,429][02834] Saving new best policy, reward=4.904! +[2025-08-18 16:49:02,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3461.1). Total num frames: 692224. Throughput: 0: 900.0. Samples: 173422. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:49:02,423][02710] Avg episode reward: [(0, '4.826')] +[2025-08-18 16:49:03,084][02847] Updated weights for policy 0, policy_version 170 (0.0014) +[2025-08-18 16:49:07,421][02710] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3476.6). Total num frames: 712704. Throughput: 0: 904.6. Samples: 176438. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:49:07,423][02710] Avg episode reward: [(0, '4.750')] +[2025-08-18 16:49:12,422][02710] Fps is (10 sec: 3276.5, 60 sec: 3550.3, 300 sec: 3452.3). Total num frames: 724992. Throughput: 0: 881.7. Samples: 181450. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:49:12,427][02710] Avg episode reward: [(0, '4.574')] +[2025-08-18 16:49:12,435][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth... +[2025-08-18 16:49:14,619][02847] Updated weights for policy 0, policy_version 180 (0.0013) +[2025-08-18 16:49:17,422][02710] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3467.3). Total num frames: 745472. Throughput: 0: 905.3. Samples: 187274. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:49:17,426][02710] Avg episode reward: [(0, '4.477')] +[2025-08-18 16:49:22,421][02710] Fps is (10 sec: 4096.3, 60 sec: 3618.1, 300 sec: 3481.6). Total num frames: 765952. Throughput: 0: 911.3. Samples: 190272. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:49:22,425][02710] Avg episode reward: [(0, '4.415')] +[2025-08-18 16:49:26,080][02847] Updated weights for policy 0, policy_version 190 (0.0013) +[2025-08-18 16:49:27,424][02710] Fps is (10 sec: 3685.4, 60 sec: 3549.9, 300 sec: 3477.0). Total num frames: 782336. Throughput: 0: 885.2. Samples: 194990. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:49:27,426][02710] Avg episode reward: [(0, '4.695')] +[2025-08-18 16:49:32,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3490.5). Total num frames: 802816. Throughput: 0: 914.8. Samples: 201004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:49:32,423][02710] Avg episode reward: [(0, '4.864')] +[2025-08-18 16:49:36,760][02847] Updated weights for policy 0, policy_version 200 (0.0012) +[2025-08-18 16:49:37,422][02710] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3485.9). Total num frames: 819200. Throughput: 0: 913.5. Samples: 203972. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:49:37,424][02710] Avg episode reward: [(0, '4.826')] +[2025-08-18 16:49:42,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3481.6). Total num frames: 835584. Throughput: 0: 902.1. Samples: 208826. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:49:42,424][02710] Avg episode reward: [(0, '4.796')] +[2025-08-18 16:49:47,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3494.1). Total num frames: 856064. Throughput: 0: 918.3. Samples: 214744. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:49:47,424][02710] Avg episode reward: [(0, '4.792')] +[2025-08-18 16:49:47,779][02847] Updated weights for policy 0, policy_version 210 (0.0017) +[2025-08-18 16:49:52,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3489.8). Total num frames: 872448. Throughput: 0: 904.8. Samples: 217154. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:49:52,424][02710] Avg episode reward: [(0, '4.772')] +[2025-08-18 16:49:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3501.7). Total num frames: 892928. Throughput: 0: 911.9. Samples: 222484. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:49:57,423][02710] Avg episode reward: [(0, '4.853')] +[2025-08-18 16:49:59,274][02847] Updated weights for policy 0, policy_version 220 (0.0012) +[2025-08-18 16:50:02,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3497.4). Total num frames: 909312. Throughput: 0: 912.0. Samples: 228314. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:50:02,426][02710] Avg episode reward: [(0, '4.972')] +[2025-08-18 16:50:02,445][02834] Saving new best policy, reward=4.972! +[2025-08-18 16:50:07,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3493.2). Total num frames: 925696. Throughput: 0: 887.0. Samples: 230186. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:50:07,423][02710] Avg episode reward: [(0, '4.851')] +[2025-08-18 16:50:10,812][02847] Updated weights for policy 0, policy_version 230 (0.0016) +[2025-08-18 16:50:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3504.4). Total num frames: 946176. Throughput: 0: 915.1. Samples: 236166. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:50:12,423][02710] Avg episode reward: [(0, '4.885')] +[2025-08-18 16:50:17,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3500.2). Total num frames: 962560. Throughput: 0: 902.0. Samples: 241594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:50:17,425][02710] Avg episode reward: [(0, '4.998')] +[2025-08-18 16:50:17,428][02834] Saving new best policy, reward=4.998! +[2025-08-18 16:50:22,385][02847] Updated weights for policy 0, policy_version 240 (0.0013) +[2025-08-18 16:50:22,424][02710] Fps is (10 sec: 3685.4, 60 sec: 3618.0, 300 sec: 3510.8). Total num frames: 983040. Throughput: 0: 887.1. Samples: 243892. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:50:22,428][02710] Avg episode reward: [(0, '4.881')] +[2025-08-18 16:50:27,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3506.8). Total num frames: 999424. Throughput: 0: 914.7. Samples: 249988. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:50:27,423][02710] Avg episode reward: [(0, '4.719')] +[2025-08-18 16:50:32,421][02710] Fps is (10 sec: 3277.7, 60 sec: 3549.9, 300 sec: 3502.8). Total num frames: 1015808. Throughput: 0: 887.1. Samples: 254662. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:50:32,423][02710] Avg episode reward: [(0, '4.856')] +[2025-08-18 16:50:33,787][02847] Updated weights for policy 0, policy_version 250 (0.0014) +[2025-08-18 16:50:37,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3512.8). Total num frames: 1036288. Throughput: 0: 901.9. Samples: 257738. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:50:37,425][02710] Avg episode reward: [(0, '5.034')] +[2025-08-18 16:50:37,428][02834] Saving new best policy, reward=5.034! +[2025-08-18 16:50:42,421][02710] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1056768. Throughput: 0: 917.1. Samples: 263754. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:50:42,423][02710] Avg episode reward: [(0, '5.200')] +[2025-08-18 16:50:42,438][02834] Saving new best policy, reward=5.200! +[2025-08-18 16:50:44,994][02847] Updated weights for policy 0, policy_version 260 (0.0013) +[2025-08-18 16:50:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1073152. Throughput: 0: 892.9. Samples: 268496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:50:47,425][02710] Avg episode reward: [(0, '5.178')] +[2025-08-18 16:50:52,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1093632. Throughput: 0: 918.3. Samples: 271508. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:50:52,423][02710] Avg episode reward: [(0, '5.199')] +[2025-08-18 16:50:55,422][02847] Updated weights for policy 0, policy_version 270 (0.0012) +[2025-08-18 16:50:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1110016. Throughput: 0: 915.2. Samples: 277348. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:50:57,429][02710] Avg episode reward: [(0, '5.143')] +[2025-08-18 16:51:02,422][02710] Fps is (10 sec: 3276.7, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1126400. Throughput: 0: 899.3. Samples: 282064. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:51:02,423][02710] Avg episode reward: [(0, '5.262')] +[2025-08-18 16:51:02,428][02834] Saving new best policy, reward=5.262! +[2025-08-18 16:51:07,244][02847] Updated weights for policy 0, policy_version 280 (0.0012) +[2025-08-18 16:51:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3596.1). Total num frames: 1146880. Throughput: 0: 912.7. Samples: 284962. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:51:07,423][02710] Avg episode reward: [(0, '5.387')] +[2025-08-18 16:51:07,425][02834] Saving new best policy, reward=5.387! +[2025-08-18 16:51:12,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1163264. Throughput: 0: 897.2. Samples: 290362. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:51:12,433][02710] Avg episode reward: [(0, '5.530')] +[2025-08-18 16:51:12,442][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth... +[2025-08-18 16:51:12,539][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth +[2025-08-18 16:51:12,552][02834] Saving new best policy, reward=5.530! +[2025-08-18 16:51:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1179648. Throughput: 0: 908.3. Samples: 295536. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:51:17,426][02710] Avg episode reward: [(0, '5.238')] +[2025-08-18 16:51:18,911][02847] Updated weights for policy 0, policy_version 290 (0.0012) +[2025-08-18 16:51:22,424][02710] Fps is (10 sec: 3685.4, 60 sec: 3618.1, 300 sec: 3582.2). Total num frames: 1200128. Throughput: 0: 905.5. Samples: 298488. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:51:22,426][02710] Avg episode reward: [(0, '5.397')] +[2025-08-18 16:51:27,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1216512. Throughput: 0: 883.5. Samples: 303512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:51:27,423][02710] Avg episode reward: [(0, '5.492')] +[2025-08-18 16:51:30,295][02847] Updated weights for policy 0, policy_version 300 (0.0014) +[2025-08-18 16:51:32,421][02710] Fps is (10 sec: 3277.7, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1232896. Throughput: 0: 907.9. Samples: 309350. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:51:32,425][02710] Avg episode reward: [(0, '5.533')] +[2025-08-18 16:51:32,483][02834] Saving new best policy, reward=5.533! +[2025-08-18 16:51:37,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1253376. Throughput: 0: 907.1. Samples: 312328. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:51:37,426][02710] Avg episode reward: [(0, '5.478')] +[2025-08-18 16:51:41,804][02847] Updated weights for policy 0, policy_version 310 (0.0013) +[2025-08-18 16:51:42,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1269760. Throughput: 0: 882.4. Samples: 317054. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:51:42,423][02710] Avg episode reward: [(0, '5.213')] +[2025-08-18 16:51:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1290240. Throughput: 0: 912.2. Samples: 323114. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:51:47,426][02710] Avg episode reward: [(0, '5.232')] +[2025-08-18 16:51:52,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1306624. Throughput: 0: 913.0. Samples: 326048. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:51:52,423][02710] Avg episode reward: [(0, '5.474')] +[2025-08-18 16:51:53,123][02847] Updated weights for policy 0, policy_version 320 (0.0012) +[2025-08-18 16:51:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 1327104. Throughput: 0: 901.1. Samples: 330912. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:51:57,426][02710] Avg episode reward: [(0, '5.611')] +[2025-08-18 16:51:57,431][02834] Saving new best policy, reward=5.611! +[2025-08-18 16:52:02,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.2, 300 sec: 3582.3). Total num frames: 1343488. Throughput: 0: 917.7. Samples: 336834. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:52:02,426][02710] Avg episode reward: [(0, '5.585')] +[2025-08-18 16:52:03,506][02847] Updated weights for policy 0, policy_version 330 (0.0013) +[2025-08-18 16:52:07,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3582.3). Total num frames: 1359872. Throughput: 0: 905.7. Samples: 339242. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:52:07,425][02710] Avg episode reward: [(0, '5.505')] +[2025-08-18 16:52:12,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3582.3). Total num frames: 1380352. Throughput: 0: 914.2. Samples: 344650. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:52:12,425][02710] Avg episode reward: [(0, '6.016')] +[2025-08-18 16:52:12,432][02834] Saving new best policy, reward=6.016! +[2025-08-18 16:52:14,949][02847] Updated weights for policy 0, policy_version 340 (0.0012) +[2025-08-18 16:52:17,426][02710] Fps is (10 sec: 4094.1, 60 sec: 3686.1, 300 sec: 3596.1). Total num frames: 1400832. Throughput: 0: 916.5. Samples: 350598. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:52:17,433][02710] Avg episode reward: [(0, '6.524')] +[2025-08-18 16:52:17,442][02834] Saving new best policy, reward=6.524! +[2025-08-18 16:52:22,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.3, 300 sec: 3610.0). Total num frames: 1417216. Throughput: 0: 891.7. Samples: 352454. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:52:22,425][02710] Avg episode reward: [(0, '6.589')] +[2025-08-18 16:52:22,435][02834] Saving new best policy, reward=6.589! +[2025-08-18 16:52:26,479][02847] Updated weights for policy 0, policy_version 350 (0.0013) +[2025-08-18 16:52:27,424][02710] Fps is (10 sec: 3277.5, 60 sec: 3618.0, 300 sec: 3610.0). Total num frames: 1433600. Throughput: 0: 919.4. Samples: 358428. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:52:27,425][02710] Avg episode reward: [(0, '6.567')] +[2025-08-18 16:52:32,423][02710] Fps is (10 sec: 3685.7, 60 sec: 3686.3, 300 sec: 3623.9). Total num frames: 1454080. Throughput: 0: 902.6. Samples: 363734. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:52:32,425][02710] Avg episode reward: [(0, '6.339')] +[2025-08-18 16:52:37,421][02710] Fps is (10 sec: 3687.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1470464. Throughput: 0: 890.7. Samples: 366128. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:52:37,424][02710] Avg episode reward: [(0, '6.348')] +[2025-08-18 16:52:38,000][02847] Updated weights for policy 0, policy_version 360 (0.0013) +[2025-08-18 16:52:42,422][02710] Fps is (10 sec: 3686.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1490944. Throughput: 0: 917.4. Samples: 372196. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:52:42,423][02710] Avg episode reward: [(0, '7.073')] +[2025-08-18 16:52:42,430][02834] Saving new best policy, reward=7.073! +[2025-08-18 16:52:47,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 1503232. Throughput: 0: 890.5. Samples: 376906. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:52:47,425][02710] Avg episode reward: [(0, '7.096')] +[2025-08-18 16:52:47,429][02834] Saving new best policy, reward=7.096! +[2025-08-18 16:52:49,568][02847] Updated weights for policy 0, policy_version 370 (0.0013) +[2025-08-18 16:52:52,426][02710] Fps is (10 sec: 3275.4, 60 sec: 3617.9, 300 sec: 3610.0). Total num frames: 1523712. Throughput: 0: 904.3. Samples: 379942. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:52:52,428][02710] Avg episode reward: [(0, '6.977')] +[2025-08-18 16:52:57,421][02710] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1544192. Throughput: 0: 919.6. Samples: 386030. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:52:57,424][02710] Avg episode reward: [(0, '6.948')] +[2025-08-18 16:53:00,842][02847] Updated weights for policy 0, policy_version 380 (0.0012) +[2025-08-18 16:53:02,421][02710] Fps is (10 sec: 3688.2, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1560576. Throughput: 0: 891.7. Samples: 390718. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:53:02,424][02710] Avg episode reward: [(0, '7.516')] +[2025-08-18 16:53:02,430][02834] Saving new best policy, reward=7.516! +[2025-08-18 16:53:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3624.0). Total num frames: 1581056. Throughput: 0: 917.5. Samples: 393740. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:53:07,424][02710] Avg episode reward: [(0, '8.303')] +[2025-08-18 16:53:07,429][02834] Saving new best policy, reward=8.303! +[2025-08-18 16:53:11,278][02847] Updated weights for policy 0, policy_version 390 (0.0013) +[2025-08-18 16:53:12,429][02710] Fps is (10 sec: 3683.5, 60 sec: 3617.7, 300 sec: 3623.8). Total num frames: 1597440. Throughput: 0: 914.5. Samples: 399584. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:53:12,443][02710] Avg episode reward: [(0, '8.422')] +[2025-08-18 16:53:12,456][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000390_1597440.pth... +[2025-08-18 16:53:12,573][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000177_724992.pth +[2025-08-18 16:53:12,586][02834] Saving new best policy, reward=8.422! +[2025-08-18 16:53:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3550.2, 300 sec: 3610.0). Total num frames: 1613824. Throughput: 0: 902.5. Samples: 404344. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:53:17,423][02710] Avg episode reward: [(0, '8.704')] +[2025-08-18 16:53:17,424][02834] Saving new best policy, reward=8.704! +[2025-08-18 16:53:22,422][02710] Fps is (10 sec: 3689.2, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 1634304. Throughput: 0: 912.7. Samples: 407198. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:53:22,426][02710] Avg episode reward: [(0, '9.189')] +[2025-08-18 16:53:22,433][02834] Saving new best policy, reward=9.189! +[2025-08-18 16:53:22,897][02847] Updated weights for policy 0, policy_version 400 (0.0014) +[2025-08-18 16:53:27,424][02710] Fps is (10 sec: 3685.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1650688. Throughput: 0: 896.1. Samples: 412522. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:53:27,425][02710] Avg episode reward: [(0, '10.076')] +[2025-08-18 16:53:27,433][02834] Saving new best policy, reward=10.076! +[2025-08-18 16:53:32,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.2, 300 sec: 3623.9). Total num frames: 1671168. Throughput: 0: 910.4. Samples: 417876. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:53:32,423][02710] Avg episode reward: [(0, '10.495')] +[2025-08-18 16:53:32,434][02834] Saving new best policy, reward=10.495! +[2025-08-18 16:53:34,473][02847] Updated weights for policy 0, policy_version 410 (0.0012) +[2025-08-18 16:53:37,421][02710] Fps is (10 sec: 3687.4, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 1687552. Throughput: 0: 909.2. Samples: 420850. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:53:37,425][02710] Avg episode reward: [(0, '10.426')] +[2025-08-18 16:53:42,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 1703936. Throughput: 0: 879.6. Samples: 425614. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:53:42,425][02710] Avg episode reward: [(0, '10.032')] +[2025-08-18 16:53:45,935][02847] Updated weights for policy 0, policy_version 420 (0.0016) +[2025-08-18 16:53:47,424][02710] Fps is (10 sec: 3685.4, 60 sec: 3686.2, 300 sec: 3623.9). Total num frames: 1724416. Throughput: 0: 908.7. Samples: 431614. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:53:47,428][02710] Avg episode reward: [(0, '10.363')] +[2025-08-18 16:53:52,421][02710] Fps is (10 sec: 4096.0, 60 sec: 3686.7, 300 sec: 3637.8). Total num frames: 1744896. Throughput: 0: 909.4. Samples: 434664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:53:52,429][02710] Avg episode reward: [(0, '11.015')] +[2025-08-18 16:53:52,437][02834] Saving new best policy, reward=11.015! +[2025-08-18 16:53:57,410][02847] Updated weights for policy 0, policy_version 430 (0.0016) +[2025-08-18 16:53:57,422][02710] Fps is (10 sec: 3687.3, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1761280. Throughput: 0: 884.7. Samples: 439388. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:53:57,423][02710] Avg episode reward: [(0, '12.120')] +[2025-08-18 16:53:57,426][02834] Saving new best policy, reward=12.120! +[2025-08-18 16:54:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 1777664. Throughput: 0: 911.9. Samples: 445378. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:54:02,423][02710] Avg episode reward: [(0, '12.294')] +[2025-08-18 16:54:02,432][02834] Saving new best policy, reward=12.294! +[2025-08-18 16:54:07,423][02710] Fps is (10 sec: 3276.3, 60 sec: 3549.8, 300 sec: 3623.9). Total num frames: 1794048. Throughput: 0: 911.3. Samples: 448206. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:54:07,424][02710] Avg episode reward: [(0, '12.704')] +[2025-08-18 16:54:07,426][02834] Saving new best policy, reward=12.704! +[2025-08-18 16:54:08,925][02847] Updated weights for policy 0, policy_version 440 (0.0012) +[2025-08-18 16:54:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.6, 300 sec: 3623.9). Total num frames: 1814528. Throughput: 0: 903.1. Samples: 453160. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:54:12,424][02710] Avg episode reward: [(0, '11.606')] +[2025-08-18 16:54:17,421][02710] Fps is (10 sec: 4096.7, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1835008. Throughput: 0: 917.5. Samples: 459162. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:54:17,423][02710] Avg episode reward: [(0, '10.698')] +[2025-08-18 16:54:19,590][02847] Updated weights for policy 0, policy_version 450 (0.0015) +[2025-08-18 16:54:22,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3624.0). Total num frames: 1851392. Throughput: 0: 903.3. Samples: 461500. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:54:22,425][02710] Avg episode reward: [(0, '9.914')] +[2025-08-18 16:54:27,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.3, 300 sec: 3610.0). Total num frames: 1867776. Throughput: 0: 921.2. Samples: 467068. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:54:27,428][02710] Avg episode reward: [(0, '10.513')] +[2025-08-18 16:54:30,520][02847] Updated weights for policy 0, policy_version 460 (0.0013) +[2025-08-18 16:54:32,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1888256. Throughput: 0: 916.9. Samples: 472874. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:54:32,425][02710] Avg episode reward: [(0, '11.437')] +[2025-08-18 16:54:37,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1904640. Throughput: 0: 891.9. Samples: 474798. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:54:37,426][02710] Avg episode reward: [(0, '11.324')] +[2025-08-18 16:54:41,869][02847] Updated weights for policy 0, policy_version 470 (0.0013) +[2025-08-18 16:54:42,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 1925120. Throughput: 0: 922.9. Samples: 480920. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:54:42,423][02710] Avg episode reward: [(0, '11.479')] +[2025-08-18 16:54:47,425][02710] Fps is (10 sec: 3685.1, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1941504. Throughput: 0: 905.7. Samples: 486138. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:54:47,426][02710] Avg episode reward: [(0, '10.827')] +[2025-08-18 16:54:52,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1961984. Throughput: 0: 899.0. Samples: 488658. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:54:52,426][02710] Avg episode reward: [(0, '11.795')] +[2025-08-18 16:54:53,332][02847] Updated weights for policy 0, policy_version 480 (0.0012) +[2025-08-18 16:54:57,421][02710] Fps is (10 sec: 4097.5, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1982464. Throughput: 0: 924.0. Samples: 494740. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:54:57,426][02710] Avg episode reward: [(0, '12.692')] +[2025-08-18 16:55:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 1994752. Throughput: 0: 897.2. Samples: 499534. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:55:02,423][02710] Avg episode reward: [(0, '13.673')] +[2025-08-18 16:55:02,434][02834] Saving new best policy, reward=13.673! +[2025-08-18 16:55:04,633][02847] Updated weights for policy 0, policy_version 490 (0.0013) +[2025-08-18 16:55:07,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3686.5, 300 sec: 3623.9). Total num frames: 2015232. Throughput: 0: 911.7. Samples: 502526. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:55:07,425][02710] Avg episode reward: [(0, '14.285')] +[2025-08-18 16:55:07,426][02834] Saving new best policy, reward=14.285! +[2025-08-18 16:55:12,422][02710] Fps is (10 sec: 4095.7, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 2035712. Throughput: 0: 921.1. Samples: 508520. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:55:12,424][02710] Avg episode reward: [(0, '14.011')] +[2025-08-18 16:55:12,432][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth... +[2025-08-18 16:55:12,521][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000284_1163264.pth +[2025-08-18 16:55:16,225][02847] Updated weights for policy 0, policy_version 500 (0.0012) +[2025-08-18 16:55:17,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2052096. Throughput: 0: 896.8. Samples: 513230. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:55:17,423][02710] Avg episode reward: [(0, '13.713')] +[2025-08-18 16:55:22,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2072576. Throughput: 0: 920.4. Samples: 516218. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:55:22,423][02710] Avg episode reward: [(0, '13.395')] +[2025-08-18 16:55:26,931][02847] Updated weights for policy 0, policy_version 510 (0.0013) +[2025-08-18 16:55:27,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2088960. Throughput: 0: 912.8. Samples: 521998. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:55:27,423][02710] Avg episode reward: [(0, '12.197')] +[2025-08-18 16:55:32,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2105344. Throughput: 0: 908.6. Samples: 527020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:55:32,423][02710] Avg episode reward: [(0, '11.480')] +[2025-08-18 16:55:37,421][02710] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 2125824. Throughput: 0: 920.9. Samples: 530098. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:55:37,423][02710] Avg episode reward: [(0, '12.422')] +[2025-08-18 16:55:37,851][02847] Updated weights for policy 0, policy_version 520 (0.0015) +[2025-08-18 16:55:42,422][02710] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2142208. Throughput: 0: 899.8. Samples: 535232. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:55:42,423][02710] Avg episode reward: [(0, '12.747')] +[2025-08-18 16:55:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.6, 300 sec: 3623.9). Total num frames: 2162688. Throughput: 0: 918.9. Samples: 540884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:55:47,426][02710] Avg episode reward: [(0, '13.417')] +[2025-08-18 16:55:49,365][02847] Updated weights for policy 0, policy_version 530 (0.0021) +[2025-08-18 16:55:52,421][02710] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2183168. Throughput: 0: 919.2. Samples: 543890. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:55:52,423][02710] Avg episode reward: [(0, '15.201')] +[2025-08-18 16:55:52,429][02834] Saving new best policy, reward=15.201! +[2025-08-18 16:55:57,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2195456. Throughput: 0: 889.5. Samples: 548548. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:55:57,423][02710] Avg episode reward: [(0, '15.048')] +[2025-08-18 16:56:00,873][02847] Updated weights for policy 0, policy_version 540 (0.0014) +[2025-08-18 16:56:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 2215936. Throughput: 0: 919.2. Samples: 554596. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:56:02,426][02710] Avg episode reward: [(0, '15.674')] +[2025-08-18 16:56:02,433][02834] Saving new best policy, reward=15.674! +[2025-08-18 16:56:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 2232320. Throughput: 0: 918.7. Samples: 557558. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:56:07,423][02710] Avg episode reward: [(0, '17.067')] +[2025-08-18 16:56:07,424][02834] Saving new best policy, reward=17.067! +[2025-08-18 16:56:12,265][02847] Updated weights for policy 0, policy_version 550 (0.0013) +[2025-08-18 16:56:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 2252800. Throughput: 0: 896.3. Samples: 562332. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:56:12,423][02710] Avg episode reward: [(0, '17.988')] +[2025-08-18 16:56:12,431][02834] Saving new best policy, reward=17.988! +[2025-08-18 16:56:17,421][02710] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2273280. Throughput: 0: 919.4. Samples: 568394. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:56:17,425][02710] Avg episode reward: [(0, '18.517')] +[2025-08-18 16:56:17,427][02834] Saving new best policy, reward=18.517! +[2025-08-18 16:56:22,424][02710] Fps is (10 sec: 3275.8, 60 sec: 3549.7, 300 sec: 3623.9). Total num frames: 2285568. Throughput: 0: 911.5. Samples: 571116. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:56:22,429][02710] Avg episode reward: [(0, '18.694')] +[2025-08-18 16:56:22,443][02834] Saving new best policy, reward=18.694! +[2025-08-18 16:56:23,875][02847] Updated weights for policy 0, policy_version 560 (0.0013) +[2025-08-18 16:56:27,423][02710] Fps is (10 sec: 3276.2, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2306048. Throughput: 0: 909.0. Samples: 576140. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:56:27,428][02710] Avg episode reward: [(0, '18.572')] +[2025-08-18 16:56:32,424][02710] Fps is (10 sec: 4096.2, 60 sec: 3686.2, 300 sec: 3637.8). Total num frames: 2326528. Throughput: 0: 917.7. Samples: 582184. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:56:32,425][02710] Avg episode reward: [(0, '19.030')] +[2025-08-18 16:56:32,434][02834] Saving new best policy, reward=19.030! +[2025-08-18 16:56:34,623][02847] Updated weights for policy 0, policy_version 570 (0.0013) +[2025-08-18 16:56:37,421][02710] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2342912. Throughput: 0: 896.3. Samples: 584224. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:56:37,423][02710] Avg episode reward: [(0, '17.226')] +[2025-08-18 16:56:42,422][02710] Fps is (10 sec: 3687.3, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2363392. Throughput: 0: 919.7. Samples: 589936. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:56:42,423][02710] Avg episode reward: [(0, '16.982')] +[2025-08-18 16:56:45,418][02847] Updated weights for policy 0, policy_version 580 (0.0017) +[2025-08-18 16:56:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2379776. Throughput: 0: 910.2. Samples: 595554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:56:47,426][02710] Avg episode reward: [(0, '18.134')] +[2025-08-18 16:56:52,421][02710] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2396160. Throughput: 0: 892.1. Samples: 597702. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:56:52,423][02710] Avg episode reward: [(0, '19.013')] +[2025-08-18 16:56:56,928][02847] Updated weights for policy 0, policy_version 590 (0.0013) +[2025-08-18 16:56:57,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2416640. Throughput: 0: 921.6. Samples: 603806. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:56:57,426][02710] Avg episode reward: [(0, '18.117')] +[2025-08-18 16:57:02,427][02710] Fps is (10 sec: 3684.3, 60 sec: 3617.8, 300 sec: 3637.7). Total num frames: 2433024. Throughput: 0: 898.7. Samples: 608842. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:57:02,430][02710] Avg episode reward: [(0, '19.705')] +[2025-08-18 16:57:02,448][02834] Saving new best policy, reward=19.705! +[2025-08-18 16:57:07,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2453504. Throughput: 0: 898.4. Samples: 611540. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:57:07,426][02710] Avg episode reward: [(0, '20.072')] +[2025-08-18 16:57:07,430][02834] Saving new best policy, reward=20.072! +[2025-08-18 16:57:08,203][02847] Updated weights for policy 0, policy_version 600 (0.0012) +[2025-08-18 16:57:12,423][02710] Fps is (10 sec: 4097.7, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 2473984. Throughput: 0: 920.0. Samples: 617542. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:57:12,425][02710] Avg episode reward: [(0, '19.663')] +[2025-08-18 16:57:12,433][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000604_2473984.pth... +[2025-08-18 16:57:12,517][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000390_1597440.pth +[2025-08-18 16:57:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2486272. Throughput: 0: 892.5. Samples: 622344. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:57:17,426][02710] Avg episode reward: [(0, '19.701')] +[2025-08-18 16:57:19,750][02847] Updated weights for policy 0, policy_version 610 (0.0012) +[2025-08-18 16:57:22,425][02710] Fps is (10 sec: 3276.1, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 2506752. Throughput: 0: 915.3. Samples: 625414. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:57:22,427][02710] Avg episode reward: [(0, '19.369')] +[2025-08-18 16:57:27,423][02710] Fps is (10 sec: 4095.2, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2527232. Throughput: 0: 922.4. Samples: 631444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:57:27,425][02710] Avg episode reward: [(0, '19.081')] +[2025-08-18 16:57:31,291][02847] Updated weights for policy 0, policy_version 620 (0.0021) +[2025-08-18 16:57:32,421][02710] Fps is (10 sec: 3687.8, 60 sec: 3618.3, 300 sec: 3637.8). Total num frames: 2543616. Throughput: 0: 900.2. Samples: 636064. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:57:32,427][02710] Avg episode reward: [(0, '18.691')] +[2025-08-18 16:57:37,422][02710] Fps is (10 sec: 3687.0, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2564096. Throughput: 0: 920.4. Samples: 639122. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:57:37,425][02710] Avg episode reward: [(0, '18.801')] +[2025-08-18 16:57:42,124][02847] Updated weights for policy 0, policy_version 630 (0.0013) +[2025-08-18 16:57:42,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 2580480. Throughput: 0: 910.7. Samples: 644788. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:57:42,424][02710] Avg episode reward: [(0, '18.915')] +[2025-08-18 16:57:47,422][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.9). Total num frames: 2596864. Throughput: 0: 914.1. Samples: 649970. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:57:47,424][02710] Avg episode reward: [(0, '19.465')] +[2025-08-18 16:57:52,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2617344. Throughput: 0: 921.4. Samples: 653004. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:57:52,423][02710] Avg episode reward: [(0, '20.587')] +[2025-08-18 16:57:52,435][02834] Saving new best policy, reward=20.587! +[2025-08-18 16:57:52,849][02847] Updated weights for policy 0, policy_version 640 (0.0013) +[2025-08-18 16:57:57,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2633728. Throughput: 0: 899.7. Samples: 658028. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) +[2025-08-18 16:57:57,426][02710] Avg episode reward: [(0, '20.577')] +[2025-08-18 16:58:02,424][02710] Fps is (10 sec: 3685.3, 60 sec: 3686.6, 300 sec: 3637.8). Total num frames: 2654208. Throughput: 0: 920.6. Samples: 663772. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:58:02,426][02710] Avg episode reward: [(0, '20.364')] +[2025-08-18 16:58:04,387][02847] Updated weights for policy 0, policy_version 650 (0.0014) +[2025-08-18 16:58:07,425][02710] Fps is (10 sec: 3685.0, 60 sec: 3617.9, 300 sec: 3637.9). Total num frames: 2670592. Throughput: 0: 918.3. Samples: 666738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:58:07,428][02710] Avg episode reward: [(0, '19.567')] +[2025-08-18 16:58:12,421][02710] Fps is (10 sec: 3277.8, 60 sec: 3550.0, 300 sec: 3637.8). Total num frames: 2686976. Throughput: 0: 891.1. Samples: 671540. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:58:12,423][02710] Avg episode reward: [(0, '19.281')] +[2025-08-18 16:58:15,681][02847] Updated weights for policy 0, policy_version 660 (0.0017) +[2025-08-18 16:58:17,421][02710] Fps is (10 sec: 3687.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2707456. Throughput: 0: 923.1. Samples: 677602. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:58:17,423][02710] Avg episode reward: [(0, '18.751')] +[2025-08-18 16:58:22,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3637.8). Total num frames: 2723840. Throughput: 0: 920.8. Samples: 680558. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:58:22,426][02710] Avg episode reward: [(0, '19.640')] +[2025-08-18 16:58:27,186][02847] Updated weights for policy 0, policy_version 670 (0.0013) +[2025-08-18 16:58:27,422][02710] Fps is (10 sec: 3686.2, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 2744320. Throughput: 0: 901.0. Samples: 685332. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:58:27,427][02710] Avg episode reward: [(0, '20.619')] +[2025-08-18 16:58:27,431][02834] Saving new best policy, reward=20.619! +[2025-08-18 16:58:32,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2760704. Throughput: 0: 918.5. Samples: 691302. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:58:32,424][02710] Avg episode reward: [(0, '20.258')] +[2025-08-18 16:58:37,421][02710] Fps is (10 sec: 3277.0, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 2777088. Throughput: 0: 903.6. Samples: 693668. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:58:37,425][02710] Avg episode reward: [(0, '21.294')] +[2025-08-18 16:58:37,431][02834] Saving new best policy, reward=21.294! +[2025-08-18 16:58:39,057][02847] Updated weights for policy 0, policy_version 680 (0.0013) +[2025-08-18 16:58:42,422][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2797568. Throughput: 0: 906.8. Samples: 698836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:58:42,426][02710] Avg episode reward: [(0, '20.909')] +[2025-08-18 16:58:47,422][02710] Fps is (10 sec: 4095.8, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2818048. Throughput: 0: 913.6. Samples: 704884. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:58:47,427][02710] Avg episode reward: [(0, '19.464')] +[2025-08-18 16:58:50,173][02847] Updated weights for policy 0, policy_version 690 (0.0012) +[2025-08-18 16:58:52,421][02710] Fps is (10 sec: 3276.9, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 2830336. Throughput: 0: 889.5. Samples: 706762. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:58:52,428][02710] Avg episode reward: [(0, '18.312')] +[2025-08-18 16:58:57,421][02710] Fps is (10 sec: 3277.0, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2850816. Throughput: 0: 914.5. Samples: 712692. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:58:57,426][02710] Avg episode reward: [(0, '18.115')] +[2025-08-18 16:59:00,504][02847] Updated weights for policy 0, policy_version 700 (0.0013) +[2025-08-18 16:59:02,424][02710] Fps is (10 sec: 4094.9, 60 sec: 3618.2, 300 sec: 3651.7). Total num frames: 2871296. Throughput: 0: 902.8. Samples: 718230. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:59:02,429][02710] Avg episode reward: [(0, '17.390')] +[2025-08-18 16:59:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.4, 300 sec: 3637.8). Total num frames: 2887680. Throughput: 0: 886.1. Samples: 720432. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:59:07,427][02710] Avg episode reward: [(0, '17.663')] +[2025-08-18 16:59:12,085][02847] Updated weights for policy 0, policy_version 710 (0.0013) +[2025-08-18 16:59:12,421][02710] Fps is (10 sec: 3687.4, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2908160. Throughput: 0: 914.3. Samples: 726476. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 16:59:12,426][02710] Avg episode reward: [(0, '18.940')] +[2025-08-18 16:59:12,432][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000710_2908160.pth... +[2025-08-18 16:59:12,508][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000497_2035712.pth +[2025-08-18 16:59:17,422][02710] Fps is (10 sec: 3686.1, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2924544. Throughput: 0: 890.2. Samples: 731362. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:59:17,425][02710] Avg episode reward: [(0, '19.107')] +[2025-08-18 16:59:22,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2940928. Throughput: 0: 898.6. Samples: 734104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:59:22,427][02710] Avg episode reward: [(0, '18.679')] +[2025-08-18 16:59:23,622][02847] Updated weights for policy 0, policy_version 720 (0.0013) +[2025-08-18 16:59:27,422][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 2961408. Throughput: 0: 919.3. Samples: 740206. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:59:27,429][02710] Avg episode reward: [(0, '19.761')] +[2025-08-18 16:59:32,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 2977792. Throughput: 0: 891.0. Samples: 744980. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:59:32,426][02710] Avg episode reward: [(0, '19.078')] +[2025-08-18 16:59:35,036][02847] Updated weights for policy 0, policy_version 730 (0.0013) +[2025-08-18 16:59:37,421][02710] Fps is (10 sec: 3686.6, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 2998272. Throughput: 0: 915.8. Samples: 747974. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:59:37,423][02710] Avg episode reward: [(0, '17.645')] +[2025-08-18 16:59:42,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3014656. Throughput: 0: 919.0. Samples: 754048. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:59:42,426][02710] Avg episode reward: [(0, '17.379')] +[2025-08-18 16:59:46,477][02847] Updated weights for policy 0, policy_version 740 (0.0015) +[2025-08-18 16:59:47,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 3031040. Throughput: 0: 900.7. Samples: 758758. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 16:59:47,426][02710] Avg episode reward: [(0, '17.897')] +[2025-08-18 16:59:52,422][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3051520. Throughput: 0: 919.3. Samples: 761800. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:59:52,424][02710] Avg episode reward: [(0, '17.513')] +[2025-08-18 16:59:57,423][02710] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3637.8). Total num frames: 3067904. Throughput: 0: 905.3. Samples: 767218. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 16:59:57,427][02710] Avg episode reward: [(0, '18.457')] +[2025-08-18 16:59:57,777][02847] Updated weights for policy 0, policy_version 750 (0.0012) +[2025-08-18 17:00:02,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.3, 300 sec: 3637.8). Total num frames: 3088384. Throughput: 0: 916.5. Samples: 772606. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:00:02,423][02710] Avg episode reward: [(0, '19.050')] +[2025-08-18 17:00:07,424][02710] Fps is (10 sec: 4095.8, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 3108864. Throughput: 0: 923.2. Samples: 775652. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:00:07,425][02710] Avg episode reward: [(0, '20.727')] +[2025-08-18 17:00:08,039][02847] Updated weights for policy 0, policy_version 760 (0.0015) +[2025-08-18 17:00:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3125248. Throughput: 0: 896.5. Samples: 780546. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:00:12,423][02710] Avg episode reward: [(0, '20.139')] +[2025-08-18 17:00:17,421][02710] Fps is (10 sec: 3277.5, 60 sec: 3618.2, 300 sec: 3623.9). Total num frames: 3141632. Throughput: 0: 921.8. Samples: 786460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 17:00:17,426][02710] Avg episode reward: [(0, '19.983')] +[2025-08-18 17:00:19,536][02847] Updated weights for policy 0, policy_version 770 (0.0012) +[2025-08-18 17:00:22,424][02710] Fps is (10 sec: 3685.6, 60 sec: 3686.3, 300 sec: 3637.8). Total num frames: 3162112. Throughput: 0: 922.7. Samples: 789496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:00:22,425][02710] Avg episode reward: [(0, '19.690')] +[2025-08-18 17:00:27,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 3178496. Throughput: 0: 894.4. Samples: 794296. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:00:27,423][02710] Avg episode reward: [(0, '20.093')] +[2025-08-18 17:00:30,936][02847] Updated weights for policy 0, policy_version 780 (0.0012) +[2025-08-18 17:00:32,421][02710] Fps is (10 sec: 3687.2, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3198976. Throughput: 0: 921.4. Samples: 800222. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:00:32,427][02710] Avg episode reward: [(0, '19.491')] +[2025-08-18 17:00:37,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3215360. Throughput: 0: 917.4. Samples: 803082. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:00:37,426][02710] Avg episode reward: [(0, '19.146')] +[2025-08-18 17:00:42,422][02710] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3231744. Throughput: 0: 905.6. Samples: 807970. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:00:42,425][02710] Avg episode reward: [(0, '19.702')] +[2025-08-18 17:00:42,583][02847] Updated weights for policy 0, policy_version 790 (0.0013) +[2025-08-18 17:00:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3623.9). Total num frames: 3252224. Throughput: 0: 916.9. Samples: 813866. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:00:47,424][02710] Avg episode reward: [(0, '21.069')] +[2025-08-18 17:00:52,421][02710] Fps is (10 sec: 3686.6, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3268608. Throughput: 0: 900.0. Samples: 816150. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:00:52,426][02710] Avg episode reward: [(0, '20.612')] +[2025-08-18 17:00:54,119][02847] Updated weights for policy 0, policy_version 800 (0.0014) +[2025-08-18 17:00:57,423][02710] Fps is (10 sec: 3685.7, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 3289088. Throughput: 0: 911.3. Samples: 821554. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:00:57,425][02710] Avg episode reward: [(0, '21.625')] +[2025-08-18 17:00:57,429][02834] Saving new best policy, reward=21.625! +[2025-08-18 17:01:02,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3618.1, 300 sec: 3637.8). Total num frames: 3305472. Throughput: 0: 908.8. Samples: 827356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:01:02,426][02710] Avg episode reward: [(0, '21.996')] +[2025-08-18 17:01:02,434][02834] Saving new best policy, reward=21.996! +[2025-08-18 17:01:05,763][02847] Updated weights for policy 0, policy_version 810 (0.0013) +[2025-08-18 17:01:07,421][02710] Fps is (10 sec: 3277.4, 60 sec: 3550.0, 300 sec: 3623.9). Total num frames: 3321856. Throughput: 0: 882.3. Samples: 829198. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:01:07,428][02710] Avg episode reward: [(0, '22.750')] +[2025-08-18 17:01:07,433][02834] Saving new best policy, reward=22.750! +[2025-08-18 17:01:12,421][02710] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3342336. Throughput: 0: 906.9. Samples: 835108. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:01:12,427][02710] Avg episode reward: [(0, '21.894')] +[2025-08-18 17:01:12,439][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000816_3342336.pth... +[2025-08-18 17:01:12,535][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000604_2473984.pth +[2025-08-18 17:01:16,993][02847] Updated weights for policy 0, policy_version 820 (0.0013) +[2025-08-18 17:01:17,423][02710] Fps is (10 sec: 3685.8, 60 sec: 3618.0, 300 sec: 3637.8). Total num frames: 3358720. Throughput: 0: 882.9. Samples: 839954. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:01:17,424][02710] Avg episode reward: [(0, '21.076')] +[2025-08-18 17:01:22,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3623.9). Total num frames: 3375104. Throughput: 0: 869.6. Samples: 842216. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:01:22,423][02710] Avg episode reward: [(0, '19.336')] +[2025-08-18 17:01:27,421][02710] Fps is (10 sec: 3687.0, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3395584. Throughput: 0: 894.8. Samples: 848236. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:01:27,426][02710] Avg episode reward: [(0, '17.501')] +[2025-08-18 17:01:28,168][02847] Updated weights for policy 0, policy_version 830 (0.0013) +[2025-08-18 17:01:32,422][02710] Fps is (10 sec: 3686.3, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 3411968. Throughput: 0: 870.2. Samples: 853024. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:01:32,427][02710] Avg episode reward: [(0, '16.579')] +[2025-08-18 17:01:37,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3428352. Throughput: 0: 886.2. Samples: 856028. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:01:37,431][02710] Avg episode reward: [(0, '17.944')] +[2025-08-18 17:01:39,536][02847] Updated weights for policy 0, policy_version 840 (0.0012) +[2025-08-18 17:01:42,422][02710] Fps is (10 sec: 3686.2, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3448832. Throughput: 0: 901.6. Samples: 862124. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:01:42,427][02710] Avg episode reward: [(0, '19.347')] +[2025-08-18 17:01:47,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3623.9). Total num frames: 3465216. Throughput: 0: 876.7. Samples: 866808. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:01:47,426][02710] Avg episode reward: [(0, '19.877')] +[2025-08-18 17:01:51,106][02847] Updated weights for policy 0, policy_version 850 (0.0012) +[2025-08-18 17:01:52,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3623.9). Total num frames: 3485696. Throughput: 0: 901.2. Samples: 869750. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:01:52,425][02710] Avg episode reward: [(0, '22.127')] +[2025-08-18 17:01:57,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3624.0). Total num frames: 3502080. Throughput: 0: 902.4. Samples: 875716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:01:57,426][02710] Avg episode reward: [(0, '23.165')] +[2025-08-18 17:01:57,427][02834] Saving new best policy, reward=23.165! +[2025-08-18 17:02:02,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3518464. Throughput: 0: 901.3. Samples: 880510. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:02:02,429][02710] Avg episode reward: [(0, '25.032')] +[2025-08-18 17:02:02,438][02834] Saving new best policy, reward=25.032! +[2025-08-18 17:02:02,651][02847] Updated weights for policy 0, policy_version 860 (0.0012) +[2025-08-18 17:02:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 3538944. Throughput: 0: 915.7. Samples: 883422. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:02:07,427][02710] Avg episode reward: [(0, '24.863')] +[2025-08-18 17:02:12,422][02710] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 3623.9). Total num frames: 3555328. Throughput: 0: 898.7. Samples: 888678. Policy #0 lag: (min: 0.0, avg: 0.0, max: 1.0) +[2025-08-18 17:02:12,423][02710] Avg episode reward: [(0, '24.386')] +[2025-08-18 17:02:14,537][02847] Updated weights for policy 0, policy_version 870 (0.0012) +[2025-08-18 17:02:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3550.0, 300 sec: 3610.1). Total num frames: 3571712. Throughput: 0: 908.9. Samples: 893924. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 17:02:17,425][02710] Avg episode reward: [(0, '23.826')] +[2025-08-18 17:02:22,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 3592192. Throughput: 0: 907.0. Samples: 896844. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:02:22,426][02710] Avg episode reward: [(0, '23.466')] +[2025-08-18 17:02:26,062][02847] Updated weights for policy 0, policy_version 880 (0.0016) +[2025-08-18 17:02:27,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3608576. Throughput: 0: 874.5. Samples: 901474. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:02:27,423][02710] Avg episode reward: [(0, '23.152')] +[2025-08-18 17:02:32,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3629056. Throughput: 0: 903.0. Samples: 907444. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:02:32,425][02710] Avg episode reward: [(0, '21.321')] +[2025-08-18 17:02:36,526][02847] Updated weights for policy 0, policy_version 890 (0.0013) +[2025-08-18 17:02:37,423][02710] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3610.0). Total num frames: 3645440. Throughput: 0: 904.5. Samples: 910456. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:02:37,430][02710] Avg episode reward: [(0, '20.257')] +[2025-08-18 17:02:42,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3661824. Throughput: 0: 875.3. Samples: 915104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:02:42,425][02710] Avg episode reward: [(0, '19.857')] +[2025-08-18 17:02:47,421][02710] Fps is (10 sec: 3687.1, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3682304. Throughput: 0: 898.4. Samples: 920936. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:02:47,426][02710] Avg episode reward: [(0, '20.608')] +[2025-08-18 17:02:48,120][02847] Updated weights for policy 0, policy_version 900 (0.0017) +[2025-08-18 17:02:52,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3596.1). Total num frames: 3694592. Throughput: 0: 893.2. Samples: 923618. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:02:52,427][02710] Avg episode reward: [(0, '20.518')] +[2025-08-18 17:02:57,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 3715072. Throughput: 0: 885.2. Samples: 928510. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:02:57,426][02710] Avg episode reward: [(0, '21.497')] +[2025-08-18 17:02:59,961][02847] Updated weights for policy 0, policy_version 910 (0.0016) +[2025-08-18 17:03:02,421][02710] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3610.1). Total num frames: 3735552. Throughput: 0: 899.1. Samples: 934384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 17:03:02,427][02710] Avg episode reward: [(0, '22.744')] +[2025-08-18 17:03:07,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3610.0). Total num frames: 3751936. Throughput: 0: 883.9. Samples: 936620. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:03:07,428][02710] Avg episode reward: [(0, '23.443')] +[2025-08-18 17:03:11,522][02847] Updated weights for policy 0, policy_version 920 (0.0017) +[2025-08-18 17:03:12,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3768320. Throughput: 0: 903.1. Samples: 942114. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-18 17:03:12,423][02710] Avg episode reward: [(0, '23.664')] +[2025-08-18 17:03:12,433][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000920_3768320.pth... +[2025-08-18 17:03:12,514][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000710_2908160.pth +[2025-08-18 17:03:17,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3610.0). Total num frames: 3788800. Throughput: 0: 894.5. Samples: 947698. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:03:17,425][02710] Avg episode reward: [(0, '23.514')] +[2025-08-18 17:03:22,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 3805184. Throughput: 0: 867.5. Samples: 949494. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:03:22,427][02710] Avg episode reward: [(0, '22.147')] +[2025-08-18 17:03:23,322][02847] Updated weights for policy 0, policy_version 930 (0.0013) +[2025-08-18 17:03:27,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.2). Total num frames: 3821568. Throughput: 0: 897.6. Samples: 955498. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:03:27,424][02710] Avg episode reward: [(0, '21.641')] +[2025-08-18 17:03:32,422][02710] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 3610.0). Total num frames: 3842048. Throughput: 0: 885.3. Samples: 960776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:03:32,425][02710] Avg episode reward: [(0, '20.297')] +[2025-08-18 17:03:34,873][02847] Updated weights for policy 0, policy_version 940 (0.0014) +[2025-08-18 17:03:37,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3550.0, 300 sec: 3596.2). Total num frames: 3858432. Throughput: 0: 877.1. Samples: 963088. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:03:37,424][02710] Avg episode reward: [(0, '19.880')] +[2025-08-18 17:03:42,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3878912. Throughput: 0: 903.4. Samples: 969162. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) +[2025-08-18 17:03:42,423][02710] Avg episode reward: [(0, '20.029')] +[2025-08-18 17:03:45,821][02847] Updated weights for policy 0, policy_version 950 (0.0012) +[2025-08-18 17:03:47,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3596.1). Total num frames: 3891200. Throughput: 0: 875.5. Samples: 973782. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:03:47,423][02710] Avg episode reward: [(0, '20.250')] +[2025-08-18 17:03:52,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3911680. Throughput: 0: 890.1. Samples: 976676. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:03:52,423][02710] Avg episode reward: [(0, '20.663')] +[2025-08-18 17:03:56,757][02847] Updated weights for policy 0, policy_version 960 (0.0013) +[2025-08-18 17:03:57,422][02710] Fps is (10 sec: 4095.7, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3932160. Throughput: 0: 902.7. Samples: 982738. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:03:57,424][02710] Avg episode reward: [(0, '21.067')] +[2025-08-18 17:04:02,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 3948544. Throughput: 0: 883.2. Samples: 987440. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:04:02,424][02710] Avg episode reward: [(0, '20.413')] +[2025-08-18 17:04:07,421][02710] Fps is (10 sec: 3686.7, 60 sec: 3618.1, 300 sec: 3596.1). Total num frames: 3969024. Throughput: 0: 910.9. Samples: 990486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:04:07,424][02710] Avg episode reward: [(0, '20.813')] +[2025-08-18 17:04:08,237][02847] Updated weights for policy 0, policy_version 970 (0.0012) +[2025-08-18 17:04:12,421][02710] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3596.2). Total num frames: 3985408. Throughput: 0: 905.3. Samples: 996238. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:04:12,424][02710] Avg episode reward: [(0, '20.233')] +[2025-08-18 17:04:17,421][02710] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3596.1). Total num frames: 4001792. Throughput: 0: 896.9. Samples: 1001134. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) +[2025-08-18 17:04:17,423][02710] Avg episode reward: [(0, '20.226')] +[2025-08-18 17:04:17,814][02834] Stopping Batcher_0... +[2025-08-18 17:04:17,816][02710] Component Batcher_0 stopped! +[2025-08-18 17:04:17,816][02834] Loop batcher_evt_loop terminating... +[2025-08-18 17:04:17,821][02710] Component RolloutWorker_w0 process died already! Don't wait for it. +[2025-08-18 17:04:17,823][02710] Component RolloutWorker_w1 process died already! Don't wait for it. +[2025-08-18 17:04:17,815][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-18 17:04:17,825][02710] Component RolloutWorker_w5 process died already! Don't wait for it. +[2025-08-18 17:04:17,828][02710] Component RolloutWorker_w6 process died already! Don't wait for it. +[2025-08-18 17:04:17,834][02710] Component RolloutWorker_w7 process died already! Don't wait for it. +[2025-08-18 17:04:17,886][02847] Weights refcount: 2 0 +[2025-08-18 17:04:17,888][02710] Component InferenceWorker_p0-w0 stopped! +[2025-08-18 17:04:17,890][02847] Stopping InferenceWorker_p0-w0... +[2025-08-18 17:04:17,890][02847] Loop inference_proc0-0_evt_loop terminating... +[2025-08-18 17:04:17,935][02834] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000816_3342336.pth +[2025-08-18 17:04:17,944][02834] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-18 17:04:18,085][02710] Component RolloutWorker_w3 stopped! +[2025-08-18 17:04:18,086][02851] Stopping RolloutWorker_w3... +[2025-08-18 17:04:18,092][02834] Stopping LearnerWorker_p0... +[2025-08-18 17:04:18,092][02834] Loop learner_proc0_evt_loop terminating... +[2025-08-18 17:04:18,092][02710] Component LearnerWorker_p0 stopped! +[2025-08-18 17:04:18,090][02851] Loop rollout_proc3_evt_loop terminating... +[2025-08-18 17:04:18,236][02710] Component RolloutWorker_w4 stopped! +[2025-08-18 17:04:18,236][02853] Stopping RolloutWorker_w4... +[2025-08-18 17:04:18,251][02850] Stopping RolloutWorker_w2... +[2025-08-18 17:04:18,251][02710] Component RolloutWorker_w2 stopped! +[2025-08-18 17:04:18,253][02710] Waiting for process learner_proc0 to stop... +[2025-08-18 17:04:18,239][02853] Loop rollout_proc4_evt_loop terminating... +[2025-08-18 17:04:18,252][02850] Loop rollout_proc2_evt_loop terminating... +[2025-08-18 17:04:19,591][02710] Waiting for process inference_proc0-0 to join... +[2025-08-18 17:04:19,596][02710] Waiting for process rollout_proc0 to join... +[2025-08-18 17:04:19,597][02710] Waiting for process rollout_proc1 to join... +[2025-08-18 17:04:19,598][02710] Waiting for process rollout_proc2 to join... +[2025-08-18 17:04:20,329][02710] Waiting for process rollout_proc3 to join... +[2025-08-18 17:04:20,331][02710] Waiting for process rollout_proc4 to join... +[2025-08-18 17:04:20,333][02710] Waiting for process rollout_proc5 to join... +[2025-08-18 17:04:20,334][02710] Waiting for process rollout_proc6 to join... +[2025-08-18 17:04:20,334][02710] Waiting for process rollout_proc7 to join... +[2025-08-18 17:04:20,336][02710] Batcher 0 profile tree view: +batching: 21.0831, releasing_batches: 0.0261 +[2025-08-18 17:04:20,338][02710] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0035 + wait_policy_total: 426.4858 +update_model: 10.0812 + weight_update: 0.0013 +one_step: 0.0026 + handle_policy_step: 640.2106 + deserialize: 15.5210, stack: 4.1158, obs_to_device_normalize: 146.3139, forward: 339.0730, send_messages: 21.3403 + prepare_outputs: 85.7238 + to_cpu: 52.9879 +[2025-08-18 17:04:20,339][02710] Learner 0 profile tree view: +misc: 0.0048, prepare_batch: 11.8849 +train: 65.5446 + epoch_init: 0.0069, minibatch_init: 0.0068, losses_postprocess: 0.5636, kl_divergence: 0.5490, after_optimizer: 31.7256 + calculate_losses: 21.8246 + losses_init: 0.0044, forward_head: 1.2480, bptt_initial: 15.0406, tail: 0.8599, advantages_returns: 0.2057, losses: 2.6615 + bptt: 1.5943 + bptt_forward_core: 1.5325 + update: 10.4315 + clip: 0.9332 +[2025-08-18 17:04:20,340][02710] Loop Runner_EvtLoop terminating... +[2025-08-18 17:04:20,341][02710] Runner profile tree view: +main_loop: 1142.6643 +[2025-08-18 17:04:20,342][02710] Collected {0: 4005888}, FPS: 3505.7 +[2025-08-18 17:04:20,657][02710] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-08-18 17:04:20,658][02710] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-18 17:04:20,659][02710] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-18 17:04:20,661][02710] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-18 17:04:20,662][02710] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-18 17:04:20,663][02710] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-18 17:04:20,664][02710] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-08-18 17:04:20,666][02710] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-18 17:04:20,667][02710] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-08-18 17:04:20,669][02710] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-08-18 17:04:20,670][02710] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-18 17:04:20,670][02710] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-18 17:04:20,671][02710] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-18 17:04:20,672][02710] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-18 17:04:20,673][02710] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-18 17:04:20,701][02710] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-18 17:04:20,704][02710] RunningMeanStd input shape: (3, 72, 128) +[2025-08-18 17:04:20,706][02710] RunningMeanStd input shape: (1,) +[2025-08-18 17:04:20,721][02710] ConvEncoder: input_channels=3 +[2025-08-18 17:04:20,817][02710] Conv encoder output size: 512 +[2025-08-18 17:04:20,819][02710] Policy head output size: 512 +[2025-08-18 17:04:20,976][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-18 17:04:20,979][02710] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-18 17:04:20,982][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-18 17:04:20,985][02710] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-18 17:04:20,986][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-18 17:04:20,988][02710] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-18 17:07:55,828][02710] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-08-18 17:07:55,829][02710] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-18 17:07:55,830][02710] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-18 17:07:55,831][02710] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-18 17:07:55,832][02710] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-18 17:07:55,833][02710] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-18 17:07:55,833][02710] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-08-18 17:07:55,834][02710] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-18 17:07:55,835][02710] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-08-18 17:07:55,836][02710] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-08-18 17:07:55,837][02710] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-18 17:07:55,838][02710] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-18 17:07:55,838][02710] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-18 17:07:55,839][02710] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-18 17:07:55,840][02710] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-18 17:07:55,870][02710] RunningMeanStd input shape: (3, 72, 128) +[2025-08-18 17:07:55,871][02710] RunningMeanStd input shape: (1,) +[2025-08-18 17:07:55,881][02710] ConvEncoder: input_channels=3 +[2025-08-18 17:07:55,919][02710] Conv encoder output size: 512 +[2025-08-18 17:07:55,922][02710] Policy head output size: 512 +[2025-08-18 17:07:55,950][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-18 17:07:55,953][02710] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-18 17:07:55,955][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-18 17:07:55,957][02710] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-18 17:07:55,958][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-18 17:07:55,960][02710] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-18 17:11:24,695][02710] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-08-18 17:11:24,696][02710] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-18 17:11:24,697][02710] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-18 17:11:24,698][02710] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-18 17:11:24,699][02710] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-18 17:11:24,699][02710] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-18 17:11:24,700][02710] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-08-18 17:11:24,701][02710] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-18 17:11:24,702][02710] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-08-18 17:11:24,703][02710] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-08-18 17:11:24,704][02710] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-18 17:11:24,704][02710] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-18 17:11:24,705][02710] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-18 17:11:24,706][02710] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-18 17:11:24,707][02710] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-18 17:11:24,734][02710] RunningMeanStd input shape: (3, 72, 128) +[2025-08-18 17:11:24,735][02710] RunningMeanStd input shape: (1,) +[2025-08-18 17:11:24,746][02710] ConvEncoder: input_channels=3 +[2025-08-18 17:11:24,778][02710] Conv encoder output size: 512 +[2025-08-18 17:11:24,779][02710] Policy head output size: 512 +[2025-08-18 17:11:24,798][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-18 17:11:25,526][02710] Num frames 100... +[2025-08-18 17:11:25,656][02710] Num frames 200... +[2025-08-18 17:11:25,781][02710] Num frames 300... +[2025-08-18 17:11:25,914][02710] Num frames 400... +[2025-08-18 17:11:26,043][02710] Num frames 500... +[2025-08-18 17:11:26,180][02710] Num frames 600... +[2025-08-18 17:11:26,318][02710] Num frames 700... +[2025-08-18 17:11:26,449][02710] Num frames 800... +[2025-08-18 17:11:26,577][02710] Num frames 900... +[2025-08-18 17:11:26,705][02710] Num frames 1000... +[2025-08-18 17:11:26,831][02710] Num frames 1100... +[2025-08-18 17:11:26,957][02710] Num frames 1200... +[2025-08-18 17:11:27,086][02710] Num frames 1300... +[2025-08-18 17:11:27,216][02710] Num frames 1400... +[2025-08-18 17:11:27,353][02710] Num frames 1500... +[2025-08-18 17:11:27,482][02710] Num frames 1600... +[2025-08-18 17:11:27,614][02710] Num frames 1700... +[2025-08-18 17:11:27,744][02710] Num frames 1800... +[2025-08-18 17:11:27,873][02710] Num frames 1900... +[2025-08-18 17:11:28,005][02710] Num frames 2000... +[2025-08-18 17:11:28,140][02710] Num frames 2100... +[2025-08-18 17:11:28,192][02710] Avg episode rewards: #0: 58.999, true rewards: #0: 21.000 +[2025-08-18 17:11:28,192][02710] Avg episode reward: 58.999, avg true_objective: 21.000 +[2025-08-18 17:11:28,326][02710] Num frames 2200... +[2025-08-18 17:11:28,451][02710] Num frames 2300... +[2025-08-18 17:11:28,579][02710] Num frames 2400... +[2025-08-18 17:11:28,708][02710] Num frames 2500... +[2025-08-18 17:11:28,837][02710] Num frames 2600... +[2025-08-18 17:11:28,963][02710] Num frames 2700... +[2025-08-18 17:11:29,091][02710] Num frames 2800... +[2025-08-18 17:11:29,219][02710] Num frames 2900... +[2025-08-18 17:11:29,365][02710] Num frames 3000... +[2025-08-18 17:11:29,496][02710] Num frames 3100... +[2025-08-18 17:11:29,629][02710] Num frames 3200... +[2025-08-18 17:11:29,757][02710] Num frames 3300... +[2025-08-18 17:11:29,892][02710] Num frames 3400... +[2025-08-18 17:11:30,027][02710] Num frames 3500... +[2025-08-18 17:11:30,163][02710] Num frames 3600... +[2025-08-18 17:11:30,297][02710] Num frames 3700... +[2025-08-18 17:11:30,446][02710] Num frames 3800... +[2025-08-18 17:11:30,573][02710] Num frames 3900... +[2025-08-18 17:11:30,700][02710] Num frames 4000... +[2025-08-18 17:11:30,824][02710] Num frames 4100... +[2025-08-18 17:11:30,955][02710] Num frames 4200... +[2025-08-18 17:11:31,007][02710] Avg episode rewards: #0: 53.999, true rewards: #0: 21.000 +[2025-08-18 17:11:31,008][02710] Avg episode reward: 53.999, avg true_objective: 21.000 +[2025-08-18 17:11:31,138][02710] Num frames 4300... +[2025-08-18 17:11:31,268][02710] Num frames 4400... +[2025-08-18 17:11:31,409][02710] Num frames 4500... +[2025-08-18 17:11:31,542][02710] Num frames 4600... +[2025-08-18 17:11:31,675][02710] Num frames 4700... +[2025-08-18 17:11:31,802][02710] Num frames 4800... +[2025-08-18 17:11:31,929][02710] Num frames 4900... +[2025-08-18 17:11:32,055][02710] Num frames 5000... +[2025-08-18 17:11:32,183][02710] Num frames 5100... +[2025-08-18 17:11:32,313][02710] Num frames 5200... +[2025-08-18 17:11:32,454][02710] Num frames 5300... +[2025-08-18 17:11:32,579][02710] Num frames 5400... +[2025-08-18 17:11:32,706][02710] Num frames 5500... +[2025-08-18 17:11:32,836][02710] Num frames 5600... +[2025-08-18 17:11:32,969][02710] Num frames 5700... +[2025-08-18 17:11:33,077][02710] Avg episode rewards: #0: 48.133, true rewards: #0: 19.133 +[2025-08-18 17:11:33,078][02710] Avg episode reward: 48.133, avg true_objective: 19.133 +[2025-08-18 17:11:33,155][02710] Num frames 5800... +[2025-08-18 17:11:33,281][02710] Num frames 5900... +[2025-08-18 17:11:33,411][02710] Num frames 6000... +[2025-08-18 17:11:33,551][02710] Num frames 6100... +[2025-08-18 17:11:33,680][02710] Num frames 6200... +[2025-08-18 17:11:33,808][02710] Num frames 6300... +[2025-08-18 17:11:33,938][02710] Num frames 6400... +[2025-08-18 17:11:34,090][02710] Avg episode rewards: #0: 40.189, true rewards: #0: 16.190 +[2025-08-18 17:11:34,091][02710] Avg episode reward: 40.189, avg true_objective: 16.190 +[2025-08-18 17:11:34,127][02710] Num frames 6500... +[2025-08-18 17:11:34,251][02710] Num frames 6600... +[2025-08-18 17:11:34,381][02710] Num frames 6700... +[2025-08-18 17:11:34,522][02710] Num frames 6800... +[2025-08-18 17:11:34,651][02710] Num frames 6900... +[2025-08-18 17:11:34,823][02710] Num frames 7000... +[2025-08-18 17:11:34,995][02710] Num frames 7100... +[2025-08-18 17:11:35,161][02710] Num frames 7200... +[2025-08-18 17:11:35,326][02710] Num frames 7300... +[2025-08-18 17:11:35,493][02710] Num frames 7400... +[2025-08-18 17:11:35,677][02710] Num frames 7500... +[2025-08-18 17:11:35,844][02710] Num frames 7600... +[2025-08-18 17:11:35,948][02710] Avg episode rewards: #0: 37.456, true rewards: #0: 15.256 +[2025-08-18 17:11:35,951][02710] Avg episode reward: 37.456, avg true_objective: 15.256 +[2025-08-18 17:11:36,075][02710] Num frames 7700... +[2025-08-18 17:11:36,245][02710] Num frames 7800... +[2025-08-18 17:11:36,418][02710] Num frames 7900... +[2025-08-18 17:11:36,595][02710] Num frames 8000... +[2025-08-18 17:11:36,781][02710] Num frames 8100... +[2025-08-18 17:11:36,925][02710] Num frames 8200... +[2025-08-18 17:11:37,055][02710] Num frames 8300... +[2025-08-18 17:11:37,106][02710] Avg episode rewards: #0: 33.333, true rewards: #0: 13.833 +[2025-08-18 17:11:37,107][02710] Avg episode reward: 33.333, avg true_objective: 13.833 +[2025-08-18 17:11:37,233][02710] Num frames 8400... +[2025-08-18 17:11:37,364][02710] Num frames 8500... +[2025-08-18 17:11:37,491][02710] Num frames 8600... +[2025-08-18 17:11:37,621][02710] Num frames 8700... +[2025-08-18 17:11:37,761][02710] Num frames 8800... +[2025-08-18 17:11:37,893][02710] Num frames 8900... +[2025-08-18 17:11:38,029][02710] Num frames 9000... +[2025-08-18 17:11:38,161][02710] Avg episode rewards: #0: 30.653, true rewards: #0: 12.939 +[2025-08-18 17:11:38,162][02710] Avg episode reward: 30.653, avg true_objective: 12.939 +[2025-08-18 17:11:38,218][02710] Num frames 9100... +[2025-08-18 17:11:38,346][02710] Num frames 9200... +[2025-08-18 17:11:38,474][02710] Num frames 9300... +[2025-08-18 17:11:38,601][02710] Num frames 9400... +[2025-08-18 17:11:38,737][02710] Num frames 9500... +[2025-08-18 17:11:38,800][02710] Avg episode rewards: #0: 27.506, true rewards: #0: 11.881 +[2025-08-18 17:11:38,801][02710] Avg episode reward: 27.506, avg true_objective: 11.881 +[2025-08-18 17:11:38,924][02710] Num frames 9600... +[2025-08-18 17:11:39,051][02710] Num frames 9700... +[2025-08-18 17:11:39,174][02710] Num frames 9800... +[2025-08-18 17:11:39,321][02710] Num frames 9900... +[2025-08-18 17:11:39,448][02710] Num frames 10000... +[2025-08-18 17:11:39,559][02710] Avg episode rewards: #0: 25.602, true rewards: #0: 11.158 +[2025-08-18 17:11:39,560][02710] Avg episode reward: 25.602, avg true_objective: 11.158 +[2025-08-18 17:11:39,635][02710] Num frames 10100... +[2025-08-18 17:11:39,771][02710] Num frames 10200... +[2025-08-18 17:11:39,899][02710] Num frames 10300... +[2025-08-18 17:11:40,025][02710] Num frames 10400... +[2025-08-18 17:11:40,153][02710] Num frames 10500... +[2025-08-18 17:11:40,276][02710] Num frames 10600... +[2025-08-18 17:11:40,402][02710] Num frames 10700... +[2025-08-18 17:11:40,528][02710] Num frames 10800... +[2025-08-18 17:11:40,653][02710] Num frames 10900... +[2025-08-18 17:11:40,791][02710] Num frames 11000... +[2025-08-18 17:11:40,919][02710] Num frames 11100... +[2025-08-18 17:11:41,045][02710] Num frames 11200... +[2025-08-18 17:11:41,172][02710] Num frames 11300... +[2025-08-18 17:11:41,297][02710] Num frames 11400... +[2025-08-18 17:11:41,426][02710] Num frames 11500... +[2025-08-18 17:11:41,554][02710] Num frames 11600... +[2025-08-18 17:11:41,688][02710] Num frames 11700... +[2025-08-18 17:11:41,831][02710] Num frames 11800... +[2025-08-18 17:11:41,932][02710] Avg episode rewards: #0: 28.334, true rewards: #0: 11.834 +[2025-08-18 17:11:41,933][02710] Avg episode reward: 28.334, avg true_objective: 11.834 +[2025-08-18 17:12:56,203][02710] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2025-08-18 17:19:37,465][02710] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-08-18 17:19:37,466][02710] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-18 17:19:37,467][02710] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-18 17:19:37,467][02710] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-18 17:19:37,468][02710] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-18 17:19:37,469][02710] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-18 17:19:37,470][02710] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-08-18 17:19:37,471][02710] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-18 17:19:37,472][02710] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-08-18 17:19:37,472][02710] Adding new argument 'hf_repository'='Nikhil058/vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-08-18 17:19:37,473][02710] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-18 17:19:37,474][02710] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-18 17:19:37,475][02710] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-18 17:19:37,476][02710] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-18 17:19:37,477][02710] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-18 17:19:37,505][02710] RunningMeanStd input shape: (3, 72, 128) +[2025-08-18 17:19:37,507][02710] RunningMeanStd input shape: (1,) +[2025-08-18 17:19:37,519][02710] ConvEncoder: input_channels=3 +[2025-08-18 17:19:37,554][02710] Conv encoder output size: 512 +[2025-08-18 17:19:37,555][02710] Policy head output size: 512 +[2025-08-18 17:19:37,574][02710] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-18 17:19:38,037][02710] Num frames 100... +[2025-08-18 17:19:38,170][02710] Num frames 200... +[2025-08-18 17:19:38,300][02710] Num frames 300... +[2025-08-18 17:19:38,437][02710] Num frames 400... +[2025-08-18 17:19:38,569][02710] Num frames 500... +[2025-08-18 17:19:38,702][02710] Num frames 600... +[2025-08-18 17:19:38,845][02710] Num frames 700... +[2025-08-18 17:19:38,975][02710] Num frames 800... +[2025-08-18 17:19:39,102][02710] Num frames 900... +[2025-08-18 17:19:39,233][02710] Num frames 1000... +[2025-08-18 17:19:39,358][02710] Num frames 1100... +[2025-08-18 17:19:39,490][02710] Num frames 1200... +[2025-08-18 17:19:39,621][02710] Num frames 1300... +[2025-08-18 17:19:39,750][02710] Num frames 1400... +[2025-08-18 17:19:39,893][02710] Num frames 1500... +[2025-08-18 17:19:40,020][02710] Num frames 1600... +[2025-08-18 17:19:40,156][02710] Avg episode rewards: #0: 43.650, true rewards: #0: 16.650 +[2025-08-18 17:19:40,157][02710] Avg episode reward: 43.650, avg true_objective: 16.650 +[2025-08-18 17:19:40,204][02710] Num frames 1700... +[2025-08-18 17:19:40,328][02710] Num frames 1800... +[2025-08-18 17:19:40,452][02710] Num frames 1900... +[2025-08-18 17:19:40,580][02710] Num frames 2000... +[2025-08-18 17:19:40,709][02710] Num frames 2100... +[2025-08-18 17:19:40,835][02710] Num frames 2200... +[2025-08-18 17:19:40,969][02710] Num frames 2300... +[2025-08-18 17:19:41,106][02710] Num frames 2400... +[2025-08-18 17:19:41,283][02710] Num frames 2500... +[2025-08-18 17:19:41,455][02710] Num frames 2600... +[2025-08-18 17:19:41,622][02710] Num frames 2700... +[2025-08-18 17:19:41,790][02710] Num frames 2800... +[2025-08-18 17:19:41,877][02710] Avg episode rewards: #0: 35.585, true rewards: #0: 14.085 +[2025-08-18 17:19:41,878][02710] Avg episode reward: 35.585, avg true_objective: 14.085 +[2025-08-18 17:19:42,024][02710] Num frames 2900... +[2025-08-18 17:19:42,190][02710] Num frames 3000... +[2025-08-18 17:19:42,352][02710] Num frames 3100... +[2025-08-18 17:19:42,524][02710] Num frames 3200... +[2025-08-18 17:19:42,704][02710] Num frames 3300... +[2025-08-18 17:19:42,873][02710] Num frames 3400... +[2025-08-18 17:19:43,031][02710] Avg episode rewards: #0: 27.857, true rewards: #0: 11.523 +[2025-08-18 17:19:43,033][02710] Avg episode reward: 27.857, avg true_objective: 11.523 +[2025-08-18 17:19:43,115][02710] Num frames 3500... +[2025-08-18 17:19:43,303][02710] Avg episode rewards: #0: 21.213, true rewards: #0: 8.962 +[2025-08-18 17:19:43,304][02710] Avg episode reward: 21.213, avg true_objective: 8.962 +[2025-08-18 17:19:43,324][02710] Num frames 3600... +[2025-08-18 17:19:43,450][02710] Num frames 3700... +[2025-08-18 17:19:43,581][02710] Num frames 3800... +[2025-08-18 17:19:43,710][02710] Num frames 3900... +[2025-08-18 17:19:43,840][02710] Num frames 4000... +[2025-08-18 17:19:43,970][02710] Num frames 4100... +[2025-08-18 17:19:44,115][02710] Num frames 4200... +[2025-08-18 17:19:44,245][02710] Num frames 4300... +[2025-08-18 17:19:44,375][02710] Num frames 4400... +[2025-08-18 17:19:44,501][02710] Num frames 4500... +[2025-08-18 17:19:44,629][02710] Num frames 4600... +[2025-08-18 17:19:44,719][02710] Avg episode rewards: #0: 21.050, true rewards: #0: 9.250 +[2025-08-18 17:19:44,720][02710] Avg episode reward: 21.050, avg true_objective: 9.250 +[2025-08-18 17:19:44,824][02710] Num frames 4700... +[2025-08-18 17:19:44,948][02710] Num frames 4800... +[2025-08-18 17:19:45,081][02710] Num frames 4900... +[2025-08-18 17:19:45,209][02710] Num frames 5000... +[2025-08-18 17:19:45,361][02710] Avg episode rewards: #0: 18.955, true rewards: #0: 8.455 +[2025-08-18 17:19:45,362][02710] Avg episode reward: 18.955, avg true_objective: 8.455 +[2025-08-18 17:19:45,397][02710] Num frames 5100... +[2025-08-18 17:19:45,522][02710] Num frames 5200... +[2025-08-18 17:19:45,653][02710] Num frames 5300... +[2025-08-18 17:19:45,779][02710] Num frames 5400... +[2025-08-18 17:19:45,910][02710] Num frames 5500... +[2025-08-18 17:19:46,039][02710] Num frames 5600... +[2025-08-18 17:19:46,185][02710] Num frames 5700... +[2025-08-18 17:19:46,315][02710] Num frames 5800... +[2025-08-18 17:19:46,446][02710] Num frames 5900... +[2025-08-18 17:19:46,572][02710] Num frames 6000... +[2025-08-18 17:19:46,702][02710] Num frames 6100... +[2025-08-18 17:19:46,829][02710] Num frames 6200... +[2025-08-18 17:19:46,955][02710] Num frames 6300... +[2025-08-18 17:19:47,084][02710] Num frames 6400... +[2025-08-18 17:19:47,240][02710] Num frames 6500... +[2025-08-18 17:19:47,366][02710] Num frames 6600... +[2025-08-18 17:19:47,502][02710] Num frames 6700... +[2025-08-18 17:19:47,652][02710] Avg episode rewards: #0: 22.533, true rewards: #0: 9.676 +[2025-08-18 17:19:47,653][02710] Avg episode reward: 22.533, avg true_objective: 9.676 +[2025-08-18 17:19:47,690][02710] Num frames 6800... +[2025-08-18 17:19:47,815][02710] Num frames 6900... +[2025-08-18 17:19:47,941][02710] Num frames 7000... +[2025-08-18 17:19:48,068][02710] Num frames 7100... +[2025-08-18 17:19:48,207][02710] Num frames 7200... +[2025-08-18 17:19:48,338][02710] Num frames 7300... +[2025-08-18 17:19:48,466][02710] Num frames 7400... +[2025-08-18 17:19:48,593][02710] Num frames 7500... +[2025-08-18 17:19:48,723][02710] Num frames 7600... +[2025-08-18 17:19:48,853][02710] Num frames 7700... +[2025-08-18 17:19:48,981][02710] Num frames 7800... +[2025-08-18 17:19:49,112][02710] Num frames 7900... +[2025-08-18 17:19:49,251][02710] Num frames 8000... +[2025-08-18 17:19:49,382][02710] Num frames 8100... +[2025-08-18 17:19:49,510][02710] Num frames 8200... +[2025-08-18 17:19:49,691][02710] Avg episode rewards: #0: 23.871, true rewards: #0: 10.371 +[2025-08-18 17:19:49,692][02710] Avg episode reward: 23.871, avg true_objective: 10.371 +[2025-08-18 17:19:49,697][02710] Num frames 8300... +[2025-08-18 17:19:49,827][02710] Num frames 8400... +[2025-08-18 17:19:49,954][02710] Num frames 8500... +[2025-08-18 17:19:50,079][02710] Num frames 8600... +[2025-08-18 17:19:50,209][02710] Num frames 8700... +[2025-08-18 17:19:50,347][02710] Num frames 8800... +[2025-08-18 17:19:50,480][02710] Num frames 8900... +[2025-08-18 17:19:50,607][02710] Num frames 9000... +[2025-08-18 17:19:50,666][02710] Avg episode rewards: #0: 22.668, true rewards: #0: 10.001 +[2025-08-18 17:19:50,667][02710] Avg episode reward: 22.668, avg true_objective: 10.001 +[2025-08-18 17:19:50,792][02710] Num frames 9100... +[2025-08-18 17:19:50,921][02710] Num frames 9200... +[2025-08-18 17:19:51,046][02710] Num frames 9300... +[2025-08-18 17:19:51,171][02710] Num frames 9400... +[2025-08-18 17:19:51,311][02710] Num frames 9500... +[2025-08-18 17:19:51,437][02710] Num frames 9600... +[2025-08-18 17:19:51,562][02710] Num frames 9700... +[2025-08-18 17:19:51,689][02710] Num frames 9800... +[2025-08-18 17:19:51,826][02710] Avg episode rewards: #0: 22.065, true rewards: #0: 9.865 +[2025-08-18 17:19:51,827][02710] Avg episode reward: 22.065, avg true_objective: 9.865 +[2025-08-18 17:20:53,325][02710] Replay video saved to /content/train_dir/default_experiment/replay.mp4!