diff --git "a/sf_log.txt" "b/sf_log.txt" new file mode 100644--- /dev/null +++ "b/sf_log.txt" @@ -0,0 +1,1204 @@ +[2025-08-01 17:28:11,657][02698] Saving configuration to /content/train_dir/default_experiment/config.json... +[2025-08-01 17:28:11,659][02698] Rollout worker 0 uses device cpu +[2025-08-01 17:28:11,660][02698] Rollout worker 1 uses device cpu +[2025-08-01 17:28:11,662][02698] Rollout worker 2 uses device cpu +[2025-08-01 17:28:11,663][02698] Rollout worker 3 uses device cpu +[2025-08-01 17:28:11,664][02698] Rollout worker 4 uses device cpu +[2025-08-01 17:28:11,665][02698] Rollout worker 5 uses device cpu +[2025-08-01 17:28:11,666][02698] Rollout worker 6 uses device cpu +[2025-08-01 17:28:11,667][02698] Rollout worker 7 uses device cpu +[2025-08-01 17:28:11,804][02698] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-01 17:28:11,804][02698] InferenceWorker_p0-w0: min num requests: 2 +[2025-08-01 17:28:11,833][02698] Starting all processes... +[2025-08-01 17:28:11,834][02698] Starting process learner_proc0 +[2025-08-01 17:28:11,887][02698] Starting all processes... +[2025-08-01 17:28:11,894][02698] Starting process inference_proc0-0 +[2025-08-01 17:28:11,895][02698] Starting process rollout_proc0 +[2025-08-01 17:28:11,895][02698] Starting process rollout_proc1 +[2025-08-01 17:28:11,895][02698] Starting process rollout_proc2 +[2025-08-01 17:28:11,896][02698] Starting process rollout_proc3 +[2025-08-01 17:28:11,896][02698] Starting process rollout_proc4 +[2025-08-01 17:28:11,896][02698] Starting process rollout_proc5 +[2025-08-01 17:28:11,896][02698] Starting process rollout_proc6 +[2025-08-01 17:28:11,896][02698] Starting process rollout_proc7 +[2025-08-01 17:28:27,931][02854] Worker 1 uses CPU cores [1] +[2025-08-01 17:28:28,115][02856] Worker 3 uses CPU cores [1] +[2025-08-01 17:28:28,182][02858] Worker 5 uses CPU cores [1] +[2025-08-01 17:28:28,321][02860] Worker 7 uses CPU cores [1] +[2025-08-01 17:28:28,520][02855] Worker 2 uses CPU cores [0] +[2025-08-01 17:28:28,519][02857] Worker 4 uses CPU cores [0] +[2025-08-01 17:28:28,572][02853] Worker 0 uses CPU cores [0] +[2025-08-01 17:28:28,739][02835] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-01 17:28:28,739][02835] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-08-01 17:28:28,741][02859] Worker 6 uses CPU cores [0] +[2025-08-01 17:28:28,755][02852] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-01 17:28:28,756][02852] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-08-01 17:28:28,773][02835] Num visible devices: 1 +[2025-08-01 17:28:28,775][02835] Starting seed is not provided +[2025-08-01 17:28:28,775][02835] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-01 17:28:28,775][02835] Initializing actor-critic model on device cuda:0 +[2025-08-01 17:28:28,776][02835] RunningMeanStd input shape: (3, 72, 128) +[2025-08-01 17:28:28,780][02835] RunningMeanStd input shape: (1,) +[2025-08-01 17:28:28,791][02852] Num visible devices: 1 +[2025-08-01 17:28:28,799][02835] ConvEncoder: input_channels=3 +[2025-08-01 17:28:29,140][02835] Conv encoder output size: 512 +[2025-08-01 17:28:29,141][02835] Policy head output size: 512 +[2025-08-01 17:28:29,206][02835] Created Actor Critic model with architecture: +[2025-08-01 17:28:29,207][02835] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2025-08-01 17:28:29,535][02835] Using optimizer +[2025-08-01 17:28:31,797][02698] Heartbeat connected on Batcher_0 +[2025-08-01 17:28:31,804][02698] Heartbeat connected on InferenceWorker_p0-w0 +[2025-08-01 17:28:31,810][02698] Heartbeat connected on RolloutWorker_w0 +[2025-08-01 17:28:31,817][02698] Heartbeat connected on RolloutWorker_w2 +[2025-08-01 17:28:31,819][02698] Heartbeat connected on RolloutWorker_w1 +[2025-08-01 17:28:31,821][02698] Heartbeat connected on RolloutWorker_w3 +[2025-08-01 17:28:31,824][02698] Heartbeat connected on RolloutWorker_w4 +[2025-08-01 17:28:31,827][02698] Heartbeat connected on RolloutWorker_w5 +[2025-08-01 17:28:31,834][02698] Heartbeat connected on RolloutWorker_w7 +[2025-08-01 17:28:31,835][02698] Heartbeat connected on RolloutWorker_w6 +[2025-08-01 17:28:34,347][02835] No checkpoints found +[2025-08-01 17:28:34,347][02835] Did not load from checkpoint, starting from scratch! +[2025-08-01 17:28:34,347][02835] Initialized policy 0 weights for model version 0 +[2025-08-01 17:28:34,350][02835] LearnerWorker_p0 finished initialization! +[2025-08-01 17:28:34,351][02835] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-01 17:28:34,351][02698] Heartbeat connected on LearnerWorker_p0 +[2025-08-01 17:28:34,493][02852] RunningMeanStd input shape: (3, 72, 128) +[2025-08-01 17:28:34,494][02852] RunningMeanStd input shape: (1,) +[2025-08-01 17:28:34,505][02852] ConvEncoder: input_channels=3 +[2025-08-01 17:28:34,606][02852] Conv encoder output size: 512 +[2025-08-01 17:28:34,606][02852] Policy head output size: 512 +[2025-08-01 17:28:34,644][02698] Inference worker 0-0 is ready! +[2025-08-01 17:28:34,645][02698] All inference workers are ready! Signal rollout workers to start! +[2025-08-01 17:28:34,932][02855] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-01 17:28:34,934][02859] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-01 17:28:34,944][02856] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-01 17:28:34,992][02857] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-01 17:28:34,999][02853] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-01 17:28:35,023][02860] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-01 17:28:35,024][02854] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-01 17:28:35,043][02858] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-01 17:28:36,088][02698] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-01 17:28:36,305][02856] Decorrelating experience for 0 frames... +[2025-08-01 17:28:36,307][02860] Decorrelating experience for 0 frames... +[2025-08-01 17:28:36,307][02855] Decorrelating experience for 0 frames... +[2025-08-01 17:28:36,305][02859] Decorrelating experience for 0 frames... +[2025-08-01 17:28:36,309][02854] Decorrelating experience for 0 frames... +[2025-08-01 17:28:36,704][02855] Decorrelating experience for 32 frames... +[2025-08-01 17:28:37,381][02858] Decorrelating experience for 0 frames... +[2025-08-01 17:28:37,408][02860] Decorrelating experience for 32 frames... +[2025-08-01 17:28:37,410][02854] Decorrelating experience for 32 frames... +[2025-08-01 17:28:37,414][02856] Decorrelating experience for 32 frames... +[2025-08-01 17:28:38,241][02859] Decorrelating experience for 32 frames... +[2025-08-01 17:28:38,243][02857] Decorrelating experience for 0 frames... +[2025-08-01 17:28:38,896][02858] Decorrelating experience for 32 frames... +[2025-08-01 17:28:39,397][02860] Decorrelating experience for 64 frames... +[2025-08-01 17:28:39,399][02854] Decorrelating experience for 64 frames... +[2025-08-01 17:28:40,157][02857] Decorrelating experience for 32 frames... +[2025-08-01 17:28:40,166][02855] Decorrelating experience for 64 frames... +[2025-08-01 17:28:40,177][02853] Decorrelating experience for 0 frames... +[2025-08-01 17:28:40,800][02858] Decorrelating experience for 64 frames... +[2025-08-01 17:28:40,955][02860] Decorrelating experience for 96 frames... +[2025-08-01 17:28:40,965][02854] Decorrelating experience for 96 frames... +[2025-08-01 17:28:41,088][02698] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-01 17:28:42,345][02853] Decorrelating experience for 32 frames... +[2025-08-01 17:28:42,351][02859] Decorrelating experience for 64 frames... +[2025-08-01 17:28:43,011][02857] Decorrelating experience for 64 frames... +[2025-08-01 17:28:43,361][02855] Decorrelating experience for 96 frames... +[2025-08-01 17:28:44,657][02859] Decorrelating experience for 96 frames... +[2025-08-01 17:28:45,485][02857] Decorrelating experience for 96 frames... +[2025-08-01 17:28:45,674][02858] Decorrelating experience for 96 frames... +[2025-08-01 17:28:45,695][02856] Decorrelating experience for 64 frames... +[2025-08-01 17:28:46,088][02698] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 19.8. Samples: 198. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-01 17:28:46,089][02698] Avg episode reward: [(0, '2.655')] +[2025-08-01 17:28:47,822][02835] Signal inference workers to stop experience collection... +[2025-08-01 17:28:47,842][02852] InferenceWorker_p0-w0: stopping experience collection +[2025-08-01 17:28:47,894][02853] Decorrelating experience for 64 frames... +[2025-08-01 17:28:48,118][02856] Decorrelating experience for 96 frames... +[2025-08-01 17:28:48,508][02853] Decorrelating experience for 96 frames... +[2025-08-01 17:28:49,321][02835] Signal inference workers to resume experience collection... +[2025-08-01 17:28:49,322][02852] InferenceWorker_p0-w0: resuming experience collection +[2025-08-01 17:28:51,088][02698] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 168.3. Samples: 2524. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2025-08-01 17:28:51,090][02698] Avg episode reward: [(0, '3.087')] +[2025-08-01 17:28:56,090][02698] Fps is (10 sec: 3276.0, 60 sec: 1638.2, 300 sec: 1638.2). Total num frames: 32768. Throughput: 0: 435.8. Samples: 8718. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:28:56,095][02698] Avg episode reward: [(0, '3.684')] +[2025-08-01 17:28:58,924][02852] Updated weights for policy 0, policy_version 10 (0.0100) +[2025-08-01 17:29:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 425.1. Samples: 10628. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:29:01,092][02698] Avg episode reward: [(0, '4.146')] +[2025-08-01 17:29:06,088][02698] Fps is (10 sec: 3687.3, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 69632. Throughput: 0: 566.3. Samples: 16990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:29:06,092][02698] Avg episode reward: [(0, '4.383')] +[2025-08-01 17:29:08,241][02852] Updated weights for policy 0, policy_version 20 (0.0021) +[2025-08-01 17:29:11,093][02698] Fps is (10 sec: 4093.7, 60 sec: 2574.2, 300 sec: 2574.2). Total num frames: 90112. Throughput: 0: 659.3. Samples: 23078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:29:11,095][02698] Avg episode reward: [(0, '4.390')] +[2025-08-01 17:29:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 623.4. Samples: 24934. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:29:16,089][02698] Avg episode reward: [(0, '4.256')] +[2025-08-01 17:29:16,097][02835] Saving new best policy, reward=4.256! +[2025-08-01 17:29:19,712][02852] Updated weights for policy 0, policy_version 30 (0.0015) +[2025-08-01 17:29:21,088][02698] Fps is (10 sec: 3688.5, 60 sec: 2821.7, 300 sec: 2821.7). Total num frames: 126976. Throughput: 0: 694.6. Samples: 31256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:29:21,089][02698] Avg episode reward: [(0, '4.311')] +[2025-08-01 17:29:21,093][02835] Saving new best policy, reward=4.311! +[2025-08-01 17:29:26,088][02698] Fps is (10 sec: 4096.0, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 147456. Throughput: 0: 829.0. Samples: 37304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:29:26,089][02698] Avg episode reward: [(0, '4.300')] +[2025-08-01 17:29:26,104][02835] Saving new best policy, reward=4.352! +[2025-08-01 17:29:30,815][02852] Updated weights for policy 0, policy_version 40 (0.0033) +[2025-08-01 17:29:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 2978.9, 300 sec: 2978.9). Total num frames: 163840. Throughput: 0: 867.2. Samples: 39220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:29:31,089][02698] Avg episode reward: [(0, '4.388')] +[2025-08-01 17:29:31,093][02835] Saving new best policy, reward=4.388! +[2025-08-01 17:29:36,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 958.7. Samples: 45664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:29:36,089][02698] Avg episode reward: [(0, '4.505')] +[2025-08-01 17:29:36,096][02835] Saving new best policy, reward=4.492! +[2025-08-01 17:29:41,089][02698] Fps is (10 sec: 2866.8, 60 sec: 3208.5, 300 sec: 2961.7). Total num frames: 192512. Throughput: 0: 901.7. Samples: 49294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:29:41,102][02698] Avg episode reward: [(0, '4.531')] +[2025-08-01 17:29:41,107][02835] Saving new best policy, reward=4.531! +[2025-08-01 17:29:44,705][02852] Updated weights for policy 0, policy_version 50 (0.0021) +[2025-08-01 17:29:46,088][02698] Fps is (10 sec: 2457.6, 60 sec: 3481.6, 300 sec: 2984.2). Total num frames: 208896. Throughput: 0: 891.6. Samples: 50750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:29:46,092][02698] Avg episode reward: [(0, '4.485')] +[2025-08-01 17:29:51,088][02698] Fps is (10 sec: 3686.9, 60 sec: 3618.1, 300 sec: 3058.3). Total num frames: 229376. Throughput: 0: 894.0. Samples: 57218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:29:51,091][02698] Avg episode reward: [(0, '4.324')] +[2025-08-01 17:29:53,952][02852] Updated weights for policy 0, policy_version 60 (0.0015) +[2025-08-01 17:29:56,088][02698] Fps is (10 sec: 4095.9, 60 sec: 3618.3, 300 sec: 3123.2). Total num frames: 249856. Throughput: 0: 892.4. Samples: 63232. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:29:56,092][02698] Avg episode reward: [(0, '4.333')] +[2025-08-01 17:30:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3132.2). Total num frames: 266240. Throughput: 0: 894.5. Samples: 65186. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:30:01,093][02698] Avg episode reward: [(0, '4.505')] +[2025-08-01 17:30:05,046][02852] Updated weights for policy 0, policy_version 70 (0.0014) +[2025-08-01 17:30:06,088][02698] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3231.3). Total num frames: 290816. Throughput: 0: 901.3. Samples: 71814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:30:06,091][02698] Avg episode reward: [(0, '4.588')] +[2025-08-01 17:30:06,098][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth... +[2025-08-01 17:30:06,205][02835] Saving new best policy, reward=4.588! +[2025-08-01 17:30:11,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3618.5, 300 sec: 3233.7). Total num frames: 307200. Throughput: 0: 889.2. Samples: 77318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:30:11,103][02698] Avg episode reward: [(0, '4.663')] +[2025-08-01 17:30:11,110][02835] Saving new best policy, reward=4.663! +[2025-08-01 17:30:16,089][02698] Fps is (10 sec: 3276.6, 60 sec: 3618.1, 300 sec: 3235.8). Total num frames: 323584. Throughput: 0: 889.4. Samples: 79244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:30:16,090][02698] Avg episode reward: [(0, '4.595')] +[2025-08-01 17:30:16,390][02852] Updated weights for policy 0, policy_version 80 (0.0029) +[2025-08-01 17:30:21,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3315.8). Total num frames: 348160. Throughput: 0: 898.6. Samples: 86100. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:30:21,089][02698] Avg episode reward: [(0, '4.642')] +[2025-08-01 17:30:26,088][02698] Fps is (10 sec: 4096.2, 60 sec: 3618.1, 300 sec: 3314.0). Total num frames: 364544. Throughput: 0: 943.2. Samples: 91736. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:30:26,093][02698] Avg episode reward: [(0, '4.642')] +[2025-08-01 17:30:27,388][02852] Updated weights for policy 0, policy_version 90 (0.0017) +[2025-08-01 17:30:31,095][02698] Fps is (10 sec: 3684.0, 60 sec: 3686.0, 300 sec: 3347.8). Total num frames: 385024. Throughput: 0: 963.6. Samples: 94120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:30:31,096][02698] Avg episode reward: [(0, '4.506')] +[2025-08-01 17:30:36,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3379.2). Total num frames: 405504. Throughput: 0: 972.2. Samples: 100968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-08-01 17:30:36,092][02698] Avg episode reward: [(0, '4.570')] +[2025-08-01 17:30:36,476][02852] Updated weights for policy 0, policy_version 100 (0.0025) +[2025-08-01 17:30:41,095][02698] Fps is (10 sec: 3686.3, 60 sec: 3822.6, 300 sec: 3374.9). Total num frames: 421888. Throughput: 0: 954.8. Samples: 106206. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-08-01 17:30:41,098][02698] Avg episode reward: [(0, '4.631')] +[2025-08-01 17:30:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3402.8). Total num frames: 442368. Throughput: 0: 970.1. Samples: 108840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:30:46,089][02698] Avg episode reward: [(0, '4.628')] +[2025-08-01 17:30:47,609][02852] Updated weights for policy 0, policy_version 110 (0.0015) +[2025-08-01 17:30:51,088][02698] Fps is (10 sec: 4098.8, 60 sec: 3891.2, 300 sec: 3428.5). Total num frames: 462848. Throughput: 0: 971.9. Samples: 115550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:30:51,089][02698] Avg episode reward: [(0, '4.722')] +[2025-08-01 17:30:51,092][02835] Saving new best policy, reward=4.722! +[2025-08-01 17:30:56,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3423.1). Total num frames: 479232. Throughput: 0: 959.9. Samples: 120514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:30:56,092][02698] Avg episode reward: [(0, '4.742')] +[2025-08-01 17:30:56,099][02835] Saving new best policy, reward=4.742! +[2025-08-01 17:30:58,672][02852] Updated weights for policy 0, policy_version 120 (0.0020) +[2025-08-01 17:31:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3446.3). Total num frames: 499712. Throughput: 0: 982.4. Samples: 123452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:31:01,093][02698] Avg episode reward: [(0, '4.643')] +[2025-08-01 17:31:06,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3495.3). Total num frames: 524288. Throughput: 0: 983.6. Samples: 130362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:31:06,092][02698] Avg episode reward: [(0, '4.681')] +[2025-08-01 17:31:08,142][02852] Updated weights for policy 0, policy_version 130 (0.0012) +[2025-08-01 17:31:11,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3461.8). Total num frames: 536576. Throughput: 0: 969.5. Samples: 135364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:31:11,093][02698] Avg episode reward: [(0, '4.797')] +[2025-08-01 17:31:11,157][02835] Saving new best policy, reward=4.797! +[2025-08-01 17:31:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3507.2). Total num frames: 561152. Throughput: 0: 985.3. Samples: 138452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:31:16,089][02698] Avg episode reward: [(0, '5.125')] +[2025-08-01 17:31:16,096][02835] Saving new best policy, reward=5.125! +[2025-08-01 17:31:18,500][02852] Updated weights for policy 0, policy_version 140 (0.0018) +[2025-08-01 17:31:21,088][02698] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3525.0). Total num frames: 581632. Throughput: 0: 985.6. Samples: 145322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:31:21,089][02698] Avg episode reward: [(0, '4.893')] +[2025-08-01 17:31:26,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3517.7). Total num frames: 598016. Throughput: 0: 973.9. Samples: 150026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:31:26,089][02698] Avg episode reward: [(0, '4.502')] +[2025-08-01 17:31:29,417][02852] Updated weights for policy 0, policy_version 150 (0.0018) +[2025-08-01 17:31:31,088][02698] Fps is (10 sec: 3686.5, 60 sec: 3891.6, 300 sec: 3534.3). Total num frames: 618496. Throughput: 0: 989.5. Samples: 153368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:31:31,089][02698] Avg episode reward: [(0, '4.596')] +[2025-08-01 17:31:36,088][02698] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3572.6). Total num frames: 643072. Throughput: 0: 989.4. Samples: 160072. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:31:36,092][02698] Avg episode reward: [(0, '5.116')] +[2025-08-01 17:31:40,453][02852] Updated weights for policy 0, policy_version 160 (0.0019) +[2025-08-01 17:31:41,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.6, 300 sec: 3542.5). Total num frames: 655360. Throughput: 0: 979.4. Samples: 164586. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:31:41,091][02698] Avg episode reward: [(0, '5.249')] +[2025-08-01 17:31:41,094][02835] Saving new best policy, reward=5.249! +[2025-08-01 17:31:46,088][02698] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3578.6). Total num frames: 679936. Throughput: 0: 984.4. Samples: 167748. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:31:46,092][02698] Avg episode reward: [(0, '5.159')] +[2025-08-01 17:31:49,815][02852] Updated weights for policy 0, policy_version 170 (0.0016) +[2025-08-01 17:31:51,090][02698] Fps is (10 sec: 4095.3, 60 sec: 3891.1, 300 sec: 3570.8). Total num frames: 696320. Throughput: 0: 978.6. Samples: 174400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:31:51,094][02698] Avg episode reward: [(0, '4.998')] +[2025-08-01 17:31:56,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 971.9. Samples: 179098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:31:56,093][02698] Avg episode reward: [(0, '5.019')] +[2025-08-01 17:32:00,583][02852] Updated weights for policy 0, policy_version 180 (0.0015) +[2025-08-01 17:32:01,088][02698] Fps is (10 sec: 4096.7, 60 sec: 3959.5, 300 sec: 3596.5). Total num frames: 737280. Throughput: 0: 982.8. Samples: 182680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:32:01,092][02698] Avg episode reward: [(0, '5.457')] +[2025-08-01 17:32:01,098][02835] Saving new best policy, reward=5.457! +[2025-08-01 17:32:06,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3608.4). Total num frames: 757760. Throughput: 0: 979.5. Samples: 189400. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-08-01 17:32:06,091][02698] Avg episode reward: [(0, '5.595')] +[2025-08-01 17:32:06,103][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000185_757760.pth... +[2025-08-01 17:32:06,251][02835] Saving new best policy, reward=5.595! +[2025-08-01 17:32:11,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3600.7). Total num frames: 774144. Throughput: 0: 982.4. Samples: 194234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:32:11,091][02698] Avg episode reward: [(0, '5.753')] +[2025-08-01 17:32:11,093][02835] Saving new best policy, reward=5.753! +[2025-08-01 17:32:11,552][02852] Updated weights for policy 0, policy_version 190 (0.0021) +[2025-08-01 17:32:16,088][02698] Fps is (10 sec: 4095.9, 60 sec: 3959.5, 300 sec: 3630.5). Total num frames: 798720. Throughput: 0: 979.5. Samples: 197446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:32:16,097][02698] Avg episode reward: [(0, '5.987')] +[2025-08-01 17:32:16,104][02835] Saving new best policy, reward=5.987! +[2025-08-01 17:32:21,088][02698] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3622.7). Total num frames: 815104. Throughput: 0: 976.4. Samples: 204012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-08-01 17:32:21,090][02698] Avg episode reward: [(0, '5.989')] +[2025-08-01 17:32:21,092][02835] Saving new best policy, reward=5.989! +[2025-08-01 17:32:21,791][02852] Updated weights for policy 0, policy_version 200 (0.0047) +[2025-08-01 17:32:26,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3615.2). Total num frames: 831488. Throughput: 0: 986.6. Samples: 208982. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:32:26,092][02698] Avg episode reward: [(0, '6.570')] +[2025-08-01 17:32:26,108][02835] Saving new best policy, reward=6.570! +[2025-08-01 17:32:31,088][02698] Fps is (10 sec: 4096.1, 60 sec: 3959.4, 300 sec: 3642.8). Total num frames: 856064. Throughput: 0: 991.6. Samples: 212370. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:32:31,092][02698] Avg episode reward: [(0, '6.729')] +[2025-08-01 17:32:31,096][02835] Saving new best policy, reward=6.729! +[2025-08-01 17:32:31,495][02852] Updated weights for policy 0, policy_version 210 (0.0012) +[2025-08-01 17:32:36,090][02698] Fps is (10 sec: 4095.2, 60 sec: 3822.8, 300 sec: 3635.2). Total num frames: 872448. Throughput: 0: 983.4. Samples: 218654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:32:36,095][02698] Avg episode reward: [(0, '6.733')] +[2025-08-01 17:32:36,106][02835] Saving new best policy, reward=6.733! +[2025-08-01 17:32:41,088][02698] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3644.6). Total num frames: 892928. Throughput: 0: 995.7. Samples: 223906. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-08-01 17:32:41,095][02698] Avg episode reward: [(0, '6.304')] +[2025-08-01 17:32:42,339][02852] Updated weights for policy 0, policy_version 220 (0.0023) +[2025-08-01 17:32:46,088][02698] Fps is (10 sec: 4506.5, 60 sec: 3959.5, 300 sec: 3670.0). Total num frames: 917504. Throughput: 0: 989.6. Samples: 227214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:32:46,094][02698] Avg episode reward: [(0, '6.269')] +[2025-08-01 17:32:51,088][02698] Fps is (10 sec: 4095.9, 60 sec: 3959.6, 300 sec: 3662.3). Total num frames: 933888. Throughput: 0: 973.6. Samples: 233212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:32:51,094][02698] Avg episode reward: [(0, '6.060')] +[2025-08-01 17:32:53,305][02852] Updated weights for policy 0, policy_version 230 (0.0017) +[2025-08-01 17:32:56,088][02698] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3670.6). Total num frames: 954368. Throughput: 0: 991.4. Samples: 238846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:32:56,089][02698] Avg episode reward: [(0, '6.089')] +[2025-08-01 17:33:01,088][02698] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3678.7). Total num frames: 974848. Throughput: 0: 995.7. Samples: 242254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:33:01,089][02698] Avg episode reward: [(0, '6.167')] +[2025-08-01 17:33:02,270][02852] Updated weights for policy 0, policy_version 240 (0.0027) +[2025-08-01 17:33:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3671.2). Total num frames: 991232. Throughput: 0: 978.6. Samples: 248048. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:33:06,091][02698] Avg episode reward: [(0, '6.106')] +[2025-08-01 17:33:11,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3679.0). Total num frames: 1011712. Throughput: 0: 987.6. Samples: 253426. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:33:11,092][02698] Avg episode reward: [(0, '6.346')] +[2025-08-01 17:33:15,097][02852] Updated weights for policy 0, policy_version 250 (0.0015) +[2025-08-01 17:33:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3671.8). Total num frames: 1028096. Throughput: 0: 954.9. Samples: 255342. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:33:16,092][02698] Avg episode reward: [(0, '6.526')] +[2025-08-01 17:33:21,088][02698] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3650.5). Total num frames: 1040384. Throughput: 0: 931.5. Samples: 260568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:33:21,093][02698] Avg episode reward: [(0, '6.934')] +[2025-08-01 17:33:21,097][02835] Saving new best policy, reward=6.934! +[2025-08-01 17:33:25,907][02852] Updated weights for policy 0, policy_version 260 (0.0025) +[2025-08-01 17:33:26,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3672.3). Total num frames: 1064960. Throughput: 0: 944.3. Samples: 266398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:33:26,089][02698] Avg episode reward: [(0, '7.142')] +[2025-08-01 17:33:26,095][02835] Saving new best policy, reward=7.142! +[2025-08-01 17:33:31,088][02698] Fps is (10 sec: 4505.7, 60 sec: 3823.0, 300 sec: 3679.5). Total num frames: 1085440. Throughput: 0: 946.8. Samples: 269822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:33:31,089][02698] Avg episode reward: [(0, '7.686')] +[2025-08-01 17:33:31,091][02835] Saving new best policy, reward=7.686! +[2025-08-01 17:33:36,088][02698] Fps is (10 sec: 3686.3, 60 sec: 3823.0, 300 sec: 3735.0). Total num frames: 1101824. Throughput: 0: 928.5. Samples: 274994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:33:36,090][02698] Avg episode reward: [(0, '8.522')] +[2025-08-01 17:33:36,097][02835] Saving new best policy, reward=8.522! +[2025-08-01 17:33:37,009][02852] Updated weights for policy 0, policy_version 270 (0.0025) +[2025-08-01 17:33:41,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 941.0. Samples: 281190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:33:41,092][02698] Avg episode reward: [(0, '9.137')] +[2025-08-01 17:33:41,094][02835] Saving new best policy, reward=9.137! +[2025-08-01 17:33:46,092][02698] Fps is (10 sec: 4094.3, 60 sec: 3754.4, 300 sec: 3832.1). Total num frames: 1142784. Throughput: 0: 941.1. Samples: 284606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:33:46,094][02698] Avg episode reward: [(0, '8.802')] +[2025-08-01 17:33:46,367][02852] Updated weights for policy 0, policy_version 280 (0.0033) +[2025-08-01 17:33:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 1159168. Throughput: 0: 921.2. Samples: 289502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:33:51,090][02698] Avg episode reward: [(0, '8.504')] +[2025-08-01 17:33:56,088][02698] Fps is (10 sec: 3688.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 1179648. Throughput: 0: 946.9. Samples: 296036. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:33:56,091][02698] Avg episode reward: [(0, '9.128')] +[2025-08-01 17:33:56,995][02852] Updated weights for policy 0, policy_version 290 (0.0022) +[2025-08-01 17:34:01,088][02698] Fps is (10 sec: 4505.5, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1204224. Throughput: 0: 980.4. Samples: 299460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:34:01,091][02698] Avg episode reward: [(0, '9.695')] +[2025-08-01 17:34:01,095][02835] Saving new best policy, reward=9.695! +[2025-08-01 17:34:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.4). Total num frames: 1216512. Throughput: 0: 967.7. Samples: 304114. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-08-01 17:34:06,092][02698] Avg episode reward: [(0, '9.875')] +[2025-08-01 17:34:06,100][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000297_1216512.pth... +[2025-08-01 17:34:06,225][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000071_290816.pth +[2025-08-01 17:34:06,239][02835] Saving new best policy, reward=9.875! +[2025-08-01 17:34:08,152][02852] Updated weights for policy 0, policy_version 300 (0.0019) +[2025-08-01 17:34:11,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1241088. Throughput: 0: 985.2. Samples: 310730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:34:11,090][02698] Avg episode reward: [(0, '9.852')] +[2025-08-01 17:34:16,093][02698] Fps is (10 sec: 4503.1, 60 sec: 3890.8, 300 sec: 3846.0). Total num frames: 1261568. Throughput: 0: 985.7. Samples: 314186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:34:16,097][02698] Avg episode reward: [(0, '10.712')] +[2025-08-01 17:34:16,111][02835] Saving new best policy, reward=10.712! +[2025-08-01 17:34:18,571][02852] Updated weights for policy 0, policy_version 310 (0.0023) +[2025-08-01 17:34:21,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3832.2). Total num frames: 1277952. Throughput: 0: 974.5. Samples: 318844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:34:21,093][02698] Avg episode reward: [(0, '10.933')] +[2025-08-01 17:34:21,097][02835] Saving new best policy, reward=10.933! +[2025-08-01 17:34:26,088][02698] Fps is (10 sec: 4098.3, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1302528. Throughput: 0: 988.5. Samples: 325674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:34:26,092][02698] Avg episode reward: [(0, '12.438')] +[2025-08-01 17:34:26,099][02835] Saving new best policy, reward=12.438! +[2025-08-01 17:34:27,932][02852] Updated weights for policy 0, policy_version 320 (0.0017) +[2025-08-01 17:34:31,095][02698] Fps is (10 sec: 4092.9, 60 sec: 3890.7, 300 sec: 3846.0). Total num frames: 1318912. Throughput: 0: 989.4. Samples: 329130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:34:31,099][02698] Avg episode reward: [(0, '12.477')] +[2025-08-01 17:34:31,111][02835] Saving new best policy, reward=12.477! +[2025-08-01 17:34:36,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1339392. Throughput: 0: 984.1. Samples: 333788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:34:36,092][02698] Avg episode reward: [(0, '13.024')] +[2025-08-01 17:34:36,099][02835] Saving new best policy, reward=13.024! +[2025-08-01 17:34:38,660][02852] Updated weights for policy 0, policy_version 330 (0.0014) +[2025-08-01 17:34:41,088][02698] Fps is (10 sec: 4099.1, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1359872. Throughput: 0: 993.2. Samples: 340732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:34:41,092][02698] Avg episode reward: [(0, '12.956')] +[2025-08-01 17:34:46,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3901.6). Total num frames: 1380352. Throughput: 0: 994.4. Samples: 344206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:34:46,093][02698] Avg episode reward: [(0, '13.146')] +[2025-08-01 17:34:46,103][02835] Saving new best policy, reward=13.146! +[2025-08-01 17:34:49,550][02852] Updated weights for policy 0, policy_version 340 (0.0018) +[2025-08-01 17:34:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1396736. Throughput: 0: 992.1. Samples: 348758. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:34:51,094][02698] Avg episode reward: [(0, '13.243')] +[2025-08-01 17:34:51,096][02835] Saving new best policy, reward=13.243! +[2025-08-01 17:34:56,088][02698] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 1421312. Throughput: 0: 997.3. Samples: 355610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:34:56,090][02698] Avg episode reward: [(0, '13.926')] +[2025-08-01 17:34:56,096][02835] Saving new best policy, reward=13.926! +[2025-08-01 17:34:58,716][02852] Updated weights for policy 0, policy_version 350 (0.0037) +[2025-08-01 17:35:01,090][02698] Fps is (10 sec: 4095.0, 60 sec: 3891.0, 300 sec: 3887.7). Total num frames: 1437696. Throughput: 0: 993.8. Samples: 358902. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:35:01,099][02698] Avg episode reward: [(0, '14.527')] +[2025-08-01 17:35:01,103][02835] Saving new best policy, reward=14.527! +[2025-08-01 17:35:06,088][02698] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3887.7). Total num frames: 1454080. Throughput: 0: 990.5. Samples: 363418. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:35:06,089][02698] Avg episode reward: [(0, '13.978')] +[2025-08-01 17:35:09,641][02852] Updated weights for policy 0, policy_version 360 (0.0027) +[2025-08-01 17:35:11,088][02698] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1478656. Throughput: 0: 992.0. Samples: 370312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-08-01 17:35:11,094][02698] Avg episode reward: [(0, '14.135')] +[2025-08-01 17:35:16,094][02698] Fps is (10 sec: 4093.5, 60 sec: 3891.2, 300 sec: 3887.6). Total num frames: 1495040. Throughput: 0: 983.9. Samples: 373406. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:35:16,096][02698] Avg episode reward: [(0, '15.391')] +[2025-08-01 17:35:16,111][02835] Saving new best policy, reward=15.391! +[2025-08-01 17:35:20,772][02852] Updated weights for policy 0, policy_version 370 (0.0015) +[2025-08-01 17:35:21,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 1515520. Throughput: 0: 989.3. Samples: 378308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-08-01 17:35:21,092][02698] Avg episode reward: [(0, '15.315')] +[2025-08-01 17:35:26,088][02698] Fps is (10 sec: 4508.4, 60 sec: 3959.5, 300 sec: 3915.6). Total num frames: 1540096. Throughput: 0: 988.4. Samples: 385212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:35:26,089][02698] Avg episode reward: [(0, '15.540')] +[2025-08-01 17:35:26,096][02835] Saving new best policy, reward=15.540! +[2025-08-01 17:35:30,683][02852] Updated weights for policy 0, policy_version 380 (0.0014) +[2025-08-01 17:35:31,090][02698] Fps is (10 sec: 4095.3, 60 sec: 3959.8, 300 sec: 3901.6). Total num frames: 1556480. Throughput: 0: 976.8. Samples: 388164. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:35:31,097][02698] Avg episode reward: [(0, '16.398')] +[2025-08-01 17:35:31,103][02835] Saving new best policy, reward=16.398! +[2025-08-01 17:35:36,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.6). Total num frames: 1576960. Throughput: 0: 991.4. Samples: 393372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:35:36,089][02698] Avg episode reward: [(0, '16.327')] +[2025-08-01 17:35:40,243][02852] Updated weights for policy 0, policy_version 390 (0.0014) +[2025-08-01 17:35:41,088][02698] Fps is (10 sec: 4096.6, 60 sec: 3959.4, 300 sec: 3915.5). Total num frames: 1597440. Throughput: 0: 993.2. Samples: 400306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:35:41,092][02698] Avg episode reward: [(0, '16.604')] +[2025-08-01 17:35:41,156][02835] Saving new best policy, reward=16.604! +[2025-08-01 17:35:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1613824. Throughput: 0: 979.8. Samples: 402992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:35:46,095][02698] Avg episode reward: [(0, '17.030')] +[2025-08-01 17:35:46,107][02835] Saving new best policy, reward=17.030! +[2025-08-01 17:35:51,088][02698] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1634304. Throughput: 0: 996.2. Samples: 408248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:35:51,089][02698] Avg episode reward: [(0, '15.661')] +[2025-08-01 17:35:51,278][02852] Updated weights for policy 0, policy_version 400 (0.0014) +[2025-08-01 17:35:56,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1658880. Throughput: 0: 998.2. Samples: 415230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:35:56,090][02698] Avg episode reward: [(0, '14.373')] +[2025-08-01 17:36:01,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3901.6). Total num frames: 1675264. Throughput: 0: 987.4. Samples: 417834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:36:01,089][02698] Avg episode reward: [(0, '13.823')] +[2025-08-01 17:36:02,036][02852] Updated weights for policy 0, policy_version 410 (0.0021) +[2025-08-01 17:36:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1695744. Throughput: 0: 1006.0. Samples: 423576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-08-01 17:36:06,090][02698] Avg episode reward: [(0, '12.577')] +[2025-08-01 17:36:06,097][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_1695744.pth... +[2025-08-01 17:36:06,215][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000185_757760.pth +[2025-08-01 17:36:10,606][02852] Updated weights for policy 0, policy_version 420 (0.0024) +[2025-08-01 17:36:11,088][02698] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1720320. Throughput: 0: 1009.1. Samples: 430622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:36:11,089][02698] Avg episode reward: [(0, '12.525')] +[2025-08-01 17:36:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.9, 300 sec: 3901.6). Total num frames: 1732608. Throughput: 0: 995.1. Samples: 432942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:36:16,089][02698] Avg episode reward: [(0, '12.542')] +[2025-08-01 17:36:21,088][02698] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1757184. Throughput: 0: 1010.2. Samples: 438832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:36:21,089][02698] Avg episode reward: [(0, '13.692')] +[2025-08-01 17:36:21,420][02852] Updated weights for policy 0, policy_version 430 (0.0017) +[2025-08-01 17:36:26,090][02698] Fps is (10 sec: 4914.2, 60 sec: 4027.6, 300 sec: 3943.2). Total num frames: 1781760. Throughput: 0: 1011.9. Samples: 445842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:36:26,095][02698] Avg episode reward: [(0, '15.525')] +[2025-08-01 17:36:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3901.6). Total num frames: 1794048. Throughput: 0: 999.4. Samples: 447966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:36:31,093][02698] Avg episode reward: [(0, '16.342')] +[2025-08-01 17:36:32,026][02852] Updated weights for policy 0, policy_version 440 (0.0027) +[2025-08-01 17:36:36,088][02698] Fps is (10 sec: 3687.1, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 1818624. Throughput: 0: 1024.1. Samples: 454332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:36:36,090][02698] Avg episode reward: [(0, '18.705')] +[2025-08-01 17:36:36,098][02835] Saving new best policy, reward=18.705! +[2025-08-01 17:36:41,088][02698] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1839104. Throughput: 0: 1008.9. Samples: 460630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-08-01 17:36:41,098][02698] Avg episode reward: [(0, '19.674')] +[2025-08-01 17:36:41,100][02835] Saving new best policy, reward=19.674! +[2025-08-01 17:36:42,346][02852] Updated weights for policy 0, policy_version 450 (0.0037) +[2025-08-01 17:36:46,088][02698] Fps is (10 sec: 2867.2, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 1847296. Throughput: 0: 976.4. Samples: 461774. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:36:46,092][02698] Avg episode reward: [(0, '19.751')] +[2025-08-01 17:36:46,106][02835] Saving new best policy, reward=19.751! +[2025-08-01 17:36:51,088][02698] Fps is (10 sec: 3277.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1871872. Throughput: 0: 975.3. Samples: 467466. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:36:51,095][02698] Avg episode reward: [(0, '20.146')] +[2025-08-01 17:36:51,099][02835] Saving new best policy, reward=20.146! +[2025-08-01 17:36:53,319][02852] Updated weights for policy 0, policy_version 460 (0.0020) +[2025-08-01 17:36:56,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1892352. Throughput: 0: 973.9. Samples: 474448. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:36:56,092][02698] Avg episode reward: [(0, '20.184')] +[2025-08-01 17:36:56,101][02835] Saving new best policy, reward=20.184! +[2025-08-01 17:37:01,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 1912832. Throughput: 0: 968.1. Samples: 476508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:37:01,094][02698] Avg episode reward: [(0, '18.801')] +[2025-08-01 17:37:03,649][02852] Updated weights for policy 0, policy_version 470 (0.0014) +[2025-08-01 17:37:06,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 1933312. Throughput: 0: 980.3. Samples: 482944. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:37:06,089][02698] Avg episode reward: [(0, '18.533')] +[2025-08-01 17:37:11,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 1953792. Throughput: 0: 975.5. Samples: 489736. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:37:11,091][02698] Avg episode reward: [(0, '18.537')] +[2025-08-01 17:37:14,172][02852] Updated weights for policy 0, policy_version 480 (0.0019) +[2025-08-01 17:37:16,089][02698] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 1974272. Throughput: 0: 972.4. Samples: 491726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:37:16,092][02698] Avg episode reward: [(0, '18.844')] +[2025-08-01 17:37:21,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3943.3). Total num frames: 1994752. Throughput: 0: 976.3. Samples: 498264. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:37:21,089][02698] Avg episode reward: [(0, '19.873')] +[2025-08-01 17:37:23,107][02852] Updated weights for policy 0, policy_version 490 (0.0026) +[2025-08-01 17:37:26,089][02698] Fps is (10 sec: 4095.8, 60 sec: 3891.3, 300 sec: 3929.4). Total num frames: 2015232. Throughput: 0: 977.1. Samples: 504600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:37:26,091][02698] Avg episode reward: [(0, '19.959')] +[2025-08-01 17:37:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2031616. Throughput: 0: 993.2. Samples: 506466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:37:31,089][02698] Avg episode reward: [(0, '19.443')] +[2025-08-01 17:37:34,234][02852] Updated weights for policy 0, policy_version 500 (0.0021) +[2025-08-01 17:37:36,088][02698] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3929.4). Total num frames: 2052096. Throughput: 0: 1014.3. Samples: 513110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:37:36,093][02698] Avg episode reward: [(0, '19.624')] +[2025-08-01 17:37:41,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2072576. Throughput: 0: 985.5. Samples: 518794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:37:41,092][02698] Avg episode reward: [(0, '20.270')] +[2025-08-01 17:37:41,095][02835] Saving new best policy, reward=20.270! +[2025-08-01 17:37:45,460][02852] Updated weights for policy 0, policy_version 510 (0.0033) +[2025-08-01 17:37:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3915.5). Total num frames: 2088960. Throughput: 0: 985.1. Samples: 520838. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-08-01 17:37:46,091][02698] Avg episode reward: [(0, '20.035')] +[2025-08-01 17:37:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3915.5). Total num frames: 2109440. Throughput: 0: 983.2. Samples: 527188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:37:51,091][02698] Avg episode reward: [(0, '20.689')] +[2025-08-01 17:37:51,095][02835] Saving new best policy, reward=20.689! +[2025-08-01 17:37:56,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 2125824. Throughput: 0: 954.3. Samples: 532678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:37:56,092][02698] Avg episode reward: [(0, '21.080')] +[2025-08-01 17:37:56,105][02835] Saving new best policy, reward=21.080! +[2025-08-01 17:37:56,376][02852] Updated weights for policy 0, policy_version 520 (0.0014) +[2025-08-01 17:38:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2146304. Throughput: 0: 958.9. Samples: 534874. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:38:01,092][02698] Avg episode reward: [(0, '21.029')] +[2025-08-01 17:38:06,061][02852] Updated weights for policy 0, policy_version 530 (0.0019) +[2025-08-01 17:38:06,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3929.4). Total num frames: 2170880. Throughput: 0: 959.6. Samples: 541446. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-08-01 17:38:06,089][02698] Avg episode reward: [(0, '22.070')] +[2025-08-01 17:38:06,097][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000530_2170880.pth... +[2025-08-01 17:38:06,234][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000297_1216512.pth +[2025-08-01 17:38:06,244][02835] Saving new best policy, reward=22.070! +[2025-08-01 17:38:11,089][02698] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2183168. Throughput: 0: 936.0. Samples: 546720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:38:11,090][02698] Avg episode reward: [(0, '21.287')] +[2025-08-01 17:38:16,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3943.3). Total num frames: 2203648. Throughput: 0: 952.8. Samples: 549342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:38:16,091][02698] Avg episode reward: [(0, '20.897')] +[2025-08-01 17:38:17,579][02852] Updated weights for policy 0, policy_version 540 (0.0036) +[2025-08-01 17:38:21,088][02698] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2224128. Throughput: 0: 950.7. Samples: 555892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:38:21,090][02698] Avg episode reward: [(0, '20.909')] +[2025-08-01 17:38:26,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 2240512. Throughput: 0: 934.8. Samples: 560862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:38:26,093][02698] Avg episode reward: [(0, '20.215')] +[2025-08-01 17:38:28,658][02852] Updated weights for policy 0, policy_version 550 (0.0029) +[2025-08-01 17:38:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2260992. Throughput: 0: 953.0. Samples: 563722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:38:31,092][02698] Avg episode reward: [(0, '20.111')] +[2025-08-01 17:38:36,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2281472. Throughput: 0: 958.4. Samples: 570314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-08-01 17:38:36,092][02698] Avg episode reward: [(0, '19.849')] +[2025-08-01 17:38:38,769][02852] Updated weights for policy 0, policy_version 560 (0.0024) +[2025-08-01 17:38:41,090][02698] Fps is (10 sec: 3685.6, 60 sec: 3754.5, 300 sec: 3915.5). Total num frames: 2297856. Throughput: 0: 942.9. Samples: 575110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:38:41,094][02698] Avg episode reward: [(0, '21.402')] +[2025-08-01 17:38:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3929.4). Total num frames: 2318336. Throughput: 0: 963.5. Samples: 578232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:38:46,089][02698] Avg episode reward: [(0, '20.967')] +[2025-08-01 17:38:49,113][02852] Updated weights for policy 0, policy_version 570 (0.0026) +[2025-08-01 17:38:51,088][02698] Fps is (10 sec: 4506.6, 60 sec: 3891.2, 300 sec: 3943.3). Total num frames: 2342912. Throughput: 0: 962.8. Samples: 584772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:38:51,089][02698] Avg episode reward: [(0, '21.115')] +[2025-08-01 17:38:56,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2355200. Throughput: 0: 946.3. Samples: 589304. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-08-01 17:38:56,089][02698] Avg episode reward: [(0, '21.189')] +[2025-08-01 17:39:00,495][02852] Updated weights for policy 0, policy_version 580 (0.0014) +[2025-08-01 17:39:01,098][02698] Fps is (10 sec: 3273.4, 60 sec: 3822.3, 300 sec: 3929.2). Total num frames: 2375680. Throughput: 0: 960.0. Samples: 592550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:39:01,102][02698] Avg episode reward: [(0, '22.028')] +[2025-08-01 17:39:06,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3915.5). Total num frames: 2396160. Throughput: 0: 961.9. Samples: 599178. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:39:06,096][02698] Avg episode reward: [(0, '21.148')] +[2025-08-01 17:39:11,088][02698] Fps is (10 sec: 3690.3, 60 sec: 3823.0, 300 sec: 3901.7). Total num frames: 2412544. Throughput: 0: 951.8. Samples: 603694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:39:11,090][02698] Avg episode reward: [(0, '21.181')] +[2025-08-01 17:39:11,680][02852] Updated weights for policy 0, policy_version 590 (0.0029) +[2025-08-01 17:39:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3915.5). Total num frames: 2433024. Throughput: 0: 960.6. Samples: 606950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:39:16,090][02698] Avg episode reward: [(0, '21.999')] +[2025-08-01 17:39:21,089][02698] Fps is (10 sec: 4095.6, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2453504. Throughput: 0: 962.0. Samples: 613604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:39:21,095][02698] Avg episode reward: [(0, '20.971')] +[2025-08-01 17:39:21,286][02852] Updated weights for policy 0, policy_version 600 (0.0017) +[2025-08-01 17:39:26,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3901.7). Total num frames: 2469888. Throughput: 0: 955.8. Samples: 618118. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:39:26,093][02698] Avg episode reward: [(0, '21.228')] +[2025-08-01 17:39:31,088][02698] Fps is (10 sec: 4096.4, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2494464. Throughput: 0: 960.1. Samples: 621436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:39:31,102][02698] Avg episode reward: [(0, '20.927')] +[2025-08-01 17:39:31,970][02852] Updated weights for policy 0, policy_version 610 (0.0012) +[2025-08-01 17:39:36,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 2510848. Throughput: 0: 964.1. Samples: 628156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:39:36,093][02698] Avg episode reward: [(0, '22.250')] +[2025-08-01 17:39:36,100][02835] Saving new best policy, reward=22.250! +[2025-08-01 17:39:41,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3887.7). Total num frames: 2527232. Throughput: 0: 964.3. Samples: 632696. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:39:41,093][02698] Avg episode reward: [(0, '22.538')] +[2025-08-01 17:39:41,096][02835] Saving new best policy, reward=22.538! +[2025-08-01 17:39:43,225][02852] Updated weights for policy 0, policy_version 620 (0.0029) +[2025-08-01 17:39:46,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3915.5). Total num frames: 2551808. Throughput: 0: 964.4. Samples: 635940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:39:46,094][02698] Avg episode reward: [(0, '22.108')] +[2025-08-01 17:39:51,089][02698] Fps is (10 sec: 4095.7, 60 sec: 3754.6, 300 sec: 3887.7). Total num frames: 2568192. Throughput: 0: 957.9. Samples: 642282. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:39:51,098][02698] Avg episode reward: [(0, '23.506')] +[2025-08-01 17:39:51,100][02835] Saving new best policy, reward=23.506! +[2025-08-01 17:39:54,295][02852] Updated weights for policy 0, policy_version 630 (0.0014) +[2025-08-01 17:39:56,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3887.8). Total num frames: 2584576. Throughput: 0: 966.8. Samples: 647200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:39:56,093][02698] Avg episode reward: [(0, '23.901')] +[2025-08-01 17:39:56,100][02835] Saving new best policy, reward=23.901! +[2025-08-01 17:40:01,088][02698] Fps is (10 sec: 4096.3, 60 sec: 3891.9, 300 sec: 3915.5). Total num frames: 2609152. Throughput: 0: 969.4. Samples: 650572. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:40:01,090][02698] Avg episode reward: [(0, '23.833')] +[2025-08-01 17:40:03,446][02852] Updated weights for policy 0, policy_version 640 (0.0022) +[2025-08-01 17:40:06,090][02698] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3887.7). Total num frames: 2625536. Throughput: 0: 958.1. Samples: 656718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:40:06,092][02698] Avg episode reward: [(0, '23.579')] +[2025-08-01 17:40:06,104][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_2625536.pth... +[2025-08-01 17:40:06,257][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000414_1695744.pth +[2025-08-01 17:40:11,088][02698] Fps is (10 sec: 2867.2, 60 sec: 3754.7, 300 sec: 3873.9). Total num frames: 2637824. Throughput: 0: 940.4. Samples: 660434. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:40:11,091][02698] Avg episode reward: [(0, '24.078')] +[2025-08-01 17:40:11,097][02835] Saving new best policy, reward=24.078! +[2025-08-01 17:40:16,088][02698] Fps is (10 sec: 3277.6, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 2658304. Throughput: 0: 925.2. Samples: 663068. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:40:16,092][02698] Avg episode reward: [(0, '23.849')] +[2025-08-01 17:40:16,484][02852] Updated weights for policy 0, policy_version 650 (0.0028) +[2025-08-01 17:40:21,091][02698] Fps is (10 sec: 3685.3, 60 sec: 3686.3, 300 sec: 3846.0). Total num frames: 2674688. Throughput: 0: 903.1. Samples: 668800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:40:21,093][02698] Avg episode reward: [(0, '23.409')] +[2025-08-01 17:40:26,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2695168. Throughput: 0: 925.3. Samples: 674336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:40:26,089][02698] Avg episode reward: [(0, '21.979')] +[2025-08-01 17:40:27,599][02852] Updated weights for policy 0, policy_version 660 (0.0031) +[2025-08-01 17:40:31,088][02698] Fps is (10 sec: 4097.2, 60 sec: 3686.4, 300 sec: 3860.0). Total num frames: 2715648. Throughput: 0: 927.8. Samples: 677692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:40:31,092][02698] Avg episode reward: [(0, '24.166')] +[2025-08-01 17:40:31,096][02835] Saving new best policy, reward=24.166! +[2025-08-01 17:40:36,091][02698] Fps is (10 sec: 3685.5, 60 sec: 3686.2, 300 sec: 3846.0). Total num frames: 2732032. Throughput: 0: 912.3. Samples: 683338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:40:36,097][02698] Avg episode reward: [(0, '25.130')] +[2025-08-01 17:40:36,112][02835] Saving new best policy, reward=25.130! +[2025-08-01 17:40:38,704][02852] Updated weights for policy 0, policy_version 670 (0.0026) +[2025-08-01 17:40:41,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2752512. Throughput: 0: 928.8. Samples: 688998. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-08-01 17:40:41,091][02698] Avg episode reward: [(0, '24.134')] +[2025-08-01 17:40:46,088][02698] Fps is (10 sec: 4506.7, 60 sec: 3754.7, 300 sec: 3873.8). Total num frames: 2777088. Throughput: 0: 930.2. Samples: 692430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:40:46,091][02698] Avg episode reward: [(0, '24.382')] +[2025-08-01 17:40:48,100][02852] Updated weights for policy 0, policy_version 680 (0.0017) +[2025-08-01 17:40:51,089][02698] Fps is (10 sec: 3685.9, 60 sec: 3686.3, 300 sec: 3832.2). Total num frames: 2789376. Throughput: 0: 914.1. Samples: 697850. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:40:51,091][02698] Avg episode reward: [(0, '24.034')] +[2025-08-01 17:40:56,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2813952. Throughput: 0: 962.8. Samples: 703760. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:40:56,093][02698] Avg episode reward: [(0, '24.029')] +[2025-08-01 17:40:58,633][02852] Updated weights for policy 0, policy_version 690 (0.0031) +[2025-08-01 17:41:01,088][02698] Fps is (10 sec: 4506.2, 60 sec: 3754.7, 300 sec: 3860.0). Total num frames: 2834432. Throughput: 0: 979.4. Samples: 707142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:41:01,094][02698] Avg episode reward: [(0, '23.236')] +[2025-08-01 17:41:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3832.2). Total num frames: 2850816. Throughput: 0: 970.7. Samples: 712480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:41:06,093][02698] Avg episode reward: [(0, '22.939')] +[2025-08-01 17:41:09,630][02852] Updated weights for policy 0, policy_version 700 (0.0024) +[2025-08-01 17:41:11,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2871296. Throughput: 0: 983.3. Samples: 718586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:41:11,092][02698] Avg episode reward: [(0, '23.654')] +[2025-08-01 17:41:16,091][02698] Fps is (10 sec: 4504.2, 60 sec: 3959.3, 300 sec: 3859.9). Total num frames: 2895872. Throughput: 0: 985.5. Samples: 722044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:41:16,095][02698] Avg episode reward: [(0, '23.847')] +[2025-08-01 17:41:20,379][02852] Updated weights for policy 0, policy_version 710 (0.0017) +[2025-08-01 17:41:21,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3818.3). Total num frames: 2908160. Throughput: 0: 972.9. Samples: 727116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:41:21,094][02698] Avg episode reward: [(0, '23.896')] +[2025-08-01 17:41:26,088][02698] Fps is (10 sec: 3687.5, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 2932736. Throughput: 0: 987.7. Samples: 733444. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:41:26,093][02698] Avg episode reward: [(0, '24.909')] +[2025-08-01 17:41:29,692][02852] Updated weights for policy 0, policy_version 720 (0.0014) +[2025-08-01 17:41:31,088][02698] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 2953216. Throughput: 0: 985.9. Samples: 736796. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:41:31,092][02698] Avg episode reward: [(0, '23.671')] +[2025-08-01 17:41:36,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3832.2). Total num frames: 2969600. Throughput: 0: 972.7. Samples: 741618. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:41:36,089][02698] Avg episode reward: [(0, '23.621')] +[2025-08-01 17:41:40,770][02852] Updated weights for policy 0, policy_version 730 (0.0035) +[2025-08-01 17:41:41,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2990080. Throughput: 0: 987.3. Samples: 748188. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) +[2025-08-01 17:41:41,089][02698] Avg episode reward: [(0, '22.847')] +[2025-08-01 17:41:46,092][02698] Fps is (10 sec: 4094.2, 60 sec: 3890.9, 300 sec: 3859.9). Total num frames: 3010560. Throughput: 0: 984.3. Samples: 751442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:41:46,094][02698] Avg episode reward: [(0, '22.065')] +[2025-08-01 17:41:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3846.1). Total num frames: 3026944. Throughput: 0: 964.9. Samples: 755902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-08-01 17:41:51,093][02698] Avg episode reward: [(0, '21.646')] +[2025-08-01 17:41:51,942][02852] Updated weights for policy 0, policy_version 740 (0.0014) +[2025-08-01 17:41:56,088][02698] Fps is (10 sec: 3688.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3047424. Throughput: 0: 981.2. Samples: 762740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:41:56,089][02698] Avg episode reward: [(0, '22.730')] +[2025-08-01 17:42:01,093][02698] Fps is (10 sec: 4093.7, 60 sec: 3890.8, 300 sec: 3846.0). Total num frames: 3067904. Throughput: 0: 978.5. Samples: 766080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:42:01,098][02698] Avg episode reward: [(0, '23.303')] +[2025-08-01 17:42:01,795][02852] Updated weights for policy 0, policy_version 750 (0.0025) +[2025-08-01 17:42:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3084288. Throughput: 0: 966.0. Samples: 770588. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:42:06,089][02698] Avg episode reward: [(0, '23.703')] +[2025-08-01 17:42:06,099][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth... +[2025-08-01 17:42:06,211][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000530_2170880.pth +[2025-08-01 17:42:11,088][02698] Fps is (10 sec: 4098.1, 60 sec: 3959.4, 300 sec: 3846.1). Total num frames: 3108864. Throughput: 0: 972.6. Samples: 777210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-08-01 17:42:11,096][02698] Avg episode reward: [(0, '24.151')] +[2025-08-01 17:42:12,111][02852] Updated weights for policy 0, policy_version 760 (0.0014) +[2025-08-01 17:42:16,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3832.2). Total num frames: 3125248. Throughput: 0: 969.2. Samples: 780408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:42:16,093][02698] Avg episode reward: [(0, '25.022')] +[2025-08-01 17:42:21,088][02698] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3141632. Throughput: 0: 962.0. Samples: 784906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:42:21,093][02698] Avg episode reward: [(0, '24.697')] +[2025-08-01 17:42:23,276][02852] Updated weights for policy 0, policy_version 770 (0.0019) +[2025-08-01 17:42:26,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3166208. Throughput: 0: 964.8. Samples: 791604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:42:26,089][02698] Avg episode reward: [(0, '24.642')] +[2025-08-01 17:42:31,090][02698] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3832.2). Total num frames: 3182592. Throughput: 0: 968.3. Samples: 795014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:42:31,098][02698] Avg episode reward: [(0, '23.410')] +[2025-08-01 17:42:34,486][02852] Updated weights for policy 0, policy_version 780 (0.0033) +[2025-08-01 17:42:36,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3198976. Throughput: 0: 971.4. Samples: 799616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:42:36,093][02698] Avg episode reward: [(0, '23.647')] +[2025-08-01 17:42:41,088][02698] Fps is (10 sec: 4097.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3223552. Throughput: 0: 968.8. Samples: 806336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:42:41,093][02698] Avg episode reward: [(0, '23.133')] +[2025-08-01 17:42:43,737][02852] Updated weights for policy 0, policy_version 790 (0.0025) +[2025-08-01 17:42:46,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3832.2). Total num frames: 3239936. Throughput: 0: 962.7. Samples: 809394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-08-01 17:42:46,090][02698] Avg episode reward: [(0, '22.424')] +[2025-08-01 17:42:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3260416. Throughput: 0: 967.8. Samples: 814138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:42:51,090][02698] Avg episode reward: [(0, '22.679')] +[2025-08-01 17:42:54,697][02852] Updated weights for policy 0, policy_version 800 (0.0012) +[2025-08-01 17:42:56,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3280896. Throughput: 0: 970.2. Samples: 820870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:42:56,094][02698] Avg episode reward: [(0, '23.330')] +[2025-08-01 17:43:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3818.3). Total num frames: 3297280. Throughput: 0: 961.5. Samples: 823676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:43:01,093][02698] Avg episode reward: [(0, '23.242')] +[2025-08-01 17:43:05,642][02852] Updated weights for policy 0, policy_version 810 (0.0036) +[2025-08-01 17:43:06,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3317760. Throughput: 0: 978.1. Samples: 828922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:43:06,089][02698] Avg episode reward: [(0, '22.768')] +[2025-08-01 17:43:11,090][02698] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3846.0). Total num frames: 3338240. Throughput: 0: 976.3. Samples: 835542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:43:11,096][02698] Avg episode reward: [(0, '23.608')] +[2025-08-01 17:43:16,095][02698] Fps is (10 sec: 3684.0, 60 sec: 3822.5, 300 sec: 3832.1). Total num frames: 3354624. Throughput: 0: 956.0. Samples: 838038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:43:16,099][02698] Avg episode reward: [(0, '23.432')] +[2025-08-01 17:43:16,960][02852] Updated weights for policy 0, policy_version 820 (0.0018) +[2025-08-01 17:43:21,088][02698] Fps is (10 sec: 3687.3, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3375104. Throughput: 0: 970.8. Samples: 843302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:43:21,093][02698] Avg episode reward: [(0, '23.359')] +[2025-08-01 17:43:26,079][02852] Updated weights for policy 0, policy_version 830 (0.0014) +[2025-08-01 17:43:26,088][02698] Fps is (10 sec: 4508.6, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3399680. Throughput: 0: 971.8. Samples: 850066. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:43:26,090][02698] Avg episode reward: [(0, '22.985')] +[2025-08-01 17:43:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3832.2). Total num frames: 3411968. Throughput: 0: 952.8. Samples: 852270. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-08-01 17:43:31,094][02698] Avg episode reward: [(0, '23.403')] +[2025-08-01 17:43:36,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3432448. Throughput: 0: 970.0. Samples: 857786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:43:36,108][02698] Avg episode reward: [(0, '24.934')] +[2025-08-01 17:43:38,419][02852] Updated weights for policy 0, policy_version 840 (0.0020) +[2025-08-01 17:43:41,088][02698] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3444736. Throughput: 0: 919.0. Samples: 862226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:43:41,089][02698] Avg episode reward: [(0, '24.175')] +[2025-08-01 17:43:46,088][02698] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 3461120. Throughput: 0: 901.1. Samples: 864224. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:43:46,093][02698] Avg episode reward: [(0, '24.371')] +[2025-08-01 17:43:50,617][02852] Updated weights for policy 0, policy_version 850 (0.0026) +[2025-08-01 17:43:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3481600. Throughput: 0: 914.0. Samples: 870050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:43:51,093][02698] Avg episode reward: [(0, '24.411')] +[2025-08-01 17:43:56,092][02698] Fps is (10 sec: 4094.2, 60 sec: 3686.1, 300 sec: 3818.4). Total num frames: 3502080. Throughput: 0: 912.0. Samples: 876584. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:43:56,094][02698] Avg episode reward: [(0, '25.077')] +[2025-08-01 17:44:01,088][02698] Fps is (10 sec: 3686.2, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 3518464. Throughput: 0: 900.7. Samples: 878564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:44:01,092][02698] Avg episode reward: [(0, '24.845')] +[2025-08-01 17:44:01,685][02852] Updated weights for policy 0, policy_version 860 (0.0023) +[2025-08-01 17:44:06,088][02698] Fps is (10 sec: 3688.1, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3538944. Throughput: 0: 915.7. Samples: 884510. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:44:06,090][02698] Avg episode reward: [(0, '25.047')] +[2025-08-01 17:44:06,101][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000864_3538944.pth... +[2025-08-01 17:44:06,236][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000641_2625536.pth +[2025-08-01 17:44:11,092][02698] Fps is (10 sec: 4094.6, 60 sec: 3686.3, 300 sec: 3818.3). Total num frames: 3559424. Throughput: 0: 903.1. Samples: 890710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:44:11,096][02698] Avg episode reward: [(0, '26.152')] +[2025-08-01 17:44:11,099][02835] Saving new best policy, reward=26.152! +[2025-08-01 17:44:11,927][02852] Updated weights for policy 0, policy_version 870 (0.0024) +[2025-08-01 17:44:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3686.8, 300 sec: 3804.4). Total num frames: 3575808. Throughput: 0: 896.2. Samples: 892598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:44:16,092][02698] Avg episode reward: [(0, '25.547')] +[2025-08-01 17:44:21,088][02698] Fps is (10 sec: 3687.8, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3596288. Throughput: 0: 909.5. Samples: 898712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-08-01 17:44:21,092][02698] Avg episode reward: [(0, '26.016')] +[2025-08-01 17:44:22,450][02852] Updated weights for policy 0, policy_version 880 (0.0016) +[2025-08-01 17:44:26,093][02698] Fps is (10 sec: 4093.9, 60 sec: 3617.8, 300 sec: 3804.4). Total num frames: 3616768. Throughput: 0: 946.0. Samples: 904802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:44:26,097][02698] Avg episode reward: [(0, '26.250')] +[2025-08-01 17:44:26,104][02835] Saving new best policy, reward=26.250! +[2025-08-01 17:44:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 3633152. Throughput: 0: 944.6. Samples: 906732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:44:31,090][02698] Avg episode reward: [(0, '26.601')] +[2025-08-01 17:44:31,092][02835] Saving new best policy, reward=26.601! +[2025-08-01 17:44:33,699][02852] Updated weights for policy 0, policy_version 890 (0.0017) +[2025-08-01 17:44:36,088][02698] Fps is (10 sec: 3688.3, 60 sec: 3686.4, 300 sec: 3818.3). Total num frames: 3653632. Throughput: 0: 954.8. Samples: 913014. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:44:36,090][02698] Avg episode reward: [(0, '26.049')] +[2025-08-01 17:44:41,088][02698] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3674112. Throughput: 0: 943.7. Samples: 919048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:44:41,089][02698] Avg episode reward: [(0, '24.543')] +[2025-08-01 17:44:44,778][02852] Updated weights for policy 0, policy_version 900 (0.0022) +[2025-08-01 17:44:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 3690496. Throughput: 0: 943.2. Samples: 921008. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:44:46,089][02698] Avg episode reward: [(0, '25.079')] +[2025-08-01 17:44:51,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3710976. Throughput: 0: 956.9. Samples: 927570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:44:51,092][02698] Avg episode reward: [(0, '25.591')] +[2025-08-01 17:44:54,208][02852] Updated weights for policy 0, policy_version 910 (0.0027) +[2025-08-01 17:44:56,090][02698] Fps is (10 sec: 4095.1, 60 sec: 3823.1, 300 sec: 3804.4). Total num frames: 3731456. Throughput: 0: 944.9. Samples: 933228. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:44:56,094][02698] Avg episode reward: [(0, '24.867')] +[2025-08-01 17:45:01,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3804.5). Total num frames: 3747840. Throughput: 0: 948.0. Samples: 935258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:45:01,092][02698] Avg episode reward: [(0, '26.617')] +[2025-08-01 17:45:01,095][02835] Saving new best policy, reward=26.617! +[2025-08-01 17:45:05,458][02852] Updated weights for policy 0, policy_version 920 (0.0026) +[2025-08-01 17:45:06,088][02698] Fps is (10 sec: 3687.2, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3768320. Throughput: 0: 956.0. Samples: 941732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:45:06,092][02698] Avg episode reward: [(0, '26.247')] +[2025-08-01 17:45:11,093][02698] Fps is (10 sec: 3684.5, 60 sec: 3754.6, 300 sec: 3818.2). Total num frames: 3784704. Throughput: 0: 945.4. Samples: 947346. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-08-01 17:45:11,099][02698] Avg episode reward: [(0, '26.608')] +[2025-08-01 17:45:16,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3805184. Throughput: 0: 952.1. Samples: 949576. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:45:16,092][02698] Avg episode reward: [(0, '26.507')] +[2025-08-01 17:45:16,636][02852] Updated weights for policy 0, policy_version 930 (0.0027) +[2025-08-01 17:45:21,088][02698] Fps is (10 sec: 4098.1, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3825664. Throughput: 0: 961.6. Samples: 956288. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:45:21,089][02698] Avg episode reward: [(0, '26.349')] +[2025-08-01 17:45:26,089][02698] Fps is (10 sec: 4095.4, 60 sec: 3823.2, 300 sec: 3832.2). Total num frames: 3846144. Throughput: 0: 949.2. Samples: 961764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:45:26,091][02698] Avg episode reward: [(0, '25.475')] +[2025-08-01 17:45:27,659][02852] Updated weights for policy 0, policy_version 940 (0.0017) +[2025-08-01 17:45:31,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3862528. Throughput: 0: 962.8. Samples: 964332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-08-01 17:45:31,092][02698] Avg episode reward: [(0, '24.982')] +[2025-08-01 17:45:36,088][02698] Fps is (10 sec: 4096.6, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3887104. Throughput: 0: 969.5. Samples: 971198. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:45:36,092][02698] Avg episode reward: [(0, '26.190')] +[2025-08-01 17:45:36,653][02852] Updated weights for policy 0, policy_version 950 (0.0019) +[2025-08-01 17:45:41,089][02698] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3903488. Throughput: 0: 956.7. Samples: 976276. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:45:41,098][02698] Avg episode reward: [(0, '26.063')] +[2025-08-01 17:45:46,088][02698] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3923968. Throughput: 0: 975.9. Samples: 979172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-08-01 17:45:46,093][02698] Avg episode reward: [(0, '25.614')] +[2025-08-01 17:45:47,635][02852] Updated weights for policy 0, policy_version 960 (0.0025) +[2025-08-01 17:45:51,088][02698] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3944448. Throughput: 0: 984.2. Samples: 986020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:45:51,089][02698] Avg episode reward: [(0, '25.870')] +[2025-08-01 17:45:56,089][02698] Fps is (10 sec: 3685.9, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 3960832. Throughput: 0: 966.3. Samples: 990826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-08-01 17:45:56,091][02698] Avg episode reward: [(0, '25.931')] +[2025-08-01 17:45:58,586][02852] Updated weights for policy 0, policy_version 970 (0.0029) +[2025-08-01 17:46:01,088][02698] Fps is (10 sec: 3686.2, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3981312. Throughput: 0: 986.9. Samples: 993988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:46:01,090][02698] Avg episode reward: [(0, '25.352')] +[2025-08-01 17:46:06,091][02698] Fps is (10 sec: 4095.4, 60 sec: 3891.0, 300 sec: 3832.2). Total num frames: 4001792. Throughput: 0: 984.3. Samples: 1000586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-08-01 17:46:06,102][02698] Avg episode reward: [(0, '23.697')] +[2025-08-01 17:46:06,119][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000977_4001792.pth... +[2025-08-01 17:46:06,448][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000753_3084288.pth +[2025-08-01 17:46:08,589][02835] Stopping Batcher_0... +[2025-08-01 17:46:08,590][02835] Loop batcher_evt_loop terminating... +[2025-08-01 17:46:08,590][02698] Component Batcher_0 stopped! +[2025-08-01 17:46:08,597][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:46:08,717][02852] Weights refcount: 2 0 +[2025-08-01 17:46:08,733][02852] Stopping InferenceWorker_p0-w0... +[2025-08-01 17:46:08,734][02852] Loop inference_proc0-0_evt_loop terminating... +[2025-08-01 17:46:08,739][02698] Component InferenceWorker_p0-w0 stopped! +[2025-08-01 17:46:08,811][02835] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000864_3538944.pth +[2025-08-01 17:46:08,845][02835] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:46:09,055][02854] Stopping RolloutWorker_w1... +[2025-08-01 17:46:09,055][02698] Component RolloutWorker_w1 stopped! +[2025-08-01 17:46:09,060][02854] Loop rollout_proc1_evt_loop terminating... +[2025-08-01 17:46:09,067][02698] Component RolloutWorker_w7 stopped! +[2025-08-01 17:46:09,072][02860] Stopping RolloutWorker_w7... +[2025-08-01 17:46:09,073][02860] Loop rollout_proc7_evt_loop terminating... +[2025-08-01 17:46:09,082][02698] Component RolloutWorker_w5 stopped! +[2025-08-01 17:46:09,085][02858] Stopping RolloutWorker_w5... +[2025-08-01 17:46:09,096][02698] Component RolloutWorker_w3 stopped! +[2025-08-01 17:46:09,100][02856] Stopping RolloutWorker_w3... +[2025-08-01 17:46:09,092][02858] Loop rollout_proc5_evt_loop terminating... +[2025-08-01 17:46:09,101][02856] Loop rollout_proc3_evt_loop terminating... +[2025-08-01 17:46:09,149][02835] Stopping LearnerWorker_p0... +[2025-08-01 17:46:09,149][02835] Loop learner_proc0_evt_loop terminating... +[2025-08-01 17:46:09,149][02698] Component LearnerWorker_p0 stopped! +[2025-08-01 17:46:09,451][02857] Stopping RolloutWorker_w4... +[2025-08-01 17:46:09,452][02857] Loop rollout_proc4_evt_loop terminating... +[2025-08-01 17:46:09,452][02698] Component RolloutWorker_w4 stopped! +[2025-08-01 17:46:09,479][02698] Component RolloutWorker_w0 stopped! +[2025-08-01 17:46:09,473][02853] Stopping RolloutWorker_w0... +[2025-08-01 17:46:09,485][02859] Stopping RolloutWorker_w6... +[2025-08-01 17:46:09,485][02859] Loop rollout_proc6_evt_loop terminating... +[2025-08-01 17:46:09,488][02853] Loop rollout_proc0_evt_loop terminating... +[2025-08-01 17:46:09,487][02698] Component RolloutWorker_w6 stopped! +[2025-08-01 17:46:09,499][02855] Stopping RolloutWorker_w2... +[2025-08-01 17:46:09,500][02855] Loop rollout_proc2_evt_loop terminating... +[2025-08-01 17:46:09,503][02698] Component RolloutWorker_w2 stopped! +[2025-08-01 17:46:09,515][02698] Waiting for process learner_proc0 to stop... +[2025-08-01 17:46:11,423][02698] Waiting for process inference_proc0-0 to join... +[2025-08-01 17:46:11,425][02698] Waiting for process rollout_proc0 to join... +[2025-08-01 17:46:13,790][02698] Waiting for process rollout_proc1 to join... +[2025-08-01 17:46:13,791][02698] Waiting for process rollout_proc2 to join... +[2025-08-01 17:46:13,793][02698] Waiting for process rollout_proc3 to join... +[2025-08-01 17:46:13,795][02698] Waiting for process rollout_proc4 to join... +[2025-08-01 17:46:13,798][02698] Waiting for process rollout_proc5 to join... +[2025-08-01 17:46:13,799][02698] Waiting for process rollout_proc6 to join... +[2025-08-01 17:46:13,801][02698] Waiting for process rollout_proc7 to join... +[2025-08-01 17:46:13,803][02698] Batcher 0 profile tree view: +batching: 26.8052, releasing_batches: 0.0308 +[2025-08-01 17:46:13,804][02698] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0190 + wait_policy_total: 426.3653 +update_model: 8.8217 + weight_update: 0.0017 +one_step: 0.0034 + handle_policy_step: 580.7459 + deserialize: 14.2973, stack: 3.1013, obs_to_device_normalize: 121.9128, forward: 302.5888, send_messages: 27.5275 + prepare_outputs: 86.9119 + to_cpu: 52.2295 +[2025-08-01 17:46:13,806][02698] Learner 0 profile tree view: +misc: 0.0038, prepare_batch: 12.3597 +train: 73.3198 + epoch_init: 0.0065, minibatch_init: 0.0073, losses_postprocess: 0.6659, kl_divergence: 0.7058, after_optimizer: 33.3045 + calculate_losses: 26.0606 + losses_init: 0.0129, forward_head: 1.4762, bptt_initial: 17.0544, tail: 1.2080, advantages_returns: 0.2881, losses: 3.5637 + bptt: 2.1475 + bptt_forward_core: 2.0454 + update: 11.9187 + clip: 0.9734 +[2025-08-01 17:46:13,808][02698] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.2858, enqueue_policy_requests: 105.0386, env_step: 825.2810, overhead: 13.8404, complete_rollouts: 7.4670 +save_policy_outputs: 19.5860 + split_output_tensors: 7.4430 +[2025-08-01 17:46:13,809][02698] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.3324, enqueue_policy_requests: 112.0578, env_step: 818.5043, overhead: 13.9942, complete_rollouts: 6.5757 +save_policy_outputs: 19.7271 + split_output_tensors: 7.4854 +[2025-08-01 17:46:13,811][02698] Loop Runner_EvtLoop terminating... +[2025-08-01 17:46:13,813][02698] Runner profile tree view: +main_loop: 1081.9796 +[2025-08-01 17:46:13,814][02698] Collected {0: 4005888}, FPS: 3702.4 +[2025-08-01 17:46:14,115][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-08-01 17:46:14,116][02698] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-01 17:46:14,118][02698] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-01 17:46:14,119][02698] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-01 17:46:14,120][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-01 17:46:14,122][02698] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-01 17:46:14,123][02698] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-08-01 17:46:14,125][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-01 17:46:14,126][02698] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-08-01 17:46:14,127][02698] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-08-01 17:46:14,128][02698] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-01 17:46:14,129][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-01 17:46:14,130][02698] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-01 17:46:14,131][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-01 17:46:14,133][02698] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-01 17:46:14,163][02698] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-01 17:46:14,166][02698] RunningMeanStd input shape: (3, 72, 128) +[2025-08-01 17:46:14,168][02698] RunningMeanStd input shape: (1,) +[2025-08-01 17:46:14,182][02698] ConvEncoder: input_channels=3 +[2025-08-01 17:46:14,278][02698] Conv encoder output size: 512 +[2025-08-01 17:46:14,279][02698] Policy head output size: 512 +[2025-08-01 17:46:14,447][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:46:14,449][02698] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:46:14,453][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:46:14,454][02698] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:46:14,456][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:46:14,457][02698] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:46:46,985][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-08-01 17:46:46,986][02698] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-01 17:46:46,988][02698] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-01 17:46:46,988][02698] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-01 17:46:46,989][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-01 17:46:46,990][02698] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-01 17:46:46,991][02698] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-08-01 17:46:46,992][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-01 17:46:46,993][02698] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-08-01 17:46:46,994][02698] Adding new argument 'hf_repository'='TayJen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-08-01 17:46:46,995][02698] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-01 17:46:46,995][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-01 17:46:46,996][02698] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-01 17:46:46,997][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-01 17:46:46,998][02698] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-01 17:46:47,029][02698] RunningMeanStd input shape: (3, 72, 128) +[2025-08-01 17:46:47,030][02698] RunningMeanStd input shape: (1,) +[2025-08-01 17:46:47,041][02698] ConvEncoder: input_channels=3 +[2025-08-01 17:46:47,078][02698] Conv encoder output size: 512 +[2025-08-01 17:46:47,079][02698] Policy head output size: 512 +[2025-08-01 17:46:47,099][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:46:47,100][02698] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:46:47,102][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:46:47,103][02698] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:46:47,104][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:46:47,106][02698] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:47:45,094][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-08-01 17:47:45,095][02698] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-01 17:47:45,096][02698] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-01 17:47:45,097][02698] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-01 17:47:45,098][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-01 17:47:45,099][02698] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-01 17:47:45,100][02698] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-08-01 17:47:45,101][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-01 17:47:45,102][02698] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-08-01 17:47:45,103][02698] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-08-01 17:47:45,104][02698] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-01 17:47:45,105][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-01 17:47:45,106][02698] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-01 17:47:45,107][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-01 17:47:45,108][02698] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-01 17:47:45,153][02698] RunningMeanStd input shape: (3, 72, 128) +[2025-08-01 17:47:45,157][02698] RunningMeanStd input shape: (1,) +[2025-08-01 17:47:45,175][02698] ConvEncoder: input_channels=3 +[2025-08-01 17:47:45,227][02698] Conv encoder output size: 512 +[2025-08-01 17:47:45,228][02698] Policy head output size: 512 +[2025-08-01 17:47:45,253][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:47:45,255][02698] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:47:45,256][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:47:45,257][02698] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:47:45,259][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:47:45,260][02698] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:47:48,908][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-08-01 17:47:48,910][02698] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-01 17:47:48,911][02698] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-01 17:47:48,913][02698] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-01 17:47:48,914][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-01 17:47:48,914][02698] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-01 17:47:48,915][02698] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-08-01 17:47:48,917][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-01 17:47:48,919][02698] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-08-01 17:47:48,921][02698] Adding new argument 'hf_repository'='TayJen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-08-01 17:47:48,923][02698] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-01 17:47:48,925][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-01 17:47:48,925][02698] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-01 17:47:48,926][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-01 17:47:48,927][02698] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-01 17:47:48,952][02698] RunningMeanStd input shape: (3, 72, 128) +[2025-08-01 17:47:48,954][02698] RunningMeanStd input shape: (1,) +[2025-08-01 17:47:48,965][02698] ConvEncoder: input_channels=3 +[2025-08-01 17:47:48,999][02698] Conv encoder output size: 512 +[2025-08-01 17:47:49,000][02698] Policy head output size: 512 +[2025-08-01 17:47:49,020][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:47:49,021][02698] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:47:49,022][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:47:49,024][02698] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:47:49,025][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:47:49,027][02698] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:48:31,235][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-08-01 17:48:31,236][02698] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-01 17:48:31,237][02698] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-01 17:48:31,238][02698] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-01 17:48:31,239][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-01 17:48:31,241][02698] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-01 17:48:31,242][02698] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-08-01 17:48:31,242][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-01 17:48:31,245][02698] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-08-01 17:48:31,246][02698] Adding new argument 'hf_repository'='TayJen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-08-01 17:48:31,248][02698] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-01 17:48:31,249][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-01 17:48:31,251][02698] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-01 17:48:31,253][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-01 17:48:31,253][02698] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-01 17:48:31,307][02698] RunningMeanStd input shape: (3, 72, 128) +[2025-08-01 17:48:31,310][02698] RunningMeanStd input shape: (1,) +[2025-08-01 17:48:31,326][02698] ConvEncoder: input_channels=3 +[2025-08-01 17:48:31,381][02698] Conv encoder output size: 512 +[2025-08-01 17:48:31,383][02698] Policy head output size: 512 +[2025-08-01 17:48:31,411][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:48:31,413][02698] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:48:31,415][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:48:31,417][02698] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:48:31,419][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:48:31,421][02698] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:48:32,927][02698] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-08-01 17:48:32,928][02698] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-01 17:48:32,930][02698] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-01 17:48:32,931][02698] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-01 17:48:32,932][02698] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-01 17:48:32,933][02698] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-01 17:48:32,934][02698] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-08-01 17:48:32,935][02698] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-01 17:48:32,936][02698] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-08-01 17:48:32,937][02698] Adding new argument 'hf_repository'='TayJen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-08-01 17:48:32,938][02698] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-01 17:48:32,939][02698] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-01 17:48:32,940][02698] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-01 17:48:32,941][02698] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-01 17:48:32,942][02698] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-01 17:48:32,968][02698] RunningMeanStd input shape: (3, 72, 128) +[2025-08-01 17:48:32,970][02698] RunningMeanStd input shape: (1,) +[2025-08-01 17:48:32,980][02698] ConvEncoder: input_channels=3 +[2025-08-01 17:48:33,012][02698] Conv encoder output size: 512 +[2025-08-01 17:48:33,013][02698] Policy head output size: 512 +[2025-08-01 17:48:33,032][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:48:33,033][02698] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:48:33,035][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:48:33,036][02698] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-01 17:48:33,037][02698] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-01 17:48:33,039][02698] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.