[2025-02-11 22:42:42,756][00403] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-02-11 22:42:42,760][00403] Rollout worker 0 uses device cpu [2025-02-11 22:42:42,762][00403] Rollout worker 1 uses device cpu [2025-02-11 22:42:42,764][00403] Rollout worker 2 uses device cpu [2025-02-11 22:42:42,766][00403] Rollout worker 3 uses device cpu [2025-02-11 22:42:42,767][00403] Rollout worker 4 uses device cpu [2025-02-11 22:42:42,769][00403] Rollout worker 5 uses device cpu [2025-02-11 22:42:42,770][00403] Rollout worker 6 uses device cpu [2025-02-11 22:42:42,771][00403] Rollout worker 7 uses device cpu [2025-02-11 22:42:42,957][00403] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-11 22:42:42,961][00403] InferenceWorker_p0-w0: min num requests: 2 [2025-02-11 22:42:43,005][00403] Starting all processes... [2025-02-11 22:42:43,009][00403] Starting process learner_proc0 [2025-02-11 22:42:43,094][00403] Starting all processes... [2025-02-11 22:42:43,102][00403] Starting process inference_proc0-0 [2025-02-11 22:42:43,103][00403] Starting process rollout_proc0 [2025-02-11 22:42:43,103][00403] Starting process rollout_proc1 [2025-02-11 22:42:43,103][00403] Starting process rollout_proc2 [2025-02-11 22:42:43,103][00403] Starting process rollout_proc3 [2025-02-11 22:42:43,103][00403] Starting process rollout_proc4 [2025-02-11 22:42:43,103][00403] Starting process rollout_proc5 [2025-02-11 22:42:43,103][00403] Starting process rollout_proc6 [2025-02-11 22:42:43,103][00403] Starting process rollout_proc7 [2025-02-11 22:42:59,118][02607] Worker 1 uses CPU cores [1] [2025-02-11 22:42:59,112][02608] Worker 2 uses CPU cores [0] [2025-02-11 22:42:59,127][02612] Worker 6 uses CPU cores [0] [2025-02-11 22:42:59,157][02609] Worker 3 uses CPU cores [1] [2025-02-11 22:42:59,279][02592] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-11 22:42:59,280][02592] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-02-11 22:42:59,322][02606] Worker 0 uses CPU cores [0] [2025-02-11 22:42:59,331][02592] Num visible devices: 1 [2025-02-11 22:42:59,367][02592] Starting seed is not provided [2025-02-11 22:42:59,368][02592] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-11 22:42:59,368][02592] Initializing actor-critic model on device cuda:0 [2025-02-11 22:42:59,369][02592] RunningMeanStd input shape: (3, 72, 128) [2025-02-11 22:42:59,372][02592] RunningMeanStd input shape: (1,) [2025-02-11 22:42:59,403][02613] Worker 7 uses CPU cores [1] [2025-02-11 22:42:59,415][02605] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-11 22:42:59,415][02605] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-02-11 22:42:59,417][02611] Worker 5 uses CPU cores [1] [2025-02-11 22:42:59,418][02592] ConvEncoder: input_channels=3 [2025-02-11 22:42:59,447][02605] Num visible devices: 1 [2025-02-11 22:42:59,449][02610] Worker 4 uses CPU cores [0] [2025-02-11 22:42:59,679][02592] Conv encoder output size: 512 [2025-02-11 22:42:59,679][02592] Policy head output size: 512 [2025-02-11 22:42:59,729][02592] Created Actor Critic model with architecture: [2025-02-11 22:42:59,729][02592] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-02-11 22:42:59,955][02592] Using optimizer [2025-02-11 22:43:02,946][00403] Heartbeat connected on Batcher_0 [2025-02-11 22:43:02,958][00403] Heartbeat connected on InferenceWorker_p0-w0 [2025-02-11 22:43:02,971][00403] Heartbeat connected on RolloutWorker_w0 [2025-02-11 22:43:02,978][00403] Heartbeat connected on RolloutWorker_w1 [2025-02-11 22:43:02,982][00403] Heartbeat connected on RolloutWorker_w2 [2025-02-11 22:43:02,987][00403] Heartbeat connected on RolloutWorker_w3 [2025-02-11 22:43:02,993][00403] Heartbeat connected on RolloutWorker_w4 [2025-02-11 22:43:02,997][00403] Heartbeat connected on RolloutWorker_w5 [2025-02-11 22:43:03,000][00403] Heartbeat connected on RolloutWorker_w6 [2025-02-11 22:43:03,005][00403] Heartbeat connected on RolloutWorker_w7 [2025-02-11 22:43:03,934][02592] No checkpoints found [2025-02-11 22:43:03,934][02592] Did not load from checkpoint, starting from scratch! [2025-02-11 22:43:03,935][02592] Initialized policy 0 weights for model version 0 [2025-02-11 22:43:03,937][02592] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-02-11 22:43:03,945][02592] LearnerWorker_p0 finished initialization! [2025-02-11 22:43:03,945][00403] Heartbeat connected on LearnerWorker_p0 [2025-02-11 22:43:04,099][02605] RunningMeanStd input shape: (3, 72, 128) [2025-02-11 22:43:04,100][02605] RunningMeanStd input shape: (1,) [2025-02-11 22:43:04,112][02605] ConvEncoder: input_channels=3 [2025-02-11 22:43:04,215][02605] Conv encoder output size: 512 [2025-02-11 22:43:04,216][02605] Policy head output size: 512 [2025-02-11 22:43:04,251][00403] Inference worker 0-0 is ready! [2025-02-11 22:43:04,252][00403] All inference workers are ready! Signal rollout workers to start! [2025-02-11 22:43:04,563][02612] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-11 22:43:04,570][02609] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-11 22:43:04,571][02606] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-11 22:43:04,607][02613] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-11 22:43:04,616][02610] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-11 22:43:04,728][02607] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-11 22:43:04,725][02608] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-11 22:43:04,738][02611] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-11 22:43:05,954][02608] Decorrelating experience for 0 frames... [2025-02-11 22:43:05,954][02607] Decorrelating experience for 0 frames... [2025-02-11 22:43:05,953][02606] Decorrelating experience for 0 frames... [2025-02-11 22:43:05,996][02611] Decorrelating experience for 0 frames... [2025-02-11 22:43:06,701][02607] Decorrelating experience for 32 frames... [2025-02-11 22:43:06,729][02608] Decorrelating experience for 32 frames... [2025-02-11 22:43:06,731][02606] Decorrelating experience for 32 frames... [2025-02-11 22:43:06,763][02611] Decorrelating experience for 32 frames... [2025-02-11 22:43:07,817][02608] Decorrelating experience for 64 frames... [2025-02-11 22:43:07,963][02607] Decorrelating experience for 64 frames... [2025-02-11 22:43:08,044][02611] Decorrelating experience for 64 frames... [2025-02-11 22:43:08,756][02606] Decorrelating experience for 64 frames... [2025-02-11 22:43:08,797][00403] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-11 22:43:08,914][02607] Decorrelating experience for 96 frames... [2025-02-11 22:43:09,010][02611] Decorrelating experience for 96 frames... [2025-02-11 22:43:09,993][02608] Decorrelating experience for 96 frames... [2025-02-11 22:43:10,107][02606] Decorrelating experience for 96 frames... [2025-02-11 22:43:13,521][02592] Signal inference workers to stop experience collection... [2025-02-11 22:43:13,528][02605] InferenceWorker_p0-w0: stopping experience collection [2025-02-11 22:43:13,789][00403] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 423.0. Samples: 2112. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-02-11 22:43:13,791][00403] Avg episode reward: [(0, '3.107')] [2025-02-11 22:43:15,157][02592] Signal inference workers to resume experience collection... [2025-02-11 22:43:15,159][02605] InferenceWorker_p0-w0: resuming experience collection [2025-02-11 22:43:18,789][00403] Fps is (10 sec: 2049.5, 60 sec: 2049.5, 300 sec: 2049.5). Total num frames: 20480. Throughput: 0: 322.8. Samples: 3226. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:43:18,794][00403] Avg episode reward: [(0, '3.848')] [2025-02-11 22:43:23,791][00403] Fps is (10 sec: 3685.8, 60 sec: 2458.6, 300 sec: 2458.6). Total num frames: 36864. Throughput: 0: 619.0. Samples: 9282. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:43:23,795][00403] Avg episode reward: [(0, '4.129')] [2025-02-11 22:43:24,160][02605] Updated weights for policy 0, policy_version 10 (0.0091) [2025-02-11 22:43:28,789][00403] Fps is (10 sec: 3686.4, 60 sec: 2868.3, 300 sec: 2868.3). Total num frames: 57344. Throughput: 0: 730.0. Samples: 14594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:43:28,794][00403] Avg episode reward: [(0, '4.329')] [2025-02-11 22:43:33,789][00403] Fps is (10 sec: 4096.6, 60 sec: 3113.9, 300 sec: 3113.9). Total num frames: 77824. Throughput: 0: 713.2. Samples: 17824. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:43:33,795][00403] Avg episode reward: [(0, '4.450')] [2025-02-11 22:43:34,011][02605] Updated weights for policy 0, policy_version 20 (0.0012) [2025-02-11 22:43:38,790][00403] Fps is (10 sec: 3686.1, 60 sec: 3141.0, 300 sec: 3141.0). Total num frames: 94208. Throughput: 0: 777.2. Samples: 23312. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:43:38,797][00403] Avg episode reward: [(0, '4.481')] [2025-02-11 22:43:43,790][00403] Fps is (10 sec: 3686.3, 60 sec: 3277.5, 300 sec: 3277.5). Total num frames: 114688. Throughput: 0: 826.8. Samples: 28932. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:43:43,792][00403] Avg episode reward: [(0, '4.475')] [2025-02-11 22:43:43,799][02592] Saving new best policy, reward=4.475! [2025-02-11 22:43:45,251][02605] Updated weights for policy 0, policy_version 30 (0.0016) [2025-02-11 22:43:48,790][00403] Fps is (10 sec: 4096.2, 60 sec: 3379.8, 300 sec: 3379.8). Total num frames: 135168. Throughput: 0: 800.4. Samples: 32010. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:43:48,794][00403] Avg episode reward: [(0, '4.290')] [2025-02-11 22:43:53,790][00403] Fps is (10 sec: 3686.5, 60 sec: 3368.4, 300 sec: 3368.4). Total num frames: 151552. Throughput: 0: 826.1. Samples: 37170. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:43:53,795][00403] Avg episode reward: [(0, '4.299')] [2025-02-11 22:43:56,263][02605] Updated weights for policy 0, policy_version 40 (0.0016) [2025-02-11 22:43:58,789][00403] Fps is (10 sec: 3686.5, 60 sec: 3441.2, 300 sec: 3441.2). Total num frames: 172032. Throughput: 0: 919.0. Samples: 43468. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:43:58,792][00403] Avg episode reward: [(0, '4.493')] [2025-02-11 22:43:58,797][02592] Saving new best policy, reward=4.493! [2025-02-11 22:44:03,791][00403] Fps is (10 sec: 4095.4, 60 sec: 3500.6, 300 sec: 3500.6). Total num frames: 192512. Throughput: 0: 966.6. Samples: 46726. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:44:03,795][00403] Avg episode reward: [(0, '4.450')] [2025-02-11 22:44:07,771][02605] Updated weights for policy 0, policy_version 50 (0.0014) [2025-02-11 22:44:08,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3482.0, 300 sec: 3482.0). Total num frames: 208896. Throughput: 0: 925.1. Samples: 50908. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:44:08,791][00403] Avg episode reward: [(0, '4.475')] [2025-02-11 22:44:13,789][00403] Fps is (10 sec: 3686.9, 60 sec: 3822.9, 300 sec: 3529.3). Total num frames: 229376. Throughput: 0: 946.9. Samples: 57206. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:44:13,796][00403] Avg episode reward: [(0, '4.536')] [2025-02-11 22:44:13,810][02592] Saving new best policy, reward=4.536! [2025-02-11 22:44:18,253][02605] Updated weights for policy 0, policy_version 60 (0.0013) [2025-02-11 22:44:18,790][00403] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3511.2). Total num frames: 245760. Throughput: 0: 945.4. Samples: 60366. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:44:18,795][00403] Avg episode reward: [(0, '4.559')] [2025-02-11 22:44:18,800][02592] Saving new best policy, reward=4.559! [2025-02-11 22:44:23,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3550.2). Total num frames: 266240. Throughput: 0: 931.3. Samples: 65218. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:44:23,795][00403] Avg episode reward: [(0, '4.629')] [2025-02-11 22:44:23,801][02592] Saving new best policy, reward=4.629! [2025-02-11 22:44:28,548][02605] Updated weights for policy 0, policy_version 70 (0.0013) [2025-02-11 22:44:28,789][00403] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3584.3). Total num frames: 286720. Throughput: 0: 948.4. Samples: 71610. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:44:28,796][00403] Avg episode reward: [(0, '4.454')] [2025-02-11 22:44:33,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3566.2). Total num frames: 303104. Throughput: 0: 941.7. Samples: 74386. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:44:33,792][00403] Avg episode reward: [(0, '4.304')] [2025-02-11 22:44:33,801][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth... [2025-02-11 22:44:38,789][00403] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3504.6). Total num frames: 315392. Throughput: 0: 909.9. Samples: 78114. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:44:38,792][00403] Avg episode reward: [(0, '4.300')] [2025-02-11 22:44:41,140][02605] Updated weights for policy 0, policy_version 80 (0.0029) [2025-02-11 22:44:43,789][00403] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3535.8). Total num frames: 335872. Throughput: 0: 912.1. Samples: 84512. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:44:43,792][00403] Avg episode reward: [(0, '4.396')] [2025-02-11 22:44:48,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3522.8). Total num frames: 352256. Throughput: 0: 894.7. Samples: 86984. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:44:48,794][00403] Avg episode reward: [(0, '4.330')] [2025-02-11 22:44:52,084][02605] Updated weights for policy 0, policy_version 90 (0.0012) [2025-02-11 22:44:53,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3550.1). Total num frames: 372736. Throughput: 0: 926.9. Samples: 92620. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:44:53,793][00403] Avg episode reward: [(0, '4.420')] [2025-02-11 22:44:58,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3574.9). Total num frames: 393216. Throughput: 0: 932.8. Samples: 99184. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-11 22:44:58,794][00403] Avg episode reward: [(0, '4.488')] [2025-02-11 22:45:02,928][02605] Updated weights for policy 0, policy_version 100 (0.0016) [2025-02-11 22:45:03,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3562.0). Total num frames: 409600. Throughput: 0: 906.9. Samples: 101176. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:45:03,795][00403] Avg episode reward: [(0, '4.415')] [2025-02-11 22:45:08,790][00403] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3618.3). Total num frames: 434176. Throughput: 0: 937.6. Samples: 107410. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:45:08,794][00403] Avg episode reward: [(0, '4.490')] [2025-02-11 22:45:12,806][02605] Updated weights for policy 0, policy_version 110 (0.0016) [2025-02-11 22:45:13,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3604.7). Total num frames: 450560. Throughput: 0: 922.8. Samples: 113136. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:45:13,794][00403] Avg episode reward: [(0, '4.547')] [2025-02-11 22:45:18,789][00403] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3623.6). Total num frames: 471040. Throughput: 0: 910.5. Samples: 115358. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:45:18,794][00403] Avg episode reward: [(0, '4.488')] [2025-02-11 22:45:23,317][02605] Updated weights for policy 0, policy_version 120 (0.0015) [2025-02-11 22:45:23,790][00403] Fps is (10 sec: 4095.9, 60 sec: 3754.6, 300 sec: 3641.1). Total num frames: 491520. Throughput: 0: 974.3. Samples: 121956. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:45:23,792][00403] Avg episode reward: [(0, '4.409')] [2025-02-11 22:45:28,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3628.1). Total num frames: 507904. Throughput: 0: 951.1. Samples: 127312. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:45:28,792][00403] Avg episode reward: [(0, '4.300')] [2025-02-11 22:45:33,789][00403] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3644.2). Total num frames: 528384. Throughput: 0: 958.0. Samples: 130094. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:45:33,797][00403] Avg episode reward: [(0, '4.511')] [2025-02-11 22:45:34,268][02605] Updated weights for policy 0, policy_version 130 (0.0013) [2025-02-11 22:45:38,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3659.3). Total num frames: 548864. Throughput: 0: 977.2. Samples: 136594. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:45:38,798][00403] Avg episode reward: [(0, '4.631')] [2025-02-11 22:45:38,801][02592] Saving new best policy, reward=4.631! [2025-02-11 22:45:43,790][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3646.9). Total num frames: 565248. Throughput: 0: 937.5. Samples: 141370. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:45:43,792][00403] Avg episode reward: [(0, '4.592')] [2025-02-11 22:45:45,436][02605] Updated weights for policy 0, policy_version 140 (0.0012) [2025-02-11 22:45:48,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3661.0). Total num frames: 585728. Throughput: 0: 964.1. Samples: 144560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:45:48,797][00403] Avg episode reward: [(0, '4.646')] [2025-02-11 22:45:48,800][02592] Saving new best policy, reward=4.646! [2025-02-11 22:45:53,792][00403] Fps is (10 sec: 4094.9, 60 sec: 3891.0, 300 sec: 3674.1). Total num frames: 606208. Throughput: 0: 967.1. Samples: 150932. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:45:53,796][00403] Avg episode reward: [(0, '4.569')] [2025-02-11 22:45:56,064][02605] Updated weights for policy 0, policy_version 150 (0.0018) [2025-02-11 22:45:58,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3662.5). Total num frames: 622592. Throughput: 0: 949.9. Samples: 155880. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:45:58,796][00403] Avg episode reward: [(0, '4.382')] [2025-02-11 22:46:03,789][00403] Fps is (10 sec: 3687.4, 60 sec: 3891.2, 300 sec: 3674.9). Total num frames: 643072. Throughput: 0: 971.8. Samples: 159088. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:46:03,796][00403] Avg episode reward: [(0, '4.556')] [2025-02-11 22:46:05,883][02605] Updated weights for policy 0, policy_version 160 (0.0013) [2025-02-11 22:46:08,791][00403] Fps is (10 sec: 4095.3, 60 sec: 3822.9, 300 sec: 3686.5). Total num frames: 663552. Throughput: 0: 965.2. Samples: 165392. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:46:08,796][00403] Avg episode reward: [(0, '4.601')] [2025-02-11 22:46:13,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3675.5). Total num frames: 679936. Throughput: 0: 952.3. Samples: 170166. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:46:13,795][00403] Avg episode reward: [(0, '4.465')] [2025-02-11 22:46:17,115][02605] Updated weights for policy 0, policy_version 170 (0.0013) [2025-02-11 22:46:18,789][00403] Fps is (10 sec: 3687.0, 60 sec: 3822.9, 300 sec: 3686.5). Total num frames: 700416. Throughput: 0: 961.6. Samples: 173368. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:46:18,794][00403] Avg episode reward: [(0, '4.386')] [2025-02-11 22:46:23,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3676.0). Total num frames: 716800. Throughput: 0: 950.3. Samples: 179358. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:46:23,791][00403] Avg episode reward: [(0, '4.459')] [2025-02-11 22:46:28,125][02605] Updated weights for policy 0, policy_version 180 (0.0012) [2025-02-11 22:46:28,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3686.5). Total num frames: 737280. Throughput: 0: 964.6. Samples: 184778. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:46:28,794][00403] Avg episode reward: [(0, '4.392')] [2025-02-11 22:46:33,789][00403] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3716.5). Total num frames: 761856. Throughput: 0: 965.8. Samples: 188020. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:46:33,792][00403] Avg episode reward: [(0, '4.463')] [2025-02-11 22:46:33,803][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth... [2025-02-11 22:46:38,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3686.5). Total num frames: 774144. Throughput: 0: 943.8. Samples: 193400. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:46:38,796][00403] Avg episode reward: [(0, '4.473')] [2025-02-11 22:46:39,077][02605] Updated weights for policy 0, policy_version 190 (0.0021) [2025-02-11 22:46:43,790][00403] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3696.1). Total num frames: 794624. Throughput: 0: 964.3. Samples: 199274. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:46:43,792][00403] Avg episode reward: [(0, '4.398')] [2025-02-11 22:46:48,707][02605] Updated weights for policy 0, policy_version 200 (0.0013) [2025-02-11 22:46:48,790][00403] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3723.8). Total num frames: 819200. Throughput: 0: 965.9. Samples: 202554. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:46:48,791][00403] Avg episode reward: [(0, '4.475')] [2025-02-11 22:46:53,789][00403] Fps is (10 sec: 4096.1, 60 sec: 3823.1, 300 sec: 3713.8). Total num frames: 835584. Throughput: 0: 935.4. Samples: 207482. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:46:53,796][00403] Avg episode reward: [(0, '4.394')] [2025-02-11 22:46:58,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3722.1). Total num frames: 856064. Throughput: 0: 975.1. Samples: 214044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-11 22:46:58,796][00403] Avg episode reward: [(0, '4.454')] [2025-02-11 22:46:59,581][02605] Updated weights for policy 0, policy_version 210 (0.0013) [2025-02-11 22:47:03,790][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3712.7). Total num frames: 872448. Throughput: 0: 975.6. Samples: 217270. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:47:03,792][00403] Avg episode reward: [(0, '4.526')] [2025-02-11 22:47:08,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3720.6). Total num frames: 892928. Throughput: 0: 951.4. Samples: 222172. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-11 22:47:08,792][00403] Avg episode reward: [(0, '4.610')] [2025-02-11 22:47:10,464][02605] Updated weights for policy 0, policy_version 220 (0.0018) [2025-02-11 22:47:13,790][00403] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3728.3). Total num frames: 913408. Throughput: 0: 972.8. Samples: 228556. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:47:13,797][00403] Avg episode reward: [(0, '4.478')] [2025-02-11 22:47:18,791][00403] Fps is (10 sec: 3685.8, 60 sec: 3822.8, 300 sec: 3719.3). Total num frames: 929792. Throughput: 0: 969.8. Samples: 231664. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:47:18,793][00403] Avg episode reward: [(0, '4.545')] [2025-02-11 22:47:21,378][02605] Updated weights for policy 0, policy_version 230 (0.0013) [2025-02-11 22:47:23,789][00403] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3726.7). Total num frames: 950272. Throughput: 0: 964.4. Samples: 236796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:47:23,794][00403] Avg episode reward: [(0, '4.554')] [2025-02-11 22:47:28,789][00403] Fps is (10 sec: 3687.0, 60 sec: 3822.9, 300 sec: 3718.0). Total num frames: 966656. Throughput: 0: 951.5. Samples: 242090. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:47:28,793][00403] Avg episode reward: [(0, '4.664')] [2025-02-11 22:47:28,799][02592] Saving new best policy, reward=4.664! [2025-02-11 22:47:33,789][00403] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3694.2). Total num frames: 978944. Throughput: 0: 923.2. Samples: 244096. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:47:33,800][00403] Avg episode reward: [(0, '4.660')] [2025-02-11 22:47:33,964][02605] Updated weights for policy 0, policy_version 240 (0.0015) [2025-02-11 22:47:38,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3716.8). Total num frames: 1003520. Throughput: 0: 936.6. Samples: 249630. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:47:38,797][00403] Avg episode reward: [(0, '4.715')] [2025-02-11 22:47:38,799][02592] Saving new best policy, reward=4.715! [2025-02-11 22:47:43,792][02605] Updated weights for policy 0, policy_version 250 (0.0015) [2025-02-11 22:47:43,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3708.8). Total num frames: 1019904. Throughput: 0: 930.8. Samples: 255932. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:47:43,793][00403] Avg episode reward: [(0, '4.569')] [2025-02-11 22:47:48,789][00403] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3701.1). Total num frames: 1036288. Throughput: 0: 905.3. Samples: 258008. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:47:48,791][00403] Avg episode reward: [(0, '4.623')] [2025-02-11 22:47:53,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3722.4). Total num frames: 1060864. Throughput: 0: 930.9. Samples: 264062. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:47:53,796][00403] Avg episode reward: [(0, '4.710')] [2025-02-11 22:47:54,558][02605] Updated weights for policy 0, policy_version 260 (0.0017) [2025-02-11 22:47:58,791][00403] Fps is (10 sec: 4095.2, 60 sec: 3686.3, 300 sec: 3714.7). Total num frames: 1077248. Throughput: 0: 923.9. Samples: 270132. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:47:58,796][00403] Avg episode reward: [(0, '4.655')] [2025-02-11 22:48:03,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3721.2). Total num frames: 1097728. Throughput: 0: 902.7. Samples: 272284. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:48:03,792][00403] Avg episode reward: [(0, '4.554')] [2025-02-11 22:48:05,448][02605] Updated weights for policy 0, policy_version 270 (0.0025) [2025-02-11 22:48:08,789][00403] Fps is (10 sec: 4096.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1118208. Throughput: 0: 932.3. Samples: 278750. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:48:08,795][00403] Avg episode reward: [(0, '4.694')] [2025-02-11 22:48:13,793][00403] Fps is (10 sec: 3685.1, 60 sec: 3686.2, 300 sec: 3776.6). Total num frames: 1134592. Throughput: 0: 934.0. Samples: 284124. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:48:13,795][00403] Avg episode reward: [(0, '4.838')] [2025-02-11 22:48:13,808][02592] Saving new best policy, reward=4.838! [2025-02-11 22:48:16,438][02605] Updated weights for policy 0, policy_version 280 (0.0020) [2025-02-11 22:48:18,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3790.6). Total num frames: 1155072. Throughput: 0: 948.8. Samples: 286790. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:48:18,797][00403] Avg episode reward: [(0, '4.954')] [2025-02-11 22:48:18,799][02592] Saving new best policy, reward=4.954! [2025-02-11 22:48:23,789][00403] Fps is (10 sec: 4097.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1175552. Throughput: 0: 967.6. Samples: 293172. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:48:23,794][00403] Avg episode reward: [(0, '4.928')] [2025-02-11 22:48:26,854][02605] Updated weights for policy 0, policy_version 290 (0.0023) [2025-02-11 22:48:28,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1191936. Throughput: 0: 938.7. Samples: 298174. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:48:28,796][00403] Avg episode reward: [(0, '4.970')] [2025-02-11 22:48:28,799][02592] Saving new best policy, reward=4.970! [2025-02-11 22:48:33,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1212416. Throughput: 0: 963.0. Samples: 301342. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:48:33,795][00403] Avg episode reward: [(0, '4.733')] [2025-02-11 22:48:33,803][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth... [2025-02-11 22:48:33,907][02592] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth [2025-02-11 22:48:37,011][02605] Updated weights for policy 0, policy_version 300 (0.0017) [2025-02-11 22:48:38,792][00403] Fps is (10 sec: 4095.0, 60 sec: 3822.8, 300 sec: 3790.5). Total num frames: 1232896. Throughput: 0: 970.3. Samples: 307730. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:48:38,794][00403] Avg episode reward: [(0, '4.804')] [2025-02-11 22:48:43,790][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1249280. Throughput: 0: 944.4. Samples: 312630. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:48:43,793][00403] Avg episode reward: [(0, '5.048')] [2025-02-11 22:48:43,803][02592] Saving new best policy, reward=5.048! [2025-02-11 22:48:48,034][02605] Updated weights for policy 0, policy_version 310 (0.0016) [2025-02-11 22:48:48,789][00403] Fps is (10 sec: 3687.3, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1269760. Throughput: 0: 966.7. Samples: 315784. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:48:48,803][00403] Avg episode reward: [(0, '4.962')] [2025-02-11 22:48:53,789][00403] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1290240. Throughput: 0: 963.8. Samples: 322122. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:48:53,792][00403] Avg episode reward: [(0, '4.992')] [2025-02-11 22:48:58,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3776.7). Total num frames: 1306624. Throughput: 0: 954.7. Samples: 327084. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:48:58,795][00403] Avg episode reward: [(0, '5.192')] [2025-02-11 22:48:58,797][02592] Saving new best policy, reward=5.192! [2025-02-11 22:48:59,230][02605] Updated weights for policy 0, policy_version 320 (0.0017) [2025-02-11 22:49:03,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1327104. Throughput: 0: 966.1. Samples: 330266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:49:03,793][00403] Avg episode reward: [(0, '5.095')] [2025-02-11 22:49:08,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1347584. Throughput: 0: 955.4. Samples: 336164. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:49:08,795][00403] Avg episode reward: [(0, '5.109')] [2025-02-11 22:49:09,968][02605] Updated weights for policy 0, policy_version 330 (0.0015) [2025-02-11 22:49:13,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3790.5). Total num frames: 1363968. Throughput: 0: 965.0. Samples: 341598. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:49:13,795][00403] Avg episode reward: [(0, '4.896')] [2025-02-11 22:49:18,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1388544. Throughput: 0: 966.0. Samples: 344810. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:49:18,796][00403] Avg episode reward: [(0, '4.758')] [2025-02-11 22:49:19,531][02605] Updated weights for policy 0, policy_version 340 (0.0015) [2025-02-11 22:49:23,790][00403] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1404928. Throughput: 0: 947.1. Samples: 350348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:49:23,796][00403] Avg episode reward: [(0, '4.813')] [2025-02-11 22:49:28,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1425408. Throughput: 0: 971.7. Samples: 356356. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:49:28,795][00403] Avg episode reward: [(0, '4.807')] [2025-02-11 22:49:30,284][02605] Updated weights for policy 0, policy_version 350 (0.0012) [2025-02-11 22:49:33,792][00403] Fps is (10 sec: 4095.1, 60 sec: 3891.1, 300 sec: 3832.2). Total num frames: 1445888. Throughput: 0: 975.0. Samples: 359660. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:49:33,798][00403] Avg episode reward: [(0, '4.543')] [2025-02-11 22:49:38,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3818.3). Total num frames: 1462272. Throughput: 0: 946.2. Samples: 364700. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:49:38,796][00403] Avg episode reward: [(0, '4.765')] [2025-02-11 22:49:41,073][02605] Updated weights for policy 0, policy_version 360 (0.0013) [2025-02-11 22:49:43,789][00403] Fps is (10 sec: 3687.2, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1482752. Throughput: 0: 979.0. Samples: 371140. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:49:43,791][00403] Avg episode reward: [(0, '5.392')] [2025-02-11 22:49:43,800][02592] Saving new best policy, reward=5.392! [2025-02-11 22:49:48,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1503232. Throughput: 0: 979.5. Samples: 374342. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:49:48,792][00403] Avg episode reward: [(0, '5.435')] [2025-02-11 22:49:48,800][02592] Saving new best policy, reward=5.435! [2025-02-11 22:49:52,306][02605] Updated weights for policy 0, policy_version 370 (0.0012) [2025-02-11 22:49:53,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1519616. Throughput: 0: 956.6. Samples: 379212. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:49:53,793][00403] Avg episode reward: [(0, '5.237')] [2025-02-11 22:49:58,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 1544192. Throughput: 0: 980.8. Samples: 385736. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:49:58,797][00403] Avg episode reward: [(0, '5.364')] [2025-02-11 22:50:02,186][02605] Updated weights for policy 0, policy_version 380 (0.0015) [2025-02-11 22:50:03,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1556480. Throughput: 0: 975.3. Samples: 388698. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:50:03,792][00403] Avg episode reward: [(0, '5.500')] [2025-02-11 22:50:03,803][02592] Saving new best policy, reward=5.500! [2025-02-11 22:50:08,789][00403] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 1576960. Throughput: 0: 966.0. Samples: 393816. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:50:08,792][00403] Avg episode reward: [(0, '5.808')] [2025-02-11 22:50:08,868][02592] Saving new best policy, reward=5.808! [2025-02-11 22:50:12,881][02605] Updated weights for policy 0, policy_version 390 (0.0013) [2025-02-11 22:50:13,790][00403] Fps is (10 sec: 4505.4, 60 sec: 3959.4, 300 sec: 3832.2). Total num frames: 1601536. Throughput: 0: 971.8. Samples: 400086. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:50:13,792][00403] Avg episode reward: [(0, '6.195')] [2025-02-11 22:50:13,803][02592] Saving new best policy, reward=6.195! [2025-02-11 22:50:18,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1613824. Throughput: 0: 948.2. Samples: 402328. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:50:18,791][00403] Avg episode reward: [(0, '6.115')] [2025-02-11 22:50:23,789][00403] Fps is (10 sec: 2867.3, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1630208. Throughput: 0: 928.8. Samples: 406498. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:50:23,792][00403] Avg episode reward: [(0, '6.407')] [2025-02-11 22:50:23,811][02592] Saving new best policy, reward=6.407! [2025-02-11 22:50:25,552][02605] Updated weights for policy 0, policy_version 400 (0.0016) [2025-02-11 22:50:28,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1650688. Throughput: 0: 926.8. Samples: 412848. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:50:28,796][00403] Avg episode reward: [(0, '6.149')] [2025-02-11 22:50:33,790][00403] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3790.5). Total num frames: 1667072. Throughput: 0: 913.1. Samples: 415430. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:50:33,794][00403] Avg episode reward: [(0, '5.820')] [2025-02-11 22:50:33,803][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000407_1667072.pth... [2025-02-11 22:50:33,903][02592] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000186_761856.pth [2025-02-11 22:50:36,593][02605] Updated weights for policy 0, policy_version 410 (0.0013) [2025-02-11 22:50:38,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1687552. Throughput: 0: 925.8. Samples: 420872. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:50:38,796][00403] Avg episode reward: [(0, '5.435')] [2025-02-11 22:50:43,791][00403] Fps is (10 sec: 4095.5, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 1708032. Throughput: 0: 920.5. Samples: 427160. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:50:43,795][00403] Avg episode reward: [(0, '6.129')] [2025-02-11 22:50:47,746][02605] Updated weights for policy 0, policy_version 420 (0.0018) [2025-02-11 22:50:48,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 1724416. Throughput: 0: 902.0. Samples: 429288. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:50:48,797][00403] Avg episode reward: [(0, '6.363')] [2025-02-11 22:50:53,789][00403] Fps is (10 sec: 3686.9, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1744896. Throughput: 0: 921.6. Samples: 435290. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:50:53,796][00403] Avg episode reward: [(0, '6.577')] [2025-02-11 22:50:53,804][02592] Saving new best policy, reward=6.577! [2025-02-11 22:50:57,286][02605] Updated weights for policy 0, policy_version 430 (0.0014) [2025-02-11 22:50:58,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3804.4). Total num frames: 1765376. Throughput: 0: 916.6. Samples: 441332. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:50:58,798][00403] Avg episode reward: [(0, '5.915')] [2025-02-11 22:51:03,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.6). Total num frames: 1781760. Throughput: 0: 912.4. Samples: 443386. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:51:03,796][00403] Avg episode reward: [(0, '6.076')] [2025-02-11 22:51:08,092][02605] Updated weights for policy 0, policy_version 440 (0.0013) [2025-02-11 22:51:08,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1802240. Throughput: 0: 963.4. Samples: 449852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:51:08,794][00403] Avg episode reward: [(0, '6.334')] [2025-02-11 22:51:13,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3790.5). Total num frames: 1818624. Throughput: 0: 944.3. Samples: 455342. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:51:13,794][00403] Avg episode reward: [(0, '6.376')] [2025-02-11 22:51:18,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 1839104. Throughput: 0: 944.2. Samples: 457920. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:51:18,794][00403] Avg episode reward: [(0, '6.399')] [2025-02-11 22:51:19,200][02605] Updated weights for policy 0, policy_version 450 (0.0015) [2025-02-11 22:51:23,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1859584. Throughput: 0: 967.8. Samples: 464424. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:51:23,794][00403] Avg episode reward: [(0, '6.371')] [2025-02-11 22:51:28,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1875968. Throughput: 0: 941.6. Samples: 469530. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:51:28,791][00403] Avg episode reward: [(0, '6.489')] [2025-02-11 22:51:30,065][02605] Updated weights for policy 0, policy_version 460 (0.0012) [2025-02-11 22:51:33,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1896448. Throughput: 0: 961.9. Samples: 472574. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:51:33,796][00403] Avg episode reward: [(0, '6.722')] [2025-02-11 22:51:33,803][02592] Saving new best policy, reward=6.722! [2025-02-11 22:51:38,791][00403] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3804.4). Total num frames: 1916928. Throughput: 0: 969.1. Samples: 478900. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:51:38,795][00403] Avg episode reward: [(0, '6.672')] [2025-02-11 22:51:40,356][02605] Updated weights for policy 0, policy_version 470 (0.0015) [2025-02-11 22:51:43,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3776.7). Total num frames: 1933312. Throughput: 0: 940.0. Samples: 483634. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:51:43,792][00403] Avg episode reward: [(0, '6.835')] [2025-02-11 22:51:43,802][02592] Saving new best policy, reward=6.835! [2025-02-11 22:51:48,789][00403] Fps is (10 sec: 3686.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1953792. Throughput: 0: 965.0. Samples: 486812. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:51:48,796][00403] Avg episode reward: [(0, '8.066')] [2025-02-11 22:51:48,798][02592] Saving new best policy, reward=8.066! [2025-02-11 22:51:50,945][02605] Updated weights for policy 0, policy_version 480 (0.0015) [2025-02-11 22:51:53,790][00403] Fps is (10 sec: 4095.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1974272. Throughput: 0: 964.3. Samples: 493248. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:51:53,794][00403] Avg episode reward: [(0, '8.494')] [2025-02-11 22:51:53,809][02592] Saving new best policy, reward=8.494! [2025-02-11 22:51:58,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1990656. Throughput: 0: 950.3. Samples: 498104. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:51:58,792][00403] Avg episode reward: [(0, '8.660')] [2025-02-11 22:51:58,795][02592] Saving new best policy, reward=8.660! [2025-02-11 22:52:01,875][02605] Updated weights for policy 0, policy_version 490 (0.0022) [2025-02-11 22:52:03,789][00403] Fps is (10 sec: 3686.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2011136. Throughput: 0: 964.7. Samples: 501330. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:52:03,795][00403] Avg episode reward: [(0, '8.308')] [2025-02-11 22:52:08,790][00403] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2031616. Throughput: 0: 952.3. Samples: 507276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:52:08,794][00403] Avg episode reward: [(0, '7.925')] [2025-02-11 22:52:13,009][02605] Updated weights for policy 0, policy_version 500 (0.0017) [2025-02-11 22:52:13,790][00403] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2048000. Throughput: 0: 955.1. Samples: 512510. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:52:13,792][00403] Avg episode reward: [(0, '8.261')] [2025-02-11 22:52:18,792][00403] Fps is (10 sec: 4095.0, 60 sec: 3891.0, 300 sec: 3804.4). Total num frames: 2072576. Throughput: 0: 959.6. Samples: 515758. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:52:18,794][00403] Avg episode reward: [(0, '7.988')] [2025-02-11 22:52:23,789][00403] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2084864. Throughput: 0: 941.9. Samples: 521282. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:52:23,796][00403] Avg episode reward: [(0, '8.182')] [2025-02-11 22:52:23,990][02605] Updated weights for policy 0, policy_version 510 (0.0012) [2025-02-11 22:52:28,789][00403] Fps is (10 sec: 3687.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2109440. Throughput: 0: 968.0. Samples: 527194. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:52:28,792][00403] Avg episode reward: [(0, '7.929')] [2025-02-11 22:52:33,350][02605] Updated weights for policy 0, policy_version 520 (0.0016) [2025-02-11 22:52:33,791][00403] Fps is (10 sec: 4504.8, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 2129920. Throughput: 0: 970.9. Samples: 530502. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:52:33,793][00403] Avg episode reward: [(0, '8.583')] [2025-02-11 22:52:33,802][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000520_2129920.pth... [2025-02-11 22:52:33,911][02592] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth [2025-02-11 22:52:38,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 2146304. Throughput: 0: 935.7. Samples: 535356. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:52:38,798][00403] Avg episode reward: [(0, '8.131')] [2025-02-11 22:52:43,789][00403] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2166784. Throughput: 0: 969.4. Samples: 541728. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:52:43,796][00403] Avg episode reward: [(0, '9.031')] [2025-02-11 22:52:43,806][02592] Saving new best policy, reward=9.031! [2025-02-11 22:52:44,651][02605] Updated weights for policy 0, policy_version 530 (0.0012) [2025-02-11 22:52:48,790][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2183168. Throughput: 0: 965.6. Samples: 544782. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:52:48,791][00403] Avg episode reward: [(0, '9.489')] [2025-02-11 22:52:48,800][02592] Saving new best policy, reward=9.489! [2025-02-11 22:52:53,789][00403] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2199552. Throughput: 0: 938.0. Samples: 549486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:52:53,793][00403] Avg episode reward: [(0, '10.664')] [2025-02-11 22:52:53,858][02592] Saving new best policy, reward=10.664! [2025-02-11 22:52:55,810][02605] Updated weights for policy 0, policy_version 540 (0.0025) [2025-02-11 22:52:58,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2224128. Throughput: 0: 964.6. Samples: 555916. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:52:58,791][00403] Avg episode reward: [(0, '11.485')] [2025-02-11 22:52:58,798][02592] Saving new best policy, reward=11.485! [2025-02-11 22:53:03,791][00403] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3804.4). Total num frames: 2240512. Throughput: 0: 961.1. Samples: 559006. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:53:03,793][00403] Avg episode reward: [(0, '10.718')] [2025-02-11 22:53:08,409][02605] Updated weights for policy 0, policy_version 550 (0.0014) [2025-02-11 22:53:08,789][00403] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 2252800. Throughput: 0: 919.4. Samples: 562656. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:53:08,792][00403] Avg episode reward: [(0, '10.606')] [2025-02-11 22:53:13,789][00403] Fps is (10 sec: 3277.3, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2273280. Throughput: 0: 922.1. Samples: 568690. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:53:13,792][00403] Avg episode reward: [(0, '9.855')] [2025-02-11 22:53:18,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3618.3, 300 sec: 3776.7). Total num frames: 2289664. Throughput: 0: 912.5. Samples: 571562. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:53:18,800][00403] Avg episode reward: [(0, '9.816')] [2025-02-11 22:53:19,471][02605] Updated weights for policy 0, policy_version 560 (0.0017) [2025-02-11 22:53:23,790][00403] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 2310144. Throughput: 0: 918.7. Samples: 576698. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:53:23,798][00403] Avg episode reward: [(0, '10.401')] [2025-02-11 22:53:28,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2330624. Throughput: 0: 919.9. Samples: 583124. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:53:28,791][00403] Avg episode reward: [(0, '11.775')] [2025-02-11 22:53:28,798][02592] Saving new best policy, reward=11.775! [2025-02-11 22:53:29,044][02605] Updated weights for policy 0, policy_version 570 (0.0015) [2025-02-11 22:53:33,789][00403] Fps is (10 sec: 3686.5, 60 sec: 3618.2, 300 sec: 3776.7). Total num frames: 2347008. Throughput: 0: 905.8. Samples: 585544. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:53:33,797][00403] Avg episode reward: [(0, '12.828')] [2025-02-11 22:53:33,806][02592] Saving new best policy, reward=12.828! [2025-02-11 22:53:38,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3790.5). Total num frames: 2367488. Throughput: 0: 924.5. Samples: 591090. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:53:38,795][00403] Avg episode reward: [(0, '13.259')] [2025-02-11 22:53:38,802][02592] Saving new best policy, reward=13.259! [2025-02-11 22:53:40,156][02605] Updated weights for policy 0, policy_version 580 (0.0012) [2025-02-11 22:53:43,792][00403] Fps is (10 sec: 4095.0, 60 sec: 3686.2, 300 sec: 3790.5). Total num frames: 2387968. Throughput: 0: 921.5. Samples: 597384. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:53:43,794][00403] Avg episode reward: [(0, '13.153')] [2025-02-11 22:53:48,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 2404352. Throughput: 0: 895.9. Samples: 599318. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:53:48,795][00403] Avg episode reward: [(0, '12.942')] [2025-02-11 22:53:51,257][02605] Updated weights for policy 0, policy_version 590 (0.0027) [2025-02-11 22:53:53,789][00403] Fps is (10 sec: 3687.3, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2424832. Throughput: 0: 951.2. Samples: 605460. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:53:53,791][00403] Avg episode reward: [(0, '13.840')] [2025-02-11 22:53:53,801][02592] Saving new best policy, reward=13.840! [2025-02-11 22:53:58,793][00403] Fps is (10 sec: 4094.6, 60 sec: 3686.2, 300 sec: 3790.5). Total num frames: 2445312. Throughput: 0: 947.3. Samples: 611320. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:53:58,795][00403] Avg episode reward: [(0, '14.693')] [2025-02-11 22:53:58,800][02592] Saving new best policy, reward=14.693! [2025-02-11 22:54:02,279][02605] Updated weights for policy 0, policy_version 600 (0.0019) [2025-02-11 22:54:03,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3776.7). Total num frames: 2461696. Throughput: 0: 933.3. Samples: 613560. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:54:03,792][00403] Avg episode reward: [(0, '15.598')] [2025-02-11 22:54:03,802][02592] Saving new best policy, reward=15.598! [2025-02-11 22:54:08,789][00403] Fps is (10 sec: 3687.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2482176. Throughput: 0: 958.2. Samples: 619818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:54:08,792][00403] Avg episode reward: [(0, '15.815')] [2025-02-11 22:54:08,801][02592] Saving new best policy, reward=15.815! [2025-02-11 22:54:13,033][02605] Updated weights for policy 0, policy_version 610 (0.0014) [2025-02-11 22:54:13,791][00403] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3762.8). Total num frames: 2498560. Throughput: 0: 930.1. Samples: 624980. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:54:13,797][00403] Avg episode reward: [(0, '14.118')] [2025-02-11 22:54:18,790][00403] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2519040. Throughput: 0: 938.4. Samples: 627772. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:54:18,792][00403] Avg episode reward: [(0, '13.242')] [2025-02-11 22:54:23,244][02605] Updated weights for policy 0, policy_version 620 (0.0015) [2025-02-11 22:54:23,789][00403] Fps is (10 sec: 4096.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2539520. Throughput: 0: 956.0. Samples: 634112. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:54:23,795][00403] Avg episode reward: [(0, '11.928')] [2025-02-11 22:54:28,789][00403] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 2555904. Throughput: 0: 925.1. Samples: 639010. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:54:28,798][00403] Avg episode reward: [(0, '11.666')] [2025-02-11 22:54:33,790][00403] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 2576384. Throughput: 0: 953.1. Samples: 642210. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:54:33,796][00403] Avg episode reward: [(0, '13.903')] [2025-02-11 22:54:33,806][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth... [2025-02-11 22:54:33,909][02592] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000407_1667072.pth [2025-02-11 22:54:34,207][02605] Updated weights for policy 0, policy_version 630 (0.0013) [2025-02-11 22:54:38,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2596864. Throughput: 0: 958.0. Samples: 648568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:54:38,798][00403] Avg episode reward: [(0, '14.362')] [2025-02-11 22:54:43,790][00403] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3762.8). Total num frames: 2613248. Throughput: 0: 936.6. Samples: 653464. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:54:43,792][00403] Avg episode reward: [(0, '15.748')] [2025-02-11 22:54:45,450][02605] Updated weights for policy 0, policy_version 640 (0.0023) [2025-02-11 22:54:48,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2633728. Throughput: 0: 955.7. Samples: 656566. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:54:48,792][00403] Avg episode reward: [(0, '16.588')] [2025-02-11 22:54:48,797][02592] Saving new best policy, reward=16.588! [2025-02-11 22:54:53,790][00403] Fps is (10 sec: 3686.2, 60 sec: 3754.6, 300 sec: 3748.9). Total num frames: 2650112. Throughput: 0: 951.9. Samples: 662656. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:54:53,792][00403] Avg episode reward: [(0, '15.490')] [2025-02-11 22:54:56,557][02605] Updated weights for policy 0, policy_version 650 (0.0012) [2025-02-11 22:54:58,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.9, 300 sec: 3776.7). Total num frames: 2670592. Throughput: 0: 948.5. Samples: 667662. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:54:58,796][00403] Avg episode reward: [(0, '15.626')] [2025-02-11 22:55:03,789][00403] Fps is (10 sec: 4096.2, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 2691072. Throughput: 0: 958.5. Samples: 670904. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:55:03,796][00403] Avg episode reward: [(0, '15.932')] [2025-02-11 22:55:06,520][02605] Updated weights for policy 0, policy_version 660 (0.0014) [2025-02-11 22:55:08,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 2707456. Throughput: 0: 942.7. Samples: 676534. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:55:08,795][00403] Avg episode reward: [(0, '16.401')] [2025-02-11 22:55:13,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3776.7). Total num frames: 2727936. Throughput: 0: 960.3. Samples: 682224. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:55:13,796][00403] Avg episode reward: [(0, '16.942')] [2025-02-11 22:55:13,802][02592] Saving new best policy, reward=16.942! [2025-02-11 22:55:17,240][02605] Updated weights for policy 0, policy_version 670 (0.0014) [2025-02-11 22:55:18,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2748416. Throughput: 0: 958.1. Samples: 685322. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:55:18,791][00403] Avg episode reward: [(0, '16.882')] [2025-02-11 22:55:23,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2764800. Throughput: 0: 930.8. Samples: 690452. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:55:23,792][00403] Avg episode reward: [(0, '17.669')] [2025-02-11 22:55:23,803][02592] Saving new best policy, reward=17.669! [2025-02-11 22:55:28,277][02605] Updated weights for policy 0, policy_version 680 (0.0013) [2025-02-11 22:55:28,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2785280. Throughput: 0: 959.3. Samples: 696634. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:55:28,795][00403] Avg episode reward: [(0, '18.832')] [2025-02-11 22:55:28,798][02592] Saving new best policy, reward=18.832! [2025-02-11 22:55:33,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 2805760. Throughput: 0: 960.3. Samples: 699780. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:55:33,794][00403] Avg episode reward: [(0, '19.248')] [2025-02-11 22:55:33,803][02592] Saving new best policy, reward=19.248! [2025-02-11 22:55:38,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2822144. Throughput: 0: 932.9. Samples: 704636. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:55:38,795][00403] Avg episode reward: [(0, '19.708')] [2025-02-11 22:55:38,798][02592] Saving new best policy, reward=19.708! [2025-02-11 22:55:39,321][02605] Updated weights for policy 0, policy_version 690 (0.0018) [2025-02-11 22:55:43,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2842624. Throughput: 0: 964.2. Samples: 711052. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:55:43,792][00403] Avg episode reward: [(0, '19.873')] [2025-02-11 22:55:43,800][02592] Saving new best policy, reward=19.873! [2025-02-11 22:55:48,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2859008. Throughput: 0: 959.8. Samples: 714094. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:55:48,793][00403] Avg episode reward: [(0, '20.584')] [2025-02-11 22:55:48,795][02592] Saving new best policy, reward=20.584! [2025-02-11 22:55:50,539][02605] Updated weights for policy 0, policy_version 700 (0.0012) [2025-02-11 22:55:53,792][00403] Fps is (10 sec: 3276.0, 60 sec: 3754.5, 300 sec: 3762.7). Total num frames: 2875392. Throughput: 0: 931.0. Samples: 718432. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:55:53,798][00403] Avg episode reward: [(0, '20.264')] [2025-02-11 22:55:58,789][00403] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 2891776. Throughput: 0: 922.4. Samples: 723734. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:55:58,792][00403] Avg episode reward: [(0, '19.727')] [2025-02-11 22:56:02,283][02605] Updated weights for policy 0, policy_version 710 (0.0018) [2025-02-11 22:56:03,789][00403] Fps is (10 sec: 3277.6, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 2908160. Throughput: 0: 917.3. Samples: 726602. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-11 22:56:03,793][00403] Avg episode reward: [(0, '20.651')] [2025-02-11 22:56:03,870][02592] Saving new best policy, reward=20.651! [2025-02-11 22:56:08,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2932736. Throughput: 0: 920.6. Samples: 731880. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:56:08,792][00403] Avg episode reward: [(0, '21.478')] [2025-02-11 22:56:08,796][02592] Saving new best policy, reward=21.478! [2025-02-11 22:56:12,618][02605] Updated weights for policy 0, policy_version 720 (0.0023) [2025-02-11 22:56:13,790][00403] Fps is (10 sec: 4505.6, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2953216. Throughput: 0: 926.6. Samples: 738332. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:56:13,792][00403] Avg episode reward: [(0, '19.226')] [2025-02-11 22:56:18,789][00403] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 2965504. Throughput: 0: 904.5. Samples: 740484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:56:18,794][00403] Avg episode reward: [(0, '19.814')] [2025-02-11 22:56:23,679][02605] Updated weights for policy 0, policy_version 730 (0.0015) [2025-02-11 22:56:23,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 2990080. Throughput: 0: 924.0. Samples: 746216. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:56:23,797][00403] Avg episode reward: [(0, '20.468')] [2025-02-11 22:56:28,790][00403] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3006464. Throughput: 0: 916.9. Samples: 752312. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:56:28,800][00403] Avg episode reward: [(0, '19.953')] [2025-02-11 22:56:33,790][00403] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3748.9). Total num frames: 3022848. Throughput: 0: 892.1. Samples: 754240. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:56:33,792][00403] Avg episode reward: [(0, '19.686')] [2025-02-11 22:56:33,798][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000738_3022848.pth... [2025-02-11 22:56:33,913][02592] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000520_2129920.pth [2025-02-11 22:56:34,997][02605] Updated weights for policy 0, policy_version 740 (0.0015) [2025-02-11 22:56:38,789][00403] Fps is (10 sec: 4096.1, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3047424. Throughput: 0: 935.2. Samples: 760514. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:56:38,796][00403] Avg episode reward: [(0, '21.209')] [2025-02-11 22:56:43,790][00403] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 3063808. Throughput: 0: 941.8. Samples: 766114. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:56:43,797][00403] Avg episode reward: [(0, '23.152')] [2025-02-11 22:56:43,809][02592] Saving new best policy, reward=23.152! [2025-02-11 22:56:46,190][02605] Updated weights for policy 0, policy_version 750 (0.0017) [2025-02-11 22:56:48,789][00403] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3748.9). Total num frames: 3080192. Throughput: 0: 925.7. Samples: 768260. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:56:48,795][00403] Avg episode reward: [(0, '22.910')] [2025-02-11 22:56:53,789][00403] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3762.8). Total num frames: 3100672. Throughput: 0: 949.8. Samples: 774620. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:56:53,793][00403] Avg episode reward: [(0, '23.895')] [2025-02-11 22:56:53,801][02592] Saving new best policy, reward=23.895! [2025-02-11 22:56:56,411][02605] Updated weights for policy 0, policy_version 760 (0.0012) [2025-02-11 22:56:58,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3117056. Throughput: 0: 918.7. Samples: 779674. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:56:58,797][00403] Avg episode reward: [(0, '24.567')] [2025-02-11 22:56:58,799][02592] Saving new best policy, reward=24.567! [2025-02-11 22:57:03,790][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3137536. Throughput: 0: 933.7. Samples: 782502. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:57:03,792][00403] Avg episode reward: [(0, '24.623')] [2025-02-11 22:57:03,800][02592] Saving new best policy, reward=24.623! [2025-02-11 22:57:07,089][02605] Updated weights for policy 0, policy_version 770 (0.0020) [2025-02-11 22:57:08,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3158016. Throughput: 0: 949.5. Samples: 788942. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-11 22:57:08,797][00403] Avg episode reward: [(0, '24.637')] [2025-02-11 22:57:08,803][02592] Saving new best policy, reward=24.637! [2025-02-11 22:57:13,790][00403] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3735.0). Total num frames: 3174400. Throughput: 0: 922.9. Samples: 793842. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:57:13,795][00403] Avg episode reward: [(0, '23.247')] [2025-02-11 22:57:18,116][02605] Updated weights for policy 0, policy_version 780 (0.0012) [2025-02-11 22:57:18,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3194880. Throughput: 0: 948.9. Samples: 796940. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:57:18,795][00403] Avg episode reward: [(0, '21.562')] [2025-02-11 22:57:23,789][00403] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3215360. Throughput: 0: 955.3. Samples: 803504. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:57:23,792][00403] Avg episode reward: [(0, '21.182')] [2025-02-11 22:57:28,794][00403] Fps is (10 sec: 3684.7, 60 sec: 3754.4, 300 sec: 3735.0). Total num frames: 3231744. Throughput: 0: 941.5. Samples: 808486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:57:28,798][00403] Avg episode reward: [(0, '18.622')] [2025-02-11 22:57:28,867][02605] Updated weights for policy 0, policy_version 790 (0.0020) [2025-02-11 22:57:33,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3256320. Throughput: 0: 967.3. Samples: 811788. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:57:33,796][00403] Avg episode reward: [(0, '18.667')] [2025-02-11 22:57:38,789][00403] Fps is (10 sec: 4097.8, 60 sec: 3754.7, 300 sec: 3748.9). Total num frames: 3272704. Throughput: 0: 964.9. Samples: 818042. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:57:38,794][00403] Avg episode reward: [(0, '18.958')] [2025-02-11 22:57:39,312][02605] Updated weights for policy 0, policy_version 800 (0.0013) [2025-02-11 22:57:43,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3762.8). Total num frames: 3293184. Throughput: 0: 967.3. Samples: 823204. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:57:43,794][00403] Avg episode reward: [(0, '19.735')] [2025-02-11 22:57:48,790][00403] Fps is (10 sec: 4095.7, 60 sec: 3891.1, 300 sec: 3776.6). Total num frames: 3313664. Throughput: 0: 975.7. Samples: 826408. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:57:48,792][00403] Avg episode reward: [(0, '20.740')] [2025-02-11 22:57:49,215][02605] Updated weights for policy 0, policy_version 810 (0.0013) [2025-02-11 22:57:53,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3748.9). Total num frames: 3330048. Throughput: 0: 960.6. Samples: 832170. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:57:53,794][00403] Avg episode reward: [(0, '21.071')] [2025-02-11 22:57:58,789][00403] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3762.8). Total num frames: 3350528. Throughput: 0: 978.5. Samples: 837872. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 22:57:58,796][00403] Avg episode reward: [(0, '24.670')] [2025-02-11 22:57:58,801][02592] Saving new best policy, reward=24.670! [2025-02-11 22:58:00,221][02605] Updated weights for policy 0, policy_version 820 (0.0016) [2025-02-11 22:58:03,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3371008. Throughput: 0: 980.4. Samples: 841058. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:58:03,794][00403] Avg episode reward: [(0, '24.389')] [2025-02-11 22:58:08,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3387392. Throughput: 0: 951.0. Samples: 846298. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:58:08,793][00403] Avg episode reward: [(0, '23.514')] [2025-02-11 22:58:11,111][02605] Updated weights for policy 0, policy_version 830 (0.0013) [2025-02-11 22:58:13,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3407872. Throughput: 0: 978.4. Samples: 852510. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:58:13,791][00403] Avg episode reward: [(0, '24.201')] [2025-02-11 22:58:18,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3428352. Throughput: 0: 973.6. Samples: 855602. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:58:18,796][00403] Avg episode reward: [(0, '25.055')] [2025-02-11 22:58:18,800][02592] Saving new best policy, reward=25.055! [2025-02-11 22:58:22,171][02605] Updated weights for policy 0, policy_version 840 (0.0012) [2025-02-11 22:58:23,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 3444736. Throughput: 0: 943.6. Samples: 860502. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:58:23,800][00403] Avg episode reward: [(0, '23.543')] [2025-02-11 22:58:28,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3959.8, 300 sec: 3804.4). Total num frames: 3469312. Throughput: 0: 975.2. Samples: 867090. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:58:28,794][00403] Avg episode reward: [(0, '23.152')] [2025-02-11 22:58:31,536][02605] Updated weights for policy 0, policy_version 850 (0.0014) [2025-02-11 22:58:33,791][00403] Fps is (10 sec: 4095.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3485696. Throughput: 0: 978.2. Samples: 870428. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:58:33,795][00403] Avg episode reward: [(0, '23.896')] [2025-02-11 22:58:33,801][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000851_3485696.pth... [2025-02-11 22:58:33,934][02592] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000629_2576384.pth [2025-02-11 22:58:38,790][00403] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3790.6). Total num frames: 3506176. Throughput: 0: 957.2. Samples: 875244. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:58:38,792][00403] Avg episode reward: [(0, '23.765')] [2025-02-11 22:58:43,325][02605] Updated weights for policy 0, policy_version 860 (0.0013) [2025-02-11 22:58:43,789][00403] Fps is (10 sec: 3686.8, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3522560. Throughput: 0: 954.3. Samples: 880814. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:58:43,792][00403] Avg episode reward: [(0, '23.473')] [2025-02-11 22:58:48,789][00403] Fps is (10 sec: 2867.3, 60 sec: 3686.5, 300 sec: 3762.8). Total num frames: 3534848. Throughput: 0: 931.6. Samples: 882982. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:58:48,794][00403] Avg episode reward: [(0, '23.540')] [2025-02-11 22:58:53,789][00403] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3762.8). Total num frames: 3555328. Throughput: 0: 932.8. Samples: 888276. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:58:53,793][00403] Avg episode reward: [(0, '23.645')] [2025-02-11 22:58:54,903][02605] Updated weights for policy 0, policy_version 870 (0.0017) [2025-02-11 22:58:58,789][00403] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3579904. Throughput: 0: 941.1. Samples: 894860. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:58:58,793][00403] Avg episode reward: [(0, '23.522')] [2025-02-11 22:59:03,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3596288. Throughput: 0: 925.9. Samples: 897266. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:59:03,796][00403] Avg episode reward: [(0, '23.788')] [2025-02-11 22:59:05,689][02605] Updated weights for policy 0, policy_version 880 (0.0013) [2025-02-11 22:59:08,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3616768. Throughput: 0: 946.6. Samples: 903098. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:59:08,797][00403] Avg episode reward: [(0, '23.323')] [2025-02-11 22:59:13,790][00403] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3637248. Throughput: 0: 941.7. Samples: 909468. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:59:13,792][00403] Avg episode reward: [(0, '23.347')] [2025-02-11 22:59:16,815][02605] Updated weights for policy 0, policy_version 890 (0.0013) [2025-02-11 22:59:18,790][00403] Fps is (10 sec: 3686.1, 60 sec: 3754.6, 300 sec: 3776.6). Total num frames: 3653632. Throughput: 0: 909.9. Samples: 911374. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:59:18,797][00403] Avg episode reward: [(0, '23.984')] [2025-02-11 22:59:23,790][00403] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3674112. Throughput: 0: 939.0. Samples: 917498. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:59:23,798][00403] Avg episode reward: [(0, '24.668')] [2025-02-11 22:59:26,348][02605] Updated weights for policy 0, policy_version 900 (0.0019) [2025-02-11 22:59:28,789][00403] Fps is (10 sec: 3686.7, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 3690496. Throughput: 0: 942.9. Samples: 923246. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:59:28,793][00403] Avg episode reward: [(0, '25.244')] [2025-02-11 22:59:28,809][02592] Saving new best policy, reward=25.244! [2025-02-11 22:59:33,790][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3710976. Throughput: 0: 944.2. Samples: 925470. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:59:33,792][00403] Avg episode reward: [(0, '24.715')] [2025-02-11 22:59:37,719][02605] Updated weights for policy 0, policy_version 910 (0.0013) [2025-02-11 22:59:38,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 3731456. Throughput: 0: 966.1. Samples: 931752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-02-11 22:59:38,796][00403] Avg episode reward: [(0, '25.196')] [2025-02-11 22:59:43,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3747840. Throughput: 0: 937.6. Samples: 937054. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:59:43,794][00403] Avg episode reward: [(0, '25.317')] [2025-02-11 22:59:43,808][02592] Saving new best policy, reward=25.317! [2025-02-11 22:59:48,698][02605] Updated weights for policy 0, policy_version 920 (0.0015) [2025-02-11 22:59:48,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3768320. Throughput: 0: 945.2. Samples: 939800. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 22:59:48,800][00403] Avg episode reward: [(0, '24.621')] [2025-02-11 22:59:53,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 3788800. Throughput: 0: 959.0. Samples: 946252. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:59:53,792][00403] Avg episode reward: [(0, '24.544')] [2025-02-11 22:59:58,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3805184. Throughput: 0: 928.0. Samples: 951228. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 22:59:58,792][00403] Avg episode reward: [(0, '24.333')] [2025-02-11 22:59:59,476][02605] Updated weights for policy 0, policy_version 930 (0.0019) [2025-02-11 23:00:03,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3825664. Throughput: 0: 958.0. Samples: 954482. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 23:00:03,797][00403] Avg episode reward: [(0, '26.230')] [2025-02-11 23:00:03,810][02592] Saving new best policy, reward=26.230! [2025-02-11 23:00:08,790][00403] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3846144. Throughput: 0: 965.1. Samples: 960928. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 23:00:08,795][00403] Avg episode reward: [(0, '25.080')] [2025-02-11 23:00:09,876][02605] Updated weights for policy 0, policy_version 940 (0.0014) [2025-02-11 23:00:13,789][00403] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 3862528. Throughput: 0: 947.4. Samples: 965880. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 23:00:13,797][00403] Avg episode reward: [(0, '24.410')] [2025-02-11 23:00:18,789][00403] Fps is (10 sec: 3686.5, 60 sec: 3823.0, 300 sec: 3790.5). Total num frames: 3883008. Throughput: 0: 969.6. Samples: 969102. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 23:00:18,797][00403] Avg episode reward: [(0, '22.407')] [2025-02-11 23:00:19,974][02605] Updated weights for policy 0, policy_version 950 (0.0012) [2025-02-11 23:00:23,789][00403] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 3903488. Throughput: 0: 967.6. Samples: 975292. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 23:00:23,792][00403] Avg episode reward: [(0, '21.679')] [2025-02-11 23:00:28,790][00403] Fps is (10 sec: 3686.2, 60 sec: 3822.9, 300 sec: 3776.6). Total num frames: 3919872. Throughput: 0: 966.4. Samples: 980542. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-02-11 23:00:28,792][00403] Avg episode reward: [(0, '21.095')] [2025-02-11 23:00:30,849][02605] Updated weights for policy 0, policy_version 960 (0.0017) [2025-02-11 23:00:33,790][00403] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3944448. Throughput: 0: 977.9. Samples: 983806. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 23:00:33,798][00403] Avg episode reward: [(0, '21.907')] [2025-02-11 23:00:33,811][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000963_3944448.pth... [2025-02-11 23:00:33,911][02592] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000738_3022848.pth [2025-02-11 23:00:38,790][00403] Fps is (10 sec: 3686.6, 60 sec: 3754.7, 300 sec: 3776.6). Total num frames: 3956736. Throughput: 0: 961.0. Samples: 989496. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 23:00:38,796][00403] Avg episode reward: [(0, '22.162')] [2025-02-11 23:00:41,617][02605] Updated weights for policy 0, policy_version 970 (0.0017) [2025-02-11 23:00:43,789][00403] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 3981312. Throughput: 0: 980.0. Samples: 995330. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-02-11 23:00:43,795][00403] Avg episode reward: [(0, '23.806')] [2025-02-11 23:00:48,789][00403] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 4001792. Throughput: 0: 980.5. Samples: 998606. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-02-11 23:00:48,792][00403] Avg episode reward: [(0, '24.151')] [2025-02-11 23:00:49,344][02592] Stopping Batcher_0... [2025-02-11 23:00:49,345][02592] Loop batcher_evt_loop terminating... [2025-02-11 23:00:49,344][00403] Component Batcher_0 stopped! [2025-02-11 23:00:49,350][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-11 23:00:49,347][00403] Component RolloutWorker_w3 process died already! Don't wait for it. [2025-02-11 23:00:49,352][00403] Component RolloutWorker_w4 process died already! Don't wait for it. [2025-02-11 23:00:49,354][00403] Component RolloutWorker_w6 process died already! Don't wait for it. [2025-02-11 23:00:49,358][00403] Component RolloutWorker_w7 process died already! Don't wait for it. [2025-02-11 23:00:49,459][02605] Weights refcount: 2 0 [2025-02-11 23:00:49,467][00403] Component InferenceWorker_p0-w0 stopped! [2025-02-11 23:00:49,469][02605] Stopping InferenceWorker_p0-w0... [2025-02-11 23:00:49,469][02605] Loop inference_proc0-0_evt_loop terminating... [2025-02-11 23:00:49,483][02592] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000851_3485696.pth [2025-02-11 23:00:49,507][02592] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-11 23:00:49,679][02592] Stopping LearnerWorker_p0... [2025-02-11 23:00:49,680][02592] Loop learner_proc0_evt_loop terminating... [2025-02-11 23:00:49,687][00403] Component LearnerWorker_p0 stopped! [2025-02-11 23:00:49,949][02611] Stopping RolloutWorker_w5... [2025-02-11 23:00:49,948][00403] Component RolloutWorker_w5 stopped! [2025-02-11 23:00:49,958][00403] Component RolloutWorker_w1 stopped! [2025-02-11 23:00:49,960][02607] Stopping RolloutWorker_w1... [2025-02-11 23:00:49,961][02607] Loop rollout_proc1_evt_loop terminating... [2025-02-11 23:00:49,953][02611] Loop rollout_proc5_evt_loop terminating... [2025-02-11 23:00:50,059][00403] Component RolloutWorker_w2 stopped! [2025-02-11 23:00:50,058][02608] Stopping RolloutWorker_w2... [2025-02-11 23:00:50,070][02608] Loop rollout_proc2_evt_loop terminating... [2025-02-11 23:00:50,102][02606] Stopping RolloutWorker_w0... [2025-02-11 23:00:50,103][00403] Component RolloutWorker_w0 stopped! [2025-02-11 23:00:50,105][00403] Waiting for process learner_proc0 to stop... [2025-02-11 23:00:50,104][02606] Loop rollout_proc0_evt_loop terminating... [2025-02-11 23:00:51,875][00403] Waiting for process inference_proc0-0 to join... [2025-02-11 23:00:51,878][00403] Waiting for process rollout_proc0 to join... [2025-02-11 23:00:53,131][00403] Waiting for process rollout_proc1 to join... [2025-02-11 23:00:53,136][00403] Waiting for process rollout_proc2 to join... [2025-02-11 23:00:53,138][00403] Waiting for process rollout_proc3 to join... [2025-02-11 23:00:53,140][00403] Waiting for process rollout_proc4 to join... [2025-02-11 23:00:53,141][00403] Waiting for process rollout_proc5 to join... [2025-02-11 23:00:53,143][00403] Waiting for process rollout_proc6 to join... [2025-02-11 23:00:53,144][00403] Waiting for process rollout_proc7 to join... [2025-02-11 23:00:53,145][00403] Batcher 0 profile tree view: batching: 22.0685, releasing_batches: 0.0256 [2025-02-11 23:00:53,147][00403] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0046 wait_policy_total: 412.1148 update_model: 9.4380 weight_update: 0.0033 one_step: 0.0170 handle_policy_step: 604.3865 deserialize: 14.7351, stack: 3.8250, obs_to_device_normalize: 136.1266, forward: 315.1475, send_messages: 22.8283 prepare_outputs: 85.0467 to_cpu: 53.3859 [2025-02-11 23:00:53,149][00403] Learner 0 profile tree view: misc: 0.0039, prepare_batch: 12.0250 train: 66.5571 epoch_init: 0.0070, minibatch_init: 0.0056, losses_postprocess: 0.5468, kl_divergence: 0.5216, after_optimizer: 32.1451 calculate_losses: 22.5719 losses_init: 0.0090, forward_head: 1.2521, bptt_initial: 15.4799, tail: 0.9039, advantages_returns: 0.2161, losses: 2.8667 bptt: 1.6160 bptt_forward_core: 1.5476 update: 10.2957 clip: 0.8468 [2025-02-11 23:00:53,159][00403] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3923, enqueue_policy_requests: 147.6949, env_step: 742.5564, overhead: 17.8267, complete_rollouts: 6.4000 save_policy_outputs: 24.3030 split_output_tensors: 9.3499 [2025-02-11 23:00:53,161][00403] Loop Runner_EvtLoop terminating... [2025-02-11 23:00:53,162][00403] Runner profile tree view: main_loop: 1090.1572 [2025-02-11 23:00:53,163][00403] Collected {0: 4005888}, FPS: 3674.6 [2025-02-11 23:00:53,566][00403] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-11 23:00:53,568][00403] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-11 23:00:53,570][00403] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-11 23:00:53,573][00403] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-11 23:00:53,575][00403] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-11 23:00:53,576][00403] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-11 23:00:53,577][00403] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-02-11 23:00:53,579][00403] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-11 23:00:53,580][00403] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-02-11 23:00:53,582][00403] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-02-11 23:00:53,585][00403] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-11 23:00:53,588][00403] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-11 23:00:53,589][00403] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-11 23:00:53,594][00403] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-11 23:00:53,597][00403] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-11 23:00:53,632][00403] Doom resolution: 160x120, resize resolution: (128, 72) [2025-02-11 23:00:53,637][00403] RunningMeanStd input shape: (3, 72, 128) [2025-02-11 23:00:53,643][00403] RunningMeanStd input shape: (1,) [2025-02-11 23:00:53,661][00403] ConvEncoder: input_channels=3 [2025-02-11 23:00:53,790][00403] Conv encoder output size: 512 [2025-02-11 23:00:53,792][00403] Policy head output size: 512 [2025-02-11 23:00:53,958][00403] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-11 23:00:54,705][00403] Num frames 100... [2025-02-11 23:00:54,841][00403] Num frames 200... [2025-02-11 23:00:54,970][00403] Num frames 300... [2025-02-11 23:00:55,101][00403] Num frames 400... [2025-02-11 23:00:55,237][00403] Num frames 500... [2025-02-11 23:00:55,367][00403] Num frames 600... [2025-02-11 23:00:55,504][00403] Num frames 700... [2025-02-11 23:00:55,695][00403] Avg episode rewards: #0: 19.890, true rewards: #0: 7.890 [2025-02-11 23:00:55,696][00403] Avg episode reward: 19.890, avg true_objective: 7.890 [2025-02-11 23:00:55,716][00403] Num frames 800... [2025-02-11 23:00:55,864][00403] Num frames 900... [2025-02-11 23:00:55,997][00403] Num frames 1000... [2025-02-11 23:00:56,140][00403] Num frames 1100... [2025-02-11 23:00:56,270][00403] Num frames 1200... [2025-02-11 23:00:56,399][00403] Num frames 1300... [2025-02-11 23:00:56,530][00403] Num frames 1400... [2025-02-11 23:00:56,671][00403] Num frames 1500... [2025-02-11 23:00:56,811][00403] Num frames 1600... [2025-02-11 23:00:56,958][00403] Num frames 1700... [2025-02-11 23:00:57,087][00403] Num frames 1800... [2025-02-11 23:00:57,218][00403] Num frames 1900... [2025-02-11 23:00:57,353][00403] Num frames 2000... [2025-02-11 23:00:57,482][00403] Num frames 2100... [2025-02-11 23:00:57,618][00403] Num frames 2200... [2025-02-11 23:00:57,748][00403] Num frames 2300... [2025-02-11 23:00:57,890][00403] Num frames 2400... [2025-02-11 23:00:58,025][00403] Num frames 2500... [2025-02-11 23:00:58,159][00403] Num frames 2600... [2025-02-11 23:00:58,317][00403] Avg episode rewards: #0: 35.395, true rewards: #0: 13.395 [2025-02-11 23:00:58,320][00403] Avg episode reward: 35.395, avg true_objective: 13.395 [2025-02-11 23:00:58,352][00403] Num frames 2700... [2025-02-11 23:00:58,483][00403] Num frames 2800... [2025-02-11 23:00:58,618][00403] Num frames 2900... [2025-02-11 23:00:58,750][00403] Num frames 3000... [2025-02-11 23:00:58,883][00403] Num frames 3100... [2025-02-11 23:00:59,019][00403] Num frames 3200... [2025-02-11 23:00:59,152][00403] Num frames 3300... [2025-02-11 23:00:59,282][00403] Num frames 3400... [2025-02-11 23:00:59,412][00403] Num frames 3500... [2025-02-11 23:00:59,544][00403] Num frames 3600... [2025-02-11 23:00:59,720][00403] Avg episode rewards: #0: 31.936, true rewards: #0: 12.270 [2025-02-11 23:00:59,722][00403] Avg episode reward: 31.936, avg true_objective: 12.270 [2025-02-11 23:00:59,750][00403] Num frames 3700... [2025-02-11 23:00:59,885][00403] Num frames 3800... [2025-02-11 23:01:00,023][00403] Num frames 3900... [2025-02-11 23:01:00,155][00403] Num frames 4000... [2025-02-11 23:01:00,298][00403] Num frames 4100... [2025-02-11 23:01:00,440][00403] Num frames 4200... [2025-02-11 23:01:00,581][00403] Num frames 4300... [2025-02-11 23:01:00,672][00403] Avg episode rewards: #0: 26.802, true rewards: #0: 10.802 [2025-02-11 23:01:00,673][00403] Avg episode reward: 26.802, avg true_objective: 10.802 [2025-02-11 23:01:00,776][00403] Num frames 4400... [2025-02-11 23:01:00,905][00403] Num frames 4500... [2025-02-11 23:01:01,047][00403] Num frames 4600... [2025-02-11 23:01:01,179][00403] Num frames 4700... [2025-02-11 23:01:01,311][00403] Num frames 4800... [2025-02-11 23:01:01,440][00403] Num frames 4900... [2025-02-11 23:01:01,572][00403] Num frames 5000... [2025-02-11 23:01:01,713][00403] Num frames 5100... [2025-02-11 23:01:01,851][00403] Num frames 5200... [2025-02-11 23:01:01,995][00403] Num frames 5300... [2025-02-11 23:01:02,178][00403] Num frames 5400... [2025-02-11 23:01:02,349][00403] Num frames 5500... [2025-02-11 23:01:02,523][00403] Num frames 5600... [2025-02-11 23:01:02,585][00403] Avg episode rewards: #0: 27.002, true rewards: #0: 11.202 [2025-02-11 23:01:02,591][00403] Avg episode reward: 27.002, avg true_objective: 11.202 [2025-02-11 23:01:02,771][00403] Num frames 5700... [2025-02-11 23:01:02,940][00403] Num frames 5800... [2025-02-11 23:01:03,116][00403] Num frames 5900... [2025-02-11 23:01:03,289][00403] Num frames 6000... [2025-02-11 23:01:03,467][00403] Num frames 6100... [2025-02-11 23:01:03,659][00403] Num frames 6200... [2025-02-11 23:01:03,839][00403] Num frames 6300... [2025-02-11 23:01:04,018][00403] Num frames 6400... [2025-02-11 23:01:04,166][00403] Num frames 6500... [2025-02-11 23:01:04,294][00403] Num frames 6600... [2025-02-11 23:01:04,420][00403] Num frames 6700... [2025-02-11 23:01:04,550][00403] Num frames 6800... [2025-02-11 23:01:04,629][00403] Avg episode rewards: #0: 27.362, true rewards: #0: 11.362 [2025-02-11 23:01:04,630][00403] Avg episode reward: 27.362, avg true_objective: 11.362 [2025-02-11 23:01:04,739][00403] Num frames 6900... [2025-02-11 23:01:04,869][00403] Num frames 7000... [2025-02-11 23:01:04,999][00403] Num frames 7100... [2025-02-11 23:01:05,139][00403] Num frames 7200... [2025-02-11 23:01:05,240][00403] Avg episode rewards: #0: 24.333, true rewards: #0: 10.333 [2025-02-11 23:01:05,241][00403] Avg episode reward: 24.333, avg true_objective: 10.333 [2025-02-11 23:01:05,328][00403] Num frames 7300... [2025-02-11 23:01:05,460][00403] Num frames 7400... [2025-02-11 23:01:05,633][00403] Avg episode rewards: #0: 21.611, true rewards: #0: 9.361 [2025-02-11 23:01:05,635][00403] Avg episode reward: 21.611, avg true_objective: 9.361 [2025-02-11 23:01:05,651][00403] Num frames 7500... [2025-02-11 23:01:05,789][00403] Num frames 7600... [2025-02-11 23:01:05,920][00403] Num frames 7700... [2025-02-11 23:01:06,047][00403] Num frames 7800... [2025-02-11 23:01:06,190][00403] Num frames 7900... [2025-02-11 23:01:06,357][00403] Avg episode rewards: #0: 20.098, true rewards: #0: 8.876 [2025-02-11 23:01:06,359][00403] Avg episode reward: 20.098, avg true_objective: 8.876 [2025-02-11 23:01:06,378][00403] Num frames 8000... [2025-02-11 23:01:06,508][00403] Num frames 8100... [2025-02-11 23:01:06,644][00403] Num frames 8200... [2025-02-11 23:01:06,769][00403] Num frames 8300... [2025-02-11 23:01:06,899][00403] Num frames 8400... [2025-02-11 23:01:07,023][00403] Num frames 8500... [2025-02-11 23:01:07,155][00403] Num frames 8600... [2025-02-11 23:01:07,276][00403] Avg episode rewards: #0: 19.545, true rewards: #0: 8.645 [2025-02-11 23:01:07,277][00403] Avg episode reward: 19.545, avg true_objective: 8.645 [2025-02-11 23:01:59,403][00403] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-02-11 23:01:59,891][00403] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-11 23:01:59,893][00403] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-11 23:01:59,895][00403] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-11 23:01:59,896][00403] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-11 23:01:59,897][00403] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-11 23:01:59,898][00403] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-11 23:01:59,899][00403] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-02-11 23:01:59,899][00403] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-11 23:01:59,900][00403] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-02-11 23:01:59,901][00403] Adding new argument 'hf_repository'='ThomasSimonini/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-02-11 23:01:59,902][00403] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-11 23:01:59,903][00403] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-11 23:01:59,904][00403] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-11 23:01:59,905][00403] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-11 23:01:59,905][00403] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-11 23:01:59,947][00403] RunningMeanStd input shape: (3, 72, 128) [2025-02-11 23:01:59,950][00403] RunningMeanStd input shape: (1,) [2025-02-11 23:01:59,965][00403] ConvEncoder: input_channels=3 [2025-02-11 23:02:00,021][00403] Conv encoder output size: 512 [2025-02-11 23:02:00,024][00403] Policy head output size: 512 [2025-02-11 23:02:00,053][00403] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-11 23:02:00,735][00403] Num frames 100... [2025-02-11 23:02:00,900][00403] Num frames 200... [2025-02-11 23:02:01,063][00403] Num frames 300... [2025-02-11 23:02:01,223][00403] Num frames 400... [2025-02-11 23:02:01,411][00403] Num frames 500... [2025-02-11 23:02:01,574][00403] Num frames 600... [2025-02-11 23:02:01,745][00403] Num frames 700... [2025-02-11 23:02:01,912][00403] Num frames 800... [2025-02-11 23:02:02,086][00403] Num frames 900... [2025-02-11 23:02:02,256][00403] Num frames 1000... [2025-02-11 23:02:02,409][00403] Avg episode rewards: #0: 28.600, true rewards: #0: 10.600 [2025-02-11 23:02:02,411][00403] Avg episode reward: 28.600, avg true_objective: 10.600 [2025-02-11 23:02:02,495][00403] Num frames 1100... [2025-02-11 23:02:02,702][00403] Num frames 1200... [2025-02-11 23:02:02,887][00403] Num frames 1300... [2025-02-11 23:02:03,082][00403] Num frames 1400... [2025-02-11 23:02:03,264][00403] Num frames 1500... [2025-02-11 23:02:03,441][00403] Num frames 1600... [2025-02-11 23:02:03,635][00403] Num frames 1700... [2025-02-11 23:02:03,829][00403] Num frames 1800... [2025-02-11 23:02:04,014][00403] Num frames 1900... [2025-02-11 23:02:04,201][00403] Num frames 2000... [2025-02-11 23:02:04,323][00403] Avg episode rewards: #0: 26.180, true rewards: #0: 10.180 [2025-02-11 23:02:04,325][00403] Avg episode reward: 26.180, avg true_objective: 10.180 [2025-02-11 23:02:04,448][00403] Num frames 2100... [2025-02-11 23:02:04,649][00403] Num frames 2200... [2025-02-11 23:02:04,833][00403] Num frames 2300... [2025-02-11 23:02:04,961][00403] Num frames 2400... [2025-02-11 23:02:05,092][00403] Num frames 2500... [2025-02-11 23:02:05,223][00403] Num frames 2600... [2025-02-11 23:02:05,352][00403] Num frames 2700... [2025-02-11 23:02:05,481][00403] Num frames 2800... [2025-02-11 23:02:05,625][00403] Num frames 2900... [2025-02-11 23:02:05,712][00403] Avg episode rewards: #0: 23.740, true rewards: #0: 9.740 [2025-02-11 23:02:05,713][00403] Avg episode reward: 23.740, avg true_objective: 9.740 [2025-02-11 23:02:05,816][00403] Num frames 3000... [2025-02-11 23:02:05,948][00403] Num frames 3100... [2025-02-11 23:02:06,075][00403] Num frames 3200... [2025-02-11 23:02:06,202][00403] Num frames 3300... [2025-02-11 23:02:06,328][00403] Num frames 3400... [2025-02-11 23:02:06,427][00403] Avg episode rewards: #0: 19.835, true rewards: #0: 8.585 [2025-02-11 23:02:06,428][00403] Avg episode reward: 19.835, avg true_objective: 8.585 [2025-02-11 23:02:06,517][00403] Num frames 3500... [2025-02-11 23:02:06,658][00403] Num frames 3600... [2025-02-11 23:02:06,788][00403] Num frames 3700... [2025-02-11 23:02:06,916][00403] Num frames 3800... [2025-02-11 23:02:07,035][00403] Avg episode rewards: #0: 17.100, true rewards: #0: 7.700 [2025-02-11 23:02:07,038][00403] Avg episode reward: 17.100, avg true_objective: 7.700 [2025-02-11 23:02:07,103][00403] Num frames 3900... [2025-02-11 23:02:07,257][00403] Num frames 4000... [2025-02-11 23:02:07,436][00403] Num frames 4100... [2025-02-11 23:02:07,612][00403] Num frames 4200... [2025-02-11 23:02:07,796][00403] Num frames 4300... [2025-02-11 23:02:08,016][00403] Avg episode rewards: #0: 15.490, true rewards: #0: 7.323 [2025-02-11 23:02:08,018][00403] Avg episode reward: 15.490, avg true_objective: 7.323 [2025-02-11 23:02:08,034][00403] Num frames 4400... [2025-02-11 23:02:08,207][00403] Num frames 4500... [2025-02-11 23:02:08,367][00403] Num frames 4600... [2025-02-11 23:02:08,535][00403] Num frames 4700... [2025-02-11 23:02:08,715][00403] Num frames 4800... [2025-02-11 23:02:08,892][00403] Num frames 4900... [2025-02-11 23:02:09,075][00403] Num frames 5000... [2025-02-11 23:02:09,192][00403] Avg episode rewards: #0: 15.477, true rewards: #0: 7.191 [2025-02-11 23:02:09,195][00403] Avg episode reward: 15.477, avg true_objective: 7.191 [2025-02-11 23:02:09,310][00403] Num frames 5100... [2025-02-11 23:02:09,439][00403] Num frames 5200... [2025-02-11 23:02:09,568][00403] Num frames 5300... [2025-02-11 23:02:09,705][00403] Num frames 5400... [2025-02-11 23:02:09,843][00403] Num frames 5500... [2025-02-11 23:02:10,001][00403] Avg episode rewards: #0: 14.848, true rewards: #0: 6.972 [2025-02-11 23:02:10,003][00403] Avg episode reward: 14.848, avg true_objective: 6.972 [2025-02-11 23:02:10,033][00403] Num frames 5600... [2025-02-11 23:02:10,177][00403] Num frames 5700... [2025-02-11 23:02:10,321][00403] Num frames 5800... [2025-02-11 23:02:10,454][00403] Num frames 5900... [2025-02-11 23:02:10,585][00403] Num frames 6000... [2025-02-11 23:02:10,721][00403] Num frames 6100... [2025-02-11 23:02:10,858][00403] Num frames 6200... [2025-02-11 23:02:10,986][00403] Num frames 6300... [2025-02-11 23:02:11,116][00403] Num frames 6400... [2025-02-11 23:02:11,269][00403] Avg episode rewards: #0: 15.417, true rewards: #0: 7.194 [2025-02-11 23:02:11,271][00403] Avg episode reward: 15.417, avg true_objective: 7.194 [2025-02-11 23:02:11,305][00403] Num frames 6500... [2025-02-11 23:02:11,432][00403] Num frames 6600... [2025-02-11 23:02:11,561][00403] Num frames 6700... [2025-02-11 23:02:11,700][00403] Num frames 6800... [2025-02-11 23:02:11,837][00403] Num frames 6900... [2025-02-11 23:02:11,975][00403] Num frames 7000... [2025-02-11 23:02:12,056][00403] Avg episode rewards: #0: 15.019, true rewards: #0: 7.019 [2025-02-11 23:02:12,057][00403] Avg episode reward: 15.019, avg true_objective: 7.019 [2025-02-11 23:02:52,951][00403] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-02-11 23:03:05,283][00403] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-02-11 23:03:05,284][00403] Overriding arg 'num_workers' with value 1 passed from command line [2025-02-11 23:03:05,286][00403] Adding new argument 'no_render'=True that is not in the saved config file! [2025-02-11 23:03:05,288][00403] Adding new argument 'save_video'=True that is not in the saved config file! [2025-02-11 23:03:05,290][00403] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-02-11 23:03:05,292][00403] Adding new argument 'video_name'=None that is not in the saved config file! [2025-02-11 23:03:05,293][00403] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-02-11 23:03:05,295][00403] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-02-11 23:03:05,296][00403] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-02-11 23:03:05,297][00403] Adding new argument 'hf_repository'='Nfanlo/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-02-11 23:03:05,298][00403] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-02-11 23:03:05,303][00403] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-02-11 23:03:05,304][00403] Adding new argument 'train_script'=None that is not in the saved config file! [2025-02-11 23:03:05,304][00403] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-02-11 23:03:05,305][00403] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-02-11 23:03:05,337][00403] RunningMeanStd input shape: (3, 72, 128) [2025-02-11 23:03:05,339][00403] RunningMeanStd input shape: (1,) [2025-02-11 23:03:05,353][00403] ConvEncoder: input_channels=3 [2025-02-11 23:03:05,385][00403] Conv encoder output size: 512 [2025-02-11 23:03:05,386][00403] Policy head output size: 512 [2025-02-11 23:03:05,404][00403] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-02-11 23:03:05,862][00403] Num frames 100... [2025-02-11 23:03:05,994][00403] Num frames 200... [2025-02-11 23:03:06,141][00403] Num frames 300... [2025-02-11 23:03:06,339][00403] Num frames 400... [2025-02-11 23:03:06,642][00403] Num frames 500... [2025-02-11 23:03:06,834][00403] Num frames 600... [2025-02-11 23:03:06,963][00403] Num frames 700... [2025-02-11 23:03:07,095][00403] Num frames 800... [2025-02-11 23:03:07,226][00403] Num frames 900... [2025-02-11 23:03:07,362][00403] Avg episode rewards: #0: 22.600, true rewards: #0: 9.600 [2025-02-11 23:03:07,363][00403] Avg episode reward: 22.600, avg true_objective: 9.600 [2025-02-11 23:03:07,424][00403] Num frames 1000... [2025-02-11 23:03:07,561][00403] Num frames 1100... [2025-02-11 23:03:07,702][00403] Num frames 1200... [2025-02-11 23:03:07,835][00403] Num frames 1300... [2025-02-11 23:03:07,916][00403] Avg episode rewards: #0: 14.595, true rewards: #0: 6.595 [2025-02-11 23:03:07,918][00403] Avg episode reward: 14.595, avg true_objective: 6.595 [2025-02-11 23:03:08,028][00403] Num frames 1400... [2025-02-11 23:03:08,166][00403] Num frames 1500... [2025-02-11 23:03:08,301][00403] Num frames 1600... [2025-02-11 23:03:08,431][00403] Num frames 1700... [2025-02-11 23:03:08,564][00403] Num frames 1800... [2025-02-11 23:03:08,715][00403] Num frames 1900... [2025-02-11 23:03:08,892][00403] Num frames 2000... [2025-02-11 23:03:09,074][00403] Num frames 2100... [2025-02-11 23:03:09,245][00403] Num frames 2200... [2025-02-11 23:03:09,388][00403] Avg episode rewards: #0: 15.500, true rewards: #0: 7.500 [2025-02-11 23:03:09,391][00403] Avg episode reward: 15.500, avg true_objective: 7.500 [2025-02-11 23:03:09,478][00403] Num frames 2300... [2025-02-11 23:03:09,663][00403] Num frames 2400... [2025-02-11 23:03:09,835][00403] Num frames 2500... [2025-02-11 23:03:10,000][00403] Num frames 2600... [2025-02-11 23:03:10,176][00403] Num frames 2700... [2025-02-11 23:03:10,402][00403] Avg episode rewards: #0: 13.485, true rewards: #0: 6.985 [2025-02-11 23:03:10,404][00403] Avg episode reward: 13.485, avg true_objective: 6.985 [2025-02-11 23:03:10,417][00403] Num frames 2800... [2025-02-11 23:03:10,597][00403] Num frames 2900... [2025-02-11 23:03:10,799][00403] Num frames 3000... [2025-02-11 23:03:10,964][00403] Num frames 3100... [2025-02-11 23:03:11,093][00403] Num frames 3200... [2025-02-11 23:03:11,206][00403] Avg episode rewards: #0: 11.884, true rewards: #0: 6.484 [2025-02-11 23:03:11,209][00403] Avg episode reward: 11.884, avg true_objective: 6.484 [2025-02-11 23:03:11,283][00403] Num frames 3300... [2025-02-11 23:03:11,417][00403] Num frames 3400... [2025-02-11 23:03:11,552][00403] Num frames 3500... [2025-02-11 23:03:11,697][00403] Num frames 3600... [2025-02-11 23:03:11,833][00403] Num frames 3700... [2025-02-11 23:03:11,965][00403] Num frames 3800... [2025-02-11 23:03:12,095][00403] Num frames 3900... [2025-02-11 23:03:12,226][00403] Num frames 4000... [2025-02-11 23:03:12,359][00403] Num frames 4100... [2025-02-11 23:03:12,486][00403] Num frames 4200... [2025-02-11 23:03:12,626][00403] Num frames 4300... [2025-02-11 23:03:12,768][00403] Num frames 4400... [2025-02-11 23:03:12,899][00403] Num frames 4500... [2025-02-11 23:03:13,029][00403] Num frames 4600... [2025-02-11 23:03:13,160][00403] Num frames 4700... [2025-02-11 23:03:13,316][00403] Num frames 4800... [2025-02-11 23:03:13,386][00403] Avg episode rewards: #0: 16.017, true rewards: #0: 8.017 [2025-02-11 23:03:13,388][00403] Avg episode reward: 16.017, avg true_objective: 8.017 [2025-02-11 23:03:13,506][00403] Num frames 4900... [2025-02-11 23:03:13,648][00403] Num frames 5000... [2025-02-11 23:03:13,788][00403] Num frames 5100... [2025-02-11 23:03:13,918][00403] Num frames 5200... [2025-02-11 23:03:14,048][00403] Num frames 5300... [2025-02-11 23:03:14,177][00403] Num frames 5400... [2025-02-11 23:03:14,311][00403] Num frames 5500... [2025-02-11 23:03:14,451][00403] Num frames 5600... [2025-02-11 23:03:14,580][00403] Num frames 5700... [2025-02-11 23:03:14,720][00403] Num frames 5800... [2025-02-11 23:03:14,864][00403] Num frames 5900... [2025-02-11 23:03:14,999][00403] Num frames 6000... [2025-02-11 23:03:15,132][00403] Num frames 6100... [2025-02-11 23:03:15,301][00403] Avg episode rewards: #0: 18.123, true rewards: #0: 8.837 [2025-02-11 23:03:15,303][00403] Avg episode reward: 18.123, avg true_objective: 8.837 [2025-02-11 23:03:15,323][00403] Num frames 6200... [2025-02-11 23:03:15,454][00403] Num frames 6300... [2025-02-11 23:03:15,582][00403] Num frames 6400... [2025-02-11 23:03:15,723][00403] Num frames 6500... [2025-02-11 23:03:15,862][00403] Num frames 6600... [2025-02-11 23:03:15,990][00403] Num frames 6700... [2025-02-11 23:03:16,120][00403] Num frames 6800... [2025-02-11 23:03:16,253][00403] Num frames 6900... [2025-02-11 23:03:16,388][00403] Num frames 7000... [2025-02-11 23:03:16,524][00403] Num frames 7100... [2025-02-11 23:03:16,657][00403] Num frames 7200... [2025-02-11 23:03:16,792][00403] Num frames 7300... [2025-02-11 23:03:16,949][00403] Avg episode rewards: #0: 18.963, true rewards: #0: 9.212 [2025-02-11 23:03:16,950][00403] Avg episode reward: 18.963, avg true_objective: 9.212 [2025-02-11 23:03:16,991][00403] Num frames 7400... [2025-02-11 23:03:17,130][00403] Num frames 7500... [2025-02-11 23:03:17,259][00403] Num frames 7600... [2025-02-11 23:03:17,393][00403] Num frames 7700... [2025-02-11 23:03:17,547][00403] Num frames 7800... [2025-02-11 23:03:17,723][00403] Avg episode rewards: #0: 17.761, true rewards: #0: 8.761 [2025-02-11 23:03:17,725][00403] Avg episode reward: 17.761, avg true_objective: 8.761 [2025-02-11 23:03:17,747][00403] Num frames 7900... [2025-02-11 23:03:17,880][00403] Num frames 8000... [2025-02-11 23:03:18,010][00403] Num frames 8100... [2025-02-11 23:03:18,141][00403] Num frames 8200... [2025-02-11 23:03:18,271][00403] Num frames 8300... [2025-02-11 23:03:18,404][00403] Num frames 8400... [2025-02-11 23:03:18,534][00403] Num frames 8500... [2025-02-11 23:03:18,673][00403] Num frames 8600... [2025-02-11 23:03:18,803][00403] Num frames 8700... [2025-02-11 23:03:18,947][00403] Num frames 8800... [2025-02-11 23:03:19,077][00403] Num frames 8900... [2025-02-11 23:03:19,196][00403] Avg episode rewards: #0: 18.346, true rewards: #0: 8.946 [2025-02-11 23:03:19,197][00403] Avg episode reward: 18.346, avg true_objective: 8.946 [2025-02-11 23:04:11,234][00403] Replay video saved to /content/train_dir/default_experiment/replay.mp4!