[2025-01-27 14:13:20,569][01491] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-01-27 14:13:20,572][01491] Rollout worker 0 uses device cpu [2025-01-27 14:13:20,573][01491] Rollout worker 1 uses device cpu [2025-01-27 14:13:20,574][01491] Rollout worker 2 uses device cpu [2025-01-27 14:13:20,575][01491] Rollout worker 3 uses device cpu [2025-01-27 14:13:20,577][01491] Rollout worker 4 uses device cpu [2025-01-27 14:13:20,578][01491] Rollout worker 5 uses device cpu [2025-01-27 14:13:20,579][01491] Rollout worker 6 uses device cpu [2025-01-27 14:13:20,584][01491] Rollout worker 7 uses device cpu [2025-01-27 14:13:20,752][01491] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-27 14:13:20,755][01491] InferenceWorker_p0-w0: min num requests: 2 [2025-01-27 14:13:20,787][01491] Starting all processes... [2025-01-27 14:13:20,790][01491] Starting process learner_proc0 [2025-01-27 14:13:20,836][01491] Starting all processes... [2025-01-27 14:13:20,844][01491] Starting process inference_proc0-0 [2025-01-27 14:13:20,844][01491] Starting process rollout_proc0 [2025-01-27 14:13:20,847][01491] Starting process rollout_proc1 [2025-01-27 14:13:20,847][01491] Starting process rollout_proc2 [2025-01-27 14:13:20,847][01491] Starting process rollout_proc3 [2025-01-27 14:13:20,847][01491] Starting process rollout_proc4 [2025-01-27 14:13:20,848][01491] Starting process rollout_proc5 [2025-01-27 14:13:20,848][01491] Starting process rollout_proc6 [2025-01-27 14:13:20,848][01491] Starting process rollout_proc7 [2025-01-27 14:13:37,132][04427] Worker 5 uses CPU cores [1] [2025-01-27 14:13:37,220][04422] Worker 0 uses CPU cores [0] [2025-01-27 14:13:37,270][04428] Worker 6 uses CPU cores [0] [2025-01-27 14:13:37,355][04421] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-27 14:13:37,357][04421] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-27 14:13:37,383][04429] Worker 7 uses CPU cores [1] [2025-01-27 14:13:37,405][04421] Num visible devices: 1 [2025-01-27 14:13:37,415][04408] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-27 14:13:37,415][04408] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-27 14:13:37,443][04425] Worker 3 uses CPU cores [1] [2025-01-27 14:13:37,448][04424] Worker 2 uses CPU cores [0] [2025-01-27 14:13:37,465][04408] Num visible devices: 1 [2025-01-27 14:13:37,482][04408] Starting seed is not provided [2025-01-27 14:13:37,482][04408] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-27 14:13:37,483][04408] Initializing actor-critic model on device cuda:0 [2025-01-27 14:13:37,484][04408] RunningMeanStd input shape: (3, 72, 128) [2025-01-27 14:13:37,488][04408] RunningMeanStd input shape: (1,) [2025-01-27 14:13:37,489][04423] Worker 1 uses CPU cores [1] [2025-01-27 14:13:37,507][04408] ConvEncoder: input_channels=3 [2025-01-27 14:13:37,537][04426] Worker 4 uses CPU cores [0] [2025-01-27 14:13:37,802][04408] Conv encoder output size: 512 [2025-01-27 14:13:37,803][04408] Policy head output size: 512 [2025-01-27 14:13:37,875][04408] Created Actor Critic model with architecture: [2025-01-27 14:13:37,875][04408] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-27 14:13:38,199][04408] Using optimizer [2025-01-27 14:13:40,745][01491] Heartbeat connected on Batcher_0 [2025-01-27 14:13:40,753][01491] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-27 14:13:40,766][01491] Heartbeat connected on RolloutWorker_w1 [2025-01-27 14:13:40,768][01491] Heartbeat connected on RolloutWorker_w0 [2025-01-27 14:13:40,769][01491] Heartbeat connected on RolloutWorker_w2 [2025-01-27 14:13:40,775][01491] Heartbeat connected on RolloutWorker_w3 [2025-01-27 14:13:40,777][01491] Heartbeat connected on RolloutWorker_w4 [2025-01-27 14:13:40,780][01491] Heartbeat connected on RolloutWorker_w5 [2025-01-27 14:13:40,788][01491] Heartbeat connected on RolloutWorker_w6 [2025-01-27 14:13:40,790][01491] Heartbeat connected on RolloutWorker_w7 [2025-01-27 14:13:43,467][04408] No checkpoints found [2025-01-27 14:13:43,467][04408] Did not load from checkpoint, starting from scratch! [2025-01-27 14:13:43,468][04408] Initialized policy 0 weights for model version 0 [2025-01-27 14:13:43,471][04408] LearnerWorker_p0 finished initialization! [2025-01-27 14:13:43,473][04408] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-27 14:13:43,471][01491] Heartbeat connected on LearnerWorker_p0 [2025-01-27 14:13:43,582][04421] RunningMeanStd input shape: (3, 72, 128) [2025-01-27 14:13:43,583][04421] RunningMeanStd input shape: (1,) [2025-01-27 14:13:43,594][04421] ConvEncoder: input_channels=3 [2025-01-27 14:13:43,694][04421] Conv encoder output size: 512 [2025-01-27 14:13:43,694][04421] Policy head output size: 512 [2025-01-27 14:13:43,729][01491] Inference worker 0-0 is ready! [2025-01-27 14:13:43,731][01491] All inference workers are ready! Signal rollout workers to start! [2025-01-27 14:13:43,944][04426] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:13:43,948][04422] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:13:43,954][04427] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:13:43,952][04428] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:13:43,958][04429] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:13:43,959][04423] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:13:43,960][04425] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:13:43,951][04424] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:13:44,749][01491] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-27 14:13:45,366][04428] Decorrelating experience for 0 frames... [2025-01-27 14:13:45,367][04422] Decorrelating experience for 0 frames... [2025-01-27 14:13:45,369][04426] Decorrelating experience for 0 frames... [2025-01-27 14:13:45,693][04427] Decorrelating experience for 0 frames... [2025-01-27 14:13:45,695][04429] Decorrelating experience for 0 frames... [2025-01-27 14:13:45,698][04425] Decorrelating experience for 0 frames... [2025-01-27 14:13:45,700][04423] Decorrelating experience for 0 frames... [2025-01-27 14:13:46,801][04429] Decorrelating experience for 32 frames... [2025-01-27 14:13:46,804][04425] Decorrelating experience for 32 frames... [2025-01-27 14:13:46,805][04423] Decorrelating experience for 32 frames... [2025-01-27 14:13:46,805][04424] Decorrelating experience for 0 frames... [2025-01-27 14:13:46,823][04426] Decorrelating experience for 32 frames... [2025-01-27 14:13:46,831][04428] Decorrelating experience for 32 frames... [2025-01-27 14:13:47,217][04422] Decorrelating experience for 32 frames... [2025-01-27 14:13:47,743][04428] Decorrelating experience for 64 frames... [2025-01-27 14:13:48,171][04428] Decorrelating experience for 96 frames... [2025-01-27 14:13:48,352][04427] Decorrelating experience for 32 frames... [2025-01-27 14:13:48,784][04423] Decorrelating experience for 64 frames... [2025-01-27 14:13:48,786][04425] Decorrelating experience for 64 frames... [2025-01-27 14:13:48,788][04429] Decorrelating experience for 64 frames... [2025-01-27 14:13:49,339][04426] Decorrelating experience for 64 frames... [2025-01-27 14:13:49,749][01491] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-27 14:13:50,113][04422] Decorrelating experience for 64 frames... [2025-01-27 14:13:50,404][04427] Decorrelating experience for 64 frames... [2025-01-27 14:13:50,493][04425] Decorrelating experience for 96 frames... [2025-01-27 14:13:50,510][04429] Decorrelating experience for 96 frames... [2025-01-27 14:13:50,933][04426] Decorrelating experience for 96 frames... [2025-01-27 14:13:51,056][04423] Decorrelating experience for 96 frames... [2025-01-27 14:13:52,237][04422] Decorrelating experience for 96 frames... [2025-01-27 14:13:53,692][04427] Decorrelating experience for 96 frames... [2025-01-27 14:13:54,055][04424] Decorrelating experience for 32 frames... [2025-01-27 14:13:54,547][04408] Signal inference workers to stop experience collection... [2025-01-27 14:13:54,578][04421] InferenceWorker_p0-w0: stopping experience collection [2025-01-27 14:13:54,749][01491] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 147.0. Samples: 1470. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-27 14:13:54,751][01491] Avg episode reward: [(0, '2.610')] [2025-01-27 14:13:55,840][04424] Decorrelating experience for 64 frames... [2025-01-27 14:13:56,878][04424] Decorrelating experience for 96 frames... [2025-01-27 14:13:58,988][04408] Signal inference workers to resume experience collection... [2025-01-27 14:13:58,989][04421] InferenceWorker_p0-w0: resuming experience collection [2025-01-27 14:13:59,749][01491] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 151.1. Samples: 2266. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-01-27 14:13:59,753][01491] Avg episode reward: [(0, '2.665')] [2025-01-27 14:14:04,749][01491] Fps is (10 sec: 2867.2, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 28672. Throughput: 0: 421.4. Samples: 8428. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:14:04,751][01491] Avg episode reward: [(0, '3.631')] [2025-01-27 14:14:06,847][04421] Updated weights for policy 0, policy_version 10 (0.0189) [2025-01-27 14:14:09,749][01491] Fps is (10 sec: 4915.2, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 477.2. Samples: 11930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:14:09,751][01491] Avg episode reward: [(0, '4.271')] [2025-01-27 14:14:14,751][01491] Fps is (10 sec: 3685.6, 60 sec: 2184.4, 300 sec: 2184.4). Total num frames: 65536. Throughput: 0: 549.8. Samples: 16494. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:14:14,753][01491] Avg episode reward: [(0, '4.317')] [2025-01-27 14:14:18,940][04421] Updated weights for policy 0, policy_version 20 (0.0046) [2025-01-27 14:14:19,749][01491] Fps is (10 sec: 2867.2, 60 sec: 2340.6, 300 sec: 2340.6). Total num frames: 81920. Throughput: 0: 622.3. Samples: 21782. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:14:19,755][01491] Avg episode reward: [(0, '4.304')] [2025-01-27 14:14:24,751][01491] Fps is (10 sec: 4096.1, 60 sec: 2662.3, 300 sec: 2662.3). Total num frames: 106496. Throughput: 0: 630.2. Samples: 25210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:14:24,753][01491] Avg episode reward: [(0, '4.359')] [2025-01-27 14:14:24,765][04408] Saving new best policy, reward=4.359! [2025-01-27 14:14:28,440][04421] Updated weights for policy 0, policy_version 30 (0.0019) [2025-01-27 14:14:29,749][01491] Fps is (10 sec: 4096.0, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 122880. Throughput: 0: 699.3. Samples: 31470. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:14:29,751][01491] Avg episode reward: [(0, '4.421')] [2025-01-27 14:14:29,753][04408] Saving new best policy, reward=4.421! [2025-01-27 14:14:34,749][01491] Fps is (10 sec: 3277.4, 60 sec: 2785.3, 300 sec: 2785.3). Total num frames: 139264. Throughput: 0: 794.8. Samples: 35768. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:14:34,751][01491] Avg episode reward: [(0, '4.488')] [2025-01-27 14:14:34,757][04408] Saving new best policy, reward=4.488! [2025-01-27 14:14:39,749][01491] Fps is (10 sec: 3686.4, 60 sec: 2904.4, 300 sec: 2904.4). Total num frames: 159744. Throughput: 0: 834.9. Samples: 39042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:14:39,753][01491] Avg episode reward: [(0, '4.398')] [2025-01-27 14:14:39,938][04421] Updated weights for policy 0, policy_version 40 (0.0012) [2025-01-27 14:14:44,749][01491] Fps is (10 sec: 4505.6, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 184320. Throughput: 0: 968.5. Samples: 45850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:14:44,753][01491] Avg episode reward: [(0, '4.432')] [2025-01-27 14:14:49,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3024.7). Total num frames: 196608. Throughput: 0: 936.5. Samples: 50570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:14:49,755][01491] Avg episode reward: [(0, '4.434')] [2025-01-27 14:14:51,374][04421] Updated weights for policy 0, policy_version 50 (0.0012) [2025-01-27 14:14:54,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3101.3). Total num frames: 217088. Throughput: 0: 913.2. Samples: 53022. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:14:54,751][01491] Avg episode reward: [(0, '4.383')] [2025-01-27 14:14:59,749][01491] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3222.2). Total num frames: 241664. Throughput: 0: 969.4. Samples: 60116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:14:59,754][01491] Avg episode reward: [(0, '4.459')] [2025-01-27 14:15:00,151][04421] Updated weights for policy 0, policy_version 60 (0.0014) [2025-01-27 14:15:04,750][01491] Fps is (10 sec: 4505.3, 60 sec: 3891.1, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 930.7. Samples: 63664. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:15:04,757][01491] Avg episode reward: [(0, '4.448')] [2025-01-27 14:15:09,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3228.6). Total num frames: 274432. Throughput: 0: 954.4. Samples: 68156. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:15:09,755][01491] Avg episode reward: [(0, '4.367')] [2025-01-27 14:15:11,725][04421] Updated weights for policy 0, policy_version 70 (0.0020) [2025-01-27 14:15:14,749][01491] Fps is (10 sec: 3686.7, 60 sec: 3891.3, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 956.0. Samples: 74488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:15:14,756][01491] Avg episode reward: [(0, '4.454')] [2025-01-27 14:15:14,767][04408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2025-01-27 14:15:19,749][01491] Fps is (10 sec: 4505.5, 60 sec: 3959.5, 300 sec: 3363.0). Total num frames: 319488. Throughput: 0: 1014.0. Samples: 81398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:15:19,751][01491] Avg episode reward: [(0, '4.421')] [2025-01-27 14:15:21,273][04421] Updated weights for policy 0, policy_version 80 (0.0026) [2025-01-27 14:15:24,749][01491] Fps is (10 sec: 3686.3, 60 sec: 3823.0, 300 sec: 3358.7). Total num frames: 335872. Throughput: 0: 989.9. Samples: 83586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:15:24,752][01491] Avg episode reward: [(0, '4.454')] [2025-01-27 14:15:29,749][01491] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 961.7. Samples: 89128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:15:29,755][01491] Avg episode reward: [(0, '4.541')] [2025-01-27 14:15:29,757][04408] Saving new best policy, reward=4.541! [2025-01-27 14:15:31,746][04421] Updated weights for policy 0, policy_version 90 (0.0013) [2025-01-27 14:15:34,749][01491] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3463.0). Total num frames: 380928. Throughput: 0: 1015.6. Samples: 96272. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:15:34,755][01491] Avg episode reward: [(0, '4.606')] [2025-01-27 14:15:34,760][04408] Saving new best policy, reward=4.606! [2025-01-27 14:15:39,749][01491] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3454.9). Total num frames: 397312. Throughput: 0: 1027.2. Samples: 99246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:15:39,756][01491] Avg episode reward: [(0, '4.621')] [2025-01-27 14:15:39,761][04408] Saving new best policy, reward=4.621! [2025-01-27 14:15:42,996][04421] Updated weights for policy 0, policy_version 100 (0.0012) [2025-01-27 14:15:44,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3447.5). Total num frames: 413696. Throughput: 0: 966.7. Samples: 103618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:15:44,756][01491] Avg episode reward: [(0, '4.601')] [2025-01-27 14:15:49,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3506.2). Total num frames: 438272. Throughput: 0: 1048.6. Samples: 110848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:15:49,751][01491] Avg episode reward: [(0, '4.515')] [2025-01-27 14:15:51,709][04421] Updated weights for policy 0, policy_version 110 (0.0013) [2025-01-27 14:15:54,749][01491] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3560.4). Total num frames: 462848. Throughput: 0: 1028.0. Samples: 114414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:15:54,756][01491] Avg episode reward: [(0, '4.448')] [2025-01-27 14:15:59,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3519.5). Total num frames: 475136. Throughput: 0: 999.2. Samples: 119452. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:15:59,755][01491] Avg episode reward: [(0, '4.448')] [2025-01-27 14:16:02,921][04421] Updated weights for policy 0, policy_version 120 (0.0020) [2025-01-27 14:16:04,749][01491] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3569.4). Total num frames: 499712. Throughput: 0: 983.7. Samples: 125664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:16:04,754][01491] Avg episode reward: [(0, '4.728')] [2025-01-27 14:16:04,759][04408] Saving new best policy, reward=4.728! [2025-01-27 14:16:09,749][01491] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 3587.5). Total num frames: 520192. Throughput: 0: 1015.0. Samples: 129260. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:16:09,758][01491] Avg episode reward: [(0, '4.719')] [2025-01-27 14:16:11,874][04421] Updated weights for policy 0, policy_version 130 (0.0028) [2025-01-27 14:16:14,753][01491] Fps is (10 sec: 4094.4, 60 sec: 4027.5, 300 sec: 3604.4). Total num frames: 540672. Throughput: 0: 1020.3. Samples: 135046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:16:14,755][01491] Avg episode reward: [(0, '4.642')] [2025-01-27 14:16:19,749][01491] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3593.9). Total num frames: 557056. Throughput: 0: 975.1. Samples: 140154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:16:19,755][01491] Avg episode reward: [(0, '4.508')] [2025-01-27 14:16:22,872][04421] Updated weights for policy 0, policy_version 140 (0.0036) [2025-01-27 14:16:24,749][01491] Fps is (10 sec: 4097.7, 60 sec: 4096.0, 300 sec: 3635.2). Total num frames: 581632. Throughput: 0: 988.5. Samples: 143728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:16:24,755][01491] Avg episode reward: [(0, '4.413')] [2025-01-27 14:16:29,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3649.2). Total num frames: 602112. Throughput: 0: 1049.5. Samples: 150846. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-27 14:16:29,752][01491] Avg episode reward: [(0, '4.351')] [2025-01-27 14:16:32,872][04421] Updated weights for policy 0, policy_version 150 (0.0020) [2025-01-27 14:16:34,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3638.2). Total num frames: 618496. Throughput: 0: 938.9. Samples: 153098. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:16:34,756][01491] Avg episode reward: [(0, '4.318')] [2025-01-27 14:16:39,749][01491] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3651.3). Total num frames: 638976. Throughput: 0: 981.2. Samples: 158570. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-27 14:16:39,752][01491] Avg episode reward: [(0, '4.597')] [2025-01-27 14:16:42,678][04421] Updated weights for policy 0, policy_version 160 (0.0015) [2025-01-27 14:16:44,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3686.4). Total num frames: 663552. Throughput: 0: 1030.5. Samples: 165826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:16:44,751][01491] Avg episode reward: [(0, '4.749')] [2025-01-27 14:16:44,761][04408] Saving new best policy, reward=4.749! [2025-01-27 14:16:49,749][01491] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3675.3). Total num frames: 679936. Throughput: 0: 1012.3. Samples: 171218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:16:49,754][01491] Avg episode reward: [(0, '4.579')] [2025-01-27 14:16:53,963][04421] Updated weights for policy 0, policy_version 170 (0.0012) [2025-01-27 14:16:54,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3664.8). Total num frames: 696320. Throughput: 0: 981.9. Samples: 173444. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:16:54,751][01491] Avg episode reward: [(0, '4.385')] [2025-01-27 14:16:59,752][01491] Fps is (10 sec: 4504.4, 60 sec: 4164.1, 300 sec: 3717.9). Total num frames: 724992. Throughput: 0: 1013.2. Samples: 180638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:16:59,757][01491] Avg episode reward: [(0, '4.670')] [2025-01-27 14:17:02,458][04421] Updated weights for policy 0, policy_version 180 (0.0019) [2025-01-27 14:17:04,753][01491] Fps is (10 sec: 4913.2, 60 sec: 4095.7, 300 sec: 3727.3). Total num frames: 745472. Throughput: 0: 1040.8. Samples: 186996. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:17:04,755][01491] Avg episode reward: [(0, '4.791')] [2025-01-27 14:17:04,762][04408] Saving new best policy, reward=4.791! [2025-01-27 14:17:09,749][01491] Fps is (10 sec: 3277.8, 60 sec: 3959.5, 300 sec: 3696.4). Total num frames: 757760. Throughput: 0: 1008.8. Samples: 189124. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:17:09,753][01491] Avg episode reward: [(0, '4.767')] [2025-01-27 14:17:13,554][04421] Updated weights for policy 0, policy_version 190 (0.0018) [2025-01-27 14:17:14,749][01491] Fps is (10 sec: 3687.9, 60 sec: 4028.0, 300 sec: 3725.4). Total num frames: 782336. Throughput: 0: 986.4. Samples: 195232. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-27 14:17:14,754][01491] Avg episode reward: [(0, '4.885')] [2025-01-27 14:17:14,763][04408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000191_782336.pth... [2025-01-27 14:17:14,896][04408] Saving new best policy, reward=4.885! [2025-01-27 14:17:19,749][01491] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3753.1). Total num frames: 806912. Throughput: 0: 1091.2. Samples: 202200. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-27 14:17:19,750][01491] Avg episode reward: [(0, '5.085')] [2025-01-27 14:17:19,758][04408] Saving new best policy, reward=5.085! [2025-01-27 14:17:23,817][04421] Updated weights for policy 0, policy_version 200 (0.0013) [2025-01-27 14:17:24,749][01491] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3723.6). Total num frames: 819200. Throughput: 0: 1022.4. Samples: 204580. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:17:24,752][01491] Avg episode reward: [(0, '4.749')] [2025-01-27 14:17:29,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3731.9). Total num frames: 839680. Throughput: 0: 974.8. Samples: 209692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:17:29,756][01491] Avg episode reward: [(0, '4.475')] [2025-01-27 14:17:33,550][04421] Updated weights for policy 0, policy_version 210 (0.0022) [2025-01-27 14:17:34,749][01491] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3757.6). Total num frames: 864256. Throughput: 0: 1018.2. Samples: 217038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:17:34,755][01491] Avg episode reward: [(0, '4.577')] [2025-01-27 14:17:39,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3764.8). Total num frames: 884736. Throughput: 0: 1047.8. Samples: 220594. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:17:39,751][01491] Avg episode reward: [(0, '4.641')] [2025-01-27 14:17:44,428][04421] Updated weights for policy 0, policy_version 220 (0.0032) [2025-01-27 14:17:44,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3754.7). Total num frames: 901120. Throughput: 0: 988.4. Samples: 225114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-27 14:17:44,753][01491] Avg episode reward: [(0, '4.840')] [2025-01-27 14:17:49,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3761.6). Total num frames: 921600. Throughput: 0: 997.7. Samples: 231888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:17:49,762][01491] Avg episode reward: [(0, '4.750')] [2025-01-27 14:17:53,374][04421] Updated weights for policy 0, policy_version 230 (0.0018) [2025-01-27 14:17:54,750][01491] Fps is (10 sec: 4505.1, 60 sec: 4164.2, 300 sec: 3784.7). Total num frames: 946176. Throughput: 0: 1030.8. Samples: 235510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:17:54,757][01491] Avg episode reward: [(0, '4.635')] [2025-01-27 14:17:59,749][01491] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 3774.7). Total num frames: 962560. Throughput: 0: 1014.9. Samples: 240904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:17:59,754][01491] Avg episode reward: [(0, '4.628')] [2025-01-27 14:18:04,504][04421] Updated weights for policy 0, policy_version 240 (0.0029) [2025-01-27 14:18:04,749][01491] Fps is (10 sec: 3686.8, 60 sec: 3959.7, 300 sec: 3780.9). Total num frames: 983040. Throughput: 0: 986.4. Samples: 246586. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:18:04,751][01491] Avg episode reward: [(0, '4.599')] [2025-01-27 14:18:09,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3802.3). Total num frames: 1007616. Throughput: 0: 1015.2. Samples: 250262. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:18:09,754][01491] Avg episode reward: [(0, '4.607')] [2025-01-27 14:18:13,406][04421] Updated weights for policy 0, policy_version 250 (0.0028) [2025-01-27 14:18:14,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3807.8). Total num frames: 1028096. Throughput: 0: 1046.0. Samples: 256764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:18:14,753][01491] Avg episode reward: [(0, '4.684')] [2025-01-27 14:18:19,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3783.2). Total num frames: 1040384. Throughput: 0: 986.0. Samples: 261406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-27 14:18:19,751][01491] Avg episode reward: [(0, '5.018')] [2025-01-27 14:18:24,298][04421] Updated weights for policy 0, policy_version 260 (0.0020) [2025-01-27 14:18:24,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3803.4). Total num frames: 1064960. Throughput: 0: 985.2. Samples: 264926. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:18:24,754][01491] Avg episode reward: [(0, '4.884')] [2025-01-27 14:18:29,749][01491] Fps is (10 sec: 4914.8, 60 sec: 4164.2, 300 sec: 3822.9). Total num frames: 1089536. Throughput: 0: 1044.3. Samples: 272110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:18:29,753][01491] Avg episode reward: [(0, '4.707')] [2025-01-27 14:18:34,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3799.4). Total num frames: 1101824. Throughput: 0: 953.6. Samples: 274800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:18:34,752][01491] Avg episode reward: [(0, '4.689')] [2025-01-27 14:18:34,830][04421] Updated weights for policy 0, policy_version 270 (0.0012) [2025-01-27 14:18:39,749][01491] Fps is (10 sec: 3277.1, 60 sec: 3959.5, 300 sec: 3804.4). Total num frames: 1122304. Throughput: 0: 982.0. Samples: 279698. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:18:39,751][01491] Avg episode reward: [(0, '4.837')] [2025-01-27 14:18:44,003][04421] Updated weights for policy 0, policy_version 280 (0.0022) [2025-01-27 14:18:44,749][01491] Fps is (10 sec: 4915.3, 60 sec: 4164.3, 300 sec: 3901.6). Total num frames: 1150976. Throughput: 0: 1025.1. Samples: 287032. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:18:44,751][01491] Avg episode reward: [(0, '5.329')] [2025-01-27 14:18:44,760][04408] Saving new best policy, reward=5.329! [2025-01-27 14:18:49,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1167360. Throughput: 0: 1022.8. Samples: 292614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:18:49,755][01491] Avg episode reward: [(0, '5.345')] [2025-01-27 14:18:49,757][04408] Saving new best policy, reward=5.345! [2025-01-27 14:18:54,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1183744. Throughput: 0: 989.7. Samples: 294800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:18:54,751][01491] Avg episode reward: [(0, '5.450')] [2025-01-27 14:18:54,758][04408] Saving new best policy, reward=5.450! [2025-01-27 14:18:55,638][04421] Updated weights for policy 0, policy_version 290 (0.0030) [2025-01-27 14:18:59,749][01491] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1204224. Throughput: 0: 992.6. Samples: 301432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:18:59,752][01491] Avg episode reward: [(0, '5.052')] [2025-01-27 14:19:04,048][04421] Updated weights for policy 0, policy_version 300 (0.0017) [2025-01-27 14:19:04,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 1228800. Throughput: 0: 1041.1. Samples: 308256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:19:04,753][01491] Avg episode reward: [(0, '5.046')] [2025-01-27 14:19:09,749][01491] Fps is (10 sec: 4096.1, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1245184. Throughput: 0: 1013.2. Samples: 310522. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:19:09,756][01491] Avg episode reward: [(0, '5.277')] [2025-01-27 14:19:14,749][01491] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1265664. Throughput: 0: 975.3. Samples: 315998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:19:14,753][01491] Avg episode reward: [(0, '5.323')] [2025-01-27 14:19:14,759][04408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000309_1265664.pth... [2025-01-27 14:19:14,879][04408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2025-01-27 14:19:15,493][04421] Updated weights for policy 0, policy_version 310 (0.0012) [2025-01-27 14:19:19,749][01491] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3998.8). Total num frames: 1286144. Throughput: 0: 1069.2. Samples: 322914. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:19:19,754][01491] Avg episode reward: [(0, '5.376')] [2025-01-27 14:19:24,749][01491] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 1302528. Throughput: 0: 1019.2. Samples: 325564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:19:24,755][01491] Avg episode reward: [(0, '5.329')] [2025-01-27 14:19:26,509][04421] Updated weights for policy 0, policy_version 320 (0.0014) [2025-01-27 14:19:29,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 4012.7). Total num frames: 1323008. Throughput: 0: 960.0. Samples: 330234. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:19:29,751][01491] Avg episode reward: [(0, '5.431')] [2025-01-27 14:19:34,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1347584. Throughput: 0: 996.4. Samples: 337450. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:19:34,756][01491] Avg episode reward: [(0, '5.527')] [2025-01-27 14:19:34,763][04408] Saving new best policy, reward=5.527! [2025-01-27 14:19:35,621][04421] Updated weights for policy 0, policy_version 330 (0.0017) [2025-01-27 14:19:39,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4012.7). Total num frames: 1368064. Throughput: 0: 1025.6. Samples: 340952. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:19:39,755][01491] Avg episode reward: [(0, '5.395')] [2025-01-27 14:19:44,755][01491] Fps is (10 sec: 3274.8, 60 sec: 3822.5, 300 sec: 4012.6). Total num frames: 1380352. Throughput: 0: 982.7. Samples: 345658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:19:44,757][01491] Avg episode reward: [(0, '5.312')] [2025-01-27 14:19:46,911][04421] Updated weights for policy 0, policy_version 340 (0.0020) [2025-01-27 14:19:49,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1404928. Throughput: 0: 973.1. Samples: 352046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:19:49,751][01491] Avg episode reward: [(0, '5.214')] [2025-01-27 14:19:54,749][01491] Fps is (10 sec: 4918.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1429504. Throughput: 0: 1005.2. Samples: 355756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:19:54,755][01491] Avg episode reward: [(0, '4.914')] [2025-01-27 14:19:55,364][04421] Updated weights for policy 0, policy_version 350 (0.0016) [2025-01-27 14:19:59,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1445888. Throughput: 0: 1010.0. Samples: 361446. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:19:59,754][01491] Avg episode reward: [(0, '5.093')] [2025-01-27 14:20:04,749][01491] Fps is (10 sec: 3276.9, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 1462272. Throughput: 0: 979.5. Samples: 366990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:20:04,751][01491] Avg episode reward: [(0, '5.417')] [2025-01-27 14:20:06,534][04421] Updated weights for policy 0, policy_version 360 (0.0031) [2025-01-27 14:20:09,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1486848. Throughput: 0: 1000.2. Samples: 370574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:20:09,754][01491] Avg episode reward: [(0, '5.334')] [2025-01-27 14:20:14,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1507328. Throughput: 0: 1048.7. Samples: 377424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:20:14,752][01491] Avg episode reward: [(0, '5.603')] [2025-01-27 14:20:14,765][04408] Saving new best policy, reward=5.603! [2025-01-27 14:20:16,703][04421] Updated weights for policy 0, policy_version 370 (0.0012) [2025-01-27 14:20:19,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1523712. Throughput: 0: 985.0. Samples: 381774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:20:19,753][01491] Avg episode reward: [(0, '5.498')] [2025-01-27 14:20:24,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1548288. Throughput: 0: 988.0. Samples: 385414. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:20:24,753][01491] Avg episode reward: [(0, '5.380')] [2025-01-27 14:20:26,294][04421] Updated weights for policy 0, policy_version 380 (0.0012) [2025-01-27 14:20:29,749][01491] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 1572864. Throughput: 0: 1045.8. Samples: 392714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:20:29,756][01491] Avg episode reward: [(0, '5.769')] [2025-01-27 14:20:29,758][04408] Saving new best policy, reward=5.769! [2025-01-27 14:20:34,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1585152. Throughput: 0: 1011.8. Samples: 397576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:20:34,759][01491] Avg episode reward: [(0, '5.709')] [2025-01-27 14:20:37,635][04421] Updated weights for policy 0, policy_version 390 (0.0017) [2025-01-27 14:20:39,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1605632. Throughput: 0: 989.6. Samples: 400286. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) [2025-01-27 14:20:39,752][01491] Avg episode reward: [(0, '5.780')] [2025-01-27 14:20:39,756][04408] Saving new best policy, reward=5.780! [2025-01-27 14:20:44,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.7, 300 sec: 4040.5). Total num frames: 1630208. Throughput: 0: 1023.8. Samples: 407518. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:20:44,751][01491] Avg episode reward: [(0, '5.886')] [2025-01-27 14:20:44,761][04408] Saving new best policy, reward=5.886! [2025-01-27 14:20:46,078][04421] Updated weights for policy 0, policy_version 400 (0.0028) [2025-01-27 14:20:49,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1650688. Throughput: 0: 1031.3. Samples: 413398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:20:49,751][01491] Avg episode reward: [(0, '5.528')] [2025-01-27 14:20:54,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1667072. Throughput: 0: 1001.1. Samples: 415622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:20:54,751][01491] Avg episode reward: [(0, '5.513')] [2025-01-27 14:20:57,170][04421] Updated weights for policy 0, policy_version 410 (0.0017) [2025-01-27 14:20:59,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1691648. Throughput: 0: 998.8. Samples: 422372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-27 14:20:59,752][01491] Avg episode reward: [(0, '5.610')] [2025-01-27 14:21:04,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 1712128. Throughput: 0: 1058.8. Samples: 429420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:21:04,754][01491] Avg episode reward: [(0, '5.801')] [2025-01-27 14:21:06,311][04421] Updated weights for policy 0, policy_version 420 (0.0016) [2025-01-27 14:21:09,749][01491] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1728512. Throughput: 0: 1026.6. Samples: 431612. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:21:09,753][01491] Avg episode reward: [(0, '5.859')] [2025-01-27 14:21:14,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1753088. Throughput: 0: 998.7. Samples: 437656. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:21:14,756][01491] Avg episode reward: [(0, '6.180')] [2025-01-27 14:21:14,764][04408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000428_1753088.pth... [2025-01-27 14:21:14,907][04408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000191_782336.pth [2025-01-27 14:21:14,926][04408] Saving new best policy, reward=6.180! [2025-01-27 14:21:16,539][04421] Updated weights for policy 0, policy_version 430 (0.0014) [2025-01-27 14:21:19,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 1773568. Throughput: 0: 1047.5. Samples: 444714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:21:19,757][01491] Avg episode reward: [(0, '6.650')] [2025-01-27 14:21:19,760][04408] Saving new best policy, reward=6.650! [2025-01-27 14:21:24,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1789952. Throughput: 0: 1045.6. Samples: 447340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:21:24,754][01491] Avg episode reward: [(0, '7.123')] [2025-01-27 14:21:24,762][04408] Saving new best policy, reward=7.123! [2025-01-27 14:21:28,045][04421] Updated weights for policy 0, policy_version 440 (0.0042) [2025-01-27 14:21:29,749][01491] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1810432. Throughput: 0: 992.4. Samples: 452178. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:21:29,759][01491] Avg episode reward: [(0, '6.574')] [2025-01-27 14:21:34,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 1835008. Throughput: 0: 1024.0. Samples: 459476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:21:34,752][01491] Avg episode reward: [(0, '6.487')] [2025-01-27 14:21:36,265][04421] Updated weights for policy 0, policy_version 450 (0.0019) [2025-01-27 14:21:39,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 1855488. Throughput: 0: 1057.9. Samples: 463226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:21:39,752][01491] Avg episode reward: [(0, '6.926')] [2025-01-27 14:21:44,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1871872. Throughput: 0: 1008.3. Samples: 467746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:21:44,752][01491] Avg episode reward: [(0, '7.360')] [2025-01-27 14:21:44,765][04408] Saving new best policy, reward=7.360! [2025-01-27 14:21:47,581][04421] Updated weights for policy 0, policy_version 460 (0.0022) [2025-01-27 14:21:49,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1892352. Throughput: 0: 991.8. Samples: 474052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:21:49,757][01491] Avg episode reward: [(0, '8.195')] [2025-01-27 14:21:49,760][04408] Saving new best policy, reward=8.195! [2025-01-27 14:21:54,749][01491] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 1916928. Throughput: 0: 1019.2. Samples: 477474. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:21:54,756][01491] Avg episode reward: [(0, '7.978')] [2025-01-27 14:21:56,757][04421] Updated weights for policy 0, policy_version 470 (0.0017) [2025-01-27 14:21:59,749][01491] Fps is (10 sec: 4095.8, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1933312. Throughput: 0: 1012.1. Samples: 483200. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:21:59,754][01491] Avg episode reward: [(0, '7.935')] [2025-01-27 14:22:04,749][01491] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1949696. Throughput: 0: 978.8. Samples: 488758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:22:04,751][01491] Avg episode reward: [(0, '7.826')] [2025-01-27 14:22:07,595][04421] Updated weights for policy 0, policy_version 480 (0.0035) [2025-01-27 14:22:09,749][01491] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1974272. Throughput: 0: 1000.5. Samples: 492362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:22:09,752][01491] Avg episode reward: [(0, '8.773')] [2025-01-27 14:22:09,757][04408] Saving new best policy, reward=8.773! [2025-01-27 14:22:14,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1994752. Throughput: 0: 1043.6. Samples: 499140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:22:14,751][01491] Avg episode reward: [(0, '9.113')] [2025-01-27 14:22:14,758][04408] Saving new best policy, reward=9.113! [2025-01-27 14:22:18,379][04421] Updated weights for policy 0, policy_version 490 (0.0025) [2025-01-27 14:22:19,749][01491] Fps is (10 sec: 3276.7, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2007040. Throughput: 0: 927.0. Samples: 501190. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:22:19,752][01491] Avg episode reward: [(0, '9.961')] [2025-01-27 14:22:19,755][04408] Saving new best policy, reward=9.961! [2025-01-27 14:22:24,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2031616. Throughput: 0: 965.3. Samples: 506664. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:22:24,753][01491] Avg episode reward: [(0, '10.608')] [2025-01-27 14:22:24,759][04408] Saving new best policy, reward=10.608! [2025-01-27 14:22:27,656][04421] Updated weights for policy 0, policy_version 500 (0.0019) [2025-01-27 14:22:29,749][01491] Fps is (10 sec: 4915.3, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2056192. Throughput: 0: 1025.2. Samples: 513880. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:22:29,755][01491] Avg episode reward: [(0, '11.612')] [2025-01-27 14:22:29,759][04408] Saving new best policy, reward=11.612! [2025-01-27 14:22:34,749][01491] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2072576. Throughput: 0: 996.6. Samples: 518900. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:22:34,751][01491] Avg episode reward: [(0, '11.191')] [2025-01-27 14:22:38,736][04421] Updated weights for policy 0, policy_version 510 (0.0024) [2025-01-27 14:22:39,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2093056. Throughput: 0: 980.0. Samples: 521574. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:22:39,755][01491] Avg episode reward: [(0, '10.221')] [2025-01-27 14:22:44,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2117632. Throughput: 0: 1017.3. Samples: 528976. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:22:44,753][01491] Avg episode reward: [(0, '9.377')] [2025-01-27 14:22:47,027][04421] Updated weights for policy 0, policy_version 520 (0.0015) [2025-01-27 14:22:49,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2134016. Throughput: 0: 976.4. Samples: 532694. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:22:49,755][01491] Avg episode reward: [(0, '9.044')] [2025-01-27 14:22:54,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2150400. Throughput: 0: 996.9. Samples: 537222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:22:54,751][01491] Avg episode reward: [(0, '9.592')] [2025-01-27 14:22:58,090][04421] Updated weights for policy 0, policy_version 530 (0.0013) [2025-01-27 14:22:59,755][01491] Fps is (10 sec: 4093.6, 60 sec: 4027.4, 300 sec: 4040.4). Total num frames: 2174976. Throughput: 0: 995.9. Samples: 543960. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-27 14:22:59,756][01491] Avg episode reward: [(0, '9.684')] [2025-01-27 14:23:04,749][01491] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 2199552. Throughput: 0: 1101.5. Samples: 550756. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-01-27 14:23:04,751][01491] Avg episode reward: [(0, '9.495')] [2025-01-27 14:23:08,645][04421] Updated weights for policy 0, policy_version 540 (0.0037) [2025-01-27 14:23:09,749][01491] Fps is (10 sec: 3688.6, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2211840. Throughput: 0: 1026.5. Samples: 552858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:23:09,752][01491] Avg episode reward: [(0, '8.976')] [2025-01-27 14:23:14,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2232320. Throughput: 0: 990.9. Samples: 558470. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:23:14,751][01491] Avg episode reward: [(0, '9.110')] [2025-01-27 14:23:14,758][04408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000545_2232320.pth... [2025-01-27 14:23:14,920][04408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000309_1265664.pth [2025-01-27 14:23:18,480][04421] Updated weights for policy 0, policy_version 550 (0.0014) [2025-01-27 14:23:19,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 2256896. Throughput: 0: 1033.9. Samples: 565426. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:23:19,751][01491] Avg episode reward: [(0, '10.173')] [2025-01-27 14:23:24,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2273280. Throughput: 0: 1035.9. Samples: 568190. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:23:24,751][01491] Avg episode reward: [(0, '11.255')] [2025-01-27 14:23:29,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2289664. Throughput: 0: 969.6. Samples: 572606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:23:29,751][01491] Avg episode reward: [(0, '13.006')] [2025-01-27 14:23:29,754][04408] Saving new best policy, reward=13.006! [2025-01-27 14:23:30,032][04421] Updated weights for policy 0, policy_version 560 (0.0017) [2025-01-27 14:23:34,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2314240. Throughput: 0: 1042.0. Samples: 579584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:23:34,752][01491] Avg episode reward: [(0, '14.035')] [2025-01-27 14:23:34,763][04408] Saving new best policy, reward=14.035! [2025-01-27 14:23:38,971][04421] Updated weights for policy 0, policy_version 570 (0.0012) [2025-01-27 14:23:39,754][01491] Fps is (10 sec: 4503.3, 60 sec: 4027.4, 300 sec: 4012.6). Total num frames: 2334720. Throughput: 0: 1021.2. Samples: 583182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:23:39,756][01491] Avg episode reward: [(0, '14.321')] [2025-01-27 14:23:39,767][04408] Saving new best policy, reward=14.321! [2025-01-27 14:23:44,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 2351104. Throughput: 0: 975.6. Samples: 587858. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:23:44,751][01491] Avg episode reward: [(0, '13.621')] [2025-01-27 14:23:49,705][04421] Updated weights for policy 0, policy_version 580 (0.0014) [2025-01-27 14:23:49,749][01491] Fps is (10 sec: 4098.2, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2375680. Throughput: 0: 973.0. Samples: 594540. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:23:49,756][01491] Avg episode reward: [(0, '13.182')] [2025-01-27 14:23:54,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2396160. Throughput: 0: 1004.0. Samples: 598038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:23:54,754][01491] Avg episode reward: [(0, '13.599')] [2025-01-27 14:23:59,749][01491] Fps is (10 sec: 3686.3, 60 sec: 3959.8, 300 sec: 4012.7). Total num frames: 2412544. Throughput: 0: 1005.4. Samples: 603712. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:23:59,753][01491] Avg episode reward: [(0, '13.450')] [2025-01-27 14:24:00,189][04421] Updated weights for policy 0, policy_version 590 (0.0026) [2025-01-27 14:24:04,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2433024. Throughput: 0: 978.9. Samples: 609476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:24:04,755][01491] Avg episode reward: [(0, '13.261')] [2025-01-27 14:24:09,230][04421] Updated weights for policy 0, policy_version 600 (0.0015) [2025-01-27 14:24:09,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2457600. Throughput: 0: 998.8. Samples: 613136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:24:09,755][01491] Avg episode reward: [(0, '13.166')] [2025-01-27 14:24:14,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2478080. Throughput: 0: 1052.5. Samples: 619970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:24:14,754][01491] Avg episode reward: [(0, '13.170')] [2025-01-27 14:24:19,749][01491] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2494464. Throughput: 0: 1002.3. Samples: 624688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:24:19,751][01491] Avg episode reward: [(0, '13.898')] [2025-01-27 14:24:20,118][04421] Updated weights for policy 0, policy_version 610 (0.0018) [2025-01-27 14:24:24,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2519040. Throughput: 0: 1001.0. Samples: 628222. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:24:24,757][01491] Avg episode reward: [(0, '15.071')] [2025-01-27 14:24:24,765][04408] Saving new best policy, reward=15.071! [2025-01-27 14:24:28,639][04421] Updated weights for policy 0, policy_version 620 (0.0014) [2025-01-27 14:24:29,749][01491] Fps is (10 sec: 4915.1, 60 sec: 4232.5, 300 sec: 4054.3). Total num frames: 2543616. Throughput: 0: 1061.5. Samples: 635624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:24:29,756][01491] Avg episode reward: [(0, '16.530')] [2025-01-27 14:24:29,761][04408] Saving new best policy, reward=16.530! [2025-01-27 14:24:34,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2555904. Throughput: 0: 1019.6. Samples: 640420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:24:34,752][01491] Avg episode reward: [(0, '16.810')] [2025-01-27 14:24:34,765][04408] Saving new best policy, reward=16.810! [2025-01-27 14:24:39,637][04421] Updated weights for policy 0, policy_version 630 (0.0024) [2025-01-27 14:24:39,749][01491] Fps is (10 sec: 3686.5, 60 sec: 4096.4, 300 sec: 4068.3). Total num frames: 2580480. Throughput: 0: 1005.1. Samples: 643266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:24:39,753][01491] Avg episode reward: [(0, '17.865')] [2025-01-27 14:24:39,757][04408] Saving new best policy, reward=17.865! [2025-01-27 14:24:44,749][01491] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4068.2). Total num frames: 2605056. Throughput: 0: 1041.8. Samples: 650592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:24:44,753][01491] Avg episode reward: [(0, '18.431')] [2025-01-27 14:24:44,762][04408] Saving new best policy, reward=18.431! [2025-01-27 14:24:49,247][04421] Updated weights for policy 0, policy_version 640 (0.0027) [2025-01-27 14:24:49,749][01491] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2621440. Throughput: 0: 1040.9. Samples: 656316. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:24:49,751][01491] Avg episode reward: [(0, '18.383')] [2025-01-27 14:24:54,749][01491] Fps is (10 sec: 3276.7, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2637824. Throughput: 0: 1008.4. Samples: 658512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:24:54,751][01491] Avg episode reward: [(0, '17.083')] [2025-01-27 14:24:59,159][04421] Updated weights for policy 0, policy_version 650 (0.0016) [2025-01-27 14:24:59,749][01491] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 2662400. Throughput: 0: 1014.1. Samples: 665606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:24:59,751][01491] Avg episode reward: [(0, '15.533')] [2025-01-27 14:25:04,755][01491] Fps is (10 sec: 4912.4, 60 sec: 4232.1, 300 sec: 4068.1). Total num frames: 2686976. Throughput: 0: 1061.3. Samples: 672452. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:25:04,759][01491] Avg episode reward: [(0, '13.279')] [2025-01-27 14:25:09,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 2699264. Throughput: 0: 1034.0. Samples: 674750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:25:09,754][01491] Avg episode reward: [(0, '12.875')] [2025-01-27 14:25:09,972][04421] Updated weights for policy 0, policy_version 660 (0.0012) [2025-01-27 14:25:14,749][01491] Fps is (10 sec: 3688.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2723840. Throughput: 0: 1004.0. Samples: 680802. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:25:14,755][01491] Avg episode reward: [(0, '13.395')] [2025-01-27 14:25:14,766][04408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000665_2723840.pth... [2025-01-27 14:25:14,890][04408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000428_1753088.pth [2025-01-27 14:25:18,583][04421] Updated weights for policy 0, policy_version 670 (0.0012) [2025-01-27 14:25:19,749][01491] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4068.2). Total num frames: 2748416. Throughput: 0: 1060.4. Samples: 688136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:25:19,755][01491] Avg episode reward: [(0, '12.901')] [2025-01-27 14:25:24,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2764800. Throughput: 0: 1052.9. Samples: 690646. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-01-27 14:25:24,754][01491] Avg episode reward: [(0, '12.882')] [2025-01-27 14:25:29,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 2781184. Throughput: 0: 1001.6. Samples: 695666. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:25:29,751][01491] Avg episode reward: [(0, '14.076')] [2025-01-27 14:25:29,786][04421] Updated weights for policy 0, policy_version 680 (0.0023) [2025-01-27 14:25:34,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 2805760. Throughput: 0: 1037.8. Samples: 703018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:25:34,751][01491] Avg episode reward: [(0, '15.686')] [2025-01-27 14:25:38,282][04421] Updated weights for policy 0, policy_version 690 (0.0019) [2025-01-27 14:25:39,752][01491] Fps is (10 sec: 4913.7, 60 sec: 4164.1, 300 sec: 4068.2). Total num frames: 2830336. Throughput: 0: 1071.9. Samples: 706752. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:25:39,761][01491] Avg episode reward: [(0, '16.537')] [2025-01-27 14:25:44,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2842624. Throughput: 0: 1014.7. Samples: 711268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:25:44,756][01491] Avg episode reward: [(0, '16.877')] [2025-01-27 14:25:49,097][04421] Updated weights for policy 0, policy_version 700 (0.0013) [2025-01-27 14:25:49,749][01491] Fps is (10 sec: 3687.5, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2867200. Throughput: 0: 1016.5. Samples: 718190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:25:49,751][01491] Avg episode reward: [(0, '18.067')] [2025-01-27 14:25:54,749][01491] Fps is (10 sec: 4915.2, 60 sec: 4232.6, 300 sec: 4068.2). Total num frames: 2891776. Throughput: 0: 1043.1. Samples: 721688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:25:54,751][01491] Avg episode reward: [(0, '17.277')] [2025-01-27 14:25:59,066][04421] Updated weights for policy 0, policy_version 710 (0.0029) [2025-01-27 14:25:59,749][01491] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2908160. Throughput: 0: 1035.2. Samples: 727384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:25:59,755][01491] Avg episode reward: [(0, '17.771')] [2025-01-27 14:26:04,749][01491] Fps is (10 sec: 3686.3, 60 sec: 4028.1, 300 sec: 4068.2). Total num frames: 2928640. Throughput: 0: 1003.7. Samples: 733302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:26:04,751][01491] Avg episode reward: [(0, '17.801')] [2025-01-27 14:26:08,543][04421] Updated weights for policy 0, policy_version 720 (0.0027) [2025-01-27 14:26:09,749][01491] Fps is (10 sec: 4505.7, 60 sec: 4232.5, 300 sec: 4068.2). Total num frames: 2953216. Throughput: 0: 1029.1. Samples: 736956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:26:09,752][01491] Avg episode reward: [(0, '16.846')] [2025-01-27 14:26:14,752][01491] Fps is (10 sec: 4504.3, 60 sec: 4164.1, 300 sec: 4068.2). Total num frames: 2973696. Throughput: 0: 1066.7. Samples: 743670. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:26:14,755][01491] Avg episode reward: [(0, '15.949')] [2025-01-27 14:26:19,541][04421] Updated weights for policy 0, policy_version 730 (0.0020) [2025-01-27 14:26:19,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2990080. Throughput: 0: 1008.8. Samples: 748414. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:26:19,750][01491] Avg episode reward: [(0, '16.281')] [2025-01-27 14:26:24,751][01491] Fps is (10 sec: 4096.4, 60 sec: 4164.1, 300 sec: 4082.1). Total num frames: 3014656. Throughput: 0: 1008.5. Samples: 752132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:26:24,754][01491] Avg episode reward: [(0, '16.224')] [2025-01-27 14:26:27,971][04421] Updated weights for policy 0, policy_version 740 (0.0019) [2025-01-27 14:26:29,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4068.2). Total num frames: 3035136. Throughput: 0: 1070.9. Samples: 759460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:26:29,753][01491] Avg episode reward: [(0, '16.642')] [2025-01-27 14:26:34,751][01491] Fps is (10 sec: 3686.4, 60 sec: 4095.9, 300 sec: 4054.3). Total num frames: 3051520. Throughput: 0: 972.8. Samples: 761970. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:26:34,753][01491] Avg episode reward: [(0, '17.416')] [2025-01-27 14:26:38,795][04421] Updated weights for policy 0, policy_version 750 (0.0019) [2025-01-27 14:26:39,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4082.1). Total num frames: 3076096. Throughput: 0: 1012.3. Samples: 767240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:26:39,753][01491] Avg episode reward: [(0, '17.234')] [2025-01-27 14:26:44,749][01491] Fps is (10 sec: 4916.2, 60 sec: 4300.8, 300 sec: 4096.0). Total num frames: 3100672. Throughput: 0: 1050.2. Samples: 774644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:26:44,751][01491] Avg episode reward: [(0, '18.803')] [2025-01-27 14:26:44,765][04408] Saving new best policy, reward=18.803! [2025-01-27 14:26:47,614][04421] Updated weights for policy 0, policy_version 760 (0.0021) [2025-01-27 14:26:49,750][01491] Fps is (10 sec: 4095.4, 60 sec: 4164.2, 300 sec: 4068.2). Total num frames: 3117056. Throughput: 0: 994.3. Samples: 778048. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:26:49,755][01491] Avg episode reward: [(0, '19.065')] [2025-01-27 14:26:49,756][04408] Saving new best policy, reward=19.065! [2025-01-27 14:26:54,749][01491] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3133440. Throughput: 0: 1011.6. Samples: 782476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:26:54,755][01491] Avg episode reward: [(0, '19.808')] [2025-01-27 14:26:54,764][04408] Saving new best policy, reward=19.808! [2025-01-27 14:26:58,443][04421] Updated weights for policy 0, policy_version 770 (0.0014) [2025-01-27 14:26:59,749][01491] Fps is (10 sec: 4096.6, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3158016. Throughput: 0: 1018.0. Samples: 789478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:26:59,755][01491] Avg episode reward: [(0, '20.807')] [2025-01-27 14:26:59,757][04408] Saving new best policy, reward=20.807! [2025-01-27 14:27:04,754][01491] Fps is (10 sec: 4503.1, 60 sec: 4163.9, 300 sec: 4082.0). Total num frames: 3178496. Throughput: 0: 1058.7. Samples: 796060. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:27:04,760][01491] Avg episode reward: [(0, '23.291')] [2025-01-27 14:27:04,766][04408] Saving new best policy, reward=23.291! [2025-01-27 14:27:09,054][04421] Updated weights for policy 0, policy_version 780 (0.0013) [2025-01-27 14:27:09,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3194880. Throughput: 0: 1024.2. Samples: 798220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:27:09,751][01491] Avg episode reward: [(0, '22.722')] [2025-01-27 14:27:14,749][01491] Fps is (10 sec: 4098.3, 60 sec: 4096.2, 300 sec: 4109.9). Total num frames: 3219456. Throughput: 0: 998.2. Samples: 804378. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:27:14,751][01491] Avg episode reward: [(0, '21.856')] [2025-01-27 14:27:14,757][04408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000786_3219456.pth... [2025-01-27 14:27:14,873][04408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000545_2232320.pth [2025-01-27 14:27:18,057][04421] Updated weights for policy 0, policy_version 790 (0.0015) [2025-01-27 14:27:19,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3239936. Throughput: 0: 1102.4. Samples: 811574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:27:19,751][01491] Avg episode reward: [(0, '22.014')] [2025-01-27 14:27:24,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.9, 300 sec: 4068.2). Total num frames: 3256320. Throughput: 0: 1040.1. Samples: 814046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:27:24,752][01491] Avg episode reward: [(0, '20.653')] [2025-01-27 14:27:29,578][04421] Updated weights for policy 0, policy_version 800 (0.0029) [2025-01-27 14:27:29,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3276800. Throughput: 0: 982.1. Samples: 818838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:27:29,753][01491] Avg episode reward: [(0, '19.731')] [2025-01-27 14:27:34,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.4, 300 sec: 4096.0). Total num frames: 3301376. Throughput: 0: 1068.9. Samples: 826146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:27:34,751][01491] Avg episode reward: [(0, '19.720')] [2025-01-27 14:27:37,981][04421] Updated weights for policy 0, policy_version 810 (0.0029) [2025-01-27 14:27:39,751][01491] Fps is (10 sec: 4504.7, 60 sec: 4095.9, 300 sec: 4082.1). Total num frames: 3321856. Throughput: 0: 1052.5. Samples: 829840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:27:39,753][01491] Avg episode reward: [(0, '20.339')] [2025-01-27 14:27:44,751][01491] Fps is (10 sec: 3685.7, 60 sec: 3959.3, 300 sec: 4082.1). Total num frames: 3338240. Throughput: 0: 997.9. Samples: 834384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:27:44,756][01491] Avg episode reward: [(0, '20.357')] [2025-01-27 14:27:48,933][04421] Updated weights for policy 0, policy_version 820 (0.0037) [2025-01-27 14:27:49,749][01491] Fps is (10 sec: 4096.8, 60 sec: 4096.1, 300 sec: 4109.9). Total num frames: 3362816. Throughput: 0: 1003.7. Samples: 841220. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:27:49,754][01491] Avg episode reward: [(0, '20.585')] [2025-01-27 14:27:54,753][01491] Fps is (10 sec: 4504.7, 60 sec: 4164.0, 300 sec: 4096.0). Total num frames: 3383296. Throughput: 0: 1038.3. Samples: 844950. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:27:54,760][01491] Avg episode reward: [(0, '20.571')] [2025-01-27 14:27:58,846][04421] Updated weights for policy 0, policy_version 830 (0.0020) [2025-01-27 14:27:59,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 3399680. Throughput: 0: 1024.6. Samples: 850486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:27:59,756][01491] Avg episode reward: [(0, '20.837')] [2025-01-27 14:28:04,754][01491] Fps is (10 sec: 3686.0, 60 sec: 4027.8, 300 sec: 4095.9). Total num frames: 3420160. Throughput: 0: 989.5. Samples: 856106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:28:04,759][01491] Avg episode reward: [(0, '19.978')] [2025-01-27 14:28:08,716][04421] Updated weights for policy 0, policy_version 840 (0.0012) [2025-01-27 14:28:09,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3444736. Throughput: 0: 1014.4. Samples: 859694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:28:09,752][01491] Avg episode reward: [(0, '20.140')] [2025-01-27 14:28:14,750][01491] Fps is (10 sec: 4097.6, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3461120. Throughput: 0: 1043.9. Samples: 865816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:28:14,752][01491] Avg episode reward: [(0, '20.906')] [2025-01-27 14:28:19,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 3477504. Throughput: 0: 974.5. Samples: 869998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:28:19,752][01491] Avg episode reward: [(0, '20.282')] [2025-01-27 14:28:20,502][04421] Updated weights for policy 0, policy_version 850 (0.0026) [2025-01-27 14:28:24,749][01491] Fps is (10 sec: 3686.8, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3497984. Throughput: 0: 973.2. Samples: 873630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:28:24,752][01491] Avg episode reward: [(0, '20.587')] [2025-01-27 14:28:29,183][04421] Updated weights for policy 0, policy_version 860 (0.0017) [2025-01-27 14:28:29,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3522560. Throughput: 0: 1030.3. Samples: 880746. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:28:29,755][01491] Avg episode reward: [(0, '21.443')] [2025-01-27 14:28:34,749][01491] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4082.2). Total num frames: 3538944. Throughput: 0: 984.4. Samples: 885520. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:28:34,755][01491] Avg episode reward: [(0, '20.103')] [2025-01-27 14:28:39,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 4096.0). Total num frames: 3559424. Throughput: 0: 962.4. Samples: 888254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:28:39,756][01491] Avg episode reward: [(0, '18.700')] [2025-01-27 14:28:40,438][04421] Updated weights for policy 0, policy_version 870 (0.0022) [2025-01-27 14:28:44,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.1, 300 sec: 4096.0). Total num frames: 3584000. Throughput: 0: 1002.9. Samples: 895618. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:28:44,760][01491] Avg episode reward: [(0, '20.105')] [2025-01-27 14:28:49,749][01491] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 3600384. Throughput: 0: 1010.1. Samples: 901556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:28:49,755][01491] Avg episode reward: [(0, '19.629')] [2025-01-27 14:28:50,019][04421] Updated weights for policy 0, policy_version 880 (0.0012) [2025-01-27 14:28:54,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3891.5, 300 sec: 4082.1). Total num frames: 3616768. Throughput: 0: 979.7. Samples: 903782. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:28:54,756][01491] Avg episode reward: [(0, '18.951')] [2025-01-27 14:28:59,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3641344. Throughput: 0: 995.4. Samples: 910606. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:28:59,751][01491] Avg episode reward: [(0, '20.294')] [2025-01-27 14:28:59,908][04421] Updated weights for policy 0, policy_version 890 (0.0018) [2025-01-27 14:29:04,750][01491] Fps is (10 sec: 4914.5, 60 sec: 4096.3, 300 sec: 4096.0). Total num frames: 3665920. Throughput: 0: 982.7. Samples: 914220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:29:04,753][01491] Avg episode reward: [(0, '21.475')] [2025-01-27 14:29:09,749][01491] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 3682304. Throughput: 0: 1025.2. Samples: 919764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-01-27 14:29:09,752][01491] Avg episode reward: [(0, '20.481')] [2025-01-27 14:29:10,971][04421] Updated weights for policy 0, policy_version 900 (0.0019) [2025-01-27 14:29:14,749][01491] Fps is (10 sec: 3686.9, 60 sec: 4027.8, 300 sec: 4096.0). Total num frames: 3702784. Throughput: 0: 998.8. Samples: 925692. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:29:14,755][01491] Avg episode reward: [(0, '20.974')] [2025-01-27 14:29:14,765][04408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000904_3702784.pth... [2025-01-27 14:29:14,881][04408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000665_2723840.pth [2025-01-27 14:29:19,185][04421] Updated weights for policy 0, policy_version 910 (0.0014) [2025-01-27 14:29:19,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3727360. Throughput: 0: 1056.0. Samples: 933038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:29:19,753][01491] Avg episode reward: [(0, '22.835')] [2025-01-27 14:29:24,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3743744. Throughput: 0: 1053.7. Samples: 935670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:29:24,758][01491] Avg episode reward: [(0, '24.281')] [2025-01-27 14:29:24,769][04408] Saving new best policy, reward=24.281! [2025-01-27 14:29:29,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 3760128. Throughput: 0: 991.5. Samples: 940236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:29:29,751][01491] Avg episode reward: [(0, '24.537')] [2025-01-27 14:29:29,761][04408] Saving new best policy, reward=24.537! [2025-01-27 14:29:30,723][04421] Updated weights for policy 0, policy_version 920 (0.0026) [2025-01-27 14:29:34,749][01491] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 3784704. Throughput: 0: 1017.0. Samples: 947322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:29:34,755][01491] Avg episode reward: [(0, '24.663')] [2025-01-27 14:29:34,765][04408] Saving new best policy, reward=24.663! [2025-01-27 14:29:39,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3805184. Throughput: 0: 1044.2. Samples: 950772. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:29:39,750][01491] Avg episode reward: [(0, '23.785')] [2025-01-27 14:29:40,187][04421] Updated weights for policy 0, policy_version 930 (0.0023) [2025-01-27 14:29:44,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 3821568. Throughput: 0: 994.8. Samples: 955374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-01-27 14:29:44,751][01491] Avg episode reward: [(0, '23.111')] [2025-01-27 14:29:49,749][01491] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3842048. Throughput: 0: 1055.0. Samples: 961692. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:29:49,751][01491] Avg episode reward: [(0, '21.571')] [2025-01-27 14:29:51,191][04421] Updated weights for policy 0, policy_version 940 (0.0022) [2025-01-27 14:29:54,749][01491] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3866624. Throughput: 0: 1008.1. Samples: 965130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:29:54,755][01491] Avg episode reward: [(0, '21.615')] [2025-01-27 14:29:59,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3878912. Throughput: 0: 992.7. Samples: 970362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:29:59,751][01491] Avg episode reward: [(0, '22.543')] [2025-01-27 14:30:03,069][04421] Updated weights for policy 0, policy_version 950 (0.0019) [2025-01-27 14:30:04,749][01491] Fps is (10 sec: 2867.2, 60 sec: 3823.0, 300 sec: 4054.3). Total num frames: 3895296. Throughput: 0: 939.2. Samples: 975304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:30:04,752][01491] Avg episode reward: [(0, '23.488')] [2025-01-27 14:30:09,749][01491] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 3919872. Throughput: 0: 954.3. Samples: 978614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:30:09,751][01491] Avg episode reward: [(0, '24.458')] [2025-01-27 14:30:12,254][04421] Updated weights for policy 0, policy_version 960 (0.0022) [2025-01-27 14:30:14,749][01491] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3940352. Throughput: 0: 1000.8. Samples: 985272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0) [2025-01-27 14:30:14,754][01491] Avg episode reward: [(0, '25.233')] [2025-01-27 14:30:14,763][04408] Saving new best policy, reward=25.233! [2025-01-27 14:30:19,749][01491] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 4026.6). Total num frames: 3952640. Throughput: 0: 939.8. Samples: 989612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-01-27 14:30:19,755][01491] Avg episode reward: [(0, '25.792')] [2025-01-27 14:30:19,758][04408] Saving new best policy, reward=25.792! [2025-01-27 14:30:23,802][04421] Updated weights for policy 0, policy_version 970 (0.0020) [2025-01-27 14:30:24,749][01491] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4054.3). Total num frames: 3977216. Throughput: 0: 930.4. Samples: 992640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-01-27 14:30:24,751][01491] Avg episode reward: [(0, '25.521')] [2025-01-27 14:30:29,749][01491] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 3997696. Throughput: 0: 976.8. Samples: 999332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-01-27 14:30:29,751][01491] Avg episode reward: [(0, '25.109')] [2025-01-27 14:30:31,612][04408] Stopping Batcher_0... [2025-01-27 14:30:31,613][04408] Loop batcher_evt_loop terminating... [2025-01-27 14:30:31,614][01491] Component Batcher_0 stopped! [2025-01-27 14:30:31,628][04408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-27 14:30:31,744][04421] Weights refcount: 2 0 [2025-01-27 14:30:31,748][04421] Stopping InferenceWorker_p0-w0... [2025-01-27 14:30:31,748][04421] Loop inference_proc0-0_evt_loop terminating... [2025-01-27 14:30:31,748][01491] Component InferenceWorker_p0-w0 stopped! [2025-01-27 14:30:31,867][04408] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000786_3219456.pth [2025-01-27 14:30:31,910][04408] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-27 14:30:32,182][04408] Stopping LearnerWorker_p0... [2025-01-27 14:30:32,183][04408] Loop learner_proc0_evt_loop terminating... [2025-01-27 14:30:32,181][01491] Component LearnerWorker_p0 stopped! [2025-01-27 14:30:32,227][04428] Stopping RolloutWorker_w6... [2025-01-27 14:30:32,228][01491] Component RolloutWorker_w6 stopped! [2025-01-27 14:30:32,228][04428] Loop rollout_proc6_evt_loop terminating... [2025-01-27 14:30:32,260][04426] Stopping RolloutWorker_w4... [2025-01-27 14:30:32,260][01491] Component RolloutWorker_w4 stopped! [2025-01-27 14:30:32,268][04424] Stopping RolloutWorker_w2... [2025-01-27 14:30:32,268][01491] Component RolloutWorker_w2 stopped! [2025-01-27 14:30:32,261][04426] Loop rollout_proc4_evt_loop terminating... [2025-01-27 14:30:32,269][04424] Loop rollout_proc2_evt_loop terminating... [2025-01-27 14:30:32,291][04422] Stopping RolloutWorker_w0... [2025-01-27 14:30:32,292][04422] Loop rollout_proc0_evt_loop terminating... [2025-01-27 14:30:32,291][01491] Component RolloutWorker_w0 stopped! [2025-01-27 14:30:32,375][04429] Stopping RolloutWorker_w7... [2025-01-27 14:30:32,376][01491] Component RolloutWorker_w7 stopped! [2025-01-27 14:30:32,396][04429] Loop rollout_proc7_evt_loop terminating... [2025-01-27 14:30:32,415][04427] Stopping RolloutWorker_w5... [2025-01-27 14:30:32,416][04427] Loop rollout_proc5_evt_loop terminating... [2025-01-27 14:30:32,415][01491] Component RolloutWorker_w5 stopped! [2025-01-27 14:30:32,444][04423] Stopping RolloutWorker_w1... [2025-01-27 14:30:32,444][01491] Component RolloutWorker_w1 stopped! [2025-01-27 14:30:32,462][04423] Loop rollout_proc1_evt_loop terminating... [2025-01-27 14:30:32,480][01491] Component RolloutWorker_w3 stopped! [2025-01-27 14:30:32,490][01491] Waiting for process learner_proc0 to stop... [2025-01-27 14:30:32,492][04425] Stopping RolloutWorker_w3... [2025-01-27 14:30:32,492][04425] Loop rollout_proc3_evt_loop terminating... [2025-01-27 14:30:34,624][01491] Waiting for process inference_proc0-0 to join... [2025-01-27 14:30:34,629][01491] Waiting for process rollout_proc0 to join... [2025-01-27 14:30:37,030][01491] Waiting for process rollout_proc1 to join... [2025-01-27 14:30:37,034][01491] Waiting for process rollout_proc2 to join... [2025-01-27 14:30:37,038][01491] Waiting for process rollout_proc3 to join... [2025-01-27 14:30:37,043][01491] Waiting for process rollout_proc4 to join... [2025-01-27 14:30:37,047][01491] Waiting for process rollout_proc5 to join... [2025-01-27 14:30:37,051][01491] Waiting for process rollout_proc6 to join... [2025-01-27 14:30:37,055][01491] Waiting for process rollout_proc7 to join... [2025-01-27 14:30:37,059][01491] Batcher 0 profile tree view: batching: 26.6115, releasing_batches: 0.0240 [2025-01-27 14:30:37,061][01491] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0064 wait_policy_total: 426.8439 update_model: 7.8201 weight_update: 0.0012 one_step: 0.0067 handle_policy_step: 534.2215 deserialize: 13.2644, stack: 2.9001, obs_to_device_normalize: 114.6855, forward: 273.2472, send_messages: 26.0295 prepare_outputs: 80.5325 to_cpu: 50.2809 [2025-01-27 14:30:37,062][01491] Learner 0 profile tree view: misc: 0.0039, prepare_batch: 13.9206 train: 73.7261 epoch_init: 0.0164, minibatch_init: 0.0067, losses_postprocess: 0.6460, kl_divergence: 0.5414, after_optimizer: 33.6727 calculate_losses: 26.2535 losses_init: 0.0036, forward_head: 1.1913, bptt_initial: 17.7688, tail: 1.0427, advantages_returns: 0.2775, losses: 3.8651 bptt: 1.8273 bptt_forward_core: 1.7564 update: 12.0534 clip: 0.8801 [2025-01-27 14:30:37,063][01491] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.2289, enqueue_policy_requests: 109.8570, env_step: 784.8590, overhead: 11.1409, complete_rollouts: 6.4451 save_policy_outputs: 17.1294 split_output_tensors: 6.5711 [2025-01-27 14:30:37,065][01491] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3162, enqueue_policy_requests: 110.3278, env_step: 783.0540, overhead: 10.9786, complete_rollouts: 7.0070 save_policy_outputs: 17.4347 split_output_tensors: 6.8573 [2025-01-27 14:30:37,066][01491] Loop Runner_EvtLoop terminating... [2025-01-27 14:30:37,067][01491] Runner profile tree view: main_loop: 1036.2805 [2025-01-27 14:30:37,068][01491] Collected {0: 4005888}, FPS: 3865.6 [2025-01-27 14:30:37,483][01491] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-27 14:30:37,485][01491] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-27 14:30:37,487][01491] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-27 14:30:37,489][01491] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-27 14:30:37,490][01491] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-27 14:30:37,492][01491] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-27 14:30:37,494][01491] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-27 14:30:37,495][01491] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-27 14:30:37,496][01491] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-27 14:30:37,497][01491] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-27 14:30:37,498][01491] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-27 14:30:37,501][01491] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-27 14:30:37,502][01491] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-27 14:30:37,503][01491] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-27 14:30:37,511][01491] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-27 14:30:37,544][01491] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:30:37,548][01491] RunningMeanStd input shape: (3, 72, 128) [2025-01-27 14:30:37,552][01491] RunningMeanStd input shape: (1,) [2025-01-27 14:30:37,566][01491] ConvEncoder: input_channels=3 [2025-01-27 14:30:37,677][01491] Conv encoder output size: 512 [2025-01-27 14:30:37,679][01491] Policy head output size: 512 [2025-01-27 14:30:37,854][01491] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-27 14:30:38,664][01491] Num frames 100... [2025-01-27 14:30:38,807][01491] Num frames 200... [2025-01-27 14:30:38,934][01491] Num frames 300... [2025-01-27 14:30:39,060][01491] Num frames 400... [2025-01-27 14:30:39,193][01491] Num frames 500... [2025-01-27 14:30:39,322][01491] Num frames 600... [2025-01-27 14:30:39,450][01491] Num frames 700... [2025-01-27 14:30:39,575][01491] Num frames 800... [2025-01-27 14:30:39,706][01491] Num frames 900... [2025-01-27 14:30:39,842][01491] Num frames 1000... [2025-01-27 14:30:39,972][01491] Num frames 1100... [2025-01-27 14:30:40,133][01491] Avg episode rewards: #0: 24.840, true rewards: #0: 11.840 [2025-01-27 14:30:40,135][01491] Avg episode reward: 24.840, avg true_objective: 11.840 [2025-01-27 14:30:40,162][01491] Num frames 1200... [2025-01-27 14:30:40,290][01491] Num frames 1300... [2025-01-27 14:30:40,418][01491] Num frames 1400... [2025-01-27 14:30:40,544][01491] Num frames 1500... [2025-01-27 14:30:40,676][01491] Num frames 1600... [2025-01-27 14:30:40,819][01491] Num frames 1700... [2025-01-27 14:30:40,948][01491] Num frames 1800... [2025-01-27 14:30:41,074][01491] Num frames 1900... [2025-01-27 14:30:41,212][01491] Num frames 2000... [2025-01-27 14:30:41,339][01491] Num frames 2100... [2025-01-27 14:30:41,469][01491] Num frames 2200... [2025-01-27 14:30:41,596][01491] Avg episode rewards: #0: 25.260, true rewards: #0: 11.260 [2025-01-27 14:30:41,597][01491] Avg episode reward: 25.260, avg true_objective: 11.260 [2025-01-27 14:30:41,662][01491] Num frames 2300... [2025-01-27 14:30:41,803][01491] Num frames 2400... [2025-01-27 14:30:41,934][01491] Num frames 2500... [2025-01-27 14:30:42,065][01491] Num frames 2600... [2025-01-27 14:30:42,208][01491] Num frames 2700... [2025-01-27 14:30:42,336][01491] Num frames 2800... [2025-01-27 14:30:42,463][01491] Num frames 2900... [2025-01-27 14:30:42,596][01491] Num frames 3000... [2025-01-27 14:30:42,726][01491] Num frames 3100... [2025-01-27 14:30:42,862][01491] Num frames 3200... [2025-01-27 14:30:42,993][01491] Num frames 3300... [2025-01-27 14:30:43,127][01491] Num frames 3400... [2025-01-27 14:30:43,262][01491] Num frames 3500... [2025-01-27 14:30:43,394][01491] Num frames 3600... [2025-01-27 14:30:43,529][01491] Num frames 3700... [2025-01-27 14:30:43,661][01491] Num frames 3800... [2025-01-27 14:30:43,799][01491] Num frames 3900... [2025-01-27 14:30:43,956][01491] Num frames 4000... [2025-01-27 14:30:44,087][01491] Num frames 4100... [2025-01-27 14:30:44,231][01491] Num frames 4200... [2025-01-27 14:30:44,366][01491] Num frames 4300... [2025-01-27 14:30:44,490][01491] Avg episode rewards: #0: 35.840, true rewards: #0: 14.507 [2025-01-27 14:30:44,492][01491] Avg episode reward: 35.840, avg true_objective: 14.507 [2025-01-27 14:30:44,558][01491] Num frames 4400... [2025-01-27 14:30:44,688][01491] Num frames 4500... [2025-01-27 14:30:44,829][01491] Num frames 4600... [2025-01-27 14:30:44,965][01491] Num frames 4700... [2025-01-27 14:30:45,094][01491] Num frames 4800... [2025-01-27 14:30:45,244][01491] Num frames 4900... [2025-01-27 14:30:45,424][01491] Num frames 5000... [2025-01-27 14:30:45,594][01491] Num frames 5100... [2025-01-27 14:30:45,762][01491] Num frames 5200... [2025-01-27 14:30:45,940][01491] Num frames 5300... [2025-01-27 14:30:46,114][01491] Num frames 5400... [2025-01-27 14:30:46,297][01491] Num frames 5500... [2025-01-27 14:30:46,469][01491] Num frames 5600... [2025-01-27 14:30:46,653][01491] Num frames 5700... [2025-01-27 14:30:46,741][01491] Avg episode rewards: #0: 35.042, true rewards: #0: 14.292 [2025-01-27 14:30:46,743][01491] Avg episode reward: 35.042, avg true_objective: 14.292 [2025-01-27 14:30:46,890][01491] Num frames 5800... [2025-01-27 14:30:47,066][01491] Num frames 5900... [2025-01-27 14:30:47,253][01491] Num frames 6000... [2025-01-27 14:30:47,438][01491] Num frames 6100... [2025-01-27 14:30:47,619][01491] Num frames 6200... [2025-01-27 14:30:47,791][01491] Num frames 6300... [2025-01-27 14:30:47,919][01491] Num frames 6400... [2025-01-27 14:30:48,060][01491] Num frames 6500... [2025-01-27 14:30:48,197][01491] Num frames 6600... [2025-01-27 14:30:48,324][01491] Num frames 6700... [2025-01-27 14:30:48,453][01491] Num frames 6800... [2025-01-27 14:30:48,579][01491] Num frames 6900... [2025-01-27 14:30:48,711][01491] Num frames 7000... [2025-01-27 14:30:48,838][01491] Num frames 7100... [2025-01-27 14:30:48,970][01491] Num frames 7200... [2025-01-27 14:30:49,114][01491] Num frames 7300... [2025-01-27 14:30:49,254][01491] Num frames 7400... [2025-01-27 14:30:49,383][01491] Num frames 7500... [2025-01-27 14:30:49,519][01491] Num frames 7600... [2025-01-27 14:30:49,654][01491] Num frames 7700... [2025-01-27 14:30:49,787][01491] Num frames 7800... [2025-01-27 14:30:49,867][01491] Avg episode rewards: #0: 38.834, true rewards: #0: 15.634 [2025-01-27 14:30:49,868][01491] Avg episode reward: 38.834, avg true_objective: 15.634 [2025-01-27 14:30:49,978][01491] Num frames 7900... [2025-01-27 14:30:50,115][01491] Num frames 8000... [2025-01-27 14:30:50,254][01491] Num frames 8100... [2025-01-27 14:30:50,378][01491] Num frames 8200... [2025-01-27 14:30:50,507][01491] Num frames 8300... [2025-01-27 14:30:50,632][01491] Num frames 8400... [2025-01-27 14:30:50,800][01491] Avg episode rewards: #0: 34.315, true rewards: #0: 14.148 [2025-01-27 14:30:50,803][01491] Avg episode reward: 34.315, avg true_objective: 14.148 [2025-01-27 14:30:50,820][01491] Num frames 8500... [2025-01-27 14:30:50,945][01491] Num frames 8600... [2025-01-27 14:30:51,080][01491] Num frames 8700... [2025-01-27 14:30:51,223][01491] Num frames 8800... [2025-01-27 14:30:51,350][01491] Num frames 8900... [2025-01-27 14:30:51,476][01491] Num frames 9000... [2025-01-27 14:30:51,604][01491] Num frames 9100... [2025-01-27 14:30:51,735][01491] Num frames 9200... [2025-01-27 14:30:51,866][01491] Num frames 9300... [2025-01-27 14:30:52,005][01491] Num frames 9400... [2025-01-27 14:30:52,138][01491] Num frames 9500... [2025-01-27 14:30:52,287][01491] Num frames 9600... [2025-01-27 14:30:52,415][01491] Num frames 9700... [2025-01-27 14:30:52,545][01491] Num frames 9800... [2025-01-27 14:30:52,673][01491] Num frames 9900... [2025-01-27 14:30:52,803][01491] Num frames 10000... [2025-01-27 14:30:52,934][01491] Num frames 10100... [2025-01-27 14:30:53,063][01491] Num frames 10200... [2025-01-27 14:30:53,205][01491] Num frames 10300... [2025-01-27 14:30:53,335][01491] Num frames 10400... [2025-01-27 14:30:53,458][01491] Num frames 10500... [2025-01-27 14:30:53,627][01491] Avg episode rewards: #0: 37.984, true rewards: #0: 15.127 [2025-01-27 14:30:53,628][01491] Avg episode reward: 37.984, avg true_objective: 15.127 [2025-01-27 14:30:53,648][01491] Num frames 10600... [2025-01-27 14:30:53,773][01491] Num frames 10700... [2025-01-27 14:30:53,897][01491] Num frames 10800... [2025-01-27 14:30:54,022][01491] Num frames 10900... [2025-01-27 14:30:54,160][01491] Num frames 11000... [2025-01-27 14:30:54,300][01491] Num frames 11100... [2025-01-27 14:30:54,428][01491] Num frames 11200... [2025-01-27 14:30:54,556][01491] Num frames 11300... [2025-01-27 14:30:54,687][01491] Num frames 11400... [2025-01-27 14:30:54,820][01491] Num frames 11500... [2025-01-27 14:30:54,950][01491] Num frames 11600... [2025-01-27 14:30:55,077][01491] Num frames 11700... [2025-01-27 14:30:55,218][01491] Num frames 11800... [2025-01-27 14:30:55,361][01491] Num frames 11900... [2025-01-27 14:30:55,491][01491] Num frames 12000... [2025-01-27 14:30:55,623][01491] Num frames 12100... [2025-01-27 14:30:55,752][01491] Num frames 12200... [2025-01-27 14:30:55,885][01491] Num frames 12300... [2025-01-27 14:30:56,020][01491] Num frames 12400... [2025-01-27 14:30:56,154][01491] Num frames 12500... [2025-01-27 14:30:56,295][01491] Num frames 12600... [2025-01-27 14:30:56,465][01491] Avg episode rewards: #0: 39.986, true rewards: #0: 15.861 [2025-01-27 14:30:56,466][01491] Avg episode reward: 39.986, avg true_objective: 15.861 [2025-01-27 14:30:56,484][01491] Num frames 12700... [2025-01-27 14:30:56,613][01491] Num frames 12800... [2025-01-27 14:30:56,748][01491] Num frames 12900... [2025-01-27 14:30:56,879][01491] Num frames 13000... [2025-01-27 14:30:57,009][01491] Num frames 13100... [2025-01-27 14:30:57,143][01491] Num frames 13200... [2025-01-27 14:30:57,279][01491] Num frames 13300... [2025-01-27 14:30:57,419][01491] Num frames 13400... [2025-01-27 14:30:57,570][01491] Num frames 13500... [2025-01-27 14:30:57,683][01491] Avg episode rewards: #0: 37.373, true rewards: #0: 15.040 [2025-01-27 14:30:57,684][01491] Avg episode reward: 37.373, avg true_objective: 15.040 [2025-01-27 14:30:57,774][01491] Num frames 13600... [2025-01-27 14:30:57,951][01491] Num frames 13700... [2025-01-27 14:30:58,122][01491] Num frames 13800... [2025-01-27 14:30:58,296][01491] Num frames 13900... [2025-01-27 14:30:58,489][01491] Num frames 14000... [2025-01-27 14:30:58,671][01491] Num frames 14100... [2025-01-27 14:30:58,840][01491] Num frames 14200... [2025-01-27 14:30:59,010][01491] Num frames 14300... [2025-01-27 14:30:59,186][01491] Num frames 14400... [2025-01-27 14:30:59,365][01491] Num frames 14500... [2025-01-27 14:30:59,557][01491] Num frames 14600... [2025-01-27 14:30:59,744][01491] Num frames 14700... [2025-01-27 14:30:59,953][01491] Avg episode rewards: #0: 36.784, true rewards: #0: 14.784 [2025-01-27 14:30:59,956][01491] Avg episode reward: 36.784, avg true_objective: 14.784 [2025-01-27 14:32:32,099][01491] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-01-27 14:46:40,340][16219] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-01-27 14:46:40,344][16219] Rollout worker 0 uses device cpu [2025-01-27 14:46:40,348][16219] Rollout worker 1 uses device cpu [2025-01-27 14:46:40,353][16219] Rollout worker 2 uses device cpu [2025-01-27 14:46:40,356][16219] Rollout worker 3 uses device cpu [2025-01-27 14:46:40,359][16219] Rollout worker 4 uses device cpu [2025-01-27 14:46:40,362][16219] Rollout worker 5 uses device cpu [2025-01-27 14:46:40,365][16219] Rollout worker 6 uses device cpu [2025-01-27 14:46:40,368][16219] Rollout worker 7 uses device cpu [2025-01-27 14:46:40,539][16219] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-27 14:46:40,546][16219] InferenceWorker_p0-w0: min num requests: 2 [2025-01-27 14:46:40,594][16219] Starting all processes... [2025-01-27 14:46:40,600][16219] Starting process learner_proc0 [2025-01-27 14:46:40,671][16219] Starting all processes... [2025-01-27 14:46:40,809][16219] Starting process inference_proc0-0 [2025-01-27 14:46:40,809][16219] Starting process rollout_proc0 [2025-01-27 14:46:40,811][16219] Starting process rollout_proc1 [2025-01-27 14:46:40,814][16219] Starting process rollout_proc2 [2025-01-27 14:46:40,819][16219] Starting process rollout_proc3 [2025-01-27 14:46:40,819][16219] Starting process rollout_proc4 [2025-01-27 14:46:40,819][16219] Starting process rollout_proc5 [2025-01-27 14:46:40,819][16219] Starting process rollout_proc6 [2025-01-27 14:46:40,819][16219] Starting process rollout_proc7 [2025-01-27 14:46:55,163][16634] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-27 14:46:55,164][16634] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-01-27 14:46:55,225][16634] Num visible devices: 1 [2025-01-27 14:46:55,268][16634] Starting seed is not provided [2025-01-27 14:46:55,268][16634] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-27 14:46:55,269][16634] Initializing actor-critic model on device cuda:0 [2025-01-27 14:46:55,270][16634] RunningMeanStd input shape: (3, 72, 128) [2025-01-27 14:46:55,271][16634] RunningMeanStd input shape: (1,) [2025-01-27 14:46:55,348][16634] ConvEncoder: input_channels=3 [2025-01-27 14:46:56,282][16634] Conv encoder output size: 512 [2025-01-27 14:46:56,285][16634] Policy head output size: 512 [2025-01-27 14:46:56,300][16656] Worker 4 uses CPU cores [0] [2025-01-27 14:46:56,307][16658] Worker 6 uses CPU cores [0] [2025-01-27 14:46:56,337][16654] Worker 2 uses CPU cores [0] [2025-01-27 14:46:56,375][16634] Created Actor Critic model with architecture: [2025-01-27 14:46:56,383][16634] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-01-27 14:46:56,407][16659] Worker 7 uses CPU cores [1] [2025-01-27 14:46:56,444][16655] Worker 3 uses CPU cores [1] [2025-01-27 14:46:56,513][16653] Worker 1 uses CPU cores [1] [2025-01-27 14:46:56,515][16657] Worker 5 uses CPU cores [1] [2025-01-27 14:46:56,583][16651] Worker 0 uses CPU cores [0] [2025-01-27 14:46:56,642][16652] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-27 14:46:56,643][16652] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-01-27 14:46:56,661][16652] Num visible devices: 1 [2025-01-27 14:46:56,684][16634] Using optimizer [2025-01-27 14:46:57,629][16634] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-01-27 14:46:57,669][16634] Loading model from checkpoint [2025-01-27 14:46:57,671][16634] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2025-01-27 14:46:57,671][16634] Initialized policy 0 weights for model version 978 [2025-01-27 14:46:57,674][16634] LearnerWorker_p0 finished initialization! [2025-01-27 14:46:57,676][16634] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-01-27 14:46:57,777][16652] RunningMeanStd input shape: (3, 72, 128) [2025-01-27 14:46:57,778][16652] RunningMeanStd input shape: (1,) [2025-01-27 14:46:57,792][16652] ConvEncoder: input_channels=3 [2025-01-27 14:46:57,892][16652] Conv encoder output size: 512 [2025-01-27 14:46:57,893][16652] Policy head output size: 512 [2025-01-27 14:46:57,927][16219] Inference worker 0-0 is ready! [2025-01-27 14:46:57,928][16219] All inference workers are ready! Signal rollout workers to start! [2025-01-27 14:46:58,145][16656] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:46:58,144][16654] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:46:58,147][16651] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:46:58,149][16658] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:46:58,192][16659] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:46:58,194][16653] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:46:58,199][16655] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:46:58,200][16657] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:47:00,069][16655] Decorrelating experience for 0 frames... [2025-01-27 14:47:00,075][16659] Decorrelating experience for 0 frames... [2025-01-27 14:47:00,100][16657] Decorrelating experience for 0 frames... [2025-01-27 14:47:00,138][16658] Decorrelating experience for 0 frames... [2025-01-27 14:47:00,141][16654] Decorrelating experience for 0 frames... [2025-01-27 14:47:00,140][16656] Decorrelating experience for 0 frames... [2025-01-27 14:47:00,147][16651] Decorrelating experience for 0 frames... [2025-01-27 14:47:00,454][16219] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-27 14:47:00,530][16219] Heartbeat connected on Batcher_0 [2025-01-27 14:47:00,533][16219] Heartbeat connected on LearnerWorker_p0 [2025-01-27 14:47:00,576][16219] Heartbeat connected on InferenceWorker_p0-w0 [2025-01-27 14:47:00,773][16658] Decorrelating experience for 32 frames... [2025-01-27 14:47:01,583][16655] Decorrelating experience for 32 frames... [2025-01-27 14:47:01,625][16653] Decorrelating experience for 0 frames... [2025-01-27 14:47:01,635][16659] Decorrelating experience for 32 frames... [2025-01-27 14:47:01,815][16651] Decorrelating experience for 32 frames... [2025-01-27 14:47:03,027][16653] Decorrelating experience for 32 frames... [2025-01-27 14:47:03,151][16656] Decorrelating experience for 32 frames... [2025-01-27 14:47:03,215][16658] Decorrelating experience for 64 frames... [2025-01-27 14:47:03,354][16655] Decorrelating experience for 64 frames... [2025-01-27 14:47:04,095][16651] Decorrelating experience for 64 frames... [2025-01-27 14:47:04,939][16659] Decorrelating experience for 64 frames... [2025-01-27 14:47:05,113][16654] Decorrelating experience for 32 frames... [2025-01-27 14:47:05,223][16653] Decorrelating experience for 64 frames... [2025-01-27 14:47:05,248][16658] Decorrelating experience for 96 frames... [2025-01-27 14:47:05,321][16655] Decorrelating experience for 96 frames... [2025-01-27 14:47:05,454][16219] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-27 14:47:05,481][16219] Heartbeat connected on RolloutWorker_w6 [2025-01-27 14:47:05,533][16656] Decorrelating experience for 64 frames... [2025-01-27 14:47:05,555][16219] Heartbeat connected on RolloutWorker_w3 [2025-01-27 14:47:06,437][16657] Decorrelating experience for 32 frames... [2025-01-27 14:47:06,806][16651] Decorrelating experience for 96 frames... [2025-01-27 14:47:07,032][16653] Decorrelating experience for 96 frames... [2025-01-27 14:47:07,063][16219] Heartbeat connected on RolloutWorker_w0 [2025-01-27 14:47:07,332][16654] Decorrelating experience for 64 frames... [2025-01-27 14:47:07,377][16656] Decorrelating experience for 96 frames... [2025-01-27 14:47:07,495][16219] Heartbeat connected on RolloutWorker_w1 [2025-01-27 14:47:07,712][16219] Heartbeat connected on RolloutWorker_w4 [2025-01-27 14:47:09,203][16659] Decorrelating experience for 96 frames... [2025-01-27 14:47:10,226][16219] Heartbeat connected on RolloutWorker_w7 [2025-01-27 14:47:10,454][16219] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 172.2. Samples: 1722. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-01-27 14:47:10,456][16219] Avg episode reward: [(0, '4.118')] [2025-01-27 14:47:10,828][16657] Decorrelating experience for 64 frames... [2025-01-27 14:47:11,405][16654] Decorrelating experience for 96 frames... [2025-01-27 14:47:11,840][16219] Heartbeat connected on RolloutWorker_w2 [2025-01-27 14:47:12,566][16634] Signal inference workers to stop experience collection... [2025-01-27 14:47:12,608][16652] InferenceWorker_p0-w0: stopping experience collection [2025-01-27 14:47:12,730][16657] Decorrelating experience for 96 frames... [2025-01-27 14:47:12,827][16219] Heartbeat connected on RolloutWorker_w5 [2025-01-27 14:47:14,108][16634] Signal inference workers to resume experience collection... [2025-01-27 14:47:14,114][16634] Stopping Batcher_0... [2025-01-27 14:47:14,117][16634] Loop batcher_evt_loop terminating... [2025-01-27 14:47:14,116][16219] Component Batcher_0 stopped! [2025-01-27 14:47:14,180][16652] Weights refcount: 2 0 [2025-01-27 14:47:14,184][16219] Component InferenceWorker_p0-w0 stopped! [2025-01-27 14:47:14,189][16652] Stopping InferenceWorker_p0-w0... [2025-01-27 14:47:14,191][16652] Loop inference_proc0-0_evt_loop terminating... [2025-01-27 14:47:14,468][16219] Component RolloutWorker_w2 stopped! [2025-01-27 14:47:14,474][16654] Stopping RolloutWorker_w2... [2025-01-27 14:47:14,490][16657] Stopping RolloutWorker_w5... [2025-01-27 14:47:14,492][16657] Loop rollout_proc5_evt_loop terminating... [2025-01-27 14:47:14,492][16219] Component RolloutWorker_w5 stopped! [2025-01-27 14:47:14,510][16653] Stopping RolloutWorker_w1... [2025-01-27 14:47:14,505][16219] Component RolloutWorker_w4 stopped! [2025-01-27 14:47:14,511][16653] Loop rollout_proc1_evt_loop terminating... [2025-01-27 14:47:14,512][16219] Component RolloutWorker_w1 stopped! [2025-01-27 14:47:14,517][16656] Stopping RolloutWorker_w4... [2025-01-27 14:47:14,475][16654] Loop rollout_proc2_evt_loop terminating... [2025-01-27 14:47:14,524][16655] Stopping RolloutWorker_w3... [2025-01-27 14:47:14,523][16219] Component RolloutWorker_w6 stopped! [2025-01-27 14:47:14,524][16658] Stopping RolloutWorker_w6... [2025-01-27 14:47:14,518][16656] Loop rollout_proc4_evt_loop terminating... [2025-01-27 14:47:14,527][16219] Component RolloutWorker_w3 stopped! [2025-01-27 14:47:14,528][16658] Loop rollout_proc6_evt_loop terminating... [2025-01-27 14:47:14,530][16655] Loop rollout_proc3_evt_loop terminating... [2025-01-27 14:47:14,541][16659] Stopping RolloutWorker_w7... [2025-01-27 14:47:14,542][16219] Component RolloutWorker_w7 stopped! [2025-01-27 14:47:14,542][16651] Stopping RolloutWorker_w0... [2025-01-27 14:47:14,543][16219] Component RolloutWorker_w0 stopped! [2025-01-27 14:47:14,551][16651] Loop rollout_proc0_evt_loop terminating... [2025-01-27 14:47:14,547][16659] Loop rollout_proc7_evt_loop terminating... [2025-01-27 14:47:15,390][16634] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2025-01-27 14:47:15,550][16634] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000904_3702784.pth [2025-01-27 14:47:15,571][16634] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2025-01-27 14:47:15,764][16219] Component LearnerWorker_p0 stopped! [2025-01-27 14:47:15,764][16634] Stopping LearnerWorker_p0... [2025-01-27 14:47:15,768][16634] Loop learner_proc0_evt_loop terminating... [2025-01-27 14:47:15,765][16219] Waiting for process learner_proc0 to stop... [2025-01-27 14:47:17,237][16219] Waiting for process inference_proc0-0 to join... [2025-01-27 14:47:17,241][16219] Waiting for process rollout_proc0 to join... [2025-01-27 14:47:19,670][16219] Waiting for process rollout_proc1 to join... [2025-01-27 14:47:19,814][16219] Waiting for process rollout_proc2 to join... [2025-01-27 14:47:19,821][16219] Waiting for process rollout_proc3 to join... [2025-01-27 14:47:19,825][16219] Waiting for process rollout_proc4 to join... [2025-01-27 14:47:19,828][16219] Waiting for process rollout_proc5 to join... [2025-01-27 14:47:19,832][16219] Waiting for process rollout_proc6 to join... [2025-01-27 14:47:19,835][16219] Waiting for process rollout_proc7 to join... [2025-01-27 14:47:19,839][16219] Batcher 0 profile tree view: batching: 1.4211, releasing_batches: 0.0049 [2025-01-27 14:47:19,840][16219] InferenceWorker_p0-w0 profile tree view: update_model: 0.0230 wait_policy: 0.0007 wait_policy_total: 10.7861 one_step: 0.0025 handle_policy_step: 3.6354 deserialize: 0.0618, stack: 0.0124, obs_to_device_normalize: 0.7022, forward: 2.2998, send_messages: 0.0773 prepare_outputs: 0.3720 to_cpu: 0.2345 [2025-01-27 14:47:19,844][16219] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 3.2811 train: 3.5078 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0005, kl_divergence: 0.0141, after_optimizer: 0.0427 calculate_losses: 2.0856 losses_init: 0.0000, forward_head: 0.4075, bptt_initial: 1.5426, tail: 0.0849, advantages_returns: 0.0011, losses: 0.0457 bptt: 0.0035 bptt_forward_core: 0.0034 update: 1.3635 clip: 0.0427 [2025-01-27 14:47:19,845][16219] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0020, enqueue_policy_requests: 0.9381, env_step: 3.6045, overhead: 0.0772, complete_rollouts: 0.0353 save_policy_outputs: 0.0919 split_output_tensors: 0.0384 [2025-01-27 14:47:19,850][16219] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0006, enqueue_policy_requests: 0.5804, env_step: 1.4623, overhead: 0.0210, complete_rollouts: 0.0005 save_policy_outputs: 0.0357 split_output_tensors: 0.0129 [2025-01-27 14:47:19,852][16219] Loop Runner_EvtLoop terminating... [2025-01-27 14:47:19,854][16219] Runner profile tree view: main_loop: 39.2603 [2025-01-27 14:47:19,855][16219] Collected {0: 4014080}, FPS: 208.7 [2025-01-27 14:47:20,060][16219] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-27 14:47:20,062][16219] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-27 14:47:20,064][16219] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-27 14:47:20,066][16219] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-27 14:47:20,068][16219] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-27 14:47:20,069][16219] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-27 14:47:20,073][16219] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-01-27 14:47:20,074][16219] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-27 14:47:20,075][16219] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-01-27 14:47:20,079][16219] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-01-27 14:47:20,080][16219] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-27 14:47:20,081][16219] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-27 14:47:20,082][16219] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-27 14:47:20,084][16219] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-27 14:47:20,085][16219] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-27 14:47:20,117][16219] Doom resolution: 160x120, resize resolution: (128, 72) [2025-01-27 14:47:20,120][16219] RunningMeanStd input shape: (3, 72, 128) [2025-01-27 14:47:20,124][16219] RunningMeanStd input shape: (1,) [2025-01-27 14:47:20,144][16219] ConvEncoder: input_channels=3 [2025-01-27 14:47:20,280][16219] Conv encoder output size: 512 [2025-01-27 14:47:20,282][16219] Policy head output size: 512 [2025-01-27 14:47:20,459][16219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2025-01-27 14:47:21,268][16219] Num frames 100... [2025-01-27 14:47:21,397][16219] Num frames 200... [2025-01-27 14:47:21,523][16219] Num frames 300... [2025-01-27 14:47:21,652][16219] Num frames 400... [2025-01-27 14:47:21,780][16219] Num frames 500... [2025-01-27 14:47:21,906][16219] Num frames 600... [2025-01-27 14:47:22,036][16219] Num frames 700... [2025-01-27 14:47:22,167][16219] Num frames 800... [2025-01-27 14:47:22,307][16219] Num frames 900... [2025-01-27 14:47:22,432][16219] Num frames 1000... [2025-01-27 14:47:22,556][16219] Num frames 1100... [2025-01-27 14:47:22,679][16219] Num frames 1200... [2025-01-27 14:47:22,834][16219] Avg episode rewards: #0: 25.800, true rewards: #0: 12.800 [2025-01-27 14:47:22,835][16219] Avg episode reward: 25.800, avg true_objective: 12.800 [2025-01-27 14:47:22,864][16219] Num frames 1300... [2025-01-27 14:47:22,992][16219] Num frames 1400... [2025-01-27 14:47:23,121][16219] Num frames 1500... [2025-01-27 14:47:23,251][16219] Num frames 1600... [2025-01-27 14:47:23,381][16219] Num frames 1700... [2025-01-27 14:47:23,506][16219] Num frames 1800... [2025-01-27 14:47:23,636][16219] Num frames 1900... [2025-01-27 14:47:23,760][16219] Num frames 2000... [2025-01-27 14:47:23,888][16219] Num frames 2100... [2025-01-27 14:47:24,017][16219] Num frames 2200... [2025-01-27 14:47:24,153][16219] Num frames 2300... [2025-01-27 14:47:24,277][16219] Num frames 2400... [2025-01-27 14:47:24,410][16219] Num frames 2500... [2025-01-27 14:47:24,536][16219] Num frames 2600... [2025-01-27 14:47:24,661][16219] Num frames 2700... [2025-01-27 14:47:24,789][16219] Num frames 2800... [2025-01-27 14:47:24,875][16219] Avg episode rewards: #0: 31.620, true rewards: #0: 14.120 [2025-01-27 14:47:24,876][16219] Avg episode reward: 31.620, avg true_objective: 14.120 [2025-01-27 14:47:24,975][16219] Num frames 2900... [2025-01-27 14:47:25,100][16219] Num frames 3000... [2025-01-27 14:47:25,229][16219] Num frames 3100... [2025-01-27 14:47:25,361][16219] Num frames 3200... [2025-01-27 14:47:25,484][16219] Num frames 3300... [2025-01-27 14:47:25,545][16219] Avg episode rewards: #0: 23.347, true rewards: #0: 11.013 [2025-01-27 14:47:25,546][16219] Avg episode reward: 23.347, avg true_objective: 11.013 [2025-01-27 14:47:25,670][16219] Num frames 3400... [2025-01-27 14:47:25,793][16219] Num frames 3500... [2025-01-27 14:47:25,919][16219] Num frames 3600... [2025-01-27 14:47:26,047][16219] Num frames 3700... [2025-01-27 14:47:26,181][16219] Num frames 3800... [2025-01-27 14:47:26,306][16219] Num frames 3900... [2025-01-27 14:47:26,438][16219] Num frames 4000... [2025-01-27 14:47:26,562][16219] Num frames 4100... [2025-01-27 14:47:26,692][16219] Num frames 4200... [2025-01-27 14:47:26,820][16219] Num frames 4300... [2025-01-27 14:47:26,947][16219] Num frames 4400... [2025-01-27 14:47:27,073][16219] Avg episode rewards: #0: 24.140, true rewards: #0: 11.140 [2025-01-27 14:47:27,075][16219] Avg episode reward: 24.140, avg true_objective: 11.140 [2025-01-27 14:47:27,140][16219] Num frames 4500... [2025-01-27 14:47:27,268][16219] Num frames 4600... [2025-01-27 14:47:27,404][16219] Num frames 4700... [2025-01-27 14:47:27,529][16219] Num frames 4800... [2025-01-27 14:47:27,655][16219] Num frames 4900... [2025-01-27 14:47:27,792][16219] Num frames 5000... [2025-01-27 14:47:27,918][16219] Num frames 5100... [2025-01-27 14:47:28,045][16219] Num frames 5200... [2025-01-27 14:47:28,150][16219] Avg episode rewards: #0: 22.480, true rewards: #0: 10.480 [2025-01-27 14:47:28,152][16219] Avg episode reward: 22.480, avg true_objective: 10.480 [2025-01-27 14:47:28,228][16219] Num frames 5300... [2025-01-27 14:47:28,352][16219] Num frames 5400... [2025-01-27 14:47:28,482][16219] Num frames 5500... [2025-01-27 14:47:28,605][16219] Num frames 5600... [2025-01-27 14:47:28,731][16219] Num frames 5700... [2025-01-27 14:47:28,856][16219] Num frames 5800... [2025-01-27 14:47:28,986][16219] Num frames 5900... [2025-01-27 14:47:29,116][16219] Num frames 6000... [2025-01-27 14:47:29,247][16219] Num frames 6100... [2025-01-27 14:47:29,371][16219] Num frames 6200... [2025-01-27 14:47:29,503][16219] Num frames 6300... [2025-01-27 14:47:29,680][16219] Num frames 6400... [2025-01-27 14:47:29,853][16219] Num frames 6500... [2025-01-27 14:47:30,025][16219] Num frames 6600... [2025-01-27 14:47:30,199][16219] Num frames 6700... [2025-01-27 14:47:30,366][16219] Num frames 6800... [2025-01-27 14:47:30,527][16219] Avg episode rewards: #0: 25.428, true rewards: #0: 11.428 [2025-01-27 14:47:30,529][16219] Avg episode reward: 25.428, avg true_objective: 11.428 [2025-01-27 14:47:30,602][16219] Num frames 6900... [2025-01-27 14:47:30,774][16219] Num frames 7000... [2025-01-27 14:47:30,946][16219] Num frames 7100... [2025-01-27 14:47:31,126][16219] Num frames 7200... [2025-01-27 14:47:31,308][16219] Num frames 7300... [2025-01-27 14:47:31,484][16219] Num frames 7400... [2025-01-27 14:47:31,547][16219] Avg episode rewards: #0: 22.859, true rewards: #0: 10.573 [2025-01-27 14:47:31,548][16219] Avg episode reward: 22.859, avg true_objective: 10.573 [2025-01-27 14:47:31,722][16219] Num frames 7500... [2025-01-27 14:47:31,904][16219] Num frames 7600... [2025-01-27 14:47:32,045][16219] Num frames 7700... [2025-01-27 14:47:32,178][16219] Num frames 7800... [2025-01-27 14:47:32,304][16219] Num frames 7900... [2025-01-27 14:47:32,454][16219] Avg episode rewards: #0: 21.221, true rewards: #0: 9.971 [2025-01-27 14:47:32,456][16219] Avg episode reward: 21.221, avg true_objective: 9.971 [2025-01-27 14:47:32,485][16219] Num frames 8000... [2025-01-27 14:47:32,615][16219] Num frames 8100... [2025-01-27 14:47:32,738][16219] Num frames 8200... [2025-01-27 14:47:32,878][16219] Num frames 8300... [2025-01-27 14:47:33,006][16219] Num frames 8400... [2025-01-27 14:47:33,138][16219] Num frames 8500... [2025-01-27 14:47:33,266][16219] Num frames 8600... [2025-01-27 14:47:33,394][16219] Num frames 8700... [2025-01-27 14:47:33,479][16219] Avg episode rewards: #0: 20.910, true rewards: #0: 9.688 [2025-01-27 14:47:33,481][16219] Avg episode reward: 20.910, avg true_objective: 9.688 [2025-01-27 14:47:33,592][16219] Num frames 8800... [2025-01-27 14:47:33,723][16219] Num frames 8900... [2025-01-27 14:47:33,850][16219] Num frames 9000... [2025-01-27 14:47:33,979][16219] Num frames 9100... [2025-01-27 14:47:34,108][16219] Num frames 9200... [2025-01-27 14:47:34,243][16219] Num frames 9300... [2025-01-27 14:47:34,371][16219] Num frames 9400... [2025-01-27 14:47:34,497][16219] Num frames 9500... [2025-01-27 14:47:34,576][16219] Avg episode rewards: #0: 20.819, true rewards: #0: 9.519 [2025-01-27 14:47:34,578][16219] Avg episode reward: 20.819, avg true_objective: 9.519 [2025-01-27 14:48:30,305][16219] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-01-27 14:49:38,883][16219] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-01-27 14:49:38,885][16219] Overriding arg 'num_workers' with value 1 passed from command line [2025-01-27 14:49:38,887][16219] Adding new argument 'no_render'=True that is not in the saved config file! [2025-01-27 14:49:38,889][16219] Adding new argument 'save_video'=True that is not in the saved config file! [2025-01-27 14:49:38,890][16219] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-01-27 14:49:38,892][16219] Adding new argument 'video_name'=None that is not in the saved config file! [2025-01-27 14:49:38,894][16219] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-01-27 14:49:38,895][16219] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-01-27 14:49:38,896][16219] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-01-27 14:49:38,897][16219] Adding new argument 'hf_repository'='earian/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-01-27 14:49:38,898][16219] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-01-27 14:49:38,899][16219] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-01-27 14:49:38,900][16219] Adding new argument 'train_script'=None that is not in the saved config file! [2025-01-27 14:49:38,901][16219] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-01-27 14:49:38,902][16219] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-01-27 14:49:38,938][16219] RunningMeanStd input shape: (3, 72, 128) [2025-01-27 14:49:38,940][16219] RunningMeanStd input shape: (1,) [2025-01-27 14:49:38,956][16219] ConvEncoder: input_channels=3 [2025-01-27 14:49:38,990][16219] Conv encoder output size: 512 [2025-01-27 14:49:38,991][16219] Policy head output size: 512 [2025-01-27 14:49:39,010][16219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth... [2025-01-27 14:49:39,445][16219] Num frames 100... [2025-01-27 14:49:39,571][16219] Num frames 200... [2025-01-27 14:49:39,696][16219] Num frames 300... [2025-01-27 14:49:39,821][16219] Num frames 400... [2025-01-27 14:49:39,956][16219] Num frames 500... [2025-01-27 14:49:40,083][16219] Num frames 600... [2025-01-27 14:49:40,214][16219] Num frames 700... [2025-01-27 14:49:40,340][16219] Num frames 800... [2025-01-27 14:49:40,515][16219] Avg episode rewards: #0: 19.960, true rewards: #0: 8.960 [2025-01-27 14:49:40,517][16219] Avg episode reward: 19.960, avg true_objective: 8.960 [2025-01-27 14:49:40,525][16219] Num frames 900... [2025-01-27 14:49:40,650][16219] Num frames 1000... [2025-01-27 14:49:40,776][16219] Num frames 1100... [2025-01-27 14:49:40,914][16219] Num frames 1200... [2025-01-27 14:49:41,041][16219] Num frames 1300... [2025-01-27 14:49:41,176][16219] Num frames 1400... [2025-01-27 14:49:41,302][16219] Num frames 1500... [2025-01-27 14:49:41,432][16219] Num frames 1600... [2025-01-27 14:49:41,560][16219] Num frames 1700... [2025-01-27 14:49:41,688][16219] Num frames 1800... [2025-01-27 14:49:41,818][16219] Num frames 1900... [2025-01-27 14:49:41,954][16219] Num frames 2000... [2025-01-27 14:49:42,087][16219] Num frames 2100... [2025-01-27 14:49:42,241][16219] Num frames 2200... [2025-01-27 14:49:42,375][16219] Num frames 2300... [2025-01-27 14:49:42,515][16219] Avg episode rewards: #0: 29.840, true rewards: #0: 11.840 [2025-01-27 14:49:42,516][16219] Avg episode reward: 29.840, avg true_objective: 11.840 [2025-01-27 14:49:42,560][16219] Num frames 2400... [2025-01-27 14:49:42,682][16219] Num frames 2500... [2025-01-27 14:49:42,810][16219] Num frames 2600... [2025-01-27 14:49:42,945][16219] Num frames 2700... [2025-01-27 14:49:43,073][16219] Num frames 2800... [2025-01-27 14:49:43,207][16219] Num frames 2900... [2025-01-27 14:49:43,334][16219] Num frames 3000... [2025-01-27 14:49:43,460][16219] Num frames 3100... [2025-01-27 14:49:43,589][16219] Num frames 3200... [2025-01-27 14:49:43,718][16219] Num frames 3300... [2025-01-27 14:49:43,846][16219] Num frames 3400... [2025-01-27 14:49:43,983][16219] Num frames 3500... [2025-01-27 14:49:44,112][16219] Num frames 3600... [2025-01-27 14:49:44,271][16219] Num frames 3700... [2025-01-27 14:49:44,443][16219] Num frames 3800... [2025-01-27 14:49:44,612][16219] Num frames 3900... [2025-01-27 14:49:44,778][16219] Num frames 4000... [2025-01-27 14:49:44,986][16219] Avg episode rewards: #0: 34.986, true rewards: #0: 13.653 [2025-01-27 14:49:44,989][16219] Avg episode reward: 34.986, avg true_objective: 13.653 [2025-01-27 14:49:44,998][16219] Num frames 4100... [2025-01-27 14:49:45,173][16219] Num frames 4200... [2025-01-27 14:49:45,334][16219] Num frames 4300... [2025-01-27 14:49:45,498][16219] Num frames 4400... [2025-01-27 14:49:45,670][16219] Num frames 4500... [2025-01-27 14:49:45,842][16219] Num frames 4600... [2025-01-27 14:49:46,020][16219] Avg episode rewards: #0: 28.680, true rewards: #0: 11.680 [2025-01-27 14:49:46,022][16219] Avg episode reward: 28.680, avg true_objective: 11.680 [2025-01-27 14:49:46,077][16219] Num frames 4700... [2025-01-27 14:49:46,256][16219] Num frames 4800... [2025-01-27 14:49:46,427][16219] Num frames 4900... [2025-01-27 14:49:46,599][16219] Num frames 5000... [2025-01-27 14:49:46,752][16219] Num frames 5100... [2025-01-27 14:49:46,886][16219] Num frames 5200... [2025-01-27 14:49:47,008][16219] Num frames 5300... [2025-01-27 14:49:47,148][16219] Num frames 5400... [2025-01-27 14:49:47,275][16219] Num frames 5500... [2025-01-27 14:49:47,401][16219] Num frames 5600... [2025-01-27 14:49:47,527][16219] Num frames 5700... [2025-01-27 14:49:47,656][16219] Num frames 5800... [2025-01-27 14:49:47,788][16219] Num frames 5900... [2025-01-27 14:49:47,913][16219] Num frames 6000... [2025-01-27 14:49:48,070][16219] Avg episode rewards: #0: 29.966, true rewards: #0: 12.166 [2025-01-27 14:49:48,072][16219] Avg episode reward: 29.966, avg true_objective: 12.166 [2025-01-27 14:49:48,098][16219] Num frames 6100... [2025-01-27 14:49:48,233][16219] Num frames 6200... [2025-01-27 14:49:48,357][16219] Num frames 6300... [2025-01-27 14:49:48,479][16219] Num frames 6400... [2025-01-27 14:49:48,606][16219] Num frames 6500... [2025-01-27 14:49:48,733][16219] Num frames 6600... [2025-01-27 14:49:48,862][16219] Num frames 6700... [2025-01-27 14:49:48,988][16219] Num frames 6800... [2025-01-27 14:49:49,068][16219] Avg episode rewards: #0: 27.365, true rewards: #0: 11.365 [2025-01-27 14:49:49,069][16219] Avg episode reward: 27.365, avg true_objective: 11.365 [2025-01-27 14:49:49,187][16219] Num frames 6900... [2025-01-27 14:49:49,314][16219] Num frames 7000... [2025-01-27 14:49:49,441][16219] Num frames 7100... [2025-01-27 14:49:49,569][16219] Num frames 7200... [2025-01-27 14:49:49,693][16219] Num frames 7300... [2025-01-27 14:49:49,821][16219] Num frames 7400... [2025-01-27 14:49:49,946][16219] Num frames 7500... [2025-01-27 14:49:50,073][16219] Num frames 7600... [2025-01-27 14:49:50,217][16219] Num frames 7700... [2025-01-27 14:49:50,341][16219] Num frames 7800... [2025-01-27 14:49:50,468][16219] Num frames 7900... [2025-01-27 14:49:50,593][16219] Num frames 8000... [2025-01-27 14:49:50,724][16219] Num frames 8100... [2025-01-27 14:49:50,851][16219] Num frames 8200... [2025-01-27 14:49:50,977][16219] Num frames 8300... [2025-01-27 14:49:51,101][16219] Num frames 8400... [2025-01-27 14:49:51,246][16219] Num frames 8500... [2025-01-27 14:49:51,374][16219] Num frames 8600... [2025-01-27 14:49:51,454][16219] Avg episode rewards: #0: 30.741, true rewards: #0: 12.313 [2025-01-27 14:49:51,456][16219] Avg episode reward: 30.741, avg true_objective: 12.313 [2025-01-27 14:49:51,558][16219] Num frames 8700... [2025-01-27 14:49:51,690][16219] Num frames 8800... [2025-01-27 14:49:51,815][16219] Num frames 8900... [2025-01-27 14:49:51,940][16219] Num frames 9000... [2025-01-27 14:49:52,068][16219] Num frames 9100... [2025-01-27 14:49:52,209][16219] Num frames 9200... [2025-01-27 14:49:52,339][16219] Num frames 9300... [2025-01-27 14:49:52,463][16219] Num frames 9400... [2025-01-27 14:49:52,590][16219] Num frames 9500... [2025-01-27 14:49:52,715][16219] Num frames 9600... [2025-01-27 14:49:52,841][16219] Num frames 9700... [2025-01-27 14:49:52,965][16219] Num frames 9800... [2025-01-27 14:49:53,091][16219] Num frames 9900... [2025-01-27 14:49:53,230][16219] Num frames 10000... [2025-01-27 14:49:53,398][16219] Avg episode rewards: #0: 31.239, true rewards: #0: 12.614 [2025-01-27 14:49:53,400][16219] Avg episode reward: 31.239, avg true_objective: 12.614 [2025-01-27 14:49:53,413][16219] Num frames 10100... [2025-01-27 14:49:53,536][16219] Num frames 10200... [2025-01-27 14:49:53,663][16219] Num frames 10300... [2025-01-27 14:49:53,789][16219] Num frames 10400... [2025-01-27 14:49:53,914][16219] Num frames 10500... [2025-01-27 14:49:54,018][16219] Avg episode rewards: #0: 28.821, true rewards: #0: 11.710 [2025-01-27 14:49:54,020][16219] Avg episode reward: 28.821, avg true_objective: 11.710 [2025-01-27 14:49:54,100][16219] Num frames 10600... [2025-01-27 14:49:54,238][16219] Num frames 10700... [2025-01-27 14:49:54,365][16219] Num frames 10800... [2025-01-27 14:49:54,488][16219] Num frames 10900... [2025-01-27 14:49:54,616][16219] Num frames 11000... [2025-01-27 14:49:54,747][16219] Num frames 11100... [2025-01-27 14:49:54,872][16219] Num frames 11200... [2025-01-27 14:49:54,999][16219] Num frames 11300... [2025-01-27 14:49:55,132][16219] Num frames 11400... [2025-01-27 14:49:55,267][16219] Num frames 11500... [2025-01-27 14:49:55,398][16219] Num frames 11600... [2025-01-27 14:49:55,533][16219] Num frames 11700... [2025-01-27 14:49:55,664][16219] Num frames 11800... [2025-01-27 14:49:55,790][16219] Num frames 11900... [2025-01-27 14:49:55,920][16219] Num frames 12000... [2025-01-27 14:49:56,002][16219] Avg episode rewards: #0: 29.720, true rewards: #0: 12.020 [2025-01-27 14:49:56,003][16219] Avg episode reward: 29.720, avg true_objective: 12.020 [2025-01-27 14:51:11,020][16219] Replay video saved to /content/train_dir/default_experiment/replay.mp4!