[2025-08-21 19:28:00,745][02158] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-08-21 19:28:00,747][02158] Rollout worker 0 uses device cpu
[2025-08-21 19:28:00,748][02158] Rollout worker 1 uses device cpu
[2025-08-21 19:28:00,750][02158] Rollout worker 2 uses device cpu
[2025-08-21 19:28:00,750][02158] Rollout worker 3 uses device cpu
[2025-08-21 19:28:00,752][02158] Rollout worker 4 uses device cpu
[2025-08-21 19:28:00,752][02158] Rollout worker 5 uses device cpu
[2025-08-21 19:28:00,753][02158] Rollout worker 6 uses device cpu
[2025-08-21 19:28:00,754][02158] Rollout worker 7 uses device cpu
[2025-08-21 19:28:00,898][02158] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-08-21 19:28:00,899][02158] InferenceWorker_p0-w0: min num requests: 2
[2025-08-21 19:28:00,936][02158] Starting all processes...
[2025-08-21 19:28:00,937][02158] Starting process learner_proc0
[2025-08-21 19:28:00,999][02158] Starting all processes...
[2025-08-21 19:28:01,005][02158] Starting process inference_proc0-0
[2025-08-21 19:28:01,007][02158] Starting process rollout_proc0
[2025-08-21 19:28:01,007][02158] Starting process rollout_proc1
[2025-08-21 19:28:01,007][02158] Starting process rollout_proc2
[2025-08-21 19:28:01,007][02158] Starting process rollout_proc3
[2025-08-21 19:28:01,007][02158] Starting process rollout_proc4
[2025-08-21 19:28:01,007][02158] Starting process rollout_proc5
[2025-08-21 19:28:01,007][02158] Starting process rollout_proc6
[2025-08-21 19:28:01,007][02158] Starting process rollout_proc7
[2025-08-21 19:28:18,263][02316] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-08-21 19:28:18,264][02316] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-08-21 19:28:18,395][02316] Num visible devices: 1
[2025-08-21 19:28:18,410][02316] Starting seed is not provided
[2025-08-21 19:28:18,410][02316] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-08-21 19:28:18,411][02316] Initializing actor-critic model on device cuda:0
[2025-08-21 19:28:18,412][02316] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 19:28:18,414][02316] RunningMeanStd input shape: (1,)
[2025-08-21 19:28:18,568][02316] ConvEncoder: input_channels=3
[2025-08-21 19:28:18,729][02329] Worker 0 uses CPU cores [0]
[2025-08-21 19:28:19,178][02336] Worker 6 uses CPU cores [0]
[2025-08-21 19:28:19,206][02337] Worker 7 uses CPU cores [1]
[2025-08-21 19:28:19,479][02331] Worker 2 uses CPU cores [0]
[2025-08-21 19:28:19,568][02334] Worker 5 uses CPU cores [1]
[2025-08-21 19:28:19,718][02330] Worker 1 uses CPU cores [1]
[2025-08-21 19:28:19,730][02332] Worker 3 uses CPU cores [1]
[2025-08-21 19:28:19,864][02316] Conv encoder output size: 512
[2025-08-21 19:28:19,866][02316] Policy head output size: 512
[2025-08-21 19:28:19,934][02335] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-08-21 19:28:19,935][02335] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-08-21 19:28:19,993][02335] Num visible devices: 1
[2025-08-21 19:28:19,994][02316] Created Actor Critic model with architecture:
[2025-08-21 19:28:19,994][02316] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2025-08-21 19:28:20,148][02333] Worker 4 uses CPU cores [0]
[2025-08-21 19:28:20,558][02316] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-08-21 19:28:20,890][02158] Heartbeat connected on Batcher_0
[2025-08-21 19:28:20,900][02158] Heartbeat connected on InferenceWorker_p0-w0
[2025-08-21 19:28:20,910][02158] Heartbeat connected on RolloutWorker_w0
[2025-08-21 19:28:20,918][02158] Heartbeat connected on RolloutWorker_w2
[2025-08-21 19:28:20,921][02158] Heartbeat connected on RolloutWorker_w1
[2025-08-21 19:28:20,921][02158] Heartbeat connected on RolloutWorker_w3
[2025-08-21 19:28:20,924][02158] Heartbeat connected on RolloutWorker_w4
[2025-08-21 19:28:20,929][02158] Heartbeat connected on RolloutWorker_w5
[2025-08-21 19:28:20,932][02158] Heartbeat connected on RolloutWorker_w6
[2025-08-21 19:28:20,936][02158] Heartbeat connected on RolloutWorker_w7
[2025-08-21 19:28:26,543][02316] No checkpoints found
[2025-08-21 19:28:26,543][02316] Did not load from checkpoint, starting from scratch!
[2025-08-21 19:28:26,543][02316] Initialized policy 0 weights for model version 0
[2025-08-21 19:28:26,547][02316] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-08-21 19:28:26,553][02316] LearnerWorker_p0 finished initialization!
[2025-08-21 19:28:26,556][02158] Heartbeat connected on LearnerWorker_p0
[2025-08-21 19:28:26,658][02335] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 19:28:26,659][02335] RunningMeanStd input shape: (1,)
[2025-08-21 19:28:26,669][02335] ConvEncoder: input_channels=3
[2025-08-21 19:28:26,759][02335] Conv encoder output size: 512
[2025-08-21 19:28:26,759][02335] Policy head output size: 512
[2025-08-21 19:28:26,792][02158] Inference worker 0-0 is ready!
[2025-08-21 19:28:26,793][02158] All inference workers are ready! Signal rollout workers to start!
[2025-08-21 19:28:26,981][02331] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:28:26,980][02330] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:28:26,979][02337] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:28:26,982][02333] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:28:26,978][02329] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:28:26,985][02336] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:28:26,989][02332] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:28:26,987][02334] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:28:27,951][02337] Decorrelating experience for 0 frames...
[2025-08-21 19:28:27,955][02330] Decorrelating experience for 0 frames...
[2025-08-21 19:28:28,256][02329] Decorrelating experience for 0 frames...
[2025-08-21 19:28:28,259][02336] Decorrelating experience for 0 frames...
[2025-08-21 19:28:28,266][02333] Decorrelating experience for 0 frames...
[2025-08-21 19:28:29,008][02329] Decorrelating experience for 32 frames...
[2025-08-21 19:28:29,019][02333] Decorrelating experience for 32 frames...
[2025-08-21 19:28:29,113][02330] Decorrelating experience for 32 frames...
[2025-08-21 19:28:29,118][02332] Decorrelating experience for 0 frames...
[2025-08-21 19:28:29,787][02337] Decorrelating experience for 32 frames...
[2025-08-21 19:28:29,784][02334] Decorrelating experience for 0 frames...
[2025-08-21 19:28:30,015][02336] Decorrelating experience for 32 frames...
[2025-08-21 19:28:30,080][02158] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-08-21 19:28:30,199][02329] Decorrelating experience for 64 frames...
[2025-08-21 19:28:30,652][02332] Decorrelating experience for 32 frames...
[2025-08-21 19:28:31,034][02330] Decorrelating experience for 64 frames...
[2025-08-21 19:28:31,374][02334] Decorrelating experience for 32 frames...
[2025-08-21 19:28:31,608][02331] Decorrelating experience for 0 frames...
[2025-08-21 19:28:31,954][02333] Decorrelating experience for 64 frames...
[2025-08-21 19:28:32,515][02336] Decorrelating experience for 64 frames...
[2025-08-21 19:28:32,554][02337] Decorrelating experience for 64 frames...
[2025-08-21 19:28:33,214][02332] Decorrelating experience for 64 frames...
[2025-08-21 19:28:33,573][02330] Decorrelating experience for 96 frames...
[2025-08-21 19:28:33,589][02331] Decorrelating experience for 32 frames...
[2025-08-21 19:28:33,592][02329] Decorrelating experience for 96 frames...
[2025-08-21 19:28:34,280][02333] Decorrelating experience for 96 frames...
[2025-08-21 19:28:34,320][02337] Decorrelating experience for 96 frames...
[2025-08-21 19:28:35,073][02336] Decorrelating experience for 96 frames...
[2025-08-21 19:28:35,080][02158] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-08-21 19:28:35,086][02332] Decorrelating experience for 96 frames...
[2025-08-21 19:28:35,476][02334] Decorrelating experience for 64 frames...
[2025-08-21 19:28:36,039][02331] Decorrelating experience for 64 frames...
[2025-08-21 19:28:37,498][02334] Decorrelating experience for 96 frames...
[2025-08-21 19:28:38,649][02331] Decorrelating experience for 96 frames...
[2025-08-21 19:28:39,003][02316] Signal inference workers to stop experience collection...
[2025-08-21 19:28:39,025][02335] InferenceWorker_p0-w0: stopping experience collection
[2025-08-21 19:28:40,080][02158] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 241.8. Samples: 2418. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-08-21 19:28:40,081][02158] Avg episode reward: [(0, '2.952')]
[2025-08-21 19:28:41,235][02316] Signal inference workers to resume experience collection...
[2025-08-21 19:28:41,237][02335] InferenceWorker_p0-w0: resuming experience collection
[2025-08-21 19:28:45,080][02158] Fps is (10 sec: 2457.6, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 24576. Throughput: 0: 439.7. Samples: 6596. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:28:45,086][02158] Avg episode reward: [(0, '3.619')]
[2025-08-21 19:28:50,080][02158] Fps is (10 sec: 3686.4, 60 sec: 1843.2, 300 sec: 1843.2). Total num frames: 36864. Throughput: 0: 437.6. Samples: 8752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:28:50,085][02158] Avg episode reward: [(0, '3.902')]
[2025-08-21 19:28:50,225][02335] Updated weights for policy 0, policy_version 10 (0.0089)
[2025-08-21 19:28:55,080][02158] Fps is (10 sec: 3686.5, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 61440. Throughput: 0: 603.4. Samples: 15086. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:28:55,081][02158] Avg episode reward: [(0, '4.337')]
[2025-08-21 19:28:58,868][02335] Updated weights for policy 0, policy_version 20 (0.0014)
[2025-08-21 19:29:00,083][02158] Fps is (10 sec: 4913.5, 60 sec: 2866.9, 300 sec: 2866.9). Total num frames: 86016. Throughput: 0: 735.0. Samples: 22052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:29:00,085][02158] Avg episode reward: [(0, '4.350')]
[2025-08-21 19:29:05,080][02158] Fps is (10 sec: 4096.0, 60 sec: 2925.7, 300 sec: 2925.7). Total num frames: 102400. Throughput: 0: 693.0. Samples: 24254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:29:05,083][02158] Avg episode reward: [(0, '4.342')]
[2025-08-21 19:29:05,090][02316] Saving new best policy, reward=4.342!
[2025-08-21 19:29:09,462][02335] Updated weights for policy 0, policy_version 30 (0.0021)
[2025-08-21 19:29:10,080][02158] Fps is (10 sec: 3687.7, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 122880. Throughput: 0: 765.1. Samples: 30602. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:29:10,084][02158] Avg episode reward: [(0, '4.404')]
[2025-08-21 19:29:10,087][02316] Saving new best policy, reward=4.404!
[2025-08-21 19:29:15,080][02158] Fps is (10 sec: 4096.0, 60 sec: 3185.8, 300 sec: 3185.8). Total num frames: 143360. Throughput: 0: 814.5. Samples: 36654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:29:15,083][02158] Avg episode reward: [(0, '4.343')]
[2025-08-21 19:29:20,080][02158] Fps is (10 sec: 3686.3, 60 sec: 3194.9, 300 sec: 3194.9). Total num frames: 159744. Throughput: 0: 860.6. Samples: 38728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:29:20,084][02158] Avg episode reward: [(0, '4.422')]
[2025-08-21 19:29:20,087][02316] Saving new best policy, reward=4.422!
[2025-08-21 19:29:20,845][02335] Updated weights for policy 0, policy_version 40 (0.0012)
[2025-08-21 19:29:25,080][02158] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 180224. Throughput: 0: 945.8. Samples: 44978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:29:25,081][02158] Avg episode reward: [(0, '4.362')]
[2025-08-21 19:29:30,082][02158] Fps is (10 sec: 4095.2, 60 sec: 3344.9, 300 sec: 3344.9). Total num frames: 200704. Throughput: 0: 999.7. Samples: 51584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:29:30,083][02158] Avg episode reward: [(0, '4.410')]
[2025-08-21 19:29:30,213][02335] Updated weights for policy 0, policy_version 50 (0.0023)
[2025-08-21 19:29:35,080][02158] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3339.8). Total num frames: 217088. Throughput: 0: 997.2. Samples: 53624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:29:35,081][02158] Avg episode reward: [(0, '4.460')]
[2025-08-21 19:29:35,087][02316] Saving new best policy, reward=4.460!
[2025-08-21 19:29:40,080][02158] Fps is (10 sec: 4096.8, 60 sec: 4027.7, 300 sec: 3452.3). Total num frames: 241664. Throughput: 0: 992.6. Samples: 59754. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:29:40,081][02158] Avg episode reward: [(0, '4.504')]
[2025-08-21 19:29:40,089][02316] Saving new best policy, reward=4.504!
[2025-08-21 19:29:40,810][02335] Updated weights for policy 0, policy_version 60 (0.0015)
[2025-08-21 19:29:45,081][02158] Fps is (10 sec: 4095.5, 60 sec: 3891.1, 300 sec: 3440.6). Total num frames: 258048. Throughput: 0: 966.4. Samples: 65540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:29:45,082][02158] Avg episode reward: [(0, '4.595')]
[2025-08-21 19:29:45,087][02316] Saving new best policy, reward=4.595!
[2025-08-21 19:29:50,080][02158] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 3430.4). Total num frames: 274432. Throughput: 0: 958.8. Samples: 67400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:29:50,082][02158] Avg episode reward: [(0, '4.531')]
[2025-08-21 19:29:52,178][02335] Updated weights for policy 0, policy_version 70 (0.0011)
[2025-08-21 19:29:55,080][02158] Fps is (10 sec: 4096.5, 60 sec: 3959.5, 300 sec: 3517.7). Total num frames: 299008. Throughput: 0: 969.3. Samples: 74220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:29:55,084][02158] Avg episode reward: [(0, '4.446')]
[2025-08-21 19:29:55,091][02316] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth...
[2025-08-21 19:30:00,080][02158] Fps is (10 sec: 4505.6, 60 sec: 3891.4, 300 sec: 3549.9). Total num frames: 319488. Throughput: 0: 969.0. Samples: 80258. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:30:00,090][02158] Avg episode reward: [(0, '4.331')]
[2025-08-21 19:30:02,747][02335] Updated weights for policy 0, policy_version 80 (0.0017)
[2025-08-21 19:30:05,080][02158] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3535.5). Total num frames: 335872. Throughput: 0: 968.6. Samples: 82316. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-08-21 19:30:05,084][02158] Avg episode reward: [(0, '4.274')]
[2025-08-21 19:30:10,080][02158] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3563.5). Total num frames: 356352. Throughput: 0: 980.9. Samples: 89118. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:30:10,083][02158] Avg episode reward: [(0, '4.268')]
[2025-08-21 19:30:12,006][02335] Updated weights for policy 0, policy_version 90 (0.0019)
[2025-08-21 19:30:15,080][02158] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3588.9). Total num frames: 376832. Throughput: 0: 967.5. Samples: 95120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:30:15,084][02158] Avg episode reward: [(0, '4.318')]
[2025-08-21 19:30:20,080][02158] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3574.7). Total num frames: 393216. Throughput: 0: 966.6. Samples: 97120. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:30:20,081][02158] Avg episode reward: [(0, '4.311')]
[2025-08-21 19:30:22,908][02335] Updated weights for policy 0, policy_version 100 (0.0032)
[2025-08-21 19:30:25,080][02158] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3633.0). Total num frames: 417792. Throughput: 0: 982.6. Samples: 103972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:30:25,083][02158] Avg episode reward: [(0, '4.630')]
[2025-08-21 19:30:25,089][02316] Saving new best policy, reward=4.630!
[2025-08-21 19:30:30,084][02158] Fps is (10 sec: 4094.1, 60 sec: 3891.0, 300 sec: 3618.0). Total num frames: 434176. Throughput: 0: 984.8. Samples: 109858. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-08-21 19:30:30,086][02158] Avg episode reward: [(0, '4.643')]
[2025-08-21 19:30:30,088][02316] Saving new best policy, reward=4.643!
[2025-08-21 19:30:33,866][02335] Updated weights for policy 0, policy_version 110 (0.0012)
[2025-08-21 19:30:35,080][02158] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3637.2). Total num frames: 454656. Throughput: 0: 991.9. Samples: 112036. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:30:35,082][02158] Avg episode reward: [(0, '4.356')]
[2025-08-21 19:30:40,080][02158] Fps is (10 sec: 4507.7, 60 sec: 3959.5, 300 sec: 3686.4). Total num frames: 479232. Throughput: 0: 999.8. Samples: 119210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:30:40,083][02158] Avg episode reward: [(0, '4.361')]
[2025-08-21 19:30:42,489][02335] Updated weights for policy 0, policy_version 120 (0.0021)
[2025-08-21 19:30:45,080][02158] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3671.2). Total num frames: 495616. Throughput: 0: 995.1. Samples: 125038. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:30:45,081][02158] Avg episode reward: [(0, '4.522')]
[2025-08-21 19:30:50,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 516096. Throughput: 0: 1000.0. Samples: 127318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:30:50,081][02158] Avg episode reward: [(0, '4.632')]
[2025-08-21 19:30:53,377][02335] Updated weights for policy 0, policy_version 130 (0.0022)
[2025-08-21 19:30:55,080][02158] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3700.5). Total num frames: 536576. Throughput: 0: 1002.2. Samples: 134218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:30:55,084][02158] Avg episode reward: [(0, '4.758')]
[2025-08-21 19:30:55,090][02316] Saving new best policy, reward=4.758!
[2025-08-21 19:31:00,081][02158] Fps is (10 sec: 4095.6, 60 sec: 3959.4, 300 sec: 3713.7). Total num frames: 557056. Throughput: 0: 992.9. Samples: 139800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:31:00,082][02158] Avg episode reward: [(0, '4.691')]
[2025-08-21 19:31:05,087][02158] Fps is (10 sec: 3274.5, 60 sec: 3890.7, 300 sec: 3673.0). Total num frames: 569344. Throughput: 0: 987.6. Samples: 141570. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:31:05,090][02158] Avg episode reward: [(0, '4.630')]
[2025-08-21 19:31:06,214][02335] Updated weights for policy 0, policy_version 140 (0.0017)
[2025-08-21 19:31:10,080][02158] Fps is (10 sec: 3277.0, 60 sec: 3891.2, 300 sec: 3686.4). Total num frames: 589824. Throughput: 0: 954.6. Samples: 146930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:31:10,081][02158] Avg episode reward: [(0, '4.455')]
[2025-08-21 19:31:15,080][02158] Fps is (10 sec: 3689.0, 60 sec: 3822.9, 300 sec: 3674.0). Total num frames: 606208. Throughput: 0: 952.9. Samples: 152732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:31:15,084][02158] Avg episode reward: [(0, '4.363')]
[2025-08-21 19:31:16,870][02335] Updated weights for policy 0, policy_version 150 (0.0020)
[2025-08-21 19:31:20,080][02158] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3686.4). Total num frames: 626688. Throughput: 0: 953.8. Samples: 154956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:31:20,087][02158] Avg episode reward: [(0, '4.421')]
[2025-08-21 19:31:25,080][02158] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3698.1). Total num frames: 647168. Throughput: 0: 946.1. Samples: 161784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:31:25,081][02158] Avg episode reward: [(0, '4.609')]
[2025-08-21 19:31:26,168][02335] Updated weights for policy 0, policy_version 160 (0.0020)
[2025-08-21 19:31:30,081][02158] Fps is (10 sec: 4095.5, 60 sec: 3891.4, 300 sec: 3709.1). Total num frames: 667648. Throughput: 0: 945.5. Samples: 167588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:31:30,082][02158] Avg episode reward: [(0, '4.615')]
[2025-08-21 19:31:35,081][02158] Fps is (10 sec: 3686.0, 60 sec: 3822.9, 300 sec: 3697.4). Total num frames: 684032. Throughput: 0: 946.0. Samples: 169890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:31:35,084][02158] Avg episode reward: [(0, '4.605')]
[2025-08-21 19:31:36,954][02335] Updated weights for policy 0, policy_version 170 (0.0024)
[2025-08-21 19:31:40,080][02158] Fps is (10 sec: 4096.5, 60 sec: 3822.9, 300 sec: 3729.5). Total num frames: 708608. Throughput: 0: 945.1. Samples: 176748. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:31:40,081][02158] Avg episode reward: [(0, '4.501')]
[2025-08-21 19:31:45,080][02158] Fps is (10 sec: 4096.5, 60 sec: 3822.9, 300 sec: 3717.9). Total num frames: 724992. Throughput: 0: 943.6. Samples: 182260. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:31:45,081][02158] Avg episode reward: [(0, '4.322')]
[2025-08-21 19:31:47,861][02335] Updated weights for policy 0, policy_version 180 (0.0018)
[2025-08-21 19:31:50,080][02158] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3727.4). Total num frames: 745472. Throughput: 0: 960.3. Samples: 184778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:31:50,081][02158] Avg episode reward: [(0, '4.526')]
[2025-08-21 19:31:55,080][02158] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3736.4). Total num frames: 765952. Throughput: 0: 994.2. Samples: 191668. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:31:55,084][02158] Avg episode reward: [(0, '4.784')]
[2025-08-21 19:31:55,110][02316] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth...
[2025-08-21 19:31:55,224][02316] Saving new best policy, reward=4.784!
[2025-08-21 19:31:57,170][02335] Updated weights for policy 0, policy_version 190 (0.0022)
[2025-08-21 19:32:00,080][02158] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3744.9). Total num frames: 786432. Throughput: 0: 983.6. Samples: 196994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:32:00,081][02158] Avg episode reward: [(0, '4.607')]
[2025-08-21 19:32:05,080][02158] Fps is (10 sec: 3686.4, 60 sec: 3891.7, 300 sec: 3734.0). Total num frames: 802816. Throughput: 0: 992.8. Samples: 199632. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:32:05,083][02158] Avg episode reward: [(0, '4.632')]
[2025-08-21 19:32:07,792][02335] Updated weights for policy 0, policy_version 200 (0.0016)
[2025-08-21 19:32:10,080][02158] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3760.9). Total num frames: 827392. Throughput: 0: 993.5. Samples: 206490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:32:10,083][02158] Avg episode reward: [(0, '4.740')]
[2025-08-21 19:32:15,080][02158] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3750.1). Total num frames: 843776. Throughput: 0: 983.8. Samples: 211856. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:32:15,084][02158] Avg episode reward: [(0, '4.690')]
[2025-08-21 19:32:18,804][02335] Updated weights for policy 0, policy_version 210 (0.0013)
[2025-08-21 19:32:20,080][02158] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3757.6). Total num frames: 864256. Throughput: 0: 992.1. Samples: 214534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:32:20,083][02158] Avg episode reward: [(0, '4.600')]
[2025-08-21 19:32:25,081][02158] Fps is (10 sec: 4505.1, 60 sec: 4027.7, 300 sec: 3782.2). Total num frames: 888832. Throughput: 0: 992.7. Samples: 221420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:32:25,084][02158] Avg episode reward: [(0, '4.535')]
[2025-08-21 19:32:28,425][02335] Updated weights for policy 0, policy_version 220 (0.0011)
[2025-08-21 19:32:30,083][02158] Fps is (10 sec: 4094.6, 60 sec: 3959.3, 300 sec: 3771.7). Total num frames: 905216. Throughput: 0: 988.9. Samples: 226764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:32:30,085][02158] Avg episode reward: [(0, '4.607')]
[2025-08-21 19:32:35,080][02158] Fps is (10 sec: 3686.8, 60 sec: 4027.8, 300 sec: 3778.4). Total num frames: 925696. Throughput: 0: 994.8. Samples: 229544. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:32:35,084][02158] Avg episode reward: [(0, '4.567')]
[2025-08-21 19:32:38,656][02335] Updated weights for policy 0, policy_version 230 (0.0019)
[2025-08-21 19:32:40,080][02158] Fps is (10 sec: 4097.4, 60 sec: 3959.5, 300 sec: 3784.7). Total num frames: 946176. Throughput: 0: 993.7. Samples: 236386. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:32:40,089][02158] Avg episode reward: [(0, '4.807')]
[2025-08-21 19:32:40,091][02316] Saving new best policy, reward=4.807!
[2025-08-21 19:32:45,084][02158] Fps is (10 sec: 3684.9, 60 sec: 3959.2, 300 sec: 3774.7). Total num frames: 962560. Throughput: 0: 989.7. Samples: 241536. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:32:45,085][02158] Avg episode reward: [(0, '4.755')]
[2025-08-21 19:32:49,291][02335] Updated weights for policy 0, policy_version 240 (0.0020)
[2025-08-21 19:32:50,080][02158] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3780.9). Total num frames: 983040. Throughput: 0: 997.1. Samples: 244500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:32:50,081][02158] Avg episode reward: [(0, '4.559')]
[2025-08-21 19:32:55,080][02158] Fps is (10 sec: 4507.5, 60 sec: 4027.7, 300 sec: 3802.3). Total num frames: 1007616. Throughput: 0: 1006.4. Samples: 251778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:32:55,081][02158] Avg episode reward: [(0, '4.365')]
[2025-08-21 19:32:59,098][02335] Updated weights for policy 0, policy_version 250 (0.0014)
[2025-08-21 19:33:00,081][02158] Fps is (10 sec: 4095.5, 60 sec: 3959.4, 300 sec: 3792.6). Total num frames: 1024000. Throughput: 0: 1005.3. Samples: 257094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:33:00,084][02158] Avg episode reward: [(0, '4.445')]
[2025-08-21 19:33:05,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3813.0). Total num frames: 1048576. Throughput: 0: 1012.8. Samples: 260110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:33:05,084][02158] Avg episode reward: [(0, '4.688')]
[2025-08-21 19:33:08,647][02335] Updated weights for policy 0, policy_version 260 (0.0028)
[2025-08-21 19:33:10,080][02158] Fps is (10 sec: 4506.2, 60 sec: 4027.7, 300 sec: 3818.1). Total num frames: 1069056. Throughput: 0: 1012.7. Samples: 266990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:33:10,084][02158] Avg episode reward: [(0, '4.870')]
[2025-08-21 19:33:10,088][02316] Saving new best policy, reward=4.870!
[2025-08-21 19:33:15,083][02158] Fps is (10 sec: 3685.2, 60 sec: 4027.5, 300 sec: 3808.5). Total num frames: 1085440. Throughput: 0: 1004.1. Samples: 271946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:33:15,084][02158] Avg episode reward: [(0, '4.764')]
[2025-08-21 19:33:19,531][02335] Updated weights for policy 0, policy_version 270 (0.0017)
[2025-08-21 19:33:20,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3813.5). Total num frames: 1105920. Throughput: 0: 1010.8. Samples: 275030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:33:20,081][02158] Avg episode reward: [(0, '4.716')]
[2025-08-21 19:33:25,080][02158] Fps is (10 sec: 4507.0, 60 sec: 4027.8, 300 sec: 3832.2). Total num frames: 1130496. Throughput: 0: 1014.5. Samples: 282038. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:33:25,088][02158] Avg episode reward: [(0, '4.593')]
[2025-08-21 19:33:30,083][02158] Fps is (10 sec: 3685.3, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 1142784. Throughput: 0: 1014.8. Samples: 287200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:33:30,086][02158] Avg episode reward: [(0, '4.832')]
[2025-08-21 19:33:30,148][02335] Updated weights for policy 0, policy_version 280 (0.0020)
[2025-08-21 19:33:35,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3957.2). Total num frames: 1167360. Throughput: 0: 1021.5. Samples: 290468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:33:35,084][02158] Avg episode reward: [(0, '4.984')]
[2025-08-21 19:33:35,090][02316] Saving new best policy, reward=4.984!
[2025-08-21 19:33:38,802][02335] Updated weights for policy 0, policy_version 290 (0.0028)
[2025-08-21 19:33:40,080][02158] Fps is (10 sec: 4916.7, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1191936. Throughput: 0: 1016.3. Samples: 297512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:33:40,081][02158] Avg episode reward: [(0, '4.819')]
[2025-08-21 19:33:45,080][02158] Fps is (10 sec: 4095.9, 60 sec: 4096.3, 300 sec: 3971.0). Total num frames: 1208320. Throughput: 0: 1013.1. Samples: 302682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:33:45,084][02158] Avg episode reward: [(0, '5.092')]
[2025-08-21 19:33:45,090][02316] Saving new best policy, reward=5.092!
[2025-08-21 19:33:49,141][02335] Updated weights for policy 0, policy_version 300 (0.0036)
[2025-08-21 19:33:50,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 1232896. Throughput: 0: 1023.5. Samples: 306168. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:33:50,084][02158] Avg episode reward: [(0, '5.382')]
[2025-08-21 19:33:50,087][02316] Saving new best policy, reward=5.382!
[2025-08-21 19:33:55,080][02158] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1253376. Throughput: 0: 1031.2. Samples: 313392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:33:55,083][02158] Avg episode reward: [(0, '5.702')]
[2025-08-21 19:33:55,152][02316] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000307_1257472.pth...
[2025-08-21 19:33:55,276][02316] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth
[2025-08-21 19:33:55,287][02316] Saving new best policy, reward=5.702!
[2025-08-21 19:33:59,611][02335] Updated weights for policy 0, policy_version 310 (0.0020)
[2025-08-21 19:34:00,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 3957.2). Total num frames: 1269760. Throughput: 0: 1030.3. Samples: 318306. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:34:00,084][02158] Avg episode reward: [(0, '5.593')]
[2025-08-21 19:34:05,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3971.0). Total num frames: 1294336. Throughput: 0: 1041.7. Samples: 321906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:34:05,081][02158] Avg episode reward: [(0, '5.400')]
[2025-08-21 19:34:08,070][02335] Updated weights for policy 0, policy_version 320 (0.0022)
[2025-08-21 19:34:10,080][02158] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 3984.9). Total num frames: 1318912. Throughput: 0: 1047.9. Samples: 329194. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:34:10,083][02158] Avg episode reward: [(0, '5.315')]
[2025-08-21 19:34:15,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.2, 300 sec: 3971.0). Total num frames: 1331200. Throughput: 0: 1042.2. Samples: 334098. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:34:15,081][02158] Avg episode reward: [(0, '5.260')]
[2025-08-21 19:34:18,385][02335] Updated weights for policy 0, policy_version 330 (0.0020)
[2025-08-21 19:34:20,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3984.9). Total num frames: 1355776. Throughput: 0: 1049.7. Samples: 337706. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:34:20,081][02158] Avg episode reward: [(0, '5.562')]
[2025-08-21 19:34:25,083][02158] Fps is (10 sec: 4913.7, 60 sec: 4164.1, 300 sec: 3998.8). Total num frames: 1380352. Throughput: 0: 1053.8. Samples: 344934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:34:25,085][02158] Avg episode reward: [(0, '5.435')]
[2025-08-21 19:34:28,378][02335] Updated weights for policy 0, policy_version 340 (0.0015)
[2025-08-21 19:34:30,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4232.8, 300 sec: 3998.8). Total num frames: 1396736. Throughput: 0: 1050.5. Samples: 349956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:34:30,084][02158] Avg episode reward: [(0, '5.254')]
[2025-08-21 19:34:35,080][02158] Fps is (10 sec: 4097.2, 60 sec: 4232.5, 300 sec: 3998.8). Total num frames: 1421312. Throughput: 0: 1054.2. Samples: 353608. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:34:35,084][02158] Avg episode reward: [(0, '5.550')]
[2025-08-21 19:34:37,315][02335] Updated weights for policy 0, policy_version 350 (0.0015)
[2025-08-21 19:34:40,080][02158] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4026.6). Total num frames: 1445888. Throughput: 0: 1053.9. Samples: 360818. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:34:40,081][02158] Avg episode reward: [(0, '5.635')]
[2025-08-21 19:34:45,080][02158] Fps is (10 sec: 3686.3, 60 sec: 4164.3, 300 sec: 4012.7). Total num frames: 1458176. Throughput: 0: 1055.1. Samples: 365784. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:34:45,081][02158] Avg episode reward: [(0, '5.662')]
[2025-08-21 19:34:47,656][02335] Updated weights for policy 0, policy_version 360 (0.0015)
[2025-08-21 19:34:50,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4012.7). Total num frames: 1482752. Throughput: 0: 1055.1. Samples: 369384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:34:50,081][02158] Avg episode reward: [(0, '5.602')]
[2025-08-21 19:34:55,080][02158] Fps is (10 sec: 4915.3, 60 sec: 4232.5, 300 sec: 4026.6). Total num frames: 1507328. Throughput: 0: 1054.3. Samples: 376638. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:34:55,083][02158] Avg episode reward: [(0, '5.167')]
[2025-08-21 19:34:57,568][02335] Updated weights for policy 0, policy_version 370 (0.0022)
[2025-08-21 19:35:00,080][02158] Fps is (10 sec: 4095.9, 60 sec: 4232.5, 300 sec: 4026.6). Total num frames: 1523712. Throughput: 0: 1055.6. Samples: 381602. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:35:00,083][02158] Avg episode reward: [(0, '5.232')]
[2025-08-21 19:35:05,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4040.5). Total num frames: 1548288. Throughput: 0: 1055.7. Samples: 385212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:35:05,081][02158] Avg episode reward: [(0, '6.109')]
[2025-08-21 19:35:05,087][02316] Saving new best policy, reward=6.109!
[2025-08-21 19:35:06,527][02335] Updated weights for policy 0, policy_version 380 (0.0011)
[2025-08-21 19:35:10,083][02158] Fps is (10 sec: 4504.1, 60 sec: 4164.0, 300 sec: 4040.4). Total num frames: 1568768. Throughput: 0: 1054.3. Samples: 392376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:35:10,084][02158] Avg episode reward: [(0, '6.492')]
[2025-08-21 19:35:10,098][02316] Saving new best policy, reward=6.492!
[2025-08-21 19:35:15,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4232.5, 300 sec: 4040.5). Total num frames: 1585152. Throughput: 0: 1053.5. Samples: 397362. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:35:15,081][02158] Avg episode reward: [(0, '6.100')]
[2025-08-21 19:35:16,920][02335] Updated weights for policy 0, policy_version 390 (0.0021)
[2025-08-21 19:35:20,080][02158] Fps is (10 sec: 4097.4, 60 sec: 4232.5, 300 sec: 4040.5). Total num frames: 1609728. Throughput: 0: 1051.8. Samples: 400940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:35:20,081][02158] Avg episode reward: [(0, '6.141')]
[2025-08-21 19:35:25,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.5, 300 sec: 4054.4). Total num frames: 1630208. Throughput: 0: 1048.3. Samples: 407992. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:35:25,086][02158] Avg episode reward: [(0, '6.681')]
[2025-08-21 19:35:25,093][02316] Saving new best policy, reward=6.681!
[2025-08-21 19:35:26,769][02335] Updated weights for policy 0, policy_version 400 (0.0011)
[2025-08-21 19:35:30,082][02158] Fps is (10 sec: 3685.7, 60 sec: 4164.1, 300 sec: 4040.4). Total num frames: 1646592. Throughput: 0: 1032.3. Samples: 412240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:35:30,083][02158] Avg episode reward: [(0, '6.904')]
[2025-08-21 19:35:30,087][02316] Saving new best policy, reward=6.904!
[2025-08-21 19:35:35,080][02158] Fps is (10 sec: 3276.7, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1662976. Throughput: 0: 1005.0. Samples: 414608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:35:35,081][02158] Avg episode reward: [(0, '6.771')]
[2025-08-21 19:35:37,628][02335] Updated weights for policy 0, policy_version 410 (0.0023)
[2025-08-21 19:35:40,087][02158] Fps is (10 sec: 4094.0, 60 sec: 4027.3, 300 sec: 4040.4). Total num frames: 1687552. Throughput: 0: 1000.9. Samples: 421684. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:35:40,089][02158] Avg episode reward: [(0, '7.111')]
[2025-08-21 19:35:40,093][02316] Saving new best policy, reward=7.111!
[2025-08-21 19:35:45,080][02158] Fps is (10 sec: 4096.2, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1703936. Throughput: 0: 1004.1. Samples: 426788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:35:45,081][02158] Avg episode reward: [(0, '7.367')]
[2025-08-21 19:35:45,092][02316] Saving new best policy, reward=7.367!
[2025-08-21 19:35:48,066][02335] Updated weights for policy 0, policy_version 420 (0.0011)
[2025-08-21 19:35:50,080][02158] Fps is (10 sec: 4098.8, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1728512. Throughput: 0: 1003.8. Samples: 430382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:35:50,081][02158] Avg episode reward: [(0, '7.651')]
[2025-08-21 19:35:50,085][02316] Saving new best policy, reward=7.651!
[2025-08-21 19:35:55,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1748992. Throughput: 0: 996.5. Samples: 437216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:35:55,084][02158] Avg episode reward: [(0, '7.611')]
[2025-08-21 19:35:55,093][02316] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000427_1748992.pth...
[2025-08-21 19:35:55,245][02316] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000188_770048.pth
[2025-08-21 19:35:58,464][02335] Updated weights for policy 0, policy_version 430 (0.0016)
[2025-08-21 19:36:00,083][02158] Fps is (10 sec: 3685.4, 60 sec: 4027.5, 300 sec: 4054.4). Total num frames: 1765376. Throughput: 0: 1005.0. Samples: 442592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:36:00,085][02158] Avg episode reward: [(0, '8.203')]
[2025-08-21 19:36:00,112][02316] Saving new best policy, reward=8.203!
[2025-08-21 19:36:05,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1789952. Throughput: 0: 1003.2. Samples: 446086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:36:05,081][02158] Avg episode reward: [(0, '8.632')]
[2025-08-21 19:36:05,090][02316] Saving new best policy, reward=8.632!
[2025-08-21 19:36:07,082][02335] Updated weights for policy 0, policy_version 440 (0.0033)
[2025-08-21 19:36:10,080][02158] Fps is (10 sec: 4507.0, 60 sec: 4028.0, 300 sec: 4082.1). Total num frames: 1810432. Throughput: 0: 992.8. Samples: 452668. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:36:10,085][02158] Avg episode reward: [(0, '8.371')]
[2025-08-21 19:36:15,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1830912. Throughput: 0: 1020.8. Samples: 458174. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:36:15,081][02158] Avg episode reward: [(0, '7.797')]
[2025-08-21 19:36:17,505][02335] Updated weights for policy 0, policy_version 450 (0.0029)
[2025-08-21 19:36:20,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 1855488. Throughput: 0: 1048.0. Samples: 461768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:36:20,081][02158] Avg episode reward: [(0, '7.527')]
[2025-08-21 19:36:25,080][02158] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 1871872. Throughput: 0: 1038.5. Samples: 468410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:36:25,087][02158] Avg episode reward: [(0, '7.681')]
[2025-08-21 19:36:27,837][02335] Updated weights for policy 0, policy_version 460 (0.0011)
[2025-08-21 19:36:30,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4096.0). Total num frames: 1892352. Throughput: 0: 1052.0. Samples: 474128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:36:30,083][02158] Avg episode reward: [(0, '8.667')]
[2025-08-21 19:36:30,085][02316] Saving new best policy, reward=8.667!
[2025-08-21 19:36:35,080][02158] Fps is (10 sec: 4505.7, 60 sec: 4232.6, 300 sec: 4096.0). Total num frames: 1916928. Throughput: 0: 1051.0. Samples: 477678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:36:35,082][02158] Avg episode reward: [(0, '9.759')]
[2025-08-21 19:36:35,088][02316] Saving new best policy, reward=9.759!
[2025-08-21 19:36:36,385][02335] Updated weights for policy 0, policy_version 470 (0.0012)
[2025-08-21 19:36:40,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.7, 300 sec: 4109.9). Total num frames: 1937408. Throughput: 0: 1040.2. Samples: 484026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:36:40,083][02158] Avg episode reward: [(0, '10.829')]
[2025-08-21 19:36:40,084][02316] Saving new best policy, reward=10.829!
[2025-08-21 19:36:45,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 1953792. Throughput: 0: 1045.9. Samples: 489654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:36:45,081][02158] Avg episode reward: [(0, '10.728')]
[2025-08-21 19:36:46,925][02335] Updated weights for policy 0, policy_version 480 (0.0023)
[2025-08-21 19:36:50,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 1978368. Throughput: 0: 1046.6. Samples: 493182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:36:50,081][02158] Avg episode reward: [(0, '10.105')]
[2025-08-21 19:36:55,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 1998848. Throughput: 0: 1041.9. Samples: 499554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:36:55,084][02158] Avg episode reward: [(0, '9.095')]
[2025-08-21 19:36:57,330][02335] Updated weights for policy 0, policy_version 490 (0.0015)
[2025-08-21 19:37:00,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4232.7, 300 sec: 4123.8). Total num frames: 2019328. Throughput: 0: 1049.2. Samples: 505390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:37:00,085][02158] Avg episode reward: [(0, '9.966')]
[2025-08-21 19:37:05,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 2043904. Throughput: 0: 1050.0. Samples: 509016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:37:05,085][02158] Avg episode reward: [(0, '10.056')]
[2025-08-21 19:37:05,810][02335] Updated weights for policy 0, policy_version 500 (0.0017)
[2025-08-21 19:37:10,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 2060288. Throughput: 0: 1040.5. Samples: 515230. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:37:10,087][02158] Avg episode reward: [(0, '9.808')]
[2025-08-21 19:37:15,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 2080768. Throughput: 0: 1047.8. Samples: 521280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:37:15,083][02158] Avg episode reward: [(0, '9.347')]
[2025-08-21 19:37:16,176][02335] Updated weights for policy 0, policy_version 510 (0.0018)
[2025-08-21 19:37:20,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 2105344. Throughput: 0: 1047.5. Samples: 524816. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:37:20,081][02158] Avg episode reward: [(0, '9.463')]
[2025-08-21 19:37:25,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 2121728. Throughput: 0: 1041.3. Samples: 530886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:37:25,084][02158] Avg episode reward: [(0, '10.514')]
[2025-08-21 19:37:26,532][02335] Updated weights for policy 0, policy_version 520 (0.0012)
[2025-08-21 19:37:30,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 2146304. Throughput: 0: 1054.9. Samples: 537124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:37:30,084][02158] Avg episode reward: [(0, '12.477')]
[2025-08-21 19:37:30,087][02316] Saving new best policy, reward=12.477!
[2025-08-21 19:37:35,055][02335] Updated weights for policy 0, policy_version 530 (0.0022)
[2025-08-21 19:37:35,080][02158] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4151.5). Total num frames: 2170880. Throughput: 0: 1056.0. Samples: 540704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:37:35,081][02158] Avg episode reward: [(0, '14.177')]
[2025-08-21 19:37:35,085][02316] Saving new best policy, reward=14.177!
[2025-08-21 19:37:40,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4151.6). Total num frames: 2187264. Throughput: 0: 1043.6. Samples: 546516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:37:40,086][02158] Avg episode reward: [(0, '14.876')]
[2025-08-21 19:37:40,091][02316] Saving new best policy, reward=14.876!
[2025-08-21 19:37:45,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4232.5, 300 sec: 4151.5). Total num frames: 2207744. Throughput: 0: 1052.6. Samples: 552756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:37:45,085][02158] Avg episode reward: [(0, '15.394')]
[2025-08-21 19:37:45,097][02316] Saving new best policy, reward=15.394!
[2025-08-21 19:37:45,517][02335] Updated weights for policy 0, policy_version 540 (0.0016)
[2025-08-21 19:37:50,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4151.5). Total num frames: 2232320. Throughput: 0: 1049.0. Samples: 556220. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:37:50,081][02158] Avg episode reward: [(0, '14.481')]
[2025-08-21 19:37:55,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4151.6). Total num frames: 2248704. Throughput: 0: 1038.9. Samples: 561980. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:37:55,081][02158] Avg episode reward: [(0, '15.061')]
[2025-08-21 19:37:55,089][02316] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000549_2248704.pth...
[2025-08-21 19:37:55,235][02316] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000307_1257472.pth
[2025-08-21 19:37:56,075][02335] Updated weights for policy 0, policy_version 550 (0.0022)
[2025-08-21 19:38:00,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 2269184. Throughput: 0: 1046.3. Samples: 568362. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:38:00,085][02158] Avg episode reward: [(0, '13.618')]
[2025-08-21 19:38:04,481][02335] Updated weights for policy 0, policy_version 560 (0.0015)
[2025-08-21 19:38:05,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 2293760. Throughput: 0: 1047.2. Samples: 571940. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:38:05,086][02158] Avg episode reward: [(0, '12.990')]
[2025-08-21 19:38:10,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4151.6). Total num frames: 2310144. Throughput: 0: 1036.9. Samples: 577548. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:38:10,082][02158] Avg episode reward: [(0, '12.770')]
[2025-08-21 19:38:14,988][02335] Updated weights for policy 0, policy_version 570 (0.0022)
[2025-08-21 19:38:15,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 2334720. Throughput: 0: 1046.1. Samples: 584200. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:38:15,083][02158] Avg episode reward: [(0, '14.517')]
[2025-08-21 19:38:20,080][02158] Fps is (10 sec: 4505.3, 60 sec: 4164.2, 300 sec: 4151.5). Total num frames: 2355200. Throughput: 0: 1042.7. Samples: 587628. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:38:20,083][02158] Avg episode reward: [(0, '14.403')]
[2025-08-21 19:38:25,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4165.5). Total num frames: 2371584. Throughput: 0: 1034.7. Samples: 593076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:38:25,084][02158] Avg episode reward: [(0, '15.084')]
[2025-08-21 19:38:25,541][02335] Updated weights for policy 0, policy_version 580 (0.0020)
[2025-08-21 19:38:30,080][02158] Fps is (10 sec: 4096.2, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 2396160. Throughput: 0: 1043.2. Samples: 599700. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:38:30,084][02158] Avg episode reward: [(0, '16.020')]
[2025-08-21 19:38:30,087][02316] Saving new best policy, reward=16.020!
[2025-08-21 19:38:34,123][02335] Updated weights for policy 0, policy_version 590 (0.0019)
[2025-08-21 19:38:35,080][02158] Fps is (10 sec: 4915.1, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 2420736. Throughput: 0: 1043.6. Samples: 603180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:38:35,081][02158] Avg episode reward: [(0, '15.834')]
[2025-08-21 19:38:40,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4151.5). Total num frames: 2433024. Throughput: 0: 1034.2. Samples: 608520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:38:40,084][02158] Avg episode reward: [(0, '16.158')]
[2025-08-21 19:38:40,088][02316] Saving new best policy, reward=16.158!
[2025-08-21 19:38:44,754][02335] Updated weights for policy 0, policy_version 600 (0.0020)
[2025-08-21 19:38:45,080][02158] Fps is (10 sec: 3686.5, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 2457600. Throughput: 0: 1039.2. Samples: 615128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:38:45,085][02158] Avg episode reward: [(0, '16.416')]
[2025-08-21 19:38:45,097][02316] Saving new best policy, reward=16.416!
[2025-08-21 19:38:50,080][02158] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 2482176. Throughput: 0: 1035.2. Samples: 618522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:38:50,082][02158] Avg episode reward: [(0, '15.465')]
[2025-08-21 19:38:55,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4151.5). Total num frames: 2494464. Throughput: 0: 1028.4. Samples: 623824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:38:55,081][02158] Avg episode reward: [(0, '16.250')]
[2025-08-21 19:38:55,245][02335] Updated weights for policy 0, policy_version 610 (0.0014)
[2025-08-21 19:39:00,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 2519040. Throughput: 0: 1029.2. Samples: 630512. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:39:00,081][02158] Avg episode reward: [(0, '16.379')]
[2025-08-21 19:39:04,175][02335] Updated weights for policy 0, policy_version 620 (0.0014)
[2025-08-21 19:39:05,083][02158] Fps is (10 sec: 4504.2, 60 sec: 4095.8, 300 sec: 4137.6). Total num frames: 2539520. Throughput: 0: 1032.1. Samples: 634076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:39:05,084][02158] Avg episode reward: [(0, '15.571')]
[2025-08-21 19:39:10,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4151.5). Total num frames: 2555904. Throughput: 0: 1026.1. Samples: 639250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:39:10,084][02158] Avg episode reward: [(0, '15.970')]
[2025-08-21 19:39:14,436][02335] Updated weights for policy 0, policy_version 630 (0.0032)
[2025-08-21 19:39:15,080][02158] Fps is (10 sec: 4097.3, 60 sec: 4096.0, 300 sec: 4151.5). Total num frames: 2580480. Throughput: 0: 1030.8. Samples: 646086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:39:15,082][02158] Avg episode reward: [(0, '15.880')]
[2025-08-21 19:39:20,082][02158] Fps is (10 sec: 4914.2, 60 sec: 4164.2, 300 sec: 4151.6). Total num frames: 2605056. Throughput: 0: 1031.0. Samples: 649578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:39:20,084][02158] Avg episode reward: [(0, '17.246')]
[2025-08-21 19:39:20,087][02316] Saving new best policy, reward=17.246!
[2025-08-21 19:39:25,079][02335] Updated weights for policy 0, policy_version 640 (0.0031)
[2025-08-21 19:39:25,081][02158] Fps is (10 sec: 4095.7, 60 sec: 4164.2, 300 sec: 4151.5). Total num frames: 2621440. Throughput: 0: 1024.1. Samples: 654606. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:39:25,082][02158] Avg episode reward: [(0, '17.860')]
[2025-08-21 19:39:25,087][02316] Saving new best policy, reward=17.860!
[2025-08-21 19:39:30,080][02158] Fps is (10 sec: 3687.0, 60 sec: 4096.0, 300 sec: 4137.6). Total num frames: 2641920. Throughput: 0: 1029.5. Samples: 661458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:39:30,081][02158] Avg episode reward: [(0, '18.267')]
[2025-08-21 19:39:30,088][02316] Saving new best policy, reward=18.267!
[2025-08-21 19:39:33,879][02335] Updated weights for policy 0, policy_version 650 (0.0017)
[2025-08-21 19:39:35,084][02158] Fps is (10 sec: 4504.2, 60 sec: 4095.7, 300 sec: 4137.6). Total num frames: 2666496. Throughput: 0: 1032.3. Samples: 664980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:39:35,085][02158] Avg episode reward: [(0, '19.785')]
[2025-08-21 19:39:35,097][02316] Saving new best policy, reward=19.785!
[2025-08-21 19:39:40,080][02158] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4151.5). Total num frames: 2682880. Throughput: 0: 1021.3. Samples: 669784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:39:40,081][02158] Avg episode reward: [(0, '19.947')]
[2025-08-21 19:39:40,083][02316] Saving new best policy, reward=19.947!
[2025-08-21 19:39:44,535][02335] Updated weights for policy 0, policy_version 660 (0.0023)
[2025-08-21 19:39:45,080][02158] Fps is (10 sec: 3687.9, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 2703360. Throughput: 0: 1028.0. Samples: 676770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:39:45,081][02158] Avg episode reward: [(0, '19.140')]
[2025-08-21 19:39:50,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 2723840. Throughput: 0: 1026.9. Samples: 680284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:39:50,085][02158] Avg episode reward: [(0, '18.758')]
[2025-08-21 19:39:55,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2740224. Throughput: 0: 1021.2. Samples: 685206. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:39:55,082][02158] Avg episode reward: [(0, '19.772')]
[2025-08-21 19:39:55,136][02335] Updated weights for policy 0, policy_version 670 (0.0017)
[2025-08-21 19:39:55,137][02316] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000670_2744320.pth...
[2025-08-21 19:39:55,248][02316] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000427_1748992.pth
[2025-08-21 19:40:00,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2764800. Throughput: 0: 1021.6. Samples: 692058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:40:00,087][02158] Avg episode reward: [(0, '19.472')]
[2025-08-21 19:40:04,069][02335] Updated weights for policy 0, policy_version 680 (0.0014)
[2025-08-21 19:40:05,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.2, 300 sec: 4123.8). Total num frames: 2785280. Throughput: 0: 1022.1. Samples: 695570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:40:05,083][02158] Avg episode reward: [(0, '20.974')]
[2025-08-21 19:40:05,091][02316] Saving new best policy, reward=20.974!
[2025-08-21 19:40:10,080][02158] Fps is (10 sec: 3686.3, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2801664. Throughput: 0: 1018.5. Samples: 700436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:40:10,085][02158] Avg episode reward: [(0, '21.681')]
[2025-08-21 19:40:10,096][02316] Saving new best policy, reward=21.681!
[2025-08-21 19:40:14,462][02335] Updated weights for policy 0, policy_version 690 (0.0018)
[2025-08-21 19:40:15,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 2826240. Throughput: 0: 1020.8. Samples: 707394. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:40:15,085][02158] Avg episode reward: [(0, '21.505')]
[2025-08-21 19:40:20,080][02158] Fps is (10 sec: 4505.7, 60 sec: 4027.9, 300 sec: 4123.8). Total num frames: 2846720. Throughput: 0: 1019.2. Samples: 710840. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:40:20,084][02158] Avg episode reward: [(0, '21.167')]
[2025-08-21 19:40:25,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4123.8). Total num frames: 2863104. Throughput: 0: 1018.4. Samples: 715614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:40:25,081][02158] Avg episode reward: [(0, '20.521')]
[2025-08-21 19:40:25,336][02335] Updated weights for policy 0, policy_version 700 (0.0013)
[2025-08-21 19:40:30,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4151.5). Total num frames: 2887680. Throughput: 0: 1020.2. Samples: 722678. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:40:30,081][02158] Avg episode reward: [(0, '19.435')]
[2025-08-21 19:40:34,410][02335] Updated weights for policy 0, policy_version 710 (0.0012)
[2025-08-21 19:40:35,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4028.0, 300 sec: 4137.8). Total num frames: 2908160. Throughput: 0: 1022.0. Samples: 726272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:40:35,084][02158] Avg episode reward: [(0, '18.893')]
[2025-08-21 19:40:40,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4137.7). Total num frames: 2924544. Throughput: 0: 1018.4. Samples: 731036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:40:40,081][02158] Avg episode reward: [(0, '20.301')]
[2025-08-21 19:40:44,671][02335] Updated weights for policy 0, policy_version 720 (0.0014)
[2025-08-21 19:40:45,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 2949120. Throughput: 0: 1023.6. Samples: 738120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:40:45,081][02158] Avg episode reward: [(0, '18.974')]
[2025-08-21 19:40:50,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 2969600. Throughput: 0: 1020.3. Samples: 741482. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:40:50,081][02158] Avg episode reward: [(0, '19.337')]
[2025-08-21 19:40:55,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 2985984. Throughput: 0: 1022.3. Samples: 746438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:40:55,085][02158] Avg episode reward: [(0, '19.696')]
[2025-08-21 19:40:55,265][02335] Updated weights for policy 0, policy_version 730 (0.0041)
[2025-08-21 19:41:00,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 3010560. Throughput: 0: 1023.2. Samples: 753438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:41:00,084][02158] Avg episode reward: [(0, '17.958')]
[2025-08-21 19:41:04,672][02335] Updated weights for policy 0, policy_version 740 (0.0014)
[2025-08-21 19:41:05,081][02158] Fps is (10 sec: 4504.9, 60 sec: 4095.9, 300 sec: 4137.6). Total num frames: 3031040. Throughput: 0: 1025.2. Samples: 756976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:41:05,083][02158] Avg episode reward: [(0, '17.979')]
[2025-08-21 19:41:10,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 3051520. Throughput: 0: 1029.3. Samples: 761934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:41:10,081][02158] Avg episode reward: [(0, '18.327')]
[2025-08-21 19:41:14,388][02335] Updated weights for policy 0, policy_version 750 (0.0021)
[2025-08-21 19:41:15,080][02158] Fps is (10 sec: 4096.6, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3072000. Throughput: 0: 1028.9. Samples: 768980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:41:15,084][02158] Avg episode reward: [(0, '18.867')]
[2025-08-21 19:41:20,080][02158] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4137.7). Total num frames: 3092480. Throughput: 0: 1023.4. Samples: 772326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:41:20,085][02158] Avg episode reward: [(0, '19.830')]
[2025-08-21 19:41:25,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3108864. Throughput: 0: 1028.6. Samples: 777324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:41:25,081][02158] Avg episode reward: [(0, '19.722')]
[2025-08-21 19:41:25,102][02335] Updated weights for policy 0, policy_version 760 (0.0032)
[2025-08-21 19:41:30,080][02158] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3133440. Throughput: 0: 1029.7. Samples: 784458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:41:30,081][02158] Avg episode reward: [(0, '19.530')]
[2025-08-21 19:41:34,276][02335] Updated weights for policy 0, policy_version 770 (0.0022)
[2025-08-21 19:41:35,082][02158] Fps is (10 sec: 4504.7, 60 sec: 4095.9, 300 sec: 4123.7). Total num frames: 3153920. Throughput: 0: 1030.6. Samples: 787860. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:41:35,083][02158] Avg episode reward: [(0, '21.378')]
[2025-08-21 19:41:40,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 3174400. Throughput: 0: 1032.0. Samples: 792876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:41:40,085][02158] Avg episode reward: [(0, '21.307')]
[2025-08-21 19:41:44,327][02335] Updated weights for policy 0, policy_version 780 (0.0027)
[2025-08-21 19:41:45,080][02158] Fps is (10 sec: 4096.8, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3194880. Throughput: 0: 1033.4. Samples: 799942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:41:45,081][02158] Avg episode reward: [(0, '20.714')]
[2025-08-21 19:41:50,080][02158] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3215360. Throughput: 0: 1029.0. Samples: 803278. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:41:50,081][02158] Avg episode reward: [(0, '20.585')]
[2025-08-21 19:41:54,832][02335] Updated weights for policy 0, policy_version 790 (0.0014)
[2025-08-21 19:41:55,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3235840. Throughput: 0: 1030.9. Samples: 808326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:41:55,081][02158] Avg episode reward: [(0, '19.940')]
[2025-08-21 19:41:55,088][02316] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000790_3235840.pth...
[2025-08-21 19:41:55,207][02316] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000549_2248704.pth
[2025-08-21 19:42:00,080][02158] Fps is (10 sec: 4505.7, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3260416. Throughput: 0: 1030.0. Samples: 815330. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:42:00,081][02158] Avg episode reward: [(0, '17.558')]
[2025-08-21 19:42:04,032][02335] Updated weights for policy 0, policy_version 800 (0.0011)
[2025-08-21 19:42:05,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 4123.8). Total num frames: 3276800. Throughput: 0: 1031.4. Samples: 818740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:42:05,084][02158] Avg episode reward: [(0, '17.162')]
[2025-08-21 19:42:10,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3297280. Throughput: 0: 1032.8. Samples: 823800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:42:10,082][02158] Avg episode reward: [(0, '18.265')]
[2025-08-21 19:42:13,906][02335] Updated weights for policy 0, policy_version 810 (0.0014)
[2025-08-21 19:42:15,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3321856. Throughput: 0: 1033.6. Samples: 830968. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:42:15,082][02158] Avg episode reward: [(0, '19.944')]
[2025-08-21 19:42:20,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3338240. Throughput: 0: 1030.3. Samples: 834222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:42:20,084][02158] Avg episode reward: [(0, '20.334')]
[2025-08-21 19:42:24,463][02335] Updated weights for policy 0, policy_version 820 (0.0018)
[2025-08-21 19:42:25,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3358720. Throughput: 0: 1031.6. Samples: 839300. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:42:25,081][02158] Avg episode reward: [(0, '22.194')]
[2025-08-21 19:42:25,096][02316] Saving new best policy, reward=22.194!
[2025-08-21 19:42:30,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3383296. Throughput: 0: 1030.0. Samples: 846290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:42:30,081][02158] Avg episode reward: [(0, '23.070')]
[2025-08-21 19:42:30,084][02316] Saving new best policy, reward=23.070!
[2025-08-21 19:42:34,335][02335] Updated weights for policy 0, policy_version 830 (0.0019)
[2025-08-21 19:42:35,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 4109.9). Total num frames: 3399680. Throughput: 0: 1026.6. Samples: 849476. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:42:35,081][02158] Avg episode reward: [(0, '22.532')]
[2025-08-21 19:42:40,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3420160. Throughput: 0: 1028.3. Samples: 854600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:42:40,081][02158] Avg episode reward: [(0, '21.858')]
[2025-08-21 19:42:44,011][02335] Updated weights for policy 0, policy_version 840 (0.0012)
[2025-08-21 19:42:45,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3444736. Throughput: 0: 1031.8. Samples: 861762. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:42:45,084][02158] Avg episode reward: [(0, '20.251')]
[2025-08-21 19:42:50,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3461120. Throughput: 0: 1027.5. Samples: 864978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:42:50,081][02158] Avg episode reward: [(0, '19.968')]
[2025-08-21 19:42:54,464][02335] Updated weights for policy 0, policy_version 850 (0.0024)
[2025-08-21 19:42:55,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3481600. Throughput: 0: 1032.4. Samples: 870256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:42:55,081][02158] Avg episode reward: [(0, '20.787')]
[2025-08-21 19:43:00,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3506176. Throughput: 0: 1030.8. Samples: 877352. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:43:00,085][02158] Avg episode reward: [(0, '20.669')]
[2025-08-21 19:43:03,685][02335] Updated weights for policy 0, policy_version 860 (0.0014)
[2025-08-21 19:43:05,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3526656. Throughput: 0: 1030.4. Samples: 880588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:43:05,084][02158] Avg episode reward: [(0, '20.597')]
[2025-08-21 19:43:10,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3547136. Throughput: 0: 1039.0. Samples: 886056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:43:10,085][02158] Avg episode reward: [(0, '21.015')]
[2025-08-21 19:43:13,289][02335] Updated weights for policy 0, policy_version 870 (0.0021)
[2025-08-21 19:43:15,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3567616. Throughput: 0: 1041.2. Samples: 893142. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:43:15,085][02158] Avg episode reward: [(0, '19.756')]
[2025-08-21 19:43:20,080][02158] Fps is (10 sec: 4095.8, 60 sec: 4164.2, 300 sec: 4123.8). Total num frames: 3588096. Throughput: 0: 1038.0. Samples: 896188. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:43:20,084][02158] Avg episode reward: [(0, '19.390')]
[2025-08-21 19:43:23,844][02335] Updated weights for policy 0, policy_version 880 (0.0023)
[2025-08-21 19:43:25,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3608576. Throughput: 0: 1045.0. Samples: 901626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:43:25,084][02158] Avg episode reward: [(0, '20.778')]
[2025-08-21 19:43:30,080][02158] Fps is (10 sec: 4505.8, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3633152. Throughput: 0: 1042.5. Samples: 908676. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:43:30,081][02158] Avg episode reward: [(0, '22.068')]
[2025-08-21 19:43:32,844][02335] Updated weights for policy 0, policy_version 890 (0.0011)
[2025-08-21 19:43:35,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3649536. Throughput: 0: 1039.6. Samples: 911762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:43:35,085][02158] Avg episode reward: [(0, '22.330')]
[2025-08-21 19:43:40,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3670016. Throughput: 0: 1045.2. Samples: 917290. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:43:40,081][02158] Avg episode reward: [(0, '23.611')]
[2025-08-21 19:43:40,082][02316] Saving new best policy, reward=23.611!
[2025-08-21 19:43:42,970][02335] Updated weights for policy 0, policy_version 900 (0.0028)
[2025-08-21 19:43:45,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3694592. Throughput: 0: 1046.0. Samples: 924420. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:43:45,085][02158] Avg episode reward: [(0, '22.402')]
[2025-08-21 19:43:50,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3710976. Throughput: 0: 1038.5. Samples: 927320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:43:50,082][02158] Avg episode reward: [(0, '21.547')]
[2025-08-21 19:43:53,451][02335] Updated weights for policy 0, policy_version 910 (0.0026)
[2025-08-21 19:43:55,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3731456. Throughput: 0: 1042.3. Samples: 932960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:43:55,081][02158] Avg episode reward: [(0, '21.501')]
[2025-08-21 19:43:55,106][02316] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000912_3735552.pth...
[2025-08-21 19:43:55,218][02316] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000670_2744320.pth
[2025-08-21 19:44:00,080][02158] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3756032. Throughput: 0: 1042.3. Samples: 940044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:44:00,082][02158] Avg episode reward: [(0, '19.866')]
[2025-08-21 19:44:02,325][02335] Updated weights for policy 0, policy_version 920 (0.0017)
[2025-08-21 19:44:05,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 3776512. Throughput: 0: 1038.5. Samples: 942922. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:44:05,085][02158] Avg episode reward: [(0, '19.345')]
[2025-08-21 19:44:10,081][02158] Fps is (10 sec: 4095.6, 60 sec: 4164.2, 300 sec: 4123.8). Total num frames: 3796992. Throughput: 0: 1045.4. Samples: 948670. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:44:10,082][02158] Avg episode reward: [(0, '20.593')]
[2025-08-21 19:44:13,878][02335] Updated weights for policy 0, policy_version 930 (0.0014)
[2025-08-21 19:44:15,080][02158] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 3809280. Throughput: 0: 998.9. Samples: 953626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:44:15,084][02158] Avg episode reward: [(0, '19.446')]
[2025-08-21 19:44:20,080][02158] Fps is (10 sec: 2867.5, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 3825664. Throughput: 0: 992.2. Samples: 956410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:44:20,081][02158] Avg episode reward: [(0, '18.481')]
[2025-08-21 19:44:24,709][02335] Updated weights for policy 0, policy_version 940 (0.0016)
[2025-08-21 19:44:25,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3850240. Throughput: 0: 996.4. Samples: 962126. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:44:25,088][02158] Avg episode reward: [(0, '18.672')]
[2025-08-21 19:44:30,080][02158] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 4096.1). Total num frames: 3874816. Throughput: 0: 999.8. Samples: 969410. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:44:30,085][02158] Avg episode reward: [(0, '18.944')]
[2025-08-21 19:44:34,811][02335] Updated weights for policy 0, policy_version 950 (0.0011)
[2025-08-21 19:44:35,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3891200. Throughput: 0: 995.8. Samples: 972132. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-08-21 19:44:35,081][02158] Avg episode reward: [(0, '17.985')]
[2025-08-21 19:44:40,080][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3911680. Throughput: 0: 1000.2. Samples: 977968. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:44:40,081][02158] Avg episode reward: [(0, '18.378')]
[2025-08-21 19:44:43,574][02335] Updated weights for policy 0, policy_version 960 (0.0013)
[2025-08-21 19:44:45,080][02158] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 3936256. Throughput: 0: 1002.3. Samples: 985148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:44:45,085][02158] Avg episode reward: [(0, '21.339')]
[2025-08-21 19:44:50,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 3952640. Throughput: 0: 995.8. Samples: 987732. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:44:50,081][02158] Avg episode reward: [(0, '22.020')]
[2025-08-21 19:44:54,077][02335] Updated weights for policy 0, policy_version 970 (0.0023)
[2025-08-21 19:44:55,080][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3977216. Throughput: 0: 999.8. Samples: 993660. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:44:55,081][02158] Avg episode reward: [(0, '22.876')]
[2025-08-21 19:45:00,084][02158] Fps is (10 sec: 4913.2, 60 sec: 4095.7, 300 sec: 4123.7). Total num frames: 4001792. Throughput: 0: 1049.4. Samples: 1000854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:45:00,086][02158] Avg episode reward: [(0, '23.877')]
[2025-08-21 19:45:00,087][02316] Saving new best policy, reward=23.877!
[2025-08-21 19:45:01,315][02316] Stopping Batcher_0...
[2025-08-21 19:45:01,315][02316] Loop batcher_evt_loop terminating...
[2025-08-21 19:45:01,316][02158] Component Batcher_0 stopped!
[2025-08-21 19:45:01,321][02316] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:45:01,406][02335] Weights refcount: 2 0
[2025-08-21 19:45:01,414][02158] Component InferenceWorker_p0-w0 stopped!
[2025-08-21 19:45:01,417][02335] Stopping InferenceWorker_p0-w0...
[2025-08-21 19:45:01,418][02335] Loop inference_proc0-0_evt_loop terminating...
[2025-08-21 19:45:01,501][02316] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000790_3235840.pth
[2025-08-21 19:45:01,509][02316] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:45:01,714][02158] Component LearnerWorker_p0 stopped!
[2025-08-21 19:45:01,717][02316] Stopping LearnerWorker_p0...
[2025-08-21 19:45:01,717][02316] Loop learner_proc0_evt_loop terminating...
[2025-08-21 19:45:01,886][02158] Component RolloutWorker_w4 stopped!
[2025-08-21 19:45:01,890][02333] Stopping RolloutWorker_w4...
[2025-08-21 19:45:01,890][02333] Loop rollout_proc4_evt_loop terminating...
[2025-08-21 19:45:01,934][02158] Component RolloutWorker_w0 stopped!
[2025-08-21 19:45:01,935][02329] Stopping RolloutWorker_w0...
[2025-08-21 19:45:01,947][02329] Loop rollout_proc0_evt_loop terminating...
[2025-08-21 19:45:01,998][02158] Component RolloutWorker_w6 stopped!
[2025-08-21 19:45:02,003][02336] Stopping RolloutWorker_w6...
[2025-08-21 19:45:02,003][02336] Loop rollout_proc6_evt_loop terminating...
[2025-08-21 19:45:02,043][02158] Component RolloutWorker_w2 stopped!
[2025-08-21 19:45:02,047][02331] Stopping RolloutWorker_w2...
[2025-08-21 19:45:02,048][02331] Loop rollout_proc2_evt_loop terminating...
[2025-08-21 19:45:02,166][02158] Component RolloutWorker_w3 stopped!
[2025-08-21 19:45:02,167][02332] Stopping RolloutWorker_w3...
[2025-08-21 19:45:02,168][02332] Loop rollout_proc3_evt_loop terminating...
[2025-08-21 19:45:02,179][02158] Component RolloutWorker_w7 stopped!
[2025-08-21 19:45:02,180][02337] Stopping RolloutWorker_w7...
[2025-08-21 19:45:02,180][02337] Loop rollout_proc7_evt_loop terminating...
[2025-08-21 19:45:02,217][02158] Component RolloutWorker_w1 stopped!
[2025-08-21 19:45:02,218][02330] Stopping RolloutWorker_w1...
[2025-08-21 19:45:02,219][02330] Loop rollout_proc1_evt_loop terminating...
[2025-08-21 19:45:02,231][02158] Component RolloutWorker_w5 stopped!
[2025-08-21 19:45:02,236][02158] Waiting for process learner_proc0 to stop...
[2025-08-21 19:45:02,237][02334] Stopping RolloutWorker_w5...
[2025-08-21 19:45:02,238][02334] Loop rollout_proc5_evt_loop terminating...
[2025-08-21 19:45:04,062][02158] Waiting for process inference_proc0-0 to join...
[2025-08-21 19:45:04,113][02158] Waiting for process rollout_proc0 to join...
[2025-08-21 19:45:06,275][02158] Waiting for process rollout_proc1 to join...
[2025-08-21 19:45:06,276][02158] Waiting for process rollout_proc2 to join...
[2025-08-21 19:45:06,281][02158] Waiting for process rollout_proc3 to join...
[2025-08-21 19:45:06,282][02158] Waiting for process rollout_proc4 to join...
[2025-08-21 19:45:06,283][02158] Waiting for process rollout_proc5 to join...
[2025-08-21 19:45:06,284][02158] Waiting for process rollout_proc6 to join...
[2025-08-21 19:45:06,285][02158] Waiting for process rollout_proc7 to join...
[2025-08-21 19:45:06,286][02158] Batcher 0 profile tree view:
batching: 25.2408, releasing_batches: 0.0231
[2025-08-21 19:45:06,287][02158] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0009
  wait_policy_total: 420.7426
update_model: 7.3242
  weight_update: 0.0013
one_step: 0.0024
  handle_policy_step: 529.3381
    deserialize: 13.8035, stack: 2.8135, obs_to_device_normalize: 112.7857, forward: 268.2359, send_messages: 27.3678
    prepare_outputs: 80.8961
      to_cpu: 51.1559
[2025-08-21 19:45:06,288][02158] Learner 0 profile tree view:
misc: 0.0043, prepare_batch: 12.3363
train: 70.8892
  epoch_init: 0.0047, minibatch_init: 0.0062, losses_postprocess: 0.6432, kl_divergence: 0.6006, after_optimizer: 32.4209
  calculate_losses: 25.0066
    losses_init: 0.0077, forward_head: 1.3103, bptt_initial: 16.4919, tail: 1.0465, advantages_returns: 0.2785, losses: 3.5946
    bptt: 2.0570
      bptt_forward_core: 1.9683
  update: 11.6119
    clip: 0.8903
[2025-08-21 19:45:06,289][02158] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.2611, enqueue_policy_requests: 105.8157, env_step: 781.4104, overhead: 11.6536, complete_rollouts: 7.2810
save_policy_outputs: 18.2975
  split_output_tensors: 7.2448
[2025-08-21 19:45:06,290][02158] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3584, enqueue_policy_requests: 113.3558, env_step: 769.0173, overhead: 12.9300, complete_rollouts: 6.1323
save_policy_outputs: 18.2378
  split_output_tensors: 7.1571
[2025-08-21 19:45:06,292][02158] Loop Runner_EvtLoop terminating...
[2025-08-21 19:45:06,293][02158] Runner profile tree view:
main_loop: 1025.3570
[2025-08-21 19:45:06,294][02158] Collected {0: 4005888}, FPS: 3906.8
[2025-08-21 19:45:06,583][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 19:45:06,584][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 19:45:06,585][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 19:45:06,587][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 19:45:06,588][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 19:45:06,589][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 19:45:06,590][02158] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 19:45:06,591][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 19:45:06,592][02158] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-08-21 19:45:06,593][02158] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-08-21 19:45:06,594][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 19:45:06,595][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 19:45:06,596][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 19:45:06,597][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 19:45:06,598][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 19:45:06,626][02158] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:45:06,628][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 19:45:06,630][02158] RunningMeanStd input shape: (1,)
[2025-08-21 19:45:06,641][02158] ConvEncoder: input_channels=3
[2025-08-21 19:45:06,729][02158] Conv encoder output size: 512
[2025-08-21 19:45:06,730][02158] Policy head output size: 512
[2025-08-21 19:45:06,894][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:45:06,898][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:45:06,900][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:45:06,902][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:45:06,904][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:45:06,905][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:45:29,585][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 19:45:29,586][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 19:45:29,587][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 19:45:29,588][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 19:45:29,589][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 19:45:29,590][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 19:45:29,591][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 19:45:29,592][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 19:45:29,592][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 19:45:29,593][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 19:45:29,594][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 19:45:29,595][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 19:45:29,596][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 19:45:29,597][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 19:45:29,599][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 19:45:29,628][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 19:45:29,629][02158] RunningMeanStd input shape: (1,)
[2025-08-21 19:45:29,638][02158] ConvEncoder: input_channels=3
[2025-08-21 19:45:29,683][02158] Conv encoder output size: 512
[2025-08-21 19:45:29,684][02158] Policy head output size: 512
[2025-08-21 19:45:29,700][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:45:29,701][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:45:29,702][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:45:29,704][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:45:29,705][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:45:29,707][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:47:35,631][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 19:47:35,632][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 19:47:35,633][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 19:47:35,634][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 19:47:35,636][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 19:47:35,637][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 19:47:35,638][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 19:47:35,639][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 19:47:35,640][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 19:47:35,641][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 19:47:35,642][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 19:47:35,643][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 19:47:35,644][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 19:47:35,645][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 19:47:35,646][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 19:47:35,668][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 19:47:35,669][02158] RunningMeanStd input shape: (1,)
[2025-08-21 19:47:35,677][02158] ConvEncoder: input_channels=3
[2025-08-21 19:47:35,709][02158] Conv encoder output size: 512
[2025-08-21 19:47:35,710][02158] Policy head output size: 512
[2025-08-21 19:47:35,726][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:47:35,728][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:47:35,729][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:47:35,730][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:47:35,731][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:47:35,733][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:48:09,236][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 19:48:09,237][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 19:48:09,238][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 19:48:09,239][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 19:48:09,240][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 19:48:09,240][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 19:48:09,241][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 19:48:09,242][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 19:48:09,243][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 19:48:09,243][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 19:48:09,244][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 19:48:09,246][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 19:48:09,247][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 19:48:09,248][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 19:48:09,249][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 19:48:09,276][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 19:48:09,279][02158] RunningMeanStd input shape: (1,)
[2025-08-21 19:48:09,287][02158] ConvEncoder: input_channels=3
[2025-08-21 19:48:09,318][02158] Conv encoder output size: 512
[2025-08-21 19:48:09,319][02158] Policy head output size: 512
[2025-08-21 19:48:09,338][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:48:10,061][02158] Num frames 100...
[2025-08-21 19:48:10,185][02158] Num frames 200...
[2025-08-21 19:48:10,309][02158] Num frames 300...
[2025-08-21 19:48:10,432][02158] Num frames 400...
[2025-08-21 19:48:10,564][02158] Num frames 500...
[2025-08-21 19:48:10,692][02158] Num frames 600...
[2025-08-21 19:48:10,816][02158] Num frames 700...
[2025-08-21 19:48:10,940][02158] Num frames 800...
[2025-08-21 19:48:11,062][02158] Num frames 900...
[2025-08-21 19:48:11,196][02158] Avg episode rewards: #0: 22.600, true rewards: #0: 9.600
[2025-08-21 19:48:11,197][02158] Avg episode reward: 22.600, avg true_objective: 9.600
[2025-08-21 19:48:11,248][02158] Num frames 1000...
[2025-08-21 19:48:11,371][02158] Num frames 1100...
[2025-08-21 19:48:11,492][02158] Num frames 1200...
[2025-08-21 19:48:11,622][02158] Num frames 1300...
[2025-08-21 19:48:11,747][02158] Num frames 1400...
[2025-08-21 19:48:11,866][02158] Num frames 1500...
[2025-08-21 19:48:11,989][02158] Num frames 1600...
[2025-08-21 19:48:12,110][02158] Num frames 1700...
[2025-08-21 19:48:12,239][02158] Num frames 1800...
[2025-08-21 19:48:12,403][02158] Avg episode rewards: #0: 20.940, true rewards: #0: 9.440
[2025-08-21 19:48:12,404][02158] Avg episode reward: 20.940, avg true_objective: 9.440
[2025-08-21 19:48:12,423][02158] Num frames 1900...
[2025-08-21 19:48:12,540][02158] Num frames 2000...
[2025-08-21 19:48:12,670][02158] Num frames 2100...
[2025-08-21 19:48:12,788][02158] Num frames 2200...
[2025-08-21 19:48:12,905][02158] Num frames 2300...
[2025-08-21 19:48:13,028][02158] Num frames 2400...
[2025-08-21 19:48:13,150][02158] Num frames 2500...
[2025-08-21 19:48:13,273][02158] Num frames 2600...
[2025-08-21 19:48:13,396][02158] Num frames 2700...
[2025-08-21 19:48:13,518][02158] Avg episode rewards: #0: 19.507, true rewards: #0: 9.173
[2025-08-21 19:48:13,519][02158] Avg episode reward: 19.507, avg true_objective: 9.173
[2025-08-21 19:48:13,581][02158] Num frames 2800...
[2025-08-21 19:48:13,713][02158] Num frames 2900...
[2025-08-21 19:48:13,832][02158] Num frames 3000...
[2025-08-21 19:48:13,958][02158] Num frames 3100...
[2025-08-21 19:48:14,078][02158] Num frames 3200...
[2025-08-21 19:48:14,202][02158] Num frames 3300...
[2025-08-21 19:48:14,323][02158] Num frames 3400...
[2025-08-21 19:48:14,444][02158] Num frames 3500...
[2025-08-21 19:48:14,564][02158] Num frames 3600...
[2025-08-21 19:48:14,694][02158] Num frames 3700...
[2025-08-21 19:48:14,812][02158] Num frames 3800...
[2025-08-21 19:48:14,931][02158] Num frames 3900...
[2025-08-21 19:48:15,048][02158] Num frames 4000...
[2025-08-21 19:48:15,166][02158] Num frames 4100...
[2025-08-21 19:48:15,263][02158] Avg episode rewards: #0: 22.335, true rewards: #0: 10.335
[2025-08-21 19:48:15,264][02158] Avg episode reward: 22.335, avg true_objective: 10.335
[2025-08-21 19:48:15,346][02158] Num frames 4200...
[2025-08-21 19:48:15,466][02158] Num frames 4300...
[2025-08-21 19:48:15,590][02158] Num frames 4400...
[2025-08-21 19:48:15,724][02158] Num frames 4500...
[2025-08-21 19:48:15,846][02158] Num frames 4600...
[2025-08-21 19:48:15,967][02158] Num frames 4700...
[2025-08-21 19:48:16,087][02158] Num frames 4800...
[2025-08-21 19:48:16,211][02158] Num frames 4900...
[2025-08-21 19:48:16,341][02158] Num frames 5000...
[2025-08-21 19:48:16,468][02158] Num frames 5100...
[2025-08-21 19:48:16,589][02158] Num frames 5200...
[2025-08-21 19:48:16,712][02158] Num frames 5300...
[2025-08-21 19:48:16,844][02158] Num frames 5400...
[2025-08-21 19:48:16,975][02158] Num frames 5500...
[2025-08-21 19:48:17,110][02158] Num frames 5600...
[2025-08-21 19:48:17,234][02158] Num frames 5700...
[2025-08-21 19:48:17,390][02158] Avg episode rewards: #0: 25.962, true rewards: #0: 11.562
[2025-08-21 19:48:17,391][02158] Avg episode reward: 25.962, avg true_objective: 11.562
[2025-08-21 19:48:17,417][02158] Num frames 5800...
[2025-08-21 19:48:17,537][02158] Num frames 5900...
[2025-08-21 19:48:17,667][02158] Num frames 6000...
[2025-08-21 19:48:17,808][02158] Num frames 6100...
[2025-08-21 19:48:17,927][02158] Num frames 6200...
[2025-08-21 19:48:18,046][02158] Num frames 6300...
[2025-08-21 19:48:18,166][02158] Num frames 6400...
[2025-08-21 19:48:18,288][02158] Num frames 6500...
[2025-08-21 19:48:18,414][02158] Num frames 6600...
[2025-08-21 19:48:18,572][02158] Num frames 6700...
[2025-08-21 19:48:18,750][02158] Num frames 6800...
[2025-08-21 19:48:18,929][02158] Num frames 6900...
[2025-08-21 19:48:18,995][02158] Avg episode rewards: #0: 25.502, true rewards: #0: 11.502
[2025-08-21 19:48:18,996][02158] Avg episode reward: 25.502, avg true_objective: 11.502
[2025-08-21 19:48:19,188][02158] Num frames 7000...
[2025-08-21 19:48:19,405][02158] Num frames 7100...
[2025-08-21 19:48:19,576][02158] Num frames 7200...
[2025-08-21 19:48:19,751][02158] Num frames 7300...
[2025-08-21 19:48:19,947][02158] Num frames 7400...
[2025-08-21 19:48:20,126][02158] Num frames 7500...
[2025-08-21 19:48:20,203][02158] Avg episode rewards: #0: 23.299, true rewards: #0: 10.727
[2025-08-21 19:48:20,204][02158] Avg episode reward: 23.299, avg true_objective: 10.727
[2025-08-21 19:48:20,366][02158] Num frames 7600...
[2025-08-21 19:48:20,547][02158] Num frames 7700...
[2025-08-21 19:48:20,612][02158] Avg episode rewards: #0: 20.876, true rewards: #0: 9.626
[2025-08-21 19:48:20,613][02158] Avg episode reward: 20.876, avg true_objective: 9.626
[2025-08-21 19:48:20,775][02158] Num frames 7800...
[2025-08-21 19:48:20,898][02158] Num frames 7900...
[2025-08-21 19:48:21,029][02158] Num frames 8000...
[2025-08-21 19:48:21,153][02158] Num frames 8100...
[2025-08-21 19:48:21,294][02158] Num frames 8200...
[2025-08-21 19:48:21,457][02158] Avg episode rewards: #0: 19.641, true rewards: #0: 9.197
[2025-08-21 19:48:21,458][02158] Avg episode reward: 19.641, avg true_objective: 9.197
[2025-08-21 19:48:21,489][02158] Num frames 8300...
[2025-08-21 19:48:21,616][02158] Num frames 8400...
[2025-08-21 19:48:21,745][02158] Num frames 8500...
[2025-08-21 19:48:21,872][02158] Num frames 8600...
[2025-08-21 19:48:22,022][02158] Avg episode rewards: #0: 18.061, true rewards: #0: 8.661
[2025-08-21 19:48:22,023][02158] Avg episode reward: 18.061, avg true_objective: 8.661
[2025-08-21 19:49:14,491][02158] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-08-21 19:49:20,462][02158] The model has been pushed to https://huggingface.co/jmartin233/rl_course_vizdoom_health_gathering_supreme
[2025-08-21 19:50:28,916][02158] Environment doom_basic already registered, overwriting...
[2025-08-21 19:50:28,917][02158] Environment doom_two_colors_easy already registered, overwriting...
[2025-08-21 19:50:28,918][02158] Environment doom_two_colors_hard already registered, overwriting...
[2025-08-21 19:50:28,919][02158] Environment doom_dm already registered, overwriting...
[2025-08-21 19:50:28,920][02158] Environment doom_dwango5 already registered, overwriting...
[2025-08-21 19:50:28,921][02158] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2025-08-21 19:50:28,922][02158] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2025-08-21 19:50:28,924][02158] Environment doom_my_way_home already registered, overwriting...
[2025-08-21 19:50:28,924][02158] Environment doom_deadly_corridor already registered, overwriting...
[2025-08-21 19:50:28,926][02158] Environment doom_defend_the_center already registered, overwriting...
[2025-08-21 19:50:28,926][02158] Environment doom_defend_the_line already registered, overwriting...
[2025-08-21 19:50:28,927][02158] Environment doom_health_gathering already registered, overwriting...
[2025-08-21 19:50:28,927][02158] Environment doom_health_gathering_supreme already registered, overwriting...
[2025-08-21 19:50:28,928][02158] Environment doom_battle already registered, overwriting...
[2025-08-21 19:50:28,929][02158] Environment doom_battle2 already registered, overwriting...
[2025-08-21 19:50:28,930][02158] Environment doom_duel_bots already registered, overwriting...
[2025-08-21 19:50:28,930][02158] Environment doom_deathmatch_bots already registered, overwriting...
[2025-08-21 19:50:28,931][02158] Environment doom_duel already registered, overwriting...
[2025-08-21 19:50:28,932][02158] Environment doom_deathmatch_full already registered, overwriting...
[2025-08-21 19:50:28,932][02158] Environment doom_benchmark already registered, overwriting...
[2025-08-21 19:50:28,933][02158] register_encoder_factory: <function make_vizdoom_encoder at 0x793079f458a0>
[2025-08-21 19:50:28,942][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 19:50:28,943][02158] Overriding arg 'train_for_env_steps' with value 5000000 passed from command line
[2025-08-21 19:50:28,948][02158] Experiment dir /content/train_dir/default_experiment already exists!
[2025-08-21 19:50:28,948][02158] Resuming existing experiment from /content/train_dir/default_experiment...
[2025-08-21 19:50:28,949][02158] Weights and Biases integration disabled
[2025-08-21 19:50:28,952][02158] Environment var CUDA_VISIBLE_DEVICES is 0

[2025-08-21 19:50:31,047][02158] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=5000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
git_hash=unknown
git_repo_name=not a git repository
[2025-08-21 19:50:31,048][02158] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-08-21 19:50:31,050][02158] Rollout worker 0 uses device cpu
[2025-08-21 19:50:31,051][02158] Rollout worker 1 uses device cpu
[2025-08-21 19:50:31,052][02158] Rollout worker 2 uses device cpu
[2025-08-21 19:50:31,053][02158] Rollout worker 3 uses device cpu
[2025-08-21 19:50:31,055][02158] Rollout worker 4 uses device cpu
[2025-08-21 19:50:31,056][02158] Rollout worker 5 uses device cpu
[2025-08-21 19:50:31,057][02158] Rollout worker 6 uses device cpu
[2025-08-21 19:50:31,058][02158] Rollout worker 7 uses device cpu
[2025-08-21 19:50:31,125][02158] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-08-21 19:50:31,126][02158] InferenceWorker_p0-w0: min num requests: 2
[2025-08-21 19:50:31,155][02158] Starting all processes...
[2025-08-21 19:50:31,155][02158] Starting process learner_proc0
[2025-08-21 19:50:31,217][02158] Starting all processes...
[2025-08-21 19:50:31,222][02158] Starting process inference_proc0-0
[2025-08-21 19:50:31,222][02158] Starting process rollout_proc0
[2025-08-21 19:50:31,224][02158] Starting process rollout_proc1
[2025-08-21 19:50:31,224][02158] Starting process rollout_proc2
[2025-08-21 19:50:31,224][02158] Starting process rollout_proc3
[2025-08-21 19:50:31,224][02158] Starting process rollout_proc4
[2025-08-21 19:50:31,224][02158] Starting process rollout_proc5
[2025-08-21 19:50:31,224][02158] Starting process rollout_proc6
[2025-08-21 19:50:31,224][02158] Starting process rollout_proc7
[2025-08-21 19:50:46,821][11373] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-08-21 19:50:46,821][11373] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-08-21 19:50:46,954][11373] Num visible devices: 1
[2025-08-21 19:50:46,982][11373] Starting seed is not provided
[2025-08-21 19:50:46,982][11373] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-08-21 19:50:46,982][11373] Initializing actor-critic model on device cuda:0
[2025-08-21 19:50:46,982][11373] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 19:50:46,987][11373] RunningMeanStd input shape: (1,)
[2025-08-21 19:50:47,106][11373] ConvEncoder: input_channels=3
[2025-08-21 19:50:47,468][11386] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-08-21 19:50:47,470][11386] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-08-21 19:50:47,624][11386] Num visible devices: 1
[2025-08-21 19:50:47,658][11394] Worker 4 uses CPU cores [0]
[2025-08-21 19:50:47,686][11393] Worker 6 uses CPU cores [0]
[2025-08-21 19:50:47,721][11391] Worker 5 uses CPU cores [1]
[2025-08-21 19:50:47,812][11387] Worker 0 uses CPU cores [0]
[2025-08-21 19:50:47,831][11392] Worker 7 uses CPU cores [1]
[2025-08-21 19:50:47,837][11389] Worker 2 uses CPU cores [0]
[2025-08-21 19:50:47,906][11388] Worker 1 uses CPU cores [1]
[2025-08-21 19:50:47,941][11390] Worker 3 uses CPU cores [1]
[2025-08-21 19:50:47,950][11373] Conv encoder output size: 512
[2025-08-21 19:50:47,950][11373] Policy head output size: 512
[2025-08-21 19:50:47,972][11373] Created Actor Critic model with architecture:
[2025-08-21 19:50:47,972][11373] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2025-08-21 19:50:48,129][11373] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-08-21 19:50:49,134][11373] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:50:49,137][11373] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:50:49,138][11373] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:50:49,139][11373] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:50:49,139][11373] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-08-21 19:50:49,140][11373] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/serialization.py", line 1529, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-08-21 19:50:49,141][11373] Did not load from checkpoint, starting from scratch!
[2025-08-21 19:50:49,141][11373] Initialized policy 0 weights for model version 0
[2025-08-21 19:50:49,144][11373] LearnerWorker_p0 finished initialization!
[2025-08-21 19:50:49,145][11373] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-08-21 19:50:49,272][11386] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 19:50:49,273][11386] RunningMeanStd input shape: (1,)
[2025-08-21 19:50:49,283][11386] ConvEncoder: input_channels=3
[2025-08-21 19:50:49,378][11386] Conv encoder output size: 512
[2025-08-21 19:50:49,379][11386] Policy head output size: 512
[2025-08-21 19:50:49,422][02158] Inference worker 0-0 is ready!
[2025-08-21 19:50:49,423][02158] All inference workers are ready! Signal rollout workers to start!
[2025-08-21 19:50:49,647][11388] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:50:49,656][11389] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:50:49,669][11392] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:50:49,682][11393] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:50:49,723][11391] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:50:49,725][11390] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:50:49,730][11394] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:50:49,742][11387] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-08-21 19:50:50,979][11388] Decorrelating experience for 0 frames...
[2025-08-21 19:50:50,981][11392] Decorrelating experience for 0 frames...
[2025-08-21 19:50:50,993][11389] Decorrelating experience for 0 frames...
[2025-08-21 19:50:51,005][11391] Decorrelating experience for 0 frames...
[2025-08-21 19:50:51,017][11394] Decorrelating experience for 0 frames...
[2025-08-21 19:50:51,020][11387] Decorrelating experience for 0 frames...
[2025-08-21 19:50:51,118][02158] Heartbeat connected on Batcher_0
[2025-08-21 19:50:51,122][02158] Heartbeat connected on LearnerWorker_p0
[2025-08-21 19:50:51,161][02158] Heartbeat connected on InferenceWorker_p0-w0
[2025-08-21 19:50:52,227][11388] Decorrelating experience for 32 frames...
[2025-08-21 19:50:52,306][11387] Decorrelating experience for 32 frames...
[2025-08-21 19:50:52,343][11391] Decorrelating experience for 32 frames...
[2025-08-21 19:50:52,345][11394] Decorrelating experience for 32 frames...
[2025-08-21 19:50:52,372][11390] Decorrelating experience for 0 frames...
[2025-08-21 19:50:53,394][11389] Decorrelating experience for 32 frames...
[2025-08-21 19:50:53,607][11387] Decorrelating experience for 64 frames...
[2025-08-21 19:50:53,616][11392] Decorrelating experience for 32 frames...
[2025-08-21 19:50:53,686][11390] Decorrelating experience for 32 frames...
[2025-08-21 19:50:53,952][02158] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-08-21 19:50:54,069][11391] Decorrelating experience for 64 frames...
[2025-08-21 19:50:54,587][11394] Decorrelating experience for 64 frames...
[2025-08-21 19:50:54,606][11388] Decorrelating experience for 64 frames...
[2025-08-21 19:50:54,741][11393] Decorrelating experience for 0 frames...
[2025-08-21 19:50:54,879][11392] Decorrelating experience for 64 frames...
[2025-08-21 19:50:54,901][11387] Decorrelating experience for 96 frames...
[2025-08-21 19:50:55,054][02158] Heartbeat connected on RolloutWorker_w0
[2025-08-21 19:50:55,750][11393] Decorrelating experience for 32 frames...
[2025-08-21 19:50:56,169][11388] Decorrelating experience for 96 frames...
[2025-08-21 19:50:56,404][02158] Heartbeat connected on RolloutWorker_w1
[2025-08-21 19:50:56,505][11390] Decorrelating experience for 64 frames...
[2025-08-21 19:50:56,606][11392] Decorrelating experience for 96 frames...
[2025-08-21 19:50:56,894][02158] Heartbeat connected on RolloutWorker_w7
[2025-08-21 19:50:57,660][11391] Decorrelating experience for 96 frames...
[2025-08-21 19:50:57,728][11393] Decorrelating experience for 64 frames...
[2025-08-21 19:50:58,062][02158] Heartbeat connected on RolloutWorker_w5
[2025-08-21 19:50:58,952][02158] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 262.4. Samples: 1312. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-08-21 19:50:58,953][02158] Avg episode reward: [(0, '3.213')]
[2025-08-21 19:50:59,945][11390] Decorrelating experience for 96 frames...
[2025-08-21 19:50:59,971][11389] Decorrelating experience for 64 frames...
[2025-08-21 19:51:00,891][02158] Heartbeat connected on RolloutWorker_w3
[2025-08-21 19:51:02,749][11394] Decorrelating experience for 96 frames...
[2025-08-21 19:51:02,991][02158] Heartbeat connected on RolloutWorker_w4
[2025-08-21 19:51:03,952][02158] Fps is (10 sec: 1228.8, 60 sec: 1228.8, 300 sec: 1228.8). Total num frames: 12288. Throughput: 0: 203.8. Samples: 2038. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2025-08-21 19:51:03,956][02158] Avg episode reward: [(0, '3.349')]
[2025-08-21 19:51:04,106][11393] Decorrelating experience for 96 frames...
[2025-08-21 19:51:04,434][11389] Decorrelating experience for 96 frames...
[2025-08-21 19:51:04,597][02158] Heartbeat connected on RolloutWorker_w6
[2025-08-21 19:51:04,770][02158] Heartbeat connected on RolloutWorker_w2
[2025-08-21 19:51:08,952][02158] Fps is (10 sec: 3276.8, 60 sec: 2184.6, 300 sec: 2184.6). Total num frames: 32768. Throughput: 0: 507.1. Samples: 7606. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:51:08,953][02158] Avg episode reward: [(0, '3.774')]
[2025-08-21 19:51:10,335][11386] Updated weights for policy 0, policy_version 10 (0.0012)
[2025-08-21 19:51:13,952][02158] Fps is (10 sec: 4096.0, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 53248. Throughput: 0: 716.7. Samples: 14334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:51:13,956][02158] Avg episode reward: [(0, '4.148')]
[2025-08-21 19:51:18,952][02158] Fps is (10 sec: 4096.0, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 73728. Throughput: 0: 658.3. Samples: 16458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:51:18,953][02158] Avg episode reward: [(0, '4.275')]
[2025-08-21 19:51:20,719][11386] Updated weights for policy 0, policy_version 20 (0.0044)
[2025-08-21 19:51:23,952][02158] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3140.3). Total num frames: 94208. Throughput: 0: 765.6. Samples: 22968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:51:23,953][02158] Avg episode reward: [(0, '4.449')]
[2025-08-21 19:51:28,956][02158] Fps is (10 sec: 4094.3, 60 sec: 3276.4, 300 sec: 3276.4). Total num frames: 114688. Throughput: 0: 831.4. Samples: 29104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:51:28,962][02158] Avg episode reward: [(0, '4.313')]
[2025-08-21 19:51:28,967][11373] Saving new best policy, reward=4.313!
[2025-08-21 19:51:31,646][11386] Updated weights for policy 0, policy_version 30 (0.0020)
[2025-08-21 19:51:33,952][02158] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 131072. Throughput: 0: 780.4. Samples: 31216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:51:33,957][02158] Avg episode reward: [(0, '4.347')]
[2025-08-21 19:51:33,960][11373] Saving new best policy, reward=4.347!
[2025-08-21 19:51:38,952][02158] Fps is (10 sec: 4097.7, 60 sec: 3458.9, 300 sec: 3458.9). Total num frames: 155648. Throughput: 0: 843.1. Samples: 37938. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:51:38,956][02158] Avg episode reward: [(0, '4.473')]
[2025-08-21 19:51:38,962][11373] Saving new best policy, reward=4.473!
[2025-08-21 19:51:40,505][11386] Updated weights for policy 0, policy_version 40 (0.0018)
[2025-08-21 19:51:43,954][02158] Fps is (10 sec: 4504.8, 60 sec: 3522.4, 300 sec: 3522.4). Total num frames: 176128. Throughput: 0: 958.7. Samples: 44454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:51:43,955][02158] Avg episode reward: [(0, '4.363')]
[2025-08-21 19:51:48,952][02158] Fps is (10 sec: 3686.4, 60 sec: 3500.2, 300 sec: 3500.2). Total num frames: 192512. Throughput: 0: 990.9. Samples: 46628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:51:48,954][02158] Avg episode reward: [(0, '4.401')]
[2025-08-21 19:51:51,016][11386] Updated weights for policy 0, policy_version 50 (0.0021)
[2025-08-21 19:51:53,952][02158] Fps is (10 sec: 4096.8, 60 sec: 3618.1, 300 sec: 3618.1). Total num frames: 217088. Throughput: 0: 1024.1. Samples: 53690. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:51:53,956][02158] Avg episode reward: [(0, '4.474')]
[2025-08-21 19:51:53,959][11373] Saving new best policy, reward=4.474!
[2025-08-21 19:51:58,953][02158] Fps is (10 sec: 4505.1, 60 sec: 3959.4, 300 sec: 3654.8). Total num frames: 237568. Throughput: 0: 1010.5. Samples: 59806. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:51:58,954][02158] Avg episode reward: [(0, '4.491')]
[2025-08-21 19:51:58,958][11373] Saving new best policy, reward=4.491!
[2025-08-21 19:52:01,362][11386] Updated weights for policy 0, policy_version 60 (0.0016)
[2025-08-21 19:52:03,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3627.9). Total num frames: 253952. Throughput: 0: 1009.4. Samples: 61882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-08-21 19:52:03,954][02158] Avg episode reward: [(0, '4.529')]
[2025-08-21 19:52:03,955][11373] Saving new best policy, reward=4.529!
[2025-08-21 19:52:08,952][02158] Fps is (10 sec: 4096.4, 60 sec: 4096.0, 300 sec: 3713.7). Total num frames: 278528. Throughput: 0: 1022.8. Samples: 68992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:52:08,953][02158] Avg episode reward: [(0, '4.425')]
[2025-08-21 19:52:10,083][11386] Updated weights for policy 0, policy_version 70 (0.0023)
[2025-08-21 19:52:13,952][02158] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3737.6). Total num frames: 299008. Throughput: 0: 1027.8. Samples: 75350. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-08-21 19:52:13,953][02158] Avg episode reward: [(0, '4.430')]
[2025-08-21 19:52:18,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3758.7). Total num frames: 319488. Throughput: 0: 1031.0. Samples: 77612. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:52:18,956][02158] Avg episode reward: [(0, '4.325')]
[2025-08-21 19:52:20,621][11386] Updated weights for policy 0, policy_version 80 (0.0018)
[2025-08-21 19:52:23,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3777.4). Total num frames: 339968. Throughput: 0: 1039.8. Samples: 84730. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:52:23,956][02158] Avg episode reward: [(0, '4.423')]
[2025-08-21 19:52:28,953][02158] Fps is (10 sec: 4095.6, 60 sec: 4096.2, 300 sec: 3794.2). Total num frames: 360448. Throughput: 0: 1030.5. Samples: 90826. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:52:28,955][02158] Avg episode reward: [(0, '4.404')]
[2025-08-21 19:52:28,965][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000088_360448.pth...
[2025-08-21 19:52:29,121][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000088_360448.pth
[2025-08-21 19:52:31,090][11386] Updated weights for policy 0, policy_version 90 (0.0032)
[2025-08-21 19:52:33,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3809.3). Total num frames: 380928. Throughput: 0: 1032.2. Samples: 93076. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:52:33,955][02158] Avg episode reward: [(0, '4.345')]
[2025-08-21 19:52:38,952][02158] Fps is (10 sec: 4506.0, 60 sec: 4164.3, 300 sec: 3862.0). Total num frames: 405504. Throughput: 0: 1033.7. Samples: 100208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:52:38,956][02158] Avg episode reward: [(0, '4.407')]
[2025-08-21 19:52:39,783][11386] Updated weights for policy 0, policy_version 100 (0.0013)
[2025-08-21 19:52:43,953][02158] Fps is (10 sec: 4095.7, 60 sec: 4096.1, 300 sec: 3835.3). Total num frames: 421888. Throughput: 0: 1031.8. Samples: 106236. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:52:43,954][02158] Avg episode reward: [(0, '4.560')]
[2025-08-21 19:52:43,960][11373] Saving new best policy, reward=4.560!
[2025-08-21 19:52:48,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3846.7). Total num frames: 442368. Throughput: 0: 1036.0. Samples: 108504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:52:48,953][02158] Avg episode reward: [(0, '4.643')]
[2025-08-21 19:52:48,958][11373] Saving new best policy, reward=4.643!
[2025-08-21 19:52:51,011][11386] Updated weights for policy 0, policy_version 110 (0.0022)
[2025-08-21 19:52:53,952][02158] Fps is (10 sec: 3686.7, 60 sec: 4027.7, 300 sec: 3822.9). Total num frames: 458752. Throughput: 0: 1003.1. Samples: 114130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:52:53,957][02158] Avg episode reward: [(0, '4.656')]
[2025-08-21 19:52:53,960][11373] Saving new best policy, reward=4.656!
[2025-08-21 19:52:58,952][02158] Fps is (10 sec: 2867.2, 60 sec: 3891.3, 300 sec: 3768.3). Total num frames: 471040. Throughput: 0: 965.4. Samples: 118794. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:52:58,953][02158] Avg episode reward: [(0, '4.468')]
[2025-08-21 19:53:03,544][11386] Updated weights for policy 0, policy_version 120 (0.0029)
[2025-08-21 19:53:03,952][02158] Fps is (10 sec: 3276.8, 60 sec: 3959.5, 300 sec: 3780.9). Total num frames: 491520. Throughput: 0: 970.0. Samples: 121260. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:53:03,953][02158] Avg episode reward: [(0, '4.417')]
[2025-08-21 19:53:08,952][02158] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3792.6). Total num frames: 512000. Throughput: 0: 959.7. Samples: 127916. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:53:08,956][02158] Avg episode reward: [(0, '4.505')]
[2025-08-21 19:53:13,673][11386] Updated weights for policy 0, policy_version 130 (0.0020)
[2025-08-21 19:53:13,952][02158] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3803.4). Total num frames: 532480. Throughput: 0: 951.4. Samples: 133640. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:53:13,954][02158] Avg episode reward: [(0, '4.564')]
[2025-08-21 19:53:18,952][02158] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3813.5). Total num frames: 552960. Throughput: 0: 957.3. Samples: 136154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:53:18,957][02158] Avg episode reward: [(0, '4.709')]
[2025-08-21 19:53:18,964][11373] Saving new best policy, reward=4.709!
[2025-08-21 19:53:23,114][11386] Updated weights for policy 0, policy_version 140 (0.0013)
[2025-08-21 19:53:23,952][02158] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3822.9). Total num frames: 573440. Throughput: 0: 953.2. Samples: 143100. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:53:23,953][02158] Avg episode reward: [(0, '4.582')]
[2025-08-21 19:53:28,952][02158] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3805.3). Total num frames: 589824. Throughput: 0: 942.7. Samples: 148658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:53:28,953][02158] Avg episode reward: [(0, '4.401')]
[2025-08-21 19:53:33,764][11386] Updated weights for policy 0, policy_version 150 (0.0026)
[2025-08-21 19:53:33,952][02158] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3840.0). Total num frames: 614400. Throughput: 0: 950.8. Samples: 151292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:53:33,953][02158] Avg episode reward: [(0, '4.551')]
[2025-08-21 19:53:38,952][02158] Fps is (10 sec: 4505.6, 60 sec: 3822.9, 300 sec: 3847.8). Total num frames: 634880. Throughput: 0: 982.9. Samples: 158360. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:53:38,957][02158] Avg episode reward: [(0, '4.554')]
[2025-08-21 19:53:43,938][11386] Updated weights for policy 0, policy_version 160 (0.0012)
[2025-08-21 19:53:43,954][02158] Fps is (10 sec: 4095.1, 60 sec: 3891.1, 300 sec: 3855.0). Total num frames: 655360. Throughput: 0: 1004.9. Samples: 164016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:53:43,958][02158] Avg episode reward: [(0, '4.429')]
[2025-08-21 19:53:48,952][02158] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3861.9). Total num frames: 675840. Throughput: 0: 1014.9. Samples: 166930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:53:48,954][02158] Avg episode reward: [(0, '4.520')]
[2025-08-21 19:53:52,942][11386] Updated weights for policy 0, policy_version 170 (0.0012)
[2025-08-21 19:53:53,952][02158] Fps is (10 sec: 4506.6, 60 sec: 4027.7, 300 sec: 3891.2). Total num frames: 700416. Throughput: 0: 1025.7. Samples: 174072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:53:53,953][02158] Avg episode reward: [(0, '4.569')]
[2025-08-21 19:53:58,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3874.6). Total num frames: 716800. Throughput: 0: 1017.8. Samples: 179440. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:53:58,953][02158] Avg episode reward: [(0, '4.627')]
[2025-08-21 19:54:03,581][11386] Updated weights for policy 0, policy_version 180 (0.0016)
[2025-08-21 19:54:03,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3880.4). Total num frames: 737280. Throughput: 0: 1026.6. Samples: 182352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:54:03,953][02158] Avg episode reward: [(0, '4.713')]
[2025-08-21 19:54:03,955][11373] Saving new best policy, reward=4.713!
[2025-08-21 19:54:08,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3885.9). Total num frames: 757760. Throughput: 0: 1025.2. Samples: 189234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:54:08,956][02158] Avg episode reward: [(0, '4.681')]
[2025-08-21 19:54:13,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3870.7). Total num frames: 774144. Throughput: 0: 1015.9. Samples: 194372. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:54:13,953][02158] Avg episode reward: [(0, '4.603')]
[2025-08-21 19:54:14,429][11386] Updated weights for policy 0, policy_version 190 (0.0024)
[2025-08-21 19:54:18,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3896.2). Total num frames: 798720. Throughput: 0: 1025.7. Samples: 197448. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:54:18,954][02158] Avg episode reward: [(0, '4.365')]
[2025-08-21 19:54:23,241][11386] Updated weights for policy 0, policy_version 200 (0.0020)
[2025-08-21 19:54:23,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3901.0). Total num frames: 819200. Throughput: 0: 1023.3. Samples: 204408. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:54:23,956][02158] Avg episode reward: [(0, '4.449')]
[2025-08-21 19:54:28,955][02158] Fps is (10 sec: 3685.4, 60 sec: 4095.8, 300 sec: 3886.4). Total num frames: 835584. Throughput: 0: 1010.2. Samples: 209476. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:54:28,956][02158] Avg episode reward: [(0, '4.520')]
[2025-08-21 19:54:28,967][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000204_835584.pth...
[2025-08-21 19:54:29,131][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000204_835584.pth
[2025-08-21 19:54:33,952][02158] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3891.2). Total num frames: 856064. Throughput: 0: 1014.8. Samples: 212598. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-08-21 19:54:33,953][02158] Avg episode reward: [(0, '4.653')]
[2025-08-21 19:54:34,016][11386] Updated weights for policy 0, policy_version 210 (0.0028)
[2025-08-21 19:54:38,952][02158] Fps is (10 sec: 4506.9, 60 sec: 4096.0, 300 sec: 3914.0). Total num frames: 880640. Throughput: 0: 1014.0. Samples: 219702. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:54:38,956][02158] Avg episode reward: [(0, '4.551')]
[2025-08-21 19:54:43,952][02158] Fps is (10 sec: 4096.1, 60 sec: 4027.9, 300 sec: 3900.1). Total num frames: 897024. Throughput: 0: 1013.6. Samples: 225050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:54:43,954][02158] Avg episode reward: [(0, '4.337')]
[2025-08-21 19:54:44,534][11386] Updated weights for policy 0, policy_version 220 (0.0018)
[2025-08-21 19:54:48,952][02158] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 3921.7). Total num frames: 921600. Throughput: 0: 1018.4. Samples: 228182. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:54:48,957][02158] Avg episode reward: [(0, '4.568')]
[2025-08-21 19:54:53,195][11386] Updated weights for policy 0, policy_version 230 (0.0023)
[2025-08-21 19:54:53,952][02158] Fps is (10 sec: 4915.3, 60 sec: 4096.0, 300 sec: 3942.4). Total num frames: 946176. Throughput: 0: 1024.6. Samples: 235340. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:54:53,956][02158] Avg episode reward: [(0, '4.938')]
[2025-08-21 19:54:53,959][11373] Saving new best policy, reward=4.938!
[2025-08-21 19:54:58,952][02158] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3912.1). Total num frames: 958464. Throughput: 0: 1023.3. Samples: 240420. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-08-21 19:54:58,955][02158] Avg episode reward: [(0, '4.913')]
[2025-08-21 19:55:03,866][11386] Updated weights for policy 0, policy_version 240 (0.0011)
[2025-08-21 19:55:03,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3932.2). Total num frames: 983040. Throughput: 0: 1026.8. Samples: 243652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:55:03,955][02158] Avg episode reward: [(0, '4.589')]
[2025-08-21 19:55:08,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3935.4). Total num frames: 1003520. Throughput: 0: 1029.1. Samples: 250716. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-08-21 19:55:08,957][02158] Avg episode reward: [(0, '4.424')]
[2025-08-21 19:55:13,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3922.7). Total num frames: 1019904. Throughput: 0: 1031.3. Samples: 255880. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:55:13,953][02158] Avg episode reward: [(0, '4.468')]
[2025-08-21 19:55:14,400][11386] Updated weights for policy 0, policy_version 250 (0.0019)
[2025-08-21 19:55:18,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3941.4). Total num frames: 1044480. Throughput: 0: 1035.1. Samples: 259176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:55:18,957][02158] Avg episode reward: [(0, '4.453')]
[2025-08-21 19:55:22,940][11386] Updated weights for policy 0, policy_version 260 (0.0014)
[2025-08-21 19:55:23,955][02158] Fps is (10 sec: 4913.8, 60 sec: 4164.1, 300 sec: 3959.4). Total num frames: 1069056. Throughput: 0: 1035.2. Samples: 266290. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:55:23,960][02158] Avg episode reward: [(0, '4.380')]
[2025-08-21 19:55:28,955][02158] Fps is (10 sec: 3685.3, 60 sec: 4096.0, 300 sec: 3932.1). Total num frames: 1081344. Throughput: 0: 1028.3. Samples: 271326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:55:28,956][02158] Avg episode reward: [(0, '4.429')]
[2025-08-21 19:55:33,637][11386] Updated weights for policy 0, policy_version 270 (0.0023)
[2025-08-21 19:55:33,952][02158] Fps is (10 sec: 3687.5, 60 sec: 4164.3, 300 sec: 3949.7). Total num frames: 1105920. Throughput: 0: 1030.9. Samples: 274574. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:55:33,955][02158] Avg episode reward: [(0, '4.567')]
[2025-08-21 19:55:38,952][02158] Fps is (10 sec: 4916.8, 60 sec: 4164.3, 300 sec: 3966.7). Total num frames: 1130496. Throughput: 0: 1029.0. Samples: 281644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:55:38,956][02158] Avg episode reward: [(0, '4.547')]
[2025-08-21 19:55:43,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3940.6). Total num frames: 1142784. Throughput: 0: 1026.8. Samples: 286624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:55:43,954][02158] Avg episode reward: [(0, '4.782')]
[2025-08-21 19:55:44,137][11386] Updated weights for policy 0, policy_version 280 (0.0016)
[2025-08-21 19:55:48,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 1167360. Throughput: 0: 1030.0. Samples: 290004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:55:48,956][02158] Avg episode reward: [(0, '4.779')]
[2025-08-21 19:55:52,941][11386] Updated weights for policy 0, policy_version 290 (0.0018)
[2025-08-21 19:55:53,955][02158] Fps is (10 sec: 4913.8, 60 sec: 4095.8, 300 sec: 4040.4). Total num frames: 1191936. Throughput: 0: 1029.6. Samples: 297050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:55:53,956][02158] Avg episode reward: [(0, '4.813')]
[2025-08-21 19:55:58,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1204224. Throughput: 0: 1018.0. Samples: 301692. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:55:58,957][02158] Avg episode reward: [(0, '4.948')]
[2025-08-21 19:55:58,963][11373] Saving new best policy, reward=4.948!
[2025-08-21 19:56:03,781][11386] Updated weights for policy 0, policy_version 300 (0.0014)
[2025-08-21 19:56:03,952][02158] Fps is (10 sec: 3687.5, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1228800. Throughput: 0: 1017.8. Samples: 304978. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:56:03,956][02158] Avg episode reward: [(0, '4.890')]
[2025-08-21 19:56:08,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1249280. Throughput: 0: 1017.3. Samples: 312064. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:56:08,955][02158] Avg episode reward: [(0, '4.846')]
[2025-08-21 19:56:13,953][02158] Fps is (10 sec: 3686.1, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 1265664. Throughput: 0: 1014.0. Samples: 316954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:56:13,960][02158] Avg episode reward: [(0, '4.947')]
[2025-08-21 19:56:14,339][11386] Updated weights for policy 0, policy_version 310 (0.0014)
[2025-08-21 19:56:18,952][02158] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1290240. Throughput: 0: 1021.3. Samples: 320532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:56:18,954][02158] Avg episode reward: [(0, '4.840')]
[2025-08-21 19:56:23,035][11386] Updated weights for policy 0, policy_version 320 (0.0017)
[2025-08-21 19:56:23,953][02158] Fps is (10 sec: 4505.4, 60 sec: 4027.9, 300 sec: 4054.4). Total num frames: 1310720. Throughput: 0: 1020.2. Samples: 327552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:56:23,954][02158] Avg episode reward: [(0, '4.695')]
[2025-08-21 19:56:28,952][02158] Fps is (10 sec: 3686.5, 60 sec: 4096.2, 300 sec: 4054.3). Total num frames: 1327104. Throughput: 0: 1013.4. Samples: 332228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:56:28,953][02158] Avg episode reward: [(0, '4.691')]
[2025-08-21 19:56:28,959][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000324_1327104.pth...
[2025-08-21 19:56:29,067][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000324_1327104.pth
[2025-08-21 19:56:33,948][11386] Updated weights for policy 0, policy_version 330 (0.0018)
[2025-08-21 19:56:33,952][02158] Fps is (10 sec: 4096.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1351680. Throughput: 0: 1011.8. Samples: 335534. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:56:33,953][02158] Avg episode reward: [(0, '4.550')]
[2025-08-21 19:56:38,957][02158] Fps is (10 sec: 4503.4, 60 sec: 4027.4, 300 sec: 4054.3). Total num frames: 1372160. Throughput: 0: 1012.8. Samples: 342628. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:56:38,968][02158] Avg episode reward: [(0, '4.702')]
[2025-08-21 19:56:43,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1388544. Throughput: 0: 1017.9. Samples: 347496. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:56:43,955][02158] Avg episode reward: [(0, '4.746')]
[2025-08-21 19:56:44,504][11386] Updated weights for policy 0, policy_version 340 (0.0022)
[2025-08-21 19:56:48,952][02158] Fps is (10 sec: 3688.2, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1409024. Throughput: 0: 1023.0. Samples: 351012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:56:48,957][02158] Avg episode reward: [(0, '4.895')]
[2025-08-21 19:56:53,382][11386] Updated weights for policy 0, policy_version 350 (0.0018)
[2025-08-21 19:56:53,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4027.9, 300 sec: 4054.4). Total num frames: 1433600. Throughput: 0: 1022.9. Samples: 358094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:56:53,955][02158] Avg episode reward: [(0, '4.852')]
[2025-08-21 19:56:58,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1449984. Throughput: 0: 1019.3. Samples: 362824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:56:58,953][02158] Avg episode reward: [(0, '4.796')]
[2025-08-21 19:57:03,838][11386] Updated weights for policy 0, policy_version 360 (0.0015)
[2025-08-21 19:57:03,952][02158] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1474560. Throughput: 0: 1019.2. Samples: 366398. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:57:03,955][02158] Avg episode reward: [(0, '4.977')]
[2025-08-21 19:57:03,960][11373] Saving new best policy, reward=4.977!
[2025-08-21 19:57:08,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1490944. Throughput: 0: 1017.4. Samples: 373336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:57:08,953][02158] Avg episode reward: [(0, '4.857')]
[2025-08-21 19:57:13,952][02158] Fps is (10 sec: 2867.3, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1503232. Throughput: 0: 997.7. Samples: 377126. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:57:13,953][02158] Avg episode reward: [(0, '4.617')]
[2025-08-21 19:57:16,202][11386] Updated weights for policy 0, policy_version 370 (0.0022)
[2025-08-21 19:57:18,952][02158] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1527808. Throughput: 0: 983.4. Samples: 379786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:57:18,956][02158] Avg episode reward: [(0, '4.567')]
[2025-08-21 19:57:23,952][02158] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1548288. Throughput: 0: 978.4. Samples: 386652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:57:23,953][02158] Avg episode reward: [(0, '4.531')]
[2025-08-21 19:57:25,099][11386] Updated weights for policy 0, policy_version 380 (0.0015)
[2025-08-21 19:57:28,954][02158] Fps is (10 sec: 3685.7, 60 sec: 3959.3, 300 sec: 4012.7). Total num frames: 1564672. Throughput: 0: 996.4. Samples: 392336. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:57:28,955][02158] Avg episode reward: [(0, '4.548')]
[2025-08-21 19:57:33,952][02158] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 1585152. Throughput: 0: 975.0. Samples: 394888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:57:33,956][02158] Avg episode reward: [(0, '4.784')]
[2025-08-21 19:57:35,683][11386] Updated weights for policy 0, policy_version 390 (0.0020)
[2025-08-21 19:57:38,952][02158] Fps is (10 sec: 4506.4, 60 sec: 3959.8, 300 sec: 4026.6). Total num frames: 1609728. Throughput: 0: 975.2. Samples: 401980. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:57:38,953][02158] Avg episode reward: [(0, '4.840')]
[2025-08-21 19:57:43,953][02158] Fps is (10 sec: 4095.5, 60 sec: 3959.4, 300 sec: 4012.7). Total num frames: 1626112. Throughput: 0: 995.1. Samples: 407604. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:57:43,958][02158] Avg episode reward: [(0, '5.066')]
[2025-08-21 19:57:44,021][11373] Saving new best policy, reward=5.066!
[2025-08-21 19:57:46,230][11386] Updated weights for policy 0, policy_version 400 (0.0012)
[2025-08-21 19:57:48,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1650688. Throughput: 0: 973.9. Samples: 410222. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:57:48,954][02158] Avg episode reward: [(0, '5.127')]
[2025-08-21 19:57:48,964][11373] Saving new best policy, reward=5.127!
[2025-08-21 19:57:53,952][02158] Fps is (10 sec: 4506.1, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 1671168. Throughput: 0: 976.6. Samples: 417282. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:57:53,953][02158] Avg episode reward: [(0, '4.903')]
[2025-08-21 19:57:54,921][11386] Updated weights for policy 0, policy_version 410 (0.0020)
[2025-08-21 19:57:58,952][02158] Fps is (10 sec: 3686.3, 60 sec: 3959.4, 300 sec: 4054.3). Total num frames: 1687552. Throughput: 0: 1017.3. Samples: 422904. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:57:58,954][02158] Avg episode reward: [(0, '4.923')]
[2025-08-21 19:58:03,952][02158] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 1712128. Throughput: 0: 1021.1. Samples: 425734. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:58:03,955][02158] Avg episode reward: [(0, '5.177')]
[2025-08-21 19:58:03,958][11373] Saving new best policy, reward=5.177!
[2025-08-21 19:58:05,814][11386] Updated weights for policy 0, policy_version 420 (0.0014)
[2025-08-21 19:58:08,952][02158] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1732608. Throughput: 0: 1018.3. Samples: 432476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:58:08,956][02158] Avg episode reward: [(0, '5.248')]
[2025-08-21 19:58:08,963][11373] Saving new best policy, reward=5.248!
[2025-08-21 19:58:13,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1748992. Throughput: 0: 1015.2. Samples: 438016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:58:13,957][02158] Avg episode reward: [(0, '5.500')]
[2025-08-21 19:58:13,959][11373] Saving new best policy, reward=5.500!
[2025-08-21 19:58:16,412][11386] Updated weights for policy 0, policy_version 430 (0.0027)
[2025-08-21 19:58:18,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1769472. Throughput: 0: 1021.2. Samples: 440842. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:58:18,957][02158] Avg episode reward: [(0, '5.619')]
[2025-08-21 19:58:19,033][11373] Saving new best policy, reward=5.619!
[2025-08-21 19:58:23,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1794048. Throughput: 0: 1017.6. Samples: 447770. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:58:23,953][02158] Avg episode reward: [(0, '6.154')]
[2025-08-21 19:58:23,954][11373] Saving new best policy, reward=6.154!
[2025-08-21 19:58:25,354][11386] Updated weights for policy 0, policy_version 440 (0.0016)
[2025-08-21 19:58:28,954][02158] Fps is (10 sec: 4095.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1810432. Throughput: 0: 1009.1. Samples: 453016. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:58:28,957][02158] Avg episode reward: [(0, '6.782')]
[2025-08-21 19:58:28,967][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000442_1810432.pth...
[2025-08-21 19:58:29,113][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000442_1810432.pth
[2025-08-21 19:58:29,127][11373] Saving new best policy, reward=6.782!
[2025-08-21 19:58:33,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1830912. Throughput: 0: 1014.8. Samples: 455890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 19:58:33,957][02158] Avg episode reward: [(0, '6.613')]
[2025-08-21 19:58:36,032][11386] Updated weights for policy 0, policy_version 450 (0.0024)
[2025-08-21 19:58:38,952][02158] Fps is (10 sec: 4506.7, 60 sec: 4096.0, 300 sec: 4068.3). Total num frames: 1855488. Throughput: 0: 1016.2. Samples: 463012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:58:38,953][02158] Avg episode reward: [(0, '6.566')]
[2025-08-21 19:58:43,954][02158] Fps is (10 sec: 4095.3, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1871872. Throughput: 0: 1009.5. Samples: 468332. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:58:43,955][02158] Avg episode reward: [(0, '6.783')]
[2025-08-21 19:58:43,958][11373] Saving new best policy, reward=6.783!
[2025-08-21 19:58:46,739][11386] Updated weights for policy 0, policy_version 460 (0.0034)
[2025-08-21 19:58:48,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1892352. Throughput: 0: 1014.1. Samples: 471368. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:58:48,955][02158] Avg episode reward: [(0, '7.128')]
[2025-08-21 19:58:48,965][11373] Saving new best policy, reward=7.128!
[2025-08-21 19:58:53,952][02158] Fps is (10 sec: 4506.2, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1916928. Throughput: 0: 1019.1. Samples: 478334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:58:53,957][02158] Avg episode reward: [(0, '6.561')]
[2025-08-21 19:58:56,114][11386] Updated weights for policy 0, policy_version 470 (0.0016)
[2025-08-21 19:58:58,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 1933312. Throughput: 0: 1012.9. Samples: 483598. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:58:58,955][02158] Avg episode reward: [(0, '6.267')]
[2025-08-21 19:59:03,952][02158] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1953792. Throughput: 0: 1014.6. Samples: 486500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 19:59:03,953][02158] Avg episode reward: [(0, '6.402')]
[2025-08-21 19:59:06,168][11386] Updated weights for policy 0, policy_version 480 (0.0023)
[2025-08-21 19:59:08,952][02158] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1978368. Throughput: 0: 1019.7. Samples: 493658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:59:08,957][02158] Avg episode reward: [(0, '7.022')]
[2025-08-21 19:59:13,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1990656. Throughput: 0: 1019.2. Samples: 498878. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:59:13,955][02158] Avg episode reward: [(0, '7.348')]
[2025-08-21 19:59:13,958][11373] Saving new best policy, reward=7.348!
[2025-08-21 19:59:16,721][11386] Updated weights for policy 0, policy_version 490 (0.0012)
[2025-08-21 19:59:18,952][02158] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2015232. Throughput: 0: 1024.4. Samples: 501988. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:59:18,956][02158] Avg episode reward: [(0, '8.149')]
[2025-08-21 19:59:18,962][11373] Saving new best policy, reward=8.149!
[2025-08-21 19:59:23,952][02158] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 4082.2). Total num frames: 2039808. Throughput: 0: 1024.3. Samples: 509106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 19:59:23,956][02158] Avg episode reward: [(0, '8.768')]
[2025-08-21 19:59:23,960][11373] Saving new best policy, reward=8.768!
[2025-08-21 19:59:26,285][11386] Updated weights for policy 0, policy_version 500 (0.0026)
[2025-08-21 19:59:28,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 4068.2). Total num frames: 2056192. Throughput: 0: 1020.3. Samples: 514244. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 19:59:28,953][02158] Avg episode reward: [(0, '9.150')]
[2025-08-21 19:59:28,960][11373] Saving new best policy, reward=9.150!
[2025-08-21 19:59:33,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2076672. Throughput: 0: 1020.9. Samples: 517310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 19:59:33,956][02158] Avg episode reward: [(0, '9.589')]
[2025-08-21 19:59:33,960][11373] Saving new best policy, reward=9.589!
[2025-08-21 19:59:36,503][11386] Updated weights for policy 0, policy_version 510 (0.0031)
[2025-08-21 19:59:38,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2097152. Throughput: 0: 1010.5. Samples: 523808. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 19:59:38,953][02158] Avg episode reward: [(0, '9.928')]
[2025-08-21 19:59:39,015][11373] Saving new best policy, reward=9.928!
[2025-08-21 19:59:43,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 2113536. Throughput: 0: 1001.9. Samples: 528684. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-08-21 19:59:43,955][02158] Avg episode reward: [(0, '10.221')]
[2025-08-21 19:59:43,960][11373] Saving new best policy, reward=10.221!
[2025-08-21 19:59:47,290][11386] Updated weights for policy 0, policy_version 520 (0.0011)
[2025-08-21 19:59:48,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2134016. Throughput: 0: 1013.2. Samples: 532094. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 19:59:48,953][02158] Avg episode reward: [(0, '9.997')]
[2025-08-21 19:59:53,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4068.2). Total num frames: 2158592. Throughput: 0: 1010.2. Samples: 539118. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-08-21 19:59:53,955][02158] Avg episode reward: [(0, '9.318')]
[2025-08-21 19:59:56,968][11386] Updated weights for policy 0, policy_version 530 (0.0014)
[2025-08-21 19:59:58,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2174976. Throughput: 0: 1001.2. Samples: 543932. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-08-21 19:59:58,955][02158] Avg episode reward: [(0, '8.927')]
[2025-08-21 20:00:03,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2195456. Throughput: 0: 1005.2. Samples: 547220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 20:00:03,953][02158] Avg episode reward: [(0, '9.344')]
[2025-08-21 20:00:06,576][11386] Updated weights for policy 0, policy_version 540 (0.0015)
[2025-08-21 20:00:08,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4068.2). Total num frames: 2220032. Throughput: 0: 1009.6. Samples: 554536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:00:08,956][02158] Avg episode reward: [(0, '9.447')]
[2025-08-21 20:00:13,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2236416. Throughput: 0: 1007.6. Samples: 559584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 20:00:13,956][02158] Avg episode reward: [(0, '9.531')]
[2025-08-21 20:00:17,229][11386] Updated weights for policy 0, policy_version 550 (0.0015)
[2025-08-21 20:00:18,952][02158] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2260992. Throughput: 0: 1012.8. Samples: 562888. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:00:18,956][02158] Avg episode reward: [(0, '9.328')]
[2025-08-21 20:00:23,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4068.3). Total num frames: 2281472. Throughput: 0: 1026.6. Samples: 570006. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:00:23,957][02158] Avg episode reward: [(0, '10.144')]
[2025-08-21 20:00:27,439][11386] Updated weights for policy 0, policy_version 560 (0.0021)
[2025-08-21 20:00:28,953][02158] Fps is (10 sec: 3686.1, 60 sec: 4027.7, 300 sec: 4040.4). Total num frames: 2297856. Throughput: 0: 1027.5. Samples: 574924. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:00:28,954][02158] Avg episode reward: [(0, '10.493')]
[2025-08-21 20:00:28,963][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000561_2297856.pth...
[2025-08-21 20:00:29,086][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000561_2297856.pth
[2025-08-21 20:00:29,097][11373] Saving new best policy, reward=10.493!
[2025-08-21 20:00:33,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2318336. Throughput: 0: 1020.9. Samples: 578036. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 20:00:33,956][02158] Avg episode reward: [(0, '10.958')]
[2025-08-21 20:00:33,958][11373] Saving new best policy, reward=10.958!
[2025-08-21 20:00:36,826][11386] Updated weights for policy 0, policy_version 570 (0.0024)
[2025-08-21 20:00:38,955][02158] Fps is (10 sec: 4095.3, 60 sec: 4027.6, 300 sec: 4054.3). Total num frames: 2338816. Throughput: 0: 1020.0. Samples: 585020. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:00:38,959][02158] Avg episode reward: [(0, '11.665')]
[2025-08-21 20:00:38,985][11373] Saving new best policy, reward=11.665!
[2025-08-21 20:00:43,953][02158] Fps is (10 sec: 4095.6, 60 sec: 4095.9, 300 sec: 4040.4). Total num frames: 2359296. Throughput: 0: 1023.2. Samples: 589978. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:00:43,958][02158] Avg episode reward: [(0, '12.141')]
[2025-08-21 20:00:43,964][11373] Saving new best policy, reward=12.141!
[2025-08-21 20:00:47,301][11386] Updated weights for policy 0, policy_version 580 (0.0011)
[2025-08-21 20:00:48,952][02158] Fps is (10 sec: 4097.1, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2379776. Throughput: 0: 1025.0. Samples: 593344. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-08-21 20:00:48,957][02158] Avg episode reward: [(0, '12.436')]
[2025-08-21 20:00:48,965][11373] Saving new best policy, reward=12.436!
[2025-08-21 20:00:53,952][02158] Fps is (10 sec: 4506.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2404352. Throughput: 0: 1016.9. Samples: 600296. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 20:00:53,955][02158] Avg episode reward: [(0, '12.543')]
[2025-08-21 20:00:53,956][11373] Saving new best policy, reward=12.543!
[2025-08-21 20:00:58,135][11386] Updated weights for policy 0, policy_version 590 (0.0019)
[2025-08-21 20:00:58,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2420736. Throughput: 0: 1013.4. Samples: 605186. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-08-21 20:00:58,957][02158] Avg episode reward: [(0, '12.270')]
[2025-08-21 20:01:03,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2441216. Throughput: 0: 1018.1. Samples: 608704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:01:03,959][02158] Avg episode reward: [(0, '11.996')]
[2025-08-21 20:01:06,685][11386] Updated weights for policy 0, policy_version 600 (0.0020)
[2025-08-21 20:01:08,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 2461696. Throughput: 0: 1011.6. Samples: 615528. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-08-21 20:01:08,955][02158] Avg episode reward: [(0, '11.801')]
[2025-08-21 20:01:13,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2482176. Throughput: 0: 1015.2. Samples: 620608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:01:13,953][02158] Avg episode reward: [(0, '12.340')]
[2025-08-21 20:01:17,269][11386] Updated weights for policy 0, policy_version 610 (0.0016)
[2025-08-21 20:01:18,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2502656. Throughput: 0: 1024.8. Samples: 624152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:01:18,953][02158] Avg episode reward: [(0, '12.233')]
[2025-08-21 20:01:23,952][02158] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2519040. Throughput: 0: 1005.1. Samples: 630246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:01:23,955][02158] Avg episode reward: [(0, '12.637')]
[2025-08-21 20:01:23,958][11373] Saving new best policy, reward=12.637!
[2025-08-21 20:01:28,952][02158] Fps is (10 sec: 2867.1, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2531328. Throughput: 0: 974.5. Samples: 633832. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:01:28,955][02158] Avg episode reward: [(0, '12.977')]
[2025-08-21 20:01:28,963][11373] Saving new best policy, reward=12.977!
[2025-08-21 20:01:30,064][11386] Updated weights for policy 0, policy_version 620 (0.0034)
[2025-08-21 20:01:33,952][02158] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.8). Total num frames: 2555904. Throughput: 0: 965.2. Samples: 636776. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-08-21 20:01:33,953][02158] Avg episode reward: [(0, '12.797')]
[2025-08-21 20:01:38,810][11386] Updated weights for policy 0, policy_version 630 (0.0021)
[2025-08-21 20:01:38,953][02158] Fps is (10 sec: 4914.8, 60 sec: 4027.8, 300 sec: 4040.4). Total num frames: 2580480. Throughput: 0: 971.7. Samples: 644022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 20:01:38,955][02158] Avg episode reward: [(0, '13.390')]
[2025-08-21 20:01:38,965][11373] Saving new best policy, reward=13.390!
[2025-08-21 20:01:43,952][02158] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 4012.7). Total num frames: 2592768. Throughput: 0: 976.0. Samples: 649106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:01:43,956][02158] Avg episode reward: [(0, '12.968')]
[2025-08-21 20:01:48,952][02158] Fps is (10 sec: 3686.8, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 2617344. Throughput: 0: 975.9. Samples: 652620. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:01:48,956][02158] Avg episode reward: [(0, '13.766')]
[2025-08-21 20:01:48,964][11373] Saving new best policy, reward=13.766!
[2025-08-21 20:01:49,371][11386] Updated weights for policy 0, policy_version 640 (0.0035)
[2025-08-21 20:01:53,952][02158] Fps is (10 sec: 4915.0, 60 sec: 3959.4, 300 sec: 4040.5). Total num frames: 2641920. Throughput: 0: 978.7. Samples: 659572. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 20:01:53,956][02158] Avg episode reward: [(0, '13.949')]
[2025-08-21 20:01:53,960][11373] Saving new best policy, reward=13.949!
[2025-08-21 20:01:58,952][02158] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3998.8). Total num frames: 2654208. Throughput: 0: 973.0. Samples: 664394. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:01:58,957][02158] Avg episode reward: [(0, '13.550')]
[2025-08-21 20:02:00,021][11386] Updated weights for policy 0, policy_version 650 (0.0019)
[2025-08-21 20:02:03,952][02158] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2678784. Throughput: 0: 968.8. Samples: 667750. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:02:03,957][02158] Avg episode reward: [(0, '14.365')]
[2025-08-21 20:02:03,961][11373] Saving new best policy, reward=14.365!
[2025-08-21 20:02:08,843][11386] Updated weights for policy 0, policy_version 660 (0.0013)
[2025-08-21 20:02:08,952][02158] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2703360. Throughput: 0: 991.8. Samples: 674876. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:02:08,953][02158] Avg episode reward: [(0, '14.826')]
[2025-08-21 20:02:08,964][11373] Saving new best policy, reward=14.826!
[2025-08-21 20:02:13,952][02158] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2715648. Throughput: 0: 1021.8. Samples: 679814. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:02:13,957][02158] Avg episode reward: [(0, '16.664')]
[2025-08-21 20:02:13,962][11373] Saving new best policy, reward=16.664!
[2025-08-21 20:02:18,952][02158] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2740224. Throughput: 0: 1035.5. Samples: 683372. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 20:02:18,953][02158] Avg episode reward: [(0, '17.674')]
[2025-08-21 20:02:18,963][11373] Saving new best policy, reward=17.674!
[2025-08-21 20:02:19,396][11386] Updated weights for policy 0, policy_version 670 (0.0031)
[2025-08-21 20:02:23,952][02158] Fps is (10 sec: 4915.1, 60 sec: 4096.0, 300 sec: 4068.3). Total num frames: 2764800. Throughput: 0: 1031.2. Samples: 690426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:02:23,954][02158] Avg episode reward: [(0, '18.301')]
[2025-08-21 20:02:23,955][11373] Saving new best policy, reward=18.301!
[2025-08-21 20:02:28,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2777088. Throughput: 0: 1024.6. Samples: 695212. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 20:02:28,953][02158] Avg episode reward: [(0, '18.961')]
[2025-08-21 20:02:28,993][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000679_2781184.pth...
[2025-08-21 20:02:29,103][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000679_2781184.pth
[2025-08-21 20:02:29,117][11373] Saving new best policy, reward=18.961!
[2025-08-21 20:02:30,013][11386] Updated weights for policy 0, policy_version 680 (0.0023)
[2025-08-21 20:02:33,952][02158] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2801664. Throughput: 0: 1021.0. Samples: 698564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:02:33,956][02158] Avg episode reward: [(0, '20.341')]
[2025-08-21 20:02:33,960][11373] Saving new best policy, reward=20.341!
[2025-08-21 20:02:38,957][02158] Fps is (10 sec: 4503.4, 60 sec: 4027.5, 300 sec: 4054.3). Total num frames: 2822144. Throughput: 0: 1025.7. Samples: 705732. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:02:38,959][02158] Avg episode reward: [(0, '19.651')]
[2025-08-21 20:02:39,133][11386] Updated weights for policy 0, policy_version 690 (0.0016)
[2025-08-21 20:02:43,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4040.5). Total num frames: 2842624. Throughput: 0: 1025.7. Samples: 710550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 20:02:43,953][02158] Avg episode reward: [(0, '19.938')]
[2025-08-21 20:02:48,952][02158] Fps is (10 sec: 4098.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2863104. Throughput: 0: 1033.1. Samples: 714238. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:02:48,953][02158] Avg episode reward: [(0, '17.840')]
[2025-08-21 20:02:49,183][11386] Updated weights for policy 0, policy_version 700 (0.0031)
[2025-08-21 20:02:53,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 2887680. Throughput: 0: 1036.3. Samples: 721510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:02:53,956][02158] Avg episode reward: [(0, '16.792')]
[2025-08-21 20:02:58,952][02158] Fps is (10 sec: 4095.9, 60 sec: 4164.2, 300 sec: 4040.5). Total num frames: 2904064. Throughput: 0: 1031.1. Samples: 726212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 20:02:58,957][02158] Avg episode reward: [(0, '16.327')]
[2025-08-21 20:02:59,677][11386] Updated weights for policy 0, policy_version 710 (0.0011)
[2025-08-21 20:03:03,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2924544. Throughput: 0: 1026.6. Samples: 729568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:03:03,955][02158] Avg episode reward: [(0, '17.190')]
[2025-08-21 20:03:08,952][02158] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2945024. Throughput: 0: 1026.7. Samples: 736626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:03:08,953][02158] Avg episode reward: [(0, '17.399')]
[2025-08-21 20:03:08,975][11386] Updated weights for policy 0, policy_version 720 (0.0018)
[2025-08-21 20:03:13,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 2965504. Throughput: 0: 1032.1. Samples: 741658. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:03:13,956][02158] Avg episode reward: [(0, '19.338')]
[2025-08-21 20:03:18,952][02158] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2985984. Throughput: 0: 1034.3. Samples: 745106. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:03:18,956][02158] Avg episode reward: [(0, '19.399')]
[2025-08-21 20:03:19,106][11386] Updated weights for policy 0, policy_version 730 (0.0015)
[2025-08-21 20:03:23,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4054.4). Total num frames: 3006464. Throughput: 0: 1028.1. Samples: 751992. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:03:23,955][02158] Avg episode reward: [(0, '17.165')]
[2025-08-21 20:03:28,952][02158] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3026944. Throughput: 0: 1032.2. Samples: 756998. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:03:28,955][02158] Avg episode reward: [(0, '16.841')]
[2025-08-21 20:03:29,654][11386] Updated weights for policy 0, policy_version 740 (0.0021)
[2025-08-21 20:03:33,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3047424. Throughput: 0: 1025.8. Samples: 760398. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:03:33,953][02158] Avg episode reward: [(0, '16.885')]
[2025-08-21 20:03:38,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.3, 300 sec: 4054.4). Total num frames: 3067904. Throughput: 0: 1013.6. Samples: 767124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:03:38,954][02158] Avg episode reward: [(0, '16.441')]
[2025-08-21 20:03:39,348][11386] Updated weights for policy 0, policy_version 750 (0.0016)
[2025-08-21 20:03:43,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3088384. Throughput: 0: 1025.0. Samples: 772338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:03:43,956][02158] Avg episode reward: [(0, '17.512')]
[2025-08-21 20:03:48,900][11386] Updated weights for policy 0, policy_version 760 (0.0011)
[2025-08-21 20:03:48,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3112960. Throughput: 0: 1030.3. Samples: 775932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:03:48,953][02158] Avg episode reward: [(0, '19.792')]
[2025-08-21 20:03:53,955][02158] Fps is (10 sec: 4094.9, 60 sec: 4027.6, 300 sec: 4054.3). Total num frames: 3129344. Throughput: 0: 1022.6. Samples: 782644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:03:53,956][02158] Avg episode reward: [(0, '19.469')]
[2025-08-21 20:03:58,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3149824. Throughput: 0: 1028.2. Samples: 787928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 20:03:58,955][02158] Avg episode reward: [(0, '20.072')]
[2025-08-21 20:03:59,517][11386] Updated weights for policy 0, policy_version 770 (0.0028)
[2025-08-21 20:04:03,952][02158] Fps is (10 sec: 4097.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3170304. Throughput: 0: 1026.7. Samples: 791306. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:04:03,955][02158] Avg episode reward: [(0, '20.490')]
[2025-08-21 20:04:04,008][11373] Saving new best policy, reward=20.490!
[2025-08-21 20:04:08,956][02158] Fps is (10 sec: 4094.3, 60 sec: 4095.7, 300 sec: 4068.2). Total num frames: 3190784. Throughput: 0: 1019.2. Samples: 797860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 20:04:08,963][02158] Avg episode reward: [(0, '19.201')]
[2025-08-21 20:04:09,521][11386] Updated weights for policy 0, policy_version 780 (0.0014)
[2025-08-21 20:04:13,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3211264. Throughput: 0: 1030.6. Samples: 803374. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:04:13,954][02158] Avg episode reward: [(0, '18.859')]
[2025-08-21 20:04:18,393][11386] Updated weights for policy 0, policy_version 790 (0.0013)
[2025-08-21 20:04:18,952][02158] Fps is (10 sec: 4507.4, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3235840. Throughput: 0: 1036.1. Samples: 807022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 20:04:18,955][02158] Avg episode reward: [(0, '18.449')]
[2025-08-21 20:04:23,954][02158] Fps is (10 sec: 4504.9, 60 sec: 4164.1, 300 sec: 4068.2). Total num frames: 3256320. Throughput: 0: 1031.4. Samples: 813540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:04:23,956][02158] Avg episode reward: [(0, '17.691')]
[2025-08-21 20:04:28,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3272704. Throughput: 0: 1037.1. Samples: 819008. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 20:04:28,953][02158] Avg episode reward: [(0, '18.019')]
[2025-08-21 20:04:29,029][11386] Updated weights for policy 0, policy_version 800 (0.0026)
[2025-08-21 20:04:29,035][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000800_3276800.pth...
[2025-08-21 20:04:29,139][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000800_3276800.pth
[2025-08-21 20:04:33,952][02158] Fps is (10 sec: 4096.8, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3297280. Throughput: 0: 1036.0. Samples: 822554. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:04:33,953][02158] Avg episode reward: [(0, '18.609')]
[2025-08-21 20:04:38,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 3313664. Throughput: 0: 1027.2. Samples: 828866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:04:38,956][02158] Avg episode reward: [(0, '18.731')]
[2025-08-21 20:04:39,115][11386] Updated weights for policy 0, policy_version 810 (0.0026)
[2025-08-21 20:04:43,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3338240. Throughput: 0: 1034.8. Samples: 834494. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:04:43,953][02158] Avg episode reward: [(0, '19.464')]
[2025-08-21 20:04:48,085][11386] Updated weights for policy 0, policy_version 820 (0.0016)
[2025-08-21 20:04:48,952][02158] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3362816. Throughput: 0: 1039.7. Samples: 838094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 20:04:48,953][02158] Avg episode reward: [(0, '17.473')]
[2025-08-21 20:04:53,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.4, 300 sec: 4082.1). Total num frames: 3379200. Throughput: 0: 1037.5. Samples: 844542. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 20:04:53,954][02158] Avg episode reward: [(0, '16.843')]
[2025-08-21 20:04:58,696][11386] Updated weights for policy 0, policy_version 830 (0.0017)
[2025-08-21 20:04:58,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3399680. Throughput: 0: 1039.6. Samples: 850156. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 20:04:58,957][02158] Avg episode reward: [(0, '16.699')]
[2025-08-21 20:05:03,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4232.6, 300 sec: 4082.1). Total num frames: 3424256. Throughput: 0: 1039.8. Samples: 853812. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 20:05:03,955][02158] Avg episode reward: [(0, '16.793')]
[2025-08-21 20:05:08,478][11386] Updated weights for policy 0, policy_version 840 (0.0011)
[2025-08-21 20:05:08,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.6, 300 sec: 4082.1). Total num frames: 3440640. Throughput: 0: 1035.0. Samples: 860114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:05:08,955][02158] Avg episode reward: [(0, '18.635')]
[2025-08-21 20:05:13,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 3461120. Throughput: 0: 1044.5. Samples: 866012. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 20:05:13,957][02158] Avg episode reward: [(0, '18.603')]
[2025-08-21 20:05:17,799][11386] Updated weights for policy 0, policy_version 850 (0.0013)
[2025-08-21 20:05:18,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3485696. Throughput: 0: 1041.2. Samples: 869408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:05:18,957][02158] Avg episode reward: [(0, '18.628')]
[2025-08-21 20:05:23,955][02158] Fps is (10 sec: 4094.7, 60 sec: 4095.9, 300 sec: 4082.1). Total num frames: 3502080. Throughput: 0: 1040.5. Samples: 875690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:05:23,957][02158] Avg episode reward: [(0, '17.757')]
[2025-08-21 20:05:28,053][11386] Updated weights for policy 0, policy_version 860 (0.0013)
[2025-08-21 20:05:28,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 3522560. Throughput: 0: 1049.4. Samples: 881716. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 20:05:28,953][02158] Avg episode reward: [(0, '18.858')]
[2025-08-21 20:05:33,958][02158] Fps is (10 sec: 4094.9, 60 sec: 4095.6, 300 sec: 4082.1). Total num frames: 3543040. Throughput: 0: 1045.6. Samples: 885154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 20:05:33,960][02158] Avg episode reward: [(0, '18.018')]
[2025-08-21 20:05:38,956][02158] Fps is (10 sec: 3275.5, 60 sec: 4027.5, 300 sec: 4054.3). Total num frames: 3555328. Throughput: 0: 994.4. Samples: 889294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:05:38,957][02158] Avg episode reward: [(0, '19.531')]
[2025-08-21 20:05:40,711][11386] Updated weights for policy 0, policy_version 870 (0.0033)
[2025-08-21 20:05:43,952][02158] Fps is (10 sec: 3278.7, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 3575808. Throughput: 0: 986.5. Samples: 894550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-08-21 20:05:43,957][02158] Avg episode reward: [(0, '19.468')]
[2025-08-21 20:05:48,952][02158] Fps is (10 sec: 4507.2, 60 sec: 3959.4, 300 sec: 4054.3). Total num frames: 3600384. Throughput: 0: 984.7. Samples: 898122. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:05:48,958][02158] Avg episode reward: [(0, '20.806')]
[2025-08-21 20:05:48,970][11373] Saving new best policy, reward=20.806!
[2025-08-21 20:05:49,506][11386] Updated weights for policy 0, policy_version 880 (0.0019)
[2025-08-21 20:05:53,952][02158] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 3616768. Throughput: 0: 987.1. Samples: 904532. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:05:53,956][02158] Avg episode reward: [(0, '21.748')]
[2025-08-21 20:05:53,957][11373] Saving new best policy, reward=21.748!
[2025-08-21 20:05:58,952][02158] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 3637248. Throughput: 0: 977.7. Samples: 910010. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 20:05:58,953][02158] Avg episode reward: [(0, '21.998')]
[2025-08-21 20:05:58,958][11373] Saving new best policy, reward=21.998!
[2025-08-21 20:06:00,226][11386] Updated weights for policy 0, policy_version 890 (0.0015)
[2025-08-21 20:06:03,952][02158] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4068.2). Total num frames: 3661824. Throughput: 0: 980.5. Samples: 913530. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-08-21 20:06:03,953][02158] Avg episode reward: [(0, '20.701')]
[2025-08-21 20:06:08,952][02158] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 3678208. Throughput: 0: 981.4. Samples: 919848. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 20:06:08,953][02158] Avg episode reward: [(0, '19.266')]
[2025-08-21 20:06:10,628][11386] Updated weights for policy 0, policy_version 900 (0.0013)
[2025-08-21 20:06:13,952][02158] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4054.3). Total num frames: 3698688. Throughput: 0: 978.1. Samples: 925730. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-08-21 20:06:13,956][02158] Avg episode reward: [(0, '20.564')]
[2025-08-21 20:06:18,952][02158] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 3723264. Throughput: 0: 982.3. Samples: 929352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:06:18,953][02158] Avg episode reward: [(0, '20.301')]
[2025-08-21 20:06:19,120][11386] Updated weights for policy 0, policy_version 910 (0.0018)
[2025-08-21 20:06:23,952][02158] Fps is (10 sec: 4505.4, 60 sec: 4027.9, 300 sec: 4109.9). Total num frames: 3743744. Throughput: 0: 1026.9. Samples: 935502. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:06:23,961][02158] Avg episode reward: [(0, '20.050')]
[2025-08-21 20:06:28,961][02158] Fps is (10 sec: 4092.4, 60 sec: 4027.1, 300 sec: 4095.9). Total num frames: 3764224. Throughput: 0: 1042.7. Samples: 941482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-08-21 20:06:28,962][02158] Avg episode reward: [(0, '21.715')]
[2025-08-21 20:06:28,970][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000919_3764224.pth...
[2025-08-21 20:06:29,077][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000912_3735552.pth
[2025-08-21 20:06:29,602][11386] Updated weights for policy 0, policy_version 920 (0.0017)
[2025-08-21 20:06:33,952][02158] Fps is (10 sec: 4505.8, 60 sec: 4096.4, 300 sec: 4096.0). Total num frames: 3788800. Throughput: 0: 1043.5. Samples: 945078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:06:33,957][02158] Avg episode reward: [(0, '22.786')]
[2025-08-21 20:06:33,962][11373] Saving new best policy, reward=22.786!
[2025-08-21 20:06:38,952][02158] Fps is (10 sec: 4099.6, 60 sec: 4164.5, 300 sec: 4109.9). Total num frames: 3805184. Throughput: 0: 1029.9. Samples: 950876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:06:38,953][02158] Avg episode reward: [(0, '22.562')]
[2025-08-21 20:06:40,051][11386] Updated weights for policy 0, policy_version 930 (0.0030)
[2025-08-21 20:06:43,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3825664. Throughput: 0: 1037.8. Samples: 956710. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:06:43,957][02158] Avg episode reward: [(0, '22.332')]
[2025-08-21 20:06:48,840][11386] Updated weights for policy 0, policy_version 940 (0.0016)
[2025-08-21 20:06:48,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3850240. Throughput: 0: 1038.2. Samples: 960248. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:06:48,957][02158] Avg episode reward: [(0, '21.134')]
[2025-08-21 20:06:53,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3862528. Throughput: 0: 1028.2. Samples: 966116. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:06:53,956][02158] Avg episode reward: [(0, '19.595')]
[2025-08-21 20:06:58,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3887104. Throughput: 0: 1039.2. Samples: 972496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:06:58,953][02158] Avg episode reward: [(0, '19.782')]
[2025-08-21 20:06:59,124][11386] Updated weights for policy 0, policy_version 950 (0.0013)
[2025-08-21 20:07:03,952][02158] Fps is (10 sec: 4915.1, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3911680. Throughput: 0: 1039.7. Samples: 976138. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:07:03,956][02158] Avg episode reward: [(0, '19.679')]
[2025-08-21 20:07:08,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3928064. Throughput: 0: 1030.1. Samples: 981858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:07:08,955][02158] Avg episode reward: [(0, '20.115')]
[2025-08-21 20:07:09,477][11386] Updated weights for policy 0, policy_version 960 (0.0021)
[2025-08-21 20:07:13,952][02158] Fps is (10 sec: 3686.5, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3948544. Throughput: 0: 1036.0. Samples: 988094. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:07:13,956][02158] Avg episode reward: [(0, '22.514')]
[2025-08-21 20:07:18,352][11386] Updated weights for policy 0, policy_version 970 (0.0015)
[2025-08-21 20:07:18,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 3973120. Throughput: 0: 1034.3. Samples: 991622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:07:18,953][02158] Avg episode reward: [(0, '23.073')]
[2025-08-21 20:07:18,960][11373] Saving new best policy, reward=23.073!
[2025-08-21 20:07:23,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3989504. Throughput: 0: 1028.8. Samples: 997170. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 20:07:23,953][02158] Avg episode reward: [(0, '23.472')]
[2025-08-21 20:07:23,954][11373] Saving new best policy, reward=23.472!
[2025-08-21 20:07:28,839][11386] Updated weights for policy 0, policy_version 980 (0.0015)
[2025-08-21 20:07:28,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.9, 300 sec: 4109.9). Total num frames: 4014080. Throughput: 0: 1045.7. Samples: 1003768. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:07:28,953][02158] Avg episode reward: [(0, '24.309')]
[2025-08-21 20:07:28,959][11373] Saving new best policy, reward=24.309!
[2025-08-21 20:07:33,952][02158] Fps is (10 sec: 4505.4, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 4034560. Throughput: 0: 1040.9. Samples: 1007088. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:07:33,957][02158] Avg episode reward: [(0, '24.210')]
[2025-08-21 20:07:38,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 4050944. Throughput: 0: 1030.7. Samples: 1012498. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:07:38,961][02158] Avg episode reward: [(0, '23.768')]
[2025-08-21 20:07:39,373][11386] Updated weights for policy 0, policy_version 990 (0.0018)
[2025-08-21 20:07:43,952][02158] Fps is (10 sec: 4096.2, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 4075520. Throughput: 0: 1034.5. Samples: 1019050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:07:43,958][02158] Avg episode reward: [(0, '22.798')]
[2025-08-21 20:07:48,058][11386] Updated weights for policy 0, policy_version 1000 (0.0018)
[2025-08-21 20:07:48,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 4096000. Throughput: 0: 1033.8. Samples: 1022658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:07:48,953][02158] Avg episode reward: [(0, '23.291')]
[2025-08-21 20:07:53,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4109.9). Total num frames: 4116480. Throughput: 0: 1029.6. Samples: 1028190. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:07:53,954][02158] Avg episode reward: [(0, '23.355')]
[2025-08-21 20:07:58,495][11386] Updated weights for policy 0, policy_version 1010 (0.0033)
[2025-08-21 20:07:58,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 4136960. Throughput: 0: 1036.2. Samples: 1034722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:07:58,957][02158] Avg episode reward: [(0, '23.236')]
[2025-08-21 20:08:03,954][02158] Fps is (10 sec: 4504.5, 60 sec: 4164.1, 300 sec: 4123.7). Total num frames: 4161536. Throughput: 0: 1037.5. Samples: 1038310. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:08:03,957][02158] Avg episode reward: [(0, '23.880')]
[2025-08-21 20:08:08,907][11386] Updated weights for policy 0, policy_version 1020 (0.0020)
[2025-08-21 20:08:08,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 4177920. Throughput: 0: 1032.5. Samples: 1043634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 20:08:08,956][02158] Avg episode reward: [(0, '24.317')]
[2025-08-21 20:08:08,966][11373] Saving new best policy, reward=24.317!
[2025-08-21 20:08:13,956][02158] Fps is (10 sec: 3685.9, 60 sec: 4164.0, 300 sec: 4109.8). Total num frames: 4198400. Throughput: 0: 1035.1. Samples: 1050352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:08:13,958][02158] Avg episode reward: [(0, '23.078')]
[2025-08-21 20:08:17,570][11386] Updated weights for policy 0, policy_version 1030 (0.0017)
[2025-08-21 20:08:18,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 4222976. Throughput: 0: 1043.3. Samples: 1054034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:08:18,959][02158] Avg episode reward: [(0, '22.987')]
[2025-08-21 20:08:23,952][02158] Fps is (10 sec: 4097.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 4239360. Throughput: 0: 1036.4. Samples: 1059136. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:08:23,954][02158] Avg episode reward: [(0, '23.026')]
[2025-08-21 20:08:27,942][11386] Updated weights for policy 0, policy_version 1040 (0.0015)
[2025-08-21 20:08:28,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 4263936. Throughput: 0: 1046.2. Samples: 1066128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:08:28,953][02158] Avg episode reward: [(0, '22.227')]
[2025-08-21 20:08:28,964][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001041_4263936.pth...
[2025-08-21 20:08:29,074][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000919_3764224.pth
[2025-08-21 20:08:33,952][02158] Fps is (10 sec: 4505.4, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 4284416. Throughput: 0: 1043.3. Samples: 1069608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:08:33,956][02158] Avg episode reward: [(0, '22.211')]
[2025-08-21 20:08:38,525][11386] Updated weights for policy 0, policy_version 1050 (0.0017)
[2025-08-21 20:08:38,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 4300800. Throughput: 0: 1031.4. Samples: 1074604. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:08:38,953][02158] Avg episode reward: [(0, '22.115')]
[2025-08-21 20:08:43,952][02158] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 4325376. Throughput: 0: 1042.0. Samples: 1081610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:08:43,953][02158] Avg episode reward: [(0, '22.268')]
[2025-08-21 20:08:47,058][11386] Updated weights for policy 0, policy_version 1060 (0.0019)
[2025-08-21 20:08:48,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 4345856. Throughput: 0: 1042.3. Samples: 1085210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:08:48,956][02158] Avg episode reward: [(0, '21.926')]
[2025-08-21 20:08:53,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 4362240. Throughput: 0: 1037.5. Samples: 1090320. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:08:53,954][02158] Avg episode reward: [(0, '22.616')]
[2025-08-21 20:08:57,595][11386] Updated weights for policy 0, policy_version 1070 (0.0017)
[2025-08-21 20:08:58,954][02158] Fps is (10 sec: 4095.2, 60 sec: 4164.1, 300 sec: 4123.7). Total num frames: 4386816. Throughput: 0: 1043.4. Samples: 1097302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:08:58,955][02158] Avg episode reward: [(0, '24.177')]
[2025-08-21 20:09:03,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.2, 300 sec: 4123.8). Total num frames: 4407296. Throughput: 0: 1038.7. Samples: 1100776. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:09:03,959][02158] Avg episode reward: [(0, '25.300')]
[2025-08-21 20:09:03,960][11373] Saving new best policy, reward=25.300!
[2025-08-21 20:09:08,178][11386] Updated weights for policy 0, policy_version 1080 (0.0017)
[2025-08-21 20:09:08,952][02158] Fps is (10 sec: 3687.2, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 4423680. Throughput: 0: 1031.8. Samples: 1105566. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:09:08,953][02158] Avg episode reward: [(0, '25.976')]
[2025-08-21 20:09:08,975][11373] Saving new best policy, reward=25.976!
[2025-08-21 20:09:13,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.5, 300 sec: 4109.9). Total num frames: 4448256. Throughput: 0: 1036.7. Samples: 1112778. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:09:13,957][02158] Avg episode reward: [(0, '25.579')]
[2025-08-21 20:09:16,877][11386] Updated weights for policy 0, policy_version 1090 (0.0012)
[2025-08-21 20:09:18,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 4468736. Throughput: 0: 1037.8. Samples: 1116308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:09:18,953][02158] Avg episode reward: [(0, '25.280')]
[2025-08-21 20:09:23,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 4485120. Throughput: 0: 1031.7. Samples: 1121032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:09:23,953][02158] Avg episode reward: [(0, '26.251')]
[2025-08-21 20:09:24,003][11373] Saving new best policy, reward=26.251!
[2025-08-21 20:09:27,646][11386] Updated weights for policy 0, policy_version 1100 (0.0023)
[2025-08-21 20:09:28,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 4509696. Throughput: 0: 1028.3. Samples: 1127884. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:09:28,957][02158] Avg episode reward: [(0, '25.822')]
[2025-08-21 20:09:33,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 4530176. Throughput: 0: 1028.5. Samples: 1131492. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:09:33,955][02158] Avg episode reward: [(0, '26.183')]
[2025-08-21 20:09:38,952][02158] Fps is (10 sec: 3276.8, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 4542464. Throughput: 0: 1016.9. Samples: 1136082. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:09:38,953][02158] Avg episode reward: [(0, '25.250')]
[2025-08-21 20:09:39,313][11386] Updated weights for policy 0, policy_version 1110 (0.0017)
[2025-08-21 20:09:43,952][02158] Fps is (10 sec: 3276.7, 60 sec: 3959.4, 300 sec: 4068.2). Total num frames: 4562944. Throughput: 0: 978.3. Samples: 1141326. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:09:43,954][02158] Avg episode reward: [(0, '25.842')]
[2025-08-21 20:09:48,863][11386] Updated weights for policy 0, policy_version 1120 (0.0015)
[2025-08-21 20:09:48,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 4587520. Throughput: 0: 982.5. Samples: 1144990. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:09:48,956][02158] Avg episode reward: [(0, '24.291')]
[2025-08-21 20:09:53,952][02158] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 4603904. Throughput: 0: 986.1. Samples: 1149942. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 20:09:53,953][02158] Avg episode reward: [(0, '22.924')]
[2025-08-21 20:09:58,866][11386] Updated weights for policy 0, policy_version 1130 (0.0013)
[2025-08-21 20:09:58,952][02158] Fps is (10 sec: 4095.9, 60 sec: 4027.9, 300 sec: 4082.1). Total num frames: 4628480. Throughput: 0: 984.2. Samples: 1157066. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:09:58,956][02158] Avg episode reward: [(0, '24.246')]
[2025-08-21 20:10:03,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 4648960. Throughput: 0: 985.6. Samples: 1160658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-08-21 20:10:03,957][02158] Avg episode reward: [(0, '24.179')]
[2025-08-21 20:10:08,952][02158] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 4665344. Throughput: 0: 986.6. Samples: 1165430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:10:08,953][02158] Avg episode reward: [(0, '23.789')]
[2025-08-21 20:10:09,549][11386] Updated weights for policy 0, policy_version 1140 (0.0016)
[2025-08-21 20:10:13,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 4689920. Throughput: 0: 993.0. Samples: 1172570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:10:13,957][02158] Avg episode reward: [(0, '25.649')]
[2025-08-21 20:10:18,417][11386] Updated weights for policy 0, policy_version 1150 (0.0020)
[2025-08-21 20:10:18,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 4710400. Throughput: 0: 994.2. Samples: 1176230. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:10:18,958][02158] Avg episode reward: [(0, '25.748')]
[2025-08-21 20:10:23,952][02158] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 4726784. Throughput: 0: 1004.5. Samples: 1181284. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:10:23,954][02158] Avg episode reward: [(0, '24.591')]
[2025-08-21 20:10:28,625][11386] Updated weights for policy 0, policy_version 1160 (0.0018)
[2025-08-21 20:10:28,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4096.1). Total num frames: 4751360. Throughput: 0: 1044.8. Samples: 1188340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 20:10:28,953][02158] Avg episode reward: [(0, '23.024')]
[2025-08-21 20:10:28,960][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001160_4751360.pth...
[2025-08-21 20:10:29,066][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth
[2025-08-21 20:10:33,952][02158] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 4123.8). Total num frames: 4771840. Throughput: 0: 1042.4. Samples: 1191896. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:10:33,953][02158] Avg episode reward: [(0, '23.460')]
[2025-08-21 20:10:38,952][02158] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 4788224. Throughput: 0: 1034.4. Samples: 1196488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:10:38,957][02158] Avg episode reward: [(0, '22.489')]
[2025-08-21 20:10:39,280][11386] Updated weights for policy 0, policy_version 1170 (0.0012)
[2025-08-21 20:10:43,952][02158] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 4812800. Throughput: 0: 1036.0. Samples: 1203684. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:10:43,956][02158] Avg episode reward: [(0, '22.847')]
[2025-08-21 20:10:48,301][11386] Updated weights for policy 0, policy_version 1180 (0.0015)
[2025-08-21 20:10:48,952][02158] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 4833280. Throughput: 0: 1035.2. Samples: 1207242. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-08-21 20:10:48,953][02158] Avg episode reward: [(0, '22.707')]
[2025-08-21 20:10:53,952][02158] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 4853760. Throughput: 0: 1041.0. Samples: 1212274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:10:53,953][02158] Avg episode reward: [(0, '24.260')]
[2025-08-21 20:10:58,357][11386] Updated weights for policy 0, policy_version 1190 (0.0028)
[2025-08-21 20:10:58,952][02158] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 4874240. Throughput: 0: 1035.2. Samples: 1219154. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:10:58,957][02158] Avg episode reward: [(0, '24.211')]
[2025-08-21 20:11:03,955][02158] Fps is (10 sec: 4094.8, 60 sec: 4095.8, 300 sec: 4123.7). Total num frames: 4894720. Throughput: 0: 1033.0. Samples: 1222720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:11:03,959][02158] Avg episode reward: [(0, '25.685')]
[2025-08-21 20:11:08,873][11386] Updated weights for policy 0, policy_version 1200 (0.0029)
[2025-08-21 20:11:08,952][02158] Fps is (10 sec: 4096.2, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 4915200. Throughput: 0: 1030.9. Samples: 1227674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:11:08,953][02158] Avg episode reward: [(0, '26.252')]
[2025-08-21 20:11:08,958][11373] Saving new best policy, reward=26.252!
[2025-08-21 20:11:13,952][02158] Fps is (10 sec: 4097.2, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 4935680. Throughput: 0: 1032.0. Samples: 1234782. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:11:13,958][02158] Avg episode reward: [(0, '26.092')]
[2025-08-21 20:11:17,909][11386] Updated weights for policy 0, policy_version 1210 (0.0024)
[2025-08-21 20:11:18,953][02158] Fps is (10 sec: 4095.7, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 4956160. Throughput: 0: 1034.3. Samples: 1238440. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-08-21 20:11:18,954][02158] Avg episode reward: [(0, '26.419')]
[2025-08-21 20:11:18,962][11373] Saving new best policy, reward=26.419!
[2025-08-21 20:11:23,952][02158] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4110.0). Total num frames: 4976640. Throughput: 0: 1040.6. Samples: 1243316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-08-21 20:11:23,959][02158] Avg episode reward: [(0, '25.139')]
[2025-08-21 20:11:27,993][11386] Updated weights for policy 0, policy_version 1220 (0.0013)
[2025-08-21 20:11:28,952][02158] Fps is (10 sec: 4505.9, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 5001216. Throughput: 0: 1037.2. Samples: 1250360. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-08-21 20:11:28,953][02158] Avg episode reward: [(0, '24.128')]
[2025-08-21 20:11:29,733][02158] Component Batcher_0 stopped!
[2025-08-21 20:11:29,733][11373] Stopping Batcher_0...
[2025-08-21 20:11:29,738][11373] Loop batcher_evt_loop terminating...
[2025-08-21 20:11:29,740][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:11:29,792][11386] Weights refcount: 2 0
[2025-08-21 20:11:29,797][11386] Stopping InferenceWorker_p0-w0...
[2025-08-21 20:11:29,797][02158] Component InferenceWorker_p0-w0 stopped!
[2025-08-21 20:11:29,803][11386] Loop inference_proc0-0_evt_loop terminating...
[2025-08-21 20:11:29,841][11373] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001041_4263936.pth
[2025-08-21 20:11:29,857][11373] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:11:30,014][11373] Stopping LearnerWorker_p0...
[2025-08-21 20:11:30,014][11373] Loop learner_proc0_evt_loop terminating...
[2025-08-21 20:11:30,017][02158] Component LearnerWorker_p0 stopped!
[2025-08-21 20:11:30,164][02158] Component RolloutWorker_w4 stopped!
[2025-08-21 20:11:30,166][11394] Stopping RolloutWorker_w4...
[2025-08-21 20:11:30,166][11394] Loop rollout_proc4_evt_loop terminating...
[2025-08-21 20:11:30,178][02158] Component RolloutWorker_w5 stopped!
[2025-08-21 20:11:30,178][11391] Stopping RolloutWorker_w5...
[2025-08-21 20:11:30,185][02158] Component RolloutWorker_w3 stopped!
[2025-08-21 20:11:30,185][11390] Stopping RolloutWorker_w3...
[2025-08-21 20:11:30,183][11391] Loop rollout_proc5_evt_loop terminating...
[2025-08-21 20:11:30,192][11390] Loop rollout_proc3_evt_loop terminating...
[2025-08-21 20:11:30,201][02158] Component RolloutWorker_w1 stopped!
[2025-08-21 20:11:30,206][11388] Stopping RolloutWorker_w1...
[2025-08-21 20:11:30,209][11388] Loop rollout_proc1_evt_loop terminating...
[2025-08-21 20:11:30,226][11389] Stopping RolloutWorker_w2...
[2025-08-21 20:11:30,227][11389] Loop rollout_proc2_evt_loop terminating...
[2025-08-21 20:11:30,226][02158] Component RolloutWorker_w2 stopped!
[2025-08-21 20:11:30,250][11393] Stopping RolloutWorker_w6...
[2025-08-21 20:11:30,251][11393] Loop rollout_proc6_evt_loop terminating...
[2025-08-21 20:11:30,250][02158] Component RolloutWorker_w6 stopped!
[2025-08-21 20:11:30,264][02158] Component RolloutWorker_w7 stopped!
[2025-08-21 20:11:30,264][11392] Stopping RolloutWorker_w7...
[2025-08-21 20:11:30,270][11392] Loop rollout_proc7_evt_loop terminating...
[2025-08-21 20:11:30,273][02158] Component RolloutWorker_w0 stopped!
[2025-08-21 20:11:30,276][02158] Waiting for process learner_proc0 to stop...
[2025-08-21 20:11:30,279][11387] Stopping RolloutWorker_w0...
[2025-08-21 20:11:30,283][11387] Loop rollout_proc0_evt_loop terminating...
[2025-08-21 20:11:31,496][02158] Waiting for process inference_proc0-0 to join...
[2025-08-21 20:11:31,502][02158] Waiting for process rollout_proc0 to join...
[2025-08-21 20:11:34,602][02158] Waiting for process rollout_proc1 to join...
[2025-08-21 20:11:34,603][02158] Waiting for process rollout_proc2 to join...
[2025-08-21 20:11:34,604][02158] Waiting for process rollout_proc3 to join...
[2025-08-21 20:11:34,605][02158] Waiting for process rollout_proc4 to join...
[2025-08-21 20:11:34,606][02158] Waiting for process rollout_proc5 to join...
[2025-08-21 20:11:34,607][02158] Waiting for process rollout_proc6 to join...
[2025-08-21 20:11:34,608][02158] Waiting for process rollout_proc7 to join...
[2025-08-21 20:11:34,609][02158] Batcher 0 profile tree view:
batching: 33.5809, releasing_batches: 0.0323
[2025-08-21 20:11:34,610][02158] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0035
  wait_policy_total: 535.4931
update_model: 9.3376
  weight_update: 0.0020
one_step: 0.0025
  handle_policy_step: 651.5964
    deserialize: 17.3898, stack: 3.3949, obs_to_device_normalize: 139.9688, forward: 329.0427, send_messages: 33.6492
    prepare_outputs: 99.5629
      to_cpu: 62.1138
[2025-08-21 20:11:34,611][02158] Learner 0 profile tree view:
misc: 0.0062, prepare_batch: 14.9270
train: 89.6135
  epoch_init: 0.0065, minibatch_init: 0.0145, losses_postprocess: 0.7535, kl_divergence: 0.8283, after_optimizer: 41.6698
  calculate_losses: 31.1672
    losses_init: 0.0040, forward_head: 1.5524, bptt_initial: 20.6017, tail: 1.1391, advantages_returns: 0.3993, losses: 4.4069
    bptt: 2.7556
      bptt_forward_core: 2.6089
  update: 14.5035
    clip: 1.1377
[2025-08-21 20:11:34,613][02158] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.2983, enqueue_policy_requests: 133.6954, env_step: 979.3261, overhead: 14.3485, complete_rollouts: 9.0930
save_policy_outputs: 22.6923
  split_output_tensors: 9.0335
[2025-08-21 20:11:34,614][02158] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3712, enqueue_policy_requests: 143.4527, env_step: 967.5485, overhead: 15.0307, complete_rollouts: 7.7438
save_policy_outputs: 21.7696
  split_output_tensors: 8.1752
[2025-08-21 20:11:34,614][02158] Loop Runner_EvtLoop terminating...
[2025-08-21 20:11:34,619][02158] Runner profile tree view:
main_loop: 1263.4643
[2025-08-21 20:11:34,619][02158] Collected {0: 5005312}, FPS: 3961.6
[2025-08-21 20:11:34,645][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:11:34,646][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:11:34,647][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:11:34,648][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:11:34,649][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:11:34,650][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:11:34,651][02158] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:11:34,653][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:11:34,654][02158] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-08-21 20:11:34,656][02158] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-08-21 20:11:34,657][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:11:34,661][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:11:34,661][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:11:34,662][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:11:34,663][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:11:34,709][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:11:34,711][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:11:34,726][02158] ConvEncoder: input_channels=3
[2025-08-21 20:11:34,786][02158] Conv encoder output size: 512
[2025-08-21 20:11:34,787][02158] Policy head output size: 512
[2025-08-21 20:11:34,814][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:11:35,234][02158] Num frames 100...
[2025-08-21 20:11:35,351][02158] Num frames 200...
[2025-08-21 20:11:35,468][02158] Num frames 300...
[2025-08-21 20:11:35,591][02158] Num frames 400...
[2025-08-21 20:11:35,707][02158] Num frames 500...
[2025-08-21 20:11:35,833][02158] Num frames 600...
[2025-08-21 20:11:35,950][02158] Num frames 700...
[2025-08-21 20:11:36,069][02158] Num frames 800...
[2025-08-21 20:11:36,187][02158] Num frames 900...
[2025-08-21 20:11:36,306][02158] Num frames 1000...
[2025-08-21 20:11:36,463][02158] Num frames 1100...
[2025-08-21 20:11:36,583][02158] Num frames 1200...
[2025-08-21 20:11:36,703][02158] Num frames 1300...
[2025-08-21 20:11:36,821][02158] Num frames 1400...
[2025-08-21 20:11:36,951][02158] Num frames 1500...
[2025-08-21 20:11:37,068][02158] Num frames 1600...
[2025-08-21 20:11:37,185][02158] Num frames 1700...
[2025-08-21 20:11:37,309][02158] Num frames 1800...
[2025-08-21 20:11:37,451][02158] Num frames 1900...
[2025-08-21 20:11:37,569][02158] Num frames 2000...
[2025-08-21 20:11:37,695][02158] Num frames 2100...
[2025-08-21 20:11:37,747][02158] Avg episode rewards: #0: 57.999, true rewards: #0: 21.000
[2025-08-21 20:11:37,748][02158] Avg episode reward: 57.999, avg true_objective: 21.000
[2025-08-21 20:11:37,864][02158] Num frames 2200...
[2025-08-21 20:11:37,993][02158] Num frames 2300...
[2025-08-21 20:11:38,109][02158] Num frames 2400...
[2025-08-21 20:11:38,232][02158] Num frames 2500...
[2025-08-21 20:11:38,382][02158] Avg episode rewards: #0: 32.899, true rewards: #0: 12.900
[2025-08-21 20:11:38,383][02158] Avg episode reward: 32.899, avg true_objective: 12.900
[2025-08-21 20:11:38,409][02158] Num frames 2600...
[2025-08-21 20:11:38,529][02158] Num frames 2700...
[2025-08-21 20:11:38,648][02158] Num frames 2800...
[2025-08-21 20:11:38,770][02158] Num frames 2900...
[2025-08-21 20:11:38,889][02158] Num frames 3000...
[2025-08-21 20:11:39,020][02158] Num frames 3100...
[2025-08-21 20:11:39,140][02158] Num frames 3200...
[2025-08-21 20:11:39,264][02158] Num frames 3300...
[2025-08-21 20:11:39,391][02158] Num frames 3400...
[2025-08-21 20:11:39,518][02158] Num frames 3500...
[2025-08-21 20:11:39,642][02158] Num frames 3600...
[2025-08-21 20:11:39,761][02158] Num frames 3700...
[2025-08-21 20:11:39,880][02158] Num frames 3800...
[2025-08-21 20:11:40,012][02158] Num frames 3900...
[2025-08-21 20:11:40,169][02158] Avg episode rewards: #0: 31.960, true rewards: #0: 13.293
[2025-08-21 20:11:40,170][02158] Avg episode reward: 31.960, avg true_objective: 13.293
[2025-08-21 20:11:40,190][02158] Num frames 4000...
[2025-08-21 20:11:40,310][02158] Num frames 4100...
[2025-08-21 20:11:40,431][02158] Num frames 4200...
[2025-08-21 20:11:40,554][02158] Num frames 4300...
[2025-08-21 20:11:40,677][02158] Num frames 4400...
[2025-08-21 20:11:40,795][02158] Num frames 4500...
[2025-08-21 20:11:40,913][02158] Num frames 4600...
[2025-08-21 20:11:41,045][02158] Num frames 4700...
[2025-08-21 20:11:41,166][02158] Num frames 4800...
[2025-08-21 20:11:41,286][02158] Num frames 4900...
[2025-08-21 20:11:41,406][02158] Num frames 5000...
[2025-08-21 20:11:41,528][02158] Num frames 5100...
[2025-08-21 20:11:41,652][02158] Num frames 5200...
[2025-08-21 20:11:41,772][02158] Num frames 5300...
[2025-08-21 20:11:41,891][02158] Num frames 5400...
[2025-08-21 20:11:42,015][02158] Num frames 5500...
[2025-08-21 20:11:42,102][02158] Avg episode rewards: #0: 33.060, true rewards: #0: 13.810
[2025-08-21 20:11:42,103][02158] Avg episode reward: 33.060, avg true_objective: 13.810
[2025-08-21 20:11:42,196][02158] Num frames 5600...
[2025-08-21 20:11:42,314][02158] Num frames 5700...
[2025-08-21 20:11:42,435][02158] Num frames 5800...
[2025-08-21 20:11:42,554][02158] Num frames 5900...
[2025-08-21 20:11:42,673][02158] Num frames 6000...
[2025-08-21 20:11:42,800][02158] Num frames 6100...
[2025-08-21 20:11:42,930][02158] Avg episode rewards: #0: 28.728, true rewards: #0: 12.328
[2025-08-21 20:11:42,931][02158] Avg episode reward: 28.728, avg true_objective: 12.328
[2025-08-21 20:11:42,975][02158] Num frames 6200...
[2025-08-21 20:11:43,102][02158] Num frames 6300...
[2025-08-21 20:11:43,223][02158] Num frames 6400...
[2025-08-21 20:11:43,343][02158] Num frames 6500...
[2025-08-21 20:11:43,463][02158] Num frames 6600...
[2025-08-21 20:11:43,581][02158] Num frames 6700...
[2025-08-21 20:11:43,698][02158] Num frames 6800...
[2025-08-21 20:11:43,794][02158] Avg episode rewards: #0: 26.060, true rewards: #0: 11.393
[2025-08-21 20:11:43,795][02158] Avg episode reward: 26.060, avg true_objective: 11.393
[2025-08-21 20:11:43,870][02158] Num frames 6900...
[2025-08-21 20:11:43,987][02158] Num frames 7000...
[2025-08-21 20:11:44,116][02158] Num frames 7100...
[2025-08-21 20:11:44,245][02158] Num frames 7200...
[2025-08-21 20:11:44,362][02158] Num frames 7300...
[2025-08-21 20:11:44,490][02158] Num frames 7400...
[2025-08-21 20:11:44,611][02158] Num frames 7500...
[2025-08-21 20:11:44,729][02158] Num frames 7600...
[2025-08-21 20:11:44,845][02158] Num frames 7700...
[2025-08-21 20:11:44,987][02158] Num frames 7800...
[2025-08-21 20:11:45,164][02158] Num frames 7900...
[2025-08-21 20:11:45,241][02158] Avg episode rewards: #0: 25.587, true rewards: #0: 11.301
[2025-08-21 20:11:45,242][02158] Avg episode reward: 25.587, avg true_objective: 11.301
[2025-08-21 20:11:45,389][02158] Num frames 8000...
[2025-08-21 20:11:45,555][02158] Num frames 8100...
[2025-08-21 20:11:45,721][02158] Num frames 8200...
[2025-08-21 20:11:45,885][02158] Num frames 8300...
[2025-08-21 20:11:46,047][02158] Num frames 8400...
[2025-08-21 20:11:46,221][02158] Num frames 8500...
[2025-08-21 20:11:46,392][02158] Num frames 8600...
[2025-08-21 20:11:46,562][02158] Num frames 8700...
[2025-08-21 20:11:46,730][02158] Num frames 8800...
[2025-08-21 20:11:46,903][02158] Num frames 8900...
[2025-08-21 20:11:47,083][02158] Num frames 9000...
[2025-08-21 20:11:47,206][02158] Num frames 9100...
[2025-08-21 20:11:47,337][02158] Num frames 9200...
[2025-08-21 20:11:47,458][02158] Num frames 9300...
[2025-08-21 20:11:47,614][02158] Avg episode rewards: #0: 26.855, true rewards: #0: 11.730
[2025-08-21 20:11:47,615][02158] Avg episode reward: 26.855, avg true_objective: 11.730
[2025-08-21 20:11:47,638][02158] Num frames 9400...
[2025-08-21 20:11:47,759][02158] Num frames 9500...
[2025-08-21 20:11:47,880][02158] Num frames 9600...
[2025-08-21 20:11:48,000][02158] Num frames 9700...
[2025-08-21 20:11:48,118][02158] Num frames 9800...
[2025-08-21 20:11:48,239][02158] Num frames 9900...
[2025-08-21 20:11:48,370][02158] Num frames 10000...
[2025-08-21 20:11:48,491][02158] Num frames 10100...
[2025-08-21 20:11:48,612][02158] Num frames 10200...
[2025-08-21 20:11:48,733][02158] Num frames 10300...
[2025-08-21 20:11:48,852][02158] Num frames 10400...
[2025-08-21 20:11:49,025][02158] Avg episode rewards: #0: 26.554, true rewards: #0: 11.666
[2025-08-21 20:11:49,026][02158] Avg episode reward: 26.554, avg true_objective: 11.666
[2025-08-21 20:11:49,029][02158] Num frames 10500...
[2025-08-21 20:11:49,151][02158] Num frames 10600...
[2025-08-21 20:11:49,272][02158] Num frames 10700...
[2025-08-21 20:11:49,400][02158] Num frames 10800...
[2025-08-21 20:11:49,519][02158] Num frames 10900...
[2025-08-21 20:11:49,636][02158] Num frames 11000...
[2025-08-21 20:11:49,755][02158] Num frames 11100...
[2025-08-21 20:11:49,873][02158] Num frames 11200...
[2025-08-21 20:11:49,990][02158] Num frames 11300...
[2025-08-21 20:11:50,109][02158] Num frames 11400...
[2025-08-21 20:11:50,232][02158] Num frames 11500...
[2025-08-21 20:11:50,363][02158] Num frames 11600...
[2025-08-21 20:11:50,486][02158] Num frames 11700...
[2025-08-21 20:11:50,606][02158] Num frames 11800...
[2025-08-21 20:11:50,728][02158] Num frames 11900...
[2025-08-21 20:11:50,848][02158] Num frames 12000...
[2025-08-21 20:11:50,969][02158] Num frames 12100...
[2025-08-21 20:11:51,091][02158] Num frames 12200...
[2025-08-21 20:11:51,215][02158] Num frames 12300...
[2025-08-21 20:11:51,334][02158] Num frames 12400...
[2025-08-21 20:11:51,466][02158] Num frames 12500...
[2025-08-21 20:11:51,636][02158] Avg episode rewards: #0: 29.799, true rewards: #0: 12.599
[2025-08-21 20:11:51,637][02158] Avg episode reward: 29.799, avg true_objective: 12.599
[2025-08-21 20:11:51,640][02158] Num frames 12600...
[2025-08-21 20:13:08,120][02158] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-08-21 20:13:08,158][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:13:08,159][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:13:08,160][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:13:08,161][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:13:08,162][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:13:08,164][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:13:08,165][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:13:08,166][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:13:08,167][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:13:08,168][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:13:08,169][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:13:08,170][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:13:08,170][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:13:08,171][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:13:08,172][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:13:08,199][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:13:08,201][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:13:08,212][02158] ConvEncoder: input_channels=3
[2025-08-21 20:13:08,248][02158] Conv encoder output size: 512
[2025-08-21 20:13:08,249][02158] Policy head output size: 512
[2025-08-21 20:13:08,266][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:13:08,267][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:13:08,291][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:13:08,432][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:13:08,454][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:13:08,456][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:14:02,225][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:14:02,226][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:14:02,227][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:14:02,228][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:14:02,229][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:14:02,229][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:14:02,230][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:14:02,231][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:14:02,232][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:14:02,232][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:14:02,234][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:14:02,235][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:14:02,236][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:14:02,237][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:14:02,238][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:14:02,266][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:14:02,267][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:14:02,282][02158] ConvEncoder: input_channels=3
[2025-08-21 20:14:02,316][02158] Conv encoder output size: 512
[2025-08-21 20:14:02,317][02158] Policy head output size: 512
[2025-08-21 20:14:02,333][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:14:02,335][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:14:02,357][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:14:02,359][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:14:02,380][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:14:02,382][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:14:06,535][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:14:06,536][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:14:06,537][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:14:06,537][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:14:06,538][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:14:06,539][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:14:06,540][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:14:06,541][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:14:06,542][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:14:06,543][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:14:06,543][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:14:06,544][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:14:06,545][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:14:06,546][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:14:06,547][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:14:06,572][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:14:06,574][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:14:06,582][02158] ConvEncoder: input_channels=3
[2025-08-21 20:14:06,613][02158] Conv encoder output size: 512
[2025-08-21 20:14:06,614][02158] Policy head output size: 512
[2025-08-21 20:14:06,630][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:14:06,633][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:14:06,654][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:14:06,656][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:14:06,678][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:14:06,681][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:15:10,526][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:15:10,527][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:15:10,528][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:15:10,529][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:15:10,530][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:15:10,531][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:15:10,532][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:15:10,533][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:15:10,534][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:15:10,534][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:15:10,535][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:15:10,536][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:15:10,537][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:15:10,538][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:15:10,539][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:15:10,564][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:15:10,565][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:15:10,574][02158] ConvEncoder: input_channels=3
[2025-08-21 20:15:10,603][02158] Conv encoder output size: 512
[2025-08-21 20:15:10,603][02158] Policy head output size: 512
[2025-08-21 20:15:10,619][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:15:10,621][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:15:10,642][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:15:10,644][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:15:10,666][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:15:10,667][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:15:35,955][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:15:35,956][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:15:35,957][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:15:35,957][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:15:35,958][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:15:35,959][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:15:35,960][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:15:35,961][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:15:35,962][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:15:35,963][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:15:35,964][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:15:35,965][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:15:35,965][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:15:35,966][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:15:35,967][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:15:35,992][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:15:35,993][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:15:36,003][02158] ConvEncoder: input_channels=3
[2025-08-21 20:15:36,032][02158] Conv encoder output size: 512
[2025-08-21 20:15:36,033][02158] Policy head output size: 512
[2025-08-21 20:15:36,049][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:15:36,051][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:15:36,073][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:15:36,074][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:15:36,097][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:15:36,099][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:17:06,806][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:17:06,807][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:17:06,809][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:17:06,810][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:17:06,811][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:17:06,811][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:17:06,812][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:17:06,813][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:17:06,815][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:17:06,815][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:17:06,816][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:17:06,817][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:17:06,818][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:17:06,819][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:17:06,820][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:17:06,843][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:17:06,844][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:17:06,853][02158] ConvEncoder: input_channels=3
[2025-08-21 20:17:06,883][02158] Conv encoder output size: 512
[2025-08-21 20:17:06,884][02158] Policy head output size: 512
[2025-08-21 20:17:06,900][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:17:06,902][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:17:06,924][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:17:06,926][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:17:06,947][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:17:06,950][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:18:16,541][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:18:16,542][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:18:16,543][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:18:16,545][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:18:16,545][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:18:16,546][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:18:16,547][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:18:16,548][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:18:16,549][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:18:16,550][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:18:16,551][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:18:16,551][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:18:16,552][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:18:16,553][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:18:16,554][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:18:16,579][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:18:16,580][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:18:16,589][02158] ConvEncoder: input_channels=3
[2025-08-21 20:18:16,619][02158] Conv encoder output size: 512
[2025-08-21 20:18:16,620][02158] Policy head output size: 512
[2025-08-21 20:18:16,635][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:18:16,637][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:18:16,658][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:18:16,660][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:18:16,682][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:18:16,684][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:18:40,179][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:18:40,180][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:18:40,181][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:18:40,183][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:18:40,184][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:18:40,185][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:18:40,186][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:18:40,187][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:18:40,188][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:18:40,189][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:18:40,190][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:18:40,191][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:18:40,192][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:18:40,192][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:18:40,193][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:18:40,216][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:18:40,217][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:18:40,226][02158] ConvEncoder: input_channels=3
[2025-08-21 20:18:40,255][02158] Conv encoder output size: 512
[2025-08-21 20:18:40,255][02158] Policy head output size: 512
[2025-08-21 20:18:40,271][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:18:40,273][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:18:40,295][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:18:40,298][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:18:40,320][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:18:40,323][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:18:53,859][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:18:53,860][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:18:53,861][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:18:53,862][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:18:53,863][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:18:53,864][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:18:53,865][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:18:53,866][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:18:53,867][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:18:53,868][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:18:53,871][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:18:53,872][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:18:53,873][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:18:53,874][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:18:53,875][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:18:53,909][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:18:53,912][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:18:53,922][02158] ConvEncoder: input_channels=3
[2025-08-21 20:18:53,952][02158] Conv encoder output size: 512
[2025-08-21 20:18:53,953][02158] Policy head output size: 512
[2025-08-21 20:18:53,969][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:18:53,971][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:18:53,997][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:18:53,999][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:18:54,020][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:18:54,022][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 973 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:19:09,813][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:19:09,814][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:19:09,815][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:19:09,816][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:19:09,817][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:19:09,818][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:19:09,819][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:19:09,820][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:19:09,820][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:19:09,821][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:19:09,822][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:19:09,823][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:19:09,824][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:19:09,825][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:19:09,826][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:19:09,849][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:19:09,850][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:19:09,858][02158] ConvEncoder: input_channels=3
[2025-08-21 20:19:09,887][02158] Conv encoder output size: 512
[2025-08-21 20:19:09,888][02158] Policy head output size: 512
[2025-08-21 20:19:09,903][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:19:09,905][02158] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-844920808.py", line 6, in _torch_load_allow_pickle
    return _orig_torch_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 972 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:19:09,926][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:19:09,928][02158] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-844920808.py", line 6, in _torch_load_allow_pickle
    return _orig_torch_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 972 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:19:09,950][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:19:09,953][02158] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-844920808.py", line 6, in _torch_load_allow_pickle
    return _orig_torch_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/ipython-input-1477929787.py", line 5, in _load_allow_pickle
    return _real_load(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  [Previous line repeated 972 more times]
RecursionError: maximum recursion depth exceeded
[2025-08-21 20:23:04,342][02158] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-08-21 20:23:04,343][02158] Overriding arg 'num_workers' with value 1 passed from command line
[2025-08-21 20:23:04,344][02158] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-08-21 20:23:04,345][02158] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-08-21 20:23:04,346][02158] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-08-21 20:23:04,347][02158] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-08-21 20:23:04,348][02158] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-08-21 20:23:04,348][02158] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-08-21 20:23:04,349][02158] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-08-21 20:23:04,350][02158] Adding new argument 'hf_repository'='jmartin233/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-08-21 20:23:04,353][02158] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-08-21 20:23:04,354][02158] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-08-21 20:23:04,355][02158] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-08-21 20:23:04,356][02158] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-08-21 20:23:04,357][02158] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-08-21 20:23:04,380][02158] RunningMeanStd input shape: (3, 72, 128)
[2025-08-21 20:23:04,381][02158] RunningMeanStd input shape: (1,)
[2025-08-21 20:23:04,390][02158] ConvEncoder: input_channels=3
[2025-08-21 20:23:04,420][02158] Conv encoder output size: 512
[2025-08-21 20:23:04,420][02158] Policy head output size: 512
[2025-08-21 20:23:04,437][02158] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth...
[2025-08-21 20:23:04,852][02158] Num frames 100...
[2025-08-21 20:23:04,972][02158] Num frames 200...
[2025-08-21 20:23:05,090][02158] Num frames 300...
[2025-08-21 20:23:05,209][02158] Num frames 400...
[2025-08-21 20:23:05,327][02158] Num frames 500...
[2025-08-21 20:23:05,444][02158] Num frames 600...
[2025-08-21 20:23:05,577][02158] Num frames 700...
[2025-08-21 20:23:05,699][02158] Num frames 800...
[2025-08-21 20:23:05,819][02158] Num frames 900...
[2025-08-21 20:23:05,939][02158] Num frames 1000...
[2025-08-21 20:23:06,060][02158] Num frames 1100...
[2025-08-21 20:23:06,177][02158] Num frames 1200...
[2025-08-21 20:23:06,299][02158] Num frames 1300...
[2025-08-21 20:23:06,410][02158] Avg episode rewards: #0: 32.440, true rewards: #0: 13.440
[2025-08-21 20:23:06,412][02158] Avg episode reward: 32.440, avg true_objective: 13.440
[2025-08-21 20:23:06,479][02158] Num frames 1400...
[2025-08-21 20:23:06,608][02158] Num frames 1500...
[2025-08-21 20:23:06,724][02158] Num frames 1600...
[2025-08-21 20:23:06,839][02158] Num frames 1700...
[2025-08-21 20:23:06,957][02158] Num frames 1800...
[2025-08-21 20:23:07,113][02158] Avg episode rewards: #0: 19.940, true rewards: #0: 9.440
[2025-08-21 20:23:07,114][02158] Avg episode reward: 19.940, avg true_objective: 9.440
[2025-08-21 20:23:07,137][02158] Num frames 1900...
[2025-08-21 20:23:07,314][02158] Num frames 2000...
[2025-08-21 20:23:07,489][02158] Num frames 2100...
[2025-08-21 20:23:07,677][02158] Num frames 2200...
[2025-08-21 20:23:07,861][02158] Num frames 2300...
[2025-08-21 20:23:08,025][02158] Num frames 2400...
[2025-08-21 20:23:08,189][02158] Num frames 2500...
[2025-08-21 20:23:08,283][02158] Avg episode rewards: #0: 17.740, true rewards: #0: 8.407
[2025-08-21 20:23:08,285][02158] Avg episode reward: 17.740, avg true_objective: 8.407
[2025-08-21 20:23:08,423][02158] Num frames 2600...
[2025-08-21 20:23:08,600][02158] Num frames 2700...
[2025-08-21 20:23:08,780][02158] Num frames 2800...
[2025-08-21 20:23:08,978][02158] Num frames 2900...
[2025-08-21 20:23:09,156][02158] Num frames 3000...
[2025-08-21 20:23:09,334][02158] Num frames 3100...
[2025-08-21 20:23:09,466][02158] Num frames 3200...
[2025-08-21 20:23:09,583][02158] Num frames 3300...
[2025-08-21 20:23:09,699][02158] Num frames 3400...
[2025-08-21 20:23:09,824][02158] Avg episode rewards: #0: 18.625, true rewards: #0: 8.625
[2025-08-21 20:23:09,825][02158] Avg episode reward: 18.625, avg true_objective: 8.625
[2025-08-21 20:23:09,886][02158] Num frames 3500...
[2025-08-21 20:23:10,008][02158] Num frames 3600...
[2025-08-21 20:23:10,126][02158] Num frames 3700...
[2025-08-21 20:23:10,247][02158] Num frames 3800...
[2025-08-21 20:23:10,366][02158] Num frames 3900...
[2025-08-21 20:23:10,485][02158] Num frames 4000...
[2025-08-21 20:23:10,606][02158] Num frames 4100...
[2025-08-21 20:23:10,725][02158] Num frames 4200...
[2025-08-21 20:23:10,859][02158] Num frames 4300...
[2025-08-21 20:23:10,993][02158] Num frames 4400...
[2025-08-21 20:23:11,122][02158] Num frames 4500...
[2025-08-21 20:23:11,259][02158] Avg episode rewards: #0: 20.130, true rewards: #0: 9.130
[2025-08-21 20:23:11,260][02158] Avg episode reward: 20.130, avg true_objective: 9.130
[2025-08-21 20:23:11,305][02158] Num frames 4600...
[2025-08-21 20:23:11,426][02158] Num frames 4700...
[2025-08-21 20:23:11,544][02158] Num frames 4800...
[2025-08-21 20:23:11,660][02158] Num frames 4900...
[2025-08-21 20:23:11,779][02158] Num frames 5000...
[2025-08-21 20:23:11,908][02158] Num frames 5100...
[2025-08-21 20:23:12,028][02158] Num frames 5200...
[2025-08-21 20:23:12,146][02158] Num frames 5300...
[2025-08-21 20:23:12,243][02158] Avg episode rewards: #0: 19.055, true rewards: #0: 8.888
[2025-08-21 20:23:12,244][02158] Avg episode reward: 19.055, avg true_objective: 8.888
[2025-08-21 20:23:12,324][02158] Num frames 5400...
[2025-08-21 20:23:12,443][02158] Num frames 5500...
[2025-08-21 20:23:12,563][02158] Num frames 5600...
[2025-08-21 20:23:12,682][02158] Num frames 5700...
[2025-08-21 20:23:12,801][02158] Num frames 5800...
[2025-08-21 20:23:12,932][02158] Num frames 5900...
[2025-08-21 20:23:13,053][02158] Num frames 6000...
[2025-08-21 20:23:13,178][02158] Num frames 6100...
[2025-08-21 20:23:13,300][02158] Num frames 6200...
[2025-08-21 20:23:13,421][02158] Num frames 6300...
[2025-08-21 20:23:13,539][02158] Num frames 6400...
[2025-08-21 20:23:13,659][02158] Avg episode rewards: #0: 19.933, true rewards: #0: 9.219
[2025-08-21 20:23:13,660][02158] Avg episode reward: 19.933, avg true_objective: 9.219
[2025-08-21 20:23:13,715][02158] Num frames 6500...
[2025-08-21 20:23:13,829][02158] Num frames 6600...
[2025-08-21 20:23:13,956][02158] Num frames 6700...
[2025-08-21 20:23:14,073][02158] Num frames 6800...
[2025-08-21 20:23:14,192][02158] Num frames 6900...
[2025-08-21 20:23:14,321][02158] Avg episode rewards: #0: 18.456, true rewards: #0: 8.706
[2025-08-21 20:23:14,322][02158] Avg episode reward: 18.456, avg true_objective: 8.706
[2025-08-21 20:23:14,365][02158] Num frames 7000...
[2025-08-21 20:23:14,480][02158] Num frames 7100...
[2025-08-21 20:23:14,597][02158] Num frames 7200...
[2025-08-21 20:23:14,713][02158] Num frames 7300...
[2025-08-21 20:23:14,830][02158] Num frames 7400...
[2025-08-21 20:23:14,965][02158] Num frames 7500...
[2025-08-21 20:23:15,091][02158] Num frames 7600...
[2025-08-21 20:23:15,210][02158] Num frames 7700...
[2025-08-21 20:23:15,330][02158] Num frames 7800...
[2025-08-21 20:23:15,482][02158] Avg episode rewards: #0: 18.753, true rewards: #0: 8.753
[2025-08-21 20:23:15,483][02158] Avg episode reward: 18.753, avg true_objective: 8.753
[2025-08-21 20:23:15,511][02158] Num frames 7900...
[2025-08-21 20:23:15,628][02158] Num frames 8000...
[2025-08-21 20:23:15,746][02158] Num frames 8100...
[2025-08-21 20:23:15,860][02158] Num frames 8200...
[2025-08-21 20:23:15,995][02158] Num frames 8300...
[2025-08-21 20:23:16,116][02158] Num frames 8400...
[2025-08-21 20:23:16,243][02158] Num frames 8500...
[2025-08-21 20:23:16,364][02158] Num frames 8600...
[2025-08-21 20:23:16,491][02158] Num frames 8700...
[2025-08-21 20:23:16,613][02158] Num frames 8800...
[2025-08-21 20:23:16,737][02158] Num frames 8900...
[2025-08-21 20:23:16,861][02158] Num frames 9000...
[2025-08-21 20:23:16,999][02158] Num frames 9100...
[2025-08-21 20:23:17,124][02158] Num frames 9200...
[2025-08-21 20:23:17,246][02158] Num frames 9300...
[2025-08-21 20:23:17,333][02158] Avg episode rewards: #0: 20.325, true rewards: #0: 9.325
[2025-08-21 20:23:17,334][02158] Avg episode reward: 20.325, avg true_objective: 9.325
[2025-08-21 20:24:14,103][02158] Replay video saved to /content/train_dir/default_experiment/replay.mp4!