Upload folder using huggingface_hub

76feb8e verified 14 days ago

121 kB

	[2025-04-08 16:58:17,065][02736] Saving configuration to /content/train_dir/default_experiment/config.json...
	[2025-04-08 16:58:17,067][02736] Rollout worker 0 uses device cpu
	[2025-04-08 16:58:17,069][02736] Rollout worker 1 uses device cpu
	[2025-04-08 16:58:17,070][02736] Rollout worker 2 uses device cpu
	[2025-04-08 16:58:17,071][02736] Rollout worker 3 uses device cpu
	[2025-04-08 16:58:17,072][02736] Rollout worker 4 uses device cpu
	[2025-04-08 16:58:17,074][02736] Rollout worker 5 uses device cpu
	[2025-04-08 16:58:17,074][02736] Rollout worker 6 uses device cpu
	[2025-04-08 16:58:17,075][02736] Rollout worker 7 uses device cpu
	[2025-04-08 16:58:17,228][02736] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-04-08 16:58:17,230][02736] InferenceWorker_p0-w0: min num requests: 2
	[2025-04-08 16:58:17,261][02736] Starting all processes...
	[2025-04-08 16:58:17,261][02736] Starting process learner_proc0
	[2025-04-08 16:58:17,312][02736] Starting all processes...
	[2025-04-08 16:58:17,321][02736] Starting process inference_proc0-0
	[2025-04-08 16:58:17,321][02736] Starting process rollout_proc0
	[2025-04-08 16:58:17,324][02736] Starting process rollout_proc1
	[2025-04-08 16:58:17,324][02736] Starting process rollout_proc2
	[2025-04-08 16:58:17,324][02736] Starting process rollout_proc3
	[2025-04-08 16:58:17,324][02736] Starting process rollout_proc4
	[2025-04-08 16:58:17,324][02736] Starting process rollout_proc5
	[2025-04-08 16:58:17,324][02736] Starting process rollout_proc6
	[2025-04-08 16:58:17,324][02736] Starting process rollout_proc7
	[2025-04-08 16:58:33,276][02956] Worker 7 uses CPU cores [1]
	[2025-04-08 16:58:33,492][02952] Worker 3 uses CPU cores [1]
	[2025-04-08 16:58:33,661][02951] Worker 2 uses CPU cores [0]
	[2025-04-08 16:58:33,731][02949] Worker 0 uses CPU cores [0]
	[2025-04-08 16:58:33,738][02953] Worker 4 uses CPU cores [0]
	[2025-04-08 16:58:33,913][02935] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-04-08 16:58:33,914][02935] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2025-04-08 16:58:33,935][02955] Worker 6 uses CPU cores [0]
	[2025-04-08 16:58:33,956][02935] Num visible devices: 1
	[2025-04-08 16:58:33,966][02935] Starting seed is not provided
	[2025-04-08 16:58:33,966][02935] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-04-08 16:58:33,967][02935] Initializing actor-critic model on device cuda:0
	[2025-04-08 16:58:33,968][02935] RunningMeanStd input shape: (3, 72, 128)
	[2025-04-08 16:58:33,972][02935] RunningMeanStd input shape: (1,)
	[2025-04-08 16:58:34,002][02954] Worker 5 uses CPU cores [1]
	[2025-04-08 16:58:34,005][02935] ConvEncoder: input_channels=3
	[2025-04-08 16:58:34,042][02950] Worker 1 uses CPU cores [1]
	[2025-04-08 16:58:34,155][02948] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-04-08 16:58:34,156][02948] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
	[2025-04-08 16:58:34,180][02948] Num visible devices: 1
	[2025-04-08 16:58:34,374][02935] Conv encoder output size: 512
	[2025-04-08 16:58:34,374][02935] Policy head output size: 512
	[2025-04-08 16:58:34,440][02935] Created Actor Critic model with architecture:
	[2025-04-08 16:58:34,440][02935] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2025-04-08 16:58:34,819][02935] Using optimizer <class 'torch.optim.adam.Adam'>
	[2025-04-08 16:58:37,221][02736] Heartbeat connected on Batcher_0
	[2025-04-08 16:58:37,229][02736] Heartbeat connected on InferenceWorker_p0-w0
	[2025-04-08 16:58:37,236][02736] Heartbeat connected on RolloutWorker_w0
	[2025-04-08 16:58:37,240][02736] Heartbeat connected on RolloutWorker_w1
	[2025-04-08 16:58:37,244][02736] Heartbeat connected on RolloutWorker_w2
	[2025-04-08 16:58:37,247][02736] Heartbeat connected on RolloutWorker_w3
	[2025-04-08 16:58:37,252][02736] Heartbeat connected on RolloutWorker_w4
	[2025-04-08 16:58:37,255][02736] Heartbeat connected on RolloutWorker_w5
	[2025-04-08 16:58:37,257][02736] Heartbeat connected on RolloutWorker_w6
	[2025-04-08 16:58:37,260][02736] Heartbeat connected on RolloutWorker_w7
	[2025-04-08 16:58:39,708][02935] No checkpoints found
	[2025-04-08 16:58:39,708][02935] Did not load from checkpoint, starting from scratch!
	[2025-04-08 16:58:39,708][02935] Initialized policy 0 weights for model version 0
	[2025-04-08 16:58:39,711][02935] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-04-08 16:58:39,722][02935] LearnerWorker_p0 finished initialization!
	[2025-04-08 16:58:39,744][02736] Heartbeat connected on LearnerWorker_p0
	[2025-04-08 16:58:39,936][02948] RunningMeanStd input shape: (3, 72, 128)
	[2025-04-08 16:58:39,938][02948] RunningMeanStd input shape: (1,)
	[2025-04-08 16:58:39,953][02948] ConvEncoder: input_channels=3
	[2025-04-08 16:58:40,052][02948] Conv encoder output size: 512
	[2025-04-08 16:58:40,052][02948] Policy head output size: 512
	[2025-04-08 16:58:40,087][02736] Inference worker 0-0 is ready!
	[2025-04-08 16:58:40,087][02736] All inference workers are ready! Signal rollout workers to start!
	[2025-04-08 16:58:40,320][02949] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-04-08 16:58:40,322][02951] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-04-08 16:58:40,364][02950] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-04-08 16:58:40,369][02956] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-04-08 16:58:40,374][02955] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-04-08 16:58:40,384][02952] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-04-08 16:58:40,386][02954] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-04-08 16:58:40,397][02953] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-04-08 16:58:42,011][02956] Decorrelating experience for 0 frames...
	[2025-04-08 16:58:42,013][02954] Decorrelating experience for 0 frames...
	[2025-04-08 16:58:42,010][02952] Decorrelating experience for 0 frames...
	[2025-04-08 16:58:42,010][02949] Decorrelating experience for 0 frames...
	[2025-04-08 16:58:42,013][02951] Decorrelating experience for 0 frames...
	[2025-04-08 16:58:42,014][02955] Decorrelating experience for 0 frames...
	[2025-04-08 16:58:42,753][02953] Decorrelating experience for 0 frames...
	[2025-04-08 16:58:42,755][02949] Decorrelating experience for 32 frames...
	[2025-04-08 16:58:43,011][02736] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-04-08 16:58:43,167][02954] Decorrelating experience for 32 frames...
	[2025-04-08 16:58:43,169][02952] Decorrelating experience for 32 frames...
	[2025-04-08 16:58:43,171][02956] Decorrelating experience for 32 frames...
	[2025-04-08 16:58:43,325][02953] Decorrelating experience for 32 frames...
	[2025-04-08 16:58:44,116][02955] Decorrelating experience for 32 frames...
	[2025-04-08 16:58:44,149][02949] Decorrelating experience for 64 frames...
	[2025-04-08 16:58:44,476][02950] Decorrelating experience for 0 frames...
	[2025-04-08 16:58:44,832][02952] Decorrelating experience for 64 frames...
	[2025-04-08 16:58:44,833][02956] Decorrelating experience for 64 frames...
	[2025-04-08 16:58:45,101][02951] Decorrelating experience for 32 frames...
	[2025-04-08 16:58:45,230][02949] Decorrelating experience for 96 frames...
	[2025-04-08 16:58:45,955][02950] Decorrelating experience for 32 frames...
	[2025-04-08 16:58:45,992][02954] Decorrelating experience for 64 frames...
	[2025-04-08 16:58:46,093][02955] Decorrelating experience for 64 frames...
	[2025-04-08 16:58:46,579][02956] Decorrelating experience for 96 frames...
	[2025-04-08 16:58:47,258][02951] Decorrelating experience for 64 frames...
	[2025-04-08 16:58:48,011][02736] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-04-08 16:58:48,014][02736] Avg episode reward: [(0, '0.480')]
	[2025-04-08 16:58:48,390][02952] Decorrelating experience for 96 frames...
	[2025-04-08 16:58:48,715][02954] Decorrelating experience for 96 frames...
	[2025-04-08 16:58:48,763][02953] Decorrelating experience for 64 frames...
	[2025-04-08 16:58:49,018][02955] Decorrelating experience for 96 frames...
	[2025-04-08 16:58:52,080][02935] Signal inference workers to stop experience collection...
	[2025-04-08 16:58:52,092][02948] InferenceWorker_p0-w0: stopping experience collection
	[2025-04-08 16:58:52,185][02950] Decorrelating experience for 64 frames...
	[2025-04-08 16:58:52,360][02951] Decorrelating experience for 96 frames...
	[2025-04-08 16:58:52,589][02953] Decorrelating experience for 96 frames...
	[2025-04-08 16:58:52,638][02950] Decorrelating experience for 96 frames...
	[2025-04-08 16:58:53,011][02736] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 217.8. Samples: 2178. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-04-08 16:58:53,014][02736] Avg episode reward: [(0, '2.906')]
	[2025-04-08 16:58:54,091][02935] Signal inference workers to resume experience collection...
	[2025-04-08 16:58:54,092][02948] InferenceWorker_p0-w0: resuming experience collection
	[2025-04-08 16:58:58,011][02736] Fps is (10 sec: 2048.0, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 20480. Throughput: 0: 166.7. Samples: 2500. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 16:58:58,015][02736] Avg episode reward: [(0, '3.518')]
	[2025-04-08 16:59:02,430][02948] Updated weights for policy 0, policy_version 10 (0.0029)
	[2025-04-08 16:59:03,011][02736] Fps is (10 sec: 4096.0, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 40960. Throughput: 0: 484.9. Samples: 9698. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 16:59:03,013][02736] Avg episode reward: [(0, '3.950')]
	[2025-04-08 16:59:08,011][02736] Fps is (10 sec: 4096.0, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 61440. Throughput: 0: 592.9. Samples: 14822. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 16:59:08,013][02736] Avg episode reward: [(0, '4.388')]
	[2025-04-08 16:59:12,827][02948] Updated weights for policy 0, policy_version 20 (0.0053)
	[2025-04-08 16:59:13,011][02736] Fps is (10 sec: 4096.0, 60 sec: 2730.7, 300 sec: 2730.7). Total num frames: 81920. Throughput: 0: 613.5. Samples: 18406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 16:59:13,015][02736] Avg episode reward: [(0, '4.329')]
	[2025-04-08 16:59:18,011][02736] Fps is (10 sec: 4096.0, 60 sec: 2925.7, 300 sec: 2925.7). Total num frames: 102400. Throughput: 0: 711.1. Samples: 24890. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 16:59:18,015][02736] Avg episode reward: [(0, '4.318')]
	[2025-04-08 16:59:18,018][02935] Saving new best policy, reward=4.318!
	[2025-04-08 16:59:22,998][02948] Updated weights for policy 0, policy_version 30 (0.0018)
	[2025-04-08 16:59:23,011][02736] Fps is (10 sec: 4096.0, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 122880. Throughput: 0: 754.5. Samples: 30182. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 16:59:23,015][02736] Avg episode reward: [(0, '4.409')]
	[2025-04-08 16:59:23,029][02935] Saving new best policy, reward=4.409!
	[2025-04-08 16:59:28,011][02736] Fps is (10 sec: 4096.0, 60 sec: 3185.8, 300 sec: 3185.8). Total num frames: 143360. Throughput: 0: 751.2. Samples: 33804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 16:59:28,012][02736] Avg episode reward: [(0, '4.421')]
	[2025-04-08 16:59:28,014][02935] Saving new best policy, reward=4.421!
	[2025-04-08 16:59:33,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3194.9, 300 sec: 3194.9). Total num frames: 159744. Throughput: 0: 896.3. Samples: 40344. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 16:59:33,012][02736] Avg episode reward: [(0, '4.383')]
	[2025-04-08 16:59:33,141][02948] Updated weights for policy 0, policy_version 40 (0.0022)
	[2025-04-08 16:59:38,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3202.3, 300 sec: 3202.3). Total num frames: 176128. Throughput: 0: 927.0. Samples: 43894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 16:59:38,012][02736] Avg episode reward: [(0, '4.353')]
	[2025-04-08 16:59:43,011][02736] Fps is (10 sec: 4096.0, 60 sec: 3345.1, 300 sec: 3345.1). Total num frames: 200704. Throughput: 0: 999.7. Samples: 47488. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 16:59:43,012][02736] Avg episode reward: [(0, '4.359')]
	[2025-04-08 16:59:43,701][02948] Updated weights for policy 0, policy_version 50 (0.0014)
	[2025-04-08 16:59:48,011][02736] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3339.8). Total num frames: 217088. Throughput: 0: 988.8. Samples: 54192. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 16:59:48,014][02736] Avg episode reward: [(0, '4.326')]
	[2025-04-08 16:59:53,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3335.3). Total num frames: 233472. Throughput: 0: 975.9. Samples: 58738. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 16:59:53,013][02736] Avg episode reward: [(0, '4.376')]
	[2025-04-08 16:59:55,433][02948] Updated weights for policy 0, policy_version 60 (0.0032)
	[2025-04-08 16:59:58,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3386.0). Total num frames: 253952. Throughput: 0: 960.5. Samples: 61630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 16:59:58,018][02736] Avg episode reward: [(0, '4.425')]
	[2025-04-08 16:59:58,055][02935] Saving new best policy, reward=4.425!
	[2025-04-08 17:00:03,011][02736] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3430.4). Total num frames: 274432. Throughput: 0: 970.4. Samples: 68560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:00:03,021][02736] Avg episode reward: [(0, '4.581')]
	[2025-04-08 17:00:03,075][02935] Saving new best policy, reward=4.581!
	[2025-04-08 17:00:06,104][02948] Updated weights for policy 0, policy_version 70 (0.0035)
	[2025-04-08 17:00:08,011][02736] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3469.6). Total num frames: 294912. Throughput: 0: 960.4. Samples: 73400. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:00:08,017][02736] Avg episode reward: [(0, '4.561')]
	[2025-04-08 17:00:13,011][02736] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3549.9). Total num frames: 319488. Throughput: 0: 961.4. Samples: 77068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:00:13,012][02736] Avg episode reward: [(0, '4.465')]
	[2025-04-08 17:00:13,017][02935] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth...
	[2025-04-08 17:00:14,702][02948] Updated weights for policy 0, policy_version 80 (0.0019)
	[2025-04-08 17:00:18,011][02736] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3578.6). Total num frames: 339968. Throughput: 0: 977.3. Samples: 84322. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:00:18,014][02736] Avg episode reward: [(0, '4.431')]
	[2025-04-08 17:00:23,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3563.5). Total num frames: 356352. Throughput: 0: 1011.6. Samples: 89414. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-04-08 17:00:23,015][02736] Avg episode reward: [(0, '4.484')]
	[2025-04-08 17:00:24,903][02948] Updated weights for policy 0, policy_version 90 (0.0028)
	[2025-04-08 17:00:28,011][02736] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3627.9). Total num frames: 380928. Throughput: 0: 1011.9. Samples: 93024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:00:28,012][02736] Avg episode reward: [(0, '4.564')]
	[2025-04-08 17:00:33,013][02736] Fps is (10 sec: 4504.7, 60 sec: 4027.6, 300 sec: 3649.1). Total num frames: 401408. Throughput: 0: 1020.8. Samples: 100130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:00:33,015][02736] Avg episode reward: [(0, '4.398')]
	[2025-04-08 17:00:34,893][02948] Updated weights for policy 0, policy_version 100 (0.0013)
	[2025-04-08 17:00:38,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3668.6). Total num frames: 421888. Throughput: 0: 1035.7. Samples: 105346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:00:38,015][02736] Avg episode reward: [(0, '4.495')]
	[2025-04-08 17:00:43,011][02736] Fps is (10 sec: 4506.5, 60 sec: 4096.0, 300 sec: 3720.5). Total num frames: 446464. Throughput: 0: 1053.8. Samples: 109050. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:00:43,015][02736] Avg episode reward: [(0, '4.606')]
	[2025-04-08 17:00:43,023][02935] Saving new best policy, reward=4.606!
	[2025-04-08 17:00:43,680][02948] Updated weights for policy 0, policy_version 110 (0.0014)
	[2025-04-08 17:00:48,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3735.5). Total num frames: 466944. Throughput: 0: 1054.4. Samples: 116006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:00:48,013][02736] Avg episode reward: [(0, '4.487')]
	[2025-04-08 17:00:53,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3749.4). Total num frames: 487424. Throughput: 0: 1065.2. Samples: 121332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:00:53,016][02736] Avg episode reward: [(0, '4.319')]
	[2025-04-08 17:00:53,850][02948] Updated weights for policy 0, policy_version 120 (0.0022)
	[2025-04-08 17:00:58,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3762.3). Total num frames: 507904. Throughput: 0: 1065.8. Samples: 125030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:00:58,016][02736] Avg episode reward: [(0, '4.414')]
	[2025-04-08 17:01:03,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3774.2). Total num frames: 528384. Throughput: 0: 1054.1. Samples: 131758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:01:03,015][02736] Avg episode reward: [(0, '4.672')]
	[2025-04-08 17:01:03,018][02935] Saving new best policy, reward=4.672!
	[2025-04-08 17:01:03,958][02948] Updated weights for policy 0, policy_version 130 (0.0022)
	[2025-04-08 17:01:08,011][02736] Fps is (10 sec: 4095.9, 60 sec: 4232.5, 300 sec: 3785.3). Total num frames: 548864. Throughput: 0: 1062.4. Samples: 137222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:01:08,013][02736] Avg episode reward: [(0, '4.682')]
	[2025-04-08 17:01:08,014][02935] Saving new best policy, reward=4.682!
	[2025-04-08 17:01:12,725][02948] Updated weights for policy 0, policy_version 140 (0.0024)
	[2025-04-08 17:01:13,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3822.9). Total num frames: 573440. Throughput: 0: 1060.6. Samples: 140750. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:01:13,013][02736] Avg episode reward: [(0, '4.546')]
	[2025-04-08 17:01:18,016][02736] Fps is (10 sec: 4094.2, 60 sec: 4163.9, 300 sec: 3805.2). Total num frames: 589824. Throughput: 0: 1049.9. Samples: 147380. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:01:18,017][02736] Avg episode reward: [(0, '4.377')]
	[2025-04-08 17:01:22,849][02948] Updated weights for policy 0, policy_version 150 (0.0018)
	[2025-04-08 17:01:23,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 3840.0). Total num frames: 614400. Throughput: 0: 1062.6. Samples: 153164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:01:23,013][02736] Avg episode reward: [(0, '4.250')]
	[2025-04-08 17:01:28,011][02736] Fps is (10 sec: 4507.7, 60 sec: 4232.5, 300 sec: 3847.8). Total num frames: 634880. Throughput: 0: 1062.3. Samples: 156852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-04-08 17:01:28,013][02736] Avg episode reward: [(0, '4.467')]
	[2025-04-08 17:01:32,853][02948] Updated weights for policy 0, policy_version 160 (0.0040)
	[2025-04-08 17:01:33,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.7, 300 sec: 3855.1). Total num frames: 655360. Throughput: 0: 1047.2. Samples: 163130. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:01:33,013][02736] Avg episode reward: [(0, '4.457')]
	[2025-04-08 17:01:38,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3861.9). Total num frames: 675840. Throughput: 0: 1062.0. Samples: 169120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:01:38,016][02736] Avg episode reward: [(0, '4.457')]
	[2025-04-08 17:01:41,499][02948] Updated weights for policy 0, policy_version 170 (0.0015)
	[2025-04-08 17:01:43,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 3891.2). Total num frames: 700416. Throughput: 0: 1061.6. Samples: 172802. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:01:43,016][02736] Avg episode reward: [(0, '4.478')]
	[2025-04-08 17:01:48,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3874.6). Total num frames: 716800. Throughput: 0: 1049.4. Samples: 178980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:01:48,013][02736] Avg episode reward: [(0, '4.437')]
	[2025-04-08 17:01:51,746][02948] Updated weights for policy 0, policy_version 180 (0.0031)
	[2025-04-08 17:01:53,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3902.0). Total num frames: 741376. Throughput: 0: 1066.3. Samples: 185206. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:01:53,015][02736] Avg episode reward: [(0, '4.498')]
	[2025-04-08 17:01:58,011][02736] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 3928.0). Total num frames: 765952. Throughput: 0: 1069.7. Samples: 188886. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:01:58,014][02736] Avg episode reward: [(0, '4.563')]
	[2025-04-08 17:02:01,324][02948] Updated weights for policy 0, policy_version 190 (0.0022)
	[2025-04-08 17:02:03,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3911.7). Total num frames: 782336. Throughput: 0: 1052.9. Samples: 194756. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-04-08 17:02:03,020][02736] Avg episode reward: [(0, '4.671')]
	[2025-04-08 17:02:08,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 3936.2). Total num frames: 806912. Throughput: 0: 1068.1. Samples: 201228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:02:08,015][02736] Avg episode reward: [(0, '4.503')]
	[2025-04-08 17:02:10,452][02948] Updated weights for policy 0, policy_version 200 (0.0012)
	[2025-04-08 17:02:13,011][02736] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 3959.5). Total num frames: 831488. Throughput: 0: 1067.6. Samples: 204892. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:02:13,013][02736] Avg episode reward: [(0, '4.559')]
	[2025-04-08 17:02:13,019][02935] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000203_831488.pth...
	[2025-04-08 17:02:18,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4232.9, 300 sec: 3924.5). Total num frames: 843776. Throughput: 0: 1055.7. Samples: 210638. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:02:18,013][02736] Avg episode reward: [(0, '4.632')]
	[2025-04-08 17:02:20,703][02948] Updated weights for policy 0, policy_version 210 (0.0016)
	[2025-04-08 17:02:23,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4232.5, 300 sec: 3947.1). Total num frames: 868352. Throughput: 0: 1073.2. Samples: 217412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-04-08 17:02:23,015][02736] Avg episode reward: [(0, '4.548')]
	[2025-04-08 17:02:28,011][02736] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 3968.6). Total num frames: 892928. Throughput: 0: 1074.5. Samples: 221154. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-04-08 17:02:28,012][02736] Avg episode reward: [(0, '4.430')]
	[2025-04-08 17:02:30,140][02948] Updated weights for policy 0, policy_version 220 (0.0029)
	[2025-04-08 17:02:33,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 3953.5). Total num frames: 909312. Throughput: 0: 1060.2. Samples: 226690. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-04-08 17:02:33,012][02736] Avg episode reward: [(0, '4.296')]
	[2025-04-08 17:02:38,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 3974.0). Total num frames: 933888. Throughput: 0: 1074.9. Samples: 233578. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:02:38,012][02736] Avg episode reward: [(0, '4.378')]
	[2025-04-08 17:02:39,184][02948] Updated weights for policy 0, policy_version 230 (0.0018)
	[2025-04-08 17:02:43,011][02736] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 3993.6). Total num frames: 958464. Throughput: 0: 1076.3. Samples: 237320. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:02:43,012][02736] Avg episode reward: [(0, '4.459')]
	[2025-04-08 17:02:48,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 3979.0). Total num frames: 974848. Throughput: 0: 1063.9. Samples: 242632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:02:48,013][02736] Avg episode reward: [(0, '4.514')]
	[2025-04-08 17:02:49,473][02948] Updated weights for policy 0, policy_version 240 (0.0015)
	[2025-04-08 17:02:53,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 3997.7). Total num frames: 999424. Throughput: 0: 1077.2. Samples: 249704. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:02:53,016][02736] Avg episode reward: [(0, '4.629')]
	[2025-04-08 17:02:58,013][02736] Fps is (10 sec: 4504.6, 60 sec: 4232.4, 300 sec: 3999.6). Total num frames: 1019904. Throughput: 0: 1077.7. Samples: 253392. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:02:58,015][02736] Avg episode reward: [(0, '4.594')]
	[2025-04-08 17:02:58,549][02948] Updated weights for policy 0, policy_version 250 (0.0019)
	[2025-04-08 17:03:03,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 4001.5). Total num frames: 1040384. Throughput: 0: 1061.6. Samples: 258412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:03:03,016][02736] Avg episode reward: [(0, '4.547')]
	[2025-04-08 17:03:08,014][02736] Fps is (10 sec: 4095.9, 60 sec: 4232.4, 300 sec: 4003.2). Total num frames: 1060864. Throughput: 0: 1070.3. Samples: 265580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:03:08,017][02736] Avg episode reward: [(0, '4.602')]
	[2025-04-08 17:03:08,076][02948] Updated weights for policy 0, policy_version 260 (0.0031)
	[2025-04-08 17:03:13,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4005.0). Total num frames: 1081344. Throughput: 0: 1066.4. Samples: 269142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:03:13,018][02736] Avg episode reward: [(0, '4.913')]
	[2025-04-08 17:03:13,053][02935] Saving new best policy, reward=4.913!
	[2025-04-08 17:03:18,011][02736] Fps is (10 sec: 4097.1, 60 sec: 4300.8, 300 sec: 4006.6). Total num frames: 1101824. Throughput: 0: 1050.0. Samples: 273938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:03:18,012][02736] Avg episode reward: [(0, '4.989')]
	[2025-04-08 17:03:18,016][02935] Saving new best policy, reward=4.989!
	[2025-04-08 17:03:18,719][02948] Updated weights for policy 0, policy_version 270 (0.0024)
	[2025-04-08 17:03:23,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4022.9). Total num frames: 1126400. Throughput: 0: 1057.3. Samples: 281158. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:03:23,015][02736] Avg episode reward: [(0, '4.921')]
	[2025-04-08 17:03:28,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4009.8). Total num frames: 1142784. Throughput: 0: 1054.8. Samples: 284788. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-04-08 17:03:28,014][02736] Avg episode reward: [(0, '4.881')]
	[2025-04-08 17:03:28,143][02948] Updated weights for policy 0, policy_version 280 (0.0020)
	[2025-04-08 17:03:33,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4232.5, 300 sec: 4011.3). Total num frames: 1163264. Throughput: 0: 1044.8. Samples: 289650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-04-08 17:03:33,016][02736] Avg episode reward: [(0, '4.654')]
	[2025-04-08 17:03:37,629][02948] Updated weights for policy 0, policy_version 290 (0.0023)
	[2025-04-08 17:03:38,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4026.6). Total num frames: 1187840. Throughput: 0: 1050.0. Samples: 296954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:03:38,013][02736] Avg episode reward: [(0, '4.727')]
	[2025-04-08 17:03:43,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 1208320. Throughput: 0: 1047.6. Samples: 300532. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:03:43,016][02736] Avg episode reward: [(0, '4.846')]
	[2025-04-08 17:03:47,937][02948] Updated weights for policy 0, policy_version 300 (0.0013)
	[2025-04-08 17:03:48,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4165.4). Total num frames: 1228800. Throughput: 0: 1048.4. Samples: 305590. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:03:48,013][02736] Avg episode reward: [(0, '4.953')]
	[2025-04-08 17:03:53,012][02736] Fps is (10 sec: 4505.4, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 1253376. Throughput: 0: 1052.3. Samples: 312930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:03:53,014][02736] Avg episode reward: [(0, '4.874')]
	[2025-04-08 17:03:56,790][02948] Updated weights for policy 0, policy_version 310 (0.0019)
	[2025-04-08 17:03:58,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4164.4, 300 sec: 4165.4). Total num frames: 1269760. Throughput: 0: 1055.1. Samples: 316622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:03:58,014][02736] Avg episode reward: [(0, '5.012')]
	[2025-04-08 17:03:58,019][02935] Saving new best policy, reward=5.012!
	[2025-04-08 17:04:03,011][02736] Fps is (10 sec: 3686.5, 60 sec: 4164.3, 300 sec: 4165.4). Total num frames: 1290240. Throughput: 0: 1058.4. Samples: 321568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:04:03,021][02736] Avg episode reward: [(0, '5.273')]
	[2025-04-08 17:04:03,033][02935] Saving new best policy, reward=5.273!
	[2025-04-08 17:04:06,603][02948] Updated weights for policy 0, policy_version 320 (0.0035)
	[2025-04-08 17:04:08,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4232.7, 300 sec: 4179.3). Total num frames: 1314816. Throughput: 0: 1058.8. Samples: 328804. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:04:08,016][02736] Avg episode reward: [(0, '5.084')]
	[2025-04-08 17:04:13,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 1335296. Throughput: 0: 1057.2. Samples: 332364. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:04:13,017][02736] Avg episode reward: [(0, '4.833')]
	[2025-04-08 17:04:13,022][02935] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000326_1335296.pth...
	[2025-04-08 17:04:13,189][02935] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000078_319488.pth
	[2025-04-08 17:04:17,072][02948] Updated weights for policy 0, policy_version 330 (0.0044)
	[2025-04-08 17:04:18,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4179.3). Total num frames: 1355776. Throughput: 0: 1062.8. Samples: 337474. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-04-08 17:04:18,015][02736] Avg episode reward: [(0, '4.919')]
	[2025-04-08 17:04:23,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 1380352. Throughput: 0: 1062.6. Samples: 344772. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:04:23,020][02736] Avg episode reward: [(0, '5.170')]
	[2025-04-08 17:04:25,951][02948] Updated weights for policy 0, policy_version 340 (0.0023)
	[2025-04-08 17:04:28,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4193.2). Total num frames: 1396736. Throughput: 0: 1059.5. Samples: 348210. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2025-04-08 17:04:28,014][02736] Avg episode reward: [(0, '5.362')]
	[2025-04-08 17:04:28,019][02935] Saving new best policy, reward=5.362!
	[2025-04-08 17:04:33,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4232.5, 300 sec: 4207.1). Total num frames: 1417216. Throughput: 0: 1061.0. Samples: 353336. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:04:33,015][02736] Avg episode reward: [(0, '5.284')]
	[2025-04-08 17:04:35,634][02948] Updated weights for policy 0, policy_version 350 (0.0028)
	[2025-04-08 17:04:38,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4232.5, 300 sec: 4207.1). Total num frames: 1441792. Throughput: 0: 1062.5. Samples: 360744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:04:38,017][02736] Avg episode reward: [(0, '4.903')]
	[2025-04-08 17:04:43,014][02736] Fps is (10 sec: 3685.4, 60 sec: 4095.8, 300 sec: 4193.2). Total num frames: 1454080. Throughput: 0: 1030.0. Samples: 362976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:04:43,018][02736] Avg episode reward: [(0, '4.971')]
	[2025-04-08 17:04:47,794][02948] Updated weights for policy 0, policy_version 360 (0.0022)
	[2025-04-08 17:04:48,011][02736] Fps is (10 sec: 3276.8, 60 sec: 4096.0, 300 sec: 4207.1). Total num frames: 1474560. Throughput: 0: 1013.8. Samples: 367188. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:04:48,016][02736] Avg episode reward: [(0, '5.284')]
	[2025-04-08 17:04:53,011][02736] Fps is (10 sec: 4506.8, 60 sec: 4096.0, 300 sec: 4221.0). Total num frames: 1499136. Throughput: 0: 1015.6. Samples: 374508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:04:53,016][02736] Avg episode reward: [(0, '5.321')]
	[2025-04-08 17:04:56,607][02948] Updated weights for policy 0, policy_version 370 (0.0034)
	[2025-04-08 17:04:58,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4221.0). Total num frames: 1519616. Throughput: 0: 1018.1. Samples: 378180. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:04:58,016][02736] Avg episode reward: [(0, '5.233')]
	[2025-04-08 17:05:03,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4207.1). Total num frames: 1536000. Throughput: 0: 1010.9. Samples: 382964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:05:03,016][02736] Avg episode reward: [(0, '5.377')]
	[2025-04-08 17:05:03,022][02935] Saving new best policy, reward=5.377!
	[2025-04-08 17:05:06,841][02948] Updated weights for policy 0, policy_version 380 (0.0035)
	[2025-04-08 17:05:08,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4207.1). Total num frames: 1560576. Throughput: 0: 1004.1. Samples: 389958. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:05:08,016][02736] Avg episode reward: [(0, '5.311')]
	[2025-04-08 17:05:13,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4193.2). Total num frames: 1576960. Throughput: 0: 1002.0. Samples: 393298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:05:13,013][02736] Avg episode reward: [(0, '5.190')]
	[2025-04-08 17:05:17,439][02948] Updated weights for policy 0, policy_version 390 (0.0027)
	[2025-04-08 17:05:18,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4207.1). Total num frames: 1597440. Throughput: 0: 996.7. Samples: 398188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:05:18,013][02736] Avg episode reward: [(0, '5.203')]
	[2025-04-08 17:05:23,011][02736] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 4207.1). Total num frames: 1622016. Throughput: 0: 997.0. Samples: 405610. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:05:23,016][02736] Avg episode reward: [(0, '5.554')]
	[2025-04-08 17:05:23,022][02935] Saving new best policy, reward=5.554!
	[2025-04-08 17:05:26,895][02948] Updated weights for policy 0, policy_version 400 (0.0018)
	[2025-04-08 17:05:28,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4193.2). Total num frames: 1638400. Throughput: 0: 1019.7. Samples: 408858. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:05:28,012][02736] Avg episode reward: [(0, '5.389')]
	[2025-04-08 17:05:33,011][02736] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4207.1). Total num frames: 1662976. Throughput: 0: 1042.6. Samples: 414106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:05:33,013][02736] Avg episode reward: [(0, '5.297')]
	[2025-04-08 17:05:36,300][02948] Updated weights for policy 0, policy_version 410 (0.0018)
	[2025-04-08 17:05:38,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4193.2). Total num frames: 1683456. Throughput: 0: 1041.0. Samples: 421352. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:05:38,013][02736] Avg episode reward: [(0, '5.323')]
	[2025-04-08 17:05:43,016][02736] Fps is (10 sec: 4094.1, 60 sec: 4164.1, 300 sec: 4193.1). Total num frames: 1703936. Throughput: 0: 1028.2. Samples: 424452. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:05:43,023][02736] Avg episode reward: [(0, '5.509')]
	[2025-04-08 17:05:47,234][02948] Updated weights for policy 0, policy_version 420 (0.0020)
	[2025-04-08 17:05:48,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4179.3). Total num frames: 1720320. Throughput: 0: 1036.7. Samples: 429616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:05:48,017][02736] Avg episode reward: [(0, '5.800')]
	[2025-04-08 17:05:48,019][02935] Saving new best policy, reward=5.800!
	[2025-04-08 17:05:53,011][02736] Fps is (10 sec: 3688.0, 60 sec: 4027.7, 300 sec: 4179.3). Total num frames: 1740800. Throughput: 0: 1012.9. Samples: 435540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:05:53,013][02736] Avg episode reward: [(0, '6.145')]
	[2025-04-08 17:05:53,021][02935] Saving new best policy, reward=6.145!
	[2025-04-08 17:05:58,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4165.4). Total num frames: 1757184. Throughput: 0: 994.0. Samples: 438028. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:05:58,012][02736] Avg episode reward: [(0, '6.206')]
	[2025-04-08 17:05:58,016][02935] Saving new best policy, reward=6.206!
	[2025-04-08 17:05:59,457][02948] Updated weights for policy 0, policy_version 430 (0.0022)
	[2025-04-08 17:06:03,011][02736] Fps is (10 sec: 3276.9, 60 sec: 3959.5, 300 sec: 4151.5). Total num frames: 1773568. Throughput: 0: 977.0. Samples: 442154. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:06:03,016][02736] Avg episode reward: [(0, '6.693')]
	[2025-04-08 17:06:03,027][02935] Saving new best policy, reward=6.693!
	[2025-04-08 17:06:08,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 4123.8). Total num frames: 1789952. Throughput: 0: 932.8. Samples: 447584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:06:08,016][02736] Avg episode reward: [(0, '6.390')]
	[2025-04-08 17:06:10,922][02948] Updated weights for policy 0, policy_version 440 (0.0023)
	[2025-04-08 17:06:13,013][02736] Fps is (10 sec: 3276.3, 60 sec: 3822.8, 300 sec: 4123.8). Total num frames: 1806336. Throughput: 0: 923.6. Samples: 450420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:06:13,020][02736] Avg episode reward: [(0, '6.297')]
	[2025-04-08 17:06:13,031][02935] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000441_1806336.pth...
	[2025-04-08 17:06:13,268][02935] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000203_831488.pth
	[2025-04-08 17:06:18,011][02736] Fps is (10 sec: 2867.2, 60 sec: 3686.4, 300 sec: 4082.1). Total num frames: 1818624. Throughput: 0: 887.4. Samples: 454038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:06:18,013][02736] Avg episode reward: [(0, '6.355')]
	[2025-04-08 17:06:23,011][02736] Fps is (10 sec: 3277.3, 60 sec: 3618.1, 300 sec: 4082.1). Total num frames: 1839104. Throughput: 0: 853.5. Samples: 459758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:06:23,013][02736] Avg episode reward: [(0, '6.719')]
	[2025-04-08 17:06:23,021][02935] Saving new best policy, reward=6.719!
	[2025-04-08 17:06:23,675][02948] Updated weights for policy 0, policy_version 450 (0.0039)
	[2025-04-08 17:06:28,012][02736] Fps is (10 sec: 3686.1, 60 sec: 3618.1, 300 sec: 4068.2). Total num frames: 1855488. Throughput: 0: 846.7. Samples: 462550. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:06:28,025][02736] Avg episode reward: [(0, '6.784')]
	[2025-04-08 17:06:28,031][02935] Saving new best policy, reward=6.784!
	[2025-04-08 17:06:33,012][02736] Fps is (10 sec: 2866.9, 60 sec: 3413.3, 300 sec: 4040.4). Total num frames: 1867776. Throughput: 0: 817.8. Samples: 466416. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:06:33,013][02736] Avg episode reward: [(0, '6.510')]
	[2025-04-08 17:06:36,998][02948] Updated weights for policy 0, policy_version 460 (0.0037)
	[2025-04-08 17:06:38,011][02736] Fps is (10 sec: 2867.4, 60 sec: 3345.1, 300 sec: 4012.7). Total num frames: 1884160. Throughput: 0: 801.5. Samples: 471606. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:06:38,016][02736] Avg episode reward: [(0, '6.896')]
	[2025-04-08 17:06:38,019][02935] Saving new best policy, reward=6.896!
	[2025-04-08 17:06:43,011][02736] Fps is (10 sec: 3686.8, 60 sec: 3345.3, 300 sec: 4026.6). Total num frames: 1904640. Throughput: 0: 810.1. Samples: 474484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:06:43,012][02736] Avg episode reward: [(0, '7.668')]
	[2025-04-08 17:06:43,020][02935] Saving new best policy, reward=7.668!
	[2025-04-08 17:06:48,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3984.9). Total num frames: 1916928. Throughput: 0: 814.5. Samples: 478808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:06:48,013][02736] Avg episode reward: [(0, '7.839')]
	[2025-04-08 17:06:48,015][02935] Saving new best policy, reward=7.839!
	[2025-04-08 17:06:50,415][02948] Updated weights for policy 0, policy_version 470 (0.0037)
	[2025-04-08 17:06:53,011][02736] Fps is (10 sec: 2867.2, 60 sec: 3208.5, 300 sec: 3957.2). Total num frames: 1933312. Throughput: 0: 799.5. Samples: 483560. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:06:53,012][02736] Avg episode reward: [(0, '8.669')]
	[2025-04-08 17:06:53,023][02935] Saving new best policy, reward=8.669!
	[2025-04-08 17:06:58,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3971.0). Total num frames: 1953792. Throughput: 0: 797.4. Samples: 486302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:06:58,016][02736] Avg episode reward: [(0, '9.010')]
	[2025-04-08 17:06:58,021][02935] Saving new best policy, reward=9.010!
	[2025-04-08 17:07:02,388][02948] Updated weights for policy 0, policy_version 480 (0.0028)
	[2025-04-08 17:07:03,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3208.5, 300 sec: 3929.4). Total num frames: 1966080. Throughput: 0: 826.6. Samples: 491234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:07:03,015][02736] Avg episode reward: [(0, '9.281')]
	[2025-04-08 17:07:03,031][02935] Saving new best policy, reward=9.281!
	[2025-04-08 17:07:08,012][02736] Fps is (10 sec: 2867.1, 60 sec: 3208.5, 300 sec: 3901.6). Total num frames: 1982464. Throughput: 0: 799.6. Samples: 495740. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:07:08,013][02736] Avg episode reward: [(0, '8.886')]
	[2025-04-08 17:07:13,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3276.9, 300 sec: 3929.4). Total num frames: 2002944. Throughput: 0: 805.1. Samples: 498780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:07:13,013][02736] Avg episode reward: [(0, '9.010')]
	[2025-04-08 17:07:13,656][02948] Updated weights for policy 0, policy_version 490 (0.0012)
	[2025-04-08 17:07:18,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3901.6). Total num frames: 2019328. Throughput: 0: 847.4. Samples: 504550. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:07:18,019][02736] Avg episode reward: [(0, '9.027')]
	[2025-04-08 17:07:23,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3873.8). Total num frames: 2035712. Throughput: 0: 823.3. Samples: 508654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:07:23,015][02736] Avg episode reward: [(0, '9.087')]
	[2025-04-08 17:07:25,959][02948] Updated weights for policy 0, policy_version 500 (0.0035)
	[2025-04-08 17:07:28,011][02736] Fps is (10 sec: 3686.5, 60 sec: 3345.1, 300 sec: 3887.7). Total num frames: 2056192. Throughput: 0: 826.8. Samples: 511688. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:07:28,012][02736] Avg episode reward: [(0, '10.268')]
	[2025-04-08 17:07:28,017][02935] Saving new best policy, reward=10.268!
	[2025-04-08 17:07:33,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3860.0). Total num frames: 2072576. Throughput: 0: 861.0. Samples: 517552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:07:33,012][02736] Avg episode reward: [(0, '9.242')]
	[2025-04-08 17:07:38,011][02736] Fps is (10 sec: 2867.2, 60 sec: 3345.1, 300 sec: 3818.3). Total num frames: 2084864. Throughput: 0: 844.8. Samples: 521576. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-04-08 17:07:38,015][02736] Avg episode reward: [(0, '9.283')]
	[2025-04-08 17:07:38,545][02948] Updated weights for policy 0, policy_version 510 (0.0024)
	[2025-04-08 17:07:43,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3832.2). Total num frames: 2105344. Throughput: 0: 846.4. Samples: 524390. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:07:43,016][02736] Avg episode reward: [(0, '8.635')]
	[2025-04-08 17:07:48,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3804.4). Total num frames: 2121728. Throughput: 0: 867.4. Samples: 530266. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:07:48,014][02736] Avg episode reward: [(0, '9.050')]
	[2025-04-08 17:07:49,418][02948] Updated weights for policy 0, policy_version 520 (0.0021)
	[2025-04-08 17:07:53,012][02736] Fps is (10 sec: 3276.6, 60 sec: 3413.3, 300 sec: 3790.6). Total num frames: 2138112. Throughput: 0: 866.7. Samples: 534742. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:07:53,015][02736] Avg episode reward: [(0, '9.695')]
	[2025-04-08 17:07:58,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3776.7). Total num frames: 2154496. Throughput: 0: 853.6. Samples: 537192. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:07:58,013][02736] Avg episode reward: [(0, '9.966')]
	[2025-04-08 17:08:01,108][02948] Updated weights for policy 0, policy_version 530 (0.0029)
	[2025-04-08 17:08:03,011][02736] Fps is (10 sec: 3686.6, 60 sec: 3481.6, 300 sec: 3776.7). Total num frames: 2174976. Throughput: 0: 859.7. Samples: 543238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:08:03,012][02736] Avg episode reward: [(0, '10.713')]
	[2025-04-08 17:08:03,018][02935] Saving new best policy, reward=10.713!
	[2025-04-08 17:08:08,014][02736] Fps is (10 sec: 3275.9, 60 sec: 3413.2, 300 sec: 3748.8). Total num frames: 2187264. Throughput: 0: 870.8. Samples: 547842. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:08:08,019][02736] Avg episode reward: [(0, '11.332')]
	[2025-04-08 17:08:08,095][02935] Saving new best policy, reward=11.332!
	[2025-04-08 17:08:13,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3748.9). Total num frames: 2207744. Throughput: 0: 846.9. Samples: 549800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:08:13,015][02736] Avg episode reward: [(0, '10.992')]
	[2025-04-08 17:08:13,035][02935] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000539_2207744.pth...
	[2025-04-08 17:08:13,175][02935] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000326_1335296.pth
	[2025-04-08 17:08:13,844][02948] Updated weights for policy 0, policy_version 540 (0.0036)
	[2025-04-08 17:08:18,015][02736] Fps is (10 sec: 4095.5, 60 sec: 3481.4, 300 sec: 3734.9). Total num frames: 2228224. Throughput: 0: 850.0. Samples: 555804. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:08:18,017][02736] Avg episode reward: [(0, '10.632')]
	[2025-04-08 17:08:23,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3721.1). Total num frames: 2240512. Throughput: 0: 879.6. Samples: 561158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:08:23,013][02736] Avg episode reward: [(0, '11.263')]
	[2025-04-08 17:08:26,115][02948] Updated weights for policy 0, policy_version 550 (0.0025)
	[2025-04-08 17:08:28,011][02736] Fps is (10 sec: 3278.1, 60 sec: 3413.3, 300 sec: 3721.1). Total num frames: 2260992. Throughput: 0: 859.6. Samples: 563074. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-04-08 17:08:28,012][02736] Avg episode reward: [(0, '11.331')]
	[2025-04-08 17:08:33,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3693.3). Total num frames: 2277376. Throughput: 0: 862.0. Samples: 569058. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:08:33,020][02736] Avg episode reward: [(0, '12.522')]
	[2025-04-08 17:08:33,116][02935] Saving new best policy, reward=12.522!
	[2025-04-08 17:08:36,294][02948] Updated weights for policy 0, policy_version 560 (0.0021)
	[2025-04-08 17:08:38,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3693.3). Total num frames: 2297856. Throughput: 0: 882.2. Samples: 574442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:08:38,012][02736] Avg episode reward: [(0, '13.504')]
	[2025-04-08 17:08:38,017][02935] Saving new best policy, reward=13.504!
	[2025-04-08 17:08:43,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3665.6). Total num frames: 2310144. Throughput: 0: 869.6. Samples: 576326. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:08:43,013][02736] Avg episode reward: [(0, '12.474')]
	[2025-04-08 17:08:48,011][02736] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 2330624. Throughput: 0: 860.5. Samples: 581960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:08:48,014][02736] Avg episode reward: [(0, '12.605')]
	[2025-04-08 17:08:48,477][02948] Updated weights for policy 0, policy_version 570 (0.0018)
	[2025-04-08 17:08:53,017][02736] Fps is (10 sec: 4093.7, 60 sec: 3549.6, 300 sec: 3665.5). Total num frames: 2351104. Throughput: 0: 899.5. Samples: 588324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:08:53,020][02736] Avg episode reward: [(0, '12.133')]
	[2025-04-08 17:08:58,011][02736] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 2371584. Throughput: 0: 905.2. Samples: 590534. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:08:58,012][02736] Avg episode reward: [(0, '11.753')]
	[2025-04-08 17:08:58,781][02948] Updated weights for policy 0, policy_version 580 (0.0021)
	[2025-04-08 17:09:03,011][02736] Fps is (10 sec: 4508.1, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 2396160. Throughput: 0: 927.2. Samples: 597526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:09:03,012][02736] Avg episode reward: [(0, '12.616')]
	[2025-04-08 17:09:07,613][02948] Updated weights for policy 0, policy_version 590 (0.0023)
	[2025-04-08 17:09:08,011][02736] Fps is (10 sec: 4505.6, 60 sec: 3823.1, 300 sec: 3665.6). Total num frames: 2416640. Throughput: 0: 961.6. Samples: 604432. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:09:08,012][02736] Avg episode reward: [(0, '12.740')]
	[2025-04-08 17:09:13,012][02736] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3651.7). Total num frames: 2433024. Throughput: 0: 966.2. Samples: 606552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:09:13,016][02736] Avg episode reward: [(0, '13.067')]
	[2025-04-08 17:09:17,285][02948] Updated weights for policy 0, policy_version 600 (0.0017)
	[2025-04-08 17:09:18,011][02736] Fps is (10 sec: 4096.0, 60 sec: 3823.2, 300 sec: 3651.7). Total num frames: 2457600. Throughput: 0: 990.1. Samples: 613614. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:09:18,012][02736] Avg episode reward: [(0, '13.956')]
	[2025-04-08 17:09:18,014][02935] Saving new best policy, reward=13.956!
	[2025-04-08 17:09:23,011][02736] Fps is (10 sec: 4505.8, 60 sec: 3959.5, 300 sec: 3665.6). Total num frames: 2478080. Throughput: 0: 1017.6. Samples: 620236. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:09:23,012][02736] Avg episode reward: [(0, '14.027')]
	[2025-04-08 17:09:23,019][02935] Saving new best policy, reward=14.027!
	[2025-04-08 17:09:27,571][02948] Updated weights for policy 0, policy_version 610 (0.0016)
	[2025-04-08 17:09:28,011][02736] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3665.6). Total num frames: 2498560. Throughput: 0: 1023.0. Samples: 622362. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:09:28,016][02736] Avg episode reward: [(0, '13.931')]
	[2025-04-08 17:09:33,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3665.6). Total num frames: 2523136. Throughput: 0: 1057.9. Samples: 629564. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:09:33,013][02736] Avg episode reward: [(0, '14.599')]
	[2025-04-08 17:09:33,019][02935] Saving new best policy, reward=14.599!
	[2025-04-08 17:09:36,798][02948] Updated weights for policy 0, policy_version 620 (0.0017)
	[2025-04-08 17:09:38,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3679.5). Total num frames: 2539520. Throughput: 0: 1054.3. Samples: 635760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:09:38,013][02736] Avg episode reward: [(0, '15.451')]
	[2025-04-08 17:09:38,014][02935] Saving new best policy, reward=15.451!
	[2025-04-08 17:09:43,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 3679.5). Total num frames: 2560000. Throughput: 0: 1051.7. Samples: 637860. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:09:43,014][02736] Avg episode reward: [(0, '15.656')]
	[2025-04-08 17:09:43,024][02935] Saving new best policy, reward=15.656!
	[2025-04-08 17:09:48,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3651.7). Total num frames: 2576384. Throughput: 0: 1020.6. Samples: 643454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:09:48,012][02736] Avg episode reward: [(0, '15.647')]
	[2025-04-08 17:09:48,620][02948] Updated weights for policy 0, policy_version 630 (0.0034)
	[2025-04-08 17:09:53,011][02736] Fps is (10 sec: 3276.8, 60 sec: 4028.1, 300 sec: 3637.8). Total num frames: 2592768. Throughput: 0: 989.1. Samples: 648940. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:09:53,013][02736] Avg episode reward: [(0, '15.079')]
	[2025-04-08 17:09:58,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3665.6). Total num frames: 2617344. Throughput: 0: 997.1. Samples: 651420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:09:58,013][02736] Avg episode reward: [(0, '15.209')]
	[2025-04-08 17:09:58,772][02948] Updated weights for policy 0, policy_version 640 (0.0017)
	[2025-04-08 17:10:03,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3651.7). Total num frames: 2637824. Throughput: 0: 1004.1. Samples: 658800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:10:03,013][02736] Avg episode reward: [(0, '14.902')]
	[2025-04-08 17:10:08,015][02736] Fps is (10 sec: 4094.3, 60 sec: 4027.4, 300 sec: 3665.5). Total num frames: 2658304. Throughput: 0: 988.4. Samples: 664716. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:10:08,017][02736] Avg episode reward: [(0, '15.717')]
	[2025-04-08 17:10:08,020][02935] Saving new best policy, reward=15.717!
	[2025-04-08 17:10:09,131][02948] Updated weights for policy 0, policy_version 650 (0.0013)
	[2025-04-08 17:10:13,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3665.6). Total num frames: 2678784. Throughput: 0: 997.8. Samples: 667264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-04-08 17:10:13,022][02736] Avg episode reward: [(0, '15.282')]
	[2025-04-08 17:10:13,033][02935] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000654_2678784.pth...
	[2025-04-08 17:10:13,179][02935] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000441_1806336.pth
	[2025-04-08 17:10:17,598][02948] Updated weights for policy 0, policy_version 660 (0.0013)
	[2025-04-08 17:10:18,011][02736] Fps is (10 sec: 4507.5, 60 sec: 4096.0, 300 sec: 3665.6). Total num frames: 2703360. Throughput: 0: 998.8. Samples: 674512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:10:18,015][02736] Avg episode reward: [(0, '15.133')]
	[2025-04-08 17:10:23,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3665.6). Total num frames: 2719744. Throughput: 0: 994.8. Samples: 680528. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:10:23,015][02736] Avg episode reward: [(0, '15.649')]
	[2025-04-08 17:10:27,471][02948] Updated weights for policy 0, policy_version 670 (0.0030)
	[2025-04-08 17:10:28,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3665.6). Total num frames: 2744320. Throughput: 0: 1015.7. Samples: 683568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:10:28,016][02736] Avg episode reward: [(0, '15.459')]
	[2025-04-08 17:10:33,011][02736] Fps is (10 sec: 4915.2, 60 sec: 4096.0, 300 sec: 3679.5). Total num frames: 2768896. Throughput: 0: 1063.1. Samples: 691294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:10:33,017][02736] Avg episode reward: [(0, '14.493')]
	[2025-04-08 17:10:37,077][02948] Updated weights for policy 0, policy_version 680 (0.0023)
	[2025-04-08 17:10:38,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3665.6). Total num frames: 2785280. Throughput: 0: 1064.2. Samples: 696828. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:10:38,015][02736] Avg episode reward: [(0, '14.116')]
	[2025-04-08 17:10:43,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3693.3). Total num frames: 2809856. Throughput: 0: 1083.6. Samples: 700182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:10:43,016][02736] Avg episode reward: [(0, '14.541')]
	[2025-04-08 17:10:45,527][02948] Updated weights for policy 0, policy_version 690 (0.0020)
	[2025-04-08 17:10:48,011][02736] Fps is (10 sec: 5324.8, 60 sec: 4369.1, 300 sec: 3721.1). Total num frames: 2838528. Throughput: 0: 1091.0. Samples: 707896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:10:48,013][02736] Avg episode reward: [(0, '15.924')]
	[2025-04-08 17:10:48,017][02935] Saving new best policy, reward=15.924!
	[2025-04-08 17:10:53,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 3707.2). Total num frames: 2850816. Throughput: 0: 1079.2. Samples: 713274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:10:53,015][02736] Avg episode reward: [(0, '17.422')]
	[2025-04-08 17:10:53,091][02935] Saving new best policy, reward=17.422!
	[2025-04-08 17:10:55,583][02948] Updated weights for policy 0, policy_version 700 (0.0012)
	[2025-04-08 17:10:58,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4300.8, 300 sec: 3735.0). Total num frames: 2875392. Throughput: 0: 1100.4. Samples: 716784. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:10:58,016][02736] Avg episode reward: [(0, '19.315')]
	[2025-04-08 17:10:58,019][02935] Saving new best policy, reward=19.315!
	[2025-04-08 17:11:03,011][02736] Fps is (10 sec: 4915.2, 60 sec: 4369.1, 300 sec: 3762.8). Total num frames: 2899968. Throughput: 0: 1098.9. Samples: 723964. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:11:03,014][02736] Avg episode reward: [(0, '18.933')]
	[2025-04-08 17:11:05,107][02948] Updated weights for policy 0, policy_version 710 (0.0026)
	[2025-04-08 17:11:08,012][02736] Fps is (10 sec: 4095.8, 60 sec: 4301.1, 300 sec: 3762.8). Total num frames: 2916352. Throughput: 0: 1077.1. Samples: 728996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:11:08,017][02736] Avg episode reward: [(0, '18.621')]
	[2025-04-08 17:11:13,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4300.8, 300 sec: 3790.5). Total num frames: 2936832. Throughput: 0: 1077.8. Samples: 732068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:11:13,016][02736] Avg episode reward: [(0, '16.796')]
	[2025-04-08 17:11:14,964][02948] Updated weights for policy 0, policy_version 720 (0.0033)
	[2025-04-08 17:11:18,014][02736] Fps is (10 sec: 4504.6, 60 sec: 4300.6, 300 sec: 3804.4). Total num frames: 2961408. Throughput: 0: 1064.4. Samples: 739194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:11:18,015][02736] Avg episode reward: [(0, '17.211')]
	[2025-04-08 17:11:23,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 3804.4). Total num frames: 2977792. Throughput: 0: 1055.5. Samples: 744326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:11:23,012][02736] Avg episode reward: [(0, '17.878')]
	[2025-04-08 17:11:25,202][02948] Updated weights for policy 0, policy_version 730 (0.0018)
	[2025-04-08 17:11:28,011][02736] Fps is (10 sec: 4097.1, 60 sec: 4300.8, 300 sec: 3846.1). Total num frames: 3002368. Throughput: 0: 1062.8. Samples: 748006. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:11:28,012][02736] Avg episode reward: [(0, '17.317')]
	[2025-04-08 17:11:33,011][02736] Fps is (10 sec: 4915.2, 60 sec: 4300.8, 300 sec: 3873.8). Total num frames: 3026944. Throughput: 0: 1059.6. Samples: 755580. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:11:33,013][02736] Avg episode reward: [(0, '19.101')]
	[2025-04-08 17:11:34,075][02948] Updated weights for policy 0, policy_version 740 (0.0030)
	[2025-04-08 17:11:38,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 3860.0). Total num frames: 3043328. Throughput: 0: 1050.8. Samples: 760558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:11:38,012][02736] Avg episode reward: [(0, '20.240')]
	[2025-04-08 17:11:38,014][02935] Saving new best policy, reward=20.240!
	[2025-04-08 17:11:43,012][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.8, 300 sec: 3901.6). Total num frames: 3067904. Throughput: 0: 1052.6. Samples: 764150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:11:43,016][02736] Avg episode reward: [(0, '19.932')]
	[2025-04-08 17:11:43,767][02948] Updated weights for policy 0, policy_version 750 (0.0030)
	[2025-04-08 17:11:48,013][02736] Fps is (10 sec: 4504.8, 60 sec: 4164.1, 300 sec: 3915.5). Total num frames: 3088384. Throughput: 0: 1053.5. Samples: 771372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-04-08 17:11:48,019][02736] Avg episode reward: [(0, '19.584')]
	[2025-04-08 17:11:53,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4232.5, 300 sec: 3901.6). Total num frames: 3104768. Throughput: 0: 1046.5. Samples: 776086. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:11:53,013][02736] Avg episode reward: [(0, '20.059')]
	[2025-04-08 17:11:54,504][02948] Updated weights for policy 0, policy_version 760 (0.0025)
	[2025-04-08 17:11:58,012][02736] Fps is (10 sec: 4096.6, 60 sec: 4232.5, 300 sec: 3943.3). Total num frames: 3129344. Throughput: 0: 1057.4. Samples: 779652. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-04-08 17:11:58,017][02736] Avg episode reward: [(0, '21.273')]
	[2025-04-08 17:11:58,021][02935] Saving new best policy, reward=21.273!
	[2025-04-08 17:12:03,012][02736] Fps is (10 sec: 4505.2, 60 sec: 4164.2, 300 sec: 3957.1). Total num frames: 3149824. Throughput: 0: 1056.7. Samples: 786746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:12:03,015][02736] Avg episode reward: [(0, '22.017')]
	[2025-04-08 17:12:03,031][02935] Saving new best policy, reward=22.017!
	[2025-04-08 17:12:04,505][02948] Updated weights for policy 0, policy_version 770 (0.0021)
	[2025-04-08 17:12:08,011][02736] Fps is (10 sec: 3686.5, 60 sec: 4164.3, 300 sec: 3943.3). Total num frames: 3166208. Throughput: 0: 1042.8. Samples: 791250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:12:08,015][02736] Avg episode reward: [(0, '22.282')]
	[2025-04-08 17:12:08,018][02935] Saving new best policy, reward=22.282!
	[2025-04-08 17:12:13,011][02736] Fps is (10 sec: 3686.7, 60 sec: 4164.3, 300 sec: 3957.2). Total num frames: 3186688. Throughput: 0: 1038.1. Samples: 794722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:12:13,015][02736] Avg episode reward: [(0, '22.812')]
	[2025-04-08 17:12:13,043][02935] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000779_3190784.pth...
	[2025-04-08 17:12:13,170][02935] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000539_2207744.pth
	[2025-04-08 17:12:13,192][02935] Saving new best policy, reward=22.812!
	[2025-04-08 17:12:14,023][02948] Updated weights for policy 0, policy_version 780 (0.0030)
	[2025-04-08 17:12:18,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4096.2, 300 sec: 3971.0). Total num frames: 3207168. Throughput: 0: 1027.2. Samples: 801806. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:12:18,013][02736] Avg episode reward: [(0, '21.864')]
	[2025-04-08 17:12:23,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3971.0). Total num frames: 3227648. Throughput: 0: 1032.4. Samples: 807016. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:12:23,012][02736] Avg episode reward: [(0, '20.309')]
	[2025-04-08 17:12:24,166][02948] Updated weights for policy 0, policy_version 790 (0.0027)
	[2025-04-08 17:12:28,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 3998.8). Total num frames: 3252224. Throughput: 0: 1036.1. Samples: 810774. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:12:28,012][02736] Avg episode reward: [(0, '19.323')]
	[2025-04-08 17:12:33,012][02736] Fps is (10 sec: 4505.1, 60 sec: 4095.9, 300 sec: 4026.6). Total num frames: 3272704. Throughput: 0: 1039.1. Samples: 818132. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:12:33,017][02736] Avg episode reward: [(0, '19.928')]
	[2025-04-08 17:12:33,272][02948] Updated weights for policy 0, policy_version 800 (0.0026)
	[2025-04-08 17:12:38,011][02736] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 4026.6). Total num frames: 3293184. Throughput: 0: 1053.6. Samples: 823496. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-04-08 17:12:38,013][02736] Avg episode reward: [(0, '20.053')]
	[2025-04-08 17:12:42,269][02948] Updated weights for policy 0, policy_version 810 (0.0025)
	[2025-04-08 17:12:43,011][02736] Fps is (10 sec: 4506.1, 60 sec: 4164.3, 300 sec: 4054.3). Total num frames: 3317760. Throughput: 0: 1056.9. Samples: 827214. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:12:43,012][02736] Avg episode reward: [(0, '19.449')]
	[2025-04-08 17:12:48,011][02736] Fps is (10 sec: 4505.7, 60 sec: 4164.4, 300 sec: 4068.2). Total num frames: 3338240. Throughput: 0: 1062.6. Samples: 834560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:12:48,016][02736] Avg episode reward: [(0, '19.886')]
	[2025-04-08 17:12:52,341][02948] Updated weights for policy 0, policy_version 820 (0.0021)
	[2025-04-08 17:12:53,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4082.1). Total num frames: 3358720. Throughput: 0: 1082.7. Samples: 839970. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:12:53,013][02736] Avg episode reward: [(0, '19.667')]
	[2025-04-08 17:12:58,011][02736] Fps is (10 sec: 4915.1, 60 sec: 4300.8, 300 sec: 4109.9). Total num frames: 3387392. Throughput: 0: 1088.8. Samples: 843718. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:12:58,013][02736] Avg episode reward: [(0, '18.125')]
	[2025-04-08 17:13:00,609][02948] Updated weights for policy 0, policy_version 830 (0.0013)
	[2025-04-08 17:13:03,014][02736] Fps is (10 sec: 4504.4, 60 sec: 4232.4, 300 sec: 4123.8). Total num frames: 3403776. Throughput: 0: 1089.3. Samples: 850826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:13:03,019][02736] Avg episode reward: [(0, '18.140')]
	[2025-04-08 17:13:08,011][02736] Fps is (10 sec: 3686.5, 60 sec: 4300.8, 300 sec: 4123.8). Total num frames: 3424256. Throughput: 0: 1098.1. Samples: 856432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:13:08,016][02736] Avg episode reward: [(0, '17.216')]
	[2025-04-08 17:13:10,591][02948] Updated weights for policy 0, policy_version 840 (0.0015)
	[2025-04-08 17:13:13,011][02736] Fps is (10 sec: 4506.8, 60 sec: 4369.1, 300 sec: 4137.7). Total num frames: 3448832. Throughput: 0: 1096.4. Samples: 860110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-04-08 17:13:13,016][02736] Avg episode reward: [(0, '17.991')]
	[2025-04-08 17:13:18,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4165.4). Total num frames: 3469312. Throughput: 0: 1083.4. Samples: 866882. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:13:18,014][02736] Avg episode reward: [(0, '18.006')]
	[2025-04-08 17:13:20,977][02948] Updated weights for policy 0, policy_version 850 (0.0015)
	[2025-04-08 17:13:23,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4369.1, 300 sec: 4165.4). Total num frames: 3489792. Throughput: 0: 1086.2. Samples: 872374. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:13:23,015][02736] Avg episode reward: [(0, '20.734')]
	[2025-04-08 17:13:28,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4369.1, 300 sec: 4193.2). Total num frames: 3514368. Throughput: 0: 1082.9. Samples: 875944. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-04-08 17:13:28,016][02736] Avg episode reward: [(0, '21.217')]
	[2025-04-08 17:13:29,691][02948] Updated weights for policy 0, policy_version 860 (0.0016)
	[2025-04-08 17:13:33,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4300.9, 300 sec: 4179.3). Total num frames: 3530752. Throughput: 0: 1062.0. Samples: 882350. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:13:33,015][02736] Avg episode reward: [(0, '21.547')]
	[2025-04-08 17:13:38,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4300.8, 300 sec: 4207.1). Total num frames: 3551232. Throughput: 0: 1066.2. Samples: 887948. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:13:38,013][02736] Avg episode reward: [(0, '22.140')]
	[2025-04-08 17:13:39,999][02948] Updated weights for policy 0, policy_version 870 (0.0029)
	[2025-04-08 17:13:43,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4300.8, 300 sec: 4221.0). Total num frames: 3575808. Throughput: 0: 1062.7. Samples: 891538. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:13:43,012][02736] Avg episode reward: [(0, '22.828')]
	[2025-04-08 17:13:43,017][02935] Saving new best policy, reward=22.828!
	[2025-04-08 17:13:48,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4207.2). Total num frames: 3592192. Throughput: 0: 1037.4. Samples: 897506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:13:48,012][02736] Avg episode reward: [(0, '22.508')]
	[2025-04-08 17:13:50,699][02948] Updated weights for policy 0, policy_version 880 (0.0015)
	[2025-04-08 17:13:53,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4232.5, 300 sec: 4207.1). Total num frames: 3612672. Throughput: 0: 1040.0. Samples: 903232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:13:53,013][02736] Avg episode reward: [(0, '21.527')]
	[2025-04-08 17:13:58,011][02736] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4207.1). Total num frames: 3637248. Throughput: 0: 1040.9. Samples: 906950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:13:58,013][02736] Avg episode reward: [(0, '22.429')]
	[2025-04-08 17:13:59,433][02948] Updated weights for policy 0, policy_version 890 (0.0012)
	[2025-04-08 17:14:03,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4164.5, 300 sec: 4193.2). Total num frames: 3653632. Throughput: 0: 1028.6. Samples: 913168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:14:03,012][02736] Avg episode reward: [(0, '22.370')]
	[2025-04-08 17:14:08,013][02736] Fps is (10 sec: 4095.4, 60 sec: 4232.4, 300 sec: 4220.9). Total num frames: 3678208. Throughput: 0: 1041.0. Samples: 919222. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:14:08,014][02736] Avg episode reward: [(0, '21.229')]
	[2025-04-08 17:14:09,519][02948] Updated weights for policy 0, policy_version 900 (0.0023)
	[2025-04-08 17:14:13,011][02736] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4221.0). Total num frames: 3702784. Throughput: 0: 1043.4. Samples: 922896. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:14:13,013][02736] Avg episode reward: [(0, '20.807')]
	[2025-04-08 17:14:13,018][02935] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000904_3702784.pth...
	[2025-04-08 17:14:13,157][02935] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000654_2678784.pth
	[2025-04-08 17:14:18,011][02736] Fps is (10 sec: 3687.0, 60 sec: 4096.0, 300 sec: 4193.2). Total num frames: 3715072. Throughput: 0: 1032.0. Samples: 928790. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:14:18,025][02736] Avg episode reward: [(0, '21.620')]
	[2025-04-08 17:14:20,256][02948] Updated weights for policy 0, policy_version 910 (0.0019)
	[2025-04-08 17:14:23,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4207.1). Total num frames: 3739648. Throughput: 0: 1036.0. Samples: 934568. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:14:23,016][02736] Avg episode reward: [(0, '21.861')]
	[2025-04-08 17:14:28,011][02736] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4193.2). Total num frames: 3760128. Throughput: 0: 1037.2. Samples: 938214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:14:28,022][02736] Avg episode reward: [(0, '20.046')]
	[2025-04-08 17:14:29,499][02948] Updated weights for policy 0, policy_version 920 (0.0020)
	[2025-04-08 17:14:33,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4193.2). Total num frames: 3776512. Throughput: 0: 1027.2. Samples: 943728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:14:33,013][02736] Avg episode reward: [(0, '19.532')]
	[2025-04-08 17:14:38,011][02736] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4207.1). Total num frames: 3801088. Throughput: 0: 1043.8. Samples: 950202. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-04-08 17:14:38,013][02736] Avg episode reward: [(0, '18.890')]
	[2025-04-08 17:14:39,310][02948] Updated weights for policy 0, policy_version 930 (0.0027)
	[2025-04-08 17:14:43,011][02736] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4234.8). Total num frames: 3825664. Throughput: 0: 1043.4. Samples: 953904. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:14:43,012][02736] Avg episode reward: [(0, '17.706')]
	[2025-04-08 17:14:48,011][02736] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4221.0). Total num frames: 3837952. Throughput: 0: 1025.3. Samples: 959308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:14:48,016][02736] Avg episode reward: [(0, '18.452')]
	[2025-04-08 17:14:50,167][02948] Updated weights for policy 0, policy_version 940 (0.0015)
	[2025-04-08 17:14:53,011][02736] Fps is (10 sec: 2867.2, 60 sec: 4027.7, 300 sec: 4193.2). Total num frames: 3854336. Throughput: 0: 1005.5. Samples: 964468. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:14:53,019][02736] Avg episode reward: [(0, '19.714')]
	[2025-04-08 17:14:58,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4193.2). Total num frames: 3874816. Throughput: 0: 977.1. Samples: 966864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:14:58,015][02736] Avg episode reward: [(0, '19.256')]
	[2025-04-08 17:15:02,837][02948] Updated weights for policy 0, policy_version 950 (0.0034)
	[2025-04-08 17:15:03,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4179.4). Total num frames: 3891200. Throughput: 0: 956.9. Samples: 971852. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:15:03,020][02736] Avg episode reward: [(0, '19.493')]
	[2025-04-08 17:15:08,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 4179.3). Total num frames: 3911680. Throughput: 0: 967.6. Samples: 978108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-04-08 17:15:08,012][02736] Avg episode reward: [(0, '20.117')]
	[2025-04-08 17:15:11,958][02948] Updated weights for policy 0, policy_version 960 (0.0031)
	[2025-04-08 17:15:13,018][02736] Fps is (10 sec: 4093.3, 60 sec: 3822.5, 300 sec: 4165.3). Total num frames: 3932160. Throughput: 0: 963.0. Samples: 981556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:15:13,019][02736] Avg episode reward: [(0, '18.960')]
	[2025-04-08 17:15:18,011][02736] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 4165.4). Total num frames: 3948544. Throughput: 0: 950.8. Samples: 986516. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:15:18,020][02736] Avg episode reward: [(0, '19.287')]
	[2025-04-08 17:15:22,777][02948] Updated weights for policy 0, policy_version 970 (0.0020)
	[2025-04-08 17:15:23,011][02736] Fps is (10 sec: 4098.7, 60 sec: 3891.2, 300 sec: 4165.4). Total num frames: 3973120. Throughput: 0: 951.6. Samples: 993024. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-04-08 17:15:23,012][02736] Avg episode reward: [(0, '19.852')]
	[2025-04-08 17:15:28,011][02736] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 4151.5). Total num frames: 3993600. Throughput: 0: 946.4. Samples: 996492. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-04-08 17:15:28,013][02736] Avg episode reward: [(0, '19.842')]
	[2025-04-08 17:15:31,865][02935] Stopping Batcher_0...
	[2025-04-08 17:15:31,865][02935] Loop batcher_evt_loop terminating...
	[2025-04-08 17:15:31,867][02736] Component Batcher_0 stopped!
	[2025-04-08 17:15:31,874][02935] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:15:31,931][02948] Weights refcount: 2 0
	[2025-04-08 17:15:31,935][02736] Component InferenceWorker_p0-w0 stopped!
	[2025-04-08 17:15:31,935][02948] Stopping InferenceWorker_p0-w0...
	[2025-04-08 17:15:31,940][02948] Loop inference_proc0-0_evt_loop terminating...
	[2025-04-08 17:15:31,992][02935] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000779_3190784.pth
	[2025-04-08 17:15:32,001][02935] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:15:32,184][02935] Stopping LearnerWorker_p0...
	[2025-04-08 17:15:32,185][02935] Loop learner_proc0_evt_loop terminating...
	[2025-04-08 17:15:32,187][02736] Component LearnerWorker_p0 stopped!
	[2025-04-08 17:15:32,287][02950] Stopping RolloutWorker_w1...
	[2025-04-08 17:15:32,288][02950] Loop rollout_proc1_evt_loop terminating...
	[2025-04-08 17:15:32,287][02736] Component RolloutWorker_w1 stopped!
	[2025-04-08 17:15:32,307][02956] Stopping RolloutWorker_w7...
	[2025-04-08 17:15:32,307][02956] Loop rollout_proc7_evt_loop terminating...
	[2025-04-08 17:15:32,307][02736] Component RolloutWorker_w7 stopped!
	[2025-04-08 17:15:32,320][02954] Stopping RolloutWorker_w5...
	[2025-04-08 17:15:32,320][02736] Component RolloutWorker_w5 stopped!
	[2025-04-08 17:15:32,324][02954] Loop rollout_proc5_evt_loop terminating...
	[2025-04-08 17:15:32,327][02736] Component RolloutWorker_w6 stopped!
	[2025-04-08 17:15:32,328][02955] Stopping RolloutWorker_w6...
	[2025-04-08 17:15:32,329][02955] Loop rollout_proc6_evt_loop terminating...
	[2025-04-08 17:15:32,345][02736] Component RolloutWorker_w0 stopped!
	[2025-04-08 17:15:32,347][02949] Stopping RolloutWorker_w0...
	[2025-04-08 17:15:32,354][02952] Stopping RolloutWorker_w3...
	[2025-04-08 17:15:32,354][02736] Component RolloutWorker_w3 stopped!
	[2025-04-08 17:15:32,350][02949] Loop rollout_proc0_evt_loop terminating...
	[2025-04-08 17:15:32,355][02952] Loop rollout_proc3_evt_loop terminating...
	[2025-04-08 17:15:32,377][02736] Component RolloutWorker_w2 stopped!
	[2025-04-08 17:15:32,378][02951] Stopping RolloutWorker_w2...
	[2025-04-08 17:15:32,383][02951] Loop rollout_proc2_evt_loop terminating...
	[2025-04-08 17:15:32,390][02736] Component RolloutWorker_w4 stopped!
	[2025-04-08 17:15:32,391][02736] Waiting for process learner_proc0 to stop...
	[2025-04-08 17:15:32,394][02953] Stopping RolloutWorker_w4...
	[2025-04-08 17:15:32,400][02953] Loop rollout_proc4_evt_loop terminating...
	[2025-04-08 17:15:34,302][02736] Waiting for process inference_proc0-0 to join...
	[2025-04-08 17:15:34,303][02736] Waiting for process rollout_proc0 to join...
	[2025-04-08 17:15:36,509][02736] Waiting for process rollout_proc1 to join...
	[2025-04-08 17:15:36,511][02736] Waiting for process rollout_proc2 to join...
	[2025-04-08 17:15:36,512][02736] Waiting for process rollout_proc3 to join...
	[2025-04-08 17:15:36,513][02736] Waiting for process rollout_proc4 to join...
	[2025-04-08 17:15:36,515][02736] Waiting for process rollout_proc5 to join...
	[2025-04-08 17:15:36,519][02736] Waiting for process rollout_proc6 to join...
	[2025-04-08 17:15:36,523][02736] Waiting for process rollout_proc7 to join...
	[2025-04-08 17:15:36,525][02736] Batcher 0 profile tree view:
	batching: 26.1460, releasing_batches: 0.0291
	[2025-04-08 17:15:36,526][02736] InferenceWorker_p0-w0 profile tree view:
	wait_policy: 0.0000
	wait_policy_total: 381.2002
	update_model: 8.6767
	weight_update: 0.0019
	one_step: 0.0058
	handle_policy_step: 583.3812
	deserialize: 14.4878, stack: 3.1639, obs_to_device_normalize: 123.0522, forward: 300.4311, send_messages: 28.5172
	prepare_outputs: 88.0724
	to_cpu: 53.9278
	[2025-04-08 17:15:36,528][02736] Learner 0 profile tree view:
	misc: 0.0036, prepare_batch: 13.0368
	train: 73.6850
	epoch_init: 0.0092, minibatch_init: 0.0056, losses_postprocess: 0.5932, kl_divergence: 0.6354, after_optimizer: 33.5808
	calculate_losses: 26.4430
	losses_init: 0.0034, forward_head: 1.3648, bptt_initial: 17.2912, tail: 1.1170, advantages_returns: 0.3002, losses: 3.8731
	bptt: 2.2211
	bptt_forward_core: 2.1263
	update: 11.8079
	clip: 1.0102
	[2025-04-08 17:15:36,529][02736] RolloutWorker_w0 profile tree view:
	wait_for_trajectories: 0.2383, enqueue_policy_requests: 77.6868, env_step: 814.4551, overhead: 12.1996, complete_rollouts: 6.9115
	save_policy_outputs: 19.1450
	split_output_tensors: 7.3124
	[2025-04-08 17:15:36,530][02736] RolloutWorker_w7 profile tree view:
	wait_for_trajectories: 0.3000, enqueue_policy_requests: 87.1055, env_step: 802.0956, overhead: 12.9610, complete_rollouts: 5.8610
	save_policy_outputs: 18.3994
	split_output_tensors: 7.1962
	[2025-04-08 17:15:36,531][02736] Loop Runner_EvtLoop terminating...
	[2025-04-08 17:15:36,533][02736] Runner profile tree view:
	main_loop: 1039.2725
	[2025-04-08 17:15:36,534][02736] Collected {0: 4005888}, FPS: 3854.5
	[2025-04-08 17:15:37,040][02736] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-04-08 17:15:37,042][02736] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-04-08 17:15:37,043][02736] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-04-08 17:15:37,043][02736] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-04-08 17:15:37,045][02736] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-04-08 17:15:37,046][02736] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-04-08 17:15:37,048][02736] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-04-08 17:15:37,048][02736] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-04-08 17:15:37,049][02736] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-04-08 17:15:37,050][02736] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-04-08 17:15:37,051][02736] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-04-08 17:15:37,054][02736] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-04-08 17:15:37,055][02736] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-04-08 17:15:37,055][02736] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-04-08 17:15:37,056][02736] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-04-08 17:15:37,084][02736] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-04-08 17:15:37,087][02736] RunningMeanStd input shape: (3, 72, 128)
	[2025-04-08 17:15:37,089][02736] RunningMeanStd input shape: (1,)
	[2025-04-08 17:15:37,103][02736] ConvEncoder: input_channels=3
	[2025-04-08 17:15:37,202][02736] Conv encoder output size: 512
	[2025-04-08 17:15:37,203][02736] Policy head output size: 512
	[2025-04-08 17:15:37,476][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:15:37,478][02736] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:15:37,483][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:15:37,485][02736] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:15:37,486][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:15:37,488][02736] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:15:56,078][02736] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-04-08 17:15:56,079][02736] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-04-08 17:15:56,080][02736] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-04-08 17:15:56,081][02736] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-04-08 17:15:56,082][02736] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-04-08 17:15:56,083][02736] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-04-08 17:15:56,084][02736] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-04-08 17:15:56,085][02736] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-04-08 17:15:56,085][02736] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-04-08 17:15:56,086][02736] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-04-08 17:15:56,087][02736] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-04-08 17:15:56,089][02736] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-04-08 17:15:56,090][02736] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-04-08 17:15:56,090][02736] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-04-08 17:15:56,091][02736] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-04-08 17:15:56,119][02736] RunningMeanStd input shape: (3, 72, 128)
	[2025-04-08 17:15:56,121][02736] RunningMeanStd input shape: (1,)
	[2025-04-08 17:15:56,131][02736] ConvEncoder: input_channels=3
	[2025-04-08 17:15:56,167][02736] Conv encoder output size: 512
	[2025-04-08 17:15:56,167][02736] Policy head output size: 512
	[2025-04-08 17:15:56,185][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:15:56,186][02736] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:15:56,187][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:15:56,189][02736] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:15:56,190][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:15:56,192][02736] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:18:05,983][02736] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-04-08 17:18:05,984][02736] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-04-08 17:18:05,984][02736] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-04-08 17:18:05,986][02736] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-04-08 17:18:05,987][02736] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-04-08 17:18:05,987][02736] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-04-08 17:18:05,988][02736] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
	[2025-04-08 17:18:05,989][02736] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-04-08 17:18:05,990][02736] Adding new argument 'push_to_hub'=True that is not in the saved config file!
	[2025-04-08 17:18:05,991][02736] Adding new argument 'hf_repository'='xwind/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
	[2025-04-08 17:18:05,992][02736] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-04-08 17:18:05,993][02736] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-04-08 17:18:05,993][02736] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-04-08 17:18:05,994][02736] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-04-08 17:18:05,995][02736] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-04-08 17:18:06,018][02736] RunningMeanStd input shape: (3, 72, 128)
	[2025-04-08 17:18:06,023][02736] RunningMeanStd input shape: (1,)
	[2025-04-08 17:18:06,041][02736] ConvEncoder: input_channels=3
	[2025-04-08 17:18:06,072][02736] Conv encoder output size: 512
	[2025-04-08 17:18:06,073][02736] Policy head output size: 512
	[2025-04-08 17:18:06,093][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:18:06,095][02736] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:18:06,097][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:18:06,099][02736] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:18:06,100][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:18:06,101][02736] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:26:04,454][02736] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-04-08 17:26:04,455][02736] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-04-08 17:26:04,457][02736] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-04-08 17:26:04,458][02736] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-04-08 17:26:04,459][02736] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-04-08 17:26:04,459][02736] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-04-08 17:26:04,460][02736] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
	[2025-04-08 17:26:04,461][02736] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-04-08 17:26:04,462][02736] Adding new argument 'push_to_hub'=True that is not in the saved config file!
	[2025-04-08 17:26:04,463][02736] Adding new argument 'hf_repository'='xwind/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
	[2025-04-08 17:26:04,464][02736] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-04-08 17:26:04,465][02736] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-04-08 17:26:04,466][02736] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-04-08 17:26:04,467][02736] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-04-08 17:26:04,469][02736] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-04-08 17:26:04,500][02736] RunningMeanStd input shape: (3, 72, 128)
	[2025-04-08 17:26:04,501][02736] RunningMeanStd input shape: (1,)
	[2025-04-08 17:26:04,511][02736] ConvEncoder: input_channels=3
	[2025-04-08 17:26:04,544][02736] Conv encoder output size: 512
	[2025-04-08 17:26:04,545][02736] Policy head output size: 512
	[2025-04-08 17:26:04,563][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:26:04,565][02736] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:26:04,567][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:26:04,568][02736] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:26:04,569][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:26:04,571][02736] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:31:00,568][02736] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-04-08 17:31:00,570][02736] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-04-08 17:31:00,571][02736] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-04-08 17:31:00,572][02736] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-04-08 17:31:00,573][02736] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-04-08 17:31:00,574][02736] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-04-08 17:31:00,575][02736] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
	[2025-04-08 17:31:00,575][02736] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-04-08 17:31:00,576][02736] Adding new argument 'push_to_hub'=True that is not in the saved config file!
	[2025-04-08 17:31:00,577][02736] Adding new argument 'hf_repository'='xwind/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
	[2025-04-08 17:31:00,578][02736] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-04-08 17:31:00,579][02736] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-04-08 17:31:00,580][02736] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-04-08 17:31:00,581][02736] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-04-08 17:31:00,582][02736] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-04-08 17:31:00,614][02736] RunningMeanStd input shape: (3, 72, 128)
	[2025-04-08 17:31:00,616][02736] RunningMeanStd input shape: (1,)
	[2025-04-08 17:31:00,625][02736] ConvEncoder: input_channels=3
	[2025-04-08 17:31:00,661][02736] Conv encoder output size: 512
	[2025-04-08 17:31:00,662][02736] Policy head output size: 512
	[2025-04-08 17:31:00,682][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:31:00,684][02736] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:31:00,685][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:31:00,687][02736] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-04-08 17:31:00,688][02736] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-04-08 17:31:00,689][02736] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.