HFRLC_U8_health_gathering_supreme / sf_log.txt

Upload folder using huggingface_hub

51d996e verified 29 days ago

151 kB

	[2025-03-22 15:39:06,957][03219] Saving configuration to /content/train_dir/default_experiment/config.json...
	[2025-03-22 15:39:06,959][03219] Rollout worker 0 uses device cpu
	[2025-03-22 15:39:06,960][03219] Rollout worker 1 uses device cpu
	[2025-03-22 15:39:06,961][03219] Rollout worker 2 uses device cpu
	[2025-03-22 15:39:06,962][03219] Rollout worker 3 uses device cpu
	[2025-03-22 15:39:06,963][03219] Rollout worker 4 uses device cpu
	[2025-03-22 15:39:06,964][03219] Rollout worker 5 uses device cpu
	[2025-03-22 15:39:06,966][03219] Rollout worker 6 uses device cpu
	[2025-03-22 15:39:06,967][03219] Rollout worker 7 uses device cpu
	[2025-03-22 15:39:07,114][03219] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-22 15:39:07,114][03219] InferenceWorker_p0-w0: min num requests: 2
	[2025-03-22 15:39:07,146][03219] Starting all processes...
	[2025-03-22 15:39:07,147][03219] Starting process learner_proc0
	[2025-03-22 15:39:07,199][03219] Starting all processes...
	[2025-03-22 15:39:07,208][03219] Starting process inference_proc0-0
	[2025-03-22 15:39:07,209][03219] Starting process rollout_proc0
	[2025-03-22 15:39:07,209][03219] Starting process rollout_proc1
	[2025-03-22 15:39:07,211][03219] Starting process rollout_proc2
	[2025-03-22 15:39:07,211][03219] Starting process rollout_proc3
	[2025-03-22 15:39:07,211][03219] Starting process rollout_proc4
	[2025-03-22 15:39:07,211][03219] Starting process rollout_proc5
	[2025-03-22 15:39:07,211][03219] Starting process rollout_proc6
	[2025-03-22 15:39:07,211][03219] Starting process rollout_proc7
	[2025-03-22 15:39:25,225][03414] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-22 15:39:25,225][03414] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2025-03-22 15:39:25,310][03414] Num visible devices: 1
	[2025-03-22 15:39:25,326][03414] Starting seed is not provided
	[2025-03-22 15:39:25,326][03414] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-22 15:39:25,326][03414] Initializing actor-critic model on device cuda:0
	[2025-03-22 15:39:25,327][03414] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 15:39:25,332][03414] RunningMeanStd input shape: (1,)
	[2025-03-22 15:39:25,417][03414] ConvEncoder: input_channels=3
	[2025-03-22 15:39:25,484][03428] Worker 1 uses CPU cores [1]
	[2025-03-22 15:39:25,765][03429] Worker 0 uses CPU cores [0]
	[2025-03-22 15:39:25,873][03435] Worker 7 uses CPU cores [1]
	[2025-03-22 15:39:25,888][03432] Worker 4 uses CPU cores [0]
	[2025-03-22 15:39:25,911][03431] Worker 3 uses CPU cores [1]
	[2025-03-22 15:39:25,921][03434] Worker 6 uses CPU cores [0]
	[2025-03-22 15:39:25,954][03430] Worker 2 uses CPU cores [0]
	[2025-03-22 15:39:25,971][03427] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-22 15:39:25,972][03427] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
	[2025-03-22 15:39:25,998][03427] Num visible devices: 1
	[2025-03-22 15:39:26,039][03433] Worker 5 uses CPU cores [1]
	[2025-03-22 15:39:26,057][03414] Conv encoder output size: 512
	[2025-03-22 15:39:26,057][03414] Policy head output size: 512
	[2025-03-22 15:39:26,111][03414] Created Actor Critic model with architecture:
	[2025-03-22 15:39:26,111][03414] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2025-03-22 15:39:26,358][03414] Using optimizer <class 'torch.optim.adam.Adam'>
	[2025-03-22 15:39:27,108][03219] Heartbeat connected on Batcher_0
	[2025-03-22 15:39:27,114][03219] Heartbeat connected on InferenceWorker_p0-w0
	[2025-03-22 15:39:27,121][03219] Heartbeat connected on RolloutWorker_w0
	[2025-03-22 15:39:27,125][03219] Heartbeat connected on RolloutWorker_w1
	[2025-03-22 15:39:27,132][03219] Heartbeat connected on RolloutWorker_w3
	[2025-03-22 15:39:27,132][03219] Heartbeat connected on RolloutWorker_w2
	[2025-03-22 15:39:27,135][03219] Heartbeat connected on RolloutWorker_w4
	[2025-03-22 15:39:27,142][03219] Heartbeat connected on RolloutWorker_w6
	[2025-03-22 15:39:27,144][03219] Heartbeat connected on RolloutWorker_w5
	[2025-03-22 15:39:27,146][03219] Heartbeat connected on RolloutWorker_w7
	[2025-03-22 15:39:30,747][03414] No checkpoints found
	[2025-03-22 15:39:30,747][03414] Did not load from checkpoint, starting from scratch!
	[2025-03-22 15:39:30,747][03414] Initialized policy 0 weights for model version 0
	[2025-03-22 15:39:30,750][03414] LearnerWorker_p0 finished initialization!
	[2025-03-22 15:39:30,751][03219] Heartbeat connected on LearnerWorker_p0
	[2025-03-22 15:39:30,753][03414] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-22 15:39:30,924][03427] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 15:39:30,926][03427] RunningMeanStd input shape: (1,)
	[2025-03-22 15:39:30,938][03427] ConvEncoder: input_channels=3
	[2025-03-22 15:39:31,041][03427] Conv encoder output size: 512
	[2025-03-22 15:39:31,041][03427] Policy head output size: 512
	[2025-03-22 15:39:31,077][03219] Inference worker 0-0 is ready!
	[2025-03-22 15:39:31,077][03219] All inference workers are ready! Signal rollout workers to start!
	[2025-03-22 15:39:31,357][03431] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 15:39:31,403][03433] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 15:39:31,434][03435] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 15:39:31,467][03434] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 15:39:31,528][03432] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 15:39:31,532][03430] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 15:39:31,565][03429] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 15:39:31,573][03428] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 15:39:32,776][03219] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-22 15:39:32,877][03431] Decorrelating experience for 0 frames...
	[2025-03-22 15:39:32,879][03432] Decorrelating experience for 0 frames...
	[2025-03-22 15:39:32,879][03435] Decorrelating experience for 0 frames...
	[2025-03-22 15:39:32,877][03434] Decorrelating experience for 0 frames...
	[2025-03-22 15:39:33,670][03432] Decorrelating experience for 32 frames...
	[2025-03-22 15:39:33,673][03434] Decorrelating experience for 32 frames...
	[2025-03-22 15:39:34,201][03431] Decorrelating experience for 32 frames...
	[2025-03-22 15:39:34,203][03435] Decorrelating experience for 32 frames...
	[2025-03-22 15:39:34,198][03433] Decorrelating experience for 0 frames...
	[2025-03-22 15:39:34,505][03434] Decorrelating experience for 64 frames...
	[2025-03-22 15:39:35,059][03428] Decorrelating experience for 0 frames...
	[2025-03-22 15:39:35,622][03432] Decorrelating experience for 64 frames...
	[2025-03-22 15:39:35,628][03430] Decorrelating experience for 0 frames...
	[2025-03-22 15:39:35,967][03433] Decorrelating experience for 32 frames...
	[2025-03-22 15:39:36,175][03434] Decorrelating experience for 96 frames...
	[2025-03-22 15:39:36,604][03428] Decorrelating experience for 32 frames...
	[2025-03-22 15:39:37,235][03430] Decorrelating experience for 32 frames...
	[2025-03-22 15:39:37,404][03431] Decorrelating experience for 64 frames...
	[2025-03-22 15:39:37,776][03219] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-22 15:39:39,376][03435] Decorrelating experience for 64 frames...
	[2025-03-22 15:39:39,374][03428] Decorrelating experience for 64 frames...
	[2025-03-22 15:39:39,551][03433] Decorrelating experience for 64 frames...
	[2025-03-22 15:39:39,671][03431] Decorrelating experience for 96 frames...
	[2025-03-22 15:39:41,468][03432] Decorrelating experience for 96 frames...
	[2025-03-22 15:39:41,852][03435] Decorrelating experience for 96 frames...
	[2025-03-22 15:39:41,904][03428] Decorrelating experience for 96 frames...
	[2025-03-22 15:39:42,116][03433] Decorrelating experience for 96 frames...
	[2025-03-22 15:39:42,227][03430] Decorrelating experience for 64 frames...
	[2025-03-22 15:39:42,776][03219] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 72.6. Samples: 726. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-22 15:39:42,779][03219] Avg episode reward: [(0, '3.513')]
	[2025-03-22 15:39:44,152][03414] Signal inference workers to stop experience collection...
	[2025-03-22 15:39:44,171][03427] InferenceWorker_p0-w0: stopping experience collection
	[2025-03-22 15:39:44,367][03430] Decorrelating experience for 96 frames...
	[2025-03-22 15:39:44,720][03414] Signal inference workers to resume experience collection...
	[2025-03-22 15:39:44,720][03427] InferenceWorker_p0-w0: resuming experience collection
	[2025-03-22 15:39:47,776][03219] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 12288. Throughput: 0: 204.9. Samples: 3074. Policy #0 lag: (min: 0.0, avg: 1.1, max: 2.0)
	[2025-03-22 15:39:47,779][03219] Avg episode reward: [(0, '3.337')]
	[2025-03-22 15:39:52,778][03219] Fps is (10 sec: 2866.6, 60 sec: 1433.4, 300 sec: 1433.4). Total num frames: 28672. Throughput: 0: 374.5. Samples: 7490. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:39:52,782][03219] Avg episode reward: [(0, '3.873')]
	[2025-03-22 15:39:55,703][03427] Updated weights for policy 0, policy_version 10 (0.0022)
	[2025-03-22 15:39:57,776][03219] Fps is (10 sec: 3686.4, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 49152. Throughput: 0: 399.8. Samples: 9994. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:39:57,779][03219] Avg episode reward: [(0, '4.432')]
	[2025-03-22 15:40:02,776][03219] Fps is (10 sec: 4097.0, 60 sec: 2321.1, 300 sec: 2321.1). Total num frames: 69632. Throughput: 0: 557.7. Samples: 16730. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:40:02,781][03219] Avg episode reward: [(0, '4.395')]
	[2025-03-22 15:40:05,553][03427] Updated weights for policy 0, policy_version 20 (0.0020)
	[2025-03-22 15:40:07,776][03219] Fps is (10 sec: 3686.4, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 86016. Throughput: 0: 629.8. Samples: 22042. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:40:07,780][03219] Avg episode reward: [(0, '4.258')]
	[2025-03-22 15:40:12,776][03219] Fps is (10 sec: 3686.4, 60 sec: 2662.4, 300 sec: 2662.4). Total num frames: 106496. Throughput: 0: 624.3. Samples: 24972. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:40:12,778][03219] Avg episode reward: [(0, '4.416')]
	[2025-03-22 15:40:12,786][03414] Saving new best policy, reward=4.416!
	[2025-03-22 15:40:15,677][03427] Updated weights for policy 0, policy_version 30 (0.0022)
	[2025-03-22 15:40:17,776][03219] Fps is (10 sec: 4505.7, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 131072. Throughput: 0: 708.5. Samples: 31884. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2025-03-22 15:40:17,778][03219] Avg episode reward: [(0, '4.478')]
	[2025-03-22 15:40:17,780][03414] Saving new best policy, reward=4.478!
	[2025-03-22 15:40:22,776][03219] Fps is (10 sec: 3686.4, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 143360. Throughput: 0: 818.6. Samples: 36838. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:40:22,779][03219] Avg episode reward: [(0, '4.385')]
	[2025-03-22 15:40:26,491][03427] Updated weights for policy 0, policy_version 40 (0.0031)
	[2025-03-22 15:40:27,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 871.2. Samples: 39928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-22 15:40:27,783][03219] Avg episode reward: [(0, '4.331')]
	[2025-03-22 15:40:32,776][03219] Fps is (10 sec: 4915.2, 60 sec: 3208.5, 300 sec: 3208.5). Total num frames: 192512. Throughput: 0: 973.1. Samples: 46862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-22 15:40:32,779][03219] Avg episode reward: [(0, '4.434')]
	[2025-03-22 15:40:36,747][03427] Updated weights for policy 0, policy_version 50 (0.0029)
	[2025-03-22 15:40:37,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3413.4, 300 sec: 3150.8). Total num frames: 204800. Throughput: 0: 986.2. Samples: 51866. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:40:37,781][03219] Avg episode reward: [(0, '4.430')]
	[2025-03-22 15:40:42,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3218.3). Total num frames: 225280. Throughput: 0: 998.4. Samples: 54922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:40:42,778][03219] Avg episode reward: [(0, '4.420')]
	[2025-03-22 15:40:46,680][03427] Updated weights for policy 0, policy_version 60 (0.0032)
	[2025-03-22 15:40:47,779][03219] Fps is (10 sec: 4094.9, 60 sec: 3891.0, 300 sec: 3276.7). Total num frames: 245760. Throughput: 0: 991.7. Samples: 61360. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:40:47,780][03219] Avg episode reward: [(0, '4.594')]
	[2025-03-22 15:40:47,821][03414] Saving new best policy, reward=4.594!
	[2025-03-22 15:40:52,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3225.6). Total num frames: 258048. Throughput: 0: 964.3. Samples: 65436. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:40:52,777][03219] Avg episode reward: [(0, '4.473')]
	[2025-03-22 15:40:57,776][03219] Fps is (10 sec: 3687.4, 60 sec: 3891.2, 300 sec: 3325.0). Total num frames: 282624. Throughput: 0: 967.1. Samples: 68490. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:40:57,780][03219] Avg episode reward: [(0, '4.183')]
	[2025-03-22 15:40:58,564][03427] Updated weights for policy 0, policy_version 70 (0.0019)
	[2025-03-22 15:41:02,779][03219] Fps is (10 sec: 4504.2, 60 sec: 3891.0, 300 sec: 3367.7). Total num frames: 303104. Throughput: 0: 957.8. Samples: 74988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:41:02,780][03219] Avg episode reward: [(0, '4.145')]
	[2025-03-22 15:41:02,785][03414] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth...
	[2025-03-22 15:41:07,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3319.9). Total num frames: 315392. Throughput: 0: 942.9. Samples: 79268. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:41:07,779][03219] Avg episode reward: [(0, '4.476')]
	[2025-03-22 15:41:10,003][03427] Updated weights for policy 0, policy_version 80 (0.0027)
	[2025-03-22 15:41:12,776][03219] Fps is (10 sec: 3687.6, 60 sec: 3891.2, 300 sec: 3399.7). Total num frames: 339968. Throughput: 0: 946.0. Samples: 82496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:41:12,779][03219] Avg episode reward: [(0, '4.634')]
	[2025-03-22 15:41:12,784][03414] Saving new best policy, reward=4.634!
	[2025-03-22 15:41:17,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 931.1. Samples: 88760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:41:17,785][03219] Avg episode reward: [(0, '4.548')]
	[2025-03-22 15:41:21,336][03427] Updated weights for policy 0, policy_version 90 (0.0021)
	[2025-03-22 15:41:22,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3388.5). Total num frames: 372736. Throughput: 0: 918.7. Samples: 93208. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:41:22,778][03219] Avg episode reward: [(0, '4.506')]
	[2025-03-22 15:41:27,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3419.3). Total num frames: 393216. Throughput: 0: 925.6. Samples: 96574. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:41:27,781][03219] Avg episode reward: [(0, '4.727')]
	[2025-03-22 15:41:27,784][03414] Saving new best policy, reward=4.727!
	[2025-03-22 15:41:30,756][03427] Updated weights for policy 0, policy_version 100 (0.0025)
	[2025-03-22 15:41:32,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3447.5). Total num frames: 413696. Throughput: 0: 931.3. Samples: 103264. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:41:32,778][03219] Avg episode reward: [(0, '4.635')]
	[2025-03-22 15:41:37,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3440.6). Total num frames: 430080. Throughput: 0: 943.1. Samples: 107874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:41:37,778][03219] Avg episode reward: [(0, '4.534')]
	[2025-03-22 15:41:42,159][03427] Updated weights for policy 0, policy_version 110 (0.0033)
	[2025-03-22 15:41:42,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3465.8). Total num frames: 450560. Throughput: 0: 945.8. Samples: 111052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:41:42,778][03219] Avg episode reward: [(0, '4.657')]
	[2025-03-22 15:41:47,776][03219] Fps is (10 sec: 4095.9, 60 sec: 3754.8, 300 sec: 3489.2). Total num frames: 471040. Throughput: 0: 940.5. Samples: 117306. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:41:47,778][03219] Avg episode reward: [(0, '4.765')]
	[2025-03-22 15:41:47,786][03414] Saving new best policy, reward=4.765!
	[2025-03-22 15:41:52,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3452.3). Total num frames: 483328. Throughput: 0: 937.7. Samples: 121464. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:41:52,780][03219] Avg episode reward: [(0, '4.668')]
	[2025-03-22 15:41:53,776][03427] Updated weights for policy 0, policy_version 120 (0.0039)
	[2025-03-22 15:41:57,776][03219] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3474.5). Total num frames: 503808. Throughput: 0: 935.5. Samples: 124592. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:41:57,777][03219] Avg episode reward: [(0, '4.507')]
	[2025-03-22 15:42:02,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3686.6, 300 sec: 3495.3). Total num frames: 524288. Throughput: 0: 940.9. Samples: 131102. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:42:02,778][03219] Avg episode reward: [(0, '4.626')]
	[2025-03-22 15:42:04,772][03427] Updated weights for policy 0, policy_version 130 (0.0019)
	[2025-03-22 15:42:07,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3488.2). Total num frames: 540672. Throughput: 0: 936.0. Samples: 135326. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:42:07,778][03219] Avg episode reward: [(0, '4.887')]
	[2025-03-22 15:42:07,782][03414] Saving new best policy, reward=4.887!
	[2025-03-22 15:42:12,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3507.2). Total num frames: 561152. Throughput: 0: 929.6. Samples: 138406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:42:12,781][03219] Avg episode reward: [(0, '4.708')]
	[2025-03-22 15:42:15,292][03427] Updated weights for policy 0, policy_version 140 (0.0013)
	[2025-03-22 15:42:17,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3500.2). Total num frames: 577536. Throughput: 0: 920.8. Samples: 144700. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:42:17,778][03219] Avg episode reward: [(0, '4.558')]
	[2025-03-22 15:42:22,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3493.6). Total num frames: 593920. Throughput: 0: 909.3. Samples: 148792. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:42:22,778][03219] Avg episode reward: [(0, '4.584')]
	[2025-03-22 15:42:27,078][03427] Updated weights for policy 0, policy_version 150 (0.0024)
	[2025-03-22 15:42:27,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3510.9). Total num frames: 614400. Throughput: 0: 908.3. Samples: 151926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:42:27,777][03219] Avg episode reward: [(0, '4.658')]
	[2025-03-22 15:42:32,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3527.1). Total num frames: 634880. Throughput: 0: 910.3. Samples: 158270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:42:32,780][03219] Avg episode reward: [(0, '4.507')]
	[2025-03-22 15:42:37,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3498.2). Total num frames: 647168. Throughput: 0: 911.0. Samples: 162458. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:42:37,779][03219] Avg episode reward: [(0, '4.625')]
	[2025-03-22 15:42:38,832][03427] Updated weights for policy 0, policy_version 160 (0.0018)
	[2025-03-22 15:42:42,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3535.5). Total num frames: 671744. Throughput: 0: 912.0. Samples: 165632. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:42:42,781][03219] Avg episode reward: [(0, '4.971')]
	[2025-03-22 15:42:42,788][03414] Saving new best policy, reward=4.971!
	[2025-03-22 15:42:47,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3528.9). Total num frames: 688128. Throughput: 0: 906.3. Samples: 171886. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:42:47,777][03219] Avg episode reward: [(0, '5.008')]
	[2025-03-22 15:42:47,780][03414] Saving new best policy, reward=5.008!
	[2025-03-22 15:42:50,378][03427] Updated weights for policy 0, policy_version 170 (0.0015)
	[2025-03-22 15:42:52,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3522.6). Total num frames: 704512. Throughput: 0: 901.6. Samples: 175898. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:42:52,778][03219] Avg episode reward: [(0, '5.009')]
	[2025-03-22 15:42:52,789][03414] Saving new best policy, reward=5.009!
	[2025-03-22 15:42:57,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3536.5). Total num frames: 724992. Throughput: 0: 901.2. Samples: 178962. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-22 15:42:57,782][03219] Avg episode reward: [(0, '4.845')]
	[2025-03-22 15:43:00,690][03427] Updated weights for policy 0, policy_version 180 (0.0020)
	[2025-03-22 15:43:02,779][03219] Fps is (10 sec: 3685.3, 60 sec: 3617.9, 300 sec: 3530.3). Total num frames: 741376. Throughput: 0: 904.3. Samples: 185396. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:43:02,783][03219] Avg episode reward: [(0, '4.987')]
	[2025-03-22 15:43:02,792][03414] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth...
	[2025-03-22 15:43:07,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3524.5). Total num frames: 757760. Throughput: 0: 909.0. Samples: 189698. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:43:07,781][03219] Avg episode reward: [(0, '4.844')]
	[2025-03-22 15:43:12,306][03427] Updated weights for policy 0, policy_version 190 (0.0019)
	[2025-03-22 15:43:12,776][03219] Fps is (10 sec: 3687.5, 60 sec: 3618.1, 300 sec: 3537.5). Total num frames: 778240. Throughput: 0: 909.1. Samples: 192836. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:43:12,777][03219] Avg episode reward: [(0, '4.586')]
	[2025-03-22 15:43:17,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3549.9). Total num frames: 798720. Throughput: 0: 908.0. Samples: 199132. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:43:17,777][03219] Avg episode reward: [(0, '4.607')]
	[2025-03-22 15:43:22,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3526.1). Total num frames: 811008. Throughput: 0: 908.7. Samples: 203348. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:43:22,781][03219] Avg episode reward: [(0, '4.601')]
	[2025-03-22 15:43:23,932][03427] Updated weights for policy 0, policy_version 200 (0.0016)
	[2025-03-22 15:43:27,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3555.7). Total num frames: 835584. Throughput: 0: 910.6. Samples: 206608. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:43:27,778][03219] Avg episode reward: [(0, '4.838')]
	[2025-03-22 15:43:32,776][03219] Fps is (10 sec: 4096.1, 60 sec: 3618.1, 300 sec: 3549.9). Total num frames: 851968. Throughput: 0: 922.7. Samples: 213406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:43:32,780][03219] Avg episode reward: [(0, '4.721')]
	[2025-03-22 15:43:34,379][03427] Updated weights for policy 0, policy_version 210 (0.0028)
	[2025-03-22 15:43:37,776][03219] Fps is (10 sec: 3276.7, 60 sec: 3686.4, 300 sec: 3544.3). Total num frames: 868352. Throughput: 0: 934.8. Samples: 217966. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:43:37,778][03219] Avg episode reward: [(0, '4.728')]
	[2025-03-22 15:43:42,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3571.7). Total num frames: 892928. Throughput: 0: 941.5. Samples: 221330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-22 15:43:42,781][03219] Avg episode reward: [(0, '5.023')]
	[2025-03-22 15:43:42,787][03414] Saving new best policy, reward=5.023!
	[2025-03-22 15:43:44,267][03427] Updated weights for policy 0, policy_version 220 (0.0030)
	[2025-03-22 15:43:47,776][03219] Fps is (10 sec: 4096.1, 60 sec: 3686.4, 300 sec: 3565.9). Total num frames: 909312. Throughput: 0: 934.8. Samples: 227460. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:43:47,779][03219] Avg episode reward: [(0, '4.732')]
	[2025-03-22 15:43:52,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3560.4). Total num frames: 925696. Throughput: 0: 934.9. Samples: 231770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:43:52,778][03219] Avg episode reward: [(0, '4.711')]
	[2025-03-22 15:43:56,261][03427] Updated weights for policy 0, policy_version 230 (0.0033)
	[2025-03-22 15:43:57,776][03219] Fps is (10 sec: 3686.3, 60 sec: 3686.4, 300 sec: 3570.5). Total num frames: 946176. Throughput: 0: 934.6. Samples: 234894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:43:57,778][03219] Avg episode reward: [(0, '4.790')]
	[2025-03-22 15:44:02,778][03219] Fps is (10 sec: 3685.7, 60 sec: 3686.5, 300 sec: 3565.0). Total num frames: 962560. Throughput: 0: 928.8. Samples: 240928. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:44:02,780][03219] Avg episode reward: [(0, '4.827')]
	[2025-03-22 15:44:07,776][03219] Fps is (10 sec: 3276.9, 60 sec: 3686.4, 300 sec: 3559.8). Total num frames: 978944. Throughput: 0: 933.8. Samples: 245368. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:44:07,778][03219] Avg episode reward: [(0, '4.752')]
	[2025-03-22 15:44:08,141][03427] Updated weights for policy 0, policy_version 240 (0.0023)
	[2025-03-22 15:44:12,776][03219] Fps is (10 sec: 3687.1, 60 sec: 3686.4, 300 sec: 3569.4). Total num frames: 999424. Throughput: 0: 930.1. Samples: 248464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:44:12,778][03219] Avg episode reward: [(0, '4.757')]
	[2025-03-22 15:44:17,779][03219] Fps is (10 sec: 4094.8, 60 sec: 3686.2, 300 sec: 3578.6). Total num frames: 1019904. Throughput: 0: 913.3. Samples: 254506. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:44:17,781][03219] Avg episode reward: [(0, '4.863')]
	[2025-03-22 15:44:19,210][03427] Updated weights for policy 0, policy_version 250 (0.0039)
	[2025-03-22 15:44:22,777][03219] Fps is (10 sec: 3686.3, 60 sec: 3754.6, 300 sec: 3573.4). Total num frames: 1036288. Throughput: 0: 911.5. Samples: 258984. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:44:22,778][03219] Avg episode reward: [(0, '4.586')]
	[2025-03-22 15:44:27,776][03219] Fps is (10 sec: 3687.5, 60 sec: 3686.4, 300 sec: 3582.3). Total num frames: 1056768. Throughput: 0: 907.2. Samples: 262156. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:44:27,778][03219] Avg episode reward: [(0, '4.705')]
	[2025-03-22 15:44:29,370][03427] Updated weights for policy 0, policy_version 260 (0.0017)
	[2025-03-22 15:44:32,776][03219] Fps is (10 sec: 3686.5, 60 sec: 3686.4, 300 sec: 3637.8). Total num frames: 1073152. Throughput: 0: 903.3. Samples: 268108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:44:32,781][03219] Avg episode reward: [(0, '4.830')]
	[2025-03-22 15:44:37,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1089536. Throughput: 0: 911.8. Samples: 272800. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:44:37,785][03219] Avg episode reward: [(0, '4.862')]
	[2025-03-22 15:44:41,248][03427] Updated weights for policy 0, policy_version 270 (0.0013)
	[2025-03-22 15:44:42,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1110016. Throughput: 0: 910.6. Samples: 275870. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:44:42,781][03219] Avg episode reward: [(0, '4.801')]
	[2025-03-22 15:44:47,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3721.1). Total num frames: 1126400. Throughput: 0: 904.0. Samples: 281606. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:44:47,778][03219] Avg episode reward: [(0, '4.670')]
	[2025-03-22 15:44:52,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3707.2). Total num frames: 1142784. Throughput: 0: 907.6. Samples: 286210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:44:52,782][03219] Avg episode reward: [(0, '4.722')]
	[2025-03-22 15:44:53,061][03427] Updated weights for policy 0, policy_version 280 (0.0028)
	[2025-03-22 15:44:57,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3707.2). Total num frames: 1163264. Throughput: 0: 908.5. Samples: 289346. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:44:57,781][03219] Avg episode reward: [(0, '4.662')]
	[2025-03-22 15:45:02,776][03219] Fps is (10 sec: 3686.3, 60 sec: 3618.2, 300 sec: 3707.2). Total num frames: 1179648. Throughput: 0: 902.6. Samples: 295122. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:45:02,781][03219] Avg episode reward: [(0, '4.748')]
	[2025-03-22 15:45:02,788][03414] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000288_1179648.pth...
	[2025-03-22 15:45:02,938][03414] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000074_303104.pth
	[2025-03-22 15:45:04,800][03427] Updated weights for policy 0, policy_version 290 (0.0022)
	[2025-03-22 15:45:07,777][03219] Fps is (10 sec: 3276.5, 60 sec: 3618.1, 300 sec: 3693.3). Total num frames: 1196032. Throughput: 0: 910.5. Samples: 299958. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:45:07,781][03219] Avg episode reward: [(0, '4.825')]
	[2025-03-22 15:45:12,776][03219] Fps is (10 sec: 4096.2, 60 sec: 3686.4, 300 sec: 3693.3). Total num frames: 1220608. Throughput: 0: 908.9. Samples: 303056. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:45:12,778][03219] Avg episode reward: [(0, '4.978')]
	[2025-03-22 15:45:14,324][03427] Updated weights for policy 0, policy_version 300 (0.0017)
	[2025-03-22 15:45:17,776][03219] Fps is (10 sec: 3686.7, 60 sec: 3550.1, 300 sec: 3693.3). Total num frames: 1232896. Throughput: 0: 899.4. Samples: 308580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:45:17,780][03219] Avg episode reward: [(0, '4.923')]
	[2025-03-22 15:45:22,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3618.2, 300 sec: 3679.5). Total num frames: 1253376. Throughput: 0: 902.8. Samples: 313426. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:45:22,778][03219] Avg episode reward: [(0, '4.979')]
	[2025-03-22 15:45:26,221][03427] Updated weights for policy 0, policy_version 310 (0.0039)
	[2025-03-22 15:45:27,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3618.1, 300 sec: 3665.6). Total num frames: 1273856. Throughput: 0: 905.6. Samples: 316620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:45:27,777][03219] Avg episode reward: [(0, '5.056')]
	[2025-03-22 15:45:27,781][03414] Saving new best policy, reward=5.056!
	[2025-03-22 15:45:32,778][03219] Fps is (10 sec: 3685.7, 60 sec: 3618.0, 300 sec: 3679.4). Total num frames: 1290240. Throughput: 0: 898.1. Samples: 322020. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:45:32,786][03219] Avg episode reward: [(0, '4.992')]
	[2025-03-22 15:45:37,776][03219] Fps is (10 sec: 2867.2, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 1302528. Throughput: 0: 899.2. Samples: 326674. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:45:37,781][03219] Avg episode reward: [(0, '5.153')]
	[2025-03-22 15:45:37,791][03414] Saving new best policy, reward=5.153!
	[2025-03-22 15:45:39,982][03427] Updated weights for policy 0, policy_version 320 (0.0019)
	[2025-03-22 15:45:42,776][03219] Fps is (10 sec: 2867.8, 60 sec: 3481.6, 300 sec: 3637.8). Total num frames: 1318912. Throughput: 0: 860.0. Samples: 328048. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:45:42,777][03219] Avg episode reward: [(0, '5.192')]
	[2025-03-22 15:45:42,785][03414] Saving new best policy, reward=5.192!
	[2025-03-22 15:45:47,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3651.7). Total num frames: 1335296. Throughput: 0: 850.8. Samples: 333406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:45:47,784][03219] Avg episode reward: [(0, '5.130')]
	[2025-03-22 15:45:51,818][03427] Updated weights for policy 0, policy_version 330 (0.0027)
	[2025-03-22 15:45:52,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 1355776. Throughput: 0: 863.2. Samples: 338800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:45:52,779][03219] Avg episode reward: [(0, '4.820')]
	[2025-03-22 15:45:57,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3549.9, 300 sec: 3637.8). Total num frames: 1376256. Throughput: 0: 867.5. Samples: 342094. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:45:57,781][03219] Avg episode reward: [(0, '4.730')]
	[2025-03-22 15:46:02,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3637.8). Total num frames: 1388544. Throughput: 0: 866.4. Samples: 347568. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
	[2025-03-22 15:46:02,781][03219] Avg episode reward: [(0, '4.788')]
	[2025-03-22 15:46:03,047][03427] Updated weights for policy 0, policy_version 340 (0.0025)
	[2025-03-22 15:46:07,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3637.8). Total num frames: 1413120. Throughput: 0: 884.1. Samples: 353212. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:46:07,781][03219] Avg episode reward: [(0, '4.894')]
	[2025-03-22 15:46:12,058][03427] Updated weights for policy 0, policy_version 350 (0.0018)
	[2025-03-22 15:46:12,776][03219] Fps is (10 sec: 4505.6, 60 sec: 3549.9, 300 sec: 3651.7). Total num frames: 1433600. Throughput: 0: 889.5. Samples: 356646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:46:12,781][03219] Avg episode reward: [(0, '4.585')]
	[2025-03-22 15:46:17,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 1449984. Throughput: 0: 892.6. Samples: 362184. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
	[2025-03-22 15:46:17,777][03219] Avg episode reward: [(0, '4.664')]
	[2025-03-22 15:46:22,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3651.7). Total num frames: 1470464. Throughput: 0: 920.7. Samples: 368104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:46:22,778][03219] Avg episode reward: [(0, '5.074')]
	[2025-03-22 15:46:23,090][03427] Updated weights for policy 0, policy_version 360 (0.0019)
	[2025-03-22 15:46:27,776][03219] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3665.6). Total num frames: 1495040. Throughput: 0: 965.8. Samples: 371508. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:46:27,778][03219] Avg episode reward: [(0, '5.271')]
	[2025-03-22 15:46:27,783][03414] Saving new best policy, reward=5.271!
	[2025-03-22 15:46:32,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3618.2, 300 sec: 3651.7). Total num frames: 1507328. Throughput: 0: 962.0. Samples: 376698. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
	[2025-03-22 15:46:32,782][03219] Avg episode reward: [(0, '5.093')]
	[2025-03-22 15:46:34,049][03427] Updated weights for policy 0, policy_version 370 (0.0021)
	[2025-03-22 15:46:37,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3651.7). Total num frames: 1527808. Throughput: 0: 979.5. Samples: 382876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:46:37,781][03219] Avg episode reward: [(0, '4.752')]
	[2025-03-22 15:46:42,776][03219] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3665.6). Total num frames: 1552384. Throughput: 0: 975.6. Samples: 385996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:46:42,783][03219] Avg episode reward: [(0, '5.007')]
	[2025-03-22 15:46:44,027][03427] Updated weights for policy 0, policy_version 380 (0.0021)
	[2025-03-22 15:46:47,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3679.5). Total num frames: 1568768. Throughput: 0: 965.6. Samples: 391018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:46:47,778][03219] Avg episode reward: [(0, '4.844')]
	[2025-03-22 15:46:52,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3679.5). Total num frames: 1589248. Throughput: 0: 981.4. Samples: 397374. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:46:52,777][03219] Avg episode reward: [(0, '4.992')]
	[2025-03-22 15:46:54,330][03427] Updated weights for policy 0, policy_version 390 (0.0019)
	[2025-03-22 15:46:57,779][03219] Fps is (10 sec: 4094.8, 60 sec: 3891.0, 300 sec: 3679.4). Total num frames: 1609728. Throughput: 0: 981.6. Samples: 400822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-22 15:46:57,780][03219] Avg episode reward: [(0, '4.914')]
	[2025-03-22 15:47:02,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3679.5). Total num frames: 1626112. Throughput: 0: 970.2. Samples: 405844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:47:02,782][03219] Avg episode reward: [(0, '4.665')]
	[2025-03-22 15:47:02,790][03414] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000397_1626112.pth...
	[2025-03-22 15:47:02,926][03414] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000181_741376.pth
	[2025-03-22 15:47:05,137][03427] Updated weights for policy 0, policy_version 400 (0.0025)
	[2025-03-22 15:47:07,776][03219] Fps is (10 sec: 4097.2, 60 sec: 3959.5, 300 sec: 3693.3). Total num frames: 1650688. Throughput: 0: 989.3. Samples: 412622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:47:07,777][03219] Avg episode reward: [(0, '4.715')]
	[2025-03-22 15:47:12,776][03219] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3707.2). Total num frames: 1671168. Throughput: 0: 991.1. Samples: 416106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:47:12,777][03219] Avg episode reward: [(0, '5.115')]
	[2025-03-22 15:47:15,269][03427] Updated weights for policy 0, policy_version 410 (0.0019)
	[2025-03-22 15:47:17,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3707.2). Total num frames: 1687552. Throughput: 0: 982.4. Samples: 420906. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:47:17,778][03219] Avg episode reward: [(0, '5.374')]
	[2025-03-22 15:47:17,779][03414] Saving new best policy, reward=5.374!
	[2025-03-22 15:47:22,776][03219] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3707.2). Total num frames: 1708032. Throughput: 0: 991.1. Samples: 427476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:47:22,783][03219] Avg episode reward: [(0, '5.421')]
	[2025-03-22 15:47:22,796][03414] Saving new best policy, reward=5.421!
	[2025-03-22 15:47:24,741][03427] Updated weights for policy 0, policy_version 420 (0.0016)
	[2025-03-22 15:47:27,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3707.2). Total num frames: 1728512. Throughput: 0: 996.0. Samples: 430818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:47:27,779][03219] Avg episode reward: [(0, '5.277')]
	[2025-03-22 15:47:32,776][03219] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3721.1). Total num frames: 1744896. Throughput: 0: 991.2. Samples: 435624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:47:32,781][03219] Avg episode reward: [(0, '5.300')]
	[2025-03-22 15:47:35,455][03427] Updated weights for policy 0, policy_version 430 (0.0015)
	[2025-03-22 15:47:37,776][03219] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3721.1). Total num frames: 1769472. Throughput: 0: 1007.5. Samples: 442712. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:47:37,778][03219] Avg episode reward: [(0, '5.463')]
	[2025-03-22 15:47:37,780][03414] Saving new best policy, reward=5.463!
	[2025-03-22 15:47:42,776][03219] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 1789952. Throughput: 0: 1005.4. Samples: 446062. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:47:42,778][03219] Avg episode reward: [(0, '5.501')]
	[2025-03-22 15:47:42,784][03414] Saving new best policy, reward=5.501!
	[2025-03-22 15:47:46,208][03427] Updated weights for policy 0, policy_version 440 (0.0026)
	[2025-03-22 15:47:47,776][03219] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3735.0). Total num frames: 1806336. Throughput: 0: 1000.7. Samples: 450874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:47:47,778][03219] Avg episode reward: [(0, '5.582')]
	[2025-03-22 15:47:47,781][03414] Saving new best policy, reward=5.582!
	[2025-03-22 15:47:52,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3748.9). Total num frames: 1830912. Throughput: 0: 1000.5. Samples: 457644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:47:52,777][03219] Avg episode reward: [(0, '5.768')]
	[2025-03-22 15:47:52,784][03414] Saving new best policy, reward=5.768!
	[2025-03-22 15:47:55,531][03427] Updated weights for policy 0, policy_version 450 (0.0027)
	[2025-03-22 15:47:57,779][03219] Fps is (10 sec: 4095.0, 60 sec: 3959.5, 300 sec: 3748.9). Total num frames: 1847296. Throughput: 0: 996.7. Samples: 460962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:47:57,790][03219] Avg episode reward: [(0, '5.919')]
	[2025-03-22 15:47:57,796][03414] Saving new best policy, reward=5.919!
	[2025-03-22 15:48:02,778][03219] Fps is (10 sec: 3685.6, 60 sec: 4027.6, 300 sec: 3762.7). Total num frames: 1867776. Throughput: 0: 995.3. Samples: 465698. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:48:02,780][03219] Avg episode reward: [(0, '5.886')]
	[2025-03-22 15:48:06,249][03427] Updated weights for policy 0, policy_version 460 (0.0021)
	[2025-03-22 15:48:07,776][03219] Fps is (10 sec: 4096.9, 60 sec: 3959.4, 300 sec: 3762.8). Total num frames: 1888256. Throughput: 0: 1001.5. Samples: 472544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:48:07,781][03219] Avg episode reward: [(0, '6.234')]
	[2025-03-22 15:48:07,783][03414] Saving new best policy, reward=6.234!
	[2025-03-22 15:48:12,776][03219] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3762.8). Total num frames: 1908736. Throughput: 0: 1000.3. Samples: 475832. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:48:12,780][03219] Avg episode reward: [(0, '6.184')]
	[2025-03-22 15:48:16,782][03427] Updated weights for policy 0, policy_version 470 (0.0019)
	[2025-03-22 15:48:17,776][03219] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 1929216. Throughput: 0: 1006.4. Samples: 480914. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:48:17,780][03219] Avg episode reward: [(0, '6.274')]
	[2025-03-22 15:48:17,783][03414] Saving new best policy, reward=6.274!
	[2025-03-22 15:48:22,776][03219] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3776.7). Total num frames: 1949696. Throughput: 0: 1003.1. Samples: 487850. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:48:22,777][03219] Avg episode reward: [(0, '6.703')]
	[2025-03-22 15:48:22,787][03414] Saving new best policy, reward=6.703!
	[2025-03-22 15:48:26,652][03427] Updated weights for policy 0, policy_version 480 (0.0016)
	[2025-03-22 15:48:27,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3776.7). Total num frames: 1966080. Throughput: 0: 997.0. Samples: 490928. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:48:27,781][03219] Avg episode reward: [(0, '6.651')]
	[2025-03-22 15:48:32,776][03219] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 1986560. Throughput: 0: 1006.6. Samples: 496172. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-22 15:48:32,781][03219] Avg episode reward: [(0, '6.803')]
	[2025-03-22 15:48:32,807][03414] Saving new best policy, reward=6.803!
	[2025-03-22 15:48:36,306][03427] Updated weights for policy 0, policy_version 490 (0.0027)
	[2025-03-22 15:48:37,776][03219] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3790.5). Total num frames: 2011136. Throughput: 0: 1013.6. Samples: 503254. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:48:37,782][03219] Avg episode reward: [(0, '6.779')]
	[2025-03-22 15:48:42,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3790.5). Total num frames: 2027520. Throughput: 0: 1003.3. Samples: 506108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:48:42,779][03219] Avg episode reward: [(0, '6.342')]
	[2025-03-22 15:48:46,703][03427] Updated weights for policy 0, policy_version 500 (0.0018)
	[2025-03-22 15:48:47,776][03219] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 3818.3). Total num frames: 2052096. Throughput: 0: 1021.2. Samples: 511650. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:48:47,778][03219] Avg episode reward: [(0, '6.072')]
	[2025-03-22 15:48:52,776][03219] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3818.3). Total num frames: 2072576. Throughput: 0: 1025.8. Samples: 518704. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:48:52,781][03219] Avg episode reward: [(0, '5.767')]
	[2025-03-22 15:48:57,063][03427] Updated weights for policy 0, policy_version 510 (0.0015)
	[2025-03-22 15:48:57,778][03219] Fps is (10 sec: 3685.8, 60 sec: 4027.8, 300 sec: 3818.3). Total num frames: 2088960. Throughput: 0: 1012.8. Samples: 521410. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:48:57,779][03219] Avg episode reward: [(0, '5.660')]
	[2025-03-22 15:49:02,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 3846.1). Total num frames: 2113536. Throughput: 0: 1026.9. Samples: 527124. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:49:02,782][03219] Avg episode reward: [(0, '6.020')]
	[2025-03-22 15:49:02,790][03414] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000516_2113536.pth...
	[2025-03-22 15:49:02,918][03414] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000288_1179648.pth
	[2025-03-22 15:49:06,065][03427] Updated weights for policy 0, policy_version 520 (0.0022)
	[2025-03-22 15:49:07,776][03219] Fps is (10 sec: 4506.5, 60 sec: 4096.0, 300 sec: 3846.1). Total num frames: 2134016. Throughput: 0: 1028.4. Samples: 534130. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:49:07,781][03219] Avg episode reward: [(0, '6.409')]
	[2025-03-22 15:49:12,776][03219] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3832.2). Total num frames: 2150400. Throughput: 0: 1018.4. Samples: 536758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:49:12,780][03219] Avg episode reward: [(0, '6.596')]
	[2025-03-22 15:49:16,585][03427] Updated weights for policy 0, policy_version 530 (0.0025)
	[2025-03-22 15:49:17,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3860.0). Total num frames: 2174976. Throughput: 0: 1032.3. Samples: 542624. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:49:17,780][03219] Avg episode reward: [(0, '7.498')]
	[2025-03-22 15:49:17,784][03414] Saving new best policy, reward=7.498!
	[2025-03-22 15:49:22,777][03219] Fps is (10 sec: 4505.2, 60 sec: 4095.9, 300 sec: 3859.9). Total num frames: 2195456. Throughput: 0: 1027.4. Samples: 549488. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:49:22,779][03219] Avg episode reward: [(0, '8.144')]
	[2025-03-22 15:49:22,895][03414] Saving new best policy, reward=8.144!
	[2025-03-22 15:49:27,372][03427] Updated weights for policy 0, policy_version 540 (0.0021)
	[2025-03-22 15:49:27,776][03219] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3860.0). Total num frames: 2211840. Throughput: 0: 1014.6. Samples: 551764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:49:27,777][03219] Avg episode reward: [(0, '8.331')]
	[2025-03-22 15:49:27,782][03414] Saving new best policy, reward=8.331!
	[2025-03-22 15:49:32,776][03219] Fps is (10 sec: 4096.4, 60 sec: 4164.3, 300 sec: 3887.7). Total num frames: 2236416. Throughput: 0: 1025.0. Samples: 557776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:49:32,777][03219] Avg episode reward: [(0, '8.851')]
	[2025-03-22 15:49:32,783][03414] Saving new best policy, reward=8.851!
	[2025-03-22 15:49:36,194][03427] Updated weights for policy 0, policy_version 550 (0.0018)
	[2025-03-22 15:49:37,778][03219] Fps is (10 sec: 4504.9, 60 sec: 4095.9, 300 sec: 3887.7). Total num frames: 2256896. Throughput: 0: 1024.9. Samples: 564824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:49:37,779][03219] Avg episode reward: [(0, '8.774')]
	[2025-03-22 15:49:42,776][03219] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3887.7). Total num frames: 2273280. Throughput: 0: 1010.1. Samples: 566864. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:49:42,777][03219] Avg episode reward: [(0, '8.335')]
	[2025-03-22 15:49:46,765][03427] Updated weights for policy 0, policy_version 560 (0.0023)
	[2025-03-22 15:49:47,776][03219] Fps is (10 sec: 4096.6, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 2297856. Throughput: 0: 1020.6. Samples: 573052. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:49:47,785][03219] Avg episode reward: [(0, '8.480')]
	[2025-03-22 15:49:52,776][03219] Fps is (10 sec: 4505.7, 60 sec: 4096.0, 300 sec: 3915.5). Total num frames: 2318336. Throughput: 0: 1018.5. Samples: 579962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:49:52,779][03219] Avg episode reward: [(0, '8.690')]
	[2025-03-22 15:49:57,538][03427] Updated weights for policy 0, policy_version 570 (0.0018)
	[2025-03-22 15:49:57,776][03219] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 3915.5). Total num frames: 2334720. Throughput: 0: 1004.9. Samples: 581978. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:49:57,777][03219] Avg episode reward: [(0, '9.524')]
	[2025-03-22 15:49:57,783][03414] Saving new best policy, reward=9.524!
	[2025-03-22 15:50:02,776][03219] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3929.4). Total num frames: 2355200. Throughput: 0: 1014.1. Samples: 588260. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
	[2025-03-22 15:50:02,780][03219] Avg episode reward: [(0, '9.506')]
	[2025-03-22 15:50:06,324][03427] Updated weights for policy 0, policy_version 580 (0.0024)
	[2025-03-22 15:50:07,783][03219] Fps is (10 sec: 4502.4, 60 sec: 4095.5, 300 sec: 3929.3). Total num frames: 2379776. Throughput: 0: 1013.0. Samples: 595080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:50:07,785][03219] Avg episode reward: [(0, '9.683')]
	[2025-03-22 15:50:07,786][03414] Saving new best policy, reward=9.683!
	[2025-03-22 15:50:12,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3943.3). Total num frames: 2396160. Throughput: 0: 1008.0. Samples: 597122. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:50:12,779][03219] Avg episode reward: [(0, '9.998')]
	[2025-03-22 15:50:12,787][03414] Saving new best policy, reward=9.998!
	[2025-03-22 15:50:16,995][03427] Updated weights for policy 0, policy_version 590 (0.0015)
	[2025-03-22 15:50:17,776][03219] Fps is (10 sec: 3689.0, 60 sec: 4027.7, 300 sec: 3943.3). Total num frames: 2416640. Throughput: 0: 1017.4. Samples: 603558. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:50:17,779][03219] Avg episode reward: [(0, '10.634')]
	[2025-03-22 15:50:17,782][03414] Saving new best policy, reward=10.634!
	[2025-03-22 15:50:22,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3943.3). Total num frames: 2437120. Throughput: 0: 1007.5. Samples: 610158. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:50:22,777][03219] Avg episode reward: [(0, '12.168')]
	[2025-03-22 15:50:22,782][03414] Saving new best policy, reward=12.168!
	[2025-03-22 15:50:27,775][03427] Updated weights for policy 0, policy_version 600 (0.0030)
	[2025-03-22 15:50:27,777][03219] Fps is (10 sec: 4095.7, 60 sec: 4096.0, 300 sec: 3957.2). Total num frames: 2457600. Throughput: 0: 1005.0. Samples: 612090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:50:27,783][03219] Avg episode reward: [(0, '11.299')]
	[2025-03-22 15:50:32,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 2478080. Throughput: 0: 1015.2. Samples: 618734. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:50:32,780][03219] Avg episode reward: [(0, '11.376')]
	[2025-03-22 15:50:36,984][03427] Updated weights for policy 0, policy_version 610 (0.0014)
	[2025-03-22 15:50:37,777][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3998.8). Total num frames: 2498560. Throughput: 0: 1005.1. Samples: 625190. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:50:37,780][03219] Avg episode reward: [(0, '10.408')]
	[2025-03-22 15:50:42,778][03219] Fps is (10 sec: 3685.6, 60 sec: 4027.6, 300 sec: 3998.8). Total num frames: 2514944. Throughput: 0: 1006.1. Samples: 627254. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:50:42,780][03219] Avg episode reward: [(0, '10.239')]
	[2025-03-22 15:50:47,218][03427] Updated weights for policy 0, policy_version 620 (0.0019)
	[2025-03-22 15:50:47,776][03219] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2539520. Throughput: 0: 1019.6. Samples: 634142. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:50:47,780][03219] Avg episode reward: [(0, '9.877')]
	[2025-03-22 15:50:52,776][03219] Fps is (10 sec: 4506.6, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2560000. Throughput: 0: 1006.2. Samples: 640352. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:50:52,783][03219] Avg episode reward: [(0, '10.574')]
	[2025-03-22 15:50:57,776][03219] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2576384. Throughput: 0: 1006.8. Samples: 642430. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:50:57,777][03219] Avg episode reward: [(0, '10.810')]
	[2025-03-22 15:50:57,897][03427] Updated weights for policy 0, policy_version 630 (0.0024)
	[2025-03-22 15:51:02,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2600960. Throughput: 0: 1016.6. Samples: 649304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:51:02,781][03219] Avg episode reward: [(0, '11.097')]
	[2025-03-22 15:51:02,790][03414] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000635_2600960.pth...
	[2025-03-22 15:51:02,949][03414] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000397_1626112.pth
	[2025-03-22 15:51:07,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3959.9, 300 sec: 4012.7). Total num frames: 2617344. Throughput: 0: 999.0. Samples: 655112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:51:07,779][03219] Avg episode reward: [(0, '12.026')]
	[2025-03-22 15:51:08,173][03427] Updated weights for policy 0, policy_version 640 (0.0018)
	[2025-03-22 15:51:12,776][03219] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2637824. Throughput: 0: 1003.5. Samples: 657246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:51:12,781][03219] Avg episode reward: [(0, '11.942')]
	[2025-03-22 15:51:17,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2658304. Throughput: 0: 1006.6. Samples: 664030. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:51:17,782][03219] Avg episode reward: [(0, '12.031')]
	[2025-03-22 15:51:17,973][03427] Updated weights for policy 0, policy_version 650 (0.0024)
	[2025-03-22 15:51:22,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2678784. Throughput: 0: 995.6. Samples: 669990. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:51:22,780][03219] Avg episode reward: [(0, '13.076')]
	[2025-03-22 15:51:22,785][03414] Saving new best policy, reward=13.076!
	[2025-03-22 15:51:27,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 2699264. Throughput: 0: 1000.6. Samples: 672278. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:51:27,781][03219] Avg episode reward: [(0, '12.828')]
	[2025-03-22 15:51:28,611][03427] Updated weights for policy 0, policy_version 660 (0.0013)
	[2025-03-22 15:51:32,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2719744. Throughput: 0: 1004.4. Samples: 679340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:51:32,782][03219] Avg episode reward: [(0, '13.384')]
	[2025-03-22 15:51:32,790][03414] Saving new best policy, reward=13.384!
	[2025-03-22 15:51:37,780][03219] Fps is (10 sec: 3684.9, 60 sec: 3959.2, 300 sec: 4012.6). Total num frames: 2736128. Throughput: 0: 993.1. Samples: 685044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:51:37,786][03219] Avg episode reward: [(0, '12.395')]
	[2025-03-22 15:51:39,259][03427] Updated weights for policy 0, policy_version 670 (0.0015)
	[2025-03-22 15:51:42,776][03219] Fps is (10 sec: 4096.1, 60 sec: 4096.2, 300 sec: 4040.5). Total num frames: 2760704. Throughput: 0: 1004.3. Samples: 687624. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:51:42,780][03219] Avg episode reward: [(0, '11.518')]
	[2025-03-22 15:51:47,776][03219] Fps is (10 sec: 4507.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2781184. Throughput: 0: 1008.0. Samples: 694662. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:51:47,778][03219] Avg episode reward: [(0, '12.925')]
	[2025-03-22 15:51:47,992][03427] Updated weights for policy 0, policy_version 680 (0.0017)
	[2025-03-22 15:51:52,777][03219] Fps is (10 sec: 3685.9, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 2797568. Throughput: 0: 999.8. Samples: 700104. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:51:52,779][03219] Avg episode reward: [(0, '12.732')]
	[2025-03-22 15:51:57,776][03219] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2818048. Throughput: 0: 1011.5. Samples: 702764. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:51:57,781][03219] Avg episode reward: [(0, '13.907')]
	[2025-03-22 15:51:57,847][03414] Saving new best policy, reward=13.907!
	[2025-03-22 15:51:58,908][03427] Updated weights for policy 0, policy_version 690 (0.0018)
	[2025-03-22 15:52:02,776][03219] Fps is (10 sec: 4506.2, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2842624. Throughput: 0: 1013.8. Samples: 709650. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:52:02,778][03219] Avg episode reward: [(0, '14.310')]
	[2025-03-22 15:52:02,791][03414] Saving new best policy, reward=14.310!
	[2025-03-22 15:52:07,776][03219] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2859008. Throughput: 0: 997.4. Samples: 714874. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:52:07,780][03219] Avg episode reward: [(0, '13.965')]
	[2025-03-22 15:52:09,632][03427] Updated weights for policy 0, policy_version 700 (0.0014)
	[2025-03-22 15:52:12,776][03219] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2879488. Throughput: 0: 1009.3. Samples: 717696. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:52:12,782][03219] Avg episode reward: [(0, '14.523')]
	[2025-03-22 15:52:12,789][03414] Saving new best policy, reward=14.523!
	[2025-03-22 15:52:17,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2899968. Throughput: 0: 1003.3. Samples: 724490. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:52:17,777][03219] Avg episode reward: [(0, '14.812')]
	[2025-03-22 15:52:17,854][03414] Saving new best policy, reward=14.812!
	[2025-03-22 15:52:19,249][03427] Updated weights for policy 0, policy_version 710 (0.0029)
	[2025-03-22 15:52:22,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2916352. Throughput: 0: 988.4. Samples: 729516. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:52:22,781][03219] Avg episode reward: [(0, '15.211')]
	[2025-03-22 15:52:22,791][03414] Saving new best policy, reward=15.211!
	[2025-03-22 15:52:27,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2936832. Throughput: 0: 993.2. Samples: 732316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:52:27,777][03219] Avg episode reward: [(0, '15.374')]
	[2025-03-22 15:52:27,786][03414] Saving new best policy, reward=15.374!
	[2025-03-22 15:52:29,967][03427] Updated weights for policy 0, policy_version 720 (0.0015)
	[2025-03-22 15:52:32,776][03219] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2961408. Throughput: 0: 986.3. Samples: 739046. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:52:32,777][03219] Avg episode reward: [(0, '16.102')]
	[2025-03-22 15:52:32,785][03414] Saving new best policy, reward=16.102!
	[2025-03-22 15:52:37,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.7, 300 sec: 4012.7). Total num frames: 2973696. Throughput: 0: 974.5. Samples: 743956. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:52:37,778][03219] Avg episode reward: [(0, '15.941')]
	[2025-03-22 15:52:41,235][03427] Updated weights for policy 0, policy_version 730 (0.0026)
	[2025-03-22 15:52:42,776][03219] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 4026.6). Total num frames: 2994176. Throughput: 0: 980.3. Samples: 746878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:52:42,782][03219] Avg episode reward: [(0, '15.761')]
	[2025-03-22 15:52:47,780][03219] Fps is (10 sec: 4503.8, 60 sec: 3959.2, 300 sec: 4026.5). Total num frames: 3018752. Throughput: 0: 977.6. Samples: 753648. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:52:47,781][03219] Avg episode reward: [(0, '16.045')]
	[2025-03-22 15:52:51,944][03427] Updated weights for policy 0, policy_version 740 (0.0021)
	[2025-03-22 15:52:52,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 4012.7). Total num frames: 3031040. Throughput: 0: 969.9. Samples: 758520. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:52:52,778][03219] Avg episode reward: [(0, '15.885')]
	[2025-03-22 15:52:57,776][03219] Fps is (10 sec: 3687.8, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 3055616. Throughput: 0: 978.9. Samples: 761746. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:52:57,778][03219] Avg episode reward: [(0, '16.483')]
	[2025-03-22 15:52:57,781][03414] Saving new best policy, reward=16.483!
	[2025-03-22 15:53:01,205][03427] Updated weights for policy 0, policy_version 750 (0.0017)
	[2025-03-22 15:53:02,779][03219] Fps is (10 sec: 4504.2, 60 sec: 3891.0, 300 sec: 4026.5). Total num frames: 3076096. Throughput: 0: 979.2. Samples: 768558. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-22 15:53:02,785][03219] Avg episode reward: [(0, '15.047')]
	[2025-03-22 15:53:02,797][03414] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000751_3076096.pth...
	[2025-03-22 15:53:02,968][03414] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000516_2113536.pth
	[2025-03-22 15:53:07,776][03219] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 3092480. Throughput: 0: 973.8. Samples: 773338. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:53:07,778][03219] Avg episode reward: [(0, '14.267')]
	[2025-03-22 15:53:11,876][03427] Updated weights for policy 0, policy_version 760 (0.0033)
	[2025-03-22 15:53:12,776][03219] Fps is (10 sec: 3687.5, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 3112960. Throughput: 0: 987.6. Samples: 776756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-22 15:53:12,780][03219] Avg episode reward: [(0, '15.728')]
	[2025-03-22 15:53:17,776][03219] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3137536. Throughput: 0: 992.0. Samples: 783688. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-22 15:53:17,779][03219] Avg episode reward: [(0, '15.968')]
	[2025-03-22 15:53:22,457][03427] Updated weights for policy 0, policy_version 770 (0.0017)
	[2025-03-22 15:53:22,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3153920. Throughput: 0: 990.7. Samples: 788538. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
	[2025-03-22 15:53:22,782][03219] Avg episode reward: [(0, '16.199')]
	[2025-03-22 15:53:27,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 3174400. Throughput: 0: 998.9. Samples: 791828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:53:27,780][03219] Avg episode reward: [(0, '17.856')]
	[2025-03-22 15:53:27,783][03414] Saving new best policy, reward=17.856!
	[2025-03-22 15:53:31,660][03427] Updated weights for policy 0, policy_version 780 (0.0017)
	[2025-03-22 15:53:32,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 4012.7). Total num frames: 3194880. Throughput: 0: 999.7. Samples: 798630. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:53:32,780][03219] Avg episode reward: [(0, '18.213')]
	[2025-03-22 15:53:32,793][03414] Saving new best policy, reward=18.213!
	[2025-03-22 15:53:37,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 3211264. Throughput: 0: 997.5. Samples: 803406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:53:37,781][03219] Avg episode reward: [(0, '17.842')]
	[2025-03-22 15:53:42,506][03427] Updated weights for policy 0, policy_version 790 (0.0015)
	[2025-03-22 15:53:42,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3235840. Throughput: 0: 999.6. Samples: 806728. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:53:42,778][03219] Avg episode reward: [(0, '18.180')]
	[2025-03-22 15:53:47,781][03219] Fps is (10 sec: 4503.4, 60 sec: 3959.4, 300 sec: 4012.6). Total num frames: 3256320. Throughput: 0: 1002.0. Samples: 813650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:53:47,783][03219] Avg episode reward: [(0, '17.431')]
	[2025-03-22 15:53:52,776][03219] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3272704. Throughput: 0: 1001.0. Samples: 818384. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:53:52,778][03219] Avg episode reward: [(0, '17.132')]
	[2025-03-22 15:53:53,305][03427] Updated weights for policy 0, policy_version 800 (0.0033)
	[2025-03-22 15:53:57,776][03219] Fps is (10 sec: 3688.2, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3293184. Throughput: 0: 998.0. Samples: 821666. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:53:57,777][03219] Avg episode reward: [(0, '17.750')]
	[2025-03-22 15:54:02,776][03219] Fps is (10 sec: 4096.1, 60 sec: 3959.7, 300 sec: 3998.8). Total num frames: 3313664. Throughput: 0: 994.1. Samples: 828424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-22 15:54:02,780][03219] Avg episode reward: [(0, '17.928')]
	[2025-03-22 15:54:03,283][03427] Updated weights for policy 0, policy_version 810 (0.0016)
	[2025-03-22 15:54:07,776][03219] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 3334144. Throughput: 0: 992.5. Samples: 833200. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:54:07,778][03219] Avg episode reward: [(0, '18.541')]
	[2025-03-22 15:54:07,779][03414] Saving new best policy, reward=18.541!
	[2025-03-22 15:54:12,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3998.8). Total num frames: 3354624. Throughput: 0: 995.3. Samples: 836618. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:54:12,777][03219] Avg episode reward: [(0, '19.677')]
	[2025-03-22 15:54:12,785][03414] Saving new best policy, reward=19.677!
	[2025-03-22 15:54:13,309][03427] Updated weights for policy 0, policy_version 820 (0.0016)
	[2025-03-22 15:54:17,777][03219] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3984.9). Total num frames: 3371008. Throughput: 0: 989.1. Samples: 843140. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:54:17,778][03219] Avg episode reward: [(0, '19.758')]
	[2025-03-22 15:54:17,806][03414] Saving new best policy, reward=19.758!
	[2025-03-22 15:54:22,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3998.8). Total num frames: 3391488. Throughput: 0: 993.4. Samples: 848108. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:54:22,778][03219] Avg episode reward: [(0, '20.368')]
	[2025-03-22 15:54:22,790][03414] Saving new best policy, reward=20.368!
	[2025-03-22 15:54:24,150][03427] Updated weights for policy 0, policy_version 830 (0.0015)
	[2025-03-22 15:54:27,776][03219] Fps is (10 sec: 4096.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3411968. Throughput: 0: 992.0. Samples: 851370. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:54:27,782][03219] Avg episode reward: [(0, '18.790')]
	[2025-03-22 15:54:32,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3432448. Throughput: 0: 979.7. Samples: 857730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:54:32,780][03219] Avg episode reward: [(0, '20.068')]
	[2025-03-22 15:54:35,098][03427] Updated weights for policy 0, policy_version 840 (0.0028)
	[2025-03-22 15:54:37,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3448832. Throughput: 0: 990.1. Samples: 862938. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:54:37,780][03219] Avg episode reward: [(0, '19.636')]
	[2025-03-22 15:54:42,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3473408. Throughput: 0: 991.7. Samples: 866292. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:54:42,782][03219] Avg episode reward: [(0, '20.835')]
	[2025-03-22 15:54:42,789][03414] Saving new best policy, reward=20.835!
	[2025-03-22 15:54:44,142][03427] Updated weights for policy 0, policy_version 850 (0.0013)
	[2025-03-22 15:54:47,779][03219] Fps is (10 sec: 4094.8, 60 sec: 3891.3, 300 sec: 3971.0). Total num frames: 3489792. Throughput: 0: 980.6. Samples: 872556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-22 15:54:47,780][03219] Avg episode reward: [(0, '19.599')]
	[2025-03-22 15:54:52,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3510272. Throughput: 0: 987.1. Samples: 877620. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:54:52,782][03219] Avg episode reward: [(0, '21.726')]
	[2025-03-22 15:54:52,790][03414] Saving new best policy, reward=21.726!
	[2025-03-22 15:54:55,245][03427] Updated weights for policy 0, policy_version 860 (0.0020)
	[2025-03-22 15:54:57,776][03219] Fps is (10 sec: 4097.2, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3530752. Throughput: 0: 987.8. Samples: 881068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:54:57,782][03219] Avg episode reward: [(0, '20.489')]
	[2025-03-22 15:55:02,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3547136. Throughput: 0: 976.2. Samples: 887068. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:55:02,782][03219] Avg episode reward: [(0, '20.337')]
	[2025-03-22 15:55:02,790][03414] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000866_3547136.pth...
	[2025-03-22 15:55:02,955][03414] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000635_2600960.pth
	[2025-03-22 15:55:06,066][03427] Updated weights for policy 0, policy_version 870 (0.0013)
	[2025-03-22 15:55:07,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3971.0). Total num frames: 3567616. Throughput: 0: 983.5. Samples: 892366. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:55:07,777][03219] Avg episode reward: [(0, '19.988')]
	[2025-03-22 15:55:12,776][03219] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3984.9). Total num frames: 3592192. Throughput: 0: 987.8. Samples: 895822. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:55:12,782][03219] Avg episode reward: [(0, '20.494')]
	[2025-03-22 15:55:15,940][03427] Updated weights for policy 0, policy_version 880 (0.0026)
	[2025-03-22 15:55:17,778][03219] Fps is (10 sec: 4095.2, 60 sec: 3959.4, 300 sec: 3971.0). Total num frames: 3608576. Throughput: 0: 979.2. Samples: 901798. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:55:17,779][03219] Avg episode reward: [(0, '21.034')]
	[2025-03-22 15:55:22,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3629056. Throughput: 0: 991.2. Samples: 907542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-22 15:55:22,777][03219] Avg episode reward: [(0, '20.752')]
	[2025-03-22 15:55:26,038][03427] Updated weights for policy 0, policy_version 890 (0.0014)
	[2025-03-22 15:55:27,776][03219] Fps is (10 sec: 4096.8, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3649536. Throughput: 0: 992.3. Samples: 910944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:55:27,777][03219] Avg episode reward: [(0, '20.562')]
	[2025-03-22 15:55:32,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3957.2). Total num frames: 3665920. Throughput: 0: 979.6. Samples: 916634. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:55:32,781][03219] Avg episode reward: [(0, '20.165')]
	[2025-03-22 15:55:36,924][03427] Updated weights for policy 0, policy_version 900 (0.0029)
	[2025-03-22 15:55:37,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3985.0). Total num frames: 3690496. Throughput: 0: 995.3. Samples: 922410. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:55:37,777][03219] Avg episode reward: [(0, '20.053')]
	[2025-03-22 15:55:42,776][03219] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3710976. Throughput: 0: 994.3. Samples: 925810. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:55:42,777][03219] Avg episode reward: [(0, '20.387')]
	[2025-03-22 15:55:47,282][03427] Updated weights for policy 0, policy_version 910 (0.0020)
	[2025-03-22 15:55:47,776][03219] Fps is (10 sec: 3686.3, 60 sec: 3959.7, 300 sec: 3957.2). Total num frames: 3727360. Throughput: 0: 986.4. Samples: 931456. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:55:47,778][03219] Avg episode reward: [(0, '20.776')]
	[2025-03-22 15:55:52,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3747840. Throughput: 0: 1000.8. Samples: 937404. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:55:52,782][03219] Avg episode reward: [(0, '20.376')]
	[2025-03-22 15:55:56,571][03427] Updated weights for policy 0, policy_version 920 (0.0013)
	[2025-03-22 15:55:57,776][03219] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3772416. Throughput: 0: 1001.6. Samples: 940896. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:55:57,779][03219] Avg episode reward: [(0, '21.251')]
	[2025-03-22 15:56:02,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3784704. Throughput: 0: 987.2. Samples: 946218. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-22 15:56:02,780][03219] Avg episode reward: [(0, '20.183')]
	[2025-03-22 15:56:07,389][03427] Updated weights for policy 0, policy_version 930 (0.0020)
	[2025-03-22 15:56:07,776][03219] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3971.0). Total num frames: 3809280. Throughput: 0: 997.7. Samples: 952438. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:56:07,777][03219] Avg episode reward: [(0, '20.141')]
	[2025-03-22 15:56:12,776][03219] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3971.0). Total num frames: 3829760. Throughput: 0: 997.2. Samples: 955818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:56:12,780][03219] Avg episode reward: [(0, '19.740')]
	[2025-03-22 15:56:17,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.6, 300 sec: 3957.2). Total num frames: 3846144. Throughput: 0: 989.2. Samples: 961146. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:56:17,777][03219] Avg episode reward: [(0, '20.032')]
	[2025-03-22 15:56:18,141][03427] Updated weights for policy 0, policy_version 940 (0.0017)
	[2025-03-22 15:56:22,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3866624. Throughput: 0: 1002.0. Samples: 967502. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-22 15:56:22,778][03219] Avg episode reward: [(0, '19.139')]
	[2025-03-22 15:56:27,396][03427] Updated weights for policy 0, policy_version 950 (0.0016)
	[2025-03-22 15:56:27,780][03219] Fps is (10 sec: 4503.7, 60 sec: 4027.4, 300 sec: 3971.0). Total num frames: 3891200. Throughput: 0: 1002.9. Samples: 970946. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-22 15:56:27,785][03219] Avg episode reward: [(0, '19.745')]
	[2025-03-22 15:56:32,776][03219] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3971.1). Total num frames: 3907584. Throughput: 0: 986.4. Samples: 975846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:56:32,778][03219] Avg episode reward: [(0, '19.496')]
	[2025-03-22 15:56:37,776][03219] Fps is (10 sec: 3687.9, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3928064. Throughput: 0: 999.7. Samples: 982392. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:56:37,778][03219] Avg episode reward: [(0, '18.704')]
	[2025-03-22 15:56:38,073][03427] Updated weights for policy 0, policy_version 960 (0.0014)
	[2025-03-22 15:56:42,776][03219] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3948544. Throughput: 0: 995.6. Samples: 985700. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:56:42,778][03219] Avg episode reward: [(0, '18.520')]
	[2025-03-22 15:56:47,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3964928. Throughput: 0: 988.0. Samples: 990678. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-22 15:56:47,782][03219] Avg episode reward: [(0, '18.811')]
	[2025-03-22 15:56:49,624][03427] Updated weights for policy 0, policy_version 970 (0.0026)
	[2025-03-22 15:56:52,776][03219] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3957.2). Total num frames: 3985408. Throughput: 0: 981.7. Samples: 996616. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-22 15:56:52,781][03219] Avg episode reward: [(0, '19.875')]
	[2025-03-22 15:56:56,717][03414] Stopping Batcher_0...
	[2025-03-22 15:56:56,717][03219] Component Batcher_0 stopped!
	[2025-03-22 15:56:56,718][03414] Loop batcher_evt_loop terminating...
	[2025-03-22 15:56:56,720][03414] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 15:56:56,722][03219] Component RolloutWorker_w0 process died already! Don't wait for it.
	[2025-03-22 15:56:56,791][03427] Weights refcount: 2 0
	[2025-03-22 15:56:56,794][03219] Component InferenceWorker_p0-w0 stopped!
	[2025-03-22 15:56:56,801][03427] Stopping InferenceWorker_p0-w0...
	[2025-03-22 15:56:56,801][03427] Loop inference_proc0-0_evt_loop terminating...
	[2025-03-22 15:56:56,874][03414] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000751_3076096.pth
	[2025-03-22 15:56:56,902][03414] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 15:56:57,104][03219] Component LearnerWorker_p0 stopped!
	[2025-03-22 15:56:57,103][03414] Stopping LearnerWorker_p0...
	[2025-03-22 15:56:57,105][03414] Loop learner_proc0_evt_loop terminating...
	[2025-03-22 15:56:57,298][03431] Stopping RolloutWorker_w3...
	[2025-03-22 15:56:57,299][03431] Loop rollout_proc3_evt_loop terminating...
	[2025-03-22 15:56:57,298][03219] Component RolloutWorker_w3 stopped!
	[2025-03-22 15:56:57,348][03433] Stopping RolloutWorker_w5...
	[2025-03-22 15:56:57,349][03219] Component RolloutWorker_w5 stopped!
	[2025-03-22 15:56:57,351][03428] Stopping RolloutWorker_w1...
	[2025-03-22 15:56:57,351][03219] Component RolloutWorker_w1 stopped!
	[2025-03-22 15:56:57,349][03433] Loop rollout_proc5_evt_loop terminating...
	[2025-03-22 15:56:57,352][03428] Loop rollout_proc1_evt_loop terminating...
	[2025-03-22 15:56:57,392][03435] Stopping RolloutWorker_w7...
	[2025-03-22 15:56:57,393][03435] Loop rollout_proc7_evt_loop terminating...
	[2025-03-22 15:56:57,392][03219] Component RolloutWorker_w7 stopped!
	[2025-03-22 15:56:57,425][03219] Component RolloutWorker_w4 stopped!
	[2025-03-22 15:56:57,429][03432] Stopping RolloutWorker_w4...
	[2025-03-22 15:56:57,430][03432] Loop rollout_proc4_evt_loop terminating...
	[2025-03-22 15:56:57,436][03219] Component RolloutWorker_w2 stopped!
	[2025-03-22 15:56:57,438][03430] Stopping RolloutWorker_w2...
	[2025-03-22 15:56:57,439][03430] Loop rollout_proc2_evt_loop terminating...
	[2025-03-22 15:56:57,506][03219] Component RolloutWorker_w6 stopped!
	[2025-03-22 15:56:57,510][03219] Waiting for process learner_proc0 to stop...
	[2025-03-22 15:56:57,511][03434] Stopping RolloutWorker_w6...
	[2025-03-22 15:56:57,516][03434] Loop rollout_proc6_evt_loop terminating...
	[2025-03-22 15:56:59,575][03219] Waiting for process inference_proc0-0 to join...
	[2025-03-22 15:56:59,687][03219] Waiting for process rollout_proc0 to join...
	[2025-03-22 15:56:59,688][03219] Waiting for process rollout_proc1 to join...
	[2025-03-22 15:57:02,117][03219] Waiting for process rollout_proc2 to join...
	[2025-03-22 15:57:02,118][03219] Waiting for process rollout_proc3 to join...
	[2025-03-22 15:57:02,119][03219] Waiting for process rollout_proc4 to join...
	[2025-03-22 15:57:02,120][03219] Waiting for process rollout_proc5 to join...
	[2025-03-22 15:57:02,121][03219] Waiting for process rollout_proc6 to join...
	[2025-03-22 15:57:02,123][03219] Waiting for process rollout_proc7 to join...
	[2025-03-22 15:57:02,124][03219] Batcher 0 profile tree view:
	batching: 26.0754, releasing_batches: 0.0299
	[2025-03-22 15:57:02,125][03219] InferenceWorker_p0-w0 profile tree view:
	wait_policy: 0.0000
	wait_policy_total: 384.9343
	update_model: 8.9280
	weight_update: 0.0017
	one_step: 0.0026
	handle_policy_step: 613.8213
	deserialize: 14.4172, stack: 3.3808, obs_to_device_normalize: 131.0309, forward: 321.4640, send_messages: 27.6357
	prepare_outputs: 89.5364
	to_cpu: 55.5379
	[2025-03-22 15:57:02,127][03219] Learner 0 profile tree view:
	misc: 0.0042, prepare_batch: 12.7394
	train: 73.5539
	epoch_init: 0.0051, minibatch_init: 0.0062, losses_postprocess: 0.7201, kl_divergence: 0.7135, after_optimizer: 33.2484
	calculate_losses: 26.0906
	losses_init: 0.0114, forward_head: 1.3828, bptt_initial: 17.2202, tail: 1.2022, advantages_returns: 0.3084, losses: 3.5212
	bptt: 2.1486
	bptt_forward_core: 2.0358
	update: 12.2106
	clip: 1.0642
	[2025-03-22 15:57:02,128][03219] RolloutWorker_w7 profile tree view:
	wait_for_trajectories: 0.2852, enqueue_policy_requests: 81.3337, env_step: 837.6647, overhead: 12.5524, complete_rollouts: 8.0634
	save_policy_outputs: 21.7719
	split_output_tensors: 8.0823
	[2025-03-22 15:57:02,130][03219] Loop Runner_EvtLoop terminating...
	[2025-03-22 15:57:02,131][03219] Runner profile tree view:
	main_loop: 1074.9850
	[2025-03-22 15:57:02,132][03219] Collected {0: 4005888}, FPS: 3726.5
	[2025-03-22 15:57:39,396][03219] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-22 15:57:39,397][03219] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-22 15:57:39,398][03219] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-22 15:57:39,399][03219] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-22 15:57:39,400][03219] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 15:57:39,401][03219] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-22 15:57:39,402][03219] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 15:57:39,403][03219] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-22 15:57:39,404][03219] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-22 15:57:39,405][03219] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-22 15:57:39,406][03219] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-22 15:57:39,407][03219] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-22 15:57:39,409][03219] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-22 15:57:39,410][03219] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-22 15:57:39,411][03219] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-22 15:57:39,440][03219] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 15:57:39,443][03219] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 15:57:39,445][03219] RunningMeanStd input shape: (1,)
	[2025-03-22 15:57:39,462][03219] ConvEncoder: input_channels=3
	[2025-03-22 15:57:39,575][03219] Conv encoder output size: 512
	[2025-03-22 15:57:39,576][03219] Policy head output size: 512
	[2025-03-22 15:57:39,762][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 15:57:39,765][03219] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy._core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 15:57:39,767][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 15:57:39,770][03219] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy._core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 15:57:39,771][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 15:57:39,774][03219] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy._core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:02:16,459][03219] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-22 16:02:16,460][03219] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-22 16:02:16,461][03219] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-22 16:02:16,461][03219] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-22 16:02:16,462][03219] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:02:16,463][03219] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-22 16:02:16,464][03219] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:02:16,465][03219] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-22 16:02:16,466][03219] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-22 16:02:16,467][03219] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-22 16:02:16,469][03219] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-22 16:02:16,471][03219] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-22 16:02:16,472][03219] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-22 16:02:16,474][03219] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-22 16:02:16,475][03219] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-22 16:02:16,502][03219] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 16:02:16,504][03219] RunningMeanStd input shape: (1,)
	[2025-03-22 16:02:16,515][03219] ConvEncoder: input_channels=3
	[2025-03-22 16:02:16,552][03219] Conv encoder output size: 512
	[2025-03-22 16:02:16,553][03219] Policy head output size: 512
	[2025-03-22 16:02:16,570][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:02:16,572][03219] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([dtype])` or the `torch.serialization.safe_globals([dtype])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:02:16,573][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:02:16,575][03219] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([dtype])` or the `torch.serialization.safe_globals([dtype])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:02:16,576][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:02:16,578][03219] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.dtype was not an allowed global by default. Please use `torch.serialization.add_safe_globals([dtype])` or the `torch.serialization.safe_globals([dtype])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:03:41,547][03219] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-22 16:03:41,548][03219] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-22 16:03:41,549][03219] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-22 16:03:41,550][03219] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-22 16:03:41,551][03219] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:03:41,552][03219] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-22 16:03:41,553][03219] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:03:41,554][03219] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-22 16:03:41,555][03219] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-22 16:03:41,556][03219] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-22 16:03:41,557][03219] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-22 16:03:41,558][03219] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-22 16:03:41,558][03219] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-22 16:03:41,559][03219] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-22 16:03:41,560][03219] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-22 16:03:41,585][03219] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 16:03:41,587][03219] RunningMeanStd input shape: (1,)
	[2025-03-22 16:03:41,599][03219] ConvEncoder: input_channels=3
	[2025-03-22 16:03:41,633][03219] Conv encoder output size: 512
	[2025-03-22 16:03:41,634][03219] Policy head output size: 512
	[2025-03-22 16:03:41,652][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:03:41,654][03219] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:03:41,655][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:03:41,657][03219] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:03:41,658][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:03:41,660][03219] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:08:58,716][03219] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-22 16:08:58,717][03219] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-22 16:08:58,718][03219] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-22 16:08:58,718][03219] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-22 16:08:58,719][03219] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:08:58,720][03219] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-22 16:08:58,721][03219] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:08:58,722][03219] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-22 16:08:58,723][03219] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-22 16:08:58,723][03219] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-22 16:08:58,724][03219] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-22 16:08:58,725][03219] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-22 16:08:58,726][03219] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-22 16:08:58,727][03219] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-22 16:08:58,728][03219] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-22 16:08:58,758][03219] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 16:08:58,760][03219] RunningMeanStd input shape: (1,)
	[2025-03-22 16:08:58,771][03219] ConvEncoder: input_channels=3
	[2025-03-22 16:08:58,806][03219] Conv encoder output size: 512
	[2025-03-22 16:08:58,807][03219] Policy head output size: 512
	[2025-03-22 16:08:58,828][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:08:58,830][03219] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	# noinspection PyBroadException
	^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:08:58,832][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:08:58,834][03219] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	# noinspection PyBroadException
	^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:08:58,836][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:08:58,837][03219] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	# noinspection PyBroadException
	^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:10:29,619][03219] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-22 16:10:29,620][03219] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-22 16:10:29,621][03219] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-22 16:10:29,622][03219] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-22 16:10:29,623][03219] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:10:29,624][03219] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-22 16:10:29,625][03219] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:10:29,626][03219] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-22 16:10:29,627][03219] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-22 16:10:29,628][03219] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-22 16:10:29,629][03219] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-22 16:10:29,630][03219] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-22 16:10:29,631][03219] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-22 16:10:29,632][03219] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-22 16:10:29,633][03219] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-22 16:10:29,662][03219] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 16:10:29,664][03219] RunningMeanStd input shape: (1,)
	[2025-03-22 16:10:29,676][03219] ConvEncoder: input_channels=3
	[2025-03-22 16:10:29,712][03219] Conv encoder output size: 512
	[2025-03-22 16:10:29,713][03219] Policy head output size: 512
	[2025-03-22 16:10:29,733][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:10:29,734][03219] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:10:29,736][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:10:29,738][03219] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:10:29,739][03219] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:10:29,741][03219] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	Please file an issue with the following so that we can make `weights_only=True` compatible with your use case: WeightsUnpickler error: Can only build Tensor, Parameter, OrderedDict or types allowlisted via `add_safe_globals`, but got <class 'numpy.dtypes.Float64DType'>

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-22 16:15:05,490][15900] Saving configuration to /content/train_dir/default_experiment/config.json...
	[2025-03-22 16:15:05,493][15900] Rollout worker 0 uses device cpu
	[2025-03-22 16:15:05,494][15900] Rollout worker 1 uses device cpu
	[2025-03-22 16:15:05,494][15900] Rollout worker 2 uses device cpu
	[2025-03-22 16:15:05,495][15900] Rollout worker 3 uses device cpu
	[2025-03-22 16:15:05,496][15900] Rollout worker 4 uses device cpu
	[2025-03-22 16:15:05,497][15900] Rollout worker 5 uses device cpu
	[2025-03-22 16:15:05,498][15900] Rollout worker 6 uses device cpu
	[2025-03-22 16:15:05,499][15900] Rollout worker 7 uses device cpu
	[2025-03-22 16:15:05,604][15900] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-22 16:15:05,605][15900] InferenceWorker_p0-w0: min num requests: 2
	[2025-03-22 16:15:05,638][15900] Starting all processes...
	[2025-03-22 16:15:05,639][15900] Starting process learner_proc0
	[2025-03-22 16:15:05,798][15900] Starting all processes...
	[2025-03-22 16:15:05,809][15900] Starting process inference_proc0-0
	[2025-03-22 16:15:05,809][15900] Starting process rollout_proc0
	[2025-03-22 16:15:05,813][15900] Starting process rollout_proc1
	[2025-03-22 16:15:05,813][15900] Starting process rollout_proc2
	[2025-03-22 16:15:05,813][15900] Starting process rollout_proc3
	[2025-03-22 16:15:05,813][15900] Starting process rollout_proc4
	[2025-03-22 16:15:05,813][15900] Starting process rollout_proc5
	[2025-03-22 16:15:05,813][15900] Starting process rollout_proc6
	[2025-03-22 16:15:05,813][15900] Starting process rollout_proc7
	[2025-03-22 16:15:21,735][16056] Worker 2 uses CPU cores [0]
	[2025-03-22 16:15:21,738][16060] Worker 5 uses CPU cores [1]
	[2025-03-22 16:15:21,742][16061] Worker 6 uses CPU cores [0]
	[2025-03-22 16:15:21,885][16041] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-22 16:15:21,886][16041] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2025-03-22 16:15:21,957][16041] Num visible devices: 1
	[2025-03-22 16:15:21,975][16041] Starting seed is not provided
	[2025-03-22 16:15:21,976][16041] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-22 16:15:21,976][16041] Initializing actor-critic model on device cuda:0
	[2025-03-22 16:15:21,977][16041] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 16:15:21,982][16041] RunningMeanStd input shape: (1,)
	[2025-03-22 16:15:21,998][16059] Worker 4 uses CPU cores [0]
	[2025-03-22 16:15:22,046][16062] Worker 7 uses CPU cores [1]
	[2025-03-22 16:15:22,050][16058] Worker 0 uses CPU cores [0]
	[2025-03-22 16:15:22,050][16055] Worker 1 uses CPU cores [1]
	[2025-03-22 16:15:22,102][16057] Worker 3 uses CPU cores [1]
	[2025-03-22 16:15:22,125][16054] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-22 16:15:22,126][16054] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
	[2025-03-22 16:15:22,148][16041] ConvEncoder: input_channels=3
	[2025-03-22 16:15:22,149][16054] Num visible devices: 1
	[2025-03-22 16:15:22,268][16041] Conv encoder output size: 512
	[2025-03-22 16:15:22,268][16041] Policy head output size: 512
	[2025-03-22 16:15:22,284][16041] Created Actor Critic model with architecture:
	[2025-03-22 16:15:22,285][16041] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2025-03-22 16:15:22,457][16041] Using optimizer <class 'torch.optim.adam.Adam'>
	[2025-03-22 16:15:23,923][16041] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-22 16:15:24,130][16041] Loading model from checkpoint
	[2025-03-22 16:15:24,134][16041] Loaded experiment state at self.train_step=978, self.env_steps=4005888
	[2025-03-22 16:15:24,134][16041] Initialized policy 0 weights for model version 978
	[2025-03-22 16:15:24,139][16041] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-22 16:15:24,150][16041] LearnerWorker_p0 finished initialization!
	[2025-03-22 16:15:24,393][16054] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 16:15:24,396][16054] RunningMeanStd input shape: (1,)
	[2025-03-22 16:15:24,485][16054] ConvEncoder: input_channels=3
	[2025-03-22 16:15:24,672][16054] Conv encoder output size: 512
	[2025-03-22 16:15:24,673][16054] Policy head output size: 512
	[2025-03-22 16:15:24,727][15900] Inference worker 0-0 is ready!
	[2025-03-22 16:15:24,730][15900] All inference workers are ready! Signal rollout workers to start!
	[2025-03-22 16:15:25,060][16057] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 16:15:25,044][16062] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 16:15:25,070][16055] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 16:15:25,077][16060] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 16:15:25,219][16058] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 16:15:25,381][16059] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 16:15:25,397][16056] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 16:15:25,388][16061] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 16:15:25,594][15900] Heartbeat connected on Batcher_0
	[2025-03-22 16:15:25,603][15900] Heartbeat connected on LearnerWorker_p0
	[2025-03-22 16:15:25,655][15900] Heartbeat connected on InferenceWorker_p0-w0
	[2025-03-22 16:15:26,293][15900] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-22 16:15:27,483][16061] Decorrelating experience for 0 frames...
	[2025-03-22 16:15:27,484][16059] Decorrelating experience for 0 frames...
	[2025-03-22 16:15:27,486][16058] Decorrelating experience for 0 frames...
	[2025-03-22 16:15:27,556][16060] Decorrelating experience for 0 frames...
	[2025-03-22 16:15:27,561][16057] Decorrelating experience for 0 frames...
	[2025-03-22 16:15:27,559][16055] Decorrelating experience for 0 frames...
	[2025-03-22 16:15:27,563][16062] Decorrelating experience for 0 frames...
	[2025-03-22 16:15:28,716][16058] Decorrelating experience for 32 frames...
	[2025-03-22 16:15:28,722][16059] Decorrelating experience for 32 frames...
	[2025-03-22 16:15:28,772][16060] Decorrelating experience for 32 frames...
	[2025-03-22 16:15:28,775][16055] Decorrelating experience for 32 frames...
	[2025-03-22 16:15:28,784][16062] Decorrelating experience for 32 frames...
	[2025-03-22 16:15:28,844][16056] Decorrelating experience for 0 frames...
	[2025-03-22 16:15:30,044][16061] Decorrelating experience for 32 frames...
	[2025-03-22 16:15:30,146][16056] Decorrelating experience for 32 frames...
	[2025-03-22 16:15:30,450][16055] Decorrelating experience for 64 frames...
	[2025-03-22 16:15:30,455][16060] Decorrelating experience for 64 frames...
	[2025-03-22 16:15:30,452][16062] Decorrelating experience for 64 frames...
	[2025-03-22 16:15:30,486][16058] Decorrelating experience for 64 frames...
	[2025-03-22 16:15:31,210][16057] Decorrelating experience for 32 frames...
	[2025-03-22 16:15:31,295][15900] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-22 16:15:31,622][16059] Decorrelating experience for 64 frames...
	[2025-03-22 16:15:31,885][16060] Decorrelating experience for 96 frames...
	[2025-03-22 16:15:32,028][16061] Decorrelating experience for 64 frames...
	[2025-03-22 16:15:32,112][15900] Heartbeat connected on RolloutWorker_w5
	[2025-03-22 16:15:32,293][16058] Decorrelating experience for 96 frames...
	[2025-03-22 16:15:32,855][15900] Heartbeat connected on RolloutWorker_w0
	[2025-03-22 16:15:33,499][16056] Decorrelating experience for 64 frames...
	[2025-03-22 16:15:33,503][16055] Decorrelating experience for 96 frames...
	[2025-03-22 16:15:33,919][15900] Heartbeat connected on RolloutWorker_w1
	[2025-03-22 16:15:34,438][16056] Decorrelating experience for 96 frames...
	[2025-03-22 16:15:34,538][16057] Decorrelating experience for 64 frames...
	[2025-03-22 16:15:34,751][15900] Heartbeat connected on RolloutWorker_w2
	[2025-03-22 16:15:35,954][16062] Decorrelating experience for 96 frames...
	[2025-03-22 16:15:36,293][15900] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 30.2. Samples: 302. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-22 16:15:36,299][15900] Avg episode reward: [(0, '4.154')]
	[2025-03-22 16:15:36,519][15900] Heartbeat connected on RolloutWorker_w7
	[2025-03-22 16:15:37,311][16041] Signal inference workers to stop experience collection...
	[2025-03-22 16:15:37,320][16054] InferenceWorker_p0-w0: stopping experience collection
	[2025-03-22 16:15:37,492][16057] Decorrelating experience for 96 frames...
	[2025-03-22 16:15:37,587][15900] Heartbeat connected on RolloutWorker_w3
	[2025-03-22 16:15:37,649][16059] Decorrelating experience for 96 frames...
	[2025-03-22 16:15:37,785][15900] Heartbeat connected on RolloutWorker_w4
	[2025-03-22 16:15:38,397][16061] Decorrelating experience for 96 frames...
	[2025-03-22 16:15:38,827][15900] Heartbeat connected on RolloutWorker_w6
	[2025-03-22 16:15:39,140][16041] Signal inference workers to resume experience collection...
	[2025-03-22 16:15:39,141][16054] InferenceWorker_p0-w0: resuming experience collection
	[2025-03-22 16:15:39,159][16041] Stopping Batcher_0...
	[2025-03-22 16:15:39,159][16041] Loop batcher_evt_loop terminating...
	[2025-03-22 16:15:39,161][15900] Component Batcher_0 stopped!
	[2025-03-22 16:15:39,320][16054] Weights refcount: 2 0
	[2025-03-22 16:15:39,329][15900] Component InferenceWorker_p0-w0 stopped!
	[2025-03-22 16:15:39,332][16054] Stopping InferenceWorker_p0-w0...
	[2025-03-22 16:15:39,337][16054] Loop inference_proc0-0_evt_loop terminating...
	[2025-03-22 16:15:39,732][15900] Component RolloutWorker_w7 stopped!
	[2025-03-22 16:15:39,735][16062] Stopping RolloutWorker_w7...
	[2025-03-22 16:15:39,741][16062] Loop rollout_proc7_evt_loop terminating...
	[2025-03-22 16:15:39,748][15900] Component RolloutWorker_w1 stopped!
	[2025-03-22 16:15:39,750][16055] Stopping RolloutWorker_w1...
	[2025-03-22 16:15:39,752][16055] Loop rollout_proc1_evt_loop terminating...
	[2025-03-22 16:15:39,756][15900] Component RolloutWorker_w3 stopped!
	[2025-03-22 16:15:39,759][16057] Stopping RolloutWorker_w3...
	[2025-03-22 16:15:39,760][16057] Loop rollout_proc3_evt_loop terminating...
	[2025-03-22 16:15:39,785][15900] Component RolloutWorker_w5 stopped!
	[2025-03-22 16:15:39,787][16060] Stopping RolloutWorker_w5...
	[2025-03-22 16:15:39,791][16041] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
	[2025-03-22 16:15:39,788][16060] Loop rollout_proc5_evt_loop terminating...
	[2025-03-22 16:15:39,883][16061] Stopping RolloutWorker_w6...
	[2025-03-22 16:15:39,878][15900] Component RolloutWorker_w6 stopped!
	[2025-03-22 16:15:39,897][15900] Component RolloutWorker_w0 stopped!
	[2025-03-22 16:15:39,883][16061] Loop rollout_proc6_evt_loop terminating...
	[2025-03-22 16:15:39,896][16058] Stopping RolloutWorker_w0...
	[2025-03-22 16:15:39,902][16058] Loop rollout_proc0_evt_loop terminating...
	[2025-03-22 16:15:39,914][15900] Component RolloutWorker_w4 stopped!
	[2025-03-22 16:15:39,915][16059] Stopping RolloutWorker_w4...
	[2025-03-22 16:15:39,916][16059] Loop rollout_proc4_evt_loop terminating...
	[2025-03-22 16:15:39,959][15900] Component RolloutWorker_w2 stopped!
	[2025-03-22 16:15:39,960][16056] Stopping RolloutWorker_w2...
	[2025-03-22 16:15:39,961][16056] Loop rollout_proc2_evt_loop terminating...
	[2025-03-22 16:15:39,981][16041] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000866_3547136.pth
	[2025-03-22 16:15:39,988][16041] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
	[2025-03-22 16:15:40,205][16041] Stopping LearnerWorker_p0...
	[2025-03-22 16:15:40,207][16041] Loop learner_proc0_evt_loop terminating...
	[2025-03-22 16:15:40,211][15900] Component LearnerWorker_p0 stopped!
	[2025-03-22 16:15:40,212][15900] Waiting for process learner_proc0 to stop...
	[2025-03-22 16:15:42,205][15900] Waiting for process inference_proc0-0 to join...
	[2025-03-22 16:15:42,206][15900] Waiting for process rollout_proc0 to join...
	[2025-03-22 16:15:44,163][15900] Waiting for process rollout_proc1 to join...
	[2025-03-22 16:15:44,296][15900] Waiting for process rollout_proc2 to join...
	[2025-03-22 16:15:44,300][15900] Waiting for process rollout_proc3 to join...
	[2025-03-22 16:15:44,304][15900] Waiting for process rollout_proc4 to join...
	[2025-03-22 16:15:44,305][15900] Waiting for process rollout_proc5 to join...
	[2025-03-22 16:15:44,307][15900] Waiting for process rollout_proc6 to join...
	[2025-03-22 16:15:44,308][15900] Waiting for process rollout_proc7 to join...
	[2025-03-22 16:15:44,309][15900] Batcher 0 profile tree view:
	batching: 0.0475, releasing_batches: 0.0004
	[2025-03-22 16:15:44,310][15900] InferenceWorker_p0-w0 profile tree view:
	wait_policy: 0.0051
	wait_policy_total: 9.5581
	update_model: 0.0227
	weight_update: 0.0013
	one_step: 0.0892
	handle_policy_step: 2.9471
	deserialize: 0.0564, stack: 0.0093, obs_to_device_normalize: 0.5743, forward: 1.9118, send_messages: 0.0664
	prepare_outputs: 0.2480
	to_cpu: 0.1698
	[2025-03-22 16:15:44,311][15900] Learner 0 profile tree view:
	misc: 0.0000, prepare_batch: 2.0491
	train: 2.4677
	epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0005, kl_divergence: 0.0173, after_optimizer: 0.0624
	calculate_losses: 0.7091
	losses_init: 0.0000, forward_head: 0.3859, bptt_initial: 0.2138, tail: 0.0541, advantages_returns: 0.0012, losses: 0.0413
	bptt: 0.0122
	bptt_forward_core: 0.0121
	update: 1.6725
	clip: 0.0682
	[2025-03-22 16:15:44,312][15900] RolloutWorker_w0 profile tree view:
	wait_for_trajectories: 0.0011, enqueue_policy_requests: 1.0230, env_step: 2.4242, overhead: 0.0986, complete_rollouts: 0.0278
	save_policy_outputs: 0.0770
	split_output_tensors: 0.0315
	[2025-03-22 16:15:44,313][15900] RolloutWorker_w7 profile tree view:
	wait_for_trajectories: 0.0029, enqueue_policy_requests: 0.0343, env_step: 0.7313, overhead: 0.0167, complete_rollouts: 0.0000
	save_policy_outputs: 0.0230
	split_output_tensors: 0.0104
	[2025-03-22 16:15:44,315][15900] Loop Runner_EvtLoop terminating...
	[2025-03-22 16:15:44,316][15900] Runner profile tree view:
	main_loop: 38.6781
	[2025-03-22 16:15:44,317][15900] Collected {0: 4014080}, FPS: 211.8
	[2025-03-22 16:16:19,731][15900] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-22 16:16:19,732][15900] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-22 16:16:19,733][15900] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-22 16:16:19,734][15900] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-22 16:16:19,735][15900] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:16:19,736][15900] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-22 16:16:19,737][15900] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:16:19,741][15900] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-22 16:16:19,741][15900] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-22 16:16:19,742][15900] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-22 16:16:19,743][15900] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-22 16:16:19,744][15900] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-22 16:16:19,746][15900] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-22 16:16:19,747][15900] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-22 16:16:19,748][15900] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-22 16:16:19,791][15900] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-22 16:16:19,795][15900] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 16:16:19,796][15900] RunningMeanStd input shape: (1,)
	[2025-03-22 16:16:19,811][15900] ConvEncoder: input_channels=3
	[2025-03-22 16:16:19,913][15900] Conv encoder output size: 512
	[2025-03-22 16:16:19,914][15900] Policy head output size: 512
	[2025-03-22 16:16:20,094][15900] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
	[2025-03-22 16:16:20,844][15900] Num frames 100...
	[2025-03-22 16:16:20,977][15900] Num frames 200...
	[2025-03-22 16:16:21,117][15900] Num frames 300...
	[2025-03-22 16:16:21,241][15900] Avg episode rewards: #0: 4.520, true rewards: #0: 3.520
	[2025-03-22 16:16:21,242][15900] Avg episode reward: 4.520, avg true_objective: 3.520
	[2025-03-22 16:16:21,307][15900] Num frames 400...
	[2025-03-22 16:16:21,439][15900] Num frames 500...
	[2025-03-22 16:16:21,568][15900] Num frames 600...
	[2025-03-22 16:16:21,696][15900] Num frames 700...
	[2025-03-22 16:16:21,826][15900] Num frames 800...
	[2025-03-22 16:16:21,958][15900] Num frames 900...
	[2025-03-22 16:16:22,146][15900] Avg episode rewards: #0: 7.460, true rewards: #0: 4.960
	[2025-03-22 16:16:22,147][15900] Avg episode reward: 7.460, avg true_objective: 4.960
	[2025-03-22 16:16:22,161][15900] Num frames 1000...
	[2025-03-22 16:16:22,290][15900] Num frames 1100...
	[2025-03-22 16:16:22,421][15900] Num frames 1200...
	[2025-03-22 16:16:22,549][15900] Num frames 1300...
	[2025-03-22 16:16:22,682][15900] Num frames 1400...
	[2025-03-22 16:16:22,815][15900] Num frames 1500...
	[2025-03-22 16:16:22,944][15900] Num frames 1600...
	[2025-03-22 16:16:23,080][15900] Num frames 1700...
	[2025-03-22 16:16:23,211][15900] Num frames 1800...
	[2025-03-22 16:16:23,296][15900] Avg episode rewards: #0: 9.747, true rewards: #0: 6.080
	[2025-03-22 16:16:23,297][15900] Avg episode reward: 9.747, avg true_objective: 6.080
	[2025-03-22 16:16:23,398][15900] Num frames 1900...
	[2025-03-22 16:16:23,528][15900] Num frames 2000...
	[2025-03-22 16:16:23,656][15900] Num frames 2100...
	[2025-03-22 16:16:23,792][15900] Num frames 2200...
	[2025-03-22 16:16:23,925][15900] Num frames 2300...
	[2025-03-22 16:16:24,057][15900] Num frames 2400...
	[2025-03-22 16:16:24,196][15900] Num frames 2500...
	[2025-03-22 16:16:24,328][15900] Num frames 2600...
	[2025-03-22 16:16:24,459][15900] Num frames 2700...
	[2025-03-22 16:16:24,594][15900] Num frames 2800...
	[2025-03-22 16:16:24,727][15900] Num frames 2900...
	[2025-03-22 16:16:24,865][15900] Num frames 3000...
	[2025-03-22 16:16:25,014][15900] Num frames 3100...
	[2025-03-22 16:16:25,181][15900] Num frames 3200...
	[2025-03-22 16:16:25,367][15900] Num frames 3300...
	[2025-03-22 16:16:25,553][15900] Num frames 3400...
	[2025-03-22 16:16:25,735][15900] Num frames 3500...
	[2025-03-22 16:16:25,917][15900] Num frames 3600...
	[2025-03-22 16:16:25,994][15900] Avg episode rewards: #0: 19.025, true rewards: #0: 9.025
	[2025-03-22 16:16:25,995][15900] Avg episode reward: 19.025, avg true_objective: 9.025
	[2025-03-22 16:16:26,153][15900] Num frames 3700...
	[2025-03-22 16:16:26,328][15900] Num frames 3800...
	[2025-03-22 16:16:26,502][15900] Num frames 3900...
	[2025-03-22 16:16:26,686][15900] Num frames 4000...
	[2025-03-22 16:16:26,849][15900] Avg episode rewards: #0: 16.714, true rewards: #0: 8.114
	[2025-03-22 16:16:26,850][15900] Avg episode reward: 16.714, avg true_objective: 8.114
	[2025-03-22 16:16:26,932][15900] Num frames 4100...
	[2025-03-22 16:16:27,113][15900] Num frames 4200...
	[2025-03-22 16:16:27,297][15900] Num frames 4300...
	[2025-03-22 16:16:27,432][15900] Num frames 4400...
	[2025-03-22 16:16:27,561][15900] Num frames 4500...
	[2025-03-22 16:16:27,691][15900] Num frames 4600...
	[2025-03-22 16:16:27,827][15900] Num frames 4700...
	[2025-03-22 16:16:27,958][15900] Num frames 4800...
	[2025-03-22 16:16:28,049][15900] Avg episode rewards: #0: 16.375, true rewards: #0: 8.042
	[2025-03-22 16:16:28,050][15900] Avg episode reward: 16.375, avg true_objective: 8.042
	[2025-03-22 16:16:28,153][15900] Num frames 4900...
	[2025-03-22 16:16:28,293][15900] Num frames 5000...
	[2025-03-22 16:16:28,424][15900] Num frames 5100...
	[2025-03-22 16:16:28,556][15900] Num frames 5200...
	[2025-03-22 16:16:28,690][15900] Num frames 5300...
	[2025-03-22 16:16:28,827][15900] Num frames 5400...
	[2025-03-22 16:16:28,885][15900] Avg episode rewards: #0: 16.001, true rewards: #0: 7.716
	[2025-03-22 16:16:28,885][15900] Avg episode reward: 16.001, avg true_objective: 7.716
	[2025-03-22 16:16:29,014][15900] Num frames 5500...
	[2025-03-22 16:16:29,146][15900] Num frames 5600...
	[2025-03-22 16:16:29,282][15900] Num frames 5700...
	[2025-03-22 16:16:29,416][15900] Num frames 5800...
	[2025-03-22 16:16:29,547][15900] Num frames 5900...
	[2025-03-22 16:16:29,680][15900] Num frames 6000...
	[2025-03-22 16:16:29,816][15900] Num frames 6100...
	[2025-03-22 16:16:29,950][15900] Num frames 6200...
	[2025-03-22 16:16:30,083][15900] Num frames 6300...
	[2025-03-22 16:16:30,217][15900] Num frames 6400...
	[2025-03-22 16:16:30,398][15900] Avg episode rewards: #0: 17.236, true rewards: #0: 8.111
	[2025-03-22 16:16:30,399][15900] Avg episode reward: 17.236, avg true_objective: 8.111
	[2025-03-22 16:16:30,414][15900] Num frames 6500...
	[2025-03-22 16:16:30,546][15900] Num frames 6600...
	[2025-03-22 16:16:30,681][15900] Num frames 6700...
	[2025-03-22 16:16:30,814][15900] Num frames 6800...
	[2025-03-22 16:16:30,953][15900] Num frames 6900...
	[2025-03-22 16:16:31,092][15900] Num frames 7000...
	[2025-03-22 16:16:31,144][15900] Avg episode rewards: #0: 16.333, true rewards: #0: 7.778
	[2025-03-22 16:16:31,145][15900] Avg episode reward: 16.333, avg true_objective: 7.778
	[2025-03-22 16:16:31,277][15900] Num frames 7100...
	[2025-03-22 16:16:31,421][15900] Num frames 7200...
	[2025-03-22 16:16:31,553][15900] Num frames 7300...
	[2025-03-22 16:16:31,689][15900] Num frames 7400...
	[2025-03-22 16:16:31,824][15900] Num frames 7500...
	[2025-03-22 16:16:31,920][15900] Avg episode rewards: #0: 15.631, true rewards: #0: 7.531
	[2025-03-22 16:16:31,921][15900] Avg episode reward: 15.631, avg true_objective: 7.531
	[2025-03-22 16:17:20,425][15900] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
	[2025-03-22 16:18:57,804][15900] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-22 16:18:57,805][15900] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-22 16:18:57,807][15900] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-22 16:18:57,808][15900] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-22 16:18:57,809][15900] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-22 16:18:57,810][15900] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-22 16:18:57,811][15900] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
	[2025-03-22 16:18:57,812][15900] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-22 16:18:57,814][15900] Adding new argument 'push_to_hub'=True that is not in the saved config file!
	[2025-03-22 16:18:57,815][15900] Adding new argument 'hf_repository'='zimka/HFRLC_U8_health_gathering_supreme' that is not in the saved config file!
	[2025-03-22 16:18:57,816][15900] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-22 16:18:57,817][15900] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-22 16:18:57,818][15900] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-22 16:18:57,819][15900] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-22 16:18:57,820][15900] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-22 16:18:57,846][15900] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-22 16:18:57,847][15900] RunningMeanStd input shape: (1,)
	[2025-03-22 16:18:57,859][15900] ConvEncoder: input_channels=3
	[2025-03-22 16:18:57,895][15900] Conv encoder output size: 512
	[2025-03-22 16:18:57,896][15900] Policy head output size: 512
	[2025-03-22 16:18:57,916][15900] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
	[2025-03-22 16:18:58,379][15900] Num frames 100...
	[2025-03-22 16:18:58,513][15900] Num frames 200...
	[2025-03-22 16:18:58,640][15900] Num frames 300...
	[2025-03-22 16:18:58,763][15900] Avg episode rewards: #0: 4.520, true rewards: #0: 3.520
	[2025-03-22 16:18:58,764][15900] Avg episode reward: 4.520, avg true_objective: 3.520
	[2025-03-22 16:18:58,841][15900] Num frames 400...
	[2025-03-22 16:18:58,974][15900] Num frames 500...
	[2025-03-22 16:18:59,118][15900] Num frames 600...
	[2025-03-22 16:18:59,249][15900] Num frames 700...
	[2025-03-22 16:18:59,380][15900] Num frames 800...
	[2025-03-22 16:18:59,511][15900] Num frames 900...
	[2025-03-22 16:18:59,645][15900] Num frames 1000...
	[2025-03-22 16:18:59,788][15900] Num frames 1100...
	[2025-03-22 16:18:59,922][15900] Num frames 1200...
	[2025-03-22 16:19:00,065][15900] Num frames 1300...
	[2025-03-22 16:19:00,221][15900] Avg episode rewards: #0: 13.880, true rewards: #0: 6.880
	[2025-03-22 16:19:00,222][15900] Avg episode reward: 13.880, avg true_objective: 6.880
	[2025-03-22 16:19:00,259][15900] Num frames 1400...
	[2025-03-22 16:19:00,389][15900] Num frames 1500...
	[2025-03-22 16:19:00,520][15900] Num frames 1600...
	[2025-03-22 16:19:00,692][15900] Num frames 1700...
	[2025-03-22 16:19:00,872][15900] Num frames 1800...
	[2025-03-22 16:19:00,973][15900] Avg episode rewards: #0: 11.080, true rewards: #0: 6.080
	[2025-03-22 16:19:00,974][15900] Avg episode reward: 11.080, avg true_objective: 6.080
	[2025-03-22 16:19:01,112][15900] Num frames 1900...
	[2025-03-22 16:19:01,281][15900] Num frames 2000...
	[2025-03-22 16:19:01,452][15900] Num frames 2100...
	[2025-03-22 16:19:01,621][15900] Num frames 2200...
	[2025-03-22 16:19:01,793][15900] Num frames 2300...
	[2025-03-22 16:19:01,972][15900] Num frames 2400...
	[2025-03-22 16:19:02,165][15900] Num frames 2500...
	[2025-03-22 16:19:02,340][15900] Num frames 2600...
	[2025-03-22 16:19:02,528][15900] Num frames 2700...
	[2025-03-22 16:19:02,622][15900] Avg episode rewards: #0: 12.800, true rewards: #0: 6.800
	[2025-03-22 16:19:02,623][15900] Avg episode reward: 12.800, avg true_objective: 6.800
	[2025-03-22 16:19:02,760][15900] Num frames 2800...
	[2025-03-22 16:19:02,894][15900] Num frames 2900...
	[2025-03-22 16:19:03,028][15900] Num frames 3000...
	[2025-03-22 16:19:03,171][15900] Num frames 3100...
	[2025-03-22 16:19:03,301][15900] Num frames 3200...
	[2025-03-22 16:19:03,435][15900] Num frames 3300...
	[2025-03-22 16:19:03,566][15900] Num frames 3400...
	[2025-03-22 16:19:03,701][15900] Num frames 3500...
	[2025-03-22 16:19:03,836][15900] Num frames 3600...
	[2025-03-22 16:19:03,965][15900] Num frames 3700...
	[2025-03-22 16:19:04,095][15900] Num frames 3800...
	[2025-03-22 16:19:04,265][15900] Num frames 3900...
	[2025-03-22 16:19:04,394][15900] Num frames 4000...
	[2025-03-22 16:19:04,564][15900] Avg episode rewards: #0: 16.976, true rewards: #0: 8.176
	[2025-03-22 16:19:04,565][15900] Avg episode reward: 16.976, avg true_objective: 8.176
	[2025-03-22 16:19:04,583][15900] Num frames 4100...
	[2025-03-22 16:19:04,710][15900] Num frames 4200...
	[2025-03-22 16:19:04,846][15900] Num frames 4300...
	[2025-03-22 16:19:04,980][15900] Num frames 4400...
	[2025-03-22 16:19:05,115][15900] Num frames 4500...
	[2025-03-22 16:19:05,250][15900] Num frames 4600...
	[2025-03-22 16:19:05,378][15900] Num frames 4700...
	[2025-03-22 16:19:05,510][15900] Avg episode rewards: #0: 16.100, true rewards: #0: 7.933
	[2025-03-22 16:19:05,512][15900] Avg episode reward: 16.100, avg true_objective: 7.933
	[2025-03-22 16:19:05,567][15900] Num frames 4800...
	[2025-03-22 16:19:05,707][15900] Num frames 4900...
	[2025-03-22 16:19:05,841][15900] Num frames 5000...
	[2025-03-22 16:19:05,972][15900] Num frames 5100...
	[2025-03-22 16:19:06,105][15900] Num frames 5200...
	[2025-03-22 16:19:06,237][15900] Num frames 5300...
	[2025-03-22 16:19:06,374][15900] Num frames 5400...
	[2025-03-22 16:19:06,505][15900] Num frames 5500...
	[2025-03-22 16:19:06,638][15900] Num frames 5600...
	[2025-03-22 16:19:06,770][15900] Num frames 5700...
	[2025-03-22 16:19:06,854][15900] Avg episode rewards: #0: 16.600, true rewards: #0: 8.171
	[2025-03-22 16:19:06,855][15900] Avg episode reward: 16.600, avg true_objective: 8.171
	[2025-03-22 16:19:06,962][15900] Num frames 5800...
	[2025-03-22 16:19:07,099][15900] Num frames 5900...
	[2025-03-22 16:19:07,231][15900] Num frames 6000...
	[2025-03-22 16:19:07,370][15900] Num frames 6100...
	[2025-03-22 16:19:07,497][15900] Num frames 6200...
	[2025-03-22 16:19:07,629][15900] Num frames 6300...
	[2025-03-22 16:19:07,760][15900] Num frames 6400...
	[2025-03-22 16:19:07,851][15900] Avg episode rewards: #0: 16.405, true rewards: #0: 8.030
	[2025-03-22 16:19:07,852][15900] Avg episode reward: 16.405, avg true_objective: 8.030
	[2025-03-22 16:19:07,951][15900] Num frames 6500...
	[2025-03-22 16:19:08,088][15900] Num frames 6600...
	[2025-03-22 16:19:08,227][15900] Num frames 6700...
	[2025-03-22 16:19:08,367][15900] Num frames 6800...
	[2025-03-22 16:19:08,503][15900] Num frames 6900...
	[2025-03-22 16:19:08,636][15900] Num frames 7000...
	[2025-03-22 16:19:08,774][15900] Num frames 7100...
	[2025-03-22 16:19:08,908][15900] Num frames 7200...
	[2025-03-22 16:19:09,044][15900] Num frames 7300...
	[2025-03-22 16:19:09,181][15900] Num frames 7400...
	[2025-03-22 16:19:09,312][15900] Num frames 7500...
	[2025-03-22 16:19:09,451][15900] Num frames 7600...
	[2025-03-22 16:19:09,584][15900] Num frames 7700...
	[2025-03-22 16:19:09,722][15900] Num frames 7800...
	[2025-03-22 16:19:09,773][15900] Avg episode rewards: #0: 18.222, true rewards: #0: 8.667
	[2025-03-22 16:19:09,775][15900] Avg episode reward: 18.222, avg true_objective: 8.667
	[2025-03-22 16:19:09,903][15900] Num frames 7900...
	[2025-03-22 16:19:10,032][15900] Num frames 8000...
	[2025-03-22 16:19:10,168][15900] Num frames 8100...
	[2025-03-22 16:19:10,298][15900] Num frames 8200...
	[2025-03-22 16:19:10,438][15900] Num frames 8300...
	[2025-03-22 16:19:10,574][15900] Num frames 8400...
	[2025-03-22 16:19:10,760][15900] Avg episode rewards: #0: 17.797, true rewards: #0: 8.497
	[2025-03-22 16:19:10,761][15900] Avg episode reward: 17.797, avg true_objective: 8.497
	[2025-03-22 16:19:10,769][15900] Num frames 8500...
	[2025-03-22 16:20:03,421][15900] Replay video saved to /content/train_dir/default_experiment/replay.mp4!