Upload folder using huggingface_hub

55ab045 verified about 1 month ago

189 kB

	[2025-03-21 03:49:02,412][07156] Saving configuration to /content/train_dir/default_experiment/config.json...
	[2025-03-21 03:49:02,414][07156] Rollout worker 0 uses device cpu
	[2025-03-21 03:49:02,415][07156] Rollout worker 1 uses device cpu
	[2025-03-21 03:49:02,416][07156] Rollout worker 2 uses device cpu
	[2025-03-21 03:49:02,417][07156] Rollout worker 3 uses device cpu
	[2025-03-21 03:49:02,418][07156] Rollout worker 4 uses device cpu
	[2025-03-21 03:49:02,419][07156] Rollout worker 5 uses device cpu
	[2025-03-21 03:49:02,420][07156] Rollout worker 6 uses device cpu
	[2025-03-21 03:49:02,421][07156] Rollout worker 7 uses device cpu
	[2025-03-21 03:49:02,669][07156] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-21 03:49:02,672][07156] InferenceWorker_p0-w0: min num requests: 2
	[2025-03-21 03:49:02,725][07156] Starting all processes...
	[2025-03-21 03:49:02,734][07156] Starting process learner_proc0
	[2025-03-21 03:49:02,866][07156] Starting all processes...
	[2025-03-21 03:49:02,948][07156] Starting process inference_proc0-0
	[2025-03-21 03:49:02,948][07156] Starting process rollout_proc0
	[2025-03-21 03:49:02,949][07156] Starting process rollout_proc1
	[2025-03-21 03:49:02,950][07156] Starting process rollout_proc2
	[2025-03-21 03:49:02,951][07156] Starting process rollout_proc3
	[2025-03-21 03:49:02,951][07156] Starting process rollout_proc4
	[2025-03-21 03:49:02,951][07156] Starting process rollout_proc5
	[2025-03-21 03:49:02,951][07156] Starting process rollout_proc6
	[2025-03-21 03:49:02,951][07156] Starting process rollout_proc7
	[2025-03-21 03:49:22,182][07681] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-21 03:49:22,188][07681] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2025-03-21 03:49:22,284][07681] Num visible devices: 1
	[2025-03-21 03:49:22,307][07681] Starting seed is not provided
	[2025-03-21 03:49:22,307][07681] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-21 03:49:22,307][07681] Initializing actor-critic model on device cuda:0
	[2025-03-21 03:49:22,308][07681] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 03:49:22,313][07681] RunningMeanStd input shape: (1,)
	[2025-03-21 03:49:22,341][07695] Worker 0 uses CPU cores [0]
	[2025-03-21 03:49:22,348][07697] Worker 2 uses CPU cores [0]
	[2025-03-21 03:49:22,385][07696] Worker 1 uses CPU cores [1]
	[2025-03-21 03:49:22,390][07681] ConvEncoder: input_channels=3
	[2025-03-21 03:49:22,667][07702] Worker 7 uses CPU cores [1]
	[2025-03-21 03:49:22,669][07156] Heartbeat connected on Batcher_0
	[2025-03-21 03:49:22,682][07156] Heartbeat connected on RolloutWorker_w0
	[2025-03-21 03:49:22,686][07156] Heartbeat connected on RolloutWorker_w1
	[2025-03-21 03:49:22,691][07156] Heartbeat connected on RolloutWorker_w2
	[2025-03-21 03:49:22,725][07156] Heartbeat connected on RolloutWorker_w7
	[2025-03-21 03:49:22,784][07701] Worker 6 uses CPU cores [0]
	[2025-03-21 03:49:22,800][07156] Heartbeat connected on RolloutWorker_w6
	[2025-03-21 03:49:22,930][07700] Worker 5 uses CPU cores [1]
	[2025-03-21 03:49:22,939][07698] Worker 3 uses CPU cores [1]
	[2025-03-21 03:49:22,940][07156] Heartbeat connected on RolloutWorker_w5
	[2025-03-21 03:49:22,947][07156] Heartbeat connected on RolloutWorker_w3
	[2025-03-21 03:49:22,991][07694] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-21 03:49:22,992][07694] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
	[2025-03-21 03:49:23,020][07694] Num visible devices: 1
	[2025-03-21 03:49:23,021][07681] Conv encoder output size: 512
	[2025-03-21 03:49:23,022][07681] Policy head output size: 512
	[2025-03-21 03:49:23,030][07156] Heartbeat connected on InferenceWorker_p0-w0
	[2025-03-21 03:49:23,030][07699] Worker 4 uses CPU cores [0]
	[2025-03-21 03:49:23,034][07156] Heartbeat connected on RolloutWorker_w4
	[2025-03-21 03:49:23,084][07681] Created Actor Critic model with architecture:
	[2025-03-21 03:49:23,084][07681] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2025-03-21 03:49:23,337][07681] Using optimizer <class 'torch.optim.adam.Adam'>
	[2025-03-21 03:49:27,739][07681] No checkpoints found
	[2025-03-21 03:49:27,739][07681] Did not load from checkpoint, starting from scratch!
	[2025-03-21 03:49:27,739][07681] Initialized policy 0 weights for model version 0
	[2025-03-21 03:49:27,746][07681] LearnerWorker_p0 finished initialization!
	[2025-03-21 03:49:27,749][07681] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-21 03:49:27,748][07156] Heartbeat connected on LearnerWorker_p0
	[2025-03-21 03:49:27,930][07694] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 03:49:27,932][07694] RunningMeanStd input shape: (1,)
	[2025-03-21 03:49:27,945][07694] ConvEncoder: input_channels=3
	[2025-03-21 03:49:28,073][07694] Conv encoder output size: 512
	[2025-03-21 03:49:28,073][07694] Policy head output size: 512
	[2025-03-21 03:49:28,110][07156] Inference worker 0-0 is ready!
	[2025-03-21 03:49:28,111][07156] All inference workers are ready! Signal rollout workers to start!
	[2025-03-21 03:49:28,369][07700] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 03:49:28,387][07695] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 03:49:28,386][07699] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 03:49:28,397][07697] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 03:49:28,442][07702] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 03:49:28,453][07696] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 03:49:28,454][07701] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 03:49:28,468][07698] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 03:49:29,460][07700] Decorrelating experience for 0 frames...
	[2025-03-21 03:49:29,669][07695] Decorrelating experience for 0 frames...
	[2025-03-21 03:49:29,672][07699] Decorrelating experience for 0 frames...
	[2025-03-21 03:49:29,674][07701] Decorrelating experience for 0 frames...
	[2025-03-21 03:49:30,767][07700] Decorrelating experience for 32 frames...
	[2025-03-21 03:49:30,856][07702] Decorrelating experience for 0 frames...
	[2025-03-21 03:49:31,063][07701] Decorrelating experience for 32 frames...
	[2025-03-21 03:49:31,066][07699] Decorrelating experience for 32 frames...
	[2025-03-21 03:49:31,630][07696] Decorrelating experience for 0 frames...
	[2025-03-21 03:49:32,524][07156] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-21 03:49:32,942][07695] Decorrelating experience for 32 frames...
	[2025-03-21 03:49:33,381][07702] Decorrelating experience for 32 frames...
	[2025-03-21 03:49:33,416][07698] Decorrelating experience for 0 frames...
	[2025-03-21 03:49:33,610][07699] Decorrelating experience for 64 frames...
	[2025-03-21 03:49:33,670][07701] Decorrelating experience for 64 frames...
	[2025-03-21 03:49:33,956][07700] Decorrelating experience for 64 frames...
	[2025-03-21 03:49:34,076][07696] Decorrelating experience for 32 frames...
	[2025-03-21 03:49:34,995][07698] Decorrelating experience for 32 frames...
	[2025-03-21 03:49:35,507][07695] Decorrelating experience for 64 frames...
	[2025-03-21 03:49:35,526][07702] Decorrelating experience for 64 frames...
	[2025-03-21 03:49:35,560][07697] Decorrelating experience for 0 frames...
	[2025-03-21 03:49:35,816][07701] Decorrelating experience for 96 frames...
	[2025-03-21 03:49:36,776][07698] Decorrelating experience for 64 frames...
	[2025-03-21 03:49:36,839][07696] Decorrelating experience for 64 frames...
	[2025-03-21 03:49:36,878][07700] Decorrelating experience for 96 frames...
	[2025-03-21 03:49:37,235][07697] Decorrelating experience for 32 frames...
	[2025-03-21 03:49:37,239][07699] Decorrelating experience for 96 frames...
	[2025-03-21 03:49:37,523][07156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 2.4. Samples: 12. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-21 03:49:37,837][07698] Decorrelating experience for 96 frames...
	[2025-03-21 03:49:38,931][07702] Decorrelating experience for 96 frames...
	[2025-03-21 03:49:39,317][07695] Decorrelating experience for 96 frames...
	[2025-03-21 03:49:40,465][07696] Decorrelating experience for 96 frames...
	[2025-03-21 03:49:41,995][07681] Signal inference workers to stop experience collection...
	[2025-03-21 03:49:41,999][07694] InferenceWorker_p0-w0: stopping experience collection
	[2025-03-21 03:49:42,142][07697] Decorrelating experience for 64 frames...
	[2025-03-21 03:49:42,523][07156] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 158.2. Samples: 1582. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-21 03:49:42,524][07156] Avg episode reward: [(0, '2.760')]
	[2025-03-21 03:49:42,582][07697] Decorrelating experience for 96 frames...
	[2025-03-21 03:49:43,248][07681] Signal inference workers to resume experience collection...
	[2025-03-21 03:49:43,250][07694] InferenceWorker_p0-w0: resuming experience collection
	[2025-03-21 03:49:47,523][07156] Fps is (10 sec: 2048.0, 60 sec: 1365.4, 300 sec: 1365.4). Total num frames: 20480. Throughput: 0: 347.4. Samples: 5210. Policy #0 lag: (min: 0.0, avg: 0.4, max: 3.0)
	[2025-03-21 03:49:47,525][07156] Avg episode reward: [(0, '3.355')]
	[2025-03-21 03:49:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 1843.3, 300 sec: 1843.3). Total num frames: 36864. Throughput: 0: 501.3. Samples: 10026. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 03:49:52,525][07156] Avg episode reward: [(0, '3.771')]
	[2025-03-21 03:49:52,736][07694] Updated weights for policy 0, policy_version 10 (0.0097)
	[2025-03-21 03:49:57,523][07156] Fps is (10 sec: 3686.4, 60 sec: 2293.9, 300 sec: 2293.9). Total num frames: 57344. Throughput: 0: 533.0. Samples: 13324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 03:49:57,525][07156] Avg episode reward: [(0, '4.537')]
	[2025-03-21 03:50:02,523][07156] Fps is (10 sec: 4096.0, 60 sec: 2594.2, 300 sec: 2594.2). Total num frames: 77824. Throughput: 0: 654.1. Samples: 19622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:50:02,527][07156] Avg episode reward: [(0, '4.644')]
	[2025-03-21 03:50:03,329][07694] Updated weights for policy 0, policy_version 20 (0.0039)
	[2025-03-21 03:50:07,523][07156] Fps is (10 sec: 3276.8, 60 sec: 2574.7, 300 sec: 2574.7). Total num frames: 90112. Throughput: 0: 672.9. Samples: 23552. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:50:07,528][07156] Avg episode reward: [(0, '4.614')]
	[2025-03-21 03:50:12,523][07156] Fps is (10 sec: 3686.4, 60 sec: 2867.3, 300 sec: 2867.3). Total num frames: 114688. Throughput: 0: 663.4. Samples: 26536. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:50:12,524][07156] Avg episode reward: [(0, '4.503')]
	[2025-03-21 03:50:12,531][07681] Saving new best policy, reward=4.503!
	[2025-03-21 03:50:14,289][07694] Updated weights for policy 0, policy_version 30 (0.0021)
	[2025-03-21 03:50:17,523][07156] Fps is (10 sec: 4096.0, 60 sec: 2912.8, 300 sec: 2912.8). Total num frames: 131072. Throughput: 0: 740.1. Samples: 33302. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:50:17,528][07156] Avg episode reward: [(0, '4.622')]
	[2025-03-21 03:50:17,533][07681] Saving new best policy, reward=4.622!
	[2025-03-21 03:50:22,523][07156] Fps is (10 sec: 3276.8, 60 sec: 2949.2, 300 sec: 2949.2). Total num frames: 147456. Throughput: 0: 838.2. Samples: 37730. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 03:50:22,524][07156] Avg episode reward: [(0, '4.409')]
	[2025-03-21 03:50:25,504][07694] Updated weights for policy 0, policy_version 40 (0.0029)
	[2025-03-21 03:50:27,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3127.9, 300 sec: 3127.9). Total num frames: 172032. Throughput: 0: 877.2. Samples: 41056. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:50:27,527][07156] Avg episode reward: [(0, '4.497')]
	[2025-03-21 03:50:32,528][07156] Fps is (10 sec: 4094.0, 60 sec: 3140.1, 300 sec: 3140.1). Total num frames: 188416. Throughput: 0: 942.7. Samples: 47636. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:50:32,529][07156] Avg episode reward: [(0, '4.662')]
	[2025-03-21 03:50:32,589][07681] Saving new best policy, reward=4.662!
	[2025-03-21 03:50:36,648][07694] Updated weights for policy 0, policy_version 50 (0.0030)
	[2025-03-21 03:50:37,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3150.8). Total num frames: 204800. Throughput: 0: 934.7. Samples: 52088. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
	[2025-03-21 03:50:37,524][07156] Avg episode reward: [(0, '4.665')]
	[2025-03-21 03:50:37,582][07681] Saving new best policy, reward=4.665!
	[2025-03-21 03:50:42,523][07156] Fps is (10 sec: 4098.0, 60 sec: 3822.9, 300 sec: 3276.9). Total num frames: 229376. Throughput: 0: 933.3. Samples: 55324. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-21 03:50:42,527][07156] Avg episode reward: [(0, '4.454')]
	[2025-03-21 03:50:46,138][07694] Updated weights for policy 0, policy_version 60 (0.0035)
	[2025-03-21 03:50:47,523][07156] Fps is (10 sec: 4095.9, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 245760. Throughput: 0: 941.6. Samples: 61992. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-21 03:50:47,525][07156] Avg episode reward: [(0, '4.421')]
	[2025-03-21 03:50:52,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 262144. Throughput: 0: 945.7. Samples: 66110. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:50:52,527][07156] Avg episode reward: [(0, '4.434')]
	[2025-03-21 03:50:57,525][07156] Fps is (10 sec: 3685.8, 60 sec: 3754.6, 300 sec: 3325.0). Total num frames: 282624. Throughput: 0: 949.2. Samples: 69250. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
	[2025-03-21 03:50:57,530][07156] Avg episode reward: [(0, '4.695')]
	[2025-03-21 03:50:57,537][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth...
	[2025-03-21 03:50:57,688][07681] Saving new best policy, reward=4.695!
	[2025-03-21 03:50:58,157][07694] Updated weights for policy 0, policy_version 70 (0.0026)
	[2025-03-21 03:51:02,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 933.5. Samples: 75310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 03:51:02,530][07156] Avg episode reward: [(0, '4.632')]
	[2025-03-21 03:51:07,523][07156] Fps is (10 sec: 2867.7, 60 sec: 3686.4, 300 sec: 3276.8). Total num frames: 311296. Throughput: 0: 916.1. Samples: 78956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:51:07,525][07156] Avg episode reward: [(0, '4.413')]
	[2025-03-21 03:51:10,784][07694] Updated weights for policy 0, policy_version 80 (0.0039)
	[2025-03-21 03:51:12,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3317.8). Total num frames: 331776. Throughput: 0: 907.9. Samples: 81910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:51:12,526][07156] Avg episode reward: [(0, '4.482')]
	[2025-03-21 03:51:17,523][07156] Fps is (10 sec: 4095.9, 60 sec: 3686.4, 300 sec: 3354.8). Total num frames: 352256. Throughput: 0: 895.3. Samples: 87922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:51:17,525][07156] Avg episode reward: [(0, '4.553')]
	[2025-03-21 03:51:22,527][07156] Fps is (10 sec: 3275.7, 60 sec: 3617.9, 300 sec: 3314.0). Total num frames: 364544. Throughput: 0: 885.8. Samples: 91954. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:51:22,531][07156] Avg episode reward: [(0, '4.451')]
	[2025-03-21 03:51:22,947][07694] Updated weights for policy 0, policy_version 90 (0.0015)
	[2025-03-21 03:51:27,523][07156] Fps is (10 sec: 3686.5, 60 sec: 3618.1, 300 sec: 3383.7). Total num frames: 389120. Throughput: 0: 887.9. Samples: 95280. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 03:51:27,524][07156] Avg episode reward: [(0, '4.607')]
	[2025-03-21 03:51:32,273][07694] Updated weights for policy 0, policy_version 100 (0.0013)
	[2025-03-21 03:51:32,525][07156] Fps is (10 sec: 4506.2, 60 sec: 3686.6, 300 sec: 3413.3). Total num frames: 409600. Throughput: 0: 886.6. Samples: 101892. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:51:32,527][07156] Avg episode reward: [(0, '4.763')]
	[2025-03-21 03:51:32,528][07681] Saving new best policy, reward=4.763!
	[2025-03-21 03:51:37,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3375.1). Total num frames: 421888. Throughput: 0: 892.4. Samples: 106266. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:51:37,527][07156] Avg episode reward: [(0, '4.710')]
	[2025-03-21 03:51:42,523][07156] Fps is (10 sec: 3687.2, 60 sec: 3618.1, 300 sec: 3434.4). Total num frames: 446464. Throughput: 0: 896.9. Samples: 109608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:51:42,528][07156] Avg episode reward: [(0, '4.685')]
	[2025-03-21 03:51:43,389][07694] Updated weights for policy 0, policy_version 110 (0.0035)
	[2025-03-21 03:51:47,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3686.4, 300 sec: 3458.9). Total num frames: 466944. Throughput: 0: 913.2. Samples: 116402. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 03:51:47,526][07156] Avg episode reward: [(0, '4.841')]
	[2025-03-21 03:51:47,534][07681] Saving new best policy, reward=4.841!
	[2025-03-21 03:51:52,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3618.1, 300 sec: 3423.1). Total num frames: 479232. Throughput: 0: 931.6. Samples: 120878. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:51:52,529][07156] Avg episode reward: [(0, '4.862')]
	[2025-03-21 03:51:52,536][07681] Saving new best policy, reward=4.862!
	[2025-03-21 03:51:54,414][07694] Updated weights for policy 0, policy_version 120 (0.0021)
	[2025-03-21 03:51:57,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3474.6). Total num frames: 503808. Throughput: 0: 938.7. Samples: 124150. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:51:57,528][07156] Avg episode reward: [(0, '4.736')]
	[2025-03-21 03:52:02,526][07156] Fps is (10 sec: 4504.4, 60 sec: 3754.5, 300 sec: 3495.2). Total num frames: 524288. Throughput: 0: 954.2. Samples: 130862. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:52:02,531][07156] Avg episode reward: [(0, '4.874')]
	[2025-03-21 03:52:02,537][07681] Saving new best policy, reward=4.874!
	[2025-03-21 03:52:04,918][07694] Updated weights for policy 0, policy_version 130 (0.0017)
	[2025-03-21 03:52:07,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3461.8). Total num frames: 536576. Throughput: 0: 959.8. Samples: 135144. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 03:52:07,528][07156] Avg episode reward: [(0, '4.731')]
	[2025-03-21 03:52:12,523][07156] Fps is (10 sec: 3687.4, 60 sec: 3822.9, 300 sec: 3507.2). Total num frames: 561152. Throughput: 0: 961.6. Samples: 138554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:52:12,528][07156] Avg episode reward: [(0, '4.725')]
	[2025-03-21 03:52:14,810][07694] Updated weights for policy 0, policy_version 140 (0.0022)
	[2025-03-21 03:52:17,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3823.0, 300 sec: 3525.1). Total num frames: 581632. Throughput: 0: 966.5. Samples: 145384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:52:17,525][07156] Avg episode reward: [(0, '4.862')]
	[2025-03-21 03:52:22,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3517.8). Total num frames: 598016. Throughput: 0: 972.1. Samples: 150010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:52:22,525][07156] Avg episode reward: [(0, '5.050')]
	[2025-03-21 03:52:22,530][07681] Saving new best policy, reward=5.050!
	[2025-03-21 03:52:25,939][07694] Updated weights for policy 0, policy_version 150 (0.0019)
	[2025-03-21 03:52:27,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3534.3). Total num frames: 618496. Throughput: 0: 969.5. Samples: 153234. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:52:27,530][07156] Avg episode reward: [(0, '5.026')]
	[2025-03-21 03:52:32,526][07156] Fps is (10 sec: 4094.8, 60 sec: 3822.9, 300 sec: 3549.8). Total num frames: 638976. Throughput: 0: 967.5. Samples: 159942. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:52:32,527][07156] Avg episode reward: [(0, '4.885')]
	[2025-03-21 03:52:37,185][07694] Updated weights for policy 0, policy_version 160 (0.0027)
	[2025-03-21 03:52:37,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3542.5). Total num frames: 655360. Throughput: 0: 966.6. Samples: 164376. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 03:52:37,527][07156] Avg episode reward: [(0, '5.099')]
	[2025-03-21 03:52:37,534][07681] Saving new best policy, reward=5.099!
	[2025-03-21 03:52:42,523][07156] Fps is (10 sec: 3687.4, 60 sec: 3822.9, 300 sec: 3557.1). Total num frames: 675840. Throughput: 0: 967.0. Samples: 167664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:52:42,525][07156] Avg episode reward: [(0, '5.035')]
	[2025-03-21 03:52:46,308][07694] Updated weights for policy 0, policy_version 170 (0.0018)
	[2025-03-21 03:52:47,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3570.9). Total num frames: 696320. Throughput: 0: 968.2. Samples: 174430. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:52:47,525][07156] Avg episode reward: [(0, '4.881')]
	[2025-03-21 03:52:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3563.5). Total num frames: 712704. Throughput: 0: 972.6. Samples: 178910. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:52:52,527][07156] Avg episode reward: [(0, '5.243')]
	[2025-03-21 03:52:52,530][07681] Saving new best policy, reward=5.243!
	[2025-03-21 03:52:57,524][07156] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3576.5). Total num frames: 733184. Throughput: 0: 968.7. Samples: 182148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:52:57,528][07156] Avg episode reward: [(0, '5.268')]
	[2025-03-21 03:52:57,536][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth...
	[2025-03-21 03:52:57,685][07681] Saving new best policy, reward=5.268!
	[2025-03-21 03:52:57,971][07694] Updated weights for policy 0, policy_version 180 (0.0036)
	[2025-03-21 03:53:02,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3588.9). Total num frames: 753664. Throughput: 0: 958.7. Samples: 188526. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:53:02,526][07156] Avg episode reward: [(0, '5.178')]
	[2025-03-21 03:53:07,523][07156] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3581.6). Total num frames: 770048. Throughput: 0: 951.3. Samples: 192820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:53:07,525][07156] Avg episode reward: [(0, '5.131')]
	[2025-03-21 03:53:09,335][07694] Updated weights for policy 0, policy_version 190 (0.0019)
	[2025-03-21 03:53:12,523][07156] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3593.3). Total num frames: 790528. Throughput: 0: 954.0. Samples: 196162. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 03:53:12,527][07156] Avg episode reward: [(0, '5.573')]
	[2025-03-21 03:53:12,531][07681] Saving new best policy, reward=5.573!
	[2025-03-21 03:53:17,527][07156] Fps is (10 sec: 4094.3, 60 sec: 3822.7, 300 sec: 3604.4). Total num frames: 811008. Throughput: 0: 949.1. Samples: 202652. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:53:17,528][07156] Avg episode reward: [(0, '5.725')]
	[2025-03-21 03:53:17,537][07681] Saving new best policy, reward=5.725!
	[2025-03-21 03:53:19,955][07694] Updated weights for policy 0, policy_version 200 (0.0026)
	[2025-03-21 03:53:22,523][07156] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3597.4). Total num frames: 827392. Throughput: 0: 950.1. Samples: 207132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:53:22,527][07156] Avg episode reward: [(0, '5.629')]
	[2025-03-21 03:53:27,523][07156] Fps is (10 sec: 3687.9, 60 sec: 3822.9, 300 sec: 3608.0). Total num frames: 847872. Throughput: 0: 951.9. Samples: 210500. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:53:27,527][07156] Avg episode reward: [(0, '5.540')]
	[2025-03-21 03:53:29,727][07694] Updated weights for policy 0, policy_version 210 (0.0020)
	[2025-03-21 03:53:32,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3618.2). Total num frames: 868352. Throughput: 0: 951.6. Samples: 217250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:53:32,526][07156] Avg episode reward: [(0, '5.946')]
	[2025-03-21 03:53:32,531][07681] Saving new best policy, reward=5.946!
	[2025-03-21 03:53:37,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3611.2). Total num frames: 884736. Throughput: 0: 948.4. Samples: 221590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:53:37,527][07156] Avg episode reward: [(0, '6.215')]
	[2025-03-21 03:53:37,536][07681] Saving new best policy, reward=6.215!
	[2025-03-21 03:53:41,089][07694] Updated weights for policy 0, policy_version 220 (0.0015)
	[2025-03-21 03:53:42,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3620.9). Total num frames: 905216. Throughput: 0: 949.2. Samples: 224862. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2025-03-21 03:53:42,529][07156] Avg episode reward: [(0, '6.056')]
	[2025-03-21 03:53:47,526][07156] Fps is (10 sec: 4094.9, 60 sec: 3822.8, 300 sec: 3630.2). Total num frames: 925696. Throughput: 0: 957.9. Samples: 231634. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 03:53:47,530][07156] Avg episode reward: [(0, '5.905')]
	[2025-03-21 03:53:52,033][07694] Updated weights for policy 0, policy_version 230 (0.0019)
	[2025-03-21 03:53:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3623.4). Total num frames: 942080. Throughput: 0: 963.9. Samples: 236194. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 03:53:52,529][07156] Avg episode reward: [(0, '6.090')]
	[2025-03-21 03:53:57,523][07156] Fps is (10 sec: 3687.4, 60 sec: 3823.0, 300 sec: 3632.3). Total num frames: 962560. Throughput: 0: 964.0. Samples: 239540. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:53:57,528][07156] Avg episode reward: [(0, '5.729')]
	[2025-03-21 03:54:01,283][07694] Updated weights for policy 0, policy_version 240 (0.0031)
	[2025-03-21 03:54:02,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3640.9). Total num frames: 983040. Throughput: 0: 969.4. Samples: 246270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:54:02,526][07156] Avg episode reward: [(0, '5.795')]
	[2025-03-21 03:54:07,524][07156] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3634.3). Total num frames: 999424. Throughput: 0: 964.9. Samples: 250552. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 03:54:07,528][07156] Avg episode reward: [(0, '5.897')]
	[2025-03-21 03:54:12,445][07694] Updated weights for policy 0, policy_version 250 (0.0017)
	[2025-03-21 03:54:12,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3657.2). Total num frames: 1024000. Throughput: 0: 964.8. Samples: 253916. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:54:12,528][07156] Avg episode reward: [(0, '5.936')]
	[2025-03-21 03:54:17,523][07156] Fps is (10 sec: 4096.2, 60 sec: 3823.2, 300 sec: 3650.5). Total num frames: 1040384. Throughput: 0: 962.6. Samples: 260566. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:54:17,527][07156] Avg episode reward: [(0, '6.635')]
	[2025-03-21 03:54:17,533][07681] Saving new best policy, reward=6.635!
	[2025-03-21 03:54:22,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3644.0). Total num frames: 1056768. Throughput: 0: 963.7. Samples: 264956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:54:22,529][07156] Avg episode reward: [(0, '7.041')]
	[2025-03-21 03:54:22,535][07681] Saving new best policy, reward=7.041!
	[2025-03-21 03:54:23,937][07694] Updated weights for policy 0, policy_version 260 (0.0028)
	[2025-03-21 03:54:27,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 1077248. Throughput: 0: 962.6. Samples: 268180. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 03:54:27,530][07156] Avg episode reward: [(0, '7.489')]
	[2025-03-21 03:54:27,539][07681] Saving new best policy, reward=7.489!
	[2025-03-21 03:54:32,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 958.6. Samples: 274770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:54:32,527][07156] Avg episode reward: [(0, '7.547')]
	[2025-03-21 03:54:32,528][07681] Saving new best policy, reward=7.547!
	[2025-03-21 03:54:34,816][07694] Updated weights for policy 0, policy_version 270 (0.0050)
	[2025-03-21 03:54:37,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1114112. Throughput: 0: 956.4. Samples: 279230. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 03:54:37,525][07156] Avg episode reward: [(0, '7.654')]
	[2025-03-21 03:54:37,531][07681] Saving new best policy, reward=7.654!
	[2025-03-21 03:54:42,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1134592. Throughput: 0: 953.9. Samples: 282466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:54:42,528][07156] Avg episode reward: [(0, '7.989')]
	[2025-03-21 03:54:42,536][07681] Saving new best policy, reward=7.989!
	[2025-03-21 03:54:44,430][07694] Updated weights for policy 0, policy_version 280 (0.0019)
	[2025-03-21 03:54:47,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3790.5). Total num frames: 1155072. Throughput: 0: 949.8. Samples: 289010. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:54:47,529][07156] Avg episode reward: [(0, '7.398')]
	[2025-03-21 03:54:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1171456. Throughput: 0: 958.3. Samples: 293676. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:54:52,525][07156] Avg episode reward: [(0, '7.400')]
	[2025-03-21 03:54:55,604][07694] Updated weights for policy 0, policy_version 290 (0.0022)
	[2025-03-21 03:54:57,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1196032. Throughput: 0: 957.3. Samples: 296994. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:54:57,525][07156] Avg episode reward: [(0, '8.147')]
	[2025-03-21 03:54:57,532][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_1196032.pth...
	[2025-03-21 03:54:57,656][07681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000069_282624.pth
	[2025-03-21 03:54:57,668][07681] Saving new best policy, reward=8.147!
	[2025-03-21 03:55:02,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1212416. Throughput: 0: 952.0. Samples: 303404. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:55:02,524][07156] Avg episode reward: [(0, '8.664')]
	[2025-03-21 03:55:02,530][07681] Saving new best policy, reward=8.664!
	[2025-03-21 03:55:06,923][07694] Updated weights for policy 0, policy_version 300 (0.0019)
	[2025-03-21 03:55:07,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3776.6). Total num frames: 1228800. Throughput: 0: 958.3. Samples: 308078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:55:07,528][07156] Avg episode reward: [(0, '8.599')]
	[2025-03-21 03:55:12,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1253376. Throughput: 0: 959.3. Samples: 311348. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 03:55:12,528][07156] Avg episode reward: [(0, '8.169')]
	[2025-03-21 03:55:16,378][07694] Updated weights for policy 0, policy_version 310 (0.0016)
	[2025-03-21 03:55:17,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1269760. Throughput: 0: 958.3. Samples: 317894. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 03:55:17,526][07156] Avg episode reward: [(0, '7.813')]
	[2025-03-21 03:55:22,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1290240. Throughput: 0: 966.9. Samples: 322742. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:55:22,527][07156] Avg episode reward: [(0, '7.894')]
	[2025-03-21 03:55:27,033][07694] Updated weights for policy 0, policy_version 320 (0.0021)
	[2025-03-21 03:55:27,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.5). Total num frames: 1310720. Throughput: 0: 968.8. Samples: 326064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 03:55:27,527][07156] Avg episode reward: [(0, '7.514')]
	[2025-03-21 03:55:32,523][07156] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1331200. Throughput: 0: 966.0. Samples: 332478. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:55:32,527][07156] Avg episode reward: [(0, '8.129')]
	[2025-03-21 03:55:37,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1347584. Throughput: 0: 974.5. Samples: 337528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:55:37,525][07156] Avg episode reward: [(0, '8.871')]
	[2025-03-21 03:55:37,530][07681] Saving new best policy, reward=8.871!
	[2025-03-21 03:55:38,058][07694] Updated weights for policy 0, policy_version 330 (0.0028)
	[2025-03-21 03:55:42,523][07156] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1368064. Throughput: 0: 972.7. Samples: 340764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:55:42,525][07156] Avg episode reward: [(0, '10.002')]
	[2025-03-21 03:55:42,574][07681] Saving new best policy, reward=10.002!
	[2025-03-21 03:55:47,523][07156] Fps is (10 sec: 4095.9, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1388544. Throughput: 0: 970.8. Samples: 347092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:55:47,526][07156] Avg episode reward: [(0, '9.897')]
	[2025-03-21 03:55:48,359][07694] Updated weights for policy 0, policy_version 340 (0.0023)
	[2025-03-21 03:55:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1404928. Throughput: 0: 977.5. Samples: 352064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 03:55:52,527][07156] Avg episode reward: [(0, '9.918')]
	[2025-03-21 03:55:57,529][07156] Fps is (10 sec: 4093.7, 60 sec: 3890.8, 300 sec: 3832.1). Total num frames: 1429504. Throughput: 0: 978.4. Samples: 355384. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:55:57,534][07156] Avg episode reward: [(0, '9.968')]
	[2025-03-21 03:55:58,437][07694] Updated weights for policy 0, policy_version 350 (0.0023)
	[2025-03-21 03:56:02,526][07156] Fps is (10 sec: 4094.9, 60 sec: 3891.0, 300 sec: 3846.0). Total num frames: 1445888. Throughput: 0: 972.0. Samples: 361638. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:56:02,527][07156] Avg episode reward: [(0, '10.534')]
	[2025-03-21 03:56:02,530][07681] Saving new best policy, reward=10.534!
	[2025-03-21 03:56:07,523][07156] Fps is (10 sec: 3278.7, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 1462272. Throughput: 0: 975.1. Samples: 366622. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:56:07,531][07156] Avg episode reward: [(0, '10.944')]
	[2025-03-21 03:56:07,620][07681] Saving new best policy, reward=10.944!
	[2025-03-21 03:56:09,695][07694] Updated weights for policy 0, policy_version 360 (0.0034)
	[2025-03-21 03:56:12,523][07156] Fps is (10 sec: 4097.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1486848. Throughput: 0: 970.9. Samples: 369756. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:56:12,525][07156] Avg episode reward: [(0, '12.112')]
	[2025-03-21 03:56:12,527][07681] Saving new best policy, reward=12.112!
	[2025-03-21 03:56:17,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1503232. Throughput: 0: 966.1. Samples: 375954. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:56:17,525][07156] Avg episode reward: [(0, '12.408')]
	[2025-03-21 03:56:17,536][07681] Saving new best policy, reward=12.408!
	[2025-03-21 03:56:20,852][07694] Updated weights for policy 0, policy_version 370 (0.0014)
	[2025-03-21 03:56:22,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1519616. Throughput: 0: 963.2. Samples: 380872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:56:22,525][07156] Avg episode reward: [(0, '13.122')]
	[2025-03-21 03:56:22,530][07681] Saving new best policy, reward=13.122!
	[2025-03-21 03:56:27,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1544192. Throughput: 0: 965.4. Samples: 384208. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:56:27,527][07156] Avg episode reward: [(0, '14.136')]
	[2025-03-21 03:56:27,534][07681] Saving new best policy, reward=14.136!
	[2025-03-21 03:56:30,528][07694] Updated weights for policy 0, policy_version 380 (0.0028)
	[2025-03-21 03:56:32,523][07156] Fps is (10 sec: 4095.9, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 1560576. Throughput: 0: 958.3. Samples: 390216. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 03:56:32,525][07156] Avg episode reward: [(0, '13.955')]
	[2025-03-21 03:56:37,523][07156] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 1576960. Throughput: 0: 962.3. Samples: 395366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:56:37,525][07156] Avg episode reward: [(0, '13.844')]
	[2025-03-21 03:56:41,289][07694] Updated weights for policy 0, policy_version 390 (0.0028)
	[2025-03-21 03:56:42,523][07156] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1601536. Throughput: 0: 961.3. Samples: 398636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2025-03-21 03:56:42,528][07156] Avg episode reward: [(0, '14.360')]
	[2025-03-21 03:56:42,532][07681] Saving new best policy, reward=14.360!
	[2025-03-21 03:56:47,523][07156] Fps is (10 sec: 4096.1, 60 sec: 3823.0, 300 sec: 3860.0). Total num frames: 1617920. Throughput: 0: 957.4. Samples: 404720. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:56:47,524][07156] Avg episode reward: [(0, '15.116')]
	[2025-03-21 03:56:47,529][07681] Saving new best policy, reward=15.116!
	[2025-03-21 03:56:52,453][07694] Updated weights for policy 0, policy_version 400 (0.0023)
	[2025-03-21 03:56:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1638400. Throughput: 0: 962.2. Samples: 409920. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 03:56:52,524][07156] Avg episode reward: [(0, '15.398')]
	[2025-03-21 03:56:52,526][07681] Saving new best policy, reward=15.398!
	[2025-03-21 03:56:57,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3823.3, 300 sec: 3846.1). Total num frames: 1658880. Throughput: 0: 965.0. Samples: 413182. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 03:56:57,525][07156] Avg episode reward: [(0, '15.805')]
	[2025-03-21 03:56:57,530][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000405_1658880.pth...
	[2025-03-21 03:56:57,661][07681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000179_733184.pth
	[2025-03-21 03:56:57,673][07681] Saving new best policy, reward=15.805!
	[2025-03-21 03:57:02,524][07156] Fps is (10 sec: 3686.3, 60 sec: 3823.1, 300 sec: 3860.0). Total num frames: 1675264. Throughput: 0: 957.9. Samples: 419058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:57:02,525][07156] Avg episode reward: [(0, '15.438')]
	[2025-03-21 03:57:03,204][07694] Updated weights for policy 0, policy_version 410 (0.0035)
	[2025-03-21 03:57:07,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1695744. Throughput: 0: 965.2. Samples: 424308. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:57:07,525][07156] Avg episode reward: [(0, '14.188')]
	[2025-03-21 03:57:12,523][07156] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1716224. Throughput: 0: 964.1. Samples: 427592. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:57:12,527][07156] Avg episode reward: [(0, '13.482')]
	[2025-03-21 03:57:12,802][07694] Updated weights for policy 0, policy_version 420 (0.0025)
	[2025-03-21 03:57:17,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 1732608. Throughput: 0: 966.4. Samples: 433706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:57:17,524][07156] Avg episode reward: [(0, '14.326')]
	[2025-03-21 03:57:22,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 1753088. Throughput: 0: 970.8. Samples: 439050. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 03:57:22,527][07156] Avg episode reward: [(0, '14.555')]
	[2025-03-21 03:57:23,660][07694] Updated weights for policy 0, policy_version 430 (0.0017)
	[2025-03-21 03:57:27,527][07156] Fps is (10 sec: 4504.0, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 1777664. Throughput: 0: 973.0. Samples: 442424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:57:27,530][07156] Avg episode reward: [(0, '14.404')]
	[2025-03-21 03:57:32,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1794048. Throughput: 0: 969.8. Samples: 448362. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2025-03-21 03:57:32,528][07156] Avg episode reward: [(0, '15.745')]
	[2025-03-21 03:57:34,813][07694] Updated weights for policy 0, policy_version 440 (0.0024)
	[2025-03-21 03:57:37,523][07156] Fps is (10 sec: 3687.8, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 1814528. Throughput: 0: 976.8. Samples: 453876. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:57:37,525][07156] Avg episode reward: [(0, '15.976')]
	[2025-03-21 03:57:37,533][07681] Saving new best policy, reward=15.976!
	[2025-03-21 03:57:42,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1835008. Throughput: 0: 975.0. Samples: 457058. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:57:42,524][07156] Avg episode reward: [(0, '16.526')]
	[2025-03-21 03:57:42,528][07681] Saving new best policy, reward=16.526!
	[2025-03-21 03:57:44,034][07694] Updated weights for policy 0, policy_version 450 (0.0023)
	[2025-03-21 03:57:47,524][07156] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3859.9). Total num frames: 1851392. Throughput: 0: 973.4. Samples: 462862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:57:47,528][07156] Avg episode reward: [(0, '16.711')]
	[2025-03-21 03:57:47,539][07681] Saving new best policy, reward=16.711!
	[2025-03-21 03:57:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1871872. Throughput: 0: 978.9. Samples: 468360. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 03:57:52,527][07156] Avg episode reward: [(0, '17.628')]
	[2025-03-21 03:57:52,530][07681] Saving new best policy, reward=17.628!
	[2025-03-21 03:57:55,135][07694] Updated weights for policy 0, policy_version 460 (0.0019)
	[2025-03-21 03:57:57,523][07156] Fps is (10 sec: 4096.3, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1892352. Throughput: 0: 979.1. Samples: 471650. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:57:57,528][07156] Avg episode reward: [(0, '18.921')]
	[2025-03-21 03:57:57,536][07681] Saving new best policy, reward=18.921!
	[2025-03-21 03:58:02,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1908736. Throughput: 0: 968.9. Samples: 477306. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 03:58:02,525][07156] Avg episode reward: [(0, '18.053')]
	[2025-03-21 03:58:06,182][07694] Updated weights for policy 0, policy_version 470 (0.0034)
	[2025-03-21 03:58:07,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1929216. Throughput: 0: 975.8. Samples: 482962. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:58:07,526][07156] Avg episode reward: [(0, '17.808')]
	[2025-03-21 03:58:12,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1949696. Throughput: 0: 972.5. Samples: 486184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 03:58:12,528][07156] Avg episode reward: [(0, '16.961')]
	[2025-03-21 03:58:16,512][07694] Updated weights for policy 0, policy_version 480 (0.0030)
	[2025-03-21 03:58:17,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 1966080. Throughput: 0: 967.4. Samples: 491894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:58:17,526][07156] Avg episode reward: [(0, '16.155')]
	[2025-03-21 03:58:22,526][07156] Fps is (10 sec: 3685.4, 60 sec: 3891.0, 300 sec: 3859.9). Total num frames: 1986560. Throughput: 0: 970.6. Samples: 497556. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:58:22,529][07156] Avg episode reward: [(0, '16.643')]
	[2025-03-21 03:58:26,453][07694] Updated weights for policy 0, policy_version 490 (0.0026)
	[2025-03-21 03:58:27,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 2011136. Throughput: 0: 976.1. Samples: 500984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 03:58:27,526][07156] Avg episode reward: [(0, '18.102')]
	[2025-03-21 03:58:32,524][07156] Fps is (10 sec: 4096.6, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 2027520. Throughput: 0: 971.2. Samples: 506568. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 03:58:32,527][07156] Avg episode reward: [(0, '18.240')]
	[2025-03-21 03:58:37,390][07694] Updated weights for policy 0, policy_version 500 (0.0040)
	[2025-03-21 03:58:37,524][07156] Fps is (10 sec: 3686.1, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 2048000. Throughput: 0: 977.1. Samples: 512332. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:58:37,525][07156] Avg episode reward: [(0, '16.882')]
	[2025-03-21 03:58:42,523][07156] Fps is (10 sec: 4096.5, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 2068480. Throughput: 0: 975.1. Samples: 515530. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:58:42,527][07156] Avg episode reward: [(0, '17.699')]
	[2025-03-21 03:58:47,525][07156] Fps is (10 sec: 3686.1, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2084864. Throughput: 0: 972.8. Samples: 521082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:58:47,526][07156] Avg episode reward: [(0, '17.197')]
	[2025-03-21 03:58:48,556][07694] Updated weights for policy 0, policy_version 510 (0.0040)
	[2025-03-21 03:58:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2105344. Throughput: 0: 976.0. Samples: 526882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 03:58:52,524][07156] Avg episode reward: [(0, '16.585')]
	[2025-03-21 03:58:57,523][07156] Fps is (10 sec: 4096.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2125824. Throughput: 0: 977.2. Samples: 530156. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:58:57,527][07156] Avg episode reward: [(0, '18.299')]
	[2025-03-21 03:58:57,535][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000519_2125824.pth...
	[2025-03-21 03:58:57,671][07681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000292_1196032.pth
	[2025-03-21 03:58:57,851][07694] Updated weights for policy 0, policy_version 520 (0.0018)
	[2025-03-21 03:59:02,524][07156] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2142208. Throughput: 0: 971.6. Samples: 535616. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:59:02,527][07156] Avg episode reward: [(0, '18.679')]
	[2025-03-21 03:59:07,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2162688. Throughput: 0: 973.4. Samples: 541356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:59:07,525][07156] Avg episode reward: [(0, '17.562')]
	[2025-03-21 03:59:08,767][07694] Updated weights for policy 0, policy_version 530 (0.0034)
	[2025-03-21 03:59:12,523][07156] Fps is (10 sec: 4096.1, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2183168. Throughput: 0: 968.4. Samples: 544560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:59:12,525][07156] Avg episode reward: [(0, '19.234')]
	[2025-03-21 03:59:12,534][07681] Saving new best policy, reward=19.234!
	[2025-03-21 03:59:17,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2199552. Throughput: 0: 963.7. Samples: 549934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:59:17,528][07156] Avg episode reward: [(0, '19.610')]
	[2025-03-21 03:59:17,534][07681] Saving new best policy, reward=19.610!
	[2025-03-21 03:59:20,186][07694] Updated weights for policy 0, policy_version 540 (0.0034)
	[2025-03-21 03:59:22,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 2220032. Throughput: 0: 962.0. Samples: 555622. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 03:59:22,528][07156] Avg episode reward: [(0, '20.648')]
	[2025-03-21 03:59:22,533][07681] Saving new best policy, reward=20.648!
	[2025-03-21 03:59:27,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2240512. Throughput: 0: 963.6. Samples: 558890. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:59:27,525][07156] Avg episode reward: [(0, '20.211')]
	[2025-03-21 03:59:30,362][07694] Updated weights for policy 0, policy_version 550 (0.0019)
	[2025-03-21 03:59:32,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 2256896. Throughput: 0: 960.0. Samples: 564280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:59:32,527][07156] Avg episode reward: [(0, '20.976')]
	[2025-03-21 03:59:32,532][07681] Saving new best policy, reward=20.976!
	[2025-03-21 03:59:37,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 2277376. Throughput: 0: 959.2. Samples: 570046. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:59:37,529][07156] Avg episode reward: [(0, '21.676')]
	[2025-03-21 03:59:37,538][07681] Saving new best policy, reward=21.676!
	[2025-03-21 03:59:40,832][07694] Updated weights for policy 0, policy_version 560 (0.0027)
	[2025-03-21 03:59:42,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2297856. Throughput: 0: 957.1. Samples: 573226. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 03:59:42,525][07156] Avg episode reward: [(0, '22.099')]
	[2025-03-21 03:59:42,532][07681] Saving new best policy, reward=22.099!
	[2025-03-21 03:59:47,526][07156] Fps is (10 sec: 3685.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2314240. Throughput: 0: 956.2. Samples: 578646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:59:47,529][07156] Avg episode reward: [(0, '22.153')]
	[2025-03-21 03:59:47,541][07681] Saving new best policy, reward=22.153!
	[2025-03-21 03:59:52,120][07694] Updated weights for policy 0, policy_version 570 (0.0016)
	[2025-03-21 03:59:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2334720. Throughput: 0: 953.6. Samples: 584270. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 03:59:52,524][07156] Avg episode reward: [(0, '22.888')]
	[2025-03-21 03:59:52,530][07681] Saving new best policy, reward=22.888!
	[2025-03-21 03:59:57,523][07156] Fps is (10 sec: 4097.1, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2355200. Throughput: 0: 954.5. Samples: 587514. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 03:59:57,524][07156] Avg episode reward: [(0, '22.905')]
	[2025-03-21 03:59:57,536][07681] Saving new best policy, reward=22.905!
	[2025-03-21 04:00:02,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 2371584. Throughput: 0: 953.8. Samples: 592854. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:00:02,527][07156] Avg episode reward: [(0, '22.359')]
	[2025-03-21 04:00:03,357][07694] Updated weights for policy 0, policy_version 580 (0.0025)
	[2025-03-21 04:00:07,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2392064. Throughput: 0: 957.5. Samples: 598710. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:00:07,525][07156] Avg episode reward: [(0, '22.004')]
	[2025-03-21 04:00:12,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 2412544. Throughput: 0: 959.6. Samples: 602074. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:00:12,525][07156] Avg episode reward: [(0, '20.517')]
	[2025-03-21 04:00:12,556][07694] Updated weights for policy 0, policy_version 590 (0.0022)
	[2025-03-21 04:00:17,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2428928. Throughput: 0: 959.4. Samples: 607454. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:00:17,526][07156] Avg episode reward: [(0, '19.842')]
	[2025-03-21 04:00:22,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2449408. Throughput: 0: 961.7. Samples: 613322. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:00:22,525][07156] Avg episode reward: [(0, '19.549')]
	[2025-03-21 04:00:23,714][07694] Updated weights for policy 0, policy_version 600 (0.0022)
	[2025-03-21 04:00:27,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2473984. Throughput: 0: 964.8. Samples: 616644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:00:27,525][07156] Avg episode reward: [(0, '19.721')]
	[2025-03-21 04:00:32,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2486272. Throughput: 0: 963.0. Samples: 621978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 04:00:32,529][07156] Avg episode reward: [(0, '19.702')]
	[2025-03-21 04:00:34,591][07694] Updated weights for policy 0, policy_version 610 (0.0013)
	[2025-03-21 04:00:37,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2510848. Throughput: 0: 972.1. Samples: 628016. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 04:00:37,528][07156] Avg episode reward: [(0, '19.955')]
	[2025-03-21 04:00:42,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2531328. Throughput: 0: 974.2. Samples: 631352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:00:42,528][07156] Avg episode reward: [(0, '20.878')]
	[2025-03-21 04:00:44,435][07694] Updated weights for policy 0, policy_version 620 (0.0023)
	[2025-03-21 04:00:47,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3823.1, 300 sec: 3860.0). Total num frames: 2543616. Throughput: 0: 968.2. Samples: 636424. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:00:47,526][07156] Avg episode reward: [(0, '21.064')]
	[2025-03-21 04:00:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2568192. Throughput: 0: 975.2. Samples: 642592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 04:00:52,530][07156] Avg episode reward: [(0, '21.254')]
	[2025-03-21 04:00:54,918][07694] Updated weights for policy 0, policy_version 630 (0.0020)
	[2025-03-21 04:00:57,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 2588672. Throughput: 0: 974.1. Samples: 645908. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 04:00:57,527][07156] Avg episode reward: [(0, '21.373')]
	[2025-03-21 04:00:57,534][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000632_2588672.pth...
	[2025-03-21 04:00:57,680][07681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000405_1658880.pth
	[2025-03-21 04:01:02,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 2600960. Throughput: 0: 966.8. Samples: 650960. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:01:02,526][07156] Avg episode reward: [(0, '21.490')]
	[2025-03-21 04:01:06,130][07694] Updated weights for policy 0, policy_version 640 (0.0038)
	[2025-03-21 04:01:07,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2625536. Throughput: 0: 971.4. Samples: 657034. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:01:07,529][07156] Avg episode reward: [(0, '22.211')]
	[2025-03-21 04:01:12,523][07156] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2646016. Throughput: 0: 974.2. Samples: 660484. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2025-03-21 04:01:12,528][07156] Avg episode reward: [(0, '22.039')]
	[2025-03-21 04:01:17,419][07694] Updated weights for policy 0, policy_version 650 (0.0029)
	[2025-03-21 04:01:17,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2662400. Throughput: 0: 967.2. Samples: 665504. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 04:01:17,524][07156] Avg episode reward: [(0, '23.091')]
	[2025-03-21 04:01:17,531][07681] Saving new best policy, reward=23.091!
	[2025-03-21 04:01:22,523][07156] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 2682880. Throughput: 0: 969.5. Samples: 671644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:01:22,525][07156] Avg episode reward: [(0, '22.890')]
	[2025-03-21 04:01:26,333][07694] Updated weights for policy 0, policy_version 660 (0.0018)
	[2025-03-21 04:01:27,524][07156] Fps is (10 sec: 4505.3, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2707456. Throughput: 0: 970.4. Samples: 675022. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 04:01:27,525][07156] Avg episode reward: [(0, '22.329')]
	[2025-03-21 04:01:32,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2719744. Throughput: 0: 971.6. Samples: 680146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 04:01:32,527][07156] Avg episode reward: [(0, '21.978')]
	[2025-03-21 04:01:37,358][07694] Updated weights for policy 0, policy_version 670 (0.0022)
	[2025-03-21 04:01:37,523][07156] Fps is (10 sec: 3686.7, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2744320. Throughput: 0: 974.8. Samples: 686460. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:01:37,530][07156] Avg episode reward: [(0, '20.824')]
	[2025-03-21 04:01:42,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2764800. Throughput: 0: 975.4. Samples: 689800. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 04:01:42,527][07156] Avg episode reward: [(0, '20.228')]
	[2025-03-21 04:01:47,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2781184. Throughput: 0: 974.2. Samples: 694800. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:01:47,528][07156] Avg episode reward: [(0, '19.930')]
	[2025-03-21 04:01:48,332][07694] Updated weights for policy 0, policy_version 680 (0.0022)
	[2025-03-21 04:01:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2801664. Throughput: 0: 978.2. Samples: 701054. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 04:01:52,524][07156] Avg episode reward: [(0, '19.502')]
	[2025-03-21 04:01:57,525][07156] Fps is (10 sec: 4095.2, 60 sec: 3891.1, 300 sec: 3887.7). Total num frames: 2822144. Throughput: 0: 975.1. Samples: 704366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:01:57,526][07156] Avg episode reward: [(0, '19.054')]
	[2025-03-21 04:01:57,587][07694] Updated weights for policy 0, policy_version 690 (0.0032)
	[2025-03-21 04:02:02,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2838528. Throughput: 0: 973.6. Samples: 709318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-21 04:02:02,527][07156] Avg episode reward: [(0, '19.413')]
	[2025-03-21 04:02:07,523][07156] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2859008. Throughput: 0: 979.2. Samples: 715708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:02:07,528][07156] Avg episode reward: [(0, '19.455')]
	[2025-03-21 04:02:08,647][07694] Updated weights for policy 0, policy_version 700 (0.0017)
	[2025-03-21 04:02:12,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3901.6). Total num frames: 2883584. Throughput: 0: 979.7. Samples: 719106. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:02:12,528][07156] Avg episode reward: [(0, '19.593')]
	[2025-03-21 04:02:17,523][07156] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2895872. Throughput: 0: 974.9. Samples: 724018. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:02:17,528][07156] Avg episode reward: [(0, '20.226')]
	[2025-03-21 04:02:19,682][07694] Updated weights for policy 0, policy_version 710 (0.0016)
	[2025-03-21 04:02:22,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.9). Total num frames: 2920448. Throughput: 0: 979.0. Samples: 730516. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 04:02:22,525][07156] Avg episode reward: [(0, '21.616')]
	[2025-03-21 04:02:27,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2940928. Throughput: 0: 980.0. Samples: 733900. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:02:27,526][07156] Avg episode reward: [(0, '21.343')]
	[2025-03-21 04:02:29,887][07694] Updated weights for policy 0, policy_version 720 (0.0013)
	[2025-03-21 04:02:32,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3873.8). Total num frames: 2957312. Throughput: 0: 976.4. Samples: 738738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 04:02:32,529][07156] Avg episode reward: [(0, '21.427')]
	[2025-03-21 04:02:37,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 2977792. Throughput: 0: 979.8. Samples: 745146. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 04:02:37,528][07156] Avg episode reward: [(0, '21.191')]
	[2025-03-21 04:02:39,770][07694] Updated weights for policy 0, policy_version 730 (0.0021)
	[2025-03-21 04:02:42,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 2998272. Throughput: 0: 981.2. Samples: 748518. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 04:02:42,524][07156] Avg episode reward: [(0, '22.027')]
	[2025-03-21 04:02:47,525][07156] Fps is (10 sec: 3685.8, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 3014656. Throughput: 0: 975.4. Samples: 753212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 04:02:47,526][07156] Avg episode reward: [(0, '20.950')]
	[2025-03-21 04:02:50,817][07694] Updated weights for policy 0, policy_version 740 (0.0022)
	[2025-03-21 04:02:52,524][07156] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3035136. Throughput: 0: 976.5. Samples: 759650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:02:52,529][07156] Avg episode reward: [(0, '22.884')]
	[2025-03-21 04:02:57,523][07156] Fps is (10 sec: 4096.7, 60 sec: 3891.3, 300 sec: 3887.7). Total num frames: 3055616. Throughput: 0: 976.1. Samples: 763030. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:02:57,526][07156] Avg episode reward: [(0, '22.351')]
	[2025-03-21 04:02:57,537][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000746_3055616.pth...
	[2025-03-21 04:02:57,708][07681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000519_2125824.pth
	[2025-03-21 04:03:02,042][07694] Updated weights for policy 0, policy_version 750 (0.0029)
	[2025-03-21 04:03:02,523][07156] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3072000. Throughput: 0: 970.8. Samples: 767702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:03:02,525][07156] Avg episode reward: [(0, '22.132')]
	[2025-03-21 04:03:07,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3092480. Throughput: 0: 970.1. Samples: 774172. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:03:07,525][07156] Avg episode reward: [(0, '22.147')]
	[2025-03-21 04:03:11,221][07694] Updated weights for policy 0, policy_version 760 (0.0020)
	[2025-03-21 04:03:12,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3117056. Throughput: 0: 969.9. Samples: 777546. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 04:03:12,526][07156] Avg episode reward: [(0, '21.652')]
	[2025-03-21 04:03:17,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 3129344. Throughput: 0: 967.6. Samples: 782278. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 04:03:17,525][07156] Avg episode reward: [(0, '21.999')]
	[2025-03-21 04:03:22,436][07694] Updated weights for policy 0, policy_version 770 (0.0019)
	[2025-03-21 04:03:22,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3153920. Throughput: 0: 970.0. Samples: 788798. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:03:22,525][07156] Avg episode reward: [(0, '21.839')]
	[2025-03-21 04:03:27,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3174400. Throughput: 0: 968.0. Samples: 792080. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:03:27,524][07156] Avg episode reward: [(0, '23.619')]
	[2025-03-21 04:03:27,531][07681] Saving new best policy, reward=23.619!
	[2025-03-21 04:03:32,523][07156] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3860.0). Total num frames: 3186688. Throughput: 0: 968.0. Samples: 796770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:03:32,530][07156] Avg episode reward: [(0, '22.708')]
	[2025-03-21 04:03:33,635][07694] Updated weights for policy 0, policy_version 780 (0.0013)
	[2025-03-21 04:03:37,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3211264. Throughput: 0: 969.6. Samples: 803282. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 04:03:37,526][07156] Avg episode reward: [(0, '22.358')]
	[2025-03-21 04:03:42,523][07156] Fps is (10 sec: 4505.7, 60 sec: 3891.2, 300 sec: 3887.8). Total num frames: 3231744. Throughput: 0: 966.5. Samples: 806522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:03:42,528][07156] Avg episode reward: [(0, '23.501')]
	[2025-03-21 04:03:43,190][07694] Updated weights for policy 0, policy_version 790 (0.0019)
	[2025-03-21 04:03:47,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3873.8). Total num frames: 3248128. Throughput: 0: 967.9. Samples: 811256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 04:03:47,529][07156] Avg episode reward: [(0, '24.111')]
	[2025-03-21 04:03:47,536][07681] Saving new best policy, reward=24.111!
	[2025-03-21 04:03:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3268608. Throughput: 0: 967.4. Samples: 817704. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:03:52,528][07156] Avg episode reward: [(0, '22.876')]
	[2025-03-21 04:03:53,977][07694] Updated weights for policy 0, policy_version 800 (0.0020)
	[2025-03-21 04:03:57,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3289088. Throughput: 0: 966.0. Samples: 821018. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 04:03:57,528][07156] Avg episode reward: [(0, '22.687')]
	[2025-03-21 04:04:02,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3305472. Throughput: 0: 966.7. Samples: 825778. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 04:04:02,527][07156] Avg episode reward: [(0, '22.776')]
	[2025-03-21 04:04:04,925][07694] Updated weights for policy 0, policy_version 810 (0.0027)
	[2025-03-21 04:04:07,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3325952. Throughput: 0: 968.4. Samples: 832378. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 04:04:07,524][07156] Avg episode reward: [(0, '21.259')]
	[2025-03-21 04:04:12,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3887.7). Total num frames: 3346432. Throughput: 0: 971.9. Samples: 835814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:04:12,525][07156] Avg episode reward: [(0, '22.074')]
	[2025-03-21 04:04:15,901][07694] Updated weights for policy 0, policy_version 820 (0.0028)
	[2025-03-21 04:04:17,525][07156] Fps is (10 sec: 3685.7, 60 sec: 3891.1, 300 sec: 3873.8). Total num frames: 3362816. Throughput: 0: 970.1. Samples: 840426. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:04:17,526][07156] Avg episode reward: [(0, '20.969')]
	[2025-03-21 04:04:22,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3383296. Throughput: 0: 970.9. Samples: 846974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:04:22,524][07156] Avg episode reward: [(0, '22.292')]
	[2025-03-21 04:04:25,201][07694] Updated weights for policy 0, policy_version 830 (0.0020)
	[2025-03-21 04:04:27,525][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 3407872. Throughput: 0: 973.9. Samples: 850350. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:04:27,526][07156] Avg episode reward: [(0, '21.985')]
	[2025-03-21 04:04:32,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3420160. Throughput: 0: 969.5. Samples: 854882. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 04:04:32,525][07156] Avg episode reward: [(0, '22.956')]
	[2025-03-21 04:04:36,451][07694] Updated weights for policy 0, policy_version 840 (0.0014)
	[2025-03-21 04:04:37,523][07156] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3444736. Throughput: 0: 974.3. Samples: 861546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:04:37,525][07156] Avg episode reward: [(0, '21.816')]
	[2025-03-21 04:04:42,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.7). Total num frames: 3465216. Throughput: 0: 975.7. Samples: 864926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
	[2025-03-21 04:04:42,527][07156] Avg episode reward: [(0, '22.435')]
	[2025-03-21 04:04:47,432][07694] Updated weights for policy 0, policy_version 850 (0.0036)
	[2025-03-21 04:04:47,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3481600. Throughput: 0: 973.5. Samples: 869584. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 04:04:47,527][07156] Avg episode reward: [(0, '23.098')]
	[2025-03-21 04:04:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3502080. Throughput: 0: 970.1. Samples: 876032. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:04:52,527][07156] Avg episode reward: [(0, '24.312')]
	[2025-03-21 04:04:52,530][07681] Saving new best policy, reward=24.312!
	[2025-03-21 04:04:57,443][07694] Updated weights for policy 0, policy_version 860 (0.0023)
	[2025-03-21 04:04:57,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3522560. Throughput: 0: 964.5. Samples: 879218. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:04:57,526][07156] Avg episode reward: [(0, '24.673')]
	[2025-03-21 04:04:57,536][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000860_3522560.pth...
	[2025-03-21 04:04:57,722][07681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000632_2588672.pth
	[2025-03-21 04:04:57,754][07681] Saving new best policy, reward=24.673!
	[2025-03-21 04:05:02,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3534848. Throughput: 0: 961.3. Samples: 883682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:05:02,531][07156] Avg episode reward: [(0, '24.271')]
	[2025-03-21 04:05:07,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3559424. Throughput: 0: 962.0. Samples: 890266. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 04:05:07,529][07156] Avg episode reward: [(0, '27.000')]
	[2025-03-21 04:05:07,539][07681] Saving new best policy, reward=27.000!
	[2025-03-21 04:05:08,352][07694] Updated weights for policy 0, policy_version 870 (0.0022)
	[2025-03-21 04:05:12,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3579904. Throughput: 0: 959.2. Samples: 893512. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:05:12,529][07156] Avg episode reward: [(0, '27.259')]
	[2025-03-21 04:05:12,531][07681] Saving new best policy, reward=27.259!
	[2025-03-21 04:05:17,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3823.0, 300 sec: 3873.8). Total num frames: 3592192. Throughput: 0: 957.3. Samples: 897960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:05:17,528][07156] Avg episode reward: [(0, '26.492')]
	[2025-03-21 04:05:19,581][07694] Updated weights for policy 0, policy_version 880 (0.0023)
	[2025-03-21 04:05:22,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3616768. Throughput: 0: 957.4. Samples: 904630. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:05:22,527][07156] Avg episode reward: [(0, '26.015')]
	[2025-03-21 04:05:27,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3823.0, 300 sec: 3901.6). Total num frames: 3637248. Throughput: 0: 957.5. Samples: 908012. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:05:27,526][07156] Avg episode reward: [(0, '25.702')]
	[2025-03-21 04:05:30,099][07694] Updated weights for policy 0, policy_version 890 (0.0034)
	[2025-03-21 04:05:32,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3653632. Throughput: 0: 957.0. Samples: 912648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:05:32,529][07156] Avg episode reward: [(0, '25.766')]
	[2025-03-21 04:05:37,527][07156] Fps is (10 sec: 3685.1, 60 sec: 3822.7, 300 sec: 3873.8). Total num frames: 3674112. Throughput: 0: 962.6. Samples: 919354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:05:37,532][07156] Avg episode reward: [(0, '26.244')]
	[2025-03-21 04:05:39,641][07694] Updated weights for policy 0, policy_version 900 (0.0013)
	[2025-03-21 04:05:42,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3694592. Throughput: 0: 967.6. Samples: 922758. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:05:42,527][07156] Avg episode reward: [(0, '25.308')]
	[2025-03-21 04:05:47,523][07156] Fps is (10 sec: 3687.7, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3710976. Throughput: 0: 968.7. Samples: 927272. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:05:47,524][07156] Avg episode reward: [(0, '25.362')]
	[2025-03-21 04:05:50,944][07694] Updated weights for policy 0, policy_version 910 (0.0028)
	[2025-03-21 04:05:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3873.8). Total num frames: 3731456. Throughput: 0: 969.0. Samples: 933872. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-21 04:05:52,528][07156] Avg episode reward: [(0, '26.356')]
	[2025-03-21 04:05:57,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3901.6). Total num frames: 3751936. Throughput: 0: 972.4. Samples: 937268. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2025-03-21 04:05:57,528][07156] Avg episode reward: [(0, '26.686')]
	[2025-03-21 04:06:01,858][07694] Updated weights for policy 0, policy_version 920 (0.0028)
	[2025-03-21 04:06:02,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3768320. Throughput: 0: 975.8. Samples: 941872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:06:02,528][07156] Avg episode reward: [(0, '25.478')]
	[2025-03-21 04:06:07,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3792896. Throughput: 0: 974.0. Samples: 948462. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 04:06:07,524][07156] Avg episode reward: [(0, '23.953')]
	[2025-03-21 04:06:11,196][07694] Updated weights for policy 0, policy_version 930 (0.0021)
	[2025-03-21 04:06:12,525][07156] Fps is (10 sec: 4504.7, 60 sec: 3891.1, 300 sec: 3901.6). Total num frames: 3813376. Throughput: 0: 973.5. Samples: 951822. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:06:12,527][07156] Avg episode reward: [(0, '24.352')]
	[2025-03-21 04:06:17,523][07156] Fps is (10 sec: 3276.8, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3825664. Throughput: 0: 970.7. Samples: 956330. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:06:17,528][07156] Avg episode reward: [(0, '24.941')]
	[2025-03-21 04:06:22,407][07694] Updated weights for policy 0, policy_version 940 (0.0019)
	[2025-03-21 04:06:22,523][07156] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 3850240. Throughput: 0: 968.0. Samples: 962912. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:06:22,525][07156] Avg episode reward: [(0, '23.972')]
	[2025-03-21 04:06:27,523][07156] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3901.6). Total num frames: 3870720. Throughput: 0: 966.3. Samples: 966240. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-21 04:06:27,528][07156] Avg episode reward: [(0, '23.781')]
	[2025-03-21 04:06:32,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3887104. Throughput: 0: 970.2. Samples: 970930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 04:06:32,524][07156] Avg episode reward: [(0, '23.663')]
	[2025-03-21 04:06:33,332][07694] Updated weights for policy 0, policy_version 950 (0.0018)
	[2025-03-21 04:06:37,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3873.8). Total num frames: 3907584. Throughput: 0: 972.0. Samples: 977614. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 04:06:37,529][07156] Avg episode reward: [(0, '25.329')]
	[2025-03-21 04:06:42,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3928064. Throughput: 0: 971.9. Samples: 981002. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-21 04:06:42,529][07156] Avg episode reward: [(0, '25.713')]
	[2025-03-21 04:06:43,120][07694] Updated weights for policy 0, policy_version 960 (0.0021)
	[2025-03-21 04:06:47,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 3944448. Throughput: 0: 973.2. Samples: 985668. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:06:47,526][07156] Avg episode reward: [(0, '24.859')]
	[2025-03-21 04:06:52,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.9). Total num frames: 3964928. Throughput: 0: 974.5. Samples: 992314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:06:52,528][07156] Avg episode reward: [(0, '24.699')]
	[2025-03-21 04:06:53,421][07694] Updated weights for policy 0, policy_version 970 (0.0013)
	[2025-03-21 04:06:57,523][07156] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3887.7). Total num frames: 3985408. Throughput: 0: 975.4. Samples: 995714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-21 04:06:57,525][07156] Avg episode reward: [(0, '26.049')]
	[2025-03-21 04:06:57,539][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000973_3985408.pth...
	[2025-03-21 04:06:57,691][07681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000746_3055616.pth
	[2025-03-21 04:07:02,523][07156] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3873.8). Total num frames: 4001792. Throughput: 0: 977.2. Samples: 1000302. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-21 04:07:02,533][07156] Avg episode reward: [(0, '24.755')]
	[2025-03-21 04:07:02,716][07681] Stopping Batcher_0...
	[2025-03-21 04:07:02,718][07681] Loop batcher_evt_loop terminating...
	[2025-03-21 04:07:02,718][07156] Component Batcher_0 stopped!
	[2025-03-21 04:07:02,725][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:07:02,784][07694] Weights refcount: 2 0
	[2025-03-21 04:07:02,786][07694] Stopping InferenceWorker_p0-w0...
	[2025-03-21 04:07:02,787][07694] Loop inference_proc0-0_evt_loop terminating...
	[2025-03-21 04:07:02,786][07156] Component InferenceWorker_p0-w0 stopped!
	[2025-03-21 04:07:02,862][07681] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000860_3522560.pth
	[2025-03-21 04:07:02,896][07681] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:07:03,073][07156] Component LearnerWorker_p0 stopped!
	[2025-03-21 04:07:03,073][07681] Stopping LearnerWorker_p0...
	[2025-03-21 04:07:03,078][07681] Loop learner_proc0_evt_loop terminating...
	[2025-03-21 04:07:03,161][07156] Component RolloutWorker_w3 stopped!
	[2025-03-21 04:07:03,163][07698] Stopping RolloutWorker_w3...
	[2025-03-21 04:07:03,167][07156] Component RolloutWorker_w7 stopped!
	[2025-03-21 04:07:03,168][07702] Stopping RolloutWorker_w7...
	[2025-03-21 04:07:03,172][07702] Loop rollout_proc7_evt_loop terminating...
	[2025-03-21 04:07:03,168][07698] Loop rollout_proc3_evt_loop terminating...
	[2025-03-21 04:07:03,178][07156] Component RolloutWorker_w5 stopped!
	[2025-03-21 04:07:03,179][07700] Stopping RolloutWorker_w5...
	[2025-03-21 04:07:03,181][07700] Loop rollout_proc5_evt_loop terminating...
	[2025-03-21 04:07:03,241][07156] Component RolloutWorker_w1 stopped!
	[2025-03-21 04:07:03,242][07696] Stopping RolloutWorker_w1...
	[2025-03-21 04:07:03,244][07696] Loop rollout_proc1_evt_loop terminating...
	[2025-03-21 04:07:03,278][07156] Component RolloutWorker_w0 stopped!
	[2025-03-21 04:07:03,278][07695] Stopping RolloutWorker_w0...
	[2025-03-21 04:07:03,285][07695] Loop rollout_proc0_evt_loop terminating...
	[2025-03-21 04:07:03,305][07156] Component RolloutWorker_w6 stopped!
	[2025-03-21 04:07:03,306][07701] Stopping RolloutWorker_w6...
	[2025-03-21 04:07:03,313][07701] Loop rollout_proc6_evt_loop terminating...
	[2025-03-21 04:07:03,329][07156] Component RolloutWorker_w2 stopped!
	[2025-03-21 04:07:03,329][07697] Stopping RolloutWorker_w2...
	[2025-03-21 04:07:03,334][07697] Loop rollout_proc2_evt_loop terminating...
	[2025-03-21 04:07:03,348][07699] Stopping RolloutWorker_w4...
	[2025-03-21 04:07:03,348][07156] Component RolloutWorker_w4 stopped!
	[2025-03-21 04:07:03,355][07156] Waiting for process learner_proc0 to stop...
	[2025-03-21 04:07:03,356][07699] Loop rollout_proc4_evt_loop terminating...
	[2025-03-21 04:07:05,059][07156] Waiting for process inference_proc0-0 to join...
	[2025-03-21 04:07:05,061][07156] Waiting for process rollout_proc0 to join...
	[2025-03-21 04:07:07,347][07156] Waiting for process rollout_proc1 to join...
	[2025-03-21 04:07:07,350][07156] Waiting for process rollout_proc2 to join...
	[2025-03-21 04:07:07,351][07156] Waiting for process rollout_proc3 to join...
	[2025-03-21 04:07:07,353][07156] Waiting for process rollout_proc4 to join...
	[2025-03-21 04:07:07,356][07156] Waiting for process rollout_proc5 to join...
	[2025-03-21 04:07:07,358][07156] Waiting for process rollout_proc6 to join...
	[2025-03-21 04:07:07,359][07156] Waiting for process rollout_proc7 to join...
	[2025-03-21 04:07:07,362][07156] Batcher 0 profile tree view:
	batching: 27.6787, releasing_batches: 0.0261
	[2025-03-21 04:07:07,364][07156] InferenceWorker_p0-w0 profile tree view:
	wait_policy: 0.0001
	wait_policy_total: 389.8431
	update_model: 9.1450
	weight_update: 0.0021
	one_step: 0.0058
	handle_policy_step: 616.4750
	deserialize: 14.5212, stack: 3.3913, obs_to_device_normalize: 129.2742, forward: 319.7633, send_messages: 30.1000
	prepare_outputs: 92.2645
	to_cpu: 56.2804
	[2025-03-21 04:07:07,366][07156] Learner 0 profile tree view:
	misc: 0.0054, prepare_batch: 12.5754
	train: 72.9136
	epoch_init: 0.0050, minibatch_init: 0.0058, losses_postprocess: 0.7120, kl_divergence: 0.6749, after_optimizer: 32.8155
	calculate_losses: 25.9117
	losses_init: 0.0123, forward_head: 1.4982, bptt_initial: 16.9884, tail: 1.1329, advantages_returns: 0.2712, losses: 3.6256
	bptt: 2.1408
	bptt_forward_core: 2.0515
	update: 12.1128
	clip: 1.0294
	[2025-03-21 04:07:07,368][07156] RolloutWorker_w0 profile tree view:
	wait_for_trajectories: 0.2524, enqueue_policy_requests: 102.0880, env_step: 827.3383, overhead: 12.7739, complete_rollouts: 7.5939
	save_policy_outputs: 19.7709
	split_output_tensors: 7.5715
	[2025-03-21 04:07:07,369][07156] RolloutWorker_w7 profile tree view:
	wait_for_trajectories: 0.3094, enqueue_policy_requests: 102.6377, env_step: 827.4641, overhead: 12.7623, complete_rollouts: 6.6195
	save_policy_outputs: 19.0376
	split_output_tensors: 7.5935
	[2025-03-21 04:07:07,371][07156] Loop Runner_EvtLoop terminating...
	[2025-03-21 04:07:07,372][07156] Runner profile tree view:
	main_loop: 1084.6471
	[2025-03-21 04:07:07,374][07156] Collected {0: 4005888}, FPS: 3693.3
	[2025-03-21 04:14:23,448][07156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:14:23,449][07156] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:14:23,451][07156] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:14:23,453][07156] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:14:23,454][07156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:14:23,455][07156] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:14:23,456][07156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:14:23,457][07156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:14:23,458][07156] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:14:23,460][07156] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:14:23,461][07156] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:14:23,462][07156] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:14:23,464][07156] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:14:23,465][07156] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:14:23,467][07156] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:14:23,496][07156] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 04:14:23,499][07156] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:14:23,501][07156] RunningMeanStd input shape: (1,)
	[2025-03-21 04:14:23,516][07156] ConvEncoder: input_channels=3
	[2025-03-21 04:14:23,619][07156] Conv encoder output size: 512
	[2025-03-21 04:14:23,620][07156] Policy head output size: 512
	[2025-03-21 04:14:23,803][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:14:23,806][07156] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:14:23,808][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:14:23,811][07156] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:14:23,812][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:14:23,814][07156] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:24:36,738][07156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:24:36,739][07156] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:24:36,740][07156] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:24:36,741][07156] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:24:36,742][07156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:24:36,743][07156] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:24:36,744][07156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:24:36,745][07156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:24:36,746][07156] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:24:36,746][07156] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:24:36,747][07156] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:24:36,749][07156] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:24:36,750][07156] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:24:36,751][07156] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:24:36,753][07156] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:24:36,786][07156] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:24:36,788][07156] RunningMeanStd input shape: (1,)
	[2025-03-21 04:24:36,800][07156] ConvEncoder: input_channels=3
	[2025-03-21 04:24:36,842][07156] Conv encoder output size: 512
	[2025-03-21 04:24:36,843][07156] Policy head output size: 512
	[2025-03-21 04:24:36,865][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:24:36,867][07156] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:24:36,869][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:24:36,871][07156] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:24:36,871][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:24:36,874][07156] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:26:53,063][07156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:26:53,064][07156] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:26:53,065][07156] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:26:53,066][07156] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:26:53,067][07156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:26:53,068][07156] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:26:53,068][07156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:26:53,069][07156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:26:53,070][07156] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:26:53,071][07156] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:26:53,072][07156] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:26:53,073][07156] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:26:53,074][07156] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:26:53,075][07156] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:26:53,075][07156] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:26:53,104][07156] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:26:53,105][07156] RunningMeanStd input shape: (1,)
	[2025-03-21 04:26:53,116][07156] ConvEncoder: input_channels=3
	[2025-03-21 04:26:53,152][07156] Conv encoder output size: 512
	[2025-03-21 04:26:53,153][07156] Policy head output size: 512
	[2025-03-21 04:26:53,172][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:26:53,174][07156] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:26:53,175][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:26:53,176][07156] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:26:53,178][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:26:53,179][07156] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:28:00,719][07156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:28:00,720][07156] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:28:00,721][07156] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:28:00,722][07156] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:28:00,723][07156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:28:00,724][07156] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:28:00,725][07156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:28:00,725][07156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:28:00,726][07156] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:28:00,727][07156] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:28:00,728][07156] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:28:00,729][07156] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:28:00,730][07156] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:28:00,730][07156] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:28:00,731][07156] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:28:00,759][07156] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:28:00,760][07156] RunningMeanStd input shape: (1,)
	[2025-03-21 04:28:00,770][07156] ConvEncoder: input_channels=3
	[2025-03-21 04:28:00,809][07156] Conv encoder output size: 512
	[2025-03-21 04:28:00,810][07156] Policy head output size: 512
	[2025-03-21 04:28:00,830][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:28:00,832][07156] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:28:00,833][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:28:00,835][07156] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:28:00,837][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:28:00,838][07156] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:28:09,150][07156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:28:09,151][07156] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:28:09,152][07156] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:28:09,153][07156] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:28:09,154][07156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:28:09,155][07156] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:28:09,155][07156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:28:09,156][07156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:28:09,157][07156] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:28:09,159][07156] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:28:09,160][07156] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:28:09,161][07156] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:28:09,163][07156] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:28:09,164][07156] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:28:09,165][07156] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:28:09,193][07156] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:28:09,194][07156] RunningMeanStd input shape: (1,)
	[2025-03-21 04:28:09,204][07156] ConvEncoder: input_channels=3
	[2025-03-21 04:28:09,239][07156] Conv encoder output size: 512
	[2025-03-21 04:28:09,240][07156] Policy head output size: 512
	[2025-03-21 04:28:09,260][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:28:09,261][07156] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:28:09,263][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:28:09,264][07156] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:28:09,265][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:28:09,267][07156] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:32:45,328][07156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:32:45,329][07156] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:32:45,331][07156] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:32:45,331][07156] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:32:45,332][07156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:32:45,333][07156] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:32:45,334][07156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:32:45,335][07156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:32:45,336][07156] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:32:45,336][07156] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:32:45,337][07156] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:32:45,338][07156] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:32:45,339][07156] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:32:45,340][07156] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:32:45,341][07156] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:32:45,373][07156] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:32:45,375][07156] RunningMeanStd input shape: (1,)
	[2025-03-21 04:32:45,388][07156] ConvEncoder: input_channels=3
	[2025-03-21 04:32:45,424][07156] Conv encoder output size: 512
	[2025-03-21 04:32:45,424][07156] Policy head output size: 512
	[2025-03-21 04:32:45,442][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:32:45,444][07156] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:32:45,446][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:32:45,448][07156] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:32:45,450][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:32:45,452][07156] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:33:16,910][07156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:33:16,911][07156] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:33:16,913][07156] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:33:16,913][07156] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:33:16,914][07156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:33:16,915][07156] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:33:16,916][07156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:33:16,917][07156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:33:16,918][07156] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:33:16,921][07156] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:33:16,922][07156] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:33:16,923][07156] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:33:16,924][07156] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:33:16,925][07156] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:33:16,926][07156] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:33:16,974][07156] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:33:16,978][07156] RunningMeanStd input shape: (1,)
	[2025-03-21 04:33:16,997][07156] ConvEncoder: input_channels=3
	[2025-03-21 04:33:17,069][07156] Conv encoder output size: 512
	[2025-03-21 04:33:17,072][07156] Policy head output size: 512
	[2025-03-21 04:33:17,101][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:33:17,103][07156] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=True)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:33:17,105][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:33:17,107][07156] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=True)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:33:17,108][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:33:17,109][07156] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=True)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:33:58,579][07156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:33:58,580][07156] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:33:58,581][07156] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:33:58,582][07156] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:33:58,583][07156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:33:58,584][07156] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:33:58,585][07156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:33:58,586][07156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:33:58,587][07156] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:33:58,588][07156] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:33:58,589][07156] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:33:58,590][07156] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:33:58,591][07156] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:33:58,592][07156] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:33:58,593][07156] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:33:58,627][07156] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:33:58,628][07156] RunningMeanStd input shape: (1,)
	[2025-03-21 04:33:58,640][07156] ConvEncoder: input_channels=3
	[2025-03-21 04:33:58,677][07156] Conv encoder output size: 512
	[2025-03-21 04:33:58,678][07156] Policy head output size: 512
	[2025-03-21 04:33:58,698][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:33:58,700][07156] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:33:58,701][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:33:58,703][07156] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:33:58,704][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:33:58,706][07156] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:35:32,573][07156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:35:32,574][07156] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:35:32,575][07156] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:35:32,577][07156] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:35:32,578][07156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:35:32,579][07156] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:35:32,580][07156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:35:32,581][07156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:35:32,582][07156] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:35:32,583][07156] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:35:32,584][07156] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:35:32,585][07156] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:35:32,587][07156] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:35:32,588][07156] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:35:32,589][07156] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:35:32,619][07156] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:35:32,621][07156] RunningMeanStd input shape: (1,)
	[2025-03-21 04:35:32,633][07156] ConvEncoder: input_channels=3
	[2025-03-21 04:35:32,670][07156] Conv encoder output size: 512
	[2025-03-21 04:35:32,671][07156] Policy head output size: 512
	[2025-03-21 04:35:32,692][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:35:32,694][07156] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:35:32,695][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:35:32,696][07156] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:35:32,698][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:35:32,699][07156] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:36:43,545][07156] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:36:43,549][07156] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:36:43,550][07156] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:36:43,553][07156] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:36:43,554][07156] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:36:43,555][07156] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:36:43,556][07156] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:36:43,557][07156] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:36:43,557][07156] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:36:43,558][07156] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:36:43,559][07156] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:36:43,560][07156] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:36:43,564][07156] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:36:43,565][07156] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:36:43,566][07156] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:36:43,748][07156] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:36:43,749][07156] RunningMeanStd input shape: (1,)
	[2025-03-21 04:36:43,780][07156] ConvEncoder: input_channels=3
	[2025-03-21 04:36:43,918][07156] Conv encoder output size: 512
	[2025-03-21 04:36:43,919][07156] Policy head output size: 512
	[2025-03-21 04:36:44,014][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:36:44,020][07156] Could not load from checkpoint, attempt 0
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:36:44,024][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:36:44,025][07156] Could not load from checkpoint, attempt 1
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:36:44,040][07156] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:36:44,045][07156] Could not load from checkpoint, attempt 2
	Traceback (most recent call last):
	File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
	checkpoint_dict = torch.load(latest_checkpoint, map_location=device, weights_only=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
	if typed_storage._data_ptr() != 0:
	^^^^^^^^^^^^^^^^^^^
	_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m.
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

	Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
	[2025-03-21 04:37:35,117][23576] Saving configuration to /content/train_dir/default_experiment/config.json...
	[2025-03-21 04:37:35,120][23576] Rollout worker 0 uses device cpu
	[2025-03-21 04:37:35,121][23576] Rollout worker 1 uses device cpu
	[2025-03-21 04:37:35,124][23576] Rollout worker 2 uses device cpu
	[2025-03-21 04:37:35,124][23576] Rollout worker 3 uses device cpu
	[2025-03-21 04:37:35,125][23576] Rollout worker 4 uses device cpu
	[2025-03-21 04:37:35,128][23576] Rollout worker 5 uses device cpu
	[2025-03-21 04:37:35,129][23576] Rollout worker 6 uses device cpu
	[2025-03-21 04:37:35,129][23576] Rollout worker 7 uses device cpu
	[2025-03-21 04:37:35,278][23576] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-21 04:37:35,280][23576] InferenceWorker_p0-w0: min num requests: 2
	[2025-03-21 04:37:35,324][23576] Starting all processes...
	[2025-03-21 04:37:35,327][23576] Starting process learner_proc0
	[2025-03-21 04:37:36,042][23576] Starting all processes...
	[2025-03-21 04:37:36,055][23576] Starting process inference_proc0-0
	[2025-03-21 04:37:36,056][23576] Starting process rollout_proc0
	[2025-03-21 04:37:36,056][23576] Starting process rollout_proc1
	[2025-03-21 04:37:36,056][23576] Starting process rollout_proc2
	[2025-03-21 04:37:36,056][23576] Starting process rollout_proc3
	[2025-03-21 04:37:36,057][23576] Starting process rollout_proc4
	[2025-03-21 04:37:36,057][23576] Starting process rollout_proc5
	[2025-03-21 04:37:36,057][23576] Starting process rollout_proc6
	[2025-03-21 04:37:36,057][23576] Starting process rollout_proc7
	[2025-03-21 04:37:40,424][23576] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 23576], exiting...
	[2025-03-21 04:37:40,427][23576] Runner profile tree view:
	main_loop: 5.1030
	[2025-03-21 04:37:40,429][23576] Collected {}, FPS: 0.0
	[2025-03-21 04:37:50,940][23576] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:37:50,946][23576] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:37:50,952][23576] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:37:50,954][23576] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:37:50,956][23576] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:37:50,958][23576] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:37:50,963][23576] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:37:50,965][23576] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:37:50,966][23576] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-21 04:37:50,968][23576] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-21 04:37:50,970][23576] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:37:50,975][23576] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:37:50,980][23576] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:37:50,982][23576] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:37:50,984][23576] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:37:51,225][23576] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-21 04:37:51,235][23576] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:37:51,257][23576] RunningMeanStd input shape: (1,)
	[2025-03-21 04:37:51,376][23576] ConvEncoder: input_channels=3
	[2025-03-21 04:37:51,429][23786] Worker 3 uses CPU cores [1]
	[2025-03-21 04:37:51,439][23786] Stopping RolloutWorker_w3...
	[2025-03-21 04:37:51,440][23786] Loop rollout_proc3_evt_loop terminating...
	[2025-03-21 04:37:51,525][23771] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-21 04:37:51,535][23771] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2025-03-21 04:37:51,587][23790] Worker 5 uses CPU cores [1]
	[2025-03-21 04:37:51,637][23771] Num visible devices: 1
	[2025-03-21 04:37:51,671][23771] Starting seed is not provided
	[2025-03-21 04:37:51,671][23771] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-21 04:37:51,671][23771] Initializing actor-critic model on device cuda:0
	[2025-03-21 04:37:51,672][23771] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:37:51,674][23771] RunningMeanStd input shape: (1,)
	[2025-03-21 04:37:51,691][23771] Stopping Batcher_0...
	[2025-03-21 04:37:51,691][23771] Loop batcher_evt_loop terminating...
	[2025-03-21 04:37:51,701][23790] Stopping RolloutWorker_w5...
	[2025-03-21 04:37:51,702][23790] Loop rollout_proc5_evt_loop terminating...
	[2025-03-21 04:37:51,725][23788] Worker 2 uses CPU cores [0]
	[2025-03-21 04:37:51,762][23771] ConvEncoder: input_channels=3
	[2025-03-21 04:37:51,807][23789] Worker 1 uses CPU cores [1]
	[2025-03-21 04:37:51,840][23785] Worker 0 uses CPU cores [0]
	[2025-03-21 04:37:51,852][23788] Stopping RolloutWorker_w2...
	[2025-03-21 04:37:51,862][23788] Loop rollout_proc2_evt_loop terminating...
	[2025-03-21 04:37:51,873][23789] Stopping RolloutWorker_w1...
	[2025-03-21 04:37:51,879][23789] Loop rollout_proc1_evt_loop terminating...
	[2025-03-21 04:37:51,928][23787] Worker 4 uses CPU cores [0]
	[2025-03-21 04:37:51,947][23785] Stopping RolloutWorker_w0...
	[2025-03-21 04:37:51,955][23785] Loop rollout_proc0_evt_loop terminating...
	[2025-03-21 04:37:52,020][23787] Stopping RolloutWorker_w4...
	[2025-03-21 04:37:52,021][23787] Loop rollout_proc4_evt_loop terminating...
	[2025-03-21 04:37:52,033][23791] Worker 7 uses CPU cores [1]
	[2025-03-21 04:37:52,055][23576] Conv encoder output size: 512
	[2025-03-21 04:37:52,059][23576] Policy head output size: 512
	[2025-03-21 04:37:52,060][23792] Worker 6 uses CPU cores [0]
	[2025-03-21 04:37:52,098][23791] Stopping RolloutWorker_w7...
	[2025-03-21 04:37:52,104][23791] Loop rollout_proc7_evt_loop terminating...
	[2025-03-21 04:37:52,116][23792] Stopping RolloutWorker_w6...
	[2025-03-21 04:37:52,117][23792] Loop rollout_proc6_evt_loop terminating...
	[2025-03-21 04:37:52,160][23771] Conv encoder output size: 512
	[2025-03-21 04:37:52,160][23771] Policy head output size: 512
	[2025-03-21 04:37:52,199][23771] Created Actor Critic model with architecture:
	[2025-03-21 04:37:52,199][23771] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2025-03-21 04:37:52,560][23576] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:37:52,571][23771] Using optimizer <class 'torch.optim.adam.Adam'>
	[2025-03-21 04:37:55,123][23576] Num frames 100...
	[2025-03-21 04:37:55,228][23771] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:37:55,299][23771] Loading model from checkpoint
	[2025-03-21 04:37:55,309][23771] Loaded experiment state at self.train_step=978, self.env_steps=4005888
	[2025-03-21 04:37:55,311][23771] Initialized policy 0 weights for model version 978
	[2025-03-21 04:37:55,317][23771] LearnerWorker_p0 finished initialization!
	[2025-03-21 04:37:55,320][23771] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:37:55,421][23576] Num frames 200...
	[2025-03-21 04:37:55,551][23771] Stopping LearnerWorker_p0...
	[2025-03-21 04:37:55,559][23771] Loop learner_proc0_evt_loop terminating...
	[2025-03-21 04:37:55,704][23576] Num frames 300...
	[2025-03-21 04:37:55,952][23576] Num frames 400...
	[2025-03-21 04:37:56,182][23576] Num frames 500...
	[2025-03-21 04:37:56,460][23576] Num frames 600...
	[2025-03-21 04:37:56,719][23576] Num frames 700...
	[2025-03-21 04:37:56,954][23576] Num frames 800...
	[2025-03-21 04:37:57,180][23576] Num frames 900...
	[2025-03-21 04:37:57,421][23576] Num frames 1000...
	[2025-03-21 04:37:57,663][23576] Num frames 1100...
	[2025-03-21 04:37:57,915][23576] Num frames 1200...
	[2025-03-21 04:37:58,111][23576] Num frames 1300...
	[2025-03-21 04:37:58,296][23576] Num frames 1400...
	[2025-03-21 04:37:58,481][23576] Num frames 1500...
	[2025-03-21 04:37:58,648][23576] Num frames 1600...
	[2025-03-21 04:37:58,837][23576] Num frames 1700...
	[2025-03-21 04:37:58,993][23576] Num frames 1800...
	[2025-03-21 04:37:59,215][23576] Avg episode rewards: #0: 43.889, true rewards: #0: 18.890
	[2025-03-21 04:37:59,216][23576] Avg episode reward: 43.889, avg true_objective: 18.890
	[2025-03-21 04:37:59,238][23576] Num frames 1900...
	[2025-03-21 04:37:59,427][23576] Num frames 2000...
	[2025-03-21 04:37:59,623][23576] Num frames 2100...
	[2025-03-21 04:37:59,886][23576] Num frames 2200...
	[2025-03-21 04:38:00,108][23576] Num frames 2300...
	[2025-03-21 04:38:00,302][23576] Num frames 2400...
	[2025-03-21 04:38:00,483][23576] Num frames 2500...
	[2025-03-21 04:38:00,660][23576] Num frames 2600...
	[2025-03-21 04:38:00,826][23576] Avg episode rewards: #0: 28.785, true rewards: #0: 13.285
	[2025-03-21 04:38:00,828][23576] Avg episode reward: 28.785, avg true_objective: 13.285
	[2025-03-21 04:38:00,911][23576] Num frames 2700...
	[2025-03-21 04:38:01,092][23576] Num frames 2800...
	[2025-03-21 04:38:01,288][23576] Num frames 2900...
	[2025-03-21 04:38:01,479][23576] Num frames 3000...
	[2025-03-21 04:38:01,666][23576] Num frames 3100...
	[2025-03-21 04:38:01,825][23576] Num frames 3200...
	[2025-03-21 04:38:01,962][23576] Num frames 3300...
	[2025-03-21 04:38:02,107][23576] Num frames 3400...
	[2025-03-21 04:38:02,237][23576] Num frames 3500...
	[2025-03-21 04:38:02,374][23576] Num frames 3600...
	[2025-03-21 04:38:02,508][23576] Num frames 3700...
	[2025-03-21 04:38:02,641][23576] Num frames 3800...
	[2025-03-21 04:38:02,779][23576] Num frames 3900...
	[2025-03-21 04:38:02,913][23576] Num frames 4000...
	[2025-03-21 04:38:03,056][23576] Num frames 4100...
	[2025-03-21 04:38:03,200][23576] Num frames 4200...
	[2025-03-21 04:38:03,337][23576] Num frames 4300...
	[2025-03-21 04:38:03,475][23576] Num frames 4400...
	[2025-03-21 04:38:03,608][23576] Num frames 4500...
	[2025-03-21 04:38:03,741][23576] Num frames 4600...
	[2025-03-21 04:38:03,878][23576] Num frames 4700...
	[2025-03-21 04:38:03,991][23576] Avg episode rewards: #0: 38.806, true rewards: #0: 15.807
	[2025-03-21 04:38:03,993][23576] Avg episode reward: 38.806, avg true_objective: 15.807
	[2025-03-21 04:38:04,077][23576] Num frames 4800...
	[2025-03-21 04:38:04,208][23576] Num frames 4900...
	[2025-03-21 04:38:04,338][23576] Num frames 5000...
	[2025-03-21 04:38:04,479][23576] Num frames 5100...
	[2025-03-21 04:38:04,611][23576] Num frames 5200...
	[2025-03-21 04:38:04,745][23576] Num frames 5300...
	[2025-03-21 04:38:04,879][23576] Num frames 5400...
	[2025-03-21 04:38:05,014][23576] Num frames 5500...
	[2025-03-21 04:38:05,156][23576] Num frames 5600...
	[2025-03-21 04:38:05,288][23576] Num frames 5700...
	[2025-03-21 04:38:05,425][23576] Num frames 5800...
	[2025-03-21 04:38:05,563][23576] Num frames 5900...
	[2025-03-21 04:38:05,695][23576] Num frames 6000...
	[2025-03-21 04:38:05,833][23576] Num frames 6100...
	[2025-03-21 04:38:05,971][23576] Num frames 6200...
	[2025-03-21 04:38:06,109][23576] Num frames 6300...
	[2025-03-21 04:38:06,247][23576] Num frames 6400...
	[2025-03-21 04:38:06,386][23576] Num frames 6500...
	[2025-03-21 04:38:06,526][23576] Num frames 6600...
	[2025-03-21 04:38:06,661][23576] Num frames 6700...
	[2025-03-21 04:38:06,799][23576] Num frames 6800...
	[2025-03-21 04:38:06,912][23576] Avg episode rewards: #0: 42.854, true rewards: #0: 17.105
	[2025-03-21 04:38:06,913][23576] Avg episode reward: 42.854, avg true_objective: 17.105
	[2025-03-21 04:38:06,991][23576] Num frames 6900...
	[2025-03-21 04:38:07,124][23576] Num frames 7000...
	[2025-03-21 04:38:07,265][23576] Num frames 7100...
	[2025-03-21 04:38:07,401][23576] Num frames 7200...
	[2025-03-21 04:38:07,534][23576] Num frames 7300...
	[2025-03-21 04:38:07,670][23576] Num frames 7400...
	[2025-03-21 04:38:07,806][23576] Num frames 7500...
	[2025-03-21 04:38:07,940][23576] Num frames 7600...
	[2025-03-21 04:38:08,073][23576] Num frames 7700...
	[2025-03-21 04:38:08,216][23576] Num frames 7800...
	[2025-03-21 04:38:08,355][23576] Num frames 7900...
	[2025-03-21 04:38:08,510][23576] Num frames 8000...
	[2025-03-21 04:38:08,646][23576] Num frames 8100...
	[2025-03-21 04:38:08,783][23576] Num frames 8200...
	[2025-03-21 04:38:08,929][23576] Num frames 8300...
	[2025-03-21 04:38:09,086][23576] Avg episode rewards: #0: 41.347, true rewards: #0: 16.748
	[2025-03-21 04:38:09,087][23576] Avg episode reward: 41.347, avg true_objective: 16.748
	[2025-03-21 04:38:09,123][23576] Num frames 8400...
	[2025-03-21 04:38:09,261][23576] Num frames 8500...
	[2025-03-21 04:38:09,398][23576] Num frames 8600...
	[2025-03-21 04:38:09,534][23576] Num frames 8700...
	[2025-03-21 04:38:09,667][23576] Num frames 8800...
	[2025-03-21 04:38:09,801][23576] Num frames 8900...
	[2025-03-21 04:38:09,932][23576] Num frames 9000...
	[2025-03-21 04:38:10,062][23576] Num frames 9100...
	[2025-03-21 04:38:10,198][23576] Num frames 9200...
	[2025-03-21 04:38:10,348][23576] Num frames 9300...
	[2025-03-21 04:38:10,492][23576] Num frames 9400...
	[2025-03-21 04:38:10,625][23576] Num frames 9500...
	[2025-03-21 04:38:10,759][23576] Num frames 9600...
	[2025-03-21 04:38:10,891][23576] Num frames 9700...
	[2025-03-21 04:38:10,994][23576] Avg episode rewards: #0: 40.055, true rewards: #0: 16.222
	[2025-03-21 04:38:10,995][23576] Avg episode reward: 40.055, avg true_objective: 16.222
	[2025-03-21 04:38:11,087][23576] Num frames 9800...
	[2025-03-21 04:38:11,220][23576] Num frames 9900...
	[2025-03-21 04:38:11,373][23576] Num frames 10000...
	[2025-03-21 04:38:11,509][23576] Num frames 10100...
	[2025-03-21 04:38:11,644][23576] Num frames 10200...
	[2025-03-21 04:38:11,789][23576] Num frames 10300...
	[2025-03-21 04:38:11,973][23576] Num frames 10400...
	[2025-03-21 04:38:12,155][23576] Num frames 10500...
	[2025-03-21 04:38:12,340][23576] Num frames 10600...
	[2025-03-21 04:38:12,521][23576] Num frames 10700...
	[2025-03-21 04:38:12,697][23576] Num frames 10800...
	[2025-03-21 04:38:12,878][23576] Num frames 10900...
	[2025-03-21 04:38:13,052][23576] Num frames 11000...
	[2025-03-21 04:38:13,242][23576] Num frames 11100...
	[2025-03-21 04:38:13,439][23576] Num frames 11200...
	[2025-03-21 04:38:13,630][23576] Num frames 11300...
	[2025-03-21 04:38:13,814][23576] Num frames 11400...
	[2025-03-21 04:38:14,008][23576] Num frames 11500...
	[2025-03-21 04:38:14,170][23576] Num frames 11600...
	[2025-03-21 04:38:14,306][23576] Num frames 11700...
	[2025-03-21 04:38:14,453][23576] Num frames 11800...
	[2025-03-21 04:38:14,527][23576] Avg episode rewards: #0: 40.875, true rewards: #0: 16.876
	[2025-03-21 04:38:14,528][23576] Avg episode reward: 40.875, avg true_objective: 16.876
	[2025-03-21 04:38:14,640][23576] Num frames 11900...
	[2025-03-21 04:38:14,776][23576] Num frames 12000...
	[2025-03-21 04:38:14,910][23576] Num frames 12100...
	[2025-03-21 04:38:15,044][23576] Num frames 12200...
	[2025-03-21 04:38:15,174][23576] Num frames 12300...
	[2025-03-21 04:38:15,306][23576] Num frames 12400...
	[2025-03-21 04:38:15,451][23576] Num frames 12500...
	[2025-03-21 04:38:15,537][23576] Avg episode rewards: #0: 37.528, true rewards: #0: 15.654
	[2025-03-21 04:38:15,537][23576] Avg episode reward: 37.528, avg true_objective: 15.654
	[2025-03-21 04:38:15,641][23576] Num frames 12600...
	[2025-03-21 04:38:15,775][23576] Num frames 12700...
	[2025-03-21 04:38:15,907][23576] Num frames 12800...
	[2025-03-21 04:38:16,040][23576] Num frames 12900...
	[2025-03-21 04:38:16,174][23576] Num frames 13000...
	[2025-03-21 04:38:16,312][23576] Num frames 13100...
	[2025-03-21 04:38:16,481][23576] Num frames 13200...
	[2025-03-21 04:38:16,685][23576] Num frames 13300...
	[2025-03-21 04:38:16,863][23576] Num frames 13400...
	[2025-03-21 04:38:17,006][23576] Num frames 13500...
	[2025-03-21 04:38:17,140][23576] Num frames 13600...
	[2025-03-21 04:38:17,271][23576] Num frames 13700...
	[2025-03-21 04:38:17,414][23576] Num frames 13800...
	[2025-03-21 04:38:17,521][23576] Avg episode rewards: #0: 36.705, true rewards: #0: 15.372
	[2025-03-21 04:38:17,522][23576] Avg episode reward: 36.705, avg true_objective: 15.372
	[2025-03-21 04:38:17,607][23576] Num frames 13900...
	[2025-03-21 04:38:17,737][23576] Num frames 14000...
	[2025-03-21 04:38:17,871][23576] Num frames 14100...
	[2025-03-21 04:38:18,004][23576] Num frames 14200...
	[2025-03-21 04:38:18,140][23576] Num frames 14300...
	[2025-03-21 04:38:18,273][23576] Num frames 14400...
	[2025-03-21 04:38:18,425][23576] Num frames 14500...
	[2025-03-21 04:38:18,571][23576] Num frames 14600...
	[2025-03-21 04:38:18,708][23576] Num frames 14700...
	[2025-03-21 04:38:18,850][23576] Num frames 14800...
	[2025-03-21 04:38:18,988][23576] Num frames 14900...
	[2025-03-21 04:38:19,121][23576] Num frames 15000...
	[2025-03-21 04:38:19,253][23576] Num frames 15100...
	[2025-03-21 04:38:19,416][23576] Avg episode rewards: #0: 35.579, true rewards: #0: 15.179
	[2025-03-21 04:38:19,417][23576] Avg episode reward: 35.579, avg true_objective: 15.179
	[2025-03-21 04:40:03,121][23576] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
	[2025-03-21 04:40:55,386][23576] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
	[2025-03-21 04:40:55,387][23576] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-21 04:40:55,388][23576] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-21 04:40:55,389][23576] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-21 04:40:55,390][23576] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-21 04:40:55,390][23576] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-21 04:40:55,391][23576] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
	[2025-03-21 04:40:55,392][23576] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-21 04:40:55,393][23576] Adding new argument 'push_to_hub'=True that is not in the saved config file!
	[2025-03-21 04:40:55,394][23576] Adding new argument 'hf_repository'='Slyne/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
	[2025-03-21 04:40:55,395][23576] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-21 04:40:55,397][23576] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-21 04:40:55,399][23576] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-21 04:40:55,400][23576] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-21 04:40:55,401][23576] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-21 04:40:55,427][23576] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-21 04:40:55,428][23576] RunningMeanStd input shape: (1,)
	[2025-03-21 04:40:55,440][23576] ConvEncoder: input_channels=3
	[2025-03-21 04:40:55,475][23576] Conv encoder output size: 512
	[2025-03-21 04:40:55,476][23576] Policy head output size: 512
	[2025-03-21 04:40:55,494][23576] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-21 04:40:55,959][23576] Num frames 100...
	[2025-03-21 04:40:56,130][23576] Num frames 200...
	[2025-03-21 04:40:56,385][23576] Num frames 300...
	[2025-03-21 04:40:56,820][23576] Num frames 400...
	[2025-03-21 04:40:57,159][23576] Num frames 500...
	[2025-03-21 04:40:57,694][23576] Num frames 600...
	[2025-03-21 04:40:58,123][23576] Num frames 700...
	[2025-03-21 04:40:58,441][23576] Avg episode rewards: #0: 14.680, true rewards: #0: 7.680
	[2025-03-21 04:40:58,442][23576] Avg episode reward: 14.680, avg true_objective: 7.680
	[2025-03-21 04:40:58,532][23576] Num frames 800...
	[2025-03-21 04:40:58,986][23576] Num frames 900...
	[2025-03-21 04:40:59,511][23576] Num frames 1000...
	[2025-03-21 04:40:59,872][23576] Num frames 1100...
	[2025-03-21 04:41:00,111][23576] Num frames 1200...
	[2025-03-21 04:41:00,400][23576] Num frames 1300...
	[2025-03-21 04:41:00,586][23576] Num frames 1400...
	[2025-03-21 04:41:00,878][23576] Num frames 1500...
	[2025-03-21 04:41:01,103][23576] Num frames 1600...
	[2025-03-21 04:41:01,289][23576] Num frames 1700...
	[2025-03-21 04:41:01,483][23576] Num frames 1800...
	[2025-03-21 04:41:01,613][23576] Avg episode rewards: #0: 19.280, true rewards: #0: 9.280
	[2025-03-21 04:41:01,614][23576] Avg episode reward: 19.280, avg true_objective: 9.280
	[2025-03-21 04:41:01,673][23576] Num frames 1900...
	[2025-03-21 04:41:01,806][23576] Num frames 2000...
	[2025-03-21 04:41:01,937][23576] Num frames 2100...
	[2025-03-21 04:41:02,076][23576] Num frames 2200...
	[2025-03-21 04:41:02,210][23576] Num frames 2300...
	[2025-03-21 04:41:02,342][23576] Num frames 2400...
	[2025-03-21 04:41:02,483][23576] Num frames 2500...
	[2025-03-21 04:41:02,615][23576] Num frames 2600...
	[2025-03-21 04:41:02,750][23576] Num frames 2700...
	[2025-03-21 04:41:02,886][23576] Num frames 2800...
	[2025-03-21 04:41:03,024][23576] Num frames 2900...
	[2025-03-21 04:41:03,172][23576] Num frames 3000...
	[2025-03-21 04:41:03,309][23576] Num frames 3100...
	[2025-03-21 04:41:03,455][23576] Num frames 3200...
	[2025-03-21 04:41:03,591][23576] Num frames 3300...
	[2025-03-21 04:41:03,725][23576] Num frames 3400...
	[2025-03-21 04:41:03,864][23576] Num frames 3500...
	[2025-03-21 04:41:03,997][23576] Num frames 3600...
	[2025-03-21 04:41:04,130][23576] Num frames 3700...
	[2025-03-21 04:41:04,311][23576] Avg episode rewards: #0: 28.313, true rewards: #0: 12.647
	[2025-03-21 04:41:04,312][23576] Avg episode reward: 28.313, avg true_objective: 12.647
	[2025-03-21 04:41:04,322][23576] Num frames 3800...
	[2025-03-21 04:41:04,465][23576] Num frames 3900...
	[2025-03-21 04:41:04,601][23576] Num frames 4000...
	[2025-03-21 04:41:04,734][23576] Num frames 4100...
	[2025-03-21 04:41:04,869][23576] Num frames 4200...
	[2025-03-21 04:41:05,002][23576] Num frames 4300...
	[2025-03-21 04:41:05,134][23576] Num frames 4400...
	[2025-03-21 04:41:05,235][23576] Avg episode rewards: #0: 23.835, true rewards: #0: 11.085
	[2025-03-21 04:41:05,236][23576] Avg episode reward: 23.835, avg true_objective: 11.085
	[2025-03-21 04:41:05,324][23576] Num frames 4500...
	[2025-03-21 04:41:05,468][23576] Num frames 4600...
	[2025-03-21 04:41:05,607][23576] Num frames 4700...
	[2025-03-21 04:41:05,743][23576] Num frames 4800...
	[2025-03-21 04:41:05,875][23576] Num frames 4900...
	[2025-03-21 04:41:06,011][23576] Num frames 5000...
	[2025-03-21 04:41:06,146][23576] Num frames 5100...
	[2025-03-21 04:41:06,278][23576] Num frames 5200...
	[2025-03-21 04:41:06,417][23576] Num frames 5300...
	[2025-03-21 04:41:06,560][23576] Num frames 5400...
	[2025-03-21 04:41:06,701][23576] Num frames 5500...
	[2025-03-21 04:41:06,761][23576] Avg episode rewards: #0: 23.606, true rewards: #0: 11.006
	[2025-03-21 04:41:06,762][23576] Avg episode reward: 23.606, avg true_objective: 11.006
	[2025-03-21 04:41:06,892][23576] Num frames 5600...
	[2025-03-21 04:41:07,028][23576] Num frames 5700...
	[2025-03-21 04:41:07,166][23576] Num frames 5800...
	[2025-03-21 04:41:07,313][23576] Avg episode rewards: #0: 20.782, true rewards: #0: 9.782
	[2025-03-21 04:41:07,314][23576] Avg episode reward: 20.782, avg true_objective: 9.782
	[2025-03-21 04:41:07,357][23576] Num frames 5900...
	[2025-03-21 04:41:07,491][23576] Num frames 6000...
	[2025-03-21 04:41:07,631][23576] Num frames 6100...
	[2025-03-21 04:41:07,766][23576] Num frames 6200...
	[2025-03-21 04:41:07,903][23576] Num frames 6300...
	[2025-03-21 04:41:08,036][23576] Num frames 6400...
	[2025-03-21 04:41:08,112][23576] Avg episode rewards: #0: 19.593, true rewards: #0: 9.164
	[2025-03-21 04:41:08,112][23576] Avg episode reward: 19.593, avg true_objective: 9.164
	[2025-03-21 04:41:08,226][23576] Num frames 6500...
	[2025-03-21 04:41:08,369][23576] Num frames 6600...
	[2025-03-21 04:41:08,521][23576] Num frames 6700...
	[2025-03-21 04:41:08,717][23576] Num frames 6800...
	[2025-03-21 04:41:08,912][23576] Num frames 6900...
	[2025-03-21 04:41:09,071][23576] Num frames 7000...
	[2025-03-21 04:41:09,218][23576] Num frames 7100...
	[2025-03-21 04:41:09,364][23576] Num frames 7200...
	[2025-03-21 04:41:09,533][23576] Num frames 7300...
	[2025-03-21 04:41:09,733][23576] Num frames 7400...
	[2025-03-21 04:41:09,913][23576] Num frames 7500...
	[2025-03-21 04:41:10,089][23576] Num frames 7600...
	[2025-03-21 04:41:10,269][23576] Num frames 7700...
	[2025-03-21 04:41:10,449][23576] Num frames 7800...
	[2025-03-21 04:41:10,629][23576] Num frames 7900...
	[2025-03-21 04:41:10,760][23576] Avg episode rewards: #0: 21.174, true rewards: #0: 9.924
	[2025-03-21 04:41:10,762][23576] Avg episode reward: 21.174, avg true_objective: 9.924
	[2025-03-21 04:41:10,874][23576] Num frames 8000...
	[2025-03-21 04:41:11,062][23576] Num frames 8100...
	[2025-03-21 04:41:11,246][23576] Num frames 8200...
	[2025-03-21 04:41:11,444][23576] Num frames 8300...
	[2025-03-21 04:41:11,641][23576] Num frames 8400...
	[2025-03-21 04:41:11,841][23576] Num frames 8500...
	[2025-03-21 04:41:11,984][23576] Num frames 8600...
	[2025-03-21 04:41:12,120][23576] Num frames 8700...
	[2025-03-21 04:41:12,254][23576] Num frames 8800...
	[2025-03-21 04:41:12,355][23576] Avg episode rewards: #0: 21.039, true rewards: #0: 9.817
	[2025-03-21 04:41:12,356][23576] Avg episode reward: 21.039, avg true_objective: 9.817
	[2025-03-21 04:41:12,445][23576] Num frames 8900...
	[2025-03-21 04:41:12,578][23576] Num frames 9000...
	[2025-03-21 04:41:12,719][23576] Num frames 9100...
	[2025-03-21 04:41:12,855][23576] Num frames 9200...
	[2025-03-21 04:41:12,990][23576] Num frames 9300...
	[2025-03-21 04:41:13,128][23576] Num frames 9400...
	[2025-03-21 04:41:13,265][23576] Num frames 9500...
	[2025-03-21 04:41:13,405][23576] Num frames 9600...
	[2025-03-21 04:41:13,539][23576] Num frames 9700...
	[2025-03-21 04:41:13,675][23576] Num frames 9800...
	[2025-03-21 04:41:13,818][23576] Num frames 9900...
	[2025-03-21 04:41:13,950][23576] Num frames 10000...
	[2025-03-21 04:41:14,083][23576] Num frames 10100...
	[2025-03-21 04:41:14,214][23576] Num frames 10200...
	[2025-03-21 04:41:14,352][23576] Num frames 10300...
	[2025-03-21 04:41:14,489][23576] Num frames 10400...
	[2025-03-21 04:41:14,627][23576] Num frames 10500...
	[2025-03-21 04:41:14,772][23576] Num frames 10600...
	[2025-03-21 04:41:14,912][23576] Num frames 10700...
	[2025-03-21 04:41:15,048][23576] Num frames 10800...
	[2025-03-21 04:41:15,185][23576] Num frames 10900...
	[2025-03-21 04:41:15,288][23576] Avg episode rewards: #0: 25.135, true rewards: #0: 10.935
	[2025-03-21 04:41:15,289][23576] Avg episode reward: 25.135, avg true_objective: 10.935
	[2025-03-21 04:42:27,673][23576] Replay video saved to /content/train_dir/default_experiment/replay.mp4!