Upload folder using huggingface_hub

5b84aa2 verified about 2 months ago

70.8 kB

	[2025-03-06 13:01:21,323][00031] Saving configuration to /kaggle/working/train_dir/default_experiment/config.json...
	[2025-03-06 13:01:21,325][00031] Rollout worker 0 uses device cpu
	[2025-03-06 13:01:21,326][00031] Rollout worker 1 uses device cpu
	[2025-03-06 13:01:21,326][00031] Rollout worker 2 uses device cpu
	[2025-03-06 13:01:21,327][00031] Rollout worker 3 uses device cpu
	[2025-03-06 13:01:21,328][00031] Rollout worker 4 uses device cpu
	[2025-03-06 13:01:21,329][00031] Rollout worker 5 uses device cpu
	[2025-03-06 13:01:21,330][00031] Rollout worker 6 uses device cpu
	[2025-03-06 13:01:21,331][00031] Rollout worker 7 uses device cpu
	[2025-03-06 13:01:21,468][00031] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-06 13:01:21,468][00031] InferenceWorker_p0-w0: min num requests: 2
	[2025-03-06 13:01:21,513][00031] Starting all processes...
	[2025-03-06 13:01:21,513][00031] Starting process learner_proc0
	[2025-03-06 13:01:21,609][00031] Starting all processes...
	[2025-03-06 13:01:21,617][00031] Starting process inference_proc0-0
	[2025-03-06 13:01:21,618][00031] Starting process rollout_proc0
	[2025-03-06 13:01:21,619][00031] Starting process rollout_proc1
	[2025-03-06 13:01:21,619][00031] Starting process rollout_proc2
	[2025-03-06 13:01:21,619][00031] Starting process rollout_proc3
	[2025-03-06 13:01:21,620][00031] Starting process rollout_proc4
	[2025-03-06 13:01:21,621][00031] Starting process rollout_proc5
	[2025-03-06 13:01:21,622][00031] Starting process rollout_proc6
	[2025-03-06 13:01:21,627][00031] Starting process rollout_proc7
	[2025-03-06 13:01:30,327][00184] Worker 2 uses CPU cores [2]
	[2025-03-06 13:01:30,330][00186] Worker 5 uses CPU cores [1]
	[2025-03-06 13:01:30,418][00181] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-06 13:01:30,419][00181] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
	[2025-03-06 13:01:30,420][00180] Worker 0 uses CPU cores [0]
	[2025-03-06 13:01:30,467][00181] Num visible devices: 1
	[2025-03-06 13:01:30,551][00183] Worker 3 uses CPU cores [3]
	[2025-03-06 13:01:30,580][00167] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-06 13:01:30,581][00167] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
	[2025-03-06 13:01:30,592][00187] Worker 6 uses CPU cores [2]
	[2025-03-06 13:01:30,614][00167] Num visible devices: 1
	[2025-03-06 13:01:30,625][00167] Starting seed is not provided
	[2025-03-06 13:01:30,625][00167] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-06 13:01:30,626][00167] Initializing actor-critic model on device cuda:0
	[2025-03-06 13:01:30,626][00167] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-06 13:01:30,632][00167] RunningMeanStd input shape: (1,)
	[2025-03-06 13:01:30,674][00167] ConvEncoder: input_channels=3
	[2025-03-06 13:01:30,705][00182] Worker 1 uses CPU cores [1]
	[2025-03-06 13:01:30,838][00185] Worker 4 uses CPU cores [0]
	[2025-03-06 13:01:30,872][00188] Worker 7 uses CPU cores [3]
	[2025-03-06 13:01:30,956][00167] Conv encoder output size: 512
	[2025-03-06 13:01:30,956][00167] Policy head output size: 512
	[2025-03-06 13:01:31,019][00167] Created Actor Critic model with architecture:
	[2025-03-06 13:01:31,020][00167] ActorCriticSharedWeights(
	(obs_normalizer): ObservationNormalizer(
	(running_mean_std): RunningMeanStdDictInPlace(
	(running_mean_std): ModuleDict(
	(obs): RunningMeanStdInPlace()
	)
	)
	)
	(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
	(encoder): VizdoomEncoder(
	(basic_encoder): ConvEncoder(
	(enc): RecursiveScriptModule(
	original_name=ConvEncoderImpl
	(conv_head): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Conv2d)
	(1): RecursiveScriptModule(original_name=ELU)
	(2): RecursiveScriptModule(original_name=Conv2d)
	(3): RecursiveScriptModule(original_name=ELU)
	(4): RecursiveScriptModule(original_name=Conv2d)
	(5): RecursiveScriptModule(original_name=ELU)
	)
	(mlp_layers): RecursiveScriptModule(
	original_name=Sequential
	(0): RecursiveScriptModule(original_name=Linear)
	(1): RecursiveScriptModule(original_name=ELU)
	)
	)
	)
	)
	(core): ModelCoreRNN(
	(core): GRU(512, 512)
	)
	(decoder): MlpDecoder(
	(mlp): Identity()
	)
	(critic_linear): Linear(in_features=512, out_features=1, bias=True)
	(action_parameterization): ActionParameterizationDefault(
	(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
	)
	)
	[2025-03-06 13:01:31,349][00167] Using optimizer <class 'torch.optim.adam.Adam'>
	[2025-03-06 13:01:33,272][00167] No checkpoints found
	[2025-03-06 13:01:33,272][00167] Did not load from checkpoint, starting from scratch!
	[2025-03-06 13:01:33,273][00167] Initialized policy 0 weights for model version 0
	[2025-03-06 13:01:33,278][00167] LearnerWorker_p0 finished initialization!
	[2025-03-06 13:01:33,278][00167] Using GPUs [0] for process 0 (actually maps to GPUs [0])
	[2025-03-06 13:01:33,383][00181] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-06 13:01:33,384][00181] RunningMeanStd input shape: (1,)
	[2025-03-06 13:01:33,397][00181] ConvEncoder: input_channels=3
	[2025-03-06 13:01:33,517][00181] Conv encoder output size: 512
	[2025-03-06 13:01:33,517][00181] Policy head output size: 512
	[2025-03-06 13:01:33,602][00031] Inference worker 0-0 is ready!
	[2025-03-06 13:01:33,603][00031] All inference workers are ready! Signal rollout workers to start!
	[2025-03-06 13:01:33,727][00180] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-06 13:01:33,730][00182] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-06 13:01:33,731][00184] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-06 13:01:33,729][00188] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-06 13:01:33,729][00185] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-06 13:01:33,732][00186] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-06 13:01:33,734][00183] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-06 13:01:33,735][00187] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-06 13:01:34,378][00184] Decorrelating experience for 0 frames...
	[2025-03-06 13:01:34,809][00180] Decorrelating experience for 0 frames...
	[2025-03-06 13:01:34,810][00185] Decorrelating experience for 0 frames...
	[2025-03-06 13:01:34,826][00183] Decorrelating experience for 0 frames...
	[2025-03-06 13:01:34,835][00186] Decorrelating experience for 0 frames...
	[2025-03-06 13:01:34,838][00188] Decorrelating experience for 0 frames...
	[2025-03-06 13:01:34,837][00182] Decorrelating experience for 0 frames...
	[2025-03-06 13:01:34,940][00184] Decorrelating experience for 32 frames...
	[2025-03-06 13:01:35,285][00185] Decorrelating experience for 32 frames...
	[2025-03-06 13:01:35,511][00184] Decorrelating experience for 64 frames...
	[2025-03-06 13:01:35,734][00186] Decorrelating experience for 32 frames...
	[2025-03-06 13:01:35,768][00183] Decorrelating experience for 32 frames...
	[2025-03-06 13:01:35,771][00188] Decorrelating experience for 32 frames...
	[2025-03-06 13:01:35,926][00182] Decorrelating experience for 32 frames...
	[2025-03-06 13:01:36,113][00184] Decorrelating experience for 96 frames...
	[2025-03-06 13:01:36,362][00185] Decorrelating experience for 64 frames...
	[2025-03-06 13:01:36,504][00031] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-06 13:01:36,623][00180] Decorrelating experience for 32 frames...
	[2025-03-06 13:01:36,662][00183] Decorrelating experience for 64 frames...
	[2025-03-06 13:01:36,996][00186] Decorrelating experience for 64 frames...
	[2025-03-06 13:01:37,156][00182] Decorrelating experience for 64 frames...
	[2025-03-06 13:01:37,168][00187] Decorrelating experience for 0 frames...
	[2025-03-06 13:01:37,378][00185] Decorrelating experience for 96 frames...
	[2025-03-06 13:01:37,651][00187] Decorrelating experience for 32 frames...
	[2025-03-06 13:01:37,794][00180] Decorrelating experience for 64 frames...
	[2025-03-06 13:01:38,037][00186] Decorrelating experience for 96 frames...
	[2025-03-06 13:01:38,150][00188] Decorrelating experience for 64 frames...
	[2025-03-06 13:01:38,202][00182] Decorrelating experience for 96 frames...
	[2025-03-06 13:01:38,379][00183] Decorrelating experience for 96 frames...
	[2025-03-06 13:01:38,526][00187] Decorrelating experience for 64 frames...
	[2025-03-06 13:01:38,997][00180] Decorrelating experience for 96 frames...
	[2025-03-06 13:01:39,468][00187] Decorrelating experience for 96 frames...
	[2025-03-06 13:01:40,058][00188] Decorrelating experience for 96 frames...
	[2025-03-06 13:01:40,133][00167] Signal inference workers to stop experience collection...
	[2025-03-06 13:01:40,151][00181] InferenceWorker_p0-w0: stopping experience collection
	[2025-03-06 13:01:41,456][00031] Heartbeat connected on Batcher_0
	[2025-03-06 13:01:41,468][00031] Heartbeat connected on InferenceWorker_p0-w0
	[2025-03-06 13:01:41,475][00031] Heartbeat connected on RolloutWorker_w0
	[2025-03-06 13:01:41,481][00031] Heartbeat connected on RolloutWorker_w1
	[2025-03-06 13:01:41,486][00031] Heartbeat connected on RolloutWorker_w2
	[2025-03-06 13:01:41,491][00031] Heartbeat connected on RolloutWorker_w3
	[2025-03-06 13:01:41,496][00031] Heartbeat connected on RolloutWorker_w4
	[2025-03-06 13:01:41,503][00031] Heartbeat connected on RolloutWorker_w5
	[2025-03-06 13:01:41,504][00031] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 357.2. Samples: 1786. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
	[2025-03-06 13:01:41,505][00031] Avg episode reward: [(0, '2.741')]
	[2025-03-06 13:01:41,507][00031] Heartbeat connected on RolloutWorker_w6
	[2025-03-06 13:01:41,512][00031] Heartbeat connected on RolloutWorker_w7
	[2025-03-06 13:01:42,605][00167] Signal inference workers to resume experience collection...
	[2025-03-06 13:01:42,606][00181] InferenceWorker_p0-w0: resuming experience collection
	[2025-03-06 13:01:43,217][00031] Heartbeat connected on LearnerWorker_p0
	[2025-03-06 13:01:46,504][00031] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 32768. Throughput: 0: 781.2. Samples: 7812. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:01:46,508][00031] Avg episode reward: [(0, '3.847')]
	[2025-03-06 13:01:47,317][00181] Updated weights for policy 0, policy_version 10 (0.0160)
	[2025-03-06 13:01:51,504][00031] Fps is (10 sec: 7372.8, 60 sec: 4915.2, 300 sec: 4915.2). Total num frames: 73728. Throughput: 0: 908.5. Samples: 13628. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:01:51,506][00031] Avg episode reward: [(0, '4.311')]
	[2025-03-06 13:01:51,982][00181] Updated weights for policy 0, policy_version 20 (0.0018)
	[2025-03-06 13:01:56,504][00031] Fps is (10 sec: 8601.7, 60 sec: 5939.2, 300 sec: 5939.2). Total num frames: 118784. Throughput: 0: 1334.3. Samples: 26686. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2025-03-06 13:01:56,506][00031] Avg episode reward: [(0, '4.415')]
	[2025-03-06 13:01:56,515][00167] Saving new best policy, reward=4.415!
	[2025-03-06 13:01:56,765][00181] Updated weights for policy 0, policy_version 30 (0.0017)
	[2025-03-06 13:02:01,504][00031] Fps is (10 sec: 8601.7, 60 sec: 6389.8, 300 sec: 6389.8). Total num frames: 159744. Throughput: 0: 1581.1. Samples: 39528. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-06 13:02:01,506][00031] Avg episode reward: [(0, '4.598')]
	[2025-03-06 13:02:01,507][00167] Saving new best policy, reward=4.598!
	[2025-03-06 13:02:01,686][00181] Updated weights for policy 0, policy_version 40 (0.0017)
	[2025-03-06 13:02:06,504][00031] Fps is (10 sec: 7782.4, 60 sec: 6553.6, 300 sec: 6553.6). Total num frames: 196608. Throughput: 0: 1497.7. Samples: 44932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:02:06,507][00031] Avg episode reward: [(0, '4.474')]
	[2025-03-06 13:02:07,080][00181] Updated weights for policy 0, policy_version 50 (0.0018)
	[2025-03-06 13:02:11,504][00031] Fps is (10 sec: 8192.0, 60 sec: 6904.7, 300 sec: 6904.7). Total num frames: 241664. Throughput: 0: 1643.1. Samples: 57510. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:02:11,506][00031] Avg episode reward: [(0, '4.303')]
	[2025-03-06 13:02:11,692][00181] Updated weights for policy 0, policy_version 60 (0.0019)
	[2025-03-06 13:02:16,353][00181] Updated weights for policy 0, policy_version 70 (0.0018)
	[2025-03-06 13:02:16,504][00031] Fps is (10 sec: 9011.2, 60 sec: 7168.0, 300 sec: 7168.0). Total num frames: 286720. Throughput: 0: 1766.1. Samples: 70644. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:02:16,506][00031] Avg episode reward: [(0, '4.573')]
	[2025-03-06 13:02:20,986][00181] Updated weights for policy 0, policy_version 80 (0.0022)
	[2025-03-06 13:02:21,504][00031] Fps is (10 sec: 9011.2, 60 sec: 7372.8, 300 sec: 7372.8). Total num frames: 331776. Throughput: 0: 1717.3. Samples: 77280. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:02:21,506][00031] Avg episode reward: [(0, '4.501')]
	[2025-03-06 13:02:25,827][00181] Updated weights for policy 0, policy_version 90 (0.0019)
	[2025-03-06 13:02:26,504][00031] Fps is (10 sec: 8601.6, 60 sec: 7454.7, 300 sec: 7454.7). Total num frames: 372736. Throughput: 0: 1970.8. Samples: 90470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:02:26,506][00031] Avg episode reward: [(0, '4.573')]
	[2025-03-06 13:02:30,478][00181] Updated weights for policy 0, policy_version 100 (0.0020)
	[2025-03-06 13:02:31,505][00031] Fps is (10 sec: 8600.9, 60 sec: 7596.1, 300 sec: 7596.1). Total num frames: 417792. Throughput: 0: 2123.3. Samples: 103364. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2025-03-06 13:02:31,506][00031] Avg episode reward: [(0, '5.086')]
	[2025-03-06 13:02:31,510][00167] Saving new best policy, reward=5.086!
	[2025-03-06 13:02:35,402][00181] Updated weights for policy 0, policy_version 110 (0.0019)
	[2025-03-06 13:02:36,506][00031] Fps is (10 sec: 8190.5, 60 sec: 7577.4, 300 sec: 7577.4). Total num frames: 454656. Throughput: 0: 2140.5. Samples: 109954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:02:36,509][00031] Avg episode reward: [(0, '5.144')]
	[2025-03-06 13:02:36,514][00167] Saving new best policy, reward=5.144!
	[2025-03-06 13:02:40,523][00181] Updated weights for policy 0, policy_version 120 (0.0023)
	[2025-03-06 13:02:41,504][00031] Fps is (10 sec: 8192.7, 60 sec: 8328.6, 300 sec: 7687.9). Total num frames: 499712. Throughput: 0: 2106.5. Samples: 121480. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:02:41,506][00031] Avg episode reward: [(0, '5.039')]
	[2025-03-06 13:02:45,163][00181] Updated weights for policy 0, policy_version 130 (0.0018)
	[2025-03-06 13:02:46,504][00031] Fps is (10 sec: 8603.2, 60 sec: 8465.1, 300 sec: 7723.9). Total num frames: 540672. Throughput: 0: 2114.0. Samples: 134660. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:02:46,506][00031] Avg episode reward: [(0, '4.850')]
	[2025-03-06 13:02:49,887][00181] Updated weights for policy 0, policy_version 140 (0.0021)
	[2025-03-06 13:02:51,504][00031] Fps is (10 sec: 8601.3, 60 sec: 8533.3, 300 sec: 7809.7). Total num frames: 585728. Throughput: 0: 2140.7. Samples: 141262. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:02:51,507][00031] Avg episode reward: [(0, '5.259')]
	[2025-03-06 13:02:51,509][00167] Saving new best policy, reward=5.259!
	[2025-03-06 13:02:54,577][00181] Updated weights for policy 0, policy_version 150 (0.0019)
	[2025-03-06 13:02:56,505][00031] Fps is (10 sec: 9010.4, 60 sec: 8533.2, 300 sec: 7884.7). Total num frames: 630784. Throughput: 0: 2152.0. Samples: 154352. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:02:56,507][00031] Avg episode reward: [(0, '5.256')]
	[2025-03-06 13:02:59,351][00181] Updated weights for policy 0, policy_version 160 (0.0022)
	[2025-03-06 13:03:01,504][00031] Fps is (10 sec: 8601.8, 60 sec: 8533.3, 300 sec: 7902.9). Total num frames: 671744. Throughput: 0: 2150.0. Samples: 167392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:03:01,506][00031] Avg episode reward: [(0, '5.256')]
	[2025-03-06 13:03:03,976][00181] Updated weights for policy 0, policy_version 170 (0.0019)
	[2025-03-06 13:03:06,504][00031] Fps is (10 sec: 8602.4, 60 sec: 8669.9, 300 sec: 7964.4). Total num frames: 716800. Throughput: 0: 2146.7. Samples: 173882. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-06 13:03:06,506][00031] Avg episode reward: [(0, '5.727')]
	[2025-03-06 13:03:06,513][00167] Saving new best policy, reward=5.727!
	[2025-03-06 13:03:09,232][00181] Updated weights for policy 0, policy_version 180 (0.0021)
	[2025-03-06 13:03:11,504][00031] Fps is (10 sec: 8192.0, 60 sec: 8533.3, 300 sec: 7933.3). Total num frames: 753664. Throughput: 0: 2114.2. Samples: 185610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:03:11,506][00031] Avg episode reward: [(0, '6.364')]
	[2025-03-06 13:03:11,511][00167] Saving new best policy, reward=6.364!
	[2025-03-06 13:03:14,092][00181] Updated weights for policy 0, policy_version 190 (0.0023)
	[2025-03-06 13:03:16,504][00031] Fps is (10 sec: 7782.4, 60 sec: 8465.1, 300 sec: 7946.2). Total num frames: 794624. Throughput: 0: 2112.7. Samples: 198432. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:03:16,506][00031] Avg episode reward: [(0, '5.609')]
	[2025-03-06 13:03:16,529][00167] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth...
	[2025-03-06 13:03:18,854][00181] Updated weights for policy 0, policy_version 200 (0.0018)
	[2025-03-06 13:03:21,504][00031] Fps is (10 sec: 8601.7, 60 sec: 8465.1, 300 sec: 7997.0). Total num frames: 839680. Throughput: 0: 2112.4. Samples: 205006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:03:21,508][00031] Avg episode reward: [(0, '6.364')]
	[2025-03-06 13:03:23,442][00181] Updated weights for policy 0, policy_version 210 (0.0018)
	[2025-03-06 13:03:26,504][00031] Fps is (10 sec: 9011.2, 60 sec: 8533.3, 300 sec: 8043.1). Total num frames: 884736. Throughput: 0: 2150.4. Samples: 218250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:03:26,505][00031] Avg episode reward: [(0, '7.231')]
	[2025-03-06 13:03:26,514][00167] Saving new best policy, reward=7.231!
	[2025-03-06 13:03:28,196][00181] Updated weights for policy 0, policy_version 220 (0.0017)
	[2025-03-06 13:03:31,504][00031] Fps is (10 sec: 9011.1, 60 sec: 8533.5, 300 sec: 8085.2). Total num frames: 929792. Throughput: 0: 2145.3. Samples: 231198. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:03:31,505][00031] Avg episode reward: [(0, '6.982')]
	[2025-03-06 13:03:32,811][00181] Updated weights for policy 0, policy_version 230 (0.0021)
	[2025-03-06 13:03:36,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.9, 300 sec: 8089.6). Total num frames: 970752. Throughput: 0: 2147.9. Samples: 237918. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-06 13:03:36,507][00031] Avg episode reward: [(0, '7.616')]
	[2025-03-06 13:03:36,538][00167] Saving new best policy, reward=7.616!
	[2025-03-06 13:03:37,508][00181] Updated weights for policy 0, policy_version 240 (0.0017)
	[2025-03-06 13:03:41,504][00031] Fps is (10 sec: 8192.0, 60 sec: 8533.3, 300 sec: 8093.7). Total num frames: 1011712. Throughput: 0: 2147.0. Samples: 250964. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:03:41,506][00031] Avg episode reward: [(0, '7.940')]
	[2025-03-06 13:03:41,511][00167] Saving new best policy, reward=7.940!
	[2025-03-06 13:03:42,892][00181] Updated weights for policy 0, policy_version 250 (0.0019)
	[2025-03-06 13:03:46,504][00031] Fps is (10 sec: 8191.8, 60 sec: 8533.3, 300 sec: 8097.5). Total num frames: 1052672. Throughput: 0: 2111.5. Samples: 262412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:03:46,506][00031] Avg episode reward: [(0, '7.947')]
	[2025-03-06 13:03:46,514][00167] Saving new best policy, reward=7.947!
	[2025-03-06 13:03:47,730][00181] Updated weights for policy 0, policy_version 260 (0.0021)
	[2025-03-06 13:03:51,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8533.4, 300 sec: 8131.3). Total num frames: 1097728. Throughput: 0: 2108.6. Samples: 268770. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-06 13:03:51,506][00031] Avg episode reward: [(0, '9.377')]
	[2025-03-06 13:03:51,512][00167] Saving new best policy, reward=9.377!
	[2025-03-06 13:03:52,502][00181] Updated weights for policy 0, policy_version 270 (0.0019)
	[2025-03-06 13:03:56,504][00031] Fps is (10 sec: 8601.8, 60 sec: 8465.2, 300 sec: 8133.5). Total num frames: 1138688. Throughput: 0: 2139.6. Samples: 281890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:03:56,506][00031] Avg episode reward: [(0, '8.659')]
	[2025-03-06 13:03:57,155][00181] Updated weights for policy 0, policy_version 280 (0.0024)
	[2025-03-06 13:04:01,505][00031] Fps is (10 sec: 8601.1, 60 sec: 8533.3, 300 sec: 8163.7). Total num frames: 1183744. Throughput: 0: 2144.2. Samples: 294922. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:04:01,506][00031] Avg episode reward: [(0, '10.756')]
	[2025-03-06 13:04:01,508][00167] Saving new best policy, reward=10.756!
	[2025-03-06 13:04:01,866][00181] Updated weights for policy 0, policy_version 290 (0.0021)
	[2025-03-06 13:04:06,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8465.1, 300 sec: 8164.7). Total num frames: 1224704. Throughput: 0: 2142.3. Samples: 301412. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:04:06,506][00031] Avg episode reward: [(0, '10.491')]
	[2025-03-06 13:04:06,613][00181] Updated weights for policy 0, policy_version 300 (0.0020)
	[2025-03-06 13:04:11,274][00181] Updated weights for policy 0, policy_version 310 (0.0019)
	[2025-03-06 13:04:11,505][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.5, 300 sec: 8192.0). Total num frames: 1269760. Throughput: 0: 2139.0. Samples: 314504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-06 13:04:11,506][00031] Avg episode reward: [(0, '10.344')]
	[2025-03-06 13:04:16,504][00031] Fps is (10 sec: 8192.0, 60 sec: 8533.3, 300 sec: 8166.4). Total num frames: 1306624. Throughput: 0: 2109.9. Samples: 326144. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2025-03-06 13:04:16,507][00031] Avg episode reward: [(0, '9.798')]
	[2025-03-06 13:04:16,675][00181] Updated weights for policy 0, policy_version 320 (0.0021)
	[2025-03-06 13:04:21,187][00181] Updated weights for policy 0, policy_version 330 (0.0016)
	[2025-03-06 13:04:21,504][00031] Fps is (10 sec: 8192.3, 60 sec: 8533.3, 300 sec: 8192.0). Total num frames: 1351680. Throughput: 0: 2107.1. Samples: 332736. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
	[2025-03-06 13:04:21,506][00031] Avg episode reward: [(0, '10.601')]
	[2025-03-06 13:04:26,018][00181] Updated weights for policy 0, policy_version 340 (0.0020)
	[2025-03-06 13:04:26,504][00031] Fps is (10 sec: 9011.2, 60 sec: 8533.3, 300 sec: 8216.1). Total num frames: 1396736. Throughput: 0: 2112.3. Samples: 346016. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:04:26,506][00031] Avg episode reward: [(0, '12.147')]
	[2025-03-06 13:04:26,514][00167] Saving new best policy, reward=12.147!
	[2025-03-06 13:04:30,589][00181] Updated weights for policy 0, policy_version 350 (0.0023)
	[2025-03-06 13:04:31,504][00031] Fps is (10 sec: 9011.1, 60 sec: 8533.3, 300 sec: 8238.8). Total num frames: 1441792. Throughput: 0: 2148.9. Samples: 359114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:04:31,507][00031] Avg episode reward: [(0, '11.999')]
	[2025-03-06 13:04:35,256][00181] Updated weights for policy 0, policy_version 360 (0.0016)
	[2025-03-06 13:04:36,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8237.5). Total num frames: 1482752. Throughput: 0: 2154.6. Samples: 365728. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-06 13:04:36,506][00031] Avg episode reward: [(0, '12.190')]
	[2025-03-06 13:04:36,516][00167] Saving new best policy, reward=12.190!
	[2025-03-06 13:04:40,006][00181] Updated weights for policy 0, policy_version 370 (0.0018)
	[2025-03-06 13:04:41,504][00031] Fps is (10 sec: 8601.7, 60 sec: 8601.6, 300 sec: 8258.4). Total num frames: 1527808. Throughput: 0: 2153.7. Samples: 378808. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:04:41,507][00031] Avg episode reward: [(0, '14.363')]
	[2025-03-06 13:04:41,509][00167] Saving new best policy, reward=14.363!
	[2025-03-06 13:04:44,488][00181] Updated weights for policy 0, policy_version 380 (0.0020)
	[2025-03-06 13:04:46,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8256.7). Total num frames: 1568768. Throughput: 0: 2155.8. Samples: 391934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-06 13:04:46,505][00031] Avg episode reward: [(0, '15.765')]
	[2025-03-06 13:04:46,515][00167] Saving new best policy, reward=15.765!
	[2025-03-06 13:04:49,953][00181] Updated weights for policy 0, policy_version 390 (0.0017)
	[2025-03-06 13:04:51,504][00031] Fps is (10 sec: 8191.7, 60 sec: 8533.3, 300 sec: 8255.0). Total num frames: 1609728. Throughput: 0: 2126.6. Samples: 397108. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:04:51,508][00031] Avg episode reward: [(0, '16.518')]
	[2025-03-06 13:04:51,511][00167] Saving new best policy, reward=16.518!
	[2025-03-06 13:04:54,573][00181] Updated weights for policy 0, policy_version 400 (0.0020)
	[2025-03-06 13:04:56,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8273.9). Total num frames: 1654784. Throughput: 0: 2128.7. Samples: 410294. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:04:56,505][00031] Avg episode reward: [(0, '16.343')]
	[2025-03-06 13:04:59,240][00181] Updated weights for policy 0, policy_version 410 (0.0016)
	[2025-03-06 13:05:01,504][00031] Fps is (10 sec: 9011.5, 60 sec: 8601.7, 300 sec: 8291.9). Total num frames: 1699840. Throughput: 0: 2164.0. Samples: 423524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:05:01,507][00031] Avg episode reward: [(0, '16.923')]
	[2025-03-06 13:05:01,510][00167] Saving new best policy, reward=16.923!
	[2025-03-06 13:05:03,870][00181] Updated weights for policy 0, policy_version 420 (0.0018)
	[2025-03-06 13:05:06,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8289.5). Total num frames: 1740800. Throughput: 0: 2161.4. Samples: 430000. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-06 13:05:06,506][00031] Avg episode reward: [(0, '17.265')]
	[2025-03-06 13:05:06,516][00167] Saving new best policy, reward=17.265!
	[2025-03-06 13:05:08,588][00181] Updated weights for policy 0, policy_version 430 (0.0018)
	[2025-03-06 13:05:11,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.7, 300 sec: 8306.3). Total num frames: 1785856. Throughput: 0: 2157.4. Samples: 443098. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-06 13:05:11,506][00031] Avg episode reward: [(0, '15.804')]
	[2025-03-06 13:05:13,313][00181] Updated weights for policy 0, policy_version 440 (0.0019)
	[2025-03-06 13:05:16,504][00031] Fps is (10 sec: 9011.2, 60 sec: 8738.1, 300 sec: 8322.3). Total num frames: 1830912. Throughput: 0: 2160.4. Samples: 456332. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:05:16,507][00031] Avg episode reward: [(0, '17.082')]
	[2025-03-06 13:05:16,517][00167] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000447_1830912.pth...
	[2025-03-06 13:05:17,948][00181] Updated weights for policy 0, policy_version 450 (0.0017)
	[2025-03-06 13:05:21,504][00031] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8301.2). Total num frames: 1867776. Throughput: 0: 2155.5. Samples: 462726. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-06 13:05:21,505][00031] Avg episode reward: [(0, '18.413')]
	[2025-03-06 13:05:21,507][00167] Saving new best policy, reward=18.413!
	[2025-03-06 13:05:23,157][00181] Updated weights for policy 0, policy_version 460 (0.0017)
	[2025-03-06 13:05:26,504][00031] Fps is (10 sec: 8191.9, 60 sec: 8601.6, 300 sec: 8316.7). Total num frames: 1912832. Throughput: 0: 2132.0. Samples: 474748. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:05:26,508][00031] Avg episode reward: [(0, '18.506')]
	[2025-03-06 13:05:26,520][00167] Saving new best policy, reward=18.506!
	[2025-03-06 13:05:27,863][00181] Updated weights for policy 0, policy_version 470 (0.0018)
	[2025-03-06 13:05:31,504][00031] Fps is (10 sec: 8601.3, 60 sec: 8533.3, 300 sec: 8314.0). Total num frames: 1953792. Throughput: 0: 2133.8. Samples: 487954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:05:31,507][00031] Avg episode reward: [(0, '18.413')]
	[2025-03-06 13:05:32,528][00181] Updated weights for policy 0, policy_version 480 (0.0018)
	[2025-03-06 13:05:36,504][00031] Fps is (10 sec: 8601.7, 60 sec: 8601.6, 300 sec: 8328.5). Total num frames: 1998848. Throughput: 0: 2166.7. Samples: 494608. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:05:36,506][00031] Avg episode reward: [(0, '17.594')]
	[2025-03-06 13:05:37,176][00181] Updated weights for policy 0, policy_version 490 (0.0021)
	[2025-03-06 13:05:41,504][00031] Fps is (10 sec: 9011.5, 60 sec: 8601.6, 300 sec: 8342.5). Total num frames: 2043904. Throughput: 0: 2169.6. Samples: 507928. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:05:41,506][00031] Avg episode reward: [(0, '17.033')]
	[2025-03-06 13:05:41,714][00181] Updated weights for policy 0, policy_version 500 (0.0020)
	[2025-03-06 13:05:46,318][00181] Updated weights for policy 0, policy_version 510 (0.0016)
	[2025-03-06 13:05:46,504][00031] Fps is (10 sec: 9011.2, 60 sec: 8669.9, 300 sec: 8355.8). Total num frames: 2088960. Throughput: 0: 2169.0. Samples: 521128. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:05:46,506][00031] Avg episode reward: [(0, '17.880')]
	[2025-03-06 13:05:50,976][00181] Updated weights for policy 0, policy_version 520 (0.0021)
	[2025-03-06 13:05:51,505][00031] Fps is (10 sec: 8601.2, 60 sec: 8669.8, 300 sec: 8352.6). Total num frames: 2129920. Throughput: 0: 2173.8. Samples: 527820. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
	[2025-03-06 13:05:51,508][00031] Avg episode reward: [(0, '19.069')]
	[2025-03-06 13:05:51,511][00167] Saving new best policy, reward=19.069!
	[2025-03-06 13:05:56,386][00181] Updated weights for policy 0, policy_version 530 (0.0015)
	[2025-03-06 13:05:56,504][00031] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8349.5). Total num frames: 2170880. Throughput: 0: 2143.2. Samples: 539540. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:05:56,506][00031] Avg episode reward: [(0, '19.326')]
	[2025-03-06 13:05:56,514][00167] Saving new best policy, reward=19.326!
	[2025-03-06 13:06:01,111][00181] Updated weights for policy 0, policy_version 540 (0.0018)
	[2025-03-06 13:06:01,504][00031] Fps is (10 sec: 8192.4, 60 sec: 8533.3, 300 sec: 8346.6). Total num frames: 2211840. Throughput: 0: 2138.2. Samples: 552552. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
	[2025-03-06 13:06:01,505][00031] Avg episode reward: [(0, '19.903')]
	[2025-03-06 13:06:01,507][00167] Saving new best policy, reward=19.903!
	[2025-03-06 13:06:05,789][00181] Updated weights for policy 0, policy_version 550 (0.0022)
	[2025-03-06 13:06:06,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8358.9). Total num frames: 2256896. Throughput: 0: 2141.8. Samples: 559108. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:06:06,505][00031] Avg episode reward: [(0, '20.427')]
	[2025-03-06 13:06:06,515][00167] Saving new best policy, reward=20.427!
	[2025-03-06 13:06:10,457][00181] Updated weights for policy 0, policy_version 560 (0.0021)
	[2025-03-06 13:06:11,504][00031] Fps is (10 sec: 9011.1, 60 sec: 8601.6, 300 sec: 8370.7). Total num frames: 2301952. Throughput: 0: 2166.8. Samples: 572254. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:06:11,506][00031] Avg episode reward: [(0, '20.298')]
	[2025-03-06 13:06:15,116][00181] Updated weights for policy 0, policy_version 570 (0.0018)
	[2025-03-06 13:06:16,504][00031] Fps is (10 sec: 9011.0, 60 sec: 8601.6, 300 sec: 8382.2). Total num frames: 2347008. Throughput: 0: 2165.1. Samples: 585382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:06:16,506][00031] Avg episode reward: [(0, '21.035')]
	[2025-03-06 13:06:16,513][00167] Saving new best policy, reward=21.035!
	[2025-03-06 13:06:19,735][00181] Updated weights for policy 0, policy_version 580 (0.0021)
	[2025-03-06 13:06:21,504][00031] Fps is (10 sec: 8601.7, 60 sec: 8669.9, 300 sec: 8378.8). Total num frames: 2387968. Throughput: 0: 2166.0. Samples: 592076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:06:21,505][00031] Avg episode reward: [(0, '20.691')]
	[2025-03-06 13:06:24,519][00181] Updated weights for policy 0, policy_version 590 (0.0021)
	[2025-03-06 13:06:26,504][00031] Fps is (10 sec: 8192.2, 60 sec: 8601.6, 300 sec: 8375.6). Total num frames: 2428928. Throughput: 0: 2153.5. Samples: 604834. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:06:26,507][00031] Avg episode reward: [(0, '20.706')]
	[2025-03-06 13:06:29,684][00181] Updated weights for policy 0, policy_version 600 (0.0016)
	[2025-03-06 13:06:31,505][00031] Fps is (10 sec: 8191.5, 60 sec: 8601.6, 300 sec: 8372.5). Total num frames: 2469888. Throughput: 0: 2133.5. Samples: 617138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:06:31,507][00031] Avg episode reward: [(0, '21.479')]
	[2025-03-06 13:06:31,558][00167] Saving new best policy, reward=21.479!
	[2025-03-06 13:06:34,311][00181] Updated weights for policy 0, policy_version 610 (0.0019)
	[2025-03-06 13:06:36,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8525.2). Total num frames: 2514944. Throughput: 0: 2130.3. Samples: 623682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:06:36,505][00031] Avg episode reward: [(0, '21.428')]
	[2025-03-06 13:06:39,022][00181] Updated weights for policy 0, policy_version 620 (0.0016)
	[2025-03-06 13:06:41,504][00031] Fps is (10 sec: 9011.8, 60 sec: 8601.6, 300 sec: 8566.9). Total num frames: 2560000. Throughput: 0: 2161.7. Samples: 636816. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:06:41,506][00031] Avg episode reward: [(0, '19.060')]
	[2025-03-06 13:06:43,655][00181] Updated weights for policy 0, policy_version 630 (0.0016)
	[2025-03-06 13:06:46,504][00031] Fps is (10 sec: 9011.1, 60 sec: 8601.6, 300 sec: 8580.8). Total num frames: 2605056. Throughput: 0: 2167.3. Samples: 650080. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:06:46,505][00031] Avg episode reward: [(0, '17.845')]
	[2025-03-06 13:06:48,273][00181] Updated weights for policy 0, policy_version 640 (0.0019)
	[2025-03-06 13:06:51,513][00031] Fps is (10 sec: 9003.1, 60 sec: 8668.6, 300 sec: 8580.5). Total num frames: 2650112. Throughput: 0: 2171.5. Samples: 656846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-06 13:06:51,517][00031] Avg episode reward: [(0, '20.448')]
	[2025-03-06 13:06:52,909][00181] Updated weights for policy 0, policy_version 650 (0.0019)
	[2025-03-06 13:06:56,504][00031] Fps is (10 sec: 8601.5, 60 sec: 8669.9, 300 sec: 8580.8). Total num frames: 2691072. Throughput: 0: 2176.2. Samples: 670184. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:06:56,507][00031] Avg episode reward: [(0, '17.318')]
	[2025-03-06 13:06:57,804][00181] Updated weights for policy 0, policy_version 660 (0.0017)
	[2025-03-06 13:07:01,504][00031] Fps is (10 sec: 8199.4, 60 sec: 8669.9, 300 sec: 8594.7). Total num frames: 2732032. Throughput: 0: 2143.3. Samples: 681830. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:07:01,507][00031] Avg episode reward: [(0, '18.783')]
	[2025-03-06 13:07:02,816][00181] Updated weights for policy 0, policy_version 670 (0.0020)
	[2025-03-06 13:07:06,504][00031] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8580.8). Total num frames: 2772992. Throughput: 0: 2143.3. Samples: 688526. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:07:06,506][00031] Avg episode reward: [(0, '21.983')]
	[2025-03-06 13:07:06,517][00167] Saving new best policy, reward=21.983!
	[2025-03-06 13:07:07,577][00181] Updated weights for policy 0, policy_version 680 (0.0018)
	[2025-03-06 13:07:11,504][00031] Fps is (10 sec: 8601.5, 60 sec: 8601.6, 300 sec: 8580.8). Total num frames: 2818048. Throughput: 0: 2150.1. Samples: 701588. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:07:11,506][00031] Avg episode reward: [(0, '23.102')]
	[2025-03-06 13:07:11,508][00167] Saving new best policy, reward=23.102!
	[2025-03-06 13:07:12,174][00181] Updated weights for policy 0, policy_version 690 (0.0020)
	[2025-03-06 13:07:16,504][00031] Fps is (10 sec: 9011.2, 60 sec: 8601.6, 300 sec: 8580.8). Total num frames: 2863104. Throughput: 0: 2166.7. Samples: 714636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:07:16,506][00031] Avg episode reward: [(0, '21.690')]
	[2025-03-06 13:07:16,513][00167] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000699_2863104.pth...
	[2025-03-06 13:07:16,604][00167] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000195_798720.pth
	[2025-03-06 13:07:16,877][00181] Updated weights for policy 0, policy_version 700 (0.0018)
	[2025-03-06 13:07:21,492][00181] Updated weights for policy 0, policy_version 710 (0.0019)
	[2025-03-06 13:07:21,504][00031] Fps is (10 sec: 9011.3, 60 sec: 8669.9, 300 sec: 8594.7). Total num frames: 2908160. Throughput: 0: 2169.0. Samples: 721288. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:07:21,506][00031] Avg episode reward: [(0, '21.939')]
	[2025-03-06 13:07:26,106][00181] Updated weights for policy 0, policy_version 720 (0.0015)
	[2025-03-06 13:07:26,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8580.8). Total num frames: 2949120. Throughput: 0: 2176.2. Samples: 734746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:07:26,506][00031] Avg episode reward: [(0, '23.221')]
	[2025-03-06 13:07:26,572][00167] Saving new best policy, reward=23.221!
	[2025-03-06 13:07:31,278][00181] Updated weights for policy 0, policy_version 730 (0.0018)
	[2025-03-06 13:07:31,504][00031] Fps is (10 sec: 8191.9, 60 sec: 8670.0, 300 sec: 8594.7). Total num frames: 2990080. Throughput: 0: 2155.6. Samples: 747082. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
	[2025-03-06 13:07:31,506][00031] Avg episode reward: [(0, '22.569')]
	[2025-03-06 13:07:36,157][00181] Updated weights for policy 0, policy_version 740 (0.0019)
	[2025-03-06 13:07:36,504][00031] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8580.8). Total num frames: 3031040. Throughput: 0: 2135.8. Samples: 752936. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:07:36,506][00031] Avg episode reward: [(0, '21.362')]
	[2025-03-06 13:07:40,791][00181] Updated weights for policy 0, policy_version 750 (0.0016)
	[2025-03-06 13:07:41,504][00031] Fps is (10 sec: 8601.7, 60 sec: 8601.6, 300 sec: 8594.7). Total num frames: 3076096. Throughput: 0: 2129.0. Samples: 765988. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:07:41,505][00031] Avg episode reward: [(0, '20.925')]
	[2025-03-06 13:07:45,436][00181] Updated weights for policy 0, policy_version 760 (0.0018)
	[2025-03-06 13:07:46,504][00031] Fps is (10 sec: 9011.3, 60 sec: 8601.6, 300 sec: 8594.7). Total num frames: 3121152. Throughput: 0: 2163.4. Samples: 779184. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:07:46,506][00031] Avg episode reward: [(0, '19.695')]
	[2025-03-06 13:07:50,047][00181] Updated weights for policy 0, policy_version 770 (0.0019)
	[2025-03-06 13:07:51,504][00031] Fps is (10 sec: 9011.1, 60 sec: 8602.9, 300 sec: 8594.7). Total num frames: 3166208. Throughput: 0: 2165.2. Samples: 785960. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:07:51,507][00031] Avg episode reward: [(0, '22.039')]
	[2025-03-06 13:07:54,716][00181] Updated weights for policy 0, policy_version 780 (0.0022)
	[2025-03-06 13:07:56,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8594.7). Total num frames: 3207168. Throughput: 0: 2170.1. Samples: 799244. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
	[2025-03-06 13:07:56,509][00031] Avg episode reward: [(0, '21.809')]
	[2025-03-06 13:07:59,412][00181] Updated weights for policy 0, policy_version 790 (0.0022)
	[2025-03-06 13:08:01,504][00031] Fps is (10 sec: 8601.5, 60 sec: 8669.8, 300 sec: 8594.7). Total num frames: 3252224. Throughput: 0: 2164.5. Samples: 812038. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:08:01,506][00031] Avg episode reward: [(0, '21.226')]
	[2025-03-06 13:08:04,826][00181] Updated weights for policy 0, policy_version 800 (0.0025)
	[2025-03-06 13:08:06,504][00031] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8594.7). Total num frames: 3289088. Throughput: 0: 2143.9. Samples: 817766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:08:06,506][00031] Avg episode reward: [(0, '21.619')]
	[2025-03-06 13:08:09,532][00181] Updated weights for policy 0, policy_version 810 (0.0016)
	[2025-03-06 13:08:11,504][00031] Fps is (10 sec: 8192.1, 60 sec: 8601.6, 300 sec: 8608.5). Total num frames: 3334144. Throughput: 0: 2123.0. Samples: 830280. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:08:11,506][00031] Avg episode reward: [(0, '23.595')]
	[2025-03-06 13:08:11,508][00167] Saving new best policy, reward=23.595!
	[2025-03-06 13:08:14,142][00181] Updated weights for policy 0, policy_version 820 (0.0017)
	[2025-03-06 13:08:16,504][00031] Fps is (10 sec: 9011.2, 60 sec: 8601.6, 300 sec: 8608.5). Total num frames: 3379200. Throughput: 0: 2142.5. Samples: 843496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:08:16,506][00031] Avg episode reward: [(0, '23.917')]
	[2025-03-06 13:08:16,514][00167] Saving new best policy, reward=23.917!
	[2025-03-06 13:08:18,742][00181] Updated weights for policy 0, policy_version 830 (0.0018)
	[2025-03-06 13:08:21,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8533.3, 300 sec: 8594.7). Total num frames: 3420160. Throughput: 0: 2161.8. Samples: 850216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:08:21,506][00031] Avg episode reward: [(0, '21.575')]
	[2025-03-06 13:08:23,387][00181] Updated weights for policy 0, policy_version 840 (0.0016)
	[2025-03-06 13:08:26,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8594.7). Total num frames: 3465216. Throughput: 0: 2167.5. Samples: 863524. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:08:26,505][00031] Avg episode reward: [(0, '22.078')]
	[2025-03-06 13:08:28,062][00181] Updated weights for policy 0, policy_version 850 (0.0021)
	[2025-03-06 13:08:31,504][00031] Fps is (10 sec: 9011.1, 60 sec: 8669.9, 300 sec: 8608.5). Total num frames: 3510272. Throughput: 0: 2167.3. Samples: 876714. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:08:31,505][00031] Avg episode reward: [(0, '22.642')]
	[2025-03-06 13:08:32,604][00181] Updated weights for policy 0, policy_version 860 (0.0021)
	[2025-03-06 13:08:36,504][00031] Fps is (10 sec: 8601.5, 60 sec: 8669.9, 300 sec: 8608.5). Total num frames: 3551232. Throughput: 0: 2163.9. Samples: 883338. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:08:36,506][00031] Avg episode reward: [(0, '22.787')]
	[2025-03-06 13:08:37,958][00181] Updated weights for policy 0, policy_version 870 (0.0019)
	[2025-03-06 13:08:41,504][00031] Fps is (10 sec: 8192.1, 60 sec: 8601.6, 300 sec: 8608.6). Total num frames: 3592192. Throughput: 0: 2128.4. Samples: 895020. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:08:41,506][00031] Avg episode reward: [(0, '23.438')]
	[2025-03-06 13:08:42,694][00181] Updated weights for policy 0, policy_version 880 (0.0019)
	[2025-03-06 13:08:46,504][00031] Fps is (10 sec: 8601.7, 60 sec: 8601.6, 300 sec: 8608.5). Total num frames: 3637248. Throughput: 0: 2136.4. Samples: 908174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:08:46,506][00031] Avg episode reward: [(0, '22.388')]
	[2025-03-06 13:08:47,306][00181] Updated weights for policy 0, policy_version 890 (0.0016)
	[2025-03-06 13:08:51,504][00031] Fps is (10 sec: 8601.5, 60 sec: 8533.3, 300 sec: 8608.5). Total num frames: 3678208. Throughput: 0: 2158.6. Samples: 914902. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:08:51,508][00031] Avg episode reward: [(0, '24.378')]
	[2025-03-06 13:08:51,522][00167] Saving new best policy, reward=24.378!
	[2025-03-06 13:08:52,061][00181] Updated weights for policy 0, policy_version 900 (0.0021)
	[2025-03-06 13:08:56,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8608.6). Total num frames: 3723264. Throughput: 0: 2166.4. Samples: 927766. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
	[2025-03-06 13:08:56,506][00031] Avg episode reward: [(0, '22.820')]
	[2025-03-06 13:08:56,818][00181] Updated weights for policy 0, policy_version 910 (0.0023)
	[2025-03-06 13:09:01,426][00181] Updated weights for policy 0, policy_version 920 (0.0017)
	[2025-03-06 13:09:01,504][00031] Fps is (10 sec: 9011.3, 60 sec: 8601.6, 300 sec: 8622.4). Total num frames: 3768320. Throughput: 0: 2164.1. Samples: 940880. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:09:01,506][00031] Avg episode reward: [(0, '23.220')]
	[2025-03-06 13:09:06,143][00181] Updated weights for policy 0, policy_version 930 (0.0022)
	[2025-03-06 13:09:06,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8669.9, 300 sec: 8608.6). Total num frames: 3809280. Throughput: 0: 2162.1. Samples: 947510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:09:06,506][00031] Avg episode reward: [(0, '20.819')]
	[2025-03-06 13:09:11,476][00181] Updated weights for policy 0, policy_version 940 (0.0018)
	[2025-03-06 13:09:11,504][00031] Fps is (10 sec: 8192.0, 60 sec: 8601.6, 300 sec: 8622.4). Total num frames: 3850240. Throughput: 0: 2132.3. Samples: 959476. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
	[2025-03-06 13:09:11,506][00031] Avg episode reward: [(0, '22.313')]
	[2025-03-06 13:09:16,131][00181] Updated weights for policy 0, policy_version 950 (0.0016)
	[2025-03-06 13:09:16,504][00031] Fps is (10 sec: 8192.0, 60 sec: 8533.3, 300 sec: 8608.5). Total num frames: 3891200. Throughput: 0: 2124.7. Samples: 972324. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:09:16,506][00031] Avg episode reward: [(0, '23.058')]
	[2025-03-06 13:09:16,516][00167] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000950_3891200.pth...
	[2025-03-06 13:09:16,608][00167] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000447_1830912.pth
	[2025-03-06 13:09:20,738][00181] Updated weights for policy 0, policy_version 960 (0.0018)
	[2025-03-06 13:09:21,504][00031] Fps is (10 sec: 8601.6, 60 sec: 8601.6, 300 sec: 8608.5). Total num frames: 3936256. Throughput: 0: 2123.0. Samples: 978872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:09:21,506][00031] Avg episode reward: [(0, '24.867')]
	[2025-03-06 13:09:21,508][00167] Saving new best policy, reward=24.867!
	[2025-03-06 13:09:25,472][00181] Updated weights for policy 0, policy_version 970 (0.0022)
	[2025-03-06 13:09:26,504][00031] Fps is (10 sec: 9011.2, 60 sec: 8601.6, 300 sec: 8608.5). Total num frames: 3981312. Throughput: 0: 2157.6. Samples: 992114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
	[2025-03-06 13:09:26,506][00031] Avg episode reward: [(0, '22.337')]
	[2025-03-06 13:09:29,128][00167] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-06 13:09:29,135][00167] Stopping Batcher_0...
	[2025-03-06 13:09:29,135][00167] Loop batcher_evt_loop terminating...
	[2025-03-06 13:09:29,135][00031] Component Batcher_0 stopped!
	[2025-03-06 13:09:29,161][00181] Weights refcount: 2 0
	[2025-03-06 13:09:29,163][00181] Stopping InferenceWorker_p0-w0...
	[2025-03-06 13:09:29,164][00181] Loop inference_proc0-0_evt_loop terminating...
	[2025-03-06 13:09:29,166][00031] Component InferenceWorker_p0-w0 stopped!
	[2025-03-06 13:09:29,225][00167] Removing /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000699_2863104.pth
	[2025-03-06 13:09:29,239][00186] Stopping RolloutWorker_w5...
	[2025-03-06 13:09:29,239][00186] Loop rollout_proc5_evt_loop terminating...
	[2025-03-06 13:09:29,240][00167] Saving /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-06 13:09:29,243][00182] Stopping RolloutWorker_w1...
	[2025-03-06 13:09:29,244][00182] Loop rollout_proc1_evt_loop terminating...
	[2025-03-06 13:09:29,246][00187] Stopping RolloutWorker_w6...
	[2025-03-06 13:09:29,243][00031] Component RolloutWorker_w5 stopped!
	[2025-03-06 13:09:29,246][00187] Loop rollout_proc6_evt_loop terminating...
	[2025-03-06 13:09:29,247][00031] Component RolloutWorker_w1 stopped!
	[2025-03-06 13:09:29,248][00184] Stopping RolloutWorker_w2...
	[2025-03-06 13:09:29,249][00184] Loop rollout_proc2_evt_loop terminating...
	[2025-03-06 13:09:29,248][00031] Component RolloutWorker_w6 stopped!
	[2025-03-06 13:09:29,254][00031] Component RolloutWorker_w2 stopped!
	[2025-03-06 13:09:29,394][00167] Stopping LearnerWorker_p0...
	[2025-03-06 13:09:29,394][00167] Loop learner_proc0_evt_loop terminating...
	[2025-03-06 13:09:29,395][00031] Component LearnerWorker_p0 stopped!
	[2025-03-06 13:09:29,429][00183] Stopping RolloutWorker_w3...
	[2025-03-06 13:09:29,430][00183] Loop rollout_proc3_evt_loop terminating...
	[2025-03-06 13:09:29,429][00031] Component RolloutWorker_w3 stopped!
	[2025-03-06 13:09:29,492][00188] Stopping RolloutWorker_w7...
	[2025-03-06 13:09:29,495][00188] Loop rollout_proc7_evt_loop terminating...
	[2025-03-06 13:09:29,492][00031] Component RolloutWorker_w7 stopped!
	[2025-03-06 13:09:29,548][00180] Stopping RolloutWorker_w0...
	[2025-03-06 13:09:29,548][00180] Loop rollout_proc0_evt_loop terminating...
	[2025-03-06 13:09:29,550][00031] Component RolloutWorker_w0 stopped!
	[2025-03-06 13:09:29,581][00185] Stopping RolloutWorker_w4...
	[2025-03-06 13:09:29,582][00185] Loop rollout_proc4_evt_loop terminating...
	[2025-03-06 13:09:29,581][00031] Component RolloutWorker_w4 stopped!
	[2025-03-06 13:09:29,583][00031] Waiting for process learner_proc0 to stop...
	[2025-03-06 13:09:30,815][00031] Waiting for process inference_proc0-0 to join...
	[2025-03-06 13:09:30,817][00031] Waiting for process rollout_proc0 to join...
	[2025-03-06 13:09:31,264][00031] Waiting for process rollout_proc1 to join...
	[2025-03-06 13:09:31,266][00031] Waiting for process rollout_proc2 to join...
	[2025-03-06 13:09:31,268][00031] Waiting for process rollout_proc3 to join...
	[2025-03-06 13:09:31,301][00031] Waiting for process rollout_proc4 to join...
	[2025-03-06 13:09:31,302][00031] Waiting for process rollout_proc5 to join...
	[2025-03-06 13:09:31,303][00031] Waiting for process rollout_proc6 to join...
	[2025-03-06 13:09:31,304][00031] Waiting for process rollout_proc7 to join...
	[2025-03-06 13:09:31,306][00031] Batcher 0 profile tree view:
	batching: 20.2462, releasing_batches: 0.0272
	[2025-03-06 13:09:31,307][00031] InferenceWorker_p0-w0 profile tree view:
	wait_policy: 0.0000
	wait_policy_total: 15.6770
	update_model: 6.8758
	weight_update: 0.0019
	one_step: 0.0027
	handle_policy_step: 427.1768
	deserialize: 13.0785, stack: 2.6267, obs_to_device_normalize: 103.1842, forward: 206.8819, send_messages: 21.9097
	prepare_outputs: 58.3068
	to_cpu: 36.8862
	[2025-03-06 13:09:31,308][00031] Learner 0 profile tree view:
	misc: 0.0050, prepare_batch: 12.6316
	train: 52.5828
	epoch_init: 0.0056, minibatch_init: 0.0069, losses_postprocess: 0.5668, kl_divergence: 0.5328, after_optimizer: 23.6469
	calculate_losses: 17.4709
	losses_init: 0.0038, forward_head: 0.9851, bptt_initial: 12.1632, tail: 0.7452, advantages_returns: 0.1970, losses: 1.7975
	bptt: 1.3622
	bptt_forward_core: 1.2944
	update: 9.9071
	clip: 0.8449
	[2025-03-06 13:09:31,309][00031] RolloutWorker_w0 profile tree view:
	wait_for_trajectories: 0.1821, enqueue_policy_requests: 8.2119, env_step: 347.7651, overhead: 7.1564, complete_rollouts: 1.1611
	save_policy_outputs: 10.1411
	split_output_tensors: 4.0777
	[2025-03-06 13:09:31,310][00031] RolloutWorker_w7 profile tree view:
	wait_for_trajectories: 0.1946, enqueue_policy_requests: 8.5301, env_step: 343.9408, overhead: 7.5370, complete_rollouts: 1.3102
	save_policy_outputs: 10.5550
	split_output_tensors: 4.2548
	[2025-03-06 13:09:31,311][00031] Loop Runner_EvtLoop terminating...
	[2025-03-06 13:09:31,312][00031] Runner profile tree view:
	main_loop: 489.7997
	[2025-03-06 13:09:31,313][00031] Collected {0: 4005888}, FPS: 8178.6
	[2025-03-06 13:12:39,274][00031] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json
	[2025-03-06 13:12:39,275][00031] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-06 13:12:39,276][00031] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-06 13:12:39,277][00031] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-06 13:12:39,278][00031] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-06 13:12:39,279][00031] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-06 13:12:39,280][00031] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-06 13:12:39,281][00031] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-06 13:12:39,282][00031] Adding new argument 'push_to_hub'=False that is not in the saved config file!
	[2025-03-06 13:12:39,282][00031] Adding new argument 'hf_repository'=None that is not in the saved config file!
	[2025-03-06 13:12:39,283][00031] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-06 13:12:39,284][00031] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-06 13:12:39,285][00031] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-06 13:12:39,286][00031] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-06 13:12:39,287][00031] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-06 13:12:39,324][00031] Doom resolution: 160x120, resize resolution: (128, 72)
	[2025-03-06 13:12:39,327][00031] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-06 13:12:39,330][00031] RunningMeanStd input shape: (1,)
	[2025-03-06 13:12:39,345][00031] ConvEncoder: input_channels=3
	[2025-03-06 13:12:39,455][00031] Conv encoder output size: 512
	[2025-03-06 13:12:39,456][00031] Policy head output size: 512
	[2025-03-06 13:12:39,681][00031] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-06 13:12:40,508][00031] Num frames 100...
	[2025-03-06 13:12:40,634][00031] Num frames 200...
	[2025-03-06 13:12:40,773][00031] Num frames 300...
	[2025-03-06 13:12:40,912][00031] Avg episode rewards: #0: 6.650, true rewards: #0: 3.650
	[2025-03-06 13:12:40,913][00031] Avg episode reward: 6.650, avg true_objective: 3.650
	[2025-03-06 13:12:40,957][00031] Num frames 400...
	[2025-03-06 13:12:41,078][00031] Num frames 500...
	[2025-03-06 13:12:41,201][00031] Num frames 600...
	[2025-03-06 13:12:41,371][00031] Avg episode rewards: #0: 5.970, true rewards: #0: 3.470
	[2025-03-06 13:12:41,371][00031] Avg episode reward: 5.970, avg true_objective: 3.470
	[2025-03-06 13:12:41,380][00031] Num frames 700...
	[2025-03-06 13:12:41,502][00031] Num frames 800...
	[2025-03-06 13:12:41,627][00031] Num frames 900...
	[2025-03-06 13:12:41,756][00031] Num frames 1000...
	[2025-03-06 13:12:41,899][00031] Num frames 1100...
	[2025-03-06 13:12:42,056][00031] Avg episode rewards: #0: 7.580, true rewards: #0: 3.913
	[2025-03-06 13:12:42,057][00031] Avg episode reward: 7.580, avg true_objective: 3.913
	[2025-03-06 13:12:42,090][00031] Num frames 1200...
	[2025-03-06 13:12:42,210][00031] Num frames 1300...
	[2025-03-06 13:12:42,333][00031] Num frames 1400...
	[2025-03-06 13:12:42,457][00031] Num frames 1500...
	[2025-03-06 13:12:42,585][00031] Num frames 1600...
	[2025-03-06 13:12:42,708][00031] Num frames 1700...
	[2025-03-06 13:12:42,829][00031] Num frames 1800...
	[2025-03-06 13:12:42,955][00031] Num frames 1900...
	[2025-03-06 13:12:43,063][00031] Avg episode rewards: #0: 9.605, true rewards: #0: 4.855
	[2025-03-06 13:12:43,064][00031] Avg episode reward: 9.605, avg true_objective: 4.855
	[2025-03-06 13:12:43,137][00031] Num frames 2000...
	[2025-03-06 13:12:43,264][00031] Num frames 2100...
	[2025-03-06 13:12:43,391][00031] Num frames 2200...
	[2025-03-06 13:12:43,511][00031] Num frames 2300...
	[2025-03-06 13:12:43,631][00031] Num frames 2400...
	[2025-03-06 13:12:43,752][00031] Num frames 2500...
	[2025-03-06 13:12:43,872][00031] Num frames 2600...
	[2025-03-06 13:12:43,994][00031] Num frames 2700...
	[2025-03-06 13:12:44,122][00031] Num frames 2800...
	[2025-03-06 13:12:44,252][00031] Num frames 2900...
	[2025-03-06 13:12:44,375][00031] Num frames 3000...
	[2025-03-06 13:12:44,497][00031] Num frames 3100...
	[2025-03-06 13:12:44,639][00031] Num frames 3200...
	[2025-03-06 13:12:44,778][00031] Num frames 3300...
	[2025-03-06 13:12:44,924][00031] Num frames 3400...
	[2025-03-06 13:12:45,047][00031] Num frames 3500...
	[2025-03-06 13:12:45,171][00031] Num frames 3600...
	[2025-03-06 13:12:45,293][00031] Num frames 3700...
	[2025-03-06 13:12:45,418][00031] Num frames 3800...
	[2025-03-06 13:12:45,543][00031] Num frames 3900...
	[2025-03-06 13:12:45,670][00031] Num frames 4000...
	[2025-03-06 13:12:45,777][00031] Avg episode rewards: #0: 18.084, true rewards: #0: 8.084
	[2025-03-06 13:12:45,777][00031] Avg episode reward: 18.084, avg true_objective: 8.084
	[2025-03-06 13:12:45,847][00031] Num frames 4100...
	[2025-03-06 13:12:45,967][00031] Num frames 4200...
	[2025-03-06 13:12:46,093][00031] Num frames 4300...
	[2025-03-06 13:12:46,215][00031] Num frames 4400...
	[2025-03-06 13:12:46,337][00031] Num frames 4500...
	[2025-03-06 13:12:46,459][00031] Num frames 4600...
	[2025-03-06 13:12:46,584][00031] Num frames 4700...
	[2025-03-06 13:12:46,706][00031] Num frames 4800...
	[2025-03-06 13:12:46,828][00031] Num frames 4900...
	[2025-03-06 13:12:46,956][00031] Num frames 5000...
	[2025-03-06 13:12:47,093][00031] Num frames 5100...
	[2025-03-06 13:12:47,223][00031] Num frames 5200...
	[2025-03-06 13:12:47,355][00031] Avg episode rewards: #0: 19.430, true rewards: #0: 8.763
	[2025-03-06 13:12:47,356][00031] Avg episode reward: 19.430, avg true_objective: 8.763
	[2025-03-06 13:12:47,409][00031] Num frames 5300...
	[2025-03-06 13:12:47,530][00031] Num frames 5400...
	[2025-03-06 13:12:47,651][00031] Num frames 5500...
	[2025-03-06 13:12:47,772][00031] Num frames 5600...
	[2025-03-06 13:12:47,895][00031] Num frames 5700...
	[2025-03-06 13:12:48,016][00031] Num frames 5800...
	[2025-03-06 13:12:48,144][00031] Num frames 5900...
	[2025-03-06 13:12:48,269][00031] Num frames 6000...
	[2025-03-06 13:12:48,397][00031] Num frames 6100...
	[2025-03-06 13:12:48,528][00031] Num frames 6200...
	[2025-03-06 13:12:48,651][00031] Num frames 6300...
	[2025-03-06 13:12:48,724][00031] Avg episode rewards: #0: 20.306, true rewards: #0: 9.020
	[2025-03-06 13:12:48,725][00031] Avg episode reward: 20.306, avg true_objective: 9.020
	[2025-03-06 13:12:48,827][00031] Num frames 6400...
	[2025-03-06 13:12:48,948][00031] Num frames 6500...
	[2025-03-06 13:12:49,074][00031] Num frames 6600...
	[2025-03-06 13:12:49,199][00031] Num frames 6700...
	[2025-03-06 13:12:49,325][00031] Num frames 6800...
	[2025-03-06 13:12:49,446][00031] Num frames 6900...
	[2025-03-06 13:12:49,573][00031] Num frames 7000...
	[2025-03-06 13:12:49,728][00031] Num frames 7100...
	[2025-03-06 13:12:49,891][00031] Num frames 7200...
	[2025-03-06 13:12:50,038][00031] Num frames 7300...
	[2025-03-06 13:12:50,152][00031] Avg episode rewards: #0: 20.417, true rewards: #0: 9.167
	[2025-03-06 13:12:50,153][00031] Avg episode reward: 20.417, avg true_objective: 9.167
	[2025-03-06 13:12:50,236][00031] Num frames 7400...
	[2025-03-06 13:12:50,368][00031] Num frames 7500...
	[2025-03-06 13:12:50,506][00031] Num frames 7600...
	[2025-03-06 13:12:50,644][00031] Num frames 7700...
	[2025-03-06 13:12:50,789][00031] Num frames 7800...
	[2025-03-06 13:12:50,927][00031] Num frames 7900...
	[2025-03-06 13:12:51,059][00031] Num frames 8000...
	[2025-03-06 13:12:51,181][00031] Num frames 8100...
	[2025-03-06 13:12:51,299][00031] Avg episode rewards: #0: 20.167, true rewards: #0: 9.056
	[2025-03-06 13:12:51,301][00031] Avg episode reward: 20.167, avg true_objective: 9.056
	[2025-03-06 13:12:51,367][00031] Num frames 8200...
	[2025-03-06 13:12:51,492][00031] Num frames 8300...
	[2025-03-06 13:12:51,620][00031] Num frames 8400...
	[2025-03-06 13:12:51,748][00031] Num frames 8500...
	[2025-03-06 13:12:51,871][00031] Num frames 8600...
	[2025-03-06 13:12:51,991][00031] Num frames 8700...
	[2025-03-06 13:12:52,116][00031] Num frames 8800...
	[2025-03-06 13:12:52,243][00031] Avg episode rewards: #0: 19.659, true rewards: #0: 8.859
	[2025-03-06 13:12:52,244][00031] Avg episode reward: 19.659, avg true_objective: 8.859
	[2025-03-06 13:13:22,940][00031] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4!
	[2025-03-06 13:16:17,322][00031] Loading existing experiment configuration from /kaggle/working/train_dir/default_experiment/config.json
	[2025-03-06 13:16:17,323][00031] Overriding arg 'num_workers' with value 1 passed from command line
	[2025-03-06 13:16:17,324][00031] Adding new argument 'no_render'=True that is not in the saved config file!
	[2025-03-06 13:16:17,325][00031] Adding new argument 'save_video'=True that is not in the saved config file!
	[2025-03-06 13:16:17,326][00031] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
	[2025-03-06 13:16:17,327][00031] Adding new argument 'video_name'=None that is not in the saved config file!
	[2025-03-06 13:16:17,328][00031] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
	[2025-03-06 13:16:17,329][00031] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
	[2025-03-06 13:16:17,330][00031] Adding new argument 'push_to_hub'=True that is not in the saved config file!
	[2025-03-06 13:16:17,331][00031] Adding new argument 'hf_repository'='faelwen/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
	[2025-03-06 13:16:17,332][00031] Adding new argument 'policy_index'=0 that is not in the saved config file!
	[2025-03-06 13:16:17,333][00031] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
	[2025-03-06 13:16:17,334][00031] Adding new argument 'train_script'=None that is not in the saved config file!
	[2025-03-06 13:16:17,335][00031] Adding new argument 'enjoy_script'=None that is not in the saved config file!
	[2025-03-06 13:16:17,337][00031] Using frameskip 1 and render_action_repeat=4 for evaluation
	[2025-03-06 13:16:17,370][00031] RunningMeanStd input shape: (3, 72, 128)
	[2025-03-06 13:16:17,372][00031] RunningMeanStd input shape: (1,)
	[2025-03-06 13:16:17,384][00031] ConvEncoder: input_channels=3
	[2025-03-06 13:16:17,426][00031] Conv encoder output size: 512
	[2025-03-06 13:16:17,427][00031] Policy head output size: 512
	[2025-03-06 13:16:17,447][00031] Loading state from checkpoint /kaggle/working/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
	[2025-03-06 13:16:17,898][00031] Num frames 100...
	[2025-03-06 13:16:18,024][00031] Num frames 200...
	[2025-03-06 13:16:18,149][00031] Num frames 300...
	[2025-03-06 13:16:18,267][00031] Num frames 400...
	[2025-03-06 13:16:18,388][00031] Num frames 500...
	[2025-03-06 13:16:18,510][00031] Num frames 600...
	[2025-03-06 13:16:18,634][00031] Num frames 700...
	[2025-03-06 13:16:18,753][00031] Num frames 800...
	[2025-03-06 13:16:18,879][00031] Num frames 900...
	[2025-03-06 13:16:18,970][00031] Avg episode rewards: #0: 23.280, true rewards: #0: 9.280
	[2025-03-06 13:16:18,972][00031] Avg episode reward: 23.280, avg true_objective: 9.280
	[2025-03-06 13:16:19,060][00031] Num frames 1000...
	[2025-03-06 13:16:19,180][00031] Num frames 1100...
	[2025-03-06 13:16:19,303][00031] Num frames 1200...
	[2025-03-06 13:16:19,428][00031] Num frames 1300...
	[2025-03-06 13:16:19,555][00031] Num frames 1400...
	[2025-03-06 13:16:19,682][00031] Num frames 1500...
	[2025-03-06 13:16:19,830][00031] Avg episode rewards: #0: 19.380, true rewards: #0: 7.880
	[2025-03-06 13:16:19,831][00031] Avg episode reward: 19.380, avg true_objective: 7.880
	[2025-03-06 13:16:19,862][00031] Num frames 1600...
	[2025-03-06 13:16:19,990][00031] Num frames 1700...
	[2025-03-06 13:16:20,119][00031] Num frames 1800...
	[2025-03-06 13:16:20,246][00031] Num frames 1900...
	[2025-03-06 13:16:20,373][00031] Num frames 2000...
	[2025-03-06 13:16:20,501][00031] Num frames 2100...
	[2025-03-06 13:16:20,630][00031] Num frames 2200...
	[2025-03-06 13:16:20,751][00031] Num frames 2300...
	[2025-03-06 13:16:20,873][00031] Num frames 2400...
	[2025-03-06 13:16:20,996][00031] Num frames 2500...
	[2025-03-06 13:16:21,120][00031] Num frames 2600...
	[2025-03-06 13:16:21,238][00031] Num frames 2700...
	[2025-03-06 13:16:21,403][00031] Avg episode rewards: #0: 22.640, true rewards: #0: 9.307
	[2025-03-06 13:16:21,404][00031] Avg episode reward: 22.640, avg true_objective: 9.307
	[2025-03-06 13:16:21,416][00031] Num frames 2800...
	[2025-03-06 13:16:21,532][00031] Num frames 2900...
	[2025-03-06 13:16:21,651][00031] Num frames 3000...
	[2025-03-06 13:16:21,775][00031] Num frames 3100...
	[2025-03-06 13:16:21,904][00031] Num frames 3200...
	[2025-03-06 13:16:22,029][00031] Num frames 3300...
	[2025-03-06 13:16:22,151][00031] Num frames 3400...
	[2025-03-06 13:16:22,285][00031] Avg episode rewards: #0: 19.660, true rewards: #0: 8.660
	[2025-03-06 13:16:22,286][00031] Avg episode reward: 19.660, avg true_objective: 8.660
	[2025-03-06 13:16:22,331][00031] Num frames 3500...
	[2025-03-06 13:16:22,458][00031] Num frames 3600...
	[2025-03-06 13:16:22,588][00031] Num frames 3700...
	[2025-03-06 13:16:22,712][00031] Num frames 3800...
	[2025-03-06 13:16:22,842][00031] Num frames 3900...
	[2025-03-06 13:16:22,976][00031] Num frames 4000...
	[2025-03-06 13:16:23,106][00031] Num frames 4100...
	[2025-03-06 13:16:23,235][00031] Num frames 4200...
	[2025-03-06 13:16:23,365][00031] Num frames 4300...
	[2025-03-06 13:16:23,455][00031] Avg episode rewards: #0: 19.256, true rewards: #0: 8.656
	[2025-03-06 13:16:23,456][00031] Avg episode reward: 19.256, avg true_objective: 8.656
	[2025-03-06 13:16:23,546][00031] Num frames 4400...
	[2025-03-06 13:16:23,673][00031] Num frames 4500...
	[2025-03-06 13:16:23,791][00031] Num frames 4600...
	[2025-03-06 13:16:23,911][00031] Num frames 4700...
	[2025-03-06 13:16:24,036][00031] Num frames 4800...
	[2025-03-06 13:16:24,161][00031] Num frames 4900...
	[2025-03-06 13:16:24,279][00031] Num frames 5000...
	[2025-03-06 13:16:24,402][00031] Num frames 5100...
	[2025-03-06 13:16:24,525][00031] Num frames 5200...
	[2025-03-06 13:16:24,672][00031] Num frames 5300...
	[2025-03-06 13:16:24,796][00031] Num frames 5400...
	[2025-03-06 13:16:24,939][00031] Num frames 5500...
	[2025-03-06 13:16:25,062][00031] Num frames 5600...
	[2025-03-06 13:16:25,182][00031] Num frames 5700...
	[2025-03-06 13:16:25,304][00031] Num frames 5800...
	[2025-03-06 13:16:25,426][00031] Num frames 5900...
	[2025-03-06 13:16:25,550][00031] Num frames 6000...
	[2025-03-06 13:16:25,636][00031] Avg episode rewards: #0: 23.707, true rewards: #0: 10.040
	[2025-03-06 13:16:25,636][00031] Avg episode reward: 23.707, avg true_objective: 10.040
	[2025-03-06 13:16:25,728][00031] Num frames 6100...
	[2025-03-06 13:16:25,853][00031] Num frames 6200...
	[2025-03-06 13:16:25,976][00031] Num frames 6300...
	[2025-03-06 13:16:26,102][00031] Num frames 6400...
	[2025-03-06 13:16:26,230][00031] Num frames 6500...
	[2025-03-06 13:16:26,358][00031] Num frames 6600...
	[2025-03-06 13:16:26,485][00031] Num frames 6700...
	[2025-03-06 13:16:26,614][00031] Num frames 6800...
	[2025-03-06 13:16:26,730][00031] Num frames 6900...
	[2025-03-06 13:16:26,846][00031] Num frames 7000...
	[2025-03-06 13:16:26,965][00031] Num frames 7100...
	[2025-03-06 13:16:27,083][00031] Num frames 7200...
	[2025-03-06 13:16:27,199][00031] Num frames 7300...
	[2025-03-06 13:16:27,320][00031] Num frames 7400...
	[2025-03-06 13:16:27,440][00031] Num frames 7500...
	[2025-03-06 13:16:27,500][00031] Avg episode rewards: #0: 25.434, true rewards: #0: 10.720
	[2025-03-06 13:16:27,501][00031] Avg episode reward: 25.434, avg true_objective: 10.720
	[2025-03-06 13:16:27,616][00031] Num frames 7600...
	[2025-03-06 13:16:27,741][00031] Num frames 7700...
	[2025-03-06 13:16:27,866][00031] Num frames 7800...
	[2025-03-06 13:16:27,993][00031] Num frames 7900...
	[2025-03-06 13:16:28,119][00031] Num frames 8000...
	[2025-03-06 13:16:28,244][00031] Num frames 8100...
	[2025-03-06 13:16:28,374][00031] Num frames 8200...
	[2025-03-06 13:16:28,506][00031] Num frames 8300...
	[2025-03-06 13:16:28,632][00031] Num frames 8400...
	[2025-03-06 13:16:28,750][00031] Num frames 8500...
	[2025-03-06 13:16:28,922][00031] Avg episode rewards: #0: 25.240, true rewards: #0: 10.740
	[2025-03-06 13:16:28,923][00031] Avg episode reward: 25.240, avg true_objective: 10.740
	[2025-03-06 13:16:28,935][00031] Num frames 8600...
	[2025-03-06 13:16:29,055][00031] Num frames 8700...
	[2025-03-06 13:16:29,174][00031] Num frames 8800...
	[2025-03-06 13:16:29,296][00031] Num frames 8900...
	[2025-03-06 13:16:29,416][00031] Num frames 9000...
	[2025-03-06 13:16:29,544][00031] Num frames 9100...
	[2025-03-06 13:16:29,717][00031] Avg episode rewards: #0: 23.886, true rewards: #0: 10.219
	[2025-03-06 13:16:29,718][00031] Avg episode reward: 23.886, avg true_objective: 10.219
	[2025-03-06 13:16:29,722][00031] Num frames 9200...
	[2025-03-06 13:16:29,840][00031] Num frames 9300...
	[2025-03-06 13:16:29,963][00031] Num frames 9400...
	[2025-03-06 13:16:30,086][00031] Num frames 9500...
	[2025-03-06 13:16:30,206][00031] Num frames 9600...
	[2025-03-06 13:16:30,327][00031] Num frames 9700...
	[2025-03-06 13:16:30,393][00031] Avg episode rewards: #0: 22.309, true rewards: #0: 9.709
	[2025-03-06 13:16:30,394][00031] Avg episode reward: 22.309, avg true_objective: 9.709
	[2025-03-06 13:17:03,897][00031] Replay video saved to /kaggle/working/train_dir/default_experiment/replay.mp4!