diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1,19 +1,50 @@ -[2025-09-03 02:46:28,056][03240] Worker 1 uses CPU cores [1] -[2025-09-03 02:46:28,272][03226] Starting seed is not provided -[2025-09-03 02:46:28,275][03226] Initializing actor-critic model on device cpu -[2025-09-03 02:46:28,277][03226] RunningMeanStd input shape: (3, 72, 128) -[2025-09-03 02:46:28,283][03226] RunningMeanStd input shape: (1,) -[2025-09-03 02:46:28,437][03226] ConvEncoder: input_channels=3 -[2025-09-03 02:46:28,636][03245] Worker 5 uses CPU cores [1] -[2025-09-03 02:46:29,176][03247] Worker 7 uses CPU cores [1] -[2025-09-03 02:46:29,393][03242] Worker 0 uses CPU cores [0] -[2025-09-03 02:46:29,503][03244] Worker 4 uses CPU cores [0] -[2025-09-03 02:46:29,508][03246] Worker 6 uses CPU cores [0] -[2025-09-03 02:46:29,629][03241] Worker 2 uses CPU cores [0] -[2025-09-03 02:46:29,649][03226] Conv encoder output size: 512 -[2025-09-03 02:46:29,652][03226] Policy head output size: 512 -[2025-09-03 02:46:29,736][03226] Created Actor Critic model with architecture: -[2025-09-03 02:46:29,738][03226] ActorCriticSharedWeights( +[2025-09-03 03:53:10,985][02795] Saving configuration to /content/train_dir/default_experiment/config.json... +[2025-09-03 03:53:10,987][02795] Rollout worker 0 uses device cpu +[2025-09-03 03:53:10,988][02795] Rollout worker 1 uses device cpu +[2025-09-03 03:53:10,989][02795] Rollout worker 2 uses device cpu +[2025-09-03 03:53:10,989][02795] Rollout worker 3 uses device cpu +[2025-09-03 03:53:10,991][02795] Rollout worker 4 uses device cpu +[2025-09-03 03:53:10,992][02795] Rollout worker 5 uses device cpu +[2025-09-03 03:53:10,994][02795] Rollout worker 6 uses device cpu +[2025-09-03 03:53:10,994][02795] Rollout worker 7 uses device cpu +[2025-09-03 03:53:11,167][02795] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-03 03:53:11,168][02795] InferenceWorker_p0-w0: min num requests: 2 +[2025-09-03 03:53:11,205][02795] Starting all processes... +[2025-09-03 03:53:11,206][02795] Starting process learner_proc0 +[2025-09-03 03:53:11,291][02795] Starting all processes... +[2025-09-03 03:53:11,299][02795] Starting process inference_proc0-0 +[2025-09-03 03:53:11,299][02795] Starting process rollout_proc0 +[2025-09-03 03:53:11,300][02795] Starting process rollout_proc1 +[2025-09-03 03:53:11,300][02795] Starting process rollout_proc2 +[2025-09-03 03:53:11,300][02795] Starting process rollout_proc3 +[2025-09-03 03:53:11,300][02795] Starting process rollout_proc4 +[2025-09-03 03:53:11,300][02795] Starting process rollout_proc5 +[2025-09-03 03:53:11,301][02795] Starting process rollout_proc6 +[2025-09-03 03:53:11,301][02795] Starting process rollout_proc7 +[2025-09-03 03:53:28,526][03490] Worker 5 uses CPU cores [1] +[2025-09-03 03:53:28,545][03472] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-03 03:53:28,551][03472] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-09-03 03:53:28,602][03491] Worker 4 uses CPU cores [0] +[2025-09-03 03:53:28,629][03472] Num visible devices: 1 +[2025-09-03 03:53:28,634][03472] Starting seed is not provided +[2025-09-03 03:53:28,635][03472] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-03 03:53:28,635][03472] Initializing actor-critic model on device cuda:0 +[2025-09-03 03:53:28,639][03472] RunningMeanStd input shape: (3, 72, 128) +[2025-09-03 03:53:28,643][03472] RunningMeanStd input shape: (1,) +[2025-09-03 03:53:28,689][03489] Worker 3 uses CPU cores [1] +[2025-09-03 03:53:28,690][03472] ConvEncoder: input_channels=3 +[2025-09-03 03:53:28,722][03487] Worker 1 uses CPU cores [1] +[2025-09-03 03:53:28,974][03488] Worker 2 uses CPU cores [0] +[2025-09-03 03:53:28,979][03485] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-03 03:53:28,980][03485] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-09-03 03:53:28,992][03493] Worker 7 uses CPU cores [1] +[2025-09-03 03:53:29,017][03485] Num visible devices: 1 +[2025-09-03 03:53:29,023][03492] Worker 6 uses CPU cores [0] +[2025-09-03 03:53:29,069][03486] Worker 0 uses CPU cores [0] +[2025-09-03 03:53:29,103][03472] Conv encoder output size: 512 +[2025-09-03 03:53:29,104][03472] Policy head output size: 512 +[2025-09-03 03:53:29,157][03472] Created Actor Critic model with architecture: +[2025-09-03 03:53:29,157][03472] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -54,167 +85,204 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2025-09-03 02:46:29,766][03243] Worker 3 uses CPU cores [1] -[2025-09-03 02:46:30,126][03226] Using optimizer -[2025-09-03 02:46:38,640][03226] No checkpoints found -[2025-09-03 02:46:38,640][03226] Did not load from checkpoint, starting from scratch! -[2025-09-03 02:46:38,641][03226] Initialized policy 0 weights for model version 0 -[2025-09-03 02:46:38,644][03226] LearnerWorker_p0 finished initialization! -[2025-09-03 02:46:38,656][03239] RunningMeanStd input shape: (3, 72, 128) -[2025-09-03 02:46:38,660][03239] RunningMeanStd input shape: (1,) -[2025-09-03 02:46:38,695][03239] ConvEncoder: input_channels=3 -[2025-09-03 02:46:38,885][03239] Conv encoder output size: 512 -[2025-09-03 02:46:38,886][03239] Policy head output size: 512 -[2025-09-03 02:46:39,233][03241] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 02:46:39,239][03246] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 02:46:39,261][03244] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 02:46:39,259][03242] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 02:46:39,270][03240] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 02:46:39,272][03245] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 02:46:39,274][03243] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 02:46:39,282][03247] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 02:46:40,599][03246] Decorrelating experience for 0 frames... -[2025-09-03 02:46:40,599][03244] Decorrelating experience for 0 frames... -[2025-09-03 02:46:40,688][03245] Decorrelating experience for 0 frames... -[2025-09-03 02:46:40,696][03243] Decorrelating experience for 0 frames... -[2025-09-03 02:46:41,645][03246] Decorrelating experience for 32 frames... -[2025-09-03 02:46:41,694][03241] Decorrelating experience for 0 frames... -[2025-09-03 02:46:42,167][03247] Decorrelating experience for 0 frames... -[2025-09-03 02:46:42,211][03240] Decorrelating experience for 0 frames... -[2025-09-03 02:46:42,221][03243] Decorrelating experience for 32 frames... -[2025-09-03 02:46:43,266][03241] Decorrelating experience for 32 frames... -[2025-09-03 02:46:43,360][03240] Decorrelating experience for 32 frames... -[2025-09-03 02:46:43,779][03246] Decorrelating experience for 64 frames... -[2025-09-03 02:46:43,857][03243] Decorrelating experience for 64 frames... -[2025-09-03 02:46:44,140][03242] Decorrelating experience for 0 frames... -[2025-09-03 02:46:44,199][03244] Decorrelating experience for 32 frames... -[2025-09-03 02:46:44,784][03247] Decorrelating experience for 32 frames... -[2025-09-03 02:46:44,794][03246] Decorrelating experience for 96 frames... -[2025-09-03 02:46:45,756][03243] Decorrelating experience for 96 frames... -[2025-09-03 02:46:46,344][03240] Decorrelating experience for 64 frames... -[2025-09-03 02:46:47,190][03241] Decorrelating experience for 64 frames... -[2025-09-03 02:46:49,420][03247] Decorrelating experience for 64 frames... -[2025-09-03 02:46:50,669][03245] Decorrelating experience for 32 frames... -[2025-09-03 02:46:50,873][03242] Decorrelating experience for 32 frames... -[2025-09-03 02:46:51,178][03240] Decorrelating experience for 96 frames... -[2025-09-03 02:46:53,391][03244] Decorrelating experience for 64 frames... -[2025-09-03 02:46:53,516][03247] Decorrelating experience for 96 frames... -[2025-09-03 02:46:54,675][03242] Decorrelating experience for 64 frames... -[2025-09-03 02:46:55,066][03245] Decorrelating experience for 64 frames... -[2025-09-03 02:46:59,229][03241] Decorrelating experience for 96 frames... -[2025-09-03 02:46:59,808][03244] Decorrelating experience for 96 frames... -[2025-09-03 02:47:00,304][03245] Decorrelating experience for 96 frames... -[2025-09-03 02:47:01,552][03226] Signal inference workers to stop experience collection... -[2025-09-03 02:47:01,636][03239] InferenceWorker_p0-w0: stopping experience collection -[2025-09-03 02:47:01,632][03242] Decorrelating experience for 96 frames... -[2025-09-03 02:47:02,639][03226] Signal inference workers to resume experience collection... -[2025-09-03 02:47:02,642][03239] InferenceWorker_p0-w0: resuming experience collection -[2025-09-03 02:48:00,561][03226] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000010_40960.pth... -[2025-09-03 02:48:00,570][03239] Updated weights for policy 0, policy_version 10 (0.1478) -[2025-09-03 02:48:58,804][03239] Updated weights for policy 0, policy_version 20 (0.1955) -[2025-09-03 02:49:33,865][03226] Saving new best policy, reward=4.331! -[2025-09-03 02:49:40,915][03226] Saving new best policy, reward=4.400! -[2025-09-03 02:49:46,146][03226] Saving new best policy, reward=4.435! -[2025-09-03 02:49:51,216][03226] Saving new best policy, reward=4.457! -[2025-09-03 02:49:58,552][03226] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000030_122880.pth... -[2025-09-03 02:49:58,559][03239] Updated weights for policy 0, policy_version 30 (0.0714) -[2025-09-03 02:49:58,676][03226] Saving new best policy, reward=4.492! -[2025-09-03 02:50:09,279][03226] Saving new best policy, reward=4.502! -[2025-09-03 02:50:16,831][03226] Saving new best policy, reward=4.520! -[2025-09-03 02:50:21,732][03226] Saving new best policy, reward=4.524! -[2025-09-03 02:50:27,005][03226] Saving new best policy, reward=4.544! -[2025-09-03 02:50:55,902][03239] Updated weights for policy 0, policy_version 40 (0.1275) -[2025-09-03 02:51:54,138][03239] Updated weights for policy 0, policy_version 50 (0.0082) -[2025-09-03 02:51:58,747][03226] Signal inference workers to stop experience collection... (50 times) -[2025-09-03 02:51:58,835][03239] InferenceWorker_p0-w0: stopping experience collection (50 times) -[2025-09-03 02:52:00,256][03226] Signal inference workers to resume experience collection... (50 times) -[2025-09-03 02:52:00,258][03226] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000051_208896.pth... -[2025-09-03 02:52:00,258][03239] InferenceWorker_p0-w0: resuming experience collection (50 times) -[2025-09-03 02:52:00,365][03226] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000010_40960.pth -[2025-09-03 02:52:52,047][03239] Updated weights for policy 0, policy_version 60 (0.0673) -[2025-09-03 02:52:57,042][03226] Saving new best policy, reward=4.552! -[2025-09-03 02:53:21,888][03226] Saving new best policy, reward=4.592! -[2025-09-03 02:53:32,172][03226] Saving new best policy, reward=4.645! -[2025-09-03 02:53:50,100][03239] Updated weights for policy 0, policy_version 70 (0.0075) -[2025-09-03 02:54:01,635][03226] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000072_294912.pth... -[2025-09-03 02:54:01,757][03226] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000030_122880.pth -[2025-09-03 02:54:48,292][03239] Updated weights for policy 0, policy_version 80 (0.3457) -[2025-09-03 02:55:44,867][03239] Updated weights for policy 0, policy_version 90 (0.0705) -[2025-09-03 02:56:01,796][03226] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000093_380928.pth... -[2025-09-03 02:56:01,890][03226] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000051_208896.pth -[2025-09-03 02:56:38,800][03226] Stopping Batcher_0... -[2025-09-03 02:56:38,802][03226] Loop batcher_evt_loop terminating... -[2025-09-03 02:56:39,182][03239] Weights refcount: 2 0 -[2025-09-03 02:56:39,200][03239] Stopping InferenceWorker_p0-w0... -[2025-09-03 02:56:39,200][03239] Loop inference_proc0-0_evt_loop terminating... -[2025-09-03 02:56:39,775][03243] Stopping RolloutWorker_w3... -[2025-09-03 02:56:39,813][03243] Loop rollout_proc3_evt_loop terminating... -[2025-09-03 02:56:39,922][03247] Stopping RolloutWorker_w7... -[2025-09-03 02:56:39,932][03247] Loop rollout_proc7_evt_loop terminating... -[2025-09-03 02:56:39,900][03240] Stopping RolloutWorker_w1... -[2025-09-03 02:56:39,953][03246] Stopping RolloutWorker_w6... -[2025-09-03 02:56:39,975][03240] Loop rollout_proc1_evt_loop terminating... -[2025-09-03 02:56:39,990][03246] Loop rollout_proc6_evt_loop terminating... -[2025-09-03 02:56:40,061][03241] Stopping RolloutWorker_w2... -[2025-09-03 02:56:40,131][03241] Loop rollout_proc2_evt_loop terminating... -[2025-09-03 02:56:40,150][03244] Stopping RolloutWorker_w4... -[2025-09-03 02:56:40,205][03244] Loop rollout_proc4_evt_loop terminating... -[2025-09-03 02:56:40,192][03245] Stopping RolloutWorker_w5... -[2025-09-03 02:56:40,243][03245] Loop rollout_proc5_evt_loop terminating... -[2025-09-03 02:56:40,266][03242] Stopping RolloutWorker_w0... -[2025-09-03 02:56:40,307][03242] Loop rollout_proc0_evt_loop terminating... -[2025-09-03 02:56:46,010][03226] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000100_409600.pth... -[2025-09-03 02:56:46,136][03226] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000072_294912.pth -[2025-09-03 02:56:46,156][03226] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000100_409600.pth... -[2025-09-03 02:56:46,321][03226] Stopping LearnerWorker_p0... -[2025-09-03 02:56:46,324][03226] Loop learner_proc0_evt_loop terminating... -[2025-09-03 03:12:33,849][14933] Saving configuration to /content/train_dir/default_experiment/config.json... -[2025-09-03 03:12:33,854][14933] Rollout worker 0 uses device cpu -[2025-09-03 03:12:33,856][14933] Rollout worker 1 uses device cpu -[2025-09-03 03:12:33,857][14933] Rollout worker 2 uses device cpu -[2025-09-03 03:12:33,858][14933] Rollout worker 3 uses device cpu -[2025-09-03 03:12:33,859][14933] Rollout worker 4 uses device cpu -[2025-09-03 03:12:33,860][14933] Rollout worker 5 uses device cpu -[2025-09-03 03:12:33,862][14933] Rollout worker 6 uses device cpu -[2025-09-03 03:12:33,862][14933] Rollout worker 7 uses device cpu -[2025-09-03 03:12:33,964][14933] InferenceWorker_p0-w0: min num requests: 2 -[2025-09-03 03:12:34,018][14933] Starting all processes... -[2025-09-03 03:12:34,019][14933] Starting process learner_proc0 -[2025-09-03 03:12:34,130][14933] Starting all processes... -[2025-09-03 03:12:34,143][14933] Starting process inference_proc0-0 -[2025-09-03 03:12:34,145][14933] Starting process rollout_proc0 -[2025-09-03 03:12:34,160][14933] Starting process rollout_proc1 -[2025-09-03 03:12:34,160][14933] Starting process rollout_proc2 -[2025-09-03 03:12:34,161][14933] Starting process rollout_proc3 -[2025-09-03 03:12:34,161][14933] Starting process rollout_proc4 -[2025-09-03 03:12:34,161][14933] Starting process rollout_proc5 -[2025-09-03 03:12:34,161][14933] Starting process rollout_proc6 -[2025-09-03 03:12:34,161][14933] Starting process rollout_proc7 -[2025-09-03 03:13:05,552][15657] Starting seed is not provided -[2025-09-03 03:13:05,553][15657] Initializing actor-critic model on device cpu -[2025-09-03 03:13:05,555][15657] RunningMeanStd input shape: (3, 72, 128) -[2025-09-03 03:13:05,558][15657] RunningMeanStd input shape: (1,) -[2025-09-03 03:13:05,567][14933] Heartbeat connected on Batcher_0 -[2025-09-03 03:13:05,607][15657] ConvEncoder: input_channels=3 -[2025-09-03 03:13:06,118][15674] Worker 1 uses CPU cores [1] -[2025-09-03 03:13:06,157][14933] Heartbeat connected on RolloutWorker_w1 -[2025-09-03 03:13:06,159][15671] Worker 0 uses CPU cores [0] -[2025-09-03 03:13:06,165][14933] Heartbeat connected on RolloutWorker_w0 -[2025-09-03 03:13:06,461][15677] Worker 7 uses CPU cores [1] -[2025-09-03 03:13:06,502][14933] Heartbeat connected on RolloutWorker_w7 -[2025-09-03 03:13:06,518][14933] Heartbeat connected on InferenceWorker_p0-w0 -[2025-09-03 03:13:06,634][15676] Worker 6 uses CPU cores [0] -[2025-09-03 03:13:06,642][14933] Heartbeat connected on RolloutWorker_w6 -[2025-09-03 03:13:06,679][15672] Worker 5 uses CPU cores [1] -[2025-09-03 03:13:06,700][14933] Heartbeat connected on RolloutWorker_w5 -[2025-09-03 03:13:06,722][15678] Worker 4 uses CPU cores [0] -[2025-09-03 03:13:06,726][14933] Heartbeat connected on RolloutWorker_w4 -[2025-09-03 03:13:06,747][15657] Conv encoder output size: 512 -[2025-09-03 03:13:06,750][15657] Policy head output size: 512 -[2025-09-03 03:13:06,777][15675] Worker 2 uses CPU cores [0] -[2025-09-03 03:13:06,781][14933] Heartbeat connected on RolloutWorker_w2 -[2025-09-03 03:13:06,814][15657] Created Actor Critic model with architecture: -[2025-09-03 03:13:06,817][15657] ActorCriticSharedWeights( +[2025-09-03 03:53:29,565][03472] Using optimizer +[2025-09-03 03:53:31,158][02795] Heartbeat connected on Batcher_0 +[2025-09-03 03:53:31,168][02795] Heartbeat connected on InferenceWorker_p0-w0 +[2025-09-03 03:53:31,176][02795] Heartbeat connected on RolloutWorker_w0 +[2025-09-03 03:53:31,184][02795] Heartbeat connected on RolloutWorker_w1 +[2025-09-03 03:53:31,185][02795] Heartbeat connected on RolloutWorker_w2 +[2025-09-03 03:53:31,192][02795] Heartbeat connected on RolloutWorker_w3 +[2025-09-03 03:53:31,194][02795] Heartbeat connected on RolloutWorker_w4 +[2025-09-03 03:53:31,198][02795] Heartbeat connected on RolloutWorker_w5 +[2025-09-03 03:53:31,201][02795] Heartbeat connected on RolloutWorker_w6 +[2025-09-03 03:53:31,205][02795] Heartbeat connected on RolloutWorker_w7 +[2025-09-03 03:53:34,266][03472] No checkpoints found +[2025-09-03 03:53:34,266][03472] Did not load from checkpoint, starting from scratch! +[2025-09-03 03:53:34,266][03472] Initialized policy 0 weights for model version 0 +[2025-09-03 03:53:34,269][03472] LearnerWorker_p0 finished initialization! +[2025-09-03 03:53:34,270][02795] Heartbeat connected on LearnerWorker_p0 +[2025-09-03 03:53:34,272][03472] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-03 03:53:34,403][02795] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-03 03:53:34,485][03485] RunningMeanStd input shape: (3, 72, 128) +[2025-09-03 03:53:34,486][03485] RunningMeanStd input shape: (1,) +[2025-09-03 03:53:34,496][03485] ConvEncoder: input_channels=3 +[2025-09-03 03:53:34,601][03485] Conv encoder output size: 512 +[2025-09-03 03:53:34,601][03485] Policy head output size: 512 +[2025-09-03 03:53:34,637][02795] Inference worker 0-0 is ready! +[2025-09-03 03:53:34,638][02795] All inference workers are ready! Signal rollout workers to start! +[2025-09-03 03:53:34,819][03486] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:53:34,821][03490] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:53:34,824][03489] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:53:34,825][03493] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:53:34,819][03487] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:53:34,825][03491] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:53:34,826][03488] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:53:34,823][03492] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:53:35,812][03491] Decorrelating experience for 0 frames... +[2025-09-03 03:53:36,586][03489] Decorrelating experience for 0 frames... +[2025-09-03 03:53:36,588][03490] Decorrelating experience for 0 frames... +[2025-09-03 03:53:36,590][03487] Decorrelating experience for 0 frames... +[2025-09-03 03:53:36,592][03493] Decorrelating experience for 0 frames... +[2025-09-03 03:53:37,405][03491] Decorrelating experience for 32 frames... +[2025-09-03 03:53:38,616][03488] Decorrelating experience for 0 frames... +[2025-09-03 03:53:38,640][03492] Decorrelating experience for 0 frames... +[2025-09-03 03:53:38,642][03486] Decorrelating experience for 0 frames... +[2025-09-03 03:53:38,752][03490] Decorrelating experience for 32 frames... +[2025-09-03 03:53:38,754][03489] Decorrelating experience for 32 frames... +[2025-09-03 03:53:38,764][03487] Decorrelating experience for 32 frames... +[2025-09-03 03:53:39,347][03493] Decorrelating experience for 32 frames... +[2025-09-03 03:53:39,402][02795] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-03 03:53:39,730][03491] Decorrelating experience for 64 frames... +[2025-09-03 03:53:40,119][03486] Decorrelating experience for 32 frames... +[2025-09-03 03:53:40,131][03492] Decorrelating experience for 32 frames... +[2025-09-03 03:53:40,318][03490] Decorrelating experience for 64 frames... +[2025-09-03 03:53:40,585][03487] Decorrelating experience for 64 frames... +[2025-09-03 03:53:40,820][03491] Decorrelating experience for 96 frames... +[2025-09-03 03:53:41,242][03493] Decorrelating experience for 64 frames... +[2025-09-03 03:53:41,305][03486] Decorrelating experience for 64 frames... +[2025-09-03 03:53:41,890][03489] Decorrelating experience for 64 frames... +[2025-09-03 03:53:42,159][03490] Decorrelating experience for 96 frames... +[2025-09-03 03:53:42,165][03492] Decorrelating experience for 64 frames... +[2025-09-03 03:53:42,397][03487] Decorrelating experience for 96 frames... +[2025-09-03 03:53:42,809][03493] Decorrelating experience for 96 frames... +[2025-09-03 03:53:42,817][03488] Decorrelating experience for 32 frames... +[2025-09-03 03:53:43,002][03486] Decorrelating experience for 96 frames... +[2025-09-03 03:53:44,112][03492] Decorrelating experience for 96 frames... +[2025-09-03 03:53:44,402][02795] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 38.6. Samples: 386. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-03 03:53:44,403][02795] Avg episode reward: [(0, '1.088')] +[2025-09-03 03:53:46,186][03488] Decorrelating experience for 64 frames... +[2025-09-03 03:53:46,526][03489] Decorrelating experience for 96 frames... +[2025-09-03 03:53:46,995][03472] Signal inference workers to stop experience collection... +[2025-09-03 03:53:47,027][03485] InferenceWorker_p0-w0: stopping experience collection +[2025-09-03 03:53:47,343][03488] Decorrelating experience for 96 frames... +[2025-09-03 03:53:49,211][03472] Signal inference workers to resume experience collection... +[2025-09-03 03:53:49,212][03485] InferenceWorker_p0-w0: resuming experience collection +[2025-09-03 03:53:49,402][02795] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 142.0. Samples: 2130. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2025-09-03 03:53:49,403][02795] Avg episode reward: [(0, '3.046')] +[2025-09-03 03:53:49,889][03472] Stopping Batcher_0... +[2025-09-03 03:53:49,889][03472] Loop batcher_evt_loop terminating... +[2025-09-03 03:53:49,891][02795] Component Batcher_0 stopped! +[2025-09-03 03:53:49,894][03472] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth... +[2025-09-03 03:53:50,028][03485] Weights refcount: 2 0 +[2025-09-03 03:53:50,056][02795] Component InferenceWorker_p0-w0 stopped! +[2025-09-03 03:53:50,059][03485] Stopping InferenceWorker_p0-w0... +[2025-09-03 03:53:50,059][03485] Loop inference_proc0-0_evt_loop terminating... +[2025-09-03 03:53:50,083][03472] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth... +[2025-09-03 03:53:50,330][02795] Component LearnerWorker_p0 stopped! +[2025-09-03 03:53:50,334][03472] Stopping LearnerWorker_p0... +[2025-09-03 03:53:50,334][03472] Loop learner_proc0_evt_loop terminating... +[2025-09-03 03:53:50,428][02795] Component RolloutWorker_w4 stopped! +[2025-09-03 03:53:50,432][03491] Stopping RolloutWorker_w4... +[2025-09-03 03:53:50,432][03491] Loop rollout_proc4_evt_loop terminating... +[2025-09-03 03:53:50,508][02795] Component RolloutWorker_w6 stopped! +[2025-09-03 03:53:50,511][03492] Stopping RolloutWorker_w6... +[2025-09-03 03:53:50,517][02795] Component RolloutWorker_w0 stopped! +[2025-09-03 03:53:50,520][03492] Loop rollout_proc6_evt_loop terminating... +[2025-09-03 03:53:50,520][03486] Stopping RolloutWorker_w0... +[2025-09-03 03:53:50,522][03486] Loop rollout_proc0_evt_loop terminating... +[2025-09-03 03:53:50,526][02795] Component RolloutWorker_w2 stopped! +[2025-09-03 03:53:50,528][03488] Stopping RolloutWorker_w2... +[2025-09-03 03:53:50,529][03488] Loop rollout_proc2_evt_loop terminating... +[2025-09-03 03:53:50,645][02795] Component RolloutWorker_w3 stopped! +[2025-09-03 03:53:50,648][03489] Stopping RolloutWorker_w3... +[2025-09-03 03:53:50,649][03489] Loop rollout_proc3_evt_loop terminating... +[2025-09-03 03:53:50,677][02795] Component RolloutWorker_w5 stopped! +[2025-09-03 03:53:50,678][03490] Stopping RolloutWorker_w5... +[2025-09-03 03:53:50,684][03490] Loop rollout_proc5_evt_loop terminating... +[2025-09-03 03:53:50,696][02795] Component RolloutWorker_w1 stopped! +[2025-09-03 03:53:50,697][03487] Stopping RolloutWorker_w1... +[2025-09-03 03:53:50,698][03487] Loop rollout_proc1_evt_loop terminating... +[2025-09-03 03:53:50,785][02795] Component RolloutWorker_w7 stopped! +[2025-09-03 03:53:50,787][02795] Waiting for process learner_proc0 to stop... +[2025-09-03 03:53:50,788][03493] Stopping RolloutWorker_w7... +[2025-09-03 03:53:50,788][03493] Loop rollout_proc7_evt_loop terminating... +[2025-09-03 03:53:52,639][02795] Waiting for process inference_proc0-0 to join... +[2025-09-03 03:53:52,852][02795] Waiting for process rollout_proc0 to join... +[2025-09-03 03:53:54,979][02795] Waiting for process rollout_proc1 to join... +[2025-09-03 03:53:54,980][02795] Waiting for process rollout_proc2 to join... +[2025-09-03 03:53:54,981][02795] Waiting for process rollout_proc3 to join... +[2025-09-03 03:53:54,982][02795] Waiting for process rollout_proc4 to join... +[2025-09-03 03:53:54,983][02795] Waiting for process rollout_proc5 to join... +[2025-09-03 03:53:54,984][02795] Waiting for process rollout_proc6 to join... +[2025-09-03 03:53:54,985][02795] Waiting for process rollout_proc7 to join... +[2025-09-03 03:53:54,986][02795] Batcher 0 profile tree view: +batching: 0.0738, releasing_batches: 0.0004 +[2025-09-03 03:53:54,987][02795] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 8.3224 +update_model: 0.0364 + weight_update: 0.0018 +one_step: 0.0185 + handle_policy_step: 4.5182 + deserialize: 0.0500, stack: 0.0118, obs_to_device_normalize: 0.7552, forward: 3.2239, send_messages: 0.1353 + prepare_outputs: 0.2458 + to_cpu: 0.1390 +[2025-09-03 03:53:54,989][02795] Learner 0 profile tree view: +misc: 0.0000, prepare_batch: 2.4031 +train: 2.3071 + epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0011, kl_divergence: 0.0093, after_optimizer: 0.1153 + calculate_losses: 0.6184 + losses_init: 0.0000, forward_head: 0.4467, bptt_initial: 0.0725, tail: 0.0364, advantages_returns: 0.0009, losses: 0.0562 + bptt: 0.0053 + bptt_forward_core: 0.0052 + update: 1.5621 + clip: 0.0854 +[2025-09-03 03:53:54,990][02795] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.0084, enqueue_policy_requests: 0.3402, env_step: 3.4139, overhead: 0.0679, complete_rollouts: 0.0008 +save_policy_outputs: 0.1217 + split_output_tensors: 0.0499 +[2025-09-03 03:53:54,991][02795] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.0013, enqueue_policy_requests: 0.7778, env_step: 3.1723, overhead: 0.0885, complete_rollouts: 0.0417 +save_policy_outputs: 0.0993 + split_output_tensors: 0.0403 +[2025-09-03 03:53:54,992][02795] Loop Runner_EvtLoop terminating... +[2025-09-03 03:53:54,993][02795] Runner profile tree view: +main_loop: 43.7883 +[2025-09-03 03:53:54,994][02795] Collected {0: 8192}, FPS: 187.1 +[2025-09-03 03:57:01,407][08012] Saving configuration to /content/train_dir/default_experiment/config.json... +[2025-09-03 03:57:01,411][08012] Rollout worker 0 uses device cpu +[2025-09-03 03:57:01,412][08012] Rollout worker 1 uses device cpu +[2025-09-03 03:57:01,413][08012] Rollout worker 2 uses device cpu +[2025-09-03 03:57:01,414][08012] Rollout worker 3 uses device cpu +[2025-09-03 03:57:01,415][08012] Rollout worker 4 uses device cpu +[2025-09-03 03:57:01,416][08012] Rollout worker 5 uses device cpu +[2025-09-03 03:57:01,416][08012] Rollout worker 6 uses device cpu +[2025-09-03 03:57:01,417][08012] Rollout worker 7 uses device cpu +[2025-09-03 03:57:01,518][08012] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-03 03:57:01,519][08012] InferenceWorker_p0-w0: min num requests: 2 +[2025-09-03 03:57:01,558][08012] Starting all processes... +[2025-09-03 03:57:01,559][08012] Starting process learner_proc0 +[2025-09-03 03:57:01,630][08012] Starting all processes... +[2025-09-03 03:57:01,637][08012] Starting process inference_proc0-0 +[2025-09-03 03:57:01,638][08012] Starting process rollout_proc0 +[2025-09-03 03:57:01,639][08012] Starting process rollout_proc1 +[2025-09-03 03:57:01,640][08012] Starting process rollout_proc2 +[2025-09-03 03:57:01,640][08012] Starting process rollout_proc3 +[2025-09-03 03:57:01,640][08012] Starting process rollout_proc4 +[2025-09-03 03:57:01,640][08012] Starting process rollout_proc5 +[2025-09-03 03:57:01,640][08012] Starting process rollout_proc6 +[2025-09-03 03:57:01,640][08012] Starting process rollout_proc7 +[2025-09-03 03:57:17,088][08388] Worker 4 uses CPU cores [0] +[2025-09-03 03:57:17,387][08389] Worker 5 uses CPU cores [1] +[2025-09-03 03:57:17,577][08385] Worker 0 uses CPU cores [0] +[2025-09-03 03:57:17,603][08387] Worker 3 uses CPU cores [1] +[2025-09-03 03:57:17,680][08370] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-03 03:57:17,680][08370] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-09-03 03:57:17,750][08391] Worker 7 uses CPU cores [1] +[2025-09-03 03:57:17,782][08370] Num visible devices: 1 +[2025-09-03 03:57:17,788][08370] Starting seed is not provided +[2025-09-03 03:57:17,788][08370] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-03 03:57:17,788][08370] Initializing actor-critic model on device cuda:0 +[2025-09-03 03:57:17,789][08370] RunningMeanStd input shape: (3, 72, 128) +[2025-09-03 03:57:17,802][08370] RunningMeanStd input shape: (1,) +[2025-09-03 03:57:17,820][08390] Worker 6 uses CPU cores [0] +[2025-09-03 03:57:17,833][08386] Worker 2 uses CPU cores [0] +[2025-09-03 03:57:17,870][08384] Worker 1 uses CPU cores [1] +[2025-09-03 03:57:17,874][08370] ConvEncoder: input_channels=3 +[2025-09-03 03:57:18,108][08370] Conv encoder output size: 512 +[2025-09-03 03:57:18,110][08370] Policy head output size: 512 +[2025-09-03 03:57:18,134][08370] Created Actor Critic model with architecture: +[2025-09-03 03:57:18,135][08370] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( @@ -255,1313 +323,853 @@ (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) -[2025-09-03 03:13:06,819][15673] Worker 3 uses CPU cores [1] -[2025-09-03 03:13:06,822][14933] Heartbeat connected on RolloutWorker_w3 -[2025-09-03 03:13:07,342][15657] Using optimizer -[2025-09-03 03:13:09,831][15657] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000100_409600.pth... -[2025-09-03 03:13:09,924][15657] Loading model from checkpoint -[2025-09-03 03:13:09,932][15657] Loaded experiment state at self.train_step=100, self.env_steps=409600 -[2025-09-03 03:13:09,933][15657] Initialized policy 0 weights for model version 100 -[2025-09-03 03:13:09,941][15670] RunningMeanStd input shape: (3, 72, 128) -[2025-09-03 03:13:09,946][15670] RunningMeanStd input shape: (1,) -[2025-09-03 03:13:09,948][15657] LearnerWorker_p0 finished initialization! -[2025-09-03 03:13:09,951][14933] Heartbeat connected on LearnerWorker_p0 -[2025-09-03 03:13:09,994][15670] ConvEncoder: input_channels=3 -[2025-09-03 03:13:10,364][15670] Conv encoder output size: 512 -[2025-09-03 03:13:10,365][15670] Policy head output size: 512 -[2025-09-03 03:13:10,403][14933] Inference worker 0-0 is ready! -[2025-09-03 03:13:10,405][14933] All inference workers are ready! Signal rollout workers to start! -[2025-09-03 03:13:10,752][15677] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 03:13:10,773][15672] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 03:13:10,775][15673] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 03:13:10,811][15674] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 03:13:11,015][15671] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 03:13:11,023][15676] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 03:13:11,032][15678] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 03:13:11,039][15675] Doom resolution: 160x120, resize resolution: (128, 72) -[2025-09-03 03:13:13,049][15678] Decorrelating experience for 0 frames... -[2025-09-03 03:13:13,055][15676] Decorrelating experience for 0 frames... -[2025-09-03 03:13:13,468][15672] Decorrelating experience for 0 frames... -[2025-09-03 03:13:13,471][15677] Decorrelating experience for 0 frames... -[2025-09-03 03:13:13,507][15673] Decorrelating experience for 0 frames... -[2025-09-03 03:13:13,533][15674] Decorrelating experience for 0 frames... -[2025-09-03 03:13:13,947][15678] Decorrelating experience for 32 frames... -[2025-09-03 03:13:14,823][14933] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 409600. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-09-03 03:13:15,312][15676] Decorrelating experience for 32 frames... -[2025-09-03 03:13:15,325][15671] Decorrelating experience for 0 frames... -[2025-09-03 03:13:15,391][15672] Decorrelating experience for 32 frames... -[2025-09-03 03:13:15,417][15675] Decorrelating experience for 0 frames... -[2025-09-03 03:13:15,421][15677] Decorrelating experience for 32 frames... -[2025-09-03 03:13:15,490][15674] Decorrelating experience for 32 frames... -[2025-09-03 03:13:15,501][15673] Decorrelating experience for 32 frames... -[2025-09-03 03:13:16,816][15671] Decorrelating experience for 32 frames... -[2025-09-03 03:13:16,859][15678] Decorrelating experience for 64 frames... -[2025-09-03 03:13:17,500][15676] Decorrelating experience for 64 frames... -[2025-09-03 03:13:18,044][15672] Decorrelating experience for 64 frames... -[2025-09-03 03:13:18,070][15677] Decorrelating experience for 64 frames... -[2025-09-03 03:13:18,177][15673] Decorrelating experience for 64 frames... -[2025-09-03 03:13:18,467][15675] Decorrelating experience for 32 frames... -[2025-09-03 03:13:18,778][15678] Decorrelating experience for 96 frames... -[2025-09-03 03:13:18,992][15674] Decorrelating experience for 64 frames... -[2025-09-03 03:13:19,823][14933] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 409600. Throughput: 0: 4.8. Samples: 24. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-09-03 03:13:20,091][15672] Decorrelating experience for 96 frames... -[2025-09-03 03:13:20,243][15673] Decorrelating experience for 96 frames... -[2025-09-03 03:13:20,351][15671] Decorrelating experience for 64 frames... -[2025-09-03 03:13:20,455][15676] Decorrelating experience for 96 frames... -[2025-09-03 03:13:21,326][15674] Decorrelating experience for 96 frames... -[2025-09-03 03:13:23,316][15677] Decorrelating experience for 96 frames... -[2025-09-03 03:13:24,448][15675] Decorrelating experience for 64 frames... -[2025-09-03 03:13:24,823][14933] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 409600. Throughput: 0: 50.4. Samples: 504. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-09-03 03:13:24,825][14933] Avg episode reward: [(0, '1.480')] -[2025-09-03 03:13:25,415][15671] Decorrelating experience for 96 frames... -[2025-09-03 03:13:29,823][14933] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 409600. Throughput: 0: 105.1. Samples: 1576. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) -[2025-09-03 03:13:29,825][14933] Avg episode reward: [(0, '2.357')] -[2025-09-03 03:13:31,338][15675] Decorrelating experience for 96 frames... -[2025-09-03 03:13:31,636][15657] Signal inference workers to stop experience collection... -[2025-09-03 03:13:31,722][15670] InferenceWorker_p0-w0: stopping experience collection -[2025-09-03 03:13:33,874][15657] Signal inference workers to resume experience collection... -[2025-09-03 03:13:33,875][15670] InferenceWorker_p0-w0: resuming experience collection -[2025-09-03 03:13:34,823][14933] Fps is (10 sec: 409.6, 60 sec: 204.8, 300 sec: 204.8). Total num frames: 413696. Throughput: 0: 127.9. Samples: 2558. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-09-03 03:13:34,826][14933] Avg episode reward: [(0, '2.722')] -[2025-09-03 03:13:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 327.7, 300 sec: 327.7). Total num frames: 417792. Throughput: 0: 137.7. Samples: 3442. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-09-03 03:13:39,825][14933] Avg episode reward: [(0, '3.085')] -[2025-09-03 03:13:44,823][14933] Fps is (10 sec: 409.6, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 417792. Throughput: 0: 139.9. Samples: 4196. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) -[2025-09-03 03:13:44,827][14933] Avg episode reward: [(0, '3.426')] -[2025-09-03 03:13:49,823][14933] Fps is (10 sec: 409.6, 60 sec: 351.1, 300 sec: 351.1). Total num frames: 421888. Throughput: 0: 150.0. Samples: 5250. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:13:49,828][14933] Avg episode reward: [(0, '3.534')] -[2025-09-03 03:13:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 409.6, 300 sec: 409.6). Total num frames: 425984. Throughput: 0: 143.7. Samples: 5748. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:13:54,825][14933] Avg episode reward: [(0, '3.790')] -[2025-09-03 03:13:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 455.1, 300 sec: 455.1). Total num frames: 430080. Throughput: 0: 149.9. Samples: 6746. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:13:59,827][14933] Avg episode reward: [(0, '3.829')] -[2025-09-03 03:14:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 491.5, 300 sec: 491.5). Total num frames: 434176. Throughput: 0: 169.0. Samples: 7628. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:14:04,825][14933] Avg episode reward: [(0, '3.910')] -[2025-09-03 03:14:09,827][14933] Fps is (10 sec: 818.9, 60 sec: 521.3, 300 sec: 521.3). Total num frames: 438272. Throughput: 0: 179.2. Samples: 8570. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:14:09,829][14933] Avg episode reward: [(0, '3.957')] -[2025-09-03 03:14:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 546.1, 300 sec: 546.1). Total num frames: 442368. Throughput: 0: 181.6. Samples: 9746. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:14:14,828][14933] Avg episode reward: [(0, '4.001')] -[2025-09-03 03:14:19,823][14933] Fps is (10 sec: 409.8, 60 sec: 546.1, 300 sec: 504.1). Total num frames: 442368. Throughput: 0: 177.2. Samples: 10530. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:14:19,828][14933] Avg episode reward: [(0, '4.057')] -[2025-09-03 03:14:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 526.6). Total num frames: 446464. Throughput: 0: 164.4. Samples: 10840. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:14:24,825][14933] Avg episode reward: [(0, '4.130')] -[2025-09-03 03:14:26,874][15670] Updated weights for policy 0, policy_version 110 (0.0778) -[2025-09-03 03:14:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 546.1). Total num frames: 450560. Throughput: 0: 179.5. Samples: 12272. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:14:29,825][14933] Avg episode reward: [(0, '4.365')] -[2025-09-03 03:14:31,544][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000111_454656.pth... -[2025-09-03 03:14:31,637][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000093_380928.pth -[2025-09-03 03:14:34,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 563.2). Total num frames: 454656. Throughput: 0: 178.7. Samples: 13290. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:14:34,827][14933] Avg episode reward: [(0, '4.463')] -[2025-09-03 03:14:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 578.3). Total num frames: 458752. Throughput: 0: 178.5. Samples: 13782. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:14:39,827][14933] Avg episode reward: [(0, '4.478')] -[2025-09-03 03:14:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 591.6). Total num frames: 462848. Throughput: 0: 179.5. Samples: 14824. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:14:44,828][14933] Avg episode reward: [(0, '4.475')] -[2025-09-03 03:14:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 603.6). Total num frames: 466944. Throughput: 0: 183.7. Samples: 15896. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:14:49,826][14933] Avg episode reward: [(0, '4.538')] -[2025-09-03 03:14:54,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 573.4). Total num frames: 466944. Throughput: 0: 176.2. Samples: 16498. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:14:54,825][14933] Avg episode reward: [(0, '4.647')] -[2025-09-03 03:14:59,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 585.1). Total num frames: 471040. Throughput: 0: 175.6. Samples: 17646. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:14:59,831][14933] Avg episode reward: [(0, '4.692')] -[2025-09-03 03:15:01,205][15657] Saving new best policy, reward=4.647! -[2025-09-03 03:15:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 595.8). Total num frames: 475136. Throughput: 0: 185.0. Samples: 18856. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:15:04,825][14933] Avg episode reward: [(0, '4.623')] -[2025-09-03 03:15:07,428][15657] Saving new best policy, reward=4.692! -[2025-09-03 03:15:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 605.5). Total num frames: 479232. Throughput: 0: 182.2. Samples: 19040. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:15:09,829][14933] Avg episode reward: [(0, '4.576')] -[2025-09-03 03:15:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 614.4). Total num frames: 483328. Throughput: 0: 173.1. Samples: 20060. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:15:14,828][14933] Avg episode reward: [(0, '4.593')] -[2025-09-03 03:15:19,824][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 622.6). Total num frames: 487424. Throughput: 0: 177.1. Samples: 21258. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:15:19,827][14933] Avg episode reward: [(0, '4.531')] -[2025-09-03 03:15:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 598.6). Total num frames: 487424. Throughput: 0: 183.0. Samples: 22018. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:15:24,825][14933] Avg episode reward: [(0, '4.469')] -[2025-09-03 03:15:25,163][15670] Updated weights for policy 0, policy_version 120 (0.0696) -[2025-09-03 03:15:29,823][14933] Fps is (10 sec: 409.7, 60 sec: 682.7, 300 sec: 606.8). Total num frames: 491520. Throughput: 0: 174.1. Samples: 22658. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:15:29,827][14933] Avg episode reward: [(0, '4.427')] -[2025-09-03 03:15:34,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 614.4). Total num frames: 495616. Throughput: 0: 179.8. Samples: 23986. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:15:34,830][14933] Avg episode reward: [(0, '4.499')] -[2025-09-03 03:15:39,824][14933] Fps is (10 sec: 819.1, 60 sec: 682.6, 300 sec: 621.5). Total num frames: 499712. Throughput: 0: 176.4. Samples: 24438. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:15:39,826][14933] Avg episode reward: [(0, '4.426')] -[2025-09-03 03:15:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 628.1). Total num frames: 503808. Throughput: 0: 171.7. Samples: 25372. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:15:44,828][14933] Avg episode reward: [(0, '4.405')] -[2025-09-03 03:15:49,823][14933] Fps is (10 sec: 819.3, 60 sec: 682.7, 300 sec: 634.2). Total num frames: 507904. Throughput: 0: 170.2. Samples: 26516. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:15:49,830][14933] Avg episode reward: [(0, '4.379')] -[2025-09-03 03:15:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 640.0). Total num frames: 512000. Throughput: 0: 180.5. Samples: 27164. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:15:54,829][14933] Avg episode reward: [(0, '4.235')] -[2025-09-03 03:15:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 645.4). Total num frames: 516096. Throughput: 0: 181.2. Samples: 28212. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:15:59,827][14933] Avg episode reward: [(0, '4.213')] -[2025-09-03 03:16:04,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 626.4). Total num frames: 516096. Throughput: 0: 177.2. Samples: 29230. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:16:04,830][14933] Avg episode reward: [(0, '4.286')] -[2025-09-03 03:16:09,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 632.0). Total num frames: 520192. Throughput: 0: 171.2. Samples: 29720. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:16:09,829][14933] Avg episode reward: [(0, '4.371')] -[2025-09-03 03:16:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 637.2). Total num frames: 524288. Throughput: 0: 182.5. Samples: 30870. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:16:14,833][14933] Avg episode reward: [(0, '4.400')] -[2025-09-03 03:16:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 642.1). Total num frames: 528384. Throughput: 0: 169.3. Samples: 31604. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:16:19,828][14933] Avg episode reward: [(0, '4.488')] -[2025-09-03 03:16:23,386][15670] Updated weights for policy 0, policy_version 130 (0.2471) -[2025-09-03 03:16:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 646.7). Total num frames: 532480. Throughput: 0: 173.6. Samples: 32250. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:16:24,828][14933] Avg episode reward: [(0, '4.530')] -[2025-09-03 03:16:29,831][14933] Fps is (10 sec: 818.5, 60 sec: 750.8, 300 sec: 651.1). Total num frames: 536576. Throughput: 0: 175.7. Samples: 33280. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:16:29,833][14933] Avg episode reward: [(0, '4.525')] -[2025-09-03 03:16:34,825][14933] Fps is (10 sec: 409.5, 60 sec: 682.6, 300 sec: 634.9). Total num frames: 536576. Throughput: 0: 172.0. Samples: 34258. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:16:34,827][14933] Avg episode reward: [(0, '4.525')] -[2025-09-03 03:16:35,327][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000132_540672.pth... -[2025-09-03 03:16:35,438][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000100_409600.pth -[2025-09-03 03:16:39,823][14933] Fps is (10 sec: 409.9, 60 sec: 682.7, 300 sec: 639.4). Total num frames: 540672. Throughput: 0: 170.3. Samples: 34828. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:16:39,830][14933] Avg episode reward: [(0, '4.552')] -[2025-09-03 03:16:44,823][14933] Fps is (10 sec: 819.4, 60 sec: 682.7, 300 sec: 643.7). Total num frames: 544768. Throughput: 0: 175.8. Samples: 36122. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:16:44,829][14933] Avg episode reward: [(0, '4.529')] -[2025-09-03 03:16:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 647.7). Total num frames: 548864. Throughput: 0: 173.1. Samples: 37018. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:16:49,833][14933] Avg episode reward: [(0, '4.534')] -[2025-09-03 03:16:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 651.6). Total num frames: 552960. Throughput: 0: 169.4. Samples: 37344. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:16:54,825][14933] Avg episode reward: [(0, '4.518')] -[2025-09-03 03:16:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 655.4). Total num frames: 557056. Throughput: 0: 167.4. Samples: 38402. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:16:59,827][14933] Avg episode reward: [(0, '4.557')] -[2025-09-03 03:17:04,824][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 658.9). Total num frames: 561152. Throughput: 0: 178.9. Samples: 39656. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:17:04,830][14933] Avg episode reward: [(0, '4.596')] -[2025-09-03 03:17:09,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 644.9). Total num frames: 561152. Throughput: 0: 176.4. Samples: 40190. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:17:09,827][14933] Avg episode reward: [(0, '4.596')] -[2025-09-03 03:17:14,823][14933] Fps is (10 sec: 409.7, 60 sec: 682.7, 300 sec: 648.5). Total num frames: 565248. Throughput: 0: 181.0. Samples: 41424. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:17:14,828][14933] Avg episode reward: [(0, '4.540')] -[2025-09-03 03:17:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 652.0). Total num frames: 569344. Throughput: 0: 182.8. Samples: 42482. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:17:19,831][14933] Avg episode reward: [(0, '4.531')] -[2025-09-03 03:17:20,851][15670] Updated weights for policy 0, policy_version 140 (0.0686) -[2025-09-03 03:17:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 655.4). Total num frames: 573440. Throughput: 0: 176.8. Samples: 42784. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:17:24,826][14933] Avg episode reward: [(0, '4.547')] -[2025-09-03 03:17:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.8, 300 sec: 658.6). Total num frames: 577536. Throughput: 0: 168.8. Samples: 43718. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:17:29,825][14933] Avg episode reward: [(0, '4.525')] -[2025-09-03 03:17:34,823][14933] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 661.7). Total num frames: 581632. Throughput: 0: 178.5. Samples: 45052. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:17:34,825][14933] Avg episode reward: [(0, '4.471')] -[2025-09-03 03:17:39,827][14933] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 664.6). Total num frames: 585728. Throughput: 0: 183.1. Samples: 45586. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:17:39,832][14933] Avg episode reward: [(0, '4.497')] -[2025-09-03 03:17:44,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 652.3). Total num frames: 585728. Throughput: 0: 180.0. Samples: 46502. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:17:44,830][14933] Avg episode reward: [(0, '4.599')] -[2025-09-03 03:17:49,823][14933] Fps is (10 sec: 409.8, 60 sec: 682.7, 300 sec: 655.4). Total num frames: 589824. Throughput: 0: 178.0. Samples: 47664. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:17:49,827][14933] Avg episode reward: [(0, '4.594')] -[2025-09-03 03:17:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 658.3). Total num frames: 593920. Throughput: 0: 181.3. Samples: 48348. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:17:54,824][14933] Avg episode reward: [(0, '4.546')] -[2025-09-03 03:17:59,824][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 661.1). Total num frames: 598016. Throughput: 0: 169.4. Samples: 49048. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:17:59,831][14933] Avg episode reward: [(0, '4.539')] -[2025-09-03 03:18:04,828][14933] Fps is (10 sec: 409.4, 60 sec: 614.4, 300 sec: 649.7). Total num frames: 598016. Throughput: 0: 159.8. Samples: 49672. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:18:04,845][14933] Avg episode reward: [(0, '4.493')] -[2025-09-03 03:18:09,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 652.6). Total num frames: 602112. Throughput: 0: 154.9. Samples: 49756. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:18:09,825][14933] Avg episode reward: [(0, '4.460')] -[2025-09-03 03:18:14,823][14933] Fps is (10 sec: 819.6, 60 sec: 682.7, 300 sec: 666.5). Total num frames: 606208. Throughput: 0: 159.2. Samples: 50880. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:18:14,828][14933] Avg episode reward: [(0, '4.442')] -[2025-09-03 03:18:19,823][14933] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 666.5). Total num frames: 606208. Throughput: 0: 149.6. Samples: 51786. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:18:19,825][14933] Avg episode reward: [(0, '4.455')] -[2025-09-03 03:18:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 680.4). Total num frames: 610304. Throughput: 0: 151.1. Samples: 52386. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:18:24,828][14933] Avg episode reward: [(0, '4.435')] -[2025-09-03 03:18:25,426][15670] Updated weights for policy 0, policy_version 150 (0.0600) -[2025-09-03 03:18:28,406][15657] Signal inference workers to stop experience collection... (50 times) -[2025-09-03 03:18:28,485][15670] InferenceWorker_p0-w0: stopping experience collection (50 times) -[2025-09-03 03:18:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 680.4). Total num frames: 614400. Throughput: 0: 162.9. Samples: 53834. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:18:29,829][14933] Avg episode reward: [(0, '4.471')] -[2025-09-03 03:18:30,298][15657] Signal inference workers to resume experience collection... (50 times) -[2025-09-03 03:18:30,299][15670] InferenceWorker_p0-w0: resuming experience collection (50 times) -[2025-09-03 03:18:34,823][14933] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 680.4). Total num frames: 618496. Throughput: 0: 153.5. Samples: 54572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:18:34,829][14933] Avg episode reward: [(0, '4.443')] -[2025-09-03 03:18:37,360][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000152_622592.pth... -[2025-09-03 03:18:37,515][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000111_454656.pth -[2025-09-03 03:18:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 694.2). Total num frames: 622592. Throughput: 0: 145.4. Samples: 54892. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:18:39,825][14933] Avg episode reward: [(0, '4.497')] -[2025-09-03 03:18:44,824][14933] Fps is (10 sec: 819.1, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 626688. Throughput: 0: 156.2. Samples: 56078. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:18:44,828][14933] Avg episode reward: [(0, '4.343')] -[2025-09-03 03:18:49,828][14933] Fps is (10 sec: 818.7, 60 sec: 682.6, 300 sec: 694.2). Total num frames: 630784. Throughput: 0: 164.3. Samples: 57066. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:18:49,830][14933] Avg episode reward: [(0, '4.333')] -[2025-09-03 03:18:54,823][14933] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 680.4). Total num frames: 630784. Throughput: 0: 173.7. Samples: 57572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:18:54,827][14933] Avg episode reward: [(0, '4.489')] -[2025-09-03 03:18:59,823][14933] Fps is (10 sec: 409.8, 60 sec: 614.4, 300 sec: 680.4). Total num frames: 634880. Throughput: 0: 179.1. Samples: 58940. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:18:59,825][14933] Avg episode reward: [(0, '4.482')] -[2025-09-03 03:19:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 638976. Throughput: 0: 182.1. Samples: 59982. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:19:04,832][14933] Avg episode reward: [(0, '4.577')] -[2025-09-03 03:19:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 643072. Throughput: 0: 177.2. Samples: 60360. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:19:09,829][14933] Avg episode reward: [(0, '4.598')] -[2025-09-03 03:19:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 647168. Throughput: 0: 160.2. Samples: 61042. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:19:14,825][14933] Avg episode reward: [(0, '4.630')] -[2025-09-03 03:19:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 651264. Throughput: 0: 175.6. Samples: 62474. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:19:19,824][14933] Avg episode reward: [(0, '4.630')] -[2025-09-03 03:19:24,204][15670] Updated weights for policy 0, policy_version 160 (0.2720) -[2025-09-03 03:19:24,823][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 655360. Throughput: 0: 181.7. Samples: 63070. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:19:24,830][14933] Avg episode reward: [(0, '4.630')] -[2025-09-03 03:19:29,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 655360. Throughput: 0: 172.4. Samples: 63836. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:19:29,830][14933] Avg episode reward: [(0, '4.656')] -[2025-09-03 03:19:34,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 659456. Throughput: 0: 178.2. Samples: 65086. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:19:34,830][14933] Avg episode reward: [(0, '4.672')] -[2025-09-03 03:19:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 663552. Throughput: 0: 180.1. Samples: 65676. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:19:39,824][14933] Avg episode reward: [(0, '4.739')] -[2025-09-03 03:19:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 667648. Throughput: 0: 166.3. Samples: 66422. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:19:44,825][14933] Avg episode reward: [(0, '4.739')] -[2025-09-03 03:19:47,689][15657] Saving new best policy, reward=4.739! -[2025-09-03 03:19:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 671744. Throughput: 0: 171.8. Samples: 67712. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:19:49,825][14933] Avg episode reward: [(0, '4.713')] -[2025-09-03 03:19:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 675840. Throughput: 0: 173.7. Samples: 68178. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:19:54,825][14933] Avg episode reward: [(0, '4.677')] -[2025-09-03 03:19:59,827][14933] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 679936. Throughput: 0: 181.1. Samples: 69192. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:19:59,832][14933] Avg episode reward: [(0, '4.679')] -[2025-09-03 03:20:04,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 679936. Throughput: 0: 173.5. Samples: 70282. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:20:04,829][14933] Avg episode reward: [(0, '4.569')] -[2025-09-03 03:20:09,823][14933] Fps is (10 sec: 409.8, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 684032. Throughput: 0: 168.3. Samples: 70644. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:20:09,825][14933] Avg episode reward: [(0, '4.582')] -[2025-09-03 03:20:14,827][14933] Fps is (10 sec: 818.8, 60 sec: 682.6, 300 sec: 680.3). Total num frames: 688128. Throughput: 0: 172.1. Samples: 71582. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:20:14,834][14933] Avg episode reward: [(0, '4.565')] -[2025-09-03 03:20:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 692224. Throughput: 0: 162.9. Samples: 72416. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:20:19,825][14933] Avg episode reward: [(0, '4.510')] -[2025-09-03 03:20:23,443][15670] Updated weights for policy 0, policy_version 170 (0.2178) -[2025-09-03 03:20:24,823][14933] Fps is (10 sec: 819.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 696320. Throughput: 0: 169.2. Samples: 73288. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:20:24,825][14933] Avg episode reward: [(0, '4.530')] -[2025-09-03 03:20:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 700416. Throughput: 0: 176.5. Samples: 74366. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:20:29,825][14933] Avg episode reward: [(0, '4.530')] -[2025-09-03 03:20:34,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 700416. Throughput: 0: 168.4. Samples: 75292. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:20:34,831][14933] Avg episode reward: [(0, '4.530')] -[2025-09-03 03:20:35,167][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000172_704512.pth... -[2025-09-03 03:20:35,257][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000132_540672.pth -[2025-09-03 03:20:39,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 704512. Throughput: 0: 173.8. Samples: 75998. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:20:39,830][14933] Avg episode reward: [(0, '4.382')] -[2025-09-03 03:20:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 708608. Throughput: 0: 180.8. Samples: 77326. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:20:44,828][14933] Avg episode reward: [(0, '4.364')] -[2025-09-03 03:20:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 712704. Throughput: 0: 173.9. Samples: 78108. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:20:49,825][14933] Avg episode reward: [(0, '4.394')] -[2025-09-03 03:20:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 716800. Throughput: 0: 171.2. Samples: 78348. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:20:54,825][14933] Avg episode reward: [(0, '4.336')] -[2025-09-03 03:20:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 720896. Throughput: 0: 179.1. Samples: 79640. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:20:59,824][14933] Avg episode reward: [(0, '4.349')] -[2025-09-03 03:21:04,829][14933] Fps is (10 sec: 818.7, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 724992. Throughput: 0: 182.8. Samples: 80644. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:21:04,839][14933] Avg episode reward: [(0, '4.451')] -[2025-09-03 03:21:09,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 724992. Throughput: 0: 175.2. Samples: 81174. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:21:09,825][14933] Avg episode reward: [(0, '4.421')] -[2025-09-03 03:21:14,823][14933] Fps is (10 sec: 409.8, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 729088. Throughput: 0: 178.9. Samples: 82416. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:21:14,831][14933] Avg episode reward: [(0, '4.298')] -[2025-09-03 03:21:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 733184. Throughput: 0: 181.7. Samples: 83470. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:21:19,825][14933] Avg episode reward: [(0, '4.259')] -[2025-09-03 03:21:20,937][15670] Updated weights for policy 0, policy_version 180 (0.0091) -[2025-09-03 03:21:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 737280. Throughput: 0: 171.6. Samples: 83720. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:21:24,831][14933] Avg episode reward: [(0, '4.258')] -[2025-09-03 03:21:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 741376. Throughput: 0: 165.3. Samples: 84766. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:21:29,829][14933] Avg episode reward: [(0, '4.190')] -[2025-09-03 03:21:34,823][14933] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 745472. Throughput: 0: 176.0. Samples: 86030. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:21:34,826][14933] Avg episode reward: [(0, '4.231')] -[2025-09-03 03:21:39,827][14933] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 749568. Throughput: 0: 183.2. Samples: 86594. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:21:39,829][14933] Avg episode reward: [(0, '4.319')] -[2025-09-03 03:21:44,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 749568. Throughput: 0: 172.2. Samples: 87390. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:21:44,825][14933] Avg episode reward: [(0, '4.284')] -[2025-09-03 03:21:49,823][14933] Fps is (10 sec: 409.8, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 753664. Throughput: 0: 177.8. Samples: 88642. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:21:49,829][14933] Avg episode reward: [(0, '4.317')] -[2025-09-03 03:21:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 757760. Throughput: 0: 176.3. Samples: 89108. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:21:54,828][14933] Avg episode reward: [(0, '4.325')] -[2025-09-03 03:21:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 761856. Throughput: 0: 165.5. Samples: 89864. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:21:59,828][14933] Avg episode reward: [(0, '4.360')] -[2025-09-03 03:22:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 765952. Throughput: 0: 168.1. Samples: 91036. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:22:04,827][14933] Avg episode reward: [(0, '4.274')] -[2025-09-03 03:22:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 770048. Throughput: 0: 177.4. Samples: 91702. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:22:09,825][14933] Avg episode reward: [(0, '4.389')] -[2025-09-03 03:22:14,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 770048. Throughput: 0: 172.9. Samples: 92548. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:22:14,825][14933] Avg episode reward: [(0, '4.398')] -[2025-09-03 03:22:19,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 774144. Throughput: 0: 168.1. Samples: 93596. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:22:19,825][14933] Avg episode reward: [(0, '4.512')] -[2025-09-03 03:22:21,472][15670] Updated weights for policy 0, policy_version 190 (0.0704) -[2025-09-03 03:22:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 778240. Throughput: 0: 164.5. Samples: 93996. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:22:24,827][14933] Avg episode reward: [(0, '4.572')] -[2025-09-03 03:22:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 782336. Throughput: 0: 172.8. Samples: 95164. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:22:29,825][14933] Avg episode reward: [(0, '4.664')] -[2025-09-03 03:22:33,942][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth... -[2025-09-03 03:22:34,039][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000152_622592.pth -[2025-09-03 03:22:34,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 786432. Throughput: 0: 162.8. Samples: 95966. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:22:34,828][14933] Avg episode reward: [(0, '4.608')] -[2025-09-03 03:22:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 790528. Throughput: 0: 170.6. Samples: 96786. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:22:39,824][14933] Avg episode reward: [(0, '4.561')] -[2025-09-03 03:22:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 794624. Throughput: 0: 175.9. Samples: 97780. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:22:44,826][14933] Avg episode reward: [(0, '4.508')] -[2025-09-03 03:22:49,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 794624. Throughput: 0: 172.3. Samples: 98790. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:22:49,830][14933] Avg episode reward: [(0, '4.553')] -[2025-09-03 03:22:54,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 798720. Throughput: 0: 167.0. Samples: 99218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:22:54,830][14933] Avg episode reward: [(0, '4.652')] -[2025-09-03 03:22:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 802816. Throughput: 0: 175.0. Samples: 100422. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:22:59,825][14933] Avg episode reward: [(0, '4.627')] -[2025-09-03 03:23:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 806912. Throughput: 0: 173.8. Samples: 101418. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:23:04,827][14933] Avg episode reward: [(0, '4.625')] -[2025-09-03 03:23:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 811008. Throughput: 0: 175.4. Samples: 101890. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:23:09,825][14933] Avg episode reward: [(0, '4.546')] -[2025-09-03 03:23:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 815104. Throughput: 0: 171.3. Samples: 102874. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:23:14,828][14933] Avg episode reward: [(0, '4.621')] -[2025-09-03 03:23:19,412][15670] Updated weights for policy 0, policy_version 200 (0.2648) -[2025-09-03 03:23:19,824][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 819200. Throughput: 0: 176.5. Samples: 103910. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:23:19,827][14933] Avg episode reward: [(0, '4.570')] -[2025-09-03 03:23:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 819200. Throughput: 0: 168.2. Samples: 104356. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:23:24,828][14933] Avg episode reward: [(0, '4.540')] -[2025-09-03 03:23:25,277][15657] Signal inference workers to stop experience collection... (100 times) -[2025-09-03 03:23:25,323][15670] InferenceWorker_p0-w0: stopping experience collection (100 times) -[2025-09-03 03:23:26,721][15657] Signal inference workers to resume experience collection... (100 times) -[2025-09-03 03:23:26,725][15670] InferenceWorker_p0-w0: resuming experience collection (100 times) -[2025-09-03 03:23:29,823][14933] Fps is (10 sec: 409.7, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 823296. Throughput: 0: 166.2. Samples: 105260. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:23:29,830][14933] Avg episode reward: [(0, '4.499')] -[2025-09-03 03:23:34,825][14933] Fps is (10 sec: 819.0, 60 sec: 682.6, 300 sec: 694.2). Total num frames: 827392. Throughput: 0: 176.7. Samples: 106744. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:23:34,827][14933] Avg episode reward: [(0, '4.542')] -[2025-09-03 03:23:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 831488. Throughput: 0: 170.7. Samples: 106900. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:23:39,829][14933] Avg episode reward: [(0, '4.542')] -[2025-09-03 03:23:44,823][14933] Fps is (10 sec: 819.4, 60 sec: 682.7, 300 sec: 694.3). Total num frames: 835584. Throughput: 0: 166.1. Samples: 107896. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:23:44,830][14933] Avg episode reward: [(0, '4.642')] -[2025-09-03 03:23:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 839680. Throughput: 0: 170.1. Samples: 109072. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:23:49,824][14933] Avg episode reward: [(0, '4.668')] -[2025-09-03 03:23:54,824][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 843776. Throughput: 0: 178.2. Samples: 109908. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:23:54,832][14933] Avg episode reward: [(0, '4.673')] -[2025-09-03 03:23:59,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 843776. Throughput: 0: 174.1. Samples: 110708. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:23:59,830][14933] Avg episode reward: [(0, '4.620')] -[2025-09-03 03:24:04,832][14933] Fps is (10 sec: 409.3, 60 sec: 682.6, 300 sec: 694.2). Total num frames: 847872. Throughput: 0: 177.1. Samples: 111882. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:24:04,835][14933] Avg episode reward: [(0, '4.674')] -[2025-09-03 03:24:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 851968. Throughput: 0: 180.6. Samples: 112484. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:24:09,825][14933] Avg episode reward: [(0, '4.728')] -[2025-09-03 03:24:14,823][14933] Fps is (10 sec: 819.9, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 856064. Throughput: 0: 177.9. Samples: 113266. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:24:14,828][14933] Avg episode reward: [(0, '4.678')] -[2025-09-03 03:24:18,830][15670] Updated weights for policy 0, policy_version 210 (0.3224) -[2025-09-03 03:24:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 860160. Throughput: 0: 166.1. Samples: 114220. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:24:19,825][14933] Avg episode reward: [(0, '4.682')] -[2025-09-03 03:24:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 864256. Throughput: 0: 180.8. Samples: 115038. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:24:24,825][14933] Avg episode reward: [(0, '4.637')] -[2025-09-03 03:24:29,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 864256. Throughput: 0: 181.8. Samples: 116076. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:24:29,825][14933] Avg episode reward: [(0, '4.654')] -[2025-09-03 03:24:34,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 868352. Throughput: 0: 178.0. Samples: 117080. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:24:34,830][14933] Avg episode reward: [(0, '4.634')] -[2025-09-03 03:24:35,554][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000213_872448.pth... -[2025-09-03 03:24:35,659][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000172_704512.pth -[2025-09-03 03:24:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 872448. Throughput: 0: 172.7. Samples: 117680. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:24:39,825][14933] Avg episode reward: [(0, '4.650')] -[2025-09-03 03:24:44,826][14933] Fps is (10 sec: 819.0, 60 sec: 682.6, 300 sec: 694.2). Total num frames: 876544. Throughput: 0: 179.5. Samples: 118786. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:24:44,831][14933] Avg episode reward: [(0, '4.533')] -[2025-09-03 03:24:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 880640. Throughput: 0: 174.3. Samples: 119726. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) -[2025-09-03 03:24:49,825][14933] Avg episode reward: [(0, '4.527')] -[2025-09-03 03:24:54,823][14933] Fps is (10 sec: 819.4, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 884736. Throughput: 0: 170.6. Samples: 120162. Policy #0 lag: (min: 1.0, avg: 1.3, max: 3.0) -[2025-09-03 03:24:54,825][14933] Avg episode reward: [(0, '4.590')] -[2025-09-03 03:24:59,825][14933] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 888832. Throughput: 0: 185.1. Samples: 121596. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:24:59,828][14933] Avg episode reward: [(0, '4.632')] -[2025-09-03 03:25:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 708.1). Total num frames: 892928. Throughput: 0: 179.2. Samples: 122284. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:25:04,825][14933] Avg episode reward: [(0, '4.581')] -[2025-09-03 03:25:09,822][14933] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 897024. Throughput: 0: 182.2. Samples: 123236. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:25:09,824][14933] Avg episode reward: [(0, '4.557')] -[2025-09-03 03:25:14,418][15670] Updated weights for policy 0, policy_version 220 (0.1131) -[2025-09-03 03:25:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 901120. Throughput: 0: 182.8. Samples: 124300. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:25:14,830][14933] Avg episode reward: [(0, '4.527')] -[2025-09-03 03:25:19,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 901120. Throughput: 0: 180.0. Samples: 125180. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:25:19,832][14933] Avg episode reward: [(0, '4.554')] -[2025-09-03 03:25:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 905216. Throughput: 0: 177.5. Samples: 125666. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:25:24,831][14933] Avg episode reward: [(0, '4.644')] -[2025-09-03 03:25:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 909312. Throughput: 0: 178.4. Samples: 126814. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:25:29,831][14933] Avg episode reward: [(0, '4.798')] -[2025-09-03 03:25:31,273][15657] Saving new best policy, reward=4.798! -[2025-09-03 03:25:34,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 913408. Throughput: 0: 182.2. Samples: 127924. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:25:34,825][14933] Avg episode reward: [(0, '4.830')] -[2025-09-03 03:25:38,547][15657] Saving new best policy, reward=4.830! -[2025-09-03 03:25:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 917504. Throughput: 0: 182.7. Samples: 128384. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:25:39,825][14933] Avg episode reward: [(0, '4.754')] -[2025-09-03 03:25:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 708.1). Total num frames: 921600. Throughput: 0: 173.2. Samples: 129388. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:25:44,824][14933] Avg episode reward: [(0, '4.752')] -[2025-09-03 03:25:49,827][14933] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 925696. Throughput: 0: 185.0. Samples: 130608. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:25:49,833][14933] Avg episode reward: [(0, '4.710')] -[2025-09-03 03:25:54,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 925696. Throughput: 0: 176.2. Samples: 131166. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:25:54,824][14933] Avg episode reward: [(0, '4.687')] -[2025-09-03 03:25:59,823][14933] Fps is (10 sec: 409.8, 60 sec: 682.7, 300 sec: 694.3). Total num frames: 929792. Throughput: 0: 174.3. Samples: 132144. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:25:59,830][14933] Avg episode reward: [(0, '4.763')] -[2025-09-03 03:26:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 933888. Throughput: 0: 182.6. Samples: 133396. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:26:04,828][14933] Avg episode reward: [(0, '4.822')] -[2025-09-03 03:26:09,826][14933] Fps is (10 sec: 818.9, 60 sec: 682.6, 300 sec: 708.1). Total num frames: 937984. Throughput: 0: 178.5. Samples: 133700. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:26:09,829][14933] Avg episode reward: [(0, '4.828')] -[2025-09-03 03:26:13,727][15670] Updated weights for policy 0, policy_version 230 (0.1917) -[2025-09-03 03:26:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 942080. Throughput: 0: 172.8. Samples: 134592. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:26:14,830][14933] Avg episode reward: [(0, '4.906')] -[2025-09-03 03:26:18,633][15657] Saving new best policy, reward=4.906! -[2025-09-03 03:26:19,823][14933] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 946176. Throughput: 0: 176.3. Samples: 135858. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:26:19,825][14933] Avg episode reward: [(0, '4.850')] -[2025-09-03 03:26:24,825][14933] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 950272. Throughput: 0: 182.7. Samples: 136604. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:26:24,829][14933] Avg episode reward: [(0, '4.886')] -[2025-09-03 03:26:29,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 950272. Throughput: 0: 177.1. Samples: 137358. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:26:29,825][14933] Avg episode reward: [(0, '4.892')] -[2025-09-03 03:26:34,823][14933] Fps is (10 sec: 409.7, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 954368. Throughput: 0: 177.7. Samples: 138604. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:26:34,830][14933] Avg episode reward: [(0, '4.869')] -[2025-09-03 03:26:35,843][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000234_958464.pth... -[2025-09-03 03:26:35,966][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000192_786432.pth -[2025-09-03 03:26:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 958464. Throughput: 0: 178.2. Samples: 139186. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:26:39,828][14933] Avg episode reward: [(0, '4.904')] -[2025-09-03 03:26:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 962560. Throughput: 0: 174.7. Samples: 140006. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:26:44,828][14933] Avg episode reward: [(0, '4.904')] -[2025-09-03 03:26:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 966656. Throughput: 0: 172.6. Samples: 141162. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:26:49,824][14933] Avg episode reward: [(0, '4.855')] -[2025-09-03 03:26:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 970752. Throughput: 0: 179.0. Samples: 141756. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:26:54,825][14933] Avg episode reward: [(0, '4.817')] -[2025-09-03 03:26:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 974848. Throughput: 0: 182.0. Samples: 142784. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:26:59,825][14933] Avg episode reward: [(0, '4.777')] -[2025-09-03 03:27:04,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 974848. Throughput: 0: 175.4. Samples: 143752. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:27:04,832][14933] Avg episode reward: [(0, '4.808')] -[2025-09-03 03:27:09,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 978944. Throughput: 0: 170.5. Samples: 144274. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:27:09,833][14933] Avg episode reward: [(0, '4.870')] -[2025-09-03 03:27:11,414][15670] Updated weights for policy 0, policy_version 240 (0.3326) -[2025-09-03 03:27:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 983040. Throughput: 0: 180.1. Samples: 145464. Policy #0 lag: (min: 1.0, avg: 1.4, max: 3.0) -[2025-09-03 03:27:14,828][14933] Avg episode reward: [(0, '4.881')] -[2025-09-03 03:27:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 987136. Throughput: 0: 170.4. Samples: 146272. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:27:19,825][14933] Avg episode reward: [(0, '4.849')] -[2025-09-03 03:27:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 991232. Throughput: 0: 172.7. Samples: 146956. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:27:24,827][14933] Avg episode reward: [(0, '4.791')] -[2025-09-03 03:27:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 995328. Throughput: 0: 177.2. Samples: 147982. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:27:29,824][14933] Avg episode reward: [(0, '4.680')] -[2025-09-03 03:27:34,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 995328. Throughput: 0: 173.9. Samples: 148988. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:27:34,825][14933] Avg episode reward: [(0, '4.680')] -[2025-09-03 03:27:39,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 999424. Throughput: 0: 172.4. Samples: 149512. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:27:39,832][14933] Avg episode reward: [(0, '4.634')] -[2025-09-03 03:27:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1003520. Throughput: 0: 183.4. Samples: 151038. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:27:44,830][14933] Avg episode reward: [(0, '4.657')] -[2025-09-03 03:27:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1007616. Throughput: 0: 178.0. Samples: 151760. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:27:49,824][14933] Avg episode reward: [(0, '4.710')] -[2025-09-03 03:27:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1011712. Throughput: 0: 174.9. Samples: 152144. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:27:54,825][14933] Avg episode reward: [(0, '4.627')] -[2025-09-03 03:27:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1015808. Throughput: 0: 177.5. Samples: 153452. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:27:59,824][14933] Avg episode reward: [(0, '4.580')] -[2025-09-03 03:28:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1019904. Throughput: 0: 181.3. Samples: 154430. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:28:04,824][14933] Avg episode reward: [(0, '4.616')] -[2025-09-03 03:28:09,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1019904. Throughput: 0: 174.5. Samples: 154808. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:28:09,826][14933] Avg episode reward: [(0, '4.619')] -[2025-09-03 03:28:14,268][15670] Updated weights for policy 0, policy_version 250 (0.3934) -[2025-09-03 03:28:14,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1024000. Throughput: 0: 159.9. Samples: 155178. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:28:14,827][14933] Avg episode reward: [(0, '4.603')] -[2025-09-03 03:28:19,823][14933] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 694.2). Total num frames: 1024000. Throughput: 0: 155.5. Samples: 155984. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:28:19,825][14933] Avg episode reward: [(0, '4.572')] -[2025-09-03 03:28:20,162][15657] Signal inference workers to stop experience collection... (150 times) -[2025-09-03 03:28:20,219][15670] InferenceWorker_p0-w0: stopping experience collection (150 times) -[2025-09-03 03:28:21,562][15657] Signal inference workers to resume experience collection... (150 times) -[2025-09-03 03:28:21,563][15670] InferenceWorker_p0-w0: resuming experience collection (150 times) -[2025-09-03 03:28:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 694.2). Total num frames: 1028096. Throughput: 0: 154.7. Samples: 156472. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:28:24,828][14933] Avg episode reward: [(0, '4.576')] -[2025-09-03 03:28:29,824][14933] Fps is (10 sec: 819.1, 60 sec: 614.4, 300 sec: 694.2). Total num frames: 1032192. Throughput: 0: 138.7. Samples: 157278. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:28:29,826][14933] Avg episode reward: [(0, '4.592')] -[2025-09-03 03:28:33,916][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000253_1036288.pth... -[2025-09-03 03:28:34,011][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000213_872448.pth -[2025-09-03 03:28:34,823][14933] Fps is (10 sec: 819.1, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1036288. Throughput: 0: 148.4. Samples: 158438. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:28:34,825][14933] Avg episode reward: [(0, '4.625')] -[2025-09-03 03:28:39,831][14933] Fps is (10 sec: 818.6, 60 sec: 682.6, 300 sec: 694.2). Total num frames: 1040384. Throughput: 0: 159.3. Samples: 159316. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:28:39,840][14933] Avg episode reward: [(0, '4.672')] -[2025-09-03 03:28:44,823][14933] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 680.4). Total num frames: 1040384. Throughput: 0: 153.2. Samples: 160344. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:28:44,825][14933] Avg episode reward: [(0, '4.747')] -[2025-09-03 03:28:49,823][14933] Fps is (10 sec: 409.9, 60 sec: 614.4, 300 sec: 680.4). Total num frames: 1044480. Throughput: 0: 153.7. Samples: 161346. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:28:49,829][14933] Avg episode reward: [(0, '4.810')] -[2025-09-03 03:28:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 694.2). Total num frames: 1048576. Throughput: 0: 153.9. Samples: 161734. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:28:54,825][14933] Avg episode reward: [(0, '4.796')] -[2025-09-03 03:28:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 694.3). Total num frames: 1052672. Throughput: 0: 175.9. Samples: 163092. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:28:59,828][14933] Avg episode reward: [(0, '4.931')] -[2025-09-03 03:29:02,405][15657] Saving new best policy, reward=4.931! -[2025-09-03 03:29:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 614.4, 300 sec: 694.2). Total num frames: 1056768. Throughput: 0: 176.9. Samples: 163946. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:29:04,829][14933] Avg episode reward: [(0, '4.931')] -[2025-09-03 03:29:09,825][14933] Fps is (10 sec: 819.0, 60 sec: 682.6, 300 sec: 694.2). Total num frames: 1060864. Throughput: 0: 177.3. Samples: 164452. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:29:09,829][14933] Avg episode reward: [(0, '4.937')] -[2025-09-03 03:29:12,633][15657] Saving new best policy, reward=4.937! -[2025-09-03 03:29:12,654][15670] Updated weights for policy 0, policy_version 260 (0.1284) -[2025-09-03 03:29:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1064960. Throughput: 0: 187.1. Samples: 165698. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:29:14,825][14933] Avg episode reward: [(0, '4.967')] -[2025-09-03 03:29:19,823][14933] Fps is (10 sec: 409.7, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 1064960. Throughput: 0: 179.3. Samples: 166506. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:29:19,825][14933] Avg episode reward: [(0, '4.899')] -[2025-09-03 03:29:20,046][15657] Saving new best policy, reward=4.967! -[2025-09-03 03:29:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1069056. Throughput: 0: 174.2. Samples: 167154. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:29:24,828][14933] Avg episode reward: [(0, '4.952')] -[2025-09-03 03:29:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1073152. Throughput: 0: 181.4. Samples: 168508. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:29:29,831][14933] Avg episode reward: [(0, '5.001')] -[2025-09-03 03:29:34,826][14933] Fps is (10 sec: 818.9, 60 sec: 682.6, 300 sec: 694.2). Total num frames: 1077248. Throughput: 0: 176.7. Samples: 169300. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:29:34,832][14933] Avg episode reward: [(0, '4.919')] -[2025-09-03 03:29:37,578][15657] Saving new best policy, reward=5.001! -[2025-09-03 03:29:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.8, 300 sec: 694.2). Total num frames: 1081344. Throughput: 0: 175.6. Samples: 169634. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:29:39,825][14933] Avg episode reward: [(0, '4.954')] -[2025-09-03 03:29:44,823][14933] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1085440. Throughput: 0: 175.0. Samples: 170968. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:29:44,830][14933] Avg episode reward: [(0, '5.086')] -[2025-09-03 03:29:47,120][15657] Saving new best policy, reward=5.086! -[2025-09-03 03:29:49,824][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1089536. Throughput: 0: 180.9. Samples: 172088. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:29:49,829][14933] Avg episode reward: [(0, '4.990')] -[2025-09-03 03:29:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1093632. Throughput: 0: 181.9. Samples: 172638. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:29:54,828][14933] Avg episode reward: [(0, '5.016')] -[2025-09-03 03:29:59,823][14933] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1097728. Throughput: 0: 178.0. Samples: 173710. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:29:59,829][14933] Avg episode reward: [(0, '5.131')] -[2025-09-03 03:30:04,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 1097728. Throughput: 0: 182.8. Samples: 174734. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:30:04,828][14933] Avg episode reward: [(0, '5.110')] -[2025-09-03 03:30:04,950][15657] Saving new best policy, reward=5.131! -[2025-09-03 03:30:09,840][14933] Fps is (10 sec: 408.9, 60 sec: 682.5, 300 sec: 680.3). Total num frames: 1101824. Throughput: 0: 178.0. Samples: 175168. Policy #0 lag: (min: 1.0, avg: 1.6, max: 3.0) -[2025-09-03 03:30:09,841][14933] Avg episode reward: [(0, '5.189')] -[2025-09-03 03:30:12,176][15657] Saving new best policy, reward=5.189! -[2025-09-03 03:30:12,191][15670] Updated weights for policy 0, policy_version 270 (0.0077) -[2025-09-03 03:30:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1105920. Throughput: 0: 168.0. Samples: 176070. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:30:14,827][14933] Avg episode reward: [(0, '5.146')] -[2025-09-03 03:30:19,823][14933] Fps is (10 sec: 820.6, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1110016. Throughput: 0: 180.4. Samples: 177418. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:30:19,832][14933] Avg episode reward: [(0, '5.081')] -[2025-09-03 03:30:24,827][14933] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1114112. Throughput: 0: 180.9. Samples: 177774. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:30:24,835][14933] Avg episode reward: [(0, '5.104')] -[2025-09-03 03:30:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1118208. Throughput: 0: 173.3. Samples: 178768. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:30:29,826][14933] Avg episode reward: [(0, '5.043')] -[2025-09-03 03:30:33,892][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000274_1122304.pth... -[2025-09-03 03:30:33,997][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000234_958464.pth -[2025-09-03 03:30:34,823][14933] Fps is (10 sec: 819.5, 60 sec: 751.0, 300 sec: 694.2). Total num frames: 1122304. Throughput: 0: 175.1. Samples: 179968. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:30:34,832][14933] Avg episode reward: [(0, '5.138')] -[2025-09-03 03:30:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1126400. Throughput: 0: 182.0. Samples: 180830. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:30:39,824][14933] Avg episode reward: [(0, '5.060')] -[2025-09-03 03:30:44,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 680.4). Total num frames: 1126400. Throughput: 0: 174.2. Samples: 181548. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:30:44,824][14933] Avg episode reward: [(0, '4.945')] -[2025-09-03 03:30:49,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1130496. Throughput: 0: 181.0. Samples: 182880. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:30:49,830][14933] Avg episode reward: [(0, '4.887')] -[2025-09-03 03:30:54,823][14933] Fps is (10 sec: 1228.8, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1138688. Throughput: 0: 189.0. Samples: 183670. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:30:54,828][14933] Avg episode reward: [(0, '4.700')] -[2025-09-03 03:30:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1138688. Throughput: 0: 188.7. Samples: 184562. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:30:59,826][14933] Avg episode reward: [(0, '4.653')] -[2025-09-03 03:31:04,823][14933] Fps is (10 sec: 409.6, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1142784. Throughput: 0: 179.5. Samples: 185496. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:31:04,824][14933] Avg episode reward: [(0, '4.631')] -[2025-09-03 03:31:07,104][15670] Updated weights for policy 0, policy_version 280 (0.1760) -[2025-09-03 03:31:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 751.1, 300 sec: 694.2). Total num frames: 1146880. Throughput: 0: 181.7. Samples: 185948. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:31:09,825][14933] Avg episode reward: [(0, '4.660')] -[2025-09-03 03:31:14,823][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1150976. Throughput: 0: 185.8. Samples: 187130. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) -[2025-09-03 03:31:14,826][14933] Avg episode reward: [(0, '4.709')] -[2025-09-03 03:31:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1155072. Throughput: 0: 181.1. Samples: 188116. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:31:19,826][14933] Avg episode reward: [(0, '4.715')] -[2025-09-03 03:31:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 708.1). Total num frames: 1159168. Throughput: 0: 178.7. Samples: 188870. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:31:24,828][14933] Avg episode reward: [(0, '4.757')] -[2025-09-03 03:31:29,824][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1163264. Throughput: 0: 185.2. Samples: 189882. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:31:29,830][14933] Avg episode reward: [(0, '4.697')] -[2025-09-03 03:31:34,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1163264. Throughput: 0: 179.4. Samples: 190952. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:31:34,825][14933] Avg episode reward: [(0, '4.795')] -[2025-09-03 03:31:39,823][14933] Fps is (10 sec: 409.7, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1167360. Throughput: 0: 171.8. Samples: 191402. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:31:39,829][14933] Avg episode reward: [(0, '4.680')] -[2025-09-03 03:31:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1171456. Throughput: 0: 180.1. Samples: 192666. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:31:44,833][14933] Avg episode reward: [(0, '4.846')] -[2025-09-03 03:31:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 694.2). Total num frames: 1175552. Throughput: 0: 179.6. Samples: 193580. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:31:49,827][14933] Avg episode reward: [(0, '4.888')] -[2025-09-03 03:31:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1179648. Throughput: 0: 179.2. Samples: 194012. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:31:54,825][14933] Avg episode reward: [(0, '4.957')] -[2025-09-03 03:31:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1183744. Throughput: 0: 179.1. Samples: 195190. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:31:59,825][14933] Avg episode reward: [(0, '4.990')] -[2025-09-03 03:32:03,087][15670] Updated weights for policy 0, policy_version 290 (0.0720) -[2025-09-03 03:32:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1187840. Throughput: 0: 182.1. Samples: 196312. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:32:04,829][14933] Avg episode reward: [(0, '5.106')] -[2025-09-03 03:32:09,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1187840. Throughput: 0: 177.2. Samples: 196842. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:32:09,829][14933] Avg episode reward: [(0, '5.043')] -[2025-09-03 03:32:14,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1191936. Throughput: 0: 182.4. Samples: 198088. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:32:14,830][14933] Avg episode reward: [(0, '5.136')] -[2025-09-03 03:32:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1196032. Throughput: 0: 180.8. Samples: 199088. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:32:19,825][14933] Avg episode reward: [(0, '5.074')] -[2025-09-03 03:32:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1200128. Throughput: 0: 180.1. Samples: 199506. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:32:24,828][14933] Avg episode reward: [(0, '5.068')] -[2025-09-03 03:32:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1204224. Throughput: 0: 176.4. Samples: 200606. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:32:29,830][14933] Avg episode reward: [(0, '5.100')] -[2025-09-03 03:32:31,976][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000295_1208320.pth... -[2025-09-03 03:32:32,087][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000253_1036288.pth -[2025-09-03 03:32:34,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1208320. Throughput: 0: 185.1. Samples: 201910. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:32:34,831][14933] Avg episode reward: [(0, '5.152')] -[2025-09-03 03:32:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1212416. Throughput: 0: 182.3. Samples: 202214. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:32:39,829][14933] Avg episode reward: [(0, '5.202')] -[2025-09-03 03:32:44,420][15657] Saving new best policy, reward=5.202! -[2025-09-03 03:32:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1216512. Throughput: 0: 178.4. Samples: 203218. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:32:44,824][14933] Avg episode reward: [(0, '5.246')] -[2025-09-03 03:32:48,943][15657] Saving new best policy, reward=5.246! -[2025-09-03 03:32:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1220608. Throughput: 0: 180.0. Samples: 204410. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:32:49,825][14933] Avg episode reward: [(0, '5.235')] -[2025-09-03 03:32:54,823][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1224704. Throughput: 0: 187.8. Samples: 205292. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:32:54,831][14933] Avg episode reward: [(0, '5.405')] -[2025-09-03 03:32:59,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1224704. Throughput: 0: 176.9. Samples: 206050. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:32:59,825][14933] Avg episode reward: [(0, '5.468')] -[2025-09-03 03:33:01,290][15657] Saving new best policy, reward=5.405! -[2025-09-03 03:33:01,308][15670] Updated weights for policy 0, policy_version 300 (0.0613) -[2025-09-03 03:33:01,432][15657] Saving new best policy, reward=5.468! -[2025-09-03 03:33:04,631][15657] Signal inference workers to stop experience collection... (200 times) -[2025-09-03 03:33:04,681][15670] InferenceWorker_p0-w0: stopping experience collection (200 times) -[2025-09-03 03:33:04,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1228800. Throughput: 0: 181.8. Samples: 207270. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:33:04,828][14933] Avg episode reward: [(0, '5.318')] -[2025-09-03 03:33:06,505][15657] Signal inference workers to resume experience collection... (200 times) -[2025-09-03 03:33:06,508][15670] InferenceWorker_p0-w0: resuming experience collection (200 times) -[2025-09-03 03:33:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1232896. Throughput: 0: 176.5. Samples: 207448. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:33:09,831][14933] Avg episode reward: [(0, '5.350')] -[2025-09-03 03:33:14,824][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1236992. Throughput: 0: 178.0. Samples: 208616. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:33:14,829][14933] Avg episode reward: [(0, '5.333')] -[2025-09-03 03:33:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1241088. Throughput: 0: 160.3. Samples: 209122. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:33:19,831][14933] Avg episode reward: [(0, '5.349')] -[2025-09-03 03:33:24,823][14933] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1245184. Throughput: 0: 177.4. Samples: 210198. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:33:24,831][14933] Avg episode reward: [(0, '5.330')] -[2025-09-03 03:33:29,826][14933] Fps is (10 sec: 409.5, 60 sec: 682.6, 300 sec: 708.1). Total num frames: 1245184. Throughput: 0: 183.7. Samples: 211486. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:33:29,847][14933] Avg episode reward: [(0, '5.442')] -[2025-09-03 03:33:34,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1249280. Throughput: 0: 171.1. Samples: 212108. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:33:34,828][14933] Avg episode reward: [(0, '5.223')] -[2025-09-03 03:33:39,823][14933] Fps is (10 sec: 819.4, 60 sec: 682.7, 300 sec: 722.0). Total num frames: 1253376. Throughput: 0: 159.8. Samples: 212484. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:33:39,825][14933] Avg episode reward: [(0, '5.348')] -[2025-09-03 03:33:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 722.0). Total num frames: 1257472. Throughput: 0: 168.1. Samples: 213614. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:33:44,825][14933] Avg episode reward: [(0, '5.290')] -[2025-09-03 03:33:49,824][14933] Fps is (10 sec: 819.1, 60 sec: 682.7, 300 sec: 722.0). Total num frames: 1261568. Throughput: 0: 166.1. Samples: 214746. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:33:49,829][14933] Avg episode reward: [(0, '5.272')] -[2025-09-03 03:33:54,823][14933] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 708.1). Total num frames: 1261568. Throughput: 0: 173.2. Samples: 215244. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:33:54,828][14933] Avg episode reward: [(0, '5.149')] -[2025-09-03 03:33:59,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1265664. Throughput: 0: 172.6. Samples: 216382. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:33:59,825][14933] Avg episode reward: [(0, '5.100')] -[2025-09-03 03:34:00,000][15670] Updated weights for policy 0, policy_version 310 (0.2501) -[2025-09-03 03:34:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1269760. Throughput: 0: 187.4. Samples: 217554. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:34:04,825][14933] Avg episode reward: [(0, '5.165')] -[2025-09-03 03:34:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1273856. Throughput: 0: 166.3. Samples: 217682. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:34:09,829][14933] Avg episode reward: [(0, '5.217')] -[2025-09-03 03:34:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 722.0). Total num frames: 1277952. Throughput: 0: 159.2. Samples: 218648. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:34:14,828][14933] Avg episode reward: [(0, '5.238')] -[2025-09-03 03:34:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 722.0). Total num frames: 1282048. Throughput: 0: 168.9. Samples: 219710. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:34:19,835][14933] Avg episode reward: [(0, '5.307')] -[2025-09-03 03:34:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 614.4, 300 sec: 708.1). Total num frames: 1282048. Throughput: 0: 173.4. Samples: 220288. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:34:24,825][14933] Avg episode reward: [(0, '5.296')] -[2025-09-03 03:34:29,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1286144. Throughput: 0: 170.2. Samples: 221272. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:34:29,830][14933] Avg episode reward: [(0, '5.355')] -[2025-09-03 03:34:34,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1290240. Throughput: 0: 172.4. Samples: 222502. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:34:34,824][14933] Avg episode reward: [(0, '5.345')] -[2025-09-03 03:34:36,148][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000316_1294336.pth... -[2025-09-03 03:34:36,264][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000274_1122304.pth -[2025-09-03 03:34:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1294336. Throughput: 0: 175.1. Samples: 223124. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:34:39,824][14933] Avg episode reward: [(0, '5.398')] -[2025-09-03 03:34:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1298432. Throughput: 0: 165.6. Samples: 223836. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:34:44,825][14933] Avg episode reward: [(0, '5.325')] -[2025-09-03 03:34:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1302528. Throughput: 0: 169.2. Samples: 225166. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:34:49,828][14933] Avg episode reward: [(0, '5.371')] -[2025-09-03 03:34:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1306624. Throughput: 0: 179.6. Samples: 225764. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:34:54,828][14933] Avg episode reward: [(0, '5.384')] -[2025-09-03 03:34:59,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1306624. Throughput: 0: 180.4. Samples: 226766. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:34:59,830][14933] Avg episode reward: [(0, '5.452')] -[2025-09-03 03:35:00,132][15670] Updated weights for policy 0, policy_version 320 (0.2027) -[2025-09-03 03:35:04,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.2). Total num frames: 1310720. Throughput: 0: 180.0. Samples: 227810. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:35:04,830][14933] Avg episode reward: [(0, '5.445')] -[2025-09-03 03:35:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1314816. Throughput: 0: 176.6. Samples: 228236. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:35:09,826][14933] Avg episode reward: [(0, '5.343')] -[2025-09-03 03:35:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1318912. Throughput: 0: 175.8. Samples: 229182. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:35:14,825][14933] Avg episode reward: [(0, '5.324')] -[2025-09-03 03:35:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1323008. Throughput: 0: 173.2. Samples: 230298. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:35:19,825][14933] Avg episode reward: [(0, '5.257')] -[2025-09-03 03:35:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1327104. Throughput: 0: 171.3. Samples: 230834. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:35:24,826][14933] Avg episode reward: [(0, '5.349')] -[2025-09-03 03:35:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1331200. Throughput: 0: 178.2. Samples: 231854. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:35:29,830][14933] Avg episode reward: [(0, '5.302')] -[2025-09-03 03:35:34,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1331200. Throughput: 0: 170.8. Samples: 232852. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:35:34,831][14933] Avg episode reward: [(0, '5.177')] -[2025-09-03 03:35:39,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1335296. Throughput: 0: 166.7. Samples: 233266. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:35:39,833][14933] Avg episode reward: [(0, '5.223')] -[2025-09-03 03:35:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1339392. Throughput: 0: 172.6. Samples: 234532. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:35:44,825][14933] Avg episode reward: [(0, '5.190')] -[2025-09-03 03:35:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1343488. Throughput: 0: 168.9. Samples: 235412. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:35:49,829][14933] Avg episode reward: [(0, '5.194')] -[2025-09-03 03:35:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1347584. Throughput: 0: 170.8. Samples: 235924. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:35:54,829][14933] Avg episode reward: [(0, '5.274')] -[2025-09-03 03:35:57,179][15670] Updated weights for policy 0, policy_version 330 (0.2353) -[2025-09-03 03:35:59,825][14933] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1351680. Throughput: 0: 181.6. Samples: 237356. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:35:59,827][14933] Avg episode reward: [(0, '5.165')] -[2025-09-03 03:36:04,826][14933] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1355776. Throughput: 0: 170.4. Samples: 237966. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:36:04,831][14933] Avg episode reward: [(0, '5.165')] -[2025-09-03 03:36:09,823][14933] Fps is (10 sec: 819.4, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1359872. Throughput: 0: 177.0. Samples: 238800. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:36:09,832][14933] Avg episode reward: [(0, '5.114')] -[2025-09-03 03:36:14,823][14933] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1363968. Throughput: 0: 180.7. Samples: 239986. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:36:14,824][14933] Avg episode reward: [(0, '5.073')] -[2025-09-03 03:36:19,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1363968. Throughput: 0: 181.0. Samples: 240998. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:36:19,825][14933] Avg episode reward: [(0, '5.090')] -[2025-09-03 03:36:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1368064. Throughput: 0: 179.5. Samples: 241342. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:36:24,834][14933] Avg episode reward: [(0, '5.116')] -[2025-09-03 03:36:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1372160. Throughput: 0: 183.4. Samples: 242784. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:36:29,833][14933] Avg episode reward: [(0, '5.156')] -[2025-09-03 03:36:34,824][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1376256. Throughput: 0: 182.8. Samples: 243638. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:36:34,829][14933] Avg episode reward: [(0, '5.140')] -[2025-09-03 03:36:38,032][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000337_1380352.pth... -[2025-09-03 03:36:38,172][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000295_1208320.pth -[2025-09-03 03:36:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1380352. Throughput: 0: 181.1. Samples: 244072. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:36:39,830][14933] Avg episode reward: [(0, '5.183')] -[2025-09-03 03:36:44,823][14933] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1384448. Throughput: 0: 173.3. Samples: 245156. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:36:44,825][14933] Avg episode reward: [(0, '5.164')] -[2025-09-03 03:36:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1388544. Throughput: 0: 188.8. Samples: 246460. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:36:49,834][14933] Avg episode reward: [(0, '5.255')] -[2025-09-03 03:36:54,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1388544. Throughput: 0: 183.1. Samples: 247040. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:36:54,831][14933] Avg episode reward: [(0, '5.317')] -[2025-09-03 03:36:54,931][15670] Updated weights for policy 0, policy_version 340 (0.2502) -[2025-09-03 03:36:59,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1392640. Throughput: 0: 182.0. Samples: 248176. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:36:59,831][14933] Avg episode reward: [(0, '5.346')] -[2025-09-03 03:37:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1396736. Throughput: 0: 182.5. Samples: 249212. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:37:04,825][14933] Avg episode reward: [(0, '5.368')] -[2025-09-03 03:37:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1400832. Throughput: 0: 182.4. Samples: 249550. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:37:09,832][14933] Avg episode reward: [(0, '5.408')] -[2025-09-03 03:37:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1404928. Throughput: 0: 169.0. Samples: 250390. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:37:14,829][14933] Avg episode reward: [(0, '5.411')] -[2025-09-03 03:37:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1409024. Throughput: 0: 180.4. Samples: 251754. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:37:19,825][14933] Avg episode reward: [(0, '5.371')] -[2025-09-03 03:37:24,825][14933] Fps is (10 sec: 819.0, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1413120. Throughput: 0: 182.3. Samples: 252276. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:37:24,835][14933] Avg episode reward: [(0, '5.319')] -[2025-09-03 03:37:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1417216. Throughput: 0: 180.3. Samples: 253268. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:37:29,833][14933] Avg episode reward: [(0, '5.335')] -[2025-09-03 03:37:34,823][14933] Fps is (10 sec: 819.3, 60 sec: 751.0, 300 sec: 708.1). Total num frames: 1421312. Throughput: 0: 173.3. Samples: 254260. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:37:34,826][14933] Avg episode reward: [(0, '5.223')] -[2025-09-03 03:37:39,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1421312. Throughput: 0: 176.0. Samples: 254962. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:37:39,825][14933] Avg episode reward: [(0, '5.317')] -[2025-09-03 03:37:44,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1425408. Throughput: 0: 169.3. Samples: 255794. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:37:44,825][14933] Avg episode reward: [(0, '5.392')] -[2025-09-03 03:37:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1429504. Throughput: 0: 177.4. Samples: 257194. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:37:49,829][14933] Avg episode reward: [(0, '5.418')] -[2025-09-03 03:37:51,756][15670] Updated weights for policy 0, policy_version 350 (0.1901) -[2025-09-03 03:37:54,827][14933] Fps is (10 sec: 818.9, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1433600. Throughput: 0: 178.3. Samples: 257572. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:37:54,835][14933] Avg episode reward: [(0, '5.425')] -[2025-09-03 03:37:56,179][15657] Signal inference workers to stop experience collection... (250 times) -[2025-09-03 03:37:56,299][15670] InferenceWorker_p0-w0: stopping experience collection (250 times) -[2025-09-03 03:37:58,475][15657] Signal inference workers to resume experience collection... (250 times) -[2025-09-03 03:37:58,478][15670] InferenceWorker_p0-w0: resuming experience collection (250 times) -[2025-09-03 03:37:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1437696. Throughput: 0: 177.5. Samples: 258376. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:37:59,825][14933] Avg episode reward: [(0, '5.446')] -[2025-09-03 03:38:04,823][14933] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1441792. Throughput: 0: 174.2. Samples: 259594. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:38:04,825][14933] Avg episode reward: [(0, '5.658')] -[2025-09-03 03:38:08,615][15657] Saving new best policy, reward=5.658! -[2025-09-03 03:38:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1445888. Throughput: 0: 181.8. Samples: 260456. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:38:09,829][14933] Avg episode reward: [(0, '5.702')] -[2025-09-03 03:38:14,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1445888. Throughput: 0: 181.3. Samples: 261426. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:38:14,828][14933] Avg episode reward: [(0, '5.734')] -[2025-09-03 03:38:16,043][15657] Saving new best policy, reward=5.702! -[2025-09-03 03:38:19,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 694.2). Total num frames: 1449984. Throughput: 0: 183.9. Samples: 262534. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:38:19,840][14933] Avg episode reward: [(0, '5.839')] -[2025-09-03 03:38:20,891][15657] Saving new best policy, reward=5.734! -[2025-09-03 03:38:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1454080. Throughput: 0: 178.9. Samples: 263012. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:38:24,827][14933] Avg episode reward: [(0, '5.846')] -[2025-09-03 03:38:25,927][15657] Saving new best policy, reward=5.839! -[2025-09-03 03:38:29,830][14933] Fps is (10 sec: 818.6, 60 sec: 682.6, 300 sec: 708.1). Total num frames: 1458176. Throughput: 0: 182.8. Samples: 264022. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:38:29,837][14933] Avg episode reward: [(0, '5.778')] -[2025-09-03 03:38:33,218][15657] Saving new best policy, reward=5.846! -[2025-09-03 03:38:33,346][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000357_1462272.pth... -[2025-09-03 03:38:33,476][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000316_1294336.pth -[2025-09-03 03:38:34,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1462272. Throughput: 0: 171.9. Samples: 264928. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:38:34,825][14933] Avg episode reward: [(0, '5.901')] -[2025-09-03 03:38:38,392][15657] Saving new best policy, reward=5.901! -[2025-09-03 03:38:39,823][14933] Fps is (10 sec: 819.8, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1466368. Throughput: 0: 178.0. Samples: 265580. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:38:39,827][14933] Avg episode reward: [(0, '5.936')] -[2025-09-03 03:38:44,004][15657] Saving new best policy, reward=5.936! -[2025-09-03 03:38:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1470464. Throughput: 0: 182.7. Samples: 266598. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:38:44,825][14933] Avg episode reward: [(0, '5.919')] -[2025-09-03 03:38:49,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1470464. Throughput: 0: 179.5. Samples: 267670. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:38:49,841][14933] Avg episode reward: [(0, '5.818')] -[2025-09-03 03:38:50,614][15670] Updated weights for policy 0, policy_version 360 (0.0105) -[2025-09-03 03:38:54,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1474560. Throughput: 0: 172.8. Samples: 268234. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:38:54,830][14933] Avg episode reward: [(0, '5.922')] -[2025-09-03 03:38:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1478656. Throughput: 0: 182.5. Samples: 269638. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:38:59,825][14933] Avg episode reward: [(0, '5.996')] -[2025-09-03 03:39:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1482752. Throughput: 0: 171.0. Samples: 270228. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:39:04,825][14933] Avg episode reward: [(0, '5.846')] -[2025-09-03 03:39:07,654][15657] Saving new best policy, reward=5.996! -[2025-09-03 03:39:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1486848. Throughput: 0: 171.3. Samples: 270722. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:09,824][14933] Avg episode reward: [(0, '5.675')] -[2025-09-03 03:39:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1490944. Throughput: 0: 178.4. Samples: 272048. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:14,826][14933] Avg episode reward: [(0, '5.807')] -[2025-09-03 03:39:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1495040. Throughput: 0: 175.4. Samples: 272820. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:19,825][14933] Avg episode reward: [(0, '5.823')] -[2025-09-03 03:39:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1495040. Throughput: 0: 176.2. Samples: 273510. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:24,829][14933] Avg episode reward: [(0, '5.718')] -[2025-09-03 03:39:29,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.8, 300 sec: 708.1). Total num frames: 1499136. Throughput: 0: 179.2. Samples: 274662. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:29,833][14933] Avg episode reward: [(0, '5.701')] -[2025-09-03 03:39:34,828][14933] Fps is (10 sec: 818.7, 60 sec: 682.6, 300 sec: 708.1). Total num frames: 1503232. Throughput: 0: 176.4. Samples: 275610. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:34,831][14933] Avg episode reward: [(0, '5.934')] -[2025-09-03 03:39:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1507328. Throughput: 0: 170.0. Samples: 275884. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:39,825][14933] Avg episode reward: [(0, '6.029')] -[2025-09-03 03:39:42,857][15657] Saving new best policy, reward=6.029! -[2025-09-03 03:39:44,823][14933] Fps is (10 sec: 819.7, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1511424. Throughput: 0: 166.0. Samples: 277106. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:44,826][14933] Avg episode reward: [(0, '5.884')] -[2025-09-03 03:39:48,368][15670] Updated weights for policy 0, policy_version 370 (0.0694) -[2025-09-03 03:39:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1515520. Throughput: 0: 175.5. Samples: 278124. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:49,826][14933] Avg episode reward: [(0, '5.947')] -[2025-09-03 03:39:54,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1515520. Throughput: 0: 176.8. Samples: 278678. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:54,825][14933] Avg episode reward: [(0, '5.877')] -[2025-09-03 03:39:59,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1519616. Throughput: 0: 176.8. Samples: 280006. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:39:59,825][14933] Avg episode reward: [(0, '5.658')] -[2025-09-03 03:40:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1523712. Throughput: 0: 181.8. Samples: 281002. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:40:04,829][14933] Avg episode reward: [(0, '5.632')] -[2025-09-03 03:40:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1527808. Throughput: 0: 172.3. Samples: 281262. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:40:09,828][14933] Avg episode reward: [(0, '5.625')] -[2025-09-03 03:40:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1531904. Throughput: 0: 172.5. Samples: 282424. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:40:14,824][14933] Avg episode reward: [(0, '5.579')] -[2025-09-03 03:40:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1536000. Throughput: 0: 181.4. Samples: 283772. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:40:19,828][14933] Avg episode reward: [(0, '5.674')] -[2025-09-03 03:40:24,824][14933] Fps is (10 sec: 819.1, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1540096. Throughput: 0: 183.1. Samples: 284126. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:40:24,826][14933] Avg episode reward: [(0, '5.714')] -[2025-09-03 03:40:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1544192. Throughput: 0: 178.3. Samples: 285130. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:40:29,825][14933] Avg episode reward: [(0, '5.675')] -[2025-09-03 03:40:33,811][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000378_1548288.pth... -[2025-09-03 03:40:33,925][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000337_1380352.pth -[2025-09-03 03:40:34,830][14933] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1548288. Throughput: 0: 183.1. Samples: 286364. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:40:34,836][14933] Avg episode reward: [(0, '5.543')] -[2025-09-03 03:40:39,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1548288. Throughput: 0: 184.8. Samples: 286996. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:40:39,824][14933] Avg episode reward: [(0, '5.442')] -[2025-09-03 03:40:44,823][14933] Fps is (10 sec: 409.9, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1552384. Throughput: 0: 175.7. Samples: 287914. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:40:44,824][14933] Avg episode reward: [(0, '5.596')] -[2025-09-03 03:40:46,126][15670] Updated weights for policy 0, policy_version 380 (0.0617) -[2025-09-03 03:40:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1556480. Throughput: 0: 182.8. Samples: 289226. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:40:49,831][14933] Avg episode reward: [(0, '5.687')] -[2025-09-03 03:40:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1560576. Throughput: 0: 185.2. Samples: 289594. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:40:54,825][14933] Avg episode reward: [(0, '5.654')] -[2025-09-03 03:40:59,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1564672. Throughput: 0: 178.0. Samples: 290432. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:40:59,831][14933] Avg episode reward: [(0, '5.665')] -[2025-09-03 03:41:04,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1568768. Throughput: 0: 178.3. Samples: 291796. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:41:04,825][14933] Avg episode reward: [(0, '5.784')] -[2025-09-03 03:41:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1572864. Throughput: 0: 182.5. Samples: 292338. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:41:09,829][14933] Avg episode reward: [(0, '5.748')] -[2025-09-03 03:41:14,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1572864. Throughput: 0: 179.9. Samples: 293226. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:41:14,829][14933] Avg episode reward: [(0, '5.764')] -[2025-09-03 03:41:19,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1576960. Throughput: 0: 179.5. Samples: 294442. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:41:19,824][14933] Avg episode reward: [(0, '5.800')] -[2025-09-03 03:41:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1581056. Throughput: 0: 180.4. Samples: 295116. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:41:24,825][14933] Avg episode reward: [(0, '5.771')] -[2025-09-03 03:41:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1585152. Throughput: 0: 178.3. Samples: 295936. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:41:29,827][14933] Avg episode reward: [(0, '6.009')] -[2025-09-03 03:41:34,824][14933] Fps is (10 sec: 819.1, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1589248. Throughput: 0: 175.5. Samples: 297122. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:41:34,826][14933] Avg episode reward: [(0, '5.946')] -[2025-09-03 03:41:39,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1593344. Throughput: 0: 175.1. Samples: 297474. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:41:39,825][14933] Avg episode reward: [(0, '6.013')] -[2025-09-03 03:41:43,285][15670] Updated weights for policy 0, policy_version 390 (0.0672) -[2025-09-03 03:41:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1597440. Throughput: 0: 178.8. Samples: 298480. Policy #0 lag: (min: 1.0, avg: 1.5, max: 3.0) -[2025-09-03 03:41:44,827][14933] Avg episode reward: [(0, '6.052')] -[2025-09-03 03:41:49,091][15657] Saving new best policy, reward=6.052! -[2025-09-03 03:41:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1601536. Throughput: 0: 171.6. Samples: 299518. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:41:49,834][14933] Avg episode reward: [(0, '5.922')] -[2025-09-03 03:41:54,823][14933] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1605632. Throughput: 0: 180.0. Samples: 300440. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:41:54,829][14933] Avg episode reward: [(0, '5.871')] -[2025-09-03 03:41:59,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1605632. Throughput: 0: 183.1. Samples: 301466. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) -[2025-09-03 03:41:59,825][14933] Avg episode reward: [(0, '6.032')] -[2025-09-03 03:42:04,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1609728. Throughput: 0: 179.4. Samples: 302516. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:04,830][14933] Avg episode reward: [(0, '6.222')] -[2025-09-03 03:42:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1613824. Throughput: 0: 174.9. Samples: 302986. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:09,832][14933] Avg episode reward: [(0, '6.162')] -[2025-09-03 03:42:10,882][15657] Saving new best policy, reward=6.222! -[2025-09-03 03:42:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1617920. Throughput: 0: 184.1. Samples: 304222. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:14,829][14933] Avg episode reward: [(0, '6.261')] -[2025-09-03 03:42:17,404][15657] Saving new best policy, reward=6.261! -[2025-09-03 03:42:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1622016. Throughput: 0: 174.6. Samples: 304980. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:19,829][14933] Avg episode reward: [(0, '6.324')] -[2025-09-03 03:42:22,837][15657] Saving new best policy, reward=6.324! -[2025-09-03 03:42:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1626112. Throughput: 0: 179.0. Samples: 305528. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:24,825][14933] Avg episode reward: [(0, '6.428')] -[2025-09-03 03:42:27,973][15657] Saving new best policy, reward=6.428! -[2025-09-03 03:42:29,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1630208. Throughput: 0: 184.0. Samples: 306760. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:29,829][14933] Avg episode reward: [(0, '6.624')] -[2025-09-03 03:42:34,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1630208. Throughput: 0: 179.1. Samples: 307578. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:34,825][14933] Avg episode reward: [(0, '6.638')] -[2025-09-03 03:42:34,976][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000399_1634304.pth... -[2025-09-03 03:42:35,081][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000357_1462272.pth -[2025-09-03 03:42:35,099][15657] Saving new best policy, reward=6.624! -[2025-09-03 03:42:39,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1634304. Throughput: 0: 173.5. Samples: 308248. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:39,831][14933] Avg episode reward: [(0, '6.418')] -[2025-09-03 03:42:40,218][15657] Saving new best policy, reward=6.638! -[2025-09-03 03:42:40,226][15670] Updated weights for policy 0, policy_version 400 (0.0624) -[2025-09-03 03:42:43,272][15657] Signal inference workers to stop experience collection... (300 times) -[2025-09-03 03:42:43,322][15670] InferenceWorker_p0-w0: stopping experience collection (300 times) -[2025-09-03 03:42:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1638400. Throughput: 0: 182.0. Samples: 309658. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:44,830][14933] Avg episode reward: [(0, '6.389')] -[2025-09-03 03:42:45,375][15657] Signal inference workers to resume experience collection... (300 times) -[2025-09-03 03:42:45,377][15670] InferenceWorker_p0-w0: resuming experience collection (300 times) -[2025-09-03 03:42:49,827][14933] Fps is (10 sec: 818.9, 60 sec: 682.6, 300 sec: 708.1). Total num frames: 1642496. Throughput: 0: 174.7. Samples: 310378. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:49,833][14933] Avg episode reward: [(0, '6.330')] -[2025-09-03 03:42:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1646592. Throughput: 0: 171.8. Samples: 310718. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:54,824][14933] Avg episode reward: [(0, '6.184')] -[2025-09-03 03:42:59,823][14933] Fps is (10 sec: 819.5, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1650688. Throughput: 0: 174.3. Samples: 312066. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:42:59,832][14933] Avg episode reward: [(0, '6.128')] -[2025-09-03 03:43:04,827][14933] Fps is (10 sec: 818.8, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1654784. Throughput: 0: 178.0. Samples: 312990. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:04,829][14933] Avg episode reward: [(0, '6.134')] -[2025-09-03 03:43:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1658880. Throughput: 0: 177.5. Samples: 313514. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:09,824][14933] Avg episode reward: [(0, '6.020')] -[2025-09-03 03:43:14,823][14933] Fps is (10 sec: 819.6, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1662976. Throughput: 0: 177.6. Samples: 314752. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:14,825][14933] Avg episode reward: [(0, '6.101')] -[2025-09-03 03:43:19,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1662976. Throughput: 0: 182.3. Samples: 315782. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:19,826][14933] Avg episode reward: [(0, '6.137')] -[2025-09-03 03:43:24,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1667072. Throughput: 0: 176.0. Samples: 316166. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:24,825][14933] Avg episode reward: [(0, '5.856')] -[2025-09-03 03:43:29,823][14933] Fps is (10 sec: 819.1, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1671168. Throughput: 0: 171.0. Samples: 317354. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:29,831][14933] Avg episode reward: [(0, '5.841')] -[2025-09-03 03:43:34,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1675264. Throughput: 0: 181.6. Samples: 318548. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:34,829][14933] Avg episode reward: [(0, '5.788')] -[2025-09-03 03:43:37,618][15670] Updated weights for policy 0, policy_version 410 (0.1883) -[2025-09-03 03:43:39,823][14933] Fps is (10 sec: 819.3, 60 sec: 750.9, 300 sec: 708.1). Total num frames: 1679360. Throughput: 0: 180.8. Samples: 318856. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:39,829][14933] Avg episode reward: [(0, '5.934')] -[2025-09-03 03:43:44,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1683456. Throughput: 0: 173.7. Samples: 319884. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:44,824][14933] Avg episode reward: [(0, '5.903')] -[2025-09-03 03:43:49,823][14933] Fps is (10 sec: 819.2, 60 sec: 751.0, 300 sec: 722.0). Total num frames: 1687552. Throughput: 0: 183.4. Samples: 321244. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:49,825][14933] Avg episode reward: [(0, '6.045')] -[2025-09-03 03:43:54,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1691648. Throughput: 0: 185.5. Samples: 321860. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:54,825][14933] Avg episode reward: [(0, '6.156')] -[2025-09-03 03:43:59,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1691648. Throughput: 0: 180.0. Samples: 322854. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:43:59,831][14933] Avg episode reward: [(0, '6.304')] -[2025-09-03 03:44:04,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1695744. Throughput: 0: 181.1. Samples: 323930. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:44:04,831][14933] Avg episode reward: [(0, '6.301')] -[2025-09-03 03:44:09,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1699840. Throughput: 0: 180.8. Samples: 324302. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:44:09,824][14933] Avg episode reward: [(0, '6.389')] -[2025-09-03 03:44:14,823][14933] Fps is (10 sec: 819.2, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1703936. Throughput: 0: 175.1. Samples: 325234. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:44:14,824][14933] Avg episode reward: [(0, '6.474')] -[2025-09-03 03:44:19,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1708032. Throughput: 0: 174.3. Samples: 326390. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:44:19,825][14933] Avg episode reward: [(0, '6.435')] -[2025-09-03 03:44:24,823][14933] Fps is (10 sec: 819.2, 60 sec: 750.9, 300 sec: 722.0). Total num frames: 1712128. Throughput: 0: 181.9. Samples: 327040. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:44:24,830][14933] Avg episode reward: [(0, '6.513')] -[2025-09-03 03:44:29,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1712128. Throughput: 0: 181.2. Samples: 328038. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) -[2025-09-03 03:44:29,830][14933] Avg episode reward: [(0, '6.460')] -[2025-09-03 03:44:34,823][14933] Fps is (10 sec: 409.6, 60 sec: 682.7, 300 sec: 708.1). Total num frames: 1716224. Throughput: 0: 174.1. Samples: 329078. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) -[2025-09-03 03:44:34,825][14933] Avg episode reward: [(0, '6.636')] -[2025-09-03 03:44:34,899][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000420_1720320.pth... -[2025-09-03 03:44:34,905][15670] Updated weights for policy 0, policy_version 420 (0.1254) -[2025-09-03 03:44:34,995][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000378_1548288.pth -[2025-09-03 03:44:35,916][14933] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 14933], exiting... -[2025-09-03 03:44:35,926][15657] Stopping Batcher_0... -[2025-09-03 03:44:35,927][15657] Loop batcher_evt_loop terminating... -[2025-09-03 03:44:35,922][14933] Runner profile tree view: -main_loop: 1921.9047 -[2025-09-03 03:44:35,929][14933] Collected {0: 1720320}, FPS: 682.0 -[2025-09-03 03:44:36,508][15670] Weights refcount: 2 0 -[2025-09-03 03:44:36,524][15670] Stopping InferenceWorker_p0-w0... -[2025-09-03 03:44:36,531][15670] Loop inference_proc0-0_evt_loop terminating... -[2025-09-03 03:44:36,455][15675] EvtLoop [rollout_proc2_evt_loop, process=rollout_proc2] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance2'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.12/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - ^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 522, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-09-03 03:44:36,605][15675] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc2_evt_loop -[2025-09-03 03:44:36,527][15676] EvtLoop [rollout_proc6_evt_loop, process=rollout_proc6] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance6'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.12/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - ^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 522, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-09-03 03:44:36,544][15678] EvtLoop [rollout_proc4_evt_loop, process=rollout_proc4] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance4'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.12/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - ^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 522, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-09-03 03:44:36,640][15678] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc4_evt_loop -[2025-09-03 03:44:36,630][15676] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc6_evt_loop -[2025-09-03 03:44:36,529][15674] EvtLoop [rollout_proc1_evt_loop, process=rollout_proc1] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance1'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.12/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - ^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 522, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-09-03 03:44:36,679][15672] Stopping RolloutWorker_w5... -[2025-09-03 03:44:36,702][15674] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc1_evt_loop -[2025-09-03 03:44:36,713][15672] Loop rollout_proc5_evt_loop terminating... -[2025-09-03 03:44:36,806][15677] EvtLoop [rollout_proc7_evt_loop, process=rollout_proc7] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance7'), args=(1, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.12/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - ^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 522, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-09-03 03:44:36,892][15677] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc7_evt_loop -[2025-09-03 03:44:36,862][15671] EvtLoop [rollout_proc0_evt_loop, process=rollout_proc0] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance0'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.12/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - ^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 522, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-09-03 03:44:36,969][15671] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc0_evt_loop -[2025-09-03 03:44:36,926][15673] EvtLoop [rollout_proc3_evt_loop, process=rollout_proc3] unhandled exception in slot='advance_rollouts' connected to emitter=Emitter(object_id='InferenceWorker_p0-w0', signal_name='advance3'), args=(0, 0) -Traceback (most recent call last): - File "/usr/local/lib/python3.12/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal - slot_callable(*args) - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/rollout_worker.py", line 241, in advance_rollouts - complete_rollouts, episodic_stats = runner.advance_rollouts(policy_id, self.timing) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/sampling/non_batched_sampling.py", line 634, in advance_rollouts - new_obs, rewards, terminated, truncated, infos = e.step(actions) - ^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 129, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/algo/utils/make_env.py", line 115, in step - obs, rew, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/scenario_wrappers/gathering_reward_shaping.py", line 33, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 522, in step - observation, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sample_factory/envs/env_wrappers.py", line 86, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/gymnasium/core.py", line 461, in step - return self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/wrappers/multiplayer_stats.py", line 54, in step - obs, reward, terminated, truncated, info = self.env.step(action) - ^^^^^^^^^^^^^^^^^^^^^ - File "/usr/local/lib/python3.12/dist-packages/sf_examples/vizdoom/doom/doom_gym.py", line 452, in step - reward = self.game.make_action(actions_flattened, self.skip_frames) - ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -vizdoom.vizdoom.SignalException: Signal SIGINT received. ViZDoom instance has been closed. -[2025-09-03 03:44:37,041][15673] Unhandled exception Signal SIGINT received. ViZDoom instance has been closed. in evt loop rollout_proc3_evt_loop -[2025-09-03 03:44:43,027][15657] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000421_1724416.pth... -[2025-09-03 03:44:43,155][15657] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000399_1634304.pth -[2025-09-03 03:44:43,176][15657] Stopping LearnerWorker_p0... -[2025-09-03 03:44:43,176][15657] Loop learner_proc0_evt_loop terminating... -[2025-09-03 03:44:46,257][14933] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json -[2025-09-03 03:44:46,261][14933] Overriding arg 'num_workers' with value 1 passed from command line -[2025-09-03 03:44:46,263][14933] Adding new argument 'no_render'=True that is not in the saved config file! -[2025-09-03 03:44:46,265][14933] Adding new argument 'save_video'=True that is not in the saved config file! -[2025-09-03 03:44:46,267][14933] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! -[2025-09-03 03:44:46,269][14933] Adding new argument 'video_name'=None that is not in the saved config file! -[2025-09-03 03:44:46,270][14933] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! -[2025-09-03 03:44:46,272][14933] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! -[2025-09-03 03:44:46,273][14933] Adding new argument 'push_to_hub'=True that is not in the saved config file! -[2025-09-03 03:44:46,276][14933] Adding new argument 'hf_repository'='WangChongan/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! -[2025-09-03 03:44:46,278][14933] Adding new argument 'policy_index'=0 that is not in the saved config file! -[2025-09-03 03:44:46,279][14933] Adding new argument 'eval_deterministic'=False that is not in the saved config file! -[2025-09-03 03:44:46,281][14933] Adding new argument 'train_script'=None that is not in the saved config file! -[2025-09-03 03:44:46,282][14933] Adding new argument 'enjoy_script'=None that is not in the saved config file! -[2025-09-03 03:44:46,284][14933] Using frameskip 1 and render_action_repeat=4 for evaluation -[2025-09-03 03:44:46,382][14933] RunningMeanStd input shape: (3, 72, 128) -[2025-09-03 03:44:46,385][14933] RunningMeanStd input shape: (1,) -[2025-09-03 03:44:46,424][14933] ConvEncoder: input_channels=3 -[2025-09-03 03:44:46,547][14933] Conv encoder output size: 512 -[2025-09-03 03:44:46,561][14933] Policy head output size: 512 -[2025-09-03 03:44:46,613][14933] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000421_1724416.pth... -[2025-09-03 03:44:48,138][14933] Num frames 100... -[2025-09-03 03:44:48,664][14933] Num frames 200... -[2025-09-03 03:44:49,258][14933] Num frames 300... -[2025-09-03 03:44:49,654][14933] Num frames 400... -[2025-09-03 03:44:49,956][14933] Num frames 500... -[2025-09-03 03:44:50,146][14933] Avg episode rewards: #0: 7.440, true rewards: #0: 5.440 -[2025-09-03 03:44:50,148][14933] Avg episode reward: 7.440, avg true_objective: 5.440 -[2025-09-03 03:44:50,331][14933] Num frames 600... -[2025-09-03 03:44:50,595][14933] Num frames 700... -[2025-09-03 03:44:50,865][14933] Num frames 800... -[2025-09-03 03:44:50,994][14933] Avg episode rewards: #0: 5.120, true rewards: #0: 4.120 -[2025-09-03 03:44:50,996][14933] Avg episode reward: 5.120, avg true_objective: 4.120 -[2025-09-03 03:44:51,212][14933] Num frames 900... -[2025-09-03 03:44:51,475][14933] Num frames 1000... -[2025-09-03 03:44:51,720][14933] Num frames 1100... -[2025-09-03 03:44:51,962][14933] Num frames 1200... -[2025-09-03 03:44:52,204][14933] Num frames 1300... -[2025-09-03 03:44:52,452][14933] Avg episode rewards: #0: 5.893, true rewards: #0: 4.560 -[2025-09-03 03:44:52,453][14933] Avg episode reward: 5.893, avg true_objective: 4.560 -[2025-09-03 03:44:52,558][14933] Num frames 1400... -[2025-09-03 03:44:52,851][14933] Num frames 1500... -[2025-09-03 03:44:53,136][14933] Num frames 1600... -[2025-09-03 03:44:53,444][14933] Num frames 1700... -[2025-09-03 03:44:53,645][14933] Avg episode rewards: #0: 5.618, true rewards: #0: 4.367 -[2025-09-03 03:44:53,647][14933] Avg episode reward: 5.618, avg true_objective: 4.367 -[2025-09-03 03:44:53,797][14933] Num frames 1800... -[2025-09-03 03:44:54,180][14933] Num frames 1900... -[2025-09-03 03:44:54,605][14933] Num frames 2000... -[2025-09-03 03:44:54,993][14933] Num frames 2100... -[2025-09-03 03:44:55,393][14933] Num frames 2200... -[2025-09-03 03:44:55,794][14933] Num frames 2300... -[2025-09-03 03:44:56,079][14933] Avg episode rewards: #0: 6.310, true rewards: #0: 4.710 -[2025-09-03 03:44:56,081][14933] Avg episode reward: 6.310, avg true_objective: 4.710 -[2025-09-03 03:44:56,274][14933] Num frames 2400... -[2025-09-03 03:44:56,715][14933] Num frames 2500... -[2025-09-03 03:44:57,018][14933] Num frames 2600... -[2025-09-03 03:44:57,223][14933] Avg episode rewards: #0: 5.753, true rewards: #0: 4.420 -[2025-09-03 03:44:57,224][14933] Avg episode reward: 5.753, avg true_objective: 4.420 -[2025-09-03 03:44:57,362][14933] Num frames 2700... -[2025-09-03 03:44:57,665][14933] Num frames 2800... -[2025-09-03 03:44:57,938][14933] Num frames 2900... -[2025-09-03 03:44:58,214][14933] Num frames 3000... -[2025-09-03 03:44:58,494][14933] Num frames 3100... -[2025-09-03 03:44:58,798][14933] Num frames 3200... -[2025-09-03 03:44:58,930][14933] Avg episode rewards: #0: 6.326, true rewards: #0: 4.611 -[2025-09-03 03:44:58,932][14933] Avg episode reward: 6.326, avg true_objective: 4.611 -[2025-09-03 03:44:59,158][14933] Num frames 3300... -[2025-09-03 03:44:59,442][14933] Num frames 3400... -[2025-09-03 03:44:59,734][14933] Num frames 3500... -[2025-09-03 03:45:00,010][14933] Num frames 3600... -[2025-09-03 03:45:00,316][14933] Num frames 3700... -[2025-09-03 03:45:00,637][14933] Num frames 3800... -[2025-09-03 03:45:00,709][14933] Avg episode rewards: #0: 6.630, true rewards: #0: 4.755 -[2025-09-03 03:45:00,710][14933] Avg episode reward: 6.630, avg true_objective: 4.755 -[2025-09-03 03:45:00,952][14933] Num frames 3900... -[2025-09-03 03:45:01,222][14933] Num frames 4000... -[2025-09-03 03:45:01,495][14933] Num frames 4100... -[2025-09-03 03:45:01,753][14933] Num frames 4200... -[2025-09-03 03:45:01,890][14933] Avg episode rewards: #0: 6.471, true rewards: #0: 4.693 -[2025-09-03 03:45:01,892][14933] Avg episode reward: 6.471, avg true_objective: 4.693 -[2025-09-03 03:45:02,087][14933] Num frames 4300... -[2025-09-03 03:45:02,397][14933] Num frames 4400... -[2025-09-03 03:45:02,651][14933] Num frames 4500... -[2025-09-03 03:45:02,911][14933] Num frames 4600... -[2025-09-03 03:45:03,177][14933] Num frames 4700... -[2025-09-03 03:45:03,402][14933] Avg episode rewards: #0: 6.568, true rewards: #0: 4.768 -[2025-09-03 03:45:03,404][14933] Avg episode reward: 6.568, avg true_objective: 4.768 -[2025-09-03 03:45:39,922][14933] Replay video saved to /content/train_dir/default_experiment/replay.mp4! +[2025-09-03 03:57:18,148][08383] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-03 03:57:18,149][08383] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-09-03 03:57:18,278][08383] Num visible devices: 1 +[2025-09-03 03:57:18,369][08370] Using optimizer +[2025-09-03 03:57:19,642][08370] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth... +[2025-09-03 03:57:19,674][08370] Loading model from checkpoint +[2025-09-03 03:57:19,676][08370] Loaded experiment state at self.train_step=2, self.env_steps=8192 +[2025-09-03 03:57:19,677][08370] Initialized policy 0 weights for model version 2 +[2025-09-03 03:57:19,679][08370] LearnerWorker_p0 finished initialization! +[2025-09-03 03:57:19,680][08370] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-09-03 03:57:19,891][08383] RunningMeanStd input shape: (3, 72, 128) +[2025-09-03 03:57:19,892][08383] RunningMeanStd input shape: (1,) +[2025-09-03 03:57:19,902][08383] ConvEncoder: input_channels=3 +[2025-09-03 03:57:20,002][08383] Conv encoder output size: 512 +[2025-09-03 03:57:20,003][08383] Policy head output size: 512 +[2025-09-03 03:57:20,048][08012] Inference worker 0-0 is ready! +[2025-09-03 03:57:20,050][08012] All inference workers are ready! Signal rollout workers to start! +[2025-09-03 03:57:20,578][08389] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:57:20,582][08391] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:57:20,583][08387] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:57:20,586][08384] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:57:20,598][08390] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:57:20,602][08388] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:57:20,603][08385] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:57:20,601][08386] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-09-03 03:57:21,070][08012] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8192. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-03 03:57:21,513][08012] Heartbeat connected on Batcher_0 +[2025-09-03 03:57:21,514][08012] Heartbeat connected on LearnerWorker_p0 +[2025-09-03 03:57:21,571][08012] Heartbeat connected on InferenceWorker_p0-w0 +[2025-09-03 03:57:22,320][08386] Decorrelating experience for 0 frames... +[2025-09-03 03:57:22,332][08388] Decorrelating experience for 0 frames... +[2025-09-03 03:57:22,343][08385] Decorrelating experience for 0 frames... +[2025-09-03 03:57:22,721][08384] Decorrelating experience for 0 frames... +[2025-09-03 03:57:22,718][08389] Decorrelating experience for 0 frames... +[2025-09-03 03:57:22,724][08387] Decorrelating experience for 0 frames... +[2025-09-03 03:57:22,733][08391] Decorrelating experience for 0 frames... +[2025-09-03 03:57:23,935][08388] Decorrelating experience for 32 frames... +[2025-09-03 03:57:23,954][08386] Decorrelating experience for 32 frames... +[2025-09-03 03:57:24,583][08385] Decorrelating experience for 32 frames... +[2025-09-03 03:57:24,590][08390] Decorrelating experience for 0 frames... +[2025-09-03 03:57:24,986][08384] Decorrelating experience for 32 frames... +[2025-09-03 03:57:25,008][08391] Decorrelating experience for 32 frames... +[2025-09-03 03:57:25,012][08389] Decorrelating experience for 32 frames... +[2025-09-03 03:57:25,898][08387] Decorrelating experience for 32 frames... +[2025-09-03 03:57:26,070][08012] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8192. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-03 03:57:26,404][08386] Decorrelating experience for 64 frames... +[2025-09-03 03:57:26,518][08390] Decorrelating experience for 32 frames... +[2025-09-03 03:57:26,919][08388] Decorrelating experience for 64 frames... +[2025-09-03 03:57:26,922][08385] Decorrelating experience for 64 frames... +[2025-09-03 03:57:27,141][08391] Decorrelating experience for 64 frames... +[2025-09-03 03:57:27,530][08387] Decorrelating experience for 64 frames... +[2025-09-03 03:57:27,649][08386] Decorrelating experience for 96 frames... +[2025-09-03 03:57:27,820][08384] Decorrelating experience for 64 frames... +[2025-09-03 03:57:27,848][08012] Heartbeat connected on RolloutWorker_w2 +[2025-09-03 03:57:28,128][08390] Decorrelating experience for 64 frames... +[2025-09-03 03:57:28,699][08389] Decorrelating experience for 64 frames... +[2025-09-03 03:57:28,994][08388] Decorrelating experience for 96 frames... +[2025-09-03 03:57:29,100][08391] Decorrelating experience for 96 frames... +[2025-09-03 03:57:29,333][08012] Heartbeat connected on RolloutWorker_w7 +[2025-09-03 03:57:29,356][08012] Heartbeat connected on RolloutWorker_w4 +[2025-09-03 03:57:29,510][08387] Decorrelating experience for 96 frames... +[2025-09-03 03:57:29,755][08012] Heartbeat connected on RolloutWorker_w3 +[2025-09-03 03:57:29,870][08384] Decorrelating experience for 96 frames... +[2025-09-03 03:57:30,055][08390] Decorrelating experience for 96 frames... +[2025-09-03 03:57:30,267][08012] Heartbeat connected on RolloutWorker_w1 +[2025-09-03 03:57:30,433][08012] Heartbeat connected on RolloutWorker_w6 +[2025-09-03 03:57:30,936][08389] Decorrelating experience for 96 frames... +[2025-09-03 03:57:31,070][08012] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 8192. Throughput: 0: 12.8. Samples: 128. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-09-03 03:57:31,076][08012] Avg episode reward: [(0, '1.657')] +[2025-09-03 03:57:31,552][08012] Heartbeat connected on RolloutWorker_w5 +[2025-09-03 03:57:31,749][08385] Decorrelating experience for 96 frames... +[2025-09-03 03:57:32,157][08012] Heartbeat connected on RolloutWorker_w0 +[2025-09-03 03:57:33,112][08370] Signal inference workers to stop experience collection... +[2025-09-03 03:57:33,125][08383] InferenceWorker_p0-w0: stopping experience collection +[2025-09-03 03:57:34,206][08370] Signal inference workers to resume experience collection... +[2025-09-03 03:57:34,208][08383] InferenceWorker_p0-w0: resuming experience collection +[2025-09-03 03:57:36,070][08012] Fps is (10 sec: 1228.8, 60 sec: 819.2, 300 sec: 819.2). Total num frames: 20480. Throughput: 0: 230.3. Samples: 3454. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2025-09-03 03:57:36,074][08012] Avg episode reward: [(0, '2.779')] +[2025-09-03 03:57:41,070][08012] Fps is (10 sec: 2867.1, 60 sec: 1433.6, 300 sec: 1433.6). Total num frames: 36864. Throughput: 0: 413.7. Samples: 8274. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 03:57:41,071][08012] Avg episode reward: [(0, '3.488')] +[2025-09-03 03:57:43,364][08383] Updated weights for policy 0, policy_version 12 (0.0105) +[2025-09-03 03:57:46,070][08012] Fps is (10 sec: 3686.4, 60 sec: 1966.1, 300 sec: 1966.1). Total num frames: 57344. Throughput: 0: 465.8. Samples: 11646. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 03:57:46,073][08012] Avg episode reward: [(0, '4.364')] +[2025-09-03 03:57:51,069][08012] Fps is (10 sec: 4505.8, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 81920. Throughput: 0: 594.1. Samples: 17824. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 03:57:51,071][08012] Avg episode reward: [(0, '4.570')] +[2025-09-03 03:57:53,867][08383] Updated weights for policy 0, policy_version 22 (0.0018) +[2025-09-03 03:57:56,071][08012] Fps is (10 sec: 3685.9, 60 sec: 2457.5, 300 sec: 2457.5). Total num frames: 94208. Throughput: 0: 585.9. Samples: 20506. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 03:57:56,076][08012] Avg episode reward: [(0, '4.445')] +[2025-09-03 03:58:01,070][08012] Fps is (10 sec: 3686.4, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 118784. Throughput: 0: 651.1. Samples: 26044. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 03:58:01,071][08012] Avg episode reward: [(0, '4.397')] +[2025-09-03 03:58:01,072][08370] Saving new best policy, reward=4.397! +[2025-09-03 03:58:03,679][08383] Updated weights for policy 0, policy_version 32 (0.0022) +[2025-09-03 03:58:06,070][08012] Fps is (10 sec: 4506.0, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 139264. Throughput: 0: 733.4. Samples: 33004. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 03:58:06,074][08012] Avg episode reward: [(0, '4.496')] +[2025-09-03 03:58:06,081][08370] Saving new best policy, reward=4.496! +[2025-09-03 03:58:11,070][08012] Fps is (10 sec: 3686.4, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 155648. Throughput: 0: 837.5. Samples: 37686. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 03:58:11,071][08012] Avg episode reward: [(0, '4.397')] +[2025-09-03 03:58:14,462][08383] Updated weights for policy 0, policy_version 42 (0.0014) +[2025-09-03 03:58:16,070][08012] Fps is (10 sec: 4096.2, 60 sec: 3127.9, 300 sec: 3127.9). Total num frames: 180224. Throughput: 0: 912.8. Samples: 41202. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 03:58:16,073][08012] Avg episode reward: [(0, '4.383')] +[2025-09-03 03:58:21,069][08012] Fps is (10 sec: 4505.6, 60 sec: 3208.5, 300 sec: 3208.5). Total num frames: 200704. Throughput: 0: 992.8. Samples: 48130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 03:58:21,074][08012] Avg episode reward: [(0, '4.338')] +[2025-09-03 03:58:24,692][08383] Updated weights for policy 0, policy_version 52 (0.0019) +[2025-09-03 03:58:26,069][08012] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3150.8). Total num frames: 212992. Throughput: 0: 988.5. Samples: 52754. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 03:58:26,071][08012] Avg episode reward: [(0, '4.453')] +[2025-09-03 03:58:31,069][08012] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3276.8). Total num frames: 237568. Throughput: 0: 991.2. Samples: 56250. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 03:58:31,071][08012] Avg episode reward: [(0, '4.425')] +[2025-09-03 03:58:34,163][08383] Updated weights for policy 0, policy_version 62 (0.0017) +[2025-09-03 03:58:36,070][08012] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3386.0). Total num frames: 262144. Throughput: 0: 1007.8. Samples: 63176. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 03:58:36,077][08012] Avg episode reward: [(0, '4.284')] +[2025-09-03 03:58:41,069][08012] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3328.0). Total num frames: 274432. Throughput: 0: 1004.1. Samples: 65690. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 03:58:41,071][08012] Avg episode reward: [(0, '4.278')] +[2025-09-03 03:58:44,808][08383] Updated weights for policy 0, policy_version 72 (0.0015) +[2025-09-03 03:58:46,070][08012] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 3421.4). Total num frames: 299008. Throughput: 0: 1008.7. Samples: 71436. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-09-03 03:58:46,071][08012] Avg episode reward: [(0, '4.384')] +[2025-09-03 03:58:51,074][08012] Fps is (10 sec: 4503.6, 60 sec: 3959.2, 300 sec: 3458.7). Total num frames: 319488. Throughput: 0: 1009.2. Samples: 78420. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 03:58:51,075][08012] Avg episode reward: [(0, '4.648')] +[2025-09-03 03:58:51,079][08370] Saving new best policy, reward=4.648! +[2025-09-03 03:58:55,589][08383] Updated weights for policy 0, policy_version 82 (0.0012) +[2025-09-03 03:58:56,070][08012] Fps is (10 sec: 3686.5, 60 sec: 4027.8, 300 sec: 3449.3). Total num frames: 335872. Throughput: 0: 1006.8. Samples: 82990. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 03:58:56,073][08012] Avg episode reward: [(0, '4.574')] +[2025-09-03 03:58:56,082][08370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000082_335872.pth... +[2025-09-03 03:59:01,070][08012] Fps is (10 sec: 4097.7, 60 sec: 4027.7, 300 sec: 3522.5). Total num frames: 360448. Throughput: 0: 1006.0. Samples: 86472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 03:59:01,072][08012] Avg episode reward: [(0, '4.416')] +[2025-09-03 03:59:04,529][08383] Updated weights for policy 0, policy_version 92 (0.0027) +[2025-09-03 03:59:06,070][08012] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 3549.9). Total num frames: 380928. Throughput: 0: 1006.4. Samples: 93418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 03:59:06,071][08012] Avg episode reward: [(0, '4.269')] +[2025-09-03 03:59:11,069][08012] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3537.5). Total num frames: 397312. Throughput: 0: 1007.6. Samples: 98094. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 03:59:11,073][08012] Avg episode reward: [(0, '4.321')] +[2025-09-03 03:59:15,184][08383] Updated weights for policy 0, policy_version 102 (0.0017) +[2025-09-03 03:59:16,070][08012] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 3561.7). Total num frames: 417792. Throughput: 0: 1007.3. Samples: 101578. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 03:59:16,071][08012] Avg episode reward: [(0, '4.472')] +[2025-09-03 03:59:21,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3618.1). Total num frames: 442368. Throughput: 0: 1006.5. Samples: 108470. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 03:59:21,073][08012] Avg episode reward: [(0, '4.660')] +[2025-09-03 03:59:21,081][08370] Saving new best policy, reward=4.660! +[2025-09-03 03:59:26,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3571.7). Total num frames: 454656. Throughput: 0: 1054.2. Samples: 113128. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 03:59:26,074][08012] Avg episode reward: [(0, '4.674')] +[2025-09-03 03:59:26,089][08370] Saving new best policy, reward=4.674! +[2025-09-03 03:59:26,347][08383] Updated weights for policy 0, policy_version 112 (0.0017) +[2025-09-03 03:59:31,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3623.4). Total num frames: 479232. Throughput: 0: 1001.9. Samples: 116522. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 03:59:31,073][08012] Avg episode reward: [(0, '4.557')] +[2025-09-03 03:59:34,996][08383] Updated weights for policy 0, policy_version 122 (0.0012) +[2025-09-03 03:59:36,070][08012] Fps is (10 sec: 4915.2, 60 sec: 4027.7, 300 sec: 3671.2). Total num frames: 503808. Throughput: 0: 1001.4. Samples: 123480. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 03:59:36,073][08012] Avg episode reward: [(0, '4.549')] +[2025-09-03 03:59:41,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3627.9). Total num frames: 516096. Throughput: 0: 1006.0. Samples: 128262. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-03 03:59:41,073][08012] Avg episode reward: [(0, '4.602')] +[2025-09-03 03:59:45,780][08383] Updated weights for policy 0, policy_version 132 (0.0017) +[2025-09-03 03:59:46,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3672.3). Total num frames: 540672. Throughput: 0: 1005.6. Samples: 131722. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 03:59:46,073][08012] Avg episode reward: [(0, '4.822')] +[2025-09-03 03:59:46,079][08370] Saving new best policy, reward=4.822! +[2025-09-03 03:59:51,073][08012] Fps is (10 sec: 4503.8, 60 sec: 4027.8, 300 sec: 3686.3). Total num frames: 561152. Throughput: 0: 1005.7. Samples: 138680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 03:59:51,075][08012] Avg episode reward: [(0, '4.970')] +[2025-09-03 03:59:51,078][08370] Saving new best policy, reward=4.970! +[2025-09-03 03:59:56,070][08012] Fps is (10 sec: 3686.1, 60 sec: 4027.7, 300 sec: 3673.2). Total num frames: 577536. Throughput: 0: 1002.4. Samples: 143204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 03:59:56,073][08012] Avg episode reward: [(0, '4.866')] +[2025-09-03 03:59:56,710][08383] Updated weights for policy 0, policy_version 142 (0.0013) +[2025-09-03 04:00:01,069][08012] Fps is (10 sec: 4097.6, 60 sec: 4027.8, 300 sec: 3712.0). Total num frames: 602112. Throughput: 0: 1001.6. Samples: 146652. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:00:01,075][08012] Avg episode reward: [(0, '4.510')] +[2025-09-03 04:00:05,610][08383] Updated weights for policy 0, policy_version 152 (0.0016) +[2025-09-03 04:00:06,070][08012] Fps is (10 sec: 4505.8, 60 sec: 4027.7, 300 sec: 3723.6). Total num frames: 622592. Throughput: 0: 1003.9. Samples: 153648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:00:06,074][08012] Avg episode reward: [(0, '4.576')] +[2025-09-03 04:00:11,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3710.5). Total num frames: 638976. Throughput: 0: 1006.0. Samples: 158400. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:00:11,071][08012] Avg episode reward: [(0, '4.714')] +[2025-09-03 04:00:16,070][08012] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3721.5). Total num frames: 659456. Throughput: 0: 1008.2. Samples: 161890. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-09-03 04:00:16,071][08012] Avg episode reward: [(0, '4.872')] +[2025-09-03 04:00:16,314][08383] Updated weights for policy 0, policy_version 162 (0.0013) +[2025-09-03 04:00:21,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3754.7). Total num frames: 684032. Throughput: 0: 1008.7. Samples: 168872. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:00:21,074][08012] Avg episode reward: [(0, '4.910')] +[2025-09-03 04:00:26,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3741.8). Total num frames: 700416. Throughput: 0: 1008.5. Samples: 173644. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:00:26,071][08012] Avg episode reward: [(0, '4.903')] +[2025-09-03 04:00:26,827][08383] Updated weights for policy 0, policy_version 172 (0.0016) +[2025-09-03 04:00:31,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3751.1). Total num frames: 720896. Throughput: 0: 1009.6. Samples: 177154. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:00:31,074][08012] Avg episode reward: [(0, '5.110')] +[2025-09-03 04:00:31,078][08370] Saving new best policy, reward=5.110! +[2025-09-03 04:00:36,003][08383] Updated weights for policy 0, policy_version 182 (0.0018) +[2025-09-03 04:00:36,071][08012] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 3780.9). Total num frames: 745472. Throughput: 0: 1009.6. Samples: 184110. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:00:36,077][08012] Avg episode reward: [(0, '5.229')] +[2025-09-03 04:00:36,095][08370] Saving new best policy, reward=5.229! +[2025-09-03 04:00:41,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3747.8). Total num frames: 757760. Throughput: 0: 1016.7. Samples: 188956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:00:41,071][08012] Avg episode reward: [(0, '5.488')] +[2025-09-03 04:00:41,076][08370] Saving new best policy, reward=5.488! +[2025-09-03 04:00:46,070][08012] Fps is (10 sec: 3687.0, 60 sec: 4027.7, 300 sec: 3776.3). Total num frames: 782336. Throughput: 0: 1016.8. Samples: 192406. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:00:46,071][08012] Avg episode reward: [(0, '5.714')] +[2025-09-03 04:00:46,076][08370] Saving new best policy, reward=5.714! +[2025-09-03 04:00:46,435][08383] Updated weights for policy 0, policy_version 192 (0.0012) +[2025-09-03 04:00:51,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4028.0, 300 sec: 3783.9). Total num frames: 802816. Throughput: 0: 1014.9. Samples: 199316. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:00:51,072][08012] Avg episode reward: [(0, '5.668')] +[2025-09-03 04:00:56,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3772.1). Total num frames: 819200. Throughput: 0: 1015.9. Samples: 204114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:00:56,071][08012] Avg episode reward: [(0, '5.577')] +[2025-09-03 04:00:56,081][08370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000200_819200.pth... +[2025-09-03 04:00:56,197][08370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth +[2025-09-03 04:00:57,140][08383] Updated weights for policy 0, policy_version 202 (0.0017) +[2025-09-03 04:01:01,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3798.1). Total num frames: 843776. Throughput: 0: 1014.0. Samples: 207520. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:01:01,071][08012] Avg episode reward: [(0, '5.318')] +[2025-09-03 04:01:06,070][08012] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3804.7). Total num frames: 864256. Throughput: 0: 1009.5. Samples: 214298. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:01:06,073][08012] Avg episode reward: [(0, '5.183')] +[2025-09-03 04:01:07,078][08383] Updated weights for policy 0, policy_version 212 (0.0012) +[2025-09-03 04:01:11,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3793.3). Total num frames: 880640. Throughput: 0: 1010.4. Samples: 219112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-09-03 04:01:11,075][08012] Avg episode reward: [(0, '5.107')] +[2025-09-03 04:01:16,070][08012] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 3799.7). Total num frames: 901120. Throughput: 0: 1008.7. Samples: 222544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:01:16,071][08012] Avg episode reward: [(0, '5.063')] +[2025-09-03 04:01:17,142][08383] Updated weights for policy 0, policy_version 222 (0.0022) +[2025-09-03 04:01:21,069][08012] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3805.9). Total num frames: 921600. Throughput: 0: 1002.7. Samples: 229228. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:01:21,073][08012] Avg episode reward: [(0, '5.211')] +[2025-09-03 04:01:26,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3811.8). Total num frames: 942080. Throughput: 0: 1001.6. Samples: 234028. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:01:26,071][08012] Avg episode reward: [(0, '5.109')] +[2025-09-03 04:01:27,777][08383] Updated weights for policy 0, policy_version 232 (0.0029) +[2025-09-03 04:01:31,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3817.5). Total num frames: 962560. Throughput: 0: 1003.3. Samples: 237554. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 04:01:31,071][08012] Avg episode reward: [(0, '4.864')] +[2025-09-03 04:01:36,070][08012] Fps is (10 sec: 4095.9, 60 sec: 3959.6, 300 sec: 3822.9). Total num frames: 983040. Throughput: 0: 998.7. Samples: 244256. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:01:36,073][08012] Avg episode reward: [(0, '5.160')] +[2025-09-03 04:01:37,981][08383] Updated weights for policy 0, policy_version 242 (0.0016) +[2025-09-03 04:01:41,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3812.4). Total num frames: 999424. Throughput: 0: 938.5. Samples: 246346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:01:41,071][08012] Avg episode reward: [(0, '5.118')] +[2025-09-03 04:01:46,070][08012] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 3833.2). Total num frames: 1024000. Throughput: 0: 1003.5. Samples: 252676. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:01:46,076][08012] Avg episode reward: [(0, '5.114')] +[2025-09-03 04:01:47,529][08383] Updated weights for policy 0, policy_version 252 (0.0013) +[2025-09-03 04:01:51,069][08012] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3838.1). Total num frames: 1044480. Throughput: 0: 1004.7. Samples: 259510. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:01:51,075][08012] Avg episode reward: [(0, '5.391')] +[2025-09-03 04:01:56,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3827.9). Total num frames: 1060864. Throughput: 0: 1007.9. Samples: 264468. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:01:56,071][08012] Avg episode reward: [(0, '5.834')] +[2025-09-03 04:01:56,081][08370] Saving new best policy, reward=5.834! +[2025-09-03 04:01:58,079][08383] Updated weights for policy 0, policy_version 262 (0.0016) +[2025-09-03 04:02:01,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3847.3). Total num frames: 1085440. Throughput: 0: 1005.9. Samples: 267808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:02:01,074][08012] Avg episode reward: [(0, '5.824')] +[2025-09-03 04:02:06,070][08012] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 3851.7). Total num frames: 1105920. Throughput: 0: 1006.7. Samples: 274530. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:02:06,073][08012] Avg episode reward: [(0, '5.642')] +[2025-09-03 04:02:08,607][08383] Updated weights for policy 0, policy_version 272 (0.0044) +[2025-09-03 04:02:11,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3841.8). Total num frames: 1122304. Throughput: 0: 1012.8. Samples: 279604. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:02:11,074][08012] Avg episode reward: [(0, '5.686')] +[2025-09-03 04:02:16,070][08012] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3860.0). Total num frames: 1146880. Throughput: 0: 1012.3. Samples: 283108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:02:16,074][08012] Avg episode reward: [(0, '6.341')] +[2025-09-03 04:02:16,080][08370] Saving new best policy, reward=6.341! +[2025-09-03 04:02:17,660][08383] Updated weights for policy 0, policy_version 282 (0.0019) +[2025-09-03 04:02:21,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3929.4). Total num frames: 1167360. Throughput: 0: 1011.9. Samples: 289790. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:02:21,072][08012] Avg episode reward: [(0, '6.193')] +[2025-09-03 04:02:26,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3984.9). Total num frames: 1183744. Throughput: 0: 1076.9. Samples: 294808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:02:26,074][08012] Avg episode reward: [(0, '6.193')] +[2025-09-03 04:02:28,521][08383] Updated weights for policy 0, policy_version 292 (0.0015) +[2025-09-03 04:02:31,069][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1204224. Throughput: 0: 1014.1. Samples: 298312. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:02:31,074][08012] Avg episode reward: [(0, '6.524')] +[2025-09-03 04:02:31,078][08370] Saving new best policy, reward=6.524! +[2025-09-03 04:02:36,069][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4026.6). Total num frames: 1224704. Throughput: 0: 1008.7. Samples: 304902. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:02:36,073][08012] Avg episode reward: [(0, '6.992')] +[2025-09-03 04:02:36,078][08370] Saving new best policy, reward=6.992! +[2025-09-03 04:02:39,358][08383] Updated weights for policy 0, policy_version 302 (0.0014) +[2025-09-03 04:02:41,069][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1245184. Throughput: 0: 1009.4. Samples: 309890. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:02:41,073][08012] Avg episode reward: [(0, '7.141')] +[2025-09-03 04:02:41,081][08370] Saving new best policy, reward=7.141! +[2025-09-03 04:02:46,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1265664. Throughput: 0: 1010.6. Samples: 313284. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:02:46,071][08012] Avg episode reward: [(0, '6.985')] +[2025-09-03 04:02:48,116][08383] Updated weights for policy 0, policy_version 312 (0.0012) +[2025-09-03 04:02:51,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1286144. Throughput: 0: 1008.1. Samples: 319894. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:02:51,073][08012] Avg episode reward: [(0, '7.610')] +[2025-09-03 04:02:51,075][08370] Saving new best policy, reward=7.610! +[2025-09-03 04:02:56,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1302528. Throughput: 0: 1003.5. Samples: 324762. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 04:02:56,072][08012] Avg episode reward: [(0, '7.415')] +[2025-09-03 04:02:56,080][08370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000318_1302528.pth... +[2025-09-03 04:02:56,189][08370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000082_335872.pth +[2025-09-03 04:02:59,195][08383] Updated weights for policy 0, policy_version 322 (0.0019) +[2025-09-03 04:03:01,069][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1327104. Throughput: 0: 1001.6. Samples: 328178. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:03:01,072][08012] Avg episode reward: [(0, '8.369')] +[2025-09-03 04:03:01,074][08370] Saving new best policy, reward=8.369! +[2025-09-03 04:03:06,070][08012] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1347584. Throughput: 0: 999.4. Samples: 334762. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:03:06,073][08012] Avg episode reward: [(0, '8.582')] +[2025-09-03 04:03:06,081][08370] Saving new best policy, reward=8.582! +[2025-09-03 04:03:10,092][08383] Updated weights for policy 0, policy_version 332 (0.0017) +[2025-09-03 04:03:11,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1363968. Throughput: 0: 997.9. Samples: 339714. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:03:11,074][08012] Avg episode reward: [(0, '8.622')] +[2025-09-03 04:03:11,083][08370] Saving new best policy, reward=8.622! +[2025-09-03 04:03:16,070][08012] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1384448. Throughput: 0: 995.7. Samples: 343120. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:03:16,071][08012] Avg episode reward: [(0, '7.633')] +[2025-09-03 04:03:18,943][08383] Updated weights for policy 0, policy_version 342 (0.0016) +[2025-09-03 04:03:21,069][08012] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1404928. Throughput: 0: 997.3. Samples: 349780. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:03:21,076][08012] Avg episode reward: [(0, '7.903')] +[2025-09-03 04:03:26,069][08012] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1421312. Throughput: 0: 995.4. Samples: 354682. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:03:26,074][08012] Avg episode reward: [(0, '7.817')] +[2025-09-03 04:03:29,827][08383] Updated weights for policy 0, policy_version 352 (0.0021) +[2025-09-03 04:03:31,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1445888. Throughput: 0: 996.6. Samples: 358130. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-09-03 04:03:31,072][08012] Avg episode reward: [(0, '8.400')] +[2025-09-03 04:03:36,075][08012] Fps is (10 sec: 4503.1, 60 sec: 4027.4, 300 sec: 4040.4). Total num frames: 1466368. Throughput: 0: 998.1. Samples: 364812. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:03:36,077][08012] Avg episode reward: [(0, '8.014')] +[2025-09-03 04:03:40,545][08383] Updated weights for policy 0, policy_version 362 (0.0017) +[2025-09-03 04:03:41,070][08012] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4012.7). Total num frames: 1482752. Throughput: 0: 1003.8. Samples: 369932. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:03:41,074][08012] Avg episode reward: [(0, '8.534')] +[2025-09-03 04:03:46,070][08012] Fps is (10 sec: 4098.2, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1507328. Throughput: 0: 1005.1. Samples: 373408. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:03:46,071][08012] Avg episode reward: [(0, '8.499')] +[2025-09-03 04:03:49,376][08383] Updated weights for policy 0, policy_version 372 (0.0012) +[2025-09-03 04:03:51,071][08012] Fps is (10 sec: 4505.0, 60 sec: 4027.6, 300 sec: 4040.4). Total num frames: 1527808. Throughput: 0: 1004.3. Samples: 379956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:03:51,072][08012] Avg episode reward: [(0, '9.577')] +[2025-09-03 04:03:51,077][08370] Saving new best policy, reward=9.577! +[2025-09-03 04:03:56,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1544192. Throughput: 0: 1004.8. Samples: 384932. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:03:56,071][08012] Avg episode reward: [(0, '9.711')] +[2025-09-03 04:03:56,078][08370] Saving new best policy, reward=9.711! +[2025-09-03 04:04:00,208][08383] Updated weights for policy 0, policy_version 382 (0.0033) +[2025-09-03 04:04:01,069][08012] Fps is (10 sec: 4096.6, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1568768. Throughput: 0: 1005.2. Samples: 388354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:04:01,071][08012] Avg episode reward: [(0, '10.830')] +[2025-09-03 04:04:01,073][08370] Saving new best policy, reward=10.830! +[2025-09-03 04:04:06,070][08012] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 1585152. Throughput: 0: 999.7. Samples: 394766. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:04:06,082][08012] Avg episode reward: [(0, '10.813')] +[2025-09-03 04:04:11,033][08383] Updated weights for policy 0, policy_version 392 (0.0013) +[2025-09-03 04:04:11,069][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1605632. Throughput: 0: 1006.0. Samples: 399950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:04:11,071][08012] Avg episode reward: [(0, '10.892')] +[2025-09-03 04:04:11,075][08370] Saving new best policy, reward=10.892! +[2025-09-03 04:04:16,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1626112. Throughput: 0: 1004.7. Samples: 403340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:04:16,071][08012] Avg episode reward: [(0, '10.847')] +[2025-09-03 04:04:20,596][08383] Updated weights for policy 0, policy_version 402 (0.0013) +[2025-09-03 04:04:21,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 1646592. Throughput: 0: 1000.5. Samples: 409828. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:04:21,073][08012] Avg episode reward: [(0, '10.475')] +[2025-09-03 04:04:26,070][08012] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1662976. Throughput: 0: 1001.2. Samples: 414988. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:04:26,074][08012] Avg episode reward: [(0, '10.361')] +[2025-09-03 04:04:30,713][08383] Updated weights for policy 0, policy_version 412 (0.0016) +[2025-09-03 04:04:31,070][08012] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1687552. Throughput: 0: 1001.5. Samples: 418478. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:04:31,076][08012] Avg episode reward: [(0, '9.136')] +[2025-09-03 04:04:36,070][08012] Fps is (10 sec: 4505.7, 60 sec: 4028.1, 300 sec: 4040.5). Total num frames: 1708032. Throughput: 0: 1000.5. Samples: 424976. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:04:36,075][08012] Avg episode reward: [(0, '9.929')] +[2025-09-03 04:04:41,070][08012] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1724416. Throughput: 0: 1009.2. Samples: 430346. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:04:41,071][08012] Avg episode reward: [(0, '9.694')] +[2025-09-03 04:04:41,399][08383] Updated weights for policy 0, policy_version 422 (0.0021) +[2025-09-03 04:04:46,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1748992. Throughput: 0: 1010.4. Samples: 433820. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:04:46,071][08012] Avg episode reward: [(0, '9.352')] +[2025-09-03 04:04:50,956][08383] Updated weights for policy 0, policy_version 432 (0.0012) +[2025-09-03 04:04:51,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 1769472. Throughput: 0: 1011.3. Samples: 440276. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:04:51,071][08012] Avg episode reward: [(0, '8.816')] +[2025-09-03 04:04:56,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 1785856. Throughput: 0: 1015.6. Samples: 445654. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:04:56,074][08012] Avg episode reward: [(0, '10.319')] +[2025-09-03 04:04:56,083][08370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000436_1785856.pth... +[2025-09-03 04:04:56,192][08370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000200_819200.pth +[2025-09-03 04:05:00,670][08383] Updated weights for policy 0, policy_version 442 (0.0017) +[2025-09-03 04:05:01,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1810432. Throughput: 0: 1016.4. Samples: 449080. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:05:01,074][08012] Avg episode reward: [(0, '11.418')] +[2025-09-03 04:05:01,078][08370] Saving new best policy, reward=11.418! +[2025-09-03 04:05:06,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1826816. Throughput: 0: 1011.4. Samples: 455340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:05:06,084][08012] Avg episode reward: [(0, '12.691')] +[2025-09-03 04:05:06,093][08370] Saving new best policy, reward=12.691! +[2025-09-03 04:05:11,069][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1847296. Throughput: 0: 1017.9. Samples: 460792. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:05:11,071][08012] Avg episode reward: [(0, '12.252')] +[2025-09-03 04:05:11,651][08383] Updated weights for policy 0, policy_version 452 (0.0014) +[2025-09-03 04:05:16,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1871872. Throughput: 0: 1016.4. Samples: 464216. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:05:16,071][08012] Avg episode reward: [(0, '11.517')] +[2025-09-03 04:05:21,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1888256. Throughput: 0: 1012.7. Samples: 470546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:05:21,071][08012] Avg episode reward: [(0, '11.167')] +[2025-09-03 04:05:21,446][08383] Updated weights for policy 0, policy_version 462 (0.0014) +[2025-09-03 04:05:26,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1908736. Throughput: 0: 1014.1. Samples: 475982. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:05:26,071][08012] Avg episode reward: [(0, '12.298')] +[2025-09-03 04:05:31,045][08383] Updated weights for policy 0, policy_version 472 (0.0018) +[2025-09-03 04:05:31,069][08012] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1933312. Throughput: 0: 1015.6. Samples: 479524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:05:31,073][08012] Avg episode reward: [(0, '13.869')] +[2025-09-03 04:05:31,076][08370] Saving new best policy, reward=13.869! +[2025-09-03 04:05:36,070][08012] Fps is (10 sec: 4095.7, 60 sec: 4027.7, 300 sec: 4040.4). Total num frames: 1949696. Throughput: 0: 1010.1. Samples: 485730. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:05:36,076][08012] Avg episode reward: [(0, '15.139')] +[2025-09-03 04:05:36,087][08370] Saving new best policy, reward=15.139! +[2025-09-03 04:05:41,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 1970176. Throughput: 0: 1013.0. Samples: 491238. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:05:41,072][08012] Avg episode reward: [(0, '15.862')] +[2025-09-03 04:05:41,075][08370] Saving new best policy, reward=15.862! +[2025-09-03 04:05:41,698][08383] Updated weights for policy 0, policy_version 482 (0.0014) +[2025-09-03 04:05:46,070][08012] Fps is (10 sec: 4096.3, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 1990656. Throughput: 0: 1013.4. Samples: 494682. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:05:46,071][08012] Avg episode reward: [(0, '17.164')] +[2025-09-03 04:05:46,081][08370] Saving new best policy, reward=17.164! +[2025-09-03 04:05:51,072][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2011136. Throughput: 0: 1011.6. Samples: 500862. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:05:51,078][08012] Avg episode reward: [(0, '17.587')] +[2025-09-03 04:05:51,080][08370] Saving new best policy, reward=17.587! +[2025-09-03 04:05:52,444][08383] Updated weights for policy 0, policy_version 492 (0.0020) +[2025-09-03 04:05:56,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2027520. Throughput: 0: 1009.2. Samples: 506208. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:05:56,071][08012] Avg episode reward: [(0, '17.475')] +[2025-09-03 04:06:01,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2052096. Throughput: 0: 1009.8. Samples: 509658. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:06:01,071][08012] Avg episode reward: [(0, '19.824')] +[2025-09-03 04:06:01,072][08370] Saving new best policy, reward=19.824! +[2025-09-03 04:06:01,636][08383] Updated weights for policy 0, policy_version 502 (0.0018) +[2025-09-03 04:06:06,071][08012] Fps is (10 sec: 4095.4, 60 sec: 4027.6, 300 sec: 4026.6). Total num frames: 2068480. Throughput: 0: 1004.4. Samples: 515744. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:06:06,072][08012] Avg episode reward: [(0, '19.915')] +[2025-09-03 04:06:06,082][08370] Saving new best policy, reward=19.915! +[2025-09-03 04:06:11,070][08012] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2088960. Throughput: 0: 1008.5. Samples: 521366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:06:11,071][08012] Avg episode reward: [(0, '19.394')] +[2025-09-03 04:06:12,346][08383] Updated weights for policy 0, policy_version 512 (0.0012) +[2025-09-03 04:06:16,070][08012] Fps is (10 sec: 4506.2, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2113536. Throughput: 0: 1006.7. Samples: 524824. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:06:16,071][08012] Avg episode reward: [(0, '20.199')] +[2025-09-03 04:06:16,079][08370] Saving new best policy, reward=20.199! +[2025-09-03 04:06:21,070][08012] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2129920. Throughput: 0: 1000.6. Samples: 530756. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:06:21,073][08012] Avg episode reward: [(0, '19.493')] +[2025-09-03 04:06:23,188][08383] Updated weights for policy 0, policy_version 522 (0.0013) +[2025-09-03 04:06:26,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2150400. Throughput: 0: 1003.7. Samples: 536404. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:06:26,071][08012] Avg episode reward: [(0, '19.153')] +[2025-09-03 04:06:31,070][08012] Fps is (10 sec: 4095.9, 60 sec: 3959.4, 300 sec: 4026.6). Total num frames: 2170880. Throughput: 0: 1005.3. Samples: 539922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:06:31,071][08012] Avg episode reward: [(0, '21.181')] +[2025-09-03 04:06:31,117][08370] Saving new best policy, reward=21.181! +[2025-09-03 04:06:32,106][08383] Updated weights for policy 0, policy_version 532 (0.0017) +[2025-09-03 04:06:36,072][08012] Fps is (10 sec: 4095.1, 60 sec: 4027.6, 300 sec: 4040.4). Total num frames: 2191360. Throughput: 0: 1000.0. Samples: 545864. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:06:36,073][08012] Avg episode reward: [(0, '21.039')] +[2025-09-03 04:06:41,070][08012] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2211840. Throughput: 0: 1010.1. Samples: 551662. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:06:41,073][08012] Avg episode reward: [(0, '20.136')] +[2025-09-03 04:06:42,498][08383] Updated weights for policy 0, policy_version 542 (0.0021) +[2025-09-03 04:06:46,070][08012] Fps is (10 sec: 4096.9, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2232320. Throughput: 0: 1012.5. Samples: 555222. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) +[2025-09-03 04:06:46,074][08012] Avg episode reward: [(0, '20.819')] +[2025-09-03 04:06:51,071][08012] Fps is (10 sec: 4095.5, 60 sec: 4027.7, 300 sec: 4040.4). Total num frames: 2252800. Throughput: 0: 1009.1. Samples: 561152. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:06:51,072][08012] Avg episode reward: [(0, '19.705')] +[2025-09-03 04:06:53,290][08383] Updated weights for policy 0, policy_version 552 (0.0019) +[2025-09-03 04:06:56,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4026.6). Total num frames: 2273280. Throughput: 0: 1012.7. Samples: 566938. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:06:56,071][08012] Avg episode reward: [(0, '19.123')] +[2025-09-03 04:06:56,076][08370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000555_2273280.pth... +[2025-09-03 04:06:56,189][08370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000318_1302528.pth +[2025-09-03 04:07:01,070][08012] Fps is (10 sec: 4096.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2293760. Throughput: 0: 1012.1. Samples: 570370. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:07:01,071][08012] Avg episode reward: [(0, '18.863')] +[2025-09-03 04:07:02,120][08383] Updated weights for policy 0, policy_version 562 (0.0012) +[2025-09-03 04:07:06,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 4040.5). Total num frames: 2314240. Throughput: 0: 1012.0. Samples: 576298. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 04:07:06,071][08012] Avg episode reward: [(0, '18.773')] +[2025-09-03 04:07:11,069][08012] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4012.7). Total num frames: 2330624. Throughput: 0: 1019.2. Samples: 582268. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:07:11,071][08012] Avg episode reward: [(0, '17.765')] +[2025-09-03 04:07:12,668][08383] Updated weights for policy 0, policy_version 572 (0.0024) +[2025-09-03 04:07:16,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2355200. Throughput: 0: 1019.7. Samples: 585810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:07:16,071][08012] Avg episode reward: [(0, '17.788')] +[2025-09-03 04:07:21,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2375680. Throughput: 0: 1017.5. Samples: 591650. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:07:21,075][08012] Avg episode reward: [(0, '19.045')] +[2025-09-03 04:07:23,379][08383] Updated weights for policy 0, policy_version 582 (0.0011) +[2025-09-03 04:07:26,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2392064. Throughput: 0: 1020.6. Samples: 597590. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:07:26,071][08012] Avg episode reward: [(0, '18.604')] +[2025-09-03 04:07:31,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2416640. Throughput: 0: 1018.7. Samples: 601064. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 04:07:31,072][08012] Avg episode reward: [(0, '19.754')] +[2025-09-03 04:07:32,155][08383] Updated weights for policy 0, policy_version 592 (0.0018) +[2025-09-03 04:07:36,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4026.6). Total num frames: 2433024. Throughput: 0: 1013.8. Samples: 606772. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 04:07:36,072][08012] Avg episode reward: [(0, '20.030')] +[2025-09-03 04:07:41,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2457600. Throughput: 0: 1019.9. Samples: 612834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 04:07:41,073][08012] Avg episode reward: [(0, '19.948')] +[2025-09-03 04:07:42,801][08383] Updated weights for policy 0, policy_version 602 (0.0012) +[2025-09-03 04:07:46,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2478080. Throughput: 0: 1022.3. Samples: 616372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 04:07:46,073][08012] Avg episode reward: [(0, '20.361')] +[2025-09-03 04:07:51,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 2494464. Throughput: 0: 1016.3. Samples: 622030. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:07:51,073][08012] Avg episode reward: [(0, '20.919')] +[2025-09-03 04:07:53,368][08383] Updated weights for policy 0, policy_version 612 (0.0017) +[2025-09-03 04:07:56,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2514944. Throughput: 0: 1018.9. Samples: 628120. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:07:56,074][08012] Avg episode reward: [(0, '21.021')] +[2025-09-03 04:08:01,070][08012] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2539520. Throughput: 0: 1016.3. Samples: 631546. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:08:01,074][08012] Avg episode reward: [(0, '21.712')] +[2025-09-03 04:08:01,078][08370] Saving new best policy, reward=21.712! +[2025-09-03 04:08:02,798][08383] Updated weights for policy 0, policy_version 622 (0.0016) +[2025-09-03 04:08:06,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2555904. Throughput: 0: 1007.9. Samples: 637004. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:08:06,071][08012] Avg episode reward: [(0, '22.916')] +[2025-09-03 04:08:06,082][08370] Saving new best policy, reward=22.916! +[2025-09-03 04:08:11,069][08012] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2576384. Throughput: 0: 1011.9. Samples: 643124. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:08:11,079][08012] Avg episode reward: [(0, '24.028')] +[2025-09-03 04:08:11,085][08370] Saving new best policy, reward=24.028! +[2025-09-03 04:08:13,338][08383] Updated weights for policy 0, policy_version 632 (0.0021) +[2025-09-03 04:08:16,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 2600960. Throughput: 0: 1010.3. Samples: 646526. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:08:16,075][08012] Avg episode reward: [(0, '23.285')] +[2025-09-03 04:08:21,070][08012] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 2613248. Throughput: 0: 1003.4. Samples: 651926. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:08:21,072][08012] Avg episode reward: [(0, '24.480')] +[2025-09-03 04:08:21,073][08370] Saving new best policy, reward=24.480! +[2025-09-03 04:08:24,147][08383] Updated weights for policy 0, policy_version 642 (0.0019) +[2025-09-03 04:08:26,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2637824. Throughput: 0: 1006.7. Samples: 658134. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:08:26,071][08012] Avg episode reward: [(0, '24.073')] +[2025-09-03 04:08:31,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2658304. Throughput: 0: 1002.9. Samples: 661504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:08:31,074][08012] Avg episode reward: [(0, '22.984')] +[2025-09-03 04:08:33,892][08383] Updated weights for policy 0, policy_version 652 (0.0017) +[2025-09-03 04:08:36,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2674688. Throughput: 0: 998.4. Samples: 666960. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:08:36,073][08012] Avg episode reward: [(0, '22.078')] +[2025-09-03 04:08:41,070][08012] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4026.6). Total num frames: 2695168. Throughput: 0: 1002.5. Samples: 673232. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:08:41,074][08012] Avg episode reward: [(0, '21.756')] +[2025-09-03 04:08:43,751][08383] Updated weights for policy 0, policy_version 662 (0.0012) +[2025-09-03 04:08:46,070][08012] Fps is (10 sec: 4505.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2719744. Throughput: 0: 1003.5. Samples: 676702. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:08:46,075][08012] Avg episode reward: [(0, '22.046')] +[2025-09-03 04:08:51,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2736128. Throughput: 0: 1001.6. Samples: 682076. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:08:51,073][08012] Avg episode reward: [(0, '21.526')] +[2025-09-03 04:08:54,672][08383] Updated weights for policy 0, policy_version 672 (0.0027) +[2025-09-03 04:08:56,070][08012] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 2756608. Throughput: 0: 1008.2. Samples: 688494. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:08:56,074][08012] Avg episode reward: [(0, '21.901')] +[2025-09-03 04:08:56,081][08370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000673_2756608.pth... +[2025-09-03 04:08:56,202][08370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000436_1785856.pth +[2025-09-03 04:09:01,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 2781184. Throughput: 0: 1006.0. Samples: 691794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:09:01,074][08012] Avg episode reward: [(0, '21.390')] +[2025-09-03 04:09:04,617][08383] Updated weights for policy 0, policy_version 682 (0.0033) +[2025-09-03 04:09:06,075][08012] Fps is (10 sec: 3684.4, 60 sec: 3959.1, 300 sec: 4026.5). Total num frames: 2793472. Throughput: 0: 1005.3. Samples: 697170. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:09:06,076][08012] Avg episode reward: [(0, '21.414')] +[2025-09-03 04:09:11,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2818048. Throughput: 0: 1012.1. Samples: 703680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:09:11,071][08012] Avg episode reward: [(0, '20.467')] +[2025-09-03 04:09:14,021][08383] Updated weights for policy 0, policy_version 692 (0.0012) +[2025-09-03 04:09:16,070][08012] Fps is (10 sec: 4917.8, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 2842624. Throughput: 0: 1013.5. Samples: 707114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:09:16,075][08012] Avg episode reward: [(0, '20.108')] +[2025-09-03 04:09:21,070][08012] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2854912. Throughput: 0: 1008.4. Samples: 712340. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:09:21,074][08012] Avg episode reward: [(0, '18.646')] +[2025-09-03 04:09:24,815][08383] Updated weights for policy 0, policy_version 702 (0.0020) +[2025-09-03 04:09:26,070][08012] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2879488. Throughput: 0: 1013.9. Samples: 718856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:09:26,076][08012] Avg episode reward: [(0, '19.922')] +[2025-09-03 04:09:31,070][08012] Fps is (10 sec: 4505.7, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2899968. Throughput: 0: 1009.9. Samples: 722148. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:09:31,075][08012] Avg episode reward: [(0, '18.935')] +[2025-09-03 04:09:35,365][08383] Updated weights for policy 0, policy_version 712 (0.0014) +[2025-09-03 04:09:36,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2916352. Throughput: 0: 1007.1. Samples: 727394. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:09:36,074][08012] Avg episode reward: [(0, '19.484')] +[2025-09-03 04:09:41,069][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 2940928. Throughput: 0: 1008.7. Samples: 733884. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:09:41,075][08012] Avg episode reward: [(0, '20.308')] +[2025-09-03 04:09:44,483][08383] Updated weights for policy 0, policy_version 722 (0.0017) +[2025-09-03 04:09:46,070][08012] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2961408. Throughput: 0: 1012.8. Samples: 737372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:09:46,074][08012] Avg episode reward: [(0, '21.129')] +[2025-09-03 04:09:51,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 2977792. Throughput: 0: 1010.1. Samples: 742618. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:09:51,074][08012] Avg episode reward: [(0, '20.183')] +[2025-09-03 04:09:55,081][08383] Updated weights for policy 0, policy_version 732 (0.0016) +[2025-09-03 04:09:56,070][08012] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3002368. Throughput: 0: 1011.8. Samples: 749210. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-09-03 04:09:56,071][08012] Avg episode reward: [(0, '20.977')] +[2025-09-03 04:10:01,071][08012] Fps is (10 sec: 4505.1, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 3022848. Throughput: 0: 1010.3. Samples: 752580. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:10:01,072][08012] Avg episode reward: [(0, '21.366')] +[2025-09-03 04:10:05,975][08383] Updated weights for policy 0, policy_version 742 (0.0011) +[2025-09-03 04:10:06,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4096.4, 300 sec: 4040.5). Total num frames: 3039232. Throughput: 0: 1007.2. Samples: 757664. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:10:06,071][08012] Avg episode reward: [(0, '22.096')] +[2025-09-03 04:10:11,070][08012] Fps is (10 sec: 3686.8, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3059712. Throughput: 0: 1012.5. Samples: 764418. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-09-03 04:10:11,071][08012] Avg episode reward: [(0, '23.588')] +[2025-09-03 04:10:14,661][08383] Updated weights for policy 0, policy_version 752 (0.0013) +[2025-09-03 04:10:16,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4054.3). Total num frames: 3084288. Throughput: 0: 1015.7. Samples: 767856. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:10:16,071][08012] Avg episode reward: [(0, '24.796')] +[2025-09-03 04:10:16,079][08370] Saving new best policy, reward=24.796! +[2025-09-03 04:10:21,069][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3100672. Throughput: 0: 1009.8. Samples: 772834. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:10:21,074][08012] Avg episode reward: [(0, '26.179')] +[2025-09-03 04:10:21,079][08370] Saving new best policy, reward=26.179! +[2025-09-03 04:10:25,571][08383] Updated weights for policy 0, policy_version 762 (0.0016) +[2025-09-03 04:10:26,069][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3121152. Throughput: 0: 1012.2. Samples: 779434. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:10:26,071][08012] Avg episode reward: [(0, '27.323')] +[2025-09-03 04:10:26,076][08370] Saving new best policy, reward=27.323! +[2025-09-03 04:10:31,072][08012] Fps is (10 sec: 4095.0, 60 sec: 4027.6, 300 sec: 4040.4). Total num frames: 3141632. Throughput: 0: 1008.7. Samples: 782766. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0) +[2025-09-03 04:10:31,082][08012] Avg episode reward: [(0, '27.332')] +[2025-09-03 04:10:31,085][08370] Saving new best policy, reward=27.332! +[2025-09-03 04:10:36,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3158016. Throughput: 0: 1001.6. Samples: 787688. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:10:36,074][08012] Avg episode reward: [(0, '26.676')] +[2025-09-03 04:10:36,513][08383] Updated weights for policy 0, policy_version 772 (0.0017) +[2025-09-03 04:10:41,069][08012] Fps is (10 sec: 4097.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3182592. Throughput: 0: 1007.5. Samples: 794546. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:10:41,075][08012] Avg episode reward: [(0, '27.142')] +[2025-09-03 04:10:45,179][08383] Updated weights for policy 0, policy_version 782 (0.0012) +[2025-09-03 04:10:46,070][08012] Fps is (10 sec: 4505.5, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3203072. Throughput: 0: 1011.0. Samples: 798076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:10:46,078][08012] Avg episode reward: [(0, '25.216')] +[2025-09-03 04:10:51,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3219456. Throughput: 0: 1003.3. Samples: 802814. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-09-03 04:10:51,072][08012] Avg episode reward: [(0, '23.652')] +[2025-09-03 04:10:55,784][08383] Updated weights for policy 0, policy_version 792 (0.0019) +[2025-09-03 04:10:56,070][08012] Fps is (10 sec: 4096.1, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3244032. Throughput: 0: 1008.4. Samples: 809796. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:10:56,072][08012] Avg episode reward: [(0, '21.309')] +[2025-09-03 04:10:56,087][08370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000792_3244032.pth... +[2025-09-03 04:10:56,215][08370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000555_2273280.pth +[2025-09-03 04:11:01,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4054.4). Total num frames: 3264512. Throughput: 0: 1004.7. Samples: 813068. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:11:01,072][08012] Avg episode reward: [(0, '21.366')] +[2025-09-03 04:11:06,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3280896. Throughput: 0: 1002.4. Samples: 817940. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:11:06,075][08012] Avg episode reward: [(0, '22.308')] +[2025-09-03 04:11:06,557][08383] Updated weights for policy 0, policy_version 802 (0.0029) +[2025-09-03 04:11:11,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3305472. Throughput: 0: 1011.5. Samples: 824950. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:11:11,071][08012] Avg episode reward: [(0, '23.254')] +[2025-09-03 04:11:15,864][08383] Updated weights for policy 0, policy_version 812 (0.0022) +[2025-09-03 04:11:16,071][08012] Fps is (10 sec: 4504.8, 60 sec: 4027.6, 300 sec: 4054.3). Total num frames: 3325952. Throughput: 0: 1016.4. Samples: 828504. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:11:16,079][08012] Avg episode reward: [(0, '24.191')] +[2025-09-03 04:11:21,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3342336. Throughput: 0: 1014.8. Samples: 833356. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:11:21,071][08012] Avg episode reward: [(0, '25.082')] +[2025-09-03 04:11:26,011][08383] Updated weights for policy 0, policy_version 822 (0.0021) +[2025-09-03 04:11:26,070][08012] Fps is (10 sec: 4096.7, 60 sec: 4096.0, 300 sec: 4054.4). Total num frames: 3366912. Throughput: 0: 1015.6. Samples: 840248. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:11:26,071][08012] Avg episode reward: [(0, '26.075')] +[2025-09-03 04:11:31,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4040.5). Total num frames: 3383296. Throughput: 0: 1012.2. Samples: 843626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:11:31,071][08012] Avg episode reward: [(0, '25.644')] +[2025-09-03 04:11:36,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3403776. Throughput: 0: 1014.4. Samples: 848464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:11:36,071][08012] Avg episode reward: [(0, '24.963')] +[2025-09-03 04:11:36,829][08383] Updated weights for policy 0, policy_version 832 (0.0029) +[2025-09-03 04:11:41,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3424256. Throughput: 0: 1017.0. Samples: 855560. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:11:41,071][08012] Avg episode reward: [(0, '24.689')] +[2025-09-03 04:11:46,070][08012] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3444736. Throughput: 0: 1018.8. Samples: 858914. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:11:46,078][08012] Avg episode reward: [(0, '25.190')] +[2025-09-03 04:11:46,614][08383] Updated weights for policy 0, policy_version 842 (0.0013) +[2025-09-03 04:11:51,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3465216. Throughput: 0: 1023.1. Samples: 863980. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:11:51,073][08012] Avg episode reward: [(0, '24.866')] +[2025-09-03 04:11:56,017][08383] Updated weights for policy 0, policy_version 852 (0.0013) +[2025-09-03 04:11:56,070][08012] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3489792. Throughput: 0: 1023.9. Samples: 871026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:11:56,074][08012] Avg episode reward: [(0, '24.028')] +[2025-09-03 04:12:01,070][08012] Fps is (10 sec: 4095.8, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3506176. Throughput: 0: 1012.3. Samples: 874056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:12:01,071][08012] Avg episode reward: [(0, '24.319')] +[2025-09-03 04:12:06,070][08012] Fps is (10 sec: 3686.5, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3526656. Throughput: 0: 1019.4. Samples: 879228. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:12:06,074][08012] Avg episode reward: [(0, '25.139')] +[2025-09-03 04:12:06,855][08383] Updated weights for policy 0, policy_version 862 (0.0022) +[2025-09-03 04:12:11,069][08012] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3547136. Throughput: 0: 1022.3. Samples: 886250. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:12:11,074][08012] Avg episode reward: [(0, '22.239')] +[2025-09-03 04:12:16,072][08012] Fps is (10 sec: 4095.0, 60 sec: 4027.7, 300 sec: 4040.4). Total num frames: 3567616. Throughput: 0: 1012.6. Samples: 889196. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:12:16,073][08012] Avg episode reward: [(0, '22.774')] +[2025-09-03 04:12:17,279][08383] Updated weights for policy 0, policy_version 872 (0.0019) +[2025-09-03 04:12:21,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3588096. Throughput: 0: 1024.0. Samples: 894546. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:12:21,075][08012] Avg episode reward: [(0, '25.331')] +[2025-09-03 04:12:26,070][08012] Fps is (10 sec: 4097.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3608576. Throughput: 0: 1021.9. Samples: 901544. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:12:26,074][08012] Avg episode reward: [(0, '27.267')] +[2025-09-03 04:12:26,206][08383] Updated weights for policy 0, policy_version 882 (0.0016) +[2025-09-03 04:12:31,072][08012] Fps is (10 sec: 4095.0, 60 sec: 4095.8, 300 sec: 4054.3). Total num frames: 3629056. Throughput: 0: 1010.1. Samples: 904370. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:12:31,073][08012] Avg episode reward: [(0, '27.768')] +[2025-09-03 04:12:31,074][08370] Saving new best policy, reward=27.768! +[2025-09-03 04:12:36,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4026.6). Total num frames: 3645440. Throughput: 0: 1016.0. Samples: 909700. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) +[2025-09-03 04:12:36,079][08012] Avg episode reward: [(0, '29.189')] +[2025-09-03 04:12:36,162][08370] Saving new best policy, reward=29.189! +[2025-09-03 04:12:37,112][08383] Updated weights for policy 0, policy_version 892 (0.0023) +[2025-09-03 04:12:41,069][08012] Fps is (10 sec: 4097.0, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3670016. Throughput: 0: 1012.1. Samples: 916570. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:12:41,071][08012] Avg episode reward: [(0, '28.383')] +[2025-09-03 04:12:46,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3686400. Throughput: 0: 1006.2. Samples: 919334. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:12:46,073][08012] Avg episode reward: [(0, '26.137')] +[2025-09-03 04:12:47,741][08383] Updated weights for policy 0, policy_version 902 (0.0030) +[2025-09-03 04:12:51,070][08012] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3706880. Throughput: 0: 1016.5. Samples: 924972. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:12:51,073][08012] Avg episode reward: [(0, '26.076')] +[2025-09-03 04:12:56,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 3731456. Throughput: 0: 1014.2. Samples: 931888. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:12:56,071][08012] Avg episode reward: [(0, '25.412')] +[2025-09-03 04:12:56,080][08370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000911_3731456.pth... +[2025-09-03 04:12:56,200][08370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000673_2756608.pth +[2025-09-03 04:12:56,603][08383] Updated weights for policy 0, policy_version 912 (0.0015) +[2025-09-03 04:13:01,070][08012] Fps is (10 sec: 4096.1, 60 sec: 4027.8, 300 sec: 4040.5). Total num frames: 3747840. Throughput: 0: 1004.6. Samples: 934402. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:13:01,077][08012] Avg episode reward: [(0, '25.667')] +[2025-09-03 04:13:06,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3768320. Throughput: 0: 1011.6. Samples: 940070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:13:06,071][08012] Avg episode reward: [(0, '25.869')] +[2025-09-03 04:13:07,336][08383] Updated weights for policy 0, policy_version 922 (0.0013) +[2025-09-03 04:13:11,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4040.5). Total num frames: 3792896. Throughput: 0: 1010.4. Samples: 947010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-09-03 04:13:11,074][08012] Avg episode reward: [(0, '28.623')] +[2025-09-03 04:13:16,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4027.9, 300 sec: 4054.3). Total num frames: 3809280. Throughput: 0: 1003.8. Samples: 949538. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) +[2025-09-03 04:13:16,071][08012] Avg episode reward: [(0, '28.061')] +[2025-09-03 04:13:17,935][08383] Updated weights for policy 0, policy_version 932 (0.0013) +[2025-09-03 04:13:21,072][08012] Fps is (10 sec: 3685.7, 60 sec: 4027.6, 300 sec: 4040.4). Total num frames: 3829760. Throughput: 0: 1012.3. Samples: 955256. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) +[2025-09-03 04:13:21,077][08012] Avg episode reward: [(0, '27.655')] +[2025-09-03 04:13:26,070][08012] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3854336. Throughput: 0: 1014.5. Samples: 962224. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:13:26,071][08012] Avg episode reward: [(0, '27.584')] +[2025-09-03 04:13:26,841][08383] Updated weights for policy 0, policy_version 942 (0.0012) +[2025-09-03 04:13:31,070][08012] Fps is (10 sec: 3687.1, 60 sec: 3959.6, 300 sec: 4040.5). Total num frames: 3866624. Throughput: 0: 1006.1. Samples: 964610. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) +[2025-09-03 04:13:31,077][08012] Avg episode reward: [(0, '26.689')] +[2025-09-03 04:13:36,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3891200. Throughput: 0: 1012.3. Samples: 970524. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) +[2025-09-03 04:13:36,072][08012] Avg episode reward: [(0, '24.730')] +[2025-09-03 04:13:37,524][08383] Updated weights for policy 0, policy_version 952 (0.0013) +[2025-09-03 04:13:41,069][08012] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3911680. Throughput: 0: 1012.2. Samples: 977438. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:13:41,071][08012] Avg episode reward: [(0, '25.412')] +[2025-09-03 04:13:46,070][08012] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4040.5). Total num frames: 3928064. Throughput: 0: 1007.2. Samples: 979728. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:13:46,071][08012] Avg episode reward: [(0, '25.922')] +[2025-09-03 04:13:48,057][08383] Updated weights for policy 0, policy_version 962 (0.0012) +[2025-09-03 04:13:51,070][08012] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4054.3). Total num frames: 3952640. Throughput: 0: 1018.7. Samples: 985912. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:13:51,071][08012] Avg episode reward: [(0, '24.725')] +[2025-09-03 04:13:56,072][08012] Fps is (10 sec: 4914.1, 60 sec: 4095.8, 300 sec: 4054.3). Total num frames: 3977216. Throughput: 0: 1022.5. Samples: 993026. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) +[2025-09-03 04:13:56,073][08012] Avg episode reward: [(0, '25.564')] +[2025-09-03 04:13:57,215][08383] Updated weights for policy 0, policy_version 972 (0.0020) +[2025-09-03 04:14:01,070][08012] Fps is (10 sec: 3686.3, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 3989504. Throughput: 0: 1012.8. Samples: 995114. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) +[2025-09-03 04:14:01,075][08012] Avg episode reward: [(0, '27.061')] +[2025-09-03 04:14:04,075][08370] Stopping Batcher_0... +[2025-09-03 04:14:04,076][08370] Loop batcher_evt_loop terminating... +[2025-09-03 04:14:04,076][08370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-09-03 04:14:04,075][08012] Component Batcher_0 stopped! +[2025-09-03 04:14:04,136][08383] Weights refcount: 2 0 +[2025-09-03 04:14:04,140][08012] Component InferenceWorker_p0-w0 stopped! +[2025-09-03 04:14:04,142][08383] Stopping InferenceWorker_p0-w0... +[2025-09-03 04:14:04,143][08383] Loop inference_proc0-0_evt_loop terminating... +[2025-09-03 04:14:04,193][08370] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000792_3244032.pth +[2025-09-03 04:14:04,202][08370] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-09-03 04:14:04,374][08370] Stopping LearnerWorker_p0... +[2025-09-03 04:14:04,376][08370] Loop learner_proc0_evt_loop terminating... +[2025-09-03 04:14:04,374][08012] Component LearnerWorker_p0 stopped! +[2025-09-03 04:14:04,484][08012] Component RolloutWorker_w6 stopped! +[2025-09-03 04:14:04,488][08390] Stopping RolloutWorker_w6... +[2025-09-03 04:14:04,493][08390] Loop rollout_proc6_evt_loop terminating... +[2025-09-03 04:14:04,495][08012] Component RolloutWorker_w2 stopped! +[2025-09-03 04:14:04,498][08386] Stopping RolloutWorker_w2... +[2025-09-03 04:14:04,501][08386] Loop rollout_proc2_evt_loop terminating... +[2025-09-03 04:14:04,511][08012] Component RolloutWorker_w4 stopped! +[2025-09-03 04:14:04,516][08388] Stopping RolloutWorker_w4... +[2025-09-03 04:14:04,519][08388] Loop rollout_proc4_evt_loop terminating... +[2025-09-03 04:14:04,533][08012] Component RolloutWorker_w0 stopped! +[2025-09-03 04:14:04,536][08385] Stopping RolloutWorker_w0... +[2025-09-03 04:14:04,546][08385] Loop rollout_proc0_evt_loop terminating... +[2025-09-03 04:14:04,594][08391] Stopping RolloutWorker_w7... +[2025-09-03 04:14:04,594][08012] Component RolloutWorker_w7 stopped! +[2025-09-03 04:14:04,609][08384] Stopping RolloutWorker_w1... +[2025-09-03 04:14:04,611][08387] Stopping RolloutWorker_w3... +[2025-09-03 04:14:04,609][08012] Component RolloutWorker_w1 stopped! +[2025-09-03 04:14:04,596][08391] Loop rollout_proc7_evt_loop terminating... +[2025-09-03 04:14:04,613][08012] Component RolloutWorker_w3 stopped! +[2025-09-03 04:14:04,610][08384] Loop rollout_proc1_evt_loop terminating... +[2025-09-03 04:14:04,614][08387] Loop rollout_proc3_evt_loop terminating... +[2025-09-03 04:14:04,642][08389] Stopping RolloutWorker_w5... +[2025-09-03 04:14:04,642][08012] Component RolloutWorker_w5 stopped! +[2025-09-03 04:14:04,644][08389] Loop rollout_proc5_evt_loop terminating... +[2025-09-03 04:14:04,645][08012] Waiting for process learner_proc0 to stop... +[2025-09-03 04:14:05,935][08012] Waiting for process inference_proc0-0 to join... +[2025-09-03 04:14:05,936][08012] Waiting for process rollout_proc0 to join... +[2025-09-03 04:14:08,106][08012] Waiting for process rollout_proc1 to join... +[2025-09-03 04:14:08,107][08012] Waiting for process rollout_proc2 to join... +[2025-09-03 04:14:08,110][08012] Waiting for process rollout_proc3 to join... +[2025-09-03 04:14:08,111][08012] Waiting for process rollout_proc4 to join... +[2025-09-03 04:14:08,113][08012] Waiting for process rollout_proc5 to join... +[2025-09-03 04:14:08,115][08012] Waiting for process rollout_proc6 to join... +[2025-09-03 04:14:08,116][08012] Waiting for process rollout_proc7 to join... +[2025-09-03 04:14:08,117][08012] Batcher 0 profile tree view: +batching: 25.2744, releasing_batches: 0.0246 +[2025-09-03 04:14:08,118][08012] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 428.3680 +update_model: 7.6109 + weight_update: 0.0013 +one_step: 0.0025 + handle_policy_step: 530.9077 + deserialize: 14.1769, stack: 2.9271, obs_to_device_normalize: 113.2590, forward: 270.5264, send_messages: 27.4868 + prepare_outputs: 79.3072 + to_cpu: 48.6280 +[2025-09-03 04:14:08,121][08012] Learner 0 profile tree view: +misc: 0.0045, prepare_batch: 12.2754 +train: 72.3191 + epoch_init: 0.0123, minibatch_init: 0.0062, losses_postprocess: 0.6619, kl_divergence: 0.6244, after_optimizer: 3.4840 + calculate_losses: 24.8392 + losses_init: 0.0031, forward_head: 1.4472, bptt_initial: 16.5348, tail: 0.9599, advantages_returns: 0.3223, losses: 3.4398 + bptt: 1.8667 + bptt_forward_core: 1.7779 + update: 42.1555 + clip: 0.8646 +[2025-09-03 04:14:08,122][08012] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.3318, enqueue_policy_requests: 110.3242, env_step: 783.7517, overhead: 12.6053, complete_rollouts: 7.2796 +save_policy_outputs: 17.9042 + split_output_tensors: 7.2233 +[2025-09-03 04:14:08,123][08012] RolloutWorker_w7 profile tree view: +wait_for_trajectories: 0.2872, enqueue_policy_requests: 115.5911, env_step: 777.2872, overhead: 12.7933, complete_rollouts: 6.6093 +save_policy_outputs: 18.3735 + split_output_tensors: 7.4541 +[2025-09-03 04:14:08,124][08012] Loop Runner_EvtLoop terminating... +[2025-09-03 04:14:08,126][08012] Runner profile tree view: +main_loop: 1026.5682 +[2025-09-03 04:14:08,129][08012] Collected {0: 4005888}, FPS: 3894.2 +[2025-09-03 04:30:41,423][08012] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json +[2025-09-03 04:30:41,424][08012] Overriding arg 'num_workers' with value 1 passed from command line +[2025-09-03 04:30:41,425][08012] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-09-03 04:30:41,426][08012] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-09-03 04:30:41,427][08012] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-09-03 04:30:41,428][08012] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-09-03 04:30:41,428][08012] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-09-03 04:30:41,429][08012] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-09-03 04:30:41,430][08012] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-09-03 04:30:41,431][08012] Adding new argument 'hf_repository'='WangChongan/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-09-03 04:30:41,431][08012] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-09-03 04:30:41,434][08012] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-09-03 04:30:41,435][08012] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-09-03 04:30:41,436][08012] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-09-03 04:30:41,437][08012] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-09-03 04:30:41,464][08012] RunningMeanStd input shape: (3, 72, 128) +[2025-09-03 04:30:41,465][08012] RunningMeanStd input shape: (1,) +[2025-09-03 04:30:41,474][08012] ConvEncoder: input_channels=3 +[2025-09-03 04:30:41,505][08012] Conv encoder output size: 512 +[2025-09-03 04:30:41,505][08012] Policy head output size: 512 +[2025-09-03 04:30:41,525][08012] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-09-03 04:30:41,995][08012] Num frames 100... +[2025-09-03 04:30:42,116][08012] Num frames 200... +[2025-09-03 04:30:42,237][08012] Num frames 300... +[2025-09-03 04:30:42,357][08012] Num frames 400... +[2025-09-03 04:30:42,482][08012] Num frames 500... +[2025-09-03 04:30:42,604][08012] Num frames 600... +[2025-09-03 04:30:42,755][08012] Avg episode rewards: #0: 11.720, true rewards: #0: 6.720 +[2025-09-03 04:30:42,757][08012] Avg episode reward: 11.720, avg true_objective: 6.720 +[2025-09-03 04:30:42,792][08012] Num frames 700... +[2025-09-03 04:30:42,914][08012] Num frames 800... +[2025-09-03 04:30:43,035][08012] Num frames 900... +[2025-09-03 04:30:43,162][08012] Num frames 1000... +[2025-09-03 04:30:43,287][08012] Num frames 1100... +[2025-09-03 04:30:43,413][08012] Num frames 1200... +[2025-09-03 04:30:43,536][08012] Num frames 1300... +[2025-09-03 04:30:43,671][08012] Num frames 1400... +[2025-09-03 04:30:43,831][08012] Avg episode rewards: #0: 13.940, true rewards: #0: 7.440 +[2025-09-03 04:30:43,832][08012] Avg episode reward: 13.940, avg true_objective: 7.440 +[2025-09-03 04:30:43,849][08012] Num frames 1500... +[2025-09-03 04:30:43,969][08012] Num frames 1600... +[2025-09-03 04:30:44,092][08012] Num frames 1700... +[2025-09-03 04:30:44,213][08012] Num frames 1800... +[2025-09-03 04:30:44,336][08012] Num frames 1900... +[2025-09-03 04:30:44,514][08012] Num frames 2000... +[2025-09-03 04:30:44,695][08012] Num frames 2100... +[2025-09-03 04:30:44,873][08012] Num frames 2200... +[2025-09-03 04:30:45,045][08012] Num frames 2300... +[2025-09-03 04:30:45,220][08012] Num frames 2400... +[2025-09-03 04:30:45,411][08012] Avg episode rewards: #0: 15.933, true rewards: #0: 8.267 +[2025-09-03 04:30:45,412][08012] Avg episode reward: 15.933, avg true_objective: 8.267 +[2025-09-03 04:30:45,449][08012] Num frames 2500... +[2025-09-03 04:30:45,615][08012] Num frames 2600... +[2025-09-03 04:30:45,794][08012] Num frames 2700... +[2025-09-03 04:30:45,965][08012] Num frames 2800... +[2025-09-03 04:30:46,192][08012] Avg episode rewards: #0: 13.490, true rewards: #0: 7.240 +[2025-09-03 04:30:46,193][08012] Avg episode reward: 13.490, avg true_objective: 7.240 +[2025-09-03 04:30:46,203][08012] Num frames 2900... +[2025-09-03 04:30:46,401][08012] Num frames 3000... +[2025-09-03 04:30:46,543][08012] Num frames 3100... +[2025-09-03 04:30:46,664][08012] Num frames 3200... +[2025-09-03 04:30:46,791][08012] Num frames 3300... +[2025-09-03 04:30:46,918][08012] Num frames 3400... +[2025-09-03 04:30:47,042][08012] Num frames 3500... +[2025-09-03 04:30:47,163][08012] Num frames 3600... +[2025-09-03 04:30:47,289][08012] Num frames 3700... +[2025-09-03 04:30:47,409][08012] Num frames 3800... +[2025-09-03 04:30:47,533][08012] Num frames 3900... +[2025-09-03 04:30:47,655][08012] Num frames 4000... +[2025-09-03 04:30:47,778][08012] Num frames 4100... +[2025-09-03 04:30:47,949][08012] Avg episode rewards: #0: 17.374, true rewards: #0: 8.374 +[2025-09-03 04:30:47,950][08012] Avg episode reward: 17.374, avg true_objective: 8.374 +[2025-09-03 04:30:47,968][08012] Num frames 4200... +[2025-09-03 04:30:48,089][08012] Num frames 4300... +[2025-09-03 04:30:48,213][08012] Num frames 4400... +[2025-09-03 04:30:48,338][08012] Num frames 4500... +[2025-09-03 04:30:48,464][08012] Num frames 4600... +[2025-09-03 04:30:48,586][08012] Num frames 4700... +[2025-09-03 04:30:48,709][08012] Num frames 4800... +[2025-09-03 04:30:48,833][08012] Num frames 4900... +[2025-09-03 04:30:48,965][08012] Num frames 5000... +[2025-09-03 04:30:49,087][08012] Num frames 5100... +[2025-09-03 04:30:49,213][08012] Num frames 5200... +[2025-09-03 04:30:49,365][08012] Avg episode rewards: #0: 18.625, true rewards: #0: 8.792 +[2025-09-03 04:30:49,366][08012] Avg episode reward: 18.625, avg true_objective: 8.792 +[2025-09-03 04:30:49,401][08012] Num frames 5300... +[2025-09-03 04:30:49,528][08012] Num frames 5400... +[2025-09-03 04:30:49,650][08012] Num frames 5500... +[2025-09-03 04:30:49,773][08012] Num frames 5600... +[2025-09-03 04:30:49,906][08012] Num frames 5700... +[2025-09-03 04:30:50,028][08012] Num frames 5800... +[2025-09-03 04:30:50,147][08012] Num frames 5900... +[2025-09-03 04:30:50,219][08012] Avg episode rewards: #0: 17.592, true rewards: #0: 8.449 +[2025-09-03 04:30:50,220][08012] Avg episode reward: 17.592, avg true_objective: 8.449 +[2025-09-03 04:30:50,324][08012] Num frames 6000... +[2025-09-03 04:30:50,443][08012] Num frames 6100... +[2025-09-03 04:30:50,565][08012] Num frames 6200... +[2025-09-03 04:30:50,687][08012] Num frames 6300... +[2025-09-03 04:30:50,809][08012] Num frames 6400... +[2025-09-03 04:30:50,943][08012] Num frames 6500... +[2025-09-03 04:30:51,067][08012] Num frames 6600... +[2025-09-03 04:30:51,190][08012] Num frames 6700... +[2025-09-03 04:30:51,316][08012] Num frames 6800... +[2025-09-03 04:30:51,440][08012] Num frames 6900... +[2025-09-03 04:30:51,565][08012] Num frames 7000... +[2025-09-03 04:30:51,690][08012] Num frames 7100... +[2025-09-03 04:30:51,813][08012] Num frames 7200... +[2025-09-03 04:30:51,940][08012] Num frames 7300... +[2025-09-03 04:30:52,072][08012] Num frames 7400... +[2025-09-03 04:30:52,193][08012] Num frames 7500... +[2025-09-03 04:30:52,317][08012] Num frames 7600... +[2025-09-03 04:30:52,441][08012] Num frames 7700... +[2025-09-03 04:30:52,569][08012] Num frames 7800... +[2025-09-03 04:30:52,655][08012] Avg episode rewards: #0: 21.281, true rewards: #0: 9.781 +[2025-09-03 04:30:52,656][08012] Avg episode reward: 21.281, avg true_objective: 9.781 +[2025-09-03 04:30:52,748][08012] Num frames 7900... +[2025-09-03 04:30:52,868][08012] Num frames 8000... +[2025-09-03 04:30:52,999][08012] Num frames 8100... +[2025-09-03 04:30:53,124][08012] Num frames 8200... +[2025-09-03 04:30:53,247][08012] Num frames 8300... +[2025-09-03 04:30:53,369][08012] Num frames 8400... +[2025-09-03 04:30:53,494][08012] Num frames 8500... +[2025-09-03 04:30:53,617][08012] Num frames 8600... +[2025-09-03 04:30:53,741][08012] Num frames 8700... +[2025-09-03 04:30:53,863][08012] Num frames 8800... +[2025-09-03 04:30:53,979][08012] Avg episode rewards: #0: 21.610, true rewards: #0: 9.832 +[2025-09-03 04:30:53,979][08012] Avg episode reward: 21.610, avg true_objective: 9.832 +[2025-09-03 04:30:54,051][08012] Num frames 8900... +[2025-09-03 04:30:54,174][08012] Num frames 9000... +[2025-09-03 04:30:54,299][08012] Num frames 9100... +[2025-09-03 04:30:54,419][08012] Num frames 9200... +[2025-09-03 04:30:54,543][08012] Num frames 9300... +[2025-09-03 04:30:54,665][08012] Num frames 9400... +[2025-09-03 04:30:54,786][08012] Num frames 9500... +[2025-09-03 04:30:54,907][08012] Num frames 9600... +[2025-09-03 04:30:55,028][08012] Num frames 9700... +[2025-09-03 04:30:55,160][08012] Num frames 9800... +[2025-09-03 04:30:55,286][08012] Avg episode rewards: #0: 21.653, true rewards: #0: 9.853 +[2025-09-03 04:30:55,287][08012] Avg episode reward: 21.653, avg true_objective: 9.853 +[2025-09-03 04:31:51,345][08012] Replay video saved to /content/train_dir/default_experiment/replay.mp4!