AriYusa's picture
Upload folder using huggingface_hub
11701f2 verified
[2025-02-24 13:43:25,943][00421] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-02-24 13:43:25,944][00421] Rollout worker 0 uses device cpu
[2025-02-24 13:43:25,945][00421] Rollout worker 1 uses device cpu
[2025-02-24 13:43:25,946][00421] Rollout worker 2 uses device cpu
[2025-02-24 13:43:25,947][00421] Rollout worker 3 uses device cpu
[2025-02-24 13:43:25,948][00421] Rollout worker 4 uses device cpu
[2025-02-24 13:43:25,949][00421] Rollout worker 5 uses device cpu
[2025-02-24 13:43:25,950][00421] Rollout worker 6 uses device cpu
[2025-02-24 13:43:25,951][00421] Rollout worker 7 uses device cpu
[2025-02-24 13:43:26,099][00421] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-24 13:43:26,099][00421] InferenceWorker_p0-w0: min num requests: 2
[2025-02-24 13:43:26,131][00421] Starting all processes...
[2025-02-24 13:43:26,131][00421] Starting process learner_proc0
[2025-02-24 13:43:26,187][00421] Starting all processes...
[2025-02-24 13:43:26,200][00421] Starting process inference_proc0-0
[2025-02-24 13:43:26,201][00421] Starting process rollout_proc0
[2025-02-24 13:43:26,201][00421] Starting process rollout_proc1
[2025-02-24 13:43:26,202][00421] Starting process rollout_proc2
[2025-02-24 13:43:26,202][00421] Starting process rollout_proc3
[2025-02-24 13:43:26,202][00421] Starting process rollout_proc4
[2025-02-24 13:43:26,202][00421] Starting process rollout_proc5
[2025-02-24 13:43:26,202][00421] Starting process rollout_proc6
[2025-02-24 13:43:26,202][00421] Starting process rollout_proc7
[2025-02-24 13:43:40,861][02540] Worker 3 uses CPU cores [1]
[2025-02-24 13:43:41,309][02537] Worker 0 uses CPU cores [0]
[2025-02-24 13:43:41,331][02523] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-24 13:43:41,331][02523] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-02-24 13:43:41,396][02523] Num visible devices: 1
[2025-02-24 13:43:41,393][02536] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-24 13:43:41,398][02536] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-02-24 13:43:41,434][02523] Starting seed is not provided
[2025-02-24 13:43:41,435][02523] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-24 13:43:41,435][02523] Initializing actor-critic model on device cuda:0
[2025-02-24 13:43:41,436][02523] RunningMeanStd input shape: (3, 72, 128)
[2025-02-24 13:43:41,439][02523] RunningMeanStd input shape: (1,)
[2025-02-24 13:43:41,490][02541] Worker 4 uses CPU cores [0]
[2025-02-24 13:43:41,499][02536] Num visible devices: 1
[2025-02-24 13:43:41,502][02523] ConvEncoder: input_channels=3
[2025-02-24 13:43:41,574][02542] Worker 5 uses CPU cores [1]
[2025-02-24 13:43:41,568][02543] Worker 6 uses CPU cores [0]
[2025-02-24 13:43:41,612][02544] Worker 7 uses CPU cores [1]
[2025-02-24 13:43:41,664][02539] Worker 2 uses CPU cores [0]
[2025-02-24 13:43:41,709][02538] Worker 1 uses CPU cores [1]
[2025-02-24 13:43:41,825][02523] Conv encoder output size: 512
[2025-02-24 13:43:41,825][02523] Policy head output size: 512
[2025-02-24 13:43:41,879][02523] Created Actor Critic model with architecture:
[2025-02-24 13:43:41,879][02523] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-02-24 13:43:42,112][02523] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-02-24 13:43:46,093][00421] Heartbeat connected on Batcher_0
[2025-02-24 13:43:46,099][00421] Heartbeat connected on InferenceWorker_p0-w0
[2025-02-24 13:43:46,109][00421] Heartbeat connected on RolloutWorker_w0
[2025-02-24 13:43:46,111][00421] Heartbeat connected on RolloutWorker_w1
[2025-02-24 13:43:46,113][00421] Heartbeat connected on RolloutWorker_w2
[2025-02-24 13:43:46,116][00421] Heartbeat connected on RolloutWorker_w3
[2025-02-24 13:43:46,123][00421] Heartbeat connected on RolloutWorker_w5
[2025-02-24 13:43:46,124][00421] Heartbeat connected on RolloutWorker_w4
[2025-02-24 13:43:46,130][00421] Heartbeat connected on RolloutWorker_w6
[2025-02-24 13:43:46,132][00421] Heartbeat connected on RolloutWorker_w7
[2025-02-24 13:43:46,652][02523] No checkpoints found
[2025-02-24 13:43:46,652][02523] Did not load from checkpoint, starting from scratch!
[2025-02-24 13:43:46,652][02523] Initialized policy 0 weights for model version 0
[2025-02-24 13:43:46,655][02523] LearnerWorker_p0 finished initialization!
[2025-02-24 13:43:46,657][02523] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-24 13:43:46,655][00421] Heartbeat connected on LearnerWorker_p0
[2025-02-24 13:43:46,829][02536] RunningMeanStd input shape: (3, 72, 128)
[2025-02-24 13:43:46,830][02536] RunningMeanStd input shape: (1,)
[2025-02-24 13:43:46,842][02536] ConvEncoder: input_channels=3
[2025-02-24 13:43:46,943][02536] Conv encoder output size: 512
[2025-02-24 13:43:46,943][02536] Policy head output size: 512
[2025-02-24 13:43:46,980][00421] Inference worker 0-0 is ready!
[2025-02-24 13:43:46,984][00421] All inference workers are ready! Signal rollout workers to start!
[2025-02-24 13:43:47,216][02537] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-24 13:43:47,239][02542] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-24 13:43:47,265][02539] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-24 13:43:47,275][02544] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-24 13:43:47,282][02538] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-24 13:43:47,288][02543] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-24 13:43:47,323][02540] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-24 13:43:47,348][02541] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-24 13:43:48,841][02540] Decorrelating experience for 0 frames...
[2025-02-24 13:43:48,840][02543] Decorrelating experience for 0 frames...
[2025-02-24 13:43:48,846][02544] Decorrelating experience for 0 frames...
[2025-02-24 13:43:48,844][02537] Decorrelating experience for 0 frames...
[2025-02-24 13:43:49,568][02542] Decorrelating experience for 0 frames...
[2025-02-24 13:43:49,591][02544] Decorrelating experience for 32 frames...
[2025-02-24 13:43:49,610][02543] Decorrelating experience for 32 frames...
[2025-02-24 13:43:49,662][02541] Decorrelating experience for 0 frames...
[2025-02-24 13:43:50,461][02542] Decorrelating experience for 32 frames...
[2025-02-24 13:43:50,513][02540] Decorrelating experience for 32 frames...
[2025-02-24 13:43:50,813][02537] Decorrelating experience for 32 frames...
[2025-02-24 13:43:50,847][02541] Decorrelating experience for 32 frames...
[2025-02-24 13:43:51,018][00421] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-24 13:43:51,167][02543] Decorrelating experience for 64 frames...
[2025-02-24 13:43:51,633][02541] Decorrelating experience for 64 frames...
[2025-02-24 13:43:51,790][02542] Decorrelating experience for 64 frames...
[2025-02-24 13:43:51,863][02540] Decorrelating experience for 64 frames...
[2025-02-24 13:43:52,103][02544] Decorrelating experience for 64 frames...
[2025-02-24 13:43:52,478][02541] Decorrelating experience for 96 frames...
[2025-02-24 13:43:52,798][02542] Decorrelating experience for 96 frames...
[2025-02-24 13:43:52,865][02540] Decorrelating experience for 96 frames...
[2025-02-24 13:43:52,980][02543] Decorrelating experience for 96 frames...
[2025-02-24 13:43:53,231][02537] Decorrelating experience for 64 frames...
[2025-02-24 13:43:53,644][02537] Decorrelating experience for 96 frames...
[2025-02-24 13:43:53,705][02544] Decorrelating experience for 96 frames...
[2025-02-24 13:43:56,019][00421] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 391.9. Samples: 1960. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-24 13:43:56,024][00421] Avg episode reward: [(0, '1.923')]
[2025-02-24 13:43:56,931][02523] Signal inference workers to stop experience collection...
[2025-02-24 13:43:56,942][02536] InferenceWorker_p0-w0: stopping experience collection
[2025-02-24 13:43:59,025][02523] Signal inference workers to resume experience collection...
[2025-02-24 13:43:59,026][02536] InferenceWorker_p0-w0: resuming experience collection
[2025-02-24 13:44:01,018][00421] Fps is (10 sec: 1638.4, 60 sec: 1638.4, 300 sec: 1638.4). Total num frames: 16384. Throughput: 0: 244.0. Samples: 2440. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
[2025-02-24 13:44:01,022][00421] Avg episode reward: [(0, '3.368')]
[2025-02-24 13:44:06,018][00421] Fps is (10 sec: 3686.7, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 36864. Throughput: 0: 608.3. Samples: 9124. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:44:06,020][00421] Avg episode reward: [(0, '3.806')]
[2025-02-24 13:44:06,357][02536] Updated weights for policy 0, policy_version 10 (0.0016)
[2025-02-24 13:44:11,019][00421] Fps is (10 sec: 4095.7, 60 sec: 2867.1, 300 sec: 2867.1). Total num frames: 57344. Throughput: 0: 755.7. Samples: 15114. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:44:11,022][00421] Avg episode reward: [(0, '4.292')]
[2025-02-24 13:44:16,018][00421] Fps is (10 sec: 4096.0, 60 sec: 3113.0, 300 sec: 3113.0). Total num frames: 77824. Throughput: 0: 710.4. Samples: 17760. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-24 13:44:16,022][00421] Avg episode reward: [(0, '4.506')]
[2025-02-24 13:44:16,828][02536] Updated weights for policy 0, policy_version 20 (0.0018)
[2025-02-24 13:44:21,018][00421] Fps is (10 sec: 4096.3, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 98304. Throughput: 0: 799.9. Samples: 23998. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:44:21,020][00421] Avg episode reward: [(0, '4.599')]
[2025-02-24 13:44:26,024][00421] Fps is (10 sec: 3684.3, 60 sec: 3276.3, 300 sec: 3276.3). Total num frames: 114688. Throughput: 0: 842.9. Samples: 29508. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:44:26,026][00421] Avg episode reward: [(0, '4.429')]
[2025-02-24 13:44:26,032][02523] Saving new best policy, reward=4.429!
[2025-02-24 13:44:27,701][02536] Updated weights for policy 0, policy_version 30 (0.0019)
[2025-02-24 13:44:31,018][00421] Fps is (10 sec: 3686.4, 60 sec: 3379.2, 300 sec: 3379.2). Total num frames: 135168. Throughput: 0: 814.6. Samples: 32584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:44:31,023][00421] Avg episode reward: [(0, '4.442')]
[2025-02-24 13:44:31,032][02523] Saving new best policy, reward=4.442!
[2025-02-24 13:44:36,018][00421] Fps is (10 sec: 4508.2, 60 sec: 3549.9, 300 sec: 3549.9). Total num frames: 159744. Throughput: 0: 875.9. Samples: 39414. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:44:36,023][00421] Avg episode reward: [(0, '4.552')]
[2025-02-24 13:44:36,027][02523] Saving new best policy, reward=4.552!
[2025-02-24 13:44:36,694][02536] Updated weights for policy 0, policy_version 40 (0.0013)
[2025-02-24 13:44:41,021][00421] Fps is (10 sec: 3685.4, 60 sec: 3440.5, 300 sec: 3440.5). Total num frames: 172032. Throughput: 0: 944.7. Samples: 44472. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:44:41,025][00421] Avg episode reward: [(0, '4.426')]
[2025-02-24 13:44:46,018][00421] Fps is (10 sec: 3686.4, 60 sec: 3574.7, 300 sec: 3574.7). Total num frames: 196608. Throughput: 0: 1011.7. Samples: 47968. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:44:46,022][00421] Avg episode reward: [(0, '4.482')]
[2025-02-24 13:44:47,220][02536] Updated weights for policy 0, policy_version 50 (0.0027)
[2025-02-24 13:44:51,018][00421] Fps is (10 sec: 4916.5, 60 sec: 3686.4, 300 sec: 3686.4). Total num frames: 221184. Throughput: 0: 1015.0. Samples: 54798. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:44:51,020][00421] Avg episode reward: [(0, '4.630')]
[2025-02-24 13:44:51,027][02523] Saving new best policy, reward=4.630!
[2025-02-24 13:44:56,018][00421] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3654.9). Total num frames: 237568. Throughput: 0: 994.4. Samples: 59860. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:44:56,020][00421] Avg episode reward: [(0, '4.484')]
[2025-02-24 13:44:57,748][02536] Updated weights for policy 0, policy_version 60 (0.0013)
[2025-02-24 13:45:01,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 258048. Throughput: 0: 1012.3. Samples: 63314. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:45:01,023][00421] Avg episode reward: [(0, '4.342')]
[2025-02-24 13:45:06,019][00421] Fps is (10 sec: 4095.8, 60 sec: 4027.7, 300 sec: 3713.7). Total num frames: 278528. Throughput: 0: 1022.2. Samples: 69996. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:45:06,021][00421] Avg episode reward: [(0, '4.395')]
[2025-02-24 13:45:08,109][02536] Updated weights for policy 0, policy_version 70 (0.0019)
[2025-02-24 13:45:11,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3737.6). Total num frames: 299008. Throughput: 0: 1018.6. Samples: 75340. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:45:11,020][00421] Avg episode reward: [(0, '4.570')]
[2025-02-24 13:45:16,020][00421] Fps is (10 sec: 4095.5, 60 sec: 4027.6, 300 sec: 3758.6). Total num frames: 319488. Throughput: 0: 1025.3. Samples: 78726. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:45:16,021][00421] Avg episode reward: [(0, '4.555')]
[2025-02-24 13:45:17,100][02536] Updated weights for policy 0, policy_version 80 (0.0024)
[2025-02-24 13:45:21,019][00421] Fps is (10 sec: 4095.9, 60 sec: 4027.7, 300 sec: 3777.4). Total num frames: 339968. Throughput: 0: 1009.1. Samples: 84826. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:45:21,022][00421] Avg episode reward: [(0, '4.432')]
[2025-02-24 13:45:21,030][02523] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000083_339968.pth...
[2025-02-24 13:45:26,018][00421] Fps is (10 sec: 4096.7, 60 sec: 4096.4, 300 sec: 3794.2). Total num frames: 360448. Throughput: 0: 1025.0. Samples: 90596. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:45:26,022][00421] Avg episode reward: [(0, '4.382')]
[2025-02-24 13:45:27,821][02536] Updated weights for policy 0, policy_version 90 (0.0018)
[2025-02-24 13:45:31,018][00421] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3809.3). Total num frames: 380928. Throughput: 0: 1022.4. Samples: 93976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:45:31,022][00421] Avg episode reward: [(0, '4.278')]
[2025-02-24 13:45:36,019][00421] Fps is (10 sec: 3686.3, 60 sec: 3959.5, 300 sec: 3783.9). Total num frames: 397312. Throughput: 0: 997.6. Samples: 99690. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:45:36,024][00421] Avg episode reward: [(0, '4.379')]
[2025-02-24 13:45:38,076][02536] Updated weights for policy 0, policy_version 100 (0.0017)
[2025-02-24 13:45:41,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.5, 300 sec: 3835.3). Total num frames: 421888. Throughput: 0: 1028.3. Samples: 106134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:45:41,023][00421] Avg episode reward: [(0, '4.549')]
[2025-02-24 13:45:46,018][00421] Fps is (10 sec: 4915.3, 60 sec: 4164.3, 300 sec: 3882.3). Total num frames: 446464. Throughput: 0: 1029.0. Samples: 109618. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:45:46,023][00421] Avg episode reward: [(0, '4.443')]
[2025-02-24 13:45:47,479][02536] Updated weights for policy 0, policy_version 110 (0.0018)
[2025-02-24 13:45:51,018][00421] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3822.9). Total num frames: 458752. Throughput: 0: 995.8. Samples: 114808. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:45:51,022][00421] Avg episode reward: [(0, '4.491')]
[2025-02-24 13:45:56,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3866.6). Total num frames: 483328. Throughput: 0: 1025.5. Samples: 121488. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:45:56,022][00421] Avg episode reward: [(0, '4.738')]
[2025-02-24 13:45:56,025][02523] Saving new best policy, reward=4.738!
[2025-02-24 13:45:57,782][02536] Updated weights for policy 0, policy_version 120 (0.0017)
[2025-02-24 13:46:01,020][00421] Fps is (10 sec: 4504.8, 60 sec: 4095.9, 300 sec: 3875.4). Total num frames: 503808. Throughput: 0: 1026.6. Samples: 124922. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-24 13:46:01,022][00421] Avg episode reward: [(0, '4.733')]
[2025-02-24 13:46:06,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3853.3). Total num frames: 520192. Throughput: 0: 1006.5. Samples: 130120. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:46:06,021][00421] Avg episode reward: [(0, '4.800')]
[2025-02-24 13:46:06,023][02523] Saving new best policy, reward=4.800!
[2025-02-24 13:46:07,909][02536] Updated weights for policy 0, policy_version 130 (0.0026)
[2025-02-24 13:46:11,018][00421] Fps is (10 sec: 4096.7, 60 sec: 4096.0, 300 sec: 3891.2). Total num frames: 544768. Throughput: 0: 1031.8. Samples: 137026. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:46:11,022][00421] Avg episode reward: [(0, '4.877')]
[2025-02-24 13:46:11,027][02523] Saving new best policy, reward=4.877!
[2025-02-24 13:46:16,020][00421] Fps is (10 sec: 4505.0, 60 sec: 4096.0, 300 sec: 3898.2). Total num frames: 565248. Throughput: 0: 1029.8. Samples: 140318. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:46:16,021][00421] Avg episode reward: [(0, '4.727')]
[2025-02-24 13:46:18,541][02536] Updated weights for policy 0, policy_version 140 (0.0017)
[2025-02-24 13:46:21,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3877.5). Total num frames: 581632. Throughput: 0: 1017.9. Samples: 145494. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:46:21,019][00421] Avg episode reward: [(0, '5.015')]
[2025-02-24 13:46:21,028][02523] Saving new best policy, reward=5.015!
[2025-02-24 13:46:26,018][00421] Fps is (10 sec: 4096.6, 60 sec: 4096.0, 300 sec: 3911.0). Total num frames: 606208. Throughput: 0: 1026.3. Samples: 152316. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:46:26,023][00421] Avg episode reward: [(0, '5.132')]
[2025-02-24 13:46:26,026][02523] Saving new best policy, reward=5.132!
[2025-02-24 13:46:27,563][02536] Updated weights for policy 0, policy_version 150 (0.0019)
[2025-02-24 13:46:31,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3891.2). Total num frames: 622592. Throughput: 0: 1009.5. Samples: 155044. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:46:31,020][00421] Avg episode reward: [(0, '5.152')]
[2025-02-24 13:46:31,025][02523] Saving new best policy, reward=5.152!
[2025-02-24 13:46:36,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 3897.4). Total num frames: 643072. Throughput: 0: 1020.4. Samples: 160728. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:46:36,020][00421] Avg episode reward: [(0, '5.135')]
[2025-02-24 13:46:37,977][02536] Updated weights for policy 0, policy_version 160 (0.0014)
[2025-02-24 13:46:41,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3927.3). Total num frames: 667648. Throughput: 0: 1027.4. Samples: 167720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:46:41,023][00421] Avg episode reward: [(0, '5.200')]
[2025-02-24 13:46:41,033][02523] Saving new best policy, reward=5.200!
[2025-02-24 13:46:46,018][00421] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3908.8). Total num frames: 684032. Throughput: 0: 1000.8. Samples: 169956. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:46:46,020][00421] Avg episode reward: [(0, '5.310')]
[2025-02-24 13:46:46,024][02523] Saving new best policy, reward=5.310!
[2025-02-24 13:46:48,462][02536] Updated weights for policy 0, policy_version 170 (0.0025)
[2025-02-24 13:46:51,023][00421] Fps is (10 sec: 3684.6, 60 sec: 4095.7, 300 sec: 3913.8). Total num frames: 704512. Throughput: 0: 1022.4. Samples: 176134. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:46:51,025][00421] Avg episode reward: [(0, '5.415')]
[2025-02-24 13:46:51,036][02523] Saving new best policy, reward=5.415!
[2025-02-24 13:46:56,020][00421] Fps is (10 sec: 4095.3, 60 sec: 4027.6, 300 sec: 3918.8). Total num frames: 724992. Throughput: 0: 1010.1. Samples: 182480. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:46:56,022][00421] Avg episode reward: [(0, '5.329')]
[2025-02-24 13:46:59,273][02536] Updated weights for policy 0, policy_version 180 (0.0012)
[2025-02-24 13:47:01,018][00421] Fps is (10 sec: 3688.2, 60 sec: 3959.6, 300 sec: 3902.0). Total num frames: 741376. Throughput: 0: 983.6. Samples: 184580. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:47:01,022][00421] Avg episode reward: [(0, '5.002')]
[2025-02-24 13:47:06,018][00421] Fps is (10 sec: 4096.7, 60 sec: 4096.0, 300 sec: 3928.0). Total num frames: 765952. Throughput: 0: 1023.2. Samples: 191540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-24 13:47:06,019][00421] Avg episode reward: [(0, '4.970')]
[2025-02-24 13:47:07,993][02536] Updated weights for policy 0, policy_version 190 (0.0019)
[2025-02-24 13:47:11,020][00421] Fps is (10 sec: 4505.0, 60 sec: 4027.6, 300 sec: 3932.1). Total num frames: 786432. Throughput: 0: 1005.2. Samples: 197552. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:47:11,023][00421] Avg episode reward: [(0, '5.204')]
[2025-02-24 13:47:16,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3936.2). Total num frames: 806912. Throughput: 0: 1000.7. Samples: 200076. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:47:16,022][00421] Avg episode reward: [(0, '5.542')]
[2025-02-24 13:47:16,028][02523] Saving new best policy, reward=5.542!
[2025-02-24 13:47:18,493][02536] Updated weights for policy 0, policy_version 200 (0.0014)
[2025-02-24 13:47:21,021][00421] Fps is (10 sec: 4095.4, 60 sec: 4095.8, 300 sec: 3939.9). Total num frames: 827392. Throughput: 0: 1025.7. Samples: 206886. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:47:21,022][00421] Avg episode reward: [(0, '5.758')]
[2025-02-24 13:47:21,035][02523] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000202_827392.pth...
[2025-02-24 13:47:21,169][02523] Saving new best policy, reward=5.758!
[2025-02-24 13:47:26,019][00421] Fps is (10 sec: 3686.0, 60 sec: 3959.4, 300 sec: 3924.5). Total num frames: 843776. Throughput: 0: 989.7. Samples: 212258. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:47:26,021][00421] Avg episode reward: [(0, '5.882')]
[2025-02-24 13:47:26,026][02523] Saving new best policy, reward=5.882!
[2025-02-24 13:47:29,260][02536] Updated weights for policy 0, policy_version 210 (0.0028)
[2025-02-24 13:47:31,019][00421] Fps is (10 sec: 4097.0, 60 sec: 4096.0, 300 sec: 3947.1). Total num frames: 868352. Throughput: 0: 1006.3. Samples: 215240. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:47:31,023][00421] Avg episode reward: [(0, '6.281')]
[2025-02-24 13:47:31,030][02523] Saving new best policy, reward=6.281!
[2025-02-24 13:47:36,019][00421] Fps is (10 sec: 4506.0, 60 sec: 4096.0, 300 sec: 3950.4). Total num frames: 888832. Throughput: 0: 1022.7. Samples: 222150. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:47:36,023][00421] Avg episode reward: [(0, '6.162')]
[2025-02-24 13:47:38,611][02536] Updated weights for policy 0, policy_version 220 (0.0026)
[2025-02-24 13:47:41,018][00421] Fps is (10 sec: 3686.5, 60 sec: 3959.5, 300 sec: 3935.7). Total num frames: 905216. Throughput: 0: 997.0. Samples: 227342. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:47:41,019][00421] Avg episode reward: [(0, '5.858')]
[2025-02-24 13:47:46,018][00421] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 3956.6). Total num frames: 929792. Throughput: 0: 1027.8. Samples: 230830. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:47:46,019][00421] Avg episode reward: [(0, '5.986')]
[2025-02-24 13:47:48,446][02536] Updated weights for policy 0, policy_version 230 (0.0015)
[2025-02-24 13:47:51,024][00421] Fps is (10 sec: 4503.0, 60 sec: 4095.9, 300 sec: 3959.4). Total num frames: 950272. Throughput: 0: 1027.3. Samples: 237776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:47:51,025][00421] Avg episode reward: [(0, '6.463')]
[2025-02-24 13:47:51,033][02523] Saving new best policy, reward=6.463!
[2025-02-24 13:47:56,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3945.5). Total num frames: 966656. Throughput: 0: 1004.7. Samples: 242760. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:47:56,023][00421] Avg episode reward: [(0, '6.701')]
[2025-02-24 13:47:56,024][02523] Saving new best policy, reward=6.701!
[2025-02-24 13:47:58,993][02536] Updated weights for policy 0, policy_version 240 (0.0018)
[2025-02-24 13:48:01,018][00421] Fps is (10 sec: 4098.3, 60 sec: 4164.3, 300 sec: 3964.9). Total num frames: 991232. Throughput: 0: 1023.6. Samples: 246138. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:48:01,023][00421] Avg episode reward: [(0, '6.829')]
[2025-02-24 13:48:01,029][02523] Saving new best policy, reward=6.829!
[2025-02-24 13:48:06,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 3967.5). Total num frames: 1011712. Throughput: 0: 1026.1. Samples: 253056. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:48:06,020][00421] Avg episode reward: [(0, '7.084')]
[2025-02-24 13:48:06,023][02523] Saving new best policy, reward=7.084!
[2025-02-24 13:48:09,334][02536] Updated weights for policy 0, policy_version 250 (0.0019)
[2025-02-24 13:48:11,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 3954.2). Total num frames: 1028096. Throughput: 0: 1024.9. Samples: 258376. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:48:11,023][00421] Avg episode reward: [(0, '7.372')]
[2025-02-24 13:48:11,070][02523] Saving new best policy, reward=7.372!
[2025-02-24 13:48:16,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3972.3). Total num frames: 1052672. Throughput: 0: 1033.4. Samples: 261744. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-24 13:48:16,020][00421] Avg episode reward: [(0, '7.619')]
[2025-02-24 13:48:16,024][02523] Saving new best policy, reward=7.619!
[2025-02-24 13:48:18,315][02536] Updated weights for policy 0, policy_version 260 (0.0023)
[2025-02-24 13:48:21,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.2, 300 sec: 3974.6). Total num frames: 1073152. Throughput: 0: 1019.7. Samples: 268036. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:48:21,020][00421] Avg episode reward: [(0, '8.298')]
[2025-02-24 13:48:21,027][02523] Saving new best policy, reward=8.298!
[2025-02-24 13:48:26,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 3962.0). Total num frames: 1089536. Throughput: 0: 1029.8. Samples: 273684. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:48:26,020][00421] Avg episode reward: [(0, '7.720')]
[2025-02-24 13:48:28,935][02536] Updated weights for policy 0, policy_version 270 (0.0023)
[2025-02-24 13:48:31,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3979.0). Total num frames: 1114112. Throughput: 0: 1026.4. Samples: 277016. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:48:31,023][00421] Avg episode reward: [(0, '7.102')]
[2025-02-24 13:48:36,019][00421] Fps is (10 sec: 4095.6, 60 sec: 4027.7, 300 sec: 3966.6). Total num frames: 1130496. Throughput: 0: 1003.1. Samples: 282912. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:48:36,021][00421] Avg episode reward: [(0, '7.047')]
[2025-02-24 13:48:39,285][02536] Updated weights for policy 0, policy_version 280 (0.0019)
[2025-02-24 13:48:41,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3983.0). Total num frames: 1155072. Throughput: 0: 1030.5. Samples: 289132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:48:41,023][00421] Avg episode reward: [(0, '6.685')]
[2025-02-24 13:48:46,018][00421] Fps is (10 sec: 4506.0, 60 sec: 4096.0, 300 sec: 3984.9). Total num frames: 1175552. Throughput: 0: 1035.6. Samples: 292738. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:48:46,020][00421] Avg episode reward: [(0, '7.319')]
[2025-02-24 13:48:48,832][02536] Updated weights for policy 0, policy_version 290 (0.0020)
[2025-02-24 13:48:51,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4028.1, 300 sec: 4040.5). Total num frames: 1191936. Throughput: 0: 1000.7. Samples: 298086. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-02-24 13:48:51,021][00421] Avg episode reward: [(0, '7.616')]
[2025-02-24 13:48:56,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4068.2). Total num frames: 1216512. Throughput: 0: 1028.4. Samples: 304654. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:48:56,023][00421] Avg episode reward: [(0, '7.827')]
[2025-02-24 13:48:58,657][02536] Updated weights for policy 0, policy_version 300 (0.0026)
[2025-02-24 13:49:01,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1236992. Throughput: 0: 1031.8. Samples: 308176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:49:01,021][00421] Avg episode reward: [(0, '8.059')]
[2025-02-24 13:49:06,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4054.4). Total num frames: 1253376. Throughput: 0: 1007.0. Samples: 313352. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:49:06,023][00421] Avg episode reward: [(0, '8.302')]
[2025-02-24 13:49:06,087][02523] Saving new best policy, reward=8.302!
[2025-02-24 13:49:08,940][02536] Updated weights for policy 0, policy_version 310 (0.0017)
[2025-02-24 13:49:11,019][00421] Fps is (10 sec: 4095.9, 60 sec: 4164.2, 300 sec: 4068.2). Total num frames: 1277952. Throughput: 0: 1034.9. Samples: 320256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:49:11,020][00421] Avg episode reward: [(0, '9.088')]
[2025-02-24 13:49:11,033][02523] Saving new best policy, reward=9.088!
[2025-02-24 13:49:16,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1298432. Throughput: 0: 1038.1. Samples: 323730. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:49:16,022][00421] Avg episode reward: [(0, '8.673')]
[2025-02-24 13:49:19,416][02536] Updated weights for policy 0, policy_version 320 (0.0026)
[2025-02-24 13:49:21,018][00421] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4068.3). Total num frames: 1314816. Throughput: 0: 1021.3. Samples: 328868. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:49:21,020][00421] Avg episode reward: [(0, '9.532')]
[2025-02-24 13:49:21,094][02523] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000322_1318912.pth...
[2025-02-24 13:49:21,207][02523] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000083_339968.pth
[2025-02-24 13:49:21,230][02523] Saving new best policy, reward=9.532!
[2025-02-24 13:49:26,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1339392. Throughput: 0: 1032.6. Samples: 335600. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-24 13:49:26,019][00421] Avg episode reward: [(0, '9.649')]
[2025-02-24 13:49:26,025][02523] Saving new best policy, reward=9.649!
[2025-02-24 13:49:28,583][02536] Updated weights for policy 0, policy_version 330 (0.0019)
[2025-02-24 13:49:31,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1355776. Throughput: 0: 1016.8. Samples: 338496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:49:31,022][00421] Avg episode reward: [(0, '10.707')]
[2025-02-24 13:49:31,041][02523] Saving new best policy, reward=10.707!
[2025-02-24 13:49:36,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4082.2). Total num frames: 1376256. Throughput: 0: 1019.3. Samples: 343956. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:49:36,022][00421] Avg episode reward: [(0, '10.349')]
[2025-02-24 13:49:38,909][02536] Updated weights for policy 0, policy_version 340 (0.0013)
[2025-02-24 13:49:41,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1400832. Throughput: 0: 1028.7. Samples: 350944. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:49:41,021][00421] Avg episode reward: [(0, '10.337')]
[2025-02-24 13:49:46,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4054.3). Total num frames: 1417216. Throughput: 0: 1005.4. Samples: 353418. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:49:46,020][00421] Avg episode reward: [(0, '9.609')]
[2025-02-24 13:49:49,351][02536] Updated weights for policy 0, policy_version 350 (0.0013)
[2025-02-24 13:49:51,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4068.2). Total num frames: 1437696. Throughput: 0: 1028.9. Samples: 359654. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-02-24 13:49:51,022][00421] Avg episode reward: [(0, '9.458')]
[2025-02-24 13:49:56,018][00421] Fps is (10 sec: 4505.5, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1462272. Throughput: 0: 1023.6. Samples: 366320. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:49:56,024][00421] Avg episode reward: [(0, '10.154')]
[2025-02-24 13:49:59,702][02536] Updated weights for policy 0, policy_version 360 (0.0018)
[2025-02-24 13:50:01,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1478656. Throughput: 0: 994.0. Samples: 368458. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:50:01,020][00421] Avg episode reward: [(0, '10.994')]
[2025-02-24 13:50:01,028][02523] Saving new best policy, reward=10.994!
[2025-02-24 13:50:06,018][00421] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1503232. Throughput: 0: 1029.9. Samples: 375214. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:50:06,022][00421] Avg episode reward: [(0, '11.436')]
[2025-02-24 13:50:06,024][02523] Saving new best policy, reward=11.436!
[2025-02-24 13:50:08,576][02536] Updated weights for policy 0, policy_version 370 (0.0015)
[2025-02-24 13:50:11,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1523712. Throughput: 0: 1015.8. Samples: 381312. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:50:11,019][00421] Avg episode reward: [(0, '12.131')]
[2025-02-24 13:50:11,025][02523] Saving new best policy, reward=12.131!
[2025-02-24 13:50:16,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1540096. Throughput: 0: 1003.2. Samples: 383638. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:50:16,024][00421] Avg episode reward: [(0, '12.456')]
[2025-02-24 13:50:16,028][02523] Saving new best policy, reward=12.456!
[2025-02-24 13:50:19,228][02536] Updated weights for policy 0, policy_version 380 (0.0015)
[2025-02-24 13:50:21,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1564672. Throughput: 0: 1034.0. Samples: 390484. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:50:21,023][00421] Avg episode reward: [(0, '13.211')]
[2025-02-24 13:50:21,030][02523] Saving new best policy, reward=13.211!
[2025-02-24 13:50:26,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1581056. Throughput: 0: 1001.0. Samples: 395990. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:50:26,022][00421] Avg episode reward: [(0, '13.042')]
[2025-02-24 13:50:29,788][02536] Updated weights for policy 0, policy_version 390 (0.0013)
[2025-02-24 13:50:31,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1601536. Throughput: 0: 1010.6. Samples: 398896. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:50:31,020][00421] Avg episode reward: [(0, '12.709')]
[2025-02-24 13:50:36,022][00421] Fps is (10 sec: 4503.7, 60 sec: 4164.0, 300 sec: 4082.1). Total num frames: 1626112. Throughput: 0: 1026.1. Samples: 405834. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:50:36,029][00421] Avg episode reward: [(0, '11.738')]
[2025-02-24 13:50:39,818][02536] Updated weights for policy 0, policy_version 400 (0.0029)
[2025-02-24 13:50:41,018][00421] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4040.5). Total num frames: 1638400. Throughput: 0: 944.6. Samples: 408828. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:50:41,022][00421] Avg episode reward: [(0, '11.416')]
[2025-02-24 13:50:46,018][00421] Fps is (10 sec: 3687.9, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1662976. Throughput: 0: 1019.0. Samples: 414312. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:50:46,023][00421] Avg episode reward: [(0, '10.844')]
[2025-02-24 13:50:49,086][02536] Updated weights for policy 0, policy_version 410 (0.0013)
[2025-02-24 13:50:51,018][00421] Fps is (10 sec: 4915.2, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1687552. Throughput: 0: 1023.8. Samples: 421286. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:50:51,020][00421] Avg episode reward: [(0, '11.133')]
[2025-02-24 13:50:56,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 4068.3). Total num frames: 1703936. Throughput: 0: 1002.1. Samples: 426406. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:50:56,019][00421] Avg episode reward: [(0, '11.990')]
[2025-02-24 13:50:59,469][02536] Updated weights for policy 0, policy_version 420 (0.0028)
[2025-02-24 13:51:01,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1724416. Throughput: 0: 1027.7. Samples: 429886. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:51:01,023][00421] Avg episode reward: [(0, '12.424')]
[2025-02-24 13:51:06,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1744896. Throughput: 0: 1030.6. Samples: 436862. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:51:06,022][00421] Avg episode reward: [(0, '13.263')]
[2025-02-24 13:51:06,093][02523] Saving new best policy, reward=13.263!
[2025-02-24 13:51:09,906][02536] Updated weights for policy 0, policy_version 430 (0.0015)
[2025-02-24 13:51:11,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1765376. Throughput: 0: 1023.3. Samples: 442038. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:51:11,024][00421] Avg episode reward: [(0, '14.805')]
[2025-02-24 13:51:11,034][02523] Saving new best policy, reward=14.805!
[2025-02-24 13:51:16,019][00421] Fps is (10 sec: 4095.9, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1785856. Throughput: 0: 1033.9. Samples: 445422. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:51:16,021][00421] Avg episode reward: [(0, '15.751')]
[2025-02-24 13:51:16,072][02523] Saving new best policy, reward=15.751!
[2025-02-24 13:51:18,878][02536] Updated weights for policy 0, policy_version 440 (0.0017)
[2025-02-24 13:51:21,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1806336. Throughput: 0: 1020.0. Samples: 451730. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:51:21,022][00421] Avg episode reward: [(0, '17.079')]
[2025-02-24 13:51:21,031][02523] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000441_1806336.pth...
[2025-02-24 13:51:21,186][02523] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000202_827392.pth
[2025-02-24 13:51:21,205][02523] Saving new best policy, reward=17.079!
[2025-02-24 13:51:26,018][00421] Fps is (10 sec: 3686.5, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1822720. Throughput: 0: 1071.4. Samples: 457042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:51:26,020][00421] Avg episode reward: [(0, '17.279')]
[2025-02-24 13:51:26,027][02523] Saving new best policy, reward=17.279!
[2025-02-24 13:51:29,693][02536] Updated weights for policy 0, policy_version 450 (0.0016)
[2025-02-24 13:51:31,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1847296. Throughput: 0: 1024.0. Samples: 460390. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:51:31,020][00421] Avg episode reward: [(0, '16.356')]
[2025-02-24 13:51:36,018][00421] Fps is (10 sec: 4096.0, 60 sec: 3959.7, 300 sec: 4054.3). Total num frames: 1863680. Throughput: 0: 1003.2. Samples: 466428. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:51:36,020][00421] Avg episode reward: [(0, '16.195')]
[2025-02-24 13:51:39,945][02536] Updated weights for policy 0, policy_version 460 (0.0022)
[2025-02-24 13:51:41,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4082.1). Total num frames: 1888256. Throughput: 0: 1028.4. Samples: 472684. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:51:41,024][00421] Avg episode reward: [(0, '15.706')]
[2025-02-24 13:51:46,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.2). Total num frames: 1908736. Throughput: 0: 1028.0. Samples: 476146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:51:46,020][00421] Avg episode reward: [(0, '17.032')]
[2025-02-24 13:51:50,326][02536] Updated weights for policy 0, policy_version 470 (0.0012)
[2025-02-24 13:51:51,018][00421] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4068.3). Total num frames: 1925120. Throughput: 0: 991.5. Samples: 481478. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:51:51,020][00421] Avg episode reward: [(0, '17.627')]
[2025-02-24 13:51:51,025][02523] Saving new best policy, reward=17.627!
[2025-02-24 13:51:56,020][00421] Fps is (10 sec: 4095.3, 60 sec: 4095.9, 300 sec: 4096.0). Total num frames: 1949696. Throughput: 0: 1023.7. Samples: 488108. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:51:56,021][00421] Avg episode reward: [(0, '18.691')]
[2025-02-24 13:51:56,026][02523] Saving new best policy, reward=18.691!
[2025-02-24 13:51:59,483][02536] Updated weights for policy 0, policy_version 480 (0.0013)
[2025-02-24 13:52:01,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 1970176. Throughput: 0: 1023.6. Samples: 491486. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:52:01,022][00421] Avg episode reward: [(0, '18.056')]
[2025-02-24 13:52:06,018][00421] Fps is (10 sec: 3687.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 1986560. Throughput: 0: 999.5. Samples: 496706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:52:06,022][00421] Avg episode reward: [(0, '17.121')]
[2025-02-24 13:52:09,703][02536] Updated weights for policy 0, policy_version 490 (0.0014)
[2025-02-24 13:52:11,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2011136. Throughput: 0: 1038.7. Samples: 503784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:52:11,021][00421] Avg episode reward: [(0, '16.994')]
[2025-02-24 13:52:16,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.2). Total num frames: 2031616. Throughput: 0: 1042.5. Samples: 507304. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:52:16,020][00421] Avg episode reward: [(0, '16.177')]
[2025-02-24 13:52:20,002][02536] Updated weights for policy 0, policy_version 500 (0.0014)
[2025-02-24 13:52:21,019][00421] Fps is (10 sec: 4095.8, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2052096. Throughput: 0: 1022.4. Samples: 512436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:52:21,023][00421] Avg episode reward: [(0, '16.025')]
[2025-02-24 13:52:26,019][00421] Fps is (10 sec: 4095.9, 60 sec: 4164.2, 300 sec: 4082.1). Total num frames: 2072576. Throughput: 0: 1036.3. Samples: 519318. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:52:26,024][00421] Avg episode reward: [(0, '17.128')]
[2025-02-24 13:52:29,778][02536] Updated weights for policy 0, policy_version 510 (0.0018)
[2025-02-24 13:52:31,022][00421] Fps is (10 sec: 4094.7, 60 sec: 4095.7, 300 sec: 4082.1). Total num frames: 2093056. Throughput: 0: 1025.3. Samples: 522288. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:52:31,028][00421] Avg episode reward: [(0, '18.080')]
[2025-02-24 13:52:36,018][00421] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2113536. Throughput: 0: 1031.6. Samples: 527900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:52:36,022][00421] Avg episode reward: [(0, '18.235')]
[2025-02-24 13:52:39,232][02536] Updated weights for policy 0, policy_version 520 (0.0021)
[2025-02-24 13:52:41,018][00421] Fps is (10 sec: 4097.5, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2134016. Throughput: 0: 1038.5. Samples: 534838. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:52:41,023][00421] Avg episode reward: [(0, '19.406')]
[2025-02-24 13:52:41,087][02523] Saving new best policy, reward=19.406!
[2025-02-24 13:52:46,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.3). Total num frames: 2150400. Throughput: 0: 1017.1. Samples: 537256. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:52:46,023][00421] Avg episode reward: [(0, '19.238')]
[2025-02-24 13:52:49,845][02536] Updated weights for policy 0, policy_version 530 (0.0021)
[2025-02-24 13:52:51,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2174976. Throughput: 0: 1037.8. Samples: 543406. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-24 13:52:51,023][00421] Avg episode reward: [(0, '20.907')]
[2025-02-24 13:52:51,030][02523] Saving new best policy, reward=20.907!
[2025-02-24 13:52:56,019][00421] Fps is (10 sec: 4505.5, 60 sec: 4096.1, 300 sec: 4082.1). Total num frames: 2195456. Throughput: 0: 1027.9. Samples: 550040. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:52:56,020][00421] Avg episode reward: [(0, '21.710')]
[2025-02-24 13:52:56,026][02523] Saving new best policy, reward=21.710!
[2025-02-24 13:53:00,471][02536] Updated weights for policy 0, policy_version 540 (0.0024)
[2025-02-24 13:53:01,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2211840. Throughput: 0: 994.8. Samples: 552072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:53:01,024][00421] Avg episode reward: [(0, '22.505')]
[2025-02-24 13:53:01,037][02523] Saving new best policy, reward=22.505!
[2025-02-24 13:53:06,018][00421] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2236416. Throughput: 0: 1025.5. Samples: 558582. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:53:06,023][00421] Avg episode reward: [(0, '23.067')]
[2025-02-24 13:53:06,027][02523] Saving new best policy, reward=23.067!
[2025-02-24 13:53:09,368][02536] Updated weights for policy 0, policy_version 550 (0.0014)
[2025-02-24 13:53:11,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2256896. Throughput: 0: 1016.0. Samples: 565036. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:53:11,024][00421] Avg episode reward: [(0, '22.869')]
[2025-02-24 13:53:16,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2273280. Throughput: 0: 1000.3. Samples: 567296. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:53:16,019][00421] Avg episode reward: [(0, '22.699')]
[2025-02-24 13:53:19,770][02536] Updated weights for policy 0, policy_version 560 (0.0019)
[2025-02-24 13:53:21,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2297856. Throughput: 0: 1029.2. Samples: 574212. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:53:21,020][00421] Avg episode reward: [(0, '20.610')]
[2025-02-24 13:53:21,028][02523] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000561_2297856.pth...
[2025-02-24 13:53:21,140][02523] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000322_1318912.pth
[2025-02-24 13:53:26,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2318336. Throughput: 0: 1003.8. Samples: 580010. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:53:26,023][00421] Avg episode reward: [(0, '20.418')]
[2025-02-24 13:53:30,135][02536] Updated weights for policy 0, policy_version 570 (0.0013)
[2025-02-24 13:53:31,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4028.0, 300 sec: 4082.1). Total num frames: 2334720. Throughput: 0: 1009.5. Samples: 582682. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:53:31,022][00421] Avg episode reward: [(0, '21.417')]
[2025-02-24 13:53:36,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2359296. Throughput: 0: 1026.5. Samples: 589600. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:53:36,022][00421] Avg episode reward: [(0, '21.018')]
[2025-02-24 13:53:40,082][02536] Updated weights for policy 0, policy_version 580 (0.0014)
[2025-02-24 13:53:41,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2375680. Throughput: 0: 1000.1. Samples: 595042. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-24 13:53:41,022][00421] Avg episode reward: [(0, '21.118')]
[2025-02-24 13:53:46,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2400256. Throughput: 0: 1026.1. Samples: 598246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:53:46,023][00421] Avg episode reward: [(0, '21.895')]
[2025-02-24 13:53:49,407][02536] Updated weights for policy 0, policy_version 590 (0.0013)
[2025-02-24 13:53:51,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2420736. Throughput: 0: 1036.2. Samples: 605210. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:53:51,020][00421] Avg episode reward: [(0, '20.800')]
[2025-02-24 13:53:56,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4068.2). Total num frames: 2437120. Throughput: 0: 1003.5. Samples: 610194. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-02-24 13:53:56,019][00421] Avg episode reward: [(0, '20.337')]
[2025-02-24 13:54:00,062][02536] Updated weights for policy 0, policy_version 600 (0.0013)
[2025-02-24 13:54:01,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2461696. Throughput: 0: 1030.4. Samples: 613662. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:54:01,020][00421] Avg episode reward: [(0, '19.942')]
[2025-02-24 13:54:06,021][00421] Fps is (10 sec: 4504.4, 60 sec: 4095.8, 300 sec: 4082.1). Total num frames: 2482176. Throughput: 0: 1034.9. Samples: 620784. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:54:06,022][00421] Avg episode reward: [(0, '19.247')]
[2025-02-24 13:54:10,304][02536] Updated weights for policy 0, policy_version 610 (0.0014)
[2025-02-24 13:54:11,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2498560. Throughput: 0: 1020.6. Samples: 625938. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:54:11,020][00421] Avg episode reward: [(0, '19.001')]
[2025-02-24 13:54:16,019][00421] Fps is (10 sec: 4097.0, 60 sec: 4164.2, 300 sec: 4096.0). Total num frames: 2523136. Throughput: 0: 1037.6. Samples: 629372. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0)
[2025-02-24 13:54:16,023][00421] Avg episode reward: [(0, '19.177')]
[2025-02-24 13:54:19,141][02536] Updated weights for policy 0, policy_version 620 (0.0016)
[2025-02-24 13:54:21,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2543616. Throughput: 0: 1035.4. Samples: 636194. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:54:21,022][00421] Avg episode reward: [(0, '18.968')]
[2025-02-24 13:54:26,018][00421] Fps is (10 sec: 4096.1, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2564096. Throughput: 0: 1032.1. Samples: 641486. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-24 13:54:26,021][00421] Avg episode reward: [(0, '19.173')]
[2025-02-24 13:54:29,436][02536] Updated weights for policy 0, policy_version 630 (0.0023)
[2025-02-24 13:54:31,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2584576. Throughput: 0: 1039.2. Samples: 645008. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:54:31,023][00421] Avg episode reward: [(0, '21.163')]
[2025-02-24 13:54:36,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2605056. Throughput: 0: 1019.2. Samples: 651072. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:54:36,020][00421] Avg episode reward: [(0, '21.485')]
[2025-02-24 13:54:39,860][02536] Updated weights for policy 0, policy_version 640 (0.0013)
[2025-02-24 13:54:41,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2625536. Throughput: 0: 1042.6. Samples: 657112. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:54:41,023][00421] Avg episode reward: [(0, '21.928')]
[2025-02-24 13:54:46,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2646016. Throughput: 0: 1042.6. Samples: 660580. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:54:46,033][00421] Avg episode reward: [(0, '23.666')]
[2025-02-24 13:54:46,083][02523] Saving new best policy, reward=23.666!
[2025-02-24 13:54:50,079][02536] Updated weights for policy 0, policy_version 650 (0.0024)
[2025-02-24 13:54:51,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4068.2). Total num frames: 2662400. Throughput: 0: 1008.7. Samples: 666174. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:54:51,021][00421] Avg episode reward: [(0, '23.217')]
[2025-02-24 13:54:56,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2686976. Throughput: 0: 1035.2. Samples: 672520. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:54:56,020][00421] Avg episode reward: [(0, '22.099')]
[2025-02-24 13:54:59,117][02536] Updated weights for policy 0, policy_version 660 (0.0020)
[2025-02-24 13:55:01,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2707456. Throughput: 0: 1036.6. Samples: 676018. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-02-24 13:55:01,025][00421] Avg episode reward: [(0, '21.815')]
[2025-02-24 13:55:06,020][00421] Fps is (10 sec: 3685.9, 60 sec: 4027.8, 300 sec: 4068.2). Total num frames: 2723840. Throughput: 0: 999.2. Samples: 681160. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:55:06,024][00421] Avg episode reward: [(0, '23.151')]
[2025-02-24 13:55:09,708][02536] Updated weights for policy 0, policy_version 670 (0.0024)
[2025-02-24 13:55:11,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2748416. Throughput: 0: 1035.2. Samples: 688072. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:55:11,020][00421] Avg episode reward: [(0, '22.816')]
[2025-02-24 13:55:16,018][00421] Fps is (10 sec: 4506.2, 60 sec: 4096.0, 300 sec: 4082.1). Total num frames: 2768896. Throughput: 0: 1033.1. Samples: 691498. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:55:16,022][00421] Avg episode reward: [(0, '23.138')]
[2025-02-24 13:55:20,127][02536] Updated weights for policy 0, policy_version 680 (0.0016)
[2025-02-24 13:55:21,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2789376. Throughput: 0: 1014.1. Samples: 696706. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:55:21,019][00421] Avg episode reward: [(0, '22.895')]
[2025-02-24 13:55:21,028][02523] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000681_2789376.pth...
[2025-02-24 13:55:21,132][02523] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000441_1806336.pth
[2025-02-24 13:55:26,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2809856. Throughput: 0: 1029.3. Samples: 703432. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:55:26,022][00421] Avg episode reward: [(0, '21.918')]
[2025-02-24 13:55:28,999][02536] Updated weights for policy 0, policy_version 690 (0.0012)
[2025-02-24 13:55:31,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4082.2). Total num frames: 2830336. Throughput: 0: 1027.5. Samples: 706818. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:55:31,021][00421] Avg episode reward: [(0, '19.296')]
[2025-02-24 13:55:36,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 2850816. Throughput: 0: 1020.4. Samples: 712090. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:55:36,023][00421] Avg episode reward: [(0, '20.074')]
[2025-02-24 13:55:39,481][02536] Updated weights for policy 0, policy_version 700 (0.0013)
[2025-02-24 13:55:41,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 2871296. Throughput: 0: 1033.2. Samples: 719012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:55:41,024][00421] Avg episode reward: [(0, '20.280')]
[2025-02-24 13:55:46,024][00421] Fps is (10 sec: 3684.3, 60 sec: 4027.3, 300 sec: 4068.2). Total num frames: 2887680. Throughput: 0: 1017.4. Samples: 721808. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:55:46,027][00421] Avg episode reward: [(0, '21.682')]
[2025-02-24 13:55:49,780][02536] Updated weights for policy 0, policy_version 710 (0.0021)
[2025-02-24 13:55:51,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4096.0). Total num frames: 2912256. Throughput: 0: 1033.4. Samples: 727662. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:55:51,025][00421] Avg episode reward: [(0, '22.360')]
[2025-02-24 13:55:56,018][00421] Fps is (10 sec: 4918.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 2936832. Throughput: 0: 1032.6. Samples: 734540. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:55:56,020][00421] Avg episode reward: [(0, '24.804')]
[2025-02-24 13:55:56,024][02523] Saving new best policy, reward=24.804!
[2025-02-24 13:56:00,167][02536] Updated weights for policy 0, policy_version 720 (0.0020)
[2025-02-24 13:56:01,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4082.1). Total num frames: 2949120. Throughput: 0: 1006.2. Samples: 736776. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:56:01,020][00421] Avg episode reward: [(0, '24.697')]
[2025-02-24 13:56:06,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4164.4, 300 sec: 4096.0). Total num frames: 2973696. Throughput: 0: 1033.4. Samples: 743208. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:56:06,020][00421] Avg episode reward: [(0, '25.104')]
[2025-02-24 13:56:06,026][02523] Saving new best policy, reward=25.104!
[2025-02-24 13:56:09,211][02536] Updated weights for policy 0, policy_version 730 (0.0013)
[2025-02-24 13:56:11,020][00421] Fps is (10 sec: 4504.9, 60 sec: 4095.9, 300 sec: 4096.0). Total num frames: 2994176. Throughput: 0: 1029.2. Samples: 749750. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-02-24 13:56:11,021][00421] Avg episode reward: [(0, '24.169')]
[2025-02-24 13:56:16,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3014656. Throughput: 0: 998.8. Samples: 751766. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-02-24 13:56:16,019][00421] Avg episode reward: [(0, '22.965')]
[2025-02-24 13:56:19,462][02536] Updated weights for policy 0, policy_version 740 (0.0015)
[2025-02-24 13:56:21,018][00421] Fps is (10 sec: 4096.6, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3035136. Throughput: 0: 1038.2. Samples: 758810. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:56:21,022][00421] Avg episode reward: [(0, '22.080')]
[2025-02-24 13:56:26,020][00421] Fps is (10 sec: 4095.4, 60 sec: 4095.9, 300 sec: 4096.0). Total num frames: 3055616. Throughput: 0: 1016.4. Samples: 764752. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:56:26,021][00421] Avg episode reward: [(0, '21.922')]
[2025-02-24 13:56:29,843][02536] Updated weights for policy 0, policy_version 750 (0.0019)
[2025-02-24 13:56:31,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3076096. Throughput: 0: 1010.9. Samples: 767292. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:56:31,020][00421] Avg episode reward: [(0, '20.083')]
[2025-02-24 13:56:36,018][00421] Fps is (10 sec: 4096.6, 60 sec: 4096.0, 300 sec: 4096.0). Total num frames: 3096576. Throughput: 0: 1036.6. Samples: 774310. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:56:36,020][00421] Avg episode reward: [(0, '20.651')]
[2025-02-24 13:56:39,101][02536] Updated weights for policy 0, policy_version 760 (0.0014)
[2025-02-24 13:56:41,020][00421] Fps is (10 sec: 4095.3, 60 sec: 4095.9, 300 sec: 4096.0). Total num frames: 3117056. Throughput: 0: 1007.8. Samples: 779892. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:56:41,024][00421] Avg episode reward: [(0, '21.812')]
[2025-02-24 13:56:46,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.7, 300 sec: 4109.9). Total num frames: 3137536. Throughput: 0: 1026.6. Samples: 782974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:56:46,020][00421] Avg episode reward: [(0, '20.967')]
[2025-02-24 13:56:49,147][02536] Updated weights for policy 0, policy_version 770 (0.0026)
[2025-02-24 13:56:51,018][00421] Fps is (10 sec: 4506.4, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3162112. Throughput: 0: 1037.2. Samples: 789880. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:56:51,020][00421] Avg episode reward: [(0, '20.851')]
[2025-02-24 13:56:56,018][00421] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 4082.1). Total num frames: 3174400. Throughput: 0: 1005.6. Samples: 795000. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:56:56,020][00421] Avg episode reward: [(0, '20.993')]
[2025-02-24 13:56:59,630][02536] Updated weights for policy 0, policy_version 780 (0.0020)
[2025-02-24 13:57:01,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3198976. Throughput: 0: 1034.6. Samples: 798324. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:57:01,020][00421] Avg episode reward: [(0, '20.346')]
[2025-02-24 13:57:06,019][00421] Fps is (10 sec: 4914.9, 60 sec: 4164.2, 300 sec: 4109.9). Total num frames: 3223552. Throughput: 0: 1036.7. Samples: 805460. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:57:06,020][00421] Avg episode reward: [(0, '19.980')]
[2025-02-24 13:57:09,841][02536] Updated weights for policy 0, policy_version 790 (0.0020)
[2025-02-24 13:57:11,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 4096.0). Total num frames: 3239936. Throughput: 0: 1022.7. Samples: 810774. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:57:11,020][00421] Avg episode reward: [(0, '21.657')]
[2025-02-24 13:57:16,018][00421] Fps is (10 sec: 4096.3, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3264512. Throughput: 0: 1044.5. Samples: 814294. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:57:16,020][00421] Avg episode reward: [(0, '22.929')]
[2025-02-24 13:57:18,541][02536] Updated weights for policy 0, policy_version 800 (0.0019)
[2025-02-24 13:57:21,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3284992. Throughput: 0: 1040.0. Samples: 821110. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:57:21,021][00421] Avg episode reward: [(0, '23.472')]
[2025-02-24 13:57:21,032][02523] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000802_3284992.pth...
[2025-02-24 13:57:21,169][02523] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000561_2297856.pth
[2025-02-24 13:57:26,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4096.1). Total num frames: 3301376. Throughput: 0: 1036.0. Samples: 826508. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 13:57:26,021][00421] Avg episode reward: [(0, '24.174')]
[2025-02-24 13:57:28,976][02536] Updated weights for policy 0, policy_version 810 (0.0025)
[2025-02-24 13:57:31,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3325952. Throughput: 0: 1042.4. Samples: 829882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:57:31,024][00421] Avg episode reward: [(0, '25.054')]
[2025-02-24 13:57:36,020][00421] Fps is (10 sec: 4095.4, 60 sec: 4095.9, 300 sec: 4096.0). Total num frames: 3342336. Throughput: 0: 1026.9. Samples: 836090. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:57:36,021][00421] Avg episode reward: [(0, '25.502')]
[2025-02-24 13:57:36,027][02523] Saving new best policy, reward=25.502!
[2025-02-24 13:57:39,246][02536] Updated weights for policy 0, policy_version 820 (0.0026)
[2025-02-24 13:57:41,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4096.1, 300 sec: 4109.9). Total num frames: 3362816. Throughput: 0: 1044.7. Samples: 842012. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:57:41,022][00421] Avg episode reward: [(0, '25.102')]
[2025-02-24 13:57:46,018][00421] Fps is (10 sec: 4506.2, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3387392. Throughput: 0: 1046.0. Samples: 845392. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:57:46,022][00421] Avg episode reward: [(0, '24.379')]
[2025-02-24 13:57:48,915][02536] Updated weights for policy 0, policy_version 830 (0.0017)
[2025-02-24 13:57:51,023][00421] Fps is (10 sec: 4094.2, 60 sec: 4027.4, 300 sec: 4095.9). Total num frames: 3403776. Throughput: 0: 1011.8. Samples: 850994. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:57:51,024][00421] Avg episode reward: [(0, '24.886')]
[2025-02-24 13:57:56,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4232.5, 300 sec: 4123.8). Total num frames: 3428352. Throughput: 0: 1036.1. Samples: 857400. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-02-24 13:57:56,022][00421] Avg episode reward: [(0, '23.577')]
[2025-02-24 13:57:58,756][02536] Updated weights for policy 0, policy_version 840 (0.0016)
[2025-02-24 13:58:01,018][00421] Fps is (10 sec: 4507.5, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3448832. Throughput: 0: 1033.7. Samples: 860810. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:58:01,025][00421] Avg episode reward: [(0, '24.831')]
[2025-02-24 13:58:06,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.8, 300 sec: 4096.0). Total num frames: 3465216. Throughput: 0: 998.8. Samples: 866058. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:58:06,022][00421] Avg episode reward: [(0, '24.948')]
[2025-02-24 13:58:08,857][02536] Updated weights for policy 0, policy_version 850 (0.0023)
[2025-02-24 13:58:11,018][00421] Fps is (10 sec: 4095.9, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3489792. Throughput: 0: 1036.6. Samples: 873156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:58:11,027][00421] Avg episode reward: [(0, '25.857')]
[2025-02-24 13:58:11,035][02523] Saving new best policy, reward=25.857!
[2025-02-24 13:58:16,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3510272. Throughput: 0: 1039.4. Samples: 876654. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0)
[2025-02-24 13:58:16,022][00421] Avg episode reward: [(0, '26.758')]
[2025-02-24 13:58:16,027][02523] Saving new best policy, reward=26.758!
[2025-02-24 13:58:19,548][02536] Updated weights for policy 0, policy_version 860 (0.0017)
[2025-02-24 13:58:21,019][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4096.0). Total num frames: 3526656. Throughput: 0: 1014.5. Samples: 881742. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0)
[2025-02-24 13:58:21,022][00421] Avg episode reward: [(0, '27.448')]
[2025-02-24 13:58:21,075][02523] Saving new best policy, reward=27.448!
[2025-02-24 13:58:26,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3551232. Throughput: 0: 1039.2. Samples: 888776. Policy #0 lag: (min: 0.0, avg: 0.2, max: 2.0)
[2025-02-24 13:58:26,019][00421] Avg episode reward: [(0, '26.120')]
[2025-02-24 13:58:28,289][02536] Updated weights for policy 0, policy_version 870 (0.0017)
[2025-02-24 13:58:31,018][00421] Fps is (10 sec: 4505.8, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3571712. Throughput: 0: 1036.8. Samples: 892050. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:58:31,021][00421] Avg episode reward: [(0, '23.734')]
[2025-02-24 13:58:36,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.4, 300 sec: 4123.8). Total num frames: 3592192. Throughput: 0: 1029.9. Samples: 897334. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0)
[2025-02-24 13:58:36,022][00421] Avg episode reward: [(0, '21.828')]
[2025-02-24 13:58:38,568][02536] Updated weights for policy 0, policy_version 880 (0.0019)
[2025-02-24 13:58:41,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4109.9). Total num frames: 3612672. Throughput: 0: 1045.2. Samples: 904436. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:58:41,019][00421] Avg episode reward: [(0, '21.341')]
[2025-02-24 13:58:46,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3633152. Throughput: 0: 1033.7. Samples: 907328. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:58:46,024][00421] Avg episode reward: [(0, '22.394')]
[2025-02-24 13:58:48,709][02536] Updated weights for policy 0, policy_version 890 (0.0016)
[2025-02-24 13:58:51,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.6, 300 sec: 4123.8). Total num frames: 3653632. Throughput: 0: 1049.1. Samples: 913266. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:58:51,023][00421] Avg episode reward: [(0, '21.887')]
[2025-02-24 13:58:56,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3678208. Throughput: 0: 1045.9. Samples: 920220. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:58:56,023][00421] Avg episode reward: [(0, '23.822')]
[2025-02-24 13:58:58,157][02536] Updated weights for policy 0, policy_version 900 (0.0017)
[2025-02-24 13:59:01,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3694592. Throughput: 0: 1021.9. Samples: 922640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:59:01,023][00421] Avg episode reward: [(0, '24.008')]
[2025-02-24 13:59:06,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3715072. Throughput: 0: 1049.7. Samples: 928976. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:59:06,020][00421] Avg episode reward: [(0, '23.797')]
[2025-02-24 13:59:07,677][02536] Updated weights for policy 0, policy_version 910 (0.0022)
[2025-02-24 13:59:11,018][00421] Fps is (10 sec: 4505.6, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3739648. Throughput: 0: 1046.0. Samples: 935844. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:59:11,021][00421] Avg episode reward: [(0, '22.587')]
[2025-02-24 13:59:16,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3756032. Throughput: 0: 1019.2. Samples: 937912. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:59:16,020][00421] Avg episode reward: [(0, '24.158')]
[2025-02-24 13:59:18,197][02536] Updated weights for policy 0, policy_version 920 (0.0012)
[2025-02-24 13:59:21,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4232.6, 300 sec: 4123.8). Total num frames: 3780608. Throughput: 0: 1051.3. Samples: 944642. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:59:21,021][00421] Avg episode reward: [(0, '23.392')]
[2025-02-24 13:59:21,028][02523] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000923_3780608.pth...
[2025-02-24 13:59:21,149][02523] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000681_2789376.pth
[2025-02-24 13:59:26,020][00421] Fps is (10 sec: 4504.9, 60 sec: 4164.2, 300 sec: 4123.7). Total num frames: 3801088. Throughput: 0: 1033.1. Samples: 950926. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:59:26,022][00421] Avg episode reward: [(0, '24.088')]
[2025-02-24 13:59:28,606][02536] Updated weights for policy 0, policy_version 930 (0.0013)
[2025-02-24 13:59:31,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3817472. Throughput: 0: 1019.7. Samples: 953216. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 13:59:31,020][00421] Avg episode reward: [(0, '25.042')]
[2025-02-24 13:59:36,018][00421] Fps is (10 sec: 4096.7, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3842048. Throughput: 0: 1042.7. Samples: 960186. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:59:36,020][00421] Avg episode reward: [(0, '26.261')]
[2025-02-24 13:59:37,423][02536] Updated weights for policy 0, policy_version 940 (0.0023)
[2025-02-24 13:59:41,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4109.9). Total num frames: 3858432. Throughput: 0: 1017.2. Samples: 965994. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:59:41,022][00421] Avg episode reward: [(0, '26.022')]
[2025-02-24 13:59:46,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 3883008. Throughput: 0: 1026.8. Samples: 968848. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:59:46,023][00421] Avg episode reward: [(0, '26.317')]
[2025-02-24 13:59:47,746][02536] Updated weights for policy 0, policy_version 950 (0.0018)
[2025-02-24 13:59:51,019][00421] Fps is (10 sec: 4505.5, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 3903488. Throughput: 0: 1041.4. Samples: 975840. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:59:51,024][00421] Avg episode reward: [(0, '26.686')]
[2025-02-24 13:59:56,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 4109.9). Total num frames: 3919872. Throughput: 0: 1006.7. Samples: 981146. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 13:59:56,021][00421] Avg episode reward: [(0, '25.987')]
[2025-02-24 13:59:58,226][02536] Updated weights for policy 0, policy_version 960 (0.0022)
[2025-02-24 14:00:01,018][00421] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 4137.7). Total num frames: 3944448. Throughput: 0: 1032.7. Samples: 984384. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 14:00:01,022][00421] Avg episode reward: [(0, '24.817')]
[2025-02-24 14:00:06,018][00421] Fps is (10 sec: 4915.2, 60 sec: 4232.5, 300 sec: 4137.7). Total num frames: 3969024. Throughput: 0: 1040.4. Samples: 991458. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 14:00:06,024][00421] Avg episode reward: [(0, '23.692')]
[2025-02-24 14:00:07,498][02536] Updated weights for policy 0, policy_version 970 (0.0021)
[2025-02-24 14:00:11,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 3985408. Throughput: 0: 1017.1. Samples: 996696. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 14:00:11,024][00421] Avg episode reward: [(0, '23.781')]
[2025-02-24 14:00:16,018][00421] Fps is (10 sec: 3686.4, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 4005888. Throughput: 0: 1044.0. Samples: 1000196. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 14:00:16,023][00421] Avg episode reward: [(0, '22.684')]
[2025-02-24 14:00:17,146][02536] Updated weights for policy 0, policy_version 980 (0.0017)
[2025-02-24 14:00:21,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 4026368. Throughput: 0: 1045.6. Samples: 1007236. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
[2025-02-24 14:00:21,022][00421] Avg episode reward: [(0, '21.374')]
[2025-02-24 14:00:26,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.1, 300 sec: 4123.8). Total num frames: 4046848. Throughput: 0: 1032.7. Samples: 1012464. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-02-24 14:00:26,024][00421] Avg episode reward: [(0, '21.139')]
[2025-02-24 14:00:27,499][02536] Updated weights for policy 0, policy_version 990 (0.0016)
[2025-02-24 14:00:31,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 4123.8). Total num frames: 4067328. Throughput: 0: 1044.4. Samples: 1015844. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 14:00:31,023][00421] Avg episode reward: [(0, '20.946')]
[2025-02-24 14:00:36,018][00421] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 4123.8). Total num frames: 4087808. Throughput: 0: 1031.9. Samples: 1022274. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-24 14:00:36,020][00421] Avg episode reward: [(0, '21.150')]
[2025-02-24 14:00:37,943][02536] Updated weights for policy 0, policy_version 1000 (0.0018)
[2025-02-24 14:00:39,571][02523] Stopping Batcher_0...
[2025-02-24 14:00:39,572][02523] Loop batcher_evt_loop terminating...
[2025-02-24 14:00:39,573][00421] Component Batcher_0 stopped!
[2025-02-24 14:00:39,576][00421] Component RolloutWorker_w1 process died already! Don't wait for it.
[2025-02-24 14:00:39,578][00421] Component RolloutWorker_w2 process died already! Don't wait for it.
[2025-02-24 14:00:39,580][02523] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001002_4104192.pth...
[2025-02-24 14:00:39,634][02536] Weights refcount: 2 0
[2025-02-24 14:00:39,636][02536] Stopping InferenceWorker_p0-w0...
[2025-02-24 14:00:39,637][00421] Component InferenceWorker_p0-w0 stopped!
[2025-02-24 14:00:39,637][02536] Loop inference_proc0-0_evt_loop terminating...
[2025-02-24 14:00:39,710][02523] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000802_3284992.pth
[2025-02-24 14:00:39,730][02523] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001002_4104192.pth...
[2025-02-24 14:00:39,780][00421] Component RolloutWorker_w5 stopped!
[2025-02-24 14:00:39,783][02542] Stopping RolloutWorker_w5...
[2025-02-24 14:00:39,786][00421] Component RolloutWorker_w3 stopped!
[2025-02-24 14:00:39,789][02540] Stopping RolloutWorker_w3...
[2025-02-24 14:00:39,790][02542] Loop rollout_proc5_evt_loop terminating...
[2025-02-24 14:00:39,792][00421] Component RolloutWorker_w7 stopped!
[2025-02-24 14:00:39,794][02544] Stopping RolloutWorker_w7...
[2025-02-24 14:00:39,798][02544] Loop rollout_proc7_evt_loop terminating...
[2025-02-24 14:00:39,789][02540] Loop rollout_proc3_evt_loop terminating...
[2025-02-24 14:00:39,930][00421] Component LearnerWorker_p0 stopped!
[2025-02-24 14:00:39,937][02523] Stopping LearnerWorker_p0...
[2025-02-24 14:00:39,938][02523] Loop learner_proc0_evt_loop terminating...
[2025-02-24 14:00:40,037][02541] Stopping RolloutWorker_w4...
[2025-02-24 14:00:40,042][02541] Loop rollout_proc4_evt_loop terminating...
[2025-02-24 14:00:40,037][00421] Component RolloutWorker_w4 stopped!
[2025-02-24 14:00:40,049][02537] Stopping RolloutWorker_w0...
[2025-02-24 14:00:40,049][00421] Component RolloutWorker_w0 stopped!
[2025-02-24 14:00:40,053][02537] Loop rollout_proc0_evt_loop terminating...
[2025-02-24 14:00:40,063][02543] Stopping RolloutWorker_w6...
[2025-02-24 14:00:40,063][00421] Component RolloutWorker_w6 stopped!
[2025-02-24 14:00:40,065][02543] Loop rollout_proc6_evt_loop terminating...
[2025-02-24 14:00:40,067][00421] Waiting for process learner_proc0 to stop...
[2025-02-24 14:00:41,537][00421] Waiting for process inference_proc0-0 to join...
[2025-02-24 14:00:41,543][00421] Waiting for process rollout_proc0 to join...
[2025-02-24 14:00:42,983][00421] Waiting for process rollout_proc1 to join...
[2025-02-24 14:00:42,984][00421] Waiting for process rollout_proc2 to join...
[2025-02-24 14:00:42,985][00421] Waiting for process rollout_proc3 to join...
[2025-02-24 14:00:42,989][00421] Waiting for process rollout_proc4 to join...
[2025-02-24 14:00:42,991][00421] Waiting for process rollout_proc5 to join...
[2025-02-24 14:00:42,993][00421] Waiting for process rollout_proc6 to join...
[2025-02-24 14:00:42,994][00421] Waiting for process rollout_proc7 to join...
[2025-02-24 14:00:42,996][00421] Batcher 0 profile tree view:
batching: 23.9179, releasing_batches: 0.0258
[2025-02-24 14:00:42,999][00421] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 414.1620
update_model: 8.2340
weight_update: 0.0034
one_step: 0.0023
handle_policy_step: 555.8728
deserialize: 13.3794, stack: 3.0695, obs_to_device_normalize: 122.1087, forward: 290.8678, send_messages: 23.0411
prepare_outputs: 80.1391
to_cpu: 50.6956
[2025-02-24 14:00:43,002][00421] Learner 0 profile tree view:
misc: 0.0047, prepare_batch: 12.9735
train: 70.5652
epoch_init: 0.0046, minibatch_init: 0.0057, losses_postprocess: 0.6048, kl_divergence: 0.5619, after_optimizer: 34.5746
calculate_losses: 23.3766
losses_init: 0.0033, forward_head: 1.1695, bptt_initial: 15.8369, tail: 0.9677, advantages_returns: 0.2632, losses: 3.1271
bptt: 1.7559
bptt_forward_core: 1.6674
update: 10.8575
clip: 0.8442
[2025-02-24 14:00:43,003][00421] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.3136, enqueue_policy_requests: 114.3962, env_step: 773.2600, overhead: 13.0113, complete_rollouts: 7.1322
save_policy_outputs: 19.4649
split_output_tensors: 7.4367
[2025-02-24 14:00:43,004][00421] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.3050, enqueue_policy_requests: 116.9404, env_step: 771.1547, overhead: 12.9429, complete_rollouts: 6.1954
save_policy_outputs: 19.6356
split_output_tensors: 7.5472
[2025-02-24 14:00:43,006][00421] Loop Runner_EvtLoop terminating...
[2025-02-24 14:00:43,007][00421] Runner profile tree view:
main_loop: 1036.8769
[2025-02-24 14:00:43,008][00421] Collected {0: 4104192}, FPS: 3958.2
[2025-02-24 14:00:43,048][00421] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-24 14:00:43,048][00421] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-24 14:00:43,050][00421] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-24 14:00:43,051][00421] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-24 14:00:43,052][00421] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-24 14:00:43,053][00421] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-24 14:00:43,053][00421] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-02-24 14:00:43,054][00421] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-24 14:00:43,055][00421] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-02-24 14:00:43,056][00421] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-02-24 14:00:43,056][00421] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-24 14:00:43,057][00421] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-24 14:00:43,057][00421] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-24 14:00:43,058][00421] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-24 14:00:43,059][00421] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-24 14:00:43,090][00421] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-24 14:00:43,093][00421] RunningMeanStd input shape: (3, 72, 128)
[2025-02-24 14:00:43,095][00421] RunningMeanStd input shape: (1,)
[2025-02-24 14:00:43,108][00421] ConvEncoder: input_channels=3
[2025-02-24 14:00:43,206][00421] Conv encoder output size: 512
[2025-02-24 14:00:43,207][00421] Policy head output size: 512
[2025-02-24 14:00:43,376][00421] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001002_4104192.pth...
[2025-02-24 14:00:44,126][00421] Num frames 100...
[2025-02-24 14:00:44,254][00421] Num frames 200...
[2025-02-24 14:00:44,381][00421] Num frames 300...
[2025-02-24 14:00:44,510][00421] Num frames 400...
[2025-02-24 14:00:44,638][00421] Num frames 500...
[2025-02-24 14:00:44,767][00421] Num frames 600...
[2025-02-24 14:00:44,897][00421] Num frames 700...
[2025-02-24 14:00:45,032][00421] Num frames 800...
[2025-02-24 14:00:45,160][00421] Num frames 900...
[2025-02-24 14:00:45,288][00421] Num frames 1000...
[2025-02-24 14:00:45,434][00421] Avg episode rewards: #0: 23.720, true rewards: #0: 10.720
[2025-02-24 14:00:45,435][00421] Avg episode reward: 23.720, avg true_objective: 10.720
[2025-02-24 14:00:45,474][00421] Num frames 1100...
[2025-02-24 14:00:45,603][00421] Num frames 1200...
[2025-02-24 14:00:45,732][00421] Num frames 1300...
[2025-02-24 14:00:45,859][00421] Num frames 1400...
[2025-02-24 14:00:46,037][00421] Avg episode rewards: #0: 16.955, true rewards: #0: 7.455
[2025-02-24 14:00:46,038][00421] Avg episode reward: 16.955, avg true_objective: 7.455
[2025-02-24 14:00:46,053][00421] Num frames 1500...
[2025-02-24 14:00:46,181][00421] Num frames 1600...
[2025-02-24 14:00:46,307][00421] Num frames 1700...
[2025-02-24 14:00:46,437][00421] Num frames 1800...
[2025-02-24 14:00:46,592][00421] Avg episode rewards: #0: 13.257, true rewards: #0: 6.257
[2025-02-24 14:00:46,593][00421] Avg episode reward: 13.257, avg true_objective: 6.257
[2025-02-24 14:00:46,624][00421] Num frames 1900...
[2025-02-24 14:00:46,755][00421] Num frames 2000...
[2025-02-24 14:00:46,881][00421] Num frames 2100...
[2025-02-24 14:00:46,940][00421] Avg episode rewards: #0: 11.003, true rewards: #0: 5.252
[2025-02-24 14:00:46,940][00421] Avg episode reward: 11.003, avg true_objective: 5.252
[2025-02-24 14:00:47,074][00421] Num frames 2200...
[2025-02-24 14:00:47,200][00421] Num frames 2300...
[2025-02-24 14:00:47,326][00421] Num frames 2400...
[2025-02-24 14:00:47,452][00421] Num frames 2500...
[2025-02-24 14:00:47,600][00421] Num frames 2600...
[2025-02-24 14:00:47,777][00421] Num frames 2700...
[2025-02-24 14:00:47,958][00421] Num frames 2800...
[2025-02-24 14:00:48,126][00421] Num frames 2900...
[2025-02-24 14:00:48,292][00421] Num frames 3000...
[2025-02-24 14:00:48,464][00421] Num frames 3100...
[2025-02-24 14:00:48,630][00421] Num frames 3200...
[2025-02-24 14:00:48,801][00421] Num frames 3300...
[2025-02-24 14:00:48,973][00421] Num frames 3400...
[2025-02-24 14:00:49,164][00421] Num frames 3500...
[2025-02-24 14:00:49,342][00421] Num frames 3600...
[2025-02-24 14:00:49,497][00421] Num frames 3700...
[2025-02-24 14:00:49,632][00421] Num frames 3800...
[2025-02-24 14:00:49,762][00421] Num frames 3900...
[2025-02-24 14:00:49,893][00421] Num frames 4000...
[2025-02-24 14:00:50,022][00421] Num frames 4100...
[2025-02-24 14:00:50,157][00421] Num frames 4200...
[2025-02-24 14:00:50,214][00421] Avg episode rewards: #0: 19.802, true rewards: #0: 8.402
[2025-02-24 14:00:50,215][00421] Avg episode reward: 19.802, avg true_objective: 8.402
[2025-02-24 14:00:50,341][00421] Num frames 4300...
[2025-02-24 14:00:50,466][00421] Num frames 4400...
[2025-02-24 14:00:50,592][00421] Num frames 4500...
[2025-02-24 14:00:50,724][00421] Num frames 4600...
[2025-02-24 14:00:50,850][00421] Num frames 4700...
[2025-02-24 14:00:50,930][00421] Avg episode rewards: #0: 18.028, true rewards: #0: 7.862
[2025-02-24 14:00:50,931][00421] Avg episode reward: 18.028, avg true_objective: 7.862
[2025-02-24 14:00:51,037][00421] Num frames 4800...
[2025-02-24 14:00:51,170][00421] Num frames 4900...
[2025-02-24 14:00:51,306][00421] Num frames 5000...
[2025-02-24 14:00:51,437][00421] Num frames 5100...
[2025-02-24 14:00:51,495][00421] Avg episode rewards: #0: 16.001, true rewards: #0: 7.287
[2025-02-24 14:00:51,495][00421] Avg episode reward: 16.001, avg true_objective: 7.287
[2025-02-24 14:00:51,622][00421] Num frames 5200...
[2025-02-24 14:00:51,754][00421] Num frames 5300...
[2025-02-24 14:00:51,883][00421] Num frames 5400...
[2025-02-24 14:00:52,014][00421] Num frames 5500...
[2025-02-24 14:00:52,147][00421] Num frames 5600...
[2025-02-24 14:00:52,276][00421] Num frames 5700...
[2025-02-24 14:00:52,403][00421] Num frames 5800...
[2025-02-24 14:00:52,530][00421] Num frames 5900...
[2025-02-24 14:00:52,657][00421] Num frames 6000...
[2025-02-24 14:00:52,836][00421] Avg episode rewards: #0: 16.991, true rewards: #0: 7.616
[2025-02-24 14:00:52,837][00421] Avg episode reward: 16.991, avg true_objective: 7.616
[2025-02-24 14:00:52,848][00421] Num frames 6100...
[2025-02-24 14:00:52,974][00421] Num frames 6200...
[2025-02-24 14:00:53,100][00421] Num frames 6300...
[2025-02-24 14:00:53,232][00421] Num frames 6400...
[2025-02-24 14:00:53,362][00421] Num frames 6500...
[2025-02-24 14:00:53,492][00421] Num frames 6600...
[2025-02-24 14:00:53,619][00421] Num frames 6700...
[2025-02-24 14:00:53,746][00421] Num frames 6800...
[2025-02-24 14:00:53,871][00421] Num frames 6900...
[2025-02-24 14:00:53,999][00421] Num frames 7000...
[2025-02-24 14:00:54,127][00421] Num frames 7100...
[2025-02-24 14:00:54,250][00421] Avg episode rewards: #0: 17.388, true rewards: #0: 7.943
[2025-02-24 14:00:54,251][00421] Avg episode reward: 17.388, avg true_objective: 7.943
[2025-02-24 14:00:54,316][00421] Num frames 7200...
[2025-02-24 14:00:54,442][00421] Num frames 7300...
[2025-02-24 14:00:54,569][00421] Num frames 7400...
[2025-02-24 14:00:54,697][00421] Avg episode rewards: #0: 16.358, true rewards: #0: 7.458
[2025-02-24 14:00:54,698][00421] Avg episode reward: 16.358, avg true_objective: 7.458
[2025-02-24 14:01:39,443][00421] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-24 14:01:39,770][00421] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-24 14:01:39,774][00421] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-24 14:01:39,776][00421] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-24 14:01:39,778][00421] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-24 14:01:39,780][00421] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-24 14:01:39,783][00421] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-24 14:01:39,784][00421] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-02-24 14:01:39,786][00421] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-24 14:01:39,787][00421] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-02-24 14:01:39,788][00421] Adding new argument 'hf_repository'='AriYusa/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-02-24 14:01:39,789][00421] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-24 14:01:39,790][00421] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-24 14:01:39,792][00421] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-24 14:01:39,793][00421] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-24 14:01:39,793][00421] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-24 14:01:39,840][00421] RunningMeanStd input shape: (3, 72, 128)
[2025-02-24 14:01:39,841][00421] RunningMeanStd input shape: (1,)
[2025-02-24 14:01:39,856][00421] ConvEncoder: input_channels=3
[2025-02-24 14:01:39,905][00421] Conv encoder output size: 512
[2025-02-24 14:01:39,907][00421] Policy head output size: 512
[2025-02-24 14:01:39,933][00421] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001002_4104192.pth...
[2025-02-24 14:01:40,570][00421] Num frames 100...
[2025-02-24 14:01:40,732][00421] Num frames 200...
[2025-02-24 14:01:40,897][00421] Num frames 300...
[2025-02-24 14:01:41,058][00421] Num frames 400...
[2025-02-24 14:01:41,214][00421] Num frames 500...
[2025-02-24 14:01:41,375][00421] Num frames 600...
[2025-02-24 14:01:41,533][00421] Num frames 700...
[2025-02-24 14:01:41,704][00421] Num frames 800...
[2025-02-24 14:01:41,870][00421] Num frames 900...
[2025-02-24 14:01:42,050][00421] Num frames 1000...
[2025-02-24 14:01:42,237][00421] Num frames 1100...
[2025-02-24 14:01:42,371][00421] Num frames 1200...
[2025-02-24 14:01:42,496][00421] Num frames 1300...
[2025-02-24 14:01:42,566][00421] Avg episode rewards: #0: 30.120, true rewards: #0: 13.120
[2025-02-24 14:01:42,567][00421] Avg episode reward: 30.120, avg true_objective: 13.120
[2025-02-24 14:01:42,699][00421] Num frames 1400...
[2025-02-24 14:01:42,832][00421] Num frames 1500...
[2025-02-24 14:01:42,968][00421] Num frames 1600...
[2025-02-24 14:01:43,098][00421] Num frames 1700...
[2025-02-24 14:01:43,223][00421] Num frames 1800...
[2025-02-24 14:01:43,349][00421] Num frames 1900...
[2025-02-24 14:01:43,476][00421] Num frames 2000...
[2025-02-24 14:01:43,602][00421] Num frames 2100...
[2025-02-24 14:01:43,733][00421] Num frames 2200...
[2025-02-24 14:01:43,861][00421] Num frames 2300...
[2025-02-24 14:01:43,995][00421] Num frames 2400...
[2025-02-24 14:01:44,121][00421] Num frames 2500...
[2025-02-24 14:01:44,249][00421] Num frames 2600...
[2025-02-24 14:01:44,375][00421] Num frames 2700...
[2025-02-24 14:01:44,502][00421] Num frames 2800...
[2025-02-24 14:01:44,627][00421] Num frames 2900...
[2025-02-24 14:01:44,755][00421] Num frames 3000...
[2025-02-24 14:01:44,885][00421] Num frames 3100...
[2025-02-24 14:01:45,019][00421] Num frames 3200...
[2025-02-24 14:01:45,145][00421] Num frames 3300...
[2025-02-24 14:01:45,270][00421] Num frames 3400...
[2025-02-24 14:01:45,340][00421] Avg episode rewards: #0: 44.060, true rewards: #0: 17.060
[2025-02-24 14:01:45,341][00421] Avg episode reward: 44.060, avg true_objective: 17.060
[2025-02-24 14:01:45,451][00421] Num frames 3500...
[2025-02-24 14:01:45,575][00421] Num frames 3600...
[2025-02-24 14:01:45,701][00421] Num frames 3700...
[2025-02-24 14:01:45,829][00421] Num frames 3800...
[2025-02-24 14:01:45,951][00421] Num frames 3900...
[2025-02-24 14:01:46,089][00421] Num frames 4000...
[2025-02-24 14:01:46,216][00421] Num frames 4100...
[2025-02-24 14:01:46,345][00421] Num frames 4200...
[2025-02-24 14:01:46,469][00421] Num frames 4300...
[2025-02-24 14:01:46,591][00421] Num frames 4400...
[2025-02-24 14:01:46,722][00421] Num frames 4500...
[2025-02-24 14:01:46,849][00421] Num frames 4600...
[2025-02-24 14:01:46,975][00421] Num frames 4700...
[2025-02-24 14:01:47,109][00421] Num frames 4800...
[2025-02-24 14:01:47,233][00421] Num frames 4900...
[2025-02-24 14:01:47,360][00421] Num frames 5000...
[2025-02-24 14:01:47,488][00421] Num frames 5100...
[2025-02-24 14:01:47,612][00421] Num frames 5200...
[2025-02-24 14:01:47,743][00421] Num frames 5300...
[2025-02-24 14:01:47,873][00421] Num frames 5400...
[2025-02-24 14:01:47,999][00421] Num frames 5500...
[2025-02-24 14:01:48,069][00421] Avg episode rewards: #0: 50.039, true rewards: #0: 18.373
[2025-02-24 14:01:48,070][00421] Avg episode reward: 50.039, avg true_objective: 18.373
[2025-02-24 14:01:48,181][00421] Num frames 5600...
[2025-02-24 14:01:48,307][00421] Num frames 5700...
[2025-02-24 14:01:48,431][00421] Num frames 5800...
[2025-02-24 14:01:48,557][00421] Num frames 5900...
[2025-02-24 14:01:48,683][00421] Num frames 6000...
[2025-02-24 14:01:48,815][00421] Num frames 6100...
[2025-02-24 14:01:48,940][00421] Num frames 6200...
[2025-02-24 14:01:49,120][00421] Num frames 6300...
[2025-02-24 14:01:49,302][00421] Num frames 6400...
[2025-02-24 14:01:49,475][00421] Num frames 6500...
[2025-02-24 14:01:49,641][00421] Avg episode rewards: #0: 44.670, true rewards: #0: 16.420
[2025-02-24 14:01:49,644][00421] Avg episode reward: 44.670, avg true_objective: 16.420
[2025-02-24 14:01:49,702][00421] Num frames 6600...
[2025-02-24 14:01:49,866][00421] Num frames 6700...
[2025-02-24 14:01:50,027][00421] Num frames 6800...
[2025-02-24 14:01:50,194][00421] Num frames 6900...
[2025-02-24 14:01:50,373][00421] Num frames 7000...
[2025-02-24 14:01:50,546][00421] Num frames 7100...
[2025-02-24 14:01:50,727][00421] Num frames 7200...
[2025-02-24 14:01:50,893][00421] Num frames 7300...
[2025-02-24 14:01:51,021][00421] Num frames 7400...
[2025-02-24 14:01:51,153][00421] Num frames 7500...
[2025-02-24 14:01:51,283][00421] Num frames 7600...
[2025-02-24 14:01:51,409][00421] Num frames 7700...
[2025-02-24 14:01:51,534][00421] Num frames 7800...
[2025-02-24 14:01:51,661][00421] Num frames 7900...
[2025-02-24 14:01:51,793][00421] Num frames 8000...
[2025-02-24 14:01:51,893][00421] Avg episode rewards: #0: 43.270, true rewards: #0: 16.070
[2025-02-24 14:01:51,894][00421] Avg episode reward: 43.270, avg true_objective: 16.070
[2025-02-24 14:01:51,974][00421] Num frames 8100...
[2025-02-24 14:01:52,099][00421] Num frames 8200...
[2025-02-24 14:01:52,239][00421] Num frames 8300...
[2025-02-24 14:01:52,382][00421] Num frames 8400...
[2025-02-24 14:01:52,512][00421] Num frames 8500...
[2025-02-24 14:01:52,641][00421] Num frames 8600...
[2025-02-24 14:01:52,773][00421] Avg episode rewards: #0: 38.263, true rewards: #0: 14.430
[2025-02-24 14:01:52,774][00421] Avg episode reward: 38.263, avg true_objective: 14.430
[2025-02-24 14:01:52,828][00421] Num frames 8700...
[2025-02-24 14:01:52,951][00421] Num frames 8800...
[2025-02-24 14:01:53,081][00421] Num frames 8900...
[2025-02-24 14:01:53,216][00421] Num frames 9000...
[2025-02-24 14:01:53,343][00421] Num frames 9100...
[2025-02-24 14:01:53,470][00421] Num frames 9200...
[2025-02-24 14:01:53,594][00421] Num frames 9300...
[2025-02-24 14:01:53,727][00421] Num frames 9400...
[2025-02-24 14:01:53,855][00421] Num frames 9500...
[2025-02-24 14:01:53,939][00421] Avg episode rewards: #0: 35.460, true rewards: #0: 13.603
[2025-02-24 14:01:53,939][00421] Avg episode reward: 35.460, avg true_objective: 13.603
[2025-02-24 14:01:54,037][00421] Num frames 9600...
[2025-02-24 14:01:54,162][00421] Num frames 9700...
[2025-02-24 14:01:54,299][00421] Num frames 9800...
[2025-02-24 14:01:54,424][00421] Num frames 9900...
[2025-02-24 14:01:54,550][00421] Num frames 10000...
[2025-02-24 14:01:54,677][00421] Num frames 10100...
[2025-02-24 14:01:54,854][00421] Avg episode rewards: #0: 32.742, true rewards: #0: 12.742
[2025-02-24 14:01:54,854][00421] Avg episode reward: 32.742, avg true_objective: 12.742
[2025-02-24 14:01:54,863][00421] Num frames 10200...
[2025-02-24 14:01:54,987][00421] Num frames 10300...
[2025-02-24 14:01:55,111][00421] Num frames 10400...
[2025-02-24 14:01:55,245][00421] Num frames 10500...
[2025-02-24 14:01:55,373][00421] Num frames 10600...
[2025-02-24 14:01:55,500][00421] Num frames 10700...
[2025-02-24 14:01:55,628][00421] Num frames 10800...
[2025-02-24 14:01:55,761][00421] Num frames 10900...
[2025-02-24 14:01:55,888][00421] Num frames 11000...
[2025-02-24 14:01:56,017][00421] Num frames 11100...
[2025-02-24 14:01:56,145][00421] Num frames 11200...
[2025-02-24 14:01:56,224][00421] Avg episode rewards: #0: 31.687, true rewards: #0: 12.464
[2025-02-24 14:01:56,225][00421] Avg episode reward: 31.687, avg true_objective: 12.464
[2025-02-24 14:01:56,339][00421] Num frames 11300...
[2025-02-24 14:01:56,464][00421] Num frames 11400...
[2025-02-24 14:01:56,590][00421] Num frames 11500...
[2025-02-24 14:01:56,719][00421] Num frames 11600...
[2025-02-24 14:01:56,847][00421] Num frames 11700...
[2025-02-24 14:01:56,976][00421] Num frames 11800...
[2025-02-24 14:01:57,101][00421] Num frames 11900...
[2025-02-24 14:01:57,228][00421] Num frames 12000...
[2025-02-24 14:01:57,362][00421] Num frames 12100...
[2025-02-24 14:01:57,488][00421] Num frames 12200...
[2025-02-24 14:01:57,615][00421] Num frames 12300...
[2025-02-24 14:01:57,744][00421] Num frames 12400...
[2025-02-24 14:01:57,843][00421] Avg episode rewards: #0: 31.034, true rewards: #0: 12.434
[2025-02-24 14:01:57,844][00421] Avg episode reward: 31.034, avg true_objective: 12.434
[2025-02-24 14:03:10,135][00421] Replay video saved to /content/train_dir/default_experiment/replay.mp4!