[2025-08-18 09:33:37,452][910102] Saving configuration to /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/config.json... [2025-08-18 09:33:37,453][910102] Rollout worker 0 uses device cpu [2025-08-18 09:33:37,453][910102] Rollout worker 1 uses device cpu [2025-08-18 09:33:37,453][910102] Rollout worker 2 uses device cpu [2025-08-18 09:33:37,453][910102] Rollout worker 3 uses device cpu [2025-08-18 09:33:37,478][910102] InferenceWorker_p0-w0: min num requests: 1 [2025-08-18 09:33:37,486][910102] Starting all processes... [2025-08-18 09:33:37,486][910102] Starting process learner_proc0 [2025-08-18 09:33:38,432][910102] Starting all processes... [2025-08-18 09:33:38,435][910102] Starting process inference_proc0-0 [2025-08-18 09:33:38,435][910102] Starting process rollout_proc0 [2025-08-18 09:33:38,435][910239] Starting seed is not provided [2025-08-18 09:33:38,436][910239] Initializing actor-critic model on device cpu [2025-08-18 09:33:38,436][910239] RunningMeanStd input shape: (3, 72, 128) [2025-08-18 09:33:38,436][910239] RunningMeanStd input shape: (1,) [2025-08-18 09:33:38,435][910102] Starting process rollout_proc1 [2025-08-18 09:33:38,435][910102] Starting process rollout_proc2 [2025-08-18 09:33:38,437][910102] Starting process rollout_proc3 [2025-08-18 09:33:38,443][910239] ConvEncoder: input_channels=3 [2025-08-18 09:33:38,502][910239] Conv encoder output size: 512 [2025-08-18 09:33:38,502][910239] Policy head output size: 512 [2025-08-18 09:33:38,510][910239] Created Actor Critic model with architecture: [2025-08-18 09:33:38,510][910239] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-08-18 09:33:38,661][910239] Using optimizer [2025-08-18 09:33:39,368][910239] No checkpoints found [2025-08-18 09:33:39,368][910239] Did not load from checkpoint, starting from scratch! [2025-08-18 09:33:39,368][910239] Initialized policy 0 weights for model version 0 [2025-08-18 09:33:39,370][910239] LearnerWorker_p0 finished initialization! [2025-08-18 09:33:39,445][910313] Worker 0 uses CPU cores [0, 1, 2, 3] [2025-08-18 09:33:39,508][910315] Worker 3 uses CPU cores [12, 13, 14, 15] [2025-08-18 09:33:39,535][910312] RunningMeanStd input shape: (3, 72, 128) [2025-08-18 09:33:39,536][910312] RunningMeanStd input shape: (1,) [2025-08-18 09:33:39,539][910314] Worker 2 uses CPU cores [8, 9, 10, 11] [2025-08-18 09:33:39,543][910312] ConvEncoder: input_channels=3 [2025-08-18 09:33:39,545][910316] Worker 1 uses CPU cores [4, 5, 6, 7] [2025-08-18 09:33:39,598][910312] Conv encoder output size: 512 [2025-08-18 09:33:39,598][910312] Policy head output size: 512 [2025-08-18 09:33:39,609][910102] Inference worker 0-0 is ready! [2025-08-18 09:33:39,609][910102] All inference workers are ready! Signal rollout workers to start! [2025-08-18 09:33:39,626][910316] Doom resolution: 160x120, resize resolution: (128, 72) [2025-08-18 09:33:39,626][910314] Doom resolution: 160x120, resize resolution: (128, 72) [2025-08-18 09:33:39,626][910313] Doom resolution: 160x120, resize resolution: (128, 72) [2025-08-18 09:33:39,627][910315] Doom resolution: 160x120, resize resolution: (128, 72) [2025-08-18 09:33:39,813][910315] Decorrelating experience for 0 frames... [2025-08-18 09:33:39,813][910316] Decorrelating experience for 0 frames... [2025-08-18 09:33:39,960][910315] Decorrelating experience for 32 frames... [2025-08-18 09:33:39,960][910316] Decorrelating experience for 32 frames... [2025-08-18 09:33:40,117][910316] Decorrelating experience for 64 frames... [2025-08-18 09:33:40,117][910315] Decorrelating experience for 64 frames... [2025-08-18 09:33:40,117][910314] Decorrelating experience for 0 frames... [2025-08-18 09:33:40,273][910314] Decorrelating experience for 32 frames... [2025-08-18 09:33:40,275][910313] Decorrelating experience for 0 frames... [2025-08-18 09:33:40,286][910316] Decorrelating experience for 96 frames... [2025-08-18 09:33:40,291][910315] Decorrelating experience for 96 frames... [2025-08-18 09:33:40,430][910314] Decorrelating experience for 64 frames... [2025-08-18 09:33:40,477][910313] Decorrelating experience for 32 frames... [2025-08-18 09:33:40,514][910316] Decorrelating experience for 128 frames... [2025-08-18 09:33:40,517][910315] Decorrelating experience for 128 frames... [2025-08-18 09:33:40,635][910313] Decorrelating experience for 64 frames... [2025-08-18 09:33:40,700][910316] Decorrelating experience for 160 frames... [2025-08-18 09:33:40,705][910314] Decorrelating experience for 96 frames... [2025-08-18 09:33:40,805][910313] Decorrelating experience for 96 frames... [2025-08-18 09:33:40,854][910315] Decorrelating experience for 160 frames... [2025-08-18 09:33:40,934][910314] Decorrelating experience for 128 frames... [2025-08-18 09:33:40,992][910316] Decorrelating experience for 192 frames... [2025-08-18 09:33:41,038][910313] Decorrelating experience for 128 frames... [2025-08-18 09:33:41,045][910315] Decorrelating experience for 192 frames... [2025-08-18 09:33:41,124][910314] Decorrelating experience for 160 frames... [2025-08-18 09:33:41,205][910316] Decorrelating experience for 224 frames... [2025-08-18 09:33:41,237][910313] Decorrelating experience for 160 frames... [2025-08-18 09:33:41,288][910102] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-08-18 09:33:41,389][910315] Decorrelating experience for 224 frames... [2025-08-18 09:33:41,420][910314] Decorrelating experience for 192 frames... [2025-08-18 09:33:41,449][910313] Decorrelating experience for 192 frames... [2025-08-18 09:33:41,632][910314] Decorrelating experience for 224 frames... [2025-08-18 09:33:41,663][910313] Decorrelating experience for 224 frames... [2025-08-18 09:33:42,649][910239] Signal inference workers to stop experience collection... [2025-08-18 09:33:42,657][910312] InferenceWorker_p0-w0: stopping experience collection [2025-08-18 09:33:43,430][910239] Signal inference workers to resume experience collection... [2025-08-18 09:33:43,430][910312] InferenceWorker_p0-w0: resuming experience collection [2025-08-18 09:33:46,288][910102] Fps is (10 sec: 2457.6, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 12288. Throughput: 0: 966.4. Samples: 4832. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) [2025-08-18 09:33:46,288][910102] Avg episode reward: [(0, '3.358')] [2025-08-18 09:33:51,288][910102] Fps is (10 sec: 2867.2, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 28672. Throughput: 0: 744.8. Samples: 7448. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:33:51,288][910102] Avg episode reward: [(0, '3.768')] [2025-08-18 09:33:54,252][910312] Updated weights for policy 0, policy_version 10 (0.2412) [2025-08-18 09:33:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3003.7, 300 sec: 3003.7). Total num frames: 45056. Throughput: 0: 860.5. Samples: 12908. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:33:56,288][910102] Avg episode reward: [(0, '4.393')] [2025-08-18 09:33:57,473][910102] Heartbeat connected on Batcher_0 [2025-08-18 09:33:57,479][910102] Heartbeat connected on InferenceWorker_p0-w0 [2025-08-18 09:33:57,481][910102] Heartbeat connected on RolloutWorker_w0 [2025-08-18 09:33:57,483][910102] Heartbeat connected on RolloutWorker_w1 [2025-08-18 09:33:57,484][910102] Heartbeat connected on RolloutWorker_w2 [2025-08-18 09:33:57,486][910102] Heartbeat connected on RolloutWorker_w3 [2025-08-18 09:33:58,730][910102] Heartbeat connected on LearnerWorker_p0 [2025-08-18 09:34:01,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 65536. Throughput: 0: 905.2. Samples: 18104. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:34:01,288][910102] Avg episode reward: [(0, '4.505')] [2025-08-18 09:34:06,032][910312] Updated weights for policy 0, policy_version 20 (0.2409) [2025-08-18 09:34:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 81920. Throughput: 0: 807.8. Samples: 20196. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:34:06,288][910102] Avg episode reward: [(0, '4.444')] [2025-08-18 09:34:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 98304. Throughput: 0: 844.1. Samples: 25324. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:34:11,288][910102] Avg episode reward: [(0, '4.358')] [2025-08-18 09:34:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 114688. Throughput: 0: 873.6. Samples: 30576. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:34:16,288][910102] Avg episode reward: [(0, '4.493')] [2025-08-18 09:34:17,697][910239] Saving new best policy, reward=4.493! [2025-08-18 09:34:17,700][910312] Updated weights for policy 0, policy_version 30 (0.1939) [2025-08-18 09:34:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 131072. Throughput: 0: 836.9. Samples: 33476. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:34:21,288][910102] Avg episode reward: [(0, '4.561')] [2025-08-18 09:34:22,495][910239] Saving new best policy, reward=4.561! [2025-08-18 09:34:26,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3367.8, 300 sec: 3367.8). Total num frames: 151552. Throughput: 0: 857.3. Samples: 38580. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:34:26,288][910102] Avg episode reward: [(0, '4.596')] [2025-08-18 09:34:27,282][910239] Saving new best policy, reward=4.596! [2025-08-18 09:34:29,928][910312] Updated weights for policy 0, policy_version 40 (0.1938) [2025-08-18 09:34:31,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3358.7, 300 sec: 3358.7). Total num frames: 167936. Throughput: 0: 864.4. Samples: 43728. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:34:31,288][910102] Avg episode reward: [(0, '4.591')] [2025-08-18 09:34:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3351.3, 300 sec: 3351.3). Total num frames: 184320. Throughput: 0: 864.5. Samples: 46352. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:34:36,288][910102] Avg episode reward: [(0, '4.469')] [2025-08-18 09:34:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3345.1). Total num frames: 200704. Throughput: 0: 864.7. Samples: 51820. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:34:41,288][910102] Avg episode reward: [(0, '4.357')] [2025-08-18 09:34:41,626][910312] Updated weights for policy 0, policy_version 50 (0.1907) [2025-08-18 09:34:42,046][910239] Signal inference workers to stop experience collection... (50 times) [2025-08-18 09:34:42,060][910312] InferenceWorker_p0-w0: stopping experience collection (50 times) [2025-08-18 09:34:42,581][910239] Signal inference workers to resume experience collection... (50 times) [2025-08-18 09:34:42,582][910312] InferenceWorker_p0-w0: resuming experience collection (50 times) [2025-08-18 09:34:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3402.8). Total num frames: 221184. Throughput: 0: 864.5. Samples: 57008. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:34:46,288][910102] Avg episode reward: [(0, '4.131')] [2025-08-18 09:34:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3393.8). Total num frames: 237568. Throughput: 0: 865.2. Samples: 59132. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-08-18 09:34:51,288][910102] Avg episode reward: [(0, '4.379')] [2025-08-18 09:34:53,431][910312] Updated weights for policy 0, policy_version 60 (0.2157) [2025-08-18 09:34:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3386.0). Total num frames: 253952. Throughput: 0: 866.6. Samples: 64320. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-08-18 09:34:56,288][910102] Avg episode reward: [(0, '4.464')] [2025-08-18 09:35:01,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3430.4). Total num frames: 274432. Throughput: 0: 878.0. Samples: 70088. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-08-18 09:35:01,288][910102] Avg episode reward: [(0, '4.625')] [2025-08-18 09:35:02,317][910239] Saving new best policy, reward=4.625! [2025-08-18 09:35:04,766][910312] Updated weights for policy 0, policy_version 70 (0.2113) [2025-08-18 09:35:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3421.4). Total num frames: 290816. Throughput: 0: 866.4. Samples: 72464. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-08-18 09:35:06,288][910102] Avg episode reward: [(0, '4.414')] [2025-08-18 09:35:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3458.8). Total num frames: 311296. Throughput: 0: 888.4. Samples: 78560. Policy #0 lag: (min: 1.0, avg: 1.7, max: 3.0) [2025-08-18 09:35:11,288][910102] Avg episode reward: [(0, '4.439')] [2025-08-18 09:35:15,959][910312] Updated weights for policy 0, policy_version 80 (0.2099) [2025-08-18 09:35:16,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3449.3). Total num frames: 327680. Throughput: 0: 887.6. Samples: 83668. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:35:16,288][910102] Avg episode reward: [(0, '4.434')] [2025-08-18 09:35:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3440.6). Total num frames: 344064. Throughput: 0: 892.9. Samples: 86532. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:35:21,288][910102] Avg episode reward: [(0, '4.668')] [2025-08-18 09:35:22,638][910239] Saving new best policy, reward=4.668! [2025-08-18 09:35:26,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.8). Total num frames: 364544. Throughput: 0: 890.0. Samples: 91868. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:35:26,288][910102] Avg episode reward: [(0, '4.497')] [2025-08-18 09:35:27,520][910312] Updated weights for policy 0, policy_version 90 (0.2124) [2025-08-18 09:35:31,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3463.0). Total num frames: 380928. Throughput: 0: 888.7. Samples: 97000. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:35:31,288][910102] Avg episode reward: [(0, '4.570')] [2025-08-18 09:35:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3454.9). Total num frames: 397312. Throughput: 0: 903.5. Samples: 99788. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:35:36,288][910102] Avg episode reward: [(0, '4.728')] [2025-08-18 09:35:37,760][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000099_405504.pth... [2025-08-18 09:35:37,780][910239] Saving new best policy, reward=4.728! [2025-08-18 09:35:39,210][910312] Updated weights for policy 0, policy_version 100 (0.2173) [2025-08-18 09:35:39,620][910239] Signal inference workers to stop experience collection... (100 times) [2025-08-18 09:35:39,628][910312] InferenceWorker_p0-w0: stopping experience collection (100 times) [2025-08-18 09:35:40,125][910239] Signal inference workers to resume experience collection... (100 times) [2025-08-18 09:35:40,126][910312] InferenceWorker_p0-w0: resuming experience collection (100 times) [2025-08-18 09:35:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3447.5). Total num frames: 413696. Throughput: 0: 907.4. Samples: 105152. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:35:41,288][910102] Avg episode reward: [(0, '4.568')] [2025-08-18 09:35:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3473.4). Total num frames: 434176. Throughput: 0: 893.2. Samples: 110280. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:35:46,288][910102] Avg episode reward: [(0, '4.571')] [2025-08-18 09:35:50,843][910312] Updated weights for policy 0, policy_version 110 (0.2405) [2025-08-18 09:35:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3465.8). Total num frames: 450560. Throughput: 0: 891.2. Samples: 112568. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:35:51,288][910102] Avg episode reward: [(0, '4.676')] [2025-08-18 09:35:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 878.0. Samples: 118072. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:35:56,288][910102] Avg episode reward: [(0, '4.736')] [2025-08-18 09:35:57,658][910239] Saving new best policy, reward=4.736! [2025-08-18 09:36:01,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3481.6). Total num frames: 487424. Throughput: 0: 882.8. Samples: 123396. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:36:01,288][910102] Avg episode reward: [(0, '4.814')] [2025-08-18 09:36:02,349][910239] Saving new best policy, reward=4.814! [2025-08-18 09:36:02,352][910312] Updated weights for policy 0, policy_version 120 (0.1696) [2025-08-18 09:36:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3474.5). Total num frames: 503808. Throughput: 0: 868.3. Samples: 125604. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:36:06,288][910102] Avg episode reward: [(0, '4.835')] [2025-08-18 09:36:07,176][910239] Saving new best policy, reward=4.835! [2025-08-18 09:36:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3467.9). Total num frames: 520192. Throughput: 0: 863.1. Samples: 130708. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:36:11,288][910102] Avg episode reward: [(0, '4.957')] [2025-08-18 09:36:11,933][910239] Saving new best policy, reward=4.957! [2025-08-18 09:36:14,528][910312] Updated weights for policy 0, policy_version 130 (0.1960) [2025-08-18 09:36:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3461.8). Total num frames: 536576. Throughput: 0: 862.9. Samples: 135832. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:36:16,288][910102] Avg episode reward: [(0, '4.749')] [2025-08-18 09:36:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3456.0). Total num frames: 552960. Throughput: 0: 868.5. Samples: 138872. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:36:21,288][910102] Avg episode reward: [(0, '4.649')] [2025-08-18 09:36:26,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3475.4). Total num frames: 573440. Throughput: 0: 863.3. Samples: 144000. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:36:26,288][910102] Avg episode reward: [(0, '4.627')] [2025-08-18 09:36:26,345][910312] Updated weights for policy 0, policy_version 140 (0.2402) [2025-08-18 09:36:31,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3469.6). Total num frames: 589824. Throughput: 0: 863.4. Samples: 149132. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:36:31,288][910102] Avg episode reward: [(0, '4.811')] [2025-08-18 09:36:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3464.0). Total num frames: 606208. Throughput: 0: 872.2. Samples: 151816. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:36:36,288][910102] Avg episode reward: [(0, '5.123')] [2025-08-18 09:36:37,875][910239] Saving new best policy, reward=5.123! [2025-08-18 09:36:37,877][910312] Updated weights for policy 0, policy_version 150 (0.1907) [2025-08-18 09:36:38,283][910239] Signal inference workers to stop experience collection... (150 times) [2025-08-18 09:36:38,295][910312] InferenceWorker_p0-w0: stopping experience collection (150 times) [2025-08-18 09:36:39,099][910239] Signal inference workers to resume experience collection... (150 times) [2025-08-18 09:36:39,100][910312] InferenceWorker_p0-w0: resuming experience collection (150 times) [2025-08-18 09:36:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3458.8). Total num frames: 622592. Throughput: 0: 865.4. Samples: 157016. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:36:41,288][910102] Avg episode reward: [(0, '5.151')] [2025-08-18 09:36:42,659][910239] Saving new best policy, reward=5.151! [2025-08-18 09:36:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3476.1). Total num frames: 643072. Throughput: 0: 863.5. Samples: 162252. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:36:46,288][910102] Avg episode reward: [(0, '4.991')] [2025-08-18 09:36:50,058][910312] Updated weights for policy 0, policy_version 160 (0.1934) [2025-08-18 09:36:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3470.8). Total num frames: 659456. Throughput: 0: 865.2. Samples: 164536. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:36:51,288][910102] Avg episode reward: [(0, '4.614')] [2025-08-18 09:36:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3465.8). Total num frames: 675840. Throughput: 0: 866.2. Samples: 169688. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:36:56,288][910102] Avg episode reward: [(0, '4.667')] [2025-08-18 09:37:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3461.1). Total num frames: 692224. Throughput: 0: 865.6. Samples: 174784. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:37:01,288][910102] Avg episode reward: [(0, '4.822')] [2025-08-18 09:37:01,798][910312] Updated weights for policy 0, policy_version 170 (0.2388) [2025-08-18 09:37:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3476.6). Total num frames: 712704. Throughput: 0: 865.0. Samples: 177796. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:37:06,288][910102] Avg episode reward: [(0, '4.838')] [2025-08-18 09:37:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.8). Total num frames: 729088. Throughput: 0: 865.0. Samples: 182924. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:37:11,288][910102] Avg episode reward: [(0, '4.708')] [2025-08-18 09:37:13,628][910312] Updated weights for policy 0, policy_version 180 (0.2161) [2025-08-18 09:37:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3467.3). Total num frames: 745472. Throughput: 0: 864.6. Samples: 188040. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:37:16,288][910102] Avg episode reward: [(0, '4.647')] [2025-08-18 09:37:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3463.0). Total num frames: 761856. Throughput: 0: 865.1. Samples: 190744. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:37:21,288][910102] Avg episode reward: [(0, '4.954')] [2025-08-18 09:37:25,441][910312] Updated weights for policy 0, policy_version 190 (0.2176) [2025-08-18 09:37:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3458.8). Total num frames: 778240. Throughput: 0: 870.2. Samples: 196176. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:37:26,288][910102] Avg episode reward: [(0, '5.228')] [2025-08-18 09:37:27,566][910239] Saving new best policy, reward=5.228! [2025-08-18 09:37:31,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3472.7). Total num frames: 798720. Throughput: 0: 869.0. Samples: 201356. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:37:31,288][910102] Avg episode reward: [(0, '5.356')] [2025-08-18 09:37:32,366][910239] Saving new best policy, reward=5.356! [2025-08-18 09:37:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3468.5). Total num frames: 815104. Throughput: 0: 865.0. Samples: 203460. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:37:36,288][910102] Avg episode reward: [(0, '5.211')] [2025-08-18 09:37:37,160][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000200_819200.pth... [2025-08-18 09:37:37,162][910312] Updated weights for policy 0, policy_version 200 (0.1701) [2025-08-18 09:37:37,543][910239] Signal inference workers to stop experience collection... (200 times) [2025-08-18 09:37:37,556][910312] InferenceWorker_p0-w0: stopping experience collection (200 times) [2025-08-18 09:37:38,380][910239] Signal inference workers to resume experience collection... (200 times) [2025-08-18 09:37:38,381][910312] InferenceWorker_p0-w0: resuming experience collection (200 times) [2025-08-18 09:37:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3464.5). Total num frames: 831488. Throughput: 0: 863.5. Samples: 208544. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:37:41,288][910102] Avg episode reward: [(0, '4.965')] [2025-08-18 09:37:46,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3460.7). Total num frames: 847872. Throughput: 0: 863.8. Samples: 213656. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:37:46,288][910102] Avg episode reward: [(0, '5.016')] [2025-08-18 09:37:49,347][910312] Updated weights for policy 0, policy_version 210 (0.2196) [2025-08-18 09:37:51,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.0). Total num frames: 864256. Throughput: 0: 863.8. Samples: 216668. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:37:51,288][910102] Avg episode reward: [(0, '5.280')] [2025-08-18 09:37:56,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3469.6). Total num frames: 884736. Throughput: 0: 863.4. Samples: 221776. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:37:56,288][910102] Avg episode reward: [(0, '5.230')] [2025-08-18 09:38:01,088][910312] Updated weights for policy 0, policy_version 220 (0.2117) [2025-08-18 09:38:01,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3465.8). Total num frames: 901120. Throughput: 0: 863.2. Samples: 226884. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:38:01,288][910102] Avg episode reward: [(0, '4.967')] [2025-08-18 09:38:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3462.3). Total num frames: 917504. Throughput: 0: 858.4. Samples: 229372. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:38:06,288][910102] Avg episode reward: [(0, '5.019')] [2025-08-18 09:38:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3458.8). Total num frames: 933888. Throughput: 0: 850.0. Samples: 234428. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:38:11,288][910102] Avg episode reward: [(0, '5.097')] [2025-08-18 09:38:13,007][910312] Updated weights for policy 0, policy_version 230 (0.2190) [2025-08-18 09:38:16,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3470.4). Total num frames: 954368. Throughput: 0: 855.1. Samples: 239836. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:38:16,288][910102] Avg episode reward: [(0, '5.108')] [2025-08-18 09:38:21,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3467.0). Total num frames: 970752. Throughput: 0: 862.7. Samples: 242280. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:38:21,289][910102] Avg episode reward: [(0, '5.098')] [2025-08-18 09:38:24,709][910312] Updated weights for policy 0, policy_version 240 (0.2411) [2025-08-18 09:38:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3463.6). Total num frames: 987136. Throughput: 0: 863.6. Samples: 247404. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:38:26,288][910102] Avg episode reward: [(0, '5.032')] [2025-08-18 09:38:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3460.4). Total num frames: 1003520. Throughput: 0: 863.2. Samples: 252500. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:38:31,288][910102] Avg episode reward: [(0, '4.876')] [2025-08-18 09:38:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 1019904. Throughput: 0: 863.7. Samples: 255536. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:38:36,288][910102] Avg episode reward: [(0, '5.049')] [2025-08-18 09:38:36,630][910312] Updated weights for policy 0, policy_version 250 (0.2414) [2025-08-18 09:38:37,019][910239] Signal inference workers to stop experience collection... (250 times) [2025-08-18 09:38:37,033][910312] InferenceWorker_p0-w0: stopping experience collection (250 times) [2025-08-18 09:38:37,591][910239] Signal inference workers to resume experience collection... (250 times) [2025-08-18 09:38:37,591][910312] InferenceWorker_p0-w0: resuming experience collection (250 times) [2025-08-18 09:38:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1040384. Throughput: 0: 862.7. Samples: 260596. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:38:41,288][910102] Avg episode reward: [(0, '5.239')] [2025-08-18 09:38:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1056768. Throughput: 0: 862.8. Samples: 265708. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:38:46,288][910102] Avg episode reward: [(0, '5.073')] [2025-08-18 09:38:48,532][910312] Updated weights for policy 0, policy_version 260 (0.2439) [2025-08-18 09:38:51,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1073152. Throughput: 0: 858.0. Samples: 267980. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:38:51,288][910102] Avg episode reward: [(0, '4.931')] [2025-08-18 09:38:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1089536. Throughput: 0: 855.6. Samples: 272928. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:38:56,288][910102] Avg episode reward: [(0, '5.072')] [2025-08-18 09:39:00,558][910312] Updated weights for policy 0, policy_version 270 (0.2410) [2025-08-18 09:39:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1105920. Throughput: 0: 851.8. Samples: 278168. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:01,288][910102] Avg episode reward: [(0, '5.433')] [2025-08-18 09:39:02,594][910239] Saving new best policy, reward=5.433! [2025-08-18 09:39:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1126400. Throughput: 0: 862.4. Samples: 281088. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:06,288][910102] Avg episode reward: [(0, '5.301')] [2025-08-18 09:39:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1142784. Throughput: 0: 864.2. Samples: 286292. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:11,288][910102] Avg episode reward: [(0, '5.623')] [2025-08-18 09:39:12,227][910312] Updated weights for policy 0, policy_version 280 (0.1651) [2025-08-18 09:39:13,141][910239] Saving new best policy, reward=5.623! [2025-08-18 09:39:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 1159168. Throughput: 0: 863.9. Samples: 291376. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:16,288][910102] Avg episode reward: [(0, '5.388')] [2025-08-18 09:39:21,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1179648. Throughput: 0: 863.5. Samples: 294392. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:21,288][910102] Avg episode reward: [(0, '5.503')] [2025-08-18 09:39:23,654][910312] Updated weights for policy 0, policy_version 290 (0.2072) [2025-08-18 09:39:26,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1196032. Throughput: 0: 866.0. Samples: 299564. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:26,288][910102] Avg episode reward: [(0, '5.144')] [2025-08-18 09:39:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1212416. Throughput: 0: 871.4. Samples: 304920. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:31,288][910102] Avg episode reward: [(0, '5.423')] [2025-08-18 09:39:35,057][910312] Updated weights for policy 0, policy_version 300 (0.2285) [2025-08-18 09:39:35,447][910239] Signal inference workers to stop experience collection... (300 times) [2025-08-18 09:39:35,454][910312] InferenceWorker_p0-w0: stopping experience collection (300 times) [2025-08-18 09:39:36,003][910239] Signal inference workers to resume experience collection... (300 times) [2025-08-18 09:39:36,003][910312] InferenceWorker_p0-w0: resuming experience collection (300 times) [2025-08-18 09:39:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 1232896. Throughput: 0: 883.1. Samples: 307720. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:36,288][910102] Avg episode reward: [(0, '5.754')] [2025-08-18 09:39:37,182][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth... [2025-08-18 09:39:37,200][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000099_405504.pth [2025-08-18 09:39:37,203][910239] Saving new best policy, reward=5.754! [2025-08-18 09:39:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1249280. Throughput: 0: 887.4. Samples: 312860. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:41,288][910102] Avg episode reward: [(0, '5.776')] [2025-08-18 09:39:43,140][910239] Saving new best policy, reward=5.776! [2025-08-18 09:39:46,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1265664. Throughput: 0: 883.6. Samples: 317932. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:46,288][910102] Avg episode reward: [(0, '5.901')] [2025-08-18 09:39:46,842][910312] Updated weights for policy 0, policy_version 310 (0.1639) [2025-08-18 09:39:47,796][910239] Saving new best policy, reward=5.901! [2025-08-18 09:39:51,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1282048. Throughput: 0: 887.1. Samples: 321008. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:51,288][910102] Avg episode reward: [(0, '5.626')] [2025-08-18 09:39:56,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1302528. Throughput: 0: 885.4. Samples: 326136. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:39:56,288][910102] Avg episode reward: [(0, '5.593')] [2025-08-18 09:39:58,766][910312] Updated weights for policy 0, policy_version 320 (0.1891) [2025-08-18 09:40:01,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1318912. Throughput: 0: 886.1. Samples: 331252. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:40:01,288][910102] Avg episode reward: [(0, '5.580')] [2025-08-18 09:40:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1335296. Throughput: 0: 871.7. Samples: 333620. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:40:06,288][910102] Avg episode reward: [(0, '5.942')] [2025-08-18 09:40:07,959][910239] Saving new best policy, reward=5.942! [2025-08-18 09:40:10,571][910312] Updated weights for policy 0, policy_version 330 (0.1882) [2025-08-18 09:40:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1351680. Throughput: 0: 875.8. Samples: 338976. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:40:11,288][910102] Avg episode reward: [(0, '6.026')] [2025-08-18 09:40:12,675][910239] Saving new best policy, reward=6.026! [2025-08-18 09:40:16,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1372160. Throughput: 0: 879.4. Samples: 344492. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:40:16,288][910102] Avg episode reward: [(0, '6.173')] [2025-08-18 09:40:17,394][910239] Saving new best policy, reward=6.173! [2025-08-18 09:40:21,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1388544. Throughput: 0: 866.1. Samples: 346696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:40:21,288][910102] Avg episode reward: [(0, '6.145')] [2025-08-18 09:40:22,393][910312] Updated weights for policy 0, policy_version 340 (0.1889) [2025-08-18 09:40:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1404928. Throughput: 0: 866.3. Samples: 351844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:40:26,288][910102] Avg episode reward: [(0, '6.486')] [2025-08-18 09:40:27,891][910239] Saving new best policy, reward=6.486! [2025-08-18 09:40:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1421312. Throughput: 0: 870.8. Samples: 357120. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:40:31,288][910102] Avg episode reward: [(0, '6.510')] [2025-08-18 09:40:32,647][910239] Saving new best policy, reward=6.510! [2025-08-18 09:40:34,098][910312] Updated weights for policy 0, policy_version 350 (0.1845) [2025-08-18 09:40:34,494][910239] Signal inference workers to stop experience collection... (350 times) [2025-08-18 09:40:34,508][910312] InferenceWorker_p0-w0: stopping experience collection (350 times) [2025-08-18 09:40:35,045][910239] Signal inference workers to resume experience collection... (350 times) [2025-08-18 09:40:35,046][910312] InferenceWorker_p0-w0: resuming experience collection (350 times) [2025-08-18 09:40:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1441792. Throughput: 0: 866.1. Samples: 359984. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:40:36,288][910102] Avg episode reward: [(0, '6.457')] [2025-08-18 09:40:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1458176. Throughput: 0: 866.8. Samples: 365140. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:40:41,288][910102] Avg episode reward: [(0, '6.226')] [2025-08-18 09:40:45,709][910312] Updated weights for policy 0, policy_version 360 (0.2356) [2025-08-18 09:40:46,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1474560. Throughput: 0: 866.5. Samples: 370244. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:40:46,288][910102] Avg episode reward: [(0, '6.406')] [2025-08-18 09:40:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1495040. Throughput: 0: 881.3. Samples: 373280. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:40:51,288][910102] Avg episode reward: [(0, '6.428')] [2025-08-18 09:40:56,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1511424. Throughput: 0: 875.9. Samples: 378392. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:40:56,288][910102] Avg episode reward: [(0, '6.116')] [2025-08-18 09:40:57,279][910312] Updated weights for policy 0, policy_version 370 (0.2345) [2025-08-18 09:41:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1527808. Throughput: 0: 867.2. Samples: 383516. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:41:01,288][910102] Avg episode reward: [(0, '6.174')] [2025-08-18 09:41:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1544192. Throughput: 0: 884.1. Samples: 386480. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:41:06,288][910102] Avg episode reward: [(0, '6.318')] [2025-08-18 09:41:09,117][910312] Updated weights for policy 0, policy_version 380 (0.2371) [2025-08-18 09:41:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1564672. Throughput: 0: 885.7. Samples: 391700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:41:11,288][910102] Avg episode reward: [(0, '5.942')] [2025-08-18 09:41:16,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1581056. Throughput: 0: 882.3. Samples: 396824. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:41:16,288][910102] Avg episode reward: [(0, '6.170')] [2025-08-18 09:41:20,995][910312] Updated weights for policy 0, policy_version 390 (0.2358) [2025-08-18 09:41:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1597440. Throughput: 0: 865.5. Samples: 398932. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:41:21,288][910102] Avg episode reward: [(0, '6.104')] [2025-08-18 09:41:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1613824. Throughput: 0: 865.7. Samples: 404096. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:41:26,288][910102] Avg episode reward: [(0, '6.440')] [2025-08-18 09:41:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1630208. Throughput: 0: 872.3. Samples: 409496. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:41:31,288][910102] Avg episode reward: [(0, '6.699')] [2025-08-18 09:41:32,584][910239] Saving new best policy, reward=6.699! [2025-08-18 09:41:32,586][910312] Updated weights for policy 0, policy_version 400 (0.2126) [2025-08-18 09:41:32,954][910239] Signal inference workers to stop experience collection... (400 times) [2025-08-18 09:41:32,966][910312] InferenceWorker_p0-w0: stopping experience collection (400 times) [2025-08-18 09:41:33,779][910239] Signal inference workers to resume experience collection... (400 times) [2025-08-18 09:41:33,779][910312] InferenceWorker_p0-w0: resuming experience collection (400 times) [2025-08-18 09:41:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1650688. Throughput: 0: 864.6. Samples: 412188. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:41:36,288][910102] Avg episode reward: [(0, '6.634')] [2025-08-18 09:41:37,218][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000404_1654784.pth... [2025-08-18 09:41:37,236][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000200_819200.pth [2025-08-18 09:41:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1667072. Throughput: 0: 865.2. Samples: 417324. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:41:41,288][910102] Avg episode reward: [(0, '7.006')] [2025-08-18 09:41:42,953][910239] Saving new best policy, reward=7.006! [2025-08-18 09:41:44,391][910312] Updated weights for policy 0, policy_version 410 (0.1661) [2025-08-18 09:41:46,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 1683456. Throughput: 0: 867.3. Samples: 422544. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:41:46,288][910102] Avg episode reward: [(0, '7.103')] [2025-08-18 09:41:47,603][910239] Saving new best policy, reward=7.103! [2025-08-18 09:41:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1703936. Throughput: 0: 867.1. Samples: 425500. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:41:51,288][910102] Avg episode reward: [(0, '7.322')] [2025-08-18 09:41:52,190][910239] Saving new best policy, reward=7.322! [2025-08-18 09:41:55,772][910312] Updated weights for policy 0, policy_version 420 (0.1875) [2025-08-18 09:41:56,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1720320. Throughput: 0: 864.5. Samples: 430604. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:41:56,288][910102] Avg episode reward: [(0, '7.439')] [2025-08-18 09:41:57,764][910239] Saving new best policy, reward=7.439! [2025-08-18 09:42:01,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1740800. Throughput: 0: 885.4. Samples: 436668. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:42:01,288][910102] Avg episode reward: [(0, '7.511')] [2025-08-18 09:42:02,239][910239] Saving new best policy, reward=7.511! [2025-08-18 09:42:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1757184. Throughput: 0: 886.8. Samples: 438836. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:42:06,288][910102] Avg episode reward: [(0, '7.666')] [2025-08-18 09:42:06,862][910239] Saving new best policy, reward=7.666! [2025-08-18 09:42:06,864][910312] Updated weights for policy 0, policy_version 430 (0.1351) [2025-08-18 09:42:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1773568. Throughput: 0: 892.0. Samples: 444236. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:42:11,288][910102] Avg episode reward: [(0, '8.049')] [2025-08-18 09:42:12,887][910239] Saving new best policy, reward=8.049! [2025-08-18 09:42:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1789952. Throughput: 0: 879.3. Samples: 449064. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:42:16,288][910102] Avg episode reward: [(0, '8.620')] [2025-08-18 09:42:17,729][910239] Saving new best policy, reward=8.620! [2025-08-18 09:42:19,139][910312] Updated weights for policy 0, policy_version 440 (0.1869) [2025-08-18 09:42:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1806336. Throughput: 0: 887.0. Samples: 452104. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:42:21,288][910102] Avg episode reward: [(0, '8.607')] [2025-08-18 09:42:26,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1826816. Throughput: 0: 887.3. Samples: 457252. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:42:26,288][910102] Avg episode reward: [(0, '8.611')] [2025-08-18 09:42:31,089][910312] Updated weights for policy 0, policy_version 450 (0.2376) [2025-08-18 09:42:31,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 1843200. Throughput: 0: 885.0. Samples: 462368. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:42:31,288][910102] Avg episode reward: [(0, '8.313')] [2025-08-18 09:42:31,470][910239] Signal inference workers to stop experience collection... (450 times) [2025-08-18 09:42:31,482][910312] InferenceWorker_p0-w0: stopping experience collection (450 times) [2025-08-18 09:42:32,046][910239] Signal inference workers to resume experience collection... (450 times) [2025-08-18 09:42:32,046][910312] InferenceWorker_p0-w0: resuming experience collection (450 times) [2025-08-18 09:42:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1859584. Throughput: 0: 871.9. Samples: 464736. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:42:36,288][910102] Avg episode reward: [(0, '8.654')] [2025-08-18 09:42:38,008][910239] Saving new best policy, reward=8.654! [2025-08-18 09:42:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1875968. Throughput: 0: 872.0. Samples: 469844. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:42:41,288][910102] Avg episode reward: [(0, '8.721')] [2025-08-18 09:42:42,911][910239] Saving new best policy, reward=8.721! [2025-08-18 09:42:42,913][910312] Updated weights for policy 0, policy_version 460 (0.1657) [2025-08-18 09:42:46,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 1892352. Throughput: 0: 845.1. Samples: 474696. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:42:46,288][910102] Avg episode reward: [(0, '9.457')] [2025-08-18 09:42:47,828][910239] Saving new best policy, reward=9.457! [2025-08-18 09:42:51,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1908736. Throughput: 0: 864.8. Samples: 477752. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:42:51,288][910102] Avg episode reward: [(0, '9.439')] [2025-08-18 09:42:55,378][910312] Updated weights for policy 0, policy_version 470 (0.2138) [2025-08-18 09:42:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1925120. Throughput: 0: 859.1. Samples: 482896. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:42:56,288][910102] Avg episode reward: [(0, '9.634')] [2025-08-18 09:42:57,598][910239] Saving new best policy, reward=9.634! [2025-08-18 09:43:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3471.2). Total num frames: 1941504. Throughput: 0: 858.8. Samples: 487708. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:43:01,288][910102] Avg episode reward: [(0, '9.173')] [2025-08-18 09:43:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 1961984. Throughput: 0: 842.6. Samples: 490020. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:06,288][910102] Avg episode reward: [(0, '9.410')] [2025-08-18 09:43:07,536][910312] Updated weights for policy 0, policy_version 480 (0.2151) [2025-08-18 09:43:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1978368. Throughput: 0: 841.7. Samples: 495128. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:11,288][910102] Avg episode reward: [(0, '9.133')] [2025-08-18 09:43:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 1994752. Throughput: 0: 841.6. Samples: 500240. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:16,288][910102] Avg episode reward: [(0, '9.486')] [2025-08-18 09:43:19,633][910312] Updated weights for policy 0, policy_version 490 (0.2375) [2025-08-18 09:43:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2011136. Throughput: 0: 846.8. Samples: 502844. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:21,288][910102] Avg episode reward: [(0, '9.899')] [2025-08-18 09:43:23,013][910239] Saving new best policy, reward=9.899! [2025-08-18 09:43:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3471.2). Total num frames: 2027520. Throughput: 0: 840.7. Samples: 507676. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:26,288][910102] Avg episode reward: [(0, '10.147')] [2025-08-18 09:43:27,822][910239] Saving new best policy, reward=10.147! [2025-08-18 09:43:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3471.2). Total num frames: 2043904. Throughput: 0: 848.4. Samples: 512876. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:31,288][910102] Avg episode reward: [(0, '9.966')] [2025-08-18 09:43:31,675][910312] Updated weights for policy 0, policy_version 500 (0.1897) [2025-08-18 09:43:32,076][910239] Signal inference workers to stop experience collection... (500 times) [2025-08-18 09:43:32,087][910312] InferenceWorker_p0-w0: stopping experience collection (500 times) [2025-08-18 09:43:32,663][910239] Signal inference workers to resume experience collection... (500 times) [2025-08-18 09:43:32,663][910312] InferenceWorker_p0-w0: resuming experience collection (500 times) [2025-08-18 09:43:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2064384. Throughput: 0: 840.4. Samples: 515572. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:36,288][910102] Avg episode reward: [(0, '9.619')] [2025-08-18 09:43:37,479][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000505_2068480.pth... [2025-08-18 09:43:37,499][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000302_1236992.pth [2025-08-18 09:43:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2080768. Throughput: 0: 840.1. Samples: 520700. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:41,288][910102] Avg episode reward: [(0, '9.759')] [2025-08-18 09:43:43,826][910312] Updated weights for policy 0, policy_version 510 (0.2103) [2025-08-18 09:43:46,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2097152. Throughput: 0: 847.7. Samples: 525856. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:46,288][910102] Avg episode reward: [(0, '9.219')] [2025-08-18 09:43:51,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2113536. Throughput: 0: 847.8. Samples: 528172. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:51,288][910102] Avg episode reward: [(0, '10.249')] [2025-08-18 09:43:52,888][910239] Saving new best policy, reward=10.249! [2025-08-18 09:43:55,481][910312] Updated weights for policy 0, policy_version 520 (0.2122) [2025-08-18 09:43:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2129920. Throughput: 0: 860.4. Samples: 533848. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:43:56,288][910102] Avg episode reward: [(0, '9.890')] [2025-08-18 09:44:01,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2150400. Throughput: 0: 865.3. Samples: 539180. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:44:01,288][910102] Avg episode reward: [(0, '10.552')] [2025-08-18 09:44:02,214][910239] Saving new best policy, reward=10.552! [2025-08-18 09:44:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2166784. Throughput: 0: 854.4. Samples: 541292. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:06,288][910102] Avg episode reward: [(0, '11.069')] [2025-08-18 09:44:06,886][910239] Saving new best policy, reward=11.069! [2025-08-18 09:44:06,887][910312] Updated weights for policy 0, policy_version 530 (0.1890) [2025-08-18 09:44:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2183168. Throughput: 0: 868.7. Samples: 546768. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:11,288][910102] Avg episode reward: [(0, '10.027')] [2025-08-18 09:44:16,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2203648. Throughput: 0: 881.3. Samples: 552536. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:16,288][910102] Avg episode reward: [(0, '10.352')] [2025-08-18 09:44:18,376][910312] Updated weights for policy 0, policy_version 540 (0.2070) [2025-08-18 09:44:21,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2220032. Throughput: 0: 880.1. Samples: 555176. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:21,288][910102] Avg episode reward: [(0, '10.216')] [2025-08-18 09:44:26,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2240512. Throughput: 0: 890.0. Samples: 560752. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:26,288][910102] Avg episode reward: [(0, '10.328')] [2025-08-18 09:44:29,901][910312] Updated weights for policy 0, policy_version 550 (0.2263) [2025-08-18 09:44:30,315][910239] Signal inference workers to stop experience collection... (550 times) [2025-08-18 09:44:30,322][910312] InferenceWorker_p0-w0: stopping experience collection (550 times) [2025-08-18 09:44:30,839][910239] Signal inference workers to resume experience collection... (550 times) [2025-08-18 09:44:30,839][910312] InferenceWorker_p0-w0: resuming experience collection (550 times) [2025-08-18 09:44:31,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2256896. Throughput: 0: 888.6. Samples: 565844. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:31,288][910102] Avg episode reward: [(0, '10.621')] [2025-08-18 09:44:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2273280. Throughput: 0: 889.6. Samples: 568204. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:36,288][910102] Avg episode reward: [(0, '9.465')] [2025-08-18 09:44:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2289664. Throughput: 0: 878.5. Samples: 573380. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:41,288][910102] Avg episode reward: [(0, '9.180')] [2025-08-18 09:44:41,865][910312] Updated weights for policy 0, policy_version 560 (0.2379) [2025-08-18 09:44:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2310144. Throughput: 0: 882.6. Samples: 578896. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:46,288][910102] Avg episode reward: [(0, '9.622')] [2025-08-18 09:44:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2326528. Throughput: 0: 888.2. Samples: 581260. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:51,288][910102] Avg episode reward: [(0, '10.699')] [2025-08-18 09:44:53,244][910312] Updated weights for policy 0, policy_version 570 (0.2360) [2025-08-18 09:44:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3471.2). Total num frames: 2342912. Throughput: 0: 893.3. Samples: 586968. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:44:56,288][910102] Avg episode reward: [(0, '10.976')] [2025-08-18 09:45:01,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2363392. Throughput: 0: 888.4. Samples: 592516. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:45:01,288][910102] Avg episode reward: [(0, '10.847')] [2025-08-18 09:45:04,459][910312] Updated weights for policy 0, policy_version 580 (0.2332) [2025-08-18 09:45:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2379776. Throughput: 0: 893.0. Samples: 595360. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:45:06,288][910102] Avg episode reward: [(0, '11.215')] [2025-08-18 09:45:07,576][910239] Saving new best policy, reward=11.215! [2025-08-18 09:45:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3618.1, 300 sec: 3485.1). Total num frames: 2400256. Throughput: 0: 887.4. Samples: 600684. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:45:11,288][910102] Avg episode reward: [(0, '11.540')] [2025-08-18 09:45:12,225][910239] Saving new best policy, reward=11.540! [2025-08-18 09:45:16,025][910312] Updated weights for policy 0, policy_version 590 (0.1888) [2025-08-18 09:45:16,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2416640. Throughput: 0: 888.4. Samples: 605824. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:45:16,288][910102] Avg episode reward: [(0, '12.353')] [2025-08-18 09:45:16,965][910239] Saving new best policy, reward=12.353! [2025-08-18 09:45:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2433024. Throughput: 0: 893.2. Samples: 608400. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:45:21,288][910102] Avg episode reward: [(0, '13.062')] [2025-08-18 09:45:22,899][910239] Saving new best policy, reward=13.062! [2025-08-18 09:45:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2449408. Throughput: 0: 896.4. Samples: 613720. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:45:26,288][910102] Avg episode reward: [(0, '12.441')] [2025-08-18 09:45:27,862][910312] Updated weights for policy 0, policy_version 600 (0.1896) [2025-08-18 09:45:28,222][910239] Signal inference workers to stop experience collection... (600 times) [2025-08-18 09:45:28,234][910312] InferenceWorker_p0-w0: stopping experience collection (600 times) [2025-08-18 09:45:28,807][910239] Signal inference workers to resume experience collection... (600 times) [2025-08-18 09:45:28,807][910312] InferenceWorker_p0-w0: resuming experience collection (600 times) [2025-08-18 09:45:31,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2469888. Throughput: 0: 894.8. Samples: 619160. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:45:31,288][910102] Avg episode reward: [(0, '12.223')] [2025-08-18 09:45:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2486272. Throughput: 0: 888.7. Samples: 621252. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:45:36,288][910102] Avg episode reward: [(0, '11.584')] [2025-08-18 09:45:36,914][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000608_2490368.pth... [2025-08-18 09:45:36,932][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000404_1654784.pth [2025-08-18 09:45:39,474][910312] Updated weights for policy 0, policy_version 610 (0.2120) [2025-08-18 09:45:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2502656. Throughput: 0: 883.3. Samples: 626716. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:45:41,288][910102] Avg episode reward: [(0, '11.408')] [2025-08-18 09:45:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2523136. Throughput: 0: 888.2. Samples: 632484. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:45:46,288][910102] Avg episode reward: [(0, '11.246')] [2025-08-18 09:45:51,028][910312] Updated weights for policy 0, policy_version 620 (0.2330) [2025-08-18 09:45:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2539520. Throughput: 0: 870.8. Samples: 634548. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:45:51,288][910102] Avg episode reward: [(0, '11.193')] [2025-08-18 09:45:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2555904. Throughput: 0: 868.7. Samples: 639776. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:45:56,288][910102] Avg episode reward: [(0, '11.937')] [2025-08-18 09:46:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2572288. Throughput: 0: 878.5. Samples: 645356. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:01,288][910102] Avg episode reward: [(0, '13.076')] [2025-08-18 09:46:02,516][910239] Saving new best policy, reward=13.076! [2025-08-18 09:46:02,518][910312] Updated weights for policy 0, policy_version 630 (0.2126) [2025-08-18 09:46:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3485.1). Total num frames: 2592768. Throughput: 0: 875.8. Samples: 647812. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:06,288][910102] Avg episode reward: [(0, '13.659')] [2025-08-18 09:46:07,244][910239] Saving new best policy, reward=13.659! [2025-08-18 09:46:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2609152. Throughput: 0: 871.3. Samples: 652928. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:11,288][910102] Avg episode reward: [(0, '13.734')] [2025-08-18 09:46:11,968][910239] Saving new best policy, reward=13.734! [2025-08-18 09:46:14,463][910312] Updated weights for policy 0, policy_version 640 (0.1891) [2025-08-18 09:46:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2625536. Throughput: 0: 864.1. Samples: 658044. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:16,288][910102] Avg episode reward: [(0, '13.303')] [2025-08-18 09:46:21,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2646016. Throughput: 0: 885.2. Samples: 661084. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:21,288][910102] Avg episode reward: [(0, '12.883')] [2025-08-18 09:46:26,202][910312] Updated weights for policy 0, policy_version 650 (0.2357) [2025-08-18 09:46:26,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2662400. Throughput: 0: 877.0. Samples: 666180. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:26,288][910102] Avg episode reward: [(0, '13.129')] [2025-08-18 09:46:26,608][910239] Signal inference workers to stop experience collection... (650 times) [2025-08-18 09:46:26,618][910312] InferenceWorker_p0-w0: stopping experience collection (650 times) [2025-08-18 09:46:27,131][910239] Signal inference workers to resume experience collection... (650 times) [2025-08-18 09:46:27,131][910312] InferenceWorker_p0-w0: resuming experience collection (650 times) [2025-08-18 09:46:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2678784. Throughput: 0: 863.2. Samples: 671328. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:31,288][910102] Avg episode reward: [(0, '14.035')] [2025-08-18 09:46:31,963][910239] Saving new best policy, reward=14.035! [2025-08-18 09:46:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2695168. Throughput: 0: 872.3. Samples: 673800. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:36,288][910102] Avg episode reward: [(0, '13.686')] [2025-08-18 09:46:38,221][910312] Updated weights for policy 0, policy_version 660 (0.2123) [2025-08-18 09:46:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2711552. Throughput: 0: 870.9. Samples: 678968. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:41,288][910102] Avg episode reward: [(0, '13.962')] [2025-08-18 09:46:46,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 2727936. Throughput: 0: 858.5. Samples: 683988. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:46,288][910102] Avg episode reward: [(0, '14.212')] [2025-08-18 09:46:47,600][910239] Saving new best policy, reward=14.212! [2025-08-18 09:46:50,159][910312] Updated weights for policy 0, policy_version 670 (0.2076) [2025-08-18 09:46:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2748416. Throughput: 0: 863.8. Samples: 686684. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:51,288][910102] Avg episode reward: [(0, '13.044')] [2025-08-18 09:46:56,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2764800. Throughput: 0: 864.5. Samples: 691832. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:46:56,288][910102] Avg episode reward: [(0, '13.605')] [2025-08-18 09:47:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 2781184. Throughput: 0: 865.2. Samples: 696980. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:01,288][910102] Avg episode reward: [(0, '14.398')] [2025-08-18 09:47:01,616][910312] Updated weights for policy 0, policy_version 680 (0.2125) [2025-08-18 09:47:02,793][910239] Saving new best policy, reward=14.398! [2025-08-18 09:47:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2801664. Throughput: 0: 864.5. Samples: 699988. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:06,288][910102] Avg episode reward: [(0, '13.954')] [2025-08-18 09:47:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2818048. Throughput: 0: 865.6. Samples: 705132. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:11,288][910102] Avg episode reward: [(0, '14.090')] [2025-08-18 09:47:13,319][910312] Updated weights for policy 0, policy_version 690 (0.2108) [2025-08-18 09:47:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2834432. Throughput: 0: 865.4. Samples: 710272. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:16,288][910102] Avg episode reward: [(0, '14.450')] [2025-08-18 09:47:17,692][910239] Saving new best policy, reward=14.450! [2025-08-18 09:47:21,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2854912. Throughput: 0: 878.0. Samples: 713308. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:21,288][910102] Avg episode reward: [(0, '14.771')] [2025-08-18 09:47:22,374][910239] Saving new best policy, reward=14.771! [2025-08-18 09:47:24,991][910312] Updated weights for policy 0, policy_version 700 (0.1874) [2025-08-18 09:47:25,379][910239] Signal inference workers to stop experience collection... (700 times) [2025-08-18 09:47:25,386][910312] InferenceWorker_p0-w0: stopping experience collection (700 times) [2025-08-18 09:47:25,915][910239] Signal inference workers to resume experience collection... (700 times) [2025-08-18 09:47:25,916][910312] InferenceWorker_p0-w0: resuming experience collection (700 times) [2025-08-18 09:47:26,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2871296. Throughput: 0: 877.3. Samples: 718448. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:26,288][910102] Avg episode reward: [(0, '14.209')] [2025-08-18 09:47:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2887680. Throughput: 0: 880.6. Samples: 723616. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:31,288][910102] Avg episode reward: [(0, '15.381')] [2025-08-18 09:47:32,903][910239] Saving new best policy, reward=15.381! [2025-08-18 09:47:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3485.1). Total num frames: 2904064. Throughput: 0: 888.0. Samples: 726644. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:36,288][910102] Avg episode reward: [(0, '15.199')] [2025-08-18 09:47:36,615][910312] Updated weights for policy 0, policy_version 710 (0.2087) [2025-08-18 09:47:37,518][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000711_2912256.pth... [2025-08-18 09:47:37,536][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000505_2068480.pth [2025-08-18 09:47:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2924544. Throughput: 0: 888.0. Samples: 731792. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:41,288][910102] Avg episode reward: [(0, '14.955')] [2025-08-18 09:47:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3499.0). Total num frames: 2940928. Throughput: 0: 886.8. Samples: 736888. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:46,288][910102] Avg episode reward: [(0, '15.744')] [2025-08-18 09:47:46,854][910239] Saving new best policy, reward=15.744! [2025-08-18 09:47:48,293][910312] Updated weights for policy 0, policy_version 720 (0.1875) [2025-08-18 09:47:51,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 2957312. Throughput: 0: 884.0. Samples: 739768. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:51,288][910102] Avg episode reward: [(0, '16.954')] [2025-08-18 09:47:52,664][910239] Saving new best policy, reward=16.954! [2025-08-18 09:47:56,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3549.9, 300 sec: 3512.8). Total num frames: 2977792. Throughput: 0: 887.4. Samples: 745064. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:47:56,288][910102] Avg episode reward: [(0, '17.951')] [2025-08-18 09:47:57,159][910239] Saving new best policy, reward=17.951! [2025-08-18 09:47:59,828][910312] Updated weights for policy 0, policy_version 730 (0.1609) [2025-08-18 09:48:01,289][910102] Fps is (10 sec: 3686.1, 60 sec: 3549.8, 300 sec: 3498.9). Total num frames: 2994176. Throughput: 0: 886.4. Samples: 750160. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:48:01,290][910102] Avg episode reward: [(0, '18.528')] [2025-08-18 09:48:02,044][910239] Saving new best policy, reward=18.528! [2025-08-18 09:48:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 3010560. Throughput: 0: 868.8. Samples: 752404. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:48:06,288][910102] Avg episode reward: [(0, '18.277')] [2025-08-18 09:48:11,288][910102] Fps is (10 sec: 3277.0, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 3026944. Throughput: 0: 866.4. Samples: 757436. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:48:11,288][910102] Avg episode reward: [(0, '18.376')] [2025-08-18 09:48:12,030][910312] Updated weights for policy 0, policy_version 740 (0.2210) [2025-08-18 09:48:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3499.0). Total num frames: 3043328. Throughput: 0: 864.3. Samples: 762508. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:48:16,288][910102] Avg episode reward: [(0, '18.960')] [2025-08-18 09:48:17,865][910239] Saving new best policy, reward=18.960! [2025-08-18 09:48:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 3059712. Throughput: 0: 864.4. Samples: 765540. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:48:21,288][910102] Avg episode reward: [(0, '17.771')] [2025-08-18 09:48:24,309][910312] Updated weights for policy 0, policy_version 750 (0.1949) [2025-08-18 09:48:24,705][910239] Signal inference workers to stop experience collection... (750 times) [2025-08-18 09:48:24,714][910312] InferenceWorker_p0-w0: stopping experience collection (750 times) [2025-08-18 09:48:25,299][910239] Signal inference workers to resume experience collection... (750 times) [2025-08-18 09:48:25,299][910312] InferenceWorker_p0-w0: resuming experience collection (750 times) [2025-08-18 09:48:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 3076096. Throughput: 0: 858.9. Samples: 770444. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:48:26,288][910102] Avg episode reward: [(0, '17.097')] [2025-08-18 09:48:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 3092480. Throughput: 0: 851.6. Samples: 775208. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:48:31,288][910102] Avg episode reward: [(0, '16.342')] [2025-08-18 09:48:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 3108864. Throughput: 0: 847.2. Samples: 777892. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:48:36,288][910102] Avg episode reward: [(0, '15.611')] [2025-08-18 09:48:36,713][910312] Updated weights for policy 0, policy_version 760 (0.2415) [2025-08-18 09:48:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 3125248. Throughput: 0: 841.9. Samples: 782948. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:48:41,288][910102] Avg episode reward: [(0, '14.627')] [2025-08-18 09:48:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 3145728. Throughput: 0: 837.5. Samples: 787848. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:48:46,288][910102] Avg episode reward: [(0, '14.712')] [2025-08-18 09:48:49,016][910312] Updated weights for policy 0, policy_version 770 (0.2410) [2025-08-18 09:48:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3499.0). Total num frames: 3162112. Throughput: 0: 839.4. Samples: 790176. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:48:51,288][910102] Avg episode reward: [(0, '15.183')] [2025-08-18 09:48:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 3178496. Throughput: 0: 841.1. Samples: 795284. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:48:56,288][910102] Avg episode reward: [(0, '16.855')] [2025-08-18 09:49:00,932][910312] Updated weights for policy 0, policy_version 780 (0.2407) [2025-08-18 09:49:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 3194880. Throughput: 0: 842.3. Samples: 800412. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:49:01,288][910102] Avg episode reward: [(0, '17.842')] [2025-08-18 09:49:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3485.1). Total num frames: 3211264. Throughput: 0: 837.9. Samples: 803244. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:49:06,288][910102] Avg episode reward: [(0, '18.369')] [2025-08-18 09:49:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 3231744. Throughput: 0: 848.5. Samples: 808628. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:49:11,288][910102] Avg episode reward: [(0, '18.283')] [2025-08-18 09:49:12,653][910312] Updated weights for policy 0, policy_version 790 (0.2155) [2025-08-18 09:49:16,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3485.1). Total num frames: 3248128. Throughput: 0: 855.8. Samples: 813720. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:49:16,288][910102] Avg episode reward: [(0, '17.410')] [2025-08-18 09:49:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3264512. Throughput: 0: 844.3. Samples: 815884. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:49:21,288][910102] Avg episode reward: [(0, '16.863')] [2025-08-18 09:49:24,584][910312] Updated weights for policy 0, policy_version 800 (0.2403) [2025-08-18 09:49:24,982][910239] Signal inference workers to stop experience collection... (800 times) [2025-08-18 09:49:24,996][910312] InferenceWorker_p0-w0: stopping experience collection (800 times) [2025-08-18 09:49:25,542][910239] Signal inference workers to resume experience collection... (800 times) [2025-08-18 09:49:25,543][910312] InferenceWorker_p0-w0: resuming experience collection (800 times) [2025-08-18 09:49:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3280896. Throughput: 0: 844.5. Samples: 820952. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:49:26,288][910102] Avg episode reward: [(0, '15.326')] [2025-08-18 09:49:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3297280. Throughput: 0: 850.0. Samples: 826100. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:49:31,288][910102] Avg episode reward: [(0, '15.256')] [2025-08-18 09:49:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3313664. Throughput: 0: 864.7. Samples: 829088. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:49:36,288][910102] Avg episode reward: [(0, '16.770')] [2025-08-18 09:49:36,585][910312] Updated weights for policy 0, policy_version 810 (0.2387) [2025-08-18 09:49:37,579][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000811_3321856.pth... [2025-08-18 09:49:37,600][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000608_2490368.pth [2025-08-18 09:49:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3471.2). Total num frames: 3334144. Throughput: 0: 865.2. Samples: 834216. Policy #0 lag: (min: 1.0, avg: 1.7, max: 2.0) [2025-08-18 09:49:41,290][910102] Avg episode reward: [(0, '16.440')] [2025-08-18 09:49:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3350528. Throughput: 0: 864.4. Samples: 839308. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:49:46,288][910102] Avg episode reward: [(0, '17.462')] [2025-08-18 09:49:48,881][910312] Updated weights for policy 0, policy_version 820 (0.1898) [2025-08-18 09:49:51,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3471.2). Total num frames: 3366912. Throughput: 0: 847.1. Samples: 841364. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:49:51,288][910102] Avg episode reward: [(0, '17.777')] [2025-08-18 09:49:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3383296. Throughput: 0: 840.3. Samples: 846440. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:49:56,288][910102] Avg episode reward: [(0, '18.375')] [2025-08-18 09:50:01,276][910312] Updated weights for policy 0, policy_version 830 (0.2418) [2025-08-18 09:50:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3457.3). Total num frames: 3399680. Throughput: 0: 841.3. Samples: 851580. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:01,288][910102] Avg episode reward: [(0, '18.169')] [2025-08-18 09:50:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 3416064. Throughput: 0: 839.1. Samples: 853644. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:06,288][910102] Avg episode reward: [(0, '19.143')] [2025-08-18 09:50:07,104][910239] Saving new best policy, reward=19.143! [2025-08-18 09:50:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3443.4). Total num frames: 3432448. Throughput: 0: 839.6. Samples: 858732. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:11,288][910102] Avg episode reward: [(0, '19.534')] [2025-08-18 09:50:12,087][910239] Saving new best policy, reward=19.534! [2025-08-18 09:50:13,515][910312] Updated weights for policy 0, policy_version 840 (0.1922) [2025-08-18 09:50:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3443.4). Total num frames: 3448832. Throughput: 0: 839.4. Samples: 863872. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:16,289][910102] Avg episode reward: [(0, '18.310')] [2025-08-18 09:50:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3443.4). Total num frames: 3465216. Throughput: 0: 833.0. Samples: 866572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:21,288][910102] Avg episode reward: [(0, '18.889')] [2025-08-18 09:50:25,644][910312] Updated weights for policy 0, policy_version 850 (0.2417) [2025-08-18 09:50:26,044][910239] Signal inference workers to stop experience collection... (850 times) [2025-08-18 09:50:26,052][910312] InferenceWorker_p0-w0: stopping experience collection (850 times) [2025-08-18 09:50:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3481600. Throughput: 0: 828.6. Samples: 871504. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:26,288][910102] Avg episode reward: [(0, '19.677')] [2025-08-18 09:50:26,627][910239] Signal inference workers to resume experience collection... (850 times) [2025-08-18 09:50:26,627][910312] InferenceWorker_p0-w0: resuming experience collection (850 times) [2025-08-18 09:50:27,846][910239] Saving new best policy, reward=19.677! [2025-08-18 09:50:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3497984. Throughput: 0: 822.1. Samples: 876304. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:31,288][910102] Avg episode reward: [(0, '19.915')] [2025-08-18 09:50:32,720][910239] Saving new best policy, reward=19.915! [2025-08-18 09:50:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 3518464. Throughput: 0: 841.0. Samples: 879208. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:36,288][910102] Avg episode reward: [(0, '19.394')] [2025-08-18 09:50:37,660][910312] Updated weights for policy 0, policy_version 860 (0.1924) [2025-08-18 09:50:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3534848. Throughput: 0: 841.8. Samples: 884320. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:41,289][910102] Avg episode reward: [(0, '20.373')] [2025-08-18 09:50:42,257][910239] Saving new best policy, reward=20.373! [2025-08-18 09:50:46,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3551232. Throughput: 0: 841.9. Samples: 889464. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:46,288][910102] Avg episode reward: [(0, '20.949')] [2025-08-18 09:50:47,170][910239] Saving new best policy, reward=20.949! [2025-08-18 09:50:49,653][910312] Updated weights for policy 0, policy_version 870 (0.1708) [2025-08-18 09:50:51,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3567616. Throughput: 0: 841.6. Samples: 891516. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:51,288][910102] Avg episode reward: [(0, '20.621')] [2025-08-18 09:50:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3429.5). Total num frames: 3584000. Throughput: 0: 842.5. Samples: 896644. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:50:56,288][910102] Avg episode reward: [(0, '20.306')] [2025-08-18 09:51:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 3600384. Throughput: 0: 841.2. Samples: 901724. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:01,288][910102] Avg episode reward: [(0, '19.747')] [2025-08-18 09:51:01,892][910312] Updated weights for policy 0, policy_version 880 (0.2409) [2025-08-18 09:51:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 3616768. Throughput: 0: 848.6. Samples: 904760. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:06,288][910102] Avg episode reward: [(0, '20.027')] [2025-08-18 09:51:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 3633152. Throughput: 0: 853.7. Samples: 909920. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:11,288][910102] Avg episode reward: [(0, '19.513')] [2025-08-18 09:51:13,984][910312] Updated weights for policy 0, policy_version 890 (0.2417) [2025-08-18 09:51:16,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 3653632. Throughput: 0: 859.6. Samples: 914984. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:16,288][910102] Avg episode reward: [(0, '18.263')] [2025-08-18 09:51:21,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 3670016. Throughput: 0: 841.8. Samples: 917088. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:21,288][910102] Avg episode reward: [(0, '17.861')] [2025-08-18 09:51:26,040][910312] Updated weights for policy 0, policy_version 900 (0.2191) [2025-08-18 09:51:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 3686400. Throughput: 0: 842.3. Samples: 922224. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:26,288][910102] Avg episode reward: [(0, '18.916')] [2025-08-18 09:51:26,429][910239] Signal inference workers to stop experience collection... (900 times) [2025-08-18 09:51:26,436][910312] InferenceWorker_p0-w0: stopping experience collection (900 times) [2025-08-18 09:51:27,011][910239] Signal inference workers to resume experience collection... (900 times) [2025-08-18 09:51:27,011][910312] InferenceWorker_p0-w0: resuming experience collection (900 times) [2025-08-18 09:51:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 3702784. Throughput: 0: 841.6. Samples: 927336. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:31,288][910102] Avg episode reward: [(0, '18.802')] [2025-08-18 09:51:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 3719168. Throughput: 0: 860.4. Samples: 930236. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:36,288][910102] Avg episode reward: [(0, '18.000')] [2025-08-18 09:51:37,789][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000910_3727360.pth... [2025-08-18 09:51:37,790][910312] Updated weights for policy 0, policy_version 910 (0.2178) [2025-08-18 09:51:37,810][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000711_2912256.pth [2025-08-18 09:51:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3415.6). Total num frames: 3735552. Throughput: 0: 861.6. Samples: 935416. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:41,288][910102] Avg episode reward: [(0, '18.387')] [2025-08-18 09:51:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 3756032. Throughput: 0: 865.2. Samples: 940660. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:46,288][910102] Avg episode reward: [(0, '18.632')] [2025-08-18 09:51:50,065][910312] Updated weights for policy 0, policy_version 920 (0.2400) [2025-08-18 09:51:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 3772416. Throughput: 0: 843.8. Samples: 942732. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:51,288][910102] Avg episode reward: [(0, '18.566')] [2025-08-18 09:51:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 3788800. Throughput: 0: 843.2. Samples: 947864. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:51:56,288][910102] Avg episode reward: [(0, '17.059')] [2025-08-18 09:52:01,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 3809280. Throughput: 0: 858.3. Samples: 953608. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:52:01,289][910102] Avg episode reward: [(0, '18.218')] [2025-08-18 09:52:01,509][910312] Updated weights for policy 0, policy_version 930 (0.2273) [2025-08-18 09:52:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 3825664. Throughput: 0: 864.4. Samples: 955984. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:52:06,288][910102] Avg episode reward: [(0, '18.771')] [2025-08-18 09:52:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 3842048. Throughput: 0: 865.1. Samples: 961152. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:52:11,288][910102] Avg episode reward: [(0, '19.056')] [2025-08-18 09:52:13,272][910312] Updated weights for policy 0, policy_version 940 (0.2023) [2025-08-18 09:52:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 3858432. Throughput: 0: 865.8. Samples: 966296. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:52:16,288][910102] Avg episode reward: [(0, '20.124')] [2025-08-18 09:52:21,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 3878912. Throughput: 0: 867.9. Samples: 969292. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:52:21,288][910102] Avg episode reward: [(0, '19.585')] [2025-08-18 09:52:24,931][910312] Updated weights for policy 0, policy_version 950 (0.2278) [2025-08-18 09:52:25,357][910239] Signal inference workers to stop experience collection... (950 times) [2025-08-18 09:52:25,372][910312] InferenceWorker_p0-w0: stopping experience collection (950 times) [2025-08-18 09:52:25,862][910239] Signal inference workers to resume experience collection... (950 times) [2025-08-18 09:52:25,862][910312] InferenceWorker_p0-w0: resuming experience collection (950 times) [2025-08-18 09:52:26,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 3895296. Throughput: 0: 866.8. Samples: 974420. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:52:26,288][910102] Avg episode reward: [(0, '20.633')] [2025-08-18 09:52:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 3911680. Throughput: 0: 864.4. Samples: 979556. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:52:31,288][910102] Avg episode reward: [(0, '20.506')] [2025-08-18 09:52:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 3928064. Throughput: 0: 885.3. Samples: 982572. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:52:36,288][910102] Avg episode reward: [(0, '19.932')] [2025-08-18 09:52:36,753][910312] Updated weights for policy 0, policy_version 960 (0.2341) [2025-08-18 09:52:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 3944448. Throughput: 0: 884.9. Samples: 987684. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:52:41,288][910102] Avg episode reward: [(0, '19.850')] [2025-08-18 09:52:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 3964928. Throughput: 0: 870.8. Samples: 992792. Policy #0 lag: (min: 1.0, avg: 1.6, max: 2.0) [2025-08-18 09:52:46,288][910102] Avg episode reward: [(0, '20.292')] [2025-08-18 09:52:48,887][910312] Updated weights for policy 0, policy_version 970 (0.2428) [2025-08-18 09:52:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 3981312. Throughput: 0: 863.4. Samples: 994836. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:52:51,288][910102] Avg episode reward: [(0, '20.459')] [2025-08-18 09:52:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3401.8). Total num frames: 3997696. Throughput: 0: 862.4. Samples: 999960. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:52:56,288][910102] Avg episode reward: [(0, '20.371')] [2025-08-18 09:53:00,993][910312] Updated weights for policy 0, policy_version 980 (0.2184) [2025-08-18 09:53:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 4014080. Throughput: 0: 861.6. Samples: 1005068. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:01,288][910102] Avg episode reward: [(0, '20.949')] [2025-08-18 09:53:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 4030464. Throughput: 0: 849.3. Samples: 1007512. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:06,288][910102] Avg episode reward: [(0, '21.614')] [2025-08-18 09:53:06,932][910239] Saving new best policy, reward=21.614! [2025-08-18 09:53:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 4046848. Throughput: 0: 840.3. Samples: 1012232. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:11,288][910102] Avg episode reward: [(0, '21.119')] [2025-08-18 09:53:13,362][910312] Updated weights for policy 0, policy_version 990 (0.2194) [2025-08-18 09:53:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 4063232. Throughput: 0: 840.2. Samples: 1017364. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:16,288][910102] Avg episode reward: [(0, '21.363')] [2025-08-18 09:53:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 4079616. Throughput: 0: 835.0. Samples: 1020148. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:21,288][910102] Avg episode reward: [(0, '21.418')] [2025-08-18 09:53:25,467][910312] Updated weights for policy 0, policy_version 1000 (0.2343) [2025-08-18 09:53:25,883][910239] Signal inference workers to stop experience collection... (1000 times) [2025-08-18 09:53:25,892][910312] InferenceWorker_p0-w0: stopping experience collection (1000 times) [2025-08-18 09:53:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 4096000. Throughput: 0: 835.3. Samples: 1025272. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:26,288][910102] Avg episode reward: [(0, '20.856')] [2025-08-18 09:53:26,401][910239] Signal inference workers to resume experience collection... (1000 times) [2025-08-18 09:53:26,401][910312] InferenceWorker_p0-w0: resuming experience collection (1000 times) [2025-08-18 09:53:31,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4116480. Throughput: 0: 840.9. Samples: 1030632. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:31,288][910102] Avg episode reward: [(0, '20.101')] [2025-08-18 09:53:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4132864. Throughput: 0: 860.1. Samples: 1033540. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:36,288][910102] Avg episode reward: [(0, '20.200')] [2025-08-18 09:53:36,865][910312] Updated weights for policy 0, policy_version 1010 (0.2032) [2025-08-18 09:53:37,740][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001011_4141056.pth... [2025-08-18 09:53:37,760][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000811_3321856.pth [2025-08-18 09:53:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 4153344. Throughput: 0: 863.6. Samples: 1038820. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:41,288][910102] Avg episode reward: [(0, '20.241')] [2025-08-18 09:53:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4169728. Throughput: 0: 863.9. Samples: 1043944. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:46,288][910102] Avg episode reward: [(0, '20.913')] [2025-08-18 09:53:48,477][910312] Updated weights for policy 0, policy_version 1020 (0.1583) [2025-08-18 09:53:51,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4186112. Throughput: 0: 863.0. Samples: 1046348. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:51,288][910102] Avg episode reward: [(0, '20.142')] [2025-08-18 09:53:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4202496. Throughput: 0: 864.9. Samples: 1051152. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:53:56,288][910102] Avg episode reward: [(0, '19.663')] [2025-08-18 09:54:00,561][910312] Updated weights for policy 0, policy_version 1030 (0.2106) [2025-08-18 09:54:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4218880. Throughput: 0: 866.4. Samples: 1056352. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:01,288][910102] Avg episode reward: [(0, '19.312')] [2025-08-18 09:54:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 4235264. Throughput: 0: 870.0. Samples: 1059296. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:06,288][910102] Avg episode reward: [(0, '20.454')] [2025-08-18 09:54:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 4251648. Throughput: 0: 870.2. Samples: 1064432. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:11,288][910102] Avg episode reward: [(0, '22.250')] [2025-08-18 09:54:12,576][910239] Saving new best policy, reward=22.250! [2025-08-18 09:54:12,579][910312] Updated weights for policy 0, policy_version 1040 (0.2134) [2025-08-18 09:54:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 4268032. Throughput: 0: 857.6. Samples: 1069224. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:16,288][910102] Avg episode reward: [(0, '21.447')] [2025-08-18 09:54:21,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 4288512. Throughput: 0: 845.1. Samples: 1071568. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:21,288][910102] Avg episode reward: [(0, '20.965')] [2025-08-18 09:54:25,019][910312] Updated weights for policy 0, policy_version 1050 (0.2385) [2025-08-18 09:54:25,411][910239] Signal inference workers to stop experience collection... (1050 times) [2025-08-18 09:54:25,418][910312] InferenceWorker_p0-w0: stopping experience collection (1050 times) [2025-08-18 09:54:26,007][910239] Signal inference workers to resume experience collection... (1050 times) [2025-08-18 09:54:26,007][910312] InferenceWorker_p0-w0: resuming experience collection (1050 times) [2025-08-18 09:54:26,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3415.6). Total num frames: 4304896. Throughput: 0: 842.3. Samples: 1076724. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:26,288][910102] Avg episode reward: [(0, '21.646')] [2025-08-18 09:54:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4321280. Throughput: 0: 842.1. Samples: 1081840. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:31,288][910102] Avg episode reward: [(0, '21.644')] [2025-08-18 09:54:36,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3401.8). Total num frames: 4337664. Throughput: 0: 840.4. Samples: 1084168. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:36,288][910102] Avg episode reward: [(0, '21.126')] [2025-08-18 09:54:37,001][910312] Updated weights for policy 0, policy_version 1060 (0.2131) [2025-08-18 09:54:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3345.1, 300 sec: 3401.8). Total num frames: 4354048. Throughput: 0: 856.2. Samples: 1089680. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:41,288][910102] Avg episode reward: [(0, '20.095')] [2025-08-18 09:54:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4374528. Throughput: 0: 861.7. Samples: 1095128. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:46,288][910102] Avg episode reward: [(0, '20.588')] [2025-08-18 09:54:48,755][910312] Updated weights for policy 0, policy_version 1070 (0.2130) [2025-08-18 09:54:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4390912. Throughput: 0: 841.4. Samples: 1097160. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:54:51,288][910102] Avg episode reward: [(0, '20.839')] [2025-08-18 09:54:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4407296. Throughput: 0: 841.5. Samples: 1102300. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:54:56,288][910102] Avg episode reward: [(0, '20.314')] [2025-08-18 09:55:00,635][910312] Updated weights for policy 0, policy_version 1080 (0.2380) [2025-08-18 09:55:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4423680. Throughput: 0: 848.4. Samples: 1107400. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:01,288][910102] Avg episode reward: [(0, '19.535')] [2025-08-18 09:55:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3415.6). Total num frames: 4440064. Throughput: 0: 863.6. Samples: 1110432. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:06,288][910102] Avg episode reward: [(0, '19.566')] [2025-08-18 09:55:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 4460544. Throughput: 0: 862.7. Samples: 1115544. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:11,288][910102] Avg episode reward: [(0, '18.983')] [2025-08-18 09:55:12,464][910312] Updated weights for policy 0, policy_version 1090 (0.2361) [2025-08-18 09:55:16,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 4476928. Throughput: 0: 862.6. Samples: 1120656. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:16,288][910102] Avg episode reward: [(0, '18.261')] [2025-08-18 09:55:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 4493312. Throughput: 0: 869.8. Samples: 1123308. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:21,288][910102] Avg episode reward: [(0, '19.928')] [2025-08-18 09:55:24,200][910312] Updated weights for policy 0, policy_version 1100 (0.2116) [2025-08-18 09:55:24,587][910239] Signal inference workers to stop experience collection... (1100 times) [2025-08-18 09:55:24,599][910312] InferenceWorker_p0-w0: stopping experience collection (1100 times) [2025-08-18 09:55:25,147][910239] Signal inference workers to resume experience collection... (1100 times) [2025-08-18 09:55:25,147][910312] InferenceWorker_p0-w0: resuming experience collection (1100 times) [2025-08-18 09:55:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 4509696. Throughput: 0: 870.1. Samples: 1128836. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:26,288][910102] Avg episode reward: [(0, '19.990')] [2025-08-18 09:55:31,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 4530176. Throughput: 0: 862.5. Samples: 1133940. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:31,288][910102] Avg episode reward: [(0, '21.418')] [2025-08-18 09:55:35,885][910312] Updated weights for policy 0, policy_version 1110 (0.2347) [2025-08-18 09:55:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 4546560. Throughput: 0: 867.4. Samples: 1136192. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:36,288][910102] Avg episode reward: [(0, '21.630')] [2025-08-18 09:55:37,987][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001112_4554752.pth... [2025-08-18 09:55:38,006][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000000910_3727360.pth [2025-08-18 09:55:41,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3429.5). Total num frames: 4562944. Throughput: 0: 872.9. Samples: 1141580. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:41,288][910102] Avg episode reward: [(0, '22.350')] [2025-08-18 09:55:42,731][910239] Saving new best policy, reward=22.350! [2025-08-18 09:55:46,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4583424. Throughput: 0: 880.3. Samples: 1147012. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:46,288][910102] Avg episode reward: [(0, '22.882')] [2025-08-18 09:55:47,387][910239] Saving new best policy, reward=22.882! [2025-08-18 09:55:47,389][910312] Updated weights for policy 0, policy_version 1120 (0.1658) [2025-08-18 09:55:51,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4599808. Throughput: 0: 864.0. Samples: 1149312. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:51,288][910102] Avg episode reward: [(0, '21.869')] [2025-08-18 09:55:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4616192. Throughput: 0: 865.2. Samples: 1154476. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:55:56,288][910102] Avg episode reward: [(0, '21.201')] [2025-08-18 09:55:59,536][910312] Updated weights for policy 0, policy_version 1130 (0.1666) [2025-08-18 09:56:01,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4632576. Throughput: 0: 864.8. Samples: 1159572. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:56:01,288][910102] Avg episode reward: [(0, '21.382')] [2025-08-18 09:56:06,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4648960. Throughput: 0: 873.1. Samples: 1162596. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:56:06,288][910102] Avg episode reward: [(0, '20.499')] [2025-08-18 09:56:11,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4669440. Throughput: 0: 864.0. Samples: 1167716. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:56:11,288][910102] Avg episode reward: [(0, '19.756')] [2025-08-18 09:56:11,444][910312] Updated weights for policy 0, policy_version 1140 (0.2427) [2025-08-18 09:56:16,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4685824. Throughput: 0: 864.8. Samples: 1172856. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-08-18 09:56:16,288][910102] Avg episode reward: [(0, '20.761')] [2025-08-18 09:56:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4702208. Throughput: 0: 862.7. Samples: 1175012. Policy #0 lag: (min: 1.0, avg: 1.3, max: 2.0) [2025-08-18 09:56:21,288][910102] Avg episode reward: [(0, '19.557')] [2025-08-18 09:56:23,443][910312] Updated weights for policy 0, policy_version 1150 (0.2416) [2025-08-18 09:56:23,826][910239] Signal inference workers to stop experience collection... (1150 times) [2025-08-18 09:56:23,838][910312] InferenceWorker_p0-w0: stopping experience collection (1150 times) [2025-08-18 09:56:24,409][910239] Signal inference workers to resume experience collection... (1150 times) [2025-08-18 09:56:24,409][910312] InferenceWorker_p0-w0: resuming experience collection (1150 times) [2025-08-18 09:56:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4718592. Throughput: 0: 855.0. Samples: 1180056. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:56:26,288][910102] Avg episode reward: [(0, '19.401')] [2025-08-18 09:56:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 4734976. Throughput: 0: 850.0. Samples: 1185260. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:56:31,288][910102] Avg episode reward: [(0, '19.960')] [2025-08-18 09:56:35,321][910312] Updated weights for policy 0, policy_version 1160 (0.2388) [2025-08-18 09:56:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3457.3). Total num frames: 4755456. Throughput: 0: 864.4. Samples: 1188212. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:56:36,288][910102] Avg episode reward: [(0, '19.949')] [2025-08-18 09:56:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4771840. Throughput: 0: 863.7. Samples: 1193344. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:56:41,288][910102] Avg episode reward: [(0, '19.392')] [2025-08-18 09:56:46,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 4788224. Throughput: 0: 864.6. Samples: 1198480. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:56:46,288][910102] Avg episode reward: [(0, '19.560')] [2025-08-18 09:56:47,159][910312] Updated weights for policy 0, policy_version 1170 (0.2402) [2025-08-18 09:56:51,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 4804608. Throughput: 0: 858.6. Samples: 1201232. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:56:51,288][910102] Avg episode reward: [(0, '19.338')] [2025-08-18 09:56:56,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 4820992. Throughput: 0: 862.8. Samples: 1206544. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:56:56,288][910102] Avg episode reward: [(0, '20.102')] [2025-08-18 09:56:59,002][910312] Updated weights for policy 0, policy_version 1180 (0.2411) [2025-08-18 09:57:01,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4841472. Throughput: 0: 865.9. Samples: 1211820. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:57:01,288][910102] Avg episode reward: [(0, '20.411')] [2025-08-18 09:57:06,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3481.6, 300 sec: 3443.4). Total num frames: 4857856. Throughput: 0: 864.4. Samples: 1213908. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:57:06,288][910102] Avg episode reward: [(0, '20.303')] [2025-08-18 09:57:11,036][910312] Updated weights for policy 0, policy_version 1190 (0.2371) [2025-08-18 09:57:11,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 4874240. Throughput: 0: 866.6. Samples: 1219052. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:57:11,288][910102] Avg episode reward: [(0, '21.208')] [2025-08-18 09:57:16,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 4890624. Throughput: 0: 865.5. Samples: 1224208. Policy #0 lag: (min: 1.0, avg: 1.4, max: 2.0) [2025-08-18 09:57:16,288][910102] Avg episode reward: [(0, '22.292')] [2025-08-18 09:57:21,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 4907008. Throughput: 0: 865.5. Samples: 1227160. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:57:21,288][910102] Avg episode reward: [(0, '22.455')] [2025-08-18 09:57:23,212][910312] Updated weights for policy 0, policy_version 1200 (0.2357) [2025-08-18 09:57:23,605][910239] Signal inference workers to stop experience collection... (1200 times) [2025-08-18 09:57:23,620][910312] InferenceWorker_p0-w0: stopping experience collection (1200 times) [2025-08-18 09:57:24,229][910239] Signal inference workers to resume experience collection... (1200 times) [2025-08-18 09:57:24,230][910312] InferenceWorker_p0-w0: resuming experience collection (1200 times) [2025-08-18 09:57:26,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 4923392. Throughput: 0: 852.2. Samples: 1231692. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:57:26,288][910102] Avg episode reward: [(0, '21.256')] [2025-08-18 09:57:31,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 4939776. Throughput: 0: 854.0. Samples: 1236908. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:57:31,288][910102] Avg episode reward: [(0, '21.290')] [2025-08-18 09:57:35,040][910312] Updated weights for policy 0, policy_version 1210 (0.2370) [2025-08-18 09:57:36,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3443.4). Total num frames: 4960256. Throughput: 0: 850.9. Samples: 1239524. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:57:36,288][910102] Avg episode reward: [(0, '22.303')] [2025-08-18 09:57:37,119][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001212_4964352.pth... [2025-08-18 09:57:37,138][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001011_4141056.pth [2025-08-18 09:57:41,288][910102] Fps is (10 sec: 3686.4, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 4976640. Throughput: 0: 847.8. Samples: 1244696. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:57:41,288][910102] Avg episode reward: [(0, '22.560')] [2025-08-18 09:57:46,288][910102] Fps is (10 sec: 3276.8, 60 sec: 3413.3, 300 sec: 3429.5). Total num frames: 4993024. Throughput: 0: 851.7. Samples: 1250148. Policy #0 lag: (min: 1.0, avg: 1.5, max: 2.0) [2025-08-18 09:57:46,288][910102] Avg episode reward: [(0, '21.263')] [2025-08-18 09:57:46,603][910312] Updated weights for policy 0, policy_version 1220 (0.2118) [2025-08-18 09:57:48,704][910239] Stopping Batcher_0... [2025-08-18 09:57:48,704][910102] Component Batcher_0 stopped! [2025-08-18 09:57:48,704][910239] Loop batcher_evt_loop terminating... [2025-08-18 09:57:48,751][910315] Stopping RolloutWorker_w3... [2025-08-18 09:57:48,751][910314] Stopping RolloutWorker_w2... [2025-08-18 09:57:48,752][910315] Loop rollout_proc3_evt_loop terminating... [2025-08-18 09:57:48,752][910102] Component RolloutWorker_w3 stopped! [2025-08-18 09:57:48,752][910314] Loop rollout_proc2_evt_loop terminating... [2025-08-18 09:57:48,752][910102] Component RolloutWorker_w2 stopped! [2025-08-18 09:57:48,752][910313] Stopping RolloutWorker_w0... [2025-08-18 09:57:48,752][910102] Component RolloutWorker_w0 stopped! [2025-08-18 09:57:48,753][910313] Loop rollout_proc0_evt_loop terminating... [2025-08-18 09:57:48,755][910316] Stopping RolloutWorker_w1... [2025-08-18 09:57:48,755][910102] Component RolloutWorker_w1 stopped! [2025-08-18 09:57:48,755][910316] Loop rollout_proc1_evt_loop terminating... [2025-08-18 09:57:48,761][910312] Weights refcount: 2 0 [2025-08-18 09:57:48,762][910312] Stopping InferenceWorker_p0-w0... [2025-08-18 09:57:48,762][910312] Loop inference_proc0-0_evt_loop terminating... [2025-08-18 09:57:48,762][910102] Component InferenceWorker_p0-w0 stopped! [2025-08-18 09:57:49,926][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001223_5009408.pth... [2025-08-18 09:57:49,944][910239] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001112_4554752.pth [2025-08-18 09:57:49,947][910239] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001223_5009408.pth... [2025-08-18 09:57:49,978][910239] Stopping LearnerWorker_p0... [2025-08-18 09:57:49,978][910102] Component LearnerWorker_p0 stopped! [2025-08-18 09:57:49,978][910239] Loop learner_proc0_evt_loop terminating... [2025-08-18 09:57:49,979][910102] Waiting for process learner_proc0 to stop... [2025-08-18 09:57:50,375][910102] Waiting for process inference_proc0-0 to join... [2025-08-18 09:57:50,375][910102] Waiting for process rollout_proc0 to join... [2025-08-18 09:57:50,375][910102] Waiting for process rollout_proc1 to join... [2025-08-18 09:57:50,375][910102] Waiting for process rollout_proc2 to join... [2025-08-18 09:57:50,375][910102] Waiting for process rollout_proc3 to join... [2025-08-18 09:57:50,376][910102] Batcher 0 profile tree view: batching: 6.4178, releasing_batches: 0.0735 [2025-08-18 09:57:50,376][910102] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0051 wait_policy_total: 5.8688 update_model: 262.6806 weight_update: 0.2111 one_step: 0.0147 handle_policy_step: 491.0925 deserialize: 6.0094, stack: 0.8847, obs_to_device_normalize: 48.9942, forward: 415.2245, send_messages: 6.0729 prepare_outputs: 7.8385 to_cpu: 0.7203 [2025-08-18 09:57:50,376][910102] Learner 0 profile tree view: misc: 0.0026, prepare_batch: 350.6982 train: 1088.8358 epoch_init: 0.0030, minibatch_init: 0.0057, losses_postprocess: 0.0468, kl_divergence: 0.1612, after_optimizer: 0.8721 calculate_losses: 407.9004 losses_init: 0.0020, forward_head: 357.2708, bptt_initial: 1.5250, tail: 1.0184, advantages_returns: 0.0971, losses: 0.5085 bptt: 47.3165 bptt_forward_core: 46.9644 update: 679.4580 clip: 2.0378 [2025-08-18 09:57:50,376][910102] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.1408, enqueue_policy_requests: 8.8424, env_step: 141.1268, overhead: 7.9617, complete_rollouts: 0.1852 save_policy_outputs: 12.7979 split_output_tensors: 4.2109 [2025-08-18 09:57:50,376][910102] RolloutWorker_w3 profile tree view: wait_for_trajectories: 0.1447, enqueue_policy_requests: 8.9843, env_step: 143.3399, overhead: 8.0621, complete_rollouts: 0.1831 save_policy_outputs: 13.0935 split_output_tensors: 4.3238 [2025-08-18 09:57:50,376][910102] Loop Runner_EvtLoop terminating... [2025-08-18 09:57:50,376][910102] Runner profile tree view: main_loop: 1452.8900 [2025-08-18 09:57:50,376][910102] Collected {0: 5009408}, FPS: 3447.9 [2025-08-18 09:57:50,460][910102] Loading existing experiment configuration from /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/config.json [2025-08-18 09:57:50,460][910102] Overriding arg 'num_workers' with value 1 passed from command line [2025-08-18 09:57:50,460][910102] Adding new argument 'no_render'=True that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'save_video'=True that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'video_name'=None that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'hf_repository'='ArunKr/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'train_script'=None that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-08-18 09:57:50,460][910102] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-08-18 09:57:50,474][910102] Doom resolution: 160x120, resize resolution: (128, 72) [2025-08-18 09:57:50,475][910102] RunningMeanStd input shape: (3, 72, 128) [2025-08-18 09:57:50,475][910102] RunningMeanStd input shape: (1,) [2025-08-18 09:57:50,482][910102] ConvEncoder: input_channels=3 [2025-08-18 09:57:50,537][910102] Conv encoder output size: 512 [2025-08-18 09:57:50,537][910102] Policy head output size: 512 [2025-08-18 09:57:50,547][910102] Loading state from checkpoint /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001223_5009408.pth... [2025-08-18 09:57:50,792][910102] Num frames 100... [2025-08-18 09:57:50,863][910102] Num frames 200... [2025-08-18 09:57:50,935][910102] Num frames 300... [2025-08-18 09:57:51,011][910102] Num frames 400... [2025-08-18 09:57:51,079][910102] Num frames 500... [2025-08-18 09:57:51,166][910102] Avg episode rewards: #0: 9.440, true rewards: #0: 5.440 [2025-08-18 09:57:51,167][910102] Avg episode reward: 9.440, avg true_objective: 5.440 [2025-08-18 09:57:51,251][910102] Num frames 600... [2025-08-18 09:57:51,319][910102] Num frames 700... [2025-08-18 09:57:51,390][910102] Num frames 800... [2025-08-18 09:57:51,469][910102] Num frames 900... [2025-08-18 09:57:51,546][910102] Num frames 1000... [2025-08-18 09:57:51,614][910102] Num frames 1100... [2025-08-18 09:57:51,682][910102] Num frames 1200... [2025-08-18 09:57:51,750][910102] Num frames 1300... [2025-08-18 09:57:51,812][910102] Avg episode rewards: #0: 12.560, true rewards: #0: 6.560 [2025-08-18 09:57:51,812][910102] Avg episode reward: 12.560, avg true_objective: 6.560 [2025-08-18 09:57:51,912][910102] Num frames 1400... [2025-08-18 09:57:51,987][910102] Num frames 1500... [2025-08-18 09:57:52,060][910102] Num frames 1600... [2025-08-18 09:57:52,175][910102] Avg episode rewards: #0: 9.653, true rewards: #0: 5.653 [2025-08-18 09:57:52,175][910102] Avg episode reward: 9.653, avg true_objective: 5.653 [2025-08-18 09:57:52,187][910102] Num frames 1700... [2025-08-18 09:57:52,283][910102] Num frames 1800... [2025-08-18 09:57:52,360][910102] Num frames 1900... [2025-08-18 09:57:52,430][910102] Num frames 2000... [2025-08-18 09:57:52,499][910102] Num frames 2100... [2025-08-18 09:57:52,569][910102] Num frames 2200... [2025-08-18 09:57:52,657][910102] Avg episode rewards: #0: 9.370, true rewards: #0: 5.620 [2025-08-18 09:57:52,658][910102] Avg episode reward: 9.370, avg true_objective: 5.620 [2025-08-18 09:57:52,734][910102] Num frames 2300... [2025-08-18 09:57:52,808][910102] Num frames 2400... [2025-08-18 09:57:52,874][910102] Num frames 2500... [2025-08-18 09:57:52,948][910102] Num frames 2600... [2025-08-18 09:57:53,078][910102] Avg episode rewards: #0: 8.992, true rewards: #0: 5.392 [2025-08-18 09:57:53,078][910102] Avg episode reward: 8.992, avg true_objective: 5.392 [2025-08-18 09:57:53,083][910102] Num frames 2700... [2025-08-18 09:57:53,170][910102] Num frames 2800... [2025-08-18 09:57:53,244][910102] Num frames 2900... [2025-08-18 09:57:53,318][910102] Num frames 3000... [2025-08-18 09:57:53,395][910102] Num frames 3100... [2025-08-18 09:57:53,471][910102] Num frames 3200... [2025-08-18 09:57:53,553][910102] Avg episode rewards: #0: 8.900, true rewards: #0: 5.400 [2025-08-18 09:57:53,554][910102] Avg episode reward: 8.900, avg true_objective: 5.400 [2025-08-18 09:57:53,630][910102] Num frames 3300... [2025-08-18 09:57:53,706][910102] Num frames 3400... [2025-08-18 09:57:53,779][910102] Num frames 3500... [2025-08-18 09:57:53,846][910102] Num frames 3600... [2025-08-18 09:57:53,914][910102] Num frames 3700... [2025-08-18 09:57:53,980][910102] Num frames 3800... [2025-08-18 09:57:54,046][910102] Num frames 3900... [2025-08-18 09:57:54,113][910102] Num frames 4000... [2025-08-18 09:57:54,204][910102] Avg episode rewards: #0: 10.227, true rewards: #0: 5.799 [2025-08-18 09:57:54,205][910102] Avg episode reward: 10.227, avg true_objective: 5.799 [2025-08-18 09:57:54,274][910102] Num frames 4100... [2025-08-18 09:57:54,341][910102] Num frames 4200... [2025-08-18 09:57:54,417][910102] Num frames 4300... [2025-08-18 09:57:54,491][910102] Num frames 4400... [2025-08-18 09:57:54,567][910102] Num frames 4500... [2025-08-18 09:57:54,643][910102] Num frames 4600... [2025-08-18 09:57:54,769][910102] Avg episode rewards: #0: 10.374, true rewards: #0: 5.874 [2025-08-18 09:57:54,769][910102] Avg episode reward: 10.374, avg true_objective: 5.874 [2025-08-18 09:57:54,771][910102] Num frames 4700... [2025-08-18 09:57:54,880][910102] Num frames 4800... [2025-08-18 09:57:54,954][910102] Num frames 4900... [2025-08-18 09:57:55,023][910102] Num frames 5000... [2025-08-18 09:57:55,096][910102] Num frames 5100... [2025-08-18 09:57:55,162][910102] Num frames 5200... [2025-08-18 09:57:55,229][910102] Num frames 5300... [2025-08-18 09:57:55,303][910102] Num frames 5400... [2025-08-18 09:57:55,377][910102] Num frames 5500... [2025-08-18 09:57:55,453][910102] Num frames 5600... [2025-08-18 09:57:55,534][910102] Num frames 5700... [2025-08-18 09:57:55,589][910102] Avg episode rewards: #0: 11.892, true rewards: #0: 6.337 [2025-08-18 09:57:55,590][910102] Avg episode reward: 11.892, avg true_objective: 6.337 [2025-08-18 09:57:55,702][910102] Num frames 5800... [2025-08-18 09:57:55,775][910102] Num frames 5900... [2025-08-18 09:57:55,851][910102] Num frames 6000... [2025-08-18 09:57:55,915][910102] Num frames 6100... [2025-08-18 09:57:55,981][910102] Num frames 6200... [2025-08-18 09:57:56,047][910102] Num frames 6300... [2025-08-18 09:57:56,113][910102] Num frames 6400... [2025-08-18 09:57:56,188][910102] Num frames 6500... [2025-08-18 09:57:56,262][910102] Num frames 6600... [2025-08-18 09:57:56,339][910102] Avg episode rewards: #0: 12.331, true rewards: #0: 6.631 [2025-08-18 09:57:56,339][910102] Avg episode reward: 12.331, avg true_objective: 6.631 [2025-08-18 09:58:04,696][910102] Replay video saved to /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/replay.mp4! [2025-08-18 09:58:34,312][910102] The model has been pushed to https://huggingface.co/ArunKr/rl_course_vizdoom_health_gathering_supreme [2025-08-18 21:56:42,067][13694] Saving configuration to /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/config.json... [2025-08-18 21:56:42,068][13694] Rollout worker 0 uses device cpu [2025-08-18 21:56:42,068][13694] Rollout worker 1 uses device cpu [2025-08-18 21:56:42,068][13694] Rollout worker 2 uses device cpu [2025-08-18 21:56:42,068][13694] Rollout worker 3 uses device cpu [2025-08-18 21:56:42,115][13694] InferenceWorker_p0-w0: min num requests: 1 [2025-08-18 21:56:42,122][13694] Starting all processes... [2025-08-18 21:56:42,122][13694] Starting process learner_proc0 [2025-08-18 21:56:43,034][13694] Starting all processes... [2025-08-18 21:56:43,039][13694] Starting process inference_proc0-0 [2025-08-18 21:56:43,039][13694] Starting process rollout_proc0 [2025-08-18 21:56:43,039][14079] Starting seed is not provided [2025-08-18 21:56:43,039][14079] Initializing actor-critic model on device cpu [2025-08-18 21:56:43,039][14079] RunningMeanStd input shape: (3, 72, 128) [2025-08-18 21:56:43,041][14079] RunningMeanStd input shape: (1,) [2025-08-18 21:56:43,039][13694] Starting process rollout_proc1 [2025-08-18 21:56:43,039][13694] Starting process rollout_proc2 [2025-08-18 21:56:43,041][13694] Starting process rollout_proc3 [2025-08-18 21:56:43,047][14079] ConvEncoder: input_channels=3 [2025-08-18 21:56:43,196][14079] Conv encoder output size: 512 [2025-08-18 21:56:43,196][14079] Policy head output size: 512 [2025-08-18 21:56:43,224][14079] Created Actor Critic model with architecture: [2025-08-18 21:56:43,225][14079] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-08-18 21:56:43,504][14079] Using optimizer [2025-08-18 21:56:44,060][14151] Worker 1 uses CPU cores [4, 5, 6, 7] [2025-08-18 21:56:44,074][14149] Worker 0 uses CPU cores [0, 1, 2, 3] [2025-08-18 21:56:44,100][14153] Worker 3 uses CPU cores [12, 13, 14, 15] [2025-08-18 21:56:44,100][14152] Worker 2 uses CPU cores [8, 9, 10, 11] [2025-08-18 21:56:44,615][14079] Loading state from checkpoint /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001223_5009408.pth... [2025-08-18 21:56:44,649][14079] Loading model from checkpoint [2025-08-18 21:56:44,651][14079] Loaded experiment state at self.train_step=1223, self.env_steps=5009408 [2025-08-18 21:56:44,651][14079] Initialized policy 0 weights for model version 1223 [2025-08-18 21:56:44,652][14079] LearnerWorker_p0 finished initialization! [2025-08-18 21:56:44,653][14150] RunningMeanStd input shape: (3, 72, 128) [2025-08-18 21:56:44,654][14150] RunningMeanStd input shape: (1,) [2025-08-18 21:56:44,661][14150] ConvEncoder: input_channels=3 [2025-08-18 21:56:44,713][14150] Conv encoder output size: 512 [2025-08-18 21:56:44,714][14150] Policy head output size: 512 [2025-08-18 21:56:44,723][13694] Inference worker 0-0 is ready! [2025-08-18 21:56:44,723][13694] All inference workers are ready! Signal rollout workers to start! [2025-08-18 21:56:44,750][14151] Doom resolution: 160x120, resize resolution: (128, 72) [2025-08-18 21:56:44,750][14153] Doom resolution: 160x120, resize resolution: (128, 72) [2025-08-18 21:56:44,750][14152] Doom resolution: 160x120, resize resolution: (128, 72) [2025-08-18 21:56:44,750][14149] Doom resolution: 160x120, resize resolution: (128, 72) [2025-08-18 21:56:44,924][13694] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 5009408. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-08-18 21:56:45,100][14152] Decorrelating experience for 0 frames... [2025-08-18 21:56:45,100][14149] Decorrelating experience for 0 frames... [2025-08-18 21:56:45,100][14151] Decorrelating experience for 0 frames... [2025-08-18 21:56:45,100][14153] Decorrelating experience for 0 frames... [2025-08-18 21:56:45,244][14152] Decorrelating experience for 32 frames... [2025-08-18 21:56:45,246][14153] Decorrelating experience for 32 frames... [2025-08-18 21:56:45,249][14151] Decorrelating experience for 32 frames... [2025-08-18 21:56:45,392][14149] Decorrelating experience for 32 frames... [2025-08-18 21:56:45,396][14152] Decorrelating experience for 64 frames... [2025-08-18 21:56:45,397][14153] Decorrelating experience for 64 frames... [2025-08-18 21:56:45,543][14149] Decorrelating experience for 64 frames... [2025-08-18 21:56:45,550][14151] Decorrelating experience for 64 frames... [2025-08-18 21:56:45,553][14153] Decorrelating experience for 96 frames... [2025-08-18 21:56:45,705][14149] Decorrelating experience for 96 frames... [2025-08-18 21:56:45,710][14152] Decorrelating experience for 96 frames... [2025-08-18 21:56:45,729][14151] Decorrelating experience for 96 frames... [2025-08-18 21:56:45,794][14153] Decorrelating experience for 128 frames... [2025-08-18 21:56:45,931][14152] Decorrelating experience for 128 frames... [2025-08-18 21:56:45,956][14151] Decorrelating experience for 128 frames... [2025-08-18 21:56:45,978][14153] Decorrelating experience for 160 frames... [2025-08-18 21:56:45,987][14149] Decorrelating experience for 128 frames... [2025-08-18 21:56:46,144][14151] Decorrelating experience for 160 frames... [2025-08-18 21:56:46,166][14149] Decorrelating experience for 160 frames... [2025-08-18 21:56:46,170][14152] Decorrelating experience for 160 frames... [2025-08-18 21:56:46,179][14153] Decorrelating experience for 192 frames... [2025-08-18 21:56:46,355][14151] Decorrelating experience for 192 frames... [2025-08-18 21:56:46,356][14152] Decorrelating experience for 192 frames... [2025-08-18 21:56:46,357][14149] Decorrelating experience for 192 frames... [2025-08-18 21:56:46,372][14153] Decorrelating experience for 224 frames... [2025-08-18 21:56:46,568][14152] Decorrelating experience for 224 frames... [2025-08-18 21:56:46,569][14149] Decorrelating experience for 224 frames... [2025-08-18 21:56:46,574][14151] Decorrelating experience for 224 frames... [2025-08-18 21:56:47,577][14079] Signal inference workers to stop experience collection... [2025-08-18 21:56:47,587][14150] InferenceWorker_p0-w0: stopping experience collection [2025-08-18 21:56:48,375][14079] Signal inference workers to resume experience collection... [2025-08-18 21:56:48,375][14079] Stopping Batcher_0... [2025-08-18 21:56:48,375][14079] Loop batcher_evt_loop terminating... [2025-08-18 21:56:48,378][13694] Component Batcher_0 stopped! [2025-08-18 21:56:48,386][14150] Weights refcount: 2 0 [2025-08-18 21:56:48,387][14150] Stopping InferenceWorker_p0-w0... [2025-08-18 21:56:48,387][14150] Loop inference_proc0-0_evt_loop terminating... [2025-08-18 21:56:48,387][13694] Component InferenceWorker_p0-w0 stopped! [2025-08-18 21:56:48,418][14152] Stopping RolloutWorker_w2... [2025-08-18 21:56:48,418][13694] Component RolloutWorker_w2 stopped! [2025-08-18 21:56:48,418][14152] Loop rollout_proc2_evt_loop terminating... [2025-08-18 21:56:48,419][14151] Stopping RolloutWorker_w1... [2025-08-18 21:56:48,419][14153] Stopping RolloutWorker_w3... [2025-08-18 21:56:48,419][13694] Component RolloutWorker_w1 stopped! [2025-08-18 21:56:48,419][14153] Loop rollout_proc3_evt_loop terminating... [2025-08-18 21:56:48,419][14151] Loop rollout_proc1_evt_loop terminating... [2025-08-18 21:56:48,419][13694] Component RolloutWorker_w3 stopped! [2025-08-18 21:56:48,421][14149] Stopping RolloutWorker_w0... [2025-08-18 21:56:48,421][13694] Component RolloutWorker_w0 stopped! [2025-08-18 21:56:48,421][14149] Loop rollout_proc0_evt_loop terminating... [2025-08-18 21:56:49,536][14079] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001225_5017600.pth... [2025-08-18 21:56:49,555][14079] Removing /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001212_4964352.pth [2025-08-18 21:56:49,556][14079] Saving /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001225_5017600.pth... [2025-08-18 21:56:49,583][14079] Stopping LearnerWorker_p0... [2025-08-18 21:56:49,583][13694] Component LearnerWorker_p0 stopped! [2025-08-18 21:56:49,583][14079] Loop learner_proc0_evt_loop terminating... [2025-08-18 21:56:49,583][13694] Waiting for process learner_proc0 to stop... [2025-08-18 21:56:49,938][13694] Waiting for process inference_proc0-0 to join... [2025-08-18 21:56:49,938][13694] Waiting for process rollout_proc0 to join... [2025-08-18 21:56:49,938][13694] Waiting for process rollout_proc1 to join... [2025-08-18 21:56:49,938][13694] Waiting for process rollout_proc2 to join... [2025-08-18 21:56:49,938][13694] Waiting for process rollout_proc3 to join... [2025-08-18 21:56:49,938][13694] Batcher 0 profile tree view: batching: 0.0102, releasing_batches: 0.0004 [2025-08-18 21:56:49,938][13694] InferenceWorker_p0-w0 profile tree view: update_model: 0.0035 wait_policy: 0.0000 wait_policy_total: 1.7177 one_step: 0.0056 handle_policy_step: 1.1209 deserialize: 0.0205, stack: 0.0019, obs_to_device_normalize: 0.1733, forward: 0.8678, send_messages: 0.0114 prepare_outputs: 0.0171 to_cpu: 0.0016 [2025-08-18 21:56:49,938][13694] Learner 0 profile tree view: misc: 0.0000, prepare_batch: 0.6942 train: 1.7142 epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0006, kl_divergence: 0.0003, after_optimizer: 0.0029 calculate_losses: 0.6491 losses_init: 0.0000, forward_head: 0.5651, bptt_initial: 0.0093, tail: 0.0031, advantages_returns: 0.0010, losses: 0.0016 bptt: 0.0688 bptt_forward_core: 0.0683 update: 1.0608 clip: 0.0068 [2025-08-18 21:56:49,939][13694] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0004, enqueue_policy_requests: 0.0149, env_step: 0.1893, overhead: 0.0098, complete_rollouts: 0.0003 save_policy_outputs: 0.0162 split_output_tensors: 0.0053 [2025-08-18 21:56:49,939][13694] RolloutWorker_w3 profile tree view: wait_for_trajectories: 0.0005, enqueue_policy_requests: 0.0230, env_step: 0.3083, overhead: 0.0152, complete_rollouts: 0.0004 save_policy_outputs: 0.0238 split_output_tensors: 0.0077 [2025-08-18 21:56:49,939][13694] Loop Runner_EvtLoop terminating... [2025-08-18 21:56:49,939][13694] Runner profile tree view: main_loop: 7.8168 [2025-08-18 21:56:49,939][13694] Collected {0: 5017600}, FPS: 1048.0 [2025-08-18 21:56:50,090][13694] Loading existing experiment configuration from /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/config.json [2025-08-18 21:56:50,090][13694] Overriding arg 'num_workers' with value 1 passed from command line [2025-08-18 21:56:50,090][13694] Adding new argument 'no_render'=True that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'save_video'=True that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'video_name'=None that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'hf_repository'='ArunKr/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'train_script'=None that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-08-18 21:56:50,090][13694] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-08-18 21:56:50,103][13694] Doom resolution: 160x120, resize resolution: (128, 72) [2025-08-18 21:56:50,104][13694] RunningMeanStd input shape: (3, 72, 128) [2025-08-18 21:56:50,104][13694] RunningMeanStd input shape: (1,) [2025-08-18 21:56:50,111][13694] ConvEncoder: input_channels=3 [2025-08-18 21:56:50,164][13694] Conv encoder output size: 512 [2025-08-18 21:56:50,164][13694] Policy head output size: 512 [2025-08-18 21:56:50,172][13694] Loading state from checkpoint /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/checkpoint_p0/checkpoint_000001225_5017600.pth... [2025-08-18 21:56:50,404][13694] Num frames 100... [2025-08-18 21:56:50,475][13694] Num frames 200... [2025-08-18 21:56:50,546][13694] Num frames 300... [2025-08-18 21:56:50,617][13694] Num frames 400... [2025-08-18 21:56:50,684][13694] Num frames 500... [2025-08-18 21:56:50,754][13694] Num frames 600... [2025-08-18 21:56:50,826][13694] Num frames 700... [2025-08-18 21:56:50,900][13694] Num frames 800... [2025-08-18 21:56:50,977][13694] Num frames 900... [2025-08-18 21:56:51,051][13694] Num frames 1000... [2025-08-18 21:56:51,125][13694] Num frames 1100... [2025-08-18 21:56:51,193][13694] Num frames 1200... [2025-08-18 21:56:51,259][13694] Num frames 1300... [2025-08-18 21:56:51,332][13694] Num frames 1400... [2025-08-18 21:56:51,405][13694] Num frames 1500... [2025-08-18 21:56:51,479][13694] Num frames 1600... [2025-08-18 21:56:51,598][13694] Avg episode rewards: #0: 39.959, true rewards: #0: 16.960 [2025-08-18 21:56:51,599][13694] Avg episode reward: 39.959, avg true_objective: 16.960 [2025-08-18 21:56:51,612][13694] Num frames 1700... [2025-08-18 21:56:51,709][13694] Num frames 1800... [2025-08-18 21:56:51,774][13694] Num frames 1900... [2025-08-18 21:56:51,841][13694] Num frames 2000... [2025-08-18 21:56:51,920][13694] Num frames 2100... [2025-08-18 21:56:51,993][13694] Num frames 2200... [2025-08-18 21:56:52,066][13694] Num frames 2300... [2025-08-18 21:56:52,133][13694] Num frames 2400... [2025-08-18 21:56:52,197][13694] Num frames 2500... [2025-08-18 21:56:52,265][13694] Num frames 2600... [2025-08-18 21:56:52,337][13694] Num frames 2700... [2025-08-18 21:56:52,425][13694] Avg episode rewards: #0: 34.275, true rewards: #0: 13.775 [2025-08-18 21:56:52,426][13694] Avg episode reward: 34.275, avg true_objective: 13.775 [2025-08-18 21:56:52,497][13694] Num frames 2800... [2025-08-18 21:56:52,569][13694] Num frames 2900... [2025-08-18 21:56:52,637][13694] Num frames 3000... [2025-08-18 21:56:52,703][13694] Num frames 3100... [2025-08-18 21:56:52,771][13694] Num frames 3200... [2025-08-18 21:56:52,837][13694] Num frames 3300... [2025-08-18 21:56:52,901][13694] Num frames 3400... [2025-08-18 21:56:52,976][13694] Num frames 3500... [2025-08-18 21:56:53,071][13694] Avg episode rewards: #0: 28.850, true rewards: #0: 11.850 [2025-08-18 21:56:53,071][13694] Avg episode reward: 28.850, avg true_objective: 11.850 [2025-08-18 21:56:53,143][13694] Num frames 3600... [2025-08-18 21:56:53,215][13694] Num frames 3700... [2025-08-18 21:56:53,278][13694] Num frames 3800... [2025-08-18 21:56:53,354][13694] Num frames 3900... [2025-08-18 21:56:53,430][13694] Num frames 4000... [2025-08-18 21:56:53,504][13694] Num frames 4100... [2025-08-18 21:56:53,574][13694] Num frames 4200... [2025-08-18 21:56:53,642][13694] Num frames 4300... [2025-08-18 21:56:53,711][13694] Num frames 4400... [2025-08-18 21:56:53,826][13694] Avg episode rewards: #0: 26.457, true rewards: #0: 11.207 [2025-08-18 21:56:53,827][13694] Avg episode reward: 26.457, avg true_objective: 11.207 [2025-08-18 21:56:53,869][13694] Num frames 4500... [2025-08-18 21:56:53,950][13694] Num frames 4600... [2025-08-18 21:56:54,022][13694] Num frames 4700... [2025-08-18 21:56:54,101][13694] Num frames 4800... [2025-08-18 21:56:54,173][13694] Num frames 4900... [2025-08-18 21:56:54,241][13694] Num frames 5000... [2025-08-18 21:56:54,352][13694] Avg episode rewards: #0: 24.182, true rewards: #0: 10.182 [2025-08-18 21:56:54,352][13694] Avg episode reward: 24.182, avg true_objective: 10.182 [2025-08-18 21:56:54,378][13694] Num frames 5100... [2025-08-18 21:56:54,475][13694] Num frames 5200... [2025-08-18 21:56:54,548][13694] Num frames 5300... [2025-08-18 21:56:54,620][13694] Num frames 5400... [2025-08-18 21:56:54,685][13694] Num frames 5500... [2025-08-18 21:56:54,756][13694] Num frames 5600... [2025-08-18 21:56:54,841][13694] Avg episode rewards: #0: 21.915, true rewards: #0: 9.415 [2025-08-18 21:56:54,842][13694] Avg episode reward: 21.915, avg true_objective: 9.415 [2025-08-18 21:56:54,921][13694] Num frames 5700... [2025-08-18 21:56:54,994][13694] Num frames 5800... [2025-08-18 21:56:55,068][13694] Num frames 5900... [2025-08-18 21:56:55,139][13694] Num frames 6000... [2025-08-18 21:56:55,208][13694] Num frames 6100... [2025-08-18 21:56:55,270][13694] Num frames 6200... [2025-08-18 21:56:55,337][13694] Num frames 6300... [2025-08-18 21:56:55,404][13694] Num frames 6400... [2025-08-18 21:56:55,470][13694] Num frames 6500... [2025-08-18 21:56:55,534][13694] Num frames 6600... [2025-08-18 21:56:55,600][13694] Num frames 6700... [2025-08-18 21:56:55,670][13694] Num frames 6800... [2025-08-18 21:56:55,738][13694] Num frames 6900... [2025-08-18 21:56:55,804][13694] Num frames 7000... [2025-08-18 21:56:55,877][13694] Num frames 7100... [2025-08-18 21:56:55,949][13694] Num frames 7200... [2025-08-18 21:56:56,015][13694] Num frames 7300... [2025-08-18 21:56:56,104][13694] Avg episode rewards: #0: 25.636, true rewards: #0: 10.493 [2025-08-18 21:56:56,104][13694] Avg episode reward: 25.636, avg true_objective: 10.493 [2025-08-18 21:56:56,145][13694] Num frames 7400... [2025-08-18 21:56:56,212][13694] Num frames 7500... [2025-08-18 21:56:56,278][13694] Num frames 7600... [2025-08-18 21:56:56,340][13694] Num frames 7700... [2025-08-18 21:56:56,404][13694] Num frames 7800... [2025-08-18 21:56:56,519][13694] Avg episode rewards: #0: 23.361, true rewards: #0: 9.861 [2025-08-18 21:56:56,520][13694] Avg episode reward: 23.361, avg true_objective: 9.861 [2025-08-18 21:56:56,531][13694] Num frames 7900... [2025-08-18 21:56:56,597][13694] Num frames 8000... [2025-08-18 21:56:56,664][13694] Num frames 8100... [2025-08-18 21:56:56,734][13694] Num frames 8200... [2025-08-18 21:56:56,797][13694] Num frames 8300... [2025-08-18 21:56:56,863][13694] Num frames 8400... [2025-08-18 21:56:56,936][13694] Num frames 8500... [2025-08-18 21:56:57,008][13694] Num frames 8600... [2025-08-18 21:56:57,073][13694] Num frames 8700... [2025-08-18 21:56:57,147][13694] Num frames 8800... [2025-08-18 21:56:57,218][13694] Num frames 8900... [2025-08-18 21:56:57,282][13694] Num frames 9000... [2025-08-18 21:56:57,357][13694] Num frames 9100... [2025-08-18 21:56:57,431][13694] Num frames 9200... [2025-08-18 21:56:57,504][13694] Num frames 9300... [2025-08-18 21:56:57,578][13694] Num frames 9400... [2025-08-18 21:56:57,646][13694] Num frames 9500... [2025-08-18 21:56:57,710][13694] Num frames 9600... [2025-08-18 21:56:57,806][13694] Avg episode rewards: #0: 25.739, true rewards: #0: 10.739 [2025-08-18 21:56:57,807][13694] Avg episode reward: 25.739, avg true_objective: 10.739 [2025-08-18 21:56:57,874][13694] Num frames 9700... [2025-08-18 21:56:57,942][13694] Num frames 9800... [2025-08-18 21:56:58,018][13694] Num frames 9900... [2025-08-18 21:56:58,086][13694] Num frames 10000... [2025-08-18 21:56:58,193][13694] Avg episode rewards: #0: 23.781, true rewards: #0: 10.081 [2025-08-18 21:56:58,193][13694] Avg episode reward: 23.781, avg true_objective: 10.081 [2025-08-18 21:57:10,464][13694] Replay video saved to /home/arun/workspace-2/hf_drl_course/train_dir/default_experiment/replay.mp4!