[2025-03-11 09:05:39,117][01034] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-03-11 09:05:39,120][01034] Rollout worker 0 uses device cpu [2025-03-11 09:05:39,121][01034] Rollout worker 1 uses device cpu [2025-03-11 09:05:39,122][01034] Rollout worker 2 uses device cpu [2025-03-11 09:05:39,124][01034] Rollout worker 3 uses device cpu [2025-03-11 09:05:39,124][01034] Rollout worker 4 uses device cpu [2025-03-11 09:05:39,125][01034] Rollout worker 5 uses device cpu [2025-03-11 09:05:39,126][01034] Rollout worker 6 uses device cpu [2025-03-11 09:05:39,127][01034] Rollout worker 7 uses device cpu [2025-03-11 09:05:39,278][01034] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-03-11 09:05:39,279][01034] InferenceWorker_p0-w0: min num requests: 2 [2025-03-11 09:05:39,309][01034] Starting all processes... [2025-03-11 09:05:39,310][01034] Starting process learner_proc0 [2025-03-11 09:05:39,367][01034] Starting all processes... [2025-03-11 09:05:39,375][01034] Starting process inference_proc0-0 [2025-03-11 09:05:39,375][01034] Starting process rollout_proc0 [2025-03-11 09:05:39,375][01034] Starting process rollout_proc1 [2025-03-11 09:05:39,376][01034] Starting process rollout_proc2 [2025-03-11 09:05:39,376][01034] Starting process rollout_proc3 [2025-03-11 09:05:39,376][01034] Starting process rollout_proc4 [2025-03-11 09:05:39,376][01034] Starting process rollout_proc5 [2025-03-11 09:05:39,376][01034] Starting process rollout_proc6 [2025-03-11 09:05:39,376][01034] Starting process rollout_proc7 [2025-03-11 09:05:55,901][03389] Worker 5 uses CPU cores [1] [2025-03-11 09:05:55,904][03372] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-03-11 09:05:55,909][03372] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-03-11 09:05:55,969][03372] Num visible devices: 1 [2025-03-11 09:05:56,015][03372] Starting seed is not provided [2025-03-11 09:05:56,016][03372] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-03-11 09:05:56,017][03372] Initializing actor-critic model on device cuda:0 [2025-03-11 09:05:56,018][03372] RunningMeanStd input shape: (3, 72, 128) [2025-03-11 09:05:56,021][03372] RunningMeanStd input shape: (1,) [2025-03-11 09:05:56,046][03393] Worker 6 uses CPU cores [0] [2025-03-11 09:05:56,073][03392] Worker 7 uses CPU cores [1] [2025-03-11 09:05:56,077][03388] Worker 4 uses CPU cores [0] [2025-03-11 09:05:56,104][03372] ConvEncoder: input_channels=3 [2025-03-11 09:05:56,126][03385] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-03-11 09:05:56,127][03385] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-03-11 09:05:56,139][03386] Worker 1 uses CPU cores [1] [2025-03-11 09:05:56,169][03390] Worker 0 uses CPU cores [0] [2025-03-11 09:05:56,199][03385] Num visible devices: 1 [2025-03-11 09:05:56,220][03391] Worker 3 uses CPU cores [1] [2025-03-11 09:05:56,238][03387] Worker 2 uses CPU cores [0] [2025-03-11 09:05:56,405][03372] Conv encoder output size: 512 [2025-03-11 09:05:56,405][03372] Policy head output size: 512 [2025-03-11 09:05:56,460][03372] Created Actor Critic model with architecture: [2025-03-11 09:05:56,461][03372] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-03-11 09:05:56,797][03372] Using optimizer [2025-03-11 09:05:59,279][01034] Heartbeat connected on InferenceWorker_p0-w0 [2025-03-11 09:05:59,285][01034] Heartbeat connected on RolloutWorker_w0 [2025-03-11 09:05:59,288][01034] Heartbeat connected on RolloutWorker_w1 [2025-03-11 09:05:59,292][01034] Heartbeat connected on RolloutWorker_w2 [2025-03-11 09:05:59,295][01034] Heartbeat connected on RolloutWorker_w3 [2025-03-11 09:05:59,299][01034] Heartbeat connected on RolloutWorker_w4 [2025-03-11 09:05:59,302][01034] Heartbeat connected on RolloutWorker_w5 [2025-03-11 09:05:59,305][01034] Heartbeat connected on RolloutWorker_w6 [2025-03-11 09:05:59,309][01034] Heartbeat connected on RolloutWorker_w7 [2025-03-11 09:05:59,528][01034] Heartbeat connected on Batcher_0 [2025-03-11 09:06:00,934][03372] No checkpoints found [2025-03-11 09:06:00,934][03372] Did not load from checkpoint, starting from scratch! [2025-03-11 09:06:00,934][03372] Initialized policy 0 weights for model version 0 [2025-03-11 09:06:00,937][03372] LearnerWorker_p0 finished initialization! [2025-03-11 09:06:00,939][03372] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-03-11 09:06:00,940][01034] Heartbeat connected on LearnerWorker_p0 [2025-03-11 09:06:01,172][03385] RunningMeanStd input shape: (3, 72, 128) [2025-03-11 09:06:01,174][03385] RunningMeanStd input shape: (1,) [2025-03-11 09:06:01,185][03385] ConvEncoder: input_channels=3 [2025-03-11 09:06:01,288][03385] Conv encoder output size: 512 [2025-03-11 09:06:01,289][03385] Policy head output size: 512 [2025-03-11 09:06:01,325][01034] Inference worker 0-0 is ready! [2025-03-11 09:06:01,326][01034] All inference workers are ready! Signal rollout workers to start! [2025-03-11 09:06:01,598][03386] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:06:01,604][03390] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:06:01,609][03391] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:06:01,625][03388] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:06:01,656][03387] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:06:01,672][03389] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:06:01,705][03392] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:06:01,739][03393] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:06:02,359][01034] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-03-11 09:06:02,851][03392] Decorrelating experience for 0 frames... [2025-03-11 09:06:02,876][03390] Decorrelating experience for 0 frames... [2025-03-11 09:06:02,992][03387] Decorrelating experience for 0 frames... [2025-03-11 09:06:03,194][03393] Decorrelating experience for 0 frames... [2025-03-11 09:06:03,580][03392] Decorrelating experience for 32 frames... [2025-03-11 09:06:03,751][03389] Decorrelating experience for 0 frames... [2025-03-11 09:06:03,947][03387] Decorrelating experience for 32 frames... [2025-03-11 09:06:04,146][03393] Decorrelating experience for 32 frames... [2025-03-11 09:06:04,205][03390] Decorrelating experience for 32 frames... [2025-03-11 09:06:04,554][03389] Decorrelating experience for 32 frames... [2025-03-11 09:06:05,043][03392] Decorrelating experience for 64 frames... [2025-03-11 09:06:05,222][03387] Decorrelating experience for 64 frames... [2025-03-11 09:06:05,547][03393] Decorrelating experience for 64 frames... [2025-03-11 09:06:05,629][03390] Decorrelating experience for 64 frames... [2025-03-11 09:06:05,821][03392] Decorrelating experience for 96 frames... [2025-03-11 09:06:06,929][03387] Decorrelating experience for 96 frames... [2025-03-11 09:06:06,936][03389] Decorrelating experience for 64 frames... [2025-03-11 09:06:07,359][01034] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-03-11 09:06:07,587][03393] Decorrelating experience for 96 frames... [2025-03-11 09:06:08,859][03389] Decorrelating experience for 96 frames... [2025-03-11 09:06:09,396][03390] Decorrelating experience for 96 frames... [2025-03-11 09:06:11,912][03372] Signal inference workers to stop experience collection... [2025-03-11 09:06:11,928][03385] InferenceWorker_p0-w0: stopping experience collection [2025-03-11 09:06:12,359][01034] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 132.0. Samples: 1320. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-03-11 09:06:12,363][01034] Avg episode reward: [(0, '2.785')] [2025-03-11 09:06:13,800][03372] Signal inference workers to resume experience collection... [2025-03-11 09:06:13,801][03385] InferenceWorker_p0-w0: resuming experience collection [2025-03-11 09:06:17,359][01034] Fps is (10 sec: 2048.0, 60 sec: 1365.3, 300 sec: 1365.3). Total num frames: 20480. Throughput: 0: 326.7. Samples: 4900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:06:17,361][01034] Avg episode reward: [(0, '3.691')] [2025-03-11 09:06:22,095][03385] Updated weights for policy 0, policy_version 10 (0.0018) [2025-03-11 09:06:22,361][01034] Fps is (10 sec: 4095.1, 60 sec: 2047.8, 300 sec: 2047.8). Total num frames: 40960. Throughput: 0: 533.3. Samples: 10668. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:06:22,364][01034] Avg episode reward: [(0, '4.200')] [2025-03-11 09:06:27,359][01034] Fps is (10 sec: 3276.8, 60 sec: 2129.9, 300 sec: 2129.9). Total num frames: 53248. Throughput: 0: 505.4. Samples: 12636. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:06:27,363][01034] Avg episode reward: [(0, '4.359')] [2025-03-11 09:06:32,359][01034] Fps is (10 sec: 3277.5, 60 sec: 2457.6, 300 sec: 2457.6). Total num frames: 73728. Throughput: 0: 611.6. Samples: 18348. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:06:32,361][01034] Avg episode reward: [(0, '4.227')] [2025-03-11 09:06:33,674][03385] Updated weights for policy 0, policy_version 20 (0.0029) [2025-03-11 09:06:37,360][01034] Fps is (10 sec: 4095.6, 60 sec: 2691.6, 300 sec: 2691.6). Total num frames: 94208. Throughput: 0: 682.7. Samples: 23894. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:06:37,362][01034] Avg episode reward: [(0, '4.162')] [2025-03-11 09:06:42,359][01034] Fps is (10 sec: 3686.4, 60 sec: 2764.8, 300 sec: 2764.8). Total num frames: 110592. Throughput: 0: 658.0. Samples: 26320. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:06:42,362][01034] Avg episode reward: [(0, '4.094')] [2025-03-11 09:06:42,368][03372] Saving new best policy, reward=4.094! [2025-03-11 09:06:44,955][03385] Updated weights for policy 0, policy_version 30 (0.0025) [2025-03-11 09:06:47,360][01034] Fps is (10 sec: 3686.4, 60 sec: 2912.7, 300 sec: 2912.7). Total num frames: 131072. Throughput: 0: 727.6. Samples: 32744. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:06:47,361][01034] Avg episode reward: [(0, '4.309')] [2025-03-11 09:06:47,369][03372] Saving new best policy, reward=4.309! [2025-03-11 09:06:52,359][01034] Fps is (10 sec: 3686.4, 60 sec: 2949.1, 300 sec: 2949.1). Total num frames: 147456. Throughput: 0: 841.8. Samples: 37880. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-03-11 09:06:52,363][01034] Avg episode reward: [(0, '4.323')] [2025-03-11 09:06:52,372][03372] Saving new best policy, reward=4.323! [2025-03-11 09:06:55,901][03385] Updated weights for policy 0, policy_version 40 (0.0021) [2025-03-11 09:06:57,359][01034] Fps is (10 sec: 3686.7, 60 sec: 3053.4, 300 sec: 3053.4). Total num frames: 167936. Throughput: 0: 873.4. Samples: 40624. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:06:57,362][01034] Avg episode reward: [(0, '4.283')] [2025-03-11 09:07:02,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3140.3, 300 sec: 3140.3). Total num frames: 188416. Throughput: 0: 936.4. Samples: 47040. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:07:02,361][01034] Avg episode reward: [(0, '4.522')] [2025-03-11 09:07:02,368][03372] Saving new best policy, reward=4.522! [2025-03-11 09:07:06,749][03385] Updated weights for policy 0, policy_version 50 (0.0012) [2025-03-11 09:07:07,362][01034] Fps is (10 sec: 3685.3, 60 sec: 3413.2, 300 sec: 3150.6). Total num frames: 204800. Throughput: 0: 916.7. Samples: 51918. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:07:07,363][01034] Avg episode reward: [(0, '4.481')] [2025-03-11 09:07:12,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3218.3). Total num frames: 225280. Throughput: 0: 942.4. Samples: 55044. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:07:12,368][01034] Avg episode reward: [(0, '4.364')] [2025-03-11 09:07:16,454][03385] Updated weights for policy 0, policy_version 60 (0.0018) [2025-03-11 09:07:17,359][01034] Fps is (10 sec: 4097.2, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 245760. Throughput: 0: 963.0. Samples: 61682. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:07:17,361][01034] Avg episode reward: [(0, '4.374')] [2025-03-11 09:07:22,362][01034] Fps is (10 sec: 3685.4, 60 sec: 3686.4, 300 sec: 3276.7). Total num frames: 262144. Throughput: 0: 935.3. Samples: 65986. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:07:22,364][01034] Avg episode reward: [(0, '4.485')] [2025-03-11 09:07:27,359][01034] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3276.8). Total num frames: 278528. Throughput: 0: 928.6. Samples: 68108. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:07:27,361][01034] Avg episode reward: [(0, '4.645')] [2025-03-11 09:07:27,366][03372] Saving new best policy, reward=4.645! [2025-03-11 09:07:28,944][03385] Updated weights for policy 0, policy_version 70 (0.0014) [2025-03-11 09:07:32,359][01034] Fps is (10 sec: 3687.3, 60 sec: 3754.7, 300 sec: 3322.3). Total num frames: 299008. Throughput: 0: 930.1. Samples: 74600. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:07:32,361][01034] Avg episode reward: [(0, '4.820')] [2025-03-11 09:07:32,374][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth... [2025-03-11 09:07:32,479][03372] Saving new best policy, reward=4.820! [2025-03-11 09:07:37,361][01034] Fps is (10 sec: 3685.9, 60 sec: 3686.4, 300 sec: 3319.9). Total num frames: 315392. Throughput: 0: 922.5. Samples: 79394. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:07:37,362][01034] Avg episode reward: [(0, '4.731')] [2025-03-11 09:07:40,055][03385] Updated weights for policy 0, policy_version 80 (0.0022) [2025-03-11 09:07:42,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3358.7). Total num frames: 335872. Throughput: 0: 929.8. Samples: 82464. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:07:42,361][01034] Avg episode reward: [(0, '4.644')] [2025-03-11 09:07:47,360][01034] Fps is (10 sec: 4096.5, 60 sec: 3754.7, 300 sec: 3393.8). Total num frames: 356352. Throughput: 0: 928.8. Samples: 88834. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:07:47,361][01034] Avg episode reward: [(0, '4.521')] [2025-03-11 09:07:51,367][03385] Updated weights for policy 0, policy_version 90 (0.0025) [2025-03-11 09:07:52,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3388.5). Total num frames: 372736. Throughput: 0: 925.5. Samples: 93562. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:07:52,360][01034] Avg episode reward: [(0, '4.477')] [2025-03-11 09:07:57,360][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.6, 300 sec: 3419.3). Total num frames: 393216. Throughput: 0: 928.5. Samples: 96826. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:07:57,361][01034] Avg episode reward: [(0, '4.395')] [2025-03-11 09:08:00,816][03385] Updated weights for policy 0, policy_version 100 (0.0012) [2025-03-11 09:08:02,362][01034] Fps is (10 sec: 4094.8, 60 sec: 3754.5, 300 sec: 3447.4). Total num frames: 413696. Throughput: 0: 921.4. Samples: 103146. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:08:02,367][01034] Avg episode reward: [(0, '4.511')] [2025-03-11 09:08:07,359][01034] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3440.6). Total num frames: 430080. Throughput: 0: 932.7. Samples: 107954. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:08:07,361][01034] Avg episode reward: [(0, '4.443')] [2025-03-11 09:08:11,967][03385] Updated weights for policy 0, policy_version 110 (0.0014) [2025-03-11 09:08:12,359][01034] Fps is (10 sec: 3687.5, 60 sec: 3754.7, 300 sec: 3465.8). Total num frames: 450560. Throughput: 0: 957.0. Samples: 111174. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:08:12,361][01034] Avg episode reward: [(0, '4.562')] [2025-03-11 09:08:17,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3458.8). Total num frames: 466944. Throughput: 0: 944.5. Samples: 117104. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:08:17,360][01034] Avg episode reward: [(0, '4.711')] [2025-03-11 09:08:22,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3481.6). Total num frames: 487424. Throughput: 0: 955.6. Samples: 122396. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:08:22,366][01034] Avg episode reward: [(0, '4.529')] [2025-03-11 09:08:23,106][03385] Updated weights for policy 0, policy_version 120 (0.0012) [2025-03-11 09:08:27,361][01034] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3502.8). Total num frames: 507904. Throughput: 0: 958.2. Samples: 125584. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:08:27,362][01034] Avg episode reward: [(0, '4.419')] [2025-03-11 09:08:32,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3495.3). Total num frames: 524288. Throughput: 0: 942.2. Samples: 131232. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2025-03-11 09:08:32,363][01034] Avg episode reward: [(0, '4.470')] [2025-03-11 09:08:34,292][03385] Updated weights for policy 0, policy_version 130 (0.0016) [2025-03-11 09:08:37,359][01034] Fps is (10 sec: 3686.9, 60 sec: 3823.0, 300 sec: 3514.6). Total num frames: 544768. Throughput: 0: 962.0. Samples: 136850. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:08:37,365][01034] Avg episode reward: [(0, '4.563')] [2025-03-11 09:08:42,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3532.8). Total num frames: 565248. Throughput: 0: 959.9. Samples: 140020. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-03-11 09:08:42,362][01034] Avg episode reward: [(0, '4.562')] [2025-03-11 09:08:44,094][03385] Updated weights for policy 0, policy_version 140 (0.0012) [2025-03-11 09:08:47,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3525.0). Total num frames: 581632. Throughput: 0: 934.7. Samples: 145204. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-03-11 09:08:47,365][01034] Avg episode reward: [(0, '4.586')] [2025-03-11 09:08:52,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3541.8). Total num frames: 602112. Throughput: 0: 961.6. Samples: 151226. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:08:52,369][01034] Avg episode reward: [(0, '4.761')] [2025-03-11 09:08:55,002][03385] Updated weights for policy 0, policy_version 150 (0.0016) [2025-03-11 09:08:57,361][01034] Fps is (10 sec: 4095.4, 60 sec: 3822.9, 300 sec: 3557.6). Total num frames: 622592. Throughput: 0: 960.4. Samples: 154392. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:08:57,362][01034] Avg episode reward: [(0, '4.721')] [2025-03-11 09:09:02,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3549.9). Total num frames: 638976. Throughput: 0: 936.7. Samples: 159256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:09:02,361][01034] Avg episode reward: [(0, '4.843')] [2025-03-11 09:09:02,370][03372] Saving new best policy, reward=4.843! [2025-03-11 09:09:06,010][03385] Updated weights for policy 0, policy_version 160 (0.0026) [2025-03-11 09:09:07,359][01034] Fps is (10 sec: 3687.0, 60 sec: 3822.9, 300 sec: 3564.6). Total num frames: 659456. Throughput: 0: 959.6. Samples: 165578. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:09:07,365][01034] Avg episode reward: [(0, '4.801')] [2025-03-11 09:09:12,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3578.6). Total num frames: 679936. Throughput: 0: 958.5. Samples: 168716. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:09:12,362][01034] Avg episode reward: [(0, '4.676')] [2025-03-11 09:09:17,274][03385] Updated weights for policy 0, policy_version 170 (0.0023) [2025-03-11 09:09:17,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3570.9). Total num frames: 696320. Throughput: 0: 938.1. Samples: 173446. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:09:17,365][01034] Avg episode reward: [(0, '4.622')] [2025-03-11 09:09:22,359][01034] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3584.0). Total num frames: 716800. Throughput: 0: 959.0. Samples: 180004. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:09:22,365][01034] Avg episode reward: [(0, '4.611')] [2025-03-11 09:09:27,147][03385] Updated weights for policy 0, policy_version 180 (0.0013) [2025-03-11 09:09:27,361][01034] Fps is (10 sec: 4095.4, 60 sec: 3822.9, 300 sec: 3596.5). Total num frames: 737280. Throughput: 0: 958.9. Samples: 183174. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:09:27,362][01034] Avg episode reward: [(0, '4.771')] [2025-03-11 09:09:32,359][01034] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3588.9). Total num frames: 753664. Throughput: 0: 950.8. Samples: 187992. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:09:32,362][01034] Avg episode reward: [(0, '5.024')] [2025-03-11 09:09:32,369][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000184_753664.pth... [2025-03-11 09:09:32,470][03372] Saving new best policy, reward=5.024! [2025-03-11 09:09:37,361][01034] Fps is (10 sec: 3686.3, 60 sec: 3822.8, 300 sec: 3600.6). Total num frames: 774144. Throughput: 0: 956.4. Samples: 194268. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:09:37,363][01034] Avg episode reward: [(0, '5.242')] [2025-03-11 09:09:37,371][03372] Saving new best policy, reward=5.242! [2025-03-11 09:09:38,090][03385] Updated weights for policy 0, policy_version 190 (0.0018) [2025-03-11 09:09:42,362][01034] Fps is (10 sec: 3685.3, 60 sec: 3754.5, 300 sec: 3593.3). Total num frames: 790528. Throughput: 0: 949.3. Samples: 197110. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:09:42,365][01034] Avg episode reward: [(0, '5.295')] [2025-03-11 09:09:42,374][03372] Saving new best policy, reward=5.295! [2025-03-11 09:09:47,359][01034] Fps is (10 sec: 3277.4, 60 sec: 3754.7, 300 sec: 3586.3). Total num frames: 806912. Throughput: 0: 950.3. Samples: 202018. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:09:47,360][01034] Avg episode reward: [(0, '5.345')] [2025-03-11 09:09:47,365][03372] Saving new best policy, reward=5.345! [2025-03-11 09:09:49,235][03385] Updated weights for policy 0, policy_version 200 (0.0021) [2025-03-11 09:09:52,359][01034] Fps is (10 sec: 4097.2, 60 sec: 3822.9, 300 sec: 3615.2). Total num frames: 831488. Throughput: 0: 953.2. Samples: 208474. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:09:52,366][01034] Avg episode reward: [(0, '5.659')] [2025-03-11 09:09:52,374][03372] Saving new best policy, reward=5.659! [2025-03-11 09:09:57,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3608.0). Total num frames: 847872. Throughput: 0: 940.8. Samples: 211050. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:09:57,365][01034] Avg episode reward: [(0, '5.570')] [2025-03-11 09:10:00,312][03385] Updated weights for policy 0, policy_version 210 (0.0014) [2025-03-11 09:10:02,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3618.1). Total num frames: 868352. Throughput: 0: 954.9. Samples: 216418. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:10:02,366][01034] Avg episode reward: [(0, '5.494')] [2025-03-11 09:10:07,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3627.9). Total num frames: 888832. Throughput: 0: 952.7. Samples: 222874. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:10:07,363][01034] Avg episode reward: [(0, '5.472')] [2025-03-11 09:10:11,169][03385] Updated weights for policy 0, policy_version 220 (0.0013) [2025-03-11 09:10:12,359][01034] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3604.5). Total num frames: 901120. Throughput: 0: 932.9. Samples: 225152. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:10:12,362][01034] Avg episode reward: [(0, '5.617')] [2025-03-11 09:10:17,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3630.2). Total num frames: 925696. Throughput: 0: 951.7. Samples: 230820. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:10:17,361][01034] Avg episode reward: [(0, '5.955')] [2025-03-11 09:10:17,365][03372] Saving new best policy, reward=5.955! [2025-03-11 09:10:20,980][03385] Updated weights for policy 0, policy_version 230 (0.0013) [2025-03-11 09:10:22,361][01034] Fps is (10 sec: 4504.9, 60 sec: 3822.8, 300 sec: 3639.1). Total num frames: 946176. Throughput: 0: 954.2. Samples: 237208. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:10:22,362][01034] Avg episode reward: [(0, '6.406')] [2025-03-11 09:10:22,371][03372] Saving new best policy, reward=6.406! [2025-03-11 09:10:27,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3632.3). Total num frames: 962560. Throughput: 0: 935.4. Samples: 239200. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:10:27,361][01034] Avg episode reward: [(0, '6.360')] [2025-03-11 09:10:32,138][03385] Updated weights for policy 0, policy_version 240 (0.0014) [2025-03-11 09:10:32,359][01034] Fps is (10 sec: 3687.0, 60 sec: 3822.9, 300 sec: 3640.9). Total num frames: 983040. Throughput: 0: 960.8. Samples: 245256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:10:32,365][01034] Avg episode reward: [(0, '6.163')] [2025-03-11 09:10:37,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3823.1, 300 sec: 3649.2). Total num frames: 1003520. Throughput: 0: 954.4. Samples: 251424. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:10:37,361][01034] Avg episode reward: [(0, '6.436')] [2025-03-11 09:10:37,362][03372] Saving new best policy, reward=6.436! [2025-03-11 09:10:42,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3642.5). Total num frames: 1019904. Throughput: 0: 941.1. Samples: 253398. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:10:42,364][01034] Avg episode reward: [(0, '6.280')] [2025-03-11 09:10:43,086][03385] Updated weights for policy 0, policy_version 250 (0.0015) [2025-03-11 09:10:47,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3650.5). Total num frames: 1040384. Throughput: 0: 964.4. Samples: 259818. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:10:47,362][01034] Avg episode reward: [(0, '6.537')] [2025-03-11 09:10:47,365][03372] Saving new best policy, reward=6.537! [2025-03-11 09:10:52,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3658.2). Total num frames: 1060864. Throughput: 0: 947.2. Samples: 265496. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:10:52,362][01034] Avg episode reward: [(0, '6.477')] [2025-03-11 09:10:53,883][03385] Updated weights for policy 0, policy_version 260 (0.0017) [2025-03-11 09:10:57,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3651.7). Total num frames: 1077248. Throughput: 0: 948.1. Samples: 267816. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:10:57,361][01034] Avg episode reward: [(0, '6.522')] [2025-03-11 09:11:02,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3721.1). Total num frames: 1097728. Throughput: 0: 968.1. Samples: 274386. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:11:02,362][01034] Avg episode reward: [(0, '6.733')] [2025-03-11 09:11:02,369][03372] Saving new best policy, reward=6.733! [2025-03-11 09:11:03,583][03385] Updated weights for policy 0, policy_version 270 (0.0012) [2025-03-11 09:11:07,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3776.7). Total num frames: 1114112. Throughput: 0: 945.3. Samples: 279744. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:11:07,365][01034] Avg episode reward: [(0, '6.678')] [2025-03-11 09:11:12,361][01034] Fps is (10 sec: 3685.9, 60 sec: 3891.1, 300 sec: 3776.6). Total num frames: 1134592. Throughput: 0: 959.5. Samples: 282378. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:11:12,367][01034] Avg episode reward: [(0, '6.676')] [2025-03-11 09:11:14,762][03385] Updated weights for policy 0, policy_version 280 (0.0023) [2025-03-11 09:11:17,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3776.7). Total num frames: 1155072. Throughput: 0: 968.3. Samples: 288830. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:11:17,360][01034] Avg episode reward: [(0, '7.361')] [2025-03-11 09:11:17,417][03372] Saving new best policy, reward=7.361! [2025-03-11 09:11:22,359][01034] Fps is (10 sec: 3686.9, 60 sec: 3754.8, 300 sec: 3790.5). Total num frames: 1171456. Throughput: 0: 940.3. Samples: 293738. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:11:22,361][01034] Avg episode reward: [(0, '7.629')] [2025-03-11 09:11:22,379][03372] Saving new best policy, reward=7.629! [2025-03-11 09:11:25,955][03385] Updated weights for policy 0, policy_version 290 (0.0013) [2025-03-11 09:11:27,359][01034] Fps is (10 sec: 3686.3, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1191936. Throughput: 0: 961.8. Samples: 296680. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-03-11 09:11:27,362][01034] Avg episode reward: [(0, '7.852')] [2025-03-11 09:11:27,365][03372] Saving new best policy, reward=7.852! [2025-03-11 09:11:32,359][01034] Fps is (10 sec: 4096.1, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1212416. Throughput: 0: 961.3. Samples: 303078. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:11:32,363][01034] Avg episode reward: [(0, '8.267')] [2025-03-11 09:11:32,369][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth... [2025-03-11 09:11:32,489][03372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000073_299008.pth [2025-03-11 09:11:32,502][03372] Saving new best policy, reward=8.267! [2025-03-11 09:11:37,157][03385] Updated weights for policy 0, policy_version 300 (0.0013) [2025-03-11 09:11:37,359][01034] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1228800. Throughput: 0: 940.0. Samples: 307796. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:11:37,364][01034] Avg episode reward: [(0, '8.543')] [2025-03-11 09:11:37,368][03372] Saving new best policy, reward=8.543! [2025-03-11 09:11:42,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1249280. Throughput: 0: 959.2. Samples: 310982. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:11:42,364][01034] Avg episode reward: [(0, '9.592')] [2025-03-11 09:11:42,376][03372] Saving new best policy, reward=9.592! [2025-03-11 09:11:47,360][01034] Fps is (10 sec: 3686.0, 60 sec: 3754.6, 300 sec: 3790.5). Total num frames: 1265664. Throughput: 0: 944.3. Samples: 316882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:11:47,362][01034] Avg episode reward: [(0, '9.803')] [2025-03-11 09:11:47,364][03372] Saving new best policy, reward=9.803! [2025-03-11 09:11:47,957][03385] Updated weights for policy 0, policy_version 310 (0.0016) [2025-03-11 09:11:52,359][01034] Fps is (10 sec: 2867.2, 60 sec: 3618.1, 300 sec: 3762.8). Total num frames: 1277952. Throughput: 0: 902.4. Samples: 320354. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:11:52,361][01034] Avg episode reward: [(0, '10.024')] [2025-03-11 09:11:52,368][03372] Saving new best policy, reward=10.024! [2025-03-11 09:11:57,359][01034] Fps is (10 sec: 3277.1, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1298432. Throughput: 0: 913.2. Samples: 323472. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:11:57,364][01034] Avg episode reward: [(0, '9.555')] [2025-03-11 09:11:59,614][03385] Updated weights for policy 0, policy_version 320 (0.0012) [2025-03-11 09:12:02,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1318912. Throughput: 0: 913.2. Samples: 329922. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-03-11 09:12:02,361][01034] Avg episode reward: [(0, '10.290')] [2025-03-11 09:12:02,376][03372] Saving new best policy, reward=10.290! [2025-03-11 09:12:07,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3762.8). Total num frames: 1335296. Throughput: 0: 910.2. Samples: 334698. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:12:07,365][01034] Avg episode reward: [(0, '10.730')] [2025-03-11 09:12:07,369][03372] Saving new best policy, reward=10.730! [2025-03-11 09:12:10,709][03385] Updated weights for policy 0, policy_version 330 (0.0018) [2025-03-11 09:12:12,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3762.8). Total num frames: 1355776. Throughput: 0: 915.5. Samples: 337876. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:12:12,365][01034] Avg episode reward: [(0, '10.666')] [2025-03-11 09:12:17,366][01034] Fps is (10 sec: 4093.1, 60 sec: 3686.0, 300 sec: 3776.6). Total num frames: 1376256. Throughput: 0: 916.1. Samples: 344308. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:12:17,371][01034] Avg episode reward: [(0, '10.650')] [2025-03-11 09:12:21,680][03385] Updated weights for policy 0, policy_version 340 (0.0012) [2025-03-11 09:12:22,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1392640. Throughput: 0: 918.8. Samples: 349140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:12:22,361][01034] Avg episode reward: [(0, '10.440')] [2025-03-11 09:12:27,359][01034] Fps is (10 sec: 4098.9, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1417216. Throughput: 0: 920.5. Samples: 352406. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:12:27,360][01034] Avg episode reward: [(0, '11.243')] [2025-03-11 09:12:27,362][03372] Saving new best policy, reward=11.243! [2025-03-11 09:12:31,728][03385] Updated weights for policy 0, policy_version 350 (0.0016) [2025-03-11 09:12:32,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3686.4, 300 sec: 3790.6). Total num frames: 1433600. Throughput: 0: 923.3. Samples: 358428. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:12:32,363][01034] Avg episode reward: [(0, '11.489')] [2025-03-11 09:12:32,368][03372] Saving new best policy, reward=11.489! [2025-03-11 09:12:37,359][01034] Fps is (10 sec: 3276.8, 60 sec: 3686.4, 300 sec: 3776.7). Total num frames: 1449984. Throughput: 0: 961.9. Samples: 363640. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:12:37,360][01034] Avg episode reward: [(0, '11.694')] [2025-03-11 09:12:37,369][03372] Saving new best policy, reward=11.694! [2025-03-11 09:12:42,226][03385] Updated weights for policy 0, policy_version 360 (0.0024) [2025-03-11 09:12:42,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1474560. Throughput: 0: 965.0. Samples: 366898. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:12:42,361][01034] Avg episode reward: [(0, '10.357')] [2025-03-11 09:12:47,363][01034] Fps is (10 sec: 4094.4, 60 sec: 3754.5, 300 sec: 3790.5). Total num frames: 1490944. Throughput: 0: 948.9. Samples: 372626. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:12:47,365][01034] Avg episode reward: [(0, '10.766')] [2025-03-11 09:12:52,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1511424. Throughput: 0: 966.6. Samples: 378196. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:12:52,362][01034] Avg episode reward: [(0, '10.749')] [2025-03-11 09:12:53,320][03385] Updated weights for policy 0, policy_version 370 (0.0014) [2025-03-11 09:12:57,359][01034] Fps is (10 sec: 4097.6, 60 sec: 3891.2, 300 sec: 3790.6). Total num frames: 1531904. Throughput: 0: 967.5. Samples: 381414. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:12:57,364][01034] Avg episode reward: [(0, '12.120')] [2025-03-11 09:12:57,368][03372] Saving new best policy, reward=12.120! [2025-03-11 09:13:02,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1548288. Throughput: 0: 943.6. Samples: 386764. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:13:02,361][01034] Avg episode reward: [(0, '13.756')] [2025-03-11 09:13:02,365][03372] Saving new best policy, reward=13.756! [2025-03-11 09:13:04,351][03385] Updated weights for policy 0, policy_version 380 (0.0017) [2025-03-11 09:13:07,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1568768. Throughput: 0: 969.0. Samples: 392746. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:13:07,361][01034] Avg episode reward: [(0, '15.026')] [2025-03-11 09:13:07,367][03372] Saving new best policy, reward=15.026! [2025-03-11 09:13:12,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1589248. Throughput: 0: 967.3. Samples: 395936. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:13:12,362][01034] Avg episode reward: [(0, '15.978')] [2025-03-11 09:13:12,367][03372] Saving new best policy, reward=15.978! [2025-03-11 09:13:14,745][03385] Updated weights for policy 0, policy_version 390 (0.0023) [2025-03-11 09:13:17,359][01034] Fps is (10 sec: 3276.8, 60 sec: 3755.1, 300 sec: 3776.7). Total num frames: 1601536. Throughput: 0: 941.6. Samples: 400798. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:13:17,365][01034] Avg episode reward: [(0, '16.265')] [2025-03-11 09:13:17,375][03372] Saving new best policy, reward=16.265! [2025-03-11 09:13:22,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.6). Total num frames: 1626112. Throughput: 0: 965.1. Samples: 407070. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:13:22,361][01034] Avg episode reward: [(0, '15.835')] [2025-03-11 09:13:25,089][03385] Updated weights for policy 0, policy_version 400 (0.0017) [2025-03-11 09:13:27,360][01034] Fps is (10 sec: 4505.2, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1646592. Throughput: 0: 963.4. Samples: 410252. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:13:27,361][01034] Avg episode reward: [(0, '14.308')] [2025-03-11 09:13:32,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1662976. Throughput: 0: 944.9. Samples: 415144. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:13:32,361][01034] Avg episode reward: [(0, '13.932')] [2025-03-11 09:13:32,369][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000406_1662976.pth... [2025-03-11 09:13:32,479][03372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000184_753664.pth [2025-03-11 09:13:36,032][03385] Updated weights for policy 0, policy_version 410 (0.0014) [2025-03-11 09:13:37,359][01034] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1683456. Throughput: 0: 964.0. Samples: 421574. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:13:37,360][01034] Avg episode reward: [(0, '13.575')] [2025-03-11 09:13:42,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 1703936. Throughput: 0: 965.7. Samples: 424870. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:13:42,361][01034] Avg episode reward: [(0, '13.383')] [2025-03-11 09:13:47,104][03385] Updated weights for policy 0, policy_version 420 (0.0012) [2025-03-11 09:13:47,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3823.2, 300 sec: 3790.5). Total num frames: 1720320. Throughput: 0: 951.9. Samples: 429598. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:13:47,361][01034] Avg episode reward: [(0, '14.590')] [2025-03-11 09:13:52,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.6). Total num frames: 1740800. Throughput: 0: 965.4. Samples: 436190. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:13:52,361][01034] Avg episode reward: [(0, '15.265')] [2025-03-11 09:13:57,293][03385] Updated weights for policy 0, policy_version 430 (0.0013) [2025-03-11 09:13:57,361][01034] Fps is (10 sec: 4095.4, 60 sec: 3822.8, 300 sec: 3804.4). Total num frames: 1761280. Throughput: 0: 964.0. Samples: 439318. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:13:57,362][01034] Avg episode reward: [(0, '14.982')] [2025-03-11 09:14:02,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1777664. Throughput: 0: 963.5. Samples: 444154. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:14:02,365][01034] Avg episode reward: [(0, '15.482')] [2025-03-11 09:14:07,359][01034] Fps is (10 sec: 3686.9, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1798144. Throughput: 0: 968.6. Samples: 450658. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:14:07,365][01034] Avg episode reward: [(0, '14.243')] [2025-03-11 09:14:07,591][03385] Updated weights for policy 0, policy_version 440 (0.0014) [2025-03-11 09:14:12,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1814528. Throughput: 0: 960.7. Samples: 453484. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:14:12,362][01034] Avg episode reward: [(0, '14.199')] [2025-03-11 09:14:17,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3790.5). Total num frames: 1835008. Throughput: 0: 968.1. Samples: 458710. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:14:17,361][01034] Avg episode reward: [(0, '15.463')] [2025-03-11 09:14:18,522][03385] Updated weights for policy 0, policy_version 450 (0.0020) [2025-03-11 09:14:22,359][01034] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1859584. Throughput: 0: 971.8. Samples: 465304. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:14:22,360][01034] Avg episode reward: [(0, '15.657')] [2025-03-11 09:14:27,359][01034] Fps is (10 sec: 3686.5, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 1871872. Throughput: 0: 954.2. Samples: 467808. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:14:27,361][01034] Avg episode reward: [(0, '16.076')] [2025-03-11 09:14:29,466][03385] Updated weights for policy 0, policy_version 460 (0.0013) [2025-03-11 09:14:32,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 1896448. Throughput: 0: 973.3. Samples: 473398. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:14:32,360][01034] Avg episode reward: [(0, '16.462')] [2025-03-11 09:14:32,374][03372] Saving new best policy, reward=16.462! [2025-03-11 09:14:37,359][01034] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 1916928. Throughput: 0: 970.3. Samples: 479852. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:14:37,361][01034] Avg episode reward: [(0, '15.923')] [2025-03-11 09:14:39,843][03385] Updated weights for policy 0, policy_version 470 (0.0015) [2025-03-11 09:14:42,359][01034] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3804.4). Total num frames: 1929216. Throughput: 0: 946.0. Samples: 481888. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:14:42,366][01034] Avg episode reward: [(0, '16.374')] [2025-03-11 09:14:47,359][01034] Fps is (10 sec: 3276.7, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 1949696. Throughput: 0: 967.8. Samples: 487706. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:14:47,361][01034] Avg episode reward: [(0, '17.715')] [2025-03-11 09:14:47,368][03372] Saving new best policy, reward=17.715! [2025-03-11 09:14:50,330][03385] Updated weights for policy 0, policy_version 480 (0.0014) [2025-03-11 09:14:52,365][01034] Fps is (10 sec: 4093.8, 60 sec: 3822.6, 300 sec: 3804.3). Total num frames: 1970176. Throughput: 0: 956.7. Samples: 493714. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:14:52,366][01034] Avg episode reward: [(0, '18.346')] [2025-03-11 09:14:52,386][03372] Saving new best policy, reward=18.346! [2025-03-11 09:14:57,359][01034] Fps is (10 sec: 3686.5, 60 sec: 3754.8, 300 sec: 3790.5). Total num frames: 1986560. Throughput: 0: 936.9. Samples: 495644. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:14:57,365][01034] Avg episode reward: [(0, '18.799')] [2025-03-11 09:14:57,368][03372] Saving new best policy, reward=18.799! [2025-03-11 09:15:01,385][03385] Updated weights for policy 0, policy_version 490 (0.0022) [2025-03-11 09:15:02,359][01034] Fps is (10 sec: 3688.5, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2007040. Throughput: 0: 961.5. Samples: 501978. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:15:02,365][01034] Avg episode reward: [(0, '19.315')] [2025-03-11 09:15:02,397][03372] Saving new best policy, reward=19.315! [2025-03-11 09:15:07,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3823.0, 300 sec: 3818.3). Total num frames: 2027520. Throughput: 0: 942.8. Samples: 507730. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:15:07,365][01034] Avg episode reward: [(0, '18.661')] [2025-03-11 09:15:12,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3790.5). Total num frames: 2043904. Throughput: 0: 937.7. Samples: 510006. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:15:12,362][01034] Avg episode reward: [(0, '17.348')] [2025-03-11 09:15:12,512][03385] Updated weights for policy 0, policy_version 500 (0.0017) [2025-03-11 09:15:17,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2068480. Throughput: 0: 957.4. Samples: 516480. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:15:17,362][01034] Avg episode reward: [(0, '15.913')] [2025-03-11 09:15:22,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2084864. Throughput: 0: 935.5. Samples: 521948. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:15:22,362][01034] Avg episode reward: [(0, '16.111')] [2025-03-11 09:15:23,151][03385] Updated weights for policy 0, policy_version 510 (0.0017) [2025-03-11 09:15:27,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2105344. Throughput: 0: 948.6. Samples: 524574. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:15:27,360][01034] Avg episode reward: [(0, '15.468')] [2025-03-11 09:15:32,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2125824. Throughput: 0: 963.6. Samples: 531070. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:15:32,365][01034] Avg episode reward: [(0, '16.833')] [2025-03-11 09:15:32,375][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000519_2125824.pth... [2025-03-11 09:15:32,486][03372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000296_1212416.pth [2025-03-11 09:15:32,905][03385] Updated weights for policy 0, policy_version 520 (0.0018) [2025-03-11 09:15:37,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2142208. Throughput: 0: 940.2. Samples: 536016. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:15:37,366][01034] Avg episode reward: [(0, '17.627')] [2025-03-11 09:15:42,359][01034] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2162688. Throughput: 0: 965.2. Samples: 539078. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:15:42,361][01034] Avg episode reward: [(0, '18.789')] [2025-03-11 09:15:44,193][03385] Updated weights for policy 0, policy_version 530 (0.0025) [2025-03-11 09:15:47,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2183168. Throughput: 0: 968.9. Samples: 545578. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:15:47,361][01034] Avg episode reward: [(0, '19.135')] [2025-03-11 09:15:52,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3823.3, 300 sec: 3804.4). Total num frames: 2199552. Throughput: 0: 948.1. Samples: 550394. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:15:52,361][01034] Avg episode reward: [(0, '18.879')] [2025-03-11 09:15:54,914][03385] Updated weights for policy 0, policy_version 540 (0.0014) [2025-03-11 09:15:57,359][01034] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2220032. Throughput: 0: 970.4. Samples: 553674. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:15:57,361][01034] Avg episode reward: [(0, '16.782')] [2025-03-11 09:16:02,361][01034] Fps is (10 sec: 4095.4, 60 sec: 3891.1, 300 sec: 3818.3). Total num frames: 2240512. Throughput: 0: 972.9. Samples: 560264. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:16:02,367][01034] Avg episode reward: [(0, '16.037')] [2025-03-11 09:16:05,739][03385] Updated weights for policy 0, policy_version 550 (0.0017) [2025-03-11 09:16:07,359][01034] Fps is (10 sec: 3686.5, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2256896. Throughput: 0: 959.2. Samples: 565112. Policy #0 lag: (min: 0.0, avg: 0.1, max: 1.0) [2025-03-11 09:16:07,360][01034] Avg episode reward: [(0, '16.959')] [2025-03-11 09:16:12,359][01034] Fps is (10 sec: 3687.0, 60 sec: 3891.2, 300 sec: 3804.4). Total num frames: 2277376. Throughput: 0: 972.1. Samples: 568318. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:16:12,361][01034] Avg episode reward: [(0, '17.453')] [2025-03-11 09:16:15,202][03385] Updated weights for policy 0, policy_version 560 (0.0024) [2025-03-11 09:16:17,365][01034] Fps is (10 sec: 4093.7, 60 sec: 3822.6, 300 sec: 3818.2). Total num frames: 2297856. Throughput: 0: 970.5. Samples: 574750. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:16:17,366][01034] Avg episode reward: [(0, '18.817')] [2025-03-11 09:16:22,359][01034] Fps is (10 sec: 3276.8, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2310144. Throughput: 0: 935.3. Samples: 578106. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:16:22,361][01034] Avg episode reward: [(0, '19.190')] [2025-03-11 09:16:27,359][01034] Fps is (10 sec: 3278.6, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2330624. Throughput: 0: 937.8. Samples: 581280. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-03-11 09:16:27,361][01034] Avg episode reward: [(0, '20.071')] [2025-03-11 09:16:27,363][03372] Saving new best policy, reward=20.071! [2025-03-11 09:16:27,885][03385] Updated weights for policy 0, policy_version 570 (0.0017) [2025-03-11 09:16:32,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2351104. Throughput: 0: 937.1. Samples: 587746. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:16:32,363][01034] Avg episode reward: [(0, '19.111')] [2025-03-11 09:16:37,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3790.5). Total num frames: 2367488. Throughput: 0: 937.3. Samples: 592574. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:16:37,360][01034] Avg episode reward: [(0, '20.482')] [2025-03-11 09:16:37,367][03372] Saving new best policy, reward=20.482! [2025-03-11 09:16:38,869][03385] Updated weights for policy 0, policy_version 580 (0.0014) [2025-03-11 09:16:42,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3804.4). Total num frames: 2387968. Throughput: 0: 935.8. Samples: 595784. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2025-03-11 09:16:42,361][01034] Avg episode reward: [(0, '21.199')] [2025-03-11 09:16:42,371][03372] Saving new best policy, reward=21.199! [2025-03-11 09:16:47,362][01034] Fps is (10 sec: 4094.8, 60 sec: 3754.5, 300 sec: 3832.2). Total num frames: 2408448. Throughput: 0: 930.9. Samples: 602156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 2.0) [2025-03-11 09:16:47,364][01034] Avg episode reward: [(0, '21.683')] [2025-03-11 09:16:47,367][03372] Saving new best policy, reward=21.683! [2025-03-11 09:16:49,752][03385] Updated weights for policy 0, policy_version 590 (0.0013) [2025-03-11 09:16:52,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2424832. Throughput: 0: 926.9. Samples: 606824. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:16:52,361][01034] Avg episode reward: [(0, '22.328')] [2025-03-11 09:16:52,370][03372] Saving new best policy, reward=22.328! [2025-03-11 09:16:57,359][01034] Fps is (10 sec: 3687.5, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2445312. Throughput: 0: 924.2. Samples: 609908. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:16:57,360][01034] Avg episode reward: [(0, '23.029')] [2025-03-11 09:16:57,364][03372] Saving new best policy, reward=23.029! [2025-03-11 09:16:59,944][03385] Updated weights for policy 0, policy_version 600 (0.0014) [2025-03-11 09:17:02,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3754.8, 300 sec: 3832.2). Total num frames: 2465792. Throughput: 0: 919.9. Samples: 616140. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:17:02,364][01034] Avg episode reward: [(0, '22.505')] [2025-03-11 09:17:07,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2482176. Throughput: 0: 958.1. Samples: 621220. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:17:07,360][01034] Avg episode reward: [(0, '20.777')] [2025-03-11 09:17:10,869][03385] Updated weights for policy 0, policy_version 610 (0.0014) [2025-03-11 09:17:12,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.4). Total num frames: 2502656. Throughput: 0: 961.0. Samples: 624526. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:17:12,361][01034] Avg episode reward: [(0, '20.146')] [2025-03-11 09:17:17,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3686.7, 300 sec: 3818.3). Total num frames: 2519040. Throughput: 0: 946.8. Samples: 630350. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:17:17,367][01034] Avg episode reward: [(0, '17.313')] [2025-03-11 09:17:21,723][03385] Updated weights for policy 0, policy_version 620 (0.0016) [2025-03-11 09:17:22,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3804.4). Total num frames: 2539520. Throughput: 0: 962.8. Samples: 635900. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:17:22,361][01034] Avg episode reward: [(0, '16.762')] [2025-03-11 09:17:27,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2560000. Throughput: 0: 963.8. Samples: 639156. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:17:27,361][01034] Avg episode reward: [(0, '16.792')] [2025-03-11 09:17:32,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 2576384. Throughput: 0: 945.3. Samples: 644690. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:17:32,362][01034] Avg episode reward: [(0, '17.315')] [2025-03-11 09:17:32,381][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000630_2580480.pth... [2025-03-11 09:17:32,386][03385] Updated weights for policy 0, policy_version 630 (0.0014) [2025-03-11 09:17:32,524][03372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000406_1662976.pth [2025-03-11 09:17:37,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2600960. Throughput: 0: 971.8. Samples: 650556. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:17:37,360][01034] Avg episode reward: [(0, '18.044')] [2025-03-11 09:17:42,018][03385] Updated weights for policy 0, policy_version 640 (0.0025) [2025-03-11 09:17:42,359][01034] Fps is (10 sec: 4505.6, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2621440. Throughput: 0: 975.1. Samples: 653786. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:17:42,360][01034] Avg episode reward: [(0, '18.962')] [2025-03-11 09:17:47,359][01034] Fps is (10 sec: 3276.8, 60 sec: 3754.9, 300 sec: 3804.4). Total num frames: 2633728. Throughput: 0: 953.2. Samples: 659034. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:17:47,366][01034] Avg episode reward: [(0, '18.861')] [2025-03-11 09:17:52,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2658304. Throughput: 0: 979.5. Samples: 665298. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:17:52,361][01034] Avg episode reward: [(0, '19.597')] [2025-03-11 09:17:52,970][03385] Updated weights for policy 0, policy_version 650 (0.0015) [2025-03-11 09:17:57,359][01034] Fps is (10 sec: 4505.5, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2678784. Throughput: 0: 978.8. Samples: 668570. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:17:57,361][01034] Avg episode reward: [(0, '19.687')] [2025-03-11 09:18:02,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2695168. Throughput: 0: 959.7. Samples: 673538. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:02,365][01034] Avg episode reward: [(0, '20.363')] [2025-03-11 09:18:03,831][03385] Updated weights for policy 0, policy_version 660 (0.0013) [2025-03-11 09:18:07,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2715648. Throughput: 0: 981.0. Samples: 680046. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:07,361][01034] Avg episode reward: [(0, '20.688')] [2025-03-11 09:18:12,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2736128. Throughput: 0: 980.2. Samples: 683266. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:12,364][01034] Avg episode reward: [(0, '20.937')] [2025-03-11 09:18:14,526][03385] Updated weights for policy 0, policy_version 670 (0.0013) [2025-03-11 09:18:17,359][01034] Fps is (10 sec: 3686.5, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2752512. Throughput: 0: 959.4. Samples: 687864. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:17,364][01034] Avg episode reward: [(0, '20.082')] [2025-03-11 09:18:22,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2772992. Throughput: 0: 972.8. Samples: 694334. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:22,365][01034] Avg episode reward: [(0, '21.141')] [2025-03-11 09:18:24,528][03385] Updated weights for policy 0, policy_version 680 (0.0016) [2025-03-11 09:18:27,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2793472. Throughput: 0: 971.9. Samples: 697520. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:27,365][01034] Avg episode reward: [(0, '22.219')] [2025-03-11 09:18:32,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2809856. Throughput: 0: 961.9. Samples: 702320. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:32,364][01034] Avg episode reward: [(0, '23.309')] [2025-03-11 09:18:32,374][03372] Saving new best policy, reward=23.309! [2025-03-11 09:18:35,677][03385] Updated weights for policy 0, policy_version 690 (0.0014) [2025-03-11 09:18:37,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 2830336. Throughput: 0: 967.4. Samples: 708832. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:37,363][01034] Avg episode reward: [(0, '25.037')] [2025-03-11 09:18:37,428][03372] Saving new best policy, reward=25.037! [2025-03-11 09:18:42,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2850816. Throughput: 0: 957.8. Samples: 711672. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:42,364][01034] Avg episode reward: [(0, '24.633')] [2025-03-11 09:18:46,855][03385] Updated weights for policy 0, policy_version 700 (0.0013) [2025-03-11 09:18:47,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2867200. Throughput: 0: 960.0. Samples: 716738. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:18:47,365][01034] Avg episode reward: [(0, '24.956')] [2025-03-11 09:18:52,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2891776. Throughput: 0: 960.5. Samples: 723268. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:52,360][01034] Avg episode reward: [(0, '24.991')] [2025-03-11 09:18:57,362][01034] Fps is (10 sec: 3685.3, 60 sec: 3754.5, 300 sec: 3818.3). Total num frames: 2904064. Throughput: 0: 945.3. Samples: 725808. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:18:57,363][01034] Avg episode reward: [(0, '23.612')] [2025-03-11 09:18:57,620][03385] Updated weights for policy 0, policy_version 710 (0.0014) [2025-03-11 09:19:02,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 2928640. Throughput: 0: 967.4. Samples: 731396. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:19:02,360][01034] Avg episode reward: [(0, '23.018')] [2025-03-11 09:19:07,147][03385] Updated weights for policy 0, policy_version 720 (0.0019) [2025-03-11 09:19:07,359][01034] Fps is (10 sec: 4506.9, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 2949120. Throughput: 0: 969.9. Samples: 737980. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:19:07,360][01034] Avg episode reward: [(0, '23.072')] [2025-03-11 09:19:12,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 2965504. Throughput: 0: 946.2. Samples: 740100. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:19:12,361][01034] Avg episode reward: [(0, '23.809')] [2025-03-11 09:19:17,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 2985984. Throughput: 0: 972.1. Samples: 746064. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:19:17,360][01034] Avg episode reward: [(0, '23.134')] [2025-03-11 09:19:17,985][03385] Updated weights for policy 0, policy_version 730 (0.0018) [2025-03-11 09:19:22,361][01034] Fps is (10 sec: 4095.2, 60 sec: 3891.1, 300 sec: 3846.0). Total num frames: 3006464. Throughput: 0: 966.7. Samples: 752334. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:19:22,365][01034] Avg episode reward: [(0, '22.464')] [2025-03-11 09:19:27,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3022848. Throughput: 0: 948.5. Samples: 754354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:19:27,369][01034] Avg episode reward: [(0, '22.534')] [2025-03-11 09:19:28,971][03385] Updated weights for policy 0, policy_version 740 (0.0023) [2025-03-11 09:19:32,359][01034] Fps is (10 sec: 3687.1, 60 sec: 3891.2, 300 sec: 3818.3). Total num frames: 3043328. Throughput: 0: 978.8. Samples: 760786. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:19:32,361][01034] Avg episode reward: [(0, '22.742')] [2025-03-11 09:19:32,372][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000743_3043328.pth... [2025-03-11 09:19:32,481][03372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000519_2125824.pth [2025-03-11 09:19:37,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3063808. Throughput: 0: 963.9. Samples: 766642. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:19:37,362][01034] Avg episode reward: [(0, '21.805')] [2025-03-11 09:19:39,753][03385] Updated weights for policy 0, policy_version 750 (0.0015) [2025-03-11 09:19:42,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3080192. Throughput: 0: 957.6. Samples: 768898. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:19:42,360][01034] Avg episode reward: [(0, '22.283')] [2025-03-11 09:19:47,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.3). Total num frames: 3100672. Throughput: 0: 977.3. Samples: 775376. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:19:47,369][01034] Avg episode reward: [(0, '22.004')] [2025-03-11 09:19:49,345][03385] Updated weights for policy 0, policy_version 760 (0.0016) [2025-03-11 09:19:52,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3121152. Throughput: 0: 952.1. Samples: 780826. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:19:52,363][01034] Avg episode reward: [(0, '22.919')] [2025-03-11 09:19:57,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.4, 300 sec: 3832.2). Total num frames: 3137536. Throughput: 0: 959.0. Samples: 783254. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:19:57,361][01034] Avg episode reward: [(0, '22.393')] [2025-03-11 09:20:00,369][03385] Updated weights for policy 0, policy_version 770 (0.0015) [2025-03-11 09:20:02,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3162112. Throughput: 0: 971.7. Samples: 789790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:20:02,366][01034] Avg episode reward: [(0, '22.309')] [2025-03-11 09:20:07,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3822.9, 300 sec: 3846.1). Total num frames: 3178496. Throughput: 0: 948.7. Samples: 795024. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:20:07,363][01034] Avg episode reward: [(0, '21.732')] [2025-03-11 09:20:11,351][03385] Updated weights for policy 0, policy_version 780 (0.0021) [2025-03-11 09:20:12,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3198976. Throughput: 0: 969.6. Samples: 797986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:20:12,368][01034] Avg episode reward: [(0, '21.573')] [2025-03-11 09:20:17,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3219456. Throughput: 0: 969.6. Samples: 804420. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:20:17,365][01034] Avg episode reward: [(0, '20.326')] [2025-03-11 09:20:22,363][01034] Fps is (10 sec: 3275.5, 60 sec: 3754.5, 300 sec: 3818.3). Total num frames: 3231744. Throughput: 0: 948.8. Samples: 809340. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:20:22,364][01034] Avg episode reward: [(0, '21.796')] [2025-03-11 09:20:22,385][03385] Updated weights for policy 0, policy_version 790 (0.0015) [2025-03-11 09:20:27,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3256320. Throughput: 0: 970.9. Samples: 812588. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:20:27,366][01034] Avg episode reward: [(0, '22.442')] [2025-03-11 09:20:31,688][03385] Updated weights for policy 0, policy_version 800 (0.0019) [2025-03-11 09:20:32,362][01034] Fps is (10 sec: 4506.3, 60 sec: 3891.0, 300 sec: 3846.0). Total num frames: 3276800. Throughput: 0: 972.7. Samples: 819150. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:20:32,363][01034] Avg episode reward: [(0, '22.560')] [2025-03-11 09:20:37,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3822.9, 300 sec: 3832.2). Total num frames: 3293184. Throughput: 0: 960.4. Samples: 824044. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:20:37,365][01034] Avg episode reward: [(0, '22.820')] [2025-03-11 09:20:42,359][01034] Fps is (10 sec: 3687.3, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3313664. Throughput: 0: 977.6. Samples: 827246. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:20:42,364][01034] Avg episode reward: [(0, '22.393')] [2025-03-11 09:20:42,711][03385] Updated weights for policy 0, policy_version 810 (0.0014) [2025-03-11 09:20:47,360][01034] Fps is (10 sec: 4095.6, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 3334144. Throughput: 0: 975.3. Samples: 833678. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:20:47,363][01034] Avg episode reward: [(0, '22.270')] [2025-03-11 09:20:52,360][01034] Fps is (10 sec: 3276.7, 60 sec: 3754.6, 300 sec: 3818.3). Total num frames: 3346432. Throughput: 0: 946.8. Samples: 837632. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:20:52,361][01034] Avg episode reward: [(0, '22.006')] [2025-03-11 09:20:55,266][03385] Updated weights for policy 0, policy_version 820 (0.0020) [2025-03-11 09:20:57,359][01034] Fps is (10 sec: 3277.1, 60 sec: 3822.9, 300 sec: 3818.3). Total num frames: 3366912. Throughput: 0: 934.6. Samples: 840044. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:20:57,365][01034] Avg episode reward: [(0, '21.347')] [2025-03-11 09:21:02,360][01034] Fps is (10 sec: 3686.2, 60 sec: 3686.3, 300 sec: 3818.3). Total num frames: 3383296. Throughput: 0: 929.7. Samples: 846256. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:21:02,361][01034] Avg episode reward: [(0, '21.657')] [2025-03-11 09:21:06,211][03385] Updated weights for policy 0, policy_version 830 (0.0016) [2025-03-11 09:21:07,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3818.3). Total num frames: 3403776. Throughput: 0: 936.4. Samples: 851476. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:21:07,360][01034] Avg episode reward: [(0, '23.577')] [2025-03-11 09:21:12,361][01034] Fps is (10 sec: 4095.6, 60 sec: 3754.5, 300 sec: 3818.4). Total num frames: 3424256. Throughput: 0: 935.9. Samples: 854704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:21:12,362][01034] Avg episode reward: [(0, '24.676')] [2025-03-11 09:21:16,425][03385] Updated weights for policy 0, policy_version 840 (0.0012) [2025-03-11 09:21:17,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3686.4, 300 sec: 3832.2). Total num frames: 3440640. Throughput: 0: 920.2. Samples: 860558. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:21:17,361][01034] Avg episode reward: [(0, '24.076')] [2025-03-11 09:21:22,359][01034] Fps is (10 sec: 3687.1, 60 sec: 3823.2, 300 sec: 3832.2). Total num frames: 3461120. Throughput: 0: 933.8. Samples: 866066. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:21:22,360][01034] Avg episode reward: [(0, '24.708')] [2025-03-11 09:21:26,553][03385] Updated weights for policy 0, policy_version 850 (0.0013) [2025-03-11 09:21:27,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3481600. Throughput: 0: 935.6. Samples: 869346. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:21:27,360][01034] Avg episode reward: [(0, '24.663')] [2025-03-11 09:21:32,360][01034] Fps is (10 sec: 3686.2, 60 sec: 3686.5, 300 sec: 3832.2). Total num frames: 3497984. Throughput: 0: 913.6. Samples: 874790. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:21:32,365][01034] Avg episode reward: [(0, '25.309')] [2025-03-11 09:21:32,374][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000854_3497984.pth... [2025-03-11 09:21:32,520][03372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000630_2580480.pth [2025-03-11 09:21:32,539][03372] Saving new best policy, reward=25.309! [2025-03-11 09:21:37,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3518464. Throughput: 0: 955.2. Samples: 880616. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:21:37,361][01034] Avg episode reward: [(0, '23.279')] [2025-03-11 09:21:37,606][03385] Updated weights for policy 0, policy_version 860 (0.0018) [2025-03-11 09:21:42,359][01034] Fps is (10 sec: 4096.2, 60 sec: 3754.7, 300 sec: 3832.2). Total num frames: 3538944. Throughput: 0: 974.2. Samples: 883882. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:21:42,366][01034] Avg episode reward: [(0, '22.805')] [2025-03-11 09:21:47,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3686.5, 300 sec: 3832.2). Total num frames: 3555328. Throughput: 0: 948.3. Samples: 888930. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-03-11 09:21:47,361][01034] Avg episode reward: [(0, '21.107')] [2025-03-11 09:21:48,613][03385] Updated weights for policy 0, policy_version 870 (0.0017) [2025-03-11 09:21:52,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3579904. Throughput: 0: 975.1. Samples: 895354. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:21:52,365][01034] Avg episode reward: [(0, '20.436')] [2025-03-11 09:21:57,362][01034] Fps is (10 sec: 4504.3, 60 sec: 3891.0, 300 sec: 3846.0). Total num frames: 3600384. Throughput: 0: 975.3. Samples: 898592. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:21:57,364][01034] Avg episode reward: [(0, '20.275')] [2025-03-11 09:21:58,550][03385] Updated weights for policy 0, policy_version 880 (0.0015) [2025-03-11 09:22:02,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.3, 300 sec: 3846.1). Total num frames: 3616768. Throughput: 0: 955.6. Samples: 903560. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:22:02,361][01034] Avg episode reward: [(0, '19.930')] [2025-03-11 09:22:07,360][01034] Fps is (10 sec: 3687.1, 60 sec: 3891.1, 300 sec: 3846.1). Total num frames: 3637248. Throughput: 0: 980.2. Samples: 910176. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:22:07,361][01034] Avg episode reward: [(0, '21.139')] [2025-03-11 09:22:08,686][03385] Updated weights for policy 0, policy_version 890 (0.0017) [2025-03-11 09:22:12,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.3, 300 sec: 3860.0). Total num frames: 3657728. Throughput: 0: 979.3. Samples: 913414. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:22:12,367][01034] Avg episode reward: [(0, '21.358')] [2025-03-11 09:22:17,359][01034] Fps is (10 sec: 3686.8, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3674112. Throughput: 0: 966.9. Samples: 918300. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:22:17,361][01034] Avg episode reward: [(0, '23.270')] [2025-03-11 09:22:19,682][03385] Updated weights for policy 0, policy_version 900 (0.0020) [2025-03-11 09:22:22,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3694592. Throughput: 0: 980.8. Samples: 924752. Policy #0 lag: (min: 0.0, avg: 0.2, max: 1.0) [2025-03-11 09:22:22,361][01034] Avg episode reward: [(0, '24.580')] [2025-03-11 09:22:27,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3715072. Throughput: 0: 978.7. Samples: 927922. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:22:27,363][01034] Avg episode reward: [(0, '24.025')] [2025-03-11 09:22:30,552][03385] Updated weights for policy 0, policy_version 910 (0.0013) [2025-03-11 09:22:32,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3832.2). Total num frames: 3731456. Throughput: 0: 979.9. Samples: 933026. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:22:32,367][01034] Avg episode reward: [(0, '23.612')] [2025-03-11 09:22:37,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3756032. Throughput: 0: 984.6. Samples: 939660. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:22:37,361][01034] Avg episode reward: [(0, '24.600')] [2025-03-11 09:22:40,360][03385] Updated weights for policy 0, policy_version 920 (0.0016) [2025-03-11 09:22:42,360][01034] Fps is (10 sec: 4095.8, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3772416. Throughput: 0: 973.1. Samples: 942380. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:22:42,363][01034] Avg episode reward: [(0, '24.428')] [2025-03-11 09:22:47,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3846.1). Total num frames: 3792896. Throughput: 0: 985.7. Samples: 947918. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:22:47,361][01034] Avg episode reward: [(0, '23.642')] [2025-03-11 09:22:50,755][03385] Updated weights for policy 0, policy_version 930 (0.0023) [2025-03-11 09:22:52,359][01034] Fps is (10 sec: 4096.2, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3813376. Throughput: 0: 985.4. Samples: 954518. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:22:52,360][01034] Avg episode reward: [(0, '23.639')] [2025-03-11 09:22:57,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3823.1, 300 sec: 3846.1). Total num frames: 3829760. Throughput: 0: 965.1. Samples: 956842. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:22:57,361][01034] Avg episode reward: [(0, '23.655')] [2025-03-11 09:23:01,515][03385] Updated weights for policy 0, policy_version 940 (0.0013) [2025-03-11 09:23:02,359][01034] Fps is (10 sec: 3686.3, 60 sec: 3891.2, 300 sec: 3846.1). Total num frames: 3850240. Throughput: 0: 988.4. Samples: 962780. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:23:02,362][01034] Avg episode reward: [(0, '23.083')] [2025-03-11 09:23:07,360][01034] Fps is (10 sec: 4505.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3874816. Throughput: 0: 989.1. Samples: 969264. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:23:07,361][01034] Avg episode reward: [(0, '22.205')] [2025-03-11 09:23:12,210][03385] Updated weights for policy 0, policy_version 950 (0.0018) [2025-03-11 09:23:12,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3891200. Throughput: 0: 965.1. Samples: 971352. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:23:12,360][01034] Avg episode reward: [(0, '21.617')] [2025-03-11 09:23:17,359][01034] Fps is (10 sec: 3686.6, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3911680. Throughput: 0: 992.8. Samples: 977704. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:23:17,363][01034] Avg episode reward: [(0, '21.506')] [2025-03-11 09:23:21,809][03385] Updated weights for policy 0, policy_version 960 (0.0023) [2025-03-11 09:23:22,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3932160. Throughput: 0: 980.4. Samples: 983776. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:23:22,360][01034] Avg episode reward: [(0, '20.765')] [2025-03-11 09:23:27,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3948544. Throughput: 0: 966.7. Samples: 985882. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:23:27,361][01034] Avg episode reward: [(0, '21.047')] [2025-03-11 09:23:32,359][01034] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3860.0). Total num frames: 3969024. Throughput: 0: 988.9. Samples: 992420. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:23:32,361][01034] Avg episode reward: [(0, '23.154')] [2025-03-11 09:23:32,381][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000970_3973120.pth... [2025-03-11 09:23:32,382][03385] Updated weights for policy 0, policy_version 970 (0.0025) [2025-03-11 09:23:32,510][03372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000743_3043328.pth [2025-03-11 09:23:37,359][01034] Fps is (10 sec: 4096.0, 60 sec: 3891.2, 300 sec: 3860.0). Total num frames: 3989504. Throughput: 0: 963.3. Samples: 997866. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:23:37,364][01034] Avg episode reward: [(0, '22.861')] [2025-03-11 09:23:41,646][03372] Stopping Batcher_0... [2025-03-11 09:23:41,646][03372] Loop batcher_evt_loop terminating... [2025-03-11 09:23:41,646][01034] Component Batcher_0 stopped! [2025-03-11 09:23:41,652][01034] Component RolloutWorker_w1 process died already! Don't wait for it. [2025-03-11 09:23:41,656][01034] Component RolloutWorker_w3 process died already! Don't wait for it. [2025-03-11 09:23:41,659][01034] Component RolloutWorker_w4 process died already! Don't wait for it. [2025-03-11 09:23:41,674][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-03-11 09:23:41,725][03385] Weights refcount: 2 0 [2025-03-11 09:23:41,728][03385] Stopping InferenceWorker_p0-w0... [2025-03-11 09:23:41,730][03385] Loop inference_proc0-0_evt_loop terminating... [2025-03-11 09:23:41,729][01034] Component InferenceWorker_p0-w0 stopped! [2025-03-11 09:23:41,795][03372] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000854_3497984.pth [2025-03-11 09:23:41,807][03372] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-03-11 09:23:41,869][01034] Component RolloutWorker_w0 stopped! [2025-03-11 09:23:41,874][03390] Stopping RolloutWorker_w0... [2025-03-11 09:23:41,877][03390] Loop rollout_proc0_evt_loop terminating... [2025-03-11 09:23:41,887][01034] Component RolloutWorker_w2 stopped! [2025-03-11 09:23:41,891][03387] Stopping RolloutWorker_w2... [2025-03-11 09:23:41,892][03387] Loop rollout_proc2_evt_loop terminating... [2025-03-11 09:23:41,913][01034] Component RolloutWorker_w6 stopped! [2025-03-11 09:23:41,912][03393] Stopping RolloutWorker_w6... [2025-03-11 09:23:41,927][03393] Loop rollout_proc6_evt_loop terminating... [2025-03-11 09:23:42,004][01034] Component LearnerWorker_p0 stopped! [2025-03-11 09:23:42,009][03372] Stopping LearnerWorker_p0... [2025-03-11 09:23:42,009][03372] Loop learner_proc0_evt_loop terminating... [2025-03-11 09:23:42,074][01034] Component RolloutWorker_w5 stopped! [2025-03-11 09:23:42,079][03389] Stopping RolloutWorker_w5... [2025-03-11 09:23:42,084][03389] Loop rollout_proc5_evt_loop terminating... [2025-03-11 09:23:42,161][01034] Component RolloutWorker_w7 stopped! [2025-03-11 09:23:42,166][01034] Waiting for process learner_proc0 to stop... [2025-03-11 09:23:42,170][03392] Stopping RolloutWorker_w7... [2025-03-11 09:23:42,170][03392] Loop rollout_proc7_evt_loop terminating... [2025-03-11 09:23:43,713][01034] Waiting for process inference_proc0-0 to join... [2025-03-11 09:23:43,718][01034] Waiting for process rollout_proc0 to join... [2025-03-11 09:23:44,851][01034] Waiting for process rollout_proc1 to join... [2025-03-11 09:23:44,852][01034] Waiting for process rollout_proc2 to join... [2025-03-11 09:23:44,863][01034] Waiting for process rollout_proc3 to join... [2025-03-11 09:23:44,863][01034] Waiting for process rollout_proc4 to join... [2025-03-11 09:23:44,865][01034] Waiting for process rollout_proc5 to join... [2025-03-11 09:23:44,865][01034] Waiting for process rollout_proc6 to join... [2025-03-11 09:23:44,866][01034] Waiting for process rollout_proc7 to join... [2025-03-11 09:23:44,867][01034] Batcher 0 profile tree view: batching: 22.5871, releasing_batches: 0.0285 [2025-03-11 09:23:44,868][01034] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0035 wait_policy_total: 416.0306 update_model: 9.1642 weight_update: 0.0015 one_step: 0.0042 handle_policy_step: 598.2390 deserialize: 14.1729, stack: 3.5278, obs_to_device_normalize: 133.0906, forward: 315.3128, send_messages: 22.1451 prepare_outputs: 84.0394 to_cpu: 52.6837 [2025-03-11 09:23:44,869][01034] Learner 0 profile tree view: misc: 0.0043, prepare_batch: 12.5155 train: 67.4967 epoch_init: 0.0050, minibatch_init: 0.0071, losses_postprocess: 0.6212, kl_divergence: 0.5492, after_optimizer: 32.4452 calculate_losses: 22.8513 losses_init: 0.0035, forward_head: 1.1245, bptt_initial: 15.5519, tail: 0.8729, advantages_returns: 0.2691, losses: 3.0482 bptt: 1.7739 bptt_forward_core: 1.6912 update: 10.4898 clip: 0.8301 [2025-03-11 09:23:44,870][01034] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.3636, enqueue_policy_requests: 103.6549, env_step: 829.7321, overhead: 15.3435, complete_rollouts: 7.9711 save_policy_outputs: 23.8309 split_output_tensors: 9.3333 [2025-03-11 09:23:44,872][01034] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.3100, enqueue_policy_requests: 193.4696, env_step: 758.0324, overhead: 14.6178, complete_rollouts: 5.7012 save_policy_outputs: 20.6164 split_output_tensors: 8.0247 [2025-03-11 09:23:44,873][01034] Loop Runner_EvtLoop terminating... [2025-03-11 09:23:44,874][01034] Runner profile tree view: main_loop: 1085.5647 [2025-03-11 09:23:44,875][01034] Collected {0: 4005888}, FPS: 3690.1 [2025-03-11 09:23:50,377][01034] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-03-11 09:23:50,378][01034] Overriding arg 'num_workers' with value 1 passed from command line [2025-03-11 09:23:50,379][01034] Adding new argument 'no_render'=True that is not in the saved config file! [2025-03-11 09:23:50,380][01034] Adding new argument 'save_video'=True that is not in the saved config file! [2025-03-11 09:23:50,380][01034] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-03-11 09:23:50,381][01034] Adding new argument 'video_name'=None that is not in the saved config file! [2025-03-11 09:23:50,382][01034] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-03-11 09:23:50,382][01034] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-03-11 09:23:50,383][01034] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-03-11 09:23:50,384][01034] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-03-11 09:23:50,384][01034] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-03-11 09:23:50,385][01034] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-03-11 09:23:50,386][01034] Adding new argument 'train_script'=None that is not in the saved config file! [2025-03-11 09:23:50,387][01034] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-03-11 09:23:50,388][01034] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-03-11 09:23:50,440][01034] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:23:50,445][01034] RunningMeanStd input shape: (3, 72, 128) [2025-03-11 09:23:50,448][01034] RunningMeanStd input shape: (1,) [2025-03-11 09:23:50,470][01034] ConvEncoder: input_channels=3 [2025-03-11 09:23:50,629][01034] Conv encoder output size: 512 [2025-03-11 09:23:50,631][01034] Policy head output size: 512 [2025-03-11 09:23:50,956][01034] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-03-11 09:23:51,727][01034] Num frames 100... [2025-03-11 09:23:51,856][01034] Num frames 200... [2025-03-11 09:23:51,983][01034] Num frames 300... [2025-03-11 09:23:52,115][01034] Num frames 400... [2025-03-11 09:23:52,251][01034] Num frames 500... [2025-03-11 09:23:52,379][01034] Num frames 600... [2025-03-11 09:23:52,518][01034] Num frames 700... [2025-03-11 09:23:52,649][01034] Num frames 800... [2025-03-11 09:23:52,775][01034] Num frames 900... [2025-03-11 09:23:52,944][01034] Avg episode rewards: #0: 20.920, true rewards: #0: 9.920 [2025-03-11 09:23:52,945][01034] Avg episode reward: 20.920, avg true_objective: 9.920 [2025-03-11 09:23:52,960][01034] Num frames 1000... [2025-03-11 09:23:53,091][01034] Num frames 1100... [2025-03-11 09:23:53,232][01034] Num frames 1200... [2025-03-11 09:23:53,360][01034] Num frames 1300... [2025-03-11 09:23:53,496][01034] Num frames 1400... [2025-03-11 09:23:53,623][01034] Num frames 1500... [2025-03-11 09:23:53,683][01034] Avg episode rewards: #0: 16.020, true rewards: #0: 7.520 [2025-03-11 09:23:53,684][01034] Avg episode reward: 16.020, avg true_objective: 7.520 [2025-03-11 09:23:53,809][01034] Num frames 1600... [2025-03-11 09:23:53,940][01034] Num frames 1700... [2025-03-11 09:23:54,068][01034] Num frames 1800... [2025-03-11 09:23:54,204][01034] Num frames 1900... [2025-03-11 09:23:54,334][01034] Num frames 2000... [2025-03-11 09:23:54,468][01034] Num frames 2100... [2025-03-11 09:23:54,594][01034] Num frames 2200... [2025-03-11 09:23:54,719][01034] Num frames 2300... [2025-03-11 09:23:54,854][01034] Num frames 2400... [2025-03-11 09:23:54,983][01034] Num frames 2500... [2025-03-11 09:23:55,122][01034] Num frames 2600... [2025-03-11 09:23:55,259][01034] Num frames 2700... [2025-03-11 09:23:55,388][01034] Num frames 2800... [2025-03-11 09:23:55,559][01034] Avg episode rewards: #0: 20.267, true rewards: #0: 9.600 [2025-03-11 09:23:55,560][01034] Avg episode reward: 20.267, avg true_objective: 9.600 [2025-03-11 09:23:55,589][01034] Num frames 2900... [2025-03-11 09:23:55,727][01034] Num frames 3000... [2025-03-11 09:23:55,864][01034] Num frames 3100... [2025-03-11 09:23:56,013][01034] Avg episode rewards: #0: 16.433, true rewards: #0: 7.932 [2025-03-11 09:23:56,014][01034] Avg episode reward: 16.433, avg true_objective: 7.932 [2025-03-11 09:23:56,052][01034] Num frames 3200... [2025-03-11 09:23:56,178][01034] Num frames 3300... [2025-03-11 09:23:56,315][01034] Num frames 3400... [2025-03-11 09:23:56,458][01034] Num frames 3500... [2025-03-11 09:23:56,586][01034] Num frames 3600... [2025-03-11 09:23:56,716][01034] Num frames 3700... [2025-03-11 09:23:56,848][01034] Num frames 3800... [2025-03-11 09:23:56,977][01034] Num frames 3900... [2025-03-11 09:23:57,108][01034] Num frames 4000... [2025-03-11 09:23:57,235][01034] Num frames 4100... [2025-03-11 09:23:57,369][01034] Num frames 4200... [2025-03-11 09:23:57,505][01034] Num frames 4300... [2025-03-11 09:23:57,632][01034] Num frames 4400... [2025-03-11 09:23:57,762][01034] Num frames 4500... [2025-03-11 09:23:57,892][01034] Num frames 4600... [2025-03-11 09:23:58,023][01034] Num frames 4700... [2025-03-11 09:23:58,155][01034] Num frames 4800... [2025-03-11 09:23:58,283][01034] Num frames 4900... [2025-03-11 09:23:58,433][01034] Num frames 5000... [2025-03-11 09:23:58,565][01034] Num frames 5100... [2025-03-11 09:23:58,696][01034] Num frames 5200... [2025-03-11 09:23:58,845][01034] Avg episode rewards: #0: 24.146, true rewards: #0: 10.546 [2025-03-11 09:23:58,846][01034] Avg episode reward: 24.146, avg true_objective: 10.546 [2025-03-11 09:23:58,884][01034] Num frames 5300... [2025-03-11 09:23:59,012][01034] Num frames 5400... [2025-03-11 09:23:59,140][01034] Num frames 5500... [2025-03-11 09:23:59,270][01034] Num frames 5600... [2025-03-11 09:23:59,411][01034] Num frames 5700... [2025-03-11 09:23:59,562][01034] Num frames 5800... [2025-03-11 09:23:59,705][01034] Num frames 5900... [2025-03-11 09:23:59,844][01034] Num frames 6000... [2025-03-11 09:23:59,971][01034] Num frames 6100... [2025-03-11 09:24:00,103][01034] Num frames 6200... [2025-03-11 09:24:00,283][01034] Avg episode rewards: #0: 23.495, true rewards: #0: 10.495 [2025-03-11 09:24:00,284][01034] Avg episode reward: 23.495, avg true_objective: 10.495 [2025-03-11 09:24:00,290][01034] Num frames 6300... [2025-03-11 09:24:00,435][01034] Num frames 6400... [2025-03-11 09:24:00,562][01034] Num frames 6500... [2025-03-11 09:24:00,691][01034] Num frames 6600... [2025-03-11 09:24:00,822][01034] Num frames 6700... [2025-03-11 09:24:00,953][01034] Num frames 6800... [2025-03-11 09:24:01,124][01034] Num frames 6900... [2025-03-11 09:24:01,301][01034] Num frames 7000... [2025-03-11 09:24:01,483][01034] Num frames 7100... [2025-03-11 09:24:01,717][01034] Avg episode rewards: #0: 22.990, true rewards: #0: 10.276 [2025-03-11 09:24:01,720][01034] Avg episode reward: 22.990, avg true_objective: 10.276 [2025-03-11 09:24:01,735][01034] Num frames 7200... [2025-03-11 09:24:01,911][01034] Num frames 7300... [2025-03-11 09:24:02,090][01034] Num frames 7400... [2025-03-11 09:24:02,256][01034] Num frames 7500... [2025-03-11 09:24:02,438][01034] Num frames 7600... [2025-03-11 09:24:02,622][01034] Num frames 7700... [2025-03-11 09:24:02,796][01034] Num frames 7800... [2025-03-11 09:24:03,036][01034] Avg episode rewards: #0: 21.746, true rewards: #0: 9.871 [2025-03-11 09:24:03,037][01034] Avg episode reward: 21.746, avg true_objective: 9.871 [2025-03-11 09:24:03,042][01034] Num frames 7900... [2025-03-11 09:24:03,219][01034] Num frames 8000... [2025-03-11 09:24:03,347][01034] Num frames 8100... [2025-03-11 09:24:03,489][01034] Num frames 8200... [2025-03-11 09:24:03,567][01034] Avg episode rewards: #0: 20.019, true rewards: #0: 9.130 [2025-03-11 09:24:03,568][01034] Avg episode reward: 20.019, avg true_objective: 9.130 [2025-03-11 09:24:03,671][01034] Num frames 8300... [2025-03-11 09:24:03,803][01034] Num frames 8400... [2025-03-11 09:24:03,932][01034] Num frames 8500... [2025-03-11 09:24:04,063][01034] Num frames 8600... [2025-03-11 09:24:04,191][01034] Num frames 8700... [2025-03-11 09:24:04,319][01034] Num frames 8800... [2025-03-11 09:24:04,454][01034] Num frames 8900... [2025-03-11 09:24:04,589][01034] Num frames 9000... [2025-03-11 09:24:04,749][01034] Avg episode rewards: #0: 19.581, true rewards: #0: 9.081 [2025-03-11 09:24:04,750][01034] Avg episode reward: 19.581, avg true_objective: 9.081 [2025-03-11 09:24:55,832][01034] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-03-11 09:30:12,073][01034] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-03-11 09:30:12,074][01034] Overriding arg 'num_workers' with value 1 passed from command line [2025-03-11 09:30:12,075][01034] Adding new argument 'no_render'=True that is not in the saved config file! [2025-03-11 09:30:12,076][01034] Adding new argument 'save_video'=True that is not in the saved config file! [2025-03-11 09:30:12,076][01034] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-03-11 09:30:12,077][01034] Adding new argument 'video_name'=None that is not in the saved config file! [2025-03-11 09:30:12,078][01034] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-03-11 09:30:12,079][01034] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-03-11 09:30:12,080][01034] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-03-11 09:30:12,080][01034] Adding new argument 'hf_repository'='so7en/Doom_unit8_2' that is not in the saved config file! [2025-03-11 09:30:12,081][01034] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-03-11 09:30:12,082][01034] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-03-11 09:30:12,083][01034] Adding new argument 'train_script'=None that is not in the saved config file! [2025-03-11 09:30:12,084][01034] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-03-11 09:30:12,085][01034] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-03-11 09:30:12,112][01034] RunningMeanStd input shape: (3, 72, 128) [2025-03-11 09:30:12,113][01034] RunningMeanStd input shape: (1,) [2025-03-11 09:30:12,124][01034] ConvEncoder: input_channels=3 [2025-03-11 09:30:12,159][01034] Conv encoder output size: 512 [2025-03-11 09:30:12,159][01034] Policy head output size: 512 [2025-03-11 09:30:12,178][01034] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-03-11 09:30:12,638][01034] Num frames 100... [2025-03-11 09:30:12,767][01034] Num frames 200... [2025-03-11 09:30:12,904][01034] Num frames 300... [2025-03-11 09:30:13,043][01034] Num frames 400... [2025-03-11 09:30:13,175][01034] Num frames 500... [2025-03-11 09:30:13,304][01034] Num frames 600... [2025-03-11 09:30:13,445][01034] Num frames 700... [2025-03-11 09:30:13,577][01034] Num frames 800... [2025-03-11 09:30:13,715][01034] Num frames 900... [2025-03-11 09:30:13,887][01034] Avg episode rewards: #0: 21.920, true rewards: #0: 9.920 [2025-03-11 09:30:13,889][01034] Avg episode reward: 21.920, avg true_objective: 9.920 [2025-03-11 09:30:13,901][01034] Num frames 1000... [2025-03-11 09:30:14,034][01034] Num frames 1100... [2025-03-11 09:30:14,167][01034] Num frames 1200... [2025-03-11 09:30:14,296][01034] Num frames 1300... [2025-03-11 09:30:14,437][01034] Num frames 1400... [2025-03-11 09:30:14,577][01034] Num frames 1500... [2025-03-11 09:30:14,743][01034] Avg episode rewards: #0: 18.890, true rewards: #0: 7.890 [2025-03-11 09:30:14,744][01034] Avg episode reward: 18.890, avg true_objective: 7.890 [2025-03-11 09:30:14,774][01034] Num frames 1600... [2025-03-11 09:30:14,901][01034] Num frames 1700... [2025-03-11 09:30:15,031][01034] Num frames 1800... [2025-03-11 09:30:15,167][01034] Num frames 1900... [2025-03-11 09:30:15,298][01034] Num frames 2000... [2025-03-11 09:30:15,409][01034] Avg episode rewards: #0: 14.807, true rewards: #0: 6.807 [2025-03-11 09:30:15,410][01034] Avg episode reward: 14.807, avg true_objective: 6.807 [2025-03-11 09:30:15,491][01034] Num frames 2100... [2025-03-11 09:30:15,620][01034] Num frames 2200... [2025-03-11 09:30:15,754][01034] Num frames 2300... [2025-03-11 09:30:15,884][01034] Num frames 2400... [2025-03-11 09:30:16,012][01034] Num frames 2500... [2025-03-11 09:30:16,145][01034] Num frames 2600... [2025-03-11 09:30:16,272][01034] Num frames 2700... [2025-03-11 09:30:16,406][01034] Num frames 2800... [2025-03-11 09:30:16,541][01034] Num frames 2900... [2025-03-11 09:30:16,682][01034] Num frames 3000... [2025-03-11 09:30:16,828][01034] Num frames 3100... [2025-03-11 09:30:16,973][01034] Avg episode rewards: #0: 18.405, true rewards: #0: 7.905 [2025-03-11 09:30:16,975][01034] Avg episode reward: 18.405, avg true_objective: 7.905 [2025-03-11 09:30:17,036][01034] Num frames 3200... [2025-03-11 09:30:17,171][01034] Num frames 3300... [2025-03-11 09:30:17,315][01034] Num frames 3400... [2025-03-11 09:30:17,451][01034] Num frames 3500... [2025-03-11 09:30:17,579][01034] Num frames 3600... [2025-03-11 09:30:17,704][01034] Num frames 3700... [2025-03-11 09:30:17,843][01034] Num frames 3800... [2025-03-11 09:30:18,022][01034] Avg episode rewards: #0: 17.396, true rewards: #0: 7.796 [2025-03-11 09:30:18,022][01034] Avg episode reward: 17.396, avg true_objective: 7.796 [2025-03-11 09:30:18,027][01034] Num frames 3900... [2025-03-11 09:30:18,155][01034] Num frames 4000... [2025-03-11 09:30:18,284][01034] Num frames 4100... [2025-03-11 09:30:18,424][01034] Num frames 4200... [2025-03-11 09:30:18,574][01034] Num frames 4300... [2025-03-11 09:30:18,747][01034] Num frames 4400... [2025-03-11 09:30:18,939][01034] Num frames 4500... [2025-03-11 09:30:19,129][01034] Num frames 4600... [2025-03-11 09:30:19,343][01034] Num frames 4700... [2025-03-11 09:30:19,526][01034] Num frames 4800... [2025-03-11 09:30:19,699][01034] Num frames 4900... [2025-03-11 09:30:19,888][01034] Num frames 5000... [2025-03-11 09:30:19,977][01034] Avg episode rewards: #0: 18.530, true rewards: #0: 8.363 [2025-03-11 09:30:19,978][01034] Avg episode reward: 18.530, avg true_objective: 8.363 [2025-03-11 09:30:20,113][01034] Num frames 5100... [2025-03-11 09:30:20,276][01034] Num frames 5200... [2025-03-11 09:30:20,443][01034] Num frames 5300... [2025-03-11 09:30:20,620][01034] Num frames 5400... [2025-03-11 09:30:20,796][01034] Num frames 5500... [2025-03-11 09:30:20,982][01034] Num frames 5600... [2025-03-11 09:30:21,162][01034] Num frames 5700... [2025-03-11 09:30:21,332][01034] Num frames 5800... [2025-03-11 09:30:21,469][01034] Num frames 5900... [2025-03-11 09:30:21,605][01034] Num frames 6000... [2025-03-11 09:30:21,733][01034] Num frames 6100... [2025-03-11 09:30:21,875][01034] Num frames 6200... [2025-03-11 09:30:22,006][01034] Num frames 6300... [2025-03-11 09:30:22,142][01034] Num frames 6400... [2025-03-11 09:30:22,293][01034] Num frames 6500... [2025-03-11 09:30:22,432][01034] Num frames 6600... [2025-03-11 09:30:22,562][01034] Num frames 6700... [2025-03-11 09:30:22,691][01034] Num frames 6800... [2025-03-11 09:30:22,822][01034] Num frames 6900... [2025-03-11 09:30:22,963][01034] Num frames 7000... [2025-03-11 09:30:23,094][01034] Num frames 7100... [2025-03-11 09:30:23,174][01034] Avg episode rewards: #0: 24.026, true rewards: #0: 10.169 [2025-03-11 09:30:23,175][01034] Avg episode reward: 24.026, avg true_objective: 10.169 [2025-03-11 09:30:23,291][01034] Num frames 7200... [2025-03-11 09:30:23,431][01034] Num frames 7300... [2025-03-11 09:30:23,562][01034] Num frames 7400... [2025-03-11 09:30:23,691][01034] Num frames 7500... [2025-03-11 09:30:23,820][01034] Num frames 7600... [2025-03-11 09:30:23,960][01034] Num frames 7700... [2025-03-11 09:30:24,091][01034] Num frames 7800... [2025-03-11 09:30:24,224][01034] Num frames 7900... [2025-03-11 09:30:24,351][01034] Num frames 8000... [2025-03-11 09:30:24,485][01034] Num frames 8100... [2025-03-11 09:30:24,594][01034] Avg episode rewards: #0: 24.177, true rewards: #0: 10.177 [2025-03-11 09:30:24,595][01034] Avg episode reward: 24.177, avg true_objective: 10.177 [2025-03-11 09:30:24,670][01034] Num frames 8200... [2025-03-11 09:30:24,800][01034] Num frames 8300... [2025-03-11 09:30:24,934][01034] Num frames 8400... [2025-03-11 09:30:25,066][01034] Num frames 8500... [2025-03-11 09:30:25,195][01034] Num frames 8600... [2025-03-11 09:30:25,321][01034] Num frames 8700... [2025-03-11 09:30:25,461][01034] Num frames 8800... [2025-03-11 09:30:25,586][01034] Num frames 8900... [2025-03-11 09:30:25,719][01034] Num frames 9000... [2025-03-11 09:30:25,846][01034] Num frames 9100... [2025-03-11 09:30:25,905][01034] Avg episode rewards: #0: 23.891, true rewards: #0: 10.113 [2025-03-11 09:30:25,906][01034] Avg episode reward: 23.891, avg true_objective: 10.113 [2025-03-11 09:30:26,036][01034] Num frames 9200... [2025-03-11 09:30:26,166][01034] Num frames 9300... [2025-03-11 09:30:26,296][01034] Num frames 9400... [2025-03-11 09:30:26,421][01034] Avg episode rewards: #0: 21.954, true rewards: #0: 9.454 [2025-03-11 09:30:26,422][01034] Avg episode reward: 21.954, avg true_objective: 9.454 [2025-03-11 09:31:19,097][01034] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-03-11 09:31:25,046][01034] The model has been pushed to https://huggingface.co/so7en/Doom_unit8_2 [2025-03-11 09:32:55,246][01034] Environment doom_basic already registered, overwriting... [2025-03-11 09:32:55,247][01034] Environment doom_two_colors_easy already registered, overwriting... [2025-03-11 09:32:55,253][01034] Environment doom_two_colors_hard already registered, overwriting... [2025-03-11 09:32:55,254][01034] Environment doom_dm already registered, overwriting... [2025-03-11 09:32:55,255][01034] Environment doom_dwango5 already registered, overwriting... [2025-03-11 09:32:55,256][01034] Environment doom_my_way_home_flat_actions already registered, overwriting... [2025-03-11 09:32:55,260][01034] Environment doom_defend_the_center_flat_actions already registered, overwriting... [2025-03-11 09:32:55,261][01034] Environment doom_my_way_home already registered, overwriting... [2025-03-11 09:32:55,264][01034] Environment doom_deadly_corridor already registered, overwriting... [2025-03-11 09:32:55,268][01034] Environment doom_defend_the_center already registered, overwriting... [2025-03-11 09:32:55,271][01034] Environment doom_defend_the_line already registered, overwriting... [2025-03-11 09:32:55,272][01034] Environment doom_health_gathering already registered, overwriting... [2025-03-11 09:32:55,274][01034] Environment doom_health_gathering_supreme already registered, overwriting... [2025-03-11 09:32:55,278][01034] Environment doom_battle already registered, overwriting... [2025-03-11 09:32:55,279][01034] Environment doom_battle2 already registered, overwriting... [2025-03-11 09:32:55,283][01034] Environment doom_duel_bots already registered, overwriting... [2025-03-11 09:32:55,284][01034] Environment doom_deathmatch_bots already registered, overwriting... [2025-03-11 09:32:55,289][01034] Environment doom_duel already registered, overwriting... [2025-03-11 09:32:55,290][01034] Environment doom_deathmatch_full already registered, overwriting... [2025-03-11 09:32:55,290][01034] Environment doom_benchmark already registered, overwriting... [2025-03-11 09:32:55,296][01034] register_encoder_factory: [2025-03-11 09:32:55,330][01034] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-03-11 09:32:55,338][01034] Overriding arg 'train_for_env_steps' with value 5000000 passed from command line [2025-03-11 09:32:55,350][01034] Experiment dir /content/train_dir/default_experiment already exists! [2025-03-11 09:32:55,353][01034] Resuming existing experiment from /content/train_dir/default_experiment... [2025-03-11 09:32:55,357][01034] Weights and Biases integration disabled [2025-03-11 09:32:55,361][01034] Environment var CUDA_VISIBLE_DEVICES is 0 [2025-03-11 09:32:59,335][01034] Starting experiment with the following configuration: help=False algo=APPO env=doom_health_gathering_supreme experiment=default_experiment train_dir=/content/train_dir restart_behavior=resume device=gpu seed=None num_policies=1 async_rl=True serial_mode=False batched_sampling=False num_batches_to_accumulate=2 worker_num_splits=2 policy_workers_per_policy=1 max_policy_lag=1000 num_workers=8 num_envs_per_worker=4 batch_size=1024 num_batches_per_epoch=1 num_epochs=1 rollout=32 recurrence=32 shuffle_minibatches=False gamma=0.99 reward_scale=1.0 reward_clip=1000.0 value_bootstrap=False normalize_returns=True exploration_loss_coeff=0.001 value_loss_coeff=0.5 kl_loss_coeff=0.0 exploration_loss=symmetric_kl gae_lambda=0.95 ppo_clip_ratio=0.1 ppo_clip_value=0.2 with_vtrace=False vtrace_rho=1.0 vtrace_c=1.0 optimizer=adam adam_eps=1e-06 adam_beta1=0.9 adam_beta2=0.999 max_grad_norm=4.0 learning_rate=0.0001 lr_schedule=constant lr_schedule_kl_threshold=0.008 lr_adaptive_min=1e-06 lr_adaptive_max=0.01 obs_subtract_mean=0.0 obs_scale=255.0 normalize_input=True normalize_input_keys=None decorrelate_experience_max_seconds=0 decorrelate_envs_on_one_worker=True actor_worker_gpus=[] set_workers_cpu_affinity=True force_envs_single_thread=False default_niceness=0 log_to_file=True experiment_summaries_interval=10 flush_summaries_interval=30 stats_avg=100 summaries_use_frameskip=True heartbeat_interval=20 heartbeat_reporting_interval=600 train_for_env_steps=5000000 train_for_seconds=10000000000 save_every_sec=120 keep_checkpoints=2 load_checkpoint_kind=latest save_milestones_sec=-1 save_best_every_sec=5 save_best_metric=reward save_best_after=100000 benchmark=False encoder_mlp_layers=[512, 512] encoder_conv_architecture=convnet_simple encoder_conv_mlp_layers=[512] use_rnn=True rnn_size=512 rnn_type=gru rnn_num_layers=1 decoder_mlp_layers=[] nonlinearity=elu policy_initialization=orthogonal policy_init_gain=1.0 actor_critic_share_weights=True adaptive_stddev=True continuous_tanh_scale=0.0 initial_stddev=1.0 use_env_info_cache=False env_gpu_actions=False env_gpu_observations=True env_frameskip=4 env_framestack=1 pixel_format=CHW use_record_episode_statistics=False with_wandb=False wandb_user=None wandb_project=sample_factory wandb_group=None wandb_job_type=SF wandb_tags=[] with_pbt=False pbt_mix_policies_in_one_env=True pbt_period_env_steps=5000000 pbt_start_mutation=20000000 pbt_replace_fraction=0.3 pbt_mutation_rate=0.15 pbt_replace_reward_gap=0.1 pbt_replace_reward_gap_absolute=1e-06 pbt_optimize_gamma=False pbt_target_objective=true_objective pbt_perturb_min=1.1 pbt_perturb_max=1.5 num_agents=-1 num_humans=0 num_bots=-1 start_bot_difficulty=None timelimit=None res_w=128 res_h=72 wide_aspect_ratio=False eval_env_frameskip=1 fps=35 command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} git_hash=unknown git_repo_name=not a git repository [2025-03-11 09:32:59,337][01034] Saving configuration to /content/train_dir/default_experiment/config.json... [2025-03-11 09:32:59,339][01034] Rollout worker 0 uses device cpu [2025-03-11 09:32:59,339][01034] Rollout worker 1 uses device cpu [2025-03-11 09:32:59,342][01034] Rollout worker 2 uses device cpu [2025-03-11 09:32:59,345][01034] Rollout worker 3 uses device cpu [2025-03-11 09:32:59,345][01034] Rollout worker 4 uses device cpu [2025-03-11 09:32:59,346][01034] Rollout worker 5 uses device cpu [2025-03-11 09:32:59,347][01034] Rollout worker 6 uses device cpu [2025-03-11 09:32:59,348][01034] Rollout worker 7 uses device cpu [2025-03-11 09:32:59,427][01034] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-03-11 09:32:59,427][01034] InferenceWorker_p0-w0: min num requests: 2 [2025-03-11 09:32:59,462][01034] Starting all processes... [2025-03-11 09:32:59,463][01034] Starting process learner_proc0 [2025-03-11 09:32:59,513][01034] Starting all processes... [2025-03-11 09:32:59,519][01034] Starting process inference_proc0-0 [2025-03-11 09:32:59,524][01034] Starting process rollout_proc0 [2025-03-11 09:32:59,524][01034] Starting process rollout_proc1 [2025-03-11 09:32:59,524][01034] Starting process rollout_proc2 [2025-03-11 09:32:59,524][01034] Starting process rollout_proc3 [2025-03-11 09:32:59,524][01034] Starting process rollout_proc4 [2025-03-11 09:32:59,524][01034] Starting process rollout_proc5 [2025-03-11 09:32:59,524][01034] Starting process rollout_proc6 [2025-03-11 09:32:59,524][01034] Starting process rollout_proc7 [2025-03-11 09:33:14,431][11994] Worker 5 uses CPU cores [1] [2025-03-11 09:33:14,697][11975] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-03-11 09:33:14,698][11975] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 [2025-03-11 09:33:14,762][11975] Num visible devices: 1 [2025-03-11 09:33:14,804][11988] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-03-11 09:33:14,805][11988] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 [2025-03-11 09:33:14,808][11975] Starting seed is not provided [2025-03-11 09:33:14,809][11975] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-03-11 09:33:14,810][11975] Initializing actor-critic model on device cuda:0 [2025-03-11 09:33:14,811][11975] RunningMeanStd input shape: (3, 72, 128) [2025-03-11 09:33:14,813][11975] RunningMeanStd input shape: (1,) [2025-03-11 09:33:14,839][11993] Worker 4 uses CPU cores [0] [2025-03-11 09:33:14,860][11975] ConvEncoder: input_channels=3 [2025-03-11 09:33:14,904][11988] Num visible devices: 1 [2025-03-11 09:33:14,965][11992] Worker 3 uses CPU cores [1] [2025-03-11 09:33:14,966][11995] Worker 7 uses CPU cores [1] [2025-03-11 09:33:15,098][11991] Worker 2 uses CPU cores [0] [2025-03-11 09:33:15,104][11989] Worker 0 uses CPU cores [0] [2025-03-11 09:33:15,127][11990] Worker 1 uses CPU cores [1] [2025-03-11 09:33:15,175][11996] Worker 6 uses CPU cores [0] [2025-03-11 09:33:15,204][11975] Conv encoder output size: 512 [2025-03-11 09:33:15,204][11975] Policy head output size: 512 [2025-03-11 09:33:15,221][11975] Created Actor Critic model with architecture: [2025-03-11 09:33:15,221][11975] ActorCriticSharedWeights( (obs_normalizer): ObservationNormalizer( (running_mean_std): RunningMeanStdDictInPlace( (running_mean_std): ModuleDict( (obs): RunningMeanStdInPlace() ) ) ) (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) (encoder): VizdoomEncoder( (basic_encoder): ConvEncoder( (enc): RecursiveScriptModule( original_name=ConvEncoderImpl (conv_head): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Conv2d) (1): RecursiveScriptModule(original_name=ELU) (2): RecursiveScriptModule(original_name=Conv2d) (3): RecursiveScriptModule(original_name=ELU) (4): RecursiveScriptModule(original_name=Conv2d) (5): RecursiveScriptModule(original_name=ELU) ) (mlp_layers): RecursiveScriptModule( original_name=Sequential (0): RecursiveScriptModule(original_name=Linear) (1): RecursiveScriptModule(original_name=ELU) ) ) ) ) (core): ModelCoreRNN( (core): GRU(512, 512) ) (decoder): MlpDecoder( (mlp): Identity() ) (critic_linear): Linear(in_features=512, out_features=1, bias=True) (action_parameterization): ActionParameterizationDefault( (distribution_linear): Linear(in_features=512, out_features=5, bias=True) ) ) [2025-03-11 09:33:15,464][11975] Using optimizer [2025-03-11 09:33:16,398][11975] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... [2025-03-11 09:33:16,441][11975] Loading model from checkpoint [2025-03-11 09:33:16,444][11975] Loaded experiment state at self.train_step=978, self.env_steps=4005888 [2025-03-11 09:33:16,444][11975] Initialized policy 0 weights for model version 978 [2025-03-11 09:33:16,447][11975] Using GPUs [0] for process 0 (actually maps to GPUs [0]) [2025-03-11 09:33:16,450][11975] LearnerWorker_p0 finished initialization! [2025-03-11 09:33:16,707][11988] RunningMeanStd input shape: (3, 72, 128) [2025-03-11 09:33:16,708][11988] RunningMeanStd input shape: (1,) [2025-03-11 09:33:16,720][11988] ConvEncoder: input_channels=3 [2025-03-11 09:33:16,820][11988] Conv encoder output size: 512 [2025-03-11 09:33:16,820][11988] Policy head output size: 512 [2025-03-11 09:33:16,855][01034] Inference worker 0-0 is ready! [2025-03-11 09:33:16,858][01034] All inference workers are ready! Signal rollout workers to start! [2025-03-11 09:33:17,136][11993] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:33:17,152][11995] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:33:17,153][11990] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:33:17,161][11991] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:33:17,170][11989] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:33:17,168][11996] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:33:17,180][11994] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:33:17,244][11992] Doom resolution: 160x120, resize resolution: (128, 72) [2025-03-11 09:33:18,192][11993] Decorrelating experience for 0 frames... [2025-03-11 09:33:18,200][11991] Decorrelating experience for 0 frames... [2025-03-11 09:33:18,492][11990] Decorrelating experience for 0 frames... [2025-03-11 09:33:18,491][11995] Decorrelating experience for 0 frames... [2025-03-11 09:33:18,523][11994] Decorrelating experience for 0 frames... [2025-03-11 09:33:19,418][01034] Heartbeat connected on Batcher_0 [2025-03-11 09:33:19,424][01034] Heartbeat connected on LearnerWorker_p0 [2025-03-11 09:33:19,466][01034] Heartbeat connected on InferenceWorker_p0-w0 [2025-03-11 09:33:19,525][11993] Decorrelating experience for 32 frames... [2025-03-11 09:33:19,563][11991] Decorrelating experience for 32 frames... [2025-03-11 09:33:19,607][11989] Decorrelating experience for 0 frames... [2025-03-11 09:33:20,362][01034] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-03-11 09:33:20,559][11995] Decorrelating experience for 32 frames... [2025-03-11 09:33:20,571][11990] Decorrelating experience for 32 frames... [2025-03-11 09:33:20,589][11992] Decorrelating experience for 0 frames... [2025-03-11 09:33:20,643][11994] Decorrelating experience for 32 frames... [2025-03-11 09:33:22,090][11996] Decorrelating experience for 0 frames... [2025-03-11 09:33:22,135][11989] Decorrelating experience for 32 frames... [2025-03-11 09:33:22,684][11993] Decorrelating experience for 64 frames... [2025-03-11 09:33:22,728][11991] Decorrelating experience for 64 frames... [2025-03-11 09:33:22,903][11990] Decorrelating experience for 64 frames... [2025-03-11 09:33:22,951][11994] Decorrelating experience for 64 frames... [2025-03-11 09:33:23,783][11995] Decorrelating experience for 64 frames... [2025-03-11 09:33:24,073][11996] Decorrelating experience for 32 frames... [2025-03-11 09:33:24,580][11990] Decorrelating experience for 96 frames... [2025-03-11 09:33:24,617][11989] Decorrelating experience for 64 frames... [2025-03-11 09:33:24,669][11993] Decorrelating experience for 96 frames... [2025-03-11 09:33:24,706][11991] Decorrelating experience for 96 frames... [2025-03-11 09:33:24,731][01034] Heartbeat connected on RolloutWorker_w1 [2025-03-11 09:33:24,934][01034] Heartbeat connected on RolloutWorker_w4 [2025-03-11 09:33:24,959][01034] Heartbeat connected on RolloutWorker_w2 [2025-03-11 09:33:25,178][11992] Decorrelating experience for 32 frames... [2025-03-11 09:33:25,363][01034] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-03-11 09:33:26,132][11995] Decorrelating experience for 96 frames... [2025-03-11 09:33:26,385][01034] Heartbeat connected on RolloutWorker_w7 [2025-03-11 09:33:27,002][11996] Decorrelating experience for 64 frames... [2025-03-11 09:33:27,316][11992] Decorrelating experience for 64 frames... [2025-03-11 09:33:28,643][11994] Decorrelating experience for 96 frames... [2025-03-11 09:33:29,174][01034] Heartbeat connected on RolloutWorker_w5 [2025-03-11 09:33:29,331][11975] Signal inference workers to stop experience collection... [2025-03-11 09:33:29,348][11988] InferenceWorker_p0-w0: stopping experience collection [2025-03-11 09:33:29,617][11992] Decorrelating experience for 96 frames... [2025-03-11 09:33:29,725][01034] Heartbeat connected on RolloutWorker_w3 [2025-03-11 09:33:29,861][11996] Decorrelating experience for 96 frames... [2025-03-11 09:33:29,982][01034] Heartbeat connected on RolloutWorker_w6 [2025-03-11 09:33:30,098][11989] Decorrelating experience for 96 frames... [2025-03-11 09:33:30,179][01034] Heartbeat connected on RolloutWorker_w0 [2025-03-11 09:33:30,362][01034] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 4005888. Throughput: 0: 197.8. Samples: 1978. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) [2025-03-11 09:33:30,363][01034] Avg episode reward: [(0, '3.446')] [2025-03-11 09:33:30,528][11975] Signal inference workers to resume experience collection... [2025-03-11 09:33:30,529][11988] InferenceWorker_p0-w0: resuming experience collection [2025-03-11 09:33:35,367][01034] Fps is (10 sec: 2456.7, 60 sec: 1637.8, 300 sec: 1637.8). Total num frames: 4030464. Throughput: 0: 465.6. Samples: 6986. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-03-11 09:33:35,370][01034] Avg episode reward: [(0, '9.019')] [2025-03-11 09:33:40,068][11988] Updated weights for policy 0, policy_version 988 (0.0013) [2025-03-11 09:33:40,362][01034] Fps is (10 sec: 4095.9, 60 sec: 2048.0, 300 sec: 2048.0). Total num frames: 4046848. Throughput: 0: 447.7. Samples: 8954. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-03-11 09:33:40,367][01034] Avg episode reward: [(0, '12.408')] [2025-03-11 09:33:45,362][01034] Fps is (10 sec: 4098.2, 60 sec: 2621.4, 300 sec: 2621.4). Total num frames: 4071424. Throughput: 0: 620.9. Samples: 15522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-03-11 09:33:45,366][01034] Avg episode reward: [(0, '15.496')] [2025-03-11 09:33:48,882][11988] Updated weights for policy 0, policy_version 998 (0.0021) [2025-03-11 09:33:50,362][01034] Fps is (10 sec: 4505.7, 60 sec: 2867.2, 300 sec: 2867.2). Total num frames: 4091904. Throughput: 0: 733.2. Samples: 21996. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-03-11 09:33:50,363][01034] Avg episode reward: [(0, '16.786')] [2025-03-11 09:33:55,362][01034] Fps is (10 sec: 3276.8, 60 sec: 2808.7, 300 sec: 2808.7). Total num frames: 4104192. Throughput: 0: 688.9. Samples: 24112. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-03-11 09:33:55,363][01034] Avg episode reward: [(0, '18.035')] [2025-03-11 09:33:59,613][11988] Updated weights for policy 0, policy_version 1008 (0.0014) [2025-03-11 09:34:00,362][01034] Fps is (10 sec: 3686.4, 60 sec: 3072.0, 300 sec: 3072.0). Total num frames: 4128768. Throughput: 0: 756.0. Samples: 30240. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:34:00,363][01034] Avg episode reward: [(0, '21.810')] [2025-03-11 09:34:05,362][01034] Fps is (10 sec: 4915.2, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 4153344. Throughput: 0: 832.1. Samples: 37446. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:34:05,365][01034] Avg episode reward: [(0, '22.686')] [2025-03-11 09:34:10,125][11988] Updated weights for policy 0, policy_version 1018 (0.0017) [2025-03-11 09:34:10,362][01034] Fps is (10 sec: 4096.0, 60 sec: 3276.8, 300 sec: 3276.8). Total num frames: 4169728. Throughput: 0: 882.4. Samples: 39708. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-03-11 09:34:10,363][01034] Avg episode reward: [(0, '23.657')] [2025-03-11 09:34:15,362][01034] Fps is (10 sec: 4096.0, 60 sec: 3425.7, 300 sec: 3425.7). Total num frames: 4194304. Throughput: 0: 982.0. Samples: 46168. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-03-11 09:34:15,363][01034] Avg episode reward: [(0, '25.134')] [2025-03-11 09:34:18,601][11988] Updated weights for policy 0, policy_version 1028 (0.0017) [2025-03-11 09:34:20,362][01034] Fps is (10 sec: 4505.5, 60 sec: 3481.6, 300 sec: 3481.6). Total num frames: 4214784. Throughput: 0: 1029.5. Samples: 53308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-03-11 09:34:20,363][01034] Avg episode reward: [(0, '25.168')] [2025-03-11 09:34:25,362][01034] Fps is (10 sec: 3686.4, 60 sec: 3754.8, 300 sec: 3465.8). Total num frames: 4231168. Throughput: 0: 1034.4. Samples: 55500. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-03-11 09:34:25,363][01034] Avg episode reward: [(0, '25.489')] [2025-03-11 09:34:25,371][11975] Saving new best policy, reward=25.489! [2025-03-11 09:34:28,907][11988] Updated weights for policy 0, policy_version 1038 (0.0029) [2025-03-11 09:34:30,362][01034] Fps is (10 sec: 4096.1, 60 sec: 4164.3, 300 sec: 3569.4). Total num frames: 4255744. Throughput: 0: 1029.4. Samples: 61844. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-03-11 09:34:30,368][01034] Avg episode reward: [(0, '25.401')] [2025-03-11 09:34:35,362][01034] Fps is (10 sec: 3686.4, 60 sec: 3959.8, 300 sec: 3495.3). Total num frames: 4268032. Throughput: 0: 992.1. Samples: 66642. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-03-11 09:34:35,366][01034] Avg episode reward: [(0, '25.305')] [2025-03-11 09:34:40,362][01034] Fps is (10 sec: 2867.2, 60 sec: 3959.5, 300 sec: 3481.6). Total num frames: 4284416. Throughput: 0: 992.8. Samples: 68786. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-03-11 09:34:40,365][01034] Avg episode reward: [(0, '25.804')] [2025-03-11 09:34:40,369][11975] Saving new best policy, reward=25.804! [2025-03-11 09:34:41,484][11988] Updated weights for policy 0, policy_version 1048 (0.0018) [2025-03-11 09:34:45,362][01034] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3565.9). Total num frames: 4308992. Throughput: 0: 1001.5. Samples: 75306. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-03-11 09:34:45,365][01034] Avg episode reward: [(0, '26.943')] [2025-03-11 09:34:45,371][11975] Saving new best policy, reward=26.943! [2025-03-11 09:34:50,362][01034] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3595.4). Total num frames: 4329472. Throughput: 0: 993.6. Samples: 82158. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-03-11 09:34:50,367][01034] Avg episode reward: [(0, '26.996')] [2025-03-11 09:34:50,385][11975] Saving new best policy, reward=26.996! [2025-03-11 09:34:50,393][11988] Updated weights for policy 0, policy_version 1058 (0.0023) [2025-03-11 09:34:55,362][01034] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3578.6). Total num frames: 4345856. Throughput: 0: 991.2. Samples: 84314. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-03-11 09:34:55,364][01034] Avg episode reward: [(0, '27.098')] [2025-03-11 09:34:55,444][11975] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001062_4349952.pth... [2025-03-11 09:34:55,589][11975] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000970_3973120.pth [2025-03-11 09:34:55,603][11975] Saving new best policy, reward=27.098! [2025-03-11 09:35:00,362][01034] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3645.4). Total num frames: 4370432. Throughput: 0: 986.2. Samples: 90548. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-03-11 09:35:00,367][01034] Avg episode reward: [(0, '27.904')] [2025-03-11 09:35:00,372][11975] Saving new best policy, reward=27.904! [2025-03-11 09:35:00,612][11988] Updated weights for policy 0, policy_version 1068 (0.0048) [2025-03-11 09:35:05,362][01034] Fps is (10 sec: 4505.6, 60 sec: 3959.5, 300 sec: 3666.9). Total num frames: 4390912. Throughput: 0: 980.5. Samples: 97430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-03-11 09:35:05,366][01034] Avg episode reward: [(0, '29.205')] [2025-03-11 09:35:05,372][11975] Saving new best policy, reward=29.205! [2025-03-11 09:35:10,362][01034] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3686.4). Total num frames: 4411392. Throughput: 0: 978.3. Samples: 99522. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-03-11 09:35:10,367][01034] Avg episode reward: [(0, '29.362')] [2025-03-11 09:35:10,370][11975] Saving new best policy, reward=29.362! [2025-03-11 09:35:11,248][11988] Updated weights for policy 0, policy_version 1078 (0.0022) [2025-03-11 09:35:15,362][01034] Fps is (10 sec: 4096.0, 60 sec: 3959.5, 300 sec: 3704.2). Total num frames: 4431872. Throughput: 0: 985.6. Samples: 106194. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-03-11 09:35:15,366][01034] Avg episode reward: [(0, '27.649')] [2025-03-11 09:35:20,292][11988] Updated weights for policy 0, policy_version 1088 (0.0020) [2025-03-11 09:35:20,364][01034] Fps is (10 sec: 4504.4, 60 sec: 4027.6, 300 sec: 3754.6). Total num frames: 4456448. Throughput: 0: 1024.1. Samples: 112730. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-03-11 09:35:20,367][01034] Avg episode reward: [(0, '26.766')] [2025-03-11 09:35:25,362][01034] Fps is (10 sec: 4096.0, 60 sec: 4027.7, 300 sec: 3735.6). Total num frames: 4472832. Throughput: 0: 1024.6. Samples: 114892. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:35:25,366][01034] Avg episode reward: [(0, '26.297')] [2025-03-11 09:35:30,362][01034] Fps is (10 sec: 3687.3, 60 sec: 3959.5, 300 sec: 3749.4). Total num frames: 4493312. Throughput: 0: 1024.0. Samples: 121386. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-03-11 09:35:30,363][01034] Avg episode reward: [(0, '24.421')] [2025-03-11 09:35:30,760][11988] Updated weights for policy 0, policy_version 1098 (0.0034) [2025-03-11 09:35:35,368][01034] Fps is (10 sec: 4093.3, 60 sec: 4095.6, 300 sec: 3762.1). Total num frames: 4513792. Throughput: 0: 1015.5. Samples: 127864. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-03-11 09:35:35,370][01034] Avg episode reward: [(0, '24.652')] [2025-03-11 09:35:40,362][01034] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3774.2). Total num frames: 4534272. Throughput: 0: 1016.6. Samples: 130060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-03-11 09:35:40,365][01034] Avg episode reward: [(0, '23.710')] [2025-03-11 09:35:41,360][11988] Updated weights for policy 0, policy_version 1108 (0.0024) [2025-03-11 09:35:45,362][01034] Fps is (10 sec: 4098.7, 60 sec: 4096.0, 300 sec: 3785.3). Total num frames: 4554752. Throughput: 0: 1028.6. Samples: 136834. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-03-11 09:35:45,366][01034] Avg episode reward: [(0, '24.926')] [2025-03-11 09:35:50,362][01034] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3795.6). Total num frames: 4575232. Throughput: 0: 1021.3. Samples: 143388. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-03-11 09:35:50,366][01034] Avg episode reward: [(0, '24.932')] [2025-03-11 09:35:50,596][11988] Updated weights for policy 0, policy_version 1118 (0.0018) [2025-03-11 09:35:55,362][01034] Fps is (10 sec: 4096.0, 60 sec: 4164.3, 300 sec: 3805.3). Total num frames: 4595712. Throughput: 0: 1021.7. Samples: 145500. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-03-11 09:35:55,368][01034] Avg episode reward: [(0, '24.212')] [2025-03-11 09:36:00,362][01034] Fps is (10 sec: 4096.0, 60 sec: 4096.0, 300 sec: 3814.4). Total num frames: 4616192. Throughput: 0: 1022.8. Samples: 152220. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-03-11 09:36:00,367][01034] Avg episode reward: [(0, '24.105')] [2025-03-11 09:36:00,587][11988] Updated weights for policy 0, policy_version 1128 (0.0021) [2025-03-11 09:36:05,364][01034] Fps is (10 sec: 4095.2, 60 sec: 4095.9, 300 sec: 3822.9). Total num frames: 4636672. Throughput: 0: 1020.2. Samples: 158640. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-03-11 09:36:05,365][01034] Avg episode reward: [(0, '24.591')] [2025-03-11 09:36:10,362][01034] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3806.9). Total num frames: 4653056. Throughput: 0: 1018.2. Samples: 160712. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-03-11 09:36:10,364][01034] Avg episode reward: [(0, '25.626')] [2025-03-11 09:36:11,436][11988] Updated weights for policy 0, policy_version 1138 (0.0016) [2025-03-11 09:36:15,362][01034] Fps is (10 sec: 4096.8, 60 sec: 4096.0, 300 sec: 3838.5). Total num frames: 4677632. Throughput: 0: 1020.0. Samples: 167286. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-03-11 09:36:15,365][01034] Avg episode reward: [(0, '26.171')] [2025-03-11 09:36:20,364][01034] Fps is (10 sec: 4504.6, 60 sec: 4027.8, 300 sec: 3845.6). Total num frames: 4698112. Throughput: 0: 1011.0. Samples: 173354. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-03-11 09:36:20,365][01034] Avg episode reward: [(0, '27.801')] [2025-03-11 09:36:21,961][11988] Updated weights for policy 0, policy_version 1148 (0.0024) [2025-03-11 09:36:25,362][01034] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3830.3). Total num frames: 4714496. Throughput: 0: 1007.4. Samples: 175392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-03-11 09:36:25,365][01034] Avg episode reward: [(0, '29.018')] [2025-03-11 09:36:30,362][01034] Fps is (10 sec: 4097.0, 60 sec: 4096.0, 300 sec: 3858.9). Total num frames: 4739072. Throughput: 0: 1007.9. Samples: 182188. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-03-11 09:36:30,363][01034] Avg episode reward: [(0, '27.383')] [2025-03-11 09:36:31,344][11988] Updated weights for policy 0, policy_version 1158 (0.0017) [2025-03-11 09:36:35,362][01034] Fps is (10 sec: 4095.8, 60 sec: 4028.1, 300 sec: 3843.9). Total num frames: 4755456. Throughput: 0: 997.2. Samples: 188262. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0) [2025-03-11 09:36:35,364][01034] Avg episode reward: [(0, '26.201')] [2025-03-11 09:36:40,362][01034] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3850.2). Total num frames: 4775936. Throughput: 0: 995.5. Samples: 190296. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0) [2025-03-11 09:36:40,363][01034] Avg episode reward: [(0, '25.372')] [2025-03-11 09:36:42,078][11988] Updated weights for policy 0, policy_version 1168 (0.0029) [2025-03-11 09:36:45,362][01034] Fps is (10 sec: 4096.2, 60 sec: 4027.7, 300 sec: 3856.2). Total num frames: 4796416. Throughput: 0: 1000.5. Samples: 197244. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0) [2025-03-11 09:36:45,363][01034] Avg episode reward: [(0, '24.422')] [2025-03-11 09:36:50,363][01034] Fps is (10 sec: 4095.7, 60 sec: 4027.7, 300 sec: 3861.9). Total num frames: 4816896. Throughput: 0: 992.2. Samples: 203286. Policy #0 lag: (min: 0.0, avg: 0.4, max: 2.0) [2025-03-11 09:36:50,364][01034] Avg episode reward: [(0, '24.667')] [2025-03-11 09:36:52,769][11988] Updated weights for policy 0, policy_version 1178 (0.0017) [2025-03-11 09:36:55,362][01034] Fps is (10 sec: 3686.4, 60 sec: 3959.5, 300 sec: 3848.3). Total num frames: 4833280. Throughput: 0: 992.8. Samples: 205390. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0) [2025-03-11 09:36:55,365][01034] Avg episode reward: [(0, '25.056')] [2025-03-11 09:36:55,375][11975] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001180_4833280.pth... [2025-03-11 09:36:55,572][11975] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth [2025-03-11 09:37:00,362][01034] Fps is (10 sec: 4096.4, 60 sec: 4027.7, 300 sec: 3872.6). Total num frames: 4857856. Throughput: 0: 996.9. Samples: 212146. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-03-11 09:37:00,367][01034] Avg episode reward: [(0, '26.580')] [2025-03-11 09:37:01,909][11988] Updated weights for policy 0, policy_version 1188 (0.0013) [2025-03-11 09:37:05,362][01034] Fps is (10 sec: 4096.0, 60 sec: 3959.6, 300 sec: 3859.3). Total num frames: 4874240. Throughput: 0: 993.3. Samples: 218050. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-03-11 09:37:05,369][01034] Avg episode reward: [(0, '26.225')] [2025-03-11 09:37:10,362][01034] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3864.5). Total num frames: 4894720. Throughput: 0: 997.3. Samples: 220272. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-03-11 09:37:10,365][01034] Avg episode reward: [(0, '25.605')] [2025-03-11 09:37:12,703][11988] Updated weights for policy 0, policy_version 1198 (0.0013) [2025-03-11 09:37:15,362][01034] Fps is (10 sec: 4505.6, 60 sec: 4027.7, 300 sec: 3886.8). Total num frames: 4919296. Throughput: 0: 1000.5. Samples: 227212. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-03-11 09:37:15,365][01034] Avg episode reward: [(0, '26.486')] [2025-03-11 09:37:20,364][01034] Fps is (10 sec: 4095.0, 60 sec: 3959.5, 300 sec: 3874.1). Total num frames: 4935680. Throughput: 0: 996.2. Samples: 233092. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-03-11 09:37:20,365][01034] Avg episode reward: [(0, '25.940')] [2025-03-11 09:37:23,538][11988] Updated weights for policy 0, policy_version 1208 (0.0034) [2025-03-11 09:37:25,362][01034] Fps is (10 sec: 3686.4, 60 sec: 4027.7, 300 sec: 3878.7). Total num frames: 4956160. Throughput: 0: 1000.9. Samples: 235336. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0) [2025-03-11 09:37:25,365][01034] Avg episode reward: [(0, '26.098')] [2025-03-11 09:37:30,362][01034] Fps is (10 sec: 4097.0, 60 sec: 3959.5, 300 sec: 3883.0). Total num frames: 4976640. Throughput: 0: 1004.2. Samples: 242432. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0) [2025-03-11 09:37:30,363][01034] Avg episode reward: [(0, '26.919')] [2025-03-11 09:37:32,059][11988] Updated weights for policy 0, policy_version 1218 (0.0021) [2025-03-11 09:37:35,362][01034] Fps is (10 sec: 4096.0, 60 sec: 4027.8, 300 sec: 3887.2). Total num frames: 4997120. Throughput: 0: 1004.7. Samples: 248496. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0) [2025-03-11 09:37:35,365][01034] Avg episode reward: [(0, '27.347')] [2025-03-11 09:37:37,397][11975] Stopping Batcher_0... [2025-03-11 09:37:37,397][11975] Loop batcher_evt_loop terminating... [2025-03-11 09:37:37,397][01034] Component Batcher_0 stopped! [2025-03-11 09:37:37,399][11975] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2025-03-11 09:37:37,458][11988] Weights refcount: 2 0 [2025-03-11 09:37:37,463][01034] Component InferenceWorker_p0-w0 stopped! [2025-03-11 09:37:37,465][11988] Stopping InferenceWorker_p0-w0... [2025-03-11 09:37:37,467][11988] Loop inference_proc0-0_evt_loop terminating... [2025-03-11 09:37:37,521][11975] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001062_4349952.pth [2025-03-11 09:37:37,530][11975] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2025-03-11 09:37:37,711][11975] Stopping LearnerWorker_p0... [2025-03-11 09:37:37,712][01034] Component LearnerWorker_p0 stopped! [2025-03-11 09:37:37,715][11975] Loop learner_proc0_evt_loop terminating... [2025-03-11 09:37:37,767][01034] Component RolloutWorker_w6 stopped! [2025-03-11 09:37:37,768][11996] Stopping RolloutWorker_w6... [2025-03-11 09:37:37,770][01034] Component RolloutWorker_w4 stopped! [2025-03-11 09:37:37,772][11993] Stopping RolloutWorker_w4... [2025-03-11 09:37:37,768][11996] Loop rollout_proc6_evt_loop terminating... [2025-03-11 09:37:37,772][11993] Loop rollout_proc4_evt_loop terminating... [2025-03-11 09:37:37,781][01034] Component RolloutWorker_w2 stopped! [2025-03-11 09:37:37,782][11991] Stopping RolloutWorker_w2... [2025-03-11 09:37:37,788][11991] Loop rollout_proc2_evt_loop terminating... [2025-03-11 09:37:37,808][01034] Component RolloutWorker_w0 stopped! [2025-03-11 09:37:37,808][11989] Stopping RolloutWorker_w0... [2025-03-11 09:37:37,811][11989] Loop rollout_proc0_evt_loop terminating... [2025-03-11 09:37:37,955][11994] Stopping RolloutWorker_w5... [2025-03-11 09:37:37,955][11994] Loop rollout_proc5_evt_loop terminating... [2025-03-11 09:37:37,963][11995] Stopping RolloutWorker_w7... [2025-03-11 09:37:37,963][11995] Loop rollout_proc7_evt_loop terminating... [2025-03-11 09:37:37,963][01034] Component RolloutWorker_w5 stopped! [2025-03-11 09:37:37,966][01034] Component RolloutWorker_w7 stopped! [2025-03-11 09:37:37,982][11992] Stopping RolloutWorker_w3... [2025-03-11 09:37:37,982][01034] Component RolloutWorker_w3 stopped! [2025-03-11 09:37:37,993][11992] Loop rollout_proc3_evt_loop terminating... [2025-03-11 09:37:38,023][11990] Stopping RolloutWorker_w1... [2025-03-11 09:37:38,026][11990] Loop rollout_proc1_evt_loop terminating... [2025-03-11 09:37:38,023][01034] Component RolloutWorker_w1 stopped! [2025-03-11 09:37:38,027][01034] Waiting for process learner_proc0 to stop... [2025-03-11 09:37:39,434][01034] Waiting for process inference_proc0-0 to join... [2025-03-11 09:37:39,435][01034] Waiting for process rollout_proc0 to join... [2025-03-11 09:37:41,580][01034] Waiting for process rollout_proc1 to join... [2025-03-11 09:37:41,604][01034] Waiting for process rollout_proc2 to join... [2025-03-11 09:37:41,608][01034] Waiting for process rollout_proc3 to join... [2025-03-11 09:37:41,610][01034] Waiting for process rollout_proc4 to join... [2025-03-11 09:37:41,612][01034] Waiting for process rollout_proc5 to join... [2025-03-11 09:37:41,613][01034] Waiting for process rollout_proc6 to join... [2025-03-11 09:37:41,614][01034] Waiting for process rollout_proc7 to join... [2025-03-11 09:37:41,616][01034] Batcher 0 profile tree view: batching: 6.6567, releasing_batches: 0.0064 [2025-03-11 09:37:41,617][01034] InferenceWorker_p0-w0 profile tree view: wait_policy: 0.0000 wait_policy_total: 106.0202 update_model: 2.0983 weight_update: 0.0023 one_step: 0.0027 handle_policy_step: 142.7545 deserialize: 3.5692, stack: 0.7917, obs_to_device_normalize: 30.2719, forward: 73.4046, send_messages: 6.8078 prepare_outputs: 21.8253 to_cpu: 13.4822 [2025-03-11 09:37:41,618][01034] Learner 0 profile tree view: misc: 0.0010, prepare_batch: 4.1704 train: 20.0778 epoch_init: 0.0011, minibatch_init: 0.0014, losses_postprocess: 0.1680, kl_divergence: 0.1749, after_optimizer: 0.7446 calculate_losses: 6.6746 losses_init: 0.0008, forward_head: 0.6318, bptt_initial: 4.0933, tail: 0.3026, advantages_returns: 0.0758, losses: 0.9827 bptt: 0.5234 bptt_forward_core: 0.5085 update: 12.1652 clip: 0.2320 [2025-03-11 09:37:41,619][01034] RolloutWorker_w0 profile tree view: wait_for_trajectories: 0.0540, enqueue_policy_requests: 23.6359, env_step: 197.9996, overhead: 2.9385, complete_rollouts: 1.6932 save_policy_outputs: 4.2113 split_output_tensors: 1.5586 [2025-03-11 09:37:41,621][01034] RolloutWorker_w7 profile tree view: wait_for_trajectories: 0.0829, enqueue_policy_requests: 25.4075, env_step: 197.3198, overhead: 3.0436, complete_rollouts: 1.7271 save_policy_outputs: 4.7781 split_output_tensors: 1.9377 [2025-03-11 09:37:41,623][01034] Loop Runner_EvtLoop terminating... [2025-03-11 09:37:41,624][01034] Runner profile tree view: main_loop: 282.1627 [2025-03-11 09:37:41,625][01034] Collected {0: 5005312}, FPS: 3542.0 [2025-03-11 09:38:17,617][01034] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-03-11 09:38:17,619][01034] Overriding arg 'num_workers' with value 1 passed from command line [2025-03-11 09:38:17,620][01034] Adding new argument 'no_render'=True that is not in the saved config file! [2025-03-11 09:38:17,621][01034] Adding new argument 'save_video'=True that is not in the saved config file! [2025-03-11 09:38:17,622][01034] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-03-11 09:38:17,623][01034] Adding new argument 'video_name'=None that is not in the saved config file! [2025-03-11 09:38:17,624][01034] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! [2025-03-11 09:38:17,625][01034] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-03-11 09:38:17,626][01034] Adding new argument 'push_to_hub'=False that is not in the saved config file! [2025-03-11 09:38:17,626][01034] Adding new argument 'hf_repository'=None that is not in the saved config file! [2025-03-11 09:38:17,627][01034] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-03-11 09:38:17,628][01034] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-03-11 09:38:17,629][01034] Adding new argument 'train_script'=None that is not in the saved config file! [2025-03-11 09:38:17,629][01034] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-03-11 09:38:17,630][01034] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-03-11 09:38:17,669][01034] RunningMeanStd input shape: (3, 72, 128) [2025-03-11 09:38:17,671][01034] RunningMeanStd input shape: (1,) [2025-03-11 09:38:17,690][01034] ConvEncoder: input_channels=3 [2025-03-11 09:38:17,728][01034] Conv encoder output size: 512 [2025-03-11 09:38:17,729][01034] Policy head output size: 512 [2025-03-11 09:38:17,748][01034] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2025-03-11 09:38:18,162][01034] Num frames 100... [2025-03-11 09:38:18,290][01034] Num frames 200... [2025-03-11 09:38:18,425][01034] Num frames 300... [2025-03-11 09:38:18,570][01034] Num frames 400... [2025-03-11 09:38:18,705][01034] Num frames 500... [2025-03-11 09:38:18,842][01034] Num frames 600... [2025-03-11 09:38:18,973][01034] Num frames 700... [2025-03-11 09:38:19,104][01034] Num frames 800... [2025-03-11 09:38:19,233][01034] Num frames 900... [2025-03-11 09:38:19,362][01034] Num frames 1000... [2025-03-11 09:38:19,500][01034] Num frames 1100... [2025-03-11 09:38:19,636][01034] Num frames 1200... [2025-03-11 09:38:19,780][01034] Num frames 1300... [2025-03-11 09:38:19,912][01034] Num frames 1400... [2025-03-11 09:38:20,045][01034] Num frames 1500... [2025-03-11 09:38:20,175][01034] Num frames 1600... [2025-03-11 09:38:20,309][01034] Num frames 1700... [2025-03-11 09:38:20,453][01034] Num frames 1800... [2025-03-11 09:38:20,582][01034] Num frames 1900... [2025-03-11 09:38:20,716][01034] Num frames 2000... [2025-03-11 09:38:20,861][01034] Num frames 2100... [2025-03-11 09:38:20,913][01034] Avg episode rewards: #0: 63.999, true rewards: #0: 21.000 [2025-03-11 09:38:20,914][01034] Avg episode reward: 63.999, avg true_objective: 21.000 [2025-03-11 09:38:21,044][01034] Num frames 2200... [2025-03-11 09:38:21,174][01034] Num frames 2300... [2025-03-11 09:38:21,305][01034] Num frames 2400... [2025-03-11 09:38:21,490][01034] Num frames 2500... [2025-03-11 09:38:21,663][01034] Num frames 2600... [2025-03-11 09:38:21,767][01034] Avg episode rewards: #0: 37.129, true rewards: #0: 13.130 [2025-03-11 09:38:21,768][01034] Avg episode reward: 37.129, avg true_objective: 13.130 [2025-03-11 09:38:21,907][01034] Num frames 2700... [2025-03-11 09:38:22,076][01034] Num frames 2800... [2025-03-11 09:38:22,247][01034] Num frames 2900... [2025-03-11 09:38:22,421][01034] Num frames 3000... [2025-03-11 09:38:22,594][01034] Num frames 3100... [2025-03-11 09:38:22,768][01034] Num frames 3200... [2025-03-11 09:38:22,958][01034] Num frames 3300... [2025-03-11 09:38:23,139][01034] Num frames 3400... [2025-03-11 09:38:23,325][01034] Num frames 3500... [2025-03-11 09:38:23,524][01034] Num frames 3600... [2025-03-11 09:38:23,682][01034] Num frames 3700... [2025-03-11 09:38:23,813][01034] Num frames 3800... [2025-03-11 09:38:23,957][01034] Num frames 3900... [2025-03-11 09:38:24,088][01034] Num frames 4000... [2025-03-11 09:38:24,216][01034] Num frames 4100... [2025-03-11 09:38:24,348][01034] Num frames 4200... [2025-03-11 09:38:24,482][01034] Avg episode rewards: #0: 38.843, true rewards: #0: 14.177 [2025-03-11 09:38:24,483][01034] Avg episode reward: 38.843, avg true_objective: 14.177 [2025-03-11 09:38:24,544][01034] Num frames 4300... [2025-03-11 09:38:24,672][01034] Num frames 4400... [2025-03-11 09:38:24,811][01034] Num frames 4500... [2025-03-11 09:38:24,950][01034] Num frames 4600... [2025-03-11 09:38:25,088][01034] Num frames 4700... [2025-03-11 09:38:25,217][01034] Num frames 4800... [2025-03-11 09:38:25,347][01034] Num frames 4900... [2025-03-11 09:38:25,484][01034] Num frames 5000... [2025-03-11 09:38:25,615][01034] Num frames 5100... [2025-03-11 09:38:25,745][01034] Num frames 5200... [2025-03-11 09:38:25,903][01034] Num frames 5300... [2025-03-11 09:38:26,040][01034] Avg episode rewards: #0: 35.372, true rewards: #0: 13.372 [2025-03-11 09:38:26,041][01034] Avg episode reward: 35.372, avg true_objective: 13.372 [2025-03-11 09:38:26,109][01034] Num frames 5400... [2025-03-11 09:38:26,241][01034] Num frames 5500... [2025-03-11 09:38:26,372][01034] Num frames 5600... [2025-03-11 09:38:26,509][01034] Num frames 5700... [2025-03-11 09:38:26,635][01034] Num frames 5800... [2025-03-11 09:38:26,768][01034] Num frames 5900... [2025-03-11 09:38:26,871][01034] Avg episode rewards: #0: 30.472, true rewards: #0: 11.872 [2025-03-11 09:38:26,871][01034] Avg episode reward: 30.472, avg true_objective: 11.872 [2025-03-11 09:38:26,955][01034] Num frames 6000... [2025-03-11 09:38:27,093][01034] Num frames 6100... [2025-03-11 09:38:27,224][01034] Num frames 6200... [2025-03-11 09:38:27,352][01034] Num frames 6300... [2025-03-11 09:38:27,486][01034] Num frames 6400... [2025-03-11 09:38:27,616][01034] Num frames 6500... [2025-03-11 09:38:27,745][01034] Num frames 6600... [2025-03-11 09:38:27,811][01034] Avg episode rewards: #0: 27.513, true rewards: #0: 11.013 [2025-03-11 09:38:27,812][01034] Avg episode reward: 27.513, avg true_objective: 11.013 [2025-03-11 09:38:27,935][01034] Num frames 6700... [2025-03-11 09:38:28,075][01034] Num frames 6800... [2025-03-11 09:38:28,202][01034] Num frames 6900... [2025-03-11 09:38:28,333][01034] Num frames 7000... [2025-03-11 09:38:28,467][01034] Num frames 7100... [2025-03-11 09:38:28,596][01034] Num frames 7200... [2025-03-11 09:38:28,724][01034] Num frames 7300... [2025-03-11 09:38:28,857][01034] Num frames 7400... [2025-03-11 09:38:28,987][01034] Num frames 7500... [2025-03-11 09:38:29,126][01034] Num frames 7600... [2025-03-11 09:38:29,257][01034] Num frames 7700... [2025-03-11 09:38:29,429][01034] Avg episode rewards: #0: 27.417, true rewards: #0: 11.131 [2025-03-11 09:38:29,430][01034] Avg episode reward: 27.417, avg true_objective: 11.131 [2025-03-11 09:38:29,442][01034] Num frames 7800... [2025-03-11 09:38:29,570][01034] Num frames 7900... [2025-03-11 09:38:29,702][01034] Num frames 8000... [2025-03-11 09:38:29,834][01034] Num frames 8100... [2025-03-11 09:38:29,963][01034] Num frames 8200... [2025-03-11 09:38:30,102][01034] Num frames 8300... [2025-03-11 09:38:30,230][01034] Num frames 8400... [2025-03-11 09:38:30,357][01034] Num frames 8500... [2025-03-11 09:38:30,493][01034] Num frames 8600... [2025-03-11 09:38:30,623][01034] Num frames 8700... [2025-03-11 09:38:30,754][01034] Num frames 8800... [2025-03-11 09:38:30,884][01034] Num frames 8900... [2025-03-11 09:38:31,014][01034] Num frames 9000... [2025-03-11 09:38:31,155][01034] Num frames 9100... [2025-03-11 09:38:31,291][01034] Num frames 9200... [2025-03-11 09:38:31,429][01034] Num frames 9300... [2025-03-11 09:38:31,561][01034] Num frames 9400... [2025-03-11 09:38:31,693][01034] Num frames 9500... [2025-03-11 09:38:31,828][01034] Num frames 9600... [2025-03-11 09:38:31,966][01034] Num frames 9700... [2025-03-11 09:38:32,102][01034] Num frames 9800... [2025-03-11 09:38:32,247][01034] Avg episode rewards: #0: 30.826, true rewards: #0: 12.326 [2025-03-11 09:38:32,248][01034] Avg episode reward: 30.826, avg true_objective: 12.326 [2025-03-11 09:38:32,299][01034] Num frames 9900... [2025-03-11 09:38:32,437][01034] Num frames 10000... [2025-03-11 09:38:32,567][01034] Num frames 10100... [2025-03-11 09:38:32,725][01034] Num frames 10200... [2025-03-11 09:38:32,855][01034] Num frames 10300... [2025-03-11 09:38:32,987][01034] Num frames 10400... [2025-03-11 09:38:33,119][01034] Num frames 10500... [2025-03-11 09:38:33,264][01034] Num frames 10600... [2025-03-11 09:38:33,392][01034] Num frames 10700... [2025-03-11 09:38:33,535][01034] Num frames 10800... [2025-03-11 09:38:33,683][01034] Num frames 10900... [2025-03-11 09:38:33,864][01034] Num frames 11000... [2025-03-11 09:38:33,988][01034] Avg episode rewards: #0: 30.595, true rewards: #0: 12.262 [2025-03-11 09:38:33,989][01034] Avg episode reward: 30.595, avg true_objective: 12.262 [2025-03-11 09:38:34,105][01034] Num frames 11100... [2025-03-11 09:38:34,280][01034] Num frames 11200... [2025-03-11 09:38:34,451][01034] Num frames 11300... [2025-03-11 09:38:34,616][01034] Num frames 11400... [2025-03-11 09:38:34,783][01034] Num frames 11500... [2025-03-11 09:38:34,961][01034] Num frames 11600... [2025-03-11 09:38:35,040][01034] Avg episode rewards: #0: 28.912, true rewards: #0: 11.612 [2025-03-11 09:38:35,041][01034] Avg episode reward: 28.912, avg true_objective: 11.612 [2025-03-11 09:39:42,700][01034] Replay video saved to /content/train_dir/default_experiment/replay.mp4! [2025-03-11 09:40:02,896][01034] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json [2025-03-11 09:40:02,897][01034] Overriding arg 'num_workers' with value 1 passed from command line [2025-03-11 09:40:02,898][01034] Adding new argument 'no_render'=True that is not in the saved config file! [2025-03-11 09:40:02,901][01034] Adding new argument 'save_video'=True that is not in the saved config file! [2025-03-11 09:40:02,902][01034] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! [2025-03-11 09:40:02,903][01034] Adding new argument 'video_name'=None that is not in the saved config file! [2025-03-11 09:40:02,904][01034] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! [2025-03-11 09:40:02,904][01034] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! [2025-03-11 09:40:02,905][01034] Adding new argument 'push_to_hub'=True that is not in the saved config file! [2025-03-11 09:40:02,909][01034] Adding new argument 'hf_repository'='so7en/Doom_unit8_2' that is not in the saved config file! [2025-03-11 09:40:02,910][01034] Adding new argument 'policy_index'=0 that is not in the saved config file! [2025-03-11 09:40:02,910][01034] Adding new argument 'eval_deterministic'=False that is not in the saved config file! [2025-03-11 09:40:02,911][01034] Adding new argument 'train_script'=None that is not in the saved config file! [2025-03-11 09:40:02,912][01034] Adding new argument 'enjoy_script'=None that is not in the saved config file! [2025-03-11 09:40:02,913][01034] Using frameskip 1 and render_action_repeat=4 for evaluation [2025-03-11 09:40:02,954][01034] RunningMeanStd input shape: (3, 72, 128) [2025-03-11 09:40:02,956][01034] RunningMeanStd input shape: (1,) [2025-03-11 09:40:02,970][01034] ConvEncoder: input_channels=3 [2025-03-11 09:40:03,025][01034] Conv encoder output size: 512 [2025-03-11 09:40:03,026][01034] Policy head output size: 512 [2025-03-11 09:40:03,052][01034] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001222_5005312.pth... [2025-03-11 09:40:03,620][01034] Num frames 100... [2025-03-11 09:40:03,755][01034] Num frames 200... [2025-03-11 09:40:03,887][01034] Num frames 300... [2025-03-11 09:40:04,015][01034] Num frames 400... [2025-03-11 09:40:04,142][01034] Num frames 500... [2025-03-11 09:40:04,277][01034] Num frames 600... [2025-03-11 09:40:04,415][01034] Num frames 700... [2025-03-11 09:40:04,543][01034] Num frames 800... [2025-03-11 09:40:04,673][01034] Num frames 900... [2025-03-11 09:40:04,802][01034] Num frames 1000... [2025-03-11 09:40:04,926][01034] Num frames 1100... [2025-03-11 09:40:05,057][01034] Num frames 1200... [2025-03-11 09:40:05,189][01034] Num frames 1300... [2025-03-11 09:40:05,372][01034] Avg episode rewards: #0: 38.950, true rewards: #0: 13.950 [2025-03-11 09:40:05,374][01034] Avg episode reward: 38.950, avg true_objective: 13.950 [2025-03-11 09:40:05,383][01034] Num frames 1400... [2025-03-11 09:40:05,516][01034] Num frames 1500... [2025-03-11 09:40:05,648][01034] Num frames 1600... [2025-03-11 09:40:05,776][01034] Num frames 1700... [2025-03-11 09:40:05,846][01034] Avg episode rewards: #0: 22.055, true rewards: #0: 8.555 [2025-03-11 09:40:05,846][01034] Avg episode reward: 22.055, avg true_objective: 8.555 [2025-03-11 09:40:05,957][01034] Num frames 1800... [2025-03-11 09:40:06,092][01034] Num frames 1900... [2025-03-11 09:40:06,217][01034] Num frames 2000... [2025-03-11 09:40:06,356][01034] Num frames 2100... [2025-03-11 09:40:06,491][01034] Num frames 2200... [2025-03-11 09:40:06,624][01034] Num frames 2300... [2025-03-11 09:40:06,756][01034] Num frames 2400... [2025-03-11 09:40:06,883][01034] Num frames 2500... [2025-03-11 09:40:07,013][01034] Num frames 2600... [2025-03-11 09:40:07,142][01034] Num frames 2700... [2025-03-11 09:40:07,284][01034] Num frames 2800... [2025-03-11 09:40:07,420][01034] Num frames 2900... [2025-03-11 09:40:07,550][01034] Num frames 3000... [2025-03-11 09:40:07,679][01034] Num frames 3100... [2025-03-11 09:40:07,846][01034] Avg episode rewards: #0: 27.277, true rewards: #0: 10.610 [2025-03-11 09:40:07,847][01034] Avg episode reward: 27.277, avg true_objective: 10.610 [2025-03-11 09:40:07,870][01034] Num frames 3200... [2025-03-11 09:40:07,993][01034] Num frames 3300... [2025-03-11 09:40:08,120][01034] Num frames 3400... [2025-03-11 09:40:08,248][01034] Num frames 3500... [2025-03-11 09:40:08,383][01034] Num frames 3600... [2025-03-11 09:40:08,523][01034] Num frames 3700... [2025-03-11 09:40:08,650][01034] Num frames 3800... [2025-03-11 09:40:08,778][01034] Num frames 3900... [2025-03-11 09:40:08,907][01034] Num frames 4000... [2025-03-11 09:40:09,035][01034] Num frames 4100... [2025-03-11 09:40:09,197][01034] Avg episode rewards: #0: 26.195, true rewards: #0: 10.445 [2025-03-11 09:40:09,198][01034] Avg episode reward: 26.195, avg true_objective: 10.445 [2025-03-11 09:40:09,234][01034] Num frames 4200... [2025-03-11 09:40:09,378][01034] Num frames 4300... [2025-03-11 09:40:09,516][01034] Num frames 4400... [2025-03-11 09:40:09,641][01034] Num frames 4500... [2025-03-11 09:40:09,768][01034] Num frames 4600... [2025-03-11 09:40:09,897][01034] Num frames 4700... [2025-03-11 09:40:10,022][01034] Num frames 4800... [2025-03-11 09:40:10,150][01034] Num frames 4900... [2025-03-11 09:40:10,282][01034] Num frames 5000... [2025-03-11 09:40:10,430][01034] Num frames 5100... [2025-03-11 09:40:10,559][01034] Num frames 5200... [2025-03-11 09:40:10,684][01034] Num frames 5300... [2025-03-11 09:40:10,813][01034] Num frames 5400... [2025-03-11 09:40:10,941][01034] Num frames 5500... [2025-03-11 09:40:11,070][01034] Num frames 5600... [2025-03-11 09:40:11,195][01034] Num frames 5700... [2025-03-11 09:40:11,323][01034] Num frames 5800... [2025-03-11 09:40:11,483][01034] Num frames 5900... [2025-03-11 09:40:11,647][01034] Avg episode rewards: #0: 30.364, true rewards: #0: 11.964 [2025-03-11 09:40:11,648][01034] Avg episode reward: 30.364, avg true_objective: 11.964 [2025-03-11 09:40:11,674][01034] Num frames 6000... [2025-03-11 09:40:11,810][01034] Num frames 6100... [2025-03-11 09:40:11,938][01034] Num frames 6200... [2025-03-11 09:40:12,066][01034] Num frames 6300... [2025-03-11 09:40:12,246][01034] Avg episode rewards: #0: 26.497, true rewards: #0: 10.663 [2025-03-11 09:40:12,247][01034] Avg episode reward: 26.497, avg true_objective: 10.663 [2025-03-11 09:40:12,251][01034] Num frames 6400... [2025-03-11 09:40:12,378][01034] Num frames 6500... [2025-03-11 09:40:12,519][01034] Num frames 6600... [2025-03-11 09:40:12,649][01034] Num frames 6700... [2025-03-11 09:40:12,778][01034] Num frames 6800... [2025-03-11 09:40:12,906][01034] Num frames 6900... [2025-03-11 09:40:13,015][01034] Avg episode rewards: #0: 23.774, true rewards: #0: 9.917 [2025-03-11 09:40:13,016][01034] Avg episode reward: 23.774, avg true_objective: 9.917 [2025-03-11 09:40:13,092][01034] Num frames 7000... [2025-03-11 09:40:13,222][01034] Num frames 7100... [2025-03-11 09:40:13,368][01034] Num frames 7200... [2025-03-11 09:40:13,562][01034] Num frames 7300... [2025-03-11 09:40:13,733][01034] Num frames 7400... [2025-03-11 09:40:13,904][01034] Num frames 7500... [2025-03-11 09:40:14,074][01034] Num frames 7600... [2025-03-11 09:40:14,240][01034] Num frames 7700... [2025-03-11 09:40:14,410][01034] Num frames 7800... [2025-03-11 09:40:14,589][01034] Num frames 7900... [2025-03-11 09:40:14,705][01034] Avg episode rewards: #0: 23.042, true rewards: #0: 9.917 [2025-03-11 09:40:14,707][01034] Avg episode reward: 23.042, avg true_objective: 9.917 [2025-03-11 09:40:14,825][01034] Num frames 8000... [2025-03-11 09:40:14,999][01034] Num frames 8100... [2025-03-11 09:40:15,180][01034] Num frames 8200... [2025-03-11 09:40:15,362][01034] Num frames 8300... [2025-03-11 09:40:15,554][01034] Num frames 8400... [2025-03-11 09:40:15,698][01034] Num frames 8500... [2025-03-11 09:40:15,824][01034] Num frames 8600... [2025-03-11 09:40:15,956][01034] Num frames 8700... [2025-03-11 09:40:16,090][01034] Num frames 8800... [2025-03-11 09:40:16,225][01034] Avg episode rewards: #0: 22.847, true rewards: #0: 9.847 [2025-03-11 09:40:16,225][01034] Avg episode reward: 22.847, avg true_objective: 9.847 [2025-03-11 09:40:16,279][01034] Num frames 8900... [2025-03-11 09:40:16,415][01034] Num frames 9000... [2025-03-11 09:40:16,552][01034] Num frames 9100... [2025-03-11 09:40:16,683][01034] Num frames 9200... [2025-03-11 09:40:16,812][01034] Num frames 9300... [2025-03-11 09:40:16,943][01034] Num frames 9400... [2025-03-11 09:40:17,072][01034] Num frames 9500... [2025-03-11 09:40:17,203][01034] Num frames 9600... [2025-03-11 09:40:17,387][01034] Avg episode rewards: #0: 22.094, true rewards: #0: 9.694 [2025-03-11 09:40:17,388][01034] Avg episode reward: 22.094, avg true_objective: 9.694 [2025-03-11 09:41:11,174][01034] Replay video saved to /content/train_dir/default_experiment/replay.mp4!