mjm54's picture
Upload folder using huggingface_hub
83dd87e verified
[2025-02-11 16:58:19,123][02117] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-02-11 16:58:19,125][02117] Rollout worker 0 uses device cpu
[2025-02-11 16:58:19,126][02117] Rollout worker 1 uses device cpu
[2025-02-11 16:58:19,128][02117] Rollout worker 2 uses device cpu
[2025-02-11 16:58:19,129][02117] Rollout worker 3 uses device cpu
[2025-02-11 16:58:19,130][02117] Rollout worker 4 uses device cpu
[2025-02-11 16:58:19,131][02117] Rollout worker 5 uses device cpu
[2025-02-11 16:58:19,133][02117] Rollout worker 6 uses device cpu
[2025-02-11 16:58:19,135][02117] Rollout worker 7 uses device cpu
[2025-02-11 16:58:19,247][02117] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 16:58:19,248][02117] InferenceWorker_p0-w0: min num requests: 2
[2025-02-11 16:58:19,281][02117] Starting all processes...
[2025-02-11 16:58:19,282][02117] Starting process learner_proc0
[2025-02-11 16:58:19,341][02117] Starting all processes...
[2025-02-11 16:58:19,346][02117] Starting process inference_proc0-0
[2025-02-11 16:58:19,347][02117] Starting process rollout_proc0
[2025-02-11 16:58:19,348][02117] Starting process rollout_proc1
[2025-02-11 16:58:19,349][02117] Starting process rollout_proc2
[2025-02-11 16:58:19,349][02117] Starting process rollout_proc3
[2025-02-11 16:58:19,349][02117] Starting process rollout_proc4
[2025-02-11 16:58:19,351][02117] Starting process rollout_proc5
[2025-02-11 16:58:19,352][02117] Starting process rollout_proc6
[2025-02-11 16:58:19,356][02117] Starting process rollout_proc7
[2025-02-11 16:58:22,077][04730] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 16:58:22,077][04730] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-02-11 16:58:22,099][04730] Num visible devices: 1
[2025-02-11 16:58:22,187][04733] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 16:58:22,239][04734] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 16:58:22,253][04731] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 16:58:22,418][04717] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 16:58:22,418][04717] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-02-11 16:58:22,438][04717] Num visible devices: 1
[2025-02-11 16:58:22,439][04737] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 16:58:22,450][04732] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 16:58:22,473][04717] Starting seed is not provided
[2025-02-11 16:58:22,474][04717] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 16:58:22,474][04717] Initializing actor-critic model on device cuda:0
[2025-02-11 16:58:22,474][04717] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 16:58:22,478][04717] RunningMeanStd input shape: (1,)
[2025-02-11 16:58:22,488][04736] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 16:58:22,492][04717] ConvEncoder: input_channels=3
[2025-02-11 16:58:22,532][04735] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 16:58:22,538][04738] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 16:58:22,783][04717] Conv encoder output size: 512
[2025-02-11 16:58:22,783][04717] Policy head output size: 512
[2025-02-11 16:58:22,843][04717] Created Actor Critic model with architecture:
[2025-02-11 16:58:22,843][04717] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-02-11 16:58:23,065][04717] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-02-11 16:58:27,331][04717] No checkpoints found
[2025-02-11 16:58:27,331][04717] Did not load from checkpoint, starting from scratch!
[2025-02-11 16:58:27,331][04717] Initialized policy 0 weights for model version 0
[2025-02-11 16:58:27,333][04717] LearnerWorker_p0 finished initialization!
[2025-02-11 16:58:27,333][04717] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 16:58:27,415][04730] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 16:58:27,416][04730] RunningMeanStd input shape: (1,)
[2025-02-11 16:58:27,427][04730] ConvEncoder: input_channels=3
[2025-02-11 16:58:27,530][04730] Conv encoder output size: 512
[2025-02-11 16:58:27,531][04730] Policy head output size: 512
[2025-02-11 16:58:27,566][02117] Inference worker 0-0 is ready!
[2025-02-11 16:58:27,567][02117] All inference workers are ready! Signal rollout workers to start!
[2025-02-11 16:58:27,609][04734] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 16:58:27,610][04733] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 16:58:27,620][04736] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 16:58:27,620][04737] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 16:58:27,620][04731] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 16:58:27,621][04735] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 16:58:27,622][04738] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 16:58:27,622][04732] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 16:58:27,943][04733] Decorrelating experience for 0 frames...
[2025-02-11 16:58:27,943][04731] Decorrelating experience for 0 frames...
[2025-02-11 16:58:27,943][04736] Decorrelating experience for 0 frames...
[2025-02-11 16:58:27,943][04737] Decorrelating experience for 0 frames...
[2025-02-11 16:58:27,943][04735] Decorrelating experience for 0 frames...
[2025-02-11 16:58:27,943][04734] Decorrelating experience for 0 frames...
[2025-02-11 16:58:28,176][04732] Decorrelating experience for 0 frames...
[2025-02-11 16:58:28,204][04735] Decorrelating experience for 32 frames...
[2025-02-11 16:58:28,211][04736] Decorrelating experience for 32 frames...
[2025-02-11 16:58:28,211][04733] Decorrelating experience for 32 frames...
[2025-02-11 16:58:28,218][04737] Decorrelating experience for 32 frames...
[2025-02-11 16:58:28,219][04734] Decorrelating experience for 32 frames...
[2025-02-11 16:58:28,440][04732] Decorrelating experience for 32 frames...
[2025-02-11 16:58:28,506][04738] Decorrelating experience for 0 frames...
[2025-02-11 16:58:28,560][04733] Decorrelating experience for 64 frames...
[2025-02-11 16:58:28,561][04734] Decorrelating experience for 64 frames...
[2025-02-11 16:58:28,685][04737] Decorrelating experience for 64 frames...
[2025-02-11 16:58:28,756][04738] Decorrelating experience for 32 frames...
[2025-02-11 16:58:28,766][04732] Decorrelating experience for 64 frames...
[2025-02-11 16:58:28,772][04736] Decorrelating experience for 64 frames...
[2025-02-11 16:58:28,889][04733] Decorrelating experience for 96 frames...
[2025-02-11 16:58:28,999][04737] Decorrelating experience for 96 frames...
[2025-02-11 16:58:29,008][04734] Decorrelating experience for 96 frames...
[2025-02-11 16:58:29,047][04735] Decorrelating experience for 64 frames...
[2025-02-11 16:58:29,165][04738] Decorrelating experience for 64 frames...
[2025-02-11 16:58:29,281][04732] Decorrelating experience for 96 frames...
[2025-02-11 16:58:29,294][04731] Decorrelating experience for 32 frames...
[2025-02-11 16:58:29,329][04735] Decorrelating experience for 96 frames...
[2025-02-11 16:58:29,477][04738] Decorrelating experience for 96 frames...
[2025-02-11 16:58:29,570][04736] Decorrelating experience for 96 frames...
[2025-02-11 16:58:29,591][02117] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-11 16:58:29,637][04731] Decorrelating experience for 64 frames...
[2025-02-11 16:58:29,914][04731] Decorrelating experience for 96 frames...
[2025-02-11 16:58:30,515][04717] Signal inference workers to stop experience collection...
[2025-02-11 16:58:30,521][04730] InferenceWorker_p0-w0: stopping experience collection
[2025-02-11 16:58:31,874][04717] Signal inference workers to resume experience collection...
[2025-02-11 16:58:31,874][04730] InferenceWorker_p0-w0: resuming experience collection
[2025-02-11 16:58:33,649][04730] Updated weights for policy 0, policy_version 10 (0.0091)
[2025-02-11 16:58:34,591][02117] Fps is (10 sec: 11468.5, 60 sec: 11468.5, 300 sec: 11468.5). Total num frames: 57344. Throughput: 0: 2059.2. Samples: 10296. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-11 16:58:34,594][02117] Avg episode reward: [(0, '4.406')]
[2025-02-11 16:58:35,801][04730] Updated weights for policy 0, policy_version 20 (0.0011)
[2025-02-11 16:58:37,831][04730] Updated weights for policy 0, policy_version 30 (0.0012)
[2025-02-11 16:58:39,239][02117] Heartbeat connected on Batcher_0
[2025-02-11 16:58:39,253][02117] Heartbeat connected on InferenceWorker_p0-w0
[2025-02-11 16:58:39,255][02117] Heartbeat connected on RolloutWorker_w0
[2025-02-11 16:58:39,258][02117] Heartbeat connected on RolloutWorker_w1
[2025-02-11 16:58:39,265][02117] Heartbeat connected on RolloutWorker_w3
[2025-02-11 16:58:39,267][02117] Heartbeat connected on RolloutWorker_w2
[2025-02-11 16:58:39,270][02117] Heartbeat connected on LearnerWorker_p0
[2025-02-11 16:58:39,272][02117] Heartbeat connected on RolloutWorker_w4
[2025-02-11 16:58:39,274][02117] Heartbeat connected on RolloutWorker_w5
[2025-02-11 16:58:39,277][02117] Heartbeat connected on RolloutWorker_w6
[2025-02-11 16:58:39,281][02117] Heartbeat connected on RolloutWorker_w7
[2025-02-11 16:58:39,591][02117] Fps is (10 sec: 15564.6, 60 sec: 15564.6, 300 sec: 15564.6). Total num frames: 155648. Throughput: 0: 3989.2. Samples: 39892. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-02-11 16:58:39,593][02117] Avg episode reward: [(0, '4.354')]
[2025-02-11 16:58:39,595][04717] Saving new best policy, reward=4.354!
[2025-02-11 16:58:39,882][04730] Updated weights for policy 0, policy_version 40 (0.0011)
[2025-02-11 16:58:41,914][04730] Updated weights for policy 0, policy_version 50 (0.0012)
[2025-02-11 16:58:44,017][04730] Updated weights for policy 0, policy_version 60 (0.0012)
[2025-02-11 16:58:44,591][02117] Fps is (10 sec: 19660.8, 60 sec: 16930.0, 300 sec: 16930.0). Total num frames: 253952. Throughput: 0: 3656.8. Samples: 54852. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-11 16:58:44,594][02117] Avg episode reward: [(0, '4.916')]
[2025-02-11 16:58:44,632][04717] Saving new best policy, reward=4.916!
[2025-02-11 16:58:46,073][04730] Updated weights for policy 0, policy_version 70 (0.0012)
[2025-02-11 16:58:48,187][04730] Updated weights for policy 0, policy_version 80 (0.0012)
[2025-02-11 16:58:49,591][02117] Fps is (10 sec: 19661.0, 60 sec: 17612.8, 300 sec: 17612.8). Total num frames: 352256. Throughput: 0: 4224.2. Samples: 84484. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 16:58:49,593][02117] Avg episode reward: [(0, '4.411')]
[2025-02-11 16:58:50,261][04730] Updated weights for policy 0, policy_version 90 (0.0011)
[2025-02-11 16:58:52,302][04730] Updated weights for policy 0, policy_version 100 (0.0012)
[2025-02-11 16:58:54,339][04730] Updated weights for policy 0, policy_version 110 (0.0012)
[2025-02-11 16:58:54,591][02117] Fps is (10 sec: 20070.4, 60 sec: 18186.2, 300 sec: 18186.2). Total num frames: 454656. Throughput: 0: 4584.7. Samples: 114618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 16:58:54,593][02117] Avg episode reward: [(0, '5.041')]
[2025-02-11 16:58:54,600][04717] Saving new best policy, reward=5.041!
[2025-02-11 16:58:56,397][04730] Updated weights for policy 0, policy_version 120 (0.0012)
[2025-02-11 16:58:58,458][04730] Updated weights for policy 0, policy_version 130 (0.0012)
[2025-02-11 16:58:59,591][02117] Fps is (10 sec: 20070.4, 60 sec: 18432.0, 300 sec: 18432.0). Total num frames: 552960. Throughput: 0: 4319.6. Samples: 129588. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 16:58:59,592][02117] Avg episode reward: [(0, '5.187')]
[2025-02-11 16:58:59,596][04717] Saving new best policy, reward=5.187!
[2025-02-11 16:59:00,573][04730] Updated weights for policy 0, policy_version 140 (0.0012)
[2025-02-11 16:59:02,651][04730] Updated weights for policy 0, policy_version 150 (0.0011)
[2025-02-11 16:59:04,591][02117] Fps is (10 sec: 19660.8, 60 sec: 18607.5, 300 sec: 18607.5). Total num frames: 651264. Throughput: 0: 4546.2. Samples: 159118. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 16:59:04,593][02117] Avg episode reward: [(0, '5.322')]
[2025-02-11 16:59:04,599][04717] Saving new best policy, reward=5.322!
[2025-02-11 16:59:04,725][04730] Updated weights for policy 0, policy_version 160 (0.0012)
[2025-02-11 16:59:06,685][04730] Updated weights for policy 0, policy_version 170 (0.0011)
[2025-02-11 16:59:08,703][04730] Updated weights for policy 0, policy_version 180 (0.0012)
[2025-02-11 16:59:09,591][02117] Fps is (10 sec: 20070.4, 60 sec: 18841.6, 300 sec: 18841.6). Total num frames: 753664. Throughput: 0: 4740.0. Samples: 189600. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 16:59:09,594][02117] Avg episode reward: [(0, '5.649')]
[2025-02-11 16:59:09,596][04717] Saving new best policy, reward=5.649!
[2025-02-11 16:59:10,701][04730] Updated weights for policy 0, policy_version 190 (0.0011)
[2025-02-11 16:59:12,765][04730] Updated weights for policy 0, policy_version 200 (0.0012)
[2025-02-11 16:59:14,591][02117] Fps is (10 sec: 20070.3, 60 sec: 18932.6, 300 sec: 18932.6). Total num frames: 851968. Throughput: 0: 4547.4. Samples: 204632. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-02-11 16:59:14,594][02117] Avg episode reward: [(0, '7.183')]
[2025-02-11 16:59:14,601][04717] Saving new best policy, reward=7.183!
[2025-02-11 16:59:14,895][04730] Updated weights for policy 0, policy_version 210 (0.0011)
[2025-02-11 16:59:16,906][04730] Updated weights for policy 0, policy_version 220 (0.0011)
[2025-02-11 16:59:18,910][04730] Updated weights for policy 0, policy_version 230 (0.0011)
[2025-02-11 16:59:19,591][02117] Fps is (10 sec: 20070.4, 60 sec: 19087.3, 300 sec: 19087.3). Total num frames: 954368. Throughput: 0: 4981.7. Samples: 234470. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-02-11 16:59:19,593][02117] Avg episode reward: [(0, '7.850')]
[2025-02-11 16:59:19,595][04717] Saving new best policy, reward=7.850!
[2025-02-11 16:59:20,937][04730] Updated weights for policy 0, policy_version 240 (0.0012)
[2025-02-11 16:59:22,946][04730] Updated weights for policy 0, policy_version 250 (0.0012)
[2025-02-11 16:59:24,591][02117] Fps is (10 sec: 20480.1, 60 sec: 19213.9, 300 sec: 19213.9). Total num frames: 1056768. Throughput: 0: 4998.8. Samples: 264838. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-02-11 16:59:24,593][02117] Avg episode reward: [(0, '8.006')]
[2025-02-11 16:59:24,600][04717] Saving new best policy, reward=8.006!
[2025-02-11 16:59:25,009][04730] Updated weights for policy 0, policy_version 260 (0.0012)
[2025-02-11 16:59:27,111][04730] Updated weights for policy 0, policy_version 270 (0.0012)
[2025-02-11 16:59:29,119][04730] Updated weights for policy 0, policy_version 280 (0.0011)
[2025-02-11 16:59:29,591][02117] Fps is (10 sec: 20070.4, 60 sec: 19251.2, 300 sec: 19251.2). Total num frames: 1155072. Throughput: 0: 4994.5. Samples: 279602. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0)
[2025-02-11 16:59:29,593][02117] Avg episode reward: [(0, '9.474')]
[2025-02-11 16:59:29,596][04717] Saving new best policy, reward=9.474!
[2025-02-11 16:59:31,146][04730] Updated weights for policy 0, policy_version 290 (0.0012)
[2025-02-11 16:59:33,153][04730] Updated weights for policy 0, policy_version 300 (0.0011)
[2025-02-11 16:59:34,591][02117] Fps is (10 sec: 20070.3, 60 sec: 20002.1, 300 sec: 19345.7). Total num frames: 1257472. Throughput: 0: 5012.0. Samples: 310026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-02-11 16:59:34,593][02117] Avg episode reward: [(0, '10.984')]
[2025-02-11 16:59:34,601][04717] Saving new best policy, reward=10.984!
[2025-02-11 16:59:35,151][04730] Updated weights for policy 0, policy_version 310 (0.0012)
[2025-02-11 16:59:37,170][04730] Updated weights for policy 0, policy_version 320 (0.0012)
[2025-02-11 16:59:39,312][04730] Updated weights for policy 0, policy_version 330 (0.0011)
[2025-02-11 16:59:39,591][02117] Fps is (10 sec: 20070.2, 60 sec: 20002.1, 300 sec: 19368.2). Total num frames: 1355776. Throughput: 0: 5008.7. Samples: 340008. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 16:59:39,593][02117] Avg episode reward: [(0, '12.348')]
[2025-02-11 16:59:39,595][04717] Saving new best policy, reward=12.348!
[2025-02-11 16:59:41,328][04730] Updated weights for policy 0, policy_version 340 (0.0011)
[2025-02-11 16:59:43,318][04730] Updated weights for policy 0, policy_version 350 (0.0012)
[2025-02-11 16:59:44,591][02117] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19442.3). Total num frames: 1458176. Throughput: 0: 5017.1. Samples: 355360. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 16:59:44,593][02117] Avg episode reward: [(0, '16.962')]
[2025-02-11 16:59:44,599][04717] Saving new best policy, reward=16.962!
[2025-02-11 16:59:45,359][04730] Updated weights for policy 0, policy_version 360 (0.0011)
[2025-02-11 16:59:47,346][04730] Updated weights for policy 0, policy_version 370 (0.0011)
[2025-02-11 16:59:49,324][04730] Updated weights for policy 0, policy_version 380 (0.0011)
[2025-02-11 16:59:49,591][02117] Fps is (10 sec: 20479.9, 60 sec: 20138.6, 300 sec: 19507.2). Total num frames: 1560576. Throughput: 0: 5041.5. Samples: 385986. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 16:59:49,593][02117] Avg episode reward: [(0, '17.832')]
[2025-02-11 16:59:49,596][04717] Saving new best policy, reward=17.832!
[2025-02-11 16:59:51,439][04730] Updated weights for policy 0, policy_version 390 (0.0012)
[2025-02-11 16:59:53,484][04730] Updated weights for policy 0, policy_version 400 (0.0012)
[2025-02-11 16:59:54,591][02117] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19516.2). Total num frames: 1658880. Throughput: 0: 5029.1. Samples: 415908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 16:59:54,593][02117] Avg episode reward: [(0, '16.814')]
[2025-02-11 16:59:55,508][04730] Updated weights for policy 0, policy_version 410 (0.0012)
[2025-02-11 16:59:57,516][04730] Updated weights for policy 0, policy_version 420 (0.0012)
[2025-02-11 16:59:59,493][04730] Updated weights for policy 0, policy_version 430 (0.0011)
[2025-02-11 16:59:59,591][02117] Fps is (10 sec: 20070.5, 60 sec: 20138.6, 300 sec: 19569.8). Total num frames: 1761280. Throughput: 0: 5034.8. Samples: 431198. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 16:59:59,592][02117] Avg episode reward: [(0, '20.721')]
[2025-02-11 16:59:59,596][04717] Saving new best policy, reward=20.721!
[2025-02-11 17:00:01,494][04730] Updated weights for policy 0, policy_version 440 (0.0011)
[2025-02-11 17:00:03,555][04730] Updated weights for policy 0, policy_version 450 (0.0012)
[2025-02-11 17:00:04,591][02117] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 19574.5). Total num frames: 1859584. Throughput: 0: 5051.1. Samples: 461768. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:00:04,593][02117] Avg episode reward: [(0, '19.242')]
[2025-02-11 17:00:05,622][04730] Updated weights for policy 0, policy_version 460 (0.0011)
[2025-02-11 17:00:07,592][04730] Updated weights for policy 0, policy_version 470 (0.0012)
[2025-02-11 17:00:09,582][04730] Updated weights for policy 0, policy_version 480 (0.0012)
[2025-02-11 17:00:09,591][02117] Fps is (10 sec: 20480.2, 60 sec: 20206.9, 300 sec: 19660.8). Total num frames: 1966080. Throughput: 0: 5057.7. Samples: 492434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:00:09,592][02117] Avg episode reward: [(0, '19.533')]
[2025-02-11 17:00:11,583][04730] Updated weights for policy 0, policy_version 490 (0.0011)
[2025-02-11 17:00:13,593][04730] Updated weights for policy 0, policy_version 500 (0.0012)
[2025-02-11 17:00:14,591][02117] Fps is (10 sec: 20889.9, 60 sec: 20275.3, 300 sec: 19699.8). Total num frames: 2068480. Throughput: 0: 5071.8. Samples: 507832. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:00:14,592][02117] Avg episode reward: [(0, '17.718')]
[2025-02-11 17:00:14,599][04717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000505_2068480.pth...
[2025-02-11 17:00:15,631][04730] Updated weights for policy 0, policy_version 510 (0.0011)
[2025-02-11 17:00:17,716][04730] Updated weights for policy 0, policy_version 520 (0.0011)
[2025-02-11 17:00:19,591][02117] Fps is (10 sec: 20070.2, 60 sec: 20206.9, 300 sec: 19698.0). Total num frames: 2166784. Throughput: 0: 5063.2. Samples: 537868. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:00:19,593][02117] Avg episode reward: [(0, '19.044')]
[2025-02-11 17:00:19,699][04730] Updated weights for policy 0, policy_version 530 (0.0011)
[2025-02-11 17:00:21,691][04730] Updated weights for policy 0, policy_version 540 (0.0012)
[2025-02-11 17:00:23,667][04730] Updated weights for policy 0, policy_version 550 (0.0012)
[2025-02-11 17:00:24,591][02117] Fps is (10 sec: 20070.1, 60 sec: 20206.9, 300 sec: 19732.0). Total num frames: 2269184. Throughput: 0: 5083.7. Samples: 568776. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:00:24,593][02117] Avg episode reward: [(0, '22.192')]
[2025-02-11 17:00:24,600][04717] Saving new best policy, reward=22.192!
[2025-02-11 17:00:25,657][04730] Updated weights for policy 0, policy_version 560 (0.0012)
[2025-02-11 17:00:27,633][04730] Updated weights for policy 0, policy_version 570 (0.0011)
[2025-02-11 17:00:29,591][02117] Fps is (10 sec: 20480.2, 60 sec: 20275.2, 300 sec: 19763.2). Total num frames: 2371584. Throughput: 0: 5085.0. Samples: 584186. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:00:29,593][02117] Avg episode reward: [(0, '20.078')]
[2025-02-11 17:00:29,675][04730] Updated weights for policy 0, policy_version 580 (0.0011)
[2025-02-11 17:00:31,723][04730] Updated weights for policy 0, policy_version 590 (0.0011)
[2025-02-11 17:00:33,742][04730] Updated weights for policy 0, policy_version 600 (0.0011)
[2025-02-11 17:00:34,591][02117] Fps is (10 sec: 20480.3, 60 sec: 20275.2, 300 sec: 19791.9). Total num frames: 2473984. Throughput: 0: 5075.5. Samples: 614384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:00:34,592][02117] Avg episode reward: [(0, '22.330')]
[2025-02-11 17:00:34,601][04717] Saving new best policy, reward=22.208!
[2025-02-11 17:00:35,756][04730] Updated weights for policy 0, policy_version 610 (0.0011)
[2025-02-11 17:00:37,760][04730] Updated weights for policy 0, policy_version 620 (0.0011)
[2025-02-11 17:00:39,591][02117] Fps is (10 sec: 20480.1, 60 sec: 20343.5, 300 sec: 19818.3). Total num frames: 2576384. Throughput: 0: 5092.9. Samples: 645086. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:00:39,593][02117] Avg episode reward: [(0, '23.133')]
[2025-02-11 17:00:39,595][04717] Saving new best policy, reward=23.133!
[2025-02-11 17:00:39,740][04730] Updated weights for policy 0, policy_version 630 (0.0012)
[2025-02-11 17:00:41,780][04730] Updated weights for policy 0, policy_version 640 (0.0012)
[2025-02-11 17:00:43,832][04730] Updated weights for policy 0, policy_version 650 (0.0012)
[2025-02-11 17:00:44,591][02117] Fps is (10 sec: 20070.0, 60 sec: 20275.2, 300 sec: 19812.5). Total num frames: 2674688. Throughput: 0: 5084.3. Samples: 659990. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:00:44,594][02117] Avg episode reward: [(0, '24.806')]
[2025-02-11 17:00:44,600][04717] Saving new best policy, reward=24.806!
[2025-02-11 17:00:45,849][04730] Updated weights for policy 0, policy_version 660 (0.0012)
[2025-02-11 17:00:47,837][04730] Updated weights for policy 0, policy_version 670 (0.0012)
[2025-02-11 17:00:49,591][02117] Fps is (10 sec: 20070.3, 60 sec: 20275.2, 300 sec: 19836.3). Total num frames: 2777088. Throughput: 0: 5086.9. Samples: 690680. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:00:49,592][02117] Avg episode reward: [(0, '22.971')]
[2025-02-11 17:00:49,820][04730] Updated weights for policy 0, policy_version 680 (0.0011)
[2025-02-11 17:00:51,818][04730] Updated weights for policy 0, policy_version 690 (0.0011)
[2025-02-11 17:00:53,843][04730] Updated weights for policy 0, policy_version 700 (0.0011)
[2025-02-11 17:00:54,591][02117] Fps is (10 sec: 20480.2, 60 sec: 20343.5, 300 sec: 19858.5). Total num frames: 2879488. Throughput: 0: 5083.0. Samples: 721168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:00:54,593][02117] Avg episode reward: [(0, '22.836')]
[2025-02-11 17:00:55,903][04730] Updated weights for policy 0, policy_version 710 (0.0011)
[2025-02-11 17:00:57,926][04730] Updated weights for policy 0, policy_version 720 (0.0011)
[2025-02-11 17:00:59,591][02117] Fps is (10 sec: 20479.8, 60 sec: 20343.5, 300 sec: 19879.2). Total num frames: 2981888. Throughput: 0: 5076.0. Samples: 736252. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:00:59,593][02117] Avg episode reward: [(0, '25.789')]
[2025-02-11 17:00:59,596][04717] Saving new best policy, reward=25.789!
[2025-02-11 17:00:59,915][04730] Updated weights for policy 0, policy_version 730 (0.0011)
[2025-02-11 17:01:01,897][04730] Updated weights for policy 0, policy_version 740 (0.0012)
[2025-02-11 17:01:03,868][04730] Updated weights for policy 0, policy_version 750 (0.0011)
[2025-02-11 17:01:04,591][02117] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 19898.6). Total num frames: 3084288. Throughput: 0: 5095.6. Samples: 767168. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:01:04,593][02117] Avg episode reward: [(0, '24.300')]
[2025-02-11 17:01:05,874][04730] Updated weights for policy 0, policy_version 760 (0.0011)
[2025-02-11 17:01:08,006][04730] Updated weights for policy 0, policy_version 770 (0.0012)
[2025-02-11 17:01:09,591][02117] Fps is (10 sec: 20070.4, 60 sec: 20275.2, 300 sec: 19891.2). Total num frames: 3182592. Throughput: 0: 5073.1. Samples: 797064. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:01:09,594][02117] Avg episode reward: [(0, '26.269')]
[2025-02-11 17:01:09,596][04717] Saving new best policy, reward=26.269!
[2025-02-11 17:01:10,073][04730] Updated weights for policy 0, policy_version 780 (0.0012)
[2025-02-11 17:01:12,071][04730] Updated weights for policy 0, policy_version 790 (0.0012)
[2025-02-11 17:01:14,072][04730] Updated weights for policy 0, policy_version 800 (0.0011)
[2025-02-11 17:01:14,591][02117] Fps is (10 sec: 20070.5, 60 sec: 20275.2, 300 sec: 19909.0). Total num frames: 3284992. Throughput: 0: 5068.6. Samples: 812272. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:01:14,593][02117] Avg episode reward: [(0, '26.525')]
[2025-02-11 17:01:14,600][04717] Saving new best policy, reward=26.525!
[2025-02-11 17:01:16,076][04730] Updated weights for policy 0, policy_version 810 (0.0011)
[2025-02-11 17:01:18,085][04730] Updated weights for policy 0, policy_version 820 (0.0011)
[2025-02-11 17:01:19,591][02117] Fps is (10 sec: 20480.0, 60 sec: 20343.5, 300 sec: 19925.8). Total num frames: 3387392. Throughput: 0: 5076.7. Samples: 842836. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:01:19,593][02117] Avg episode reward: [(0, '27.708')]
[2025-02-11 17:01:19,595][04717] Saving new best policy, reward=27.708!
[2025-02-11 17:01:20,180][04730] Updated weights for policy 0, policy_version 830 (0.0012)
[2025-02-11 17:01:22,264][04730] Updated weights for policy 0, policy_version 840 (0.0011)
[2025-02-11 17:01:24,266][04730] Updated weights for policy 0, policy_version 850 (0.0012)
[2025-02-11 17:01:24,591][02117] Fps is (10 sec: 20070.0, 60 sec: 20275.1, 300 sec: 19918.2). Total num frames: 3485696. Throughput: 0: 5059.7. Samples: 872776. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:01:24,593][02117] Avg episode reward: [(0, '25.521')]
[2025-02-11 17:01:26,427][04730] Updated weights for policy 0, policy_version 860 (0.0012)
[2025-02-11 17:01:28,447][04730] Updated weights for policy 0, policy_version 870 (0.0011)
[2025-02-11 17:01:29,591][02117] Fps is (10 sec: 19661.0, 60 sec: 20206.9, 300 sec: 19911.1). Total num frames: 3584000. Throughput: 0: 5047.8. Samples: 887138. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:01:29,593][02117] Avg episode reward: [(0, '25.614')]
[2025-02-11 17:01:30,456][04730] Updated weights for policy 0, policy_version 880 (0.0012)
[2025-02-11 17:01:32,555][04730] Updated weights for policy 0, policy_version 890 (0.0012)
[2025-02-11 17:01:34,591][02117] Fps is (10 sec: 19661.1, 60 sec: 20138.6, 300 sec: 19904.3). Total num frames: 3682304. Throughput: 0: 5029.0. Samples: 916984. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:01:34,593][02117] Avg episode reward: [(0, '23.923')]
[2025-02-11 17:01:34,633][04730] Updated weights for policy 0, policy_version 900 (0.0012)
[2025-02-11 17:01:36,609][04730] Updated weights for policy 0, policy_version 910 (0.0011)
[2025-02-11 17:01:38,610][04730] Updated weights for policy 0, policy_version 920 (0.0011)
[2025-02-11 17:01:39,591][02117] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 19919.5). Total num frames: 3784704. Throughput: 0: 5037.4. Samples: 947850. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:01:39,593][02117] Avg episode reward: [(0, '26.635')]
[2025-02-11 17:01:40,626][04730] Updated weights for policy 0, policy_version 930 (0.0011)
[2025-02-11 17:01:42,596][04730] Updated weights for policy 0, policy_version 940 (0.0011)
[2025-02-11 17:01:44,591][02117] Fps is (10 sec: 20480.0, 60 sec: 20207.0, 300 sec: 19933.9). Total num frames: 3887104. Throughput: 0: 5045.2. Samples: 963288. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:01:44,593][02117] Avg episode reward: [(0, '27.553')]
[2025-02-11 17:01:44,633][04730] Updated weights for policy 0, policy_version 950 (0.0011)
[2025-02-11 17:01:46,770][04730] Updated weights for policy 0, policy_version 960 (0.0012)
[2025-02-11 17:01:48,767][04730] Updated weights for policy 0, policy_version 970 (0.0011)
[2025-02-11 17:01:49,591][02117] Fps is (10 sec: 20479.8, 60 sec: 20206.9, 300 sec: 19947.5). Total num frames: 3989504. Throughput: 0: 5022.9. Samples: 993200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:01:49,593][02117] Avg episode reward: [(0, '22.304')]
[2025-02-11 17:01:50,333][04717] Stopping Batcher_0...
[2025-02-11 17:01:50,333][04717] Loop batcher_evt_loop terminating...
[2025-02-11 17:01:50,333][04717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-11 17:01:50,334][02117] Component Batcher_0 stopped!
[2025-02-11 17:01:50,353][04730] Weights refcount: 2 0
[2025-02-11 17:01:50,355][04730] Stopping InferenceWorker_p0-w0...
[2025-02-11 17:01:50,356][04730] Loop inference_proc0-0_evt_loop terminating...
[2025-02-11 17:01:50,355][02117] Component InferenceWorker_p0-w0 stopped!
[2025-02-11 17:01:50,377][04738] Stopping RolloutWorker_w7...
[2025-02-11 17:01:50,378][04738] Loop rollout_proc7_evt_loop terminating...
[2025-02-11 17:01:50,379][04735] Stopping RolloutWorker_w4...
[2025-02-11 17:01:50,377][02117] Component RolloutWorker_w7 stopped!
[2025-02-11 17:01:50,380][04735] Loop rollout_proc4_evt_loop terminating...
[2025-02-11 17:01:50,381][04733] Stopping RolloutWorker_w2...
[2025-02-11 17:01:50,382][04733] Loop rollout_proc2_evt_loop terminating...
[2025-02-11 17:01:50,382][04736] Stopping RolloutWorker_w5...
[2025-02-11 17:01:50,382][04736] Loop rollout_proc5_evt_loop terminating...
[2025-02-11 17:01:50,380][02117] Component RolloutWorker_w4 stopped!
[2025-02-11 17:01:50,384][04731] Stopping RolloutWorker_w1...
[2025-02-11 17:01:50,385][04734] Stopping RolloutWorker_w3...
[2025-02-11 17:01:50,385][04731] Loop rollout_proc1_evt_loop terminating...
[2025-02-11 17:01:50,385][04737] Stopping RolloutWorker_w6...
[2025-02-11 17:01:50,385][04732] Stopping RolloutWorker_w0...
[2025-02-11 17:01:50,385][04737] Loop rollout_proc6_evt_loop terminating...
[2025-02-11 17:01:50,384][02117] Component RolloutWorker_w2 stopped!
[2025-02-11 17:01:50,385][04732] Loop rollout_proc0_evt_loop terminating...
[2025-02-11 17:01:50,385][04734] Loop rollout_proc3_evt_loop terminating...
[2025-02-11 17:01:50,386][02117] Component RolloutWorker_w5 stopped!
[2025-02-11 17:01:50,388][02117] Component RolloutWorker_w1 stopped!
[2025-02-11 17:01:50,390][02117] Component RolloutWorker_w3 stopped!
[2025-02-11 17:01:50,391][02117] Component RolloutWorker_w6 stopped!
[2025-02-11 17:01:50,392][02117] Component RolloutWorker_w0 stopped!
[2025-02-11 17:01:50,408][04717] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-11 17:01:50,508][04717] Stopping LearnerWorker_p0...
[2025-02-11 17:01:50,509][04717] Loop learner_proc0_evt_loop terminating...
[2025-02-11 17:01:50,510][02117] Component LearnerWorker_p0 stopped!
[2025-02-11 17:01:50,513][02117] Waiting for process learner_proc0 to stop...
[2025-02-11 17:01:51,471][02117] Waiting for process inference_proc0-0 to join...
[2025-02-11 17:01:51,473][02117] Waiting for process rollout_proc0 to join...
[2025-02-11 17:01:51,475][02117] Waiting for process rollout_proc1 to join...
[2025-02-11 17:01:51,476][02117] Waiting for process rollout_proc2 to join...
[2025-02-11 17:01:51,478][02117] Waiting for process rollout_proc3 to join...
[2025-02-11 17:01:51,479][02117] Waiting for process rollout_proc4 to join...
[2025-02-11 17:01:51,480][02117] Waiting for process rollout_proc5 to join...
[2025-02-11 17:01:51,482][02117] Waiting for process rollout_proc6 to join...
[2025-02-11 17:01:51,484][02117] Waiting for process rollout_proc7 to join...
[2025-02-11 17:01:51,485][02117] Batcher 0 profile tree view:
batching: 11.8391, releasing_batches: 0.0239
[2025-02-11 17:01:51,487][02117] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
wait_policy_total: 3.8509
update_model: 3.2577
weight_update: 0.0012
one_step: 0.0028
handle_policy_step: 186.2668
deserialize: 7.5123, stack: 1.2923, obs_to_device_normalize: 46.5476, forward: 87.7695, send_messages: 12.9137
prepare_outputs: 23.0564
to_cpu: 14.8538
[2025-02-11 17:01:51,488][02117] Learner 0 profile tree view:
misc: 0.0037, prepare_batch: 9.7542
train: 23.1544
epoch_init: 0.0043, minibatch_init: 0.0055, losses_postprocess: 0.2719, kl_divergence: 0.3588, after_optimizer: 5.1615
calculate_losses: 9.5406
losses_init: 0.0032, forward_head: 0.6877, bptt_initial: 5.7729, tail: 0.5934, advantages_returns: 0.1634, losses: 1.1112
bptt: 1.0598
bptt_forward_core: 1.0099
update: 7.4917
clip: 0.7795
[2025-02-11 17:01:51,489][02117] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.1244, enqueue_policy_requests: 8.8805, env_step: 128.7927, overhead: 5.5858, complete_rollouts: 0.2136
save_policy_outputs: 7.9403
split_output_tensors: 3.0252
[2025-02-11 17:01:51,490][02117] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.1250, enqueue_policy_requests: 8.8758, env_step: 128.8295, overhead: 5.5466, complete_rollouts: 0.2104
save_policy_outputs: 7.9921
split_output_tensors: 3.0659
[2025-02-11 17:01:51,493][02117] Loop Runner_EvtLoop terminating...
[2025-02-11 17:01:51,494][02117] Runner profile tree view:
main_loop: 212.2128
[2025-02-11 17:01:51,495][02117] Collected {0: 4005888}, FPS: 18876.7
[2025-02-11 17:02:12,715][02117] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-11 17:02:12,717][02117] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-11 17:02:12,718][02117] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-11 17:02:12,720][02117] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-11 17:02:12,721][02117] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-11 17:02:12,722][02117] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-11 17:02:12,723][02117] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-02-11 17:02:12,725][02117] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-11 17:02:12,726][02117] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-02-11 17:02:12,727][02117] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-02-11 17:02:12,728][02117] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-11 17:02:12,730][02117] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-11 17:02:12,731][02117] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-11 17:02:12,732][02117] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-11 17:02:12,733][02117] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-11 17:02:12,762][02117] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:02:12,765][02117] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:02:12,767][02117] RunningMeanStd input shape: (1,)
[2025-02-11 17:02:12,781][02117] ConvEncoder: input_channels=3
[2025-02-11 17:02:12,886][02117] Conv encoder output size: 512
[2025-02-11 17:02:12,888][02117] Policy head output size: 512
[2025-02-11 17:02:13,041][02117] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-11 17:02:13,854][02117] Num frames 100...
[2025-02-11 17:02:13,981][02117] Num frames 200...
[2025-02-11 17:02:14,110][02117] Num frames 300...
[2025-02-11 17:02:14,237][02117] Num frames 400...
[2025-02-11 17:02:14,363][02117] Num frames 500...
[2025-02-11 17:02:14,487][02117] Num frames 600...
[2025-02-11 17:02:14,617][02117] Num frames 700...
[2025-02-11 17:02:14,745][02117] Num frames 800...
[2025-02-11 17:02:14,878][02117] Num frames 900...
[2025-02-11 17:02:15,015][02117] Num frames 1000...
[2025-02-11 17:02:15,149][02117] Num frames 1100...
[2025-02-11 17:02:15,275][02117] Num frames 1200...
[2025-02-11 17:02:15,402][02117] Num frames 1300...
[2025-02-11 17:02:15,530][02117] Num frames 1400...
[2025-02-11 17:02:15,657][02117] Num frames 1500...
[2025-02-11 17:02:15,784][02117] Num frames 1600...
[2025-02-11 17:02:15,913][02117] Num frames 1700...
[2025-02-11 17:02:16,043][02117] Num frames 1800...
[2025-02-11 17:02:16,208][02117] Avg episode rewards: #0: 47.859, true rewards: #0: 18.860
[2025-02-11 17:02:16,209][02117] Avg episode reward: 47.859, avg true_objective: 18.860
[2025-02-11 17:02:16,229][02117] Num frames 1900...
[2025-02-11 17:02:16,353][02117] Num frames 2000...
[2025-02-11 17:02:16,479][02117] Num frames 2100...
[2025-02-11 17:02:16,604][02117] Num frames 2200...
[2025-02-11 17:02:16,730][02117] Num frames 2300...
[2025-02-11 17:02:16,855][02117] Num frames 2400...
[2025-02-11 17:02:16,982][02117] Num frames 2500...
[2025-02-11 17:02:17,109][02117] Num frames 2600...
[2025-02-11 17:02:17,236][02117] Num frames 2700...
[2025-02-11 17:02:17,314][02117] Avg episode rewards: #0: 31.590, true rewards: #0: 13.590
[2025-02-11 17:02:17,315][02117] Avg episode reward: 31.590, avg true_objective: 13.590
[2025-02-11 17:02:17,420][02117] Num frames 2800...
[2025-02-11 17:02:17,547][02117] Num frames 2900...
[2025-02-11 17:02:17,674][02117] Num frames 3000...
[2025-02-11 17:02:17,800][02117] Num frames 3100...
[2025-02-11 17:02:17,928][02117] Num frames 3200...
[2025-02-11 17:02:18,058][02117] Num frames 3300...
[2025-02-11 17:02:18,185][02117] Num frames 3400...
[2025-02-11 17:02:18,311][02117] Num frames 3500...
[2025-02-11 17:02:18,440][02117] Num frames 3600...
[2025-02-11 17:02:18,567][02117] Num frames 3700...
[2025-02-11 17:02:18,697][02117] Num frames 3800...
[2025-02-11 17:02:18,827][02117] Num frames 3900...
[2025-02-11 17:02:18,955][02117] Num frames 4000...
[2025-02-11 17:02:19,090][02117] Num frames 4100...
[2025-02-11 17:02:19,224][02117] Num frames 4200...
[2025-02-11 17:02:19,353][02117] Num frames 4300...
[2025-02-11 17:02:19,479][02117] Num frames 4400...
[2025-02-11 17:02:19,611][02117] Num frames 4500...
[2025-02-11 17:02:19,741][02117] Num frames 4600...
[2025-02-11 17:02:19,872][02117] Num frames 4700...
[2025-02-11 17:02:20,012][02117] Num frames 4800...
[2025-02-11 17:02:20,093][02117] Avg episode rewards: #0: 41.059, true rewards: #0: 16.060
[2025-02-11 17:02:20,094][02117] Avg episode reward: 41.059, avg true_objective: 16.060
[2025-02-11 17:02:20,205][02117] Num frames 4900...
[2025-02-11 17:02:20,338][02117] Num frames 5000...
[2025-02-11 17:02:20,467][02117] Num frames 5100...
[2025-02-11 17:02:20,594][02117] Num frames 5200...
[2025-02-11 17:02:20,719][02117] Num frames 5300...
[2025-02-11 17:02:20,847][02117] Num frames 5400...
[2025-02-11 17:02:20,936][02117] Avg episode rewards: #0: 34.065, true rewards: #0: 13.565
[2025-02-11 17:02:20,937][02117] Avg episode reward: 34.065, avg true_objective: 13.565
[2025-02-11 17:02:21,032][02117] Num frames 5500...
[2025-02-11 17:02:21,160][02117] Num frames 5600...
[2025-02-11 17:02:21,288][02117] Num frames 5700...
[2025-02-11 17:02:21,413][02117] Num frames 5800...
[2025-02-11 17:02:21,539][02117] Num frames 5900...
[2025-02-11 17:02:21,707][02117] Avg episode rewards: #0: 29.584, true rewards: #0: 11.984
[2025-02-11 17:02:21,709][02117] Avg episode reward: 29.584, avg true_objective: 11.984
[2025-02-11 17:02:21,721][02117] Num frames 6000...
[2025-02-11 17:02:21,849][02117] Num frames 6100...
[2025-02-11 17:02:21,983][02117] Num frames 6200...
[2025-02-11 17:02:22,115][02117] Num frames 6300...
[2025-02-11 17:02:22,245][02117] Num frames 6400...
[2025-02-11 17:02:22,373][02117] Num frames 6500...
[2025-02-11 17:02:22,500][02117] Num frames 6600...
[2025-02-11 17:02:22,629][02117] Num frames 6700...
[2025-02-11 17:02:22,756][02117] Num frames 6800...
[2025-02-11 17:02:22,886][02117] Num frames 6900...
[2025-02-11 17:02:23,019][02117] Num frames 7000...
[2025-02-11 17:02:23,148][02117] Num frames 7100...
[2025-02-11 17:02:23,301][02117] Avg episode rewards: #0: 29.460, true rewards: #0: 11.960
[2025-02-11 17:02:23,303][02117] Avg episode reward: 29.460, avg true_objective: 11.960
[2025-02-11 17:02:23,336][02117] Num frames 7200...
[2025-02-11 17:02:23,461][02117] Num frames 7300...
[2025-02-11 17:02:23,588][02117] Num frames 7400...
[2025-02-11 17:02:23,713][02117] Num frames 7500...
[2025-02-11 17:02:23,842][02117] Num frames 7600...
[2025-02-11 17:02:23,973][02117] Num frames 7700...
[2025-02-11 17:02:24,102][02117] Num frames 7800...
[2025-02-11 17:02:24,230][02117] Num frames 7900...
[2025-02-11 17:02:24,355][02117] Num frames 8000...
[2025-02-11 17:02:24,481][02117] Num frames 8100...
[2025-02-11 17:02:24,610][02117] Num frames 8200...
[2025-02-11 17:02:24,738][02117] Num frames 8300...
[2025-02-11 17:02:24,863][02117] Num frames 8400...
[2025-02-11 17:02:24,995][02117] Num frames 8500...
[2025-02-11 17:02:25,124][02117] Num frames 8600...
[2025-02-11 17:02:25,253][02117] Num frames 8700...
[2025-02-11 17:02:25,324][02117] Avg episode rewards: #0: 30.303, true rewards: #0: 12.446
[2025-02-11 17:02:25,325][02117] Avg episode reward: 30.303, avg true_objective: 12.446
[2025-02-11 17:02:25,436][02117] Num frames 8800...
[2025-02-11 17:02:25,562][02117] Num frames 8900...
[2025-02-11 17:02:25,686][02117] Num frames 9000...
[2025-02-11 17:02:25,814][02117] Num frames 9100...
[2025-02-11 17:02:25,941][02117] Num frames 9200...
[2025-02-11 17:02:26,067][02117] Num frames 9300...
[2025-02-11 17:02:26,195][02117] Num frames 9400...
[2025-02-11 17:02:26,324][02117] Num frames 9500...
[2025-02-11 17:02:26,451][02117] Num frames 9600...
[2025-02-11 17:02:26,575][02117] Num frames 9700...
[2025-02-11 17:02:26,705][02117] Num frames 9800...
[2025-02-11 17:02:26,832][02117] Num frames 9900...
[2025-02-11 17:02:26,959][02117] Num frames 10000...
[2025-02-11 17:02:27,089][02117] Num frames 10100...
[2025-02-11 17:02:27,216][02117] Num frames 10200...
[2025-02-11 17:02:27,341][02117] Num frames 10300...
[2025-02-11 17:02:27,470][02117] Num frames 10400...
[2025-02-11 17:02:27,601][02117] Num frames 10500...
[2025-02-11 17:02:27,727][02117] Num frames 10600...
[2025-02-11 17:02:27,856][02117] Num frames 10700...
[2025-02-11 17:02:27,991][02117] Avg episode rewards: #0: 33.453, true rewards: #0: 13.454
[2025-02-11 17:02:27,993][02117] Avg episode reward: 33.453, avg true_objective: 13.454
[2025-02-11 17:02:28,046][02117] Num frames 10800...
[2025-02-11 17:02:28,173][02117] Num frames 10900...
[2025-02-11 17:02:28,299][02117] Num frames 11000...
[2025-02-11 17:02:28,426][02117] Num frames 11100...
[2025-02-11 17:02:28,549][02117] Num frames 11200...
[2025-02-11 17:02:28,676][02117] Num frames 11300...
[2025-02-11 17:02:28,802][02117] Num frames 11400...
[2025-02-11 17:02:28,941][02117] Avg episode rewards: #0: 31.296, true rewards: #0: 12.741
[2025-02-11 17:02:28,942][02117] Avg episode reward: 31.296, avg true_objective: 12.741
[2025-02-11 17:02:28,986][02117] Num frames 11500...
[2025-02-11 17:02:29,117][02117] Num frames 11600...
[2025-02-11 17:02:29,242][02117] Num frames 11700...
[2025-02-11 17:02:29,368][02117] Num frames 11800...
[2025-02-11 17:02:29,494][02117] Num frames 11900...
[2025-02-11 17:02:29,621][02117] Num frames 12000...
[2025-02-11 17:02:29,746][02117] Num frames 12100...
[2025-02-11 17:02:29,874][02117] Num frames 12200...
[2025-02-11 17:02:29,993][02117] Avg episode rewards: #0: 29.850, true rewards: #0: 12.250
[2025-02-11 17:02:29,994][02117] Avg episode reward: 29.850, avg true_objective: 12.250
[2025-02-11 17:02:59,239][02117] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-11 17:04:57,900][02117] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-11 17:04:57,901][02117] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-11 17:04:57,902][02117] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-11 17:04:57,904][02117] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-11 17:04:57,905][02117] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-11 17:04:57,906][02117] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-11 17:04:57,908][02117] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-02-11 17:04:57,909][02117] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-11 17:04:57,910][02117] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-02-11 17:04:57,912][02117] Adding new argument 'hf_repository'='mjm54/doom_health_gathering_supreme' that is not in the saved config file!
[2025-02-11 17:04:57,913][02117] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-11 17:04:57,914][02117] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-11 17:04:57,915][02117] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-11 17:04:57,917][02117] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-11 17:04:57,918][02117] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-11 17:04:57,942][02117] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:04:57,945][02117] RunningMeanStd input shape: (1,)
[2025-02-11 17:04:57,956][02117] ConvEncoder: input_channels=3
[2025-02-11 17:04:57,993][02117] Conv encoder output size: 512
[2025-02-11 17:04:57,995][02117] Policy head output size: 512
[2025-02-11 17:04:58,017][02117] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-11 17:04:58,455][02117] Num frames 100...
[2025-02-11 17:04:58,579][02117] Num frames 200...
[2025-02-11 17:04:58,703][02117] Num frames 300...
[2025-02-11 17:04:58,829][02117] Num frames 400...
[2025-02-11 17:04:58,964][02117] Num frames 500...
[2025-02-11 17:04:59,100][02117] Num frames 600...
[2025-02-11 17:04:59,234][02117] Num frames 700...
[2025-02-11 17:04:59,368][02117] Num frames 800...
[2025-02-11 17:04:59,504][02117] Num frames 900...
[2025-02-11 17:04:59,636][02117] Num frames 1000...
[2025-02-11 17:04:59,768][02117] Num frames 1100...
[2025-02-11 17:04:59,906][02117] Num frames 1200...
[2025-02-11 17:05:00,038][02117] Num frames 1300...
[2025-02-11 17:05:00,165][02117] Num frames 1400...
[2025-02-11 17:05:00,296][02117] Num frames 1500...
[2025-02-11 17:05:00,423][02117] Num frames 1600...
[2025-02-11 17:05:00,491][02117] Avg episode rewards: #0: 35.090, true rewards: #0: 16.090
[2025-02-11 17:05:00,492][02117] Avg episode reward: 35.090, avg true_objective: 16.090
[2025-02-11 17:05:00,604][02117] Num frames 1700...
[2025-02-11 17:05:00,730][02117] Num frames 1800...
[2025-02-11 17:05:00,855][02117] Num frames 1900...
[2025-02-11 17:05:00,979][02117] Num frames 2000...
[2025-02-11 17:05:01,106][02117] Num frames 2100...
[2025-02-11 17:05:01,231][02117] Num frames 2200...
[2025-02-11 17:05:01,357][02117] Num frames 2300...
[2025-02-11 17:05:01,507][02117] Avg episode rewards: #0: 25.880, true rewards: #0: 11.880
[2025-02-11 17:05:01,508][02117] Avg episode reward: 25.880, avg true_objective: 11.880
[2025-02-11 17:05:01,539][02117] Num frames 2400...
[2025-02-11 17:05:01,666][02117] Num frames 2500...
[2025-02-11 17:05:01,794][02117] Num frames 2600...
[2025-02-11 17:05:01,921][02117] Num frames 2700...
[2025-02-11 17:05:02,050][02117] Num frames 2800...
[2025-02-11 17:05:02,175][02117] Num frames 2900...
[2025-02-11 17:05:02,301][02117] Num frames 3000...
[2025-02-11 17:05:02,428][02117] Num frames 3100...
[2025-02-11 17:05:02,552][02117] Num frames 3200...
[2025-02-11 17:05:02,618][02117] Avg episode rewards: #0: 23.360, true rewards: #0: 10.693
[2025-02-11 17:05:02,619][02117] Avg episode reward: 23.360, avg true_objective: 10.693
[2025-02-11 17:05:02,734][02117] Num frames 3300...
[2025-02-11 17:05:02,859][02117] Num frames 3400...
[2025-02-11 17:05:02,985][02117] Num frames 3500...
[2025-02-11 17:05:03,110][02117] Num frames 3600...
[2025-02-11 17:05:03,236][02117] Num frames 3700...
[2025-02-11 17:05:03,363][02117] Num frames 3800...
[2025-02-11 17:05:03,488][02117] Num frames 3900...
[2025-02-11 17:05:03,612][02117] Num frames 4000...
[2025-02-11 17:05:03,739][02117] Num frames 4100...
[2025-02-11 17:05:03,865][02117] Num frames 4200...
[2025-02-11 17:05:03,993][02117] Num frames 4300...
[2025-02-11 17:05:04,120][02117] Num frames 4400...
[2025-02-11 17:05:04,247][02117] Num frames 4500...
[2025-02-11 17:05:04,327][02117] Avg episode rewards: #0: 26.300, true rewards: #0: 11.300
[2025-02-11 17:05:04,329][02117] Avg episode reward: 26.300, avg true_objective: 11.300
[2025-02-11 17:05:04,433][02117] Num frames 4600...
[2025-02-11 17:05:04,558][02117] Num frames 4700...
[2025-02-11 17:05:04,684][02117] Num frames 4800...
[2025-02-11 17:05:04,808][02117] Num frames 4900...
[2025-02-11 17:05:04,934][02117] Num frames 5000...
[2025-02-11 17:05:05,064][02117] Num frames 5100...
[2025-02-11 17:05:05,189][02117] Num frames 5200...
[2025-02-11 17:05:05,316][02117] Num frames 5300...
[2025-02-11 17:05:05,445][02117] Num frames 5400...
[2025-02-11 17:05:05,572][02117] Num frames 5500...
[2025-02-11 17:05:05,699][02117] Num frames 5600...
[2025-02-11 17:05:05,765][02117] Avg episode rewards: #0: 26.016, true rewards: #0: 11.216
[2025-02-11 17:05:05,766][02117] Avg episode reward: 26.016, avg true_objective: 11.216
[2025-02-11 17:05:05,881][02117] Num frames 5700...
[2025-02-11 17:05:06,004][02117] Num frames 5800...
[2025-02-11 17:05:06,129][02117] Num frames 5900...
[2025-02-11 17:05:06,255][02117] Num frames 6000...
[2025-02-11 17:05:06,379][02117] Num frames 6100...
[2025-02-11 17:05:06,506][02117] Num frames 6200...
[2025-02-11 17:05:06,633][02117] Num frames 6300...
[2025-02-11 17:05:06,758][02117] Num frames 6400...
[2025-02-11 17:05:06,885][02117] Num frames 6500...
[2025-02-11 17:05:07,017][02117] Num frames 6600...
[2025-02-11 17:05:07,145][02117] Num frames 6700...
[2025-02-11 17:05:07,271][02117] Num frames 6800...
[2025-02-11 17:05:07,401][02117] Num frames 6900...
[2025-02-11 17:05:07,529][02117] Num frames 7000...
[2025-02-11 17:05:07,660][02117] Num frames 7100...
[2025-02-11 17:05:07,787][02117] Num frames 7200...
[2025-02-11 17:05:07,913][02117] Num frames 7300...
[2025-02-11 17:05:08,043][02117] Num frames 7400...
[2025-02-11 17:05:08,174][02117] Num frames 7500...
[2025-02-11 17:05:08,300][02117] Num frames 7600...
[2025-02-11 17:05:08,427][02117] Num frames 7700...
[2025-02-11 17:05:08,492][02117] Avg episode rewards: #0: 31.180, true rewards: #0: 12.847
[2025-02-11 17:05:08,493][02117] Avg episode reward: 31.180, avg true_objective: 12.847
[2025-02-11 17:05:08,610][02117] Num frames 7800...
[2025-02-11 17:05:08,735][02117] Num frames 7900...
[2025-02-11 17:05:08,857][02117] Num frames 8000...
[2025-02-11 17:05:08,983][02117] Num frames 8100...
[2025-02-11 17:05:09,114][02117] Num frames 8200...
[2025-02-11 17:05:09,240][02117] Num frames 8300...
[2025-02-11 17:05:09,367][02117] Num frames 8400...
[2025-02-11 17:05:09,493][02117] Num frames 8500...
[2025-02-11 17:05:09,624][02117] Num frames 8600...
[2025-02-11 17:05:09,750][02117] Num frames 8700...
[2025-02-11 17:05:09,878][02117] Num frames 8800...
[2025-02-11 17:05:10,006][02117] Num frames 8900...
[2025-02-11 17:05:10,133][02117] Num frames 9000...
[2025-02-11 17:05:10,259][02117] Num frames 9100...
[2025-02-11 17:05:10,398][02117] Num frames 9200...
[2025-02-11 17:05:10,526][02117] Num frames 9300...
[2025-02-11 17:05:10,653][02117] Num frames 9400...
[2025-02-11 17:05:10,782][02117] Num frames 9500...
[2025-02-11 17:05:10,911][02117] Num frames 9600...
[2025-02-11 17:05:11,002][02117] Avg episode rewards: #0: 33.326, true rewards: #0: 13.754
[2025-02-11 17:05:11,003][02117] Avg episode reward: 33.326, avg true_objective: 13.754
[2025-02-11 17:05:11,097][02117] Num frames 9700...
[2025-02-11 17:05:11,233][02117] Num frames 9800...
[2025-02-11 17:05:11,369][02117] Num frames 9900...
[2025-02-11 17:05:11,508][02117] Num frames 10000...
[2025-02-11 17:05:11,647][02117] Num frames 10100...
[2025-02-11 17:05:11,779][02117] Num frames 10200...
[2025-02-11 17:05:11,906][02117] Num frames 10300...
[2025-02-11 17:05:12,035][02117] Num frames 10400...
[2025-02-11 17:05:12,160][02117] Num frames 10500...
[2025-02-11 17:05:12,289][02117] Num frames 10600...
[2025-02-11 17:05:12,383][02117] Avg episode rewards: #0: 32.164, true rewards: #0: 13.289
[2025-02-11 17:05:12,385][02117] Avg episode reward: 32.164, avg true_objective: 13.289
[2025-02-11 17:05:12,472][02117] Num frames 10700...
[2025-02-11 17:05:12,601][02117] Num frames 10800...
[2025-02-11 17:05:12,727][02117] Num frames 10900...
[2025-02-11 17:05:12,854][02117] Num frames 11000...
[2025-02-11 17:05:12,981][02117] Num frames 11100...
[2025-02-11 17:05:13,112][02117] Num frames 11200...
[2025-02-11 17:05:13,238][02117] Num frames 11300...
[2025-02-11 17:05:13,364][02117] Num frames 11400...
[2025-02-11 17:05:13,492][02117] Num frames 11500...
[2025-02-11 17:05:13,621][02117] Num frames 11600...
[2025-02-11 17:05:13,747][02117] Num frames 11700...
[2025-02-11 17:05:13,874][02117] Num frames 11800...
[2025-02-11 17:05:14,015][02117] Num frames 11900...
[2025-02-11 17:05:14,149][02117] Num frames 12000...
[2025-02-11 17:05:14,288][02117] Num frames 12100...
[2025-02-11 17:05:14,423][02117] Num frames 12200...
[2025-02-11 17:05:14,524][02117] Avg episode rewards: #0: 32.929, true rewards: #0: 13.596
[2025-02-11 17:05:14,526][02117] Avg episode reward: 32.929, avg true_objective: 13.596
[2025-02-11 17:05:14,607][02117] Num frames 12300...
[2025-02-11 17:05:14,735][02117] Num frames 12400...
[2025-02-11 17:05:14,867][02117] Num frames 12500...
[2025-02-11 17:05:15,000][02117] Num frames 12600...
[2025-02-11 17:05:15,131][02117] Num frames 12700...
[2025-02-11 17:05:15,261][02117] Num frames 12800...
[2025-02-11 17:05:15,390][02117] Num frames 12900...
[2025-02-11 17:05:15,515][02117] Num frames 13000...
[2025-02-11 17:05:15,643][02117] Num frames 13100...
[2025-02-11 17:05:15,776][02117] Num frames 13200...
[2025-02-11 17:05:15,906][02117] Num frames 13300...
[2025-02-11 17:05:16,034][02117] Num frames 13400...
[2025-02-11 17:05:16,163][02117] Num frames 13500...
[2025-02-11 17:05:16,292][02117] Num frames 13600...
[2025-02-11 17:05:16,420][02117] Num frames 13700...
[2025-02-11 17:05:16,549][02117] Num frames 13800...
[2025-02-11 17:05:16,719][02117] Avg episode rewards: #0: 34.187, true rewards: #0: 13.887
[2025-02-11 17:05:16,721][02117] Avg episode reward: 34.187, avg true_objective: 13.887
[2025-02-11 17:05:49,668][02117] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-11 17:06:06,659][02117] The model has been pushed to https://huggingface.co/mjm54/doom_health_gathering_supreme
[2025-02-11 17:06:20,791][02117] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-11 17:06:20,793][02117] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-11 17:06:20,795][02117] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-11 17:06:20,797][02117] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-11 17:06:20,798][02117] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-11 17:06:20,799][02117] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-11 17:06:20,800][02117] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-02-11 17:06:20,801][02117] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-11 17:06:20,803][02117] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-02-11 17:06:20,803][02117] Adding new argument 'hf_repository'='mjm54/doom_health_gathering_supreme' that is not in the saved config file!
[2025-02-11 17:06:20,805][02117] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-11 17:06:20,807][02117] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-11 17:06:20,808][02117] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-11 17:06:20,809][02117] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-11 17:06:20,811][02117] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-11 17:06:20,835][02117] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:06:20,837][02117] RunningMeanStd input shape: (1,)
[2025-02-11 17:06:20,848][02117] ConvEncoder: input_channels=3
[2025-02-11 17:06:20,883][02117] Conv encoder output size: 512
[2025-02-11 17:06:20,885][02117] Policy head output size: 512
[2025-02-11 17:06:20,905][02117] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-11 17:06:21,358][02117] Num frames 100...
[2025-02-11 17:06:21,492][02117] Num frames 200...
[2025-02-11 17:06:21,628][02117] Num frames 300...
[2025-02-11 17:06:21,757][02117] Num frames 400...
[2025-02-11 17:06:21,894][02117] Num frames 500...
[2025-02-11 17:06:22,032][02117] Num frames 600...
[2025-02-11 17:06:22,162][02117] Num frames 700...
[2025-02-11 17:06:22,274][02117] Avg episode rewards: #0: 20.420, true rewards: #0: 7.420
[2025-02-11 17:06:22,276][02117] Avg episode reward: 20.420, avg true_objective: 7.420
[2025-02-11 17:06:22,353][02117] Num frames 800...
[2025-02-11 17:06:22,480][02117] Num frames 900...
[2025-02-11 17:06:22,607][02117] Num frames 1000...
[2025-02-11 17:06:22,735][02117] Num frames 1100...
[2025-02-11 17:06:22,865][02117] Num frames 1200...
[2025-02-11 17:06:22,992][02117] Avg episode rewards: #0: 14.270, true rewards: #0: 6.270
[2025-02-11 17:06:22,994][02117] Avg episode reward: 14.270, avg true_objective: 6.270
[2025-02-11 17:06:23,058][02117] Num frames 1300...
[2025-02-11 17:06:23,189][02117] Num frames 1400...
[2025-02-11 17:06:23,322][02117] Num frames 1500...
[2025-02-11 17:06:23,452][02117] Num frames 1600...
[2025-02-11 17:06:23,584][02117] Num frames 1700...
[2025-02-11 17:06:23,720][02117] Num frames 1800...
[2025-02-11 17:06:23,849][02117] Num frames 1900...
[2025-02-11 17:06:23,978][02117] Num frames 2000...
[2025-02-11 17:06:24,160][02117] Avg episode rewards: #0: 16.980, true rewards: #0: 6.980
[2025-02-11 17:06:24,162][02117] Avg episode reward: 16.980, avg true_objective: 6.980
[2025-02-11 17:06:24,172][02117] Num frames 2100...
[2025-02-11 17:06:24,300][02117] Num frames 2200...
[2025-02-11 17:06:24,429][02117] Num frames 2300...
[2025-02-11 17:06:24,553][02117] Num frames 2400...
[2025-02-11 17:06:24,683][02117] Num frames 2500...
[2025-02-11 17:06:24,810][02117] Num frames 2600...
[2025-02-11 17:06:24,940][02117] Num frames 2700...
[2025-02-11 17:06:25,123][02117] Avg episode rewards: #0: 16.245, true rewards: #0: 6.995
[2025-02-11 17:06:25,125][02117] Avg episode reward: 16.245, avg true_objective: 6.995
[2025-02-11 17:06:25,128][02117] Num frames 2800...
[2025-02-11 17:06:25,258][02117] Num frames 2900...
[2025-02-11 17:06:25,392][02117] Num frames 3000...
[2025-02-11 17:06:25,520][02117] Num frames 3100...
[2025-02-11 17:06:25,646][02117] Num frames 3200...
[2025-02-11 17:06:25,776][02117] Num frames 3300...
[2025-02-11 17:06:25,904][02117] Num frames 3400...
[2025-02-11 17:06:26,032][02117] Num frames 3500...
[2025-02-11 17:06:26,159][02117] Num frames 3600...
[2025-02-11 17:06:26,285][02117] Num frames 3700...
[2025-02-11 17:06:26,412][02117] Num frames 3800...
[2025-02-11 17:06:26,538][02117] Num frames 3900...
[2025-02-11 17:06:26,668][02117] Num frames 4000...
[2025-02-11 17:06:26,796][02117] Num frames 4100...
[2025-02-11 17:06:26,923][02117] Num frames 4200...
[2025-02-11 17:06:27,050][02117] Num frames 4300...
[2025-02-11 17:06:27,178][02117] Num frames 4400...
[2025-02-11 17:06:27,306][02117] Num frames 4500...
[2025-02-11 17:06:27,433][02117] Num frames 4600...
[2025-02-11 17:06:27,559][02117] Num frames 4700...
[2025-02-11 17:06:27,688][02117] Num frames 4800...
[2025-02-11 17:06:27,864][02117] Avg episode rewards: #0: 24.996, true rewards: #0: 9.796
[2025-02-11 17:06:27,865][02117] Avg episode reward: 24.996, avg true_objective: 9.796
[2025-02-11 17:06:27,870][02117] Num frames 4900...
[2025-02-11 17:06:27,997][02117] Num frames 5000...
[2025-02-11 17:06:28,123][02117] Num frames 5100...
[2025-02-11 17:06:28,249][02117] Num frames 5200...
[2025-02-11 17:06:28,373][02117] Num frames 5300...
[2025-02-11 17:06:28,503][02117] Num frames 5400...
[2025-02-11 17:06:28,633][02117] Num frames 5500...
[2025-02-11 17:06:28,762][02117] Num frames 5600...
[2025-02-11 17:06:28,892][02117] Num frames 5700...
[2025-02-11 17:06:29,019][02117] Num frames 5800...
[2025-02-11 17:06:29,147][02117] Num frames 5900...
[2025-02-11 17:06:29,273][02117] Num frames 6000...
[2025-02-11 17:06:29,398][02117] Num frames 6100...
[2025-02-11 17:06:29,525][02117] Num frames 6200...
[2025-02-11 17:06:29,634][02117] Avg episode rewards: #0: 26.570, true rewards: #0: 10.403
[2025-02-11 17:06:29,636][02117] Avg episode reward: 26.570, avg true_objective: 10.403
[2025-02-11 17:06:29,709][02117] Num frames 6300...
[2025-02-11 17:06:29,837][02117] Num frames 6400...
[2025-02-11 17:06:29,963][02117] Num frames 6500...
[2025-02-11 17:06:30,090][02117] Num frames 6600...
[2025-02-11 17:06:30,178][02117] Avg episode rewards: #0: 23.608, true rewards: #0: 9.466
[2025-02-11 17:06:30,180][02117] Avg episode reward: 23.608, avg true_objective: 9.466
[2025-02-11 17:06:30,273][02117] Num frames 6700...
[2025-02-11 17:06:30,401][02117] Num frames 6800...
[2025-02-11 17:06:30,528][02117] Num frames 6900...
[2025-02-11 17:06:30,656][02117] Num frames 7000...
[2025-02-11 17:06:30,784][02117] Num frames 7100...
[2025-02-11 17:06:30,914][02117] Num frames 7200...
[2025-02-11 17:06:31,046][02117] Num frames 7300...
[2025-02-11 17:06:31,172][02117] Num frames 7400...
[2025-02-11 17:06:31,302][02117] Num frames 7500...
[2025-02-11 17:06:31,434][02117] Num frames 7600...
[2025-02-11 17:06:31,560][02117] Num frames 7700...
[2025-02-11 17:06:31,690][02117] Num frames 7800...
[2025-02-11 17:06:31,822][02117] Num frames 7900...
[2025-02-11 17:06:31,954][02117] Num frames 8000...
[2025-02-11 17:06:32,090][02117] Num frames 8100...
[2025-02-11 17:06:32,240][02117] Num frames 8200...
[2025-02-11 17:06:32,374][02117] Num frames 8300...
[2025-02-11 17:06:32,504][02117] Num frames 8400...
[2025-02-11 17:06:32,636][02117] Num frames 8500...
[2025-02-11 17:06:32,766][02117] Num frames 8600...
[2025-02-11 17:06:32,894][02117] Num frames 8700...
[2025-02-11 17:06:32,983][02117] Avg episode rewards: #0: 28.032, true rewards: #0: 10.908
[2025-02-11 17:06:32,985][02117] Avg episode reward: 28.032, avg true_objective: 10.908
[2025-02-11 17:06:33,080][02117] Num frames 8800...
[2025-02-11 17:06:33,205][02117] Num frames 8900...
[2025-02-11 17:06:33,332][02117] Num frames 9000...
[2025-02-11 17:06:33,458][02117] Num frames 9100...
[2025-02-11 17:06:33,585][02117] Num frames 9200...
[2025-02-11 17:06:33,715][02117] Num frames 9300...
[2025-02-11 17:06:33,854][02117] Avg episode rewards: #0: 26.073, true rewards: #0: 10.407
[2025-02-11 17:06:33,855][02117] Avg episode reward: 26.073, avg true_objective: 10.407
[2025-02-11 17:06:33,900][02117] Num frames 9400...
[2025-02-11 17:06:34,025][02117] Num frames 9500...
[2025-02-11 17:06:34,153][02117] Num frames 9600...
[2025-02-11 17:06:34,280][02117] Num frames 9700...
[2025-02-11 17:06:34,404][02117] Num frames 9800...
[2025-02-11 17:06:34,531][02117] Num frames 9900...
[2025-02-11 17:06:34,658][02117] Num frames 10000...
[2025-02-11 17:06:34,786][02117] Num frames 10100...
[2025-02-11 17:06:34,913][02117] Num frames 10200...
[2025-02-11 17:06:35,099][02117] Avg episode rewards: #0: 25.899, true rewards: #0: 10.299
[2025-02-11 17:06:35,100][02117] Avg episode reward: 25.899, avg true_objective: 10.299
[2025-02-11 17:06:35,103][02117] Num frames 10300...
[2025-02-11 17:06:59,510][02117] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-11 17:07:09,332][02117] The model has been pushed to https://huggingface.co/mjm54/doom_health_gathering_supreme
[2025-02-11 17:12:45,863][02117] Environment doom_basic already registered, overwriting...
[2025-02-11 17:12:45,865][02117] Environment doom_two_colors_easy already registered, overwriting...
[2025-02-11 17:12:45,866][02117] Environment doom_two_colors_hard already registered, overwriting...
[2025-02-11 17:12:45,869][02117] Environment doom_dm already registered, overwriting...
[2025-02-11 17:12:45,869][02117] Environment doom_dwango5 already registered, overwriting...
[2025-02-11 17:12:45,871][02117] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2025-02-11 17:12:45,873][02117] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2025-02-11 17:12:45,874][02117] Environment doom_my_way_home already registered, overwriting...
[2025-02-11 17:12:45,876][02117] Environment doom_deadly_corridor already registered, overwriting...
[2025-02-11 17:12:45,878][02117] Environment doom_defend_the_center already registered, overwriting...
[2025-02-11 17:12:45,879][02117] Environment doom_defend_the_line already registered, overwriting...
[2025-02-11 17:12:45,881][02117] Environment doom_health_gathering already registered, overwriting...
[2025-02-11 17:12:45,882][02117] Environment doom_health_gathering_supreme already registered, overwriting...
[2025-02-11 17:12:45,883][02117] Environment doom_battle already registered, overwriting...
[2025-02-11 17:12:45,884][02117] Environment doom_battle2 already registered, overwriting...
[2025-02-11 17:12:45,886][02117] Environment doom_duel_bots already registered, overwriting...
[2025-02-11 17:12:45,888][02117] Environment doom_deathmatch_bots already registered, overwriting...
[2025-02-11 17:12:45,889][02117] Environment doom_duel already registered, overwriting...
[2025-02-11 17:12:45,891][02117] Environment doom_deathmatch_full already registered, overwriting...
[2025-02-11 17:12:45,892][02117] Environment doom_benchmark already registered, overwriting...
[2025-02-11 17:12:45,894][02117] register_encoder_factory: <function make_vizdoom_encoder at 0x7da2c5ac6660>
[2025-02-11 17:12:45,902][02117] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-11 17:12:45,904][02117] Overriding arg 'num_workers' with value 10 passed from command line
[2025-02-11 17:12:45,904][02117] Overriding arg 'num_envs_per_worker' with value 5 passed from command line
[2025-02-11 17:12:45,906][02117] Overriding arg 'train_for_env_steps' with value 8000000 passed from command line
[2025-02-11 17:12:45,912][02117] Experiment dir /content/train_dir/default_experiment already exists!
[2025-02-11 17:12:45,913][02117] Resuming existing experiment from /content/train_dir/default_experiment...
[2025-02-11 17:12:45,914][02117] Weights and Biases integration disabled
[2025-02-11 17:12:45,917][02117] Environment var CUDA_VISIBLE_DEVICES is 0
[2025-02-11 17:12:48,151][02117] cfg.num_envs_per_worker=5 must be a multiple of cfg.worker_num_splits=2 (for double-buffered sampling you need to use even number of envs per worker)
[2025-02-11 17:13:52,567][02117] Environment doom_basic already registered, overwriting...
[2025-02-11 17:13:52,569][02117] Environment doom_two_colors_easy already registered, overwriting...
[2025-02-11 17:13:52,570][02117] Environment doom_two_colors_hard already registered, overwriting...
[2025-02-11 17:13:52,571][02117] Environment doom_dm already registered, overwriting...
[2025-02-11 17:13:52,574][02117] Environment doom_dwango5 already registered, overwriting...
[2025-02-11 17:13:52,575][02117] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2025-02-11 17:13:52,576][02117] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2025-02-11 17:13:52,578][02117] Environment doom_my_way_home already registered, overwriting...
[2025-02-11 17:13:52,578][02117] Environment doom_deadly_corridor already registered, overwriting...
[2025-02-11 17:13:52,579][02117] Environment doom_defend_the_center already registered, overwriting...
[2025-02-11 17:13:52,580][02117] Environment doom_defend_the_line already registered, overwriting...
[2025-02-11 17:13:52,581][02117] Environment doom_health_gathering already registered, overwriting...
[2025-02-11 17:13:52,584][02117] Environment doom_health_gathering_supreme already registered, overwriting...
[2025-02-11 17:13:52,585][02117] Environment doom_battle already registered, overwriting...
[2025-02-11 17:13:52,586][02117] Environment doom_battle2 already registered, overwriting...
[2025-02-11 17:13:52,588][02117] Environment doom_duel_bots already registered, overwriting...
[2025-02-11 17:13:52,589][02117] Environment doom_deathmatch_bots already registered, overwriting...
[2025-02-11 17:13:52,590][02117] Environment doom_duel already registered, overwriting...
[2025-02-11 17:13:52,591][02117] Environment doom_deathmatch_full already registered, overwriting...
[2025-02-11 17:13:52,593][02117] Environment doom_benchmark already registered, overwriting...
[2025-02-11 17:13:52,594][02117] register_encoder_factory: <function make_vizdoom_encoder at 0x7da2c5ac6660>
[2025-02-11 17:13:52,602][02117] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-11 17:13:52,604][02117] Overriding arg 'num_workers' with value 10 passed from command line
[2025-02-11 17:13:52,604][02117] Overriding arg 'train_for_env_steps' with value 8000000 passed from command line
[2025-02-11 17:13:52,610][02117] Experiment dir /content/train_dir/default_experiment already exists!
[2025-02-11 17:13:52,611][02117] Resuming existing experiment from /content/train_dir/default_experiment...
[2025-02-11 17:13:52,613][02117] Weights and Biases integration disabled
[2025-02-11 17:13:52,615][02117] Environment var CUDA_VISIBLE_DEVICES is 0
[2025-02-11 17:13:54,826][02117] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=10
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=8000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
git_hash=unknown
git_repo_name=not a git repository
[2025-02-11 17:13:54,827][02117] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-02-11 17:13:54,830][02117] Rollout worker 0 uses device cpu
[2025-02-11 17:13:54,831][02117] Rollout worker 1 uses device cpu
[2025-02-11 17:13:54,833][02117] Rollout worker 2 uses device cpu
[2025-02-11 17:13:54,835][02117] Rollout worker 3 uses device cpu
[2025-02-11 17:13:54,835][02117] Rollout worker 4 uses device cpu
[2025-02-11 17:13:54,836][02117] Rollout worker 5 uses device cpu
[2025-02-11 17:13:54,837][02117] Rollout worker 6 uses device cpu
[2025-02-11 17:13:54,840][02117] Rollout worker 7 uses device cpu
[2025-02-11 17:13:54,841][02117] Rollout worker 8 uses device cpu
[2025-02-11 17:13:54,842][02117] Rollout worker 9 uses device cpu
[2025-02-11 17:13:54,882][02117] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 17:13:54,884][02117] InferenceWorker_p0-w0: min num requests: 3
[2025-02-11 17:13:54,922][02117] Starting all processes...
[2025-02-11 17:13:54,924][02117] Starting process learner_proc0
[2025-02-11 17:13:54,978][02117] Starting all processes...
[2025-02-11 17:13:54,982][02117] Starting process inference_proc0-0
[2025-02-11 17:13:54,982][02117] Starting process rollout_proc0
[2025-02-11 17:13:54,983][02117] Starting process rollout_proc1
[2025-02-11 17:13:54,983][02117] Starting process rollout_proc2
[2025-02-11 17:13:54,984][02117] Starting process rollout_proc3
[2025-02-11 17:13:54,985][02117] Starting process rollout_proc4
[2025-02-11 17:13:54,988][02117] Starting process rollout_proc5
[2025-02-11 17:13:54,992][02117] Starting process rollout_proc6
[2025-02-11 17:13:54,994][02117] Starting process rollout_proc7
[2025-02-11 17:13:54,995][02117] Starting process rollout_proc8
[2025-02-11 17:13:54,995][02117] Starting process rollout_proc9
[2025-02-11 17:13:58,235][10024] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 17:13:58,235][10024] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-02-11 17:13:58,255][10024] Num visible devices: 1
[2025-02-11 17:13:58,266][10049] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:13:58,296][10024] Starting seed is not provided
[2025-02-11 17:13:58,297][10024] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 17:13:58,297][10024] Initializing actor-critic model on device cuda:0
[2025-02-11 17:13:58,297][10024] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:13:58,296][10045] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:13:58,298][10024] RunningMeanStd input shape: (1,)
[2025-02-11 17:13:58,303][10042] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:13:58,312][10024] ConvEncoder: input_channels=3
[2025-02-11 17:13:58,360][10048] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:13:58,363][10046] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:13:58,378][10039] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 17:13:58,378][10039] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-02-11 17:13:58,388][10041] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:13:58,405][10039] Num visible devices: 1
[2025-02-11 17:13:58,444][10047] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:13:58,448][10040] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:13:58,470][10024] Conv encoder output size: 512
[2025-02-11 17:13:58,470][10024] Policy head output size: 512
[2025-02-11 17:13:58,487][10024] Created Actor Critic model with architecture:
[2025-02-11 17:13:58,487][10024] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-02-11 17:13:58,559][10044] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:13:58,589][10024] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-02-11 17:13:58,602][10043] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:13:59,525][10024] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-02-11 17:13:59,560][10024] Loading model from checkpoint
[2025-02-11 17:13:59,562][10024] Loaded experiment state at self.train_step=978, self.env_steps=4005888
[2025-02-11 17:13:59,562][10024] Initialized policy 0 weights for model version 978
[2025-02-11 17:13:59,564][10024] LearnerWorker_p0 finished initialization!
[2025-02-11 17:13:59,564][10024] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 17:13:59,643][10039] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:13:59,644][10039] RunningMeanStd input shape: (1,)
[2025-02-11 17:13:59,656][10039] ConvEncoder: input_channels=3
[2025-02-11 17:13:59,760][10039] Conv encoder output size: 512
[2025-02-11 17:13:59,760][10039] Policy head output size: 512
[2025-02-11 17:13:59,796][02117] Inference worker 0-0 is ready!
[2025-02-11 17:13:59,797][02117] All inference workers are ready! Signal rollout workers to start!
[2025-02-11 17:13:59,831][10044] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:13:59,831][10047] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:13:59,850][10048] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:13:59,851][10040] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:13:59,852][10042] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:13:59,852][10045] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:13:59,853][10046] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:13:59,853][10049] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:13:59,853][10043] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:13:59,853][10041] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:14:00,111][10044] Decorrelating experience for 0 frames...
[2025-02-11 17:14:00,126][10047] Decorrelating experience for 0 frames...
[2025-02-11 17:14:00,151][10041] Decorrelating experience for 0 frames...
[2025-02-11 17:14:00,151][10042] Decorrelating experience for 0 frames...
[2025-02-11 17:14:00,154][10048] Decorrelating experience for 0 frames...
[2025-02-11 17:14:00,372][10047] Decorrelating experience for 32 frames...
[2025-02-11 17:14:00,410][10041] Decorrelating experience for 32 frames...
[2025-02-11 17:14:00,411][10049] Decorrelating experience for 0 frames...
[2025-02-11 17:14:00,415][10040] Decorrelating experience for 0 frames...
[2025-02-11 17:14:00,418][10045] Decorrelating experience for 0 frames...
[2025-02-11 17:14:00,533][10042] Decorrelating experience for 32 frames...
[2025-02-11 17:14:00,544][10044] Decorrelating experience for 32 frames...
[2025-02-11 17:14:00,705][10045] Decorrelating experience for 32 frames...
[2025-02-11 17:14:00,707][10048] Decorrelating experience for 32 frames...
[2025-02-11 17:14:00,708][10049] Decorrelating experience for 32 frames...
[2025-02-11 17:14:00,727][10040] Decorrelating experience for 32 frames...
[2025-02-11 17:14:00,828][10046] Decorrelating experience for 0 frames...
[2025-02-11 17:14:00,833][10047] Decorrelating experience for 64 frames...
[2025-02-11 17:14:00,940][10042] Decorrelating experience for 64 frames...
[2025-02-11 17:14:00,995][10044] Decorrelating experience for 64 frames...
[2025-02-11 17:14:01,051][10048] Decorrelating experience for 64 frames...
[2025-02-11 17:14:01,100][10046] Decorrelating experience for 32 frames...
[2025-02-11 17:14:01,113][10040] Decorrelating experience for 64 frames...
[2025-02-11 17:14:01,149][10047] Decorrelating experience for 96 frames...
[2025-02-11 17:14:01,259][10042] Decorrelating experience for 96 frames...
[2025-02-11 17:14:01,279][10045] Decorrelating experience for 64 frames...
[2025-02-11 17:14:01,350][10044] Decorrelating experience for 96 frames...
[2025-02-11 17:14:01,386][10048] Decorrelating experience for 96 frames...
[2025-02-11 17:14:01,416][10041] Decorrelating experience for 64 frames...
[2025-02-11 17:14:01,421][10043] Decorrelating experience for 0 frames...
[2025-02-11 17:14:01,641][10045] Decorrelating experience for 96 frames...
[2025-02-11 17:14:01,665][10046] Decorrelating experience for 64 frames...
[2025-02-11 17:14:01,697][10049] Decorrelating experience for 64 frames...
[2025-02-11 17:14:01,732][10041] Decorrelating experience for 96 frames...
[2025-02-11 17:14:01,926][10043] Decorrelating experience for 32 frames...
[2025-02-11 17:14:02,048][10049] Decorrelating experience for 96 frames...
[2025-02-11 17:14:02,049][10046] Decorrelating experience for 96 frames...
[2025-02-11 17:14:02,266][10040] Decorrelating experience for 96 frames...
[2025-02-11 17:14:02,384][10043] Decorrelating experience for 64 frames...
[2025-02-11 17:14:02,580][10024] Signal inference workers to stop experience collection...
[2025-02-11 17:14:02,585][10039] InferenceWorker_p0-w0: stopping experience collection
[2025-02-11 17:14:02,615][02117] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-11 17:14:02,617][02117] Avg episode reward: [(0, '3.937')]
[2025-02-11 17:14:02,730][10043] Decorrelating experience for 96 frames...
[2025-02-11 17:14:03,610][10024] Signal inference workers to resume experience collection...
[2025-02-11 17:14:03,610][10039] InferenceWorker_p0-w0: resuming experience collection
[2025-02-11 17:14:05,147][10039] Updated weights for policy 0, policy_version 988 (0.0090)
[2025-02-11 17:14:06,918][10039] Updated weights for policy 0, policy_version 998 (0.0012)
[2025-02-11 17:14:07,615][02117] Fps is (10 sec: 19660.9, 60 sec: 19660.9, 300 sec: 19660.9). Total num frames: 4104192. Throughput: 0: 2941.2. Samples: 14706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 3.0)
[2025-02-11 17:14:07,617][02117] Avg episode reward: [(0, '20.606')]
[2025-02-11 17:14:08,692][10039] Updated weights for policy 0, policy_version 1008 (0.0012)
[2025-02-11 17:14:10,550][10039] Updated weights for policy 0, policy_version 1018 (0.0012)
[2025-02-11 17:14:12,481][10039] Updated weights for policy 0, policy_version 1028 (0.0013)
[2025-02-11 17:14:12,615][02117] Fps is (10 sec: 20480.1, 60 sec: 20480.1, 300 sec: 20480.1). Total num frames: 4210688. Throughput: 0: 4806.0. Samples: 48060. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:14:12,618][02117] Avg episode reward: [(0, '23.828')]
[2025-02-11 17:14:14,269][10039] Updated weights for policy 0, policy_version 1038 (0.0012)
[2025-02-11 17:14:14,874][02117] Heartbeat connected on Batcher_0
[2025-02-11 17:14:14,878][02117] Heartbeat connected on LearnerWorker_p0
[2025-02-11 17:14:14,889][02117] Heartbeat connected on InferenceWorker_p0-w0
[2025-02-11 17:14:14,891][02117] Heartbeat connected on RolloutWorker_w0
[2025-02-11 17:14:14,898][02117] Heartbeat connected on RolloutWorker_w2
[2025-02-11 17:14:14,900][02117] Heartbeat connected on RolloutWorker_w1
[2025-02-11 17:14:14,902][02117] Heartbeat connected on RolloutWorker_w3
[2025-02-11 17:14:14,906][02117] Heartbeat connected on RolloutWorker_w4
[2025-02-11 17:14:14,910][02117] Heartbeat connected on RolloutWorker_w5
[2025-02-11 17:14:14,914][02117] Heartbeat connected on RolloutWorker_w6
[2025-02-11 17:14:14,918][02117] Heartbeat connected on RolloutWorker_w7
[2025-02-11 17:14:14,920][02117] Heartbeat connected on RolloutWorker_w8
[2025-02-11 17:14:14,926][02117] Heartbeat connected on RolloutWorker_w9
[2025-02-11 17:14:16,073][10039] Updated weights for policy 0, policy_version 1048 (0.0012)
[2025-02-11 17:14:17,615][02117] Fps is (10 sec: 22118.4, 60 sec: 21299.2, 300 sec: 21299.2). Total num frames: 4325376. Throughput: 0: 4358.9. Samples: 65384. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:14:17,617][02117] Avg episode reward: [(0, '26.297')]
[2025-02-11 17:14:17,819][10039] Updated weights for policy 0, policy_version 1058 (0.0013)
[2025-02-11 17:14:19,571][10039] Updated weights for policy 0, policy_version 1068 (0.0013)
[2025-02-11 17:14:21,318][10039] Updated weights for policy 0, policy_version 1078 (0.0011)
[2025-02-11 17:14:22,616][02117] Fps is (10 sec: 22937.4, 60 sec: 21708.8, 300 sec: 21708.8). Total num frames: 4440064. Throughput: 0: 5012.3. Samples: 100246. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:14:22,619][02117] Avg episode reward: [(0, '27.202')]
[2025-02-11 17:14:23,256][10039] Updated weights for policy 0, policy_version 1088 (0.0012)
[2025-02-11 17:14:25,252][10039] Updated weights for policy 0, policy_version 1098 (0.0012)
[2025-02-11 17:14:27,028][10039] Updated weights for policy 0, policy_version 1108 (0.0012)
[2025-02-11 17:14:27,616][02117] Fps is (10 sec: 22527.9, 60 sec: 21790.7, 300 sec: 21790.7). Total num frames: 4550656. Throughput: 0: 5306.6. Samples: 132664. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:14:27,618][02117] Avg episode reward: [(0, '27.020')]
[2025-02-11 17:14:28,792][10039] Updated weights for policy 0, policy_version 1118 (0.0012)
[2025-02-11 17:14:30,539][10039] Updated weights for policy 0, policy_version 1128 (0.0012)
[2025-02-11 17:14:32,300][10039] Updated weights for policy 0, policy_version 1138 (0.0012)
[2025-02-11 17:14:32,616][02117] Fps is (10 sec: 22528.1, 60 sec: 21981.9, 300 sec: 21981.9). Total num frames: 4665344. Throughput: 0: 5004.8. Samples: 150144. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:14:32,618][02117] Avg episode reward: [(0, '26.149')]
[2025-02-11 17:14:34,045][10039] Updated weights for policy 0, policy_version 1148 (0.0012)
[2025-02-11 17:14:35,871][10039] Updated weights for policy 0, policy_version 1158 (0.0012)
[2025-02-11 17:14:37,615][02117] Fps is (10 sec: 22937.7, 60 sec: 22118.4, 300 sec: 22118.4). Total num frames: 4780032. Throughput: 0: 5280.3. Samples: 184810. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:14:37,617][02117] Avg episode reward: [(0, '24.301')]
[2025-02-11 17:14:37,737][10039] Updated weights for policy 0, policy_version 1168 (0.0012)
[2025-02-11 17:14:39,526][10039] Updated weights for policy 0, policy_version 1178 (0.0013)
[2025-02-11 17:14:41,280][10039] Updated weights for policy 0, policy_version 1188 (0.0012)
[2025-02-11 17:14:42,616][02117] Fps is (10 sec: 22937.5, 60 sec: 22220.8, 300 sec: 22220.8). Total num frames: 4894720. Throughput: 0: 5475.8. Samples: 219034. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:14:42,618][02117] Avg episode reward: [(0, '24.426')]
[2025-02-11 17:14:43,049][10039] Updated weights for policy 0, policy_version 1198 (0.0013)
[2025-02-11 17:14:44,806][10039] Updated weights for policy 0, policy_version 1208 (0.0012)
[2025-02-11 17:14:46,532][10039] Updated weights for policy 0, policy_version 1218 (0.0012)
[2025-02-11 17:14:47,616][02117] Fps is (10 sec: 23346.3, 60 sec: 22391.3, 300 sec: 22391.3). Total num frames: 5013504. Throughput: 0: 5256.5. Samples: 236544. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:14:47,618][02117] Avg episode reward: [(0, '23.581')]
[2025-02-11 17:14:48,261][10039] Updated weights for policy 0, policy_version 1228 (0.0011)
[2025-02-11 17:14:50,124][10039] Updated weights for policy 0, policy_version 1238 (0.0012)
[2025-02-11 17:14:51,920][10039] Updated weights for policy 0, policy_version 1248 (0.0012)
[2025-02-11 17:14:52,615][02117] Fps is (10 sec: 22937.7, 60 sec: 22364.2, 300 sec: 22364.2). Total num frames: 5124096. Throughput: 0: 5691.3. Samples: 270814. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:14:52,618][02117] Avg episode reward: [(0, '25.236')]
[2025-02-11 17:14:53,682][10039] Updated weights for policy 0, policy_version 1258 (0.0012)
[2025-02-11 17:14:55,440][10039] Updated weights for policy 0, policy_version 1268 (0.0012)
[2025-02-11 17:14:57,200][10039] Updated weights for policy 0, policy_version 1278 (0.0012)
[2025-02-11 17:14:57,616][02117] Fps is (10 sec: 22938.4, 60 sec: 22490.8, 300 sec: 22490.8). Total num frames: 5242880. Throughput: 0: 5728.6. Samples: 305846. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:14:57,618][02117] Avg episode reward: [(0, '26.685')]
[2025-02-11 17:14:58,939][10039] Updated weights for policy 0, policy_version 1288 (0.0013)
[2025-02-11 17:15:00,673][10039] Updated weights for policy 0, policy_version 1298 (0.0012)
[2025-02-11 17:15:02,480][10039] Updated weights for policy 0, policy_version 1308 (0.0012)
[2025-02-11 17:15:02,615][02117] Fps is (10 sec: 23347.3, 60 sec: 22528.0, 300 sec: 22528.0). Total num frames: 5357568. Throughput: 0: 5734.9. Samples: 323456. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:15:02,618][02117] Avg episode reward: [(0, '27.484')]
[2025-02-11 17:15:04,383][10039] Updated weights for policy 0, policy_version 1318 (0.0012)
[2025-02-11 17:15:06,130][10039] Updated weights for policy 0, policy_version 1328 (0.0012)
[2025-02-11 17:15:07,615][02117] Fps is (10 sec: 22937.7, 60 sec: 22801.1, 300 sec: 22559.5). Total num frames: 5472256. Throughput: 0: 5715.3. Samples: 357432. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:15:07,618][02117] Avg episode reward: [(0, '25.903')]
[2025-02-11 17:15:07,854][10039] Updated weights for policy 0, policy_version 1338 (0.0012)
[2025-02-11 17:15:09,607][10039] Updated weights for policy 0, policy_version 1348 (0.0012)
[2025-02-11 17:15:11,354][10039] Updated weights for policy 0, policy_version 1358 (0.0012)
[2025-02-11 17:15:12,615][02117] Fps is (10 sec: 23347.2, 60 sec: 23005.9, 300 sec: 22645.0). Total num frames: 5591040. Throughput: 0: 5779.2. Samples: 392730. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:15:12,617][02117] Avg episode reward: [(0, '24.576')]
[2025-02-11 17:15:13,078][10039] Updated weights for policy 0, policy_version 1368 (0.0012)
[2025-02-11 17:15:14,863][10039] Updated weights for policy 0, policy_version 1378 (0.0012)
[2025-02-11 17:15:16,759][10039] Updated weights for policy 0, policy_version 1388 (0.0013)
[2025-02-11 17:15:17,615][02117] Fps is (10 sec: 22937.6, 60 sec: 22937.6, 300 sec: 22609.9). Total num frames: 5701632. Throughput: 0: 5776.1. Samples: 410068. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-02-11 17:15:17,617][02117] Avg episode reward: [(0, '23.824')]
[2025-02-11 17:15:18,504][10039] Updated weights for policy 0, policy_version 1398 (0.0012)
[2025-02-11 17:15:20,247][10039] Updated weights for policy 0, policy_version 1408 (0.0012)
[2025-02-11 17:15:21,972][10039] Updated weights for policy 0, policy_version 1418 (0.0012)
[2025-02-11 17:15:22,615][02117] Fps is (10 sec: 22937.7, 60 sec: 23005.9, 300 sec: 22681.6). Total num frames: 5820416. Throughput: 0: 5770.8. Samples: 444496. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:15:22,617][02117] Avg episode reward: [(0, '26.195')]
[2025-02-11 17:15:23,703][10039] Updated weights for policy 0, policy_version 1428 (0.0012)
[2025-02-11 17:15:25,470][10039] Updated weights for policy 0, policy_version 1438 (0.0012)
[2025-02-11 17:15:27,224][10039] Updated weights for policy 0, policy_version 1448 (0.0012)
[2025-02-11 17:15:27,615][02117] Fps is (10 sec: 23756.8, 60 sec: 23142.4, 300 sec: 22744.9). Total num frames: 5939200. Throughput: 0: 5792.8. Samples: 479710. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:15:27,618][02117] Avg episode reward: [(0, '29.534')]
[2025-02-11 17:15:27,620][10024] Saving new best policy, reward=29.534!
[2025-02-11 17:15:29,122][10039] Updated weights for policy 0, policy_version 1458 (0.0012)
[2025-02-11 17:15:30,934][10039] Updated weights for policy 0, policy_version 1468 (0.0012)
[2025-02-11 17:15:32,616][02117] Fps is (10 sec: 22937.3, 60 sec: 23074.1, 300 sec: 22710.0). Total num frames: 6049792. Throughput: 0: 5770.2. Samples: 496200. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:15:32,618][02117] Avg episode reward: [(0, '27.920')]
[2025-02-11 17:15:32,701][10039] Updated weights for policy 0, policy_version 1478 (0.0013)
[2025-02-11 17:15:34,461][10039] Updated weights for policy 0, policy_version 1488 (0.0012)
[2025-02-11 17:15:36,201][10039] Updated weights for policy 0, policy_version 1498 (0.0012)
[2025-02-11 17:15:37,615][02117] Fps is (10 sec: 22937.7, 60 sec: 23142.4, 300 sec: 22765.2). Total num frames: 6168576. Throughput: 0: 5784.8. Samples: 531128. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:15:37,618][02117] Avg episode reward: [(0, '30.531')]
[2025-02-11 17:15:37,620][10024] Saving new best policy, reward=30.531!
[2025-02-11 17:15:37,915][10039] Updated weights for policy 0, policy_version 1508 (0.0012)
[2025-02-11 17:15:39,639][10039] Updated weights for policy 0, policy_version 1518 (0.0012)
[2025-02-11 17:15:41,445][10039] Updated weights for policy 0, policy_version 1528 (0.0013)
[2025-02-11 17:15:42,615][02117] Fps is (10 sec: 23347.4, 60 sec: 23142.4, 300 sec: 22773.8). Total num frames: 6283264. Throughput: 0: 5784.2. Samples: 566134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:15:42,617][02117] Avg episode reward: [(0, '27.030')]
[2025-02-11 17:15:43,300][10039] Updated weights for policy 0, policy_version 1538 (0.0013)
[2025-02-11 17:15:45,060][10039] Updated weights for policy 0, policy_version 1548 (0.0012)
[2025-02-11 17:15:46,788][10039] Updated weights for policy 0, policy_version 1558 (0.0012)
[2025-02-11 17:15:47,615][02117] Fps is (10 sec: 22937.5, 60 sec: 23074.3, 300 sec: 22781.6). Total num frames: 6397952. Throughput: 0: 5770.3. Samples: 583118. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:15:47,618][02117] Avg episode reward: [(0, '28.183')]
[2025-02-11 17:15:48,547][10039] Updated weights for policy 0, policy_version 1568 (0.0012)
[2025-02-11 17:15:50,273][10039] Updated weights for policy 0, policy_version 1578 (0.0012)
[2025-02-11 17:15:52,045][10039] Updated weights for policy 0, policy_version 1588 (0.0012)
[2025-02-11 17:15:52,616][02117] Fps is (10 sec: 23347.1, 60 sec: 23210.6, 300 sec: 22825.9). Total num frames: 6516736. Throughput: 0: 5798.9. Samples: 618384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:15:52,618][02117] Avg episode reward: [(0, '30.221')]
[2025-02-11 17:15:52,624][10024] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001591_6516736.pth...
[2025-02-11 17:15:52,696][10024] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000505_2068480.pth
[2025-02-11 17:15:53,822][10039] Updated weights for policy 0, policy_version 1598 (0.0012)
[2025-02-11 17:15:55,747][10039] Updated weights for policy 0, policy_version 1608 (0.0012)
[2025-02-11 17:15:57,518][10039] Updated weights for policy 0, policy_version 1618 (0.0012)
[2025-02-11 17:15:57,616][02117] Fps is (10 sec: 22937.4, 60 sec: 23074.1, 300 sec: 22795.1). Total num frames: 6627328. Throughput: 0: 5764.0. Samples: 652112. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:15:57,618][02117] Avg episode reward: [(0, '22.512')]
[2025-02-11 17:15:59,272][10039] Updated weights for policy 0, policy_version 1628 (0.0012)
[2025-02-11 17:16:01,015][10039] Updated weights for policy 0, policy_version 1638 (0.0012)
[2025-02-11 17:16:02,615][02117] Fps is (10 sec: 22937.8, 60 sec: 23142.4, 300 sec: 22835.2). Total num frames: 6746112. Throughput: 0: 5767.6. Samples: 669612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:16:02,618][02117] Avg episode reward: [(0, '27.309')]
[2025-02-11 17:16:02,743][10039] Updated weights for policy 0, policy_version 1648 (0.0012)
[2025-02-11 17:16:04,494][10039] Updated weights for policy 0, policy_version 1658 (0.0012)
[2025-02-11 17:16:06,274][10039] Updated weights for policy 0, policy_version 1668 (0.0012)
[2025-02-11 17:16:07,615][02117] Fps is (10 sec: 23347.4, 60 sec: 23142.4, 300 sec: 22839.3). Total num frames: 6860800. Throughput: 0: 5784.0. Samples: 704774. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:16:07,617][02117] Avg episode reward: [(0, '28.578')]
[2025-02-11 17:16:08,151][10039] Updated weights for policy 0, policy_version 1678 (0.0012)
[2025-02-11 17:16:09,943][10039] Updated weights for policy 0, policy_version 1688 (0.0012)
[2025-02-11 17:16:11,682][10039] Updated weights for policy 0, policy_version 1698 (0.0012)
[2025-02-11 17:16:12,615][02117] Fps is (10 sec: 22937.5, 60 sec: 23074.1, 300 sec: 22843.1). Total num frames: 6975488. Throughput: 0: 5757.6. Samples: 738804. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:16:12,618][02117] Avg episode reward: [(0, '28.419')]
[2025-02-11 17:16:13,465][10039] Updated weights for policy 0, policy_version 1708 (0.0012)
[2025-02-11 17:16:15,232][10039] Updated weights for policy 0, policy_version 1718 (0.0012)
[2025-02-11 17:16:17,001][10039] Updated weights for policy 0, policy_version 1728 (0.0012)
[2025-02-11 17:16:17,615][02117] Fps is (10 sec: 22937.6, 60 sec: 23142.4, 300 sec: 22846.6). Total num frames: 7090176. Throughput: 0: 5775.7. Samples: 756106. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:16:17,618][02117] Avg episode reward: [(0, '29.284')]
[2025-02-11 17:16:18,762][10039] Updated weights for policy 0, policy_version 1738 (0.0012)
[2025-02-11 17:16:20,604][10039] Updated weights for policy 0, policy_version 1748 (0.0012)
[2025-02-11 17:16:22,480][10039] Updated weights for policy 0, policy_version 1758 (0.0012)
[2025-02-11 17:16:22,615][02117] Fps is (10 sec: 22528.0, 60 sec: 23005.9, 300 sec: 22820.6). Total num frames: 7200768. Throughput: 0: 5761.2. Samples: 790384. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:16:22,618][02117] Avg episode reward: [(0, '31.747')]
[2025-02-11 17:16:22,625][10024] Saving new best policy, reward=31.747!
[2025-02-11 17:16:24,252][10039] Updated weights for policy 0, policy_version 1768 (0.0012)
[2025-02-11 17:16:26,047][10039] Updated weights for policy 0, policy_version 1778 (0.0012)
[2025-02-11 17:16:27,615][02117] Fps is (10 sec: 22937.6, 60 sec: 23005.9, 300 sec: 22852.9). Total num frames: 7319552. Throughput: 0: 5743.5. Samples: 824592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-02-11 17:16:27,617][02117] Avg episode reward: [(0, '29.525')]
[2025-02-11 17:16:27,810][10039] Updated weights for policy 0, policy_version 1788 (0.0012)
[2025-02-11 17:16:29,585][10039] Updated weights for policy 0, policy_version 1798 (0.0012)
[2025-02-11 17:16:31,340][10039] Updated weights for policy 0, policy_version 1808 (0.0012)
[2025-02-11 17:16:32,615][02117] Fps is (10 sec: 23347.2, 60 sec: 23074.2, 300 sec: 22855.7). Total num frames: 7434240. Throughput: 0: 5751.0. Samples: 841914. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:16:32,618][02117] Avg episode reward: [(0, '26.626')]
[2025-02-11 17:16:33,131][10039] Updated weights for policy 0, policy_version 1818 (0.0012)
[2025-02-11 17:16:35,034][10039] Updated weights for policy 0, policy_version 1828 (0.0012)
[2025-02-11 17:16:36,806][10039] Updated weights for policy 0, policy_version 1838 (0.0012)
[2025-02-11 17:16:37,615][02117] Fps is (10 sec: 22528.1, 60 sec: 22937.6, 300 sec: 22831.9). Total num frames: 7544832. Throughput: 0: 5715.8. Samples: 875596. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-02-11 17:16:37,618][02117] Avg episode reward: [(0, '29.534')]
[2025-02-11 17:16:38,551][10039] Updated weights for policy 0, policy_version 1848 (0.0012)
[2025-02-11 17:16:40,329][10039] Updated weights for policy 0, policy_version 1858 (0.0012)
[2025-02-11 17:16:42,066][10039] Updated weights for policy 0, policy_version 1868 (0.0012)
[2025-02-11 17:16:42,615][02117] Fps is (10 sec: 22937.6, 60 sec: 23005.9, 300 sec: 22860.8). Total num frames: 7663616. Throughput: 0: 5745.3. Samples: 910648. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:16:42,618][02117] Avg episode reward: [(0, '26.627')]
[2025-02-11 17:16:43,819][10039] Updated weights for policy 0, policy_version 1878 (0.0012)
[2025-02-11 17:16:45,594][10039] Updated weights for policy 0, policy_version 1888 (0.0012)
[2025-02-11 17:16:47,470][10039] Updated weights for policy 0, policy_version 1898 (0.0013)
[2025-02-11 17:16:47,615][02117] Fps is (10 sec: 22937.4, 60 sec: 22937.6, 300 sec: 22838.3). Total num frames: 7774208. Throughput: 0: 5747.2. Samples: 928238. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:16:47,618][02117] Avg episode reward: [(0, '29.294')]
[2025-02-11 17:16:49,296][10039] Updated weights for policy 0, policy_version 1908 (0.0012)
[2025-02-11 17:16:51,056][10039] Updated weights for policy 0, policy_version 1918 (0.0012)
[2025-02-11 17:16:52,615][02117] Fps is (10 sec: 22528.0, 60 sec: 22869.4, 300 sec: 22841.2). Total num frames: 7888896. Throughput: 0: 5714.6. Samples: 961932. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:16:52,618][02117] Avg episode reward: [(0, '26.959')]
[2025-02-11 17:16:52,825][10039] Updated weights for policy 0, policy_version 1928 (0.0012)
[2025-02-11 17:16:54,591][10039] Updated weights for policy 0, policy_version 1938 (0.0012)
[2025-02-11 17:16:56,352][10039] Updated weights for policy 0, policy_version 1948 (0.0013)
[2025-02-11 17:16:57,593][10024] Stopping Batcher_0...
[2025-02-11 17:16:57,593][10024] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2025-02-11 17:16:57,593][02117] Component Batcher_0 stopped!
[2025-02-11 17:16:57,594][10024] Loop batcher_evt_loop terminating...
[2025-02-11 17:16:57,616][10039] Weights refcount: 2 0
[2025-02-11 17:16:57,618][10039] Stopping InferenceWorker_p0-w0...
[2025-02-11 17:16:57,619][10039] Loop inference_proc0-0_evt_loop terminating...
[2025-02-11 17:16:57,619][02117] Component InferenceWorker_p0-w0 stopped!
[2025-02-11 17:16:57,644][10041] Stopping RolloutWorker_w2...
[2025-02-11 17:16:57,645][10041] Loop rollout_proc2_evt_loop terminating...
[2025-02-11 17:16:57,644][02117] Component RolloutWorker_w2 stopped!
[2025-02-11 17:16:57,651][10043] Stopping RolloutWorker_w4...
[2025-02-11 17:16:57,651][10049] Stopping RolloutWorker_w8...
[2025-02-11 17:16:57,651][10043] Loop rollout_proc4_evt_loop terminating...
[2025-02-11 17:16:57,652][10049] Loop rollout_proc8_evt_loop terminating...
[2025-02-11 17:16:57,652][10045] Stopping RolloutWorker_w6...
[2025-02-11 17:16:57,653][10045] Loop rollout_proc6_evt_loop terminating...
[2025-02-11 17:16:57,651][02117] Component RolloutWorker_w4 stopped!
[2025-02-11 17:16:57,655][10048] Stopping RolloutWorker_w9...
[2025-02-11 17:16:57,653][02117] Component RolloutWorker_w8 stopped!
[2025-02-11 17:16:57,655][10048] Loop rollout_proc9_evt_loop terminating...
[2025-02-11 17:16:57,656][10042] Stopping RolloutWorker_w1...
[2025-02-11 17:16:57,656][10046] Stopping RolloutWorker_w5...
[2025-02-11 17:16:57,656][10042] Loop rollout_proc1_evt_loop terminating...
[2025-02-11 17:16:57,657][10046] Loop rollout_proc5_evt_loop terminating...
[2025-02-11 17:16:57,655][02117] Component RolloutWorker_w6 stopped!
[2025-02-11 17:16:57,657][10044] Stopping RolloutWorker_w3...
[2025-02-11 17:16:57,657][10040] Stopping RolloutWorker_w0...
[2025-02-11 17:16:57,657][10044] Loop rollout_proc3_evt_loop terminating...
[2025-02-11 17:16:57,657][10040] Loop rollout_proc0_evt_loop terminating...
[2025-02-11 17:16:57,657][02117] Component RolloutWorker_w9 stopped!
[2025-02-11 17:16:57,658][02117] Component RolloutWorker_w1 stopped!
[2025-02-11 17:16:57,660][10047] Stopping RolloutWorker_w7...
[2025-02-11 17:16:57,661][10047] Loop rollout_proc7_evt_loop terminating...
[2025-02-11 17:16:57,659][02117] Component RolloutWorker_w5 stopped!
[2025-02-11 17:16:57,662][02117] Component RolloutWorker_w0 stopped!
[2025-02-11 17:16:57,663][02117] Component RolloutWorker_w3 stopped!
[2025-02-11 17:16:57,664][02117] Component RolloutWorker_w7 stopped!
[2025-02-11 17:16:57,676][10024] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth
[2025-02-11 17:16:57,689][10024] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2025-02-11 17:16:57,813][10024] Stopping LearnerWorker_p0...
[2025-02-11 17:16:57,813][10024] Loop learner_proc0_evt_loop terminating...
[2025-02-11 17:16:57,813][02117] Component LearnerWorker_p0 stopped!
[2025-02-11 17:16:57,817][02117] Waiting for process learner_proc0 to stop...
[2025-02-11 17:16:58,781][02117] Waiting for process inference_proc0-0 to join...
[2025-02-11 17:16:58,782][02117] Waiting for process rollout_proc0 to join...
[2025-02-11 17:16:58,784][02117] Waiting for process rollout_proc1 to join...
[2025-02-11 17:16:58,785][02117] Waiting for process rollout_proc2 to join...
[2025-02-11 17:16:58,786][02117] Waiting for process rollout_proc3 to join...
[2025-02-11 17:16:58,788][02117] Waiting for process rollout_proc4 to join...
[2025-02-11 17:16:58,790][02117] Waiting for process rollout_proc5 to join...
[2025-02-11 17:16:58,791][02117] Waiting for process rollout_proc6 to join...
[2025-02-11 17:16:58,792][02117] Waiting for process rollout_proc7 to join...
[2025-02-11 17:16:58,794][02117] Waiting for process rollout_proc8 to join...
[2025-02-11 17:16:58,795][02117] Waiting for process rollout_proc9 to join...
[2025-02-11 17:16:58,796][02117] Batcher 0 profile tree view:
batching: 15.6213, releasing_batches: 0.0232
[2025-02-11 17:16:58,798][02117] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
wait_policy_total: 3.8488
update_model: 3.0511
weight_update: 0.0012
one_step: 0.0028
handle_policy_step: 162.9518
deserialize: 7.0491, stack: 1.0774, obs_to_device_normalize: 40.5926, forward: 74.6922, send_messages: 12.8839
prepare_outputs: 20.3975
to_cpu: 13.1915
[2025-02-11 17:16:58,799][02117] Learner 0 profile tree view:
misc: 0.0044, prepare_batch: 9.7605
train: 23.2426
epoch_init: 0.0047, minibatch_init: 0.0052, losses_postprocess: 0.2465, kl_divergence: 0.3402, after_optimizer: 0.5242
calculate_losses: 9.1531
losses_init: 0.0034, forward_head: 0.6992, bptt_initial: 5.4273, tail: 0.6030, advantages_returns: 0.1588, losses: 1.0276
bptt: 1.0847
bptt_forward_core: 1.0347
update: 12.6363
clip: 0.7410
[2025-02-11 17:16:58,800][02117] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.1051, enqueue_policy_requests: 7.2853, env_step: 110.2883, overhead: 4.5671, complete_rollouts: 0.1765
save_policy_outputs: 6.6263
split_output_tensors: 2.5375
[2025-02-11 17:16:58,802][02117] RolloutWorker_w9 profile tree view:
wait_for_trajectories: 0.1058, enqueue_policy_requests: 7.3334, env_step: 110.5633, overhead: 4.5568, complete_rollouts: 0.1765
save_policy_outputs: 6.6708
split_output_tensors: 2.5414
[2025-02-11 17:16:58,804][02117] Loop Runner_EvtLoop terminating...
[2025-02-11 17:16:58,805][02117] Runner profile tree view:
main_loop: 183.8825
[2025-02-11 17:16:58,807][02117] Collected {0: 8007680}, FPS: 21762.8
[2025-02-11 17:17:45,982][02117] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-11 17:17:45,984][02117] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-11 17:17:45,985][02117] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-11 17:17:45,986][02117] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-11 17:17:45,987][02117] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-11 17:17:45,989][02117] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-11 17:17:45,989][02117] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-02-11 17:17:45,991][02117] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-11 17:17:45,992][02117] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-02-11 17:17:45,993][02117] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-02-11 17:17:45,994][02117] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-11 17:17:45,995][02117] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-11 17:17:45,996][02117] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-11 17:17:45,998][02117] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-11 17:17:45,999][02117] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-11 17:17:46,033][02117] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:17:46,035][02117] RunningMeanStd input shape: (1,)
[2025-02-11 17:17:46,046][02117] ConvEncoder: input_channels=3
[2025-02-11 17:17:46,083][02117] Conv encoder output size: 512
[2025-02-11 17:17:46,084][02117] Policy head output size: 512
[2025-02-11 17:17:46,105][02117] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2025-02-11 17:17:46,523][02117] Num frames 100...
[2025-02-11 17:17:46,649][02117] Num frames 200...
[2025-02-11 17:17:46,775][02117] Num frames 300...
[2025-02-11 17:17:46,900][02117] Num frames 400...
[2025-02-11 17:17:47,025][02117] Num frames 500...
[2025-02-11 17:17:47,149][02117] Num frames 600...
[2025-02-11 17:17:47,274][02117] Num frames 700...
[2025-02-11 17:17:47,398][02117] Num frames 800...
[2025-02-11 17:17:47,525][02117] Num frames 900...
[2025-02-11 17:17:47,651][02117] Num frames 1000...
[2025-02-11 17:17:47,779][02117] Num frames 1100...
[2025-02-11 17:17:47,905][02117] Num frames 1200...
[2025-02-11 17:17:48,033][02117] Num frames 1300...
[2025-02-11 17:17:48,161][02117] Num frames 1400...
[2025-02-11 17:17:48,289][02117] Num frames 1500...
[2025-02-11 17:17:48,417][02117] Num frames 1600...
[2025-02-11 17:17:48,545][02117] Num frames 1700...
[2025-02-11 17:17:48,670][02117] Num frames 1800...
[2025-02-11 17:17:48,799][02117] Num frames 1900...
[2025-02-11 17:17:48,928][02117] Num frames 2000...
[2025-02-11 17:17:49,058][02117] Num frames 2100...
[2025-02-11 17:17:49,110][02117] Avg episode rewards: #0: 60.999, true rewards: #0: 21.000
[2025-02-11 17:17:49,112][02117] Avg episode reward: 60.999, avg true_objective: 21.000
[2025-02-11 17:17:49,237][02117] Num frames 2200...
[2025-02-11 17:17:49,363][02117] Num frames 2300...
[2025-02-11 17:17:49,488][02117] Num frames 2400...
[2025-02-11 17:17:49,615][02117] Num frames 2500...
[2025-02-11 17:17:49,740][02117] Num frames 2600...
[2025-02-11 17:17:49,868][02117] Num frames 2700...
[2025-02-11 17:17:49,993][02117] Num frames 2800...
[2025-02-11 17:17:50,119][02117] Num frames 2900...
[2025-02-11 17:17:50,249][02117] Num frames 3000...
[2025-02-11 17:17:50,377][02117] Num frames 3100...
[2025-02-11 17:17:50,504][02117] Num frames 3200...
[2025-02-11 17:17:50,629][02117] Num frames 3300...
[2025-02-11 17:17:50,755][02117] Num frames 3400...
[2025-02-11 17:17:50,886][02117] Num frames 3500...
[2025-02-11 17:17:51,014][02117] Num frames 3600...
[2025-02-11 17:17:51,142][02117] Num frames 3700...
[2025-02-11 17:17:51,272][02117] Num frames 3800...
[2025-02-11 17:17:51,401][02117] Num frames 3900...
[2025-02-11 17:17:51,527][02117] Num frames 4000...
[2025-02-11 17:17:51,655][02117] Num frames 4100...
[2025-02-11 17:17:51,785][02117] Num frames 4200...
[2025-02-11 17:17:51,837][02117] Avg episode rewards: #0: 62.499, true rewards: #0: 21.000
[2025-02-11 17:17:51,839][02117] Avg episode reward: 62.499, avg true_objective: 21.000
[2025-02-11 17:17:51,966][02117] Num frames 4300...
[2025-02-11 17:17:52,094][02117] Num frames 4400...
[2025-02-11 17:17:52,221][02117] Num frames 4500...
[2025-02-11 17:17:52,348][02117] Num frames 4600...
[2025-02-11 17:17:52,474][02117] Num frames 4700...
[2025-02-11 17:17:52,600][02117] Num frames 4800...
[2025-02-11 17:17:52,728][02117] Num frames 4900...
[2025-02-11 17:17:52,855][02117] Num frames 5000...
[2025-02-11 17:17:52,980][02117] Num frames 5100...
[2025-02-11 17:17:53,109][02117] Num frames 5200...
[2025-02-11 17:17:53,236][02117] Num frames 5300...
[2025-02-11 17:17:53,317][02117] Avg episode rewards: #0: 50.399, true rewards: #0: 17.733
[2025-02-11 17:17:53,319][02117] Avg episode reward: 50.399, avg true_objective: 17.733
[2025-02-11 17:17:53,418][02117] Num frames 5400...
[2025-02-11 17:17:53,541][02117] Num frames 5500...
[2025-02-11 17:17:53,669][02117] Num frames 5600...
[2025-02-11 17:17:53,796][02117] Num frames 5700...
[2025-02-11 17:17:53,924][02117] Num frames 5800...
[2025-02-11 17:17:54,050][02117] Num frames 5900...
[2025-02-11 17:17:54,182][02117] Num frames 6000...
[2025-02-11 17:17:54,309][02117] Num frames 6100...
[2025-02-11 17:17:54,433][02117] Num frames 6200...
[2025-02-11 17:17:54,562][02117] Num frames 6300...
[2025-02-11 17:17:54,690][02117] Num frames 6400...
[2025-02-11 17:17:54,817][02117] Num frames 6500...
[2025-02-11 17:17:54,946][02117] Num frames 6600...
[2025-02-11 17:17:55,083][02117] Num frames 6700...
[2025-02-11 17:17:55,198][02117] Avg episode rewards: #0: 47.117, true rewards: #0: 16.868
[2025-02-11 17:17:55,200][02117] Avg episode reward: 47.117, avg true_objective: 16.868
[2025-02-11 17:17:55,270][02117] Num frames 6800...
[2025-02-11 17:17:55,404][02117] Num frames 6900...
[2025-02-11 17:17:55,531][02117] Num frames 7000...
[2025-02-11 17:17:55,662][02117] Num frames 7100...
[2025-02-11 17:17:55,801][02117] Num frames 7200...
[2025-02-11 17:17:55,928][02117] Num frames 7300...
[2025-02-11 17:17:56,061][02117] Num frames 7400...
[2025-02-11 17:17:56,192][02117] Num frames 7500...
[2025-02-11 17:17:56,324][02117] Num frames 7600...
[2025-02-11 17:17:56,454][02117] Num frames 7700...
[2025-02-11 17:17:56,583][02117] Num frames 7800...
[2025-02-11 17:17:56,718][02117] Num frames 7900...
[2025-02-11 17:17:56,846][02117] Num frames 8000...
[2025-02-11 17:17:56,976][02117] Num frames 8100...
[2025-02-11 17:17:57,104][02117] Num frames 8200...
[2025-02-11 17:17:57,233][02117] Num frames 8300...
[2025-02-11 17:17:57,361][02117] Num frames 8400...
[2025-02-11 17:17:57,488][02117] Num frames 8500...
[2025-02-11 17:17:57,616][02117] Num frames 8600...
[2025-02-11 17:17:57,755][02117] Avg episode rewards: #0: 48.531, true rewards: #0: 17.332
[2025-02-11 17:17:57,757][02117] Avg episode reward: 48.531, avg true_objective: 17.332
[2025-02-11 17:17:57,800][02117] Num frames 8700...
[2025-02-11 17:17:57,924][02117] Num frames 8800...
[2025-02-11 17:17:58,054][02117] Num frames 8900...
[2025-02-11 17:17:58,179][02117] Num frames 9000...
[2025-02-11 17:17:58,306][02117] Num frames 9100...
[2025-02-11 17:17:58,431][02117] Num frames 9200...
[2025-02-11 17:17:58,557][02117] Num frames 9300...
[2025-02-11 17:17:58,684][02117] Num frames 9400...
[2025-02-11 17:17:58,807][02117] Num frames 9500...
[2025-02-11 17:17:58,980][02117] Avg episode rewards: #0: 43.823, true rewards: #0: 15.990
[2025-02-11 17:17:58,981][02117] Avg episode reward: 43.823, avg true_objective: 15.990
[2025-02-11 17:17:58,990][02117] Num frames 9600...
[2025-02-11 17:17:59,121][02117] Num frames 9700...
[2025-02-11 17:17:59,247][02117] Num frames 9800...
[2025-02-11 17:17:59,374][02117] Num frames 9900...
[2025-02-11 17:17:59,499][02117] Num frames 10000...
[2025-02-11 17:17:59,626][02117] Num frames 10100...
[2025-02-11 17:17:59,756][02117] Num frames 10200...
[2025-02-11 17:17:59,889][02117] Num frames 10300...
[2025-02-11 17:18:00,020][02117] Num frames 10400...
[2025-02-11 17:18:00,146][02117] Num frames 10500...
[2025-02-11 17:18:00,270][02117] Avg episode rewards: #0: 40.362, true rewards: #0: 15.077
[2025-02-11 17:18:00,271][02117] Avg episode reward: 40.362, avg true_objective: 15.077
[2025-02-11 17:18:00,328][02117] Num frames 10600...
[2025-02-11 17:18:00,451][02117] Num frames 10700...
[2025-02-11 17:18:00,578][02117] Num frames 10800...
[2025-02-11 17:18:00,704][02117] Num frames 10900...
[2025-02-11 17:18:00,829][02117] Num frames 11000...
[2025-02-11 17:18:00,958][02117] Num frames 11100...
[2025-02-11 17:18:01,086][02117] Num frames 11200...
[2025-02-11 17:18:01,213][02117] Num frames 11300...
[2025-02-11 17:18:01,337][02117] Num frames 11400...
[2025-02-11 17:18:01,464][02117] Num frames 11500...
[2025-02-11 17:18:01,591][02117] Num frames 11600...
[2025-02-11 17:18:01,719][02117] Num frames 11700...
[2025-02-11 17:18:01,846][02117] Num frames 11800...
[2025-02-11 17:18:01,996][02117] Avg episode rewards: #0: 39.345, true rewards: #0: 14.845
[2025-02-11 17:18:01,998][02117] Avg episode reward: 39.345, avg true_objective: 14.845
[2025-02-11 17:18:02,032][02117] Num frames 11900...
[2025-02-11 17:18:02,157][02117] Num frames 12000...
[2025-02-11 17:18:02,283][02117] Num frames 12100...
[2025-02-11 17:18:02,408][02117] Num frames 12200...
[2025-02-11 17:18:02,534][02117] Num frames 12300...
[2025-02-11 17:18:02,660][02117] Num frames 12400...
[2025-02-11 17:18:02,786][02117] Num frames 12500...
[2025-02-11 17:18:02,916][02117] Num frames 12600...
[2025-02-11 17:18:03,041][02117] Num frames 12700...
[2025-02-11 17:18:03,169][02117] Num frames 12800...
[2025-02-11 17:18:03,296][02117] Num frames 12900...
[2025-02-11 17:18:03,423][02117] Num frames 13000...
[2025-02-11 17:18:03,549][02117] Num frames 13100...
[2025-02-11 17:18:03,677][02117] Num frames 13200...
[2025-02-11 17:18:03,805][02117] Num frames 13300...
[2025-02-11 17:18:03,931][02117] Num frames 13400...
[2025-02-11 17:18:04,056][02117] Num frames 13500...
[2025-02-11 17:18:04,183][02117] Num frames 13600...
[2025-02-11 17:18:04,310][02117] Num frames 13700...
[2025-02-11 17:18:04,441][02117] Num frames 13800...
[2025-02-11 17:18:04,570][02117] Num frames 13900...
[2025-02-11 17:18:04,721][02117] Avg episode rewards: #0: 40.973, true rewards: #0: 15.529
[2025-02-11 17:18:04,723][02117] Avg episode reward: 40.973, avg true_objective: 15.529
[2025-02-11 17:18:04,756][02117] Num frames 14000...
[2025-02-11 17:18:04,881][02117] Num frames 14100...
[2025-02-11 17:18:05,006][02117] Num frames 14200...
[2025-02-11 17:18:05,134][02117] Num frames 14300...
[2025-02-11 17:18:05,259][02117] Num frames 14400...
[2025-02-11 17:18:05,422][02117] Avg episode rewards: #0: 37.688, true rewards: #0: 14.488
[2025-02-11 17:18:05,423][02117] Avg episode reward: 37.688, avg true_objective: 14.488
[2025-02-11 17:18:39,819][02117] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-11 17:19:15,119][02117] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-11 17:19:15,120][02117] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-11 17:19:15,122][02117] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-11 17:19:15,123][02117] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-11 17:19:15,124][02117] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-11 17:19:15,125][02117] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-11 17:19:15,126][02117] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-02-11 17:19:15,128][02117] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-11 17:19:15,129][02117] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-02-11 17:19:15,130][02117] Adding new argument 'hf_repository'='mjm54/doom_health_gathering_supreme' that is not in the saved config file!
[2025-02-11 17:19:15,131][02117] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-11 17:19:15,133][02117] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-11 17:19:15,134][02117] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-11 17:19:15,136][02117] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-11 17:19:15,137][02117] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-11 17:19:15,161][02117] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:19:15,164][02117] RunningMeanStd input shape: (1,)
[2025-02-11 17:19:15,175][02117] ConvEncoder: input_channels=3
[2025-02-11 17:19:15,211][02117] Conv encoder output size: 512
[2025-02-11 17:19:15,212][02117] Policy head output size: 512
[2025-02-11 17:19:15,232][02117] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2025-02-11 17:19:15,672][02117] Num frames 100...
[2025-02-11 17:19:15,795][02117] Num frames 200...
[2025-02-11 17:19:15,929][02117] Num frames 300...
[2025-02-11 17:19:16,063][02117] Num frames 400...
[2025-02-11 17:19:16,197][02117] Num frames 500...
[2025-02-11 17:19:16,333][02117] Num frames 600...
[2025-02-11 17:19:16,468][02117] Num frames 700...
[2025-02-11 17:19:16,604][02117] Num frames 800...
[2025-02-11 17:19:16,747][02117] Avg episode rewards: #0: 20.640, true rewards: #0: 8.640
[2025-02-11 17:19:16,748][02117] Avg episode reward: 20.640, avg true_objective: 8.640
[2025-02-11 17:19:16,801][02117] Num frames 900...
[2025-02-11 17:19:16,934][02117] Num frames 1000...
[2025-02-11 17:19:17,077][02117] Num frames 1100...
[2025-02-11 17:19:17,212][02117] Num frames 1200...
[2025-02-11 17:19:17,350][02117] Num frames 1300...
[2025-02-11 17:19:17,487][02117] Num frames 1400...
[2025-02-11 17:19:17,622][02117] Num frames 1500...
[2025-02-11 17:19:17,764][02117] Num frames 1600...
[2025-02-11 17:19:17,898][02117] Num frames 1700...
[2025-02-11 17:19:18,033][02117] Num frames 1800...
[2025-02-11 17:19:18,169][02117] Num frames 1900...
[2025-02-11 17:19:18,307][02117] Num frames 2000...
[2025-02-11 17:19:18,444][02117] Num frames 2100...
[2025-02-11 17:19:18,515][02117] Avg episode rewards: #0: 26.560, true rewards: #0: 10.560
[2025-02-11 17:19:18,516][02117] Avg episode reward: 26.560, avg true_objective: 10.560
[2025-02-11 17:19:18,631][02117] Num frames 2200...
[2025-02-11 17:19:18,758][02117] Num frames 2300...
[2025-02-11 17:19:18,885][02117] Num frames 2400...
[2025-02-11 17:19:19,014][02117] Num frames 2500...
[2025-02-11 17:19:19,140][02117] Num frames 2600...
[2025-02-11 17:19:19,272][02117] Num frames 2700...
[2025-02-11 17:19:19,398][02117] Num frames 2800...
[2025-02-11 17:19:19,523][02117] Num frames 2900...
[2025-02-11 17:19:19,648][02117] Num frames 3000...
[2025-02-11 17:19:19,771][02117] Num frames 3100...
[2025-02-11 17:19:19,896][02117] Num frames 3200...
[2025-02-11 17:19:20,070][02117] Avg episode rewards: #0: 26.654, true rewards: #0: 10.987
[2025-02-11 17:19:20,071][02117] Avg episode reward: 26.654, avg true_objective: 10.987
[2025-02-11 17:19:20,078][02117] Num frames 3300...
[2025-02-11 17:19:20,202][02117] Num frames 3400...
[2025-02-11 17:19:20,326][02117] Num frames 3500...
[2025-02-11 17:19:20,467][02117] Avg episode rewards: #0: 20.663, true rewards: #0: 8.912
[2025-02-11 17:19:20,468][02117] Avg episode reward: 20.663, avg true_objective: 8.912
[2025-02-11 17:19:20,515][02117] Num frames 3600...
[2025-02-11 17:19:20,647][02117] Num frames 3700...
[2025-02-11 17:19:20,786][02117] Num frames 3800...
[2025-02-11 17:19:20,870][02117] Avg episode rewards: #0: 17.442, true rewards: #0: 7.642
[2025-02-11 17:19:20,872][02117] Avg episode reward: 17.442, avg true_objective: 7.642
[2025-02-11 17:19:20,971][02117] Num frames 3900...
[2025-02-11 17:19:21,095][02117] Num frames 4000...
[2025-02-11 17:19:21,220][02117] Num frames 4100...
[2025-02-11 17:19:21,344][02117] Num frames 4200...
[2025-02-11 17:19:21,472][02117] Num frames 4300...
[2025-02-11 17:19:21,597][02117] Num frames 4400...
[2025-02-11 17:19:21,731][02117] Num frames 4500...
[2025-02-11 17:19:21,881][02117] Avg episode rewards: #0: 17.118, true rewards: #0: 7.618
[2025-02-11 17:19:21,883][02117] Avg episode reward: 17.118, avg true_objective: 7.618
[2025-02-11 17:19:21,923][02117] Num frames 4600...
[2025-02-11 17:19:22,056][02117] Num frames 4700...
[2025-02-11 17:19:22,187][02117] Num frames 4800...
[2025-02-11 17:19:22,322][02117] Num frames 4900...
[2025-02-11 17:19:22,455][02117] Num frames 5000...
[2025-02-11 17:19:22,584][02117] Num frames 5100...
[2025-02-11 17:19:22,710][02117] Num frames 5200...
[2025-02-11 17:19:22,834][02117] Num frames 5300...
[2025-02-11 17:19:22,957][02117] Num frames 5400...
[2025-02-11 17:19:23,085][02117] Num frames 5500...
[2025-02-11 17:19:23,219][02117] Avg episode rewards: #0: 17.519, true rewards: #0: 7.947
[2025-02-11 17:19:23,221][02117] Avg episode reward: 17.519, avg true_objective: 7.947
[2025-02-11 17:19:23,269][02117] Num frames 5600...
[2025-02-11 17:19:23,397][02117] Num frames 5700...
[2025-02-11 17:19:23,525][02117] Num frames 5800...
[2025-02-11 17:19:23,653][02117] Num frames 5900...
[2025-02-11 17:19:23,777][02117] Num frames 6000...
[2025-02-11 17:19:23,906][02117] Num frames 6100...
[2025-02-11 17:19:24,034][02117] Num frames 6200...
[2025-02-11 17:19:24,161][02117] Num frames 6300...
[2025-02-11 17:19:24,286][02117] Num frames 6400...
[2025-02-11 17:19:24,414][02117] Num frames 6500...
[2025-02-11 17:19:24,538][02117] Num frames 6600...
[2025-02-11 17:19:24,664][02117] Num frames 6700...
[2025-02-11 17:19:24,792][02117] Num frames 6800...
[2025-02-11 17:19:24,885][02117] Avg episode rewards: #0: 19.413, true rewards: #0: 8.537
[2025-02-11 17:19:24,887][02117] Avg episode reward: 19.413, avg true_objective: 8.537
[2025-02-11 17:19:24,974][02117] Num frames 6900...
[2025-02-11 17:19:25,099][02117] Num frames 7000...
[2025-02-11 17:19:25,224][02117] Num frames 7100...
[2025-02-11 17:19:25,348][02117] Num frames 7200...
[2025-02-11 17:19:25,472][02117] Num frames 7300...
[2025-02-11 17:19:25,601][02117] Num frames 7400...
[2025-02-11 17:19:25,727][02117] Num frames 7500...
[2025-02-11 17:19:25,853][02117] Num frames 7600...
[2025-02-11 17:19:25,980][02117] Num frames 7700...
[2025-02-11 17:19:26,108][02117] Num frames 7800...
[2025-02-11 17:19:26,236][02117] Num frames 7900...
[2025-02-11 17:19:26,361][02117] Num frames 8000...
[2025-02-11 17:19:26,483][02117] Num frames 8100...
[2025-02-11 17:19:26,621][02117] Num frames 8200...
[2025-02-11 17:19:26,757][02117] Num frames 8300...
[2025-02-11 17:19:26,893][02117] Num frames 8400...
[2025-02-11 17:19:27,029][02117] Num frames 8500...
[2025-02-11 17:19:27,165][02117] Num frames 8600...
[2025-02-11 17:19:27,298][02117] Num frames 8700...
[2025-02-11 17:19:27,432][02117] Num frames 8800...
[2025-02-11 17:19:27,563][02117] Num frames 8900...
[2025-02-11 17:19:27,658][02117] Avg episode rewards: #0: 23.700, true rewards: #0: 9.922
[2025-02-11 17:19:27,660][02117] Avg episode reward: 23.700, avg true_objective: 9.922
[2025-02-11 17:19:27,751][02117] Num frames 9000...
[2025-02-11 17:19:27,884][02117] Num frames 9100...
[2025-02-11 17:19:28,017][02117] Num frames 9200...
[2025-02-11 17:19:28,145][02117] Num frames 9300...
[2025-02-11 17:19:28,280][02117] Num frames 9400...
[2025-02-11 17:19:28,415][02117] Num frames 9500...
[2025-02-11 17:19:28,550][02117] Num frames 9600...
[2025-02-11 17:19:28,688][02117] Num frames 9700...
[2025-02-11 17:19:28,823][02117] Num frames 9800...
[2025-02-11 17:19:28,959][02117] Num frames 9900...
[2025-02-11 17:19:29,046][02117] Avg episode rewards: #0: 23.822, true rewards: #0: 9.922
[2025-02-11 17:19:29,047][02117] Avg episode reward: 23.822, avg true_objective: 9.922
[2025-02-11 17:19:52,814][02117] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-11 17:20:02,963][02117] The model has been pushed to https://huggingface.co/mjm54/doom_health_gathering_supreme
[2025-02-11 17:22:00,187][02117] Environment doom_basic already registered, overwriting...
[2025-02-11 17:22:00,191][02117] Environment doom_two_colors_easy already registered, overwriting...
[2025-02-11 17:22:00,191][02117] Environment doom_two_colors_hard already registered, overwriting...
[2025-02-11 17:22:00,194][02117] Environment doom_dm already registered, overwriting...
[2025-02-11 17:22:00,195][02117] Environment doom_dwango5 already registered, overwriting...
[2025-02-11 17:22:00,197][02117] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2025-02-11 17:22:00,198][02117] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2025-02-11 17:22:00,199][02117] Environment doom_my_way_home already registered, overwriting...
[2025-02-11 17:22:00,200][02117] Environment doom_deadly_corridor already registered, overwriting...
[2025-02-11 17:22:00,201][02117] Environment doom_defend_the_center already registered, overwriting...
[2025-02-11 17:22:00,202][02117] Environment doom_defend_the_line already registered, overwriting...
[2025-02-11 17:22:00,206][02117] Environment doom_health_gathering already registered, overwriting...
[2025-02-11 17:22:00,207][02117] Environment doom_health_gathering_supreme already registered, overwriting...
[2025-02-11 17:22:00,209][02117] Environment doom_battle already registered, overwriting...
[2025-02-11 17:22:00,210][02117] Environment doom_battle2 already registered, overwriting...
[2025-02-11 17:22:00,212][02117] Environment doom_duel_bots already registered, overwriting...
[2025-02-11 17:22:00,213][02117] Environment doom_deathmatch_bots already registered, overwriting...
[2025-02-11 17:22:00,215][02117] Environment doom_duel already registered, overwriting...
[2025-02-11 17:22:00,216][02117] Environment doom_deathmatch_full already registered, overwriting...
[2025-02-11 17:22:00,218][02117] Environment doom_benchmark already registered, overwriting...
[2025-02-11 17:22:00,219][02117] register_encoder_factory: <function make_vizdoom_encoder at 0x7da2c5ac6660>
[2025-02-11 17:22:00,227][02117] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-11 17:22:00,229][02117] Overriding arg 'train_for_env_steps' with value 16000000 passed from command line
[2025-02-11 17:22:00,234][02117] Experiment dir /content/train_dir/default_experiment already exists!
[2025-02-11 17:22:00,235][02117] Resuming existing experiment from /content/train_dir/default_experiment...
[2025-02-11 17:22:00,237][02117] Weights and Biases integration disabled
[2025-02-11 17:22:00,240][02117] Environment var CUDA_VISIBLE_DEVICES is 0
[2025-02-11 17:22:02,431][02117] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=10
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=16000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
git_hash=unknown
git_repo_name=not a git repository
[2025-02-11 17:22:02,433][02117] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-02-11 17:22:02,436][02117] Rollout worker 0 uses device cpu
[2025-02-11 17:22:02,436][02117] Rollout worker 1 uses device cpu
[2025-02-11 17:22:02,437][02117] Rollout worker 2 uses device cpu
[2025-02-11 17:22:02,438][02117] Rollout worker 3 uses device cpu
[2025-02-11 17:22:02,440][02117] Rollout worker 4 uses device cpu
[2025-02-11 17:22:02,441][02117] Rollout worker 5 uses device cpu
[2025-02-11 17:22:02,443][02117] Rollout worker 6 uses device cpu
[2025-02-11 17:22:02,444][02117] Rollout worker 7 uses device cpu
[2025-02-11 17:22:02,445][02117] Rollout worker 8 uses device cpu
[2025-02-11 17:22:02,446][02117] Rollout worker 9 uses device cpu
[2025-02-11 17:22:02,496][02117] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 17:22:02,497][02117] InferenceWorker_p0-w0: min num requests: 3
[2025-02-11 17:22:02,536][02117] Starting all processes...
[2025-02-11 17:22:02,537][02117] Starting process learner_proc0
[2025-02-11 17:22:02,590][02117] Starting all processes...
[2025-02-11 17:22:02,594][02117] Starting process inference_proc0-0
[2025-02-11 17:22:02,595][02117] Starting process rollout_proc0
[2025-02-11 17:22:02,595][02117] Starting process rollout_proc1
[2025-02-11 17:22:02,595][02117] Starting process rollout_proc2
[2025-02-11 17:22:02,598][02117] Starting process rollout_proc3
[2025-02-11 17:22:02,601][02117] Starting process rollout_proc4
[2025-02-11 17:22:02,602][02117] Starting process rollout_proc5
[2025-02-11 17:22:02,602][02117] Starting process rollout_proc6
[2025-02-11 17:22:02,603][02117] Starting process rollout_proc7
[2025-02-11 17:22:02,604][02117] Starting process rollout_proc8
[2025-02-11 17:22:02,608][02117] Starting process rollout_proc9
[2025-02-11 17:22:05,761][12745] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:22:05,787][12741] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:22:05,792][12725] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 17:22:05,792][12725] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-02-11 17:22:05,812][12725] Num visible devices: 1
[2025-02-11 17:22:05,831][12744] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:22:05,855][12746] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:22:05,860][12743] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:22:05,873][12725] Starting seed is not provided
[2025-02-11 17:22:05,873][12725] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 17:22:05,873][12725] Initializing actor-critic model on device cuda:0
[2025-02-11 17:22:05,874][12725] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:22:05,875][12725] RunningMeanStd input shape: (1,)
[2025-02-11 17:22:05,894][12725] ConvEncoder: input_channels=3
[2025-02-11 17:22:05,896][12750] Worker 9 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:22:05,900][12742] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:22:05,906][12748] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:22:05,916][12749] Worker 8 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:22:05,923][12740] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 17:22:05,924][12740] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-02-11 17:22:05,924][12747] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-02-11 17:22:05,945][12740] Num visible devices: 1
[2025-02-11 17:22:06,045][12725] Conv encoder output size: 512
[2025-02-11 17:22:06,046][12725] Policy head output size: 512
[2025-02-11 17:22:06,062][12725] Created Actor Critic model with architecture:
[2025-02-11 17:22:06,062][12725] ActorCriticSharedWeights(
(obs_normalizer): ObservationNormalizer(
(running_mean_std): RunningMeanStdDictInPlace(
(running_mean_std): ModuleDict(
(obs): RunningMeanStdInPlace()
)
)
)
(returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
(encoder): VizdoomEncoder(
(basic_encoder): ConvEncoder(
(enc): RecursiveScriptModule(
original_name=ConvEncoderImpl
(conv_head): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Conv2d)
(1): RecursiveScriptModule(original_name=ELU)
(2): RecursiveScriptModule(original_name=Conv2d)
(3): RecursiveScriptModule(original_name=ELU)
(4): RecursiveScriptModule(original_name=Conv2d)
(5): RecursiveScriptModule(original_name=ELU)
)
(mlp_layers): RecursiveScriptModule(
original_name=Sequential
(0): RecursiveScriptModule(original_name=Linear)
(1): RecursiveScriptModule(original_name=ELU)
)
)
)
)
(core): ModelCoreRNN(
(core): GRU(512, 512)
)
(decoder): MlpDecoder(
(mlp): Identity()
)
(critic_linear): Linear(in_features=512, out_features=1, bias=True)
(action_parameterization): ActionParameterizationDefault(
(distribution_linear): Linear(in_features=512, out_features=5, bias=True)
)
)
[2025-02-11 17:22:06,161][12725] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-02-11 17:22:07,069][12725] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth...
[2025-02-11 17:22:07,100][12725] Loading model from checkpoint
[2025-02-11 17:22:07,101][12725] Loaded experiment state at self.train_step=1955, self.env_steps=8007680
[2025-02-11 17:22:07,101][12725] Initialized policy 0 weights for model version 1955
[2025-02-11 17:22:07,103][12725] LearnerWorker_p0 finished initialization!
[2025-02-11 17:22:07,103][12725] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-02-11 17:22:07,180][12740] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:22:07,181][12740] RunningMeanStd input shape: (1,)
[2025-02-11 17:22:07,193][12740] ConvEncoder: input_channels=3
[2025-02-11 17:22:07,297][12740] Conv encoder output size: 512
[2025-02-11 17:22:07,297][12740] Policy head output size: 512
[2025-02-11 17:22:07,332][02117] Inference worker 0-0 is ready!
[2025-02-11 17:22:07,334][02117] All inference workers are ready! Signal rollout workers to start!
[2025-02-11 17:22:07,368][12745] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:22:07,369][12743] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:22:07,388][12750] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:22:07,389][12749] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:22:07,389][12741] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:22:07,390][12748] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:22:07,390][12746] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:22:07,390][12747] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:22:07,390][12744] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:22:07,391][12742] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-02-11 17:22:07,665][12743] Decorrelating experience for 0 frames...
[2025-02-11 17:22:07,665][12745] Decorrelating experience for 0 frames...
[2025-02-11 17:22:07,678][12749] Decorrelating experience for 0 frames...
[2025-02-11 17:22:07,687][12748] Decorrelating experience for 0 frames...
[2025-02-11 17:22:07,688][12741] Decorrelating experience for 0 frames...
[2025-02-11 17:22:07,931][12745] Decorrelating experience for 32 frames...
[2025-02-11 17:22:07,934][12743] Decorrelating experience for 32 frames...
[2025-02-11 17:22:07,950][12749] Decorrelating experience for 32 frames...
[2025-02-11 17:22:07,960][12748] Decorrelating experience for 32 frames...
[2025-02-11 17:22:07,963][12744] Decorrelating experience for 0 frames...
[2025-02-11 17:22:07,964][12741] Decorrelating experience for 32 frames...
[2025-02-11 17:22:08,232][12744] Decorrelating experience for 32 frames...
[2025-02-11 17:22:08,234][12747] Decorrelating experience for 0 frames...
[2025-02-11 17:22:08,262][12750] Decorrelating experience for 0 frames...
[2025-02-11 17:22:08,306][12745] Decorrelating experience for 64 frames...
[2025-02-11 17:22:08,326][12748] Decorrelating experience for 64 frames...
[2025-02-11 17:22:08,502][12746] Decorrelating experience for 0 frames...
[2025-02-11 17:22:08,522][12741] Decorrelating experience for 64 frames...
[2025-02-11 17:22:08,561][12744] Decorrelating experience for 64 frames...
[2025-02-11 17:22:08,580][12750] Decorrelating experience for 32 frames...
[2025-02-11 17:22:08,628][12748] Decorrelating experience for 96 frames...
[2025-02-11 17:22:08,845][12746] Decorrelating experience for 32 frames...
[2025-02-11 17:22:08,846][12742] Decorrelating experience for 0 frames...
[2025-02-11 17:22:08,846][12749] Decorrelating experience for 64 frames...
[2025-02-11 17:22:08,956][12741] Decorrelating experience for 96 frames...
[2025-02-11 17:22:08,969][12747] Decorrelating experience for 32 frames...
[2025-02-11 17:22:09,118][12742] Decorrelating experience for 32 frames...
[2025-02-11 17:22:09,180][12750] Decorrelating experience for 64 frames...
[2025-02-11 17:22:09,281][12744] Decorrelating experience for 96 frames...
[2025-02-11 17:22:09,444][12745] Decorrelating experience for 96 frames...
[2025-02-11 17:22:09,450][12747] Decorrelating experience for 64 frames...
[2025-02-11 17:22:09,557][12742] Decorrelating experience for 64 frames...
[2025-02-11 17:22:09,589][12750] Decorrelating experience for 96 frames...
[2025-02-11 17:22:09,676][12749] Decorrelating experience for 96 frames...
[2025-02-11 17:22:09,776][12743] Decorrelating experience for 64 frames...
[2025-02-11 17:22:09,825][12747] Decorrelating experience for 96 frames...
[2025-02-11 17:22:10,042][12746] Decorrelating experience for 64 frames...
[2025-02-11 17:22:10,160][12743] Decorrelating experience for 96 frames...
[2025-02-11 17:22:10,240][02117] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 8007680. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-02-11 17:22:10,245][02117] Avg episode reward: [(0, '4.653')]
[2025-02-11 17:22:10,256][12725] Signal inference workers to stop experience collection...
[2025-02-11 17:22:10,261][12740] InferenceWorker_p0-w0: stopping experience collection
[2025-02-11 17:22:10,367][12742] Decorrelating experience for 96 frames...
[2025-02-11 17:22:10,403][12746] Decorrelating experience for 96 frames...
[2025-02-11 17:22:11,293][12725] Signal inference workers to resume experience collection...
[2025-02-11 17:22:11,294][12740] InferenceWorker_p0-w0: resuming experience collection
[2025-02-11 17:22:12,982][12740] Updated weights for policy 0, policy_version 1965 (0.0089)
[2025-02-11 17:22:14,747][12740] Updated weights for policy 0, policy_version 1975 (0.0013)
[2025-02-11 17:22:15,240][02117] Fps is (10 sec: 18022.6, 60 sec: 18022.6, 300 sec: 18022.6). Total num frames: 8097792. Throughput: 0: 2612.4. Samples: 13062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:22:15,242][02117] Avg episode reward: [(0, '24.351')]
[2025-02-11 17:22:16,544][12740] Updated weights for policy 0, policy_version 1985 (0.0012)
[2025-02-11 17:22:18,328][12740] Updated weights for policy 0, policy_version 1995 (0.0012)
[2025-02-11 17:22:20,120][12740] Updated weights for policy 0, policy_version 2005 (0.0013)
[2025-02-11 17:22:20,240][02117] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20480.0). Total num frames: 8212480. Throughput: 0: 4771.8. Samples: 47718. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:22:20,243][02117] Avg episode reward: [(0, '34.533')]
[2025-02-11 17:22:20,264][12725] Saving new best policy, reward=34.533!
[2025-02-11 17:22:21,982][12740] Updated weights for policy 0, policy_version 2015 (0.0013)
[2025-02-11 17:22:22,488][02117] Heartbeat connected on Batcher_0
[2025-02-11 17:22:22,492][02117] Heartbeat connected on LearnerWorker_p0
[2025-02-11 17:22:22,502][02117] Heartbeat connected on InferenceWorker_p0-w0
[2025-02-11 17:22:22,505][02117] Heartbeat connected on RolloutWorker_w0
[2025-02-11 17:22:22,510][02117] Heartbeat connected on RolloutWorker_w2
[2025-02-11 17:22:22,513][02117] Heartbeat connected on RolloutWorker_w1
[2025-02-11 17:22:22,515][02117] Heartbeat connected on RolloutWorker_w3
[2025-02-11 17:22:22,518][02117] Heartbeat connected on RolloutWorker_w4
[2025-02-11 17:22:22,524][02117] Heartbeat connected on RolloutWorker_w5
[2025-02-11 17:22:22,526][02117] Heartbeat connected on RolloutWorker_w6
[2025-02-11 17:22:22,531][02117] Heartbeat connected on RolloutWorker_w7
[2025-02-11 17:22:22,534][02117] Heartbeat connected on RolloutWorker_w8
[2025-02-11 17:22:22,536][02117] Heartbeat connected on RolloutWorker_w9
[2025-02-11 17:22:23,888][12740] Updated weights for policy 0, policy_version 2025 (0.0012)
[2025-02-11 17:22:25,240][02117] Fps is (10 sec: 22527.4, 60 sec: 21025.8, 300 sec: 21025.8). Total num frames: 8323072. Throughput: 0: 4280.7. Samples: 64212. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:22:25,243][02117] Avg episode reward: [(0, '25.262')]
[2025-02-11 17:22:25,646][12740] Updated weights for policy 0, policy_version 2035 (0.0012)
[2025-02-11 17:22:27,367][12740] Updated weights for policy 0, policy_version 2045 (0.0012)
[2025-02-11 17:22:29,259][12740] Updated weights for policy 0, policy_version 2055 (0.0012)
[2025-02-11 17:22:30,240][02117] Fps is (10 sec: 22528.1, 60 sec: 21504.0, 300 sec: 21504.0). Total num frames: 8437760. Throughput: 0: 4926.0. Samples: 98520. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:22:30,242][02117] Avg episode reward: [(0, '30.272')]
[2025-02-11 17:22:31,060][12740] Updated weights for policy 0, policy_version 2065 (0.0012)
[2025-02-11 17:22:32,816][12740] Updated weights for policy 0, policy_version 2075 (0.0012)
[2025-02-11 17:22:34,670][12740] Updated weights for policy 0, policy_version 2085 (0.0012)
[2025-02-11 17:22:35,240][02117] Fps is (10 sec: 22528.5, 60 sec: 21626.9, 300 sec: 21626.9). Total num frames: 8548352. Throughput: 0: 5287.4. Samples: 132184. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:22:35,243][02117] Avg episode reward: [(0, '32.111')]
[2025-02-11 17:22:36,522][12740] Updated weights for policy 0, policy_version 2095 (0.0012)
[2025-02-11 17:22:38,243][12740] Updated weights for policy 0, policy_version 2105 (0.0013)
[2025-02-11 17:22:40,005][12740] Updated weights for policy 0, policy_version 2115 (0.0012)
[2025-02-11 17:22:40,240][02117] Fps is (10 sec: 22937.6, 60 sec: 21981.9, 300 sec: 21981.9). Total num frames: 8667136. Throughput: 0: 4968.4. Samples: 149052. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:22:40,242][02117] Avg episode reward: [(0, '28.859')]
[2025-02-11 17:22:41,777][12740] Updated weights for policy 0, policy_version 2125 (0.0012)
[2025-02-11 17:22:43,523][12740] Updated weights for policy 0, policy_version 2135 (0.0012)
[2025-02-11 17:22:45,240][02117] Fps is (10 sec: 23347.3, 60 sec: 22118.4, 300 sec: 22118.4). Total num frames: 8781824. Throughput: 0: 5256.2. Samples: 183966. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:22:45,242][02117] Avg episode reward: [(0, '29.565')]
[2025-02-11 17:22:45,300][12740] Updated weights for policy 0, policy_version 2145 (0.0012)
[2025-02-11 17:22:47,067][12740] Updated weights for policy 0, policy_version 2155 (0.0012)
[2025-02-11 17:22:48,968][12740] Updated weights for policy 0, policy_version 2165 (0.0012)
[2025-02-11 17:22:50,240][02117] Fps is (10 sec: 22937.6, 60 sec: 22220.8, 300 sec: 22220.8). Total num frames: 8896512. Throughput: 0: 5445.4. Samples: 217818. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:22:50,242][02117] Avg episode reward: [(0, '26.417')]
[2025-02-11 17:22:50,734][12740] Updated weights for policy 0, policy_version 2175 (0.0012)
[2025-02-11 17:22:52,483][12740] Updated weights for policy 0, policy_version 2185 (0.0012)
[2025-02-11 17:22:54,226][12740] Updated weights for policy 0, policy_version 2195 (0.0012)
[2025-02-11 17:22:55,240][02117] Fps is (10 sec: 22937.6, 60 sec: 22300.5, 300 sec: 22300.5). Total num frames: 9011200. Throughput: 0: 5230.8. Samples: 235386. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:22:55,243][02117] Avg episode reward: [(0, '29.656')]
[2025-02-11 17:22:55,982][12740] Updated weights for policy 0, policy_version 2205 (0.0012)
[2025-02-11 17:22:57,759][12740] Updated weights for policy 0, policy_version 2215 (0.0012)
[2025-02-11 17:22:59,524][12740] Updated weights for policy 0, policy_version 2225 (0.0013)
[2025-02-11 17:23:00,240][02117] Fps is (10 sec: 22937.5, 60 sec: 22364.1, 300 sec: 22364.1). Total num frames: 9125888. Throughput: 0: 5722.5. Samples: 270576. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-02-11 17:23:00,243][02117] Avg episode reward: [(0, '29.065')]
[2025-02-11 17:23:01,394][12740] Updated weights for policy 0, policy_version 2235 (0.0012)
[2025-02-11 17:23:03,194][12740] Updated weights for policy 0, policy_version 2245 (0.0012)
[2025-02-11 17:23:04,946][12740] Updated weights for policy 0, policy_version 2255 (0.0012)
[2025-02-11 17:23:05,240][02117] Fps is (10 sec: 22937.6, 60 sec: 22416.3, 300 sec: 22416.3). Total num frames: 9240576. Throughput: 0: 5701.6. Samples: 304290. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:23:05,242][02117] Avg episode reward: [(0, '26.903')]
[2025-02-11 17:23:06,717][12740] Updated weights for policy 0, policy_version 2265 (0.0012)
[2025-02-11 17:23:08,490][12740] Updated weights for policy 0, policy_version 2275 (0.0012)
[2025-02-11 17:23:10,236][12740] Updated weights for policy 0, policy_version 2285 (0.0012)
[2025-02-11 17:23:10,240][02117] Fps is (10 sec: 23347.2, 60 sec: 22528.0, 300 sec: 22528.0). Total num frames: 9359360. Throughput: 0: 5720.6. Samples: 321640. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:23:10,242][02117] Avg episode reward: [(0, '30.326')]
[2025-02-11 17:23:11,999][12740] Updated weights for policy 0, policy_version 2295 (0.0012)
[2025-02-11 17:23:13,780][12740] Updated weights for policy 0, policy_version 2305 (0.0012)
[2025-02-11 17:23:15,240][02117] Fps is (10 sec: 22937.5, 60 sec: 22869.3, 300 sec: 22496.5). Total num frames: 9469952. Throughput: 0: 5737.8. Samples: 356722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:23:15,243][02117] Avg episode reward: [(0, '30.082')]
[2025-02-11 17:23:15,651][12740] Updated weights for policy 0, policy_version 2315 (0.0012)
[2025-02-11 17:23:17,395][12740] Updated weights for policy 0, policy_version 2325 (0.0013)
[2025-02-11 17:23:19,180][12740] Updated weights for policy 0, policy_version 2335 (0.0012)
[2025-02-11 17:23:20,240][02117] Fps is (10 sec: 22937.7, 60 sec: 22937.6, 300 sec: 22586.5). Total num frames: 9588736. Throughput: 0: 5748.8. Samples: 390878. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:23:20,242][02117] Avg episode reward: [(0, '27.262')]
[2025-02-11 17:23:20,925][12740] Updated weights for policy 0, policy_version 2345 (0.0013)
[2025-02-11 17:23:22,635][12740] Updated weights for policy 0, policy_version 2355 (0.0012)
[2025-02-11 17:23:24,389][12740] Updated weights for policy 0, policy_version 2365 (0.0013)
[2025-02-11 17:23:25,240][02117] Fps is (10 sec: 23347.3, 60 sec: 23006.0, 300 sec: 22609.9). Total num frames: 9703424. Throughput: 0: 5768.0. Samples: 408614. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:23:25,243][02117] Avg episode reward: [(0, '26.114')]
[2025-02-11 17:23:26,214][12740] Updated weights for policy 0, policy_version 2375 (0.0012)
[2025-02-11 17:23:28,133][12740] Updated weights for policy 0, policy_version 2385 (0.0012)
[2025-02-11 17:23:29,929][12740] Updated weights for policy 0, policy_version 2395 (0.0012)
[2025-02-11 17:23:30,240][02117] Fps is (10 sec: 22527.9, 60 sec: 22937.6, 300 sec: 22579.2). Total num frames: 9814016. Throughput: 0: 5744.3. Samples: 442460. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-02-11 17:23:30,243][02117] Avg episode reward: [(0, '26.835')]
[2025-02-11 17:23:31,721][12740] Updated weights for policy 0, policy_version 2405 (0.0012)
[2025-02-11 17:23:33,459][12740] Updated weights for policy 0, policy_version 2415 (0.0012)
[2025-02-11 17:23:35,215][12740] Updated weights for policy 0, policy_version 2425 (0.0012)
[2025-02-11 17:23:35,240][02117] Fps is (10 sec: 22937.6, 60 sec: 23074.1, 300 sec: 22648.5). Total num frames: 9932800. Throughput: 0: 5761.3. Samples: 477078. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:23:35,242][02117] Avg episode reward: [(0, '29.854')]
[2025-02-11 17:23:36,977][12740] Updated weights for policy 0, policy_version 2435 (0.0012)
[2025-02-11 17:23:38,755][12740] Updated weights for policy 0, policy_version 2445 (0.0012)
[2025-02-11 17:23:40,240][02117] Fps is (10 sec: 23347.2, 60 sec: 23005.9, 300 sec: 22664.5). Total num frames: 10047488. Throughput: 0: 5759.3. Samples: 494554. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:23:40,243][02117] Avg episode reward: [(0, '30.698')]
[2025-02-11 17:23:40,640][12740] Updated weights for policy 0, policy_version 2455 (0.0013)
[2025-02-11 17:23:42,455][12740] Updated weights for policy 0, policy_version 2465 (0.0012)
[2025-02-11 17:23:44,200][12740] Updated weights for policy 0, policy_version 2475 (0.0012)
[2025-02-11 17:23:45,240][02117] Fps is (10 sec: 22937.7, 60 sec: 23005.9, 300 sec: 22678.9). Total num frames: 10162176. Throughput: 0: 5730.2. Samples: 528434. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:23:45,243][02117] Avg episode reward: [(0, '28.478')]
[2025-02-11 17:23:45,948][12740] Updated weights for policy 0, policy_version 2485 (0.0012)
[2025-02-11 17:23:47,702][12740] Updated weights for policy 0, policy_version 2495 (0.0012)
[2025-02-11 17:23:49,472][12740] Updated weights for policy 0, policy_version 2505 (0.0013)
[2025-02-11 17:23:50,240][02117] Fps is (10 sec: 22937.8, 60 sec: 23005.9, 300 sec: 22691.9). Total num frames: 10276864. Throughput: 0: 5759.1. Samples: 563450. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:23:50,243][02117] Avg episode reward: [(0, '28.311')]
[2025-02-11 17:23:51,217][12740] Updated weights for policy 0, policy_version 2515 (0.0012)
[2025-02-11 17:23:53,041][12740] Updated weights for policy 0, policy_version 2525 (0.0012)
[2025-02-11 17:23:54,944][12740] Updated weights for policy 0, policy_version 2535 (0.0012)
[2025-02-11 17:23:55,240][02117] Fps is (10 sec: 22528.0, 60 sec: 22937.6, 300 sec: 22664.5). Total num frames: 10387456. Throughput: 0: 5757.4. Samples: 580722. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:23:55,243][02117] Avg episode reward: [(0, '26.855')]
[2025-02-11 17:23:56,680][12740] Updated weights for policy 0, policy_version 2545 (0.0012)
[2025-02-11 17:23:58,455][12740] Updated weights for policy 0, policy_version 2555 (0.0013)
[2025-02-11 17:24:00,183][12740] Updated weights for policy 0, policy_version 2565 (0.0012)
[2025-02-11 17:24:00,240][02117] Fps is (10 sec: 22937.4, 60 sec: 23005.9, 300 sec: 22714.2). Total num frames: 10506240. Throughput: 0: 5729.5. Samples: 614550. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:24:00,242][02117] Avg episode reward: [(0, '30.195')]
[2025-02-11 17:24:00,249][12725] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002565_10506240.pth...
[2025-02-11 17:24:00,326][12725] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001591_6516736.pth
[2025-02-11 17:24:01,964][12740] Updated weights for policy 0, policy_version 2575 (0.0012)
[2025-02-11 17:24:03,709][12740] Updated weights for policy 0, policy_version 2585 (0.0011)
[2025-02-11 17:24:05,240][02117] Fps is (10 sec: 23347.3, 60 sec: 23005.9, 300 sec: 22723.9). Total num frames: 10620928. Throughput: 0: 5746.7. Samples: 649480. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:24:05,243][02117] Avg episode reward: [(0, '29.347')]
[2025-02-11 17:24:05,505][12740] Updated weights for policy 0, policy_version 2595 (0.0012)
[2025-02-11 17:24:07,417][12740] Updated weights for policy 0, policy_version 2605 (0.0012)
[2025-02-11 17:24:09,201][12740] Updated weights for policy 0, policy_version 2615 (0.0012)
[2025-02-11 17:24:10,241][02117] Fps is (10 sec: 22527.5, 60 sec: 22869.2, 300 sec: 22698.6). Total num frames: 10731520. Throughput: 0: 5714.2. Samples: 665754. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:24:10,243][02117] Avg episode reward: [(0, '28.283')]
[2025-02-11 17:24:10,977][12740] Updated weights for policy 0, policy_version 2625 (0.0012)
[2025-02-11 17:24:12,737][12740] Updated weights for policy 0, policy_version 2635 (0.0012)
[2025-02-11 17:24:14,481][12740] Updated weights for policy 0, policy_version 2645 (0.0012)
[2025-02-11 17:24:15,240][02117] Fps is (10 sec: 22528.1, 60 sec: 22937.6, 300 sec: 22708.2). Total num frames: 10846208. Throughput: 0: 5735.3. Samples: 700548. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-02-11 17:24:15,242][02117] Avg episode reward: [(0, '29.653')]
[2025-02-11 17:24:16,258][12740] Updated weights for policy 0, policy_version 2655 (0.0013)
[2025-02-11 17:24:18,108][12740] Updated weights for policy 0, policy_version 2665 (0.0012)
[2025-02-11 17:24:20,041][12740] Updated weights for policy 0, policy_version 2675 (0.0012)
[2025-02-11 17:24:20,240][02117] Fps is (10 sec: 22938.3, 60 sec: 22869.3, 300 sec: 22717.1). Total num frames: 10960896. Throughput: 0: 5719.5. Samples: 734454. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:24:20,242][02117] Avg episode reward: [(0, '32.766')]
[2025-02-11 17:24:21,850][12740] Updated weights for policy 0, policy_version 2685 (0.0012)
[2025-02-11 17:24:23,623][12740] Updated weights for policy 0, policy_version 2695 (0.0012)
[2025-02-11 17:24:25,241][02117] Fps is (10 sec: 22936.3, 60 sec: 22869.1, 300 sec: 22725.1). Total num frames: 11075584. Throughput: 0: 5701.7. Samples: 751134. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:24:25,243][02117] Avg episode reward: [(0, '32.836')]
[2025-02-11 17:24:25,419][12740] Updated weights for policy 0, policy_version 2705 (0.0012)
[2025-02-11 17:24:27,192][12740] Updated weights for policy 0, policy_version 2715 (0.0012)
[2025-02-11 17:24:28,986][12740] Updated weights for policy 0, policy_version 2725 (0.0012)
[2025-02-11 17:24:30,240][02117] Fps is (10 sec: 22937.5, 60 sec: 22937.6, 300 sec: 22732.8). Total num frames: 11190272. Throughput: 0: 5716.2. Samples: 785662. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:24:30,243][02117] Avg episode reward: [(0, '30.424')]
[2025-02-11 17:24:30,791][12740] Updated weights for policy 0, policy_version 2735 (0.0012)
[2025-02-11 17:24:32,773][12740] Updated weights for policy 0, policy_version 2745 (0.0013)
[2025-02-11 17:24:34,625][12740] Updated weights for policy 0, policy_version 2755 (0.0013)
[2025-02-11 17:24:35,240][02117] Fps is (10 sec: 22119.6, 60 sec: 22732.8, 300 sec: 22683.4). Total num frames: 11296768. Throughput: 0: 5665.7. Samples: 818408. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:24:35,243][02117] Avg episode reward: [(0, '29.429')]
[2025-02-11 17:24:36,389][12740] Updated weights for policy 0, policy_version 2765 (0.0013)
[2025-02-11 17:24:38,167][12740] Updated weights for policy 0, policy_version 2775 (0.0012)
[2025-02-11 17:24:39,942][12740] Updated weights for policy 0, policy_version 2785 (0.0013)
[2025-02-11 17:24:40,240][02117] Fps is (10 sec: 22118.3, 60 sec: 22732.8, 300 sec: 22691.8). Total num frames: 11411456. Throughput: 0: 5667.0. Samples: 835738. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:24:40,243][02117] Avg episode reward: [(0, '30.784')]
[2025-02-11 17:24:41,709][12740] Updated weights for policy 0, policy_version 2795 (0.0012)
[2025-02-11 17:24:43,490][12740] Updated weights for policy 0, policy_version 2805 (0.0012)
[2025-02-11 17:24:45,240][02117] Fps is (10 sec: 22937.5, 60 sec: 22732.8, 300 sec: 22699.8). Total num frames: 11526144. Throughput: 0: 5682.3. Samples: 870252. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:24:45,242][02117] Avg episode reward: [(0, '28.860')]
[2025-02-11 17:24:45,359][12740] Updated weights for policy 0, policy_version 2815 (0.0012)
[2025-02-11 17:24:47,256][12740] Updated weights for policy 0, policy_version 2825 (0.0012)
[2025-02-11 17:24:49,059][12740] Updated weights for policy 0, policy_version 2835 (0.0012)
[2025-02-11 17:24:50,240][02117] Fps is (10 sec: 22528.0, 60 sec: 22664.5, 300 sec: 22681.6). Total num frames: 11636736. Throughput: 0: 5651.4. Samples: 903792. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:24:50,243][02117] Avg episode reward: [(0, '29.631')]
[2025-02-11 17:24:50,809][12740] Updated weights for policy 0, policy_version 2845 (0.0012)
[2025-02-11 17:24:52,562][12740] Updated weights for policy 0, policy_version 2855 (0.0013)
[2025-02-11 17:24:54,330][12740] Updated weights for policy 0, policy_version 2865 (0.0013)
[2025-02-11 17:24:55,240][02117] Fps is (10 sec: 22937.8, 60 sec: 22801.1, 300 sec: 22714.2). Total num frames: 11755520. Throughput: 0: 5676.1. Samples: 921176. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:24:55,243][02117] Avg episode reward: [(0, '31.826')]
[2025-02-11 17:24:56,115][12740] Updated weights for policy 0, policy_version 2875 (0.0012)
[2025-02-11 17:24:57,912][12740] Updated weights for policy 0, policy_version 2885 (0.0012)
[2025-02-11 17:24:59,757][12740] Updated weights for policy 0, policy_version 2895 (0.0013)
[2025-02-11 17:25:00,240][02117] Fps is (10 sec: 22937.7, 60 sec: 22664.5, 300 sec: 22696.7). Total num frames: 11866112. Throughput: 0: 5669.4. Samples: 955672. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-02-11 17:25:00,242][02117] Avg episode reward: [(0, '31.478')]
[2025-02-11 17:25:01,558][12740] Updated weights for policy 0, policy_version 2905 (0.0012)
[2025-02-11 17:25:03,323][12740] Updated weights for policy 0, policy_version 2915 (0.0012)
[2025-02-11 17:25:05,083][12740] Updated weights for policy 0, policy_version 2925 (0.0012)
[2025-02-11 17:25:05,240][02117] Fps is (10 sec: 22937.5, 60 sec: 22732.8, 300 sec: 22727.0). Total num frames: 11984896. Throughput: 0: 5675.7. Samples: 989860. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:25:05,242][02117] Avg episode reward: [(0, '32.538')]
[2025-02-11 17:25:06,837][12740] Updated weights for policy 0, policy_version 2935 (0.0012)
[2025-02-11 17:25:08,594][12740] Updated weights for policy 0, policy_version 2945 (0.0012)
[2025-02-11 17:25:10,240][02117] Fps is (10 sec: 23347.2, 60 sec: 22801.2, 300 sec: 22732.8). Total num frames: 12099584. Throughput: 0: 5693.1. Samples: 1007322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:25:10,242][02117] Avg episode reward: [(0, '29.907')]
[2025-02-11 17:25:10,415][12740] Updated weights for policy 0, policy_version 2955 (0.0013)
[2025-02-11 17:25:12,321][12740] Updated weights for policy 0, policy_version 2965 (0.0013)
[2025-02-11 17:25:14,123][12740] Updated weights for policy 0, policy_version 2975 (0.0012)
[2025-02-11 17:25:15,240][02117] Fps is (10 sec: 22527.6, 60 sec: 22732.7, 300 sec: 22716.2). Total num frames: 12210176. Throughput: 0: 5668.4. Samples: 1040742. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:25:15,242][02117] Avg episode reward: [(0, '31.334')]
[2025-02-11 17:25:15,893][12740] Updated weights for policy 0, policy_version 2985 (0.0012)
[2025-02-11 17:25:17,685][12740] Updated weights for policy 0, policy_version 2995 (0.0012)
[2025-02-11 17:25:19,470][12740] Updated weights for policy 0, policy_version 3005 (0.0012)
[2025-02-11 17:25:20,240][02117] Fps is (10 sec: 22528.1, 60 sec: 22732.8, 300 sec: 22722.0). Total num frames: 12324864. Throughput: 0: 5711.3. Samples: 1075414. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:25:20,242][02117] Avg episode reward: [(0, '31.723')]
[2025-02-11 17:25:21,241][12740] Updated weights for policy 0, policy_version 3015 (0.0013)
[2025-02-11 17:25:22,996][12740] Updated weights for policy 0, policy_version 3025 (0.0012)
[2025-02-11 17:25:24,833][12740] Updated weights for policy 0, policy_version 3035 (0.0012)
[2025-02-11 17:25:25,241][02117] Fps is (10 sec: 22527.7, 60 sec: 22664.6, 300 sec: 22706.5). Total num frames: 12435456. Throughput: 0: 5712.5. Samples: 1092800. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:25:25,242][02117] Avg episode reward: [(0, '30.617')]
[2025-02-11 17:25:26,695][12740] Updated weights for policy 0, policy_version 3045 (0.0013)
[2025-02-11 17:25:28,486][12740] Updated weights for policy 0, policy_version 3055 (0.0013)
[2025-02-11 17:25:30,240][02117] Fps is (10 sec: 22527.8, 60 sec: 22664.5, 300 sec: 22712.3). Total num frames: 12550144. Throughput: 0: 5692.4. Samples: 1126408. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-02-11 17:25:30,242][02117] Avg episode reward: [(0, '29.658')]
[2025-02-11 17:25:30,275][12740] Updated weights for policy 0, policy_version 3065 (0.0012)
[2025-02-11 17:25:32,061][12740] Updated weights for policy 0, policy_version 3075 (0.0012)
[2025-02-11 17:25:33,865][12740] Updated weights for policy 0, policy_version 3085 (0.0012)
[2025-02-11 17:25:35,240][02117] Fps is (10 sec: 22938.4, 60 sec: 22801.1, 300 sec: 22717.8). Total num frames: 12664832. Throughput: 0: 5708.3. Samples: 1160666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:25:35,243][02117] Avg episode reward: [(0, '33.813')]
[2025-02-11 17:25:35,621][12740] Updated weights for policy 0, policy_version 3095 (0.0012)
[2025-02-11 17:25:37,489][12740] Updated weights for policy 0, policy_version 3105 (0.0013)
[2025-02-11 17:25:39,395][12740] Updated weights for policy 0, policy_version 3115 (0.0013)
[2025-02-11 17:25:40,240][02117] Fps is (10 sec: 22528.0, 60 sec: 22732.8, 300 sec: 22703.5). Total num frames: 12775424. Throughput: 0: 5698.6. Samples: 1177612. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:25:40,243][02117] Avg episode reward: [(0, '29.854')]
[2025-02-11 17:25:41,147][12740] Updated weights for policy 0, policy_version 3125 (0.0012)
[2025-02-11 17:25:42,928][12740] Updated weights for policy 0, policy_version 3135 (0.0013)
[2025-02-11 17:25:44,707][12740] Updated weights for policy 0, policy_version 3145 (0.0012)
[2025-02-11 17:25:45,240][02117] Fps is (10 sec: 22528.0, 60 sec: 22732.8, 300 sec: 22709.0). Total num frames: 12890112. Throughput: 0: 5686.0. Samples: 1211542. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:25:45,243][02117] Avg episode reward: [(0, '29.512')]
[2025-02-11 17:25:46,475][12740] Updated weights for policy 0, policy_version 3155 (0.0012)
[2025-02-11 17:25:48,247][12740] Updated weights for policy 0, policy_version 3165 (0.0012)
[2025-02-11 17:25:50,065][12740] Updated weights for policy 0, policy_version 3175 (0.0013)
[2025-02-11 17:25:50,240][02117] Fps is (10 sec: 22937.3, 60 sec: 22801.0, 300 sec: 22714.2). Total num frames: 13004800. Throughput: 0: 5692.1. Samples: 1246006. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:25:50,243][02117] Avg episode reward: [(0, '29.450')]
[2025-02-11 17:25:52,009][12740] Updated weights for policy 0, policy_version 3185 (0.0013)
[2025-02-11 17:25:53,834][12740] Updated weights for policy 0, policy_version 3195 (0.0012)
[2025-02-11 17:25:55,240][02117] Fps is (10 sec: 22527.9, 60 sec: 22664.5, 300 sec: 22700.9). Total num frames: 13115392. Throughput: 0: 5665.2. Samples: 1262258. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:25:55,243][02117] Avg episode reward: [(0, '30.317')]
[2025-02-11 17:25:55,613][12740] Updated weights for policy 0, policy_version 3205 (0.0012)
[2025-02-11 17:25:57,381][12740] Updated weights for policy 0, policy_version 3215 (0.0012)
[2025-02-11 17:25:59,144][12740] Updated weights for policy 0, policy_version 3225 (0.0012)
[2025-02-11 17:26:00,240][02117] Fps is (10 sec: 22937.7, 60 sec: 22801.0, 300 sec: 22723.9). Total num frames: 13234176. Throughput: 0: 5687.3. Samples: 1296670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:26:00,243][02117] Avg episode reward: [(0, '31.051')]
[2025-02-11 17:26:00,250][12725] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003231_13234176.pth...
[2025-02-11 17:26:00,322][12725] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000001955_8007680.pth
[2025-02-11 17:26:00,907][12740] Updated weights for policy 0, policy_version 3235 (0.0012)
[2025-02-11 17:26:02,715][12740] Updated weights for policy 0, policy_version 3245 (0.0012)
[2025-02-11 17:26:04,662][12740] Updated weights for policy 0, policy_version 3255 (0.0012)
[2025-02-11 17:26:05,240][02117] Fps is (10 sec: 22937.7, 60 sec: 22664.5, 300 sec: 22711.0). Total num frames: 13344768. Throughput: 0: 5667.4. Samples: 1330446. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:26:05,244][02117] Avg episode reward: [(0, '30.771')]
[2025-02-11 17:26:06,490][12740] Updated weights for policy 0, policy_version 3265 (0.0012)
[2025-02-11 17:26:08,270][12740] Updated weights for policy 0, policy_version 3275 (0.0012)
[2025-02-11 17:26:10,053][12740] Updated weights for policy 0, policy_version 3285 (0.0012)
[2025-02-11 17:26:10,240][02117] Fps is (10 sec: 22118.5, 60 sec: 22596.3, 300 sec: 22698.7). Total num frames: 13455360. Throughput: 0: 5655.2. Samples: 1347282. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:26:10,243][02117] Avg episode reward: [(0, '33.388')]
[2025-02-11 17:26:11,856][12740] Updated weights for policy 0, policy_version 3295 (0.0012)
[2025-02-11 17:26:13,642][12740] Updated weights for policy 0, policy_version 3305 (0.0012)
[2025-02-11 17:26:15,240][02117] Fps is (10 sec: 22937.7, 60 sec: 22732.9, 300 sec: 22720.3). Total num frames: 13574144. Throughput: 0: 5672.9. Samples: 1381688. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-02-11 17:26:15,243][02117] Avg episode reward: [(0, '28.598')]
[2025-02-11 17:26:15,433][12740] Updated weights for policy 0, policy_version 3315 (0.0012)
[2025-02-11 17:26:17,330][12740] Updated weights for policy 0, policy_version 3325 (0.0012)
[2025-02-11 17:26:19,162][12740] Updated weights for policy 0, policy_version 3335 (0.0012)
[2025-02-11 17:26:20,240][02117] Fps is (10 sec: 22937.8, 60 sec: 22664.5, 300 sec: 22708.2). Total num frames: 13684736. Throughput: 0: 5653.9. Samples: 1415090. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:26:20,243][02117] Avg episode reward: [(0, '30.308')]
[2025-02-11 17:26:20,912][12740] Updated weights for policy 0, policy_version 3345 (0.0012)
[2025-02-11 17:26:22,667][12740] Updated weights for policy 0, policy_version 3355 (0.0012)
[2025-02-11 17:26:24,410][12740] Updated weights for policy 0, policy_version 3365 (0.0012)
[2025-02-11 17:26:25,240][02117] Fps is (10 sec: 22528.0, 60 sec: 22732.9, 300 sec: 22712.7). Total num frames: 13799424. Throughput: 0: 5664.1. Samples: 1432498. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:26:25,242][02117] Avg episode reward: [(0, '32.269')]
[2025-02-11 17:26:26,182][12740] Updated weights for policy 0, policy_version 3375 (0.0013)
[2025-02-11 17:26:27,933][12740] Updated weights for policy 0, policy_version 3385 (0.0012)
[2025-02-11 17:26:29,789][12740] Updated weights for policy 0, policy_version 3395 (0.0012)
[2025-02-11 17:26:30,240][02117] Fps is (10 sec: 22937.4, 60 sec: 22732.8, 300 sec: 22717.0). Total num frames: 13914112. Throughput: 0: 5687.1. Samples: 1467462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:26:30,243][02117] Avg episode reward: [(0, '33.848')]
[2025-02-11 17:26:31,678][12740] Updated weights for policy 0, policy_version 3405 (0.0012)
[2025-02-11 17:26:33,456][12740] Updated weights for policy 0, policy_version 3415 (0.0012)
[2025-02-11 17:26:35,236][12740] Updated weights for policy 0, policy_version 3425 (0.0012)
[2025-02-11 17:26:35,240][02117] Fps is (10 sec: 22937.3, 60 sec: 22732.8, 300 sec: 22721.2). Total num frames: 14028800. Throughput: 0: 5664.5. Samples: 1500910. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:26:35,243][02117] Avg episode reward: [(0, '29.624')]
[2025-02-11 17:26:36,996][12740] Updated weights for policy 0, policy_version 3435 (0.0012)
[2025-02-11 17:26:38,774][12740] Updated weights for policy 0, policy_version 3445 (0.0012)
[2025-02-11 17:26:40,240][02117] Fps is (10 sec: 22937.6, 60 sec: 22801.1, 300 sec: 22725.2). Total num frames: 14143488. Throughput: 0: 5690.7. Samples: 1518338. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:26:40,243][02117] Avg episode reward: [(0, '34.344')]
[2025-02-11 17:26:40,551][12740] Updated weights for policy 0, policy_version 3455 (0.0012)
[2025-02-11 17:26:42,362][12740] Updated weights for policy 0, policy_version 3465 (0.0012)
[2025-02-11 17:26:44,232][12740] Updated weights for policy 0, policy_version 3475 (0.0012)
[2025-02-11 17:26:45,240][02117] Fps is (10 sec: 22528.3, 60 sec: 22732.8, 300 sec: 22714.2). Total num frames: 14254080. Throughput: 0: 5681.2. Samples: 1552324. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:26:45,243][02117] Avg episode reward: [(0, '32.705')]
[2025-02-11 17:26:46,061][12740] Updated weights for policy 0, policy_version 3485 (0.0012)
[2025-02-11 17:26:47,815][12740] Updated weights for policy 0, policy_version 3495 (0.0012)
[2025-02-11 17:26:49,581][12740] Updated weights for policy 0, policy_version 3505 (0.0012)
[2025-02-11 17:26:50,240][02117] Fps is (10 sec: 22528.0, 60 sec: 22732.8, 300 sec: 22718.2). Total num frames: 14368768. Throughput: 0: 5692.4. Samples: 1586604. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:26:50,242][02117] Avg episode reward: [(0, '31.404')]
[2025-02-11 17:26:51,325][12740] Updated weights for policy 0, policy_version 3515 (0.0012)
[2025-02-11 17:26:53,111][12740] Updated weights for policy 0, policy_version 3525 (0.0012)
[2025-02-11 17:26:54,879][12740] Updated weights for policy 0, policy_version 3535 (0.0013)
[2025-02-11 17:26:55,240][02117] Fps is (10 sec: 23347.1, 60 sec: 22869.3, 300 sec: 22736.4). Total num frames: 14487552. Throughput: 0: 5706.8. Samples: 1604088. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:26:55,242][02117] Avg episode reward: [(0, '28.501')]
[2025-02-11 17:26:56,754][12740] Updated weights for policy 0, policy_version 3545 (0.0013)
[2025-02-11 17:26:58,574][12740] Updated weights for policy 0, policy_version 3555 (0.0013)
[2025-02-11 17:27:00,240][02117] Fps is (10 sec: 22937.6, 60 sec: 22732.8, 300 sec: 22725.7). Total num frames: 14598144. Throughput: 0: 5691.7. Samples: 1637816. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:27:00,242][02117] Avg episode reward: [(0, '31.883')]
[2025-02-11 17:27:00,319][12740] Updated weights for policy 0, policy_version 3565 (0.0012)
[2025-02-11 17:27:02,068][12740] Updated weights for policy 0, policy_version 3575 (0.0012)
[2025-02-11 17:27:03,830][12740] Updated weights for policy 0, policy_version 3585 (0.0012)
[2025-02-11 17:27:05,240][02117] Fps is (10 sec: 22528.2, 60 sec: 22801.1, 300 sec: 22729.3). Total num frames: 14712832. Throughput: 0: 5726.0. Samples: 1672758. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:27:05,243][02117] Avg episode reward: [(0, '32.349')]
[2025-02-11 17:27:05,614][12740] Updated weights for policy 0, policy_version 3595 (0.0012)
[2025-02-11 17:27:07,385][12740] Updated weights for policy 0, policy_version 3605 (0.0012)
[2025-02-11 17:27:09,256][12740] Updated weights for policy 0, policy_version 3615 (0.0013)
[2025-02-11 17:27:10,240][02117] Fps is (10 sec: 22937.6, 60 sec: 22869.3, 300 sec: 22812.6). Total num frames: 14827520. Throughput: 0: 5725.0. Samples: 1690122. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-02-11 17:27:10,242][02117] Avg episode reward: [(0, '33.127')]
[2025-02-11 17:27:11,117][12740] Updated weights for policy 0, policy_version 3625 (0.0012)
[2025-02-11 17:27:12,890][12740] Updated weights for policy 0, policy_version 3635 (0.0012)
[2025-02-11 17:27:14,646][12740] Updated weights for policy 0, policy_version 3645 (0.0012)
[2025-02-11 17:27:15,240][02117] Fps is (10 sec: 22937.4, 60 sec: 22801.0, 300 sec: 22812.6). Total num frames: 14942208. Throughput: 0: 5693.6. Samples: 1723674. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-02-11 17:27:15,242][02117] Avg episode reward: [(0, '30.411')]
[2025-02-11 17:27:16,412][12740] Updated weights for policy 0, policy_version 3655 (0.0012)
[2025-02-11 17:27:18,146][12740] Updated weights for policy 0, policy_version 3665 (0.0012)
[2025-02-11 17:27:19,912][12740] Updated weights for policy 0, policy_version 3675 (0.0012)
[2025-02-11 17:27:20,241][02117] Fps is (10 sec: 22937.1, 60 sec: 22869.2, 300 sec: 22826.5). Total num frames: 15056896. Throughput: 0: 5729.1. Samples: 1758722. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-02-11 17:27:20,243][02117] Avg episode reward: [(0, '31.541')]
[2025-02-11 17:27:21,741][12740] Updated weights for policy 0, policy_version 3685 (0.0012)
[2025-02-11 17:27:23,664][12740] Updated weights for policy 0, policy_version 3695 (0.0012)
[2025-02-11 17:27:25,240][02117] Fps is (10 sec: 22937.5, 60 sec: 22869.3, 300 sec: 22826.5). Total num frames: 15171584. Throughput: 0: 5715.0. Samples: 1775512. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:27:25,243][02117] Avg episode reward: [(0, '32.396')]
[2025-02-11 17:27:25,416][12740] Updated weights for policy 0, policy_version 3705 (0.0012)
[2025-02-11 17:27:27,153][12740] Updated weights for policy 0, policy_version 3715 (0.0012)
[2025-02-11 17:27:28,915][12740] Updated weights for policy 0, policy_version 3725 (0.0012)
[2025-02-11 17:27:30,240][02117] Fps is (10 sec: 22938.2, 60 sec: 22869.3, 300 sec: 22840.4). Total num frames: 15286272. Throughput: 0: 5725.2. Samples: 1809958. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:27:30,243][02117] Avg episode reward: [(0, '32.554')]
[2025-02-11 17:27:30,661][12740] Updated weights for policy 0, policy_version 3735 (0.0012)
[2025-02-11 17:27:32,452][12740] Updated weights for policy 0, policy_version 3745 (0.0012)
[2025-02-11 17:27:34,308][12740] Updated weights for policy 0, policy_version 3755 (0.0013)
[2025-02-11 17:27:35,240][02117] Fps is (10 sec: 22528.0, 60 sec: 22801.1, 300 sec: 22812.6). Total num frames: 15396864. Throughput: 0: 5724.8. Samples: 1844220. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:27:35,242][02117] Avg episode reward: [(0, '31.010')]
[2025-02-11 17:27:36,253][12740] Updated weights for policy 0, policy_version 3765 (0.0013)
[2025-02-11 17:27:38,062][12740] Updated weights for policy 0, policy_version 3775 (0.0012)
[2025-02-11 17:27:39,873][12740] Updated weights for policy 0, policy_version 3785 (0.0012)
[2025-02-11 17:27:40,240][02117] Fps is (10 sec: 22528.0, 60 sec: 22801.1, 300 sec: 22812.6). Total num frames: 15511552. Throughput: 0: 5696.6. Samples: 1860434. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:27:40,242][02117] Avg episode reward: [(0, '35.659')]
[2025-02-11 17:27:40,251][12725] Saving new best policy, reward=35.659!
[2025-02-11 17:27:41,648][12740] Updated weights for policy 0, policy_version 3795 (0.0012)
[2025-02-11 17:27:43,401][12740] Updated weights for policy 0, policy_version 3805 (0.0012)
[2025-02-11 17:27:45,191][12740] Updated weights for policy 0, policy_version 3815 (0.0012)
[2025-02-11 17:27:45,240][02117] Fps is (10 sec: 22937.7, 60 sec: 22869.3, 300 sec: 22812.6). Total num frames: 15626240. Throughput: 0: 5713.2. Samples: 1894908. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:27:45,242][02117] Avg episode reward: [(0, '33.112')]
[2025-02-11 17:27:47,000][12740] Updated weights for policy 0, policy_version 3825 (0.0012)
[2025-02-11 17:27:48,914][12740] Updated weights for policy 0, policy_version 3835 (0.0013)
[2025-02-11 17:27:50,240][02117] Fps is (10 sec: 22527.9, 60 sec: 22801.1, 300 sec: 22798.7). Total num frames: 15736832. Throughput: 0: 5680.7. Samples: 1928392. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-02-11 17:27:50,242][02117] Avg episode reward: [(0, '35.327')]
[2025-02-11 17:27:50,726][12740] Updated weights for policy 0, policy_version 3845 (0.0012)
[2025-02-11 17:27:52,500][12740] Updated weights for policy 0, policy_version 3855 (0.0013)
[2025-02-11 17:27:54,263][12740] Updated weights for policy 0, policy_version 3865 (0.0012)
[2025-02-11 17:27:55,240][02117] Fps is (10 sec: 22528.2, 60 sec: 22732.8, 300 sec: 22798.8). Total num frames: 15851520. Throughput: 0: 5676.1. Samples: 1945548. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:27:55,243][02117] Avg episode reward: [(0, '34.351')]
[2025-02-11 17:27:56,028][12740] Updated weights for policy 0, policy_version 3875 (0.0012)
[2025-02-11 17:27:57,815][12740] Updated weights for policy 0, policy_version 3885 (0.0012)
[2025-02-11 17:27:59,620][12740] Updated weights for policy 0, policy_version 3895 (0.0012)
[2025-02-11 17:28:00,240][02117] Fps is (10 sec: 22937.2, 60 sec: 22801.0, 300 sec: 22798.7). Total num frames: 15966208. Throughput: 0: 5700.1. Samples: 1980180. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-02-11 17:28:00,243][02117] Avg episode reward: [(0, '33.750')]
[2025-02-11 17:28:00,250][12725] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003898_15966208.pth...
[2025-02-11 17:28:00,325][12725] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000002565_10506240.pth
[2025-02-11 17:28:01,476][12740] Updated weights for policy 0, policy_version 3905 (0.0012)
[2025-02-11 17:28:02,076][12725] Stopping Batcher_0...
[2025-02-11 17:28:02,077][12725] Loop batcher_evt_loop terminating...
[2025-02-11 17:28:02,077][12725] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth...
[2025-02-11 17:28:02,077][02117] Component Batcher_0 stopped!
[2025-02-11 17:28:02,100][12740] Weights refcount: 2 0
[2025-02-11 17:28:02,102][12740] Stopping InferenceWorker_p0-w0...
[2025-02-11 17:28:02,102][12740] Loop inference_proc0-0_evt_loop terminating...
[2025-02-11 17:28:02,102][02117] Component InferenceWorker_p0-w0 stopped!
[2025-02-11 17:28:02,125][12750] Stopping RolloutWorker_w9...
[2025-02-11 17:28:02,125][12750] Loop rollout_proc9_evt_loop terminating...
[2025-02-11 17:28:02,125][02117] Component RolloutWorker_w9 stopped!
[2025-02-11 17:28:02,130][12744] Stopping RolloutWorker_w4...
[2025-02-11 17:28:02,131][12744] Loop rollout_proc4_evt_loop terminating...
[2025-02-11 17:28:02,131][12742] Stopping RolloutWorker_w1...
[2025-02-11 17:28:02,132][12742] Loop rollout_proc1_evt_loop terminating...
[2025-02-11 17:28:02,131][02117] Component RolloutWorker_w4 stopped!
[2025-02-11 17:28:02,132][12745] Stopping RolloutWorker_w3...
[2025-02-11 17:28:02,133][12745] Loop rollout_proc3_evt_loop terminating...
[2025-02-11 17:28:02,134][12746] Stopping RolloutWorker_w5...
[2025-02-11 17:28:02,133][02117] Component RolloutWorker_w1 stopped!
[2025-02-11 17:28:02,134][12746] Loop rollout_proc5_evt_loop terminating...
[2025-02-11 17:28:02,135][12748] Stopping RolloutWorker_w7...
[2025-02-11 17:28:02,135][12748] Loop rollout_proc7_evt_loop terminating...
[2025-02-11 17:28:02,136][12743] Stopping RolloutWorker_w2...
[2025-02-11 17:28:02,135][02117] Component RolloutWorker_w3 stopped!
[2025-02-11 17:28:02,137][12743] Loop rollout_proc2_evt_loop terminating...
[2025-02-11 17:28:02,136][02117] Component RolloutWorker_w5 stopped!
[2025-02-11 17:28:02,139][12747] Stopping RolloutWorker_w6...
[2025-02-11 17:28:02,140][12747] Loop rollout_proc6_evt_loop terminating...
[2025-02-11 17:28:02,138][02117] Component RolloutWorker_w7 stopped!
[2025-02-11 17:28:02,143][02117] Component RolloutWorker_w2 stopped!
[2025-02-11 17:28:02,145][02117] Component RolloutWorker_w6 stopped!
[2025-02-11 17:28:02,161][12725] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003231_13234176.pth
[2025-02-11 17:28:02,164][12741] Stopping RolloutWorker_w0...
[2025-02-11 17:28:02,164][12741] Loop rollout_proc0_evt_loop terminating...
[2025-02-11 17:28:02,164][02117] Component RolloutWorker_w0 stopped!
[2025-02-11 17:28:02,166][12749] Stopping RolloutWorker_w8...
[2025-02-11 17:28:02,166][12749] Loop rollout_proc8_evt_loop terminating...
[2025-02-11 17:28:02,166][02117] Component RolloutWorker_w8 stopped!
[2025-02-11 17:28:02,172][12725] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth...
[2025-02-11 17:28:02,295][12725] Stopping LearnerWorker_p0...
[2025-02-11 17:28:02,296][12725] Loop learner_proc0_evt_loop terminating...
[2025-02-11 17:28:02,296][02117] Component LearnerWorker_p0 stopped!
[2025-02-11 17:28:02,298][02117] Waiting for process learner_proc0 to stop...
[2025-02-11 17:28:03,257][02117] Waiting for process inference_proc0-0 to join...
[2025-02-11 17:28:03,258][02117] Waiting for process rollout_proc0 to join...
[2025-02-11 17:28:03,260][02117] Waiting for process rollout_proc1 to join...
[2025-02-11 17:28:03,261][02117] Waiting for process rollout_proc2 to join...
[2025-02-11 17:28:03,263][02117] Waiting for process rollout_proc3 to join...
[2025-02-11 17:28:03,264][02117] Waiting for process rollout_proc4 to join...
[2025-02-11 17:28:03,266][02117] Waiting for process rollout_proc5 to join...
[2025-02-11 17:28:03,267][02117] Waiting for process rollout_proc6 to join...
[2025-02-11 17:28:03,268][02117] Waiting for process rollout_proc7 to join...
[2025-02-11 17:28:03,269][02117] Waiting for process rollout_proc8 to join...
[2025-02-11 17:28:03,271][02117] Waiting for process rollout_proc9 to join...
[2025-02-11 17:28:03,272][02117] Batcher 0 profile tree view:
batching: 33.5245, releasing_batches: 0.0484
[2025-02-11 17:28:03,273][02117] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
wait_policy_total: 6.2807
update_model: 6.0939
weight_update: 0.0012
one_step: 0.0030
handle_policy_step: 327.3869
deserialize: 14.1943, stack: 2.1812, obs_to_device_normalize: 81.2034, forward: 150.1038, send_messages: 25.7323
prepare_outputs: 41.1445
to_cpu: 26.4090
[2025-02-11 17:28:03,274][02117] Learner 0 profile tree view:
misc: 0.0097, prepare_batch: 18.7589
train: 45.7863
epoch_init: 0.0085, minibatch_init: 0.0109, losses_postprocess: 0.4906, kl_divergence: 0.6622, after_optimizer: 0.8413
calculate_losses: 18.1321
losses_init: 0.0059, forward_head: 1.3001, bptt_initial: 10.7448, tail: 1.1877, advantages_returns: 0.3223, losses: 2.0733
bptt: 2.1968
bptt_forward_core: 2.0972
update: 24.9648
clip: 1.4294
[2025-02-11 17:28:03,276][02117] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.2094, enqueue_policy_requests: 14.7807, env_step: 223.1613, overhead: 9.1558, complete_rollouts: 0.3620
save_policy_outputs: 13.4449
split_output_tensors: 5.1515
[2025-02-11 17:28:03,278][02117] RolloutWorker_w9 profile tree view:
wait_for_trajectories: 0.2073, enqueue_policy_requests: 14.6800, env_step: 223.5111, overhead: 8.9706, complete_rollouts: 0.3574
save_policy_outputs: 13.4113
split_output_tensors: 5.1475
[2025-02-11 17:28:03,279][02117] Loop Runner_EvtLoop terminating...
[2025-02-11 17:28:03,280][02117] Runner profile tree view:
main_loop: 360.7447
[2025-02-11 17:28:03,282][02117] Collected {0: 16007168}, FPS: 22174.9
[2025-02-11 17:28:29,955][02117] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-11 17:28:29,957][02117] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-11 17:28:29,958][02117] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-11 17:28:29,960][02117] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-11 17:28:29,961][02117] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-11 17:28:29,963][02117] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-11 17:28:29,964][02117] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-02-11 17:28:29,966][02117] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-11 17:28:29,967][02117] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-02-11 17:28:29,968][02117] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-02-11 17:28:29,969][02117] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-11 17:28:29,970][02117] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-11 17:28:29,971][02117] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-11 17:28:29,973][02117] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-11 17:28:29,974][02117] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-11 17:28:30,003][02117] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:28:30,006][02117] RunningMeanStd input shape: (1,)
[2025-02-11 17:28:30,019][02117] ConvEncoder: input_channels=3
[2025-02-11 17:28:30,058][02117] Conv encoder output size: 512
[2025-02-11 17:28:30,059][02117] Policy head output size: 512
[2025-02-11 17:28:30,081][02117] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth...
[2025-02-11 17:28:30,498][02117] Num frames 100...
[2025-02-11 17:28:30,623][02117] Num frames 200...
[2025-02-11 17:28:30,746][02117] Num frames 300...
[2025-02-11 17:28:30,870][02117] Num frames 400...
[2025-02-11 17:28:30,992][02117] Num frames 500...
[2025-02-11 17:28:31,122][02117] Num frames 600...
[2025-02-11 17:28:31,248][02117] Num frames 700...
[2025-02-11 17:28:31,371][02117] Num frames 800...
[2025-02-11 17:28:31,497][02117] Num frames 900...
[2025-02-11 17:28:31,622][02117] Num frames 1000...
[2025-02-11 17:28:31,753][02117] Num frames 1100...
[2025-02-11 17:28:31,884][02117] Num frames 1200...
[2025-02-11 17:28:32,014][02117] Num frames 1300...
[2025-02-11 17:28:32,142][02117] Num frames 1400...
[2025-02-11 17:28:32,269][02117] Num frames 1500...
[2025-02-11 17:28:32,400][02117] Num frames 1600...
[2025-02-11 17:28:32,530][02117] Num frames 1700...
[2025-02-11 17:28:32,660][02117] Num frames 1800...
[2025-02-11 17:28:32,789][02117] Num frames 1900...
[2025-02-11 17:28:32,919][02117] Num frames 2000...
[2025-02-11 17:28:33,053][02117] Num frames 2100...
[2025-02-11 17:28:33,105][02117] Avg episode rewards: #0: 55.999, true rewards: #0: 21.000
[2025-02-11 17:28:33,107][02117] Avg episode reward: 55.999, avg true_objective: 21.000
[2025-02-11 17:28:33,238][02117] Num frames 2200...
[2025-02-11 17:28:33,363][02117] Num frames 2300...
[2025-02-11 17:28:33,493][02117] Num frames 2400...
[2025-02-11 17:28:33,620][02117] Num frames 2500...
[2025-02-11 17:28:33,750][02117] Num frames 2600...
[2025-02-11 17:28:33,879][02117] Num frames 2700...
[2025-02-11 17:28:34,006][02117] Num frames 2800...
[2025-02-11 17:28:34,136][02117] Avg episode rewards: #0: 36.785, true rewards: #0: 14.285
[2025-02-11 17:28:34,138][02117] Avg episode reward: 36.785, avg true_objective: 14.285
[2025-02-11 17:28:34,194][02117] Num frames 2900...
[2025-02-11 17:28:34,321][02117] Num frames 3000...
[2025-02-11 17:28:34,459][02117] Num frames 3100...
[2025-02-11 17:28:34,587][02117] Num frames 3200...
[2025-02-11 17:28:34,714][02117] Num frames 3300...
[2025-02-11 17:28:34,862][02117] Avg episode rewards: #0: 28.910, true rewards: #0: 11.243
[2025-02-11 17:28:34,863][02117] Avg episode reward: 28.910, avg true_objective: 11.243
[2025-02-11 17:28:34,900][02117] Num frames 3400...
[2025-02-11 17:28:35,026][02117] Num frames 3500...
[2025-02-11 17:28:35,153][02117] Num frames 3600...
[2025-02-11 17:28:35,285][02117] Num frames 3700...
[2025-02-11 17:28:35,414][02117] Num frames 3800...
[2025-02-11 17:28:35,542][02117] Num frames 3900...
[2025-02-11 17:28:35,674][02117] Num frames 4000...
[2025-02-11 17:28:35,803][02117] Num frames 4100...
[2025-02-11 17:28:35,939][02117] Num frames 4200...
[2025-02-11 17:28:36,069][02117] Num frames 4300...
[2025-02-11 17:28:36,198][02117] Num frames 4400...
[2025-02-11 17:28:36,373][02117] Avg episode rewards: #0: 29.232, true rewards: #0: 11.233
[2025-02-11 17:28:36,375][02117] Avg episode reward: 29.232, avg true_objective: 11.233
[2025-02-11 17:28:36,386][02117] Num frames 4500...
[2025-02-11 17:28:36,516][02117] Num frames 4600...
[2025-02-11 17:28:36,642][02117] Num frames 4700...
[2025-02-11 17:28:36,772][02117] Num frames 4800...
[2025-02-11 17:28:36,903][02117] Num frames 4900...
[2025-02-11 17:28:37,037][02117] Num frames 5000...
[2025-02-11 17:28:37,174][02117] Num frames 5100...
[2025-02-11 17:28:37,327][02117] Num frames 5200...
[2025-02-11 17:28:37,457][02117] Num frames 5300...
[2025-02-11 17:28:37,585][02117] Num frames 5400...
[2025-02-11 17:28:37,712][02117] Num frames 5500...
[2025-02-11 17:28:37,835][02117] Num frames 5600...
[2025-02-11 17:28:37,963][02117] Num frames 5700...
[2025-02-11 17:28:38,096][02117] Num frames 5800...
[2025-02-11 17:28:38,223][02117] Num frames 5900...
[2025-02-11 17:28:38,349][02117] Num frames 6000...
[2025-02-11 17:28:38,480][02117] Num frames 6100...
[2025-02-11 17:28:38,608][02117] Num frames 6200...
[2025-02-11 17:28:38,735][02117] Num frames 6300...
[2025-02-11 17:28:38,861][02117] Num frames 6400...
[2025-02-11 17:28:39,044][02117] Avg episode rewards: #0: 34.198, true rewards: #0: 12.998
[2025-02-11 17:28:39,045][02117] Avg episode reward: 34.198, avg true_objective: 12.998
[2025-02-11 17:28:39,049][02117] Num frames 6500...
[2025-02-11 17:28:39,175][02117] Num frames 6600...
[2025-02-11 17:28:39,302][02117] Num frames 6700...
[2025-02-11 17:28:39,425][02117] Num frames 6800...
[2025-02-11 17:28:39,552][02117] Num frames 6900...
[2025-02-11 17:28:39,680][02117] Num frames 7000...
[2025-02-11 17:28:39,804][02117] Num frames 7100...
[2025-02-11 17:28:39,930][02117] Num frames 7200...
[2025-02-11 17:28:40,061][02117] Num frames 7300...
[2025-02-11 17:28:40,190][02117] Num frames 7400...
[2025-02-11 17:28:40,318][02117] Num frames 7500...
[2025-02-11 17:28:40,442][02117] Num frames 7600...
[2025-02-11 17:28:40,571][02117] Num frames 7700...
[2025-02-11 17:28:40,695][02117] Num frames 7800...
[2025-02-11 17:28:40,818][02117] Num frames 7900...
[2025-02-11 17:28:40,945][02117] Num frames 8000...
[2025-02-11 17:28:41,076][02117] Avg episode rewards: #0: 35.095, true rewards: #0: 13.428
[2025-02-11 17:28:41,077][02117] Avg episode reward: 35.095, avg true_objective: 13.428
[2025-02-11 17:28:41,133][02117] Num frames 8100...
[2025-02-11 17:28:41,259][02117] Num frames 8200...
[2025-02-11 17:28:41,386][02117] Num frames 8300...
[2025-02-11 17:28:41,514][02117] Num frames 8400...
[2025-02-11 17:28:41,643][02117] Num frames 8500...
[2025-02-11 17:28:41,768][02117] Num frames 8600...
[2025-02-11 17:28:41,894][02117] Num frames 8700...
[2025-02-11 17:28:42,027][02117] Num frames 8800...
[2025-02-11 17:28:42,152][02117] Num frames 8900...
[2025-02-11 17:28:42,278][02117] Num frames 9000...
[2025-02-11 17:28:42,405][02117] Num frames 9100...
[2025-02-11 17:28:42,559][02117] Avg episode rewards: #0: 34.253, true rewards: #0: 13.110
[2025-02-11 17:28:42,560][02117] Avg episode reward: 34.253, avg true_objective: 13.110
[2025-02-11 17:28:42,591][02117] Num frames 9200...
[2025-02-11 17:28:42,717][02117] Num frames 9300...
[2025-02-11 17:28:42,843][02117] Num frames 9400...
[2025-02-11 17:28:42,968][02117] Num frames 9500...
[2025-02-11 17:28:43,097][02117] Num frames 9600...
[2025-02-11 17:28:43,221][02117] Num frames 9700...
[2025-02-11 17:28:43,350][02117] Num frames 9800...
[2025-02-11 17:28:43,479][02117] Num frames 9900...
[2025-02-11 17:28:43,606][02117] Num frames 10000...
[2025-02-11 17:28:43,734][02117] Num frames 10100...
[2025-02-11 17:28:43,858][02117] Num frames 10200...
[2025-02-11 17:28:43,985][02117] Num frames 10300...
[2025-02-11 17:28:44,113][02117] Num frames 10400...
[2025-02-11 17:28:44,239][02117] Num frames 10500...
[2025-02-11 17:28:44,365][02117] Num frames 10600...
[2025-02-11 17:28:44,492][02117] Num frames 10700...
[2025-02-11 17:28:44,645][02117] Avg episode rewards: #0: 34.971, true rewards: #0: 13.471
[2025-02-11 17:28:44,647][02117] Avg episode reward: 34.971, avg true_objective: 13.471
[2025-02-11 17:28:44,676][02117] Num frames 10800...
[2025-02-11 17:28:44,803][02117] Num frames 10900...
[2025-02-11 17:28:44,929][02117] Num frames 11000...
[2025-02-11 17:28:45,057][02117] Num frames 11100...
[2025-02-11 17:28:45,182][02117] Num frames 11200...
[2025-02-11 17:28:45,312][02117] Num frames 11300...
[2025-02-11 17:28:45,438][02117] Num frames 11400...
[2025-02-11 17:28:45,564][02117] Num frames 11500...
[2025-02-11 17:28:45,692][02117] Num frames 11600...
[2025-02-11 17:28:45,819][02117] Num frames 11700...
[2025-02-11 17:28:45,962][02117] Avg episode rewards: #0: 33.854, true rewards: #0: 13.077
[2025-02-11 17:28:45,963][02117] Avg episode reward: 33.854, avg true_objective: 13.077
[2025-02-11 17:28:46,004][02117] Num frames 11800...
[2025-02-11 17:28:46,132][02117] Num frames 11900...
[2025-02-11 17:28:46,259][02117] Num frames 12000...
[2025-02-11 17:28:46,383][02117] Num frames 12100...
[2025-02-11 17:28:46,507][02117] Num frames 12200...
[2025-02-11 17:28:46,633][02117] Num frames 12300...
[2025-02-11 17:28:46,760][02117] Num frames 12400...
[2025-02-11 17:28:46,889][02117] Num frames 12500...
[2025-02-11 17:28:47,018][02117] Num frames 12600...
[2025-02-11 17:28:47,196][02117] Avg episode rewards: #0: 32.097, true rewards: #0: 12.697
[2025-02-11 17:28:47,198][02117] Avg episode reward: 32.097, avg true_objective: 12.697
[2025-02-11 17:28:47,203][02117] Num frames 12700...
[2025-02-11 17:29:17,288][02117] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-02-11 17:30:55,450][02117] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-02-11 17:30:55,451][02117] Overriding arg 'num_workers' with value 1 passed from command line
[2025-02-11 17:30:55,453][02117] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-02-11 17:30:55,454][02117] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-02-11 17:30:55,456][02117] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-02-11 17:30:55,457][02117] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-02-11 17:30:55,459][02117] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-02-11 17:30:55,460][02117] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-02-11 17:30:55,461][02117] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-02-11 17:30:55,463][02117] Adding new argument 'hf_repository'='mjm54/doom_health_gathering_supreme' that is not in the saved config file!
[2025-02-11 17:30:55,464][02117] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-02-11 17:30:55,466][02117] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-02-11 17:30:55,467][02117] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-02-11 17:30:55,469][02117] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-02-11 17:30:55,470][02117] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-02-11 17:30:55,494][02117] RunningMeanStd input shape: (3, 72, 128)
[2025-02-11 17:30:55,497][02117] RunningMeanStd input shape: (1,)
[2025-02-11 17:30:55,508][02117] ConvEncoder: input_channels=3
[2025-02-11 17:30:55,543][02117] Conv encoder output size: 512
[2025-02-11 17:30:55,545][02117] Policy head output size: 512
[2025-02-11 17:30:55,564][02117] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000003908_16007168.pth...
[2025-02-11 17:30:56,007][02117] Num frames 100...
[2025-02-11 17:30:56,130][02117] Num frames 200...
[2025-02-11 17:30:56,254][02117] Num frames 300...
[2025-02-11 17:30:56,381][02117] Num frames 400...
[2025-02-11 17:30:56,509][02117] Num frames 500...
[2025-02-11 17:30:56,635][02117] Num frames 600...
[2025-02-11 17:30:56,759][02117] Num frames 700...
[2025-02-11 17:30:56,886][02117] Num frames 800...
[2025-02-11 17:30:57,012][02117] Num frames 900...
[2025-02-11 17:30:57,137][02117] Num frames 1000...
[2025-02-11 17:30:57,261][02117] Num frames 1100...
[2025-02-11 17:30:57,386][02117] Num frames 1200...
[2025-02-11 17:30:57,511][02117] Num frames 1300...
[2025-02-11 17:30:57,633][02117] Num frames 1400...
[2025-02-11 17:30:57,760][02117] Num frames 1500...
[2025-02-11 17:30:57,889][02117] Num frames 1600...
[2025-02-11 17:30:58,016][02117] Num frames 1700...
[2025-02-11 17:30:58,142][02117] Num frames 1800...
[2025-02-11 17:30:58,272][02117] Num frames 1900...
[2025-02-11 17:30:58,402][02117] Num frames 2000...
[2025-02-11 17:30:58,530][02117] Num frames 2100...
[2025-02-11 17:30:58,582][02117] Avg episode rewards: #0: 54.999, true rewards: #0: 21.000
[2025-02-11 17:30:58,584][02117] Avg episode reward: 54.999, avg true_objective: 21.000
[2025-02-11 17:30:58,707][02117] Num frames 2200...
[2025-02-11 17:30:58,834][02117] Num frames 2300...
[2025-02-11 17:30:58,960][02117] Num frames 2400...
[2025-02-11 17:30:59,086][02117] Num frames 2500...
[2025-02-11 17:30:59,210][02117] Num frames 2600...
[2025-02-11 17:30:59,344][02117] Num frames 2700...
[2025-02-11 17:30:59,410][02117] Avg episode rewards: #0: 32.539, true rewards: #0: 13.540
[2025-02-11 17:30:59,412][02117] Avg episode reward: 32.539, avg true_objective: 13.540
[2025-02-11 17:30:59,537][02117] Num frames 2800...
[2025-02-11 17:30:59,676][02117] Num frames 2900...
[2025-02-11 17:30:59,811][02117] Num frames 3000...
[2025-02-11 17:30:59,948][02117] Num frames 3100...
[2025-02-11 17:31:00,086][02117] Num frames 3200...
[2025-02-11 17:31:00,228][02117] Num frames 3300...
[2025-02-11 17:31:00,364][02117] Num frames 3400...
[2025-02-11 17:31:00,495][02117] Num frames 3500...
[2025-02-11 17:31:00,627][02117] Num frames 3600...
[2025-02-11 17:31:00,761][02117] Num frames 3700...
[2025-02-11 17:31:00,897][02117] Num frames 3800...
[2025-02-11 17:31:01,022][02117] Num frames 3900...
[2025-02-11 17:31:01,147][02117] Num frames 4000...
[2025-02-11 17:31:01,304][02117] Avg episode rewards: #0: 31.613, true rewards: #0: 13.613
[2025-02-11 17:31:01,306][02117] Avg episode reward: 31.613, avg true_objective: 13.613
[2025-02-11 17:31:01,328][02117] Num frames 4100...
[2025-02-11 17:31:01,450][02117] Num frames 4200...
[2025-02-11 17:31:01,573][02117] Num frames 4300...
[2025-02-11 17:31:01,698][02117] Num frames 4400...
[2025-02-11 17:31:01,825][02117] Num frames 4500...
[2025-02-11 17:31:01,951][02117] Num frames 4600...
[2025-02-11 17:31:02,076][02117] Num frames 4700...
[2025-02-11 17:31:02,201][02117] Num frames 4800...
[2025-02-11 17:31:02,327][02117] Num frames 4900...
[2025-02-11 17:31:02,453][02117] Num frames 5000...
[2025-02-11 17:31:02,576][02117] Num frames 5100...
[2025-02-11 17:31:02,703][02117] Num frames 5200...
[2025-02-11 17:31:02,828][02117] Num frames 5300...
[2025-02-11 17:31:02,954][02117] Num frames 5400...
[2025-02-11 17:31:03,081][02117] Num frames 5500...
[2025-02-11 17:31:03,208][02117] Num frames 5600...
[2025-02-11 17:31:03,334][02117] Num frames 5700...
[2025-02-11 17:31:03,463][02117] Num frames 5800...
[2025-02-11 17:31:03,588][02117] Num frames 5900...
[2025-02-11 17:31:03,715][02117] Num frames 6000...
[2025-02-11 17:31:03,855][02117] Avg episode rewards: #0: 37.420, true rewards: #0: 15.170
[2025-02-11 17:31:03,857][02117] Avg episode reward: 37.420, avg true_objective: 15.170
[2025-02-11 17:31:03,900][02117] Num frames 6100...
[2025-02-11 17:31:04,026][02117] Num frames 6200...
[2025-02-11 17:31:04,152][02117] Num frames 6300...
[2025-02-11 17:31:04,276][02117] Num frames 6400...
[2025-02-11 17:31:04,400][02117] Num frames 6500...
[2025-02-11 17:31:04,525][02117] Num frames 6600...
[2025-02-11 17:31:04,650][02117] Num frames 6700...
[2025-02-11 17:31:04,775][02117] Num frames 6800...
[2025-02-11 17:31:04,900][02117] Num frames 6900...
[2025-02-11 17:31:05,033][02117] Num frames 7000...
[2025-02-11 17:31:05,164][02117] Num frames 7100...
[2025-02-11 17:31:05,291][02117] Num frames 7200...
[2025-02-11 17:31:05,416][02117] Num frames 7300...
[2025-02-11 17:31:05,540][02117] Num frames 7400...
[2025-02-11 17:31:05,650][02117] Avg episode rewards: #0: 37.088, true rewards: #0: 14.888
[2025-02-11 17:31:05,651][02117] Avg episode reward: 37.088, avg true_objective: 14.888
[2025-02-11 17:31:05,734][02117] Num frames 7500...
[2025-02-11 17:31:05,857][02117] Num frames 7600...
[2025-02-11 17:31:05,986][02117] Num frames 7700...
[2025-02-11 17:31:06,117][02117] Num frames 7800...
[2025-02-11 17:31:06,220][02117] Avg episode rewards: #0: 31.723, true rewards: #0: 13.057
[2025-02-11 17:31:06,221][02117] Avg episode reward: 31.723, avg true_objective: 13.057
[2025-02-11 17:31:06,307][02117] Num frames 7900...
[2025-02-11 17:31:06,434][02117] Num frames 8000...
[2025-02-11 17:31:06,556][02117] Num frames 8100...
[2025-02-11 17:31:06,683][02117] Num frames 8200...
[2025-02-11 17:31:06,809][02117] Num frames 8300...
[2025-02-11 17:31:06,935][02117] Num frames 8400...
[2025-02-11 17:31:07,059][02117] Num frames 8500...
[2025-02-11 17:31:07,186][02117] Num frames 8600...
[2025-02-11 17:31:07,312][02117] Num frames 8700...
[2025-02-11 17:31:07,402][02117] Avg episode rewards: #0: 30.325, true rewards: #0: 12.469
[2025-02-11 17:31:07,403][02117] Avg episode reward: 30.325, avg true_objective: 12.469
[2025-02-11 17:31:07,493][02117] Num frames 8800...
[2025-02-11 17:31:07,617][02117] Num frames 8900...
[2025-02-11 17:31:07,742][02117] Num frames 9000...
[2025-02-11 17:31:07,868][02117] Num frames 9100...
[2025-02-11 17:31:07,993][02117] Num frames 9200...
[2025-02-11 17:31:08,119][02117] Num frames 9300...
[2025-02-11 17:31:08,245][02117] Num frames 9400...
[2025-02-11 17:31:08,375][02117] Num frames 9500...
[2025-02-11 17:31:08,499][02117] Num frames 9600...
[2025-02-11 17:31:08,629][02117] Num frames 9700...
[2025-02-11 17:31:08,755][02117] Num frames 9800...
[2025-02-11 17:31:08,882][02117] Num frames 9900...
[2025-02-11 17:31:09,009][02117] Num frames 10000...
[2025-02-11 17:31:09,133][02117] Num frames 10100...
[2025-02-11 17:31:09,255][02117] Num frames 10200...
[2025-02-11 17:31:09,383][02117] Num frames 10300...
[2025-02-11 17:31:09,508][02117] Num frames 10400...
[2025-02-11 17:31:09,637][02117] Num frames 10500...
[2025-02-11 17:31:09,717][02117] Avg episode rewards: #0: 32.397, true rewards: #0: 13.148
[2025-02-11 17:31:09,718][02117] Avg episode reward: 32.397, avg true_objective: 13.148
[2025-02-11 17:31:09,822][02117] Num frames 10600...
[2025-02-11 17:31:09,950][02117] Num frames 10700...
[2025-02-11 17:31:10,078][02117] Num frames 10800...
[2025-02-11 17:31:10,206][02117] Num frames 10900...
[2025-02-11 17:31:10,331][02117] Num frames 11000...
[2025-02-11 17:31:10,459][02117] Num frames 11100...
[2025-02-11 17:31:10,586][02117] Num frames 11200...
[2025-02-11 17:31:10,714][02117] Num frames 11300...
[2025-02-11 17:31:10,838][02117] Num frames 11400...
[2025-02-11 17:31:10,964][02117] Num frames 11500...
[2025-02-11 17:31:11,091][02117] Num frames 11600...
[2025-02-11 17:31:11,219][02117] Num frames 11700...
[2025-02-11 17:31:11,346][02117] Num frames 11800...
[2025-02-11 17:31:11,474][02117] Num frames 11900...
[2025-02-11 17:31:11,600][02117] Num frames 12000...
[2025-02-11 17:31:11,725][02117] Num frames 12100...
[2025-02-11 17:31:11,854][02117] Num frames 12200...
[2025-02-11 17:31:11,983][02117] Num frames 12300...
[2025-02-11 17:31:12,111][02117] Num frames 12400...
[2025-02-11 17:31:12,237][02117] Num frames 12500...
[2025-02-11 17:31:12,365][02117] Num frames 12600...
[2025-02-11 17:31:12,443][02117] Avg episode rewards: #0: 35.686, true rewards: #0: 14.020
[2025-02-11 17:31:12,444][02117] Avg episode reward: 35.686, avg true_objective: 14.020
[2025-02-11 17:31:12,546][02117] Num frames 12700...
[2025-02-11 17:31:12,673][02117] Num frames 12800...
[2025-02-11 17:31:12,798][02117] Num frames 12900...
[2025-02-11 17:31:12,864][02117] Avg episode rewards: #0: 32.408, true rewards: #0: 12.908
[2025-02-11 17:31:12,866][02117] Avg episode reward: 32.408, avg true_objective: 12.908
[2025-02-11 17:31:43,232][02117] Replay video saved to /content/train_dir/default_experiment/replay.mp4!