diff --git "a/sf_log.txt" "b/sf_log.txt" --- "a/sf_log.txt" +++ "b/sf_log.txt" @@ -1562,3 +1562,3106 @@ Check the documentation of torch.load to learn more about types accepted by defa [2025-08-22 19:24:46,564][19241] Avg episode rewards: #0: 4.536, true rewards: #0: 3.936 [2025-08-22 19:24:46,565][19241] Avg episode reward: 4.536, avg true_objective: 3.936 [2025-08-22 19:24:51,963][19241] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! +[2025-08-22 19:24:55,751][19241] The model has been pushed to https://huggingface.co/turbo-maikol/rl_course_vizdoom_health_gathering_supreme +[2025-08-22 19:28:43,395][19241] Loading legacy config file train_dir/doom_health_gathering_supreme_2222/cfg.json instead of train_dir/doom_health_gathering_supreme_2222/config.json +[2025-08-22 19:28:43,399][19241] Loading existing experiment configuration from train_dir/doom_health_gathering_supreme_2222/config.json +[2025-08-22 19:28:43,401][19241] Overriding arg 'experiment' with value 'doom_health_gathering_supreme_2222' passed from command line +[2025-08-22 19:28:43,402][19241] Overriding arg 'train_dir' with value 'train_dir' passed from command line +[2025-08-22 19:28:43,402][19241] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-22 19:28:43,404][19241] Adding new argument 'lr_adaptive_min'=1e-06 that is not in the saved config file! +[2025-08-22 19:28:43,406][19241] Adding new argument 'lr_adaptive_max'=0.01 that is not in the saved config file! +[2025-08-22 19:28:43,409][19241] Adding new argument 'env_gpu_observations'=True that is not in the saved config file! +[2025-08-22 19:28:43,410][19241] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-22 19:28:43,411][19241] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-22 19:28:43,412][19241] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-22 19:28:43,414][19241] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-22 19:28:43,415][19241] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-08-22 19:28:43,417][19241] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-22 19:28:43,418][19241] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-08-22 19:28:43,419][19241] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-08-22 19:28:43,420][19241] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-22 19:28:43,421][19241] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-22 19:28:43,422][19241] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-22 19:28:43,423][19241] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-22 19:28:43,423][19241] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-22 19:28:43,487][19241] RunningMeanStd input shape: (3, 72, 128) +[2025-08-22 19:28:43,492][19241] RunningMeanStd input shape: (1,) +[2025-08-22 19:28:43,512][19241] ConvEncoder: input_channels=3 +[2025-08-22 19:28:43,636][19241] Conv encoder output size: 512 +[2025-08-22 19:28:43,638][19241] Policy head output size: 512 +[2025-08-22 19:28:43,679][19241] Loading state from checkpoint train_dir/doom_health_gathering_supreme_2222/checkpoint_p0/checkpoint_000539850_4422451200.pth... +[2025-08-22 19:28:44,434][19241] Num frames 100... +[2025-08-22 19:28:47,541][19241] Num frames 200... +[2025-08-22 19:28:47,722][19241] Num frames 300... +[2025-08-22 19:28:47,905][19241] Num frames 400... +[2025-08-22 19:28:48,069][19241] Num frames 500... +[2025-08-22 19:28:48,335][19241] Num frames 600... +[2025-08-22 19:28:48,597][19241] Num frames 700... +[2025-08-22 19:28:48,814][19241] Num frames 800... +[2025-08-22 19:28:49,006][19241] Num frames 900... +[2025-08-22 19:28:49,161][19241] Num frames 1000... +[2025-08-22 19:28:49,344][19241] Num frames 1100... +[2025-08-22 19:28:49,498][19241] Num frames 1200... +[2025-08-22 19:28:49,686][19241] Num frames 1300... +[2025-08-22 19:28:49,856][19241] Num frames 1400... +[2025-08-22 19:28:50,021][19241] Num frames 1500... +[2025-08-22 19:28:50,213][19241] Num frames 1600... +[2025-08-22 19:28:50,387][19241] Num frames 1700... +[2025-08-22 19:28:50,560][19241] Num frames 1800... +[2025-08-22 19:28:50,740][19241] Num frames 1900... +[2025-08-22 19:28:50,909][19241] Num frames 2000... +[2025-08-22 19:28:51,091][19241] Num frames 2100... +[2025-08-22 19:28:51,143][19241] Avg episode rewards: #0: 68.999, true rewards: #0: 21.000 +[2025-08-22 19:28:51,145][19241] Avg episode reward: 68.999, avg true_objective: 21.000 +[2025-08-22 19:28:51,323][19241] Num frames 2200... +[2025-08-22 19:28:51,495][19241] Num frames 2300... +[2025-08-22 19:28:51,679][19241] Num frames 2400... +[2025-08-22 19:28:51,962][19241] Num frames 2500... +[2025-08-22 19:28:52,278][19241] Num frames 2600... +[2025-08-22 19:28:52,520][19241] Num frames 2700... +[2025-08-22 19:28:52,740][19241] Num frames 2800... +[2025-08-22 19:28:52,962][19241] Num frames 2900... +[2025-08-22 19:28:53,199][19241] Num frames 3000... +[2025-08-22 19:28:53,398][19241] Num frames 3100... +[2025-08-22 19:28:53,593][19241] Num frames 3200... +[2025-08-22 19:28:53,894][19241] Num frames 3300... +[2025-08-22 19:28:54,117][19241] Num frames 3400... +[2025-08-22 19:28:54,320][19241] Num frames 3500... +[2025-08-22 19:28:54,501][19241] Num frames 3600... +[2025-08-22 19:28:54,701][19241] Num frames 3700... +[2025-08-22 19:28:54,982][19241] Num frames 3800... +[2025-08-22 19:28:55,175][19241] Num frames 3900... +[2025-08-22 19:28:55,370][19241] Num frames 4000... +[2025-08-22 19:28:55,573][19241] Num frames 4100... +[2025-08-22 19:28:55,820][19241] Num frames 4200... +[2025-08-22 19:28:55,872][19241] Avg episode rewards: #0: 67.499, true rewards: #0: 21.000 +[2025-08-22 19:28:55,874][19241] Avg episode reward: 67.499, avg true_objective: 21.000 +[2025-08-22 19:28:56,101][19241] Num frames 4300... +[2025-08-22 19:28:56,285][19241] Num frames 4400... +[2025-08-22 19:28:56,468][19241] Num frames 4500... +[2025-08-22 19:28:56,685][19241] Num frames 4600... +[2025-08-22 19:28:56,966][19241] Num frames 4700... +[2025-08-22 19:28:57,192][19241] Num frames 4800... +[2025-08-22 19:28:57,385][19241] Num frames 4900... +[2025-08-22 19:28:57,592][19241] Num frames 5000... +[2025-08-22 19:28:57,791][19241] Num frames 5100... +[2025-08-22 19:28:58,010][19241] Num frames 5200... +[2025-08-22 19:28:58,204][19241] Num frames 5300... +[2025-08-22 19:28:58,397][19241] Num frames 5400... +[2025-08-22 19:28:58,646][19241] Num frames 5500... +[2025-08-22 19:28:58,863][19241] Num frames 5600... +[2025-08-22 19:28:59,137][19241] Num frames 5700... +[2025-08-22 19:28:59,349][19241] Num frames 5800... +[2025-08-22 19:28:59,541][19241] Num frames 5900... +[2025-08-22 19:28:59,726][19241] Num frames 6000... +[2025-08-22 19:28:59,948][19241] Num frames 6100... +[2025-08-22 19:29:00,187][19241] Num frames 6200... +[2025-08-22 19:29:00,461][19241] Num frames 6300... +[2025-08-22 19:29:00,514][19241] Avg episode rewards: #0: 66.666, true rewards: #0: 21.000 +[2025-08-22 19:29:00,517][19241] Avg episode reward: 66.666, avg true_objective: 21.000 +[2025-08-22 19:29:00,727][19241] Num frames 6400... +[2025-08-22 19:29:00,969][19241] Num frames 6500... +[2025-08-22 19:29:01,265][19241] Num frames 6600... +[2025-08-22 19:29:01,504][19241] Num frames 6700... +[2025-08-22 19:29:01,691][19241] Num frames 6800... +[2025-08-22 19:29:01,878][19241] Num frames 6900... +[2025-08-22 19:29:02,105][19241] Num frames 7000... +[2025-08-22 19:29:02,330][19241] Num frames 7100... +[2025-08-22 19:29:02,531][19241] Num frames 7200... +[2025-08-22 19:29:02,737][19241] Num frames 7300... +[2025-08-22 19:29:02,905][19241] Num frames 7400... +[2025-08-22 19:29:03,095][19241] Num frames 7500... +[2025-08-22 19:29:03,303][19241] Num frames 7600... +[2025-08-22 19:29:03,493][19241] Num frames 7700... +[2025-08-22 19:29:03,725][19241] Num frames 7800... +[2025-08-22 19:29:03,910][19241] Num frames 7900... +[2025-08-22 19:29:04,115][19241] Num frames 8000... +[2025-08-22 19:29:04,196][19241] Avg episode rewards: #0: 62.027, true rewards: #0: 20.028 +[2025-08-22 19:29:04,197][19241] Avg episode reward: 62.027, avg true_objective: 20.028 +[2025-08-22 19:29:04,376][19241] Num frames 8100... +[2025-08-22 19:29:04,534][19241] Num frames 8200... +[2025-08-22 19:29:04,710][19241] Num frames 8300... +[2025-08-22 19:29:04,878][19241] Num frames 8400... +[2025-08-22 19:29:05,072][19241] Num frames 8500... +[2025-08-22 19:29:05,242][19241] Num frames 8600... +[2025-08-22 19:29:05,416][19241] Num frames 8700... +[2025-08-22 19:29:05,589][19241] Num frames 8800... +[2025-08-22 19:29:05,761][19241] Num frames 8900... +[2025-08-22 19:29:05,972][19241] Num frames 9000... +[2025-08-22 19:29:06,189][19241] Num frames 9100... +[2025-08-22 19:29:06,393][19241] Num frames 9200... +[2025-08-22 19:29:06,561][19241] Num frames 9300... +[2025-08-22 19:29:06,752][19241] Num frames 9400... +[2025-08-22 19:29:06,955][19241] Num frames 9500... +[2025-08-22 19:29:07,180][19241] Avg episode rewards: #0: 58.357, true rewards: #0: 19.158 +[2025-08-22 19:29:07,182][19241] Avg episode reward: 58.357, avg true_objective: 19.158 +[2025-08-22 19:29:07,227][19241] Num frames 9600... +[2025-08-22 19:29:07,440][19241] Num frames 9700... +[2025-08-22 19:29:07,652][19241] Num frames 9800... +[2025-08-22 19:29:08,009][19241] Num frames 9900... +[2025-08-22 19:29:08,255][19241] Num frames 10000... +[2025-08-22 19:29:08,483][19241] Num frames 10100... +[2025-08-22 19:29:08,673][19241] Num frames 10200... +[2025-08-22 19:29:08,892][19241] Num frames 10300... +[2025-08-22 19:29:09,083][19241] Num frames 10400... +[2025-08-22 19:29:09,293][19241] Num frames 10500... +[2025-08-22 19:29:09,459][19241] Num frames 10600... +[2025-08-22 19:29:09,646][19241] Num frames 10700... +[2025-08-22 19:29:09,826][19241] Num frames 10800... +[2025-08-22 19:29:09,993][19241] Num frames 10900... +[2025-08-22 19:29:10,195][19241] Num frames 11000... +[2025-08-22 19:29:10,363][19241] Num frames 11100... +[2025-08-22 19:29:10,555][19241] Num frames 11200... +[2025-08-22 19:29:10,730][19241] Num frames 11300... +[2025-08-22 19:29:10,885][19241] Num frames 11400... +[2025-08-22 19:29:11,045][19241] Num frames 11500... +[2025-08-22 19:29:11,204][19241] Num frames 11600... +[2025-08-22 19:29:11,376][19241] Avg episode rewards: #0: 59.297, true rewards: #0: 19.465 +[2025-08-22 19:29:11,377][19241] Avg episode reward: 59.297, avg true_objective: 19.465 +[2025-08-22 19:29:11,413][19241] Num frames 11700... +[2025-08-22 19:29:11,634][19241] Num frames 11800... +[2025-08-22 19:29:11,780][19241] Num frames 11900... +[2025-08-22 19:29:11,927][19241] Num frames 12000... +[2025-08-22 19:29:12,074][19241] Num frames 12100... +[2025-08-22 19:29:12,226][19241] Num frames 12200... +[2025-08-22 19:29:12,405][19241] Num frames 12300... +[2025-08-22 19:29:12,557][19241] Num frames 12400... +[2025-08-22 19:29:12,702][19241] Num frames 12500... +[2025-08-22 19:29:12,840][19241] Num frames 12600... +[2025-08-22 19:29:12,983][19241] Num frames 12700... +[2025-08-22 19:29:13,125][19241] Num frames 12800... +[2025-08-22 19:29:13,278][19241] Num frames 12900... +[2025-08-22 19:29:13,496][19241] Num frames 13000... +[2025-08-22 19:29:13,701][19241] Num frames 13100... +[2025-08-22 19:29:13,853][19241] Num frames 13200... +[2025-08-22 19:29:13,928][19241] Avg episode rewards: #0: 56.449, true rewards: #0: 18.879 +[2025-08-22 19:29:13,931][19241] Avg episode reward: 56.449, avg true_objective: 18.879 +[2025-08-22 19:29:14,063][19241] Num frames 13300... +[2025-08-22 19:29:14,230][19241] Num frames 13400... +[2025-08-22 19:29:14,371][19241] Num frames 13500... +[2025-08-22 19:29:14,514][19241] Num frames 13600... +[2025-08-22 19:29:14,655][19241] Num frames 13700... +[2025-08-22 19:29:14,792][19241] Num frames 13800... +[2025-08-22 19:29:14,961][19241] Num frames 13900... +[2025-08-22 19:29:15,106][19241] Num frames 14000... +[2025-08-22 19:29:15,276][19241] Num frames 14100... +[2025-08-22 19:29:15,511][19241] Num frames 14200... +[2025-08-22 19:29:15,728][19241] Num frames 14300... +[2025-08-22 19:29:15,866][19241] Num frames 14400... +[2025-08-22 19:29:16,023][19241] Num frames 14500... +[2025-08-22 19:29:16,182][19241] Num frames 14600... +[2025-08-22 19:29:16,335][19241] Num frames 14700... +[2025-08-22 19:29:16,480][19241] Num frames 14800... +[2025-08-22 19:29:16,624][19241] Num frames 14900... +[2025-08-22 19:29:16,783][19241] Num frames 15000... +[2025-08-22 19:29:16,930][19241] Num frames 15100... +[2025-08-22 19:29:17,083][19241] Num frames 15200... +[2025-08-22 19:29:17,242][19241] Num frames 15300... +[2025-08-22 19:29:17,322][19241] Avg episode rewards: #0: 56.768, true rewards: #0: 19.144 +[2025-08-22 19:29:17,323][19241] Avg episode reward: 56.768, avg true_objective: 19.144 +[2025-08-22 19:29:17,498][19241] Num frames 15400... +[2025-08-22 19:29:17,721][19241] Num frames 15500... +[2025-08-22 19:29:17,885][19241] Num frames 15600... +[2025-08-22 19:29:18,052][19241] Num frames 15700... +[2025-08-22 19:29:18,212][19241] Num frames 15800... +[2025-08-22 19:29:18,408][19241] Num frames 15900... +[2025-08-22 19:29:18,570][19241] Num frames 16000... +[2025-08-22 19:29:18,849][19241] Num frames 16100... +[2025-08-22 19:29:19,101][19241] Num frames 16200... +[2025-08-22 19:29:19,363][19241] Num frames 16300... +[2025-08-22 19:29:19,622][19241] Num frames 16400... +[2025-08-22 19:29:22,775][19241] Num frames 16500... +[2025-08-22 19:29:22,977][19241] Num frames 16600... +[2025-08-22 19:29:23,156][19241] Num frames 16700... +[2025-08-22 19:29:23,364][19241] Num frames 16800... +[2025-08-22 19:29:23,561][19241] Num frames 16900... +[2025-08-22 19:29:23,789][19241] Num frames 17000... +[2025-08-22 19:29:23,977][19241] Num frames 17100... +[2025-08-22 19:29:24,169][19241] Num frames 17200... +[2025-08-22 19:29:24,391][19241] Num frames 17300... +[2025-08-22 19:29:24,590][19241] Num frames 17400... +[2025-08-22 19:29:24,680][19241] Avg episode rewards: #0: 57.238, true rewards: #0: 19.350 +[2025-08-22 19:29:24,682][19241] Avg episode reward: 57.238, avg true_objective: 19.350 +[2025-08-22 19:29:24,847][19241] Num frames 17500... +[2025-08-22 19:29:25,018][19241] Num frames 17600... +[2025-08-22 19:29:25,189][19241] Num frames 17700... +[2025-08-22 19:29:25,405][19241] Num frames 17800... +[2025-08-22 19:29:25,593][19241] Num frames 17900... +[2025-08-22 19:29:25,803][19241] Num frames 18000... +[2025-08-22 19:29:25,989][19241] Num frames 18100... +[2025-08-22 19:29:26,247][19241] Num frames 18200... +[2025-08-22 19:29:26,507][19241] Num frames 18300... +[2025-08-22 19:29:26,708][19241] Num frames 18400... +[2025-08-22 19:29:26,915][19241] Num frames 18500... +[2025-08-22 19:29:27,110][19241] Num frames 18600... +[2025-08-22 19:29:27,333][19241] Num frames 18700... +[2025-08-22 19:29:27,554][19241] Num frames 18800... +[2025-08-22 19:29:27,778][19241] Num frames 18900... +[2025-08-22 19:29:27,990][19241] Num frames 19000... +[2025-08-22 19:29:28,187][19241] Num frames 19100... +[2025-08-22 19:29:28,417][19241] Num frames 19200... +[2025-08-22 19:29:28,610][19241] Num frames 19300... +[2025-08-22 19:29:28,798][19241] Num frames 19400... +[2025-08-22 19:29:29,010][19241] Num frames 19500... +[2025-08-22 19:29:29,098][19241] Avg episode rewards: #0: 57.014, true rewards: #0: 19.515 +[2025-08-22 19:29:29,099][19241] Avg episode reward: 57.014, avg true_objective: 19.515 +[2025-08-22 19:30:01,772][19241] Replay video saved to train_dir/doom_health_gathering_supreme_2222/replay.mp4! +[2025-08-29 18:22:49,095][15827] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... +[2025-08-29 18:22:49,213][15827] Rollout worker 0 uses device cpu +[2025-08-29 18:22:49,215][15827] Rollout worker 1 uses device cpu +[2025-08-29 18:22:49,216][15827] Rollout worker 2 uses device cpu +[2025-08-29 18:22:49,216][15827] Rollout worker 3 uses device cpu +[2025-08-29 18:22:49,217][15827] Rollout worker 4 uses device cpu +[2025-08-29 18:22:49,218][15827] Rollout worker 5 uses device cpu +[2025-08-29 18:22:49,219][15827] Rollout worker 6 uses device cpu +[2025-08-29 18:22:49,220][15827] Rollout worker 7 uses device cpu +[2025-08-29 18:22:49,221][15827] Rollout worker 8 uses device cpu +[2025-08-29 18:22:49,222][15827] Rollout worker 9 uses device cpu +[2025-08-29 18:22:49,222][15827] Rollout worker 10 uses device cpu +[2025-08-29 18:22:49,224][15827] Rollout worker 11 uses device cpu +[2025-08-29 18:22:49,874][15827] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:22:49,876][15827] InferenceWorker_p0-w0: min num requests: 4 +[2025-08-29 18:22:49,950][15827] Starting all processes... +[2025-08-29 18:22:49,952][15827] Starting process learner_proc0 +[2025-08-29 18:22:49,998][15827] Starting all processes... +[2025-08-29 18:22:50,005][15827] Starting process inference_proc0-0 +[2025-08-29 18:22:50,005][15827] Starting process rollout_proc0 +[2025-08-29 18:22:50,006][15827] Starting process rollout_proc1 +[2025-08-29 18:22:50,006][15827] Starting process rollout_proc2 +[2025-08-29 18:22:50,008][15827] Starting process rollout_proc3 +[2025-08-29 18:22:50,008][15827] Starting process rollout_proc4 +[2025-08-29 18:22:50,008][15827] Starting process rollout_proc5 +[2025-08-29 18:22:50,009][15827] Starting process rollout_proc6 +[2025-08-29 18:22:50,017][15827] Starting process rollout_proc7 +[2025-08-29 18:22:50,018][15827] Starting process rollout_proc8 +[2025-08-29 18:22:50,020][15827] Starting process rollout_proc9 +[2025-08-29 18:22:50,021][15827] Starting process rollout_proc10 +[2025-08-29 18:22:50,025][15827] Starting process rollout_proc11 +[2025-08-29 18:23:00,344][17355] Worker 1 uses CPU cores [1] +[2025-08-29 18:23:00,345][17359] Worker 5 uses CPU cores [5] +[2025-08-29 18:23:00,347][17354] Worker 0 uses CPU cores [0] +[2025-08-29 18:23:00,347][17353] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:23:00,347][17353] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-08-29 18:23:00,347][17358] Worker 4 uses CPU cores [4] +[2025-08-29 18:23:00,351][17360] Worker 6 uses CPU cores [6] +[2025-08-29 18:23:00,352][17373] Worker 10 uses CPU cores [0, 1, 2, 3, 4] +[2025-08-29 18:23:00,352][17336] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:23:00,353][17336] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-08-29 18:23:00,353][17361] Worker 8 uses CPU cores [8] +[2025-08-29 18:23:00,354][17372] Worker 9 uses CPU cores [9] +[2025-08-29 18:23:00,354][17357] Worker 3 uses CPU cores [3] +[2025-08-29 18:23:00,355][17374] Worker 11 uses CPU cores [5, 6, 7, 8, 9] +[2025-08-29 18:23:00,358][17356] Worker 2 uses CPU cores [2] +[2025-08-29 18:23:00,363][17362] Worker 7 uses CPU cores [7] +[2025-08-29 18:23:00,571][17336] Num visible devices: 1 +[2025-08-29 18:23:00,572][17336] Starting seed is not provided +[2025-08-29 18:23:00,572][17336] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:23:00,572][17336] Initializing actor-critic model on device cuda:0 +[2025-08-29 18:23:00,573][17336] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 18:23:00,575][17353] Num visible devices: 1 +[2025-08-29 18:23:00,588][17336] RunningMeanStd input shape: (1,) +[2025-08-29 18:23:00,608][17336] ConvEncoder: input_channels=3 +[2025-08-29 18:23:01,046][17336] Conv encoder output size: 512 +[2025-08-29 18:23:01,047][17336] Policy head output size: 512 +[2025-08-29 18:23:01,127][17336] Created Actor Critic model with architecture: +[2025-08-29 18:23:01,127][17336] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2025-08-29 18:23:02,276][17336] Using optimizer +[2025-08-29 18:23:05,013][17336] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-29 18:23:05,024][17336] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-29 18:23:05,028][17336] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-29 18:23:05,030][17336] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-29 18:23:05,030][17336] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-29 18:23:05,031][17336] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-29 18:23:05,032][17336] Did not load from checkpoint, starting from scratch! +[2025-08-29 18:23:05,032][17336] Initialized policy 0 weights for model version 0 +[2025-08-29 18:23:05,043][17336] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:23:05,043][17336] LearnerWorker_p0 finished initialization! +[2025-08-29 18:23:05,608][17353] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 18:23:05,613][17353] RunningMeanStd input shape: (1,) +[2025-08-29 18:23:05,642][17353] ConvEncoder: input_channels=3 +[2025-08-29 18:23:05,849][17353] Conv encoder output size: 512 +[2025-08-29 18:23:05,850][17353] Policy head output size: 512 +[2025-08-29 18:23:05,990][15827] Inference worker 0-0 is ready! +[2025-08-29 18:23:05,995][15827] All inference workers are ready! Signal rollout workers to start! +[2025-08-29 18:23:06,203][17356] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,208][17360] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,215][17355] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,223][17358] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,246][17359] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,267][17357] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,293][17372] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,319][17373] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,331][17361] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,357][17362] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,362][17354] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,386][17374] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:23:06,859][17359] Decorrelating experience for 0 frames... +[2025-08-29 18:23:06,859][17358] Decorrelating experience for 0 frames... +[2025-08-29 18:23:06,859][17357] Decorrelating experience for 0 frames... +[2025-08-29 18:23:06,859][17355] Decorrelating experience for 0 frames... +[2025-08-29 18:23:06,859][17354] Decorrelating experience for 0 frames... +[2025-08-29 18:23:06,859][17360] Decorrelating experience for 0 frames... +[2025-08-29 18:23:06,861][17373] Decorrelating experience for 0 frames... +[2025-08-29 18:23:06,864][17356] Decorrelating experience for 0 frames... +[2025-08-29 18:23:10,601][15827] Heartbeat connected on LearnerWorker_p0 +[2025-08-29 18:23:10,604][15827] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:23:10,606][15827] Heartbeat connected on Batcher_0 +[2025-08-29 18:23:10,629][15827] Heartbeat connected on InferenceWorker_p0-w0 +[2025-08-29 18:23:10,679][17361] Decorrelating experience for 0 frames... +[2025-08-29 18:23:10,694][17359] Decorrelating experience for 64 frames... +[2025-08-29 18:23:10,695][17357] Decorrelating experience for 64 frames... +[2025-08-29 18:23:10,695][17354] Decorrelating experience for 64 frames... +[2025-08-29 18:23:10,696][17362] Decorrelating experience for 0 frames... +[2025-08-29 18:23:10,730][17372] Decorrelating experience for 0 frames... +[2025-08-29 18:23:10,742][17355] Decorrelating experience for 64 frames... +[2025-08-29 18:23:10,935][17362] Decorrelating experience for 64 frames... +[2025-08-29 18:23:10,940][17358] Decorrelating experience for 64 frames... +[2025-08-29 18:23:10,971][17357] Decorrelating experience for 128 frames... +[2025-08-29 18:23:10,972][17354] Decorrelating experience for 128 frames... +[2025-08-29 18:23:10,973][17372] Decorrelating experience for 64 frames... +[2025-08-29 18:23:10,977][17356] Decorrelating experience for 64 frames... +[2025-08-29 18:23:10,983][17374] Decorrelating experience for 0 frames... +[2025-08-29 18:23:11,001][17373] Decorrelating experience for 64 frames... +[2025-08-29 18:23:11,215][17362] Decorrelating experience for 128 frames... +[2025-08-29 18:23:11,273][17374] Decorrelating experience for 64 frames... +[2025-08-29 18:23:11,290][17372] Decorrelating experience for 128 frames... +[2025-08-29 18:23:11,292][17356] Decorrelating experience for 128 frames... +[2025-08-29 18:23:11,307][17360] Decorrelating experience for 64 frames... +[2025-08-29 18:23:11,316][17358] Decorrelating experience for 128 frames... +[2025-08-29 18:23:11,373][17373] Decorrelating experience for 128 frames... +[2025-08-29 18:23:11,555][17355] Decorrelating experience for 128 frames... +[2025-08-29 18:23:11,558][17359] Decorrelating experience for 128 frames... +[2025-08-29 18:23:11,596][17362] Decorrelating experience for 192 frames... +[2025-08-29 18:23:11,626][17361] Decorrelating experience for 64 frames... +[2025-08-29 18:23:11,628][17374] Decorrelating experience for 128 frames... +[2025-08-29 18:23:11,653][17354] Decorrelating experience for 192 frames... +[2025-08-29 18:23:11,676][17358] Decorrelating experience for 192 frames... +[2025-08-29 18:23:11,855][17360] Decorrelating experience for 128 frames... +[2025-08-29 18:23:11,902][17355] Decorrelating experience for 192 frames... +[2025-08-29 18:23:11,911][17359] Decorrelating experience for 192 frames... +[2025-08-29 18:23:11,917][17361] Decorrelating experience for 128 frames... +[2025-08-29 18:23:11,918][17373] Decorrelating experience for 192 frames... +[2025-08-29 18:23:11,968][17356] Decorrelating experience for 192 frames... +[2025-08-29 18:23:12,030][17357] Decorrelating experience for 192 frames... +[2025-08-29 18:23:12,258][17374] Decorrelating experience for 192 frames... +[2025-08-29 18:23:12,266][17358] Decorrelating experience for 256 frames... +[2025-08-29 18:23:12,768][17354] Decorrelating experience for 256 frames... +[2025-08-29 18:23:12,770][17360] Decorrelating experience for 192 frames... +[2025-08-29 18:23:12,770][17372] Decorrelating experience for 192 frames... +[2025-08-29 18:23:12,928][17361] Decorrelating experience for 192 frames... +[2025-08-29 18:23:12,927][17357] Decorrelating experience for 256 frames... +[2025-08-29 18:23:12,933][17374] Decorrelating experience for 256 frames... +[2025-08-29 18:23:12,984][17358] Decorrelating experience for 320 frames... +[2025-08-29 18:23:13,000][17355] Decorrelating experience for 256 frames... +[2025-08-29 18:23:13,067][17362] Decorrelating experience for 256 frames... +[2025-08-29 18:23:13,255][17373] Decorrelating experience for 256 frames... +[2025-08-29 18:23:13,279][17354] Decorrelating experience for 320 frames... +[2025-08-29 18:23:13,308][17360] Decorrelating experience for 256 frames... +[2025-08-29 18:23:13,310][17356] Decorrelating experience for 256 frames... +[2025-08-29 18:23:13,345][17374] Decorrelating experience for 320 frames... +[2025-08-29 18:23:13,435][17358] Decorrelating experience for 384 frames... +[2025-08-29 18:23:13,462][17355] Decorrelating experience for 320 frames... +[2025-08-29 18:23:13,494][17362] Decorrelating experience for 320 frames... +[2025-08-29 18:23:13,608][17361] Decorrelating experience for 256 frames... +[2025-08-29 18:23:13,662][17372] Decorrelating experience for 256 frames... +[2025-08-29 18:23:13,758][17373] Decorrelating experience for 320 frames... +[2025-08-29 18:23:13,791][17356] Decorrelating experience for 320 frames... +[2025-08-29 18:23:13,804][17360] Decorrelating experience for 320 frames... +[2025-08-29 18:23:13,929][17374] Decorrelating experience for 384 frames... +[2025-08-29 18:23:14,027][17358] Decorrelating experience for 448 frames... +[2025-08-29 18:23:14,032][17357] Decorrelating experience for 320 frames... +[2025-08-29 18:23:14,087][17362] Decorrelating experience for 384 frames... +[2025-08-29 18:23:14,352][17355] Decorrelating experience for 384 frames... +[2025-08-29 18:23:14,394][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:23:14,709][17360] Decorrelating experience for 384 frames... +[2025-08-29 18:23:14,714][17359] Decorrelating experience for 256 frames... +[2025-08-29 18:23:14,760][15827] Heartbeat connected on RolloutWorker_w4 +[2025-08-29 18:23:14,790][17356] Decorrelating experience for 384 frames... +[2025-08-29 18:23:14,830][17354] Decorrelating experience for 384 frames... +[2025-08-29 18:23:15,080][17362] Decorrelating experience for 448 frames... +[2025-08-29 18:23:15,099][17372] Decorrelating experience for 320 frames... +[2025-08-29 18:23:15,125][17357] Decorrelating experience for 384 frames... +[2025-08-29 18:23:15,171][17373] Decorrelating experience for 384 frames... +[2025-08-29 18:23:15,223][17361] Decorrelating experience for 320 frames... +[2025-08-29 18:23:15,265][17355] Decorrelating experience for 448 frames... +[2025-08-29 18:23:15,337][17374] Decorrelating experience for 448 frames... +[2025-08-29 18:23:15,376][15827] Heartbeat connected on RolloutWorker_w7 +[2025-08-29 18:23:15,495][17354] Decorrelating experience for 448 frames... +[2025-08-29 18:23:15,523][17359] Decorrelating experience for 320 frames... +[2025-08-29 18:23:15,580][15827] Heartbeat connected on RolloutWorker_w1 +[2025-08-29 18:23:15,608][15827] Heartbeat connected on RolloutWorker_w11 +[2025-08-29 18:23:15,651][17372] Decorrelating experience for 384 frames... +[2025-08-29 18:23:15,663][17357] Decorrelating experience for 448 frames... +[2025-08-29 18:23:15,687][17356] Decorrelating experience for 448 frames... +[2025-08-29 18:23:15,754][15827] Heartbeat connected on RolloutWorker_w0 +[2025-08-29 18:23:15,799][17373] Decorrelating experience for 448 frames... +[2025-08-29 18:23:15,847][17361] Decorrelating experience for 384 frames... +[2025-08-29 18:23:15,908][15827] Heartbeat connected on RolloutWorker_w3 +[2025-08-29 18:23:15,966][15827] Heartbeat connected on RolloutWorker_w2 +[2025-08-29 18:23:16,039][17359] Decorrelating experience for 384 frames... +[2025-08-29 18:23:16,247][15827] Heartbeat connected on RolloutWorker_w10 +[2025-08-29 18:23:16,277][17372] Decorrelating experience for 448 frames... +[2025-08-29 18:23:16,277][17360] Decorrelating experience for 448 frames... +[2025-08-29 18:23:16,435][15827] Heartbeat connected on RolloutWorker_w9 +[2025-08-29 18:23:16,437][15827] Heartbeat connected on RolloutWorker_w6 +[2025-08-29 18:23:16,707][17359] Decorrelating experience for 448 frames... +[2025-08-29 18:23:16,884][15827] Heartbeat connected on RolloutWorker_w5 +[2025-08-29 18:23:17,328][17361] Decorrelating experience for 448 frames... +[2025-08-29 18:23:19,406][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:23:24,424][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:23:29,412][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:23:31,107][15827] Heartbeat connected on RolloutWorker_w8 +[2025-08-29 18:23:34,399][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:23:39,460][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:23:46,448][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:23:49,426][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:23:54,407][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:23:59,609][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:04,419][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:09,507][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:14,460][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:22,313][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:24,455][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:29,508][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:34,819][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:39,608][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:44,726][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:49,472][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:24:50,386][17336] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2025-08-29 18:24:54,474][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:25:00,020][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:25:04,543][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:25:08,113][17336] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth +[2025-08-29 18:25:09,465][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:25:12,505][15827] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 15827], exiting... +[2025-08-29 18:25:12,533][17336] Stopping Batcher_0... +[2025-08-29 18:25:12,533][17336] Loop batcher_evt_loop terminating... +[2025-08-29 18:25:12,531][15827] Runner profile tree view: +main_loop: 142.5811 +[2025-08-29 18:25:12,539][15827] Collected {0: 0}, FPS: 0.0 +[2025-08-29 18:25:12,594][17336] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth... +[2025-08-29 18:25:12,752][17360] Stopping RolloutWorker_w6... +[2025-08-29 18:25:12,758][17360] Loop rollout_proc6_evt_loop terminating... +[2025-08-29 18:25:12,756][17359] Stopping RolloutWorker_w5... +[2025-08-29 18:25:12,760][17359] Loop rollout_proc5_evt_loop terminating... +[2025-08-29 18:25:12,766][17374] Stopping RolloutWorker_w11... +[2025-08-29 18:25:12,768][17374] Loop rollout_proc11_evt_loop terminating... +[2025-08-29 18:25:12,790][17362] Stopping RolloutWorker_w7... +[2025-08-29 18:25:12,792][17362] Loop rollout_proc7_evt_loop terminating... +[2025-08-29 18:25:12,809][17336] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000000_0.pth +[2025-08-29 18:25:12,814][17336] Stopping LearnerWorker_p0... +[2025-08-29 18:25:12,816][17336] Loop learner_proc0_evt_loop terminating... +[2025-08-29 18:25:12,821][17356] Stopping RolloutWorker_w2... +[2025-08-29 18:25:12,820][17372] Stopping RolloutWorker_w9... +[2025-08-29 18:25:12,822][17356] Loop rollout_proc2_evt_loop terminating... +[2025-08-29 18:25:12,822][17372] Loop rollout_proc9_evt_loop terminating... +[2025-08-29 18:25:12,823][17357] Stopping RolloutWorker_w3... +[2025-08-29 18:25:12,827][17357] Loop rollout_proc3_evt_loop terminating... +[2025-08-29 18:25:12,832][17354] Stopping RolloutWorker_w0... +[2025-08-29 18:25:12,834][17354] Loop rollout_proc0_evt_loop terminating... +[2025-08-29 18:25:12,836][17361] Stopping RolloutWorker_w8... +[2025-08-29 18:25:12,829][17373] Stopping RolloutWorker_w10... +[2025-08-29 18:25:12,837][17373] Loop rollout_proc10_evt_loop terminating... +[2025-08-29 18:25:12,836][17361] Loop rollout_proc8_evt_loop terminating... +[2025-08-29 18:25:12,836][17358] Stopping RolloutWorker_w4... +[2025-08-29 18:25:12,840][17358] Loop rollout_proc4_evt_loop terminating... +[2025-08-29 18:25:13,013][17355] Stopping RolloutWorker_w1... +[2025-08-29 18:25:13,015][17355] Loop rollout_proc1_evt_loop terminating... +[2025-08-29 18:25:14,529][17353] Weights refcount: 2 0 +[2025-08-29 18:25:14,540][17353] Stopping InferenceWorker_p0-w0... +[2025-08-29 18:25:14,541][17353] Loop inference_proc0-0_evt_loop terminating... +[2025-08-29 18:27:07,198][15827] Environment doom_basic already registered, overwriting... +[2025-08-29 18:27:07,204][15827] Environment doom_two_colors_easy already registered, overwriting... +[2025-08-29 18:27:07,207][15827] Environment doom_two_colors_hard already registered, overwriting... +[2025-08-29 18:27:07,210][15827] Environment doom_dm already registered, overwriting... +[2025-08-29 18:27:07,213][15827] Environment doom_dwango5 already registered, overwriting... +[2025-08-29 18:27:07,215][15827] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2025-08-29 18:27:07,220][15827] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2025-08-29 18:27:07,223][15827] Environment doom_my_way_home already registered, overwriting... +[2025-08-29 18:27:07,224][15827] Environment doom_deadly_corridor already registered, overwriting... +[2025-08-29 18:27:07,224][15827] Environment doom_defend_the_center already registered, overwriting... +[2025-08-29 18:27:07,225][15827] Environment doom_defend_the_line already registered, overwriting... +[2025-08-29 18:27:07,226][15827] Environment doom_health_gathering already registered, overwriting... +[2025-08-29 18:27:07,227][15827] Environment doom_health_gathering_supreme already registered, overwriting... +[2025-08-29 18:27:07,228][15827] Environment doom_battle already registered, overwriting... +[2025-08-29 18:27:07,230][15827] Environment doom_battle2 already registered, overwriting... +[2025-08-29 18:27:07,233][15827] Environment doom_duel_bots already registered, overwriting... +[2025-08-29 18:27:07,235][15827] Environment doom_deathmatch_bots already registered, overwriting... +[2025-08-29 18:27:07,237][15827] Environment doom_duel already registered, overwriting... +[2025-08-29 18:27:07,238][15827] Environment doom_deathmatch_full already registered, overwriting... +[2025-08-29 18:27:07,241][15827] Environment doom_benchmark already registered, overwriting... +[2025-08-29 18:27:07,243][15827] register_encoder_factory: +[2025-08-29 18:27:07,298][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json +[2025-08-29 18:27:07,299][15827] Overriding arg 'num_workers' with value 10 passed from command line +[2025-08-29 18:27:07,300][15827] Overriding arg 'num_envs_per_worker' with value 4 passed from command line +[2025-08-29 18:27:07,313][15827] Experiment dir /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists! +[2025-08-29 18:27:07,314][15827] Resuming existing experiment from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment... +[2025-08-29 18:27:07,315][15827] Weights and Biases integration disabled +[2025-08-29 18:27:07,327][15827] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2025-08-29 18:27:10,679][15827] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=10 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=64 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=20000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=f8ed470f837e96d11b86d84cc03d9d0be1dc0042 +git_repo_name=git@github.com:huggingface/deep-rl-class.git +[2025-08-29 18:27:10,681][15827] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... +[2025-08-29 18:27:10,802][15827] Rollout worker 0 uses device cpu +[2025-08-29 18:27:10,804][15827] Rollout worker 1 uses device cpu +[2025-08-29 18:27:10,806][15827] Rollout worker 2 uses device cpu +[2025-08-29 18:27:10,807][15827] Rollout worker 3 uses device cpu +[2025-08-29 18:27:10,809][15827] Rollout worker 4 uses device cpu +[2025-08-29 18:27:10,811][15827] Rollout worker 5 uses device cpu +[2025-08-29 18:27:10,813][15827] Rollout worker 6 uses device cpu +[2025-08-29 18:27:10,816][15827] Rollout worker 7 uses device cpu +[2025-08-29 18:27:10,818][15827] Rollout worker 8 uses device cpu +[2025-08-29 18:27:10,820][15827] Rollout worker 9 uses device cpu +[2025-08-29 18:27:10,906][15827] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:27:10,907][15827] InferenceWorker_p0-w0: min num requests: 3 +[2025-08-29 18:27:10,943][15827] Starting all processes... +[2025-08-29 18:27:10,944][15827] Starting process learner_proc0 +[2025-08-29 18:27:10,993][15827] Starting all processes... +[2025-08-29 18:27:11,001][15827] Starting process inference_proc0-0 +[2025-08-29 18:27:11,003][15827] Starting process rollout_proc0 +[2025-08-29 18:27:11,003][15827] Starting process rollout_proc1 +[2025-08-29 18:27:11,004][15827] Starting process rollout_proc2 +[2025-08-29 18:27:11,004][15827] Starting process rollout_proc3 +[2025-08-29 18:27:11,005][15827] Starting process rollout_proc4 +[2025-08-29 18:27:11,005][15827] Starting process rollout_proc5 +[2025-08-29 18:27:11,005][15827] Starting process rollout_proc6 +[2025-08-29 18:27:11,005][15827] Starting process rollout_proc7 +[2025-08-29 18:27:11,006][15827] Starting process rollout_proc8 +[2025-08-29 18:27:11,006][15827] Starting process rollout_proc9 +[2025-08-29 18:27:15,683][18823] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:27:15,683][18823] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-08-29 18:27:15,978][18823] Num visible devices: 1 +[2025-08-29 18:27:15,999][18823] Starting seed is not provided +[2025-08-29 18:27:15,999][18823] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:27:15,999][18823] Initializing actor-critic model on device cuda:0 +[2025-08-29 18:27:16,000][18823] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 18:27:16,013][18823] RunningMeanStd input shape: (1,) +[2025-08-29 18:27:16,062][18823] ConvEncoder: input_channels=3 +[2025-08-29 18:27:16,128][18843] Worker 4 uses CPU cores [4] +[2025-08-29 18:27:16,129][18842] Worker 2 uses CPU cores [2] +[2025-08-29 18:27:16,142][18838] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:27:16,143][18838] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-08-29 18:27:16,152][18841] Worker 3 uses CPU cores [3] +[2025-08-29 18:27:16,183][18845] Worker 6 uses CPU cores [6] +[2025-08-29 18:27:16,196][18839] Worker 0 uses CPU cores [0] +[2025-08-29 18:27:16,197][18844] Worker 5 uses CPU cores [5] +[2025-08-29 18:27:16,226][18840] Worker 1 uses CPU cores [1] +[2025-08-29 18:27:16,244][18855] Worker 9 uses CPU cores [9] +[2025-08-29 18:27:16,271][18838] Num visible devices: 1 +[2025-08-29 18:27:16,408][18823] Conv encoder output size: 512 +[2025-08-29 18:27:16,408][18823] Policy head output size: 512 +[2025-08-29 18:27:16,455][18857] Worker 8 uses CPU cores [8] +[2025-08-29 18:27:16,472][18823] Created Actor Critic model with architecture: +[2025-08-29 18:27:16,472][18823] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2025-08-29 18:27:16,566][18856] Worker 7 uses CPU cores [7] +[2025-08-29 18:27:17,682][18823] Using optimizer +[2025-08-29 18:27:23,850][18823] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-29 18:27:23,856][18823] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-29 18:27:23,858][18823] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-29 18:27:23,859][18823] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-29 18:27:23,859][18823] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-29 18:27:23,860][18823] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-29 18:27:23,860][18823] Did not load from checkpoint, starting from scratch! +[2025-08-29 18:27:23,860][18823] Initialized policy 0 weights for model version 0 +[2025-08-29 18:27:23,866][18823] LearnerWorker_p0 finished initialization! +[2025-08-29 18:27:23,867][18823] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:27:24,260][18838] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 18:27:24,263][18838] RunningMeanStd input shape: (1,) +[2025-08-29 18:27:24,274][18838] ConvEncoder: input_channels=3 +[2025-08-29 18:27:24,389][18838] Conv encoder output size: 512 +[2025-08-29 18:27:24,390][18838] Policy head output size: 512 +[2025-08-29 18:27:24,492][15827] Inference worker 0-0 is ready! +[2025-08-29 18:27:24,495][15827] All inference workers are ready! Signal rollout workers to start! +[2025-08-29 18:27:24,653][18840] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:27:24,654][18855] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:27:24,654][18841] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:27:24,654][18845] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:27:24,660][18839] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:27:24,662][18857] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:27:24,662][18842] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:27:24,661][18844] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:27:24,664][18843] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:27:24,666][18856] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:27:25,049][18844] Decorrelating experience for 0 frames... +[2025-08-29 18:27:25,049][18840] Decorrelating experience for 0 frames... +[2025-08-29 18:27:25,049][18843] Decorrelating experience for 0 frames... +[2025-08-29 18:27:25,049][18855] Decorrelating experience for 0 frames... +[2025-08-29 18:27:25,049][18842] Decorrelating experience for 0 frames... +[2025-08-29 18:27:25,049][18839] Decorrelating experience for 0 frames... +[2025-08-29 18:27:25,240][18840] Decorrelating experience for 64 frames... +[2025-08-29 18:27:25,241][18856] Decorrelating experience for 0 frames... +[2025-08-29 18:27:25,253][18855] Decorrelating experience for 64 frames... +[2025-08-29 18:27:25,257][18844] Decorrelating experience for 64 frames... +[2025-08-29 18:27:25,267][18842] Decorrelating experience for 64 frames... +[2025-08-29 18:27:25,271][18839] Decorrelating experience for 64 frames... +[2025-08-29 18:27:25,300][18845] Decorrelating experience for 0 frames... +[2025-08-29 18:27:25,431][18856] Decorrelating experience for 64 frames... +[2025-08-29 18:27:25,441][18841] Decorrelating experience for 0 frames... +[2025-08-29 18:27:25,480][18857] Decorrelating experience for 0 frames... +[2025-08-29 18:27:25,488][18845] Decorrelating experience for 64 frames... +[2025-08-29 18:27:25,542][18844] Decorrelating experience for 128 frames... +[2025-08-29 18:27:25,623][18843] Decorrelating experience for 64 frames... +[2025-08-29 18:27:25,641][18841] Decorrelating experience for 64 frames... +[2025-08-29 18:27:25,693][18840] Decorrelating experience for 128 frames... +[2025-08-29 18:27:25,729][18856] Decorrelating experience for 128 frames... +[2025-08-29 18:27:25,774][18857] Decorrelating experience for 64 frames... +[2025-08-29 18:27:25,831][18844] Decorrelating experience for 192 frames... +[2025-08-29 18:27:25,885][18845] Decorrelating experience for 128 frames... +[2025-08-29 18:27:25,949][18855] Decorrelating experience for 128 frames... +[2025-08-29 18:27:25,960][18843] Decorrelating experience for 128 frames... +[2025-08-29 18:27:26,073][18840] Decorrelating experience for 192 frames... +[2025-08-29 18:27:26,101][18841] Decorrelating experience for 128 frames... +[2025-08-29 18:27:26,173][18845] Decorrelating experience for 192 frames... +[2025-08-29 18:27:26,211][18857] Decorrelating experience for 128 frames... +[2025-08-29 18:27:26,225][18839] Decorrelating experience for 128 frames... +[2025-08-29 18:27:26,247][18843] Decorrelating experience for 192 frames... +[2025-08-29 18:27:26,388][18842] Decorrelating experience for 128 frames... +[2025-08-29 18:27:26,388][18841] Decorrelating experience for 192 frames... +[2025-08-29 18:27:26,421][18855] Decorrelating experience for 192 frames... +[2025-08-29 18:27:26,459][18856] Decorrelating experience for 192 frames... +[2025-08-29 18:27:26,482][18857] Decorrelating experience for 192 frames... +[2025-08-29 18:27:26,491][18839] Decorrelating experience for 192 frames... +[2025-08-29 18:27:26,628][18842] Decorrelating experience for 192 frames... +[2025-08-29 18:27:27,328][15827] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:27:29,461][18823] Signal inference workers to stop experience collection... +[2025-08-29 18:27:29,477][18838] InferenceWorker_p0-w0: stopping experience collection +[2025-08-29 18:27:30,906][15827] Heartbeat connected on Batcher_0 +[2025-08-29 18:27:30,952][15827] Heartbeat connected on InferenceWorker_p0-w0 +[2025-08-29 18:27:30,963][15827] Heartbeat connected on RolloutWorker_w0 +[2025-08-29 18:27:30,965][15827] Heartbeat connected on RolloutWorker_w1 +[2025-08-29 18:27:30,966][15827] Heartbeat connected on RolloutWorker_w2 +[2025-08-29 18:27:30,968][15827] Heartbeat connected on RolloutWorker_w3 +[2025-08-29 18:27:30,970][15827] Heartbeat connected on RolloutWorker_w4 +[2025-08-29 18:27:30,971][15827] Heartbeat connected on RolloutWorker_w5 +[2025-08-29 18:27:30,972][15827] Heartbeat connected on RolloutWorker_w6 +[2025-08-29 18:27:30,973][15827] Heartbeat connected on RolloutWorker_w8 +[2025-08-29 18:27:30,973][15827] Heartbeat connected on RolloutWorker_w7 +[2025-08-29 18:27:30,975][15827] Heartbeat connected on RolloutWorker_w9 +[2025-08-29 18:27:32,334][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 650.5. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:27:32,369][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:27:37,410][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 323.6. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:27:37,862][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:27:42,413][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 215.9. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:27:42,443][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:27:47,439][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 162.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:27:48,122][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:27:52,356][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 130.1. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:27:52,839][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:27:57,380][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 108.5. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:27:57,833][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:02,476][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 92.7. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:02,912][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:07,464][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 81.2. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:07,904][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:12,456][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 72.2. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:12,727][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:17,350][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:17,702][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:22,406][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:22,542][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:27,336][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:27,657][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:33,122][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:33,144][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:37,358][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:37,718][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:42,390][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:42,529][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:47,356][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:47,631][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:52,447][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:52,725][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:28:57,358][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:28:57,548][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:29:02,462][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:29:03,339][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:29:08,946][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:29:10,332][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:29:13,333][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:29:15,106][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:29:17,378][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:29:17,760][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:29:22,385][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:29:23,312][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:29:31,368][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 3256. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:29:31,408][15827] Avg episode reward: [(0, '2.022')] +[2025-08-29 18:29:31,944][15827] Keyboard interrupt detected in the event loop EvtLoop [Runner_EvtLoop, process=main process 15827], exiting... +[2025-08-29 18:29:31,948][18823] Stopping Batcher_0... +[2025-08-29 18:29:31,949][18823] Loop batcher_evt_loop terminating... +[2025-08-29 18:29:31,947][15827] Runner profile tree view: +main_loop: 141.0043 +[2025-08-29 18:29:31,953][15827] Collected {0: 0}, FPS: 0.0 +[2025-08-29 18:29:31,995][18838] Weights refcount: 2 0 +[2025-08-29 18:29:32,011][18843] Stopping RolloutWorker_w4... +[2025-08-29 18:29:32,011][18843] Loop rollout_proc4_evt_loop terminating... +[2025-08-29 18:29:32,017][18840] Stopping RolloutWorker_w1... +[2025-08-29 18:29:32,018][18840] Loop rollout_proc1_evt_loop terminating... +[2025-08-29 18:29:32,023][18838] Stopping InferenceWorker_p0-w0... +[2025-08-29 18:29:32,022][18855] Stopping RolloutWorker_w9... +[2025-08-29 18:29:32,023][18838] Loop inference_proc0-0_evt_loop terminating... +[2025-08-29 18:29:32,023][18855] Loop rollout_proc9_evt_loop terminating... +[2025-08-29 18:29:32,024][18845] Stopping RolloutWorker_w6... +[2025-08-29 18:29:32,024][18845] Loop rollout_proc6_evt_loop terminating... +[2025-08-29 18:29:32,026][18839] Stopping RolloutWorker_w0... +[2025-08-29 18:29:32,027][18839] Loop rollout_proc0_evt_loop terminating... +[2025-08-29 18:29:32,034][18856] Stopping RolloutWorker_w7... +[2025-08-29 18:29:32,036][18856] Loop rollout_proc7_evt_loop terminating... +[2025-08-29 18:29:32,036][18841] Stopping RolloutWorker_w3... +[2025-08-29 18:29:32,037][18841] Loop rollout_proc3_evt_loop terminating... +[2025-08-29 18:29:32,039][18842] Stopping RolloutWorker_w2... +[2025-08-29 18:29:32,040][18842] Loop rollout_proc2_evt_loop terminating... +[2025-08-29 18:29:32,043][18844] Stopping RolloutWorker_w5... +[2025-08-29 18:29:32,045][18844] Loop rollout_proc5_evt_loop terminating... +[2025-08-29 18:29:32,049][18857] Stopping RolloutWorker_w8... +[2025-08-29 18:29:32,050][18857] Loop rollout_proc8_evt_loop terminating... +[2025-08-29 18:29:35,446][18823] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth... +[2025-08-29 18:29:35,497][18823] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth +[2025-08-29 18:29:35,500][18823] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth... +[2025-08-29 18:29:35,543][18823] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000002_8192.pth +[2025-08-29 18:29:35,547][18823] Stopping LearnerWorker_p0... +[2025-08-29 18:29:35,547][18823] Loop learner_proc0_evt_loop terminating... +[2025-08-29 18:29:38,831][15827] Environment doom_basic already registered, overwriting... +[2025-08-29 18:29:38,835][15827] Environment doom_two_colors_easy already registered, overwriting... +[2025-08-29 18:29:38,837][15827] Environment doom_two_colors_hard already registered, overwriting... +[2025-08-29 18:29:38,838][15827] Environment doom_dm already registered, overwriting... +[2025-08-29 18:29:38,839][15827] Environment doom_dwango5 already registered, overwriting... +[2025-08-29 18:29:38,840][15827] Environment doom_my_way_home_flat_actions already registered, overwriting... +[2025-08-29 18:29:38,841][15827] Environment doom_defend_the_center_flat_actions already registered, overwriting... +[2025-08-29 18:29:38,842][15827] Environment doom_my_way_home already registered, overwriting... +[2025-08-29 18:29:38,844][15827] Environment doom_deadly_corridor already registered, overwriting... +[2025-08-29 18:29:38,845][15827] Environment doom_defend_the_center already registered, overwriting... +[2025-08-29 18:29:38,846][15827] Environment doom_defend_the_line already registered, overwriting... +[2025-08-29 18:29:38,847][15827] Environment doom_health_gathering already registered, overwriting... +[2025-08-29 18:29:38,848][15827] Environment doom_health_gathering_supreme already registered, overwriting... +[2025-08-29 18:29:38,849][15827] Environment doom_battle already registered, overwriting... +[2025-08-29 18:29:38,851][15827] Environment doom_battle2 already registered, overwriting... +[2025-08-29 18:29:38,852][15827] Environment doom_duel_bots already registered, overwriting... +[2025-08-29 18:29:38,853][15827] Environment doom_deathmatch_bots already registered, overwriting... +[2025-08-29 18:29:38,854][15827] Environment doom_duel already registered, overwriting... +[2025-08-29 18:29:38,855][15827] Environment doom_deathmatch_full already registered, overwriting... +[2025-08-29 18:29:38,858][15827] Environment doom_benchmark already registered, overwriting... +[2025-08-29 18:29:38,859][15827] register_encoder_factory: +[2025-08-29 18:29:38,967][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json +[2025-08-29 18:29:39,044][15827] Experiment dir /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment already exists! +[2025-08-29 18:29:39,046][15827] Resuming existing experiment from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment... +[2025-08-29 18:29:39,047][15827] Weights and Biases integration disabled +[2025-08-29 18:29:39,061][15827] Environment var CUDA_VISIBLE_DEVICES is 0 + +[2025-08-29 18:29:47,659][15827] Starting experiment with the following configuration: +help=False +algo=APPO +env=doom_health_gathering_supreme +experiment=default_experiment +train_dir=/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir +restart_behavior=resume +device=gpu +seed=None +num_policies=1 +async_rl=True +serial_mode=False +batched_sampling=False +num_batches_to_accumulate=2 +worker_num_splits=2 +policy_workers_per_policy=1 +max_policy_lag=1000 +num_workers=10 +num_envs_per_worker=4 +batch_size=1024 +num_batches_per_epoch=1 +num_epochs=1 +rollout=64 +recurrence=32 +shuffle_minibatches=False +gamma=0.99 +reward_scale=1.0 +reward_clip=1000.0 +value_bootstrap=False +normalize_returns=True +exploration_loss_coeff=0.001 +value_loss_coeff=0.5 +kl_loss_coeff=0.0 +exploration_loss=symmetric_kl +gae_lambda=0.95 +ppo_clip_ratio=0.1 +ppo_clip_value=0.2 +with_vtrace=False +vtrace_rho=1.0 +vtrace_c=1.0 +optimizer=adam +adam_eps=1e-06 +adam_beta1=0.9 +adam_beta2=0.999 +max_grad_norm=4.0 +learning_rate=0.0001 +lr_schedule=constant +lr_schedule_kl_threshold=0.008 +lr_adaptive_min=1e-06 +lr_adaptive_max=0.01 +obs_subtract_mean=0.0 +obs_scale=255.0 +normalize_input=True +normalize_input_keys=None +decorrelate_experience_max_seconds=0 +decorrelate_envs_on_one_worker=True +actor_worker_gpus=[] +set_workers_cpu_affinity=True +force_envs_single_thread=False +default_niceness=0 +log_to_file=True +experiment_summaries_interval=10 +flush_summaries_interval=30 +stats_avg=100 +summaries_use_frameskip=True +heartbeat_interval=20 +heartbeat_reporting_interval=600 +train_for_env_steps=20000000 +train_for_seconds=10000000000 +save_every_sec=120 +keep_checkpoints=2 +load_checkpoint_kind=latest +save_milestones_sec=-1 +save_best_every_sec=5 +save_best_metric=reward +save_best_after=100000 +benchmark=False +encoder_mlp_layers=[512, 512] +encoder_conv_architecture=convnet_simple +encoder_conv_mlp_layers=[512] +use_rnn=True +rnn_size=512 +rnn_type=gru +rnn_num_layers=1 +decoder_mlp_layers=[] +nonlinearity=elu +policy_initialization=orthogonal +policy_init_gain=1.0 +actor_critic_share_weights=True +adaptive_stddev=True +continuous_tanh_scale=0.0 +initial_stddev=1.0 +use_env_info_cache=False +env_gpu_actions=False +env_gpu_observations=True +env_frameskip=4 +env_framestack=1 +pixel_format=CHW +use_record_episode_statistics=False +with_wandb=False +wandb_user=None +wandb_project=sample_factory +wandb_group=None +wandb_job_type=SF +wandb_tags=[] +with_pbt=False +pbt_mix_policies_in_one_env=True +pbt_period_env_steps=5000000 +pbt_start_mutation=20000000 +pbt_replace_fraction=0.3 +pbt_mutation_rate=0.15 +pbt_replace_reward_gap=0.1 +pbt_replace_reward_gap_absolute=1e-06 +pbt_optimize_gamma=False +pbt_target_objective=true_objective +pbt_perturb_min=1.1 +pbt_perturb_max=1.5 +num_agents=-1 +num_humans=0 +num_bots=-1 +start_bot_difficulty=None +timelimit=None +res_w=128 +res_h=72 +wide_aspect_ratio=False +eval_env_frameskip=1 +fps=35 +command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000 +cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000} +git_hash=f8ed470f837e96d11b86d84cc03d9d0be1dc0042 +git_repo_name=git@github.com:huggingface/deep-rl-class.git +[2025-08-29 18:29:47,663][15827] Saving configuration to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json... +[2025-08-29 18:29:47,785][15827] Rollout worker 0 uses device cpu +[2025-08-29 18:29:47,787][15827] Rollout worker 1 uses device cpu +[2025-08-29 18:29:47,788][15827] Rollout worker 2 uses device cpu +[2025-08-29 18:29:47,789][15827] Rollout worker 3 uses device cpu +[2025-08-29 18:29:47,790][15827] Rollout worker 4 uses device cpu +[2025-08-29 18:29:47,791][15827] Rollout worker 5 uses device cpu +[2025-08-29 18:29:47,793][15827] Rollout worker 6 uses device cpu +[2025-08-29 18:29:47,795][15827] Rollout worker 7 uses device cpu +[2025-08-29 18:29:47,796][15827] Rollout worker 8 uses device cpu +[2025-08-29 18:29:47,797][15827] Rollout worker 9 uses device cpu +[2025-08-29 18:29:47,898][15827] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:29:47,899][15827] InferenceWorker_p0-w0: min num requests: 3 +[2025-08-29 18:29:48,389][15827] Starting all processes... +[2025-08-29 18:29:48,390][15827] Starting process learner_proc0 +[2025-08-29 18:29:48,493][15827] Starting all processes... +[2025-08-29 18:29:48,508][15827] Starting process inference_proc0-0 +[2025-08-29 18:29:48,512][15827] Starting process rollout_proc0 +[2025-08-29 18:29:48,513][15827] Starting process rollout_proc1 +[2025-08-29 18:29:48,513][15827] Starting process rollout_proc2 +[2025-08-29 18:29:48,514][15827] Starting process rollout_proc3 +[2025-08-29 18:29:48,514][15827] Starting process rollout_proc4 +[2025-08-29 18:29:48,514][15827] Starting process rollout_proc5 +[2025-08-29 18:29:48,523][15827] Starting process rollout_proc6 +[2025-08-29 18:29:48,524][15827] Starting process rollout_proc7 +[2025-08-29 18:29:48,525][15827] Starting process rollout_proc8 +[2025-08-29 18:29:48,526][15827] Starting process rollout_proc9 +[2025-08-29 18:29:52,935][19395] Worker 3 uses CPU cores [3] +[2025-08-29 18:29:52,947][19394] Worker 0 uses CPU cores [0] +[2025-08-29 18:29:52,954][19378] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:29:52,955][19378] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0 +[2025-08-29 18:29:53,005][19398] Worker 4 uses CPU cores [4] +[2025-08-29 18:29:53,115][19396] Worker 2 uses CPU cores [2] +[2025-08-29 18:29:53,183][19378] Num visible devices: 1 +[2025-08-29 18:29:53,200][19378] Starting seed is not provided +[2025-08-29 18:29:53,200][19378] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:29:53,201][19378] Initializing actor-critic model on device cuda:0 +[2025-08-29 18:29:53,201][19378] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 18:29:53,205][19400] Worker 7 uses CPU cores [7] +[2025-08-29 18:29:53,209][19378] RunningMeanStd input shape: (1,) +[2025-08-29 18:29:53,227][19378] ConvEncoder: input_channels=3 +[2025-08-29 18:29:53,265][19403] Worker 9 uses CPU cores [9] +[2025-08-29 18:29:53,325][19401] Worker 8 uses CPU cores [8] +[2025-08-29 18:29:53,465][19397] Worker 1 uses CPU cores [1] +[2025-08-29 18:29:53,465][19402] Worker 5 uses CPU cores [5] +[2025-08-29 18:29:53,482][19399] Worker 6 uses CPU cores [6] +[2025-08-29 18:29:53,488][19378] Conv encoder output size: 512 +[2025-08-29 18:29:53,489][19378] Policy head output size: 512 +[2025-08-29 18:29:53,516][19378] Created Actor Critic model with architecture: +[2025-08-29 18:29:53,517][19378] ActorCriticSharedWeights( + (obs_normalizer): ObservationNormalizer( + (running_mean_std): RunningMeanStdDictInPlace( + (running_mean_std): ModuleDict( + (obs): RunningMeanStdInPlace() + ) + ) + ) + (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace) + (encoder): VizdoomEncoder( + (basic_encoder): ConvEncoder( + (enc): RecursiveScriptModule( + original_name=ConvEncoderImpl + (conv_head): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Conv2d) + (1): RecursiveScriptModule(original_name=ELU) + (2): RecursiveScriptModule(original_name=Conv2d) + (3): RecursiveScriptModule(original_name=ELU) + (4): RecursiveScriptModule(original_name=Conv2d) + (5): RecursiveScriptModule(original_name=ELU) + ) + (mlp_layers): RecursiveScriptModule( + original_name=Sequential + (0): RecursiveScriptModule(original_name=Linear) + (1): RecursiveScriptModule(original_name=ELU) + ) + ) + ) + ) + (core): ModelCoreRNN( + (core): GRU(512, 512) + ) + (decoder): MlpDecoder( + (mlp): Identity() + ) + (critic_linear): Linear(in_features=512, out_features=1, bias=True) + (action_parameterization): ActionParameterizationDefault( + (distribution_linear): Linear(in_features=512, out_features=5, bias=True) + ) +) +[2025-08-29 18:29:53,597][19393] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:29:53,597][19393] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0 +[2025-08-29 18:29:53,637][19393] Num visible devices: 1 +[2025-08-29 18:29:54,111][19378] Using optimizer +[2025-08-29 18:29:56,115][19378] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-29 18:29:56,120][19378] Could not load from checkpoint, attempt 0 +Traceback (most recent call last): + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-29 18:29:56,123][19378] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-29 18:29:56,124][19378] Could not load from checkpoint, attempt 1 +Traceback (most recent call last): + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-29 18:29:56,124][19378] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth... +[2025-08-29 18:29:56,125][19378] Could not load from checkpoint, attempt 2 +Traceback (most recent call last): + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint + checkpoint_dict = torch.load(latest_checkpoint, map_location=device) + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + File "/home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/venv-u82/lib/python3.12/site-packages/torch/serialization.py", line 1529, in load + raise pickle.UnpicklingError(_get_wo_message(str(e))) from None +_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint. + (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source. + (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message. + WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([numpy.core.multiarray.scalar])` or the `torch.serialization.safe_globals([numpy.core.multiarray.scalar])` context manager to allowlist this global if you trust this class/function. + +Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html. +[2025-08-29 18:29:56,125][19378] Did not load from checkpoint, starting from scratch! +[2025-08-29 18:29:56,126][19378] Initialized policy 0 weights for model version 0 +[2025-08-29 18:29:56,134][19378] LearnerWorker_p0 finished initialization! +[2025-08-29 18:29:56,134][19378] Using GPUs [0] for process 0 (actually maps to GPUs [0]) +[2025-08-29 18:29:56,395][19393] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 18:29:56,397][19393] RunningMeanStd input shape: (1,) +[2025-08-29 18:29:56,407][19393] ConvEncoder: input_channels=3 +[2025-08-29 18:29:56,482][19393] Conv encoder output size: 512 +[2025-08-29 18:29:56,482][19393] Policy head output size: 512 +[2025-08-29 18:29:56,519][15827] Inference worker 0-0 is ready! +[2025-08-29 18:29:56,521][15827] All inference workers are ready! Signal rollout workers to start! +[2025-08-29 18:29:56,604][19402] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:29:56,614][19398] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:29:56,616][19403] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:29:56,620][19401] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:29:56,623][19394] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:29:56,630][19400] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:29:56,631][19396] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:29:56,640][19395] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:29:56,645][19397] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:29:56,665][19399] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:29:57,062][19398] Decorrelating experience for 0 frames... +[2025-08-29 18:29:57,062][19399] Decorrelating experience for 0 frames... +[2025-08-29 18:29:57,062][19395] Decorrelating experience for 0 frames... +[2025-08-29 18:29:57,078][19402] Decorrelating experience for 0 frames... +[2025-08-29 18:29:57,079][19397] Decorrelating experience for 0 frames... +[2025-08-29 18:29:57,079][19401] Decorrelating experience for 0 frames... +[2025-08-29 18:29:57,080][19403] Decorrelating experience for 0 frames... +[2025-08-29 18:29:57,080][19396] Decorrelating experience for 0 frames... +[2025-08-29 18:29:57,081][19400] Decorrelating experience for 0 frames... +[2025-08-29 18:29:57,284][19397] Decorrelating experience for 64 frames... +[2025-08-29 18:29:57,286][19395] Decorrelating experience for 64 frames... +[2025-08-29 18:29:57,299][19402] Decorrelating experience for 64 frames... +[2025-08-29 18:29:57,303][19403] Decorrelating experience for 64 frames... +[2025-08-29 18:29:57,303][19401] Decorrelating experience for 64 frames... +[2025-08-29 18:29:57,329][19399] Decorrelating experience for 64 frames... +[2025-08-29 18:29:57,340][19398] Decorrelating experience for 64 frames... +[2025-08-29 18:29:57,375][19394] Decorrelating experience for 0 frames... +[2025-08-29 18:29:57,510][19400] Decorrelating experience for 64 frames... +[2025-08-29 18:29:57,577][19396] Decorrelating experience for 64 frames... +[2025-08-29 18:29:57,580][19395] Decorrelating experience for 128 frames... +[2025-08-29 18:29:57,603][19397] Decorrelating experience for 128 frames... +[2025-08-29 18:29:57,603][19402] Decorrelating experience for 128 frames... +[2025-08-29 18:29:57,629][19403] Decorrelating experience for 128 frames... +[2025-08-29 18:29:57,752][19400] Decorrelating experience for 128 frames... +[2025-08-29 18:29:57,788][19394] Decorrelating experience for 64 frames... +[2025-08-29 18:29:57,790][19399] Decorrelating experience for 128 frames... +[2025-08-29 18:29:57,839][19397] Decorrelating experience for 192 frames... +[2025-08-29 18:29:57,848][19396] Decorrelating experience for 128 frames... +[2025-08-29 18:29:57,851][19398] Decorrelating experience for 128 frames... +[2025-08-29 18:29:57,898][19403] Decorrelating experience for 192 frames... +[2025-08-29 18:29:58,004][19401] Decorrelating experience for 128 frames... +[2025-08-29 18:29:58,047][19395] Decorrelating experience for 192 frames... +[2025-08-29 18:29:58,181][19394] Decorrelating experience for 128 frames... +[2025-08-29 18:29:58,257][19400] Decorrelating experience for 192 frames... +[2025-08-29 18:29:58,260][19399] Decorrelating experience for 192 frames... +[2025-08-29 18:29:58,299][19398] Decorrelating experience for 192 frames... +[2025-08-29 18:29:58,353][19396] Decorrelating experience for 192 frames... +[2025-08-29 18:29:58,369][19401] Decorrelating experience for 192 frames... +[2025-08-29 18:29:58,445][19394] Decorrelating experience for 192 frames... +[2025-08-29 18:29:58,478][19402] Decorrelating experience for 192 frames... +[2025-08-29 18:29:59,061][15827] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:30:00,642][19378] Signal inference workers to stop experience collection... +[2025-08-29 18:30:00,652][19393] InferenceWorker_p0-w0: stopping experience collection +[2025-08-29 18:30:04,069][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 639.5. Samples: 3202. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:30:04,118][15827] Avg episode reward: [(0, '1.964')] +[2025-08-29 18:30:07,895][15827] Heartbeat connected on Batcher_0 +[2025-08-29 18:30:07,909][15827] Heartbeat connected on InferenceWorker_p0-w0 +[2025-08-29 18:30:07,909][15827] Heartbeat connected on RolloutWorker_w0 +[2025-08-29 18:30:07,910][15827] Heartbeat connected on RolloutWorker_w1 +[2025-08-29 18:30:07,911][15827] Heartbeat connected on RolloutWorker_w2 +[2025-08-29 18:30:07,913][15827] Heartbeat connected on RolloutWorker_w3 +[2025-08-29 18:30:07,954][15827] Heartbeat connected on RolloutWorker_w4 +[2025-08-29 18:30:08,038][15827] Heartbeat connected on RolloutWorker_w5 +[2025-08-29 18:30:08,189][15827] Heartbeat connected on RolloutWorker_w6 +[2025-08-29 18:30:08,262][15827] Heartbeat connected on RolloutWorker_w7 +[2025-08-29 18:30:08,362][15827] Heartbeat connected on RolloutWorker_w8 +[2025-08-29 18:30:08,400][15827] Heartbeat connected on RolloutWorker_w9 +[2025-08-29 18:30:09,068][15827] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 320.0. Samples: 3202. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0) +[2025-08-29 18:30:09,116][15827] Avg episode reward: [(0, '1.964')] +[2025-08-29 18:30:13,517][19378] Signal inference workers to resume experience collection... +[2025-08-29 18:30:13,525][19393] InferenceWorker_p0-w0: resuming experience collection +[2025-08-29 18:30:14,061][15827] Fps is (10 sec: 409.9, 60 sec: 273.1, 300 sec: 273.1). Total num frames: 4096. Throughput: 0: 213.5. Samples: 3202. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0) +[2025-08-29 18:30:14,064][15827] Avg episode reward: [(0, '2.085')] +[2025-08-29 18:30:14,168][15827] Heartbeat connected on LearnerWorker_p0 +[2025-08-29 18:30:16,484][19393] Updated weights for policy 0, policy_version 10 (0.0336) +[2025-08-29 18:30:20,599][15827] Fps is (10 sec: 4261.8, 60 sec: 2282.0, 300 sec: 2282.0). Total num frames: 49152. Throughput: 0: 231.7. Samples: 4990. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:30:20,601][15827] Avg episode reward: [(0, '4.441')] +[2025-08-29 18:30:23,007][19393] Updated weights for policy 0, policy_version 20 (0.0013) +[2025-08-29 18:30:24,061][15827] Fps is (10 sec: 9420.7, 60 sec: 3932.2, 300 sec: 3932.2). Total num frames: 98304. Throughput: 0: 951.9. Samples: 23798. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:30:24,062][15827] Avg episode reward: [(0, '4.251')] +[2025-08-29 18:30:25,756][19393] Updated weights for policy 0, policy_version 30 (0.0010) +[2025-08-29 18:30:28,414][19393] Updated weights for policy 0, policy_version 40 (0.0014) +[2025-08-29 18:30:29,060][15827] Fps is (10 sec: 14523.2, 60 sec: 5734.4, 300 sec: 5734.4). Total num frames: 172032. Throughput: 0: 1184.1. Samples: 35522. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:30:29,062][15827] Avg episode reward: [(0, '4.330')] +[2025-08-29 18:30:29,068][19378] Saving new best policy, reward=4.330! +[2025-08-29 18:30:30,993][19393] Updated weights for policy 0, policy_version 50 (0.0011) +[2025-08-29 18:30:33,666][19393] Updated weights for policy 0, policy_version 60 (0.0013) +[2025-08-29 18:30:34,060][15827] Fps is (10 sec: 15155.3, 60 sec: 7138.8, 300 sec: 7138.8). Total num frames: 249856. Throughput: 0: 1674.1. Samples: 58592. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:30:34,062][15827] Avg episode reward: [(0, '4.487')] +[2025-08-29 18:30:34,064][19378] Saving new best policy, reward=4.487! +[2025-08-29 18:30:36,408][19393] Updated weights for policy 0, policy_version 70 (0.0015) +[2025-08-29 18:30:38,988][19393] Updated weights for policy 0, policy_version 80 (0.0013) +[2025-08-29 18:30:39,060][15827] Fps is (10 sec: 15564.6, 60 sec: 8192.0, 300 sec: 8192.0). Total num frames: 327680. Throughput: 0: 2036.5. Samples: 81458. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:30:39,063][15827] Avg episode reward: [(0, '4.568')] +[2025-08-29 18:30:39,069][19378] Saving new best policy, reward=4.568! +[2025-08-29 18:30:41,577][19393] Updated weights for policy 0, policy_version 90 (0.0015) +[2025-08-29 18:30:44,061][15827] Fps is (10 sec: 15564.7, 60 sec: 9011.2, 300 sec: 9011.2). Total num frames: 405504. Throughput: 0: 2070.2. Samples: 93158. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:30:44,062][15827] Avg episode reward: [(0, '4.371')] +[2025-08-29 18:30:44,163][19393] Updated weights for policy 0, policy_version 100 (0.0011) +[2025-08-29 18:30:46,845][19393] Updated weights for policy 0, policy_version 110 (0.0014) +[2025-08-29 18:30:49,060][15827] Fps is (10 sec: 15565.0, 60 sec: 9666.6, 300 sec: 9666.6). Total num frames: 483328. Throughput: 0: 2518.9. Samples: 116534. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:30:49,061][15827] Avg episode reward: [(0, '4.453')] +[2025-08-29 18:30:49,470][19393] Updated weights for policy 0, policy_version 120 (0.0013) +[2025-08-29 18:30:52,112][19393] Updated weights for policy 0, policy_version 130 (0.0013) +[2025-08-29 18:30:56,427][15827] Fps is (10 sec: 10930.4, 60 sec: 9424.9, 300 sec: 9424.9). Total num frames: 540672. Throughput: 0: 2639.5. Samples: 128212. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:30:56,431][15827] Avg episode reward: [(0, '4.376')] +[2025-08-29 18:30:58,325][19393] Updated weights for policy 0, policy_version 140 (0.0013) +[2025-08-29 18:30:59,060][15827] Fps is (10 sec: 9830.4, 60 sec: 9693.9, 300 sec: 9693.9). Total num frames: 581632. Throughput: 0: 2988.0. Samples: 137660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:30:59,061][15827] Avg episode reward: [(0, '4.393')] +[2025-08-29 18:31:00,969][19393] Updated weights for policy 0, policy_version 150 (0.0014) +[2025-08-29 18:31:03,665][19393] Updated weights for policy 0, policy_version 160 (0.0012) +[2025-08-29 18:31:04,060][15827] Fps is (10 sec: 15023.9, 60 sec: 10923.9, 300 sec: 10082.5). Total num frames: 655360. Throughput: 0: 3571.2. Samples: 160200. Policy #0 lag: (min: 0.0, avg: 1.3, max: 4.0) +[2025-08-29 18:31:04,062][15827] Avg episode reward: [(0, '4.436')] +[2025-08-29 18:31:06,363][19393] Updated weights for policy 0, policy_version 170 (0.0010) +[2025-08-29 18:31:09,061][15827] Fps is (10 sec: 15154.9, 60 sec: 12220.9, 300 sec: 10474.1). Total num frames: 733184. Throughput: 0: 3544.3. Samples: 183292. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:31:09,062][15827] Avg episode reward: [(0, '4.287')] +[2025-08-29 18:31:09,116][19393] Updated weights for policy 0, policy_version 180 (0.0013) +[2025-08-29 18:31:11,767][19393] Updated weights for policy 0, policy_version 190 (0.0011) +[2025-08-29 18:31:14,060][15827] Fps is (10 sec: 15974.6, 60 sec: 13516.8, 300 sec: 10868.1). Total num frames: 815104. Throughput: 0: 3542.7. Samples: 194942. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:31:14,061][15827] Avg episode reward: [(0, '4.362')] +[2025-08-29 18:31:14,230][19393] Updated weights for policy 0, policy_version 200 (0.0014) +[2025-08-29 18:31:16,752][19393] Updated weights for policy 0, policy_version 210 (0.0012) +[2025-08-29 18:31:19,061][15827] Fps is (10 sec: 15155.3, 60 sec: 14293.0, 300 sec: 11059.2). Total num frames: 884736. Throughput: 0: 3553.5. Samples: 218498. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:31:19,062][15827] Avg episode reward: [(0, '4.387')] +[2025-08-29 18:31:20,251][19393] Updated weights for policy 0, policy_version 220 (0.0014) +[2025-08-29 18:31:23,187][19393] Updated weights for policy 0, policy_version 230 (0.0013) +[2025-08-29 18:31:24,061][15827] Fps is (10 sec: 13106.7, 60 sec: 14131.2, 300 sec: 11131.5). Total num frames: 946176. Throughput: 0: 3471.4. Samples: 237672. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:31:24,062][15827] Avg episode reward: [(0, '4.617')] +[2025-08-29 18:31:24,068][19378] Saving new best policy, reward=4.617! +[2025-08-29 18:31:26,406][19393] Updated weights for policy 0, policy_version 240 (0.0017) +[2025-08-29 18:31:32,254][15827] Fps is (10 sec: 9623.9, 60 sec: 13287.3, 300 sec: 10856.0). Total num frames: 1011712. Throughput: 0: 3204.8. Samples: 247608. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:31:32,255][15827] Avg episode reward: [(0, '4.342')] +[2025-08-29 18:31:33,147][19393] Updated weights for policy 0, policy_version 250 (0.0017) +[2025-08-29 18:31:34,060][15827] Fps is (10 sec: 9011.6, 60 sec: 13107.2, 300 sec: 10908.3). Total num frames: 1036288. Throughput: 0: 3087.3. Samples: 255462. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:31:34,062][15827] Avg episode reward: [(0, '4.375')] +[2025-08-29 18:31:36,778][19393] Updated weights for policy 0, policy_version 260 (0.0018) +[2025-08-29 18:31:39,061][15827] Fps is (10 sec: 12035.4, 60 sec: 12765.8, 300 sec: 10936.3). Total num frames: 1093632. Throughput: 0: 3397.5. Samples: 273060. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:31:39,063][15827] Avg episode reward: [(0, '4.311')] +[2025-08-29 18:31:39,069][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000267_1093632.pth... +[2025-08-29 18:31:39,146][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000267_1093632.pth +[2025-08-29 18:31:40,018][19393] Updated weights for policy 0, policy_version 270 (0.0015) +[2025-08-29 18:31:43,012][19393] Updated weights for policy 0, policy_version 280 (0.0016) +[2025-08-29 18:31:44,060][15827] Fps is (10 sec: 12287.9, 60 sec: 12561.1, 300 sec: 11039.7). Total num frames: 1159168. Throughput: 0: 3238.6. Samples: 283396. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:31:44,062][15827] Avg episode reward: [(0, '4.458')] +[2025-08-29 18:31:46,227][19393] Updated weights for policy 0, policy_version 290 (0.0015) +[2025-08-29 18:31:49,061][15827] Fps is (10 sec: 12697.4, 60 sec: 12287.9, 300 sec: 11096.4). Total num frames: 1220608. Throughput: 0: 3169.8. Samples: 302844. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:31:49,064][15827] Avg episode reward: [(0, '4.347')] +[2025-08-29 18:31:49,460][19393] Updated weights for policy 0, policy_version 300 (0.0015) +[2025-08-29 18:31:53,455][19393] Updated weights for policy 0, policy_version 310 (0.0025) +[2025-08-29 18:31:54,061][15827] Fps is (10 sec: 11468.7, 60 sec: 12721.4, 300 sec: 11077.0). Total num frames: 1273856. Throughput: 0: 3018.1. Samples: 319106. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:31:54,062][15827] Avg episode reward: [(0, '4.509')] +[2025-08-29 18:31:56,849][19393] Updated weights for policy 0, policy_version 320 (0.0019) +[2025-08-29 18:31:59,060][15827] Fps is (10 sec: 11060.0, 60 sec: 12492.8, 300 sec: 11093.4). Total num frames: 1331200. Throughput: 0: 2958.5. Samples: 328074. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:31:59,061][15827] Avg episode reward: [(0, '4.292')] +[2025-08-29 18:32:00,408][19393] Updated weights for policy 0, policy_version 330 (0.0023) +[2025-08-29 18:32:03,376][19393] Updated weights for policy 0, policy_version 340 (0.0019) +[2025-08-29 18:32:04,060][15827] Fps is (10 sec: 12288.2, 60 sec: 12356.3, 300 sec: 11173.9). Total num frames: 1396736. Throughput: 0: 2839.9. Samples: 346294. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:32:04,062][15827] Avg episode reward: [(0, '4.476')] +[2025-08-29 18:32:09,060][15827] Fps is (10 sec: 8601.6, 60 sec: 11400.6, 300 sec: 10901.7). Total num frames: 1417216. Throughput: 0: 2584.2. Samples: 353962. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:32:09,062][15827] Avg episode reward: [(0, '4.641')] +[2025-08-29 18:32:09,068][19378] Saving new best policy, reward=4.677! +[2025-08-29 18:32:10,004][19393] Updated weights for policy 0, policy_version 350 (0.0017) +[2025-08-29 18:32:13,012][19393] Updated weights for policy 0, policy_version 360 (0.0011) +[2025-08-29 18:32:14,060][15827] Fps is (10 sec: 9011.1, 60 sec: 11195.7, 300 sec: 11013.7). Total num frames: 1486848. Throughput: 0: 2799.5. Samples: 364646. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:32:14,062][15827] Avg episode reward: [(0, '4.525')] +[2025-08-29 18:32:16,067][19393] Updated weights for policy 0, policy_version 370 (0.0015) +[2025-08-29 18:32:19,060][15827] Fps is (10 sec: 13107.1, 60 sec: 11059.2, 300 sec: 11059.2). Total num frames: 1548288. Throughput: 0: 2869.6. Samples: 384592. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:32:19,062][15827] Avg episode reward: [(0, '4.449')] +[2025-08-29 18:32:19,483][19393] Updated weights for policy 0, policy_version 380 (0.0015) +[2025-08-29 18:32:22,801][19393] Updated weights for policy 0, policy_version 390 (0.0013) +[2025-08-29 18:32:24,060][15827] Fps is (10 sec: 12697.8, 60 sec: 11127.6, 300 sec: 11129.8). Total num frames: 1613824. Throughput: 0: 2888.8. Samples: 403056. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:32:24,062][15827] Avg episode reward: [(0, '4.439')] +[2025-08-29 18:32:25,752][19393] Updated weights for policy 0, policy_version 400 (0.0013) +[2025-08-29 18:32:28,603][19393] Updated weights for policy 0, policy_version 410 (0.0014) +[2025-08-29 18:32:29,060][15827] Fps is (10 sec: 13926.4, 60 sec: 11897.3, 300 sec: 11250.4). Total num frames: 1687552. Throughput: 0: 2904.0. Samples: 414078. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:32:29,062][15827] Avg episode reward: [(0, '4.499')] +[2025-08-29 18:32:31,246][19393] Updated weights for policy 0, policy_version 420 (0.0015) +[2025-08-29 18:32:34,029][19393] Updated weights for policy 0, policy_version 430 (0.0012) +[2025-08-29 18:32:34,061][15827] Fps is (10 sec: 14745.2, 60 sec: 12083.2, 300 sec: 11363.1). Total num frames: 1761280. Throughput: 0: 2970.0. Samples: 436492. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:32:34,062][15827] Avg episode reward: [(0, '4.446')] +[2025-08-29 18:32:36,789][19393] Updated weights for policy 0, policy_version 440 (0.0012) +[2025-08-29 18:32:39,060][15827] Fps is (10 sec: 14336.0, 60 sec: 12288.1, 300 sec: 11443.2). Total num frames: 1830912. Throughput: 0: 3096.1. Samples: 458432. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:32:39,062][15827] Avg episode reward: [(0, '4.408')] +[2025-08-29 18:32:39,566][19393] Updated weights for policy 0, policy_version 450 (0.0011) +[2025-08-29 18:32:44,061][15827] Fps is (10 sec: 9420.8, 60 sec: 11605.3, 300 sec: 11245.4). Total num frames: 1855488. Throughput: 0: 3051.8. Samples: 465406. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:32:44,062][15827] Avg episode reward: [(0, '4.473')] +[2025-08-29 18:32:45,723][19393] Updated weights for policy 0, policy_version 460 (0.0014) +[2025-08-29 18:32:48,648][19393] Updated weights for policy 0, policy_version 470 (0.0013) +[2025-08-29 18:32:49,061][15827] Fps is (10 sec: 9830.3, 60 sec: 11810.2, 300 sec: 11348.3). Total num frames: 1929216. Throughput: 0: 2949.0. Samples: 478998. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:32:49,062][15827] Avg episode reward: [(0, '4.338')] +[2025-08-29 18:32:51,478][19393] Updated weights for policy 0, policy_version 480 (0.0011) +[2025-08-29 18:32:54,060][15827] Fps is (10 sec: 14745.9, 60 sec: 12151.5, 300 sec: 11445.4). Total num frames: 2002944. Throughput: 0: 3265.2. Samples: 500894. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:32:54,061][15827] Avg episode reward: [(0, '4.232')] +[2025-08-29 18:32:54,187][19393] Updated weights for policy 0, policy_version 490 (0.0012) +[2025-08-29 18:32:56,797][19393] Updated weights for policy 0, policy_version 500 (0.0012) +[2025-08-29 18:32:59,060][15827] Fps is (10 sec: 14745.7, 60 sec: 12424.5, 300 sec: 11537.1). Total num frames: 2076672. Throughput: 0: 3284.2. Samples: 512436. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:32:59,062][15827] Avg episode reward: [(0, '4.215')] +[2025-08-29 18:32:59,834][19393] Updated weights for policy 0, policy_version 510 (0.0016) +[2025-08-29 18:33:02,533][19393] Updated weights for policy 0, policy_version 520 (0.0013) +[2025-08-29 18:33:04,060][15827] Fps is (10 sec: 14745.3, 60 sec: 12561.0, 300 sec: 11623.8). Total num frames: 2150400. Throughput: 0: 3324.4. Samples: 534192. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:33:04,062][15827] Avg episode reward: [(0, '4.277')] +[2025-08-29 18:33:05,157][19393] Updated weights for policy 0, policy_version 530 (0.0012) +[2025-08-29 18:33:07,828][19393] Updated weights for policy 0, policy_version 540 (0.0013) +[2025-08-29 18:33:09,061][15827] Fps is (10 sec: 15155.0, 60 sec: 13516.8, 300 sec: 11727.5). Total num frames: 2228224. Throughput: 0: 3421.3. Samples: 557014. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:33:09,062][15827] Avg episode reward: [(0, '4.432')] +[2025-08-29 18:33:10,641][19393] Updated weights for policy 0, policy_version 550 (0.0014) +[2025-08-29 18:33:13,388][19393] Updated weights for policy 0, policy_version 560 (0.0013) +[2025-08-29 18:33:14,061][15827] Fps is (10 sec: 15155.3, 60 sec: 13585.1, 300 sec: 11804.9). Total num frames: 2301952. Throughput: 0: 3416.9. Samples: 567838. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:33:14,062][15827] Avg episode reward: [(0, '4.490')] +[2025-08-29 18:33:15,964][19393] Updated weights for policy 0, policy_version 570 (0.0012) +[2025-08-29 18:33:19,754][15827] Fps is (10 sec: 9958.6, 60 sec: 12957.3, 300 sec: 11633.2). Total num frames: 2334720. Throughput: 0: 3121.1. Samples: 579108. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:33:19,756][15827] Avg episode reward: [(0, '4.451')] +[2025-08-29 18:33:22,521][19393] Updated weights for policy 0, policy_version 580 (0.0013) +[2025-08-29 18:33:24,060][15827] Fps is (10 sec: 9420.8, 60 sec: 13038.9, 300 sec: 11688.6). Total num frames: 2396160. Throughput: 0: 3122.8. Samples: 598960. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:33:24,062][15827] Avg episode reward: [(0, '4.347')] +[2025-08-29 18:33:25,366][19393] Updated weights for policy 0, policy_version 590 (0.0015) +[2025-08-29 18:33:28,319][19393] Updated weights for policy 0, policy_version 600 (0.0011) +[2025-08-29 18:33:29,060][15827] Fps is (10 sec: 14084.8, 60 sec: 12970.7, 300 sec: 11741.9). Total num frames: 2465792. Throughput: 0: 3208.0. Samples: 609766. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:33:29,062][15827] Avg episode reward: [(0, '4.327')] +[2025-08-29 18:33:31,220][19393] Updated weights for policy 0, policy_version 610 (0.0013) +[2025-08-29 18:33:34,061][15827] Fps is (10 sec: 13926.2, 60 sec: 12902.4, 300 sec: 11792.7). Total num frames: 2535424. Throughput: 0: 3376.4. Samples: 630938. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) +[2025-08-29 18:33:34,062][15827] Avg episode reward: [(0, '4.548')] +[2025-08-29 18:33:34,225][19393] Updated weights for policy 0, policy_version 620 (0.0015) +[2025-08-29 18:33:37,207][19393] Updated weights for policy 0, policy_version 630 (0.0010) +[2025-08-29 18:33:39,061][15827] Fps is (10 sec: 13926.2, 60 sec: 12902.4, 300 sec: 11841.2). Total num frames: 2605056. Throughput: 0: 3346.4. Samples: 651482. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:33:39,062][15827] Avg episode reward: [(0, '4.336')] +[2025-08-29 18:33:39,068][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000636_2605056.pth... +[2025-08-29 18:33:39,158][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000636_2605056.pth +[2025-08-29 18:33:40,079][19393] Updated weights for policy 0, policy_version 640 (0.0011) +[2025-08-29 18:33:42,924][19393] Updated weights for policy 0, policy_version 650 (0.0015) +[2025-08-29 18:33:44,060][15827] Fps is (10 sec: 14336.4, 60 sec: 13721.7, 300 sec: 11905.7). Total num frames: 2678784. Throughput: 0: 3324.5. Samples: 662040. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:33:44,062][15827] Avg episode reward: [(0, '4.435')] +[2025-08-29 18:33:45,827][19393] Updated weights for policy 0, policy_version 660 (0.0012) +[2025-08-29 18:33:48,557][19393] Updated weights for policy 0, policy_version 670 (0.0012) +[2025-08-29 18:33:49,061][15827] Fps is (10 sec: 14745.6, 60 sec: 13721.6, 300 sec: 11967.4). Total num frames: 2752512. Throughput: 0: 3326.3. Samples: 683876. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:33:49,062][15827] Avg episode reward: [(0, '4.448')] +[2025-08-29 18:33:51,476][19393] Updated weights for policy 0, policy_version 680 (0.0013) +[2025-08-29 18:33:55,593][15827] Fps is (10 sec: 9944.4, 60 sec: 12847.2, 300 sec: 11810.1). Total num frames: 2793472. Throughput: 0: 2955.4. Samples: 694536. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:33:55,595][15827] Avg episode reward: [(0, '4.359')] +[2025-08-29 18:33:57,929][19393] Updated weights for policy 0, policy_version 690 (0.0013) +[2025-08-29 18:33:59,061][15827] Fps is (10 sec: 9011.2, 60 sec: 12765.8, 300 sec: 11844.3). Total num frames: 2842624. Throughput: 0: 3012.2. Samples: 703388. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:33:59,062][15827] Avg episode reward: [(0, '4.307')] +[2025-08-29 18:34:00,938][19393] Updated weights for policy 0, policy_version 700 (0.0018) +[2025-08-29 18:34:03,902][19393] Updated weights for policy 0, policy_version 710 (0.0014) +[2025-08-29 18:34:04,061][15827] Fps is (10 sec: 13544.8, 60 sec: 12629.3, 300 sec: 11870.0). Total num frames: 2908160. Throughput: 0: 3269.8. Samples: 723980. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:34:04,062][15827] Avg episode reward: [(0, '4.493')] +[2025-08-29 18:34:06,430][19393] Updated weights for policy 0, policy_version 720 (0.0014) +[2025-08-29 18:34:09,061][15827] Fps is (10 sec: 14335.4, 60 sec: 12629.2, 300 sec: 11943.9). Total num frames: 2985984. Throughput: 0: 3286.6. Samples: 746858. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:34:09,063][15827] Avg episode reward: [(0, '4.283')] +[2025-08-29 18:34:09,111][19393] Updated weights for policy 0, policy_version 730 (0.0014) +[2025-08-29 18:34:11,959][19393] Updated weights for policy 0, policy_version 740 (0.0013) +[2025-08-29 18:34:14,060][15827] Fps is (10 sec: 15155.5, 60 sec: 12629.4, 300 sec: 11998.9). Total num frames: 3059712. Throughput: 0: 3284.6. Samples: 757574. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:34:14,061][15827] Avg episode reward: [(0, '4.420')] +[2025-08-29 18:34:14,928][19393] Updated weights for policy 0, policy_version 750 (0.0018) +[2025-08-29 18:34:17,832][19393] Updated weights for policy 0, policy_version 760 (0.0011) +[2025-08-29 18:34:19,060][15827] Fps is (10 sec: 14336.8, 60 sec: 13398.7, 300 sec: 12035.9). Total num frames: 3129344. Throughput: 0: 3283.7. Samples: 778706. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:34:19,062][15827] Avg episode reward: [(0, '4.333')] +[2025-08-29 18:34:20,738][19393] Updated weights for policy 0, policy_version 770 (0.0014) +[2025-08-29 18:34:23,466][19393] Updated weights for policy 0, policy_version 780 (0.0011) +[2025-08-29 18:34:24,060][15827] Fps is (10 sec: 13926.3, 60 sec: 13380.3, 300 sec: 12071.6). Total num frames: 3198976. Throughput: 0: 3303.5. Samples: 800140. Policy #0 lag: (min: 0.0, avg: 1.4, max: 4.0) +[2025-08-29 18:34:24,063][15827] Avg episode reward: [(0, '4.588')] +[2025-08-29 18:34:26,562][19393] Updated weights for policy 0, policy_version 790 (0.0015) +[2025-08-29 18:34:31,421][15827] Fps is (10 sec: 10272.9, 60 sec: 12676.8, 300 sec: 11955.9). Total num frames: 3256320. Throughput: 0: 3131.4. Samples: 810346. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:34:31,422][15827] Avg episode reward: [(0, '4.347')] +[2025-08-29 18:34:32,725][19393] Updated weights for policy 0, policy_version 800 (0.0011) +[2025-08-29 18:34:34,060][15827] Fps is (10 sec: 9830.4, 60 sec: 12697.6, 300 sec: 11990.1). Total num frames: 3297280. Throughput: 0: 3020.0. Samples: 819778. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:34:34,062][15827] Avg episode reward: [(0, '4.659')] +[2025-08-29 18:34:35,258][19393] Updated weights for policy 0, policy_version 810 (0.0012) +[2025-08-29 18:34:38,139][19393] Updated weights for policy 0, policy_version 820 (0.0012) +[2025-08-29 18:34:39,060][15827] Fps is (10 sec: 15012.0, 60 sec: 12765.9, 300 sec: 12039.3). Total num frames: 3371008. Throughput: 0: 3399.0. Samples: 842282. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:34:39,062][15827] Avg episode reward: [(0, '4.296')] +[2025-08-29 18:34:40,885][19393] Updated weights for policy 0, policy_version 830 (0.0011) +[2025-08-29 18:34:44,061][15827] Fps is (10 sec: 13926.4, 60 sec: 12629.3, 300 sec: 12058.1). Total num frames: 3436544. Throughput: 0: 3338.2. Samples: 853606. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:34:44,063][15827] Avg episode reward: [(0, '4.363')] +[2025-08-29 18:34:44,105][19393] Updated weights for policy 0, policy_version 840 (0.0017) +[2025-08-29 18:34:47,384][19393] Updated weights for policy 0, policy_version 850 (0.0018) +[2025-08-29 18:34:49,060][15827] Fps is (10 sec: 13107.3, 60 sec: 12492.8, 300 sec: 12076.1). Total num frames: 3502080. Throughput: 0: 3291.1. Samples: 872080. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:34:49,061][15827] Avg episode reward: [(0, '4.441')] +[2025-08-29 18:34:50,069][19393] Updated weights for policy 0, policy_version 860 (0.0013) +[2025-08-29 18:34:52,987][19393] Updated weights for policy 0, policy_version 870 (0.0012) +[2025-08-29 18:34:54,060][15827] Fps is (10 sec: 13926.6, 60 sec: 13380.8, 300 sec: 12121.4). Total num frames: 3575808. Throughput: 0: 3274.7. Samples: 894218. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:34:54,062][15827] Avg episode reward: [(0, '4.484')] +[2025-08-29 18:34:55,909][19393] Updated weights for policy 0, policy_version 880 (0.0014) +[2025-08-29 18:34:58,925][19393] Updated weights for policy 0, policy_version 890 (0.0013) +[2025-08-29 18:34:59,061][15827] Fps is (10 sec: 14335.5, 60 sec: 13380.2, 300 sec: 12357.7). Total num frames: 3645440. Throughput: 0: 3270.4. Samples: 904744. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:34:59,062][15827] Avg episode reward: [(0, '4.500')] +[2025-08-29 18:35:01,812][19393] Updated weights for policy 0, policy_version 900 (0.0014) +[2025-08-29 18:35:07,255][15827] Fps is (10 sec: 10244.2, 60 sec: 12703.9, 300 sec: 12445.1). Total num frames: 3710976. Throughput: 0: 3043.5. Samples: 925384. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:35:07,256][15827] Avg episode reward: [(0, '4.466')] +[2025-08-29 18:35:08,389][19393] Updated weights for policy 0, policy_version 910 (0.0012) +[2025-08-29 18:35:09,060][15827] Fps is (10 sec: 9011.4, 60 sec: 12492.9, 300 sec: 12649.0). Total num frames: 3735552. Throughput: 0: 2973.4. Samples: 933944. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:35:09,062][15827] Avg episode reward: [(0, '4.405')] +[2025-08-29 18:35:11,079][19393] Updated weights for policy 0, policy_version 920 (0.0012) +[2025-08-29 18:35:13,889][19393] Updated weights for policy 0, policy_version 930 (0.0013) +[2025-08-29 18:35:14,060][15827] Fps is (10 sec: 14444.8, 60 sec: 12492.8, 300 sec: 12813.0). Total num frames: 3809280. Throughput: 0: 3157.9. Samples: 944996. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:35:14,061][15827] Avg episode reward: [(0, '4.231')] +[2025-08-29 18:35:16,866][19393] Updated weights for policy 0, policy_version 940 (0.0020) +[2025-08-29 18:35:19,061][15827] Fps is (10 sec: 14335.6, 60 sec: 12492.7, 300 sec: 12815.6). Total num frames: 3878912. Throughput: 0: 3246.3. Samples: 965862. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:35:19,063][15827] Avg episode reward: [(0, '4.572')] +[2025-08-29 18:35:20,011][19393] Updated weights for policy 0, policy_version 950 (0.0020) +[2025-08-29 18:35:23,158][19393] Updated weights for policy 0, policy_version 960 (0.0015) +[2025-08-29 18:35:24,060][15827] Fps is (10 sec: 13516.7, 60 sec: 12424.5, 300 sec: 12787.8). Total num frames: 3944448. Throughput: 0: 3191.1. Samples: 985882. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:35:24,062][15827] Avg episode reward: [(0, '4.458')] +[2025-08-29 18:35:26,073][19393] Updated weights for policy 0, policy_version 970 (0.0013) +[2025-08-29 18:35:29,061][15827] Fps is (10 sec: 13107.2, 60 sec: 13075.4, 300 sec: 12746.2). Total num frames: 4009984. Throughput: 0: 3165.2. Samples: 996040. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:35:29,062][15827] Avg episode reward: [(0, '4.519')] +[2025-08-29 18:35:29,270][19393] Updated weights for policy 0, policy_version 980 (0.0013) +[2025-08-29 18:35:32,252][19393] Updated weights for policy 0, policy_version 990 (0.0015) +[2025-08-29 18:35:34,060][15827] Fps is (10 sec: 13516.9, 60 sec: 13039.0, 300 sec: 12718.4). Total num frames: 4079616. Throughput: 0: 3202.5. Samples: 1016192. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:35:34,062][15827] Avg episode reward: [(0, '4.477')] +[2025-08-29 18:35:35,282][19393] Updated weights for policy 0, policy_version 1000 (0.0013) +[2025-08-29 18:35:38,308][19393] Updated weights for policy 0, policy_version 1010 (0.0016) +[2025-08-29 18:35:39,060][15827] Fps is (10 sec: 13517.3, 60 sec: 12902.4, 300 sec: 12676.8). Total num frames: 4145152. Throughput: 0: 3161.4. Samples: 1036482. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:35:39,062][15827] Avg episode reward: [(0, '4.567')] +[2025-08-29 18:35:39,068][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001012_4145152.pth... +[2025-08-29 18:35:39,156][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000941_3854336.pth +[2025-08-29 18:35:44,060][15827] Fps is (10 sec: 8601.4, 60 sec: 12151.5, 300 sec: 12482.4). Total num frames: 4165632. Throughput: 0: 3001.5. Samples: 1039810. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:35:44,061][15827] Avg episode reward: [(0, '4.398')] +[2025-08-29 18:35:44,933][19393] Updated weights for policy 0, policy_version 1020 (0.0012) +[2025-08-29 18:35:48,016][19393] Updated weights for policy 0, policy_version 1030 (0.0017) +[2025-08-29 18:35:49,060][15827] Fps is (10 sec: 9011.1, 60 sec: 12219.7, 300 sec: 12625.3). Total num frames: 4235264. Throughput: 0: 3101.0. Samples: 1055024. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:35:49,062][15827] Avg episode reward: [(0, '4.314')] +[2025-08-29 18:35:51,204][19393] Updated weights for policy 0, policy_version 1040 (0.0019) +[2025-08-29 18:35:54,061][15827] Fps is (10 sec: 13107.1, 60 sec: 12014.9, 300 sec: 12593.5). Total num frames: 4296704. Throughput: 0: 3134.6. Samples: 1075000. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:35:54,062][15827] Avg episode reward: [(0, '4.534')] +[2025-08-29 18:35:54,120][19393] Updated weights for policy 0, policy_version 1050 (0.0015) +[2025-08-29 18:35:56,849][19393] Updated weights for policy 0, policy_version 1060 (0.0011) +[2025-08-29 18:35:59,060][15827] Fps is (10 sec: 13516.9, 60 sec: 12083.3, 300 sec: 12593.5). Total num frames: 4370432. Throughput: 0: 3128.8. Samples: 1085794. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:35:59,062][15827] Avg episode reward: [(0, '4.444')] +[2025-08-29 18:35:59,762][19393] Updated weights for policy 0, policy_version 1070 (0.0012) +[2025-08-29 18:36:02,615][19393] Updated weights for policy 0, policy_version 1080 (0.0014) +[2025-08-29 18:36:04,060][15827] Fps is (10 sec: 14336.2, 60 sec: 12834.8, 300 sec: 12565.7). Total num frames: 4440064. Throughput: 0: 3140.2. Samples: 1107172. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:36:04,061][15827] Avg episode reward: [(0, '4.585')] +[2025-08-29 18:36:05,699][19393] Updated weights for policy 0, policy_version 1090 (0.0012) +[2025-08-29 18:36:08,570][19393] Updated weights for policy 0, policy_version 1100 (0.0017) +[2025-08-29 18:36:09,061][15827] Fps is (10 sec: 13926.1, 60 sec: 12902.4, 300 sec: 12524.0). Total num frames: 4509696. Throughput: 0: 3150.2. Samples: 1127640. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:36:09,062][15827] Avg episode reward: [(0, '4.381')] +[2025-08-29 18:36:11,395][19393] Updated weights for policy 0, policy_version 1110 (0.0012) +[2025-08-29 18:36:14,060][15827] Fps is (10 sec: 14336.2, 60 sec: 12902.4, 300 sec: 12537.9). Total num frames: 4583424. Throughput: 0: 3165.9. Samples: 1138506. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:36:14,061][15827] Avg episode reward: [(0, '4.365')] +[2025-08-29 18:36:14,417][19393] Updated weights for policy 0, policy_version 1120 (0.0015) +[2025-08-29 18:36:19,060][15827] Fps is (10 sec: 9421.0, 60 sec: 12083.3, 300 sec: 12399.1). Total num frames: 4603904. Throughput: 0: 3023.2. Samples: 1152234. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:36:19,062][15827] Avg episode reward: [(0, '4.568')] +[2025-08-29 18:36:20,747][19393] Updated weights for policy 0, policy_version 1130 (0.0014) +[2025-08-29 18:36:23,772][19393] Updated weights for policy 0, policy_version 1140 (0.0016) +[2025-08-29 18:36:24,060][15827] Fps is (10 sec: 8601.6, 60 sec: 12083.2, 300 sec: 12534.8). Total num frames: 4669440. Throughput: 0: 2916.0. Samples: 1167702. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:36:24,062][15827] Avg episode reward: [(0, '4.653')] +[2025-08-29 18:36:26,867][19393] Updated weights for policy 0, policy_version 1150 (0.0012) +[2025-08-29 18:36:29,061][15827] Fps is (10 sec: 13516.5, 60 sec: 12151.5, 300 sec: 12551.8). Total num frames: 4739072. Throughput: 0: 3064.8. Samples: 1177728. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:36:29,062][15827] Avg episode reward: [(0, '4.427')] +[2025-08-29 18:36:29,966][19393] Updated weights for policy 0, policy_version 1160 (0.0013) +[2025-08-29 18:36:32,991][19393] Updated weights for policy 0, policy_version 1170 (0.0017) +[2025-08-29 18:36:34,060][15827] Fps is (10 sec: 13926.3, 60 sec: 12151.4, 300 sec: 12593.5). Total num frames: 4808704. Throughput: 0: 3178.8. Samples: 1198070. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-08-29 18:36:34,062][15827] Avg episode reward: [(0, '4.450')] +[2025-08-29 18:36:35,678][19393] Updated weights for policy 0, policy_version 1180 (0.0012) +[2025-08-29 18:36:38,608][19393] Updated weights for policy 0, policy_version 1190 (0.0014) +[2025-08-29 18:36:39,060][15827] Fps is (10 sec: 13926.7, 60 sec: 12219.7, 300 sec: 12607.4). Total num frames: 4878336. Throughput: 0: 3221.2. Samples: 1219954. Policy #0 lag: (min: 0.0, avg: 1.4, max: 4.0) +[2025-08-29 18:36:39,061][15827] Avg episode reward: [(0, '4.421')] +[2025-08-29 18:36:41,488][19393] Updated weights for policy 0, policy_version 1200 (0.0016) +[2025-08-29 18:36:44,060][15827] Fps is (10 sec: 13926.5, 60 sec: 13039.0, 300 sec: 12635.1). Total num frames: 4947968. Throughput: 0: 3213.4. Samples: 1230398. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:36:44,062][15827] Avg episode reward: [(0, '4.472')] +[2025-08-29 18:36:44,599][19393] Updated weights for policy 0, policy_version 1210 (0.0014) +[2025-08-29 18:36:47,468][19393] Updated weights for policy 0, policy_version 1220 (0.0011) +[2025-08-29 18:36:49,060][15827] Fps is (10 sec: 13926.3, 60 sec: 13038.9, 300 sec: 12690.7). Total num frames: 5017600. Throughput: 0: 3196.0. Samples: 1250994. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:36:49,061][15827] Avg episode reward: [(0, '4.491')] +[2025-08-29 18:36:50,308][19393] Updated weights for policy 0, policy_version 1230 (0.0016) +[2025-08-29 18:36:54,753][15827] Fps is (10 sec: 9576.9, 60 sec: 12417.8, 300 sec: 12577.8). Total num frames: 5050368. Throughput: 0: 2936.8. Samples: 1261830. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:36:54,754][15827] Avg episode reward: [(0, '4.628')] +[2025-08-29 18:36:57,027][19393] Updated weights for policy 0, policy_version 1240 (0.0015) +[2025-08-29 18:36:59,060][15827] Fps is (10 sec: 9011.2, 60 sec: 12288.0, 300 sec: 12579.6). Total num frames: 5107712. Throughput: 0: 2906.6. Samples: 1269302. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:36:59,062][15827] Avg episode reward: [(0, '4.577')] +[2025-08-29 18:36:59,677][19393] Updated weights for policy 0, policy_version 1250 (0.0012) +[2025-08-29 18:37:02,657][19393] Updated weights for policy 0, policy_version 1260 (0.0017) +[2025-08-29 18:37:04,060][15827] Fps is (10 sec: 13642.2, 60 sec: 12288.0, 300 sec: 12746.2). Total num frames: 5177344. Throughput: 0: 3081.1. Samples: 1290884. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:37:04,061][15827] Avg episode reward: [(0, '4.664')] +[2025-08-29 18:37:05,470][19393] Updated weights for policy 0, policy_version 1270 (0.0012) +[2025-08-29 18:37:08,304][19393] Updated weights for policy 0, policy_version 1280 (0.0013) +[2025-08-29 18:37:09,060][15827] Fps is (10 sec: 14336.1, 60 sec: 12356.3, 300 sec: 12760.1). Total num frames: 5251072. Throughput: 0: 3234.0. Samples: 1313230. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:37:09,062][15827] Avg episode reward: [(0, '4.490')] +[2025-08-29 18:37:10,948][19393] Updated weights for policy 0, policy_version 1290 (0.0010) +[2025-08-29 18:37:13,654][19393] Updated weights for policy 0, policy_version 1300 (0.0010) +[2025-08-29 18:37:14,061][15827] Fps is (10 sec: 15154.8, 60 sec: 12424.5, 300 sec: 12815.6). Total num frames: 5328896. Throughput: 0: 3258.4. Samples: 1324354. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:37:14,062][15827] Avg episode reward: [(0, '4.394')] +[2025-08-29 18:37:16,288][19393] Updated weights for policy 0, policy_version 1310 (0.0014) +[2025-08-29 18:37:19,061][15827] Fps is (10 sec: 15154.9, 60 sec: 13312.0, 300 sec: 12843.4). Total num frames: 5402624. Throughput: 0: 3311.0. Samples: 1347066. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2025-08-29 18:37:19,062][15827] Avg episode reward: [(0, '4.494')] +[2025-08-29 18:37:19,244][19393] Updated weights for policy 0, policy_version 1320 (0.0013) +[2025-08-29 18:37:22,009][19393] Updated weights for policy 0, policy_version 1330 (0.0013) +[2025-08-29 18:37:24,060][15827] Fps is (10 sec: 14746.0, 60 sec: 13448.5, 300 sec: 12843.4). Total num frames: 5476352. Throughput: 0: 3312.5. Samples: 1369018. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:37:24,062][15827] Avg episode reward: [(0, '4.358')] +[2025-08-29 18:37:24,702][19393] Updated weights for policy 0, policy_version 1340 (0.0011) +[2025-08-29 18:37:30,587][15827] Fps is (10 sec: 10305.1, 60 sec: 12715.4, 300 sec: 12680.6). Total num frames: 5521408. Throughput: 0: 3230.2. Samples: 1380688. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:37:30,589][15827] Avg episode reward: [(0, '4.231')] +[2025-08-29 18:37:30,972][19393] Updated weights for policy 0, policy_version 1350 (0.0014) +[2025-08-29 18:37:33,597][19393] Updated weights for policy 0, policy_version 1360 (0.0014) +[2025-08-29 18:37:34,061][15827] Fps is (10 sec: 9830.3, 60 sec: 12765.9, 300 sec: 12690.7). Total num frames: 5574656. Throughput: 0: 3082.6. Samples: 1389710. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:37:34,062][15827] Avg episode reward: [(0, '4.357')] +[2025-08-29 18:37:36,344][19393] Updated weights for policy 0, policy_version 1370 (0.0012) +[2025-08-29 18:37:39,060][15827] Fps is (10 sec: 14985.5, 60 sec: 12834.1, 300 sec: 12857.3). Total num frames: 5648384. Throughput: 0: 3395.0. Samples: 1412256. Policy #0 lag: (min: 0.0, avg: 0.9, max: 3.0) +[2025-08-29 18:37:39,062][15827] Avg episode reward: [(0, '4.262')] +[2025-08-29 18:37:39,068][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001379_5648384.pth... +[2025-08-29 18:37:39,151][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth +[2025-08-29 18:37:39,208][19393] Updated weights for policy 0, policy_version 1380 (0.0015) +[2025-08-29 18:37:41,950][19393] Updated weights for policy 0, policy_version 1390 (0.0012) +[2025-08-29 18:37:44,060][15827] Fps is (10 sec: 15155.4, 60 sec: 12970.7, 300 sec: 12871.2). Total num frames: 5726208. Throughput: 0: 3424.2. Samples: 1423390. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:37:44,062][15827] Avg episode reward: [(0, '4.601')] +[2025-08-29 18:37:44,659][19393] Updated weights for policy 0, policy_version 1400 (0.0010) +[2025-08-29 18:37:47,416][19393] Updated weights for policy 0, policy_version 1410 (0.0013) +[2025-08-29 18:37:49,060][15827] Fps is (10 sec: 15155.4, 60 sec: 13039.0, 300 sec: 12871.2). Total num frames: 5799936. Throughput: 0: 3440.7. Samples: 1445716. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:37:49,061][15827] Avg episode reward: [(0, '4.228')] +[2025-08-29 18:37:50,025][19393] Updated weights for policy 0, policy_version 1420 (0.0010) +[2025-08-29 18:37:52,751][19393] Updated weights for policy 0, policy_version 1430 (0.0015) +[2025-08-29 18:37:54,060][15827] Fps is (10 sec: 14745.7, 60 sec: 13881.8, 300 sec: 12871.2). Total num frames: 5873664. Throughput: 0: 3447.0. Samples: 1468346. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:37:54,061][15827] Avg episode reward: [(0, '4.244')] +[2025-08-29 18:37:55,529][19393] Updated weights for policy 0, policy_version 1440 (0.0013) +[2025-08-29 18:37:58,103][19393] Updated weights for policy 0, policy_version 1450 (0.0011) +[2025-08-29 18:37:59,060][15827] Fps is (10 sec: 15155.2, 60 sec: 14062.9, 300 sec: 12885.0). Total num frames: 5951488. Throughput: 0: 3453.8. Samples: 1479774. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:37:59,063][15827] Avg episode reward: [(0, '4.463')] +[2025-08-29 18:38:00,575][19393] Updated weights for policy 0, policy_version 1460 (0.0011) +[2025-08-29 18:38:06,419][15827] Fps is (10 sec: 11268.2, 60 sec: 13399.6, 300 sec: 12727.7). Total num frames: 6012928. Throughput: 0: 3304.1. Samples: 1503544. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:38:06,420][15827] Avg episode reward: [(0, '4.535')] +[2025-08-29 18:38:06,830][19393] Updated weights for policy 0, policy_version 1470 (0.0013) +[2025-08-29 18:38:09,061][15827] Fps is (10 sec: 9830.3, 60 sec: 13312.0, 300 sec: 12704.5). Total num frames: 6049792. Throughput: 0: 3189.7. Samples: 1512556. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:38:09,062][15827] Avg episode reward: [(0, '4.430')] +[2025-08-29 18:38:09,648][19393] Updated weights for policy 0, policy_version 1480 (0.0011) +[2025-08-29 18:38:12,429][19393] Updated weights for policy 0, policy_version 1490 (0.0013) +[2025-08-29 18:38:14,060][15827] Fps is (10 sec: 14473.3, 60 sec: 13243.8, 300 sec: 12873.7). Total num frames: 6123520. Throughput: 0: 3289.1. Samples: 1523676. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:38:14,062][15827] Avg episode reward: [(0, '4.466')] +[2025-08-29 18:38:15,246][19393] Updated weights for policy 0, policy_version 1500 (0.0011) +[2025-08-29 18:38:18,018][19393] Updated weights for policy 0, policy_version 1510 (0.0014) +[2025-08-29 18:38:19,061][15827] Fps is (10 sec: 14745.6, 60 sec: 13243.7, 300 sec: 12885.0). Total num frames: 6197248. Throughput: 0: 3466.3. Samples: 1545692. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:38:19,062][15827] Avg episode reward: [(0, '4.297')] +[2025-08-29 18:38:21,147][19393] Updated weights for policy 0, policy_version 1520 (0.0015) +[2025-08-29 18:38:23,892][19393] Updated weights for policy 0, policy_version 1530 (0.0013) +[2025-08-29 18:38:24,061][15827] Fps is (10 sec: 14335.9, 60 sec: 13175.4, 300 sec: 12885.0). Total num frames: 6266880. Throughput: 0: 3437.6. Samples: 1566948. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:38:24,062][15827] Avg episode reward: [(0, '4.254')] +[2025-08-29 18:38:26,781][19393] Updated weights for policy 0, policy_version 1540 (0.0012) +[2025-08-29 18:38:29,060][15827] Fps is (10 sec: 14336.2, 60 sec: 14009.8, 300 sec: 12898.9). Total num frames: 6340608. Throughput: 0: 3422.4. Samples: 1577400. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:38:29,062][15827] Avg episode reward: [(0, '4.214')] +[2025-08-29 18:38:29,406][19393] Updated weights for policy 0, policy_version 1550 (0.0011) +[2025-08-29 18:38:31,940][19393] Updated weights for policy 0, policy_version 1560 (0.0011) +[2025-08-29 18:38:34,060][15827] Fps is (10 sec: 15155.4, 60 sec: 14063.0, 300 sec: 12926.7). Total num frames: 6418432. Throughput: 0: 3447.8. Samples: 1600868. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:38:34,062][15827] Avg episode reward: [(0, '4.310')] +[2025-08-29 18:38:34,610][19393] Updated weights for policy 0, policy_version 1570 (0.0011) +[2025-08-29 18:38:37,374][19393] Updated weights for policy 0, policy_version 1580 (0.0014) +[2025-08-29 18:38:42,252][15827] Fps is (10 sec: 11178.3, 60 sec: 13287.9, 300 sec: 12774.6). Total num frames: 6488064. Throughput: 0: 3225.3. Samples: 1623778. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:38:42,253][15827] Avg episode reward: [(0, '4.598')] +[2025-08-29 18:38:43,594][19393] Updated weights for policy 0, policy_version 1590 (0.0014) +[2025-08-29 18:38:44,061][15827] Fps is (10 sec: 9830.2, 60 sec: 13175.4, 300 sec: 12760.1). Total num frames: 6516736. Throughput: 0: 3213.8. Samples: 1624394. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:38:44,062][15827] Avg episode reward: [(0, '4.438')] +[2025-08-29 18:38:46,404][19393] Updated weights for policy 0, policy_version 1600 (0.0014) +[2025-08-29 18:38:49,061][15827] Fps is (10 sec: 15039.2, 60 sec: 13175.4, 300 sec: 12938.4). Total num frames: 6590464. Throughput: 0: 3292.2. Samples: 1643928. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:38:49,062][15827] Avg episode reward: [(0, '4.193')] +[2025-08-29 18:38:49,205][19393] Updated weights for policy 0, policy_version 1610 (0.0013) +[2025-08-29 18:38:51,992][19393] Updated weights for policy 0, policy_version 1620 (0.0012) +[2025-08-29 18:38:54,061][15827] Fps is (10 sec: 14745.4, 60 sec: 13175.4, 300 sec: 12954.5). Total num frames: 6664192. Throughput: 0: 3405.6. Samples: 1665810. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:38:54,062][15827] Avg episode reward: [(0, '4.403')] +[2025-08-29 18:38:55,249][19393] Updated weights for policy 0, policy_version 1630 (0.0016) +[2025-08-29 18:38:57,991][19393] Updated weights for policy 0, policy_version 1640 (0.0013) +[2025-08-29 18:38:59,060][15827] Fps is (10 sec: 13926.5, 60 sec: 12970.7, 300 sec: 12954.5). Total num frames: 6729728. Throughput: 0: 3363.2. Samples: 1675020. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:38:59,062][15827] Avg episode reward: [(0, '4.409')] +[2025-08-29 18:39:00,588][19393] Updated weights for policy 0, policy_version 1650 (0.0013) +[2025-08-29 18:39:03,119][19393] Updated weights for policy 0, policy_version 1660 (0.0013) +[2025-08-29 18:39:04,060][15827] Fps is (10 sec: 14746.0, 60 sec: 13856.8, 300 sec: 12968.4). Total num frames: 6811648. Throughput: 0: 3399.8. Samples: 1698682. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:39:04,062][15827] Avg episode reward: [(0, '4.439')] +[2025-08-29 18:39:05,876][19393] Updated weights for policy 0, policy_version 1670 (0.0011) +[2025-08-29 18:39:08,596][19393] Updated weights for policy 0, policy_version 1680 (0.0011) +[2025-08-29 18:39:09,060][15827] Fps is (10 sec: 15564.7, 60 sec: 13926.4, 300 sec: 12968.3). Total num frames: 6885376. Throughput: 0: 3434.9. Samples: 1721518. Policy #0 lag: (min: 0.0, avg: 1.7, max: 3.0) +[2025-08-29 18:39:09,063][15827] Avg episode reward: [(0, '4.379')] +[2025-08-29 18:39:11,146][19393] Updated weights for policy 0, policy_version 1690 (0.0016) +[2025-08-29 18:39:13,739][19393] Updated weights for policy 0, policy_version 1700 (0.0013) +[2025-08-29 18:39:14,060][15827] Fps is (10 sec: 15564.8, 60 sec: 14062.9, 300 sec: 13010.0). Total num frames: 6967296. Throughput: 0: 3465.6. Samples: 1733354. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:39:14,061][15827] Avg episode reward: [(0, '4.431')] +[2025-08-29 18:39:19,060][15827] Fps is (10 sec: 10240.0, 60 sec: 13175.5, 300 sec: 12843.4). Total num frames: 6987776. Throughput: 0: 3204.9. Samples: 1745090. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:39:19,062][15827] Avg episode reward: [(0, '4.286')] +[2025-08-29 18:39:20,029][19393] Updated weights for policy 0, policy_version 1710 (0.0016) +[2025-08-29 18:39:22,956][19393] Updated weights for policy 0, policy_version 1720 (0.0011) +[2025-08-29 18:39:24,060][15827] Fps is (10 sec: 9420.9, 60 sec: 13243.8, 300 sec: 13003.0). Total num frames: 7061504. Throughput: 0: 3378.6. Samples: 1765034. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:39:24,062][15827] Avg episode reward: [(0, '4.454')] +[2025-08-29 18:39:25,732][19393] Updated weights for policy 0, policy_version 1730 (0.0016) +[2025-08-29 18:39:28,313][19393] Updated weights for policy 0, policy_version 1740 (0.0010) +[2025-08-29 18:39:29,060][15827] Fps is (10 sec: 14745.7, 60 sec: 13243.7, 300 sec: 13010.0). Total num frames: 7135232. Throughput: 0: 3373.5. Samples: 1776200. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:39:29,061][15827] Avg episode reward: [(0, '4.445')] +[2025-08-29 18:39:30,908][19393] Updated weights for policy 0, policy_version 1750 (0.0014) +[2025-08-29 18:39:33,452][19393] Updated weights for policy 0, policy_version 1760 (0.0013) +[2025-08-29 18:39:34,060][15827] Fps is (10 sec: 15564.6, 60 sec: 13312.0, 300 sec: 13037.8). Total num frames: 7217152. Throughput: 0: 3464.7. Samples: 1799840. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:39:34,062][15827] Avg episode reward: [(0, '4.437')] +[2025-08-29 18:39:36,168][19393] Updated weights for policy 0, policy_version 1770 (0.0013) +[2025-08-29 18:39:38,901][19393] Updated weights for policy 0, policy_version 1780 (0.0013) +[2025-08-29 18:39:39,060][15827] Fps is (10 sec: 15564.8, 60 sec: 14131.9, 300 sec: 13065.6). Total num frames: 7290880. Throughput: 0: 3480.4. Samples: 1822426. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:39:39,062][15827] Avg episode reward: [(0, '4.503')] +[2025-08-29 18:39:39,066][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001780_7290880.pth... +[2025-08-29 18:39:39,145][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001012_4145152.pth +[2025-08-29 18:39:41,686][19393] Updated weights for policy 0, policy_version 1790 (0.0014) +[2025-08-29 18:39:44,060][15827] Fps is (10 sec: 15155.2, 60 sec: 14199.5, 300 sec: 13107.2). Total num frames: 7368704. Throughput: 0: 3528.0. Samples: 1833780. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:39:44,062][15827] Avg episode reward: [(0, '4.482')] +[2025-08-29 18:39:44,365][19393] Updated weights for policy 0, policy_version 1800 (0.0013) +[2025-08-29 18:39:47,118][19393] Updated weights for policy 0, policy_version 1810 (0.0015) +[2025-08-29 18:39:49,061][15827] Fps is (10 sec: 15154.5, 60 sec: 14199.4, 300 sec: 13107.2). Total num frames: 7442432. Throughput: 0: 3506.5. Samples: 1856474. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:39:49,062][15827] Avg episode reward: [(0, '4.437')] +[2025-08-29 18:39:49,878][19393] Updated weights for policy 0, policy_version 1820 (0.0012) +[2025-08-29 18:39:54,061][15827] Fps is (10 sec: 9420.5, 60 sec: 13312.0, 300 sec: 12940.6). Total num frames: 7462912. Throughput: 0: 3239.0. Samples: 1867276. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:39:54,062][15827] Avg episode reward: [(0, '4.319')] +[2025-08-29 18:39:56,134][19393] Updated weights for policy 0, policy_version 1830 (0.0011) +[2025-08-29 18:39:58,756][19393] Updated weights for policy 0, policy_version 1840 (0.0009) +[2025-08-29 18:39:59,060][15827] Fps is (10 sec: 9830.9, 60 sec: 13516.8, 300 sec: 13124.4). Total num frames: 7540736. Throughput: 0: 3197.8. Samples: 1877254. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:39:59,062][15827] Avg episode reward: [(0, '4.545')] +[2025-08-29 18:40:01,493][19393] Updated weights for policy 0, policy_version 1850 (0.0011) +[2025-08-29 18:40:04,061][15827] Fps is (10 sec: 15155.4, 60 sec: 13380.2, 300 sec: 13148.8). Total num frames: 7614464. Throughput: 0: 3441.4. Samples: 1899954. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:40:04,062][15827] Avg episode reward: [(0, '4.372')] +[2025-08-29 18:40:04,128][19393] Updated weights for policy 0, policy_version 1860 (0.0013) +[2025-08-29 18:40:06,872][19393] Updated weights for policy 0, policy_version 1870 (0.0011) +[2025-08-29 18:40:09,061][15827] Fps is (10 sec: 14745.4, 60 sec: 13380.3, 300 sec: 13148.8). Total num frames: 7688192. Throughput: 0: 3493.5. Samples: 1922240. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:40:09,062][15827] Avg episode reward: [(0, '4.241')] +[2025-08-29 18:40:09,616][19393] Updated weights for policy 0, policy_version 1880 (0.0016) +[2025-08-29 18:40:12,468][19393] Updated weights for policy 0, policy_version 1890 (0.0014) +[2025-08-29 18:40:14,060][15827] Fps is (10 sec: 14746.0, 60 sec: 13243.8, 300 sec: 13162.8). Total num frames: 7761920. Throughput: 0: 3489.5. Samples: 1933226. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:40:14,061][15827] Avg episode reward: [(0, '4.266')] +[2025-08-29 18:40:15,129][19393] Updated weights for policy 0, policy_version 1900 (0.0011) +[2025-08-29 18:40:18,613][19393] Updated weights for policy 0, policy_version 1910 (0.0016) +[2025-08-29 18:40:19,060][15827] Fps is (10 sec: 13926.5, 60 sec: 13994.7, 300 sec: 13162.7). Total num frames: 7827456. Throughput: 0: 3437.3. Samples: 1954520. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:40:19,062][15827] Avg episode reward: [(0, '4.325')] +[2025-08-29 18:40:21,932][19393] Updated weights for policy 0, policy_version 1920 (0.0016) +[2025-08-29 18:40:24,060][15827] Fps is (10 sec: 13107.1, 60 sec: 13858.1, 300 sec: 13162.8). Total num frames: 7892992. Throughput: 0: 3355.0. Samples: 1973400. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:40:24,061][15827] Avg episode reward: [(0, '4.410')] +[2025-08-29 18:40:24,796][19393] Updated weights for policy 0, policy_version 1930 (0.0012) +[2025-08-29 18:40:29,753][15827] Fps is (10 sec: 8810.7, 60 sec: 12957.7, 300 sec: 12993.4). Total num frames: 7921664. Throughput: 0: 3055.6. Samples: 1973400. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:40:29,754][15827] Avg episode reward: [(0, '4.596')] +[2025-08-29 18:40:31,233][19393] Updated weights for policy 0, policy_version 1940 (0.0013) +[2025-08-29 18:40:33,952][19393] Updated weights for policy 0, policy_version 1950 (0.0014) +[2025-08-29 18:40:34,061][15827] Fps is (10 sec: 9420.5, 60 sec: 12834.1, 300 sec: 13023.9). Total num frames: 7987200. Throughput: 0: 3035.4. Samples: 1993068. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:40:34,063][15827] Avg episode reward: [(0, '4.433')] +[2025-08-29 18:40:36,789][19393] Updated weights for policy 0, policy_version 1960 (0.0013) +[2025-08-29 18:40:39,060][15827] Fps is (10 sec: 14522.5, 60 sec: 12765.9, 300 sec: 13190.5). Total num frames: 8056832. Throughput: 0: 3267.9. Samples: 2014330. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:40:39,062][15827] Avg episode reward: [(0, '4.378')] +[2025-08-29 18:40:39,857][19393] Updated weights for policy 0, policy_version 1970 (0.0021) +[2025-08-29 18:40:43,115][19393] Updated weights for policy 0, policy_version 1980 (0.0016) +[2025-08-29 18:40:44,060][15827] Fps is (10 sec: 13517.2, 60 sec: 12561.1, 300 sec: 13176.6). Total num frames: 8122368. Throughput: 0: 3267.5. Samples: 2024290. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:40:44,063][15827] Avg episode reward: [(0, '4.350')] +[2025-08-29 18:40:46,091][19393] Updated weights for policy 0, policy_version 1990 (0.0015) +[2025-08-29 18:40:48,900][19393] Updated weights for policy 0, policy_version 2000 (0.0013) +[2025-08-29 18:40:49,060][15827] Fps is (10 sec: 13516.8, 60 sec: 12492.9, 300 sec: 13204.4). Total num frames: 8192000. Throughput: 0: 3212.1. Samples: 2044496. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:40:49,062][15827] Avg episode reward: [(0, '4.421')] +[2025-08-29 18:40:51,727][19393] Updated weights for policy 0, policy_version 2010 (0.0012) +[2025-08-29 18:40:54,060][15827] Fps is (10 sec: 13926.6, 60 sec: 13312.1, 300 sec: 13190.5). Total num frames: 8261632. Throughput: 0: 3188.8. Samples: 2065734. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:40:54,061][15827] Avg episode reward: [(0, '4.350')] +[2025-08-29 18:40:54,708][19393] Updated weights for policy 0, policy_version 2020 (0.0014) +[2025-08-29 18:40:57,638][19393] Updated weights for policy 0, policy_version 2030 (0.0015) +[2025-08-29 18:40:59,061][15827] Fps is (10 sec: 14335.8, 60 sec: 13243.7, 300 sec: 13204.4). Total num frames: 8335360. Throughput: 0: 3179.0. Samples: 2076280. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:40:59,062][15827] Avg episode reward: [(0, '4.415')] +[2025-08-29 18:41:00,285][19393] Updated weights for policy 0, policy_version 2040 (0.0014) +[2025-08-29 18:41:05,587][15827] Fps is (10 sec: 10305.0, 60 sec: 12449.1, 300 sec: 13053.5). Total num frames: 8380416. Throughput: 0: 2858.7. Samples: 2087528. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:41:05,590][15827] Avg episode reward: [(0, '4.290')] +[2025-08-29 18:41:06,695][19393] Updated weights for policy 0, policy_version 2050 (0.0014) +[2025-08-29 18:41:09,060][15827] Fps is (10 sec: 9420.9, 60 sec: 12356.3, 300 sec: 13037.8). Total num frames: 8429568. Throughput: 0: 2980.9. Samples: 2107542. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:41:09,062][15827] Avg episode reward: [(0, '4.317')] +[2025-08-29 18:41:09,525][19393] Updated weights for policy 0, policy_version 2060 (0.0015) +[2025-08-29 18:41:12,383][19393] Updated weights for policy 0, policy_version 2070 (0.0011) +[2025-08-29 18:41:14,061][15827] Fps is (10 sec: 14502.0, 60 sec: 12356.2, 300 sec: 13218.3). Total num frames: 8503296. Throughput: 0: 3261.1. Samples: 2117892. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:41:14,062][15827] Avg episode reward: [(0, '4.445')] +[2025-08-29 18:41:15,057][19393] Updated weights for policy 0, policy_version 2080 (0.0012) +[2025-08-29 18:41:17,872][19393] Updated weights for policy 0, policy_version 2090 (0.0014) +[2025-08-29 18:41:19,060][15827] Fps is (10 sec: 14336.1, 60 sec: 12424.5, 300 sec: 13232.2). Total num frames: 8572928. Throughput: 0: 3274.7. Samples: 2140430. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:41:19,061][15827] Avg episode reward: [(0, '4.245')] +[2025-08-29 18:41:20,903][19393] Updated weights for policy 0, policy_version 2100 (0.0012) +[2025-08-29 18:41:23,564][19393] Updated weights for policy 0, policy_version 2110 (0.0014) +[2025-08-29 18:41:24,060][15827] Fps is (10 sec: 14336.3, 60 sec: 12561.1, 300 sec: 13246.1). Total num frames: 8646656. Throughput: 0: 3282.9. Samples: 2162062. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:41:24,061][15827] Avg episode reward: [(0, '4.502')] +[2025-08-29 18:41:26,303][19393] Updated weights for policy 0, policy_version 2120 (0.0015) +[2025-08-29 18:41:29,006][19393] Updated weights for policy 0, policy_version 2130 (0.0013) +[2025-08-29 18:41:29,061][15827] Fps is (10 sec: 15154.7, 60 sec: 13536.4, 300 sec: 13273.8). Total num frames: 8724480. Throughput: 0: 3306.2. Samples: 2173072. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:41:29,062][15827] Avg episode reward: [(0, '4.272')] +[2025-08-29 18:41:31,853][19393] Updated weights for policy 0, policy_version 2140 (0.0013) +[2025-08-29 18:41:34,061][15827] Fps is (10 sec: 14335.3, 60 sec: 13380.2, 300 sec: 13259.9). Total num frames: 8790016. Throughput: 0: 3352.7. Samples: 2195370. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:41:34,064][15827] Avg episode reward: [(0, '4.450')] +[2025-08-29 18:41:35,030][19393] Updated weights for policy 0, policy_version 2150 (0.0018) +[2025-08-29 18:41:37,774][19393] Updated weights for policy 0, policy_version 2160 (0.0014) +[2025-08-29 18:41:41,416][15827] Fps is (10 sec: 9945.3, 60 sec: 12677.7, 300 sec: 13113.6). Total num frames: 8847360. Throughput: 0: 2936.2. Samples: 2204780. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:41:41,417][15827] Avg episode reward: [(0, '4.569')] +[2025-08-29 18:41:41,424][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002160_8847360.pth... +[2025-08-29 18:41:41,507][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001379_5648384.pth +[2025-08-29 18:41:44,061][15827] Fps is (10 sec: 9010.8, 60 sec: 12629.2, 300 sec: 13093.3). Total num frames: 8880128. Throughput: 0: 3054.7. Samples: 2213746. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:41:44,064][15827] Avg episode reward: [(0, '4.361')] +[2025-08-29 18:41:44,779][19393] Updated weights for policy 0, policy_version 2170 (0.0015) +[2025-08-29 18:41:47,687][19393] Updated weights for policy 0, policy_version 2180 (0.0016) +[2025-08-29 18:41:49,061][15827] Fps is (10 sec: 12859.2, 60 sec: 12560.9, 300 sec: 13235.4). Total num frames: 8945664. Throughput: 0: 3347.3. Samples: 2233046. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:41:49,063][15827] Avg episode reward: [(0, '4.626')] +[2025-08-29 18:41:50,830][19393] Updated weights for policy 0, policy_version 2190 (0.0013) +[2025-08-29 18:41:53,928][19393] Updated weights for policy 0, policy_version 2200 (0.0016) +[2025-08-29 18:41:54,061][15827] Fps is (10 sec: 13108.2, 60 sec: 12492.8, 300 sec: 13232.2). Total num frames: 9011200. Throughput: 0: 3230.4. Samples: 2252908. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:41:54,062][15827] Avg episode reward: [(0, '4.361')] +[2025-08-29 18:41:56,687][19393] Updated weights for policy 0, policy_version 2210 (0.0012) +[2025-08-29 18:41:59,061][15827] Fps is (10 sec: 14336.2, 60 sec: 12561.0, 300 sec: 13259.9). Total num frames: 9089024. Throughput: 0: 3248.8. Samples: 2264090. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:41:59,062][15827] Avg episode reward: [(0, '4.144')] +[2025-08-29 18:41:59,296][19393] Updated weights for policy 0, policy_version 2220 (0.0013) +[2025-08-29 18:42:02,052][19393] Updated weights for policy 0, policy_version 2230 (0.0009) +[2025-08-29 18:42:04,060][15827] Fps is (10 sec: 14745.8, 60 sec: 13309.3, 300 sec: 13246.0). Total num frames: 9158656. Throughput: 0: 3251.9. Samples: 2286766. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:42:04,062][15827] Avg episode reward: [(0, '4.461')] +[2025-08-29 18:42:05,198][19393] Updated weights for policy 0, policy_version 2240 (0.0013) +[2025-08-29 18:42:08,652][19393] Updated weights for policy 0, policy_version 2250 (0.0015) +[2025-08-29 18:42:09,061][15827] Fps is (10 sec: 13106.7, 60 sec: 13175.3, 300 sec: 13190.5). Total num frames: 9220096. Throughput: 0: 3178.7. Samples: 2305108. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:42:09,064][15827] Avg episode reward: [(0, '4.559')] +[2025-08-29 18:42:11,854][19393] Updated weights for policy 0, policy_version 2260 (0.0017) +[2025-08-29 18:42:17,254][15827] Fps is (10 sec: 9003.3, 60 sec: 12250.4, 300 sec: 12994.3). Total num frames: 9277440. Throughput: 0: 2937.4. Samples: 2314634. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:42:17,255][15827] Avg episode reward: [(0, '4.611')] +[2025-08-29 18:42:19,060][15827] Fps is (10 sec: 7373.4, 60 sec: 12014.9, 300 sec: 12940.6). Total num frames: 9293824. Throughput: 0: 2805.8. Samples: 2321630. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:42:19,062][15827] Avg episode reward: [(0, '4.455')] +[2025-08-29 18:42:19,067][19393] Updated weights for policy 0, policy_version 2270 (0.0021) +[2025-08-29 18:42:22,791][19393] Updated weights for policy 0, policy_version 2280 (0.0018) +[2025-08-29 18:42:24,061][15827] Fps is (10 sec: 10831.7, 60 sec: 11741.8, 300 sec: 13049.8). Total num frames: 9351168. Throughput: 0: 3128.3. Samples: 2338182. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:42:24,063][15827] Avg episode reward: [(0, '4.419')] +[2025-08-29 18:42:26,582][19393] Updated weights for policy 0, policy_version 2290 (0.0021) +[2025-08-29 18:42:29,061][15827] Fps is (10 sec: 11059.0, 60 sec: 11332.3, 300 sec: 12982.2). Total num frames: 9404416. Throughput: 0: 2944.3. Samples: 2346236. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:42:29,062][15827] Avg episode reward: [(0, '4.502')] +[2025-08-29 18:42:30,318][19393] Updated weights for policy 0, policy_version 2300 (0.0014) +[2025-08-29 18:42:33,804][19393] Updated weights for policy 0, policy_version 2310 (0.0014) +[2025-08-29 18:42:34,061][15827] Fps is (10 sec: 11058.6, 60 sec: 11195.7, 300 sec: 12926.7). Total num frames: 9461760. Throughput: 0: 2880.7. Samples: 2362680. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:42:34,063][15827] Avg episode reward: [(0, '4.433')] +[2025-08-29 18:42:37,110][19393] Updated weights for policy 0, policy_version 2320 (0.0016) +[2025-08-29 18:42:39,061][15827] Fps is (10 sec: 11878.3, 60 sec: 11724.3, 300 sec: 12871.1). Total num frames: 9523200. Throughput: 0: 2863.5. Samples: 2381766. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:42:39,062][15827] Avg episode reward: [(0, '4.413')] +[2025-08-29 18:42:40,262][19393] Updated weights for policy 0, policy_version 2330 (0.0016) +[2025-08-29 18:42:43,382][19393] Updated weights for policy 0, policy_version 2340 (0.0017) +[2025-08-29 18:42:44,060][15827] Fps is (10 sec: 12698.4, 60 sec: 11810.3, 300 sec: 12843.4). Total num frames: 9588736. Throughput: 0: 2829.7. Samples: 2391426. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) +[2025-08-29 18:42:44,062][15827] Avg episode reward: [(0, '4.435')] +[2025-08-29 18:42:46,719][19393] Updated weights for policy 0, policy_version 2350 (0.0017) +[2025-08-29 18:42:49,060][15827] Fps is (10 sec: 12697.9, 60 sec: 11742.0, 300 sec: 12801.7). Total num frames: 9650176. Throughput: 0: 2744.0. Samples: 2410246. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:42:49,062][15827] Avg episode reward: [(0, '4.199')] +[2025-08-29 18:42:53,630][19393] Updated weights for policy 0, policy_version 2360 (0.0014) +[2025-08-29 18:42:54,061][15827] Fps is (10 sec: 8191.9, 60 sec: 10990.9, 300 sec: 12607.3). Total num frames: 9670656. Throughput: 0: 2504.3. Samples: 2417800. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:42:54,062][15827] Avg episode reward: [(0, '4.225')] +[2025-08-29 18:42:57,093][19393] Updated weights for policy 0, policy_version 2370 (0.0017) +[2025-08-29 18:42:59,061][15827] Fps is (10 sec: 7782.4, 60 sec: 10649.7, 300 sec: 12695.0). Total num frames: 9728000. Throughput: 0: 2681.4. Samples: 2426734. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:42:59,062][15827] Avg episode reward: [(0, '4.307')] +[2025-08-29 18:43:00,958][19393] Updated weights for policy 0, policy_version 2380 (0.0022) +[2025-08-29 18:43:04,060][15827] Fps is (10 sec: 11059.4, 60 sec: 10376.5, 300 sec: 12649.0). Total num frames: 9781248. Throughput: 0: 2683.9. Samples: 2442406. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:43:04,062][15827] Avg episode reward: [(0, '4.503')] +[2025-08-29 18:43:04,673][19393] Updated weights for policy 0, policy_version 2390 (0.0020) +[2025-08-29 18:43:07,847][19393] Updated weights for policy 0, policy_version 2400 (0.0016) +[2025-08-29 18:43:09,061][15827] Fps is (10 sec: 11468.6, 60 sec: 10376.6, 300 sec: 12607.3). Total num frames: 9842688. Throughput: 0: 2732.1. Samples: 2461126. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:43:09,062][15827] Avg episode reward: [(0, '4.495')] +[2025-08-29 18:43:11,537][19393] Updated weights for policy 0, policy_version 2410 (0.0018) +[2025-08-29 18:43:14,060][15827] Fps is (10 sec: 11878.4, 60 sec: 10959.9, 300 sec: 12551.8). Total num frames: 9900032. Throughput: 0: 2732.4. Samples: 2469192. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:43:14,062][15827] Avg episode reward: [(0, '4.496')] +[2025-08-29 18:43:14,960][19393] Updated weights for policy 0, policy_version 2420 (0.0017) +[2025-08-29 18:43:17,837][19393] Updated weights for policy 0, policy_version 2430 (0.0010) +[2025-08-29 18:43:19,061][15827] Fps is (10 sec: 12697.7, 60 sec: 11264.0, 300 sec: 12551.8). Total num frames: 9969664. Throughput: 0: 2801.5. Samples: 2488748. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:43:19,062][15827] Avg episode reward: [(0, '4.315')] +[2025-08-29 18:43:20,894][19393] Updated weights for policy 0, policy_version 2440 (0.0017) +[2025-08-29 18:43:23,953][19393] Updated weights for policy 0, policy_version 2450 (0.0011) +[2025-08-29 18:43:24,061][15827] Fps is (10 sec: 13516.6, 60 sec: 11400.5, 300 sec: 12524.0). Total num frames: 10035200. Throughput: 0: 2829.3. Samples: 2509086. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:43:24,063][15827] Avg episode reward: [(0, '4.404')] +[2025-08-29 18:43:29,060][15827] Fps is (10 sec: 8192.1, 60 sec: 10786.2, 300 sec: 12315.8). Total num frames: 10051584. Throughput: 0: 2750.3. Samples: 2515188. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:43:29,063][15827] Avg episode reward: [(0, '4.366')] +[2025-08-29 18:43:30,548][19393] Updated weights for policy 0, policy_version 2460 (0.0015) +[2025-08-29 18:43:33,335][19393] Updated weights for policy 0, policy_version 2470 (0.0014) +[2025-08-29 18:43:34,060][15827] Fps is (10 sec: 9011.3, 60 sec: 11059.3, 300 sec: 12464.5). Total num frames: 10125312. Throughput: 0: 2612.4. Samples: 2527802. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:43:34,061][15827] Avg episode reward: [(0, '4.263')] +[2025-08-29 18:43:36,279][19393] Updated weights for policy 0, policy_version 2480 (0.0015) +[2025-08-29 18:43:39,061][15827] Fps is (10 sec: 14335.2, 60 sec: 11195.7, 300 sec: 12468.5). Total num frames: 10194944. Throughput: 0: 2925.7. Samples: 2549456. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:43:39,063][15827] Avg episode reward: [(0, '4.259')] +[2025-08-29 18:43:39,089][19393] Updated weights for policy 0, policy_version 2490 (0.0013) +[2025-08-29 18:43:39,090][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002490_10199040.pth... +[2025-08-29 18:43:39,165][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000001780_7290880.pth +[2025-08-29 18:43:42,008][19393] Updated weights for policy 0, policy_version 2500 (0.0016) +[2025-08-29 18:43:44,061][15827] Fps is (10 sec: 13926.3, 60 sec: 11264.0, 300 sec: 12454.6). Total num frames: 10264576. Throughput: 0: 2953.8. Samples: 2559654. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:43:44,062][15827] Avg episode reward: [(0, '4.462')] +[2025-08-29 18:43:45,157][19393] Updated weights for policy 0, policy_version 2510 (0.0012) +[2025-08-29 18:43:48,519][19393] Updated weights for policy 0, policy_version 2520 (0.0019) +[2025-08-29 18:43:49,061][15827] Fps is (10 sec: 13107.6, 60 sec: 11264.0, 300 sec: 12413.0). Total num frames: 10326016. Throughput: 0: 3020.9. Samples: 2578346. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:43:49,063][15827] Avg episode reward: [(0, '4.308')] +[2025-08-29 18:43:51,804][19393] Updated weights for policy 0, policy_version 2530 (0.0019) +[2025-08-29 18:43:54,061][15827] Fps is (10 sec: 12288.0, 60 sec: 11946.7, 300 sec: 12399.1). Total num frames: 10387456. Throughput: 0: 3030.8. Samples: 2597512. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:43:54,063][15827] Avg episode reward: [(0, '4.352')] +[2025-08-29 18:43:55,138][19393] Updated weights for policy 0, policy_version 2540 (0.0015) +[2025-08-29 18:43:58,255][19393] Updated weights for policy 0, policy_version 2550 (0.0018) +[2025-08-29 18:43:59,061][15827] Fps is (10 sec: 12697.8, 60 sec: 12083.2, 300 sec: 12343.5). Total num frames: 10452992. Throughput: 0: 3047.4. Samples: 2606326. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:43:59,062][15827] Avg episode reward: [(0, '4.502')] +[2025-08-29 18:44:04,754][15827] Fps is (10 sec: 8809.5, 60 sec: 11540.1, 300 sec: 12162.2). Total num frames: 10481664. Throughput: 0: 2802.4. Samples: 2616800. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:44:04,755][15827] Avg episode reward: [(0, '4.691')] +[2025-08-29 18:44:04,757][19378] Saving new best policy, reward=4.691! +[2025-08-29 18:44:04,915][19393] Updated weights for policy 0, policy_version 2560 (0.0012) +[2025-08-29 18:44:07,901][19393] Updated weights for policy 0, policy_version 2570 (0.0014) +[2025-08-29 18:44:09,061][15827] Fps is (10 sec: 8601.5, 60 sec: 11605.3, 300 sec: 12107.5). Total num frames: 10539008. Throughput: 0: 2803.6. Samples: 2635250. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:44:09,063][15827] Avg episode reward: [(0, '4.231')] +[2025-08-29 18:44:11,310][19393] Updated weights for policy 0, policy_version 2580 (0.0022) +[2025-08-29 18:44:14,060][15827] Fps is (10 sec: 12764.4, 60 sec: 11673.6, 300 sec: 12246.3). Total num frames: 10600448. Throughput: 0: 2860.8. Samples: 2643924. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:44:14,061][15827] Avg episode reward: [(0, '4.388')] +[2025-08-29 18:44:14,450][19393] Updated weights for policy 0, policy_version 2590 (0.0012) +[2025-08-29 18:44:17,346][19393] Updated weights for policy 0, policy_version 2600 (0.0012) +[2025-08-29 18:44:19,061][15827] Fps is (10 sec: 13107.4, 60 sec: 11673.6, 300 sec: 12232.5). Total num frames: 10670080. Throughput: 0: 3037.6. Samples: 2664494. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:44:19,062][15827] Avg episode reward: [(0, '4.446')] +[2025-08-29 18:44:20,443][19393] Updated weights for policy 0, policy_version 2610 (0.0014) +[2025-08-29 18:44:23,226][19393] Updated weights for policy 0, policy_version 2620 (0.0012) +[2025-08-29 18:44:24,061][15827] Fps is (10 sec: 13926.2, 60 sec: 11741.9, 300 sec: 12218.6). Total num frames: 10739712. Throughput: 0: 3017.4. Samples: 2685238. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:44:24,062][15827] Avg episode reward: [(0, '4.418')] +[2025-08-29 18:44:26,099][19393] Updated weights for policy 0, policy_version 2630 (0.0012) +[2025-08-29 18:44:29,060][15827] Fps is (10 sec: 13926.6, 60 sec: 12629.4, 300 sec: 12176.9). Total num frames: 10809344. Throughput: 0: 3029.3. Samples: 2695974. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:44:29,062][15827] Avg episode reward: [(0, '4.299')] +[2025-08-29 18:44:29,209][19393] Updated weights for policy 0, policy_version 2640 (0.0015) +[2025-08-29 18:44:32,017][19393] Updated weights for policy 0, policy_version 2650 (0.0011) +[2025-08-29 18:44:34,060][15827] Fps is (10 sec: 14336.1, 60 sec: 12629.3, 300 sec: 12176.9). Total num frames: 10883072. Throughput: 0: 3078.6. Samples: 2716884. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:44:34,061][15827] Avg episode reward: [(0, '4.530')] +[2025-08-29 18:44:34,763][19393] Updated weights for policy 0, policy_version 2660 (0.0012) +[2025-08-29 18:44:40,583][15827] Fps is (10 sec: 9953.2, 60 sec: 11850.8, 300 sec: 11990.1). Total num frames: 10924032. Throughput: 0: 2803.5. Samples: 2727940. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:44:40,584][15827] Avg episode reward: [(0, '4.444')] +[2025-08-29 18:44:41,183][19393] Updated weights for policy 0, policy_version 2670 (0.0011) +[2025-08-29 18:44:44,060][15827] Fps is (10 sec: 9421.0, 60 sec: 11878.4, 300 sec: 11982.6). Total num frames: 10977280. Throughput: 0: 2893.7. Samples: 2736540. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:44:44,062][19393] Updated weights for policy 0, policy_version 2680 (0.0012) +[2025-08-29 18:44:44,062][15827] Avg episode reward: [(0, '4.252')] +[2025-08-29 18:44:46,817][19393] Updated weights for policy 0, policy_version 2690 (0.0017) +[2025-08-29 18:44:49,061][15827] Fps is (10 sec: 14494.9, 60 sec: 12015.0, 300 sec: 12149.2). Total num frames: 11046912. Throughput: 0: 3199.3. Samples: 2758550. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:44:49,063][15827] Avg episode reward: [(0, '4.443')] +[2025-08-29 18:44:49,693][19393] Updated weights for policy 0, policy_version 2700 (0.0015) +[2025-08-29 18:44:52,622][19393] Updated weights for policy 0, policy_version 2710 (0.0012) +[2025-08-29 18:44:54,060][15827] Fps is (10 sec: 13926.3, 60 sec: 12151.5, 300 sec: 12121.4). Total num frames: 11116544. Throughput: 0: 3215.1. Samples: 2779928. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:44:54,062][15827] Avg episode reward: [(0, '4.306')] +[2025-08-29 18:44:55,579][19393] Updated weights for policy 0, policy_version 2720 (0.0013) +[2025-08-29 18:44:58,498][19393] Updated weights for policy 0, policy_version 2730 (0.0013) +[2025-08-29 18:44:59,061][15827] Fps is (10 sec: 14335.7, 60 sec: 12288.0, 300 sec: 12121.4). Total num frames: 11190272. Throughput: 0: 3248.7. Samples: 2790118. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:44:59,062][15827] Avg episode reward: [(0, '4.287')] +[2025-08-29 18:45:01,198][19393] Updated weights for policy 0, policy_version 2740 (0.0013) +[2025-08-29 18:45:04,029][19393] Updated weights for policy 0, policy_version 2750 (0.0014) +[2025-08-29 18:45:04,060][15827] Fps is (10 sec: 14745.6, 60 sec: 13191.5, 300 sec: 12121.4). Total num frames: 11264000. Throughput: 0: 3283.3. Samples: 2812240. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:45:04,061][15827] Avg episode reward: [(0, '4.231')] +[2025-08-29 18:45:06,987][19393] Updated weights for policy 0, policy_version 2760 (0.0013) +[2025-08-29 18:45:09,060][15827] Fps is (10 sec: 13927.0, 60 sec: 13175.5, 300 sec: 12093.6). Total num frames: 11329536. Throughput: 0: 3283.2. Samples: 2832980. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:45:09,061][15827] Avg episode reward: [(0, '4.698')] +[2025-08-29 18:45:09,066][19378] Saving new best policy, reward=4.698! +[2025-08-29 18:45:09,869][19393] Updated weights for policy 0, policy_version 2770 (0.0011) +[2025-08-29 18:45:12,758][19393] Updated weights for policy 0, policy_version 2780 (0.0011) +[2025-08-29 18:45:16,421][15827] Fps is (10 sec: 9941.0, 60 sec: 12611.0, 300 sec: 11970.0). Total num frames: 11386880. Throughput: 0: 3112.9. Samples: 2843404. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:45:16,423][15827] Avg episode reward: [(0, '4.367')] +[2025-08-29 18:45:19,061][15827] Fps is (10 sec: 9420.6, 60 sec: 12561.1, 300 sec: 11968.6). Total num frames: 11423744. Throughput: 0: 3005.0. Samples: 2852110. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:45:19,062][15827] Avg episode reward: [(0, '4.320')] +[2025-08-29 18:45:19,309][19393] Updated weights for policy 0, policy_version 2790 (0.0016) +[2025-08-29 18:45:22,558][19393] Updated weights for policy 0, policy_version 2800 (0.0016) +[2025-08-29 18:45:24,060][15827] Fps is (10 sec: 13404.8, 60 sec: 12492.8, 300 sec: 12122.1). Total num frames: 11489280. Throughput: 0: 3309.9. Samples: 2871846. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:45:24,062][15827] Avg episode reward: [(0, '4.421')] +[2025-08-29 18:45:25,531][19393] Updated weights for policy 0, policy_version 2810 (0.0013) +[2025-08-29 18:45:28,318][19393] Updated weights for policy 0, policy_version 2820 (0.0012) +[2025-08-29 18:45:29,061][15827] Fps is (10 sec: 13517.0, 60 sec: 12492.8, 300 sec: 12107.5). Total num frames: 11558912. Throughput: 0: 3240.0. Samples: 2882340. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:45:29,062][15827] Avg episode reward: [(0, '4.640')] +[2025-08-29 18:45:31,184][19393] Updated weights for policy 0, policy_version 2830 (0.0015) +[2025-08-29 18:45:33,869][19393] Updated weights for policy 0, policy_version 2840 (0.0014) +[2025-08-29 18:45:34,060][15827] Fps is (10 sec: 14335.9, 60 sec: 12492.8, 300 sec: 12121.4). Total num frames: 11632640. Throughput: 0: 3240.2. Samples: 2904358. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:45:34,062][15827] Avg episode reward: [(0, '4.403')] +[2025-08-29 18:45:36,716][19393] Updated weights for policy 0, policy_version 2850 (0.0011) +[2025-08-29 18:45:39,060][15827] Fps is (10 sec: 14745.7, 60 sec: 13378.5, 300 sec: 12149.2). Total num frames: 11706368. Throughput: 0: 3252.1. Samples: 2926272. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:45:39,062][15827] Avg episode reward: [(0, '4.435')] +[2025-08-29 18:45:39,070][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002858_11706368.pth... +[2025-08-29 18:45:39,136][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002160_8847360.pth +[2025-08-29 18:45:39,524][19393] Updated weights for policy 0, policy_version 2860 (0.0014) +[2025-08-29 18:45:42,358][19393] Updated weights for policy 0, policy_version 2870 (0.0012) +[2025-08-29 18:45:44,060][15827] Fps is (10 sec: 14745.7, 60 sec: 13380.3, 300 sec: 12163.0). Total num frames: 11780096. Throughput: 0: 3262.7. Samples: 2936938. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:45:44,062][15827] Avg episode reward: [(0, '4.445')] +[2025-08-29 18:45:45,184][19393] Updated weights for policy 0, policy_version 2880 (0.0010) +[2025-08-29 18:45:48,091][19393] Updated weights for policy 0, policy_version 2890 (0.0014) +[2025-08-29 18:45:52,251][15827] Fps is (10 sec: 10247.6, 60 sec: 12575.2, 300 sec: 12005.4). Total num frames: 11841536. Throughput: 0: 3036.7. Samples: 2958580. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:45:52,252][15827] Avg episode reward: [(0, '4.426')] +[2025-08-29 18:45:54,061][15827] Fps is (10 sec: 9011.1, 60 sec: 12561.0, 300 sec: 11982.5). Total num frames: 11870208. Throughput: 0: 2989.5. Samples: 2967506. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:45:54,062][15827] Avg episode reward: [(0, '4.455')] +[2025-08-29 18:45:54,524][19393] Updated weights for policy 0, policy_version 2900 (0.0012) +[2025-08-29 18:45:57,361][19393] Updated weights for policy 0, policy_version 2910 (0.0013) +[2025-08-29 18:45:59,060][15827] Fps is (10 sec: 15036.8, 60 sec: 12561.1, 300 sec: 12142.6). Total num frames: 11943936. Throughput: 0: 3161.8. Samples: 2978222. Policy #0 lag: (min: 0.0, avg: 1.6, max: 4.0) +[2025-08-29 18:45:59,062][15827] Avg episode reward: [(0, '4.502')] +[2025-08-29 18:46:00,146][19393] Updated weights for policy 0, policy_version 2920 (0.0014) +[2025-08-29 18:46:02,830][19393] Updated weights for policy 0, policy_version 2930 (0.0010) +[2025-08-29 18:46:04,060][15827] Fps is (10 sec: 14336.3, 60 sec: 12492.8, 300 sec: 12149.2). Total num frames: 12013568. Throughput: 0: 3299.5. Samples: 3000588. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:46:04,061][15827] Avg episode reward: [(0, '4.259')] +[2025-08-29 18:46:05,541][19393] Updated weights for policy 0, policy_version 2940 (0.0013) +[2025-08-29 18:46:08,248][19393] Updated weights for policy 0, policy_version 2950 (0.0009) +[2025-08-29 18:46:09,060][15827] Fps is (10 sec: 14745.8, 60 sec: 12697.6, 300 sec: 12163.0). Total num frames: 12091392. Throughput: 0: 3359.0. Samples: 3023002. Policy #0 lag: (min: 0.0, avg: 1.1, max: 4.0) +[2025-08-29 18:46:09,062][15827] Avg episode reward: [(0, '4.580')] +[2025-08-29 18:46:11,079][19393] Updated weights for policy 0, policy_version 2960 (0.0012) +[2025-08-29 18:46:13,922][19393] Updated weights for policy 0, policy_version 2970 (0.0016) +[2025-08-29 18:46:14,060][15827] Fps is (10 sec: 15155.0, 60 sec: 13501.9, 300 sec: 12176.9). Total num frames: 12165120. Throughput: 0: 3368.8. Samples: 3033938. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:46:14,061][15827] Avg episode reward: [(0, '4.374')] +[2025-08-29 18:46:16,738][19393] Updated weights for policy 0, policy_version 2980 (0.0013) +[2025-08-29 18:46:19,060][15827] Fps is (10 sec: 14335.8, 60 sec: 13516.8, 300 sec: 12163.0). Total num frames: 12234752. Throughput: 0: 3351.3. Samples: 3055166. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:46:19,061][15827] Avg episode reward: [(0, '4.473')] +[2025-08-29 18:46:19,886][19393] Updated weights for policy 0, policy_version 2990 (0.0016) +[2025-08-29 18:46:22,789][19393] Updated weights for policy 0, policy_version 3000 (0.0017) +[2025-08-29 18:46:24,060][15827] Fps is (10 sec: 13516.9, 60 sec: 13516.8, 300 sec: 12121.4). Total num frames: 12300288. Throughput: 0: 3318.9. Samples: 3075622. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:46:24,063][15827] Avg episode reward: [(0, '4.445')] +[2025-08-29 18:46:29,060][15827] Fps is (10 sec: 8601.6, 60 sec: 12697.6, 300 sec: 11968.7). Total num frames: 12320768. Throughput: 0: 3153.8. Samples: 3078860. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:46:29,062][15827] Avg episode reward: [(0, '4.387')] +[2025-08-29 18:46:29,387][19393] Updated weights for policy 0, policy_version 3010 (0.0014) +[2025-08-29 18:46:32,323][19393] Updated weights for policy 0, policy_version 3020 (0.0014) +[2025-08-29 18:46:34,060][15827] Fps is (10 sec: 9011.2, 60 sec: 12629.3, 300 sec: 12107.0). Total num frames: 12390400. Throughput: 0: 3253.4. Samples: 3094604. Policy #0 lag: (min: 0.0, avg: 1.5, max: 4.0) +[2025-08-29 18:46:34,062][15827] Avg episode reward: [(0, '4.182')] +[2025-08-29 18:46:35,133][19393] Updated weights for policy 0, policy_version 3030 (0.0014) +[2025-08-29 18:46:37,998][19393] Updated weights for policy 0, policy_version 3040 (0.0012) +[2025-08-29 18:46:39,061][15827] Fps is (10 sec: 14335.9, 60 sec: 12629.3, 300 sec: 12149.2). Total num frames: 12464128. Throughput: 0: 3307.6. Samples: 3116348. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:46:39,062][15827] Avg episode reward: [(0, '4.507')] +[2025-08-29 18:46:40,836][19393] Updated weights for policy 0, policy_version 3050 (0.0016) +[2025-08-29 18:46:43,695][19393] Updated weights for policy 0, policy_version 3060 (0.0014) +[2025-08-29 18:46:44,061][15827] Fps is (10 sec: 14745.5, 60 sec: 12629.3, 300 sec: 12176.9). Total num frames: 12537856. Throughput: 0: 3310.6. Samples: 3127198. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:46:44,062][15827] Avg episode reward: [(0, '4.395')] +[2025-08-29 18:46:46,574][19393] Updated weights for policy 0, policy_version 3070 (0.0012) +[2025-08-29 18:46:49,060][15827] Fps is (10 sec: 14745.7, 60 sec: 13554.8, 300 sec: 12204.7). Total num frames: 12611584. Throughput: 0: 3291.9. Samples: 3148724. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:46:49,062][15827] Avg episode reward: [(0, '4.618')] +[2025-08-29 18:46:49,391][19393] Updated weights for policy 0, policy_version 3080 (0.0018) +[2025-08-29 18:46:52,204][19393] Updated weights for policy 0, policy_version 3090 (0.0012) +[2025-08-29 18:46:54,060][15827] Fps is (10 sec: 14336.3, 60 sec: 13516.8, 300 sec: 12176.9). Total num frames: 12681216. Throughput: 0: 3269.9. Samples: 3170146. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:46:54,062][15827] Avg episode reward: [(0, '4.170')] +[2025-08-29 18:46:55,198][19393] Updated weights for policy 0, policy_version 3100 (0.0015) +[2025-08-29 18:46:58,019][19393] Updated weights for policy 0, policy_version 3110 (0.0013) +[2025-08-29 18:46:59,061][15827] Fps is (10 sec: 13925.9, 60 sec: 13448.4, 300 sec: 12176.9). Total num frames: 12750848. Throughput: 0: 3265.3. Samples: 3180876. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:46:59,063][15827] Avg episode reward: [(0, '4.565')] +[2025-08-29 18:47:04,060][15827] Fps is (10 sec: 8601.5, 60 sec: 12561.1, 300 sec: 12024.2). Total num frames: 12767232. Throughput: 0: 3071.2. Samples: 3193370. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:47:04,062][15827] Avg episode reward: [(0, '4.576')] +[2025-08-29 18:47:04,857][19393] Updated weights for policy 0, policy_version 3120 (0.0014) +[2025-08-29 18:47:08,218][19393] Updated weights for policy 0, policy_version 3130 (0.0015) +[2025-08-29 18:47:09,061][15827] Fps is (10 sec: 7782.3, 60 sec: 12287.9, 300 sec: 12169.8). Total num frames: 12828672. Throughput: 0: 2939.6. Samples: 3207906. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:47:09,063][15827] Avg episode reward: [(0, '4.267')] +[2025-08-29 18:47:11,413][19393] Updated weights for policy 0, policy_version 3140 (0.0015) +[2025-08-29 18:47:14,060][15827] Fps is (10 sec: 13107.1, 60 sec: 12219.7, 300 sec: 12218.6). Total num frames: 12898304. Throughput: 0: 3081.6. Samples: 3217530. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:47:14,062][15827] Avg episode reward: [(0, '4.751')] +[2025-08-29 18:47:14,063][19378] Saving new best policy, reward=4.751! +[2025-08-29 18:47:14,592][19393] Updated weights for policy 0, policy_version 3150 (0.0016) +[2025-08-29 18:47:17,597][19393] Updated weights for policy 0, policy_version 3160 (0.0013) +[2025-08-29 18:47:19,060][15827] Fps is (10 sec: 13108.0, 60 sec: 12083.2, 300 sec: 12232.5). Total num frames: 12959744. Throughput: 0: 3166.6. Samples: 3237100. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:47:19,062][15827] Avg episode reward: [(0, '4.426')] +[2025-08-29 18:47:21,004][19393] Updated weights for policy 0, policy_version 3170 (0.0015) +[2025-08-29 18:47:24,061][15827] Fps is (10 sec: 12287.9, 60 sec: 12014.9, 300 sec: 12260.2). Total num frames: 13021184. Throughput: 0: 3089.7. Samples: 3255386. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:47:24,062][15827] Avg episode reward: [(0, '4.482')] +[2025-08-29 18:47:24,423][19393] Updated weights for policy 0, policy_version 3180 (0.0018) +[2025-08-29 18:47:27,574][19393] Updated weights for policy 0, policy_version 3190 (0.0015) +[2025-08-29 18:47:29,060][15827] Fps is (10 sec: 12697.5, 60 sec: 12765.9, 300 sec: 12288.0). Total num frames: 13086720. Throughput: 0: 3058.0. Samples: 3264810. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:47:29,068][15827] Avg episode reward: [(0, '4.401')] +[2025-08-29 18:47:30,708][19393] Updated weights for policy 0, policy_version 3200 (0.0015) +[2025-08-29 18:47:34,060][15827] Fps is (10 sec: 12288.1, 60 sec: 12561.1, 300 sec: 12274.1). Total num frames: 13144064. Throughput: 0: 2989.0. Samples: 3283230. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:47:34,062][15827] Avg episode reward: [(0, '4.407')] +[2025-08-29 18:47:34,117][19393] Updated weights for policy 0, policy_version 3210 (0.0018) +[2025-08-29 18:47:39,752][15827] Fps is (10 sec: 8045.3, 60 sec: 11675.6, 300 sec: 12120.7). Total num frames: 13172736. Throughput: 0: 2689.2. Samples: 3293018. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:47:39,753][15827] Avg episode reward: [(0, '4.305')] +[2025-08-29 18:47:39,759][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003216_13172736.pth... +[2025-08-29 18:47:39,843][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002490_10199040.pth +[2025-08-29 18:47:40,858][19393] Updated weights for policy 0, policy_version 3220 (0.0013) +[2025-08-29 18:47:43,824][19393] Updated weights for policy 0, policy_version 3230 (0.0014) +[2025-08-29 18:47:44,060][15827] Fps is (10 sec: 8601.6, 60 sec: 11537.1, 300 sec: 12135.3). Total num frames: 13230080. Throughput: 0: 2671.5. Samples: 3301094. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:47:44,062][15827] Avg episode reward: [(0, '4.470')] +[2025-08-29 18:47:46,982][19393] Updated weights for policy 0, policy_version 3240 (0.0015) +[2025-08-29 18:47:49,060][15827] Fps is (10 sec: 13200.9, 60 sec: 11400.6, 300 sec: 12288.0). Total num frames: 13295616. Throughput: 0: 2826.5. Samples: 3320562. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:47:49,061][15827] Avg episode reward: [(0, '4.426')] +[2025-08-29 18:47:50,069][19393] Updated weights for policy 0, policy_version 3250 (0.0013) +[2025-08-29 18:47:53,164][19393] Updated weights for policy 0, policy_version 3260 (0.0014) +[2025-08-29 18:47:54,060][15827] Fps is (10 sec: 13107.3, 60 sec: 11332.3, 300 sec: 12315.8). Total num frames: 13361152. Throughput: 0: 2943.9. Samples: 3340380. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:47:54,062][15827] Avg episode reward: [(0, '4.520')] +[2025-08-29 18:47:56,215][19393] Updated weights for policy 0, policy_version 3270 (0.0016) +[2025-08-29 18:47:59,060][15827] Fps is (10 sec: 13516.7, 60 sec: 11332.4, 300 sec: 12371.3). Total num frames: 13430784. Throughput: 0: 2956.6. Samples: 3350576. Policy #0 lag: (min: 0.0, avg: 1.7, max: 3.0) +[2025-08-29 18:47:59,062][15827] Avg episode reward: [(0, '4.450')] +[2025-08-29 18:47:59,231][19393] Updated weights for policy 0, policy_version 3280 (0.0016) +[2025-08-29 18:48:02,283][19393] Updated weights for policy 0, policy_version 3290 (0.0014) +[2025-08-29 18:48:04,061][15827] Fps is (10 sec: 13515.6, 60 sec: 12151.3, 300 sec: 12385.2). Total num frames: 13496320. Throughput: 0: 2976.0. Samples: 3371024. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:48:04,063][15827] Avg episode reward: [(0, '4.382')] +[2025-08-29 18:48:05,502][19393] Updated weights for policy 0, policy_version 3300 (0.0013) +[2025-08-29 18:48:08,561][19393] Updated weights for policy 0, policy_version 3310 (0.0014) +[2025-08-29 18:48:09,061][15827] Fps is (10 sec: 13106.9, 60 sec: 12219.8, 300 sec: 12413.0). Total num frames: 13561856. Throughput: 0: 3007.6. Samples: 3390728. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:48:09,062][15827] Avg episode reward: [(0, '4.440')] +[2025-08-29 18:48:11,611][19393] Updated weights for policy 0, policy_version 3320 (0.0012) +[2025-08-29 18:48:15,582][15827] Fps is (10 sec: 9243.8, 60 sec: 11451.5, 300 sec: 12252.6). Total num frames: 13602816. Throughput: 0: 2924.1. Samples: 3400844. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:48:15,583][15827] Avg episode reward: [(0, '4.511')] +[2025-08-29 18:48:18,250][19393] Updated weights for policy 0, policy_version 3330 (0.0015) +[2025-08-29 18:48:19,060][15827] Fps is (10 sec: 8601.8, 60 sec: 11468.8, 300 sec: 12246.4). Total num frames: 13647872. Throughput: 0: 2799.1. Samples: 3409188. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:48:19,061][15827] Avg episode reward: [(0, '4.412')] +[2025-08-29 18:48:21,387][19393] Updated weights for policy 0, policy_version 3340 (0.0014) +[2025-08-29 18:48:24,061][15827] Fps is (10 sec: 13526.8, 60 sec: 11605.3, 300 sec: 12426.8). Total num frames: 13717504. Throughput: 0: 3061.0. Samples: 3428646. Policy #0 lag: (min: 0.0, avg: 1.8, max: 3.0) +[2025-08-29 18:48:24,063][15827] Avg episode reward: [(0, '4.649')] +[2025-08-29 18:48:24,566][19393] Updated weights for policy 0, policy_version 3350 (0.0012) +[2025-08-29 18:48:27,589][19393] Updated weights for policy 0, policy_version 3360 (0.0014) +[2025-08-29 18:48:29,060][15827] Fps is (10 sec: 13107.1, 60 sec: 11537.1, 300 sec: 12385.2). Total num frames: 13778944. Throughput: 0: 3055.7. Samples: 3438602. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:48:29,062][15827] Avg episode reward: [(0, '4.415')] +[2025-08-29 18:48:30,547][19393] Updated weights for policy 0, policy_version 3370 (0.0011) +[2025-08-29 18:48:33,503][19393] Updated weights for policy 0, policy_version 3380 (0.0014) +[2025-08-29 18:48:34,060][15827] Fps is (10 sec: 13107.5, 60 sec: 11741.9, 300 sec: 12385.2). Total num frames: 13848576. Throughput: 0: 3085.3. Samples: 3459402. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:48:34,062][15827] Avg episode reward: [(0, '4.437')] +[2025-08-29 18:48:36,511][19393] Updated weights for policy 0, policy_version 3390 (0.0012) +[2025-08-29 18:48:39,060][15827] Fps is (10 sec: 13926.4, 60 sec: 12569.4, 300 sec: 12385.2). Total num frames: 13918208. Throughput: 0: 3092.9. Samples: 3479560. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:48:39,062][15827] Avg episode reward: [(0, '4.313')] +[2025-08-29 18:48:39,556][19393] Updated weights for policy 0, policy_version 3400 (0.0014) +[2025-08-29 18:48:42,431][19393] Updated weights for policy 0, policy_version 3410 (0.0013) +[2025-08-29 18:48:44,060][15827] Fps is (10 sec: 13926.2, 60 sec: 12629.3, 300 sec: 12413.0). Total num frames: 13987840. Throughput: 0: 3097.5. Samples: 3489964. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:48:44,063][15827] Avg episode reward: [(0, '4.660')] +[2025-08-29 18:48:45,499][19393] Updated weights for policy 0, policy_version 3420 (0.0015) +[2025-08-29 18:48:51,420][15827] Fps is (10 sec: 9610.8, 60 sec: 11888.7, 300 sec: 12273.2). Total num frames: 14036992. Throughput: 0: 2940.6. Samples: 3510288. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:48:51,423][15827] Avg episode reward: [(0, '4.379')] +[2025-08-29 18:48:52,043][19393] Updated weights for policy 0, policy_version 3430 (0.0013) +[2025-08-29 18:48:54,060][15827] Fps is (10 sec: 8192.0, 60 sec: 11810.1, 300 sec: 12260.2). Total num frames: 14069760. Throughput: 0: 2835.2. Samples: 3518312. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:48:54,062][15827] Avg episode reward: [(0, '4.461')] +[2025-08-29 18:48:55,206][19393] Updated weights for policy 0, policy_version 3440 (0.0015) +[2025-08-29 18:48:58,313][19393] Updated weights for policy 0, policy_version 3450 (0.0014) +[2025-08-29 18:48:59,060][15827] Fps is (10 sec: 13402.1, 60 sec: 11810.1, 300 sec: 12428.3). Total num frames: 14139392. Throughput: 0: 2924.1. Samples: 3527980. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:48:59,062][15827] Avg episode reward: [(0, '4.392')] +[2025-08-29 18:49:01,345][19393] Updated weights for policy 0, policy_version 3460 (0.0014) +[2025-08-29 18:49:04,060][15827] Fps is (10 sec: 13516.8, 60 sec: 11810.3, 300 sec: 12426.9). Total num frames: 14204928. Throughput: 0: 3094.3. Samples: 3548430. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:49:04,062][15827] Avg episode reward: [(0, '4.620')] +[2025-08-29 18:49:04,411][19393] Updated weights for policy 0, policy_version 3470 (0.0013) +[2025-08-29 18:49:07,540][19393] Updated weights for policy 0, policy_version 3480 (0.0015) +[2025-08-29 18:49:09,060][15827] Fps is (10 sec: 13517.0, 60 sec: 11878.5, 300 sec: 12454.6). Total num frames: 14274560. Throughput: 0: 3101.6. Samples: 3568218. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:49:09,061][15827] Avg episode reward: [(0, '4.597')] +[2025-08-29 18:49:10,607][19393] Updated weights for policy 0, policy_version 3490 (0.0011) +[2025-08-29 18:49:13,749][19393] Updated weights for policy 0, policy_version 3500 (0.0018) +[2025-08-29 18:49:14,060][15827] Fps is (10 sec: 13107.2, 60 sec: 12537.7, 300 sec: 12426.9). Total num frames: 14336000. Throughput: 0: 3096.6. Samples: 3577948. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:49:14,062][15827] Avg episode reward: [(0, '4.463')] +[2025-08-29 18:49:16,645][19393] Updated weights for policy 0, policy_version 3510 (0.0013) +[2025-08-29 18:49:19,061][15827] Fps is (10 sec: 12287.7, 60 sec: 12492.8, 300 sec: 12399.1). Total num frames: 14397440. Throughput: 0: 3074.6. Samples: 3597760. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:49:19,062][15827] Avg episode reward: [(0, '4.504')] +[2025-08-29 18:49:20,366][19393] Updated weights for policy 0, policy_version 3520 (0.0018) +[2025-08-29 18:49:27,254][15827] Fps is (10 sec: 9002.9, 60 sec: 11666.9, 300 sec: 12225.1). Total num frames: 14454784. Throughput: 0: 2804.0. Samples: 3614694. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:49:27,256][15827] Avg episode reward: [(0, '4.645')] +[2025-08-29 18:49:27,458][19393] Updated weights for policy 0, policy_version 3530 (0.0012) +[2025-08-29 18:49:29,061][15827] Fps is (10 sec: 7782.4, 60 sec: 11605.3, 300 sec: 12176.9). Total num frames: 14475264. Throughput: 0: 2782.3. Samples: 3615168. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:49:29,062][15827] Avg episode reward: [(0, '4.469')] +[2025-08-29 18:49:30,849][19393] Updated weights for policy 0, policy_version 3540 (0.0017) +[2025-08-29 18:49:34,060][15827] Fps is (10 sec: 12036.7, 60 sec: 11468.8, 300 sec: 12309.9). Total num frames: 14536704. Throughput: 0: 2834.6. Samples: 3631156. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:49:34,062][15827] Avg episode reward: [(0, '4.297')] +[2025-08-29 18:49:34,384][19393] Updated weights for policy 0, policy_version 3550 (0.0017) +[2025-08-29 18:49:37,678][19393] Updated weights for policy 0, policy_version 3560 (0.0016) +[2025-08-29 18:49:39,060][15827] Fps is (10 sec: 12288.0, 60 sec: 11332.3, 300 sec: 12274.1). Total num frames: 14598144. Throughput: 0: 2917.1. Samples: 3649582. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:49:39,062][15827] Avg episode reward: [(0, '4.168')] +[2025-08-29 18:49:39,069][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003564_14598144.pth... +[2025-08-29 18:49:39,149][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000002858_11706368.pth +[2025-08-29 18:49:40,981][19393] Updated weights for policy 0, policy_version 3570 (0.0013) +[2025-08-29 18:49:43,986][19393] Updated weights for policy 0, policy_version 3580 (0.0016) +[2025-08-29 18:49:44,060][15827] Fps is (10 sec: 12697.5, 60 sec: 11264.0, 300 sec: 12260.2). Total num frames: 14663680. Throughput: 0: 2915.4. Samples: 3659172. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:49:44,062][15827] Avg episode reward: [(0, '4.408')] +[2025-08-29 18:49:47,214][19393] Updated weights for policy 0, policy_version 3590 (0.0019) +[2025-08-29 18:49:49,060][15827] Fps is (10 sec: 12697.7, 60 sec: 11938.3, 300 sec: 12232.5). Total num frames: 14725120. Throughput: 0: 2891.9. Samples: 3678566. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:49:49,062][15827] Avg episode reward: [(0, '4.397')] +[2025-08-29 18:49:50,716][19393] Updated weights for policy 0, policy_version 3600 (0.0017) +[2025-08-29 18:49:53,965][19393] Updated weights for policy 0, policy_version 3610 (0.0016) +[2025-08-29 18:49:54,061][15827] Fps is (10 sec: 12287.9, 60 sec: 11946.7, 300 sec: 12190.8). Total num frames: 14786560. Throughput: 0: 2849.1. Samples: 3696428. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:49:54,070][15827] Avg episode reward: [(0, '4.462')] +[2025-08-29 18:49:57,230][19393] Updated weights for policy 0, policy_version 3620 (0.0014) +[2025-08-29 18:49:59,060][15827] Fps is (10 sec: 12288.0, 60 sec: 11810.1, 300 sec: 12149.2). Total num frames: 14848000. Throughput: 0: 2849.8. Samples: 3706190. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:49:59,062][15827] Avg episode reward: [(0, '4.356')] +[2025-08-29 18:50:04,061][15827] Fps is (10 sec: 7782.0, 60 sec: 10990.8, 300 sec: 11982.5). Total num frames: 14864384. Throughput: 0: 2608.8. Samples: 3715156. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:50:04,064][15827] Avg episode reward: [(0, '4.396')] +[2025-08-29 18:50:04,322][19393] Updated weights for policy 0, policy_version 3630 (0.0013) +[2025-08-29 18:50:08,247][19393] Updated weights for policy 0, policy_version 3640 (0.0018) +[2025-08-29 18:50:09,061][15827] Fps is (10 sec: 6963.0, 60 sec: 10717.8, 300 sec: 12065.2). Total num frames: 14917632. Throughput: 0: 2753.0. Samples: 3729788. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:50:09,064][15827] Avg episode reward: [(0, '4.451')] +[2025-08-29 18:50:12,125][19393] Updated weights for policy 0, policy_version 3650 (0.0025) +[2025-08-29 18:50:14,061][15827] Fps is (10 sec: 10240.4, 60 sec: 10513.0, 300 sec: 12010.3). Total num frames: 14966784. Throughput: 0: 2716.5. Samples: 3737412. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:50:14,063][15827] Avg episode reward: [(0, '4.359')] +[2025-08-29 18:50:16,435][19393] Updated weights for policy 0, policy_version 3660 (0.0027) +[2025-08-29 18:50:19,061][15827] Fps is (10 sec: 10240.1, 60 sec: 10376.5, 300 sec: 11968.6). Total num frames: 15020032. Throughput: 0: 2700.7. Samples: 3752686. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:50:19,065][15827] Avg episode reward: [(0, '4.528')] +[2025-08-29 18:50:20,166][19393] Updated weights for policy 0, policy_version 3670 (0.0026) +[2025-08-29 18:50:22,988][19393] Updated weights for policy 0, policy_version 3680 (0.0015) +[2025-08-29 18:50:24,060][15827] Fps is (10 sec: 12288.1, 60 sec: 11176.3, 300 sec: 11968.6). Total num frames: 15089664. Throughput: 0: 2717.9. Samples: 3771886. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:50:24,062][15827] Avg episode reward: [(0, '4.440')] +[2025-08-29 18:50:25,808][19393] Updated weights for policy 0, policy_version 3690 (0.0012) +[2025-08-29 18:50:29,061][15827] Fps is (10 sec: 13107.2, 60 sec: 11264.0, 300 sec: 11927.0). Total num frames: 15151104. Throughput: 0: 2737.4. Samples: 3782356. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:50:29,062][15827] Avg episode reward: [(0, '4.444')] +[2025-08-29 18:50:29,377][19393] Updated weights for policy 0, policy_version 3700 (0.0018) +[2025-08-29 18:50:33,244][19393] Updated weights for policy 0, policy_version 3710 (0.0022) +[2025-08-29 18:50:34,061][15827] Fps is (10 sec: 11058.9, 60 sec: 11059.1, 300 sec: 11843.7). Total num frames: 15200256. Throughput: 0: 2670.3. Samples: 3798732. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:50:34,064][15827] Avg episode reward: [(0, '4.367')] +[2025-08-29 18:50:39,062][15827] Fps is (10 sec: 6143.5, 60 sec: 10239.8, 300 sec: 11635.4). Total num frames: 15212544. Throughput: 0: 2418.6. Samples: 3805268. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:50:39,070][15827] Avg episode reward: [(0, '4.363')] +[2025-08-29 18:50:41,940][19393] Updated weights for policy 0, policy_version 3720 (0.0028) +[2025-08-29 18:50:44,061][15827] Fps is (10 sec: 5324.9, 60 sec: 9830.4, 300 sec: 11692.4). Total num frames: 15253504. Throughput: 0: 2305.2. Samples: 3809924. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:50:44,063][15827] Avg episode reward: [(0, '4.269')] +[2025-08-29 18:50:46,611][19393] Updated weights for policy 0, policy_version 3730 (0.0025) +[2025-08-29 18:50:49,061][15827] Fps is (10 sec: 8602.3, 60 sec: 9557.3, 300 sec: 11621.5). Total num frames: 15298560. Throughput: 0: 2399.1. Samples: 3823116. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:50:49,063][15827] Avg episode reward: [(0, '4.348')] +[2025-08-29 18:50:50,433][19393] Updated weights for policy 0, policy_version 3740 (0.0017) +[2025-08-29 18:50:53,943][19393] Updated weights for policy 0, policy_version 3750 (0.0019) +[2025-08-29 18:50:54,061][15827] Fps is (10 sec: 10649.6, 60 sec: 9557.3, 300 sec: 11579.9). Total num frames: 15360000. Throughput: 0: 2449.4. Samples: 3840012. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:50:54,063][15827] Avg episode reward: [(0, '4.309')] +[2025-08-29 18:50:57,404][19393] Updated weights for policy 0, policy_version 3760 (0.0017) +[2025-08-29 18:50:59,061][15827] Fps is (10 sec: 11468.8, 60 sec: 9420.8, 300 sec: 11524.3). Total num frames: 15413248. Throughput: 0: 2478.7. Samples: 3848952. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:50:59,063][15827] Avg episode reward: [(0, '4.472')] +[2025-08-29 18:51:00,931][19393] Updated weights for policy 0, policy_version 3770 (0.0014) +[2025-08-29 18:51:04,060][15827] Fps is (10 sec: 11468.9, 60 sec: 10171.8, 300 sec: 11468.8). Total num frames: 15474688. Throughput: 0: 2529.2. Samples: 3866500. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:51:04,062][15827] Avg episode reward: [(0, '4.568')] +[2025-08-29 18:51:04,375][19393] Updated weights for policy 0, policy_version 3780 (0.0014) +[2025-08-29 18:51:07,584][19393] Updated weights for policy 0, policy_version 3790 (0.0014) +[2025-08-29 18:51:09,061][15827] Fps is (10 sec: 12288.0, 60 sec: 10308.3, 300 sec: 11427.1). Total num frames: 15536128. Throughput: 0: 2512.0. Samples: 3884928. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:51:09,063][15827] Avg episode reward: [(0, '4.354')] +[2025-08-29 18:51:14,749][15827] Fps is (10 sec: 8047.4, 60 sec: 9786.4, 300 sec: 11248.2). Total num frames: 15560704. Throughput: 0: 2245.0. Samples: 3884928. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:51:14,751][15827] Avg episode reward: [(0, '4.332')] +[2025-08-29 18:51:15,017][19393] Updated weights for policy 0, policy_version 3800 (0.0019) +[2025-08-29 18:51:18,275][19393] Updated weights for policy 0, policy_version 3810 (0.0013) +[2025-08-29 18:51:19,060][15827] Fps is (10 sec: 7782.5, 60 sec: 9898.7, 300 sec: 11232.8). Total num frames: 15613952. Throughput: 0: 2259.3. Samples: 3900398. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:51:19,062][15827] Avg episode reward: [(0, '4.571')] +[2025-08-29 18:51:21,995][19393] Updated weights for policy 0, policy_version 3820 (0.0017) +[2025-08-29 18:51:24,060][15827] Fps is (10 sec: 11437.3, 60 sec: 9625.6, 300 sec: 11343.8). Total num frames: 15667200. Throughput: 0: 2482.0. Samples: 3916956. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:51:24,063][15827] Avg episode reward: [(0, '4.306')] +[2025-08-29 18:51:25,630][19393] Updated weights for policy 0, policy_version 3830 (0.0016) +[2025-08-29 18:51:28,923][19393] Updated weights for policy 0, policy_version 3840 (0.0015) +[2025-08-29 18:51:29,061][15827] Fps is (10 sec: 11468.1, 60 sec: 9625.5, 300 sec: 11316.0). Total num frames: 15728640. Throughput: 0: 2586.4. Samples: 3926314. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:51:29,063][15827] Avg episode reward: [(0, '4.300')] +[2025-08-29 18:51:32,796][19393] Updated weights for policy 0, policy_version 3850 (0.0017) +[2025-08-29 18:51:34,061][15827] Fps is (10 sec: 11468.7, 60 sec: 9693.9, 300 sec: 11246.6). Total num frames: 15781888. Throughput: 0: 2671.7. Samples: 3943342. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:51:34,062][15827] Avg episode reward: [(0, '4.462')] +[2025-08-29 18:51:36,027][19393] Updated weights for policy 0, policy_version 3860 (0.0015) +[2025-08-29 18:51:38,931][19393] Updated weights for policy 0, policy_version 3870 (0.0012) +[2025-08-29 18:51:39,060][15827] Fps is (10 sec: 12288.9, 60 sec: 10649.8, 300 sec: 11232.8). Total num frames: 15851520. Throughput: 0: 2720.2. Samples: 3962418. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:51:39,061][15827] Avg episode reward: [(0, '4.436')] +[2025-08-29 18:51:39,066][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003870_15851520.pth... +[2025-08-29 18:51:39,176][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003216_13172736.pth +[2025-08-29 18:51:41,644][19393] Updated weights for policy 0, policy_version 3880 (0.0011) +[2025-08-29 18:51:44,060][15827] Fps is (10 sec: 13926.5, 60 sec: 11127.5, 300 sec: 11218.9). Total num frames: 15921152. Throughput: 0: 2780.1. Samples: 3974058. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:51:44,062][15827] Avg episode reward: [(0, '4.390')] +[2025-08-29 18:51:44,906][19393] Updated weights for policy 0, policy_version 3890 (0.0016) +[2025-08-29 18:51:50,584][15827] Fps is (10 sec: 8530.5, 60 sec: 10585.6, 300 sec: 11023.1). Total num frames: 15949824. Throughput: 0: 2518.1. Samples: 3983652. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:51:50,586][15827] Avg episode reward: [(0, '4.694')] +[2025-08-29 18:51:52,172][19393] Updated weights for policy 0, policy_version 3900 (0.0014) +[2025-08-29 18:51:54,061][15827] Fps is (10 sec: 7372.8, 60 sec: 10581.3, 300 sec: 10996.7). Total num frames: 15994880. Throughput: 0: 2531.5. Samples: 3998846. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:51:54,062][15827] Avg episode reward: [(0, '4.501')] +[2025-08-29 18:51:55,449][19393] Updated weights for policy 0, policy_version 3910 (0.0014) +[2025-08-29 18:51:58,820][19393] Updated weights for policy 0, policy_version 3920 (0.0011) +[2025-08-29 18:51:59,061][15827] Fps is (10 sec: 12562.7, 60 sec: 10717.8, 300 sec: 11149.4). Total num frames: 16056320. Throughput: 0: 2789.1. Samples: 4008518. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:51:59,065][15827] Avg episode reward: [(0, '4.402')] +[2025-08-29 18:52:02,471][19393] Updated weights for policy 0, policy_version 3930 (0.0013) +[2025-08-29 18:52:04,060][15827] Fps is (10 sec: 11878.5, 60 sec: 10649.6, 300 sec: 11135.6). Total num frames: 16113664. Throughput: 0: 2784.8. Samples: 4025716. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:52:04,062][15827] Avg episode reward: [(0, '4.423')] +[2025-08-29 18:52:05,810][19393] Updated weights for policy 0, policy_version 3940 (0.0021) +[2025-08-29 18:52:09,061][15827] Fps is (10 sec: 11469.3, 60 sec: 10581.3, 300 sec: 11093.9). Total num frames: 16171008. Throughput: 0: 2809.1. Samples: 4043366. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:52:09,063][15827] Avg episode reward: [(0, '4.553')] +[2025-08-29 18:52:09,547][19393] Updated weights for policy 0, policy_version 3950 (0.0017) +[2025-08-29 18:52:13,361][19393] Updated weights for policy 0, policy_version 3960 (0.0018) +[2025-08-29 18:52:14,060][15827] Fps is (10 sec: 11059.2, 60 sec: 11187.6, 300 sec: 11066.1). Total num frames: 16224256. Throughput: 0: 2786.3. Samples: 4051694. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:52:14,062][15827] Avg episode reward: [(0, '4.276')] +[2025-08-29 18:52:17,061][19393] Updated weights for policy 0, policy_version 3970 (0.0018) +[2025-08-29 18:52:19,060][15827] Fps is (10 sec: 11059.7, 60 sec: 11127.5, 300 sec: 11052.3). Total num frames: 16281600. Throughput: 0: 2771.6. Samples: 4068064. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:52:19,061][15827] Avg episode reward: [(0, '4.533')] +[2025-08-29 18:52:20,575][19393] Updated weights for policy 0, policy_version 3980 (0.0017) +[2025-08-29 18:52:26,426][15827] Fps is (10 sec: 8612.0, 60 sec: 10639.6, 300 sec: 10909.2). Total num frames: 16330752. Throughput: 0: 2415.1. Samples: 4076814. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:52:26,428][15827] Avg episode reward: [(0, '4.270')] +[2025-08-29 18:52:27,201][19393] Updated weights for policy 0, policy_version 3990 (0.0011) +[2025-08-29 18:52:29,061][15827] Fps is (10 sec: 8191.8, 60 sec: 10581.4, 300 sec: 10913.4). Total num frames: 16363520. Throughput: 0: 2462.5. Samples: 4084872. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:52:29,063][15827] Avg episode reward: [(0, '4.417')] +[2025-08-29 18:52:30,437][19393] Updated weights for policy 0, policy_version 4000 (0.0014) +[2025-08-29 18:52:33,476][19393] Updated weights for policy 0, policy_version 4010 (0.0016) +[2025-08-29 18:52:34,060][15827] Fps is (10 sec: 12877.1, 60 sec: 10786.1, 300 sec: 11064.3). Total num frames: 16429056. Throughput: 0: 2767.7. Samples: 4103980. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:52:34,062][15827] Avg episode reward: [(0, '4.551')] +[2025-08-29 18:52:36,707][19393] Updated weights for policy 0, policy_version 4020 (0.0015) +[2025-08-29 18:52:39,060][15827] Fps is (10 sec: 13107.5, 60 sec: 10717.8, 300 sec: 11066.1). Total num frames: 16494592. Throughput: 0: 2767.5. Samples: 4123384. Policy #0 lag: (min: 0.0, avg: 1.7, max: 3.0) +[2025-08-29 18:52:39,062][15827] Avg episode reward: [(0, '4.204')] +[2025-08-29 18:52:39,893][19393] Updated weights for policy 0, policy_version 4030 (0.0014) +[2025-08-29 18:52:43,160][19393] Updated weights for policy 0, policy_version 4040 (0.0012) +[2025-08-29 18:52:44,060][15827] Fps is (10 sec: 13107.3, 60 sec: 10649.6, 300 sec: 11066.1). Total num frames: 16560128. Throughput: 0: 2772.0. Samples: 4133258. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:52:44,061][15827] Avg episode reward: [(0, '4.444')] +[2025-08-29 18:52:46,408][19393] Updated weights for policy 0, policy_version 4050 (0.0014) +[2025-08-29 18:52:49,061][15827] Fps is (10 sec: 12287.5, 60 sec: 11417.3, 300 sec: 11038.4). Total num frames: 16617472. Throughput: 0: 2814.8. Samples: 4152382. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:52:49,063][15827] Avg episode reward: [(0, '4.343')] +[2025-08-29 18:52:49,937][19393] Updated weights for policy 0, policy_version 4060 (0.0017) +[2025-08-29 18:52:53,778][19393] Updated weights for policy 0, policy_version 4070 (0.0023) +[2025-08-29 18:52:54,060][15827] Fps is (10 sec: 11468.9, 60 sec: 11332.3, 300 sec: 10996.7). Total num frames: 16674816. Throughput: 0: 2782.0. Samples: 4168554. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:52:54,062][15827] Avg episode reward: [(0, '4.238')] +[2025-08-29 18:52:56,986][19393] Updated weights for policy 0, policy_version 4080 (0.0015) +[2025-08-29 18:53:02,255][15827] Fps is (10 sec: 9002.5, 60 sec: 10759.5, 300 sec: 10865.2). Total num frames: 16736256. Throughput: 0: 2620.3. Samples: 4177978. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:53:02,257][15827] Avg episode reward: [(0, '4.580')] +[2025-08-29 18:53:03,481][19393] Updated weights for policy 0, policy_version 4090 (0.0011) +[2025-08-29 18:53:04,061][15827] Fps is (10 sec: 8191.8, 60 sec: 10717.8, 300 sec: 10830.1). Total num frames: 16756736. Throughput: 0: 2628.3. Samples: 4186338. Policy #0 lag: (min: 0.0, avg: 1.2, max: 4.0) +[2025-08-29 18:53:04,062][15827] Avg episode reward: [(0, '4.269')] +[2025-08-29 18:53:06,200][19393] Updated weights for policy 0, policy_version 4100 (0.0014) +[2025-08-29 18:53:09,061][15827] Fps is (10 sec: 13241.4, 60 sec: 10922.7, 300 sec: 10983.9). Total num frames: 16826368. Throughput: 0: 3057.6. Samples: 4207174. Policy #0 lag: (min: 0.0, avg: 1.7, max: 3.0) +[2025-08-29 18:53:09,062][15827] Avg episode reward: [(0, '4.358')] +[2025-08-29 18:53:09,527][19393] Updated weights for policy 0, policy_version 4110 (0.0011) +[2025-08-29 18:53:12,941][19393] Updated weights for policy 0, policy_version 4120 (0.0017) +[2025-08-29 18:53:14,060][15827] Fps is (10 sec: 12697.9, 60 sec: 10991.0, 300 sec: 10969.0). Total num frames: 16883712. Throughput: 0: 2906.8. Samples: 4215678. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:53:14,062][15827] Avg episode reward: [(0, '4.316')] +[2025-08-29 18:53:16,061][19393] Updated weights for policy 0, policy_version 4130 (0.0016) +[2025-08-29 18:53:19,061][15827] Fps is (10 sec: 12288.1, 60 sec: 11127.4, 300 sec: 10955.1). Total num frames: 16949248. Throughput: 0: 2904.0. Samples: 4234660. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:53:19,063][15827] Avg episode reward: [(0, '4.329')] +[2025-08-29 18:53:19,406][19393] Updated weights for policy 0, policy_version 4140 (0.0015) +[2025-08-29 18:53:22,802][19393] Updated weights for policy 0, policy_version 4150 (0.0018) +[2025-08-29 18:53:24,061][15827] Fps is (10 sec: 12287.8, 60 sec: 11726.4, 300 sec: 10941.2). Total num frames: 17006592. Throughput: 0: 2873.2. Samples: 4252678. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:53:24,062][15827] Avg episode reward: [(0, '4.336')] +[2025-08-29 18:53:26,333][19393] Updated weights for policy 0, policy_version 4160 (0.0016) +[2025-08-29 18:53:29,060][15827] Fps is (10 sec: 12288.2, 60 sec: 11810.2, 300 sec: 10927.3). Total num frames: 17072128. Throughput: 0: 2850.8. Samples: 4261544. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:53:29,062][15827] Avg episode reward: [(0, '4.376')] +[2025-08-29 18:53:29,540][19393] Updated weights for policy 0, policy_version 4170 (0.0020) +[2025-08-29 18:53:32,712][19393] Updated weights for policy 0, policy_version 4180 (0.0014) +[2025-08-29 18:53:34,060][15827] Fps is (10 sec: 13107.3, 60 sec: 11810.1, 300 sec: 10913.4). Total num frames: 17137664. Throughput: 0: 2861.8. Samples: 4281162. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:53:34,062][15827] Avg episode reward: [(0, '4.387')] +[2025-08-29 18:53:39,060][15827] Fps is (10 sec: 8191.9, 60 sec: 10990.9, 300 sec: 10732.9). Total num frames: 17154048. Throughput: 0: 2671.9. Samples: 4288788. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:53:39,062][15827] Avg episode reward: [(0, '4.496')] +[2025-08-29 18:53:39,068][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004188_17154048.pth... +[2025-08-29 18:53:39,192][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003564_14598144.pth +[2025-08-29 18:53:39,528][19393] Updated weights for policy 0, policy_version 4190 (0.0011) +[2025-08-29 18:53:42,777][19393] Updated weights for policy 0, policy_version 4200 (0.0019) +[2025-08-29 18:53:44,060][15827] Fps is (10 sec: 8192.0, 60 sec: 10990.9, 300 sec: 10875.4). Total num frames: 17219584. Throughput: 0: 2883.8. Samples: 4298536. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:53:44,062][15827] Avg episode reward: [(0, '4.423')] +[2025-08-29 18:53:45,930][19393] Updated weights for policy 0, policy_version 4210 (0.0014) +[2025-08-29 18:53:49,039][19393] Updated weights for policy 0, policy_version 4220 (0.0012) +[2025-08-29 18:53:49,061][15827] Fps is (10 sec: 13107.0, 60 sec: 11127.5, 300 sec: 10899.5). Total num frames: 17285120. Throughput: 0: 2922.5. Samples: 4317852. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:53:49,064][15827] Avg episode reward: [(0, '4.442')] +[2025-08-29 18:53:52,304][19393] Updated weights for policy 0, policy_version 4230 (0.0018) +[2025-08-29 18:53:54,061][15827] Fps is (10 sec: 12697.5, 60 sec: 11195.7, 300 sec: 10871.8). Total num frames: 17346560. Throughput: 0: 2886.4. Samples: 4337062. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:53:54,062][15827] Avg episode reward: [(0, '4.417')] +[2025-08-29 18:53:55,329][19393] Updated weights for policy 0, policy_version 4240 (0.0016) +[2025-08-29 18:53:58,559][19393] Updated weights for policy 0, policy_version 4250 (0.0015) +[2025-08-29 18:53:59,060][15827] Fps is (10 sec: 12697.8, 60 sec: 11897.5, 300 sec: 10871.8). Total num frames: 17412096. Throughput: 0: 2914.7. Samples: 4346840. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:53:59,062][15827] Avg episode reward: [(0, '4.444')] +[2025-08-29 18:54:01,873][19393] Updated weights for policy 0, policy_version 4260 (0.0016) +[2025-08-29 18:54:04,061][15827] Fps is (10 sec: 13107.1, 60 sec: 12014.9, 300 sec: 10857.9). Total num frames: 17477632. Throughput: 0: 2909.7. Samples: 4365598. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:54:04,063][15827] Avg episode reward: [(0, '4.350')] +[2025-08-29 18:54:04,970][19393] Updated weights for policy 0, policy_version 4270 (0.0014) +[2025-08-29 18:54:08,300][19393] Updated weights for policy 0, policy_version 4280 (0.0016) +[2025-08-29 18:54:09,060][15827] Fps is (10 sec: 12697.6, 60 sec: 11878.4, 300 sec: 10857.9). Total num frames: 17539072. Throughput: 0: 2939.1. Samples: 4384936. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:54:09,062][15827] Avg episode reward: [(0, '4.394')] +[2025-08-29 18:54:14,060][15827] Fps is (10 sec: 8192.1, 60 sec: 11264.0, 300 sec: 10719.0). Total num frames: 17559552. Throughput: 0: 2866.7. Samples: 4390544. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:54:14,062][15827] Avg episode reward: [(0, '4.301')] +[2025-08-29 18:54:15,212][19393] Updated weights for policy 0, policy_version 4290 (0.0014) +[2025-08-29 18:54:18,429][19393] Updated weights for policy 0, policy_version 4300 (0.0017) +[2025-08-29 18:54:19,060][15827] Fps is (10 sec: 8192.1, 60 sec: 11195.8, 300 sec: 10850.4). Total num frames: 17620992. Throughput: 0: 2685.5. Samples: 4402008. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:54:19,062][15827] Avg episode reward: [(0, '4.280')] +[2025-08-29 18:54:21,708][19393] Updated weights for policy 0, policy_version 4310 (0.0015) +[2025-08-29 18:54:24,060][15827] Fps is (10 sec: 12288.1, 60 sec: 11264.0, 300 sec: 10871.8). Total num frames: 17682432. Throughput: 0: 2935.3. Samples: 4420878. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:54:24,062][15827] Avg episode reward: [(0, '4.485')] +[2025-08-29 18:54:24,851][19393] Updated weights for policy 0, policy_version 4320 (0.0015) +[2025-08-29 18:54:27,964][19393] Updated weights for policy 0, policy_version 4330 (0.0016) +[2025-08-29 18:54:29,061][15827] Fps is (10 sec: 12696.9, 60 sec: 11263.9, 300 sec: 10885.6). Total num frames: 17747968. Throughput: 0: 2936.5. Samples: 4430682. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:54:29,063][15827] Avg episode reward: [(0, '4.383')] +[2025-08-29 18:54:31,209][19393] Updated weights for policy 0, policy_version 4340 (0.0014) +[2025-08-29 18:54:34,060][15827] Fps is (10 sec: 13107.3, 60 sec: 11264.0, 300 sec: 10899.5). Total num frames: 17813504. Throughput: 0: 2934.6. Samples: 4449910. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:54:34,062][15827] Avg episode reward: [(0, '4.478')] +[2025-08-29 18:54:34,480][19393] Updated weights for policy 0, policy_version 4350 (0.0020) +[2025-08-29 18:54:37,716][19393] Updated weights for policy 0, policy_version 4360 (0.0016) +[2025-08-29 18:54:39,060][15827] Fps is (10 sec: 12698.3, 60 sec: 12014.9, 300 sec: 10885.6). Total num frames: 17874944. Throughput: 0: 2927.4. Samples: 4468794. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:54:39,062][15827] Avg episode reward: [(0, '4.622')] +[2025-08-29 18:54:40,989][19393] Updated weights for policy 0, policy_version 4370 (0.0016) +[2025-08-29 18:54:44,061][15827] Fps is (10 sec: 12287.6, 60 sec: 11946.6, 300 sec: 10885.6). Total num frames: 17936384. Throughput: 0: 2921.8. Samples: 4478322. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:54:44,062][15827] Avg episode reward: [(0, '4.503')] +[2025-08-29 18:54:44,207][19393] Updated weights for policy 0, policy_version 4380 (0.0015) +[2025-08-29 18:54:49,747][15827] Fps is (10 sec: 8432.6, 60 sec: 11204.2, 300 sec: 10749.6). Total num frames: 17965056. Throughput: 0: 2671.1. Samples: 4487632. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:54:49,749][15827] Avg episode reward: [(0, '4.626')] +[2025-08-29 18:54:50,867][19393] Updated weights for policy 0, policy_version 4390 (0.0014) +[2025-08-29 18:54:54,061][15827] Fps is (10 sec: 8192.2, 60 sec: 11195.7, 300 sec: 10746.8). Total num frames: 18018304. Throughput: 0: 2663.0. Samples: 4504772. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:54:54,062][15827] Avg episode reward: [(0, '4.377')] +[2025-08-29 18:54:54,130][19393] Updated weights for policy 0, policy_version 4400 (0.0013) +[2025-08-29 18:54:57,586][19393] Updated weights for policy 0, policy_version 4410 (0.0018) +[2025-08-29 18:54:59,060][15827] Fps is (10 sec: 12313.6, 60 sec: 11127.5, 300 sec: 10899.5). Total num frames: 18079744. Throughput: 0: 2744.4. Samples: 4514040. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:54:59,062][15827] Avg episode reward: [(0, '4.470')] +[2025-08-29 18:55:00,949][19393] Updated weights for policy 0, policy_version 4420 (0.0015) +[2025-08-29 18:55:04,061][15827] Fps is (10 sec: 11878.3, 60 sec: 10990.9, 300 sec: 10913.4). Total num frames: 18137088. Throughput: 0: 2891.4. Samples: 4532122. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:55:04,062][15827] Avg episode reward: [(0, '4.506')] +[2025-08-29 18:55:04,425][19393] Updated weights for policy 0, policy_version 4430 (0.0016) +[2025-08-29 18:55:07,704][19393] Updated weights for policy 0, policy_version 4440 (0.0016) +[2025-08-29 18:55:09,060][15827] Fps is (10 sec: 12288.1, 60 sec: 11059.2, 300 sec: 10969.0). Total num frames: 18202624. Throughput: 0: 2887.4. Samples: 4550812. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:55:09,062][15827] Avg episode reward: [(0, '4.333')] +[2025-08-29 18:55:10,939][19393] Updated weights for policy 0, policy_version 4450 (0.0019) +[2025-08-29 18:55:14,060][15827] Fps is (10 sec: 12697.9, 60 sec: 11741.9, 300 sec: 10996.7). Total num frames: 18264064. Throughput: 0: 2877.1. Samples: 4560152. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:55:14,062][15827] Avg episode reward: [(0, '4.228')] +[2025-08-29 18:55:14,146][19393] Updated weights for policy 0, policy_version 4460 (0.0015) +[2025-08-29 18:55:17,364][19393] Updated weights for policy 0, policy_version 4470 (0.0015) +[2025-08-29 18:55:19,060][15827] Fps is (10 sec: 12288.0, 60 sec: 11741.9, 300 sec: 10969.0). Total num frames: 18325504. Throughput: 0: 2866.0. Samples: 4578880. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:55:19,062][15827] Avg episode reward: [(0, '4.319')] +[2025-08-29 18:55:21,036][19393] Updated weights for policy 0, policy_version 4480 (0.0018) +[2025-08-29 18:55:25,582][15827] Fps is (10 sec: 8176.5, 60 sec: 10985.4, 300 sec: 10816.0). Total num frames: 18358272. Throughput: 0: 2545.5. Samples: 4587214. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:55:25,584][15827] Avg episode reward: [(0, '4.504')] +[2025-08-29 18:55:27,875][19393] Updated weights for policy 0, policy_version 4490 (0.0017) +[2025-08-29 18:55:29,061][15827] Fps is (10 sec: 7781.9, 60 sec: 10922.6, 300 sec: 10857.9). Total num frames: 18403328. Throughput: 0: 2575.4. Samples: 4594218. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:55:29,063][15827] Avg episode reward: [(0, '4.278')] +[2025-08-29 18:55:31,110][19393] Updated weights for policy 0, policy_version 4500 (0.0013) +[2025-08-29 18:55:34,060][15827] Fps is (10 sec: 12561.1, 60 sec: 10854.4, 300 sec: 11024.5). Total num frames: 18464768. Throughput: 0: 2841.7. Samples: 4613560. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:55:34,062][15827] Avg episode reward: [(0, '4.289')] +[2025-08-29 18:55:34,305][19393] Updated weights for policy 0, policy_version 4510 (0.0016) +[2025-08-29 18:55:37,806][19393] Updated weights for policy 0, policy_version 4520 (0.0017) +[2025-08-29 18:55:39,061][15827] Fps is (10 sec: 12288.6, 60 sec: 10854.4, 300 sec: 11093.9). Total num frames: 18526208. Throughput: 0: 2820.7. Samples: 4631706. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:55:39,062][15827] Avg episode reward: [(0, '4.602')] +[2025-08-29 18:55:39,068][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004523_18526208.pth... +[2025-08-29 18:55:39,175][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000003870_15851520.pth +[2025-08-29 18:55:41,247][19393] Updated weights for policy 0, policy_version 4530 (0.0014) +[2025-08-29 18:55:44,061][15827] Fps is (10 sec: 12287.9, 60 sec: 10854.4, 300 sec: 11149.5). Total num frames: 18587648. Throughput: 0: 2823.0. Samples: 4641076. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:55:44,062][15827] Avg episode reward: [(0, '4.691')] +[2025-08-29 18:55:44,375][19393] Updated weights for policy 0, policy_version 4540 (0.0011) +[2025-08-29 18:55:47,592][19393] Updated weights for policy 0, policy_version 4550 (0.0016) +[2025-08-29 18:55:49,061][15827] Fps is (10 sec: 12287.8, 60 sec: 11532.4, 300 sec: 11149.4). Total num frames: 18649088. Throughput: 0: 2849.9. Samples: 4660366. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:55:49,062][15827] Avg episode reward: [(0, '4.329')] +[2025-08-29 18:55:50,891][19393] Updated weights for policy 0, policy_version 4560 (0.0017) +[2025-08-29 18:55:54,061][15827] Fps is (10 sec: 12697.5, 60 sec: 11605.3, 300 sec: 11191.1). Total num frames: 18714624. Throughput: 0: 2848.1. Samples: 4678976. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:55:54,064][15827] Avg episode reward: [(0, '4.301')] +[2025-08-29 18:55:54,276][19393] Updated weights for policy 0, policy_version 4570 (0.0018) +[2025-08-29 18:55:57,529][19393] Updated weights for policy 0, policy_version 4580 (0.0016) +[2025-08-29 18:56:01,419][15827] Fps is (10 sec: 9280.1, 60 sec: 10969.3, 300 sec: 11061.0). Total num frames: 18763776. Throughput: 0: 2706.2. Samples: 4688316. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:56:01,421][15827] Avg episode reward: [(0, '4.334')] +[2025-08-29 18:56:04,061][15827] Fps is (10 sec: 8192.1, 60 sec: 10990.9, 300 sec: 11052.3). Total num frames: 18796544. Throughput: 0: 2595.7. Samples: 4695686. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:56:04,063][15827] Avg episode reward: [(0, '4.438')] +[2025-08-29 18:56:04,395][19393] Updated weights for policy 0, policy_version 4590 (0.0020) +[2025-08-29 18:56:08,339][19393] Updated weights for policy 0, policy_version 4600 (0.0016) +[2025-08-29 18:56:09,061][15827] Fps is (10 sec: 11256.4, 60 sec: 10786.1, 300 sec: 11175.5). Total num frames: 18849792. Throughput: 0: 2882.8. Samples: 4712554. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:56:09,065][15827] Avg episode reward: [(0, '4.339')] +[2025-08-29 18:56:12,259][19393] Updated weights for policy 0, policy_version 4610 (0.0016) +[2025-08-29 18:56:14,061][15827] Fps is (10 sec: 10240.0, 60 sec: 10581.3, 300 sec: 11135.6). Total num frames: 18898944. Throughput: 0: 2801.7. Samples: 4720294. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:56:14,063][15827] Avg episode reward: [(0, '4.443')] +[2025-08-29 18:56:15,923][19393] Updated weights for policy 0, policy_version 4620 (0.0021) +[2025-08-29 18:56:19,060][15827] Fps is (10 sec: 11059.6, 60 sec: 10581.3, 300 sec: 11163.3). Total num frames: 18960384. Throughput: 0: 2759.0. Samples: 4737716. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:56:19,062][15827] Avg episode reward: [(0, '4.249')] +[2025-08-29 18:56:19,114][19393] Updated weights for policy 0, policy_version 4630 (0.0013) +[2025-08-29 18:56:23,041][19393] Updated weights for policy 0, policy_version 4640 (0.0016) +[2025-08-29 18:56:24,061][15827] Fps is (10 sec: 11468.8, 60 sec: 11206.9, 300 sec: 11135.6). Total num frames: 19013632. Throughput: 0: 2725.2. Samples: 4754342. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:56:24,062][15827] Avg episode reward: [(0, '4.492')] +[2025-08-29 18:56:26,487][19393] Updated weights for policy 0, policy_version 4650 (0.0018) +[2025-08-29 18:56:29,060][15827] Fps is (10 sec: 11468.8, 60 sec: 11195.8, 300 sec: 11163.3). Total num frames: 19075072. Throughput: 0: 2714.0. Samples: 4763208. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:56:29,062][15827] Avg episode reward: [(0, '4.501')] +[2025-08-29 18:56:29,836][19393] Updated weights for policy 0, policy_version 4660 (0.0012) +[2025-08-29 18:56:33,212][19393] Updated weights for policy 0, policy_version 4670 (0.0018) +[2025-08-29 18:56:37,249][15827] Fps is (10 sec: 9006.4, 60 sec: 10565.9, 300 sec: 11002.7). Total num frames: 19132416. Throughput: 0: 2512.1. Samples: 4781420. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:56:37,250][15827] Avg episode reward: [(0, '4.494')] +[2025-08-29 18:56:39,061][15827] Fps is (10 sec: 7782.2, 60 sec: 10444.8, 300 sec: 10955.1). Total num frames: 19152896. Throughput: 0: 2439.8. Samples: 4788768. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:56:39,064][15827] Avg episode reward: [(0, '4.466')] +[2025-08-29 18:56:40,452][19393] Updated weights for policy 0, policy_version 4680 (0.0013) +[2025-08-29 18:56:44,060][15827] Fps is (10 sec: 10824.8, 60 sec: 10308.3, 300 sec: 11095.7). Total num frames: 19206144. Throughput: 0: 2532.5. Samples: 4796306. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:56:44,062][15827] Avg episode reward: [(0, '4.523')] +[2025-08-29 18:56:44,612][19393] Updated weights for policy 0, policy_version 4690 (0.0019) +[2025-08-29 18:56:47,933][19393] Updated weights for policy 0, policy_version 4700 (0.0016) +[2025-08-29 18:56:49,061][15827] Fps is (10 sec: 11468.9, 60 sec: 10308.3, 300 sec: 11093.9). Total num frames: 19267584. Throughput: 0: 2609.3. Samples: 4813104. Policy #0 lag: (min: 0.0, avg: 1.5, max: 3.0) +[2025-08-29 18:56:49,062][15827] Avg episode reward: [(0, '4.397')] +[2025-08-29 18:56:50,953][19393] Updated weights for policy 0, policy_version 4710 (0.0012) +[2025-08-29 18:56:54,060][15827] Fps is (10 sec: 11878.3, 60 sec: 10171.8, 300 sec: 11080.1). Total num frames: 19324928. Throughput: 0: 2652.6. Samples: 4831922. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:56:54,062][15827] Avg episode reward: [(0, '4.393')] +[2025-08-29 18:56:54,637][19393] Updated weights for policy 0, policy_version 4720 (0.0016) +[2025-08-29 18:56:58,134][19393] Updated weights for policy 0, policy_version 4730 (0.0017) +[2025-08-29 18:56:59,061][15827] Fps is (10 sec: 11878.6, 60 sec: 10801.2, 300 sec: 11093.9). Total num frames: 19386368. Throughput: 0: 2657.2. Samples: 4839868. Policy #0 lag: (min: 0.0, avg: 1.6, max: 3.0) +[2025-08-29 18:56:59,062][15827] Avg episode reward: [(0, '4.449')] +[2025-08-29 18:57:01,496][19393] Updated weights for policy 0, policy_version 4740 (0.0016) +[2025-08-29 18:57:04,061][15827] Fps is (10 sec: 12287.4, 60 sec: 10854.3, 300 sec: 11107.8). Total num frames: 19447808. Throughput: 0: 2679.1. Samples: 4858278. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:57:04,065][15827] Avg episode reward: [(0, '4.389')] +[2025-08-29 18:57:05,043][19393] Updated weights for policy 0, policy_version 4750 (0.0015) +[2025-08-29 18:57:09,061][15827] Fps is (10 sec: 10649.5, 60 sec: 10717.9, 300 sec: 11080.0). Total num frames: 19492864. Throughput: 0: 2647.8. Samples: 4873494. Policy #0 lag: (min: 0.0, avg: 1.1, max: 3.0) +[2025-08-29 18:57:09,062][15827] Avg episode reward: [(0, '4.320')] +[2025-08-29 18:57:09,195][19393] Updated weights for policy 0, policy_version 4760 (0.0016) +[2025-08-29 18:57:14,060][15827] Fps is (10 sec: 6554.0, 60 sec: 10240.0, 300 sec: 10955.1). Total num frames: 19513344. Throughput: 0: 2522.0. Samples: 4876700. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:57:14,062][15827] Avg episode reward: [(0, '4.520')] +[2025-08-29 18:57:15,580][19393] Updated weights for policy 0, policy_version 4770 (0.0012) +[2025-08-29 18:57:18,405][19393] Updated weights for policy 0, policy_version 4780 (0.0012) +[2025-08-29 18:57:19,061][15827] Fps is (10 sec: 9420.7, 60 sec: 10444.8, 300 sec: 11127.6). Total num frames: 19587072. Throughput: 0: 2663.0. Samples: 4892764. Policy #0 lag: (min: 0.0, avg: 1.0, max: 3.0) +[2025-08-29 18:57:19,063][15827] Avg episode reward: [(0, '4.555')] +[2025-08-29 18:57:21,525][19393] Updated weights for policy 0, policy_version 4790 (0.0019) +[2025-08-29 18:57:24,060][15827] Fps is (10 sec: 12697.6, 60 sec: 10444.8, 300 sec: 11107.8). Total num frames: 19640320. Throughput: 0: 2704.9. Samples: 4910486. Policy #0 lag: (min: 0.0, avg: 1.3, max: 3.0) +[2025-08-29 18:57:24,062][15827] Avg episode reward: [(0, '4.335')] +[2025-08-29 18:57:25,678][19393] Updated weights for policy 0, policy_version 4800 (0.0017) +[2025-08-29 18:57:29,060][15827] Fps is (10 sec: 11059.5, 60 sec: 10376.5, 300 sec: 11080.0). Total num frames: 19697664. Throughput: 0: 2715.7. Samples: 4918512. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:57:29,062][15827] Avg episode reward: [(0, '4.454')] +[2025-08-29 18:57:29,103][19393] Updated weights for policy 0, policy_version 4810 (0.0016) +[2025-08-29 18:57:32,512][19393] Updated weights for policy 0, policy_version 4820 (0.0015) +[2025-08-29 18:57:34,061][15827] Fps is (10 sec: 11878.3, 60 sec: 11031.1, 300 sec: 11066.1). Total num frames: 19759104. Throughput: 0: 2754.0. Samples: 4937032. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:57:34,062][15827] Avg episode reward: [(0, '4.382')] +[2025-08-29 18:57:35,908][19393] Updated weights for policy 0, policy_version 4830 (0.0013) +[2025-08-29 18:57:39,061][15827] Fps is (10 sec: 12287.8, 60 sec: 11127.5, 300 sec: 11052.3). Total num frames: 19820544. Throughput: 0: 2741.8. Samples: 4955304. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:57:39,062][15827] Avg episode reward: [(0, '4.406')] +[2025-08-29 18:57:39,069][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004839_19820544.pth... +[2025-08-29 18:57:39,165][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004188_17154048.pth +[2025-08-29 18:57:39,298][19393] Updated weights for policy 0, policy_version 4840 (0.0013) +[2025-08-29 18:57:42,799][19393] Updated weights for policy 0, policy_version 4850 (0.0015) +[2025-08-29 18:57:44,061][15827] Fps is (10 sec: 11878.2, 60 sec: 11195.7, 300 sec: 11052.3). Total num frames: 19877888. Throughput: 0: 2761.8. Samples: 4964148. Policy #0 lag: (min: 0.0, avg: 1.4, max: 3.0) +[2025-08-29 18:57:44,062][15827] Avg episode reward: [(0, '4.303')] +[2025-08-29 18:57:49,061][15827] Fps is (10 sec: 7782.3, 60 sec: 10513.1, 300 sec: 10927.3). Total num frames: 19898368. Throughput: 0: 2612.6. Samples: 4975844. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:57:49,062][15827] Avg episode reward: [(0, '4.298')] +[2025-08-29 18:57:49,732][19393] Updated weights for policy 0, policy_version 4860 (0.0014) +[2025-08-29 18:57:52,918][19393] Updated weights for policy 0, policy_version 4870 (0.0016) +[2025-08-29 18:57:54,061][15827] Fps is (10 sec: 7782.5, 60 sec: 10513.0, 300 sec: 11032.9). Total num frames: 19955712. Throughput: 0: 2584.3. Samples: 4989786. Policy #0 lag: (min: 0.0, avg: 1.2, max: 3.0) +[2025-08-29 18:57:54,063][15827] Avg episode reward: [(0, '4.506')] +[2025-08-29 18:57:57,044][19393] Updated weights for policy 0, policy_version 4880 (0.0020) +[2025-08-29 18:57:58,132][19378] Stopping Batcher_0... +[2025-08-29 18:57:58,137][19378] Loop batcher_evt_loop terminating... +[2025-08-29 18:57:58,141][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2025-08-29 18:57:58,141][15827] Component Batcher_0 stopped! +[2025-08-29 18:57:58,237][19378] Removing /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004523_18526208.pth +[2025-08-29 18:57:58,251][19378] Saving /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2025-08-29 18:57:58,403][15827] Component RolloutWorker_w8 stopped! +[2025-08-29 18:57:58,421][15827] Component RolloutWorker_w1 stopped! +[2025-08-29 18:57:58,424][19378] Stopping LearnerWorker_p0... +[2025-08-29 18:57:58,424][19378] Loop learner_proc0_evt_loop terminating... +[2025-08-29 18:57:58,424][15827] Component LearnerWorker_p0 stopped! +[2025-08-29 18:57:58,426][15827] Component RolloutWorker_w7 stopped! +[2025-08-29 18:57:58,432][15827] Component RolloutWorker_w4 stopped! +[2025-08-29 18:57:58,408][19401] Stopping RolloutWorker_w8... +[2025-08-29 18:57:58,437][15827] Component RolloutWorker_w5 stopped! +[2025-08-29 18:57:58,441][19401] Loop rollout_proc8_evt_loop terminating... +[2025-08-29 18:57:58,453][15827] Component RolloutWorker_w9 stopped! +[2025-08-29 18:57:58,433][19400] Stopping RolloutWorker_w7... +[2025-08-29 18:57:58,432][19397] Stopping RolloutWorker_w1... +[2025-08-29 18:57:58,462][19400] Loop rollout_proc7_evt_loop terminating... +[2025-08-29 18:57:58,462][19397] Loop rollout_proc1_evt_loop terminating... +[2025-08-29 18:57:58,460][15827] Component RolloutWorker_w3 stopped! +[2025-08-29 18:57:58,444][19398] Stopping RolloutWorker_w4... +[2025-08-29 18:57:58,443][19402] Stopping RolloutWorker_w5... +[2025-08-29 18:57:58,458][19403] Stopping RolloutWorker_w9... +[2025-08-29 18:57:58,475][19398] Loop rollout_proc4_evt_loop terminating... +[2025-08-29 18:57:58,476][19402] Loop rollout_proc5_evt_loop terminating... +[2025-08-29 18:57:58,479][19403] Loop rollout_proc9_evt_loop terminating... +[2025-08-29 18:57:58,488][15827] Component RolloutWorker_w0 stopped! +[2025-08-29 18:57:58,527][19393] Weights refcount: 2 0 +[2025-08-29 18:57:58,495][19394] Stopping RolloutWorker_w0... +[2025-08-29 18:57:58,537][19394] Loop rollout_proc0_evt_loop terminating... +[2025-08-29 18:57:58,579][19393] Stopping InferenceWorker_p0-w0... +[2025-08-29 18:57:58,579][19393] Loop inference_proc0-0_evt_loop terminating... +[2025-08-29 18:57:58,580][15827] Component InferenceWorker_p0-w0 stopped! +[2025-08-29 18:57:58,587][15827] Component RolloutWorker_w6 stopped! +[2025-08-29 18:57:58,590][19399] Stopping RolloutWorker_w6... +[2025-08-29 18:57:58,601][15827] Component RolloutWorker_w2 stopped! +[2025-08-29 18:57:58,603][19399] Loop rollout_proc6_evt_loop terminating... +[2025-08-29 18:57:58,604][15827] Waiting for process learner_proc0 to stop... +[2025-08-29 18:57:58,608][19396] Stopping RolloutWorker_w2... +[2025-08-29 18:57:58,629][19396] Loop rollout_proc2_evt_loop terminating... +[2025-08-29 18:57:58,485][19395] Stopping RolloutWorker_w3... +[2025-08-29 18:57:58,639][19395] Loop rollout_proc3_evt_loop terminating... +[2025-08-29 18:58:10,820][15827] Waiting for process inference_proc0-0 to join... +[2025-08-29 18:58:10,822][15827] Waiting for process rollout_proc0 to join... +[2025-08-29 18:58:10,823][15827] Waiting for process rollout_proc1 to join... +[2025-08-29 18:58:10,824][15827] Waiting for process rollout_proc2 to join... +[2025-08-29 18:58:10,825][15827] Waiting for process rollout_proc3 to join... +[2025-08-29 18:58:10,826][15827] Waiting for process rollout_proc4 to join... +[2025-08-29 18:58:10,827][15827] Waiting for process rollout_proc5 to join... +[2025-08-29 18:58:10,828][15827] Waiting for process rollout_proc6 to join... +[2025-08-29 18:58:10,829][15827] Waiting for process rollout_proc7 to join... +[2025-08-29 18:58:10,830][15827] Waiting for process rollout_proc8 to join... +[2025-08-29 18:58:10,831][15827] Waiting for process rollout_proc9 to join... +[2025-08-29 18:58:10,833][15827] Batcher 0 profile tree view: +batching: 96.9923, releasing_batches: 0.2179 +[2025-08-29 18:58:10,835][15827] InferenceWorker_p0-w0 profile tree view: +wait_policy: 0.0000 + wait_policy_total: 17.7003 +update_model: 26.5437 + weight_update: 0.0018 +one_step: 0.0047 + handle_policy_step: 1571.9646 + deserialize: 60.5948, stack: 8.7009, obs_to_device_normalize: 436.3278, forward: 658.4755, send_messages: 111.6453 + prepare_outputs: 239.6883 + to_cpu: 179.7071 +[2025-08-29 18:58:10,839][15827] Learner 0 profile tree view: +misc: 0.0255, prepare_batch: 66.1001 +train: 223.9064 + epoch_init: 0.0215, minibatch_init: 0.0330, losses_postprocess: 2.7224, kl_divergence: 3.1769, after_optimizer: 77.3884 + calculate_losses: 82.4285 + losses_init: 0.0122, forward_head: 5.8426, bptt_initial: 53.8262, tail: 3.9175, advantages_returns: 1.2866, losses: 8.5309 + bptt: 7.9398 + bptt_forward_core: 7.5254 + update: 55.5816 + clip: 5.1826 +[2025-08-29 18:58:10,841][15827] RolloutWorker_w0 profile tree view: +wait_for_trajectories: 0.2804, enqueue_policy_requests: 43.9031, env_step: 387.9207, overhead: 32.5997, complete_rollouts: 0.8219 +save_policy_outputs: 45.3418 + split_output_tensors: 14.6080 +[2025-08-29 18:58:10,844][15827] RolloutWorker_w9 profile tree view: +wait_for_trajectories: 0.2721, enqueue_policy_requests: 36.4583, env_step: 371.5879, overhead: 28.0753, complete_rollouts: 0.8656 +save_policy_outputs: 43.8599 + split_output_tensors: 17.6693 +[2025-08-29 18:58:10,847][15827] Loop Runner_EvtLoop terminating... +[2025-08-29 18:58:10,850][15827] Runner profile tree view: +main_loop: 1702.4607 +[2025-08-29 18:58:10,851][15827] Collected {0: 20004864}, FPS: 11750.6 +[2025-08-29 18:58:26,518][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json +[2025-08-29 18:58:26,520][15827] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-29 18:58:26,520][15827] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-29 18:58:26,521][15827] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-29 18:58:26,521][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-29 18:58:26,522][15827] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-29 18:58:26,524][15827] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file! +[2025-08-29 18:58:26,524][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-29 18:58:26,525][15827] Adding new argument 'push_to_hub'=False that is not in the saved config file! +[2025-08-29 18:58:26,526][15827] Adding new argument 'hf_repository'=None that is not in the saved config file! +[2025-08-29 18:58:26,527][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-29 18:58:26,527][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-29 18:58:26,528][15827] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-29 18:58:26,528][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-29 18:58:26,529][15827] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-29 18:58:26,933][15827] Doom resolution: 160x120, resize resolution: (128, 72) +[2025-08-29 18:58:26,960][15827] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 18:58:27,019][15827] RunningMeanStd input shape: (1,) +[2025-08-29 18:58:27,266][15827] ConvEncoder: input_channels=3 +[2025-08-29 18:58:28,061][15827] Conv encoder output size: 512 +[2025-08-29 18:58:28,064][15827] Policy head output size: 512 +[2025-08-29 18:58:29,963][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2025-08-29 18:58:34,432][15827] Num frames 100... +[2025-08-29 18:58:34,626][15827] Num frames 200... +[2025-08-29 18:58:34,961][15827] Num frames 300... +[2025-08-29 18:58:35,261][15827] Num frames 400... +[2025-08-29 18:58:35,542][15827] Num frames 500... +[2025-08-29 18:58:35,789][15827] Avg episode rewards: #0: 8.760, true rewards: #0: 5.760 +[2025-08-29 18:58:35,790][15827] Avg episode reward: 8.760, avg true_objective: 5.760 +[2025-08-29 18:58:35,846][15827] Num frames 600... +[2025-08-29 18:58:36,074][15827] Num frames 700... +[2025-08-29 18:58:36,308][15827] Num frames 800... +[2025-08-29 18:58:36,555][15827] Avg episode rewards: #0: 6.320, true rewards: #0: 4.320 +[2025-08-29 18:58:36,556][15827] Avg episode reward: 6.320, avg true_objective: 4.320 +[2025-08-29 18:58:36,683][15827] Num frames 900... +[2025-08-29 18:58:37,000][15827] Num frames 1000... +[2025-08-29 18:58:37,343][15827] Num frames 1100... +[2025-08-29 18:58:37,647][15827] Num frames 1200... +[2025-08-29 18:58:37,826][15827] Avg episode rewards: #0: 5.493, true rewards: #0: 4.160 +[2025-08-29 18:58:37,828][15827] Avg episode reward: 5.493, avg true_objective: 4.160 +[2025-08-29 18:58:37,954][15827] Num frames 1300... +[2025-08-29 18:58:38,236][15827] Num frames 1400... +[2025-08-29 18:58:38,573][15827] Num frames 1500... +[2025-08-29 18:58:38,908][15827] Num frames 1600... +[2025-08-29 18:58:39,306][15827] Avg episode rewards: #0: 5.490, true rewards: #0: 4.240 +[2025-08-29 18:58:39,308][15827] Avg episode reward: 5.490, avg true_objective: 4.240 +[2025-08-29 18:58:39,322][15827] Num frames 1700... +[2025-08-29 18:58:39,769][15827] Num frames 1800... +[2025-08-29 18:58:40,129][15827] Num frames 1900... +[2025-08-29 18:58:40,501][15827] Num frames 2000... +[2025-08-29 18:58:40,845][15827] Avg episode rewards: #0: 5.160, true rewards: #0: 4.160 +[2025-08-29 18:58:40,848][15827] Avg episode reward: 5.160, avg true_objective: 4.160 +[2025-08-29 18:58:40,922][15827] Num frames 2100... +[2025-08-29 18:58:41,159][15827] Num frames 2200... +[2025-08-29 18:58:41,421][15827] Num frames 2300... +[2025-08-29 18:58:41,701][15827] Num frames 2400... +[2025-08-29 18:58:41,971][15827] Avg episode rewards: #0: 4.940, true rewards: #0: 4.107 +[2025-08-29 18:58:41,972][15827] Avg episode reward: 4.940, avg true_objective: 4.107 +[2025-08-29 18:58:42,061][15827] Num frames 2500... +[2025-08-29 18:58:42,353][15827] Num frames 2600... +[2025-08-29 18:58:42,639][15827] Num frames 2700... +[2025-08-29 18:58:42,880][15827] Num frames 2800... +[2025-08-29 18:58:43,072][15827] Avg episode rewards: #0: 4.783, true rewards: #0: 4.069 +[2025-08-29 18:58:43,073][15827] Avg episode reward: 4.783, avg true_objective: 4.069 +[2025-08-29 18:58:43,219][15827] Num frames 2900... +[2025-08-29 18:58:43,468][15827] Num frames 3000... +[2025-08-29 18:58:43,725][15827] Num frames 3100... +[2025-08-29 18:58:43,973][15827] Num frames 3200... +[2025-08-29 18:58:44,026][15827] Avg episode rewards: #0: 4.750, true rewards: #0: 4.000 +[2025-08-29 18:58:44,028][15827] Avg episode reward: 4.750, avg true_objective: 4.000 +[2025-08-29 18:58:44,298][15827] Num frames 3300... +[2025-08-29 18:58:44,561][15827] Num frames 3400... +[2025-08-29 18:58:44,816][15827] Num frames 3500... +[2025-08-29 18:58:45,133][15827] Avg episode rewards: #0: 4.649, true rewards: #0: 3.982 +[2025-08-29 18:58:45,134][15827] Avg episode reward: 4.649, avg true_objective: 3.982 +[2025-08-29 18:58:45,181][15827] Num frames 3600... +[2025-08-29 18:58:45,399][15827] Num frames 3700... +[2025-08-29 18:58:45,637][15827] Num frames 3800... +[2025-08-29 18:58:45,900][15827] Num frames 3900... +[2025-08-29 18:58:46,160][15827] Num frames 4000... +[2025-08-29 18:58:46,294][15827] Avg episode rewards: #0: 4.732, true rewards: #0: 4.032 +[2025-08-29 18:58:46,295][15827] Avg episode reward: 4.732, avg true_objective: 4.032 +[2025-08-29 18:58:53,528][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! +[2025-08-29 18:58:53,559][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json +[2025-08-29 18:58:53,560][15827] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-29 18:58:53,561][15827] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-29 18:58:53,562][15827] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-29 18:58:53,563][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-29 18:58:53,564][15827] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-29 18:58:53,565][15827] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-08-29 18:58:53,566][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-29 18:58:53,567][15827] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-08-29 18:58:53,568][15827] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-08-29 18:58:53,569][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-29 18:58:53,570][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-29 18:58:53,571][15827] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-29 18:58:53,572][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-29 18:58:53,573][15827] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-29 18:58:53,594][15827] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 18:58:53,596][15827] RunningMeanStd input shape: (1,) +[2025-08-29 18:58:53,629][15827] ConvEncoder: input_channels=3 +[2025-08-29 18:58:53,687][15827] Conv encoder output size: 512 +[2025-08-29 18:58:53,688][15827] Policy head output size: 512 +[2025-08-29 18:58:53,726][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2025-08-29 18:58:54,271][15827] Num frames 100... +[2025-08-29 18:58:54,485][15827] Num frames 200... +[2025-08-29 18:58:54,638][15827] Avg episode rewards: #0: 2.560, true rewards: #0: 2.560 +[2025-08-29 18:58:54,639][15827] Avg episode reward: 2.560, avg true_objective: 2.560 +[2025-08-29 18:58:54,716][15827] Num frames 300... +[2025-08-29 18:58:54,899][15827] Num frames 400... +[2025-08-29 18:58:55,080][15827] Num frames 500... +[2025-08-29 18:58:55,292][15827] Num frames 600... +[2025-08-29 18:58:55,463][15827] Avg episode rewards: #0: 3.200, true rewards: #0: 3.200 +[2025-08-29 18:58:55,464][15827] Avg episode reward: 3.200, avg true_objective: 3.200 +[2025-08-29 18:58:55,597][15827] Num frames 700... +[2025-08-29 18:58:55,842][15827] Num frames 800... +[2025-08-29 18:58:56,068][15827] Num frames 900... +[2025-08-29 18:58:56,242][15827] Num frames 1000... +[2025-08-29 18:58:56,338][15827] Avg episode rewards: #0: 3.413, true rewards: #0: 3.413 +[2025-08-29 18:58:56,340][15827] Avg episode reward: 3.413, avg true_objective: 3.413 +[2025-08-29 18:58:56,481][15827] Num frames 1100... +[2025-08-29 18:58:56,650][15827] Num frames 1200... +[2025-08-29 18:58:56,815][15827] Num frames 1300... +[2025-08-29 18:58:56,974][15827] Num frames 1400... +[2025-08-29 18:59:00,683][15827] Avg episode rewards: #0: 3.850, true rewards: #0: 3.600 +[2025-08-29 18:59:00,685][15827] Avg episode reward: 3.850, avg true_objective: 3.600 +[2025-08-29 18:59:00,793][15827] Num frames 1500... +[2025-08-29 18:59:00,973][15827] Num frames 1600... +[2025-08-29 18:59:01,188][15827] Avg episode rewards: #0: 3.592, true rewards: #0: 3.392 +[2025-08-29 18:59:01,189][15827] Avg episode reward: 3.592, avg true_objective: 3.392 +[2025-08-29 18:59:01,198][15827] Num frames 1700... +[2025-08-29 18:59:01,373][15827] Num frames 1800... +[2025-08-29 18:59:01,558][15827] Num frames 1900... +[2025-08-29 18:59:01,798][15827] Num frames 2000... +[2025-08-29 18:59:02,067][15827] Avg episode rewards: #0: 3.633, true rewards: #0: 3.467 +[2025-08-29 18:59:02,069][15827] Avg episode reward: 3.633, avg true_objective: 3.467 +[2025-08-29 18:59:02,118][15827] Num frames 2100... +[2025-08-29 18:59:02,355][15827] Num frames 2200... +[2025-08-29 18:59:02,549][15827] Num frames 2300... +[2025-08-29 18:59:02,764][15827] Num frames 2400... +[2025-08-29 18:59:02,956][15827] Num frames 2500... +[2025-08-29 18:59:03,068][15827] Avg episode rewards: #0: 3.897, true rewards: #0: 3.611 +[2025-08-29 18:59:03,070][15827] Avg episode reward: 3.897, avg true_objective: 3.611 +[2025-08-29 18:59:03,308][15827] Num frames 2600... +[2025-08-29 18:59:03,502][15827] Num frames 2700... +[2025-08-29 18:59:03,702][15827] Num frames 2800... +[2025-08-29 18:59:03,933][15827] Num frames 2900... +[2025-08-29 18:59:04,014][15827] Avg episode rewards: #0: 3.890, true rewards: #0: 3.640 +[2025-08-29 18:59:04,017][15827] Avg episode reward: 3.890, avg true_objective: 3.640 +[2025-08-29 18:59:04,198][15827] Num frames 3000... +[2025-08-29 18:59:04,386][15827] Num frames 3100... +[2025-08-29 18:59:04,597][15827] Num frames 3200... +[2025-08-29 18:59:04,866][15827] Avg episode rewards: #0: 3.884, true rewards: #0: 3.662 +[2025-08-29 18:59:04,868][15827] Avg episode reward: 3.884, avg true_objective: 3.662 +[2025-08-29 18:59:04,885][15827] Num frames 3300... +[2025-08-29 18:59:05,157][15827] Num frames 3400... +[2025-08-29 18:59:05,351][15827] Num frames 3500... +[2025-08-29 18:59:05,535][15827] Num frames 3600... +[2025-08-29 18:59:05,710][15827] Num frames 3700... +[2025-08-29 18:59:05,837][15827] Avg episode rewards: #0: 4.044, true rewards: #0: 3.744 +[2025-08-29 18:59:05,838][15827] Avg episode reward: 4.044, avg true_objective: 3.744 +[2025-08-29 18:59:10,753][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! +[2025-08-29 19:00:41,771][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json +[2025-08-29 19:00:41,772][15827] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-29 19:00:41,773][15827] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-29 19:00:41,774][15827] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-29 19:00:41,774][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-29 19:00:41,775][15827] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-29 19:00:41,775][15827] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-08-29 19:00:41,776][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-29 19:00:41,777][15827] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-08-29 19:00:41,777][15827] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-08-29 19:00:41,778][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-29 19:00:41,779][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-29 19:00:41,780][15827] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-29 19:00:41,780][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-29 19:00:41,782][15827] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-29 19:00:41,825][15827] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 19:00:41,827][15827] RunningMeanStd input shape: (1,) +[2025-08-29 19:00:41,839][15827] ConvEncoder: input_channels=3 +[2025-08-29 19:00:41,870][15827] Conv encoder output size: 512 +[2025-08-29 19:00:41,871][15827] Policy head output size: 512 +[2025-08-29 19:00:41,891][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2025-08-29 19:00:42,317][15827] Num frames 100... +[2025-08-29 19:00:42,514][15827] Num frames 200... +[2025-08-29 19:00:42,711][15827] Num frames 300... +[2025-08-29 19:00:42,888][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2025-08-29 19:00:42,889][15827] Avg episode reward: 3.840, avg true_objective: 3.840 +[2025-08-29 19:00:42,925][15827] Num frames 400... +[2025-08-29 19:00:43,102][15827] Num frames 500... +[2025-08-29 19:00:43,317][15827] Num frames 600... +[2025-08-29 19:00:43,510][15827] Num frames 700... +[2025-08-29 19:00:43,697][15827] Num frames 800... +[2025-08-29 19:00:43,750][15827] Avg episode rewards: #0: 4.500, true rewards: #0: 4.000 +[2025-08-29 19:00:43,751][15827] Avg episode reward: 4.500, avg true_objective: 4.000 +[2025-08-29 19:00:43,990][15827] Num frames 900... +[2025-08-29 19:00:44,196][15827] Num frames 1000... +[2025-08-29 19:00:44,392][15827] Num frames 1100... +[2025-08-29 19:00:48,193][15827] Avg episode rewards: #0: 4.280, true rewards: #0: 3.947 +[2025-08-29 19:00:48,194][15827] Avg episode reward: 4.280, avg true_objective: 3.947 +[2025-08-29 19:00:48,231][15827] Num frames 1200... +[2025-08-29 19:00:48,436][15827] Num frames 1300... +[2025-08-29 19:00:48,618][15827] Num frames 1400... +[2025-08-29 19:00:48,808][15827] Num frames 1500... +[2025-08-29 19:00:48,984][15827] Num frames 1600... +[2025-08-29 19:00:49,096][15827] Avg episode rewards: #0: 4.580, true rewards: #0: 4.080 +[2025-08-29 19:00:49,097][15827] Avg episode reward: 4.580, avg true_objective: 4.080 +[2025-08-29 19:00:49,223][15827] Num frames 1700... +[2025-08-29 19:00:49,443][15827] Num frames 1800... +[2025-08-29 19:00:49,699][15827] Num frames 1900... +[2025-08-29 19:00:49,977][15827] Num frames 2000... +[2025-08-29 19:00:50,084][15827] Avg episode rewards: #0: 4.432, true rewards: #0: 4.032 +[2025-08-29 19:00:50,085][15827] Avg episode reward: 4.432, avg true_objective: 4.032 +[2025-08-29 19:00:50,302][15827] Num frames 2100... +[2025-08-29 19:00:50,569][15827] Num frames 2200... +[2025-08-29 19:00:50,777][15827] Num frames 2300... +[2025-08-29 19:00:50,964][15827] Num frames 2400... +[2025-08-29 19:00:51,075][15827] Avg episode rewards: #0: 4.553, true rewards: #0: 4.053 +[2025-08-29 19:00:51,076][15827] Avg episode reward: 4.553, avg true_objective: 4.053 +[2025-08-29 19:00:51,213][15827] Num frames 2500... +[2025-08-29 19:00:51,430][15827] Num frames 2600... +[2025-08-29 19:00:51,706][15827] Num frames 2700... +[2025-08-29 19:00:51,958][15827] Num frames 2800... +[2025-08-29 19:00:52,062][15827] Avg episode rewards: #0: 4.451, true rewards: #0: 4.023 +[2025-08-29 19:00:52,063][15827] Avg episode reward: 4.451, avg true_objective: 4.023 +[2025-08-29 19:00:52,335][15827] Num frames 2900... +[2025-08-29 19:00:52,623][15827] Num frames 3000... +[2025-08-29 19:00:52,851][15827] Num frames 3100... +[2025-08-29 19:00:53,055][15827] Num frames 3200... +[2025-08-29 19:00:53,108][15827] Avg episode rewards: #0: 4.375, true rewards: #0: 4.000 +[2025-08-29 19:00:53,109][15827] Avg episode reward: 4.375, avg true_objective: 4.000 +[2025-08-29 19:00:53,327][15827] Num frames 3300... +[2025-08-29 19:00:53,537][15827] Num frames 3400... +[2025-08-29 19:00:53,767][15827] Num frames 3500... +[2025-08-29 19:00:53,984][15827] Avg episode rewards: #0: 4.316, true rewards: #0: 3.982 +[2025-08-29 19:00:53,986][15827] Avg episode reward: 4.316, avg true_objective: 3.982 +[2025-08-29 19:00:54,026][15827] Num frames 3600... +[2025-08-29 19:00:54,276][15827] Num frames 3700... +[2025-08-29 19:00:54,534][15827] Num frames 3800... +[2025-08-29 19:00:54,815][15827] Num frames 3900... +[2025-08-29 19:00:55,101][15827] Num frames 4000... +[2025-08-29 19:00:55,153][15827] Avg episode rewards: #0: 4.300, true rewards: #0: 4.000 +[2025-08-29 19:00:55,155][15827] Avg episode reward: 4.300, avg true_objective: 4.000 +[2025-08-29 19:01:01,058][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! +[2025-08-29 19:03:50,532][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json +[2025-08-29 19:03:50,534][15827] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-29 19:03:50,535][15827] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-29 19:03:50,536][15827] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-29 19:03:50,536][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-29 19:03:50,537][15827] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-29 19:03:50,538][15827] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-08-29 19:03:50,538][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-29 19:03:50,539][15827] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-08-29 19:03:50,540][15827] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-08-29 19:03:50,541][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-29 19:03:50,542][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-29 19:03:50,542][15827] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-29 19:03:50,543][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-29 19:03:50,545][15827] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-29 19:03:50,580][15827] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 19:03:50,581][15827] RunningMeanStd input shape: (1,) +[2025-08-29 19:03:50,598][15827] ConvEncoder: input_channels=3 +[2025-08-29 19:03:50,631][15827] Conv encoder output size: 512 +[2025-08-29 19:03:50,633][15827] Policy head output size: 512 +[2025-08-29 19:03:50,668][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2025-08-29 19:03:51,206][15827] Num frames 100... +[2025-08-29 19:03:51,458][15827] Num frames 200... +[2025-08-29 19:03:51,669][15827] Num frames 300... +[2025-08-29 19:03:51,877][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2025-08-29 19:03:51,878][15827] Avg episode reward: 3.840, avg true_objective: 3.840 +[2025-08-29 19:03:51,913][15827] Num frames 400... +[2025-08-29 19:03:52,094][15827] Num frames 500... +[2025-08-29 19:03:52,281][15827] Num frames 600... +[2025-08-29 19:03:52,475][15827] Num frames 700... +[2025-08-29 19:03:52,659][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2025-08-29 19:03:52,660][15827] Avg episode reward: 3.840, avg true_objective: 3.840 +[2025-08-29 19:03:52,730][15827] Num frames 800... +[2025-08-29 19:03:52,915][15827] Num frames 900... +[2025-08-29 19:03:53,117][15827] Num frames 1000... +[2025-08-29 19:03:53,320][15827] Num frames 1100... +[2025-08-29 19:03:53,512][15827] Num frames 1200... +[2025-08-29 19:03:53,605][15827] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2025-08-29 19:03:53,607][15827] Avg episode reward: 4.387, avg true_objective: 4.053 +[2025-08-29 19:03:53,780][15827] Num frames 1300... +[2025-08-29 19:03:53,963][15827] Num frames 1400... +[2025-08-29 19:03:54,157][15827] Num frames 1500... +[2025-08-29 19:03:54,360][15827] Num frames 1600... +[2025-08-29 19:03:54,413][15827] Avg episode rewards: #0: 4.250, true rewards: #0: 4.000 +[2025-08-29 19:03:54,414][15827] Avg episode reward: 4.250, avg true_objective: 4.000 +[2025-08-29 19:03:54,630][15827] Num frames 1700... +[2025-08-29 19:03:54,865][15827] Num frames 1800... +[2025-08-29 19:03:55,049][15827] Num frames 1900... +[2025-08-29 19:03:55,269][15827] Avg episode rewards: #0: 4.168, true rewards: #0: 3.968 +[2025-08-29 19:03:55,270][15827] Avg episode reward: 4.168, avg true_objective: 3.968 +[2025-08-29 19:03:55,308][15827] Num frames 2000... +[2025-08-29 19:03:55,509][15827] Num frames 2100... +[2025-08-29 19:03:55,701][15827] Num frames 2200... +[2025-08-29 19:03:55,882][15827] Num frames 2300... +[2025-08-29 19:03:56,084][15827] Num frames 2400... +[2025-08-29 19:03:56,217][15827] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2025-08-29 19:03:56,219][15827] Avg episode reward: 4.387, avg true_objective: 4.053 +[2025-08-29 19:03:56,364][15827] Num frames 2500... +[2025-08-29 19:03:56,573][15827] Num frames 2600... +[2025-08-29 19:03:56,768][15827] Num frames 2700... +[2025-08-29 19:03:56,968][15827] Num frames 2800... +[2025-08-29 19:03:57,059][15827] Avg episode rewards: #0: 4.309, true rewards: #0: 4.023 +[2025-08-29 19:03:57,061][15827] Avg episode reward: 4.309, avg true_objective: 4.023 +[2025-08-29 19:03:57,226][15827] Num frames 2900... +[2025-08-29 19:03:57,433][15827] Num frames 3000... +[2025-08-29 19:03:57,631][15827] Num frames 3100... +[2025-08-29 19:03:57,824][15827] Num frames 3200... +[2025-08-29 19:03:57,943][15827] Avg episode rewards: #0: 4.415, true rewards: #0: 4.040 +[2025-08-29 19:03:57,944][15827] Avg episode reward: 4.415, avg true_objective: 4.040 +[2025-08-29 19:03:58,104][15827] Num frames 3300... +[2025-08-29 19:03:58,348][15827] Num frames 3400... +[2025-08-29 19:03:58,555][15827] Num frames 3500... +[2025-08-29 19:03:58,761][15827] Num frames 3600... +[2025-08-29 19:03:58,970][15827] Avg episode rewards: #0: 4.533, true rewards: #0: 4.089 +[2025-08-29 19:03:58,971][15827] Avg episode reward: 4.533, avg true_objective: 4.089 +[2025-08-29 19:03:59,017][15827] Num frames 3700... +[2025-08-29 19:03:59,202][15827] Num frames 3800... +[2025-08-29 19:03:59,394][15827] Num frames 3900... +[2025-08-29 19:03:59,590][15827] Num frames 4000... +[2025-08-29 19:03:59,798][15827] Avg episode rewards: #0: 4.596, true rewards: #0: 4.096 +[2025-08-29 19:03:59,799][15827] Avg episode reward: 4.596, avg true_objective: 4.096 +[2025-08-29 19:04:05,558][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4! +[2025-08-29 19:06:00,917][15827] Loading existing experiment configuration from /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/config.json +[2025-08-29 19:06:00,918][15827] Overriding arg 'num_workers' with value 1 passed from command line +[2025-08-29 19:06:00,919][15827] Adding new argument 'no_render'=True that is not in the saved config file! +[2025-08-29 19:06:00,920][15827] Adding new argument 'save_video'=True that is not in the saved config file! +[2025-08-29 19:06:00,922][15827] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file! +[2025-08-29 19:06:00,924][15827] Adding new argument 'video_name'=None that is not in the saved config file! +[2025-08-29 19:06:00,925][15827] Adding new argument 'max_num_frames'=100000 that is not in the saved config file! +[2025-08-29 19:06:00,926][15827] Adding new argument 'max_num_episodes'=10 that is not in the saved config file! +[2025-08-29 19:06:00,927][15827] Adding new argument 'push_to_hub'=True that is not in the saved config file! +[2025-08-29 19:06:00,927][15827] Adding new argument 'hf_repository'='turbo-maikol/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file! +[2025-08-29 19:06:00,928][15827] Adding new argument 'policy_index'=0 that is not in the saved config file! +[2025-08-29 19:06:00,929][15827] Adding new argument 'eval_deterministic'=False that is not in the saved config file! +[2025-08-29 19:06:00,929][15827] Adding new argument 'train_script'=None that is not in the saved config file! +[2025-08-29 19:06:00,930][15827] Adding new argument 'enjoy_script'=None that is not in the saved config file! +[2025-08-29 19:06:00,931][15827] Using frameskip 1 and render_action_repeat=4 for evaluation +[2025-08-29 19:06:00,957][15827] RunningMeanStd input shape: (3, 72, 128) +[2025-08-29 19:06:00,959][15827] RunningMeanStd input shape: (1,) +[2025-08-29 19:06:00,971][15827] ConvEncoder: input_channels=3 +[2025-08-29 19:06:01,002][15827] Conv encoder output size: 512 +[2025-08-29 19:06:01,003][15827] Policy head output size: 512 +[2025-08-29 19:06:01,045][15827] Loading state from checkpoint /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/checkpoint_p0/checkpoint_000004884_20004864.pth... +[2025-08-29 19:06:01,719][15827] Num frames 100... +[2025-08-29 19:06:01,917][15827] Num frames 200... +[2025-08-29 19:06:02,156][15827] Num frames 300... +[2025-08-29 19:06:02,427][15827] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840 +[2025-08-29 19:06:02,428][15827] Avg episode reward: 3.840, avg true_objective: 3.840 +[2025-08-29 19:06:02,474][15827] Num frames 400... +[2025-08-29 19:06:02,683][15827] Num frames 500... +[2025-08-29 19:06:02,881][15827] Num frames 600... +[2025-08-29 19:06:03,003][15827] Num frames 700... +[2025-08-29 19:06:03,153][15827] Num frames 800... +[2025-08-29 19:06:03,253][15827] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2025-08-29 19:06:03,254][15827] Avg episode reward: 4.660, avg true_objective: 4.160 +[2025-08-29 19:06:03,358][15827] Num frames 900... +[2025-08-29 19:06:03,509][15827] Num frames 1000... +[2025-08-29 19:06:03,679][15827] Num frames 1100... +[2025-08-29 19:06:03,794][15827] Num frames 1200... +[2025-08-29 19:06:03,867][15827] Avg episode rewards: #0: 4.387, true rewards: #0: 4.053 +[2025-08-29 19:06:03,868][15827] Avg episode reward: 4.387, avg true_objective: 4.053 +[2025-08-29 19:06:04,053][15827] Num frames 1300... +[2025-08-29 19:06:04,204][15827] Num frames 1400... +[2025-08-29 19:06:04,377][15827] Num frames 1500... +[2025-08-29 19:06:04,537][15827] Num frames 1600... +[2025-08-29 19:06:04,685][15827] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2025-08-29 19:06:04,686][15827] Avg episode reward: 4.660, avg true_objective: 4.160 +[2025-08-29 19:06:04,750][15827] Num frames 1700... +[2025-08-29 19:06:04,890][15827] Num frames 1800... +[2025-08-29 19:06:05,043][15827] Num frames 1900... +[2025-08-29 19:06:05,178][15827] Num frames 2000... +[2025-08-29 19:06:05,329][15827] Num frames 2100... +[2025-08-29 19:06:05,400][15827] Avg episode rewards: #0: 4.824, true rewards: #0: 4.224 +[2025-08-29 19:06:05,401][15827] Avg episode reward: 4.824, avg true_objective: 4.224 +[2025-08-29 19:06:05,581][15827] Num frames 2200... +[2025-08-29 19:06:05,717][15827] Num frames 2300... +[2025-08-29 19:06:05,860][15827] Num frames 2400... +[2025-08-29 19:06:06,052][15827] Avg episode rewards: #0: 4.660, true rewards: #0: 4.160 +[2025-08-29 19:06:06,053][15827] Avg episode reward: 4.660, avg true_objective: 4.160 +[2025-08-29 19:06:06,068][15827] Num frames 2500... +[2025-08-29 19:06:06,206][15827] Num frames 2600... +[2025-08-29 19:06:06,417][15827] Num frames 2700... +[2025-08-29 19:06:06,581][15827] Num frames 2800... +[2025-08-29 19:06:06,746][15827] Avg episode rewards: #0: 4.543, true rewards: #0: 4.114 +[2025-08-29 19:06:06,747][15827] Avg episode reward: 4.543, avg true_objective: 4.114 +[2025-08-29 19:06:06,795][15827] Num frames 2900... +[2025-08-29 19:06:10,630][15827] Num frames 3000... +[2025-08-29 19:06:10,769][15827] Num frames 3100... +[2025-08-29 19:06:10,957][15827] Num frames 3200... +[2025-08-29 19:06:11,107][15827] Avg episode rewards: #0: 4.455, true rewards: #0: 4.080 +[2025-08-29 19:06:11,108][15827] Avg episode reward: 4.455, avg true_objective: 4.080 +[2025-08-29 19:06:11,177][15827] Num frames 3300... +[2025-08-29 19:06:11,317][15827] Num frames 3400... +[2025-08-29 19:06:11,483][15827] Num frames 3500... +[2025-08-29 19:06:11,681][15827] Num frames 3600... +[2025-08-29 19:06:11,852][15827] Avg episode rewards: #0: 4.533, true rewards: #0: 4.089 +[2025-08-29 19:06:11,853][15827] Avg episode reward: 4.533, avg true_objective: 4.089 +[2025-08-29 19:06:11,920][15827] Num frames 3700... +[2025-08-29 19:06:12,073][15827] Num frames 3800... +[2025-08-29 19:06:12,202][15827] Num frames 3900... +[2025-08-29 19:06:12,483][15827] Num frames 4000... +[2025-08-29 19:06:12,731][15827] Avg episode rewards: #0: 4.464, true rewards: #0: 4.064 +[2025-08-29 19:06:12,733][15827] Avg episode reward: 4.464, avg true_objective: 4.064 +[2025-08-29 19:06:18,499][15827] Replay video saved to /home/mique/Desktop/Code/deep-rl-class/notebooks/unit8/train_dir/default_experiment/replay.mp4!