togu6669 commited on
Commit
6e2bd09
·
verified ·
1 Parent(s): 9878be4

Upload folder using huggingface_hub

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ replay.mp4 filter=lfs diff=lfs merge=lfs -text
.summary/0/events.out.tfevents.1743885919.tguz-ASUS ADDED
File without changes
.summary/0/events.out.tfevents.1743885927.tguz-ASUS ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:65ee1c34cc0c05587f7fcb1f39bf15ee8c9f58d529e17914973f2a9ca5db57fb
3
+ size 40
.summary/0/events.out.tfevents.1743886069.tguz-ASUS ADDED
File without changes
.summary/0/events.out.tfevents.1743886088.tguz-ASUS ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7ee1fda7daa65ca7b0e8273c8150b7dde7d39e8218fd1f1288c4c4f0e40cd0bc
3
+ size 40
.summary/0/events.out.tfevents.1743966703.tguz-ASUS ADDED
File without changes
.summary/0/events.out.tfevents.1743966725.tguz-ASUS ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d2f6b6868c238950ee32dc71f37a9d9006ec49a76f40e08b713c4e161c994fae
3
+ size 40
.summary/0/events.out.tfevents.1743967250.tguz-ASUS ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:649a664593a7681e3007efbed7ba706689d0b9d01473f5645b6a92f9ab1e9083
3
+ size 466054
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: sample-factory
3
+ tags:
4
+ - deep-reinforcement-learning
5
+ - reinforcement-learning
6
+ - sample-factory
7
+ model-index:
8
+ - name: APPO
9
+ results:
10
+ - task:
11
+ type: reinforcement-learning
12
+ name: reinforcement-learning
13
+ dataset:
14
+ name: doom_health_gathering_supreme
15
+ type: doom_health_gathering_supreme
16
+ metrics:
17
+ - type: mean_reward
18
+ value: 12.38 +/- 4.75
19
+ name: mean_reward
20
+ verified: false
21
+ ---
22
+
23
+ A(n) **APPO** model trained on the **doom_health_gathering_supreme** environment.
24
+
25
+ This model was trained using Sample-Factory 2.0: https://github.com/alex-petrenko/sample-factory.
26
+ Documentation for how to use Sample-Factory can be found at https://www.samplefactory.dev/
27
+
28
+
29
+ ## Downloading the model
30
+
31
+ After installing Sample-Factory, download the model with:
32
+ ```
33
+ python -m sample_factory.huggingface.load_from_hub -r togu6669/rl_course_vizdoom_health_gathering_supreme
34
+ ```
35
+
36
+
37
+ ## Using the model
38
+
39
+ To run the model after download, use the `enjoy` script corresponding to this environment:
40
+ ```
41
+ python -m <path.to.enjoy.module> --algo=APPO --env=doom_health_gathering_supreme --train_dir=./train_dir --experiment=rl_course_vizdoom_health_gathering_supreme
42
+ ```
43
+
44
+
45
+ You can also upload models to the Hugging Face Hub using the same script with the `--push_to_hub` flag.
46
+ See https://www.samplefactory.dev/10-huggingface/huggingface/ for more details
47
+
48
+ ## Training with this model
49
+
50
+ To continue training with this model, use the `train` script corresponding to this environment:
51
+ ```
52
+ python -m <path.to.train.module> --algo=APPO --env=doom_health_gathering_supreme --train_dir=./train_dir --experiment=rl_course_vizdoom_health_gathering_supreme --restart_behavior=resume --train_for_env_steps=10000000000
53
+ ```
54
+
55
+ Note, you may have to adjust `--train_for_env_steps` to a suitably high number as the experiment will resume at the number of steps it concluded at.
56
+
checkpoint_p0/best_000000951_3895296_reward_26.537.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:eaec76dea9c52ebc495d5371c2fa9445ee9acd768eabeed59a6a83fe6ded4a6b
3
+ size 34929051
checkpoint_p0/checkpoint_000000741_3035136.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:302c35205f5097fc1a4c70b5dad5335570768a60d626dcc762db3cbf804b3dfc
3
+ size 34929541
checkpoint_p0/checkpoint_000000978_4005888.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e969c26b45be6027caa26141c5d7189cc7ce8ba8a637577e76ff95757b08afd
3
+ size 34929541
config.json ADDED
@@ -0,0 +1,142 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "help": false,
3
+ "algo": "APPO",
4
+ "env": "doom_health_gathering_supreme",
5
+ "experiment": "default_experiment",
6
+ "train_dir": "/home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir",
7
+ "restart_behavior": "resume",
8
+ "device": "gpu",
9
+ "seed": null,
10
+ "num_policies": 1,
11
+ "async_rl": true,
12
+ "serial_mode": false,
13
+ "batched_sampling": false,
14
+ "num_batches_to_accumulate": 2,
15
+ "worker_num_splits": 2,
16
+ "policy_workers_per_policy": 1,
17
+ "max_policy_lag": 1000,
18
+ "num_workers": 8,
19
+ "num_envs_per_worker": 4,
20
+ "batch_size": 1024,
21
+ "num_batches_per_epoch": 1,
22
+ "num_epochs": 1,
23
+ "rollout": 32,
24
+ "recurrence": 32,
25
+ "shuffle_minibatches": false,
26
+ "gamma": 0.99,
27
+ "reward_scale": 1.0,
28
+ "reward_clip": 1000.0,
29
+ "value_bootstrap": false,
30
+ "normalize_returns": true,
31
+ "exploration_loss_coeff": 0.001,
32
+ "value_loss_coeff": 0.5,
33
+ "kl_loss_coeff": 0.0,
34
+ "exploration_loss": "symmetric_kl",
35
+ "gae_lambda": 0.95,
36
+ "ppo_clip_ratio": 0.1,
37
+ "ppo_clip_value": 0.2,
38
+ "with_vtrace": false,
39
+ "vtrace_rho": 1.0,
40
+ "vtrace_c": 1.0,
41
+ "optimizer": "adam",
42
+ "adam_eps": 1e-06,
43
+ "adam_beta1": 0.9,
44
+ "adam_beta2": 0.999,
45
+ "max_grad_norm": 4.0,
46
+ "learning_rate": 0.0001,
47
+ "lr_schedule": "constant",
48
+ "lr_schedule_kl_threshold": 0.008,
49
+ "lr_adaptive_min": 1e-06,
50
+ "lr_adaptive_max": 0.01,
51
+ "obs_subtract_mean": 0.0,
52
+ "obs_scale": 255.0,
53
+ "normalize_input": true,
54
+ "normalize_input_keys": null,
55
+ "decorrelate_experience_max_seconds": 0,
56
+ "decorrelate_envs_on_one_worker": true,
57
+ "actor_worker_gpus": [],
58
+ "set_workers_cpu_affinity": true,
59
+ "force_envs_single_thread": false,
60
+ "default_niceness": 0,
61
+ "log_to_file": true,
62
+ "experiment_summaries_interval": 10,
63
+ "flush_summaries_interval": 30,
64
+ "stats_avg": 100,
65
+ "summaries_use_frameskip": true,
66
+ "heartbeat_interval": 20,
67
+ "heartbeat_reporting_interval": 600,
68
+ "train_for_env_steps": 4000000,
69
+ "train_for_seconds": 10000000000,
70
+ "save_every_sec": 120,
71
+ "keep_checkpoints": 2,
72
+ "load_checkpoint_kind": "latest",
73
+ "save_milestones_sec": -1,
74
+ "save_best_every_sec": 5,
75
+ "save_best_metric": "reward",
76
+ "save_best_after": 100000,
77
+ "benchmark": false,
78
+ "encoder_mlp_layers": [
79
+ 512,
80
+ 512
81
+ ],
82
+ "encoder_conv_architecture": "convnet_simple",
83
+ "encoder_conv_mlp_layers": [
84
+ 512
85
+ ],
86
+ "use_rnn": true,
87
+ "rnn_size": 512,
88
+ "rnn_type": "gru",
89
+ "rnn_num_layers": 1,
90
+ "decoder_mlp_layers": [],
91
+ "nonlinearity": "elu",
92
+ "policy_initialization": "orthogonal",
93
+ "policy_init_gain": 1.0,
94
+ "actor_critic_share_weights": true,
95
+ "adaptive_stddev": true,
96
+ "continuous_tanh_scale": 0.0,
97
+ "initial_stddev": 1.0,
98
+ "use_env_info_cache": false,
99
+ "env_gpu_actions": false,
100
+ "env_gpu_observations": true,
101
+ "env_frameskip": 4,
102
+ "env_framestack": 1,
103
+ "pixel_format": "CHW",
104
+ "use_record_episode_statistics": false,
105
+ "with_wandb": false,
106
+ "wandb_user": null,
107
+ "wandb_project": "sample_factory",
108
+ "wandb_group": null,
109
+ "wandb_job_type": "SF",
110
+ "wandb_tags": [],
111
+ "with_pbt": false,
112
+ "pbt_mix_policies_in_one_env": true,
113
+ "pbt_period_env_steps": 5000000,
114
+ "pbt_start_mutation": 20000000,
115
+ "pbt_replace_fraction": 0.3,
116
+ "pbt_mutation_rate": 0.15,
117
+ "pbt_replace_reward_gap": 0.1,
118
+ "pbt_replace_reward_gap_absolute": 1e-06,
119
+ "pbt_optimize_gamma": false,
120
+ "pbt_target_objective": "true_objective",
121
+ "pbt_perturb_min": 1.1,
122
+ "pbt_perturb_max": 1.5,
123
+ "num_agents": -1,
124
+ "num_humans": 0,
125
+ "num_bots": -1,
126
+ "start_bot_difficulty": null,
127
+ "timelimit": null,
128
+ "res_w": 128,
129
+ "res_h": 72,
130
+ "wide_aspect_ratio": false,
131
+ "eval_env_frameskip": 1,
132
+ "fps": 35,
133
+ "command_line": "--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000",
134
+ "cli_args": {
135
+ "env": "doom_health_gathering_supreme",
136
+ "num_workers": 8,
137
+ "num_envs_per_worker": 4,
138
+ "train_for_env_steps": 4000000
139
+ },
140
+ "git_hash": "9b997b38e3980d2faacd862b544705c166fa246f",
141
+ "git_repo_name": "https://github.com/togu6669/Hugging-Face-RL.git"
142
+ }
git.diff ADDED
@@ -0,0 +1,125 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ diff --git a/Python/PPOVizDoom.py b/Python/PPOVizDoom.py
2
+ index 29087d6..6a61571 100644
3
+ --- a/Python/PPOVizDoom.py
4
+ +++ b/Python/PPOVizDoom.py
5
+ @@ -40,62 +40,62 @@ def parse_vizdoom_cfg(argv=None, evaluation=False):
6
+ return final_cfg
7
+
8
+
9
+ -## Start the training, this should take around 15 minutes
10
+ -register_vizdoom_components()
11
+ -
12
+ -# The scenario we train on today is health gathering
13
+ -# other scenarios include "doom_basic", "doom_two_colors_easy", "doom_dm", "doom_dwango5", "doom_my_way_home", "doom_deadly_corridor", "doom_defend_the_center", "doom_defend_the_line"
14
+ -env = "doom_health_gathering_supreme"
15
+ -cfg = parse_vizdoom_cfg(
16
+ - argv=[f"--env={env}", "--num_workers=8", "--num_envs_per_worker=4", "--train_for_env_steps=4000000"]
17
+ -)
18
+ -
19
+ -status = run_rl(cfg)
20
+ -
21
+ -
22
+ -from sample_factory.enjoy import enjoy
23
+ -
24
+ -cfg = parse_vizdoom_cfg(
25
+ - argv=[f"--env={env}", "--num_workers=1", "--save_video", "--no_render", "--max_num_episodes=10"], evaluation=True
26
+ -)
27
+ -status = enjoy(cfg)
28
+ -
29
+ -
30
+ -# from base64 import b64encode
31
+ -# from IPython.display import HTML
32
+ -
33
+ -# mp4 = open("/content/train_dir/default_experiment/replay.mp4", "rb").read()
34
+ -# data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
35
+ -# HTML(
36
+ -# """
37
+ -# <video width=640 controls>
38
+ -# <source src="%s" type="video/mp4">
39
+ -# </video>
40
+ -# """
41
+ -# % data_url
42
+ -# )
43
+ -
44
+ -from huggingface_hub import notebook_login
45
+ -notebook_login()
46
+ -
47
+ -# !git config --global credential.helper store
48
+ -
49
+ -from sample_factory.enjoy import enjoy
50
+ -
51
+ -hf_username = "ThomasSimonini" # insert your HuggingFace username here
52
+ -
53
+ -cfg = parse_vizdoom_cfg(
54
+ - argv=[
55
+ - f"--env={env}",
56
+ - "--num_workers=1",
57
+ - "--save_video",
58
+ - "--no_render",
59
+ - "--max_num_episodes=10",
60
+ - "--max_num_frames=100000",
61
+ - "--push_to_hub",
62
+ - f"--hf_repository={hf_username}/rl_course_vizdoom_health_gathering_supreme",
63
+ - ],
64
+ - evaluation=True,
65
+ -)
66
+ -status = enjoy(cfg)
67
+ +if __name__ == '__main__':
68
+ + ## Start the training, this should take around 15 minutes
69
+ + register_vizdoom_components()
70
+ +
71
+ + # The scenario we train on today is health gathering
72
+ + # other scenarios include "doom_basic", "doom_two_colors_easy", "doom_dm", "doom_dwango5", "doom_my_way_home", "doom_deadly_corridor", "doom_defend_the_center", "doom_defend_the_line"
73
+ + env = "doom_health_gathering_supreme"
74
+ + cfg = parse_vizdoom_cfg(
75
+ + argv=[f"--env={env}", "--num_workers=8", "--num_envs_per_worker=4", "--train_for_env_steps=4000000"]
76
+ + )
77
+ + status = run_rl(cfg)
78
+ +
79
+ +
80
+ + from sample_factory.enjoy import enjoy
81
+ +
82
+ + cfg = parse_vizdoom_cfg(
83
+ + argv=[f"--env={env}", "--num_workers=1", "--save_video", "--no_render", "--max_num_episodes=10"], evaluation=True
84
+ + )
85
+ + status = enjoy(cfg)
86
+ +
87
+ +
88
+ + # from base64 import b64encode
89
+ + # from IPython.display import HTML
90
+ +
91
+ + # mp4 = open("/content/train_dir/default_experiment/replay.mp4", "rb").read()
92
+ + # data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
93
+ + # HTML(
94
+ + # """
95
+ + # <video width=640 controls>
96
+ + # <source src="%s" type="video/mp4">
97
+ + # </video>
98
+ + # """
99
+ + # % data_url
100
+ + # )
101
+ +
102
+ + from huggingface_hub import notebook_login
103
+ + notebook_login()
104
+ +
105
+ + # !git config --global credential.helper store
106
+ +
107
+ + from sample_factory.enjoy import enjoy
108
+ +
109
+ + hf_username = "togu6669" # insert your HuggingFace username here
110
+ +
111
+ + cfg = parse_vizdoom_cfg(
112
+ + argv=[
113
+ + f"--env={env}",
114
+ + "--num_workers=1",
115
+ + "--save_video",
116
+ + "--no_render",
117
+ + "--max_num_episodes=10",
118
+ + "--max_num_frames=100000",
119
+ + "--push_to_hub",
120
+ + f"--hf_repository={hf_username}/rl_course_vizdoom_health_gathering_supreme",
121
+ + ],
122
+ + evaluation=True,
123
+ + )
124
+ + status = enjoy(cfg)
125
+
replay.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c6654a5248c11972a00859c9e17904a952899c319b0049cd593909923780071c
3
+ size 23803452
sf_log.txt ADDED
@@ -0,0 +1,654 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [2025-04-06 21:21:03,390][29458] Saving configuration to /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/config.json...
2
+ [2025-04-06 21:21:03,429][29458] Rollout worker 0 uses device cpu
3
+ [2025-04-06 21:21:03,429][29458] Rollout worker 1 uses device cpu
4
+ [2025-04-06 21:21:03,429][29458] Rollout worker 2 uses device cpu
5
+ [2025-04-06 21:21:03,430][29458] Rollout worker 3 uses device cpu
6
+ [2025-04-06 21:21:03,430][29458] Rollout worker 4 uses device cpu
7
+ [2025-04-06 21:21:03,430][29458] Rollout worker 5 uses device cpu
8
+ [2025-04-06 21:21:03,430][29458] Rollout worker 6 uses device cpu
9
+ [2025-04-06 21:21:03,431][29458] Rollout worker 7 uses device cpu
10
+ [2025-04-06 21:21:03,542][29458] Using GPUs [0] for process 0 (actually maps to GPUs [0])
11
+ [2025-04-06 21:21:03,542][29458] InferenceWorker_p0-w0: min num requests: 2
12
+ [2025-04-06 21:21:03,595][29458] Starting all processes...
13
+ [2025-04-06 21:21:03,596][29458] Starting process learner_proc0
14
+ [2025-04-06 21:21:12,041][29458] Starting all processes...
15
+ [2025-04-06 21:21:12,055][29697] Using GPUs [0] for process 0 (actually maps to GPUs [0])
16
+ [2025-04-06 21:21:12,056][29697] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
17
+ [2025-04-06 21:21:12,068][29458] Starting process inference_proc0-0
18
+ [2025-04-06 21:21:12,068][29458] Starting process rollout_proc0
19
+ [2025-04-06 21:21:12,068][29458] Starting process rollout_proc1
20
+ [2025-04-06 21:21:12,071][29458] Starting process rollout_proc6
21
+ [2025-04-06 21:21:12,078][29697] Num visible devices: 1
22
+ [2025-04-06 21:21:12,086][29697] Starting seed is not provided
23
+ [2025-04-06 21:21:12,087][29697] Using GPUs [0] for process 0 (actually maps to GPUs [0])
24
+ [2025-04-06 21:21:12,088][29697] Initializing actor-critic model on device cuda:0
25
+ [2025-04-06 21:21:12,090][29697] RunningMeanStd input shape: (3, 72, 128)
26
+ [2025-04-06 21:21:12,095][29697] RunningMeanStd input shape: (1,)
27
+ [2025-04-06 21:21:12,069][29458] Starting process rollout_proc3
28
+ [2025-04-06 21:21:12,069][29458] Starting process rollout_proc4
29
+ [2025-04-06 21:21:12,070][29458] Starting process rollout_proc5
30
+ [2025-04-06 21:21:12,068][29458] Starting process rollout_proc2
31
+ [2025-04-06 21:21:12,071][29458] Starting process rollout_proc7
32
+ [2025-04-06 21:21:12,144][29697] ConvEncoder: input_channels=3
33
+ [2025-04-06 21:21:12,576][29697] Conv encoder output size: 512
34
+ [2025-04-06 21:21:12,577][29697] Policy head output size: 512
35
+ [2025-04-06 21:21:12,610][29697] Created Actor Critic model with architecture:
36
+ [2025-04-06 21:21:12,611][29697] ActorCriticSharedWeights(
37
+ (obs_normalizer): ObservationNormalizer(
38
+ (running_mean_std): RunningMeanStdDictInPlace(
39
+ (running_mean_std): ModuleDict(
40
+ (obs): RunningMeanStdInPlace()
41
+ )
42
+ )
43
+ )
44
+ (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
45
+ (encoder): VizdoomEncoder(
46
+ (basic_encoder): ConvEncoder(
47
+ (enc): RecursiveScriptModule(
48
+ original_name=ConvEncoderImpl
49
+ (conv_head): RecursiveScriptModule(
50
+ original_name=Sequential
51
+ (0): RecursiveScriptModule(original_name=Conv2d)
52
+ (1): RecursiveScriptModule(original_name=ELU)
53
+ (2): RecursiveScriptModule(original_name=Conv2d)
54
+ (3): RecursiveScriptModule(original_name=ELU)
55
+ (4): RecursiveScriptModule(original_name=Conv2d)
56
+ (5): RecursiveScriptModule(original_name=ELU)
57
+ )
58
+ (mlp_layers): RecursiveScriptModule(
59
+ original_name=Sequential
60
+ (0): RecursiveScriptModule(original_name=Linear)
61
+ (1): RecursiveScriptModule(original_name=ELU)
62
+ )
63
+ )
64
+ )
65
+ )
66
+ (core): ModelCoreRNN(
67
+ (core): GRU(512, 512)
68
+ )
69
+ (decoder): MlpDecoder(
70
+ (mlp): Identity()
71
+ )
72
+ (critic_linear): Linear(in_features=512, out_features=1, bias=True)
73
+ (action_parameterization): ActionParameterizationDefault(
74
+ (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
75
+ )
76
+ )
77
+ [2025-04-06 21:21:12,813][29697] Using optimizer <class 'torch.optim.adam.Adam'>
78
+ [2025-04-06 21:21:18,468][29697] No checkpoints found
79
+ [2025-04-06 21:21:18,471][29697] Did not load from checkpoint, starting from scratch!
80
+ [2025-04-06 21:21:18,474][29697] Initialized policy 0 weights for model version 0
81
+ [2025-04-06 21:21:18,488][29697] LearnerWorker_p0 finished initialization!
82
+ [2025-04-06 21:21:18,488][29697] Using GPUs [0] for process 0 (actually maps to GPUs [0])
83
+ [2025-04-06 21:21:27,570][29817] Worker 6 uses CPU cores [6]
84
+ [2025-04-06 21:21:32,697][29818] Worker 0 uses CPU cores [0]
85
+ [2025-04-06 21:21:34,112][29822] Worker 5 uses CPU cores [5]
86
+ [2025-04-06 21:21:40,610][29819] Worker 3 uses CPU cores [3]
87
+ [2025-04-06 21:21:50,398][29816] Using GPUs [0] for process 0 (actually maps to GPUs [0])
88
+ [2025-04-06 21:21:50,399][29816] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
89
+ [2025-04-06 21:21:50,418][29816] Num visible devices: 1
90
+ [2025-04-06 21:21:50,551][29816] RunningMeanStd input shape: (3, 72, 128)
91
+ [2025-04-06 21:21:50,553][29816] RunningMeanStd input shape: (1,)
92
+ [2025-04-06 21:21:50,583][29816] ConvEncoder: input_channels=3
93
+ [2025-04-06 21:21:50,754][29816] Conv encoder output size: 512
94
+ [2025-04-06 21:21:50,755][29816] Policy head output size: 512
95
+ [2025-04-06 21:21:56,879][29823] Worker 7 uses CPU cores [7]
96
+ [2025-04-06 21:22:03,459][29815] Worker 1 uses CPU cores [1]
97
+ [2025-04-06 21:22:25,110][29820] Worker 4 uses CPU cores [4]
98
+ [2025-04-06 21:22:27,189][29458] Heartbeat connected on Batcher_0
99
+ [2025-04-06 21:22:27,191][29458] Heartbeat connected on LearnerWorker_p0
100
+ [2025-04-06 21:22:27,191][29458] Heartbeat connected on RolloutWorker_w6
101
+ [2025-04-06 21:22:27,192][29458] Heartbeat connected on RolloutWorker_w0
102
+ [2025-04-06 21:22:27,192][29458] Heartbeat connected on RolloutWorker_w5
103
+ [2025-04-06 21:22:27,193][29458] Heartbeat connected on RolloutWorker_w3
104
+ [2025-04-06 21:22:27,193][29458] Inference worker 0-0 is ready!
105
+ [2025-04-06 21:22:27,194][29458] All inference workers are ready! Signal rollout workers to start!
106
+ [2025-04-06 21:22:27,195][29458] Heartbeat connected on InferenceWorker_p0-w0
107
+ [2025-04-06 21:22:27,195][29458] Heartbeat connected on RolloutWorker_w7
108
+ [2025-04-06 21:22:27,196][29821] Worker 2 uses CPU cores [2]
109
+ [2025-04-06 21:22:27,198][29458] Heartbeat connected on RolloutWorker_w1
110
+ [2025-04-06 21:22:27,200][29458] Heartbeat connected on RolloutWorker_w4
111
+ [2025-04-06 21:22:27,202][29458] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
112
+ [2025-04-06 21:22:27,224][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
113
+ [2025-04-06 21:22:27,231][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
114
+ [2025-04-06 21:22:27,236][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
115
+ [2025-04-06 21:22:27,239][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
116
+ [2025-04-06 21:22:27,248][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
117
+ [2025-04-06 21:22:27,252][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
118
+ [2025-04-06 21:22:27,255][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
119
+ [2025-04-06 21:22:27,257][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
120
+ [2025-04-06 21:22:27,260][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
121
+ [2025-04-06 21:22:27,261][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
122
+ [2025-04-06 21:22:27,263][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
123
+ [2025-04-06 21:22:27,264][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
124
+ [2025-04-06 21:22:27,265][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
125
+ [2025-04-06 21:22:27,267][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
126
+ [2025-04-06 21:22:27,270][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
127
+ [2025-04-06 21:22:27,271][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
128
+ [2025-04-06 21:22:27,276][29820] Doom resolution: 160x120, resize resolution: (128, 72)
129
+ [2025-04-06 21:22:27,276][29818] Doom resolution: 160x120, resize resolution: (128, 72)
130
+ [2025-04-06 21:22:27,295][29823] Doom resolution: 160x120, resize resolution: (128, 72)
131
+ [2025-04-06 21:22:27,317][29819] Doom resolution: 160x120, resize resolution: (128, 72)
132
+ [2025-04-06 21:22:27,319][29815] Doom resolution: 160x120, resize resolution: (128, 72)
133
+ [2025-04-06 21:22:27,343][29821] Doom resolution: 160x120, resize resolution: (128, 72)
134
+ [2025-04-06 21:22:27,380][29817] Doom resolution: 160x120, resize resolution: (128, 72)
135
+ [2025-04-06 21:22:27,394][29822] Doom resolution: 160x120, resize resolution: (128, 72)
136
+ [2025-04-06 21:22:27,868][29815] Decorrelating experience for 0 frames...
137
+ [2025-04-06 21:22:27,868][29820] Decorrelating experience for 0 frames...
138
+ [2025-04-06 21:22:27,869][29823] Decorrelating experience for 0 frames...
139
+ [2025-04-06 21:22:27,871][29819] Decorrelating experience for 0 frames...
140
+ [2025-04-06 21:22:27,882][29818] Decorrelating experience for 0 frames...
141
+ [2025-04-06 21:22:27,902][29817] Decorrelating experience for 0 frames...
142
+ [2025-04-06 21:22:28,222][29818] Decorrelating experience for 32 frames...
143
+ [2025-04-06 21:22:28,231][29819] Decorrelating experience for 32 frames...
144
+ [2025-04-06 21:22:28,240][29817] Decorrelating experience for 32 frames...
145
+ [2025-04-06 21:22:28,256][29821] Decorrelating experience for 0 frames...
146
+ [2025-04-06 21:22:28,260][29820] Decorrelating experience for 32 frames...
147
+ [2025-04-06 21:22:28,275][29823] Decorrelating experience for 32 frames...
148
+ [2025-04-06 21:22:28,582][29822] Decorrelating experience for 0 frames...
149
+ [2025-04-06 21:22:28,598][29821] Decorrelating experience for 32 frames...
150
+ [2025-04-06 21:22:28,707][29819] Decorrelating experience for 64 frames...
151
+ [2025-04-06 21:22:28,748][29817] Decorrelating experience for 64 frames...
152
+ [2025-04-06 21:22:28,760][29823] Decorrelating experience for 64 frames...
153
+ [2025-04-06 21:22:28,783][29818] Decorrelating experience for 64 frames...
154
+ [2025-04-06 21:22:28,980][29822] Decorrelating experience for 32 frames...
155
+ [2025-04-06 21:22:28,985][29820] Decorrelating experience for 64 frames...
156
+ [2025-04-06 21:22:29,096][29819] Decorrelating experience for 96 frames...
157
+ [2025-04-06 21:22:29,163][29817] Decorrelating experience for 96 frames...
158
+ [2025-04-06 21:22:29,237][29818] Decorrelating experience for 96 frames...
159
+ [2025-04-06 21:22:29,391][29815] Decorrelating experience for 32 frames...
160
+ [2025-04-06 21:22:29,393][29820] Decorrelating experience for 96 frames...
161
+ [2025-04-06 21:22:29,460][29822] Decorrelating experience for 64 frames...
162
+ [2025-04-06 21:22:29,589][29823] Decorrelating experience for 96 frames...
163
+ [2025-04-06 21:22:29,790][29821] Decorrelating experience for 64 frames...
164
+ [2025-04-06 21:22:29,836][29822] Decorrelating experience for 96 frames...
165
+ [2025-04-06 21:22:29,856][29815] Decorrelating experience for 64 frames...
166
+ [2025-04-06 21:22:30,133][29821] Decorrelating experience for 96 frames...
167
+ [2025-04-06 21:22:30,204][29815] Decorrelating experience for 96 frames...
168
+ [2025-04-06 21:22:30,234][29458] Heartbeat connected on RolloutWorker_w2
169
+ [2025-04-06 21:22:30,818][29458] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 0. Throughput: 0: 9.0. Samples: 32. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
170
+ [2025-04-06 21:22:30,819][29458] Avg episode reward: [(0, '0.480')]
171
+ [2025-04-06 21:22:31,550][29697] Signal inference workers to stop experience collection...
172
+ [2025-04-06 21:22:31,561][29816] InferenceWorker_p0-w0: stopping experience collection
173
+ [2025-04-06 21:22:34,456][29697] Signal inference workers to resume experience collection...
174
+ [2025-04-06 21:22:34,458][29816] InferenceWorker_p0-w0: resuming experience collection
175
+ [2025-04-06 21:22:35,818][29458] Fps is (10 sec: 1917.0, 60 sec: 1912.7, 300 sec: 1901.5). Total num frames: 16384. Throughput: 0: 571.1. Samples: 4888. Policy #0 lag: (min: 0.0, avg: 0.0, max: 0.0)
176
+ [2025-04-06 21:22:35,819][29458] Avg episode reward: [(0, '3.035')]
177
+ [2025-04-06 21:22:38,690][29816] Updated weights for policy 0, policy_version 10 (0.0118)
178
+ [2025-04-06 21:22:40,819][29458] Fps is (10 sec: 5734.4, 60 sec: 4227.9, 300 sec: 4211.3). Total num frames: 57344. Throughput: 0: 800.0. Samples: 10846. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
179
+ [2025-04-06 21:22:40,820][29458] Avg episode reward: [(0, '4.419')]
180
+ [2025-04-06 21:22:43,570][29816] Updated weights for policy 0, policy_version 20 (0.0035)
181
+ [2025-04-06 21:22:45,818][29458] Fps is (10 sec: 8191.9, 60 sec: 5296.3, 300 sec: 5280.4). Total num frames: 98304. Throughput: 0: 1273.0. Samples: 23622. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
182
+ [2025-04-06 21:22:45,819][29458] Avg episode reward: [(0, '4.432')]
183
+ [2025-04-06 21:22:48,379][29816] Updated weights for policy 0, policy_version 30 (0.0034)
184
+ [2025-04-06 21:22:50,818][29458] Fps is (10 sec: 8601.9, 60 sec: 6085.3, 300 sec: 6070.3). Total num frames: 143360. Throughput: 0: 1556.9. Samples: 36672. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
185
+ [2025-04-06 21:22:50,819][29458] Avg episode reward: [(0, '4.405')]
186
+ [2025-04-06 21:22:50,836][29697] Saving /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000035_143360.pth...
187
+ [2025-04-06 21:22:50,967][29697] Saving new best policy, reward=4.405!
188
+ [2025-04-06 21:22:53,235][29816] Updated weights for policy 0, policy_version 40 (0.0038)
189
+ [2025-04-06 21:22:55,818][29458] Fps is (10 sec: 8601.6, 60 sec: 6454.5, 300 sec: 6441.0). Total num frames: 184320. Throughput: 0: 1497.7. Samples: 42764. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
190
+ [2025-04-06 21:22:55,819][29458] Avg episode reward: [(0, '4.547')]
191
+ [2025-04-06 21:22:55,820][29697] Saving new best policy, reward=4.547!
192
+ [2025-04-06 21:22:57,999][29816] Updated weights for policy 0, policy_version 50 (0.0035)
193
+ [2025-04-06 21:23:00,818][29458] Fps is (10 sec: 8191.9, 60 sec: 6713.7, 300 sec: 6701.5). Total num frames: 225280. Throughput: 0: 1658.8. Samples: 55656. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
194
+ [2025-04-06 21:23:00,819][29458] Avg episode reward: [(0, '4.311')]
195
+ [2025-04-06 21:23:02,747][29816] Updated weights for policy 0, policy_version 60 (0.0040)
196
+ [2025-04-06 21:23:05,818][29458] Fps is (10 sec: 8601.7, 60 sec: 7011.9, 300 sec: 7000.5). Total num frames: 270336. Throughput: 0: 1783.5. Samples: 68752. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
197
+ [2025-04-06 21:23:05,819][29458] Avg episode reward: [(0, '4.297')]
198
+ [2025-04-06 21:23:07,502][29816] Updated weights for policy 0, policy_version 70 (0.0031)
199
+ [2025-04-06 21:23:10,819][29458] Fps is (10 sec: 8601.2, 60 sec: 7147.5, 300 sec: 7137.0). Total num frames: 311296. Throughput: 0: 1724.7. Samples: 75108. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
200
+ [2025-04-06 21:23:10,819][29458] Avg episode reward: [(0, '4.408')]
201
+ [2025-04-06 21:23:12,285][29816] Updated weights for policy 0, policy_version 80 (0.0035)
202
+ [2025-04-06 21:23:15,819][29458] Fps is (10 sec: 8600.7, 60 sec: 7339.6, 300 sec: 7329.7). Total num frames: 356352. Throughput: 0: 1957.3. Samples: 88112. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
203
+ [2025-04-06 21:23:15,820][29458] Avg episode reward: [(0, '4.271')]
204
+ [2025-04-06 21:23:17,010][29816] Updated weights for policy 0, policy_version 90 (0.0031)
205
+ [2025-04-06 21:23:20,818][29458] Fps is (10 sec: 9011.5, 60 sec: 7496.2, 300 sec: 7486.6). Total num frames: 401408. Throughput: 0: 1990.8. Samples: 94474. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
206
+ [2025-04-06 21:23:20,819][29458] Avg episode reward: [(0, '4.516')]
207
+ [2025-04-06 21:23:21,817][29816] Updated weights for policy 0, policy_version 100 (0.0038)
208
+ [2025-04-06 21:23:25,818][29458] Fps is (10 sec: 8602.4, 60 sec: 7555.8, 300 sec: 7546.8). Total num frames: 442368. Throughput: 0: 2147.5. Samples: 107482. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
209
+ [2025-04-06 21:23:25,819][29458] Avg episode reward: [(0, '4.406')]
210
+ [2025-04-06 21:23:26,458][29816] Updated weights for policy 0, policy_version 110 (0.0034)
211
+ [2025-04-06 21:23:30,818][29458] Fps is (10 sec: 8601.7, 60 sec: 8123.8, 300 sec: 7661.9). Total num frames: 487424. Throughput: 0: 2158.1. Samples: 120734. Policy #0 lag: (min: 0.0, avg: 0.3, max: 1.0)
212
+ [2025-04-06 21:23:30,819][29458] Avg episode reward: [(0, '4.321')]
213
+ [2025-04-06 21:23:30,997][29816] Updated weights for policy 0, policy_version 120 (0.0036)
214
+ [2025-04-06 21:23:35,819][29458] Fps is (10 sec: 8601.2, 60 sec: 8533.3, 300 sec: 7700.5). Total num frames: 528384. Throughput: 0: 2155.0. Samples: 133648. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
215
+ [2025-04-06 21:23:35,819][29458] Avg episode reward: [(0, '4.625')]
216
+ [2025-04-06 21:23:35,863][29697] Saving new best policy, reward=4.625!
217
+ [2025-04-06 21:23:35,869][29816] Updated weights for policy 0, policy_version 130 (0.0033)
218
+ [2025-04-06 21:23:40,637][29816] Updated weights for policy 0, policy_version 140 (0.0035)
219
+ [2025-04-06 21:23:40,819][29458] Fps is (10 sec: 8601.1, 60 sec: 8601.6, 300 sec: 7789.5). Total num frames: 573440. Throughput: 0: 2162.5. Samples: 140078. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
220
+ [2025-04-06 21:23:40,820][29458] Avg episode reward: [(0, '4.268')]
221
+ [2025-04-06 21:23:45,417][29816] Updated weights for policy 0, policy_version 150 (0.0037)
222
+ [2025-04-06 21:23:45,818][29458] Fps is (10 sec: 8602.0, 60 sec: 8601.6, 300 sec: 7815.1). Total num frames: 614400. Throughput: 0: 2163.8. Samples: 153026. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
223
+ [2025-04-06 21:23:45,819][29458] Avg episode reward: [(0, '4.387')]
224
+ [2025-04-06 21:23:50,154][29816] Updated weights for policy 0, policy_version 160 (0.0039)
225
+ [2025-04-06 21:23:50,819][29458] Fps is (10 sec: 8601.6, 60 sec: 8601.5, 300 sec: 7886.6). Total num frames: 659456. Throughput: 0: 2159.5. Samples: 165932. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
226
+ [2025-04-06 21:23:50,819][29458] Avg episode reward: [(0, '4.397')]
227
+ [2025-04-06 21:23:54,952][29816] Updated weights for policy 0, policy_version 170 (0.0036)
228
+ [2025-04-06 21:23:55,818][29458] Fps is (10 sec: 8601.5, 60 sec: 8601.6, 300 sec: 7903.9). Total num frames: 700416. Throughput: 0: 2161.7. Samples: 172382. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
229
+ [2025-04-06 21:23:55,819][29458] Avg episode reward: [(0, '4.640')]
230
+ [2025-04-06 21:23:55,866][29697] Saving new best policy, reward=4.640!
231
+ [2025-04-06 21:23:59,759][29816] Updated weights for policy 0, policy_version 180 (0.0037)
232
+ [2025-04-06 21:24:00,818][29458] Fps is (10 sec: 8602.0, 60 sec: 8669.8, 300 sec: 7963.0). Total num frames: 745472. Throughput: 0: 2154.8. Samples: 185076. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
233
+ [2025-04-06 21:24:00,819][29458] Avg episode reward: [(0, '4.844')]
234
+ [2025-04-06 21:24:00,830][29697] Saving new best policy, reward=4.844!
235
+ [2025-04-06 21:24:04,949][29816] Updated weights for policy 0, policy_version 190 (0.0039)
236
+ [2025-04-06 21:24:05,819][29458] Fps is (10 sec: 8191.9, 60 sec: 8533.3, 300 sec: 7933.1). Total num frames: 782336. Throughput: 0: 2273.2. Samples: 196770. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
237
+ [2025-04-06 21:24:05,820][29458] Avg episode reward: [(0, '4.955')]
238
+ [2025-04-06 21:24:05,821][29697] Saving new best policy, reward=4.955!
239
+ [2025-04-06 21:24:10,673][29816] Updated weights for policy 0, policy_version 200 (0.0037)
240
+ [2025-04-06 21:24:10,819][29458] Fps is (10 sec: 7372.7, 60 sec: 8465.1, 300 sec: 7906.1). Total num frames: 819200. Throughput: 0: 2103.5. Samples: 202142. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
241
+ [2025-04-06 21:24:10,819][29458] Avg episode reward: [(0, '4.772')]
242
+ [2025-04-06 21:24:15,818][29458] Fps is (10 sec: 7373.1, 60 sec: 8328.7, 300 sec: 7881.5). Total num frames: 856064. Throughput: 0: 2042.1. Samples: 212630. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
243
+ [2025-04-06 21:24:15,819][29458] Avg episode reward: [(0, '4.667')]
244
+ [2025-04-06 21:24:16,335][29816] Updated weights for policy 0, policy_version 210 (0.0039)
245
+ [2025-04-06 21:24:20,818][29458] Fps is (10 sec: 7782.5, 60 sec: 8260.3, 300 sec: 7895.2). Total num frames: 897024. Throughput: 0: 2038.9. Samples: 225400. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
246
+ [2025-04-06 21:24:20,819][29458] Avg episode reward: [(0, '4.723')]
247
+ [2025-04-06 21:24:21,104][29816] Updated weights for policy 0, policy_version 220 (0.0041)
248
+ [2025-04-06 21:24:25,818][29458] Fps is (10 sec: 8191.8, 60 sec: 8260.3, 300 sec: 7907.7). Total num frames: 937984. Throughput: 0: 2025.0. Samples: 231204. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
249
+ [2025-04-06 21:24:25,819][29458] Avg episode reward: [(0, '4.701')]
250
+ [2025-04-06 21:24:26,084][29816] Updated weights for policy 0, policy_version 230 (0.0039)
251
+ [2025-04-06 21:24:30,818][29458] Fps is (10 sec: 8192.0, 60 sec: 8192.0, 300 sec: 7919.2). Total num frames: 978944. Throughput: 0: 2021.2. Samples: 243982. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
252
+ [2025-04-06 21:24:30,819][29458] Avg episode reward: [(0, '5.029')]
253
+ [2025-04-06 21:24:30,877][29697] Saving new best policy, reward=5.029!
254
+ [2025-04-06 21:24:30,885][29816] Updated weights for policy 0, policy_version 240 (0.0038)
255
+ [2025-04-06 21:24:35,684][29816] Updated weights for policy 0, policy_version 250 (0.0036)
256
+ [2025-04-06 21:24:35,819][29458] Fps is (10 sec: 8601.5, 60 sec: 8260.3, 300 sec: 7961.6). Total num frames: 1024000. Throughput: 0: 1876.3. Samples: 250366. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
257
+ [2025-04-06 21:24:35,819][29458] Avg episode reward: [(0, '5.175')]
258
+ [2025-04-06 21:24:35,821][29697] Saving new best policy, reward=5.175!
259
+ [2025-04-06 21:24:40,475][29816] Updated weights for policy 0, policy_version 260 (0.0035)
260
+ [2025-04-06 21:24:40,818][29458] Fps is (10 sec: 8601.7, 60 sec: 8192.1, 300 sec: 7970.3). Total num frames: 1064960. Throughput: 0: 2019.1. Samples: 263240. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
261
+ [2025-04-06 21:24:40,819][29458] Avg episode reward: [(0, '5.215')]
262
+ [2025-04-06 21:24:40,831][29697] Saving new best policy, reward=5.215!
263
+ [2025-04-06 21:24:45,776][29816] Updated weights for policy 0, policy_version 270 (0.0034)
264
+ [2025-04-06 21:24:45,819][29458] Fps is (10 sec: 8192.0, 60 sec: 8192.0, 300 sec: 7978.3). Total num frames: 1105920. Throughput: 0: 2003.9. Samples: 275254. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
265
+ [2025-04-06 21:24:45,819][29458] Avg episode reward: [(0, '5.764')]
266
+ [2025-04-06 21:24:45,821][29697] Saving new best policy, reward=5.764!
267
+ [2025-04-06 21:24:50,818][29458] Fps is (10 sec: 7782.3, 60 sec: 8055.5, 300 sec: 7957.2). Total num frames: 1142784. Throughput: 0: 2011.9. Samples: 287306. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
268
+ [2025-04-06 21:24:50,819][29458] Avg episode reward: [(0, '5.685')]
269
+ [2025-04-06 21:24:50,827][29697] Saving /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000279_1142784.pth...
270
+ [2025-04-06 21:24:51,023][29816] Updated weights for policy 0, policy_version 280 (0.0037)
271
+ [2025-04-06 21:24:55,726][29816] Updated weights for policy 0, policy_version 290 (0.0038)
272
+ [2025-04-06 21:24:55,818][29458] Fps is (10 sec: 8192.2, 60 sec: 8123.7, 300 sec: 7992.6). Total num frames: 1187840. Throughput: 0: 2025.6. Samples: 293296. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
273
+ [2025-04-06 21:24:55,819][29458] Avg episode reward: [(0, '5.926')]
274
+ [2025-04-06 21:24:55,820][29697] Saving new best policy, reward=5.926!
275
+ [2025-04-06 21:25:00,818][29458] Fps is (10 sec: 8601.6, 60 sec: 8055.5, 300 sec: 7999.1). Total num frames: 1228800. Throughput: 0: 2065.0. Samples: 305554. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
276
+ [2025-04-06 21:25:00,818][29816] Updated weights for policy 0, policy_version 300 (0.0041)
277
+ [2025-04-06 21:25:00,819][29458] Avg episode reward: [(0, '6.334')]
278
+ [2025-04-06 21:25:00,829][29697] Saving new best policy, reward=6.334!
279
+ [2025-04-06 21:25:05,818][29816] Updated weights for policy 0, policy_version 310 (0.0033)
280
+ [2025-04-06 21:25:05,818][29458] Fps is (10 sec: 8192.2, 60 sec: 8123.8, 300 sec: 8005.2). Total num frames: 1269760. Throughput: 0: 2058.7. Samples: 318040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
281
+ [2025-04-06 21:25:05,819][29458] Avg episode reward: [(0, '7.399')]
282
+ [2025-04-06 21:25:05,820][29697] Saving new best policy, reward=7.399!
283
+ [2025-04-06 21:25:10,767][29816] Updated weights for policy 0, policy_version 320 (0.0035)
284
+ [2025-04-06 21:25:10,818][29458] Fps is (10 sec: 8191.9, 60 sec: 8192.0, 300 sec: 8010.9). Total num frames: 1310720. Throughput: 0: 2065.3. Samples: 324144. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
285
+ [2025-04-06 21:25:10,819][29458] Avg episode reward: [(0, '7.160')]
286
+ [2025-04-06 21:25:15,614][29816] Updated weights for policy 0, policy_version 330 (0.0037)
287
+ [2025-04-06 21:25:15,819][29458] Fps is (10 sec: 8191.6, 60 sec: 8260.2, 300 sec: 8016.3). Total num frames: 1351680. Throughput: 0: 2061.4. Samples: 336744. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
288
+ [2025-04-06 21:25:15,820][29458] Avg episode reward: [(0, '7.225')]
289
+ [2025-04-06 21:25:20,818][29458] Fps is (10 sec: 7782.5, 60 sec: 8192.0, 300 sec: 7997.8). Total num frames: 1388544. Throughput: 0: 2182.8. Samples: 348592. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
290
+ [2025-04-06 21:25:20,819][29458] Avg episode reward: [(0, '7.720')]
291
+ [2025-04-06 21:25:20,833][29697] Saving new best policy, reward=7.720!
292
+ [2025-04-06 21:25:21,072][29816] Updated weights for policy 0, policy_version 340 (0.0041)
293
+ [2025-04-06 21:25:25,818][29458] Fps is (10 sec: 7782.6, 60 sec: 8192.0, 300 sec: 8003.2). Total num frames: 1429504. Throughput: 0: 2022.7. Samples: 354260. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
294
+ [2025-04-06 21:25:25,819][29458] Avg episode reward: [(0, '8.824')]
295
+ [2025-04-06 21:25:25,821][29697] Saving new best policy, reward=8.824!
296
+ [2025-04-06 21:25:26,135][29816] Updated weights for policy 0, policy_version 350 (0.0035)
297
+ [2025-04-06 21:25:30,818][29458] Fps is (10 sec: 8192.0, 60 sec: 8192.0, 300 sec: 8008.3). Total num frames: 1470464. Throughput: 0: 2030.5. Samples: 366626. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
298
+ [2025-04-06 21:25:30,819][29458] Avg episode reward: [(0, '9.869')]
299
+ [2025-04-06 21:25:30,833][29697] Saving new best policy, reward=9.869!
300
+ [2025-04-06 21:25:31,162][29816] Updated weights for policy 0, policy_version 360 (0.0034)
301
+ [2025-04-06 21:25:35,818][29458] Fps is (10 sec: 8192.0, 60 sec: 8123.8, 300 sec: 8013.2). Total num frames: 1511424. Throughput: 0: 2039.2. Samples: 379070. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
302
+ [2025-04-06 21:25:35,819][29458] Avg episode reward: [(0, '9.580')]
303
+ [2025-04-06 21:25:36,085][29816] Updated weights for policy 0, policy_version 370 (0.0032)
304
+ [2025-04-06 21:25:40,820][29458] Fps is (10 sec: 8191.0, 60 sec: 8123.6, 300 sec: 8017.8). Total num frames: 1552384. Throughput: 0: 2039.1. Samples: 385060. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
305
+ [2025-04-06 21:25:40,821][29458] Avg episode reward: [(0, '9.251')]
306
+ [2025-04-06 21:25:41,112][29816] Updated weights for policy 0, policy_version 380 (0.0035)
307
+ [2025-04-06 21:25:45,819][29458] Fps is (10 sec: 7782.3, 60 sec: 8055.5, 300 sec: 8001.6). Total num frames: 1589248. Throughput: 0: 2029.6. Samples: 396886. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
308
+ [2025-04-06 21:25:45,819][29458] Avg episode reward: [(0, '10.675')]
309
+ [2025-04-06 21:25:45,822][29697] Saving new best policy, reward=10.675!
310
+ [2025-04-06 21:25:46,567][29816] Updated weights for policy 0, policy_version 390 (0.0037)
311
+ [2025-04-06 21:25:50,819][29458] Fps is (10 sec: 7373.1, 60 sec: 8055.4, 300 sec: 7986.1). Total num frames: 1626112. Throughput: 0: 1989.9. Samples: 407588. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
312
+ [2025-04-06 21:25:50,825][29458] Avg episode reward: [(0, '12.079')]
313
+ [2025-04-06 21:25:50,840][29697] Saving new best policy, reward=12.079!
314
+ [2025-04-06 21:25:52,204][29816] Updated weights for policy 0, policy_version 400 (0.0036)
315
+ [2025-04-06 21:25:55,819][29458] Fps is (10 sec: 7372.8, 60 sec: 7918.9, 300 sec: 7971.4). Total num frames: 1662976. Throughput: 0: 1977.1. Samples: 413114. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
316
+ [2025-04-06 21:25:55,819][29458] Avg episode reward: [(0, '11.502')]
317
+ [2025-04-06 21:25:57,689][29816] Updated weights for policy 0, policy_version 410 (0.0042)
318
+ [2025-04-06 21:26:00,819][29458] Fps is (10 sec: 7373.3, 60 sec: 7850.6, 300 sec: 7957.4). Total num frames: 1699840. Throughput: 0: 1938.5. Samples: 423974. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
319
+ [2025-04-06 21:26:00,820][29458] Avg episode reward: [(0, '11.994')]
320
+ [2025-04-06 21:26:03,596][29816] Updated weights for policy 0, policy_version 420 (0.0038)
321
+ [2025-04-06 21:26:05,818][29458] Fps is (10 sec: 7373.0, 60 sec: 7782.4, 300 sec: 7944.9). Total num frames: 1736704. Throughput: 0: 1792.6. Samples: 429258. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
322
+ [2025-04-06 21:26:05,819][29458] Avg episode reward: [(0, '13.856')]
323
+ [2025-04-06 21:26:05,820][29697] Saving new best policy, reward=13.856!
324
+ [2025-04-06 21:26:09,764][29816] Updated weights for policy 0, policy_version 430 (0.0048)
325
+ [2025-04-06 21:26:10,818][29458] Fps is (10 sec: 6553.7, 60 sec: 7577.6, 300 sec: 7895.7). Total num frames: 1765376. Throughput: 0: 1894.1. Samples: 439496. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
326
+ [2025-04-06 21:26:10,819][29458] Avg episode reward: [(0, '14.682')]
327
+ [2025-04-06 21:26:10,837][29697] Saving new best policy, reward=14.682!
328
+ [2025-04-06 21:26:15,819][29458] Fps is (10 sec: 6143.8, 60 sec: 7441.1, 300 sec: 7866.5). Total num frames: 1798144. Throughput: 0: 1833.9. Samples: 449150. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
329
+ [2025-04-06 21:26:15,819][29458] Avg episode reward: [(0, '13.383')]
330
+ [2025-04-06 21:26:15,829][29816] Updated weights for policy 0, policy_version 440 (0.0045)
331
+ [2025-04-06 21:26:20,818][29458] Fps is (10 sec: 7372.9, 60 sec: 7509.3, 300 sec: 7873.6). Total num frames: 1839104. Throughput: 0: 1821.6. Samples: 461042. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
332
+ [2025-04-06 21:26:20,819][29458] Avg episode reward: [(0, '12.544')]
333
+ [2025-04-06 21:26:21,138][29816] Updated weights for policy 0, policy_version 450 (0.0035)
334
+ [2025-04-06 21:26:25,818][29458] Fps is (10 sec: 8192.4, 60 sec: 7509.4, 300 sec: 7880.6). Total num frames: 1880064. Throughput: 0: 1814.0. Samples: 466688. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
335
+ [2025-04-06 21:26:25,820][29458] Avg episode reward: [(0, '14.011')]
336
+ [2025-04-06 21:26:26,363][29816] Updated weights for policy 0, policy_version 460 (0.0037)
337
+ [2025-04-06 21:26:30,820][29458] Fps is (10 sec: 7781.1, 60 sec: 7440.9, 300 sec: 7870.2). Total num frames: 1916928. Throughput: 0: 1812.5. Samples: 478450. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
338
+ [2025-04-06 21:26:30,821][29458] Avg episode reward: [(0, '14.741')]
339
+ [2025-04-06 21:26:30,836][29697] Saving new best policy, reward=14.741!
340
+ [2025-04-06 21:26:31,636][29816] Updated weights for policy 0, policy_version 470 (0.0037)
341
+ [2025-04-06 21:26:35,818][29458] Fps is (10 sec: 7782.4, 60 sec: 7441.1, 300 sec: 7876.8). Total num frames: 1957888. Throughput: 0: 1701.0. Samples: 484130. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
342
+ [2025-04-06 21:26:35,819][29458] Avg episode reward: [(0, '14.143')]
343
+ [2025-04-06 21:26:36,932][29816] Updated weights for policy 0, policy_version 480 (0.0038)
344
+ [2025-04-06 21:26:40,819][29458] Fps is (10 sec: 7783.3, 60 sec: 7372.9, 300 sec: 7866.9). Total num frames: 1994752. Throughput: 0: 1839.4. Samples: 495888. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
345
+ [2025-04-06 21:26:40,819][29458] Avg episode reward: [(0, '16.219')]
346
+ [2025-04-06 21:26:40,831][29697] Saving new best policy, reward=16.219!
347
+ [2025-04-06 21:26:42,188][29816] Updated weights for policy 0, policy_version 490 (0.0044)
348
+ [2025-04-06 21:26:45,819][29458] Fps is (10 sec: 7372.5, 60 sec: 7372.8, 300 sec: 7857.5). Total num frames: 2031616. Throughput: 0: 1852.7. Samples: 507346. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
349
+ [2025-04-06 21:26:45,820][29458] Avg episode reward: [(0, '16.662')]
350
+ [2025-04-06 21:26:45,822][29697] Saving new best policy, reward=16.662!
351
+ [2025-04-06 21:26:47,546][29816] Updated weights for policy 0, policy_version 500 (0.0040)
352
+ [2025-04-06 21:26:50,818][29458] Fps is (10 sec: 7782.7, 60 sec: 7441.2, 300 sec: 7863.9). Total num frames: 2072576. Throughput: 0: 1861.4. Samples: 513022. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
353
+ [2025-04-06 21:26:50,819][29458] Avg episode reward: [(0, '16.306')]
354
+ [2025-04-06 21:26:50,834][29697] Saving /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000506_2072576.pth...
355
+ [2025-04-06 21:26:50,935][29697] Removing /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000035_143360.pth
356
+ [2025-04-06 21:26:52,874][29816] Updated weights for policy 0, policy_version 510 (0.0048)
357
+ [2025-04-06 21:26:55,819][29458] Fps is (10 sec: 7782.1, 60 sec: 7441.0, 300 sec: 7854.7). Total num frames: 2109440. Throughput: 0: 1893.9. Samples: 524724. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
358
+ [2025-04-06 21:26:55,820][29458] Avg episode reward: [(0, '16.752')]
359
+ [2025-04-06 21:26:55,822][29697] Saving new best policy, reward=16.752!
360
+ [2025-04-06 21:26:58,215][29816] Updated weights for policy 0, policy_version 520 (0.0033)
361
+ [2025-04-06 21:27:00,819][29458] Fps is (10 sec: 7372.7, 60 sec: 7441.1, 300 sec: 7846.0). Total num frames: 2146304. Throughput: 0: 1937.2. Samples: 536322. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
362
+ [2025-04-06 21:27:00,820][29458] Avg episode reward: [(0, '16.356')]
363
+ [2025-04-06 21:27:03,417][29816] Updated weights for policy 0, policy_version 530 (0.0040)
364
+ [2025-04-06 21:27:05,818][29458] Fps is (10 sec: 7782.9, 60 sec: 7509.3, 300 sec: 7852.2). Total num frames: 2187264. Throughput: 0: 1932.6. Samples: 548010. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
365
+ [2025-04-06 21:27:05,820][29458] Avg episode reward: [(0, '16.246')]
366
+ [2025-04-06 21:27:08,808][29816] Updated weights for policy 0, policy_version 540 (0.0049)
367
+ [2025-04-06 21:27:10,819][29458] Fps is (10 sec: 8191.7, 60 sec: 7714.1, 300 sec: 7858.3). Total num frames: 2228224. Throughput: 0: 1934.2. Samples: 553728. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
368
+ [2025-04-06 21:27:10,819][29458] Avg episode reward: [(0, '16.217')]
369
+ [2025-04-06 21:27:14,157][29816] Updated weights for policy 0, policy_version 550 (0.0039)
370
+ [2025-04-06 21:27:15,818][29458] Fps is (10 sec: 7782.4, 60 sec: 7782.4, 300 sec: 7849.9). Total num frames: 2265088. Throughput: 0: 1924.3. Samples: 565040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
371
+ [2025-04-06 21:27:15,819][29458] Avg episode reward: [(0, '15.185')]
372
+ [2025-04-06 21:27:19,607][29816] Updated weights for policy 0, policy_version 560 (0.0038)
373
+ [2025-04-06 21:27:20,819][29458] Fps is (10 sec: 7373.0, 60 sec: 7714.1, 300 sec: 7841.8). Total num frames: 2301952. Throughput: 0: 2051.4. Samples: 576444. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
374
+ [2025-04-06 21:27:20,820][29458] Avg episode reward: [(0, '18.130')]
375
+ [2025-04-06 21:27:20,837][29697] Saving new best policy, reward=18.130!
376
+ [2025-04-06 21:27:25,069][29816] Updated weights for policy 0, policy_version 570 (0.0041)
377
+ [2025-04-06 21:27:25,819][29458] Fps is (10 sec: 7372.7, 60 sec: 7645.8, 300 sec: 7928.2). Total num frames: 2338816. Throughput: 0: 1910.8. Samples: 581872. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
378
+ [2025-04-06 21:27:25,819][29458] Avg episode reward: [(0, '17.818')]
379
+ [2025-04-06 21:27:30,819][29458] Fps is (10 sec: 6963.2, 60 sec: 7577.8, 300 sec: 7983.7). Total num frames: 2371584. Throughput: 0: 1894.0. Samples: 592576. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
380
+ [2025-04-06 21:27:30,820][29458] Avg episode reward: [(0, '19.286')]
381
+ [2025-04-06 21:27:30,854][29697] Saving new best policy, reward=19.286!
382
+ [2025-04-06 21:27:30,862][29816] Updated weights for policy 0, policy_version 580 (0.0041)
383
+ [2025-04-06 21:27:35,818][29458] Fps is (10 sec: 7372.8, 60 sec: 7577.6, 300 sec: 7983.7). Total num frames: 2412544. Throughput: 0: 2027.3. Samples: 604252. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
384
+ [2025-04-06 21:27:35,819][29458] Avg episode reward: [(0, '18.906')]
385
+ [2025-04-06 21:27:36,156][29816] Updated weights for policy 0, policy_version 590 (0.0034)
386
+ [2025-04-06 21:27:40,818][29458] Fps is (10 sec: 8192.3, 60 sec: 7645.9, 300 sec: 7983.7). Total num frames: 2453504. Throughput: 0: 1903.0. Samples: 610358. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
387
+ [2025-04-06 21:27:40,819][29458] Avg episode reward: [(0, '19.766')]
388
+ [2025-04-06 21:27:40,831][29697] Saving new best policy, reward=19.766!
389
+ [2025-04-06 21:27:41,189][29816] Updated weights for policy 0, policy_version 600 (0.0036)
390
+ [2025-04-06 21:27:45,818][29458] Fps is (10 sec: 8192.1, 60 sec: 7714.2, 300 sec: 7969.8). Total num frames: 2494464. Throughput: 0: 1922.3. Samples: 622826. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
391
+ [2025-04-06 21:27:45,819][29458] Avg episode reward: [(0, '19.846')]
392
+ [2025-04-06 21:27:45,821][29697] Saving new best policy, reward=19.846!
393
+ [2025-04-06 21:27:46,042][29816] Updated weights for policy 0, policy_version 610 (0.0037)
394
+ [2025-04-06 21:27:50,818][29458] Fps is (10 sec: 8192.0, 60 sec: 7714.1, 300 sec: 7969.8). Total num frames: 2535424. Throughput: 0: 1945.3. Samples: 635548. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
395
+ [2025-04-06 21:27:50,819][29458] Avg episode reward: [(0, '20.144')]
396
+ [2025-04-06 21:27:50,847][29697] Saving new best policy, reward=20.144!
397
+ [2025-04-06 21:27:50,856][29816] Updated weights for policy 0, policy_version 620 (0.0037)
398
+ [2025-04-06 21:27:55,689][29816] Updated weights for policy 0, policy_version 630 (0.0032)
399
+ [2025-04-06 21:27:55,819][29458] Fps is (10 sec: 8601.3, 60 sec: 7850.7, 300 sec: 7983.7). Total num frames: 2580480. Throughput: 0: 1955.3. Samples: 641716. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
400
+ [2025-04-06 21:27:55,820][29458] Avg episode reward: [(0, '21.052')]
401
+ [2025-04-06 21:27:55,822][29697] Saving new best policy, reward=21.052!
402
+ [2025-04-06 21:28:00,625][29816] Updated weights for policy 0, policy_version 640 (0.0037)
403
+ [2025-04-06 21:28:00,819][29458] Fps is (10 sec: 8601.4, 60 sec: 7918.9, 300 sec: 7969.8). Total num frames: 2621440. Throughput: 0: 1981.0. Samples: 654186. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
404
+ [2025-04-06 21:28:00,819][29458] Avg episode reward: [(0, '19.409')]
405
+ [2025-04-06 21:28:05,536][29816] Updated weights for policy 0, policy_version 650 (0.0039)
406
+ [2025-04-06 21:28:05,818][29458] Fps is (10 sec: 8192.3, 60 sec: 7918.9, 300 sec: 7969.9). Total num frames: 2662400. Throughput: 0: 2005.8. Samples: 666706. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
407
+ [2025-04-06 21:28:05,819][29458] Avg episode reward: [(0, '19.812')]
408
+ [2025-04-06 21:28:10,511][29816] Updated weights for policy 0, policy_version 660 (0.0040)
409
+ [2025-04-06 21:28:10,819][29458] Fps is (10 sec: 8192.0, 60 sec: 7919.0, 300 sec: 7956.0). Total num frames: 2703360. Throughput: 0: 2023.9. Samples: 672948. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
410
+ [2025-04-06 21:28:10,820][29458] Avg episode reward: [(0, '21.072')]
411
+ [2025-04-06 21:28:10,835][29697] Saving new best policy, reward=21.072!
412
+ [2025-04-06 21:28:15,671][29816] Updated weights for policy 0, policy_version 670 (0.0036)
413
+ [2025-04-06 21:28:15,819][29458] Fps is (10 sec: 8191.9, 60 sec: 7987.2, 300 sec: 7942.1). Total num frames: 2744320. Throughput: 0: 2050.1. Samples: 684832. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
414
+ [2025-04-06 21:28:15,819][29458] Avg episode reward: [(0, '21.161')]
415
+ [2025-04-06 21:28:15,821][29697] Saving new best policy, reward=21.161!
416
+ [2025-04-06 21:28:20,590][29816] Updated weights for policy 0, policy_version 680 (0.0038)
417
+ [2025-04-06 21:28:20,818][29458] Fps is (10 sec: 8192.1, 60 sec: 8055.5, 300 sec: 7942.1). Total num frames: 2785280. Throughput: 0: 1928.6. Samples: 691040. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
418
+ [2025-04-06 21:28:20,819][29458] Avg episode reward: [(0, '20.833')]
419
+ [2025-04-06 21:28:25,516][29816] Updated weights for policy 0, policy_version 690 (0.0037)
420
+ [2025-04-06 21:28:25,819][29458] Fps is (10 sec: 8192.1, 60 sec: 8123.7, 300 sec: 7928.2). Total num frames: 2826240. Throughput: 0: 2073.1. Samples: 703646. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
421
+ [2025-04-06 21:28:25,820][29458] Avg episode reward: [(0, '21.685')]
422
+ [2025-04-06 21:28:25,821][29697] Saving new best policy, reward=21.685!
423
+ [2025-04-06 21:28:30,510][29816] Updated weights for policy 0, policy_version 700 (0.0037)
424
+ [2025-04-06 21:28:30,818][29458] Fps is (10 sec: 8192.0, 60 sec: 8260.3, 300 sec: 7928.2). Total num frames: 2867200. Throughput: 0: 2069.4. Samples: 715948. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
425
+ [2025-04-06 21:28:30,819][29458] Avg episode reward: [(0, '21.848')]
426
+ [2025-04-06 21:28:30,831][29697] Saving new best policy, reward=21.848!
427
+ [2025-04-06 21:28:35,417][29816] Updated weights for policy 0, policy_version 710 (0.0031)
428
+ [2025-04-06 21:28:35,818][29458] Fps is (10 sec: 8192.1, 60 sec: 8260.3, 300 sec: 7914.3). Total num frames: 2908160. Throughput: 0: 2064.6. Samples: 728454. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
429
+ [2025-04-06 21:28:35,819][29458] Avg episode reward: [(0, '21.461')]
430
+ [2025-04-06 21:28:40,339][29816] Updated weights for policy 0, policy_version 720 (0.0037)
431
+ [2025-04-06 21:28:40,818][29458] Fps is (10 sec: 8601.5, 60 sec: 8328.5, 300 sec: 7928.2). Total num frames: 2953216. Throughput: 0: 2066.6. Samples: 734712. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
432
+ [2025-04-06 21:28:40,819][29458] Avg episode reward: [(0, '21.480')]
433
+ [2025-04-06 21:28:45,138][29816] Updated weights for policy 0, policy_version 730 (0.0035)
434
+ [2025-04-06 21:28:45,818][29458] Fps is (10 sec: 8601.5, 60 sec: 8328.5, 300 sec: 7914.3). Total num frames: 2994176. Throughput: 0: 2069.0. Samples: 747290. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
435
+ [2025-04-06 21:28:45,819][29458] Avg episode reward: [(0, '18.329')]
436
+ [2025-04-06 21:28:50,138][29816] Updated weights for policy 0, policy_version 740 (0.0042)
437
+ [2025-04-06 21:28:50,818][29458] Fps is (10 sec: 8192.1, 60 sec: 8328.5, 300 sec: 7914.3). Total num frames: 3035136. Throughput: 0: 2067.3. Samples: 759734. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
438
+ [2025-04-06 21:28:50,819][29458] Avg episode reward: [(0, '18.835')]
439
+ [2025-04-06 21:28:50,834][29697] Saving /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000741_3035136.pth...
440
+ [2025-04-06 21:28:50,964][29697] Removing /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000279_1142784.pth
441
+ [2025-04-06 21:28:55,225][29816] Updated weights for policy 0, policy_version 750 (0.0041)
442
+ [2025-04-06 21:28:55,818][29458] Fps is (10 sec: 8192.0, 60 sec: 8260.3, 300 sec: 7900.4). Total num frames: 3076096. Throughput: 0: 2063.3. Samples: 765794. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
443
+ [2025-04-06 21:28:55,820][29458] Avg episode reward: [(0, '19.178')]
444
+ [2025-04-06 21:29:00,235][29816] Updated weights for policy 0, policy_version 760 (0.0037)
445
+ [2025-04-06 21:29:00,820][29458] Fps is (10 sec: 8191.0, 60 sec: 8260.1, 300 sec: 7914.3). Total num frames: 3117056. Throughput: 0: 2072.2. Samples: 778084. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
446
+ [2025-04-06 21:29:00,820][29458] Avg episode reward: [(0, '19.598')]
447
+ [2025-04-06 21:29:05,337][29816] Updated weights for policy 0, policy_version 770 (0.0039)
448
+ [2025-04-06 21:29:05,818][29458] Fps is (10 sec: 7782.5, 60 sec: 8192.0, 300 sec: 7914.3). Total num frames: 3153920. Throughput: 0: 2203.1. Samples: 790178. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
449
+ [2025-04-06 21:29:05,819][29458] Avg episode reward: [(0, '21.833')]
450
+ [2025-04-06 21:29:10,278][29816] Updated weights for policy 0, policy_version 780 (0.0039)
451
+ [2025-04-06 21:29:10,819][29458] Fps is (10 sec: 8192.6, 60 sec: 8260.2, 300 sec: 7942.1). Total num frames: 3198976. Throughput: 0: 2061.4. Samples: 796412. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
452
+ [2025-04-06 21:29:10,820][29458] Avg episode reward: [(0, '22.750')]
453
+ [2025-04-06 21:29:10,838][29697] Saving new best policy, reward=22.750!
454
+ [2025-04-06 21:29:15,200][29816] Updated weights for policy 0, policy_version 790 (0.0034)
455
+ [2025-04-06 21:29:15,818][29458] Fps is (10 sec: 8601.4, 60 sec: 8260.3, 300 sec: 7942.1). Total num frames: 3239936. Throughput: 0: 2060.7. Samples: 808680. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
456
+ [2025-04-06 21:29:15,819][29458] Avg episode reward: [(0, '24.524')]
457
+ [2025-04-06 21:29:15,821][29697] Saving new best policy, reward=24.524!
458
+ [2025-04-06 21:29:20,195][29816] Updated weights for policy 0, policy_version 800 (0.0039)
459
+ [2025-04-06 21:29:20,819][29458] Fps is (10 sec: 8192.2, 60 sec: 8260.2, 300 sec: 7942.1). Total num frames: 3280896. Throughput: 0: 2057.5. Samples: 821042. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
460
+ [2025-04-06 21:29:20,820][29458] Avg episode reward: [(0, '25.083')]
461
+ [2025-04-06 21:29:20,830][29697] Saving new best policy, reward=25.083!
462
+ [2025-04-06 21:29:25,181][29816] Updated weights for policy 0, policy_version 810 (0.0037)
463
+ [2025-04-06 21:29:25,818][29458] Fps is (10 sec: 8192.1, 60 sec: 8260.3, 300 sec: 7942.1). Total num frames: 3321856. Throughput: 0: 2053.8. Samples: 827132. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
464
+ [2025-04-06 21:29:25,819][29458] Avg episode reward: [(0, '23.715')]
465
+ [2025-04-06 21:29:30,162][29816] Updated weights for policy 0, policy_version 820 (0.0041)
466
+ [2025-04-06 21:29:30,818][29458] Fps is (10 sec: 8192.3, 60 sec: 8260.3, 300 sec: 7928.2). Total num frames: 3362816. Throughput: 0: 2050.0. Samples: 839542. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
467
+ [2025-04-06 21:29:30,820][29458] Avg episode reward: [(0, '22.336')]
468
+ [2025-04-06 21:29:35,117][29816] Updated weights for policy 0, policy_version 830 (0.0038)
469
+ [2025-04-06 21:29:35,818][29458] Fps is (10 sec: 8192.1, 60 sec: 8260.3, 300 sec: 7928.2). Total num frames: 3403776. Throughput: 0: 2050.0. Samples: 851982. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
470
+ [2025-04-06 21:29:35,819][29458] Avg episode reward: [(0, '23.183')]
471
+ [2025-04-06 21:29:40,031][29816] Updated weights for policy 0, policy_version 840 (0.0037)
472
+ [2025-04-06 21:29:40,818][29458] Fps is (10 sec: 8192.1, 60 sec: 8192.0, 300 sec: 7928.2). Total num frames: 3444736. Throughput: 0: 2052.9. Samples: 858176. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
473
+ [2025-04-06 21:29:40,820][29458] Avg episode reward: [(0, '21.936')]
474
+ [2025-04-06 21:29:44,985][29816] Updated weights for policy 0, policy_version 850 (0.0035)
475
+ [2025-04-06 21:29:45,819][29458] Fps is (10 sec: 8191.7, 60 sec: 8192.0, 300 sec: 7942.1). Total num frames: 3485696. Throughput: 0: 2055.8. Samples: 870594. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
476
+ [2025-04-06 21:29:45,821][29458] Avg episode reward: [(0, '22.511')]
477
+ [2025-04-06 21:29:50,137][29816] Updated weights for policy 0, policy_version 860 (0.0032)
478
+ [2025-04-06 21:29:50,818][29458] Fps is (10 sec: 8192.0, 60 sec: 8192.0, 300 sec: 7928.2). Total num frames: 3526656. Throughput: 0: 2053.7. Samples: 882596. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
479
+ [2025-04-06 21:29:50,819][29458] Avg episode reward: [(0, '22.732')]
480
+ [2025-04-06 21:29:55,161][29816] Updated weights for policy 0, policy_version 870 (0.0035)
481
+ [2025-04-06 21:29:55,818][29458] Fps is (10 sec: 8192.3, 60 sec: 8192.0, 300 sec: 7928.2). Total num frames: 3567616. Throughput: 0: 2051.9. Samples: 888746. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
482
+ [2025-04-06 21:29:55,819][29458] Avg episode reward: [(0, '23.068')]
483
+ [2025-04-06 21:30:00,144][29816] Updated weights for policy 0, policy_version 880 (0.0037)
484
+ [2025-04-06 21:30:00,819][29458] Fps is (10 sec: 8191.6, 60 sec: 8192.1, 300 sec: 7928.2). Total num frames: 3608576. Throughput: 0: 2055.1. Samples: 901160. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
485
+ [2025-04-06 21:30:00,820][29458] Avg episode reward: [(0, '23.628')]
486
+ [2025-04-06 21:30:05,140][29816] Updated weights for policy 0, policy_version 890 (0.0037)
487
+ [2025-04-06 21:30:05,818][29458] Fps is (10 sec: 8192.0, 60 sec: 8260.3, 300 sec: 7928.2). Total num frames: 3649536. Throughput: 0: 2053.1. Samples: 913430. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
488
+ [2025-04-06 21:30:05,819][29458] Avg episode reward: [(0, '24.405')]
489
+ [2025-04-06 21:30:09,984][29816] Updated weights for policy 0, policy_version 900 (0.0036)
490
+ [2025-04-06 21:30:10,818][29458] Fps is (10 sec: 8192.3, 60 sec: 8192.1, 300 sec: 7928.2). Total num frames: 3690496. Throughput: 0: 2055.4. Samples: 919626. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
491
+ [2025-04-06 21:30:10,820][29458] Avg episode reward: [(0, '22.835')]
492
+ [2025-04-06 21:30:15,093][29816] Updated weights for policy 0, policy_version 910 (0.0035)
493
+ [2025-04-06 21:30:15,818][29458] Fps is (10 sec: 8191.9, 60 sec: 8192.0, 300 sec: 7942.1). Total num frames: 3731456. Throughput: 0: 2050.0. Samples: 931792. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
494
+ [2025-04-06 21:30:15,819][29458] Avg episode reward: [(0, '21.146')]
495
+ [2025-04-06 21:30:20,125][29816] Updated weights for policy 0, policy_version 920 (0.0033)
496
+ [2025-04-06 21:30:20,819][29458] Fps is (10 sec: 8191.9, 60 sec: 8192.0, 300 sec: 7942.1). Total num frames: 3772416. Throughput: 0: 2047.9. Samples: 944140. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
497
+ [2025-04-06 21:30:20,819][29458] Avg episode reward: [(0, '22.432')]
498
+ [2025-04-06 21:30:25,108][29816] Updated weights for policy 0, policy_version 930 (0.0038)
499
+ [2025-04-06 21:30:25,819][29458] Fps is (10 sec: 8191.8, 60 sec: 8192.0, 300 sec: 7942.1). Total num frames: 3813376. Throughput: 0: 2047.9. Samples: 950334. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
500
+ [2025-04-06 21:30:25,819][29458] Avg episode reward: [(0, '25.029')]
501
+ [2025-04-06 21:30:30,143][29816] Updated weights for policy 0, policy_version 940 (0.0034)
502
+ [2025-04-06 21:30:30,818][29458] Fps is (10 sec: 8192.2, 60 sec: 8192.0, 300 sec: 7942.1). Total num frames: 3854336. Throughput: 0: 2041.6. Samples: 962464. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
503
+ [2025-04-06 21:30:30,819][29458] Avg episode reward: [(0, '24.218')]
504
+ [2025-04-06 21:30:35,169][29816] Updated weights for policy 0, policy_version 950 (0.0038)
505
+ [2025-04-06 21:30:35,819][29458] Fps is (10 sec: 8191.8, 60 sec: 8191.9, 300 sec: 7942.1). Total num frames: 3895296. Throughput: 0: 2046.6. Samples: 974694. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
506
+ [2025-04-06 21:30:35,820][29458] Avg episode reward: [(0, '26.537')]
507
+ [2025-04-06 21:30:35,821][29697] Saving new best policy, reward=26.537!
508
+ [2025-04-06 21:30:40,154][29816] Updated weights for policy 0, policy_version 960 (0.0037)
509
+ [2025-04-06 21:30:40,819][29458] Fps is (10 sec: 8191.8, 60 sec: 8192.0, 300 sec: 7956.0). Total num frames: 3936256. Throughput: 0: 2046.8. Samples: 980854. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
510
+ [2025-04-06 21:30:40,819][29458] Avg episode reward: [(0, '24.753')]
511
+ [2025-04-06 21:30:45,077][29816] Updated weights for policy 0, policy_version 970 (0.0034)
512
+ [2025-04-06 21:30:45,818][29458] Fps is (10 sec: 8192.4, 60 sec: 8192.0, 300 sec: 7969.9). Total num frames: 3977216. Throughput: 0: 2048.8. Samples: 993354. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
513
+ [2025-04-06 21:30:45,819][29458] Avg episode reward: [(0, '23.041')]
514
+ [2025-04-06 21:30:49,011][29697] Saving /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
515
+ [2025-04-06 21:30:49,019][29458] Component Batcher_0 stopped!
516
+ [2025-04-06 21:30:49,025][29697] Stopping Batcher_0...
517
+ [2025-04-06 21:30:49,035][29697] Loop batcher_evt_loop terminating...
518
+ [2025-04-06 21:30:49,050][29816] Weights refcount: 2 0
519
+ [2025-04-06 21:30:49,055][29458] Component InferenceWorker_p0-w0 stopped!
520
+ [2025-04-06 21:30:49,054][29816] Stopping InferenceWorker_p0-w0...
521
+ [2025-04-06 21:30:49,058][29816] Loop inference_proc0-0_evt_loop terminating...
522
+ [2025-04-06 21:30:49,088][29458] Component RolloutWorker_w6 stopped!
523
+ [2025-04-06 21:30:49,088][29817] Stopping RolloutWorker_w6...
524
+ [2025-04-06 21:30:49,093][29817] Loop rollout_proc6_evt_loop terminating...
525
+ [2025-04-06 21:30:49,128][29458] Component RolloutWorker_w3 stopped!
526
+ [2025-04-06 21:30:49,129][29458] Component RolloutWorker_w7 stopped!
527
+ [2025-04-06 21:30:49,128][29823] Stopping RolloutWorker_w7...
528
+ [2025-04-06 21:30:49,131][29458] Component RolloutWorker_w1 stopped!
529
+ [2025-04-06 21:30:49,128][29819] Stopping RolloutWorker_w3...
530
+ [2025-04-06 21:30:49,131][29823] Loop rollout_proc7_evt_loop terminating...
531
+ [2025-04-06 21:30:49,133][29815] Stopping RolloutWorker_w1...
532
+ [2025-04-06 21:30:49,138][29815] Loop rollout_proc1_evt_loop terminating...
533
+ [2025-04-06 21:30:49,134][29819] Loop rollout_proc3_evt_loop terminating...
534
+ [2025-04-06 21:30:49,140][29458] Component RolloutWorker_w2 stopped!
535
+ [2025-04-06 21:30:49,142][29821] Stopping RolloutWorker_w2...
536
+ [2025-04-06 21:30:49,145][29458] Component RolloutWorker_w4 stopped!
537
+ [2025-04-06 21:30:49,147][29697] Removing /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000506_2072576.pth
538
+ [2025-04-06 21:30:49,148][29458] Component RolloutWorker_w0 stopped!
539
+ [2025-04-06 21:30:49,144][29821] Loop rollout_proc2_evt_loop terminating...
540
+ [2025-04-06 21:30:49,147][29818] Stopping RolloutWorker_w0...
541
+ [2025-04-06 21:30:49,150][29818] Loop rollout_proc0_evt_loop terminating...
542
+ [2025-04-06 21:30:49,145][29820] Stopping RolloutWorker_w4...
543
+ [2025-04-06 21:30:49,154][29820] Loop rollout_proc4_evt_loop terminating...
544
+ [2025-04-06 21:30:49,164][29697] Saving /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
545
+ [2025-04-06 21:30:49,175][29458] Component RolloutWorker_w5 stopped!
546
+ [2025-04-06 21:30:49,175][29822] Stopping RolloutWorker_w5...
547
+ [2025-04-06 21:30:49,177][29822] Loop rollout_proc5_evt_loop terminating...
548
+ [2025-04-06 21:30:49,340][29697] Stopping LearnerWorker_p0...
549
+ [2025-04-06 21:30:49,341][29458] Component LearnerWorker_p0 stopped!
550
+ [2025-04-06 21:30:49,342][29458] Waiting for process learner_proc0 to stop...
551
+ [2025-04-06 21:30:49,343][29697] Loop learner_proc0_evt_loop terminating...
552
+ [2025-04-06 21:30:51,627][29458] Waiting for process inference_proc0-0 to join...
553
+ [2025-04-06 21:30:51,628][29458] Waiting for process rollout_proc0 to join...
554
+ [2025-04-06 21:30:51,629][29458] Waiting for process rollout_proc1 to join...
555
+ [2025-04-06 21:30:51,630][29458] Waiting for process rollout_proc2 to join...
556
+ [2025-04-06 21:30:51,631][29458] Waiting for process rollout_proc3 to join...
557
+ [2025-04-06 21:30:51,632][29458] Waiting for process rollout_proc4 to join...
558
+ [2025-04-06 21:30:51,633][29458] Waiting for process rollout_proc5 to join...
559
+ [2025-04-06 21:30:51,634][29458] Waiting for process rollout_proc6 to join...
560
+ [2025-04-06 21:30:51,634][29458] Waiting for process rollout_proc7 to join...
561
+ [2025-04-06 21:30:51,635][29458] Batcher 0 profile tree view:
562
+ batching: 17.8770, releasing_batches: 0.0695
563
+ [2025-04-06 21:30:51,636][29458] InferenceWorker_p0-w0 profile tree view:
564
+ wait_policy: 0.0001
565
+ wait_policy_total: 42.1090
566
+ update_model: 10.7099
567
+ weight_update: 0.0035
568
+ one_step: 0.0087
569
+ handle_policy_step: 451.9330
570
+ deserialize: 16.9719, stack: 3.9921, obs_to_device_normalize: 136.9680, forward: 196.5648, send_messages: 25.9704
571
+ prepare_outputs: 43.6267
572
+ to_cpu: 23.3038
573
+ [2025-04-06 21:30:51,637][29458] Learner 0 profile tree view:
574
+ misc: 0.0148, prepare_batch: 11.7053
575
+ train: 50.0566
576
+ epoch_init: 0.0177, minibatch_init: 0.0135, losses_postprocess: 0.3649, kl_divergence: 0.5450, after_optimizer: 16.5465
577
+ calculate_losses: 17.8696
578
+ losses_init: 0.0104, forward_head: 1.2988, bptt_initial: 11.1267, tail: 1.2122, advantages_returns: 0.2812, losses: 1.7146
579
+ bptt: 1.7721
580
+ bptt_forward_core: 1.6535
581
+ update: 13.9069
582
+ clip: 1.7712
583
+ [2025-04-06 21:30:51,637][29458] RolloutWorker_w0 profile tree view:
584
+ wait_for_trajectories: 0.3932, enqueue_policy_requests: 14.9665, env_step: 208.0064, overhead: 24.6133, complete_rollouts: 0.6180
585
+ save_policy_outputs: 26.4317
586
+ split_output_tensors: 13.2113
587
+ [2025-04-06 21:30:51,638][29458] RolloutWorker_w7 profile tree view:
588
+ wait_for_trajectories: 0.4155, enqueue_policy_requests: 16.2129, env_step: 211.3192, overhead: 26.4352, complete_rollouts: 0.6806
589
+ save_policy_outputs: 28.5057
590
+ split_output_tensors: 14.0651
591
+ [2025-04-06 21:30:51,639][29458] Loop Runner_EvtLoop terminating...
592
+ [2025-04-06 21:30:51,640][29458] Runner profile tree view:
593
+ main_loop: 588.0457
594
+ [2025-04-06 21:30:51,641][29458] Collected {0: 4005888}, FPS: 6812.2
595
+ [2025-04-06 21:45:24,892][29458] Loading existing experiment configuration from /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/config.json
596
+ [2025-04-06 21:45:24,892][29458] Overriding arg 'num_workers' with value 1 passed from command line
597
+ [2025-04-06 21:45:24,892][29458] Adding new argument 'no_render'=True that is not in the saved config file!
598
+ [2025-04-06 21:45:24,893][29458] Adding new argument 'save_video'=True that is not in the saved config file!
599
+ [2025-04-06 21:45:24,894][29458] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
600
+ [2025-04-06 21:45:24,894][29458] Adding new argument 'video_name'=None that is not in the saved config file!
601
+ [2025-04-06 21:45:24,895][29458] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
602
+ [2025-04-06 21:45:24,895][29458] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
603
+ [2025-04-06 21:45:24,896][29458] Adding new argument 'push_to_hub'=False that is not in the saved config file!
604
+ [2025-04-06 21:45:24,896][29458] Adding new argument 'hf_repository'=None that is not in the saved config file!
605
+ [2025-04-06 21:45:24,897][29458] Adding new argument 'policy_index'=0 that is not in the saved config file!
606
+ [2025-04-06 21:45:24,898][29458] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
607
+ [2025-04-06 21:45:24,899][29458] Adding new argument 'train_script'=None that is not in the saved config file!
608
+ [2025-04-06 21:45:24,899][29458] Adding new argument 'enjoy_script'=None that is not in the saved config file!
609
+ [2025-04-06 21:45:24,900][29458] Using frameskip 1 and render_action_repeat=4 for evaluation
610
+ [2025-04-06 21:45:24,939][29458] Doom resolution: 160x120, resize resolution: (128, 72)
611
+ [2025-04-06 21:45:24,942][29458] RunningMeanStd input shape: (3, 72, 128)
612
+ [2025-04-06 21:45:24,944][29458] RunningMeanStd input shape: (1,)
613
+ [2025-04-06 21:45:24,978][29458] ConvEncoder: input_channels=3
614
+ [2025-04-06 21:45:25,206][29458] Conv encoder output size: 512
615
+ [2025-04-06 21:45:25,206][29458] Policy head output size: 512
616
+ [2025-04-06 21:45:25,370][29458] Loading state from checkpoint /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
617
+ [2025-04-06 21:45:25,373][29458] Could not load from checkpoint, attempt 0
618
+ Traceback (most recent call last):
619
+ File "/home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/venv/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
620
+ checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
621
+ File "/home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/venv/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
622
+ raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
623
+ _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
624
+ (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
625
+ (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
626
+ WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
627
+
628
+ Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
629
+ [2025-04-06 21:45:25,379][29458] Loading state from checkpoint /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
630
+ [2025-04-06 21:45:25,380][29458] Could not load from checkpoint, attempt 1
631
+ Traceback (most recent call last):
632
+ File "/home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/venv/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
633
+ checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
634
+ File "/home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/venv/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
635
+ raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
636
+ _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
637
+ (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
638
+ (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
639
+ WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
640
+
641
+ Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
642
+ [2025-04-06 21:45:25,380][29458] Loading state from checkpoint /home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
643
+ [2025-04-06 21:45:25,381][29458] Could not load from checkpoint, attempt 2
644
+ Traceback (most recent call last):
645
+ File "/home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/venv/lib/python3.10/site-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
646
+ checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
647
+ File "/home/tguz/Proj/PhD/RL/RL_courses/Hugging-Face-RL/venv/lib/python3.10/site-packages/torch/serialization.py", line 1470, in load
648
+ raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
649
+ _pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, do those steps only if you trust the source of the checkpoint.
650
+ (1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
651
+ (2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
652
+ WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.
653
+
654
+ Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.