WangChongan commited on
Commit
4e59508
·
verified ·
1 Parent(s): a8017d4

Upload folder using huggingface_hub

Browse files
.summary/0/events.out.tfevents.1756806802.a1e0233c2656 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0df08ea8be895cd332bddead4c58dfda6ed0c895f60433ac1828c55c6a298f4d
3
+ size 2497
README.md CHANGED
@@ -15,7 +15,7 @@ model-index:
15
  type: doom_health_gathering_supreme
16
  metrics:
17
  - type: mean_reward
18
- value: 4.06 +/- 0.70
19
  name: mean_reward
20
  verified: false
21
  ---
 
15
  type: doom_health_gathering_supreme
16
  metrics:
17
  - type: mean_reward
18
+ value: 3.95 +/- 0.57
19
  name: mean_reward
20
  verified: false
21
  ---
checkpoint_p0/checkpoint_000000005_20480.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f3578342954be56affd16ea702feccdba842a0bc2b16223716e02d6786fe9b7f
3
+ size 34929457
config.json CHANGED
@@ -65,7 +65,7 @@
65
  "summaries_use_frameskip": true,
66
  "heartbeat_interval": 20,
67
  "heartbeat_reporting_interval": 600,
68
- "train_for_env_steps": 4000,
69
  "train_for_seconds": 10000000000,
70
  "save_every_sec": 120,
71
  "keep_checkpoints": 2,
 
65
  "summaries_use_frameskip": true,
66
  "heartbeat_interval": 20,
67
  "heartbeat_reporting_interval": 600,
68
+ "train_for_env_steps": 10000,
69
  "train_for_seconds": 10000000000,
70
  "save_every_sec": 120,
71
  "keep_checkpoints": 2,
replay.mp4 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:a04e9665fb99c18667cadeb0eff2e27a518aac8d67611b0b9cfadca13422fd1c
3
- size 6063304
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b60dd4467cfe59ca184996dba397af9bb352f8426644f69ad1955c5c195afe5a
3
+ size 5808317
sf_log.txt CHANGED
@@ -708,3 +708,474 @@ main_loop: 55.3351
708
  [2025-09-02 09:52:13,559][02807] Avg episode rewards: #0: 4.564, true rewards: #0: 4.064
709
  [2025-09-02 09:52:13,560][02807] Avg episode reward: 4.564, avg true_objective: 4.064
710
  [2025-09-02 09:52:40,090][02807] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
708
  [2025-09-02 09:52:13,559][02807] Avg episode rewards: #0: 4.564, true rewards: #0: 4.064
709
  [2025-09-02 09:52:13,560][02807] Avg episode reward: 4.564, avg true_objective: 4.064
710
  [2025-09-02 09:52:40,090][02807] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
711
+ [2025-09-02 09:52:44,244][02807] The model has been pushed to https://huggingface.co/WangChongan/rl_course_vizdoom_health_gathering_supreme
712
+ [2025-09-02 09:53:22,722][02807] Environment doom_basic already registered, overwriting...
713
+ [2025-09-02 09:53:22,724][02807] Environment doom_two_colors_easy already registered, overwriting...
714
+ [2025-09-02 09:53:22,725][02807] Environment doom_two_colors_hard already registered, overwriting...
715
+ [2025-09-02 09:53:22,726][02807] Environment doom_dm already registered, overwriting...
716
+ [2025-09-02 09:53:22,727][02807] Environment doom_dwango5 already registered, overwriting...
717
+ [2025-09-02 09:53:22,728][02807] Environment doom_my_way_home_flat_actions already registered, overwriting...
718
+ [2025-09-02 09:53:22,729][02807] Environment doom_defend_the_center_flat_actions already registered, overwriting...
719
+ [2025-09-02 09:53:22,730][02807] Environment doom_my_way_home already registered, overwriting...
720
+ [2025-09-02 09:53:22,731][02807] Environment doom_deadly_corridor already registered, overwriting...
721
+ [2025-09-02 09:53:22,735][02807] Environment doom_defend_the_center already registered, overwriting...
722
+ [2025-09-02 09:53:22,736][02807] Environment doom_defend_the_line already registered, overwriting...
723
+ [2025-09-02 09:53:22,737][02807] Environment doom_health_gathering already registered, overwriting...
724
+ [2025-09-02 09:53:22,738][02807] Environment doom_health_gathering_supreme already registered, overwriting...
725
+ [2025-09-02 09:53:22,739][02807] Environment doom_battle already registered, overwriting...
726
+ [2025-09-02 09:53:22,740][02807] Environment doom_battle2 already registered, overwriting...
727
+ [2025-09-02 09:53:22,743][02807] Environment doom_duel_bots already registered, overwriting...
728
+ [2025-09-02 09:53:22,744][02807] Environment doom_deathmatch_bots already registered, overwriting...
729
+ [2025-09-02 09:53:22,745][02807] Environment doom_duel already registered, overwriting...
730
+ [2025-09-02 09:53:22,746][02807] Environment doom_deathmatch_full already registered, overwriting...
731
+ [2025-09-02 09:53:22,746][02807] Environment doom_benchmark already registered, overwriting...
732
+ [2025-09-02 09:53:22,747][02807] register_encoder_factory: <function make_vizdoom_encoder at 0x7f1b7c8de020>
733
+ [2025-09-02 09:53:22,771][02807] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
734
+ [2025-09-02 09:53:22,773][02807] Overriding arg 'train_for_env_steps' with value 10000 passed from command line
735
+ [2025-09-02 09:53:22,776][02807] Experiment dir /content/train_dir/default_experiment already exists!
736
+ [2025-09-02 09:53:22,779][02807] Resuming existing experiment from /content/train_dir/default_experiment...
737
+ [2025-09-02 09:53:22,780][02807] Weights and Biases integration disabled
738
+ [2025-09-02 09:53:22,783][02807] Environment var CUDA_VISIBLE_DEVICES is
739
+
740
+ [2025-09-02 09:53:25,606][02807] Starting experiment with the following configuration:
741
+ help=False
742
+ algo=APPO
743
+ env=doom_health_gathering_supreme
744
+ experiment=default_experiment
745
+ train_dir=/content/train_dir
746
+ restart_behavior=resume
747
+ device=cpu
748
+ seed=None
749
+ num_policies=1
750
+ async_rl=True
751
+ serial_mode=False
752
+ batched_sampling=False
753
+ num_batches_to_accumulate=2
754
+ worker_num_splits=2
755
+ policy_workers_per_policy=1
756
+ max_policy_lag=1000
757
+ num_workers=8
758
+ num_envs_per_worker=4
759
+ batch_size=1024
760
+ num_batches_per_epoch=1
761
+ num_epochs=1
762
+ rollout=32
763
+ recurrence=32
764
+ shuffle_minibatches=False
765
+ gamma=0.99
766
+ reward_scale=1.0
767
+ reward_clip=1000.0
768
+ value_bootstrap=False
769
+ normalize_returns=True
770
+ exploration_loss_coeff=0.001
771
+ value_loss_coeff=0.5
772
+ kl_loss_coeff=0.0
773
+ exploration_loss=symmetric_kl
774
+ gae_lambda=0.95
775
+ ppo_clip_ratio=0.1
776
+ ppo_clip_value=0.2
777
+ with_vtrace=False
778
+ vtrace_rho=1.0
779
+ vtrace_c=1.0
780
+ optimizer=adam
781
+ adam_eps=1e-06
782
+ adam_beta1=0.9
783
+ adam_beta2=0.999
784
+ max_grad_norm=4.0
785
+ learning_rate=0.0001
786
+ lr_schedule=constant
787
+ lr_schedule_kl_threshold=0.008
788
+ lr_adaptive_min=1e-06
789
+ lr_adaptive_max=0.01
790
+ obs_subtract_mean=0.0
791
+ obs_scale=255.0
792
+ normalize_input=True
793
+ normalize_input_keys=None
794
+ decorrelate_experience_max_seconds=0
795
+ decorrelate_envs_on_one_worker=True
796
+ actor_worker_gpus=[]
797
+ set_workers_cpu_affinity=True
798
+ force_envs_single_thread=False
799
+ default_niceness=0
800
+ log_to_file=True
801
+ experiment_summaries_interval=10
802
+ flush_summaries_interval=30
803
+ stats_avg=100
804
+ summaries_use_frameskip=True
805
+ heartbeat_interval=20
806
+ heartbeat_reporting_interval=600
807
+ train_for_env_steps=10000
808
+ train_for_seconds=10000000000
809
+ save_every_sec=120
810
+ keep_checkpoints=2
811
+ load_checkpoint_kind=latest
812
+ save_milestones_sec=-1
813
+ save_best_every_sec=5
814
+ save_best_metric=reward
815
+ save_best_after=100000
816
+ benchmark=False
817
+ encoder_mlp_layers=[512, 512]
818
+ encoder_conv_architecture=convnet_simple
819
+ encoder_conv_mlp_layers=[512]
820
+ use_rnn=True
821
+ rnn_size=512
822
+ rnn_type=gru
823
+ rnn_num_layers=1
824
+ decoder_mlp_layers=[]
825
+ nonlinearity=elu
826
+ policy_initialization=orthogonal
827
+ policy_init_gain=1.0
828
+ actor_critic_share_weights=True
829
+ adaptive_stddev=True
830
+ continuous_tanh_scale=0.0
831
+ initial_stddev=1.0
832
+ use_env_info_cache=False
833
+ env_gpu_actions=False
834
+ env_gpu_observations=True
835
+ env_frameskip=4
836
+ env_framestack=1
837
+ pixel_format=CHW
838
+ use_record_episode_statistics=False
839
+ with_wandb=False
840
+ wandb_user=None
841
+ wandb_project=sample_factory
842
+ wandb_group=None
843
+ wandb_job_type=SF
844
+ wandb_tags=[]
845
+ with_pbt=False
846
+ pbt_mix_policies_in_one_env=True
847
+ pbt_period_env_steps=5000000
848
+ pbt_start_mutation=20000000
849
+ pbt_replace_fraction=0.3
850
+ pbt_mutation_rate=0.15
851
+ pbt_replace_reward_gap=0.1
852
+ pbt_replace_reward_gap_absolute=1e-06
853
+ pbt_optimize_gamma=False
854
+ pbt_target_objective=true_objective
855
+ pbt_perturb_min=1.1
856
+ pbt_perturb_max=1.5
857
+ num_agents=-1
858
+ num_humans=0
859
+ num_bots=-1
860
+ start_bot_difficulty=None
861
+ timelimit=None
862
+ res_w=128
863
+ res_h=72
864
+ wide_aspect_ratio=False
865
+ eval_env_frameskip=1
866
+ fps=35
867
+ command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
868
+ cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
869
+ git_hash=unknown
870
+ git_repo_name=not a git repository
871
+ [2025-09-02 09:53:25,608][02807] Saving configuration to /content/train_dir/default_experiment/config.json...
872
+ [2025-09-02 09:53:25,610][02807] Rollout worker 0 uses device cpu
873
+ [2025-09-02 09:53:25,611][02807] Rollout worker 1 uses device cpu
874
+ [2025-09-02 09:53:25,612][02807] Rollout worker 2 uses device cpu
875
+ [2025-09-02 09:53:25,613][02807] Rollout worker 3 uses device cpu
876
+ [2025-09-02 09:53:25,614][02807] Rollout worker 4 uses device cpu
877
+ [2025-09-02 09:53:25,615][02807] Rollout worker 5 uses device cpu
878
+ [2025-09-02 09:53:25,616][02807] Rollout worker 6 uses device cpu
879
+ [2025-09-02 09:53:25,617][02807] Rollout worker 7 uses device cpu
880
+ [2025-09-02 09:53:25,690][02807] InferenceWorker_p0-w0: min num requests: 2
881
+ [2025-09-02 09:53:25,722][02807] Starting all processes...
882
+ [2025-09-02 09:53:25,723][02807] Starting process learner_proc0
883
+ [2025-09-02 09:53:25,798][02807] Starting all processes...
884
+ [2025-09-02 09:53:25,804][02807] Starting process inference_proc0-0
885
+ [2025-09-02 09:53:25,804][02807] Starting process rollout_proc0
886
+ [2025-09-02 09:53:25,805][02807] Starting process rollout_proc1
887
+ [2025-09-02 09:53:25,805][02807] Starting process rollout_proc2
888
+ [2025-09-02 09:53:25,805][02807] Starting process rollout_proc3
889
+ [2025-09-02 09:53:25,805][02807] Starting process rollout_proc4
890
+ [2025-09-02 09:53:25,805][02807] Starting process rollout_proc5
891
+ [2025-09-02 09:53:25,805][02807] Starting process rollout_proc6
892
+ [2025-09-02 09:53:25,805][02807] Starting process rollout_proc7
893
+ [2025-09-02 09:53:45,617][06913] Starting seed is not provided
894
+ [2025-09-02 09:53:45,618][06913] Initializing actor-critic model on device cpu
895
+ [2025-09-02 09:53:45,619][06913] RunningMeanStd input shape: (3, 72, 128)
896
+ [2025-09-02 09:53:45,621][06913] RunningMeanStd input shape: (1,)
897
+ [2025-09-02 09:53:45,683][02807] Heartbeat connected on Batcher_0
898
+ [2025-09-02 09:53:45,719][06913] ConvEncoder: input_channels=3
899
+ [2025-09-02 09:53:45,846][06933] Worker 7 uses CPU cores [1]
900
+ [2025-09-02 09:53:45,881][02807] Heartbeat connected on RolloutWorker_w7
901
+ [2025-09-02 09:53:45,899][06929] Worker 3 uses CPU cores [1]
902
+ [2025-09-02 09:53:45,923][02807] Heartbeat connected on RolloutWorker_w3
903
+ [2025-09-02 09:53:45,957][06932] Worker 6 uses CPU cores [0]
904
+ [2025-09-02 09:53:45,965][02807] Heartbeat connected on RolloutWorker_w6
905
+ [2025-09-02 09:53:46,157][06934] Worker 5 uses CPU cores [1]
906
+ [2025-09-02 09:53:46,167][06927] Worker 0 uses CPU cores [0]
907
+ [2025-09-02 09:53:46,171][02807] Heartbeat connected on RolloutWorker_w5
908
+ [2025-09-02 09:53:46,184][06926] Worker 1 uses CPU cores [1]
909
+ [2025-09-02 09:53:46,196][02807] Heartbeat connected on RolloutWorker_w1
910
+ [2025-09-02 09:53:46,197][02807] Heartbeat connected on RolloutWorker_w0
911
+ [2025-09-02 09:53:46,218][06928] Worker 2 uses CPU cores [0]
912
+ [2025-09-02 09:53:46,224][02807] Heartbeat connected on RolloutWorker_w2
913
+ [2025-09-02 09:53:46,269][02807] Heartbeat connected on InferenceWorker_p0-w0
914
+ [2025-09-02 09:53:46,297][06931] Worker 4 uses CPU cores [0]
915
+ [2025-09-02 09:53:46,299][02807] Heartbeat connected on RolloutWorker_w4
916
+ [2025-09-02 09:53:46,325][06913] Conv encoder output size: 512
917
+ [2025-09-02 09:53:46,325][06913] Policy head output size: 512
918
+ [2025-09-02 09:53:46,342][06913] Created Actor Critic model with architecture:
919
+ [2025-09-02 09:53:46,342][06913] ActorCriticSharedWeights(
920
+ (obs_normalizer): ObservationNormalizer(
921
+ (running_mean_std): RunningMeanStdDictInPlace(
922
+ (running_mean_std): ModuleDict(
923
+ (obs): RunningMeanStdInPlace()
924
+ )
925
+ )
926
+ )
927
+ (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
928
+ (encoder): VizdoomEncoder(
929
+ (basic_encoder): ConvEncoder(
930
+ (enc): RecursiveScriptModule(
931
+ original_name=ConvEncoderImpl
932
+ (conv_head): RecursiveScriptModule(
933
+ original_name=Sequential
934
+ (0): RecursiveScriptModule(original_name=Conv2d)
935
+ (1): RecursiveScriptModule(original_name=ELU)
936
+ (2): RecursiveScriptModule(original_name=Conv2d)
937
+ (3): RecursiveScriptModule(original_name=ELU)
938
+ (4): RecursiveScriptModule(original_name=Conv2d)
939
+ (5): RecursiveScriptModule(original_name=ELU)
940
+ )
941
+ (mlp_layers): RecursiveScriptModule(
942
+ original_name=Sequential
943
+ (0): RecursiveScriptModule(original_name=Linear)
944
+ (1): RecursiveScriptModule(original_name=ELU)
945
+ )
946
+ )
947
+ )
948
+ )
949
+ (core): ModelCoreRNN(
950
+ (core): GRU(512, 512)
951
+ )
952
+ (decoder): MlpDecoder(
953
+ (mlp): Identity()
954
+ )
955
+ (critic_linear): Linear(in_features=512, out_features=1, bias=True)
956
+ (action_parameterization): ActionParameterizationDefault(
957
+ (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
958
+ )
959
+ )
960
+ [2025-09-02 09:53:46,691][06913] Using optimizer <class 'torch.optim.adam.Adam'>
961
+ [2025-09-02 09:53:48,476][06913] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000003_12288.pth...
962
+ [2025-09-02 09:53:48,535][06913] Loading model from checkpoint
963
+ [2025-09-02 09:53:48,541][06913] Loaded experiment state at self.train_step=3, self.env_steps=12288
964
+ [2025-09-02 09:53:48,541][06913] Initialized policy 0 weights for model version 3
965
+ [2025-09-02 09:53:48,547][06930] RunningMeanStd input shape: (3, 72, 128)
966
+ [2025-09-02 09:53:48,548][06930] RunningMeanStd input shape: (1,)
967
+ [2025-09-02 09:53:48,556][06913] LearnerWorker_p0 finished initialization!
968
+ [2025-09-02 09:53:48,561][02807] Heartbeat connected on LearnerWorker_p0
969
+ [2025-09-02 09:53:48,599][06930] ConvEncoder: input_channels=3
970
+ [2025-09-02 09:53:48,812][06930] Conv encoder output size: 512
971
+ [2025-09-02 09:53:48,815][06930] Policy head output size: 512
972
+ [2025-09-02 09:53:48,852][02807] Inference worker 0-0 is ready!
973
+ [2025-09-02 09:53:48,853][02807] All inference workers are ready! Signal rollout workers to start!
974
+ [2025-09-02 09:53:49,154][06929] Doom resolution: 160x120, resize resolution: (128, 72)
975
+ [2025-09-02 09:53:49,156][06933] Doom resolution: 160x120, resize resolution: (128, 72)
976
+ [2025-09-02 09:53:49,162][06926] Doom resolution: 160x120, resize resolution: (128, 72)
977
+ [2025-09-02 09:53:49,163][06934] Doom resolution: 160x120, resize resolution: (128, 72)
978
+ [2025-09-02 09:53:49,264][06931] Doom resolution: 160x120, resize resolution: (128, 72)
979
+ [2025-09-02 09:53:49,270][06932] Doom resolution: 160x120, resize resolution: (128, 72)
980
+ [2025-09-02 09:53:49,275][06927] Doom resolution: 160x120, resize resolution: (128, 72)
981
+ [2025-09-02 09:53:49,270][06928] Doom resolution: 160x120, resize resolution: (128, 72)
982
+ [2025-09-02 09:53:50,355][06933] Decorrelating experience for 0 frames...
983
+ [2025-09-02 09:53:50,840][06928] Decorrelating experience for 0 frames...
984
+ [2025-09-02 09:53:50,848][06927] Decorrelating experience for 0 frames...
985
+ [2025-09-02 09:53:50,855][06931] Decorrelating experience for 0 frames...
986
+ [2025-09-02 09:53:51,241][06933] Decorrelating experience for 32 frames...
987
+ [2025-09-02 09:53:51,324][06929] Decorrelating experience for 0 frames...
988
+ [2025-09-02 09:53:51,738][06931] Decorrelating experience for 32 frames...
989
+ [2025-09-02 09:53:51,741][06927] Decorrelating experience for 32 frames...
990
+ [2025-09-02 09:53:52,340][06929] Decorrelating experience for 32 frames...
991
+ [2025-09-02 09:53:52,374][06926] Decorrelating experience for 0 frames...
992
+ [2025-09-02 09:53:52,783][02807] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 12288. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
993
+ [2025-09-02 09:53:53,520][06932] Decorrelating experience for 0 frames...
994
+ [2025-09-02 09:53:53,539][06929] Decorrelating experience for 64 frames...
995
+ [2025-09-02 09:53:54,094][06927] Decorrelating experience for 64 frames...
996
+ [2025-09-02 09:53:54,096][06931] Decorrelating experience for 64 frames...
997
+ [2025-09-02 09:53:54,589][06928] Decorrelating experience for 32 frames...
998
+ [2025-09-02 09:53:55,140][06933] Decorrelating experience for 64 frames...
999
+ [2025-09-02 09:53:55,319][06932] Decorrelating experience for 32 frames...
1000
+ [2025-09-02 09:53:55,346][06929] Decorrelating experience for 96 frames...
1001
+ [2025-09-02 09:53:55,920][06931] Decorrelating experience for 96 frames...
1002
+ [2025-09-02 09:53:56,155][06926] Decorrelating experience for 32 frames...
1003
+ [2025-09-02 09:53:56,743][06928] Decorrelating experience for 64 frames...
1004
+ [2025-09-02 09:53:57,345][06934] Decorrelating experience for 0 frames...
1005
+ [2025-09-02 09:53:57,380][06933] Decorrelating experience for 96 frames...
1006
+ [2025-09-02 09:53:57,783][02807] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 12288. Throughput: 0: 0.0. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
1007
+ [2025-09-02 09:53:57,785][02807] Avg episode reward: [(0, '1.440')]
1008
+ [2025-09-02 09:53:57,906][06927] Decorrelating experience for 96 frames...
1009
+ [2025-09-02 09:53:58,494][06932] Decorrelating experience for 64 frames...
1010
+ [2025-09-02 09:53:58,797][06926] Decorrelating experience for 64 frames...
1011
+ [2025-09-02 09:53:59,833][06926] Decorrelating experience for 96 frames...
1012
+ [2025-09-02 09:53:59,863][06928] Decorrelating experience for 96 frames...
1013
+ [2025-09-02 09:54:02,783][02807] Fps is (10 sec: 0.0, 60 sec: 0.0, 300 sec: 0.0). Total num frames: 12288. Throughput: 0: 133.0. Samples: 1330. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
1014
+ [2025-09-02 09:54:02,785][02807] Avg episode reward: [(0, '2.453')]
1015
+ [2025-09-02 09:54:03,868][06932] Decorrelating experience for 96 frames...
1016
+ [2025-09-02 09:54:04,999][06934] Decorrelating experience for 32 frames...
1017
+ [2025-09-02 09:54:06,481][06913] Signal inference workers to stop experience collection...
1018
+ [2025-09-02 09:54:06,526][06930] InferenceWorker_p0-w0: stopping experience collection
1019
+ [2025-09-02 09:54:07,030][06934] Decorrelating experience for 64 frames...
1020
+ [2025-09-02 09:54:07,600][06934] Decorrelating experience for 96 frames...
1021
+ [2025-09-02 09:54:07,754][06913] Signal inference workers to resume experience collection...
1022
+ [2025-09-02 09:54:07,761][06913] Stopping Batcher_0...
1023
+ [2025-09-02 09:54:07,762][02807] Component Batcher_0 stopped!
1024
+ [2025-09-02 09:54:07,768][06913] Loop batcher_evt_loop terminating...
1025
+ [2025-09-02 09:54:07,804][06930] Weights refcount: 2 0
1026
+ [2025-09-02 09:54:07,810][02807] Component InferenceWorker_p0-w0 stopped!
1027
+ [2025-09-02 09:54:07,818][06930] Stopping InferenceWorker_p0-w0...
1028
+ [2025-09-02 09:54:07,819][06930] Loop inference_proc0-0_evt_loop terminating...
1029
+ [2025-09-02 09:54:08,117][06926] Stopping RolloutWorker_w1...
1030
+ [2025-09-02 09:54:08,122][06926] Loop rollout_proc1_evt_loop terminating...
1031
+ [2025-09-02 09:54:08,117][02807] Component RolloutWorker_w1 stopped!
1032
+ [2025-09-02 09:54:08,144][06933] Stopping RolloutWorker_w7...
1033
+ [2025-09-02 09:54:08,147][06933] Loop rollout_proc7_evt_loop terminating...
1034
+ [2025-09-02 09:54:08,144][02807] Component RolloutWorker_w7 stopped!
1035
+ [2025-09-02 09:54:08,168][06934] Stopping RolloutWorker_w5...
1036
+ [2025-09-02 09:54:08,168][02807] Component RolloutWorker_w5 stopped!
1037
+ [2025-09-02 09:54:08,169][06934] Loop rollout_proc5_evt_loop terminating...
1038
+ [2025-09-02 09:54:08,180][06929] Stopping RolloutWorker_w3...
1039
+ [2025-09-02 09:54:08,190][06929] Loop rollout_proc3_evt_loop terminating...
1040
+ [2025-09-02 09:54:08,180][02807] Component RolloutWorker_w3 stopped!
1041
+ [2025-09-02 09:54:08,303][02807] Component RolloutWorker_w2 stopped!
1042
+ [2025-09-02 09:54:08,308][06928] Stopping RolloutWorker_w2...
1043
+ [2025-09-02 09:54:08,317][02807] Component RolloutWorker_w4 stopped!
1044
+ [2025-09-02 09:54:08,321][06931] Stopping RolloutWorker_w4...
1045
+ [2025-09-02 09:54:08,322][06931] Loop rollout_proc4_evt_loop terminating...
1046
+ [2025-09-02 09:54:08,326][02807] Component RolloutWorker_w6 stopped!
1047
+ [2025-09-02 09:54:08,329][06932] Stopping RolloutWorker_w6...
1048
+ [2025-09-02 09:54:08,330][06932] Loop rollout_proc6_evt_loop terminating...
1049
+ [2025-09-02 09:54:08,314][06928] Loop rollout_proc2_evt_loop terminating...
1050
+ [2025-09-02 09:54:08,347][02807] Component RolloutWorker_w0 stopped!
1051
+ [2025-09-02 09:54:08,351][06927] Stopping RolloutWorker_w0...
1052
+ [2025-09-02 09:54:08,352][06927] Loop rollout_proc0_evt_loop terminating...
1053
+ [2025-09-02 09:54:13,196][06913] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000005_20480.pth...
1054
+ [2025-09-02 09:54:13,259][06913] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000005_20480.pth...
1055
+ [2025-09-02 09:54:13,366][06913] Stopping LearnerWorker_p0...
1056
+ [2025-09-02 09:54:13,367][06913] Loop learner_proc0_evt_loop terminating...
1057
+ [2025-09-02 09:54:13,366][02807] Component LearnerWorker_p0 stopped!
1058
+ [2025-09-02 09:54:13,371][02807] Waiting for process learner_proc0 to stop...
1059
+ [2025-09-02 09:54:14,291][02807] Waiting for process inference_proc0-0 to join...
1060
+ [2025-09-02 09:54:14,293][02807] Waiting for process rollout_proc0 to join...
1061
+ [2025-09-02 09:54:14,296][02807] Waiting for process rollout_proc1 to join...
1062
+ [2025-09-02 09:54:14,297][02807] Waiting for process rollout_proc2 to join...
1063
+ [2025-09-02 09:54:14,299][02807] Waiting for process rollout_proc3 to join...
1064
+ [2025-09-02 09:54:14,300][02807] Waiting for process rollout_proc4 to join...
1065
+ [2025-09-02 09:54:14,301][02807] Waiting for process rollout_proc5 to join...
1066
+ [2025-09-02 09:54:14,302][02807] Waiting for process rollout_proc6 to join...
1067
+ [2025-09-02 09:54:14,305][02807] Waiting for process rollout_proc7 to join...
1068
+ [2025-09-02 09:54:14,306][02807] Batcher 0 profile tree view:
1069
+ batching: 0.0391, releasing_batches: 0.0067
1070
+ [2025-09-02 09:54:14,308][02807] InferenceWorker_p0-w0 profile tree view:
1071
+ update_model: 0.0657
1072
+ wait_policy: 0.0001
1073
+ wait_policy_total: 7.4236
1074
+ one_step: 0.0267
1075
+ handle_policy_step: 9.7252
1076
+ deserialize: 0.3177, stack: 0.0320, obs_to_device_normalize: 1.0536, forward: 7.7469, send_messages: 0.2031
1077
+ prepare_outputs: 0.1186
1078
+ to_cpu: 0.0155
1079
+ [2025-09-02 09:54:14,309][02807] Learner 0 profile tree view:
1080
+ misc: 0.0000, prepare_batch: 2.8606
1081
+ train: 9.2517
1082
+ epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0027, kl_divergence: 0.0009, after_optimizer: 0.0044
1083
+ calculate_losses: 4.0182
1084
+ losses_init: 0.0000, forward_head: 3.5911, bptt_initial: 0.0130, tail: 0.0037, advantages_returns: 0.0010, losses: 0.0029
1085
+ bptt: 0.4059
1086
+ bptt_forward_core: 0.4034
1087
+ update: 5.2216
1088
+ clip: 0.0072
1089
+ [2025-09-02 09:54:14,310][02807] RolloutWorker_w0 profile tree view:
1090
+ wait_for_trajectories: 0.0010, enqueue_policy_requests: 0.1876, env_step: 3.5443, overhead: 0.0776, complete_rollouts: 0.0482
1091
+ save_policy_outputs: 0.0791
1092
+ split_output_tensors: 0.0251
1093
+ [2025-09-02 09:54:14,311][02807] RolloutWorker_w7 profile tree view:
1094
+ wait_for_trajectories: 0.0010, enqueue_policy_requests: 0.2246, env_step: 4.1101, overhead: 0.0910, complete_rollouts: 0.0875
1095
+ save_policy_outputs: 0.0999
1096
+ split_output_tensors: 0.0628
1097
+ [2025-09-02 09:54:14,312][02807] Loop Runner_EvtLoop terminating...
1098
+ [2025-09-02 09:54:14,313][02807] Runner profile tree view:
1099
+ main_loop: 48.5912
1100
+ [2025-09-02 09:54:14,314][02807] Collected {0: 20480}, FPS: 168.6
1101
+ [2025-09-02 10:01:30,977][02807] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
1102
+ [2025-09-02 10:01:30,979][02807] Overriding arg 'num_workers' with value 1 passed from command line
1103
+ [2025-09-02 10:01:30,981][02807] Adding new argument 'no_render'=True that is not in the saved config file!
1104
+ [2025-09-02 10:01:30,982][02807] Adding new argument 'save_video'=True that is not in the saved config file!
1105
+ [2025-09-02 10:01:30,983][02807] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
1106
+ [2025-09-02 10:01:30,985][02807] Adding new argument 'video_name'=None that is not in the saved config file!
1107
+ [2025-09-02 10:01:30,986][02807] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
1108
+ [2025-09-02 10:01:30,987][02807] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
1109
+ [2025-09-02 10:01:30,988][02807] Adding new argument 'push_to_hub'=True that is not in the saved config file!
1110
+ [2025-09-02 10:01:30,989][02807] Adding new argument 'hf_repository'='WangChongan/rl_course_vizdoom_health_gathering_supreme' that is not in the saved config file!
1111
+ [2025-09-02 10:01:30,990][02807] Adding new argument 'policy_index'=0 that is not in the saved config file!
1112
+ [2025-09-02 10:01:30,991][02807] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
1113
+ [2025-09-02 10:01:30,992][02807] Adding new argument 'train_script'=None that is not in the saved config file!
1114
+ [2025-09-02 10:01:30,993][02807] Adding new argument 'enjoy_script'=None that is not in the saved config file!
1115
+ [2025-09-02 10:01:30,994][02807] Using frameskip 1 and render_action_repeat=4 for evaluation
1116
+ [2025-09-02 10:01:31,023][02807] RunningMeanStd input shape: (3, 72, 128)
1117
+ [2025-09-02 10:01:31,025][02807] RunningMeanStd input shape: (1,)
1118
+ [2025-09-02 10:01:31,036][02807] ConvEncoder: input_channels=3
1119
+ [2025-09-02 10:01:31,074][02807] Conv encoder output size: 512
1120
+ [2025-09-02 10:01:31,076][02807] Policy head output size: 512
1121
+ [2025-09-02 10:01:31,092][02807] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000005_20480.pth...
1122
+ [2025-09-02 10:01:31,614][02807] Num frames 100...
1123
+ [2025-09-02 10:01:31,834][02807] Num frames 200...
1124
+ [2025-09-02 10:01:32,036][02807] Num frames 300...
1125
+ [2025-09-02 10:01:32,261][02807] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
1126
+ [2025-09-02 10:01:32,263][02807] Avg episode reward: 3.840, avg true_objective: 3.840
1127
+ [2025-09-02 10:01:32,297][02807] Num frames 400...
1128
+ [2025-09-02 10:01:32,497][02807] Num frames 500...
1129
+ [2025-09-02 10:01:32,706][02807] Num frames 600...
1130
+ [2025-09-02 10:01:32,905][02807] Num frames 700...
1131
+ [2025-09-02 10:01:33,096][02807] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
1132
+ [2025-09-02 10:01:33,098][02807] Avg episode reward: 3.840, avg true_objective: 3.840
1133
+ [2025-09-02 10:01:33,167][02807] Num frames 800...
1134
+ [2025-09-02 10:01:33,364][02807] Num frames 900...
1135
+ [2025-09-02 10:01:33,564][02807] Num frames 1000...
1136
+ [2025-09-02 10:01:33,770][02807] Num frames 1100...
1137
+ [2025-09-02 10:01:33,928][02807] Avg episode rewards: #0: 3.840, true rewards: #0: 3.840
1138
+ [2025-09-02 10:01:33,929][02807] Avg episode reward: 3.840, avg true_objective: 3.840
1139
+ [2025-09-02 10:01:34,029][02807] Num frames 1200...
1140
+ [2025-09-02 10:01:34,236][02807] Num frames 1300...
1141
+ [2025-09-02 10:01:34,435][02807] Num frames 1400...
1142
+ [2025-09-02 10:01:34,633][02807] Num frames 1500...
1143
+ [2025-09-02 10:01:34,839][02807] Avg episode rewards: #0: 4.170, true rewards: #0: 3.920
1144
+ [2025-09-02 10:01:34,840][02807] Avg episode reward: 4.170, avg true_objective: 3.920
1145
+ [2025-09-02 10:01:34,907][02807] Num frames 1600...
1146
+ [2025-09-02 10:01:35,105][02807] Num frames 1700...
1147
+ [2025-09-02 10:01:35,307][02807] Num frames 1800...
1148
+ [2025-09-02 10:01:35,504][02807] Avg episode rewards: #0: 3.942, true rewards: #0: 3.742
1149
+ [2025-09-02 10:01:35,505][02807] Avg episode reward: 3.942, avg true_objective: 3.742
1150
+ [2025-09-02 10:01:35,567][02807] Num frames 1900...
1151
+ [2025-09-02 10:01:35,768][02807] Num frames 2000...
1152
+ [2025-09-02 10:01:35,985][02807] Num frames 2100...
1153
+ [2025-09-02 10:01:36,193][02807] Num frames 2200...
1154
+ [2025-09-02 10:01:36,359][02807] Avg episode rewards: #0: 3.925, true rewards: #0: 3.758
1155
+ [2025-09-02 10:01:36,360][02807] Avg episode reward: 3.925, avg true_objective: 3.758
1156
+ [2025-09-02 10:01:36,448][02807] Num frames 2300...
1157
+ [2025-09-02 10:01:36,645][02807] Num frames 2400...
1158
+ [2025-09-02 10:01:36,862][02807] Num frames 2500...
1159
+ [2025-09-02 10:01:37,148][02807] Num frames 2600...
1160
+ [2025-09-02 10:01:37,319][02807] Avg episode rewards: #0: 3.913, true rewards: #0: 3.770
1161
+ [2025-09-02 10:01:37,322][02807] Avg episode reward: 3.913, avg true_objective: 3.770
1162
+ [2025-09-02 10:01:37,489][02807] Num frames 2700...
1163
+ [2025-09-02 10:01:37,755][02807] Num frames 2800...
1164
+ [2025-09-02 10:01:38,039][02807] Num frames 2900...
1165
+ [2025-09-02 10:01:38,320][02807] Num frames 3000...
1166
+ [2025-09-02 10:01:38,442][02807] Avg episode rewards: #0: 3.904, true rewards: #0: 3.779
1167
+ [2025-09-02 10:01:38,449][02807] Avg episode reward: 3.904, avg true_objective: 3.779
1168
+ [2025-09-02 10:01:38,673][02807] Num frames 3100...
1169
+ [2025-09-02 10:01:38,964][02807] Num frames 3200...
1170
+ [2025-09-02 10:01:39,274][02807] Num frames 3300...
1171
+ [2025-09-02 10:01:39,555][02807] Num frames 3400...
1172
+ [2025-09-02 10:01:39,753][02807] Num frames 3500...
1173
+ [2025-09-02 10:01:39,943][02807] Avg episode rewards: #0: 4.297, true rewards: #0: 3.963
1174
+ [2025-09-02 10:01:39,944][02807] Avg episode reward: 4.297, avg true_objective: 3.963
1175
+ [2025-09-02 10:01:40,012][02807] Num frames 3600...
1176
+ [2025-09-02 10:01:40,230][02807] Num frames 3700...
1177
+ [2025-09-02 10:01:40,434][02807] Num frames 3800...
1178
+ [2025-09-02 10:01:40,638][02807] Num frames 3900...
1179
+ [2025-09-02 10:01:40,798][02807] Avg episode rewards: #0: 4.251, true rewards: #0: 3.951
1180
+ [2025-09-02 10:01:40,799][02807] Avg episode reward: 4.251, avg true_objective: 3.951
1181
+ [2025-09-02 10:02:06,456][02807] Replay video saved to /content/train_dir/default_experiment/replay.mp4!