[2025-07-29 10:53:09,328][05283] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-07-29 10:53:09,330][05283] Rollout worker 0 uses device cpu
[2025-07-29 10:53:09,330][05283] Rollout worker 1 uses device cpu
[2025-07-29 10:53:09,331][05283] Rollout worker 2 uses device cpu
[2025-07-29 10:53:09,332][05283] Rollout worker 3 uses device cpu
[2025-07-29 10:53:09,332][05283] Rollout worker 4 uses device cpu
[2025-07-29 10:53:09,334][05283] Rollout worker 5 uses device cpu
[2025-07-29 10:53:09,334][05283] Rollout worker 6 uses device cpu
[2025-07-29 10:53:09,335][05283] Rollout worker 7 uses device cpu
[2025-07-29 10:53:09,429][05283] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 10:53:09,430][05283] InferenceWorker_p0-w0: min num requests: 2
[2025-07-29 10:53:09,459][05283] Starting all processes...
[2025-07-29 10:53:09,460][05283] Starting process learner_proc0
[2025-07-29 10:53:09,462][05283] EvtLoop [Runner_EvtLoop, process=main process 5283] unhandled exception in slot='_on_start' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=()
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
    slot_callable(*args)
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 49, in _on_start
    self._start_processes()
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 56, in _start_processes
    p.start()
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 515, in start
    self._process.start()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.11/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'TLSBuffer' object
[2025-07-29 10:53:09,467][05283] Unhandled exception cannot pickle 'TLSBuffer' object in evt loop Runner_EvtLoop
[2025-07-29 10:53:09,468][05283] Uncaught exception in Runner evt loop
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/runners/runner.py", line 770, in run
    evt_loop_status = self.event_loop.exec()
                      ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 403, in exec
    raise exc
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 399, in exec
    while self._loop_iteration():
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 383, in _loop_iteration
    self._process_signal(s)
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 358, in _process_signal
    raise exc
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
    slot_callable(*args)
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 49, in _on_start
    self._start_processes()
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 56, in _start_processes
    p.start()
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 515, in start
    self._process.start()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.11/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'TLSBuffer' object
[2025-07-29 10:53:09,470][05283] Runner profile tree view:
main_loop: 0.0113
[2025-07-29 10:53:09,471][05283] Collected {}, FPS: 0.0
[2025-07-29 10:53:31,275][05283] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 10:53:31,275][05283] Overriding arg 'num_workers' with value 1 passed from command line
[2025-07-29 10:53:31,276][05283] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-07-29 10:53:31,277][05283] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-07-29 10:53:31,277][05283] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 10:53:31,278][05283] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-07-29 10:53:31,279][05283] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 10:53:31,280][05283] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-07-29 10:53:31,280][05283] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-07-29 10:53:31,281][05283] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-07-29 10:53:31,281][05283] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-07-29 10:53:31,282][05283] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-07-29 10:53:31,282][05283] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-07-29 10:53:31,283][05283] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-07-29 10:53:31,284][05283] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-07-29 10:53:31,311][05283] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 10:53:31,313][05283] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 10:53:31,316][05283] RunningMeanStd input shape: (1,)
[2025-07-29 10:53:31,329][05283] ConvEncoder: input_channels=3
[2025-07-29 10:53:31,591][05283] Conv encoder output size: 512
[2025-07-29 10:53:31,592][05283] Policy head output size: 512
[2025-07-29 10:53:31,915][05283] No checkpoints found
[2025-07-29 10:53:45,412][05283] Environment doom_basic already registered, overwriting...
[2025-07-29 10:53:45,413][05283] Environment doom_two_colors_easy already registered, overwriting...
[2025-07-29 10:53:45,414][05283] Environment doom_two_colors_hard already registered, overwriting...
[2025-07-29 10:53:45,415][05283] Environment doom_dm already registered, overwriting...
[2025-07-29 10:53:45,415][05283] Environment doom_dwango5 already registered, overwriting...
[2025-07-29 10:53:45,416][05283] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2025-07-29 10:53:45,416][05283] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2025-07-29 10:53:45,417][05283] Environment doom_my_way_home already registered, overwriting...
[2025-07-29 10:53:45,417][05283] Environment doom_deadly_corridor already registered, overwriting...
[2025-07-29 10:53:45,418][05283] Environment doom_defend_the_center already registered, overwriting...
[2025-07-29 10:53:45,419][05283] Environment doom_defend_the_line already registered, overwriting...
[2025-07-29 10:53:45,419][05283] Environment doom_health_gathering already registered, overwriting...
[2025-07-29 10:53:45,420][05283] Environment doom_health_gathering_supreme already registered, overwriting...
[2025-07-29 10:53:45,420][05283] Environment doom_battle already registered, overwriting...
[2025-07-29 10:53:45,421][05283] Environment doom_battle2 already registered, overwriting...
[2025-07-29 10:53:45,422][05283] Environment doom_duel_bots already registered, overwriting...
[2025-07-29 10:53:45,422][05283] Environment doom_deathmatch_bots already registered, overwriting...
[2025-07-29 10:53:45,423][05283] Environment doom_duel already registered, overwriting...
[2025-07-29 10:53:45,423][05283] Environment doom_deathmatch_full already registered, overwriting...
[2025-07-29 10:53:45,424][05283] Environment doom_benchmark already registered, overwriting...
[2025-07-29 10:53:45,425][05283] register_encoder_factory: <function make_vizdoom_encoder at 0x7bcb34d2b1a0>
[2025-07-29 10:53:45,434][05283] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 10:53:45,438][05283] Experiment dir /content/train_dir/default_experiment already exists!
[2025-07-29 10:53:45,439][05283] Resuming existing experiment from /content/train_dir/default_experiment...
[2025-07-29 10:53:45,439][05283] Weights and Biases integration disabled
[2025-07-29 10:53:45,441][05283] Environment var CUDA_VISIBLE_DEVICES is 0

[2025-07-29 10:53:47,725][05283] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=4000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
git_hash=unknown
git_repo_name=not a git repository
[2025-07-29 10:53:47,726][05283] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-07-29 10:53:47,728][05283] Rollout worker 0 uses device cpu
[2025-07-29 10:53:47,728][05283] Rollout worker 1 uses device cpu
[2025-07-29 10:53:47,729][05283] Rollout worker 2 uses device cpu
[2025-07-29 10:53:47,730][05283] Rollout worker 3 uses device cpu
[2025-07-29 10:53:47,730][05283] Rollout worker 4 uses device cpu
[2025-07-29 10:53:47,731][05283] Rollout worker 5 uses device cpu
[2025-07-29 10:53:47,732][05283] Rollout worker 6 uses device cpu
[2025-07-29 10:53:47,733][05283] Rollout worker 7 uses device cpu
[2025-07-29 10:53:47,768][05283] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 10:53:47,769][05283] InferenceWorker_p0-w0: min num requests: 2
[2025-07-29 10:53:47,797][05283] Starting all processes...
[2025-07-29 10:53:47,797][05283] Starting process learner_proc0
[2025-07-29 10:53:47,800][05283] EvtLoop [Runner_EvtLoop, process=main process 5283] unhandled exception in slot='_on_start' connected to emitter=Emitter(object_id='Runner_EvtLoop', signal_name='start'), args=()
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
    slot_callable(*args)
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 49, in _on_start
    self._start_processes()
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 56, in _start_processes
    p.start()
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 515, in start
    self._process.start()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.11/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'TLSBuffer' object
[2025-07-29 10:53:47,801][05283] Unhandled exception cannot pickle 'TLSBuffer' object in evt loop Runner_EvtLoop
[2025-07-29 10:53:47,801][05283] Uncaught exception in Runner evt loop
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/runners/runner.py", line 770, in run
    evt_loop_status = self.event_loop.exec()
                      ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 403, in exec
    raise exc
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 399, in exec
    while self._loop_iteration():
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 383, in _loop_iteration
    self._process_signal(s)
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 358, in _process_signal
    raise exc
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 355, in _process_signal
    slot_callable(*args)
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 49, in _on_start
    self._start_processes()
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/runners/runner_parallel.py", line 56, in _start_processes
    p.start()
  File "/usr/local/lib/python3.11/dist-packages/signal_slot/signal_slot.py", line 515, in start
    self._process.start()
  File "/usr/lib/python3.11/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
                  ^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/context.py", line 288, in _Popen
    return Popen(process_obj)
           ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.11/multiprocessing/popen_spawn_posix.py", line 47, in _launch
    reduction.dump(process_obj, fp)
  File "/usr/lib/python3.11/multiprocessing/reduction.py", line 60, in dump
    ForkingPickler(file, protocol).dump(obj)
TypeError: cannot pickle 'TLSBuffer' object
[2025-07-29 10:53:47,803][05283] Runner profile tree view:
main_loop: 0.0063
[2025-07-29 10:53:47,804][05283] Collected {}, FPS: 0.0
[2025-07-29 10:55:39,589][08356] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-07-29 10:55:39,591][08356] Rollout worker 0 uses device cpu
[2025-07-29 10:55:39,592][08356] Rollout worker 1 uses device cpu
[2025-07-29 10:55:39,593][08356] Rollout worker 2 uses device cpu
[2025-07-29 10:55:39,594][08356] Rollout worker 3 uses device cpu
[2025-07-29 10:55:39,594][08356] Rollout worker 4 uses device cpu
[2025-07-29 10:55:39,595][08356] Rollout worker 5 uses device cpu
[2025-07-29 10:55:39,596][08356] Rollout worker 6 uses device cpu
[2025-07-29 10:55:39,597][08356] Rollout worker 7 uses device cpu
[2025-07-29 10:55:39,635][08356] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 10:55:39,636][08356] InferenceWorker_p0-w0: min num requests: 2
[2025-07-29 10:55:39,669][08356] Starting all processes...
[2025-07-29 10:55:39,669][08356] Starting process learner_proc0
[2025-07-29 10:55:39,723][08356] Starting all processes...
[2025-07-29 10:55:39,727][08356] Starting process inference_proc0-0
[2025-07-29 10:55:39,728][08356] Starting process rollout_proc0
[2025-07-29 10:55:39,728][08356] Starting process rollout_proc1
[2025-07-29 10:55:39,728][08356] Starting process rollout_proc2
[2025-07-29 10:55:39,730][08356] Starting process rollout_proc3
[2025-07-29 10:55:39,730][08356] Starting process rollout_proc4
[2025-07-29 10:55:39,733][08356] Starting process rollout_proc5
[2025-07-29 10:55:39,734][08356] Starting process rollout_proc6
[2025-07-29 10:55:39,734][08356] Starting process rollout_proc7
[2025-07-29 10:55:42,283][08564] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 10:55:42,307][08550] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 10:55:42,308][08550] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-07-29 10:55:42,323][08550] Num visible devices: 1
[2025-07-29 10:55:42,324][08550] Starting seed is not provided
[2025-07-29 10:55:42,324][08550] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 10:55:42,324][08550] Initializing actor-critic model on device cuda:0
[2025-07-29 10:55:42,324][08550] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 10:55:42,326][08550] RunningMeanStd input shape: (1,)
[2025-07-29 10:55:42,337][08550] ConvEncoder: input_channels=3
[2025-07-29 10:55:42,443][08550] Conv encoder output size: 512
[2025-07-29 10:55:42,443][08550] Policy head output size: 512
[2025-07-29 10:55:42,459][08550] Created Actor Critic model with architecture:
[2025-07-29 10:55:42,459][08550] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2025-07-29 10:55:42,613][08550] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-07-29 10:55:42,814][08565] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 10:55:42,838][08566] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 10:55:42,840][08569] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 10:55:42,864][08568] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 10:55:42,975][08563] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 10:55:42,975][08563] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-07-29 10:55:42,991][08563] Num visible devices: 1
[2025-07-29 10:55:43,006][08567] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 10:55:43,024][08570] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 10:55:43,065][08571] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 10:55:47,009][08550] No checkpoints found
[2025-07-29 10:55:47,009][08550] Did not load from checkpoint, starting from scratch!
[2025-07-29 10:55:47,009][08550] Initialized policy 0 weights for model version 0
[2025-07-29 10:55:47,011][08550] LearnerWorker_p0 finished initialization!
[2025-07-29 10:55:47,011][08550] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 10:55:47,121][08563] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 10:55:47,123][08563] RunningMeanStd input shape: (1,)
[2025-07-29 10:55:47,135][08563] ConvEncoder: input_channels=3
[2025-07-29 10:55:47,238][08563] Conv encoder output size: 512
[2025-07-29 10:55:47,238][08563] Policy head output size: 512
[2025-07-29 10:55:47,271][08356] Inference worker 0-0 is ready!
[2025-07-29 10:55:47,272][08356] All inference workers are ready! Signal rollout workers to start!
[2025-07-29 10:55:47,323][08568] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 10:55:47,323][08569] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 10:55:47,323][08566] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 10:55:47,323][08564] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 10:55:47,323][08567] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 10:55:47,324][08570] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 10:55:47,324][08571] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 10:55:47,324][08565] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 10:55:47,662][08567] Decorrelating experience for 0 frames...
[2025-07-29 10:55:47,662][08568] Decorrelating experience for 0 frames...
[2025-07-29 10:55:47,662][08566] Decorrelating experience for 0 frames...
[2025-07-29 10:55:47,662][08570] Decorrelating experience for 0 frames...
[2025-07-29 10:55:47,662][08564] Decorrelating experience for 0 frames...
[2025-07-29 10:55:47,915][08570] Decorrelating experience for 32 frames...
[2025-07-29 10:55:47,918][08567] Decorrelating experience for 32 frames...
[2025-07-29 10:55:47,919][08566] Decorrelating experience for 32 frames...
[2025-07-29 10:55:47,931][08565] Decorrelating experience for 0 frames...
[2025-07-29 10:55:47,932][08569] Decorrelating experience for 0 frames...
[2025-07-29 10:55:48,037][08564] Decorrelating experience for 32 frames...
[2025-07-29 10:55:48,055][08571] Decorrelating experience for 0 frames...
[2025-07-29 10:55:48,212][08568] Decorrelating experience for 32 frames...
[2025-07-29 10:55:48,212][08565] Decorrelating experience for 32 frames...
[2025-07-29 10:55:48,215][08569] Decorrelating experience for 32 frames...
[2025-07-29 10:55:48,244][08567] Decorrelating experience for 64 frames...
[2025-07-29 10:55:48,279][08570] Decorrelating experience for 64 frames...
[2025-07-29 10:55:48,305][08571] Decorrelating experience for 32 frames...
[2025-07-29 10:55:48,524][08566] Decorrelating experience for 64 frames...
[2025-07-29 10:55:48,564][08568] Decorrelating experience for 64 frames...
[2025-07-29 10:55:48,572][08564] Decorrelating experience for 64 frames...
[2025-07-29 10:55:48,798][08566] Decorrelating experience for 96 frames...
[2025-07-29 10:55:48,803][08567] Decorrelating experience for 96 frames...
[2025-07-29 10:55:48,840][08570] Decorrelating experience for 96 frames...
[2025-07-29 10:55:48,842][08568] Decorrelating experience for 96 frames...
[2025-07-29 10:55:49,076][08564] Decorrelating experience for 96 frames...
[2025-07-29 10:55:49,148][08571] Decorrelating experience for 64 frames...
[2025-07-29 10:55:49,357][08569] Decorrelating experience for 64 frames...
[2025-07-29 10:55:49,431][08571] Decorrelating experience for 96 frames...
[2025-07-29 10:55:49,642][08569] Decorrelating experience for 96 frames...
[2025-07-29 10:55:49,648][08565] Decorrelating experience for 64 frames...
[2025-07-29 10:55:50,005][08565] Decorrelating experience for 96 frames...
[2025-07-29 10:55:50,274][08356] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-07-29 10:55:50,275][08356] Avg episode reward: [(0, '1.797')]
[2025-07-29 10:55:50,491][08550] Signal inference workers to stop experience collection...
[2025-07-29 10:55:50,511][08563] InferenceWorker_p0-w0: stopping experience collection
[2025-07-29 10:55:51,622][08550] Signal inference workers to resume experience collection...
[2025-07-29 10:55:51,623][08563] InferenceWorker_p0-w0: resuming experience collection
[2025-07-29 10:55:53,372][08563] Updated weights for policy 0, policy_version 10 (0.0089)
[2025-07-29 10:55:55,274][08356] Fps is (10 sec: 15564.7, 60 sec: 15564.7, 300 sec: 15564.7). Total num frames: 77824. Throughput: 0: 2153.2. Samples: 10766. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-07-29 10:55:55,275][08356] Avg episode reward: [(0, '4.516')]
[2025-07-29 10:55:55,404][08563] Updated weights for policy 0, policy_version 20 (0.0012)
[2025-07-29 10:55:57,391][08563] Updated weights for policy 0, policy_version 30 (0.0012)
[2025-07-29 10:55:59,416][08563] Updated weights for policy 0, policy_version 40 (0.0011)
[2025-07-29 10:55:59,626][08356] Heartbeat connected on Batcher_0
[2025-07-29 10:55:59,629][08356] Heartbeat connected on LearnerWorker_p0
[2025-07-29 10:55:59,638][08356] Heartbeat connected on InferenceWorker_p0-w0
[2025-07-29 10:55:59,646][08356] Heartbeat connected on RolloutWorker_w0
[2025-07-29 10:55:59,648][08356] Heartbeat connected on RolloutWorker_w1
[2025-07-29 10:55:59,652][08356] Heartbeat connected on RolloutWorker_w2
[2025-07-29 10:55:59,658][08356] Heartbeat connected on RolloutWorker_w4
[2025-07-29 10:55:59,661][08356] Heartbeat connected on RolloutWorker_w3
[2025-07-29 10:55:59,664][08356] Heartbeat connected on RolloutWorker_w5
[2025-07-29 10:55:59,665][08356] Heartbeat connected on RolloutWorker_w6
[2025-07-29 10:55:59,674][08356] Heartbeat connected on RolloutWorker_w7
[2025-07-29 10:56:00,274][08356] Fps is (10 sec: 18022.5, 60 sec: 18022.5, 300 sec: 18022.5). Total num frames: 180224. Throughput: 0: 4124.6. Samples: 41246. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-07-29 10:56:00,275][08356] Avg episode reward: [(0, '4.550')]
[2025-07-29 10:56:00,276][08550] Saving new best policy, reward=4.550!
[2025-07-29 10:56:01,506][08563] Updated weights for policy 0, policy_version 50 (0.0011)
[2025-07-29 10:56:03,634][08563] Updated weights for policy 0, policy_version 60 (0.0012)
[2025-07-29 10:56:05,274][08356] Fps is (10 sec: 20070.3, 60 sec: 18568.5, 300 sec: 18568.5). Total num frames: 278528. Throughput: 0: 3729.1. Samples: 55936. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-07-29 10:56:05,275][08356] Avg episode reward: [(0, '4.571')]
[2025-07-29 10:56:05,280][08550] Saving new best policy, reward=4.571!
[2025-07-29 10:56:05,659][08563] Updated weights for policy 0, policy_version 70 (0.0011)
[2025-07-29 10:56:07,647][08563] Updated weights for policy 0, policy_version 80 (0.0012)
[2025-07-29 10:56:09,674][08563] Updated weights for policy 0, policy_version 90 (0.0011)
[2025-07-29 10:56:10,274][08356] Fps is (10 sec: 20070.4, 60 sec: 19046.4, 300 sec: 19046.4). Total num frames: 380928. Throughput: 0: 4303.7. Samples: 86074. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-07-29 10:56:10,275][08356] Avg episode reward: [(0, '4.404')]
[2025-07-29 10:56:11,679][08563] Updated weights for policy 0, policy_version 100 (0.0011)
[2025-07-29 10:56:13,663][08563] Updated weights for policy 0, policy_version 110 (0.0011)
[2025-07-29 10:56:15,274][08356] Fps is (10 sec: 20070.6, 60 sec: 19169.3, 300 sec: 19169.3). Total num frames: 479232. Throughput: 0: 4664.7. Samples: 116618. Policy #0 lag: (min: 0.0, avg: 1.0, max: 2.0)
[2025-07-29 10:56:15,275][08356] Avg episode reward: [(0, '4.467')]
[2025-07-29 10:56:15,750][08563] Updated weights for policy 0, policy_version 120 (0.0012)
[2025-07-29 10:56:17,778][08563] Updated weights for policy 0, policy_version 130 (0.0011)
[2025-07-29 10:56:19,789][08563] Updated weights for policy 0, policy_version 140 (0.0012)
[2025-07-29 10:56:20,274][08356] Fps is (10 sec: 20070.5, 60 sec: 19387.8, 300 sec: 19387.8). Total num frames: 581632. Throughput: 0: 4379.7. Samples: 131390. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-07-29 10:56:20,275][08356] Avg episode reward: [(0, '4.520')]
[2025-07-29 10:56:21,795][08563] Updated weights for policy 0, policy_version 150 (0.0011)
[2025-07-29 10:56:23,802][08563] Updated weights for policy 0, policy_version 160 (0.0011)
[2025-07-29 10:56:25,274][08356] Fps is (10 sec: 20480.0, 60 sec: 19543.8, 300 sec: 19543.8). Total num frames: 684032. Throughput: 0: 4629.0. Samples: 162016. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 10:56:25,275][08356] Avg episode reward: [(0, '4.459')]
[2025-07-29 10:56:25,804][08563] Updated weights for policy 0, policy_version 170 (0.0011)
[2025-07-29 10:56:27,847][08563] Updated weights for policy 0, policy_version 180 (0.0011)
[2025-07-29 10:56:29,893][08563] Updated weights for policy 0, policy_version 190 (0.0011)
[2025-07-29 10:56:30,274][08356] Fps is (10 sec: 20070.4, 60 sec: 19558.4, 300 sec: 19558.4). Total num frames: 782336. Throughput: 0: 4804.8. Samples: 192190. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 10:56:30,275][08356] Avg episode reward: [(0, '4.757')]
[2025-07-29 10:56:30,293][08550] Saving new best policy, reward=4.757!
[2025-07-29 10:56:31,910][08563] Updated weights for policy 0, policy_version 200 (0.0012)
[2025-07-29 10:56:33,906][08563] Updated weights for policy 0, policy_version 210 (0.0011)
[2025-07-29 10:56:35,274][08356] Fps is (10 sec: 20070.5, 60 sec: 19660.8, 300 sec: 19660.8). Total num frames: 884736. Throughput: 0: 4610.9. Samples: 207490. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 10:56:35,275][08356] Avg episode reward: [(0, '4.508')]
[2025-07-29 10:56:35,931][08563] Updated weights for policy 0, policy_version 220 (0.0011)
[2025-07-29 10:56:37,943][08563] Updated weights for policy 0, policy_version 230 (0.0011)
[2025-07-29 10:56:39,991][08563] Updated weights for policy 0, policy_version 240 (0.0011)
[2025-07-29 10:56:40,274][08356] Fps is (10 sec: 20480.0, 60 sec: 19742.8, 300 sec: 19742.8). Total num frames: 987136. Throughput: 0: 5051.0. Samples: 238062. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 10:56:40,275][08356] Avg episode reward: [(0, '4.751')]
[2025-07-29 10:56:42,103][08563] Updated weights for policy 0, policy_version 250 (0.0012)
[2025-07-29 10:56:44,094][08563] Updated weights for policy 0, policy_version 260 (0.0012)
[2025-07-29 10:56:45,273][08356] Fps is (10 sec: 20070.5, 60 sec: 19735.3, 300 sec: 19735.3). Total num frames: 1085440. Throughput: 0: 5037.5. Samples: 267934. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 10:56:45,275][08356] Avg episode reward: [(0, '4.370')]
[2025-07-29 10:56:46,099][08563] Updated weights for policy 0, policy_version 270 (0.0011)
[2025-07-29 10:56:48,100][08563] Updated weights for policy 0, policy_version 280 (0.0011)
[2025-07-29 10:56:50,133][08563] Updated weights for policy 0, policy_version 290 (0.0011)
[2025-07-29 10:56:50,274][08356] Fps is (10 sec: 20070.3, 60 sec: 19797.3, 300 sec: 19797.3). Total num frames: 1187840. Throughput: 0: 5052.9. Samples: 283316. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 10:56:50,275][08356] Avg episode reward: [(0, '4.721')]
[2025-07-29 10:56:52,161][08563] Updated weights for policy 0, policy_version 300 (0.0011)
[2025-07-29 10:56:54,240][08563] Updated weights for policy 0, policy_version 310 (0.0012)
[2025-07-29 10:56:55,274][08356] Fps is (10 sec: 20479.8, 60 sec: 20206.9, 300 sec: 19849.9). Total num frames: 1290240. Throughput: 0: 5056.0. Samples: 313594. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 10:56:55,275][08356] Avg episode reward: [(0, '4.902')]
[2025-07-29 10:56:55,280][08550] Saving new best policy, reward=4.902!
[2025-07-29 10:56:56,282][08563] Updated weights for policy 0, policy_version 320 (0.0011)
[2025-07-29 10:56:58,277][08563] Updated weights for policy 0, policy_version 330 (0.0011)
[2025-07-29 10:57:00,274][08356] Fps is (10 sec: 20070.6, 60 sec: 20138.7, 300 sec: 19836.4). Total num frames: 1388544. Throughput: 0: 5048.8. Samples: 343812. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-07-29 10:57:00,275][08356] Avg episode reward: [(0, '5.191')]
[2025-07-29 10:57:00,284][08550] Saving new best policy, reward=5.191!
[2025-07-29 10:57:00,287][08563] Updated weights for policy 0, policy_version 340 (0.0012)
[2025-07-29 10:57:02,313][08563] Updated weights for policy 0, policy_version 350 (0.0011)
[2025-07-29 10:57:04,335][08563] Updated weights for policy 0, policy_version 360 (0.0012)
[2025-07-29 10:57:05,274][08356] Fps is (10 sec: 20070.5, 60 sec: 20207.0, 300 sec: 19879.3). Total num frames: 1490944. Throughput: 0: 5060.0. Samples: 359092. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 10:57:05,275][08356] Avg episode reward: [(0, '5.012')]
[2025-07-29 10:57:06,453][08563] Updated weights for policy 0, policy_version 370 (0.0012)
[2025-07-29 10:57:08,514][08563] Updated weights for policy 0, policy_version 380 (0.0011)
[2025-07-29 10:57:10,274][08356] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 19865.6). Total num frames: 1589248. Throughput: 0: 5038.0. Samples: 388726. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 10:57:10,275][08356] Avg episode reward: [(0, '5.173')]
[2025-07-29 10:57:10,534][08563] Updated weights for policy 0, policy_version 390 (0.0011)
[2025-07-29 10:57:12,578][08563] Updated weights for policy 0, policy_version 400 (0.0011)
[2025-07-29 10:57:14,609][08563] Updated weights for policy 0, policy_version 410 (0.0011)
[2025-07-29 10:57:15,274][08356] Fps is (10 sec: 20070.5, 60 sec: 20207.0, 300 sec: 19901.8). Total num frames: 1691648. Throughput: 0: 5038.2. Samples: 418908. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 10:57:15,275][08356] Avg episode reward: [(0, '5.349')]
[2025-07-29 10:57:15,280][08550] Saving new best policy, reward=5.349!
[2025-07-29 10:57:16,630][08563] Updated weights for policy 0, policy_version 420 (0.0011)
[2025-07-29 10:57:18,679][08563] Updated weights for policy 0, policy_version 430 (0.0011)
[2025-07-29 10:57:20,274][08356] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 19888.4). Total num frames: 1789952. Throughput: 0: 5034.3. Samples: 434034. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 10:57:20,275][08356] Avg episode reward: [(0, '6.162')]
[2025-07-29 10:57:20,276][08550] Saving new best policy, reward=6.162!
[2025-07-29 10:57:20,752][08563] Updated weights for policy 0, policy_version 440 (0.0012)
[2025-07-29 10:57:22,752][08563] Updated weights for policy 0, policy_version 450 (0.0012)
[2025-07-29 10:57:24,748][08563] Updated weights for policy 0, policy_version 460 (0.0011)
[2025-07-29 10:57:25,274][08356] Fps is (10 sec: 20070.2, 60 sec: 20138.6, 300 sec: 19919.5). Total num frames: 1892352. Throughput: 0: 5024.7. Samples: 464174. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 10:57:25,275][08356] Avg episode reward: [(0, '6.499')]
[2025-07-29 10:57:25,280][08550] Saving new best policy, reward=6.499!
[2025-07-29 10:57:26,729][08563] Updated weights for policy 0, policy_version 470 (0.0012)
[2025-07-29 10:57:28,736][08563] Updated weights for policy 0, policy_version 480 (0.0012)
[2025-07-29 10:57:30,274][08356] Fps is (10 sec: 20479.2, 60 sec: 20206.8, 300 sec: 19947.5). Total num frames: 1994752. Throughput: 0: 5044.4. Samples: 494934. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-07-29 10:57:30,275][08356] Avg episode reward: [(0, '6.789')]
[2025-07-29 10:57:30,277][08550] Saving new best policy, reward=6.789!
[2025-07-29 10:57:30,726][08563] Updated weights for policy 0, policy_version 490 (0.0011)
[2025-07-29 10:57:32,843][08563] Updated weights for policy 0, policy_version 500 (0.0012)
[2025-07-29 10:57:34,824][08563] Updated weights for policy 0, policy_version 510 (0.0012)
[2025-07-29 10:57:35,274][08356] Fps is (10 sec: 20480.2, 60 sec: 20206.9, 300 sec: 19972.9). Total num frames: 2097152. Throughput: 0: 5032.1. Samples: 509760. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-07-29 10:57:35,275][08356] Avg episode reward: [(0, '6.559')]
[2025-07-29 10:57:35,281][08550] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000512_2097152.pth...
[2025-07-29 10:57:36,832][08563] Updated weights for policy 0, policy_version 520 (0.0012)
[2025-07-29 10:57:38,851][08563] Updated weights for policy 0, policy_version 530 (0.0012)
[2025-07-29 10:57:40,274][08356] Fps is (10 sec: 20480.9, 60 sec: 20206.9, 300 sec: 19995.9). Total num frames: 2199552. Throughput: 0: 5039.7. Samples: 540382. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-07-29 10:57:40,275][08356] Avg episode reward: [(0, '6.527')]
[2025-07-29 10:57:40,873][08563] Updated weights for policy 0, policy_version 540 (0.0011)
[2025-07-29 10:57:42,855][08563] Updated weights for policy 0, policy_version 550 (0.0012)
[2025-07-29 10:57:44,901][08563] Updated weights for policy 0, policy_version 560 (0.0011)
[2025-07-29 10:57:45,274][08356] Fps is (10 sec: 20070.2, 60 sec: 20206.9, 300 sec: 19981.4). Total num frames: 2297856. Throughput: 0: 5047.5. Samples: 570952. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 10:57:45,275][08356] Avg episode reward: [(0, '8.490')]
[2025-07-29 10:57:45,280][08550] Saving new best policy, reward=8.490!
[2025-07-29 10:57:46,945][08563] Updated weights for policy 0, policy_version 570 (0.0011)
[2025-07-29 10:57:48,931][08563] Updated weights for policy 0, policy_version 580 (0.0012)
[2025-07-29 10:57:50,274][08356] Fps is (10 sec: 20070.3, 60 sec: 20206.9, 300 sec: 20002.1). Total num frames: 2400256. Throughput: 0: 5041.8. Samples: 585974. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-07-29 10:57:50,275][08356] Avg episode reward: [(0, '8.160')]
[2025-07-29 10:57:50,924][08563] Updated weights for policy 0, policy_version 590 (0.0011)
[2025-07-29 10:57:52,900][08563] Updated weights for policy 0, policy_version 600 (0.0012)
[2025-07-29 10:57:54,902][08563] Updated weights for policy 0, policy_version 610 (0.0011)
[2025-07-29 10:57:55,274][08356] Fps is (10 sec: 20480.2, 60 sec: 20207.0, 300 sec: 20021.3). Total num frames: 2502656. Throughput: 0: 5068.9. Samples: 616826. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-07-29 10:57:55,275][08356] Avg episode reward: [(0, '8.444')]
[2025-07-29 10:57:56,949][08563] Updated weights for policy 0, policy_version 620 (0.0012)
[2025-07-29 10:57:59,016][08563] Updated weights for policy 0, policy_version 630 (0.0011)
[2025-07-29 10:58:00,274][08356] Fps is (10 sec: 20479.9, 60 sec: 20275.2, 300 sec: 20038.9). Total num frames: 2605056. Throughput: 0: 5073.1. Samples: 647196. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 10:58:00,275][08356] Avg episode reward: [(0, '10.207')]
[2025-07-29 10:58:00,276][08550] Saving new best policy, reward=10.207!
[2025-07-29 10:58:01,028][08563] Updated weights for policy 0, policy_version 640 (0.0011)
[2025-07-29 10:58:03,022][08563] Updated weights for policy 0, policy_version 650 (0.0011)
[2025-07-29 10:58:04,990][08563] Updated weights for policy 0, policy_version 660 (0.0011)
[2025-07-29 10:58:05,274][08356] Fps is (10 sec: 20479.9, 60 sec: 20275.2, 300 sec: 20055.2). Total num frames: 2707456. Throughput: 0: 5076.4. Samples: 662472. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 10:58:05,275][08356] Avg episode reward: [(0, '11.219')]
[2025-07-29 10:58:05,279][08550] Saving new best policy, reward=11.219!
[2025-07-29 10:58:06,990][08563] Updated weights for policy 0, policy_version 670 (0.0011)
[2025-07-29 10:58:09,008][08563] Updated weights for policy 0, policy_version 680 (0.0012)
[2025-07-29 10:58:10,274][08356] Fps is (10 sec: 20480.0, 60 sec: 20343.5, 300 sec: 20070.4). Total num frames: 2809856. Throughput: 0: 5094.7. Samples: 693436. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0)
[2025-07-29 10:58:10,275][08356] Avg episode reward: [(0, '11.927')]
[2025-07-29 10:58:10,277][08550] Saving new best policy, reward=11.927!
[2025-07-29 10:58:11,089][08563] Updated weights for policy 0, policy_version 690 (0.0012)
[2025-07-29 10:58:13,076][08563] Updated weights for policy 0, policy_version 700 (0.0011)
[2025-07-29 10:58:15,040][08563] Updated weights for policy 0, policy_version 710 (0.0011)
[2025-07-29 10:58:15,274][08356] Fps is (10 sec: 20479.9, 60 sec: 20343.4, 300 sec: 20084.5). Total num frames: 2912256. Throughput: 0: 5086.2. Samples: 723810. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 10:58:15,275][08356] Avg episode reward: [(0, '13.472')]
[2025-07-29 10:58:15,280][08550] Saving new best policy, reward=13.472!
[2025-07-29 10:58:17,025][08563] Updated weights for policy 0, policy_version 720 (0.0011)
[2025-07-29 10:58:19,021][08563] Updated weights for policy 0, policy_version 730 (0.0012)
[2025-07-29 10:58:20,274][08356] Fps is (10 sec: 20480.1, 60 sec: 20411.7, 300 sec: 20097.7). Total num frames: 3014656. Throughput: 0: 5098.8. Samples: 739208. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 10:58:20,275][08356] Avg episode reward: [(0, '14.176')]
[2025-07-29 10:58:20,276][08550] Saving new best policy, reward=14.176!
[2025-07-29 10:58:20,985][08563] Updated weights for policy 0, policy_version 740 (0.0011)
[2025-07-29 10:58:23,036][08563] Updated weights for policy 0, policy_version 750 (0.0012)
[2025-07-29 10:58:25,091][08563] Updated weights for policy 0, policy_version 760 (0.0012)
[2025-07-29 10:58:25,274][08356] Fps is (10 sec: 20070.5, 60 sec: 20343.5, 300 sec: 20083.6). Total num frames: 3112960. Throughput: 0: 5097.9. Samples: 769790. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 10:58:25,275][08356] Avg episode reward: [(0, '15.850')]
[2025-07-29 10:58:25,283][08550] Saving new best policy, reward=15.850!
[2025-07-29 10:58:27,069][08563] Updated weights for policy 0, policy_version 770 (0.0011)
[2025-07-29 10:58:29,052][08563] Updated weights for policy 0, policy_version 780 (0.0011)
[2025-07-29 10:58:30,274][08356] Fps is (10 sec: 20480.1, 60 sec: 20411.9, 300 sec: 20121.6). Total num frames: 3219456. Throughput: 0: 5104.4. Samples: 800650. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-07-29 10:58:30,275][08356] Avg episode reward: [(0, '15.208')]
[2025-07-29 10:58:31,009][08563] Updated weights for policy 0, policy_version 790 (0.0011)
[2025-07-29 10:58:32,979][08563] Updated weights for policy 0, policy_version 800 (0.0011)
[2025-07-29 10:58:35,005][08563] Updated weights for policy 0, policy_version 810 (0.0012)
[2025-07-29 10:58:35,274][08356] Fps is (10 sec: 20889.5, 60 sec: 20411.7, 300 sec: 20132.5). Total num frames: 3321856. Throughput: 0: 5118.5. Samples: 816308. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-07-29 10:58:35,276][08356] Avg episode reward: [(0, '17.220')]
[2025-07-29 10:58:35,282][08550] Saving new best policy, reward=17.220!
[2025-07-29 10:58:37,067][08563] Updated weights for policy 0, policy_version 820 (0.0012)
[2025-07-29 10:58:39,042][08563] Updated weights for policy 0, policy_version 830 (0.0012)
[2025-07-29 10:58:40,274][08356] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 20142.7). Total num frames: 3424256. Throughput: 0: 5102.4. Samples: 846436. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-07-29 10:58:40,275][08356] Avg episode reward: [(0, '17.562')]
[2025-07-29 10:58:40,276][08550] Saving new best policy, reward=17.562!
[2025-07-29 10:58:41,034][08563] Updated weights for policy 0, policy_version 840 (0.0011)
[2025-07-29 10:58:43,024][08563] Updated weights for policy 0, policy_version 850 (0.0011)
[2025-07-29 10:58:44,978][08563] Updated weights for policy 0, policy_version 860 (0.0011)
[2025-07-29 10:58:45,274][08356] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 20152.3). Total num frames: 3526656. Throughput: 0: 5116.0. Samples: 877416. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-07-29 10:58:45,275][08356] Avg episode reward: [(0, '19.748')]
[2025-07-29 10:58:45,281][08550] Saving new best policy, reward=19.748!
[2025-07-29 10:58:46,986][08563] Updated weights for policy 0, policy_version 870 (0.0011)
[2025-07-29 10:58:49,061][08563] Updated weights for policy 0, policy_version 880 (0.0012)
[2025-07-29 10:58:50,274][08356] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20161.4). Total num frames: 3629056. Throughput: 0: 5121.9. Samples: 892956. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 10:58:50,275][08356] Avg episode reward: [(0, '20.681')]
[2025-07-29 10:58:50,276][08550] Saving new best policy, reward=20.681!
[2025-07-29 10:58:51,043][08563] Updated weights for policy 0, policy_version 890 (0.0011)
[2025-07-29 10:58:53,004][08563] Updated weights for policy 0, policy_version 900 (0.0011)
[2025-07-29 10:58:54,984][08563] Updated weights for policy 0, policy_version 910 (0.0011)
[2025-07-29 10:58:55,274][08356] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20170.0). Total num frames: 3731456. Throughput: 0: 5114.8. Samples: 923600. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 10:58:55,275][08356] Avg episode reward: [(0, '22.012')]
[2025-07-29 10:58:55,281][08550] Saving new best policy, reward=22.012!
[2025-07-29 10:58:56,978][08563] Updated weights for policy 0, policy_version 920 (0.0012)
[2025-07-29 10:58:58,950][08563] Updated weights for policy 0, policy_version 930 (0.0011)
[2025-07-29 10:59:00,274][08356] Fps is (10 sec: 20480.0, 60 sec: 20480.0, 300 sec: 20178.2). Total num frames: 3833856. Throughput: 0: 5128.0. Samples: 954570. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 10:59:00,275][08356] Avg episode reward: [(0, '21.860')]
[2025-07-29 10:59:01,008][08563] Updated weights for policy 0, policy_version 940 (0.0011)
[2025-07-29 10:59:03,040][08563] Updated weights for policy 0, policy_version 950 (0.0012)
[2025-07-29 10:59:05,015][08563] Updated weights for policy 0, policy_version 960 (0.0011)
[2025-07-29 10:59:05,274][08356] Fps is (10 sec: 20480.1, 60 sec: 20480.0, 300 sec: 20185.9). Total num frames: 3936256. Throughput: 0: 5116.3. Samples: 969442. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 10:59:05,275][08356] Avg episode reward: [(0, '24.220')]
[2025-07-29 10:59:05,281][08550] Saving new best policy, reward=24.220!
[2025-07-29 10:59:06,977][08563] Updated weights for policy 0, policy_version 970 (0.0011)
[2025-07-29 10:59:08,542][08550] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:08,543][08356] Component Batcher_0 stopped!
[2025-07-29 10:59:08,543][08550] Stopping Batcher_0...
[2025-07-29 10:59:08,548][08550] Loop batcher_evt_loop terminating...
[2025-07-29 10:59:08,565][08563] Weights refcount: 2 0
[2025-07-29 10:59:08,567][08563] Stopping InferenceWorker_p0-w0...
[2025-07-29 10:59:08,567][08563] Loop inference_proc0-0_evt_loop terminating...
[2025-07-29 10:59:08,567][08356] Component InferenceWorker_p0-w0 stopped!
[2025-07-29 10:59:08,584][08569] Stopping RolloutWorker_w5...
[2025-07-29 10:59:08,584][08571] Stopping RolloutWorker_w6...
[2025-07-29 10:59:08,585][08571] Loop rollout_proc6_evt_loop terminating...
[2025-07-29 10:59:08,584][08356] Component RolloutWorker_w5 stopped!
[2025-07-29 10:59:08,585][08569] Loop rollout_proc5_evt_loop terminating...
[2025-07-29 10:59:08,586][08566] Stopping RolloutWorker_w2...
[2025-07-29 10:59:08,586][08568] Stopping RolloutWorker_w4...
[2025-07-29 10:59:08,586][08566] Loop rollout_proc2_evt_loop terminating...
[2025-07-29 10:59:08,587][08568] Loop rollout_proc4_evt_loop terminating...
[2025-07-29 10:59:08,587][08567] Stopping RolloutWorker_w3...
[2025-07-29 10:59:08,586][08356] Component RolloutWorker_w6 stopped!
[2025-07-29 10:59:08,588][08567] Loop rollout_proc3_evt_loop terminating...
[2025-07-29 10:59:08,588][08570] Stopping RolloutWorker_w7...
[2025-07-29 10:59:08,588][08356] Component RolloutWorker_w2 stopped!
[2025-07-29 10:59:08,589][08570] Loop rollout_proc7_evt_loop terminating...
[2025-07-29 10:59:08,589][08564] Stopping RolloutWorker_w0...
[2025-07-29 10:59:08,590][08564] Loop rollout_proc0_evt_loop terminating...
[2025-07-29 10:59:08,589][08356] Component RolloutWorker_w4 stopped!
[2025-07-29 10:59:08,590][08565] Stopping RolloutWorker_w1...
[2025-07-29 10:59:08,591][08565] Loop rollout_proc1_evt_loop terminating...
[2025-07-29 10:59:08,590][08356] Component RolloutWorker_w3 stopped!
[2025-07-29 10:59:08,591][08356] Component RolloutWorker_w7 stopped!
[2025-07-29 10:59:08,592][08356] Component RolloutWorker_w0 stopped!
[2025-07-29 10:59:08,593][08356] Component RolloutWorker_w1 stopped!
[2025-07-29 10:59:08,616][08550] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:08,720][08550] Stopping LearnerWorker_p0...
[2025-07-29 10:59:08,721][08550] Loop learner_proc0_evt_loop terminating...
[2025-07-29 10:59:08,720][08356] Component LearnerWorker_p0 stopped!
[2025-07-29 10:59:08,721][08356] Waiting for process learner_proc0 to stop...
[2025-07-29 10:59:09,675][08356] Waiting for process inference_proc0-0 to join...
[2025-07-29 10:59:09,676][08356] Waiting for process rollout_proc0 to join...
[2025-07-29 10:59:09,677][08356] Waiting for process rollout_proc1 to join...
[2025-07-29 10:59:09,678][08356] Waiting for process rollout_proc2 to join...
[2025-07-29 10:59:09,678][08356] Waiting for process rollout_proc3 to join...
[2025-07-29 10:59:09,679][08356] Waiting for process rollout_proc4 to join...
[2025-07-29 10:59:09,680][08356] Waiting for process rollout_proc5 to join...
[2025-07-29 10:59:09,680][08356] Waiting for process rollout_proc6 to join...
[2025-07-29 10:59:09,681][08356] Waiting for process rollout_proc7 to join...
[2025-07-29 10:59:09,682][08356] Batcher 0 profile tree view:
batching: 12.3062, releasing_batches: 0.0226
[2025-07-29 10:59:09,683][08356] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0001
  wait_policy_total: 3.8979
update_model: 3.2072
  weight_update: 0.0011
one_step: 0.0028
  handle_policy_step: 185.2466
    deserialize: 7.5720, stack: 1.2933, obs_to_device_normalize: 45.3519, forward: 88.9795, send_messages: 12.7549
    prepare_outputs: 22.0606
      to_cpu: 14.0958
[2025-07-29 10:59:09,684][08356] Learner 0 profile tree view:
misc: 0.0036, prepare_batch: 6.5850
train: 18.8969
  epoch_init: 0.0041, minibatch_init: 0.0056, losses_postprocess: 0.3338, kl_divergence: 0.3881, after_optimizer: 1.9327
  calculate_losses: 8.5966
    losses_init: 0.0031, forward_head: 0.6340, bptt_initial: 4.6393, tail: 0.6094, advantages_returns: 0.1555, losses: 1.2006
    bptt: 1.2132
      bptt_forward_core: 1.1620
  update: 7.3269
    clip: 0.7956
[2025-07-29 10:59:09,684][08356] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.1235, enqueue_policy_requests: 8.8549, env_step: 127.0001, overhead: 5.4236, complete_rollouts: 0.2088
save_policy_outputs: 7.9631
  split_output_tensors: 3.0557
[2025-07-29 10:59:09,685][08356] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.1222, enqueue_policy_requests: 8.8302, env_step: 127.1710, overhead: 5.5206, complete_rollouts: 0.2087
save_policy_outputs: 7.9508
  split_output_tensors: 3.0303
[2025-07-29 10:59:09,686][08356] Loop Runner_EvtLoop terminating...
[2025-07-29 10:59:09,687][08356] Runner profile tree view:
main_loop: 210.0184
[2025-07-29 10:59:09,687][08356] Collected {0: 4005888}, FPS: 19074.0
[2025-07-29 10:59:21,569][08356] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 10:59:21,570][08356] Overriding arg 'num_workers' with value 1 passed from command line
[2025-07-29 10:59:21,570][08356] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-07-29 10:59:21,571][08356] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-07-29 10:59:21,572][08356] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 10:59:21,572][08356] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-07-29 10:59:21,573][08356] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 10:59:21,574][08356] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-07-29 10:59:21,575][08356] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-07-29 10:59:21,575][08356] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-07-29 10:59:21,576][08356] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-07-29 10:59:21,576][08356] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-07-29 10:59:21,577][08356] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-07-29 10:59:21,578][08356] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-07-29 10:59:21,578][08356] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-07-29 10:59:21,608][08356] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 10:59:21,611][08356] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 10:59:21,612][08356] RunningMeanStd input shape: (1,)
[2025-07-29 10:59:21,625][08356] ConvEncoder: input_channels=3
[2025-07-29 10:59:21,733][08356] Conv encoder output size: 512
[2025-07-29 10:59:21,734][08356] Policy head output size: 512
[2025-07-29 10:59:21,918][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:21,920][08356] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 10:59:21,922][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:21,923][08356] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 10:59:21,924][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:21,925][08356] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 10:59:43,523][08356] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 10:59:43,523][08356] Overriding arg 'num_workers' with value 1 passed from command line
[2025-07-29 10:59:43,524][08356] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-07-29 10:59:43,525][08356] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-07-29 10:59:43,526][08356] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 10:59:43,527][08356] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-07-29 10:59:43,527][08356] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 10:59:43,528][08356] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-07-29 10:59:43,529][08356] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-07-29 10:59:43,530][08356] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-07-29 10:59:43,531][08356] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-07-29 10:59:43,532][08356] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-07-29 10:59:43,532][08356] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-07-29 10:59:43,533][08356] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-07-29 10:59:43,533][08356] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-07-29 10:59:43,562][08356] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 10:59:43,563][08356] RunningMeanStd input shape: (1,)
[2025-07-29 10:59:43,572][08356] ConvEncoder: input_channels=3
[2025-07-29 10:59:43,608][08356] Conv encoder output size: 512
[2025-07-29 10:59:43,609][08356] Policy head output size: 512
[2025-07-29 10:59:43,627][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:43,628][08356] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 10:59:43,629][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:43,630][08356] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 10:59:43,631][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:43,633][08356] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 10:59:49,480][08356] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 10:59:49,481][08356] Overriding arg 'num_workers' with value 1 passed from command line
[2025-07-29 10:59:49,481][08356] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-07-29 10:59:49,482][08356] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-07-29 10:59:49,483][08356] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 10:59:49,483][08356] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-07-29 10:59:49,484][08356] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 10:59:49,485][08356] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-07-29 10:59:49,485][08356] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-07-29 10:59:49,486][08356] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-07-29 10:59:49,487][08356] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-07-29 10:59:49,487][08356] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-07-29 10:59:49,488][08356] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-07-29 10:59:49,488][08356] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-07-29 10:59:49,490][08356] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-07-29 10:59:49,513][08356] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 10:59:49,514][08356] RunningMeanStd input shape: (1,)
[2025-07-29 10:59:49,523][08356] ConvEncoder: input_channels=3
[2025-07-29 10:59:49,556][08356] Conv encoder output size: 512
[2025-07-29 10:59:49,557][08356] Policy head output size: 512
[2025-07-29 10:59:49,574][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:49,577][08356] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 10:59:49,578][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:49,579][08356] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 10:59:49,580][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 10:59:49,581][08356] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:00:31,147][08356] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 11:00:31,148][08356] Overriding arg 'num_workers' with value 1 passed from command line
[2025-07-29 11:00:31,149][08356] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-07-29 11:00:31,149][08356] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-07-29 11:00:31,150][08356] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:00:31,151][08356] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-07-29 11:00:31,151][08356] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:00:31,152][08356] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-07-29 11:00:31,152][08356] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-07-29 11:00:31,154][08356] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-07-29 11:00:31,154][08356] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-07-29 11:00:31,155][08356] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-07-29 11:00:31,155][08356] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-07-29 11:00:31,156][08356] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-07-29 11:00:31,157][08356] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-07-29 11:00:31,182][08356] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:00:31,183][08356] RunningMeanStd input shape: (1,)
[2025-07-29 11:00:31,192][08356] ConvEncoder: input_channels=3
[2025-07-29 11:00:31,227][08356] Conv encoder output size: 512
[2025-07-29 11:00:31,228][08356] Policy head output size: 512
[2025-07-29 11:00:31,247][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:00:31,249][08356] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:00:31,250][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:00:31,251][08356] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:00:31,251][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:00:31,252][08356] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:00:50,477][08356] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 11:00:50,478][08356] Overriding arg 'num_workers' with value 1 passed from command line
[2025-07-29 11:00:50,478][08356] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-07-29 11:00:50,479][08356] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-07-29 11:00:50,480][08356] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:00:50,481][08356] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-07-29 11:00:50,481][08356] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:00:50,482][08356] Adding new argument 'max_num_episodes'=5 that is not in the saved config file!
[2025-07-29 11:00:50,482][08356] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-07-29 11:00:50,483][08356] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-07-29 11:00:50,484][08356] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-07-29 11:00:50,484][08356] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-07-29 11:00:50,485][08356] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-07-29 11:00:50,486][08356] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-07-29 11:00:50,487][08356] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-07-29 11:00:50,512][08356] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:00:50,514][08356] RunningMeanStd input shape: (1,)
[2025-07-29 11:00:50,523][08356] ConvEncoder: input_channels=3
[2025-07-29 11:00:50,558][08356] Conv encoder output size: 512
[2025-07-29 11:00:50,559][08356] Policy head output size: 512
[2025-07-29 11:00:50,579][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:00:50,580][08356] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:00:50,581][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:00:50,582][08356] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:00:50,583][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:00:50,584][08356] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:05:34,005][08356] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 11:05:34,006][08356] Overriding arg 'num_workers' with value 1 passed from command line
[2025-07-29 11:05:34,007][08356] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-07-29 11:05:34,008][08356] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-07-29 11:05:34,008][08356] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:05:34,009][08356] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-07-29 11:05:34,009][08356] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:05:34,010][08356] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-07-29 11:05:34,010][08356] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-07-29 11:05:34,012][08356] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-07-29 11:05:34,012][08356] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-07-29 11:05:34,012][08356] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-07-29 11:05:34,013][08356] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-07-29 11:05:34,014][08356] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-07-29 11:05:34,015][08356] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-07-29 11:05:34,041][08356] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:05:34,042][08356] RunningMeanStd input shape: (1,)
[2025-07-29 11:05:34,052][08356] ConvEncoder: input_channels=3
[2025-07-29 11:05:34,087][08356] Conv encoder output size: 512
[2025-07-29 11:05:34,088][08356] Policy head output size: 512
[2025-07-29 11:05:34,108][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:05:34,109][08356] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:05:34,110][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:05:34,111][08356] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:05:34,112][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:05:34,113][08356] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:06:40,036][08356] Environment doom_basic already registered, overwriting...
[2025-07-29 11:06:40,037][08356] Environment doom_two_colors_easy already registered, overwriting...
[2025-07-29 11:06:40,037][08356] Environment doom_two_colors_hard already registered, overwriting...
[2025-07-29 11:06:40,038][08356] Environment doom_dm already registered, overwriting...
[2025-07-29 11:06:40,039][08356] Environment doom_dwango5 already registered, overwriting...
[2025-07-29 11:06:40,039][08356] Environment doom_my_way_home_flat_actions already registered, overwriting...
[2025-07-29 11:06:40,040][08356] Environment doom_defend_the_center_flat_actions already registered, overwriting...
[2025-07-29 11:06:40,041][08356] Environment doom_my_way_home already registered, overwriting...
[2025-07-29 11:06:40,041][08356] Environment doom_deadly_corridor already registered, overwriting...
[2025-07-29 11:06:40,042][08356] Environment doom_defend_the_center already registered, overwriting...
[2025-07-29 11:06:40,043][08356] Environment doom_defend_the_line already registered, overwriting...
[2025-07-29 11:06:40,043][08356] Environment doom_health_gathering already registered, overwriting...
[2025-07-29 11:06:40,044][08356] Environment doom_health_gathering_supreme already registered, overwriting...
[2025-07-29 11:06:40,045][08356] Environment doom_battle already registered, overwriting...
[2025-07-29 11:06:40,045][08356] Environment doom_battle2 already registered, overwriting...
[2025-07-29 11:06:40,046][08356] Environment doom_duel_bots already registered, overwriting...
[2025-07-29 11:06:40,046][08356] Environment doom_deathmatch_bots already registered, overwriting...
[2025-07-29 11:06:40,047][08356] Environment doom_duel already registered, overwriting...
[2025-07-29 11:06:40,048][08356] Environment doom_deathmatch_full already registered, overwriting...
[2025-07-29 11:06:40,049][08356] Environment doom_benchmark already registered, overwriting...
[2025-07-29 11:06:40,049][08356] register_encoder_factory: <function make_vizdoom_encoder at 0x7b3a471e7880>
[2025-07-29 11:06:40,057][08356] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 11:06:40,061][08356] Experiment dir /content/train_dir/default_experiment already exists!
[2025-07-29 11:06:40,062][08356] Resuming existing experiment from /content/train_dir/default_experiment...
[2025-07-29 11:06:40,062][08356] Weights and Biases integration disabled
[2025-07-29 11:06:40,064][08356] Environment var CUDA_VISIBLE_DEVICES is 0

[2025-07-29 11:06:42,354][08356] Starting experiment with the following configuration:
help=False
algo=APPO
env=doom_health_gathering_supreme
experiment=default_experiment
train_dir=/content/train_dir
restart_behavior=resume
device=gpu
seed=None
num_policies=1
async_rl=True
serial_mode=False
batched_sampling=False
num_batches_to_accumulate=2
worker_num_splits=2
policy_workers_per_policy=1
max_policy_lag=1000
num_workers=8
num_envs_per_worker=4
batch_size=1024
num_batches_per_epoch=1
num_epochs=1
rollout=32
recurrence=32
shuffle_minibatches=False
gamma=0.99
reward_scale=1.0
reward_clip=1000.0
value_bootstrap=False
normalize_returns=True
exploration_loss_coeff=0.001
value_loss_coeff=0.5
kl_loss_coeff=0.0
exploration_loss=symmetric_kl
gae_lambda=0.95
ppo_clip_ratio=0.1
ppo_clip_value=0.2
with_vtrace=False
vtrace_rho=1.0
vtrace_c=1.0
optimizer=adam
adam_eps=1e-06
adam_beta1=0.9
adam_beta2=0.999
max_grad_norm=4.0
learning_rate=0.0001
lr_schedule=constant
lr_schedule_kl_threshold=0.008
lr_adaptive_min=1e-06
lr_adaptive_max=0.01
obs_subtract_mean=0.0
obs_scale=255.0
normalize_input=True
normalize_input_keys=None
decorrelate_experience_max_seconds=0
decorrelate_envs_on_one_worker=True
actor_worker_gpus=[]
set_workers_cpu_affinity=True
force_envs_single_thread=False
default_niceness=0
log_to_file=True
experiment_summaries_interval=10
flush_summaries_interval=30
stats_avg=100
summaries_use_frameskip=True
heartbeat_interval=20
heartbeat_reporting_interval=600
train_for_env_steps=4000000
train_for_seconds=10000000000
save_every_sec=120
keep_checkpoints=2
load_checkpoint_kind=latest
save_milestones_sec=-1
save_best_every_sec=5
save_best_metric=reward
save_best_after=100000
benchmark=False
encoder_mlp_layers=[512, 512]
encoder_conv_architecture=convnet_simple
encoder_conv_mlp_layers=[512]
use_rnn=True
rnn_size=512
rnn_type=gru
rnn_num_layers=1
decoder_mlp_layers=[]
nonlinearity=elu
policy_initialization=orthogonal
policy_init_gain=1.0
actor_critic_share_weights=True
adaptive_stddev=True
continuous_tanh_scale=0.0
initial_stddev=1.0
use_env_info_cache=False
env_gpu_actions=False
env_gpu_observations=True
env_frameskip=4
env_framestack=1
pixel_format=CHW
use_record_episode_statistics=False
with_wandb=False
wandb_user=None
wandb_project=sample_factory
wandb_group=None
wandb_job_type=SF
wandb_tags=[]
with_pbt=False
pbt_mix_policies_in_one_env=True
pbt_period_env_steps=5000000
pbt_start_mutation=20000000
pbt_replace_fraction=0.3
pbt_mutation_rate=0.15
pbt_replace_reward_gap=0.1
pbt_replace_reward_gap_absolute=1e-06
pbt_optimize_gamma=False
pbt_target_objective=true_objective
pbt_perturb_min=1.1
pbt_perturb_max=1.5
num_agents=-1
num_humans=0
num_bots=-1
start_bot_difficulty=None
timelimit=None
res_w=128
res_h=72
wide_aspect_ratio=False
eval_env_frameskip=1
fps=35
command_line=--env=doom_health_gathering_supreme --num_workers=8 --num_envs_per_worker=4 --train_for_env_steps=4000000
cli_args={'env': 'doom_health_gathering_supreme', 'num_workers': 8, 'num_envs_per_worker': 4, 'train_for_env_steps': 4000000}
git_hash=unknown
git_repo_name=not a git repository
[2025-07-29 11:06:42,355][08356] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-07-29 11:06:42,357][08356] Rollout worker 0 uses device cpu
[2025-07-29 11:06:42,357][08356] Rollout worker 1 uses device cpu
[2025-07-29 11:06:42,358][08356] Rollout worker 2 uses device cpu
[2025-07-29 11:06:42,359][08356] Rollout worker 3 uses device cpu
[2025-07-29 11:06:42,360][08356] Rollout worker 4 uses device cpu
[2025-07-29 11:06:42,360][08356] Rollout worker 5 uses device cpu
[2025-07-29 11:06:42,361][08356] Rollout worker 6 uses device cpu
[2025-07-29 11:06:42,362][08356] Rollout worker 7 uses device cpu
[2025-07-29 11:06:42,401][08356] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:06:42,402][08356] InferenceWorker_p0-w0: min num requests: 2
[2025-07-29 11:06:42,431][08356] Starting all processes...
[2025-07-29 11:06:42,431][08356] Starting process learner_proc0
[2025-07-29 11:06:42,483][08356] Starting all processes...
[2025-07-29 11:06:42,488][08356] Starting process inference_proc0-0
[2025-07-29 11:06:42,488][08356] Starting process rollout_proc0
[2025-07-29 11:06:42,489][08356] Starting process rollout_proc1
[2025-07-29 11:06:42,490][08356] Starting process rollout_proc2
[2025-07-29 11:06:42,491][08356] Starting process rollout_proc3
[2025-07-29 11:06:42,491][08356] Starting process rollout_proc4
[2025-07-29 11:06:42,492][08356] Starting process rollout_proc5
[2025-07-29 11:06:42,492][08356] Starting process rollout_proc6
[2025-07-29 11:06:42,499][08356] Starting process rollout_proc7
[2025-07-29 11:06:45,331][12434] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:06:45,410][12436] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:06:45,435][12430] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:06:45,475][12432] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:06:45,575][12431] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:06:45,590][12415] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:06:45,591][12415] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-07-29 11:06:45,606][12415] Num visible devices: 1
[2025-07-29 11:06:45,606][12415] Starting seed is not provided
[2025-07-29 11:06:45,607][12415] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:06:45,607][12415] Initializing actor-critic model on device cuda:0
[2025-07-29 11:06:45,607][12415] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:06:45,608][12415] RunningMeanStd input shape: (1,)
[2025-07-29 11:06:45,620][12415] ConvEncoder: input_channels=3
[2025-07-29 11:06:45,684][12435] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:06:45,704][12429] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:06:45,704][12429] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-07-29 11:06:45,709][12428] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:06:45,719][12429] Num visible devices: 1
[2025-07-29 11:06:45,727][12415] Conv encoder output size: 512
[2025-07-29 11:06:45,727][12415] Policy head output size: 512
[2025-07-29 11:06:45,736][12433] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:06:45,742][12415] Created Actor Critic model with architecture:
[2025-07-29 11:06:45,742][12415] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2025-07-29 11:06:45,869][12415] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-07-29 11:06:46,791][12415] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:06:46,792][12415] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:06:46,793][12415] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:06:46,794][12415] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:06:46,794][12415] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:06:46,795][12415] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:06:46,795][12415] Did not load from checkpoint, starting from scratch!
[2025-07-29 11:06:46,796][12415] Initialized policy 0 weights for model version 0
[2025-07-29 11:06:46,798][12415] LearnerWorker_p0 finished initialization!
[2025-07-29 11:06:46,798][12415] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:06:46,910][12429] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:06:46,911][12429] RunningMeanStd input shape: (1,)
[2025-07-29 11:06:46,923][12429] ConvEncoder: input_channels=3
[2025-07-29 11:06:47,026][12429] Conv encoder output size: 512
[2025-07-29 11:06:47,026][12429] Policy head output size: 512
[2025-07-29 11:06:47,059][08356] Inference worker 0-0 is ready!
[2025-07-29 11:06:47,060][08356] All inference workers are ready! Signal rollout workers to start!
[2025-07-29 11:06:47,092][12428] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:06:47,093][12434] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:06:47,094][12433] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:06:47,098][12430] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:06:47,111][12432] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:06:47,111][12431] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:06:47,112][12435] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:06:47,112][12436] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:06:47,367][12428] Decorrelating experience for 0 frames...
[2025-07-29 11:06:47,371][12433] Decorrelating experience for 0 frames...
[2025-07-29 11:06:47,375][12430] Decorrelating experience for 0 frames...
[2025-07-29 11:06:47,402][12435] Decorrelating experience for 0 frames...
[2025-07-29 11:06:47,404][12432] Decorrelating experience for 0 frames...
[2025-07-29 11:06:47,606][12428] Decorrelating experience for 32 frames...
[2025-07-29 11:06:47,610][12430] Decorrelating experience for 32 frames...
[2025-07-29 11:06:47,637][12435] Decorrelating experience for 32 frames...
[2025-07-29 11:06:47,851][12436] Decorrelating experience for 0 frames...
[2025-07-29 11:06:47,877][12434] Decorrelating experience for 0 frames...
[2025-07-29 11:06:47,908][12431] Decorrelating experience for 0 frames...
[2025-07-29 11:06:47,928][12430] Decorrelating experience for 64 frames...
[2025-07-29 11:06:47,963][12435] Decorrelating experience for 64 frames...
[2025-07-29 11:06:48,089][12436] Decorrelating experience for 32 frames...
[2025-07-29 11:06:48,115][12432] Decorrelating experience for 32 frames...
[2025-07-29 11:06:48,131][12434] Decorrelating experience for 32 frames...
[2025-07-29 11:06:48,156][12428] Decorrelating experience for 64 frames...
[2025-07-29 11:06:48,333][12435] Decorrelating experience for 96 frames...
[2025-07-29 11:06:48,368][12433] Decorrelating experience for 32 frames...
[2025-07-29 11:06:48,420][12431] Decorrelating experience for 32 frames...
[2025-07-29 11:06:48,468][12432] Decorrelating experience for 64 frames...
[2025-07-29 11:06:48,469][12428] Decorrelating experience for 96 frames...
[2025-07-29 11:06:48,496][12430] Decorrelating experience for 96 frames...
[2025-07-29 11:06:48,505][12434] Decorrelating experience for 64 frames...
[2025-07-29 11:06:48,646][12436] Decorrelating experience for 64 frames...
[2025-07-29 11:06:48,774][12432] Decorrelating experience for 96 frames...
[2025-07-29 11:06:48,785][12433] Decorrelating experience for 64 frames...
[2025-07-29 11:06:48,792][12431] Decorrelating experience for 64 frames...
[2025-07-29 11:06:48,906][12434] Decorrelating experience for 96 frames...
[2025-07-29 11:06:49,069][12436] Decorrelating experience for 96 frames...
[2025-07-29 11:06:49,102][12433] Decorrelating experience for 96 frames...
[2025-07-29 11:06:49,456][12431] Decorrelating experience for 96 frames...
[2025-07-29 11:06:49,650][12415] Signal inference workers to stop experience collection...
[2025-07-29 11:06:49,668][12429] InferenceWorker_p0-w0: stopping experience collection
[2025-07-29 11:06:50,065][08356] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 2588. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-07-29 11:06:50,066][08356] Avg episode reward: [(0, '2.322')]
[2025-07-29 11:06:50,489][12415] Signal inference workers to resume experience collection...
[2025-07-29 11:06:50,490][12429] InferenceWorker_p0-w0: resuming experience collection
[2025-07-29 11:06:52,276][12429] Updated weights for policy 0, policy_version 10 (0.0090)
[2025-07-29 11:06:54,296][12429] Updated weights for policy 0, policy_version 20 (0.0011)
[2025-07-29 11:06:55,065][08356] Fps is (10 sec: 18841.2, 60 sec: 18841.2, 300 sec: 18841.2). Total num frames: 94208. Throughput: 0: 2463.5. Samples: 14906. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:06:55,066][08356] Avg episode reward: [(0, '4.433')]
[2025-07-29 11:06:56,284][12429] Updated weights for policy 0, policy_version 30 (0.0011)
[2025-07-29 11:06:58,307][12429] Updated weights for policy 0, policy_version 40 (0.0012)
[2025-07-29 11:07:00,065][08356] Fps is (10 sec: 19660.4, 60 sec: 19660.4, 300 sec: 19660.4). Total num frames: 196608. Throughput: 0: 4291.5. Samples: 45504. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:07:00,066][08356] Avg episode reward: [(0, '4.493')]
[2025-07-29 11:07:00,073][12415] Saving new best policy, reward=4.493!
[2025-07-29 11:07:00,320][12429] Updated weights for policy 0, policy_version 50 (0.0012)
[2025-07-29 11:07:02,364][12429] Updated weights for policy 0, policy_version 60 (0.0012)
[2025-07-29 11:07:02,394][08356] Heartbeat connected on Batcher_0
[2025-07-29 11:07:02,397][08356] Heartbeat connected on LearnerWorker_p0
[2025-07-29 11:07:02,406][08356] Heartbeat connected on InferenceWorker_p0-w0
[2025-07-29 11:07:02,409][08356] Heartbeat connected on RolloutWorker_w0
[2025-07-29 11:07:02,412][08356] Heartbeat connected on RolloutWorker_w1
[2025-07-29 11:07:02,415][08356] Heartbeat connected on RolloutWorker_w2
[2025-07-29 11:07:02,418][08356] Heartbeat connected on RolloutWorker_w3
[2025-07-29 11:07:02,424][08356] Heartbeat connected on RolloutWorker_w5
[2025-07-29 11:07:02,425][08356] Heartbeat connected on RolloutWorker_w4
[2025-07-29 11:07:02,427][08356] Heartbeat connected on RolloutWorker_w6
[2025-07-29 11:07:02,430][08356] Heartbeat connected on RolloutWorker_w7
[2025-07-29 11:07:04,450][12429] Updated weights for policy 0, policy_version 70 (0.0012)
[2025-07-29 11:07:05,065][08356] Fps is (10 sec: 20480.1, 60 sec: 19933.8, 300 sec: 19933.8). Total num frames: 299008. Throughput: 0: 3868.1. Samples: 60610. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:07:05,066][08356] Avg episode reward: [(0, '4.623')]
[2025-07-29 11:07:05,068][12415] Saving new best policy, reward=4.623!
[2025-07-29 11:07:06,435][12429] Updated weights for policy 0, policy_version 80 (0.0011)
[2025-07-29 11:07:08,483][12429] Updated weights for policy 0, policy_version 90 (0.0012)
[2025-07-29 11:07:10,065][08356] Fps is (10 sec: 20070.5, 60 sec: 19865.5, 300 sec: 19865.5). Total num frames: 397312. Throughput: 0: 4409.4. Samples: 90776. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:07:10,066][08356] Avg episode reward: [(0, '4.477')]
[2025-07-29 11:07:10,520][12429] Updated weights for policy 0, policy_version 100 (0.0011)
[2025-07-29 11:07:12,509][12429] Updated weights for policy 0, policy_version 110 (0.0012)
[2025-07-29 11:07:14,543][12429] Updated weights for policy 0, policy_version 120 (0.0012)
[2025-07-29 11:07:15,065][08356] Fps is (10 sec: 20070.1, 60 sec: 19988.3, 300 sec: 19988.3). Total num frames: 499712. Throughput: 0: 4745.3. Samples: 121222. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:07:15,066][08356] Avg episode reward: [(0, '4.683')]
[2025-07-29 11:07:15,067][12415] Saving new best policy, reward=4.683!
[2025-07-29 11:07:16,609][12429] Updated weights for policy 0, policy_version 130 (0.0012)
[2025-07-29 11:07:18,619][12429] Updated weights for policy 0, policy_version 140 (0.0012)
[2025-07-29 11:07:20,065][08356] Fps is (10 sec: 20480.2, 60 sec: 20070.4, 300 sec: 20070.4). Total num frames: 602112. Throughput: 0: 4959.8. Samples: 151382. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:07:20,066][08356] Avg episode reward: [(0, '4.708')]
[2025-07-29 11:07:20,071][12415] Saving new best policy, reward=4.708!
[2025-07-29 11:07:20,606][12429] Updated weights for policy 0, policy_version 150 (0.0011)
[2025-07-29 11:07:22,607][12429] Updated weights for policy 0, policy_version 160 (0.0011)
[2025-07-29 11:07:24,612][12429] Updated weights for policy 0, policy_version 170 (0.0012)
[2025-07-29 11:07:25,065][08356] Fps is (10 sec: 20480.1, 60 sec: 20128.8, 300 sec: 20128.8). Total num frames: 704512. Throughput: 0: 4688.2. Samples: 166674. Policy #0 lag: (min: 0.0, avg: 0.5, max: 2.0)
[2025-07-29 11:07:25,066][08356] Avg episode reward: [(0, '4.567')]
[2025-07-29 11:07:26,621][12429] Updated weights for policy 0, policy_version 180 (0.0011)
[2025-07-29 11:07:28,746][12429] Updated weights for policy 0, policy_version 190 (0.0012)
[2025-07-29 11:07:30,065][08356] Fps is (10 sec: 20070.3, 60 sec: 20070.3, 300 sec: 20070.3). Total num frames: 802816. Throughput: 0: 4851.6. Samples: 196654. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:07:30,066][08356] Avg episode reward: [(0, '4.901')]
[2025-07-29 11:07:30,071][12415] Saving new best policy, reward=4.901!
[2025-07-29 11:07:30,841][12429] Updated weights for policy 0, policy_version 200 (0.0012)
[2025-07-29 11:07:32,875][12429] Updated weights for policy 0, policy_version 210 (0.0012)
[2025-07-29 11:07:34,897][12429] Updated weights for policy 0, policy_version 220 (0.0012)
[2025-07-29 11:07:35,065][08356] Fps is (10 sec: 19660.8, 60 sec: 20024.8, 300 sec: 20024.8). Total num frames: 901120. Throughput: 0: 4645.8. Samples: 211648. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:07:35,066][08356] Avg episode reward: [(0, '5.587')]
[2025-07-29 11:07:35,067][12415] Saving new best policy, reward=5.587!
[2025-07-29 11:07:36,933][12429] Updated weights for policy 0, policy_version 230 (0.0012)
[2025-07-29 11:07:38,927][12429] Updated weights for policy 0, policy_version 240 (0.0011)
[2025-07-29 11:07:40,065][08356] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 20070.4). Total num frames: 1003520. Throughput: 0: 5047.9. Samples: 242060. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:07:40,066][08356] Avg episode reward: [(0, '6.193')]
[2025-07-29 11:07:40,072][12415] Saving new best policy, reward=6.193!
[2025-07-29 11:07:41,001][12429] Updated weights for policy 0, policy_version 250 (0.0012)
[2025-07-29 11:07:43,086][12429] Updated weights for policy 0, policy_version 260 (0.0012)
[2025-07-29 11:07:45,065][08356] Fps is (10 sec: 20070.3, 60 sec: 20033.1, 300 sec: 20033.1). Total num frames: 1101824. Throughput: 0: 5033.3. Samples: 272004. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:07:45,066][08356] Avg episode reward: [(0, '6.077')]
[2025-07-29 11:07:45,087][12429] Updated weights for policy 0, policy_version 270 (0.0012)
[2025-07-29 11:07:47,079][12429] Updated weights for policy 0, policy_version 280 (0.0012)
[2025-07-29 11:07:49,054][12429] Updated weights for policy 0, policy_version 290 (0.0011)
[2025-07-29 11:07:50,065][08356] Fps is (10 sec: 20480.1, 60 sec: 20138.6, 300 sec: 20138.6). Total num frames: 1208320. Throughput: 0: 5382.9. Samples: 302842. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:07:50,066][08356] Avg episode reward: [(0, '8.214')]
[2025-07-29 11:07:50,071][12415] Saving new best policy, reward=8.214!
[2025-07-29 11:07:51,050][12429] Updated weights for policy 0, policy_version 300 (0.0012)
[2025-07-29 11:07:53,055][12429] Updated weights for policy 0, policy_version 310 (0.0012)
[2025-07-29 11:07:55,065][08356] Fps is (10 sec: 20480.0, 60 sec: 20206.9, 300 sec: 20101.9). Total num frames: 1306624. Throughput: 0: 5051.3. Samples: 318084. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:07:55,066][08356] Avg episode reward: [(0, '10.397')]
[2025-07-29 11:07:55,068][12415] Saving new best policy, reward=10.397!
[2025-07-29 11:07:55,172][12429] Updated weights for policy 0, policy_version 320 (0.0012)
[2025-07-29 11:07:57,158][12429] Updated weights for policy 0, policy_version 330 (0.0012)
[2025-07-29 11:07:59,127][12429] Updated weights for policy 0, policy_version 340 (0.0012)
[2025-07-29 11:08:00,065][08356] Fps is (10 sec: 20070.3, 60 sec: 20207.0, 300 sec: 20128.9). Total num frames: 1409024. Throughput: 0: 5050.6. Samples: 348498. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:08:00,066][08356] Avg episode reward: [(0, '12.791')]
[2025-07-29 11:08:00,071][12415] Saving new best policy, reward=12.791!
[2025-07-29 11:08:01,103][12429] Updated weights for policy 0, policy_version 350 (0.0012)
[2025-07-29 11:08:03,116][12429] Updated weights for policy 0, policy_version 360 (0.0011)
[2025-07-29 11:08:05,065][08356] Fps is (10 sec: 20480.3, 60 sec: 20206.9, 300 sec: 20152.3). Total num frames: 1511424. Throughput: 0: 5065.2. Samples: 379314. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:08:05,066][08356] Avg episode reward: [(0, '11.732')]
[2025-07-29 11:08:05,109][12429] Updated weights for policy 0, policy_version 370 (0.0012)
[2025-07-29 11:08:07,201][12429] Updated weights for policy 0, policy_version 380 (0.0012)
[2025-07-29 11:08:09,215][12429] Updated weights for policy 0, policy_version 390 (0.0012)
[2025-07-29 11:08:10,065][08356] Fps is (10 sec: 20479.9, 60 sec: 20275.2, 300 sec: 20172.8). Total num frames: 1613824. Throughput: 0: 5052.9. Samples: 394056. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 11:08:10,066][08356] Avg episode reward: [(0, '12.746')]
[2025-07-29 11:08:11,225][12429] Updated weights for policy 0, policy_version 400 (0.0011)
[2025-07-29 11:08:13,209][12429] Updated weights for policy 0, policy_version 410 (0.0011)
[2025-07-29 11:08:15,065][08356] Fps is (10 sec: 20479.8, 60 sec: 20275.2, 300 sec: 20190.8). Total num frames: 1716224. Throughput: 0: 5072.8. Samples: 424928. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:08:15,066][08356] Avg episode reward: [(0, '13.065')]
[2025-07-29 11:08:15,068][12415] Saving new best policy, reward=13.065!
[2025-07-29 11:08:15,184][12429] Updated weights for policy 0, policy_version 420 (0.0012)
[2025-07-29 11:08:17,153][12429] Updated weights for policy 0, policy_version 430 (0.0012)
[2025-07-29 11:08:19,189][12429] Updated weights for policy 0, policy_version 440 (0.0012)
[2025-07-29 11:08:20,065][08356] Fps is (10 sec: 20480.1, 60 sec: 20275.2, 300 sec: 20206.9). Total num frames: 1818624. Throughput: 0: 5419.5. Samples: 455524. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-07-29 11:08:20,066][08356] Avg episode reward: [(0, '13.716')]
[2025-07-29 11:08:20,070][12415] Saving new best policy, reward=13.716!
[2025-07-29 11:08:21,229][12429] Updated weights for policy 0, policy_version 450 (0.0012)
[2025-07-29 11:08:23,202][12429] Updated weights for policy 0, policy_version 460 (0.0012)
[2025-07-29 11:08:25,065][08356] Fps is (10 sec: 20480.0, 60 sec: 20275.2, 300 sec: 20221.3). Total num frames: 1921024. Throughput: 0: 5085.4. Samples: 470904. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 11:08:25,066][08356] Avg episode reward: [(0, '15.275')]
[2025-07-29 11:08:25,068][12415] Saving new best policy, reward=15.275!
[2025-07-29 11:08:25,208][12429] Updated weights for policy 0, policy_version 470 (0.0012)
[2025-07-29 11:08:27,178][12429] Updated weights for policy 0, policy_version 480 (0.0012)
[2025-07-29 11:08:29,171][12429] Updated weights for policy 0, policy_version 490 (0.0012)
[2025-07-29 11:08:30,065][08356] Fps is (10 sec: 20480.0, 60 sec: 20343.5, 300 sec: 20234.2). Total num frames: 2023424. Throughput: 0: 5107.9. Samples: 501860. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-07-29 11:08:30,066][08356] Avg episode reward: [(0, '14.421')]
[2025-07-29 11:08:31,175][12429] Updated weights for policy 0, policy_version 500 (0.0012)
[2025-07-29 11:08:33,255][12429] Updated weights for policy 0, policy_version 510 (0.0011)
[2025-07-29 11:08:35,065][08356] Fps is (10 sec: 20480.2, 60 sec: 20411.8, 300 sec: 20245.9). Total num frames: 2125824. Throughput: 0: 5096.0. Samples: 532164. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 11:08:35,066][08356] Avg episode reward: [(0, '17.210')]
[2025-07-29 11:08:35,067][12415] Saving new best policy, reward=17.210!
[2025-07-29 11:08:35,241][12429] Updated weights for policy 0, policy_version 520 (0.0012)
[2025-07-29 11:08:37,227][12429] Updated weights for policy 0, policy_version 530 (0.0011)
[2025-07-29 11:08:39,211][12429] Updated weights for policy 0, policy_version 540 (0.0012)
[2025-07-29 11:08:40,065][08356] Fps is (10 sec: 20480.2, 60 sec: 20411.7, 300 sec: 20256.6). Total num frames: 2228224. Throughput: 0: 5100.9. Samples: 547624. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 11:08:40,066][08356] Avg episode reward: [(0, '20.572')]
[2025-07-29 11:08:40,071][12415] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000544_2228224.pth...
[2025-07-29 11:08:40,136][12415] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000512_2097152.pth
[2025-07-29 11:08:40,143][12415] Saving new best policy, reward=20.572!
[2025-07-29 11:08:41,192][12429] Updated weights for policy 0, policy_version 550 (0.0012)
[2025-07-29 11:08:43,170][12429] Updated weights for policy 0, policy_version 560 (0.0011)
[2025-07-29 11:08:45,065][08356] Fps is (10 sec: 20479.9, 60 sec: 20480.0, 300 sec: 20266.3). Total num frames: 2330624. Throughput: 0: 5110.4. Samples: 578466. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:08:45,066][08356] Avg episode reward: [(0, '17.952')]
[2025-07-29 11:08:45,261][12429] Updated weights for policy 0, policy_version 570 (0.0012)
[2025-07-29 11:08:47,293][12429] Updated weights for policy 0, policy_version 580 (0.0012)
[2025-07-29 11:08:49,269][12429] Updated weights for policy 0, policy_version 590 (0.0012)
[2025-07-29 11:08:50,065][08356] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 20275.2). Total num frames: 2433024. Throughput: 0: 4759.1. Samples: 593472. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:08:50,066][08356] Avg episode reward: [(0, '19.162')]
[2025-07-29 11:08:51,260][12429] Updated weights for policy 0, policy_version 600 (0.0012)
[2025-07-29 11:08:53,247][12429] Updated weights for policy 0, policy_version 610 (0.0012)
[2025-07-29 11:08:55,065][08356] Fps is (10 sec: 20479.9, 60 sec: 20480.0, 300 sec: 20283.4). Total num frames: 2535424. Throughput: 0: 5117.5. Samples: 624342. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:08:55,066][08356] Avg episode reward: [(0, '19.000')]
[2025-07-29 11:08:55,238][12429] Updated weights for policy 0, policy_version 620 (0.0011)
[2025-07-29 11:08:57,290][12429] Updated weights for policy 0, policy_version 630 (0.0012)
[2025-07-29 11:08:59,355][12429] Updated weights for policy 0, policy_version 640 (0.0012)
[2025-07-29 11:09:00,065][08356] Fps is (10 sec: 20070.5, 60 sec: 20411.7, 300 sec: 20259.4). Total num frames: 2633728. Throughput: 0: 5105.3. Samples: 654666. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:09:00,066][08356] Avg episode reward: [(0, '23.100')]
[2025-07-29 11:09:00,071][12415] Saving new best policy, reward=23.100!
[2025-07-29 11:09:01,334][12429] Updated weights for policy 0, policy_version 650 (0.0011)
[2025-07-29 11:09:03,336][12429] Updated weights for policy 0, policy_version 660 (0.0012)
[2025-07-29 11:09:05,065][08356] Fps is (10 sec: 20070.4, 60 sec: 20411.7, 300 sec: 20267.6). Total num frames: 2736128. Throughput: 0: 4771.2. Samples: 670230. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:09:05,066][08356] Avg episode reward: [(0, '24.874')]
[2025-07-29 11:09:05,068][12415] Saving new best policy, reward=24.874!
[2025-07-29 11:09:05,303][12429] Updated weights for policy 0, policy_version 670 (0.0012)
[2025-07-29 11:09:07,288][12429] Updated weights for policy 0, policy_version 680 (0.0011)
[2025-07-29 11:09:09,281][12429] Updated weights for policy 0, policy_version 690 (0.0011)
[2025-07-29 11:09:10,065][08356] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 20275.2). Total num frames: 2838528. Throughput: 0: 5118.0. Samples: 701214. Policy #0 lag: (min: 0.0, avg: 0.8, max: 1.0)
[2025-07-29 11:09:10,066][08356] Avg episode reward: [(0, '22.482')]
[2025-07-29 11:09:11,357][12429] Updated weights for policy 0, policy_version 700 (0.0012)
[2025-07-29 11:09:13,361][12429] Updated weights for policy 0, policy_version 710 (0.0012)
[2025-07-29 11:09:15,065][08356] Fps is (10 sec: 20480.0, 60 sec: 20411.7, 300 sec: 20282.2). Total num frames: 2940928. Throughput: 0: 5104.3. Samples: 731552. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:09:15,066][08356] Avg episode reward: [(0, '20.169')]
[2025-07-29 11:09:15,338][12429] Updated weights for policy 0, policy_version 720 (0.0011)
[2025-07-29 11:09:17,326][12429] Updated weights for policy 0, policy_version 730 (0.0012)
[2025-07-29 11:09:19,329][12429] Updated weights for policy 0, policy_version 740 (0.0012)
[2025-07-29 11:09:20,065][08356] Fps is (10 sec: 20480.0, 60 sec: 20411.7, 300 sec: 20288.8). Total num frames: 3043328. Throughput: 0: 4776.2. Samples: 747092. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-07-29 11:09:20,066][08356] Avg episode reward: [(0, '22.839')]
[2025-07-29 11:09:21,300][12429] Updated weights for policy 0, policy_version 750 (0.0012)
[2025-07-29 11:09:23,348][12429] Updated weights for policy 0, policy_version 760 (0.0012)
[2025-07-29 11:09:25,065][08356] Fps is (10 sec: 20480.0, 60 sec: 20411.7, 300 sec: 20295.0). Total num frames: 3145728. Throughput: 0: 5113.9. Samples: 777750. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-07-29 11:09:25,066][08356] Avg episode reward: [(0, '22.733')]
[2025-07-29 11:09:25,379][12429] Updated weights for policy 0, policy_version 770 (0.0011)
[2025-07-29 11:09:27,352][12429] Updated weights for policy 0, policy_version 780 (0.0011)
[2025-07-29 11:09:29,328][12429] Updated weights for policy 0, policy_version 790 (0.0012)
[2025-07-29 11:09:30,065][08356] Fps is (10 sec: 20480.1, 60 sec: 20411.8, 300 sec: 20300.8). Total num frames: 3248128. Throughput: 0: 5112.9. Samples: 808546. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-07-29 11:09:30,066][08356] Avg episode reward: [(0, '19.832')]
[2025-07-29 11:09:31,310][12429] Updated weights for policy 0, policy_version 800 (0.0011)
[2025-07-29 11:09:33,286][12429] Updated weights for policy 0, policy_version 810 (0.0012)
[2025-07-29 11:09:35,065][08356] Fps is (10 sec: 20480.1, 60 sec: 20411.7, 300 sec: 20306.2). Total num frames: 3350528. Throughput: 0: 5466.2. Samples: 839450. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-07-29 11:09:35,066][08356] Avg episode reward: [(0, '22.250')]
[2025-07-29 11:09:35,286][12429] Updated weights for policy 0, policy_version 820 (0.0011)
[2025-07-29 11:09:37,361][12429] Updated weights for policy 0, policy_version 830 (0.0012)
[2025-07-29 11:09:39,338][12429] Updated weights for policy 0, policy_version 840 (0.0011)
[2025-07-29 11:09:40,065][08356] Fps is (10 sec: 20479.9, 60 sec: 20411.7, 300 sec: 20311.3). Total num frames: 3452928. Throughput: 0: 5112.9. Samples: 854422. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:09:40,066][08356] Avg episode reward: [(0, '24.782')]
[2025-07-29 11:09:41,308][12429] Updated weights for policy 0, policy_version 850 (0.0011)
[2025-07-29 11:09:43,284][12429] Updated weights for policy 0, policy_version 860 (0.0012)
[2025-07-29 11:09:45,065][08356] Fps is (10 sec: 20889.6, 60 sec: 20480.0, 300 sec: 20339.6). Total num frames: 3559424. Throughput: 0: 5129.9. Samples: 885510. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:09:45,066][08356] Avg episode reward: [(0, '23.621')]
[2025-07-29 11:09:45,276][12429] Updated weights for policy 0, policy_version 870 (0.0011)
[2025-07-29 11:09:47,410][12429] Updated weights for policy 0, policy_version 880 (0.0012)
[2025-07-29 11:09:49,514][12429] Updated weights for policy 0, policy_version 890 (0.0012)
[2025-07-29 11:09:50,065][08356] Fps is (10 sec: 20070.5, 60 sec: 20343.5, 300 sec: 20297.9). Total num frames: 3653632. Throughput: 0: 5433.1. Samples: 914720. Policy #0 lag: (min: 0.0, avg: 0.4, max: 1.0)
[2025-07-29 11:09:50,066][08356] Avg episode reward: [(0, '22.036')]
[2025-07-29 11:09:51,501][12429] Updated weights for policy 0, policy_version 900 (0.0012)
[2025-07-29 11:09:53,482][12429] Updated weights for policy 0, policy_version 910 (0.0012)
[2025-07-29 11:09:55,065][08356] Fps is (10 sec: 20070.4, 60 sec: 20411.7, 300 sec: 20325.0). Total num frames: 3760128. Throughput: 0: 5090.2. Samples: 930274. Policy #0 lag: (min: 0.0, avg: 0.5, max: 1.0)
[2025-07-29 11:09:55,066][08356] Avg episode reward: [(0, '23.609')]
[2025-07-29 11:09:55,455][12429] Updated weights for policy 0, policy_version 920 (0.0012)
[2025-07-29 11:09:57,431][12429] Updated weights for policy 0, policy_version 930 (0.0011)
[2025-07-29 11:09:59,413][12429] Updated weights for policy 0, policy_version 940 (0.0012)
[2025-07-29 11:10:00,065][08356] Fps is (10 sec: 20889.5, 60 sec: 20480.0, 300 sec: 20329.1). Total num frames: 3862528. Throughput: 0: 5105.7. Samples: 961310. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 11:10:00,066][08356] Avg episode reward: [(0, '26.307')]
[2025-07-29 11:10:00,072][12415] Saving new best policy, reward=26.307!
[2025-07-29 11:10:01,481][12429] Updated weights for policy 0, policy_version 950 (0.0012)
[2025-07-29 11:10:03,535][12429] Updated weights for policy 0, policy_version 960 (0.0012)
[2025-07-29 11:10:05,065][08356] Fps is (10 sec: 20070.1, 60 sec: 20411.7, 300 sec: 20311.9). Total num frames: 3960832. Throughput: 0: 5432.7. Samples: 991562. Policy #0 lag: (min: 0.0, avg: 0.6, max: 1.0)
[2025-07-29 11:10:05,066][08356] Avg episode reward: [(0, '24.505')]
[2025-07-29 11:10:05,539][12429] Updated weights for policy 0, policy_version 970 (0.0011)
[2025-07-29 11:10:07,138][12415] Stopping Batcher_0...
[2025-07-29 11:10:07,138][12415] Loop batcher_evt_loop terminating...
[2025-07-29 11:10:07,138][08356] Component Batcher_0 stopped!
[2025-07-29 11:10:07,139][12415] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:10:07,161][12429] Weights refcount: 2 0
[2025-07-29 11:10:07,162][12429] Stopping InferenceWorker_p0-w0...
[2025-07-29 11:10:07,163][12429] Loop inference_proc0-0_evt_loop terminating...
[2025-07-29 11:10:07,163][08356] Component InferenceWorker_p0-w0 stopped!
[2025-07-29 11:10:07,174][12434] Stopping RolloutWorker_w5...
[2025-07-29 11:10:07,175][12434] Loop rollout_proc5_evt_loop terminating...
[2025-07-29 11:10:07,175][12432] Stopping RolloutWorker_w3...
[2025-07-29 11:10:07,174][08356] Component RolloutWorker_w5 stopped!
[2025-07-29 11:10:07,175][12432] Loop rollout_proc3_evt_loop terminating...
[2025-07-29 11:10:07,176][08356] Component RolloutWorker_w3 stopped!
[2025-07-29 11:10:07,177][12430] Stopping RolloutWorker_w2...
[2025-07-29 11:10:07,177][12433] Stopping RolloutWorker_w4...
[2025-07-29 11:10:07,178][12430] Loop rollout_proc2_evt_loop terminating...
[2025-07-29 11:10:07,178][12433] Loop rollout_proc4_evt_loop terminating...
[2025-07-29 11:10:07,177][08356] Component RolloutWorker_w2 stopped!
[2025-07-29 11:10:07,178][08356] Component RolloutWorker_w4 stopped!
[2025-07-29 11:10:07,179][12431] Stopping RolloutWorker_w1...
[2025-07-29 11:10:07,180][08356] Component RolloutWorker_w1 stopped!
[2025-07-29 11:10:07,180][12436] Stopping RolloutWorker_w7...
[2025-07-29 11:10:07,180][12431] Loop rollout_proc1_evt_loop terminating...
[2025-07-29 11:10:07,181][12436] Loop rollout_proc7_evt_loop terminating...
[2025-07-29 11:10:07,181][08356] Component RolloutWorker_w7 stopped!
[2025-07-29 11:10:07,181][12428] Stopping RolloutWorker_w0...
[2025-07-29 11:10:07,182][12435] Stopping RolloutWorker_w6...
[2025-07-29 11:10:07,182][12428] Loop rollout_proc0_evt_loop terminating...
[2025-07-29 11:10:07,181][08356] Component RolloutWorker_w0 stopped!
[2025-07-29 11:10:07,183][12435] Loop rollout_proc6_evt_loop terminating...
[2025-07-29 11:10:07,183][08356] Component RolloutWorker_w6 stopped!
[2025-07-29 11:10:07,230][12415] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:10:07,304][12415] Stopping LearnerWorker_p0...
[2025-07-29 11:10:07,304][12415] Loop learner_proc0_evt_loop terminating...
[2025-07-29 11:10:07,304][08356] Component LearnerWorker_p0 stopped!
[2025-07-29 11:10:07,306][08356] Waiting for process learner_proc0 to stop...
[2025-07-29 11:10:08,149][08356] Waiting for process inference_proc0-0 to join...
[2025-07-29 11:10:08,150][08356] Waiting for process rollout_proc0 to join...
[2025-07-29 11:10:08,151][08356] Waiting for process rollout_proc1 to join...
[2025-07-29 11:10:08,152][08356] Waiting for process rollout_proc2 to join...
[2025-07-29 11:10:08,153][08356] Waiting for process rollout_proc3 to join...
[2025-07-29 11:10:08,153][08356] Waiting for process rollout_proc4 to join...
[2025-07-29 11:10:08,154][08356] Waiting for process rollout_proc5 to join...
[2025-07-29 11:10:08,155][08356] Waiting for process rollout_proc6 to join...
[2025-07-29 11:10:08,155][08356] Waiting for process rollout_proc7 to join...
[2025-07-29 11:10:08,156][08356] Batcher 0 profile tree view:
batching: 16.6304, releasing_batches: 0.0214
[2025-07-29 11:10:08,157][08356] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
  wait_policy_total: 3.6828
update_model: 3.2133
  weight_update: 0.0011
one_step: 0.0029
  handle_policy_step: 184.6307
    deserialize: 7.6461, stack: 1.2934, obs_to_device_normalize: 45.3181, forward: 88.3579, send_messages: 12.7485
    prepare_outputs: 22.0962
      to_cpu: 14.2240
[2025-07-29 11:10:08,157][08356] Learner 0 profile tree view:
misc: 0.0036, prepare_batch: 6.5193
train: 18.4938
  epoch_init: 0.0041, minibatch_init: 0.0051, losses_postprocess: 0.3442, kl_divergence: 0.3916, after_optimizer: 1.9269
  calculate_losses: 8.3696
    losses_init: 0.0035, forward_head: 0.6279, bptt_initial: 4.4161, tail: 0.6188, advantages_returns: 0.1577, losses: 1.1792
    bptt: 1.2235
      bptt_forward_core: 1.1725
  update: 7.1419
    clip: 0.7660
[2025-07-29 11:10:08,158][08356] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.1227, enqueue_policy_requests: 8.8482, env_step: 126.6760, overhead: 5.4741, complete_rollouts: 0.2066
save_policy_outputs: 7.9668
  split_output_tensors: 3.0468
[2025-07-29 11:10:08,159][08356] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.1191, enqueue_policy_requests: 8.8728, env_step: 126.4309, overhead: 5.5495, complete_rollouts: 0.2074
save_policy_outputs: 8.0131
  split_output_tensors: 3.0342
[2025-07-29 11:10:08,160][08356] Loop Runner_EvtLoop terminating...
[2025-07-29 11:10:08,161][08356] Runner profile tree view:
main_loop: 205.7305
[2025-07-29 11:10:08,161][08356] Collected {0: 4005888}, FPS: 19471.5
[2025-07-29 11:10:58,192][08356] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 11:10:58,193][08356] Overriding arg 'num_workers' with value 1 passed from command line
[2025-07-29 11:10:58,194][08356] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-07-29 11:10:58,195][08356] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-07-29 11:10:58,195][08356] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:10:58,196][08356] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-07-29 11:10:58,197][08356] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:10:58,197][08356] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-07-29 11:10:58,198][08356] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-07-29 11:10:58,199][08356] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-07-29 11:10:58,199][08356] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-07-29 11:10:58,200][08356] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-07-29 11:10:58,200][08356] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-07-29 11:10:58,201][08356] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-07-29 11:10:58,202][08356] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-07-29 11:10:58,230][08356] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:10:58,231][08356] RunningMeanStd input shape: (1,)
[2025-07-29 11:10:58,241][08356] ConvEncoder: input_channels=3
[2025-07-29 11:10:58,277][08356] Conv encoder output size: 512
[2025-07-29 11:10:58,277][08356] Policy head output size: 512
[2025-07-29 11:10:58,298][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:10:58,299][08356] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:10:58,300][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:10:58,301][08356] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:10:58,302][08356] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:10:58,303][08356] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:16:06,923][14877] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-07-29 11:16:06,925][14877] Rollout worker 0 uses device cpu
[2025-07-29 11:16:06,926][14877] Rollout worker 1 uses device cpu
[2025-07-29 11:16:06,926][14877] Rollout worker 2 uses device cpu
[2025-07-29 11:16:06,927][14877] Rollout worker 3 uses device cpu
[2025-07-29 11:16:06,928][14877] Rollout worker 4 uses device cpu
[2025-07-29 11:16:06,928][14877] Rollout worker 5 uses device cpu
[2025-07-29 11:16:06,929][14877] Rollout worker 6 uses device cpu
[2025-07-29 11:16:06,929][14877] Rollout worker 7 uses device cpu
[2025-07-29 11:16:06,967][14877] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:16:06,968][14877] InferenceWorker_p0-w0: min num requests: 2
[2025-07-29 11:16:06,997][14877] Starting all processes...
[2025-07-29 11:16:06,998][14877] Starting process learner_proc0
[2025-07-29 11:16:07,051][14877] Starting all processes...
[2025-07-29 11:16:07,056][14877] Starting process inference_proc0-0
[2025-07-29 11:16:07,056][14877] Starting process rollout_proc0
[2025-07-29 11:16:07,056][14877] Starting process rollout_proc1
[2025-07-29 11:16:07,057][14877] Starting process rollout_proc2
[2025-07-29 11:16:07,057][14877] Starting process rollout_proc3
[2025-07-29 11:16:07,057][14877] Starting process rollout_proc4
[2025-07-29 11:16:07,058][14877] Starting process rollout_proc5
[2025-07-29 11:16:07,058][14877] Starting process rollout_proc6
[2025-07-29 11:16:07,058][14877] Starting process rollout_proc7
[2025-07-29 11:16:09,944][15381] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:16:10,092][15382] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:16:10,243][15380] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:16:10,256][15362] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:16:10,256][15362] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-07-29 11:16:10,274][15362] Num visible devices: 1
[2025-07-29 11:16:10,275][15362] Starting seed is not provided
[2025-07-29 11:16:10,276][15362] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:16:10,276][15362] Initializing actor-critic model on device cuda:0
[2025-07-29 11:16:10,276][15362] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:16:10,278][15362] RunningMeanStd input shape: (1,)
[2025-07-29 11:16:10,293][15375] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:16:10,293][15375] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-07-29 11:16:10,297][15362] ConvEncoder: input_channels=3
[2025-07-29 11:16:10,300][15378] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:16:10,309][15375] Num visible devices: 1
[2025-07-29 11:16:10,369][15377] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:16:10,413][15376] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:16:10,422][15362] Conv encoder output size: 512
[2025-07-29 11:16:10,422][15362] Policy head output size: 512
[2025-07-29 11:16:10,437][15362] Created Actor Critic model with architecture:
[2025-07-29 11:16:10,437][15362] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2025-07-29 11:16:10,465][15379] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:16:10,506][15383] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:16:10,589][15362] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-07-29 11:16:11,512][15362] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:16:11,513][15362] Could not load from checkpoint, attempt 0
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:16:11,514][15362] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:16:11,515][15362] Could not load from checkpoint, attempt 1
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:16:11,515][15362] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:16:11,516][15362] Could not load from checkpoint, attempt 2
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/dist-packages/sample_factory/algo/learning/learner.py", line 281, in load_checkpoint
    checkpoint_dict = torch.load(latest_checkpoint, map_location=device)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/torch/serialization.py", line 1470, in load
    raise pickle.UnpicklingError(_get_wo_message(str(e))) from None
_pickle.UnpicklingError: Weights only load failed. This file can still be loaded, to do so you have two options, [1mdo those steps only if you trust the source of the checkpoint[0m. 
	(1) In PyTorch 2.6, we changed the default value of the `weights_only` argument in `torch.load` from `False` to `True`. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution. Do it only if you got the file from a trusted source.
	(2) Alternatively, to load with `weights_only=True` please check the recommended steps in the following error message.
	WeightsUnpickler error: Unsupported global: GLOBAL numpy.core.multiarray.scalar was not an allowed global by default. Please use `torch.serialization.add_safe_globals([scalar])` or the `torch.serialization.safe_globals([scalar])` context manager to allowlist this global if you trust this class/function.

Check the documentation of torch.load to learn more about types accepted by default with weights_only https://pytorch.org/docs/stable/generated/torch.load.html.
[2025-07-29 11:16:11,516][15362] Did not load from checkpoint, starting from scratch!
[2025-07-29 11:16:11,517][15362] Initialized policy 0 weights for model version 0
[2025-07-29 11:16:11,519][15362] LearnerWorker_p0 finished initialization!
[2025-07-29 11:16:11,519][15362] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:16:11,626][15375] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:16:11,627][15375] RunningMeanStd input shape: (1,)
[2025-07-29 11:16:11,639][15375] ConvEncoder: input_channels=3
[2025-07-29 11:16:11,743][15375] Conv encoder output size: 512
[2025-07-29 11:16:11,744][15375] Policy head output size: 512
[2025-07-29 11:16:11,776][14877] Inference worker 0-0 is ready!
[2025-07-29 11:16:11,778][14877] All inference workers are ready! Signal rollout workers to start!
[2025-07-29 11:16:11,810][15383] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:16:11,810][15379] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:16:11,811][15376] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:16:11,811][15380] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:16:11,831][15377] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:16:11,831][15382] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:16:11,831][15381] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:16:11,831][15378] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:16:12,102][15376] Decorrelating experience for 0 frames...
[2025-07-29 11:16:12,106][15380] Decorrelating experience for 0 frames...
[2025-07-29 11:16:12,107][15379] Decorrelating experience for 0 frames...
[2025-07-29 11:16:12,114][15383] Decorrelating experience for 0 frames...
[2025-07-29 11:16:12,130][15378] Decorrelating experience for 0 frames...
[2025-07-29 11:16:12,133][15382] Decorrelating experience for 0 frames...
[2025-07-29 11:16:12,360][15376] Decorrelating experience for 32 frames...
[2025-07-29 11:16:12,362][15380] Decorrelating experience for 32 frames...
[2025-07-29 11:16:12,364][15379] Decorrelating experience for 32 frames...
[2025-07-29 11:16:12,369][15383] Decorrelating experience for 32 frames...
[2025-07-29 11:16:12,384][15377] Decorrelating experience for 0 frames...
[2025-07-29 11:16:12,388][15378] Decorrelating experience for 32 frames...
[2025-07-29 11:16:12,581][14877] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 0. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-07-29 11:16:12,627][15381] Decorrelating experience for 0 frames...
[2025-07-29 11:16:12,643][15377] Decorrelating experience for 32 frames...
[2025-07-29 11:16:12,684][15376] Decorrelating experience for 64 frames...
[2025-07-29 11:16:12,727][15383] Decorrelating experience for 64 frames...
[2025-07-29 11:16:12,730][15379] Decorrelating experience for 64 frames...
[2025-07-29 11:16:12,875][15381] Decorrelating experience for 32 frames...
[2025-07-29 11:16:12,924][15380] Decorrelating experience for 64 frames...
[2025-07-29 11:16:12,934][15382] Decorrelating experience for 32 frames...
[2025-07-29 11:16:13,023][15379] Decorrelating experience for 96 frames...
[2025-07-29 11:16:13,057][15383] Decorrelating experience for 96 frames...
[2025-07-29 11:16:13,192][15378] Decorrelating experience for 64 frames...
[2025-07-29 11:16:13,201][15381] Decorrelating experience for 64 frames...
[2025-07-29 11:16:13,221][15380] Decorrelating experience for 96 frames...
[2025-07-29 11:16:13,265][15377] Decorrelating experience for 64 frames...
[2025-07-29 11:16:13,436][15376] Decorrelating experience for 96 frames...
[2025-07-29 11:16:13,476][15378] Decorrelating experience for 96 frames...
[2025-07-29 11:16:13,540][15381] Decorrelating experience for 96 frames...
[2025-07-29 11:16:13,704][15377] Decorrelating experience for 96 frames...
[2025-07-29 11:16:14,036][15382] Decorrelating experience for 64 frames...
[2025-07-29 11:16:14,318][15362] Signal inference workers to stop experience collection...
[2025-07-29 11:16:14,323][15375] InferenceWorker_p0-w0: stopping experience collection
[2025-07-29 11:16:14,395][15382] Decorrelating experience for 96 frames...
[2025-07-29 11:16:15,154][15362] Signal inference workers to resume experience collection...
[2025-07-29 11:16:15,155][15375] InferenceWorker_p0-w0: resuming experience collection
[2025-07-29 11:16:16,922][15375] Updated weights for policy 0, policy_version 10 (0.0088)
[2025-07-29 11:16:17,581][14877] Fps is (10 sec: 10649.7, 60 sec: 10649.7, 300 sec: 10649.7). Total num frames: 53248. Throughput: 0: 543.6. Samples: 2718. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:16:17,582][14877] Avg episode reward: [(0, '4.451')]
[2025-07-29 11:16:18,956][15375] Updated weights for policy 0, policy_version 20 (0.0011)
[2025-07-29 11:16:21,073][15375] Updated weights for policy 0, policy_version 30 (0.0012)
[2025-07-29 11:16:22,581][14877] Fps is (10 sec: 15155.2, 60 sec: 15155.2, 300 sec: 15155.2). Total num frames: 151552. Throughput: 0: 3010.4. Samples: 30104. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:16:22,582][14877] Avg episode reward: [(0, '4.403')]
[2025-07-29 11:16:22,587][15362] Saving new best policy, reward=4.403!
[2025-07-29 11:16:23,196][15375] Updated weights for policy 0, policy_version 40 (0.0012)
[2025-07-29 11:16:25,189][15375] Updated weights for policy 0, policy_version 50 (0.0011)
[2025-07-29 11:16:26,959][14877] Heartbeat connected on Batcher_0
[2025-07-29 11:16:26,972][14877] Heartbeat connected on InferenceWorker_p0-w0
[2025-07-29 11:16:26,973][14877] Heartbeat connected on LearnerWorker_p0
[2025-07-29 11:16:26,975][14877] Heartbeat connected on RolloutWorker_w0
[2025-07-29 11:16:26,980][14877] Heartbeat connected on RolloutWorker_w1
[2025-07-29 11:16:26,982][14877] Heartbeat connected on RolloutWorker_w2
[2025-07-29 11:16:26,988][14877] Heartbeat connected on RolloutWorker_w4
[2025-07-29 11:16:26,989][14877] Heartbeat connected on RolloutWorker_w3
[2025-07-29 11:16:26,991][14877] Heartbeat connected on RolloutWorker_w5
[2025-07-29 11:16:26,994][14877] Heartbeat connected on RolloutWorker_w6
[2025-07-29 11:16:26,997][14877] Heartbeat connected on RolloutWorker_w7
[2025-07-29 11:16:27,180][15375] Updated weights for policy 0, policy_version 60 (0.0012)
[2025-07-29 11:16:27,581][14877] Fps is (10 sec: 20070.3, 60 sec: 16930.2, 300 sec: 16930.2). Total num frames: 253952. Throughput: 0: 4012.5. Samples: 60188. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:16:27,582][14877] Avg episode reward: [(0, '4.371')]
[2025-07-29 11:16:29,166][15375] Updated weights for policy 0, policy_version 70 (0.0011)
[2025-07-29 11:16:31,174][15375] Updated weights for policy 0, policy_version 80 (0.0012)
[2025-07-29 11:16:32,581][14877] Fps is (10 sec: 20480.0, 60 sec: 17817.6, 300 sec: 17817.6). Total num frames: 356352. Throughput: 0: 3779.3. Samples: 75586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:16:32,582][14877] Avg episode reward: [(0, '4.305')]
[2025-07-29 11:16:33,197][15375] Updated weights for policy 0, policy_version 90 (0.0011)
[2025-07-29 11:16:35,302][15375] Updated weights for policy 0, policy_version 100 (0.0012)
[2025-07-29 11:16:37,318][15375] Updated weights for policy 0, policy_version 110 (0.0012)
[2025-07-29 11:16:37,581][14877] Fps is (10 sec: 20070.5, 60 sec: 18186.3, 300 sec: 18186.3). Total num frames: 454656. Throughput: 0: 4224.8. Samples: 105620. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:16:37,582][14877] Avg episode reward: [(0, '4.606')]
[2025-07-29 11:16:37,584][15362] Saving new best policy, reward=4.606!
[2025-07-29 11:16:39,326][15375] Updated weights for policy 0, policy_version 120 (0.0012)
[2025-07-29 11:16:41,320][15375] Updated weights for policy 0, policy_version 130 (0.0011)
[2025-07-29 11:16:42,581][14877] Fps is (10 sec: 20070.5, 60 sec: 18568.6, 300 sec: 18568.6). Total num frames: 557056. Throughput: 0: 4540.5. Samples: 136216. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:16:42,582][14877] Avg episode reward: [(0, '4.683')]
[2025-07-29 11:16:42,587][15362] Saving new best policy, reward=4.683!
[2025-07-29 11:16:43,334][15375] Updated weights for policy 0, policy_version 140 (0.0011)
[2025-07-29 11:16:45,354][15375] Updated weights for policy 0, policy_version 150 (0.0012)
[2025-07-29 11:16:47,446][15375] Updated weights for policy 0, policy_version 160 (0.0012)
[2025-07-29 11:16:47,581][14877] Fps is (10 sec: 20070.3, 60 sec: 18724.6, 300 sec: 18724.6). Total num frames: 655360. Throughput: 0: 4331.3. Samples: 151594. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:16:47,582][14877] Avg episode reward: [(0, '4.935')]
[2025-07-29 11:16:47,583][15362] Saving new best policy, reward=4.935!
[2025-07-29 11:16:49,452][15375] Updated weights for policy 0, policy_version 170 (0.0012)
[2025-07-29 11:16:51,459][15375] Updated weights for policy 0, policy_version 180 (0.0012)
[2025-07-29 11:16:52,581][14877] Fps is (10 sec: 20070.3, 60 sec: 18944.0, 300 sec: 18944.0). Total num frames: 757760. Throughput: 0: 4540.3. Samples: 181610. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:16:52,582][14877] Avg episode reward: [(0, '5.278')]
[2025-07-29 11:16:52,588][15362] Saving new best policy, reward=5.278!
[2025-07-29 11:16:53,476][15375] Updated weights for policy 0, policy_version 190 (0.0012)
[2025-07-29 11:16:55,471][15375] Updated weights for policy 0, policy_version 200 (0.0012)
[2025-07-29 11:16:57,448][15375] Updated weights for policy 0, policy_version 210 (0.0012)
[2025-07-29 11:16:57,581][14877] Fps is (10 sec: 20480.1, 60 sec: 19114.7, 300 sec: 19114.7). Total num frames: 860160. Throughput: 0: 4718.5. Samples: 212330. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:16:57,582][14877] Avg episode reward: [(0, '5.428')]
[2025-07-29 11:16:57,584][15362] Saving new best policy, reward=5.428!
[2025-07-29 11:16:59,542][15375] Updated weights for policy 0, policy_version 220 (0.0012)
[2025-07-29 11:17:01,636][15375] Updated weights for policy 0, policy_version 230 (0.0012)
[2025-07-29 11:17:02,581][14877] Fps is (10 sec: 20070.5, 60 sec: 19169.3, 300 sec: 19169.3). Total num frames: 958464. Throughput: 0: 4988.3. Samples: 227192. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:17:02,582][14877] Avg episode reward: [(0, '5.942')]
[2025-07-29 11:17:02,586][15362] Saving new best policy, reward=5.942!
[2025-07-29 11:17:03,662][15375] Updated weights for policy 0, policy_version 240 (0.0012)
[2025-07-29 11:17:05,786][15375] Updated weights for policy 0, policy_version 250 (0.0012)
[2025-07-29 11:17:07,581][14877] Fps is (10 sec: 19660.8, 60 sec: 19214.0, 300 sec: 19214.0). Total num frames: 1056768. Throughput: 0: 5035.9. Samples: 256720. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:17:07,582][14877] Avg episode reward: [(0, '6.500')]
[2025-07-29 11:17:07,584][15362] Saving new best policy, reward=6.500!
[2025-07-29 11:17:07,890][15375] Updated weights for policy 0, policy_version 260 (0.0011)
[2025-07-29 11:17:09,897][15375] Updated weights for policy 0, policy_version 270 (0.0012)
[2025-07-29 11:17:11,987][15375] Updated weights for policy 0, policy_version 280 (0.0012)
[2025-07-29 11:17:12,581][14877] Fps is (10 sec: 19660.8, 60 sec: 19251.2, 300 sec: 19251.2). Total num frames: 1155072. Throughput: 0: 5031.8. Samples: 286618. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-07-29 11:17:12,582][14877] Avg episode reward: [(0, '7.833')]
[2025-07-29 11:17:12,587][15362] Saving new best policy, reward=7.833!
[2025-07-29 11:17:14,063][15375] Updated weights for policy 0, policy_version 290 (0.0012)
[2025-07-29 11:17:16,126][15375] Updated weights for policy 0, policy_version 300 (0.0011)
[2025-07-29 11:17:17,581][14877] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19345.7). Total num frames: 1257472. Throughput: 0: 5019.6. Samples: 301468. Policy #0 lag: (min: 0.0, avg: 0.9, max: 2.0)
[2025-07-29 11:17:17,582][14877] Avg episode reward: [(0, '8.027')]
[2025-07-29 11:17:17,584][15362] Saving new best policy, reward=8.027!
[2025-07-29 11:17:18,176][15375] Updated weights for policy 0, policy_version 310 (0.0012)
[2025-07-29 11:17:20,217][15375] Updated weights for policy 0, policy_version 320 (0.0012)
[2025-07-29 11:17:22,213][15375] Updated weights for policy 0, policy_version 330 (0.0012)
[2025-07-29 11:17:22,581][14877] Fps is (10 sec: 20070.2, 60 sec: 20070.4, 300 sec: 19368.2). Total num frames: 1355776. Throughput: 0: 5018.5. Samples: 331452. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:17:22,582][14877] Avg episode reward: [(0, '11.054')]
[2025-07-29 11:17:22,587][15362] Saving new best policy, reward=11.054!
[2025-07-29 11:17:24,265][15375] Updated weights for policy 0, policy_version 340 (0.0012)
[2025-07-29 11:17:26,310][15375] Updated weights for policy 0, policy_version 350 (0.0011)
[2025-07-29 11:17:27,581][14877] Fps is (10 sec: 20070.4, 60 sec: 20070.4, 300 sec: 19442.4). Total num frames: 1458176. Throughput: 0: 5007.7. Samples: 361564. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:17:27,582][14877] Avg episode reward: [(0, '11.577')]
[2025-07-29 11:17:27,584][15362] Saving new best policy, reward=11.577!
[2025-07-29 11:17:28,313][15375] Updated weights for policy 0, policy_version 360 (0.0011)
[2025-07-29 11:17:30,340][15375] Updated weights for policy 0, policy_version 370 (0.0012)
[2025-07-29 11:17:32,340][15375] Updated weights for policy 0, policy_version 380 (0.0011)
[2025-07-29 11:17:32,581][14877] Fps is (10 sec: 20480.2, 60 sec: 20070.4, 300 sec: 19507.2). Total num frames: 1560576. Throughput: 0: 5007.5. Samples: 376932. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:17:32,583][14877] Avg episode reward: [(0, '13.717')]
[2025-07-29 11:17:32,589][15362] Saving new best policy, reward=13.717!
[2025-07-29 11:17:34,324][15375] Updated weights for policy 0, policy_version 390 (0.0012)
[2025-07-29 11:17:36,323][15375] Updated weights for policy 0, policy_version 400 (0.0012)
[2025-07-29 11:17:37,581][14877] Fps is (10 sec: 20480.0, 60 sec: 20138.7, 300 sec: 19564.4). Total num frames: 1662976. Throughput: 0: 5024.2. Samples: 407700. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:17:37,583][14877] Avg episode reward: [(0, '17.373')]
[2025-07-29 11:17:37,584][15362] Saving new best policy, reward=17.373!
[2025-07-29 11:17:38,398][15375] Updated weights for policy 0, policy_version 410 (0.0012)
[2025-07-29 11:17:40,396][15375] Updated weights for policy 0, policy_version 420 (0.0011)
[2025-07-29 11:17:42,384][15375] Updated weights for policy 0, policy_version 430 (0.0012)
[2025-07-29 11:17:42,581][14877] Fps is (10 sec: 20480.0, 60 sec: 20138.7, 300 sec: 19615.3). Total num frames: 1765376. Throughput: 0: 5015.0. Samples: 438006. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:17:42,582][14877] Avg episode reward: [(0, '15.654')]
[2025-07-29 11:17:44,384][15375] Updated weights for policy 0, policy_version 440 (0.0011)
[2025-07-29 11:17:46,366][15375] Updated weights for policy 0, policy_version 450 (0.0012)
[2025-07-29 11:17:47,581][14877] Fps is (10 sec: 20070.5, 60 sec: 20138.7, 300 sec: 19617.7). Total num frames: 1863680. Throughput: 0: 5028.2. Samples: 453462. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:17:47,582][14877] Avg episode reward: [(0, '14.627')]
[2025-07-29 11:17:48,399][15375] Updated weights for policy 0, policy_version 460 (0.0012)
[2025-07-29 11:17:50,552][15375] Updated weights for policy 0, policy_version 470 (0.0013)
[2025-07-29 11:17:52,581][14877] Fps is (10 sec: 19660.8, 60 sec: 20070.4, 300 sec: 19619.8). Total num frames: 1961984. Throughput: 0: 5033.5. Samples: 483226. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:17:52,582][14877] Avg episode reward: [(0, '17.532')]
[2025-07-29 11:17:52,603][15362] Saving new best policy, reward=17.532!
[2025-07-29 11:17:52,606][15375] Updated weights for policy 0, policy_version 480 (0.0012)
[2025-07-29 11:17:54,649][15375] Updated weights for policy 0, policy_version 490 (0.0012)
[2025-07-29 11:17:56,672][15375] Updated weights for policy 0, policy_version 500 (0.0011)
[2025-07-29 11:17:57,581][14877] Fps is (10 sec: 20070.3, 60 sec: 20070.4, 300 sec: 19660.8). Total num frames: 2064384. Throughput: 0: 5037.6. Samples: 513312. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:17:57,582][14877] Avg episode reward: [(0, '16.624')]
[2025-07-29 11:17:58,682][15375] Updated weights for policy 0, policy_version 510 (0.0012)
[2025-07-29 11:18:00,683][15375] Updated weights for policy 0, policy_version 520 (0.0012)
[2025-07-29 11:18:02,581][14877] Fps is (10 sec: 20479.9, 60 sec: 20138.7, 300 sec: 19698.0). Total num frames: 2166784. Throughput: 0: 5048.4. Samples: 528644. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:18:02,582][14877] Avg episode reward: [(0, '16.520')]
[2025-07-29 11:18:02,588][15362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000529_2166784.pth...
[2025-07-29 11:18:02,653][15362] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000529_2166784.pth
[2025-07-29 11:18:02,782][15375] Updated weights for policy 0, policy_version 530 (0.0012)
[2025-07-29 11:18:04,841][15375] Updated weights for policy 0, policy_version 540 (0.0012)
[2025-07-29 11:18:06,830][15375] Updated weights for policy 0, policy_version 550 (0.0011)
[2025-07-29 11:18:07,581][14877] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 19696.4). Total num frames: 2265088. Throughput: 0: 5048.1. Samples: 558618. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:18:07,582][14877] Avg episode reward: [(0, '18.642')]
[2025-07-29 11:18:07,583][15362] Saving new best policy, reward=18.642!
[2025-07-29 11:18:08,855][15375] Updated weights for policy 0, policy_version 560 (0.0012)
[2025-07-29 11:18:10,853][15375] Updated weights for policy 0, policy_version 570 (0.0012)
[2025-07-29 11:18:12,581][14877] Fps is (10 sec: 20070.5, 60 sec: 20206.9, 300 sec: 19729.1). Total num frames: 2367488. Throughput: 0: 5061.8. Samples: 589346. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:18:12,582][14877] Avg episode reward: [(0, '20.052')]
[2025-07-29 11:18:12,586][15362] Saving new best policy, reward=20.052!
[2025-07-29 11:18:12,849][15375] Updated weights for policy 0, policy_version 580 (0.0012)
[2025-07-29 11:18:14,846][15375] Updated weights for policy 0, policy_version 590 (0.0012)
[2025-07-29 11:18:17,085][15375] Updated weights for policy 0, policy_version 600 (0.0012)
[2025-07-29 11:18:17,581][14877] Fps is (10 sec: 20070.3, 60 sec: 20138.7, 300 sec: 19726.3). Total num frames: 2465792. Throughput: 0: 5059.0. Samples: 604586. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-07-29 11:18:17,582][14877] Avg episode reward: [(0, '20.239')]
[2025-07-29 11:18:17,584][15362] Saving new best policy, reward=20.239!
[2025-07-29 11:18:19,094][15375] Updated weights for policy 0, policy_version 610 (0.0011)
[2025-07-29 11:18:21,091][15375] Updated weights for policy 0, policy_version 620 (0.0012)
[2025-07-29 11:18:22,581][14877] Fps is (10 sec: 20070.4, 60 sec: 20207.0, 300 sec: 19755.3). Total num frames: 2568192. Throughput: 0: 5032.0. Samples: 634140. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:18:22,582][14877] Avg episode reward: [(0, '19.108')]
[2025-07-29 11:18:23,088][15375] Updated weights for policy 0, policy_version 630 (0.0011)
[2025-07-29 11:18:25,114][15375] Updated weights for policy 0, policy_version 640 (0.0011)
[2025-07-29 11:18:27,122][15375] Updated weights for policy 0, policy_version 650 (0.0012)
[2025-07-29 11:18:27,581][14877] Fps is (10 sec: 20480.1, 60 sec: 20206.9, 300 sec: 19782.2). Total num frames: 2670592. Throughput: 0: 5037.0. Samples: 664670. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:18:27,582][14877] Avg episode reward: [(0, '19.611')]
[2025-07-29 11:18:29,241][15375] Updated weights for policy 0, policy_version 660 (0.0013)
[2025-07-29 11:18:31,258][15375] Updated weights for policy 0, policy_version 670 (0.0012)
[2025-07-29 11:18:32,581][14877] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 19777.8). Total num frames: 2768896. Throughput: 0: 5017.4. Samples: 679246. Policy #0 lag: (min: 0.0, avg: 0.6, max: 2.0)
[2025-07-29 11:18:32,582][14877] Avg episode reward: [(0, '18.892')]
[2025-07-29 11:18:33,246][15375] Updated weights for policy 0, policy_version 680 (0.0012)
[2025-07-29 11:18:35,234][15375] Updated weights for policy 0, policy_version 690 (0.0011)
[2025-07-29 11:18:37,236][15375] Updated weights for policy 0, policy_version 700 (0.0012)
[2025-07-29 11:18:37,581][14877] Fps is (10 sec: 20070.4, 60 sec: 20138.7, 300 sec: 19802.0). Total num frames: 2871296. Throughput: 0: 5040.0. Samples: 710026. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:18:37,582][14877] Avg episode reward: [(0, '20.674')]
[2025-07-29 11:18:37,583][15362] Saving new best policy, reward=20.674!
[2025-07-29 11:18:39,242][15375] Updated weights for policy 0, policy_version 710 (0.0012)
[2025-07-29 11:18:41,296][15375] Updated weights for policy 0, policy_version 720 (0.0012)
[2025-07-29 11:18:42,581][14877] Fps is (10 sec: 20479.9, 60 sec: 20138.6, 300 sec: 19824.6). Total num frames: 2973696. Throughput: 0: 5044.2. Samples: 740302. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:18:42,582][14877] Avg episode reward: [(0, '21.549')]
[2025-07-29 11:18:42,588][15362] Saving new best policy, reward=21.549!
[2025-07-29 11:18:43,352][15375] Updated weights for policy 0, policy_version 730 (0.0012)
[2025-07-29 11:18:45,329][15375] Updated weights for policy 0, policy_version 740 (0.0012)
[2025-07-29 11:18:47,327][15375] Updated weights for policy 0, policy_version 750 (0.0012)
[2025-07-29 11:18:47,581][14877] Fps is (10 sec: 20480.1, 60 sec: 20206.9, 300 sec: 19845.8). Total num frames: 3076096. Throughput: 0: 5044.3. Samples: 755636. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:18:47,582][14877] Avg episode reward: [(0, '22.523')]
[2025-07-29 11:18:47,583][15362] Saving new best policy, reward=22.523!
[2025-07-29 11:18:49,323][15375] Updated weights for policy 0, policy_version 760 (0.0012)
[2025-07-29 11:18:51,323][15375] Updated weights for policy 0, policy_version 770 (0.0012)
[2025-07-29 11:18:52,581][14877] Fps is (10 sec: 20480.2, 60 sec: 20275.2, 300 sec: 19865.6). Total num frames: 3178496. Throughput: 0: 5062.2. Samples: 786416. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:18:52,582][14877] Avg episode reward: [(0, '19.452')]
[2025-07-29 11:18:53,362][15375] Updated weights for policy 0, policy_version 780 (0.0012)
[2025-07-29 11:18:55,433][15375] Updated weights for policy 0, policy_version 790 (0.0011)
[2025-07-29 11:18:57,435][15375] Updated weights for policy 0, policy_version 800 (0.0011)
[2025-07-29 11:18:57,581][14877] Fps is (10 sec: 20070.5, 60 sec: 20206.9, 300 sec: 19859.4). Total num frames: 3276800. Throughput: 0: 5050.4. Samples: 816612. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:18:57,582][14877] Avg episode reward: [(0, '21.263')]
[2025-07-29 11:18:59,448][15375] Updated weights for policy 0, policy_version 810 (0.0012)
[2025-07-29 11:19:01,447][15375] Updated weights for policy 0, policy_version 820 (0.0012)
[2025-07-29 11:19:02,581][14877] Fps is (10 sec: 20070.2, 60 sec: 20206.9, 300 sec: 19877.6). Total num frames: 3379200. Throughput: 0: 5051.4. Samples: 831900. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:19:02,582][14877] Avg episode reward: [(0, '19.952')]
[2025-07-29 11:19:03,463][15375] Updated weights for policy 0, policy_version 830 (0.0012)
[2025-07-29 11:19:05,460][15375] Updated weights for policy 0, policy_version 840 (0.0012)
[2025-07-29 11:19:07,514][15375] Updated weights for policy 0, policy_version 850 (0.0011)
[2025-07-29 11:19:07,581][14877] Fps is (10 sec: 20480.0, 60 sec: 20275.2, 300 sec: 19894.9). Total num frames: 3481600. Throughput: 0: 5076.5. Samples: 862584. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:19:07,582][14877] Avg episode reward: [(0, '23.121')]
[2025-07-29 11:19:07,583][15362] Saving new best policy, reward=23.121!
[2025-07-29 11:19:09,550][15375] Updated weights for policy 0, policy_version 860 (0.0012)
[2025-07-29 11:19:11,531][15375] Updated weights for policy 0, policy_version 870 (0.0012)
[2025-07-29 11:19:12,581][14877] Fps is (10 sec: 20480.0, 60 sec: 20275.2, 300 sec: 19911.1). Total num frames: 3584000. Throughput: 0: 5072.0. Samples: 892912. Policy #0 lag: (min: 0.0, avg: 0.7, max: 1.0)
[2025-07-29 11:19:12,582][14877] Avg episode reward: [(0, '20.525')]
[2025-07-29 11:19:13,524][15375] Updated weights for policy 0, policy_version 880 (0.0011)
[2025-07-29 11:19:15,507][15375] Updated weights for policy 0, policy_version 890 (0.0012)
[2025-07-29 11:19:17,486][15375] Updated weights for policy 0, policy_version 900 (0.0012)
[2025-07-29 11:19:17,581][14877] Fps is (10 sec: 20479.9, 60 sec: 20343.5, 300 sec: 19926.5). Total num frames: 3686400. Throughput: 0: 5092.1. Samples: 908392. Policy #0 lag: (min: 0.0, avg: 0.7, max: 2.0)
[2025-07-29 11:19:17,582][14877] Avg episode reward: [(0, '23.864')]
[2025-07-29 11:19:17,583][15362] Saving new best policy, reward=23.864!
[2025-07-29 11:19:19,558][15375] Updated weights for policy 0, policy_version 910 (0.0012)
[2025-07-29 11:19:21,620][15375] Updated weights for policy 0, policy_version 920 (0.0011)
[2025-07-29 11:19:22,581][14877] Fps is (10 sec: 20070.5, 60 sec: 20275.2, 300 sec: 19919.5). Total num frames: 3784704. Throughput: 0: 5080.4. Samples: 938642. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:19:22,582][14877] Avg episode reward: [(0, '22.878')]
[2025-07-29 11:19:23,628][15375] Updated weights for policy 0, policy_version 930 (0.0012)
[2025-07-29 11:19:25,630][15375] Updated weights for policy 0, policy_version 940 (0.0012)
[2025-07-29 11:19:27,581][14877] Fps is (10 sec: 20070.3, 60 sec: 20275.2, 300 sec: 19933.9). Total num frames: 3887104. Throughput: 0: 5087.1. Samples: 969222. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:19:27,582][14877] Avg episode reward: [(0, '22.381')]
[2025-07-29 11:19:27,655][15375] Updated weights for policy 0, policy_version 950 (0.0011)
[2025-07-29 11:19:29,638][15375] Updated weights for policy 0, policy_version 960 (0.0012)
[2025-07-29 11:19:31,652][15375] Updated weights for policy 0, policy_version 970 (0.0011)
[2025-07-29 11:19:32,581][14877] Fps is (10 sec: 20479.9, 60 sec: 20343.4, 300 sec: 19947.5). Total num frames: 3989504. Throughput: 0: 5089.3. Samples: 984654. Policy #0 lag: (min: 0.0, avg: 0.8, max: 2.0)
[2025-07-29 11:19:32,582][14877] Avg episode reward: [(0, '21.898')]
[2025-07-29 11:19:33,283][15362] Stopping Batcher_0...
[2025-07-29 11:19:33,283][14877] Component Batcher_0 stopped!
[2025-07-29 11:19:33,283][15362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:19:33,284][15362] Loop batcher_evt_loop terminating...
[2025-07-29 11:19:33,309][15375] Weights refcount: 2 0
[2025-07-29 11:19:33,311][15375] Stopping InferenceWorker_p0-w0...
[2025-07-29 11:19:33,311][15375] Loop inference_proc0-0_evt_loop terminating...
[2025-07-29 11:19:33,311][14877] Component InferenceWorker_p0-w0 stopped!
[2025-07-29 11:19:33,322][15382] Stopping RolloutWorker_w5...
[2025-07-29 11:19:33,323][14877] Component RolloutWorker_w5 stopped!
[2025-07-29 11:19:33,323][15382] Loop rollout_proc5_evt_loop terminating...
[2025-07-29 11:19:33,325][15379] Stopping RolloutWorker_w4...
[2025-07-29 11:19:33,325][15379] Loop rollout_proc4_evt_loop terminating...
[2025-07-29 11:19:33,326][15383] Stopping RolloutWorker_w7...
[2025-07-29 11:19:33,325][14877] Component RolloutWorker_w4 stopped!
[2025-07-29 11:19:33,326][15378] Stopping RolloutWorker_w2...
[2025-07-29 11:19:33,326][15383] Loop rollout_proc7_evt_loop terminating...
[2025-07-29 11:19:33,327][15378] Loop rollout_proc2_evt_loop terminating...
[2025-07-29 11:19:33,327][14877] Component RolloutWorker_w7 stopped!
[2025-07-29 11:19:33,327][15381] Stopping RolloutWorker_w6...
[2025-07-29 11:19:33,328][14877] Component RolloutWorker_w2 stopped!
[2025-07-29 11:19:33,328][15377] Stopping RolloutWorker_w3...
[2025-07-29 11:19:33,329][15377] Loop rollout_proc3_evt_loop terminating...
[2025-07-29 11:19:33,328][14877] Component RolloutWorker_w6 stopped!
[2025-07-29 11:19:33,329][15381] Loop rollout_proc6_evt_loop terminating...
[2025-07-29 11:19:33,330][15376] Stopping RolloutWorker_w0...
[2025-07-29 11:19:33,329][14877] Component RolloutWorker_w3 stopped!
[2025-07-29 11:19:33,330][15376] Loop rollout_proc0_evt_loop terminating...
[2025-07-29 11:19:33,330][14877] Component RolloutWorker_w0 stopped!
[2025-07-29 11:19:33,331][15380] Stopping RolloutWorker_w1...
[2025-07-29 11:19:33,332][15380] Loop rollout_proc1_evt_loop terminating...
[2025-07-29 11:19:33,331][14877] Component RolloutWorker_w1 stopped!
[2025-07-29 11:19:33,387][15362] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:19:33,497][15362] Stopping LearnerWorker_p0...
[2025-07-29 11:19:33,498][15362] Loop learner_proc0_evt_loop terminating...
[2025-07-29 11:19:33,497][14877] Component LearnerWorker_p0 stopped!
[2025-07-29 11:19:33,499][14877] Waiting for process learner_proc0 to stop...
[2025-07-29 11:19:34,390][14877] Waiting for process inference_proc0-0 to join...
[2025-07-29 11:19:34,391][14877] Waiting for process rollout_proc0 to join...
[2025-07-29 11:19:34,392][14877] Waiting for process rollout_proc1 to join...
[2025-07-29 11:19:34,393][14877] Waiting for process rollout_proc2 to join...
[2025-07-29 11:19:34,393][14877] Waiting for process rollout_proc3 to join...
[2025-07-29 11:19:34,394][14877] Waiting for process rollout_proc4 to join...
[2025-07-29 11:19:34,395][14877] Waiting for process rollout_proc5 to join...
[2025-07-29 11:19:34,395][14877] Waiting for process rollout_proc6 to join...
[2025-07-29 11:19:34,396][14877] Waiting for process rollout_proc7 to join...
[2025-07-29 11:19:34,397][14877] Batcher 0 profile tree view:
batching: 15.9394, releasing_batches: 0.0226
[2025-07-29 11:19:34,397][14877] InferenceWorker_p0-w0 profile tree view:
wait_policy: 0.0000
  wait_policy_total: 3.6368
update_model: 3.2563
  weight_update: 0.0012
one_step: 0.0028
  handle_policy_step: 185.7811
    deserialize: 7.7116, stack: 1.3444, obs_to_device_normalize: 45.7883, forward: 88.8487, send_messages: 12.8386
    prepare_outputs: 22.0814
      to_cpu: 14.1540
[2025-07-29 11:19:34,398][14877] Learner 0 profile tree view:
misc: 0.0036, prepare_batch: 6.5224
train: 18.3334
  epoch_init: 0.0043, minibatch_init: 0.0052, losses_postprocess: 0.3326, kl_divergence: 0.3777, after_optimizer: 1.8839
  calculate_losses: 8.2978
    losses_init: 0.0031, forward_head: 0.6269, bptt_initial: 4.3473, tail: 0.6268, advantages_returns: 0.1559, losses: 1.1816
    bptt: 1.2128
      bptt_forward_core: 1.1619
  update: 7.1203
    clip: 0.7673
[2025-07-29 11:19:34,399][14877] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.1222, enqueue_policy_requests: 8.9219, env_step: 127.3378, overhead: 5.4915, complete_rollouts: 0.2142
save_policy_outputs: 7.9989
  split_output_tensors: 3.0649
[2025-07-29 11:19:34,400][14877] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.1216, enqueue_policy_requests: 8.8727, env_step: 127.3302, overhead: 5.5457, complete_rollouts: 0.2093
save_policy_outputs: 8.0453
  split_output_tensors: 3.0660
[2025-07-29 11:19:34,400][14877] Loop Runner_EvtLoop terminating...
[2025-07-29 11:19:34,401][14877] Runner profile tree view:
main_loop: 207.4044
[2025-07-29 11:19:34,402][14877] Collected {0: 4005888}, FPS: 19314.4
[2025-07-29 11:25:37,292][17818] Saving configuration to /content/train_dir/default_experiment/config.json...
[2025-07-29 11:25:37,294][17818] Rollout worker 0 uses device cpu
[2025-07-29 11:25:37,294][17818] Rollout worker 1 uses device cpu
[2025-07-29 11:25:37,295][17818] Rollout worker 2 uses device cpu
[2025-07-29 11:25:37,296][17818] Rollout worker 3 uses device cpu
[2025-07-29 11:25:37,296][17818] Rollout worker 4 uses device cpu
[2025-07-29 11:25:37,297][17818] Rollout worker 5 uses device cpu
[2025-07-29 11:25:37,298][17818] Rollout worker 6 uses device cpu
[2025-07-29 11:25:37,299][17818] Rollout worker 7 uses device cpu
[2025-07-29 11:25:37,341][17818] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:25:37,342][17818] InferenceWorker_p0-w0: min num requests: 2
[2025-07-29 11:25:37,371][17818] Starting all processes...
[2025-07-29 11:25:37,372][17818] Starting process learner_proc0
[2025-07-29 11:25:37,424][17818] Starting all processes...
[2025-07-29 11:25:37,428][17818] Starting process inference_proc0-0
[2025-07-29 11:25:37,428][17818] Starting process rollout_proc0
[2025-07-29 11:25:37,429][17818] Starting process rollout_proc1
[2025-07-29 11:25:37,429][17818] Starting process rollout_proc2
[2025-07-29 11:25:37,430][17818] Starting process rollout_proc3
[2025-07-29 11:25:37,430][17818] Starting process rollout_proc4
[2025-07-29 11:25:37,430][17818] Starting process rollout_proc5
[2025-07-29 11:25:37,433][17818] Starting process rollout_proc6
[2025-07-29 11:25:37,434][17818] Starting process rollout_proc7
[2025-07-29 11:25:39,421][18484] Worker 3 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:25:39,498][18477] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:25:39,499][18477] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for inference process 0
[2025-07-29 11:25:39,517][18477] Num visible devices: 1
[2025-07-29 11:25:39,535][18478] Worker 0 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:25:39,578][18480] Worker 2 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:25:39,650][18464] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:25:39,651][18464] Set environment var CUDA_VISIBLE_DEVICES to '0' (GPU indices [0]) for learning process 0
[2025-07-29 11:25:39,667][18483] Worker 7 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:25:39,668][18464] Num visible devices: 1
[2025-07-29 11:25:39,671][18485] Worker 4 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:25:39,708][18481] Worker 6 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:25:39,710][18479] Worker 1 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:25:39,713][18464] Starting seed is not provided
[2025-07-29 11:25:39,713][18464] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:25:39,713][18464] Initializing actor-critic model on device cuda:0
[2025-07-29 11:25:39,713][18464] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:25:39,714][18464] RunningMeanStd input shape: (1,)
[2025-07-29 11:25:39,728][18464] ConvEncoder: input_channels=3
[2025-07-29 11:25:39,758][18482] Worker 5 uses CPU cores [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
[2025-07-29 11:25:39,858][18464] Conv encoder output size: 512
[2025-07-29 11:25:39,859][18464] Policy head output size: 512
[2025-07-29 11:25:39,874][18464] Created Actor Critic model with architecture:
[2025-07-29 11:25:39,874][18464] ActorCriticSharedWeights(
  (obs_normalizer): ObservationNormalizer(
    (running_mean_std): RunningMeanStdDictInPlace(
      (running_mean_std): ModuleDict(
        (obs): RunningMeanStdInPlace()
      )
    )
  )
  (returns_normalizer): RecursiveScriptModule(original_name=RunningMeanStdInPlace)
  (encoder): VizdoomEncoder(
    (basic_encoder): ConvEncoder(
      (enc): RecursiveScriptModule(
        original_name=ConvEncoderImpl
        (conv_head): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Conv2d)
          (1): RecursiveScriptModule(original_name=ELU)
          (2): RecursiveScriptModule(original_name=Conv2d)
          (3): RecursiveScriptModule(original_name=ELU)
          (4): RecursiveScriptModule(original_name=Conv2d)
          (5): RecursiveScriptModule(original_name=ELU)
        )
        (mlp_layers): RecursiveScriptModule(
          original_name=Sequential
          (0): RecursiveScriptModule(original_name=Linear)
          (1): RecursiveScriptModule(original_name=ELU)
        )
      )
    )
  )
  (core): ModelCoreRNN(
    (core): GRU(512, 512)
  )
  (decoder): MlpDecoder(
    (mlp): Identity()
  )
  (critic_linear): Linear(in_features=512, out_features=1, bias=True)
  (action_parameterization): ActionParameterizationDefault(
    (distribution_linear): Linear(in_features=512, out_features=5, bias=True)
  )
)
[2025-07-29 11:25:41,670][18464] Using optimizer <class 'torch.optim.adam.Adam'>
[2025-07-29 11:25:41,670][18464] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000978_4005888.pth...
[2025-07-29 11:25:41,702][18464] Loading model from checkpoint
[2025-07-29 11:25:41,706][18464] Loaded experiment state at self.train_step=978, self.env_steps=4005888
[2025-07-29 11:25:41,707][18464] Initialized policy 0 weights for model version 978
[2025-07-29 11:25:41,709][18464] LearnerWorker_p0 finished initialization!
[2025-07-29 11:25:41,709][18464] Using GPUs [0] for process 0 (actually maps to GPUs [0])
[2025-07-29 11:25:41,806][18477] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:25:41,807][18477] RunningMeanStd input shape: (1,)
[2025-07-29 11:25:41,818][18477] ConvEncoder: input_channels=3
[2025-07-29 11:25:41,924][18477] Conv encoder output size: 512
[2025-07-29 11:25:41,925][18477] Policy head output size: 512
[2025-07-29 11:25:43,599][17818] Inference worker 0-0 is ready!
[2025-07-29 11:25:43,600][17818] All inference workers are ready! Signal rollout workers to start!
[2025-07-29 11:25:43,613][18480] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:25:43,613][18479] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:25:43,618][18484] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:25:43,618][18478] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:25:43,618][18483] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:25:43,618][18481] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:25:43,619][18482] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:25:43,619][18485] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:25:43,877][18479] Decorrelating experience for 0 frames...
[2025-07-29 11:25:43,896][18485] Decorrelating experience for 0 frames...
[2025-07-29 11:25:43,896][18484] Decorrelating experience for 0 frames...
[2025-07-29 11:25:43,897][18483] Decorrelating experience for 0 frames...
[2025-07-29 11:25:43,897][18478] Decorrelating experience for 0 frames...
[2025-07-29 11:25:44,120][18480] Decorrelating experience for 0 frames...
[2025-07-29 11:25:44,142][18484] Decorrelating experience for 32 frames...
[2025-07-29 11:25:44,142][18478] Decorrelating experience for 32 frames...
[2025-07-29 11:25:44,142][18483] Decorrelating experience for 32 frames...
[2025-07-29 11:25:44,164][18479] Decorrelating experience for 32 frames...
[2025-07-29 11:25:44,391][18481] Decorrelating experience for 0 frames...
[2025-07-29 11:25:44,435][18483] Decorrelating experience for 64 frames...
[2025-07-29 11:25:44,435][18482] Decorrelating experience for 0 frames...
[2025-07-29 11:25:44,439][18484] Decorrelating experience for 64 frames...
[2025-07-29 11:25:44,451][18480] Decorrelating experience for 32 frames...
[2025-07-29 11:25:44,473][18478] Decorrelating experience for 64 frames...
[2025-07-29 11:25:44,548][17818] Fps is (10 sec: nan, 60 sec: nan, 300 sec: nan). Total num frames: 4005888. Throughput: 0: nan. Samples: 0. Policy #0 lag: (min: -1.0, avg: -1.0, max: -1.0)
[2025-07-29 11:25:44,646][18485] Decorrelating experience for 32 frames...
[2025-07-29 11:25:44,687][18482] Decorrelating experience for 32 frames...
[2025-07-29 11:25:44,738][18481] Decorrelating experience for 32 frames...
[2025-07-29 11:25:44,744][18484] Decorrelating experience for 96 frames...
[2025-07-29 11:25:44,746][18483] Decorrelating experience for 96 frames...
[2025-07-29 11:25:44,931][18478] Decorrelating experience for 96 frames...
[2025-07-29 11:25:44,948][18485] Decorrelating experience for 64 frames...
[2025-07-29 11:25:44,997][18480] Decorrelating experience for 64 frames...
[2025-07-29 11:25:45,003][18482] Decorrelating experience for 64 frames...
[2025-07-29 11:25:45,007][18479] Decorrelating experience for 64 frames...
[2025-07-29 11:25:45,289][18482] Decorrelating experience for 96 frames...
[2025-07-29 11:25:45,290][18480] Decorrelating experience for 96 frames...
[2025-07-29 11:25:45,297][18485] Decorrelating experience for 96 frames...
[2025-07-29 11:25:45,304][18479] Decorrelating experience for 96 frames...
[2025-07-29 11:25:45,605][18481] Decorrelating experience for 64 frames...
[2025-07-29 11:25:45,983][18464] Signal inference workers to stop experience collection...
[2025-07-29 11:25:45,993][18477] InferenceWorker_p0-w0: stopping experience collection
[2025-07-29 11:25:46,007][18481] Decorrelating experience for 96 frames...
[2025-07-29 11:25:47,020][18464] Signal inference workers to resume experience collection...
[2025-07-29 11:25:47,021][18477] InferenceWorker_p0-w0: resuming experience collection
[2025-07-29 11:25:47,022][18464] Stopping Batcher_0...
[2025-07-29 11:25:47,022][18464] Loop batcher_evt_loop terminating...
[2025-07-29 11:25:47,027][17818] Component Batcher_0 stopped!
[2025-07-29 11:25:47,034][18477] Weights refcount: 2 0
[2025-07-29 11:25:47,035][18477] Stopping InferenceWorker_p0-w0...
[2025-07-29 11:25:47,036][18477] Loop inference_proc0-0_evt_loop terminating...
[2025-07-29 11:25:47,036][17818] Component InferenceWorker_p0-w0 stopped!
[2025-07-29 11:25:47,055][18480] Stopping RolloutWorker_w2...
[2025-07-29 11:25:47,055][18480] Loop rollout_proc2_evt_loop terminating...
[2025-07-29 11:25:47,055][17818] Component RolloutWorker_w2 stopped!
[2025-07-29 11:25:47,056][18484] Stopping RolloutWorker_w3...
[2025-07-29 11:25:47,057][17818] Component RolloutWorker_w3 stopped!
[2025-07-29 11:25:47,057][18482] Stopping RolloutWorker_w5...
[2025-07-29 11:25:47,057][18485] Stopping RolloutWorker_w4...
[2025-07-29 11:25:47,057][18484] Loop rollout_proc3_evt_loop terminating...
[2025-07-29 11:25:47,058][18482] Loop rollout_proc5_evt_loop terminating...
[2025-07-29 11:25:47,057][18479] Stopping RolloutWorker_w1...
[2025-07-29 11:25:47,058][18485] Loop rollout_proc4_evt_loop terminating...
[2025-07-29 11:25:47,057][18481] Stopping RolloutWorker_w6...
[2025-07-29 11:25:47,058][17818] Component RolloutWorker_w6 stopped!
[2025-07-29 11:25:47,058][18479] Loop rollout_proc1_evt_loop terminating...
[2025-07-29 11:25:47,058][17818] Component RolloutWorker_w5 stopped!
[2025-07-29 11:25:47,059][18481] Loop rollout_proc6_evt_loop terminating...
[2025-07-29 11:25:47,059][17818] Component RolloutWorker_w4 stopped!
[2025-07-29 11:25:47,060][18478] Stopping RolloutWorker_w0...
[2025-07-29 11:25:47,060][17818] Component RolloutWorker_w1 stopped!
[2025-07-29 11:25:47,061][18478] Loop rollout_proc0_evt_loop terminating...
[2025-07-29 11:25:47,061][17818] Component RolloutWorker_w0 stopped!
[2025-07-29 11:25:47,061][18483] Stopping RolloutWorker_w7...
[2025-07-29 11:25:47,062][17818] Component RolloutWorker_w7 stopped!
[2025-07-29 11:25:47,062][18483] Loop rollout_proc7_evt_loop terminating...
[2025-07-29 11:25:48,071][18464] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
[2025-07-29 11:25:48,116][18464] Removing /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000544_2228224.pth
[2025-07-29 11:25:48,121][18464] Saving /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
[2025-07-29 11:25:48,184][18464] Stopping LearnerWorker_p0...
[2025-07-29 11:25:48,185][18464] Loop learner_proc0_evt_loop terminating...
[2025-07-29 11:25:48,185][17818] Component LearnerWorker_p0 stopped!
[2025-07-29 11:25:48,186][17818] Waiting for process learner_proc0 to stop...
[2025-07-29 11:25:48,739][17818] Waiting for process inference_proc0-0 to join...
[2025-07-29 11:25:48,740][17818] Waiting for process rollout_proc0 to join...
[2025-07-29 11:25:48,741][17818] Waiting for process rollout_proc1 to join...
[2025-07-29 11:25:48,742][17818] Waiting for process rollout_proc2 to join...
[2025-07-29 11:25:48,743][17818] Waiting for process rollout_proc3 to join...
[2025-07-29 11:25:48,744][17818] Waiting for process rollout_proc4 to join...
[2025-07-29 11:25:48,744][17818] Waiting for process rollout_proc5 to join...
[2025-07-29 11:25:48,745][17818] Waiting for process rollout_proc6 to join...
[2025-07-29 11:25:48,746][17818] Waiting for process rollout_proc7 to join...
[2025-07-29 11:25:48,747][17818] Batcher 0 profile tree view:
batching: 0.0806, releasing_batches: 0.0004
[2025-07-29 11:25:48,748][17818] InferenceWorker_p0-w0 profile tree view:
update_model: 0.0057
wait_policy: 0.0000
  wait_policy_total: 1.2174
one_step: 0.0029
  handle_policy_step: 1.1303
    deserialize: 0.0287, stack: 0.0038, obs_to_device_normalize: 0.1447, forward: 0.8114, send_messages: 0.0405
    prepare_outputs: 0.0768
      to_cpu: 0.0473
[2025-07-29 11:25:48,748][17818] Learner 0 profile tree view:
misc: 0.0000, prepare_batch: 2.0204
train: 0.3211
  epoch_init: 0.0000, minibatch_init: 0.0000, losses_postprocess: 0.0003, kl_divergence: 0.0004, after_optimizer: 0.0037
  calculate_losses: 0.0610
    losses_init: 0.0000, forward_head: 0.0493, bptt_initial: 0.0044, tail: 0.0013, advantages_returns: 0.0009, losses: 0.0025
    bptt: 0.0022
      bptt_forward_core: 0.0022
  update: 0.2550
    clip: 0.0023
[2025-07-29 11:25:48,749][17818] RolloutWorker_w0 profile tree view:
wait_for_trajectories: 0.0007, enqueue_policy_requests: 0.0368, env_step: 0.4303, overhead: 0.0194, complete_rollouts: 0.0007
save_policy_outputs: 0.0320
  split_output_tensors: 0.0107
[2025-07-29 11:25:48,750][17818] RolloutWorker_w7 profile tree view:
wait_for_trajectories: 0.0007, enqueue_policy_requests: 0.0368, env_step: 0.4288, overhead: 0.0198, complete_rollouts: 0.0007
save_policy_outputs: 0.0326
  split_output_tensors: 0.0108
[2025-07-29 11:25:48,751][17818] Loop Runner_EvtLoop terminating...
[2025-07-29 11:25:48,752][17818] Runner profile tree view:
main_loop: 11.3815
[2025-07-29 11:25:48,753][17818] Collected {0: 4014080}, FPS: 719.8
[2025-07-29 11:25:59,169][17818] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 11:25:59,170][17818] Overriding arg 'num_workers' with value 1 passed from command line
[2025-07-29 11:25:59,170][17818] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-07-29 11:25:59,171][17818] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-07-29 11:25:59,172][17818] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:25:59,172][17818] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-07-29 11:25:59,173][17818] Adding new argument 'max_num_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:25:59,173][17818] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-07-29 11:25:59,175][17818] Adding new argument 'push_to_hub'=False that is not in the saved config file!
[2025-07-29 11:25:59,175][17818] Adding new argument 'hf_repository'=None that is not in the saved config file!
[2025-07-29 11:25:59,176][17818] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-07-29 11:25:59,176][17818] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-07-29 11:25:59,177][17818] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-07-29 11:25:59,177][17818] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-07-29 11:25:59,178][17818] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-07-29 11:25:59,189][17818] Doom resolution: 160x120, resize resolution: (128, 72)
[2025-07-29 11:25:59,190][17818] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:25:59,191][17818] RunningMeanStd input shape: (1,)
[2025-07-29 11:25:59,203][17818] ConvEncoder: input_channels=3
[2025-07-29 11:25:59,335][17818] Conv encoder output size: 512
[2025-07-29 11:25:59,336][17818] Policy head output size: 512
[2025-07-29 11:26:01,153][17818] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
[2025-07-29 11:26:02,083][17818] Num frames 100...
[2025-07-29 11:26:02,203][17818] Num frames 200...
[2025-07-29 11:26:02,325][17818] Num frames 300...
[2025-07-29 11:26:02,448][17818] Num frames 400...
[2025-07-29 11:26:02,560][17818] Avg episode rewards: #0: 5.480, true rewards: #0: 4.480
[2025-07-29 11:26:02,561][17818] Avg episode reward: 5.480, avg true_objective: 4.480
[2025-07-29 11:26:02,629][17818] Num frames 500...
[2025-07-29 11:26:02,759][17818] Num frames 600...
[2025-07-29 11:26:02,891][17818] Num frames 700...
[2025-07-29 11:26:03,017][17818] Num frames 800...
[2025-07-29 11:26:03,140][17818] Num frames 900...
[2025-07-29 11:26:03,264][17818] Num frames 1000...
[2025-07-29 11:26:03,386][17818] Num frames 1100...
[2025-07-29 11:26:03,508][17818] Num frames 1200...
[2025-07-29 11:26:03,630][17818] Num frames 1300...
[2025-07-29 11:26:03,753][17818] Num frames 1400...
[2025-07-29 11:26:03,874][17818] Num frames 1500...
[2025-07-29 11:26:03,999][17818] Num frames 1600...
[2025-07-29 11:26:04,119][17818] Num frames 1700...
[2025-07-29 11:26:04,240][17818] Num frames 1800...
[2025-07-29 11:26:04,364][17818] Num frames 1900...
[2025-07-29 11:26:04,489][17818] Num frames 2000...
[2025-07-29 11:26:04,610][17818] Num frames 2100...
[2025-07-29 11:26:04,733][17818] Num frames 2200...
[2025-07-29 11:26:04,857][17818] Num frames 2300...
[2025-07-29 11:26:04,982][17818] Num frames 2400...
[2025-07-29 11:26:05,107][17818] Num frames 2500...
[2025-07-29 11:26:05,218][17818] Avg episode rewards: #0: 29.739, true rewards: #0: 12.740
[2025-07-29 11:26:05,219][17818] Avg episode reward: 29.739, avg true_objective: 12.740
[2025-07-29 11:26:05,284][17818] Num frames 2600...
[2025-07-29 11:26:05,404][17818] Num frames 2700...
[2025-07-29 11:26:05,525][17818] Num frames 2800...
[2025-07-29 11:26:05,646][17818] Num frames 2900...
[2025-07-29 11:26:05,767][17818] Num frames 3000...
[2025-07-29 11:26:05,888][17818] Num frames 3100...
[2025-07-29 11:26:06,010][17818] Avg episode rewards: #0: 23.520, true rewards: #0: 10.520
[2025-07-29 11:26:06,010][17818] Avg episode reward: 23.520, avg true_objective: 10.520
[2025-07-29 11:26:06,064][17818] Num frames 3200...
[2025-07-29 11:26:06,185][17818] Num frames 3300...
[2025-07-29 11:26:06,303][17818] Num frames 3400...
[2025-07-29 11:26:06,422][17818] Num frames 3500...
[2025-07-29 11:26:06,544][17818] Num frames 3600...
[2025-07-29 11:26:06,693][17818] Avg episode rewards: #0: 19.942, true rewards: #0: 9.193
[2025-07-29 11:26:06,694][17818] Avg episode reward: 19.942, avg true_objective: 9.193
[2025-07-29 11:26:06,722][17818] Num frames 3700...
[2025-07-29 11:26:06,841][17818] Num frames 3800...
[2025-07-29 11:26:06,962][17818] Num frames 3900...
[2025-07-29 11:26:07,083][17818] Num frames 4000...
[2025-07-29 11:26:07,201][17818] Num frames 4100...
[2025-07-29 11:26:07,320][17818] Num frames 4200...
[2025-07-29 11:26:07,442][17818] Num frames 4300...
[2025-07-29 11:26:07,595][17818] Avg episode rewards: #0: 18.762, true rewards: #0: 8.762
[2025-07-29 11:26:07,596][17818] Avg episode reward: 18.762, avg true_objective: 8.762
[2025-07-29 11:26:07,619][17818] Num frames 4400...
[2025-07-29 11:26:07,739][17818] Num frames 4500...
[2025-07-29 11:26:07,863][17818] Num frames 4600...
[2025-07-29 11:26:07,985][17818] Num frames 4700...
[2025-07-29 11:26:08,106][17818] Num frames 4800...
[2025-07-29 11:26:08,227][17818] Num frames 4900...
[2025-07-29 11:26:08,351][17818] Num frames 5000...
[2025-07-29 11:26:08,473][17818] Num frames 5100...
[2025-07-29 11:26:08,594][17818] Num frames 5200...
[2025-07-29 11:26:08,716][17818] Num frames 5300...
[2025-07-29 11:26:08,836][17818] Num frames 5400...
[2025-07-29 11:26:08,935][17818] Avg episode rewards: #0: 19.895, true rewards: #0: 9.062
[2025-07-29 11:26:08,936][17818] Avg episode reward: 19.895, avg true_objective: 9.062
[2025-07-29 11:26:09,011][17818] Num frames 5500...
[2025-07-29 11:26:09,134][17818] Num frames 5600...
[2025-07-29 11:26:09,254][17818] Num frames 5700...
[2025-07-29 11:26:09,374][17818] Num frames 5800...
[2025-07-29 11:26:09,496][17818] Num frames 5900...
[2025-07-29 11:26:09,617][17818] Num frames 6000...
[2025-07-29 11:26:09,737][17818] Num frames 6100...
[2025-07-29 11:26:09,859][17818] Num frames 6200...
[2025-07-29 11:26:09,980][17818] Num frames 6300...
[2025-07-29 11:26:10,101][17818] Num frames 6400...
[2025-07-29 11:26:10,230][17818] Avg episode rewards: #0: 20.516, true rewards: #0: 9.230
[2025-07-29 11:26:10,231][17818] Avg episode reward: 20.516, avg true_objective: 9.230
[2025-07-29 11:26:10,280][17818] Num frames 6500...
[2025-07-29 11:26:10,399][17818] Num frames 6600...
[2025-07-29 11:26:10,521][17818] Num frames 6700...
[2025-07-29 11:26:10,644][17818] Num frames 6800...
[2025-07-29 11:26:10,765][17818] Num frames 6900...
[2025-07-29 11:26:10,886][17818] Num frames 7000...
[2025-07-29 11:26:11,007][17818] Num frames 7100...
[2025-07-29 11:26:11,127][17818] Num frames 7200...
[2025-07-29 11:26:11,248][17818] Num frames 7300...
[2025-07-29 11:26:11,370][17818] Num frames 7400...
[2025-07-29 11:26:11,494][17818] Num frames 7500...
[2025-07-29 11:26:11,617][17818] Num frames 7600...
[2025-07-29 11:26:11,741][17818] Num frames 7700...
[2025-07-29 11:26:11,864][17818] Num frames 7800...
[2025-07-29 11:26:11,987][17818] Num frames 7900...
[2025-07-29 11:26:12,111][17818] Num frames 8000...
[2025-07-29 11:26:12,236][17818] Num frames 8100...
[2025-07-29 11:26:12,360][17818] Num frames 8200...
[2025-07-29 11:26:12,485][17818] Num frames 8300...
[2025-07-29 11:26:12,609][17818] Num frames 8400...
[2025-07-29 11:26:12,680][17818] Avg episode rewards: #0: 24.391, true rewards: #0: 10.516
[2025-07-29 11:26:12,681][17818] Avg episode reward: 24.391, avg true_objective: 10.516
[2025-07-29 11:26:12,784][17818] Num frames 8500...
[2025-07-29 11:26:12,907][17818] Num frames 8600...
[2025-07-29 11:26:13,027][17818] Num frames 8700...
[2025-07-29 11:26:13,148][17818] Num frames 8800...
[2025-07-29 11:26:13,273][17818] Num frames 8900...
[2025-07-29 11:26:13,395][17818] Num frames 9000...
[2025-07-29 11:26:13,524][17818] Num frames 9100...
[2025-07-29 11:26:13,638][17818] Avg episode rewards: #0: 23.165, true rewards: #0: 10.166
[2025-07-29 11:26:13,639][17818] Avg episode reward: 23.165, avg true_objective: 10.166
[2025-07-29 11:26:13,701][17818] Num frames 9200...
[2025-07-29 11:26:13,824][17818] Num frames 9300...
[2025-07-29 11:26:13,944][17818] Num frames 9400...
[2025-07-29 11:26:14,067][17818] Num frames 9500...
[2025-07-29 11:26:14,163][17818] Avg episode rewards: #0: 21.733, true rewards: #0: 9.533
[2025-07-29 11:26:14,164][17818] Avg episode reward: 21.733, avg true_objective: 9.533
[2025-07-29 11:26:37,015][17818] Replay video saved to /content/train_dir/default_experiment/replay.mp4!
[2025-07-29 11:30:35,093][17818] Loading existing experiment configuration from /content/train_dir/default_experiment/config.json
[2025-07-29 11:30:35,094][17818] Overriding arg 'num_workers' with value 1 passed from command line
[2025-07-29 11:30:35,095][17818] Adding new argument 'no_render'=True that is not in the saved config file!
[2025-07-29 11:30:35,095][17818] Adding new argument 'save_video'=True that is not in the saved config file!
[2025-07-29 11:30:35,096][17818] Adding new argument 'video_frames'=1000000000.0 that is not in the saved config file!
[2025-07-29 11:30:35,096][17818] Adding new argument 'video_name'=None that is not in the saved config file!
[2025-07-29 11:30:35,097][17818] Adding new argument 'max_num_frames'=100000 that is not in the saved config file!
[2025-07-29 11:30:35,097][17818] Adding new argument 'max_num_episodes'=10 that is not in the saved config file!
[2025-07-29 11:30:35,098][17818] Adding new argument 'push_to_hub'=True that is not in the saved config file!
[2025-07-29 11:30:35,099][17818] Adding new argument 'hf_repository'='Dumoura/rl_vizdoom_health_gathering_supreme' that is not in the saved config file!
[2025-07-29 11:30:35,100][17818] Adding new argument 'policy_index'=0 that is not in the saved config file!
[2025-07-29 11:30:35,100][17818] Adding new argument 'eval_deterministic'=False that is not in the saved config file!
[2025-07-29 11:30:35,101][17818] Adding new argument 'train_script'=None that is not in the saved config file!
[2025-07-29 11:30:35,101][17818] Adding new argument 'enjoy_script'=None that is not in the saved config file!
[2025-07-29 11:30:35,102][17818] Using frameskip 1 and render_action_repeat=4 for evaluation
[2025-07-29 11:30:35,108][17818] RunningMeanStd input shape: (3, 72, 128)
[2025-07-29 11:30:35,109][17818] RunningMeanStd input shape: (1,)
[2025-07-29 11:30:35,118][17818] ConvEncoder: input_channels=3
[2025-07-29 11:30:35,151][17818] Conv encoder output size: 512
[2025-07-29 11:30:35,152][17818] Policy head output size: 512
[2025-07-29 11:30:35,170][17818] Loading state from checkpoint /content/train_dir/default_experiment/checkpoint_p0/checkpoint_000000980_4014080.pth...
[2025-07-29 11:30:35,654][17818] Num frames 100...
[2025-07-29 11:30:35,776][17818] Num frames 200...
[2025-07-29 11:30:35,898][17818] Num frames 300...
[2025-07-29 11:30:36,019][17818] Num frames 400...
[2025-07-29 11:30:36,142][17818] Num frames 500...
[2025-07-29 11:30:36,262][17818] Num frames 600...
[2025-07-29 11:30:36,380][17818] Num frames 700...
[2025-07-29 11:30:36,501][17818] Num frames 800...
[2025-07-29 11:30:36,626][17818] Num frames 900...
[2025-07-29 11:30:36,723][17818] Avg episode rewards: #0: 20.350, true rewards: #0: 9.350
[2025-07-29 11:30:36,724][17818] Avg episode reward: 20.350, avg true_objective: 9.350
[2025-07-29 11:30:36,803][17818] Num frames 1000...
[2025-07-29 11:30:36,922][17818] Num frames 1100...
[2025-07-29 11:30:37,049][17818] Num frames 1200...
[2025-07-29 11:30:37,174][17818] Num frames 1300...
[2025-07-29 11:30:37,301][17818] Num frames 1400...
[2025-07-29 11:30:37,421][17818] Num frames 1500...
[2025-07-29 11:30:37,489][17818] Avg episode rewards: #0: 16.555, true rewards: #0: 7.555
[2025-07-29 11:30:37,490][17818] Avg episode reward: 16.555, avg true_objective: 7.555
[2025-07-29 11:30:37,600][17818] Num frames 1600...
[2025-07-29 11:30:37,719][17818] Num frames 1700...
[2025-07-29 11:30:37,842][17818] Num frames 1800...
[2025-07-29 11:30:37,967][17818] Num frames 1900...
[2025-07-29 11:30:38,094][17818] Num frames 2000...
[2025-07-29 11:30:38,225][17818] Num frames 2100...
[2025-07-29 11:30:38,357][17818] Num frames 2200...
[2025-07-29 11:30:38,488][17818] Num frames 2300...
[2025-07-29 11:30:38,617][17818] Num frames 2400...
[2025-07-29 11:30:38,748][17818] Num frames 2500...
[2025-07-29 11:30:38,872][17818] Num frames 2600...
[2025-07-29 11:30:38,991][17818] Num frames 2700...
[2025-07-29 11:30:39,113][17818] Num frames 2800...
[2025-07-29 11:30:39,235][17818] Num frames 2900...
[2025-07-29 11:30:39,359][17818] Num frames 3000...
[2025-07-29 11:30:39,480][17818] Num frames 3100...
[2025-07-29 11:30:39,606][17818] Num frames 3200...
[2025-07-29 11:30:39,729][17818] Num frames 3300...
[2025-07-29 11:30:39,850][17818] Num frames 3400...
[2025-07-29 11:30:39,972][17818] Num frames 3500...
[2025-07-29 11:30:40,095][17818] Num frames 3600...
[2025-07-29 11:30:40,163][17818] Avg episode rewards: #0: 30.036, true rewards: #0: 12.037
[2025-07-29 11:30:40,164][17818] Avg episode reward: 30.036, avg true_objective: 12.037
[2025-07-29 11:30:40,272][17818] Num frames 3700...
[2025-07-29 11:30:40,392][17818] Num frames 3800...
[2025-07-29 11:30:40,513][17818] Num frames 3900...
[2025-07-29 11:30:40,634][17818] Num frames 4000...
[2025-07-29 11:30:40,754][17818] Num frames 4100...
[2025-07-29 11:30:40,874][17818] Num frames 4200...
[2025-07-29 11:30:40,997][17818] Num frames 4300...
[2025-07-29 11:30:41,118][17818] Num frames 4400...
[2025-07-29 11:30:41,241][17818] Num frames 4500...
[2025-07-29 11:30:41,363][17818] Num frames 4600...
[2025-07-29 11:30:41,484][17818] Num frames 4700...
[2025-07-29 11:30:41,607][17818] Num frames 4800...
[2025-07-29 11:30:41,727][17818] Num frames 4900...
[2025-07-29 11:30:41,849][17818] Num frames 5000...
[2025-07-29 11:30:41,969][17818] Num frames 5100...
[2025-07-29 11:30:42,090][17818] Num frames 5200...
[2025-07-29 11:30:42,213][17818] Num frames 5300...
[2025-07-29 11:30:42,336][17818] Num frames 5400...
[2025-07-29 11:30:42,462][17818] Num frames 5500...
[2025-07-29 11:30:42,585][17818] Num frames 5600...
[2025-07-29 11:30:42,709][17818] Num frames 5700...
[2025-07-29 11:30:42,777][17818] Avg episode rewards: #0: 35.777, true rewards: #0: 14.278
[2025-07-29 11:30:42,778][17818] Avg episode reward: 35.777, avg true_objective: 14.278
[2025-07-29 11:30:42,887][17818] Num frames 5800...
[2025-07-29 11:30:43,008][17818] Num frames 5900...
[2025-07-29 11:30:43,130][17818] Num frames 6000...
[2025-07-29 11:30:43,254][17818] Num frames 6100...
[2025-07-29 11:30:43,375][17818] Num frames 6200...
[2025-07-29 11:30:43,496][17818] Num frames 6300...
[2025-07-29 11:30:43,618][17818] Num frames 6400...
[2025-07-29 11:30:43,737][17818] Num frames 6500...
[2025-07-29 11:30:43,862][17818] Num frames 6600...
[2025-07-29 11:30:43,985][17818] Num frames 6700...
[2025-07-29 11:30:44,107][17818] Num frames 6800...
[2025-07-29 11:30:44,282][17818] Avg episode rewards: #0: 34.196, true rewards: #0: 13.796
[2025-07-29 11:30:44,283][17818] Avg episode reward: 34.196, avg true_objective: 13.796
[2025-07-29 11:30:44,286][17818] Num frames 6900...
[2025-07-29 11:30:44,408][17818] Num frames 7000...
[2025-07-29 11:30:44,530][17818] Num frames 7100...
[2025-07-29 11:30:44,652][17818] Num frames 7200...
[2025-07-29 11:30:44,776][17818] Num frames 7300...
[2025-07-29 11:30:44,898][17818] Num frames 7400...
[2025-07-29 11:30:45,021][17818] Num frames 7500...
[2025-07-29 11:30:45,142][17818] Num frames 7600...
[2025-07-29 11:30:45,262][17818] Num frames 7700...
[2025-07-29 11:30:45,384][17818] Num frames 7800...
[2025-07-29 11:30:45,509][17818] Num frames 7900...
[2025-07-29 11:30:45,631][17818] Num frames 8000...
[2025-07-29 11:30:45,752][17818] Num frames 8100...
[2025-07-29 11:30:45,877][17818] Num frames 8200...
[2025-07-29 11:30:45,996][17818] Num frames 8300...
[2025-07-29 11:30:46,120][17818] Num frames 8400...
[2025-07-29 11:30:46,240][17818] Num frames 8500...
[2025-07-29 11:30:46,364][17818] Num frames 8600...
[2025-07-29 11:30:46,488][17818] Num frames 8700...
[2025-07-29 11:30:46,612][17818] Num frames 8800...
[2025-07-29 11:30:46,736][17818] Num frames 8900...
[2025-07-29 11:30:46,912][17818] Avg episode rewards: #0: 37.829, true rewards: #0: 14.997
[2025-07-29 11:30:46,913][17818] Avg episode reward: 37.829, avg true_objective: 14.997
[2025-07-29 11:30:46,915][17818] Num frames 9000...
[2025-07-29 11:30:47,036][17818] Num frames 9100...
[2025-07-29 11:30:47,157][17818] Num frames 9200...
[2025-07-29 11:30:47,282][17818] Num frames 9300...
[2025-07-29 11:30:47,404][17818] Num frames 9400...
[2025-07-29 11:30:47,554][17818] Avg episode rewards: #0: 33.824, true rewards: #0: 13.539
[2025-07-29 11:30:47,555][17818] Avg episode reward: 33.824, avg true_objective: 13.539
[2025-07-29 11:30:47,583][17818] Num frames 9500...
[2025-07-29 11:30:47,703][17818] Num frames 9600...
[2025-07-29 11:30:47,826][17818] Num frames 9700...
[2025-07-29 11:30:47,949][17818] Num frames 9800...
[2025-07-29 11:30:48,070][17818] Num frames 9900...
[2025-07-29 11:30:48,193][17818] Num frames 10000...
[2025-07-29 11:30:48,316][17818] Num frames 10100...
[2025-07-29 11:30:48,441][17818] Num frames 10200...
[2025-07-29 11:30:48,569][17818] Num frames 10300...
[2025-07-29 11:30:48,697][17818] Num frames 10400...
[2025-07-29 11:30:48,822][17818] Num frames 10500...
[2025-07-29 11:30:48,945][17818] Num frames 10600...
[2025-07-29 11:30:49,118][17818] Avg episode rewards: #0: 32.745, true rewards: #0: 13.370
[2025-07-29 11:30:49,119][17818] Avg episode reward: 32.745, avg true_objective: 13.370
[2025-07-29 11:30:49,126][17818] Num frames 10700...
[2025-07-29 11:30:49,244][17818] Num frames 10800...
[2025-07-29 11:30:49,368][17818] Num frames 10900...
[2025-07-29 11:30:49,494][17818] Num frames 11000...
[2025-07-29 11:30:49,624][17818] Num frames 11100...
[2025-07-29 11:30:49,755][17818] Num frames 11200...
[2025-07-29 11:30:49,859][17818] Avg episode rewards: #0: 30.153, true rewards: #0: 12.487
[2025-07-29 11:30:49,860][17818] Avg episode reward: 30.153, avg true_objective: 12.487
[2025-07-29 11:30:49,941][17818] Num frames 11300...
[2025-07-29 11:30:50,072][17818] Num frames 11400...
[2025-07-29 11:30:50,204][17818] Num frames 11500...
[2025-07-29 11:30:50,335][17818] Num frames 11600...
[2025-07-29 11:30:50,464][17818] Num frames 11700...
[2025-07-29 11:30:50,591][17818] Num frames 11800...
[2025-07-29 11:30:50,721][17818] Num frames 11900...
[2025-07-29 11:30:50,852][17818] Num frames 12000...
[2025-07-29 11:30:50,983][17818] Num frames 12100...
[2025-07-29 11:30:51,114][17818] Num frames 12200...
[2025-07-29 11:30:51,243][17818] Num frames 12300...
[2025-07-29 11:30:51,374][17818] Avg episode rewards: #0: 29.558, true rewards: #0: 12.358
[2025-07-29 11:30:51,375][17818] Avg episode reward: 29.558, avg true_objective: 12.358
[2025-07-29 11:31:20,688][17818] Replay video saved to /content/train_dir/default_experiment/replay.mp4!