[21:22:53] 2025-08-12 [21:22:53] Tesla T4 [21:22:53] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [21:22:53] CPU usage: 91.9%, RAM usage: 28.1% [21:22:53] Running with the following configuration: [21:22:53] model_name: NousResearch/Hermes-3-Llama-3.1-8B [21:22:53] tokenizer: NousResearch/Hermes-3-Llama-3.1-8B [21:22:53] output_dir: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview [21:22:53] train_path: /content/drive/MyDrive/data/none.csv [21:22:53] checkpoint: [21:22:53] lr: 5e-05 [21:22:53] lr_floor: 1e-05 [21:22:53] epochs: 1 [21:22:53] batch_size: 5 [21:22:53] accum_steps: 7 [21:22:53] val_batch_size: 6 [21:22:53] max_val_size: 100 [21:22:53] max_length: 150 [21:22:53] save_temp_frequency: 33 [21:22:53] save_frequency: 500 [21:22:53] eval_frequency: 500 [21:22:53] save_pattern: y [21:22:53] quantization: y [21:22:53] quantization_bits: 4 [21:22:53] lora: y [21:22:53] frozen_lora_path: None [21:22:53] lora_rank: 16 [21:22:53] lora_alpha: 32 [21:22:53] lora_dropout: 0.08 [21:22:53] optimizer_weight_decay: 0.0 [21:22:53] warmup_type: cosine [21:22:53] warmup_ratio: 0.08 [21:22:53] warmup_steps: 439 [21:22:53] shuffle: y [21:22:53] csv_column: text [21:22:53] new_run: n [21:22:53] label_smoothing: 0.05 [21:22:53] SEED: 1 [21:22:53] Using device: cuda [21:24:23] LoRA configuration: [21:24:24] task_type: TaskType.CAUSAL_LM [21:24:24] peft_type: PeftType.LORA [21:24:24] auto_mapping: None [21:24:24] base_model_name_or_path: NousResearch/Hermes-3-Llama-3.1-8B [21:24:24] revision: None [21:24:24] inference_mode: False [21:24:24] r: 16 [21:24:24] target_modules: {'o_proj', 'q_proj', 'k_proj', 'v_proj'} [21:24:24] exclude_modules: None [21:24:24] lora_alpha: 32 [21:24:24] lora_dropout: 0.08 [21:24:24] fan_in_fan_out: False [21:24:24] bias: none [21:24:24] use_rslora: True [21:24:24] modules_to_save: None [21:24:24] init_lora_weights: True [21:24:24] layers_to_transform: None [21:24:24] layers_pattern: None [21:24:24] rank_pattern: {} [21:24:24] alpha_pattern: {} [21:24:24] megatron_config: None [21:24:24] megatron_core: megatron.core [21:24:24] trainable_token_indices: None [21:24:24] loftq_config: {} [21:24:24] eva_config: None [21:24:24] corda_config: None [21:24:24] use_dora: False [21:24:24] use_qalora: False [21:24:24] qalora_group_size: 16 [21:24:24] layer_replication: None [21:24:24] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [21:24:24] lora_bias: False [21:24:24] target_parameters: None [21:24:24] _custom_modules: None [21:24:24] TRAINING: base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:24] TRAINING: base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:24] TRAINING: base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:24] TRAINING: base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] TRAINING: base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [21:24:25] TRAINING: base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [21:24:25] TRAINING: base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [21:24:25] Total Parameters: 4,554,231,808 [21:24:25] Trainable Parameters: 13,631,488 [21:24:25] Trainable %: 0.2993% [21:24:25] base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:25] base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [21:24:26] Starting from CSV file... [21:24:28] Splitting data into chunks of 11000... [21:24:28] Using 7 processes across 18 chunks [21:24:29] Creating new train/val split. [21:24:29] Initializing scheduler with cosine schedule with warmup, warmup steps 439, total steps: 5482 [21:24:29] Train/Val split: 191887 train, 100 val samples. [21:24:40] Model: PeftModelForCausalLM [21:24:40] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.0", "use_cache": true, "vocab_size": 128256 } [21:24:41] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [21:24:41] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 5e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [21:24:41] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [21:24:41] Scheduler: [21:24:41] Training on 191887 training samples, 100 validation samples [21:24:41] Average tokens per sample: 141.99 [21:24:41] Estimated epoch time: ~747.92 min [21:24:41] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 8060 MiB | 8994 MiB | 407330 MiB | 399269 MiB | |---------------------------------------------------------------------------| | Active memory | 8060 MiB | 8994 MiB | 407330 MiB | 399269 MiB | |---------------------------------------------------------------------------| | Requested memory | 8057 MiB | 8990 MiB | 407010 MiB | 398953 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 11050 MiB | 11050 MiB | 11050 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 2987 MiB | 5879 MiB | 398807 MiB | 395820 MiB | |---------------------------------------------------------------------------| | Allocations | 1738 | 1816 | 32748 | 31010 | |---------------------------------------------------------------------------| | Active allocs | 1738 | 1816 | 32748 | 31010 | |---------------------------------------------------------------------------| | GPU reserved segments | 84 | 84 | 84 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 87 | 94 | 13335 | 13248 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [21:24:41] Shuffling indices for epoch 1 with seed 1 [21:24:41] CPU usage: 56.7%, RAM usage: 41.7% [21:24:42] Epoch 1 learning rate: 0.0 [21:24:42] Starting epoch 1 [21:24:42] Batch 1: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) [21:24:45] Epoch: 1 Batch: 1/38378 (0.00%) Loss: 6.741514 LR: 0.00000000 [21:24:48] Epoch: 1 Batch: 2/38378 (0.01%) Loss: 6.606559 LR: 0.00000000 [21:24:51] Epoch: 1 Batch: 3/38378 (0.01%) Loss: 6.529622 LR: 0.00000000 [21:24:54] Epoch: 1 Batch: 4/38378 (0.01%) Loss: 7.056437 LR: 0.00000000 [21:24:57] Epoch: 1 Batch: 5/38378 (0.01%) Loss: 7.443799 LR: 0.00000000 [21:25:00] Epoch: 1 Batch: 6/38378 (0.02%) Loss: 7.252364 LR: 0.00000000 [21:25:04] Epoch: 1 Batch: 7/38378 (0.02%) Loss: 7.118078 LR: 0.00000011 [21:25:07] Epoch: 1 Batch: 8/38378 (0.02%) Loss: 6.958713 LR: 0.00000011 [21:25:10] Epoch: 1 Batch: 9/38378 (0.02%) Loss: 6.378026 LR: 0.00000011 [21:25:13] Epoch: 1 Batch: 10/38378 (0.03%) Loss: 7.290612 LR: 0.00000011 [21:25:16] Epoch: 1 Batch: 11/38378 (0.03%) Loss: 7.146268 LR: 0.00000011 [21:25:19] Epoch: 1 Batch: 12/38378 (0.03%) Loss: 6.903623 LR: 0.00000011 [21:25:22] Epoch: 1 Batch: 13/38378 (0.03%) Loss: 6.562722 LR: 0.00000011 [21:25:26] Epoch: 1 Batch: 14/38378 (0.04%) Loss: 6.781000 LR: 0.00000023 [21:25:29] Epoch: 1 Batch: 15/38378 (0.04%) Loss: 6.777888 LR: 0.00000023 [21:25:32] Epoch: 1 Batch: 16/38378 (0.04%) Loss: 6.792592 LR: 0.00000023 [21:25:35] Epoch: 1 Batch: 17/38378 (0.04%) Loss: 7.356396 LR: 0.00000023 [21:25:38] Epoch: 1 Batch: 18/38378 (0.05%) Loss: 7.225408 LR: 0.00000023 [21:25:41] Epoch: 1 Batch: 19/38378 (0.05%) Loss: 6.864712 LR: 0.00000023 [21:25:44] Epoch: 1 Batch: 20/38378 (0.05%) Loss: 7.012792 LR: 0.00000023 [21:25:47] Epoch: 1 Batch: 21/38378 (0.05%) Loss: 6.787436 LR: 0.00000034 [21:25:50] Epoch: 1 Batch: 22/38378 (0.06%) Loss: 6.767088 LR: 0.00000034 [21:25:53] Epoch: 1 Batch: 23/38378 (0.06%) Loss: 6.798065 LR: 0.00000034 [21:25:56] Epoch: 1 Batch: 24/38378 (0.06%) Loss: 6.851676 LR: 0.00000034 [21:25:59] Epoch: 1 Batch: 25/38378 (0.07%) Loss: 7.143130 LR: 0.00000034 [21:26:02] Epoch: 1 Batch: 26/38378 (0.07%) Loss: 7.074902 LR: 0.00000034 [21:26:05] Epoch: 1 Batch: 27/38378 (0.07%) Loss: 6.774152 LR: 0.00000034 [21:26:08] Epoch: 1 Batch: 28/38378 (0.07%) Loss: 6.752544 LR: 0.00000046 [21:26:11] Epoch: 1 Batch: 29/38378 (0.08%) Loss: 6.840197 LR: 0.00000046 [21:26:14] Epoch: 1 Batch: 30/38378 (0.08%) Loss: 7.134908 LR: 0.00000046 [21:26:17] Epoch: 1 Batch: 31/38378 (0.08%) Loss: 7.050409 LR: 0.00000046 [21:26:20] Epoch: 1 Batch: 32/38378 (0.08%) Loss: 7.194702 LR: 0.00000046 [21:26:28] >> Temp checkpoint saved: epoch1_step33, size: 0.1702 GB [21:26:28] Epoch: 1 Batch: 33/38378 (0.09%) Loss: 6.571790 LR: 0.00000046 [21:26:31] Epoch: 1 Batch: 34/38378 (0.09%) Loss: 6.937932 LR: 0.00000046 [21:26:34] Epoch: 1 Batch: 35/38378 (0.09%) Loss: 6.642770 LR: 0.00000057 [21:26:39] Epoch: 1 Batch: 36/38378 (0.09%) Loss: 6.670043 LR: 0.00000057 [21:26:42] Epoch: 1 Batch: 37/38378 (0.10%) Loss: 6.841078 LR: 0.00000057 [21:26:45] Epoch: 1 Batch: 38/38378 (0.10%) Loss: 6.802460 LR: 0.00000057 [21:26:48] Epoch: 1 Batch: 39/38378 (0.10%) Loss: 7.237385 LR: 0.00000057 [21:26:51] Epoch: 1 Batch: 40/38378 (0.10%) Loss: 6.988084 LR: 0.00000057 [21:26:54] Epoch: 1 Batch: 41/38378 (0.11%) Loss: 6.709630 LR: 0.00000057 [21:26:57] Epoch: 1 Batch: 42/38378 (0.11%) Loss: 6.944387 LR: 0.00000068 [21:27:00] Epoch: 1 Batch: 43/38378 (0.11%) Loss: 6.641298 LR: 0.00000068 [21:27:04] Epoch: 1 Batch: 44/38378 (0.11%) Loss: 6.724445 LR: 0.00000068 [21:27:07] Epoch: 1 Batch: 45/38378 (0.12%) Loss: 6.464626 LR: 0.00000068 [21:27:10] Epoch: 1 Batch: 46/38378 (0.12%) Loss: 7.176118 LR: 0.00000068 [21:27:13] Epoch: 1 Batch: 47/38378 (0.12%) Loss: 7.052706 LR: 0.00000068 [21:27:16] Epoch: 1 Batch: 48/38378 (0.13%) Loss: 7.198937 LR: 0.00000068 [21:27:19] Epoch: 1 Batch: 49/38378 (0.13%) Loss: 7.113519 LR: 0.00000080 [21:27:22] Epoch: 1 Batch: 50/38378 (0.13%) Loss: 6.696510 LR: 0.00000080 [21:27:25] Epoch: 1 Batch: 51/38378 (0.13%) Loss: 7.416306 LR: 0.00000080 [21:27:28] Epoch: 1 Batch: 52/38378 (0.14%) Loss: 7.002541 LR: 0.00000080 [21:27:31] Epoch: 1 Batch: 53/38378 (0.14%) Loss: 6.769703 LR: 0.00000080 [21:27:34] Epoch: 1 Batch: 54/38378 (0.14%) Loss: 6.174042 LR: 0.00000080 [21:27:37] Epoch: 1 Batch: 55/38378 (0.14%) Loss: 6.545664 LR: 0.00000080 [21:27:40] Epoch: 1 Batch: 56/38378 (0.15%) Loss: 6.964957 LR: 0.00000091 [21:27:43] Epoch: 1 Batch: 57/38378 (0.15%) Loss: 6.704674 LR: 0.00000091 [21:27:46] Epoch: 1 Batch: 58/38378 (0.15%) Loss: 6.660461 LR: 0.00000091 [21:27:49] Epoch: 1 Batch: 59/38378 (0.15%) Loss: 6.331504 LR: 0.00000091 [21:27:52] Epoch: 1 Batch: 60/38378 (0.16%) Loss: 6.882839 LR: 0.00000091 [21:27:55] Epoch: 1 Batch: 61/38378 (0.16%) Loss: 6.556422 LR: 0.00000091 [21:27:58] Epoch: 1 Batch: 62/38378 (0.16%) Loss: 6.913469 LR: 0.00000091 [21:28:01] Epoch: 1 Batch: 63/38378 (0.16%) Loss: 6.637333 LR: 0.00000103 [21:28:04] Epoch: 1 Batch: 64/38378 (0.17%) Loss: 7.068727 LR: 0.00000103 [21:28:07] Epoch: 1 Batch: 65/38378 (0.17%) Loss: 6.691253 LR: 0.00000103 [21:28:15] >> Temp checkpoint saved: epoch1_step66, size: 0.1702 GB [21:28:15] Epoch: 1 Batch: 66/38378 (0.17%) Loss: 6.524147 LR: 0.00000103 [21:28:18] Epoch: 1 Batch: 67/38378 (0.17%) Loss: 6.853878 LR: 0.00000103 [21:28:21] Epoch: 1 Batch: 68/38378 (0.18%) Loss: 6.419892 LR: 0.00000103 [21:28:26] Epoch: 1 Batch: 69/38378 (0.18%) Loss: 6.651607 LR: 0.00000103 [21:28:29] Epoch: 1 Batch: 70/38378 (0.18%) Loss: 6.476128 LR: 0.00000114 [21:28:32] Epoch: 1 Batch: 71/38378 (0.19%) Loss: 7.407096 LR: 0.00000114 [21:28:35] Epoch: 1 Batch: 72/38378 (0.19%) Loss: 6.302770 LR: 0.00000114 [21:28:38] Epoch: 1 Batch: 73/38378 (0.19%) Loss: 6.600459 LR: 0.00000114 [21:28:41] Epoch: 1 Batch: 74/38378 (0.19%) Loss: 6.871166 LR: 0.00000114 [21:28:44] Epoch: 1 Batch: 75/38378 (0.20%) Loss: 6.762555 LR: 0.00000114 [21:28:47] Epoch: 1 Batch: 76/38378 (0.20%) Loss: 6.951845 LR: 0.00000114 [21:28:50] Epoch: 1 Batch: 77/38378 (0.20%) Loss: 6.911203 LR: 0.00000125 [21:28:53] Epoch: 1 Batch: 78/38378 (0.20%) Loss: 6.747880 LR: 0.00000125 [21:28:57] Epoch: 1 Batch: 79/38378 (0.21%) Loss: 7.012897 LR: 0.00000125 [21:29:00] Epoch: 1 Batch: 80/38378 (0.21%) Loss: 6.808088 LR: 0.00000125 [21:29:03] Epoch: 1 Batch: 81/38378 (0.21%) Loss: 6.694718 LR: 0.00000125 [21:29:06] Epoch: 1 Batch: 82/38378 (0.21%) Loss: 6.636609 LR: 0.00000125 [21:29:09] Epoch: 1 Batch: 83/38378 (0.22%) Loss: 6.662189 LR: 0.00000125 [21:29:12] Epoch: 1 Batch: 84/38378 (0.22%) Loss: 6.346995 LR: 0.00000137 [21:29:15] Epoch: 1 Batch: 85/38378 (0.22%) Loss: 6.247926 LR: 0.00000137 [21:29:18] Epoch: 1 Batch: 86/38378 (0.22%) Loss: 6.268597 LR: 0.00000137 [21:29:21] Epoch: 1 Batch: 87/38378 (0.23%) Loss: 6.864013 LR: 0.00000137 [21:29:24] Epoch: 1 Batch: 88/38378 (0.23%) Loss: 6.181249 LR: 0.00000137 [21:29:27] Epoch: 1 Batch: 89/38378 (0.23%) Loss: 6.552874 LR: 0.00000137 [21:29:30] Epoch: 1 Batch: 90/38378 (0.23%) Loss: 6.732682 LR: 0.00000137 [21:29:33] Epoch: 1 Batch: 91/38378 (0.24%) Loss: 6.773973 LR: 0.00000148 [21:29:36] Epoch: 1 Batch: 92/38378 (0.24%) Loss: 6.420141 LR: 0.00000148 [21:29:39] Epoch: 1 Batch: 93/38378 (0.24%) Loss: 6.269273 LR: 0.00000148 [21:29:42] Epoch: 1 Batch: 94/38378 (0.24%) Loss: 5.768437 LR: 0.00000148 [21:29:45] Epoch: 1 Batch: 95/38378 (0.25%) Loss: 6.363480 LR: 0.00000148 [21:29:48] Epoch: 1 Batch: 96/38378 (0.25%) Loss: 6.216162 LR: 0.00000148 [21:29:51] Epoch: 1 Batch: 97/38378 (0.25%) Loss: 6.479863 LR: 0.00000148 [21:29:55] Epoch: 1 Batch: 98/38378 (0.26%) Loss: 6.726694 LR: 0.00000159 [21:30:02] >> Temp checkpoint saved: epoch1_step99, size: 0.1702 GB [21:30:02] Epoch: 1 Batch: 99/38378 (0.26%) Loss: 6.393977 LR: 0.00000159 [21:30:05] Epoch: 1 Batch: 100/38378 (0.26%) Loss: 6.709968 LR: 0.00000159 [21:30:08] Epoch: 1 Batch: 101/38378 (0.26%) Loss: 6.089592 LR: 0.00000159 [21:30:13] Epoch: 1 Batch: 102/38378 (0.27%) Loss: 6.363952 LR: 0.00000159 [21:30:16] Epoch: 1 Batch: 103/38378 (0.27%) Loss: 6.570012 LR: 0.00000159 [21:30:20] Epoch: 1 Batch: 104/38378 (0.27%) Loss: 6.124007 LR: 0.00000159 [21:30:23] Epoch: 1 Batch: 105/38378 (0.27%) Loss: 6.603824 LR: 0.00000171 [21:30:26] Epoch: 1 Batch: 106/38378 (0.28%) Loss: 6.236663 LR: 0.00000171 [21:30:29] Epoch: 1 Batch: 107/38378 (0.28%) Loss: 6.347280 LR: 0.00000171 [21:30:32] Epoch: 1 Batch: 108/38378 (0.28%) Loss: 6.158096 LR: 0.00000171 [21:30:35] Epoch: 1 Batch: 109/38378 (0.28%) Loss: 6.738978 LR: 0.00000171 [21:30:38] Epoch: 1 Batch: 110/38378 (0.29%) Loss: 6.412094 LR: 0.00000171 [21:30:41] Epoch: 1 Batch: 111/38378 (0.29%) Loss: 6.523893 LR: 0.00000171 [21:30:44] Epoch: 1 Batch: 112/38378 (0.29%) Loss: 6.619224 LR: 0.00000182 [21:30:47] Epoch: 1 Batch: 113/38378 (0.29%) Loss: 6.194023 LR: 0.00000182 [21:30:50] Epoch: 1 Batch: 114/38378 (0.30%) Loss: 6.600773 LR: 0.00000182 [21:30:54] Epoch: 1 Batch: 115/38378 (0.30%) Loss: 6.036370 LR: 0.00000182 [21:30:57] Epoch: 1 Batch: 116/38378 (0.30%) Loss: 6.583125 LR: 0.00000182 [21:31:00] Epoch: 1 Batch: 117/38378 (0.30%) Loss: 6.343484 LR: 0.00000182 [21:31:03] Epoch: 1 Batch: 118/38378 (0.31%) Loss: 6.007777 LR: 0.00000182 [21:31:06] Epoch: 1 Batch: 119/38378 (0.31%) Loss: 6.613847 LR: 0.00000194 [21:31:09] Epoch: 1 Batch: 120/38378 (0.31%) Loss: 6.091333 LR: 0.00000194 [21:31:12] Epoch: 1 Batch: 121/38378 (0.32%) Loss: 5.614332 LR: 0.00000194 [21:31:15] Epoch: 1 Batch: 122/38378 (0.32%) Loss: 6.275421 LR: 0.00000194 [21:31:18] Epoch: 1 Batch: 123/38378 (0.32%) Loss: 6.523528 LR: 0.00000194 [21:31:21] Epoch: 1 Batch: 124/38378 (0.32%) Loss: 6.160956 LR: 0.00000194 [21:31:24] Epoch: 1 Batch: 125/38378 (0.33%) Loss: 6.458415 LR: 0.00000194 [21:31:27] Epoch: 1 Batch: 126/38378 (0.33%) Loss: 6.588439 LR: 0.00000205 [21:31:30] Epoch: 1 Batch: 127/38378 (0.33%) Loss: 6.026058 LR: 0.00000205 [21:31:33] Epoch: 1 Batch: 128/38378 (0.33%) Loss: 5.961139 LR: 0.00000205 [21:31:36] Epoch: 1 Batch: 129/38378 (0.34%) Loss: 6.383879 LR: 0.00000205 [21:31:39] Epoch: 1 Batch: 130/38378 (0.34%) Loss: 6.499477 LR: 0.00000205 [21:31:42] Epoch: 1 Batch: 131/38378 (0.34%) Loss: 6.007458 LR: 0.00000205 [21:31:49] >> Temp checkpoint saved: epoch1_step132, size: 0.1702 GB [21:31:49] Epoch: 1 Batch: 132/38378 (0.34%) Loss: 6.071045 LR: 0.00000205 [21:31:52] Epoch: 1 Batch: 133/38378 (0.35%) Loss: 5.990319 LR: 0.00000216 [21:31:55] Epoch: 1 Batch: 134/38378 (0.35%) Loss: 6.141752 LR: 0.00000216 [21:31:58] Epoch: 1 Batch: 135/38378 (0.35%) Loss: 5.848039 LR: 0.00000216 [21:32:01] Epoch: 1 Batch: 136/38378 (0.35%) Loss: 5.866212 LR: 0.00000216 [21:32:05] Epoch: 1 Batch: 137/38378 (0.36%) Loss: 5.965481 LR: 0.00000216 [21:32:08] Epoch: 1 Batch: 138/38378 (0.36%) Loss: 5.915098 LR: 0.00000216 [21:32:11] Epoch: 1 Batch: 139/38378 (0.36%) Loss: 6.214770 LR: 0.00000216 [21:32:14] Epoch: 1 Batch: 140/38378 (0.36%) Loss: 5.967258 LR: 0.00000228 [21:32:17] Epoch: 1 Batch: 141/38378 (0.37%) Loss: 5.840922 LR: 0.00000228 [21:32:20] Epoch: 1 Batch: 142/38378 (0.37%) Loss: 6.121053 LR: 0.00000228 [21:32:23] Epoch: 1 Batch: 143/38378 (0.37%) Loss: 5.756975 LR: 0.00000228 [21:32:26] Epoch: 1 Batch: 144/38378 (0.38%) Loss: 6.142191 LR: 0.00000228 [21:32:29] Epoch: 1 Batch: 145/38378 (0.38%) Loss: 5.963239 LR: 0.00000228 [21:32:32] Epoch: 1 Batch: 146/38378 (0.38%) Loss: 5.973310 LR: 0.00000228 [21:32:35] Epoch: 1 Batch: 147/38378 (0.38%) Loss: 5.792087 LR: 0.00000239 [21:32:38] Epoch: 1 Batch: 148/38378 (0.39%) Loss: 5.831920 LR: 0.00000239 [21:32:41] Epoch: 1 Batch: 149/38378 (0.39%) Loss: 6.156088 LR: 0.00000239 [21:32:45] Epoch: 1 Batch: 150/38378 (0.39%) Loss: 5.679299 LR: 0.00000239 [21:32:48] Epoch: 1 Batch: 151/38378 (0.39%) Loss: 5.430111 LR: 0.00000239 [21:32:51] Epoch: 1 Batch: 152/38378 (0.40%) Loss: 5.669382 LR: 0.00000239 [21:32:54] Epoch: 1 Batch: 153/38378 (0.40%) Loss: 5.960983 LR: 0.00000239 [21:32:57] Epoch: 1 Batch: 154/38378 (0.40%) Loss: 5.929320 LR: 0.00000251 [21:33:00] Epoch: 1 Batch: 155/38378 (0.40%) Loss: 6.027066 LR: 0.00000251 [21:33:03] Epoch: 1 Batch: 156/38378 (0.41%) Loss: 5.518361 LR: 0.00000251 [21:33:06] Epoch: 1 Batch: 157/38378 (0.41%) Loss: 5.781840 LR: 0.00000251 [21:33:09] Epoch: 1 Batch: 158/38378 (0.41%) Loss: 5.386876 LR: 0.00000251 [21:33:12] Epoch: 1 Batch: 159/38378 (0.41%) Loss: 5.509346 LR: 0.00000251 [21:33:15] Epoch: 1 Batch: 160/38378 (0.42%) Loss: 5.613064 LR: 0.00000251 [21:33:18] Epoch: 1 Batch: 161/38378 (0.42%) Loss: 5.440505 LR: 0.00000262 [21:33:21] Epoch: 1 Batch: 162/38378 (0.42%) Loss: 5.659715 LR: 0.00000262 [21:33:24] Epoch: 1 Batch: 163/38378 (0.42%) Loss: 5.999717 LR: 0.00000262 [21:33:27] Epoch: 1 Batch: 164/38378 (0.43%) Loss: 5.813747 LR: 0.00000262 [21:33:35] >> Temp checkpoint saved: epoch1_step165, size: 0.1702 GB [21:33:35] Epoch: 1 Batch: 165/38378 (0.43%) Loss: 5.928907 LR: 0.00000262 [21:33:38] Epoch: 1 Batch: 166/38378 (0.43%) Loss: 5.704259 LR: 0.00000262 [21:33:41] Epoch: 1 Batch: 167/38378 (0.44%) Loss: 5.719970 LR: 0.00000262 [21:33:44] Epoch: 1 Batch: 168/38378 (0.44%) Loss: 5.351349 LR: 0.00000273 [21:33:47] Epoch: 1 Batch: 169/38378 (0.44%) Loss: 5.490031 LR: 0.00000273 [21:33:50] Epoch: 1 Batch: 170/38378 (0.44%) Loss: 5.588965 LR: 0.00000273 [21:33:53] Epoch: 1 Batch: 171/38378 (0.45%) Loss: 5.219666 LR: 0.00000273 [21:33:56] Epoch: 1 Batch: 172/38378 (0.45%) Loss: 5.508370 LR: 0.00000273 [21:33:59] Epoch: 1 Batch: 173/38378 (0.45%) Loss: 5.635222 LR: 0.00000273 [21:34:02] Epoch: 1 Batch: 174/38378 (0.45%) Loss: 5.426221 LR: 0.00000273 [21:34:05] Epoch: 1 Batch: 175/38378 (0.46%) Loss: 6.189226 LR: 0.00000285 [21:34:08] Epoch: 1 Batch: 176/38378 (0.46%) Loss: 5.335996 LR: 0.00000285 [21:34:11] Epoch: 1 Batch: 177/38378 (0.46%) Loss: 5.370844 LR: 0.00000285 [21:34:14] Epoch: 1 Batch: 178/38378 (0.46%) Loss: 5.297530 LR: 0.00000285 [21:34:18] Epoch: 1 Batch: 179/38378 (0.47%) Loss: 5.165634 LR: 0.00000285 [21:34:21] Epoch: 1 Batch: 180/38378 (0.47%) Loss: 5.509738 LR: 0.00000285 [21:34:24] Epoch: 1 Batch: 181/38378 (0.47%) Loss: 5.408429 LR: 0.00000285 [21:34:27] Epoch: 1 Batch: 182/38378 (0.47%) Loss: 5.248386 LR: 0.00000296 [21:34:30] Epoch: 1 Batch: 183/38378 (0.48%) Loss: 5.666003 LR: 0.00000296 [21:34:33] Epoch: 1 Batch: 184/38378 (0.48%) Loss: 5.910677 LR: 0.00000296 [21:34:36] Epoch: 1 Batch: 185/38378 (0.48%) Loss: 5.487061 LR: 0.00000296 [21:34:39] Epoch: 1 Batch: 186/38378 (0.48%) Loss: 5.293736 LR: 0.00000296 [21:34:42] Epoch: 1 Batch: 187/38378 (0.49%) Loss: 5.457509 LR: 0.00000296 [21:34:45] Epoch: 1 Batch: 188/38378 (0.49%) Loss: 5.463485 LR: 0.00000296 [21:34:48] Epoch: 1 Batch: 189/38378 (0.49%) Loss: 4.704139 LR: 0.00000308 [21:34:51] Epoch: 1 Batch: 190/38378 (0.50%) Loss: 5.142122 LR: 0.00000308 [21:34:54] Epoch: 1 Batch: 191/38378 (0.50%) Loss: 5.214944 LR: 0.00000308 [21:34:57] Epoch: 1 Batch: 192/38378 (0.50%) Loss: 5.131995 LR: 0.00000308 [21:35:00] Epoch: 1 Batch: 193/38378 (0.50%) Loss: 5.192181 LR: 0.00000308 [21:35:03] Epoch: 1 Batch: 194/38378 (0.51%) Loss: 5.395660 LR: 0.00000308 [21:35:06] Epoch: 1 Batch: 195/38378 (0.51%) Loss: 5.126329 LR: 0.00000308 [21:35:09] Epoch: 1 Batch: 196/38378 (0.51%) Loss: 5.183446 LR: 0.00000319 [21:35:12] Epoch: 1 Batch: 197/38378 (0.51%) Loss: 5.150136 LR: 0.00000319 [21:35:20] >> Temp checkpoint saved: epoch1_step198, size: 0.1702 GB [21:35:20] Epoch: 1 Batch: 198/38378 (0.52%) Loss: 5.046819 LR: 0.00000319 [21:35:23] Epoch: 1 Batch: 199/38378 (0.52%) Loss: 4.983326 LR: 0.00000319 [21:35:26] Epoch: 1 Batch: 200/38378 (0.52%) Loss: 5.005875 LR: 0.00000319 [21:35:29] Epoch: 1 Batch: 201/38378 (0.52%) Loss: 5.409193 LR: 0.00000319 [21:35:32] Epoch: 1 Batch: 202/38378 (0.53%) Loss: 4.982484 LR: 0.00000319 [21:35:35] Epoch: 1 Batch: 203/38378 (0.53%) Loss: 5.054588 LR: 0.00000330 [21:35:39] Epoch: 1 Batch: 204/38378 (0.53%) Loss: 5.241577 LR: 0.00000330 [21:35:42] Epoch: 1 Batch: 205/38378 (0.53%) Loss: 4.873852 LR: 0.00000330 [21:35:45] Epoch: 1 Batch: 206/38378 (0.54%) Loss: 5.131657 LR: 0.00000330 [21:35:48] Epoch: 1 Batch: 207/38378 (0.54%) Loss: 4.959173 LR: 0.00000330 [21:35:51] Epoch: 1 Batch: 208/38378 (0.54%) Loss: 5.299251 LR: 0.00000330 [21:35:54] Epoch: 1 Batch: 209/38378 (0.54%) Loss: 5.219866 LR: 0.00000330 [21:35:57] Epoch: 1 Batch: 210/38378 (0.55%) Loss: 4.868629 LR: 0.00000342 [21:36:00] Epoch: 1 Batch: 211/38378 (0.55%) Loss: 5.047840 LR: 0.00000342 [21:36:03] Epoch: 1 Batch: 212/38378 (0.55%) Loss: 5.258916 LR: 0.00000342 [21:36:06] Epoch: 1 Batch: 213/38378 (0.56%) Loss: 5.118901 LR: 0.00000342 [21:36:09] Epoch: 1 Batch: 214/38378 (0.56%) Loss: 5.140559 LR: 0.00000342 [21:36:12] Epoch: 1 Batch: 215/38378 (0.56%) Loss: 4.625506 LR: 0.00000342 [21:36:15] Epoch: 1 Batch: 216/38378 (0.56%) Loss: 5.161873 LR: 0.00000342 [21:36:19] Epoch: 1 Batch: 217/38378 (0.57%) Loss: 5.308355 LR: 0.00000353 [21:36:22] Epoch: 1 Batch: 218/38378 (0.57%) Loss: 5.247584 LR: 0.00000353 [21:36:25] Epoch: 1 Batch: 219/38378 (0.57%) Loss: 4.923747 LR: 0.00000353 [21:36:28] Epoch: 1 Batch: 220/38378 (0.57%) Loss: 5.282529 LR: 0.00000353 [21:36:31] Epoch: 1 Batch: 221/38378 (0.58%) Loss: 5.311969 LR: 0.00000353 [21:36:34] Epoch: 1 Batch: 222/38378 (0.58%) Loss: 4.951192 LR: 0.00000353 [21:36:37] Epoch: 1 Batch: 223/38378 (0.58%) Loss: 4.732906 LR: 0.00000353 [21:36:40] Epoch: 1 Batch: 224/38378 (0.58%) Loss: 4.440954 LR: 0.00000364 [21:36:43] Epoch: 1 Batch: 225/38378 (0.59%) Loss: 4.723118 LR: 0.00000364 [21:36:46] Epoch: 1 Batch: 226/38378 (0.59%) Loss: 5.059729 LR: 0.00000364 [21:36:49] Epoch: 1 Batch: 227/38378 (0.59%) Loss: 4.514014 LR: 0.00000364 [21:36:52] Epoch: 1 Batch: 228/38378 (0.59%) Loss: 4.773696 LR: 0.00000364 [21:36:55] Epoch: 1 Batch: 229/38378 (0.60%) Loss: 4.664015 LR: 0.00000364 [21:36:58] Epoch: 1 Batch: 230/38378 (0.60%) Loss: 5.160714 LR: 0.00000364 [21:37:05] >> Temp checkpoint saved: epoch1_step231, size: 0.1702 GB [21:37:05] Epoch: 1 Batch: 231/38378 (0.60%) Loss: 4.752860 LR: 0.00000376 [21:37:08] Epoch: 1 Batch: 232/38378 (0.60%) Loss: 5.118712 LR: 0.00000376 [21:37:11] Epoch: 1 Batch: 233/38378 (0.61%) Loss: 4.589212 LR: 0.00000376 [21:37:15] Epoch: 1 Batch: 234/38378 (0.61%) Loss: 4.753568 LR: 0.00000376 [21:37:18] Epoch: 1 Batch: 235/38378 (0.61%) Loss: 4.648283 LR: 0.00000376 [21:37:21] Epoch: 1 Batch: 236/38378 (0.61%) Loss: 4.581940 LR: 0.00000376 [21:37:24] Epoch: 1 Batch: 237/38378 (0.62%) Loss: 4.800599 LR: 0.00000376 [21:37:27] Epoch: 1 Batch: 238/38378 (0.62%) Loss: 4.601741 LR: 0.00000387 [21:37:30] Epoch: 1 Batch: 239/38378 (0.62%) Loss: 4.519505 LR: 0.00000387 [21:37:33] Epoch: 1 Batch: 240/38378 (0.63%) Loss: 4.838657 LR: 0.00000387 [21:37:36] Epoch: 1 Batch: 241/38378 (0.63%) Loss: 4.310844 LR: 0.00000387 [21:37:39] Epoch: 1 Batch: 242/38378 (0.63%) Loss: 4.484790 LR: 0.00000387 [21:37:42] Epoch: 1 Batch: 243/38378 (0.63%) Loss: 4.875067 LR: 0.00000387 [21:37:45] Epoch: 1 Batch: 244/38378 (0.64%) Loss: 4.375403 LR: 0.00000387 [21:37:48] Epoch: 1 Batch: 245/38378 (0.64%) Loss: 4.132089 LR: 0.00000399 [21:37:51] Epoch: 1 Batch: 246/38378 (0.64%) Loss: 5.012807 LR: 0.00000399 [21:37:54] Epoch: 1 Batch: 247/38378 (0.64%) Loss: 4.604140 LR: 0.00000399 [21:37:57] Epoch: 1 Batch: 248/38378 (0.65%) Loss: 4.561182 LR: 0.00000399 [21:38:01] Epoch: 1 Batch: 249/38378 (0.65%) Loss: 4.240117 LR: 0.00000399 [21:38:04] Epoch: 1 Batch: 250/38378 (0.65%) Loss: 4.945144 LR: 0.00000399 [21:38:07] Epoch: 1 Batch: 251/38378 (0.65%) Loss: 4.650477 LR: 0.00000399 [21:38:10] Epoch: 1 Batch: 252/38378 (0.66%) Loss: 4.371702 LR: 0.00000410 [21:38:13] Epoch: 1 Batch: 253/38378 (0.66%) Loss: 4.280695 LR: 0.00000410 [21:38:16] Epoch: 1 Batch: 254/38378 (0.66%) Loss: 4.751051 LR: 0.00000410 [21:38:19] Epoch: 1 Batch: 255/38378 (0.66%) Loss: 4.612496 LR: 0.00000410 [21:38:22] Epoch: 1 Batch: 256/38378 (0.67%) Loss: 4.923680 LR: 0.00000410 [21:38:25] Epoch: 1 Batch: 257/38378 (0.67%) Loss: 4.203763 LR: 0.00000410 [21:38:28] Epoch: 1 Batch: 258/38378 (0.67%) Loss: 4.765778 LR: 0.00000410 [21:38:31] Epoch: 1 Batch: 259/38378 (0.67%) Loss: 4.836518 LR: 0.00000421 [21:38:34] Epoch: 1 Batch: 260/38378 (0.68%) Loss: 4.602279 LR: 0.00000421 [21:38:37] Epoch: 1 Batch: 261/38378 (0.68%) Loss: 4.513278 LR: 0.00000421 [21:38:40] Epoch: 1 Batch: 262/38378 (0.68%) Loss: 4.846032 LR: 0.00000421 [21:38:43] Epoch: 1 Batch: 263/38378 (0.69%) Loss: 4.143041 LR: 0.00000421 [21:38:51] >> Cleaned up old temp checkpoint: epoch1_step1 [21:38:51] >> Temp checkpoint saved: epoch1_step264, size: 0.1702 GB [21:38:51] Epoch: 1 Batch: 264/38378 (0.69%) Loss: 3.995149 LR: 0.00000421 [21:38:54] Epoch: 1 Batch: 265/38378 (0.69%) Loss: 4.533149 LR: 0.00000421 [21:38:57] Epoch: 1 Batch: 266/38378 (0.69%) Loss: 3.858708 LR: 0.00000433 [21:39:00] Epoch: 1 Batch: 267/38378 (0.70%) Loss: 4.726948 LR: 0.00000433 [21:39:03] Epoch: 1 Batch: 268/38378 (0.70%) Loss: 4.287750 LR: 0.00000433 [21:39:06] Epoch: 1 Batch: 269/38378 (0.70%) Loss: 4.810292 LR: 0.00000433 [21:39:09] Epoch: 1 Batch: 270/38378 (0.70%) Loss: 4.868884 LR: 0.00000433 [21:39:12] Epoch: 1 Batch: 271/38378 (0.71%) Loss: 4.447680 LR: 0.00000433 [21:39:15] Epoch: 1 Batch: 272/38378 (0.71%) Loss: 4.496168 LR: 0.00000433 [21:39:18] Epoch: 1 Batch: 273/38378 (0.71%) Loss: 4.243864 LR: 0.00000444 [21:39:21] Epoch: 1 Batch: 274/38378 (0.71%) Loss: 4.667808 LR: 0.00000444 [21:39:24] Epoch: 1 Batch: 275/38378 (0.72%) Loss: 4.152484 LR: 0.00000444 [21:39:28] Epoch: 1 Batch: 276/38378 (0.72%) Loss: 4.396418 LR: 0.00000444 [21:39:31] Epoch: 1 Batch: 277/38378 (0.72%) Loss: 4.273847 LR: 0.00000444 [21:39:34] Epoch: 1 Batch: 278/38378 (0.72%) Loss: 4.343357 LR: 0.00000444 [21:39:37] Epoch: 1 Batch: 279/38378 (0.73%) Loss: 4.230586 LR: 0.00000444 [21:39:40] Epoch: 1 Batch: 280/38378 (0.73%) Loss: 4.027423 LR: 0.00000456 [21:39:43] Epoch: 1 Batch: 281/38378 (0.73%) Loss: 4.165125 LR: 0.00000456 [21:39:46] Epoch: 1 Batch: 282/38378 (0.73%) Loss: 4.354095 LR: 0.00000456 [21:39:49] Epoch: 1 Batch: 283/38378 (0.74%) Loss: 4.425945 LR: 0.00000456 [21:39:52] Epoch: 1 Batch: 284/38378 (0.74%) Loss: 4.530136 LR: 0.00000456 [21:39:55] Epoch: 1 Batch: 285/38378 (0.74%) Loss: 4.411188 LR: 0.00000456 [21:39:58] Epoch: 1 Batch: 286/38378 (0.75%) Loss: 4.605838 LR: 0.00000456 [21:40:01] Epoch: 1 Batch: 287/38378 (0.75%) Loss: 3.829754 LR: 0.00000467 [21:40:04] Epoch: 1 Batch: 288/38378 (0.75%) Loss: 4.271561 LR: 0.00000467 [21:40:07] Epoch: 1 Batch: 289/38378 (0.75%) Loss: 4.085157 LR: 0.00000467 [21:40:10] Epoch: 1 Batch: 290/38378 (0.76%) Loss: 3.670547 LR: 0.00000467 [21:40:13] Epoch: 1 Batch: 291/38378 (0.76%) Loss: 4.142964 LR: 0.00000467 [21:40:16] Epoch: 1 Batch: 292/38378 (0.76%) Loss: 4.341801 LR: 0.00000467 [21:40:19] Epoch: 1 Batch: 293/38378 (0.76%) Loss: 4.403275 LR: 0.00000467 [21:40:22] Epoch: 1 Batch: 294/38378 (0.77%) Loss: 3.695142 LR: 0.00000478 [21:40:25] Epoch: 1 Batch: 295/38378 (0.77%) Loss: 4.039203 LR: 0.00000478 [21:40:29] Epoch: 1 Batch: 296/38378 (0.77%) Loss: 3.797617 LR: 0.00000478 [21:40:35] >> Cleaned up old temp checkpoint: epoch1_step2 [21:40:36] >> Temp checkpoint saved: epoch1_step297, size: 0.1702 GB [21:40:36] Epoch: 1 Batch: 297/38378 (0.77%) Loss: 4.257056 LR: 0.00000478 [21:40:39] Epoch: 1 Batch: 298/38378 (0.78%) Loss: 3.992099 LR: 0.00000478 [21:40:42] Epoch: 1 Batch: 299/38378 (0.78%) Loss: 4.646511 LR: 0.00000478 [21:40:45] Epoch: 1 Batch: 300/38378 (0.78%) Loss: 4.065659 LR: 0.00000478 [21:40:48] Epoch: 1 Batch: 301/38378 (0.78%) Loss: 4.328078 LR: 0.00000490 [21:40:51] Epoch: 1 Batch: 302/38378 (0.79%) Loss: 3.999807 LR: 0.00000490 [21:40:54] Epoch: 1 Batch: 303/38378 (0.79%) Loss: 3.907648 LR: 0.00000490 [21:40:57] Epoch: 1 Batch: 304/38378 (0.79%) Loss: 3.721033 LR: 0.00000490 [21:41:00] Epoch: 1 Batch: 305/38378 (0.79%) Loss: 4.031550 LR: 0.00000490 [21:41:03] Epoch: 1 Batch: 306/38378 (0.80%) Loss: 3.947484 LR: 0.00000490 [21:41:06] Epoch: 1 Batch: 307/38378 (0.80%) Loss: 3.988375 LR: 0.00000490 [21:41:09] Epoch: 1 Batch: 308/38378 (0.80%) Loss: 4.265417 LR: 0.00000501 [21:41:12] Epoch: 1 Batch: 309/38378 (0.81%) Loss: 4.065121 LR: 0.00000501 [21:41:15] Epoch: 1 Batch: 310/38378 (0.81%) Loss: 3.974044 LR: 0.00000501 [21:41:19] Epoch: 1 Batch: 311/38378 (0.81%) Loss: 3.683698 LR: 0.00000501 [21:41:22] Epoch: 1 Batch: 312/38378 (0.81%) Loss: 4.450507 LR: 0.00000501 [21:41:25] Epoch: 1 Batch: 313/38378 (0.82%) Loss: 3.729052 LR: 0.00000501 [21:41:28] Epoch: 1 Batch: 314/38378 (0.82%) Loss: 4.016059 LR: 0.00000501 [21:41:31] Epoch: 1 Batch: 315/38378 (0.82%) Loss: 4.014587 LR: 0.00000513 [21:41:34] Epoch: 1 Batch: 316/38378 (0.82%) Loss: 3.941951 LR: 0.00000513 [21:41:37] Epoch: 1 Batch: 317/38378 (0.83%) Loss: 4.054495 LR: 0.00000513 [21:41:40] Epoch: 1 Batch: 318/38378 (0.83%) Loss: 4.012283 LR: 0.00000513 [21:41:43] Epoch: 1 Batch: 319/38378 (0.83%) Loss: 3.846924 LR: 0.00000513 [21:41:46] Epoch: 1 Batch: 320/38378 (0.83%) Loss: 3.909558 LR: 0.00000513 [21:41:49] Epoch: 1 Batch: 321/38378 (0.84%) Loss: 3.828723 LR: 0.00000513 [21:41:52] Epoch: 1 Batch: 322/38378 (0.84%) Loss: 3.451006 LR: 0.00000524 [21:41:55] Epoch: 1 Batch: 323/38378 (0.84%) Loss: 3.745984 LR: 0.00000524 [21:41:58] Epoch: 1 Batch: 324/38378 (0.84%) Loss: 3.200889 LR: 0.00000524 [21:42:01] Epoch: 1 Batch: 325/38378 (0.85%) Loss: 3.846779 LR: 0.00000524 [21:42:04] Epoch: 1 Batch: 326/38378 (0.85%) Loss: 3.732131 LR: 0.00000524 [21:42:07] Epoch: 1 Batch: 327/38378 (0.85%) Loss: 3.769197 LR: 0.00000524 [21:42:10] Epoch: 1 Batch: 328/38378 (0.85%) Loss: 3.808945 LR: 0.00000524 [21:42:13] Epoch: 1 Batch: 329/38378 (0.86%) Loss: 3.744761 LR: 0.00000535 [21:42:21] >> Cleaned up old temp checkpoint: epoch1_step3 [21:42:21] >> Temp checkpoint saved: epoch1_step330, size: 0.1702 GB [21:42:21] Epoch: 1 Batch: 330/38378 (0.86%) Loss: 3.732106 LR: 0.00000535 [21:42:24] Epoch: 1 Batch: 331/38378 (0.86%) Loss: 3.712530 LR: 0.00000535 [21:42:27] Epoch: 1 Batch: 332/38378 (0.87%) Loss: 4.052700 LR: 0.00000535 [21:42:30] Epoch: 1 Batch: 333/38378 (0.87%) Loss: 3.723883 LR: 0.00000535 [21:42:33] Epoch: 1 Batch: 334/38378 (0.87%) Loss: 3.534116 LR: 0.00000535 [21:42:36] Epoch: 1 Batch: 335/38378 (0.87%) Loss: 3.792468 LR: 0.00000535 [21:42:39] Epoch: 1 Batch: 336/38378 (0.88%) Loss: 4.055508 LR: 0.00000547 [21:42:42] Epoch: 1 Batch: 337/38378 (0.88%) Loss: 3.211589 LR: 0.00000547 [21:42:45] Epoch: 1 Batch: 338/38378 (0.88%) Loss: 4.043658 LR: 0.00000547 [21:42:48] Epoch: 1 Batch: 339/38378 (0.88%) Loss: 3.316242 LR: 0.00000547 [21:42:51] Epoch: 1 Batch: 340/38378 (0.89%) Loss: 3.747635 LR: 0.00000547 [21:42:54] Epoch: 1 Batch: 341/38378 (0.89%) Loss: 3.584497 LR: 0.00000547 [21:42:57] Epoch: 1 Batch: 342/38378 (0.89%) Loss: 3.342387 LR: 0.00000547 [21:43:01] Epoch: 1 Batch: 343/38378 (0.89%) Loss: 3.527352 LR: 0.00000558 [21:43:03] Epoch: 1 Batch: 344/38378 (0.90%) Loss: 3.861019 LR: 0.00000558 [21:43:07] Epoch: 1 Batch: 345/38378 (0.90%) Loss: 3.528163 LR: 0.00000558 [21:43:10] Epoch: 1 Batch: 346/38378 (0.90%) Loss: 3.652140 LR: 0.00000558 [21:43:13] Epoch: 1 Batch: 347/38378 (0.90%) Loss: 3.576094 LR: 0.00000558 [21:43:16] Epoch: 1 Batch: 348/38378 (0.91%) Loss: 3.796640 LR: 0.00000558 [21:43:19] Epoch: 1 Batch: 349/38378 (0.91%) Loss: 3.585316 LR: 0.00000558 [21:43:22] Epoch: 1 Batch: 350/38378 (0.91%) Loss: 3.682586 LR: 0.00000569 [21:43:25] Epoch: 1 Batch: 351/38378 (0.91%) Loss: 3.842823 LR: 0.00000569 [21:43:28] Epoch: 1 Batch: 352/38378 (0.92%) Loss: 3.466629 LR: 0.00000569 [21:43:31] Epoch: 1 Batch: 353/38378 (0.92%) Loss: 3.606604 LR: 0.00000569 [21:43:34] Epoch: 1 Batch: 354/38378 (0.92%) Loss: 3.927129 LR: 0.00000569 [21:43:37] Epoch: 1 Batch: 355/38378 (0.93%) Loss: 3.814574 LR: 0.00000569 [21:43:40] Epoch: 1 Batch: 356/38378 (0.93%) Loss: 3.530164 LR: 0.00000569 [21:43:43] Epoch: 1 Batch: 357/38378 (0.93%) Loss: 3.437786 LR: 0.00000581 [21:43:46] Epoch: 1 Batch: 358/38378 (0.93%) Loss: 3.432374 LR: 0.00000581 [21:43:49] Epoch: 1 Batch: 359/38378 (0.94%) Loss: 3.802280 LR: 0.00000581 [21:43:52] Epoch: 1 Batch: 360/38378 (0.94%) Loss: 3.583182 LR: 0.00000581 [21:43:55] Epoch: 1 Batch: 361/38378 (0.94%) Loss: 3.611546 LR: 0.00000581 [21:43:58] Epoch: 1 Batch: 362/38378 (0.94%) Loss: 3.429801 LR: 0.00000581 [21:44:06] >> Cleaned up old temp checkpoint: epoch1_step33 [21:44:06] >> Temp checkpoint saved: epoch1_step363, size: 0.1702 GB [21:44:06] Epoch: 1 Batch: 363/38378 (0.95%) Loss: 3.168781 LR: 0.00000581 [21:44:09] Epoch: 1 Batch: 364/38378 (0.95%) Loss: 3.481647 LR: 0.00000592 [21:44:12] Epoch: 1 Batch: 365/38378 (0.95%) Loss: 3.903280 LR: 0.00000592 [21:44:15] Epoch: 1 Batch: 366/38378 (0.95%) Loss: 3.377783 LR: 0.00000592 [21:44:18] Epoch: 1 Batch: 367/38378 (0.96%) Loss: 3.551972 LR: 0.00000592 [21:44:21] Epoch: 1 Batch: 368/38378 (0.96%) Loss: 3.028324 LR: 0.00000592 [21:44:24] Epoch: 1 Batch: 369/38378 (0.96%) Loss: 3.473576 LR: 0.00000592 [21:44:27] Epoch: 1 Batch: 370/38378 (0.96%) Loss: 4.017837 LR: 0.00000592 [21:44:30] Epoch: 1 Batch: 371/38378 (0.97%) Loss: 3.462052 LR: 0.00000604 [21:44:33] Epoch: 1 Batch: 372/38378 (0.97%) Loss: 3.527038 LR: 0.00000604 [21:44:36] Epoch: 1 Batch: 373/38378 (0.97%) Loss: 3.169239 LR: 0.00000604 [21:44:39] Epoch: 1 Batch: 374/38378 (0.97%) Loss: 3.080874 LR: 0.00000604 [21:44:42] Epoch: 1 Batch: 375/38378 (0.98%) Loss: 3.322771 LR: 0.00000604 [21:44:45] Epoch: 1 Batch: 376/38378 (0.98%) Loss: 3.360427 LR: 0.00000604 [21:44:49] Epoch: 1 Batch: 377/38378 (0.98%) Loss: 3.280625 LR: 0.00000604 [21:44:52] Epoch: 1 Batch: 378/38378 (0.98%) Loss: 3.255173 LR: 0.00000615 [21:44:55] Epoch: 1 Batch: 379/38378 (0.99%) Loss: 3.486617 LR: 0.00000615 [21:44:58] Epoch: 1 Batch: 380/38378 (0.99%) Loss: 3.389522 LR: 0.00000615 [21:45:01] Epoch: 1 Batch: 381/38378 (0.99%) Loss: 3.415988 LR: 0.00000615 [21:45:04] Epoch: 1 Batch: 382/38378 (1.00%) Loss: 3.707755 LR: 0.00000615 [21:45:07] Epoch: 1 Batch: 383/38378 (1.00%) Loss: 3.079115 LR: 0.00000615 [21:45:10] Epoch: 1 Batch: 384/38378 (1.00%) Loss: 3.753197 LR: 0.00000615 [21:45:13] Epoch: 1 Batch: 385/38378 (1.00%) Loss: 3.507763 LR: 0.00000626 [21:45:16] Epoch: 1 Batch: 386/38378 (1.01%) Loss: 3.624879 LR: 0.00000626 [21:45:19] Epoch: 1 Batch: 387/38378 (1.01%) Loss: 3.366000 LR: 0.00000626 [21:45:22] Epoch: 1 Batch: 388/38378 (1.01%) Loss: 3.435059 LR: 0.00000626 [21:45:25] Epoch: 1 Batch: 389/38378 (1.01%) Loss: 3.292537 LR: 0.00000626 [21:45:28] Epoch: 1 Batch: 390/38378 (1.02%) Loss: 3.469031 LR: 0.00000626 [21:45:32] Epoch: 1 Batch: 391/38378 (1.02%) Loss: 3.372516 LR: 0.00000626 [21:45:35] Epoch: 1 Batch: 392/38378 (1.02%) Loss: 3.322258 LR: 0.00000638 [21:45:38] Epoch: 1 Batch: 393/38378 (1.02%) Loss: 3.061369 LR: 0.00000638 [21:45:41] Epoch: 1 Batch: 394/38378 (1.03%) Loss: 3.208388 LR: 0.00000638 [21:45:44] Epoch: 1 Batch: 395/38378 (1.03%) Loss: 3.393885 LR: 0.00000638 [21:45:51] >> Cleaned up old temp checkpoint: epoch1_step66 [21:45:51] >> Temp checkpoint saved: epoch1_step396, size: 0.1702 GB [21:45:51] Epoch: 1 Batch: 396/38378 (1.03%) Loss: 3.099539 LR: 0.00000638 [21:45:54] Epoch: 1 Batch: 397/38378 (1.03%) Loss: 3.461243 LR: 0.00000638 [21:45:57] Epoch: 1 Batch: 398/38378 (1.04%) Loss: 3.263430 LR: 0.00000638 [21:46:00] Epoch: 1 Batch: 399/38378 (1.04%) Loss: 3.426262 LR: 0.00000649 [21:46:03] Epoch: 1 Batch: 400/38378 (1.04%) Loss: 3.459949 LR: 0.00000649 [21:46:06] Epoch: 1 Batch: 401/38378 (1.04%) Loss: 3.252793 LR: 0.00000649 [21:46:09] Epoch: 1 Batch: 402/38378 (1.05%) Loss: 3.168297 LR: 0.00000649 [21:46:12] Epoch: 1 Batch: 403/38378 (1.05%) Loss: 3.251612 LR: 0.00000649 [21:46:15] Epoch: 1 Batch: 404/38378 (1.05%) Loss: 3.385485 LR: 0.00000649 [21:46:19] Epoch: 1 Batch: 405/38378 (1.06%) Loss: 3.636062 LR: 0.00000649 [21:46:22] Epoch: 1 Batch: 406/38378 (1.06%) Loss: 3.351034 LR: 0.00000661 [21:46:25] Epoch: 1 Batch: 407/38378 (1.06%) Loss: 3.379482 LR: 0.00000661 [21:46:28] Epoch: 1 Batch: 408/38378 (1.06%) Loss: 2.966108 LR: 0.00000661 [21:46:31] Epoch: 1 Batch: 409/38378 (1.07%) Loss: 3.110812 LR: 0.00000661 [21:46:34] Epoch: 1 Batch: 410/38378 (1.07%) Loss: 3.063781 LR: 0.00000661 [21:46:37] Epoch: 1 Batch: 411/38378 (1.07%) Loss: 3.020690 LR: 0.00000661 [21:46:40] Epoch: 1 Batch: 412/38378 (1.07%) Loss: 3.116003 LR: 0.00000661 [21:46:43] Epoch: 1 Batch: 413/38378 (1.08%) Loss: 3.735662 LR: 0.00000672 [21:46:46] Epoch: 1 Batch: 414/38378 (1.08%) Loss: 3.139761 LR: 0.00000672 [21:46:49] Epoch: 1 Batch: 415/38378 (1.08%) Loss: 3.033384 LR: 0.00000672 [21:46:52] Epoch: 1 Batch: 416/38378 (1.08%) Loss: 2.991915 LR: 0.00000672 [21:46:55] Epoch: 1 Batch: 417/38378 (1.09%) Loss: 3.381431 LR: 0.00000672 [21:46:58] Epoch: 1 Batch: 418/38378 (1.09%) Loss: 2.827977 LR: 0.00000672 [21:47:01] Epoch: 1 Batch: 419/38378 (1.09%) Loss: 3.306009 LR: 0.00000672 [21:47:04] Epoch: 1 Batch: 420/38378 (1.09%) Loss: 3.400380 LR: 0.00000683 [21:47:07] Epoch: 1 Batch: 421/38378 (1.10%) Loss: 3.126677 LR: 0.00000683 [21:47:10] Epoch: 1 Batch: 422/38378 (1.10%) Loss: 3.145681 LR: 0.00000683 [21:47:14] Epoch: 1 Batch: 423/38378 (1.10%) Loss: 2.986626 LR: 0.00000683 [21:47:17] Epoch: 1 Batch: 424/38378 (1.10%) Loss: 2.634757 LR: 0.00000683 [21:47:20] Epoch: 1 Batch: 425/38378 (1.11%) Loss: 3.368925 LR: 0.00000683 [21:47:23] Epoch: 1 Batch: 426/38378 (1.11%) Loss: 3.076137 LR: 0.00000683 [21:47:26] Epoch: 1 Batch: 427/38378 (1.11%) Loss: 3.144976 LR: 0.00000695 [21:47:29] Epoch: 1 Batch: 428/38378 (1.12%) Loss: 2.868480 LR: 0.00000695 [21:47:36] >> Cleaned up old temp checkpoint: epoch1_step99 [21:47:36] >> Temp checkpoint saved: epoch1_step429, size: 0.1702 GB [21:47:36] Epoch: 1 Batch: 429/38378 (1.12%) Loss: 3.094857 LR: 0.00000695 [21:47:39] Epoch: 1 Batch: 430/38378 (1.12%) Loss: 3.167593 LR: 0.00000695 [21:47:42] Epoch: 1 Batch: 431/38378 (1.12%) Loss: 3.098971 LR: 0.00000695 [21:47:45] Epoch: 1 Batch: 432/38378 (1.13%) Loss: 3.132343 LR: 0.00000695 [21:47:48] Epoch: 1 Batch: 433/38378 (1.13%) Loss: 3.108425 LR: 0.00000695 [21:47:51] Epoch: 1 Batch: 434/38378 (1.13%) Loss: 2.903331 LR: 0.00000706 [21:47:54] Epoch: 1 Batch: 435/38378 (1.13%) Loss: 3.311420 LR: 0.00000706 [21:47:57] Epoch: 1 Batch: 436/38378 (1.14%) Loss: 2.897499 LR: 0.00000706 [21:48:00] Epoch: 1 Batch: 437/38378 (1.14%) Loss: 2.680161 LR: 0.00000706 [21:48:03] Epoch: 1 Batch: 438/38378 (1.14%) Loss: 2.853214 LR: 0.00000706 [21:48:06] Epoch: 1 Batch: 439/38378 (1.14%) Loss: 3.250393 LR: 0.00000706 [21:48:09] Epoch: 1 Batch: 440/38378 (1.15%) Loss: 3.058198 LR: 0.00000706 [21:48:12] Epoch: 1 Batch: 441/38378 (1.15%) Loss: 2.783056 LR: 0.00000718 [21:48:15] Epoch: 1 Batch: 442/38378 (1.15%) Loss: 2.948597 LR: 0.00000718 [21:48:18] Epoch: 1 Batch: 443/38378 (1.15%) Loss: 3.018823 LR: 0.00000718 [21:48:21] Epoch: 1 Batch: 444/38378 (1.16%) Loss: 2.845767 LR: 0.00000718 [21:48:25] Epoch: 1 Batch: 445/38378 (1.16%) Loss: 3.124448 LR: 0.00000718 [21:48:28] Epoch: 1 Batch: 446/38378 (1.16%) Loss: 2.991099 LR: 0.00000718 [21:48:31] Epoch: 1 Batch: 447/38378 (1.16%) Loss: 3.321373 LR: 0.00000718 [21:48:34] Epoch: 1 Batch: 448/38378 (1.17%) Loss: 2.792124 LR: 0.00000729 [21:48:37] Epoch: 1 Batch: 449/38378 (1.17%) Loss: 3.410154 LR: 0.00000729 [21:48:40] Epoch: 1 Batch: 450/38378 (1.17%) Loss: 3.096085 LR: 0.00000729 [21:48:43] Epoch: 1 Batch: 451/38378 (1.18%) Loss: 3.012783 LR: 0.00000729 [21:48:46] Epoch: 1 Batch: 452/38378 (1.18%) Loss: 3.086165 LR: 0.00000729 [21:48:49] Epoch: 1 Batch: 453/38378 (1.18%) Loss: 3.056878 LR: 0.00000729 [21:48:52] Epoch: 1 Batch: 454/38378 (1.18%) Loss: 2.842410 LR: 0.00000729 [21:48:55] Epoch: 1 Batch: 455/38378 (1.19%) Loss: 2.850691 LR: 0.00000740 [21:48:58] Epoch: 1 Batch: 456/38378 (1.19%) Loss: 2.372428 LR: 0.00000740 [21:49:01] Epoch: 1 Batch: 457/38378 (1.19%) Loss: 3.031664 LR: 0.00000740 [21:49:04] Epoch: 1 Batch: 458/38378 (1.19%) Loss: 2.613009 LR: 0.00000740 [21:49:07] Epoch: 1 Batch: 459/38378 (1.20%) Loss: 2.556346 LR: 0.00000740 [21:49:10] Epoch: 1 Batch: 460/38378 (1.20%) Loss: 2.717404 LR: 0.00000740 [21:49:13] Epoch: 1 Batch: 461/38378 (1.20%) Loss: 2.937742 LR: 0.00000740 [21:49:21] >> Cleaned up old temp checkpoint: epoch1_step132 [21:49:21] >> Temp checkpoint saved: epoch1_step462, size: 0.1702 GB [21:49:21] Epoch: 1 Batch: 462/38378 (1.20%) Loss: 3.026179 LR: 0.00000752 [21:49:24] Epoch: 1 Batch: 463/38378 (1.21%) Loss: 3.165942 LR: 0.00000752 [21:49:27] Epoch: 1 Batch: 464/38378 (1.21%) Loss: 2.922586 LR: 0.00000752 [21:49:30] Epoch: 1 Batch: 465/38378 (1.21%) Loss: 2.836621 LR: 0.00000752 [21:49:33] Epoch: 1 Batch: 466/38378 (1.21%) Loss: 2.762107 LR: 0.00000752 [21:49:36] Epoch: 1 Batch: 467/38378 (1.22%) Loss: 2.554874 LR: 0.00000752 [21:49:39] Epoch: 1 Batch: 468/38378 (1.22%) Loss: 2.955701 LR: 0.00000752 [21:49:42] Epoch: 1 Batch: 469/38378 (1.22%) Loss: 3.025367 LR: 0.00000763 [21:49:45] Epoch: 1 Batch: 470/38378 (1.22%) Loss: 2.490263 LR: 0.00000763 [21:49:48] Epoch: 1 Batch: 471/38378 (1.23%) Loss: 2.830211 LR: 0.00000763 [21:49:51] Epoch: 1 Batch: 472/38378 (1.23%) Loss: 2.773396 LR: 0.00000763 [21:49:54] Epoch: 1 Batch: 473/38378 (1.23%) Loss: 2.870567 LR: 0.00000763 [21:49:57] Epoch: 1 Batch: 474/38378 (1.24%) Loss: 2.836768 LR: 0.00000763 [21:50:01] Epoch: 1 Batch: 475/38378 (1.24%) Loss: 2.862435 LR: 0.00000763 [21:50:04] Epoch: 1 Batch: 476/38378 (1.24%) Loss: 2.962499 LR: 0.00000774 [21:50:07] Epoch: 1 Batch: 477/38378 (1.24%) Loss: 2.836619 LR: 0.00000774 [21:50:10] Epoch: 1 Batch: 478/38378 (1.25%) Loss: 2.496119 LR: 0.00000774 [21:50:13] Epoch: 1 Batch: 479/38378 (1.25%) Loss: 2.678739 LR: 0.00000774 [21:50:16] Epoch: 1 Batch: 480/38378 (1.25%) Loss: 2.962086 LR: 0.00000774 [21:50:19] Epoch: 1 Batch: 481/38378 (1.25%) Loss: 2.915701 LR: 0.00000774 [21:50:22] Epoch: 1 Batch: 482/38378 (1.26%) Loss: 3.095269 LR: 0.00000774 [21:50:25] Epoch: 1 Batch: 483/38378 (1.26%) Loss: 2.713267 LR: 0.00000786 [21:50:28] Epoch: 1 Batch: 484/38378 (1.26%) Loss: 2.792262 LR: 0.00000786 [21:50:31] Epoch: 1 Batch: 485/38378 (1.26%) Loss: 2.938969 LR: 0.00000786 [21:50:34] Epoch: 1 Batch: 486/38378 (1.27%) Loss: 2.848706 LR: 0.00000786 [21:50:37] Epoch: 1 Batch: 487/38378 (1.27%) Loss: 2.926342 LR: 0.00000786 [21:50:40] Epoch: 1 Batch: 488/38378 (1.27%) Loss: 2.661978 LR: 0.00000786 [21:50:43] Epoch: 1 Batch: 489/38378 (1.27%) Loss: 3.267420 LR: 0.00000786 [21:50:46] Epoch: 1 Batch: 490/38378 (1.28%) Loss: 2.802702 LR: 0.00000797 [21:50:49] Epoch: 1 Batch: 491/38378 (1.28%) Loss: 2.648819 LR: 0.00000797 [21:50:53] Epoch: 1 Batch: 492/38378 (1.28%) Loss: 3.098214 LR: 0.00000797 [21:50:56] Epoch: 1 Batch: 493/38378 (1.28%) Loss: 2.687705 LR: 0.00000797 [21:50:59] Epoch: 1 Batch: 494/38378 (1.29%) Loss: 2.898034 LR: 0.00000797 [21:51:06] >> Cleaned up old temp checkpoint: epoch1_step165 [21:51:06] >> Temp checkpoint saved: epoch1_step495, size: 0.1702 GB [21:51:06] Epoch: 1 Batch: 495/38378 (1.29%) Loss: 2.745142 LR: 0.00000797 [21:51:09] Epoch: 1 Batch: 496/38378 (1.29%) Loss: 2.624750 LR: 0.00000797 [21:51:12] Epoch: 1 Batch: 497/38378 (1.30%) Loss: 2.819097 LR: 0.00000809 [21:51:15] Epoch: 1 Batch: 498/38378 (1.30%) Loss: 2.899592 LR: 0.00000809 [21:51:18] Epoch: 1 Batch: 499/38378 (1.30%) Loss: 2.751285 LR: 0.00000809 [21:51:21] >> Evaluating batch 0 [21:51:22] >> Evaluating batch 1 [21:51:23] >> Evaluating batch 2 [21:51:25] >> Evaluating batch 3 [21:51:26] >> Evaluating batch 4 [21:51:27] >> Evaluating batch 5 [21:51:28] >> Evaluating batch 6 [21:51:30] >> Evaluating batch 7 [21:51:31] >> Evaluating batch 8 [21:51:32] >> Evaluating batch 9 [21:51:33] >> Evaluating batch 10 [21:51:34] >> Evaluating batch 11 [21:51:36] >> Evaluating batch 12 [21:51:37] >> Evaluating batch 13 [21:51:38] >> Evaluating batch 14 [21:51:39] >> Evaluating batch 15 [21:51:41] >> Evaluating batch 16 [21:51:42] Epoch: 1 Step: 500/38378 Evaluation: [21:51:42] [1mAvg Loss Since Last Eval: 4.7612 Val Loss: 2.9254 Validation loss delta: 2.9254 'Perplexity: 18.6425 LR: 0.00000809 [21:51:46] >> Checkpoint saved: epoch1_step500, size: 0.1702 GB [21:51:46] Epoch: 1 Batch: 500/38378 (1.30%) Loss: 2.620382 LR: 0.00000809 [21:51:49] Epoch: 1 Batch: 501/38378 (1.31%) Loss: 2.935156 LR: 0.00000809 [21:51:52] Epoch: 1 Batch: 502/38378 (1.31%) Loss: 2.799627 LR: 0.00000809 [21:51:55] Epoch: 1 Batch: 503/38378 (1.31%) Loss: 2.903092 LR: 0.00000809 [21:51:58] Epoch: 1 Batch: 504/38378 (1.31%) Loss: 2.932722 LR: 0.00000820 [21:52:01] Epoch: 1 Batch: 505/38378 (1.32%) Loss: 2.478773 LR: 0.00000820 [21:52:04] Epoch: 1 Batch: 506/38378 (1.32%) Loss: 2.941505 LR: 0.00000820 [21:52:08] Epoch: 1 Batch: 507/38378 (1.32%) Loss: 2.551590 LR: 0.00000820 [21:52:11] Epoch: 1 Batch: 508/38378 (1.32%) Loss: 2.517526 LR: 0.00000820 [21:52:14] Epoch: 1 Batch: 509/38378 (1.33%) Loss: 2.681382 LR: 0.00000820 [21:52:17] Epoch: 1 Batch: 510/38378 (1.33%) Loss: 3.006157 LR: 0.00000820 [21:52:20] Epoch: 1 Batch: 511/38378 (1.33%) Loss: 2.948110 LR: 0.00000831 [21:52:23] Epoch: 1 Batch: 512/38378 (1.33%) Loss: 2.682347 LR: 0.00000831 [21:52:26] Epoch: 1 Batch: 513/38378 (1.34%) Loss: 2.862048 LR: 0.00000831 [21:52:29] Epoch: 1 Batch: 514/38378 (1.34%) Loss: 2.682139 LR: 0.00000831 [21:52:32] Epoch: 1 Batch: 515/38378 (1.34%) Loss: 3.006808 LR: 0.00000831 [21:52:35] Epoch: 1 Batch: 516/38378 (1.34%) Loss: 2.851148 LR: 0.00000831 [21:52:38] Epoch: 1 Batch: 517/38378 (1.35%) Loss: 2.578413 LR: 0.00000831 [21:52:41] Epoch: 1 Batch: 518/38378 (1.35%) Loss: 2.692137 LR: 0.00000843 [21:52:44] Epoch: 1 Batch: 519/38378 (1.35%) Loss: 2.511456 LR: 0.00000843 [21:52:47] Epoch: 1 Batch: 520/38378 (1.35%) Loss: 2.758758 LR: 0.00000843 [21:52:50] Epoch: 1 Batch: 521/38378 (1.36%) Loss: 2.601042 LR: 0.00000843 [21:52:53] Epoch: 1 Batch: 522/38378 (1.36%) Loss: 2.557038 LR: 0.00000843 [21:52:56] Epoch: 1 Batch: 523/38378 (1.36%) Loss: 2.671759 LR: 0.00000843 [21:52:59] Epoch: 1 Batch: 524/38378 (1.37%) Loss: 3.061105 LR: 0.00000843 [21:53:02] Epoch: 1 Batch: 525/38378 (1.37%) Loss: 2.496414 LR: 0.00000854 [21:53:06] Epoch: 1 Batch: 526/38378 (1.37%) Loss: 2.226687 LR: 0.00000854 [21:53:09] Epoch: 1 Batch: 527/38378 (1.37%) Loss: 2.695514 LR: 0.00000854 [21:53:16] >> Cleaned up old temp checkpoint: epoch1_step198 [21:53:16] >> Temp checkpoint saved: epoch1_step528, size: 0.1702 GB [21:53:16] Epoch: 1 Batch: 528/38378 (1.38%) Loss: 2.470053 LR: 0.00000854 [21:53:19] Epoch: 1 Batch: 529/38378 (1.38%) Loss: 2.810913 LR: 0.00000854 [21:53:22] Epoch: 1 Batch: 530/38378 (1.38%) Loss: 2.733479 LR: 0.00000854 [21:53:25] Epoch: 1 Batch: 531/38378 (1.38%) Loss: 2.778696 LR: 0.00000854 [21:53:28] Epoch: 1 Batch: 532/38378 (1.39%) Loss: 2.558206 LR: 0.00000866 [21:53:31] Epoch: 1 Batch: 533/38378 (1.39%) Loss: 2.542966 LR: 0.00000866 [21:53:34] Epoch: 1 Batch: 534/38378 (1.39%) Loss: 2.706975 LR: 0.00000866 [21:53:37] Epoch: 1 Batch: 535/38378 (1.39%) Loss: 2.854583 LR: 0.00000866 [21:53:40] Epoch: 1 Batch: 536/38378 (1.40%) Loss: 2.547437 LR: 0.00000866 [21:53:43] Epoch: 1 Batch: 537/38378 (1.40%) Loss: 2.722793 LR: 0.00000866 [21:53:46] Epoch: 1 Batch: 538/38378 (1.40%) Loss: 2.622955 LR: 0.00000866 [21:53:49] Epoch: 1 Batch: 539/38378 (1.40%) Loss: 2.483658 LR: 0.00000877 [21:53:52] Epoch: 1 Batch: 540/38378 (1.41%) Loss: 2.628791 LR: 0.00000877 [21:53:55] Epoch: 1 Batch: 541/38378 (1.41%) Loss: 2.407417 LR: 0.00000877 [21:53:58] Epoch: 1 Batch: 542/38378 (1.41%) Loss: 2.445017 LR: 0.00000877 [21:54:01] Epoch: 1 Batch: 543/38378 (1.41%) Loss: 2.576764 LR: 0.00000877 [21:54:04] Epoch: 1 Batch: 544/38378 (1.42%) Loss: 2.673256 LR: 0.00000877 [21:54:08] Epoch: 1 Batch: 545/38378 (1.42%) Loss: 2.580182 LR: 0.00000877 [21:54:11] Epoch: 1 Batch: 546/38378 (1.42%) Loss: 2.524456 LR: 0.00000888 [21:54:14] Epoch: 1 Batch: 547/38378 (1.43%) Loss: 3.038174 LR: 0.00000888 [21:54:17] Epoch: 1 Batch: 548/38378 (1.43%) Loss: 2.625224 LR: 0.00000888 [21:54:20] Epoch: 1 Batch: 549/38378 (1.43%) Loss: 2.351764 LR: 0.00000888 [21:54:23] Epoch: 1 Batch: 550/38378 (1.43%) Loss: 2.710614 LR: 0.00000888 [21:54:26] Epoch: 1 Batch: 551/38378 (1.44%) Loss: 2.585419 LR: 0.00000888 [21:54:29] Epoch: 1 Batch: 552/38378 (1.44%) Loss: 2.678363 LR: 0.00000888 [21:54:32] Epoch: 1 Batch: 553/38378 (1.44%) Loss: 2.664387 LR: 0.00000900 [21:54:35] Epoch: 1 Batch: 554/38378 (1.44%) Loss: 2.810636 LR: 0.00000900 [21:54:38] Epoch: 1 Batch: 555/38378 (1.45%) Loss: 2.368165 LR: 0.00000900 [21:54:41] Epoch: 1 Batch: 556/38378 (1.45%) Loss: 2.916900 LR: 0.00000900 [21:54:44] Epoch: 1 Batch: 557/38378 (1.45%) Loss: 2.430010 LR: 0.00000900 [21:54:47] Epoch: 1 Batch: 558/38378 (1.45%) Loss: 2.887228 LR: 0.00000900 [21:54:50] Epoch: 1 Batch: 559/38378 (1.46%) Loss: 2.608093 LR: 0.00000900 [21:54:53] Epoch: 1 Batch: 560/38378 (1.46%) Loss: 2.597916 LR: 0.00000911 [21:55:01] >> Cleaned up old temp checkpoint: epoch1_step231 [21:55:01] >> Temp checkpoint saved: epoch1_step561, size: 0.1702 GB [21:55:01] Epoch: 1 Batch: 561/38378 (1.46%) Loss: 2.285917 LR: 0.00000911 [21:55:04] Epoch: 1 Batch: 562/38378 (1.46%) Loss: 2.575504 LR: 0.00000911 [21:55:07] Epoch: 1 Batch: 563/38378 (1.47%) Loss: 2.621603 LR: 0.00000911 [21:55:10] Epoch: 1 Batch: 564/38378 (1.47%) Loss: 2.214562 LR: 0.00000911 [21:55:13] Epoch: 1 Batch: 565/38378 (1.47%) Loss: 2.666754 LR: 0.00000911 [21:55:16] Epoch: 1 Batch: 566/38378 (1.47%) Loss: 2.323968 LR: 0.00000911 [21:55:19] Epoch: 1 Batch: 567/38378 (1.48%) Loss: 2.874478 LR: 0.00000923 [21:55:22] Epoch: 1 Batch: 568/38378 (1.48%) Loss: 2.196221 LR: 0.00000923 [21:55:25] Epoch: 1 Batch: 569/38378 (1.48%) Loss: 2.467180 LR: 0.00000923 [21:55:28] Epoch: 1 Batch: 570/38378 (1.49%) Loss: 2.696464 LR: 0.00000923 [21:55:31] Epoch: 1 Batch: 571/38378 (1.49%) Loss: 2.875287 LR: 0.00000923 [21:55:34] Epoch: 1 Batch: 572/38378 (1.49%) Loss: 2.516870 LR: 0.00000923 [21:55:38] Epoch: 1 Batch: 573/38378 (1.49%) Loss: 2.670900 LR: 0.00000923 [21:55:41] Epoch: 1 Batch: 574/38378 (1.50%) Loss: 2.472196 LR: 0.00000934 [21:55:44] Epoch: 1 Batch: 575/38378 (1.50%) Loss: 2.419791 LR: 0.00000934 [21:55:47] Epoch: 1 Batch: 576/38378 (1.50%) Loss: 2.565501 LR: 0.00000934 [21:55:50] Epoch: 1 Batch: 577/38378 (1.50%) Loss: 2.160207 LR: 0.00000934 [21:55:53] Epoch: 1 Batch: 578/38378 (1.51%) Loss: 2.589928 LR: 0.00000934 [21:55:56] Epoch: 1 Batch: 579/38378 (1.51%) Loss: 2.320048 LR: 0.00000934 [21:55:59] Epoch: 1 Batch: 580/38378 (1.51%) Loss: 2.580634 LR: 0.00000934 [21:56:02] Epoch: 1 Batch: 581/38378 (1.51%) Loss: 2.462677 LR: 0.00000945 [21:56:05] Epoch: 1 Batch: 582/38378 (1.52%) Loss: 2.725107 LR: 0.00000945 [21:56:08] Epoch: 1 Batch: 583/38378 (1.52%) Loss: 2.783955 LR: 0.00000945 [21:56:11] Epoch: 1 Batch: 584/38378 (1.52%) Loss: 2.508150 LR: 0.00000945 [21:56:14] Epoch: 1 Batch: 585/38378 (1.52%) Loss: 2.693967 LR: 0.00000945 [21:56:17] Epoch: 1 Batch: 586/38378 (1.53%) Loss: 2.383064 LR: 0.00000945 [21:56:20] Epoch: 1 Batch: 587/38378 (1.53%) Loss: 2.592644 LR: 0.00000945 [21:56:24] Epoch: 1 Batch: 588/38378 (1.53%) Loss: 2.646351 LR: 0.00000957 [21:56:27] Epoch: 1 Batch: 589/38378 (1.53%) Loss: 2.338320 LR: 0.00000957 [21:56:30] Epoch: 1 Batch: 590/38378 (1.54%) Loss: 2.291920 LR: 0.00000957 [21:56:33] Epoch: 1 Batch: 591/38378 (1.54%) Loss: 2.521715 LR: 0.00000957 [21:56:36] Epoch: 1 Batch: 592/38378 (1.54%) Loss: 2.673134 LR: 0.00000957 [21:56:39] Epoch: 1 Batch: 593/38378 (1.55%) Loss: 2.800904 LR: 0.00000957 [21:56:46] >> Cleaned up old temp checkpoint: epoch1_step264 [21:56:46] >> Temp checkpoint saved: epoch1_step594, size: 0.1702 GB [21:56:46] Epoch: 1 Batch: 594/38378 (1.55%) Loss: 2.554381 LR: 0.00000957 [21:56:49] Epoch: 1 Batch: 595/38378 (1.55%) Loss: 2.535363 LR: 0.00000968 [21:56:52] Epoch: 1 Batch: 596/38378 (1.55%) Loss: 2.592125 LR: 0.00000968 [21:56:55] Epoch: 1 Batch: 597/38378 (1.56%) Loss: 2.500444 LR: 0.00000968 [21:56:58] Epoch: 1 Batch: 598/38378 (1.56%) Loss: 2.762440 LR: 0.00000968 [21:57:01] Epoch: 1 Batch: 599/38378 (1.56%) Loss: 2.464513 LR: 0.00000968 [21:57:04] Epoch: 1 Batch: 600/38378 (1.56%) Loss: 2.601847 LR: 0.00000968 [21:57:07] Epoch: 1 Batch: 601/38378 (1.57%) Loss: 2.375511 LR: 0.00000968 [21:57:10] Epoch: 1 Batch: 602/38378 (1.57%) Loss: 2.419599 LR: 0.00000979 [21:57:13] Epoch: 1 Batch: 603/38378 (1.57%) Loss: 2.404740 LR: 0.00000979 [21:57:16] Epoch: 1 Batch: 604/38378 (1.57%) Loss: 2.349013 LR: 0.00000979 [21:57:20] Epoch: 1 Batch: 605/38378 (1.58%) Loss: 2.680650 LR: 0.00000979 [21:57:23] Epoch: 1 Batch: 606/38378 (1.58%) Loss: 2.429367 LR: 0.00000979 [21:57:26] Epoch: 1 Batch: 607/38378 (1.58%) Loss: 2.789640 LR: 0.00000979 [21:57:29] Epoch: 1 Batch: 608/38378 (1.58%) Loss: 2.635769 LR: 0.00000979 [21:57:32] Epoch: 1 Batch: 609/38378 (1.59%) Loss: 2.563586 LR: 0.00000991 [21:57:35] Epoch: 1 Batch: 610/38378 (1.59%) Loss: 2.313303 LR: 0.00000991 [21:57:38] Epoch: 1 Batch: 611/38378 (1.59%) Loss: 2.483899 LR: 0.00000991 [21:57:41] Epoch: 1 Batch: 612/38378 (1.59%) Loss: 2.066755 LR: 0.00000991 [21:57:44] Epoch: 1 Batch: 613/38378 (1.60%) Loss: 2.451644 LR: 0.00000991 [21:57:47] Epoch: 1 Batch: 614/38378 (1.60%) Loss: 2.047642 LR: 0.00000991 [21:57:50] Epoch: 1 Batch: 615/38378 (1.60%) Loss: 2.949406 LR: 0.00000991 [21:57:53] Epoch: 1 Batch: 616/38378 (1.61%) Loss: 2.780540 LR: 0.00001002 [21:57:56] Epoch: 1 Batch: 617/38378 (1.61%) Loss: 2.729009 LR: 0.00001002 [21:57:59] Epoch: 1 Batch: 618/38378 (1.61%) Loss: 2.591674 LR: 0.00001002 [21:58:02] Epoch: 1 Batch: 619/38378 (1.61%) Loss: 2.438821 LR: 0.00001002 [21:58:05] Epoch: 1 Batch: 620/38378 (1.62%) Loss: 2.234570 LR: 0.00001002 [21:58:08] Epoch: 1 Batch: 621/38378 (1.62%) Loss: 2.439206 LR: 0.00001002 [21:58:12] Epoch: 1 Batch: 622/38378 (1.62%) Loss: 2.420854 LR: 0.00001002 [21:58:15] Epoch: 1 Batch: 623/38378 (1.62%) Loss: 2.538013 LR: 0.00001014 [21:58:18] Epoch: 1 Batch: 624/38378 (1.63%) Loss: 2.297755 LR: 0.00001014 [21:58:21] Epoch: 1 Batch: 625/38378 (1.63%) Loss: 2.225003 LR: 0.00001014 [21:58:24] Epoch: 1 Batch: 626/38378 (1.63%) Loss: 2.200465 LR: 0.00001014 [21:58:31] >> Cleaned up old temp checkpoint: epoch1_step297 [21:58:31] >> Temp checkpoint saved: epoch1_step627, size: 0.1702 GB [21:58:31] Epoch: 1 Batch: 627/38378 (1.63%) Loss: 2.436458 LR: 0.00001014 [21:58:34] Epoch: 1 Batch: 628/38378 (1.64%) Loss: 2.806550 LR: 0.00001014 [21:58:37] Epoch: 1 Batch: 629/38378 (1.64%) Loss: 2.597137 LR: 0.00001014 [21:58:40] Epoch: 1 Batch: 630/38378 (1.64%) Loss: 2.420276 LR: 0.00001025 [21:58:43] Epoch: 1 Batch: 631/38378 (1.64%) Loss: 2.272601 LR: 0.00001025 [21:58:46] Epoch: 1 Batch: 632/38378 (1.65%) Loss: 2.359733 LR: 0.00001025 [21:58:50] Epoch: 1 Batch: 633/38378 (1.65%) Loss: 2.414894 LR: 0.00001025 [21:58:53] Epoch: 1 Batch: 634/38378 (1.65%) Loss: 2.479996 LR: 0.00001025 [21:58:56] Epoch: 1 Batch: 635/38378 (1.65%) Loss: 2.333269 LR: 0.00001025 [21:58:59] Epoch: 1 Batch: 636/38378 (1.66%) Loss: 2.301518 LR: 0.00001025 [21:59:02] Epoch: 1 Batch: 637/38378 (1.66%) Loss: 2.633277 LR: 0.00001036 [21:59:05] Epoch: 1 Batch: 638/38378 (1.66%) Loss: 2.417414 LR: 0.00001036 [21:59:08] Epoch: 1 Batch: 639/38378 (1.67%) Loss: 2.694861 LR: 0.00001036 [21:59:11] Epoch: 1 Batch: 640/38378 (1.67%) Loss: 2.535944 LR: 0.00001036 [21:59:14] Epoch: 1 Batch: 641/38378 (1.67%) Loss: 2.343565 LR: 0.00001036 [21:59:17] Epoch: 1 Batch: 642/38378 (1.67%) Loss: 2.484047 LR: 0.00001036 [21:59:20] Epoch: 1 Batch: 643/38378 (1.68%) Loss: 2.461201 LR: 0.00001036 [21:59:24] Epoch: 1 Batch: 644/38378 (1.68%) Loss: 2.700276 LR: 0.00001048 [21:59:26] Epoch: 1 Batch: 645/38378 (1.68%) Loss: 2.314776 LR: 0.00001048 [21:59:29] Epoch: 1 Batch: 646/38378 (1.68%) Loss: 2.685287 LR: 0.00001048 [21:59:33] Epoch: 1 Batch: 647/38378 (1.69%) Loss: 2.658276 LR: 0.00001048 [21:59:35] Epoch: 1 Batch: 648/38378 (1.69%) Loss: 2.676983 LR: 0.00001048 [21:59:39] Epoch: 1 Batch: 649/38378 (1.69%) Loss: 2.466840 LR: 0.00001048 [21:59:42] Epoch: 1 Batch: 650/38378 (1.69%) Loss: 2.464982 LR: 0.00001048 [21:59:45] Epoch: 1 Batch: 651/38378 (1.70%) Loss: 2.259222 LR: 0.00001059 [21:59:48] Epoch: 1 Batch: 652/38378 (1.70%) Loss: 2.392079 LR: 0.00001059 [21:59:51] Epoch: 1 Batch: 653/38378 (1.70%) Loss: 2.538207 LR: 0.00001059 [21:59:54] Epoch: 1 Batch: 654/38378 (1.70%) Loss: 2.413375 LR: 0.00001059 [21:59:57] Epoch: 1 Batch: 655/38378 (1.71%) Loss: 2.542210 LR: 0.00001059 [22:00:00] Epoch: 1 Batch: 656/38378 (1.71%) Loss: 2.461487 LR: 0.00001059 [22:00:03] Epoch: 1 Batch: 657/38378 (1.71%) Loss: 2.338935 LR: 0.00001059 [22:00:06] Epoch: 1 Batch: 658/38378 (1.71%) Loss: 2.524172 LR: 0.00001071 [22:00:09] Epoch: 1 Batch: 659/38378 (1.72%) Loss: 2.688110 LR: 0.00001071 [22:00:16] >> Cleaned up old temp checkpoint: epoch1_step330 [22:00:16] >> Temp checkpoint saved: epoch1_step660, size: 0.1702 GB [22:00:16] Epoch: 1 Batch: 660/38378 (1.72%) Loss: 2.440571 LR: 0.00001071 [22:00:19] Epoch: 1 Batch: 661/38378 (1.72%) Loss: 2.760349 LR: 0.00001071 [22:00:22] Epoch: 1 Batch: 662/38378 (1.72%) Loss: 2.215806 LR: 0.00001071 [22:00:25] Epoch: 1 Batch: 663/38378 (1.73%) Loss: 2.197159 LR: 0.00001071 [22:00:28] Epoch: 1 Batch: 664/38378 (1.73%) Loss: 2.246229 LR: 0.00001071 [22:00:31] Epoch: 1 Batch: 665/38378 (1.73%) Loss: 2.065708 LR: 0.00001082 [22:00:34] Epoch: 1 Batch: 666/38378 (1.74%) Loss: 2.610571 LR: 0.00001082 [22:00:37] Epoch: 1 Batch: 667/38378 (1.74%) Loss: 2.747888 LR: 0.00001082 [22:00:40] Epoch: 1 Batch: 668/38378 (1.74%) Loss: 2.328878 LR: 0.00001082 [22:00:43] Epoch: 1 Batch: 669/38378 (1.74%) Loss: 2.412668 LR: 0.00001082 [22:00:47] Epoch: 1 Batch: 670/38378 (1.75%) Loss: 2.472459 LR: 0.00001082 [22:00:50] Epoch: 1 Batch: 671/38378 (1.75%) Loss: 2.408286 LR: 0.00001082 [22:00:53] Epoch: 1 Batch: 672/38378 (1.75%) Loss: 2.289797 LR: 0.00001093 [22:21:35] 2025-08-12 [22:21:38] Tesla T4 [22:21:38] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [22:21:38] CPU usage: 95.9%, RAM usage: 25.9% [22:21:38] Running with the following configuration: [22:21:38] model_name: NousResearch/Hermes-3-Llama-3.1-8B [22:21:38] tokenizer: NousResearch/Hermes-3-Llama-3.1-8B [22:21:38] output_dir: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview [22:21:38] train_path: /content/drive/MyDrive/data/none.csv [22:21:38] checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step627 [22:21:38] lr: 5e-05 [22:21:38] lr_floor: 1e-05 [22:21:38] epochs: 1 [22:21:38] batch_size: 5 [22:21:38] accum_steps: 7 [22:21:38] val_batch_size: 6 [22:21:38] max_val_size: 100 [22:21:38] max_length: 150 [22:21:38] save_temp_frequency: 33 [22:21:38] save_frequency: 500 [22:21:38] eval_frequency: 500 [22:21:38] save_pattern: y [22:21:38] quantization: y [22:21:38] quantization_bits: 4 [22:21:38] lora: y [22:21:38] frozen_lora_path: None [22:21:38] lora_rank: 16 [22:21:38] lora_alpha: 32 [22:21:38] lora_dropout: 0.08 [22:21:38] optimizer_weight_decay: 0.0 [22:21:38] warmup_type: cosine [22:21:38] warmup_ratio: 0.08 [22:21:38] warmup_steps: 439 [22:21:38] shuffle: y [22:21:38] csv_column: text [22:21:38] new_run: n [22:21:38] label_smoothing: 0.05 [22:21:38] SEED: 1 [22:21:38] Using device: cuda [22:21:38] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step627 [22:23:15] Embeddings shape after: torch.Size([128256, 4096]) [22:23:20] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step627 [22:23:20] Trainable LoRA 'default': [22:23:20] task_type: CAUSAL_LM [22:23:20] peft_type: PeftType.LORA [22:23:20] auto_mapping: None [22:23:20] base_model_name_or_path: NousResearch/Hermes-3-Llama-3.1-8B [22:23:20] revision: None [22:23:20] inference_mode: False [22:23:20] r: 16 [22:23:20] target_modules: {'k_proj', 'o_proj', 'v_proj', 'q_proj'} [22:23:20] exclude_modules: None [22:23:20] lora_alpha: 32 [22:23:20] lora_dropout: 0.08 [22:23:20] fan_in_fan_out: False [22:23:20] bias: none [22:23:20] use_rslora: True [22:23:20] modules_to_save: None [22:23:20] init_lora_weights: True [22:23:20] layers_to_transform: None [22:23:20] layers_pattern: None [22:23:20] rank_pattern: {} [22:23:20] alpha_pattern: {} [22:23:20] megatron_config: None [22:23:20] megatron_core: megatron.core [22:23:20] trainable_token_indices: None [22:23:20] loftq_config: {} [22:23:20] eva_config: None [22:23:20] corda_config: None [22:23:20] use_dora: False [22:23:20] use_qalora: False [22:23:20] qalora_group_size: 16 [22:23:20] layer_replication: None [22:23:20] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [22:23:20] lora_bias: False [22:23:20] target_parameters: None [22:23:20] _custom_modules: None [22:23:20] Embeddings shape after: torch.Size([128256, 4096]) [22:23:31] Resumed from epoch 1, step 628, file 1 [22:23:31] Starting from CSV file... [22:23:35] Splitting data into chunks of 11000... [22:23:35] Using 7 processes across 18 chunks [22:23:35] Using saved train/val split from checkpoint. [22:23:35] Resuming scheduler with warmup steps: 438, total steps: 5482 [22:23:35] Initializing scheduler with cosine schedule with warmup, warmup steps 439, total steps: 5482 [22:23:35] Train/Val split: 191887 train, 100 val samples. [22:23:45] Model: PeftModelForCausalLM [22:23:45] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.0", "use_cache": true, "vocab_size": 128256 } [22:23:45] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [22:23:45] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 5e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [22:23:45] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [22:23:45] Scheduler: [22:23:45] Training on 191887 training samples, 100 validation samples [22:23:45] Average tokens per sample: 141.99 [22:23:45] Estimated epoch time: ~598.03 min [22:23:45] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Active memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Requested memory | 5983 MiB | 7000 MiB | 332173 MiB | 326190 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 7616 MiB | 7616 MiB | 7616 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 1259 MiB | 5879 MiB | 333261 MiB | 332002 MiB | |---------------------------------------------------------------------------| | Allocations | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | Active allocs | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | GPU reserved segments | 186 | 186 | 186 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 33 | 37 | 12954 | 12921 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [22:23:45] Restoring shuffle indices from training state for epoch 1 [22:23:45] CPU usage: 58.3%, RAM usage: 40.6% [22:23:45] Epoch 1 learning rate: 0.0 [22:23:45] Starting epoch 1 [22:23:48] Batch 628: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) [22:23:49] Epoch: 1 Batch: 628/38378 (1.64%) Loss: 2.806133 LR: 0.00000000 [22:23:51] Epoch: 1 Batch: 629/38378 (1.64%) Loss: 2.596608 LR: 0.00000000 [22:23:53] Epoch: 1 Batch: 630/38378 (1.64%) Loss: 2.420663 LR: 0.00000000 [22:23:54] Epoch: 1 Batch: 631/38378 (1.64%) Loss: 2.280098 LR: 0.00000000 [22:23:56] Epoch: 1 Batch: 632/38378 (1.65%) Loss: 2.368384 LR: 0.00000000 [22:23:57] Epoch: 1 Batch: 633/38378 (1.65%) Loss: 2.424439 LR: 0.00000000 [22:23:59] Epoch: 1 Batch: 634/38378 (1.65%) Loss: 2.491126 LR: 0.00001025 [22:24:00] Epoch: 1 Batch: 635/38378 (1.65%) Loss: 2.343545 LR: 0.00001025 [22:24:02] Epoch: 1 Batch: 636/38378 (1.66%) Loss: 2.309943 LR: 0.00001025 [22:24:04] Epoch: 1 Batch: 637/38378 (1.66%) Loss: 2.638439 LR: 0.00001025 [22:24:05] Epoch: 1 Batch: 638/38378 (1.66%) Loss: 2.428503 LR: 0.00001025 [22:24:07] Epoch: 1 Batch: 639/38378 (1.67%) Loss: 2.713404 LR: 0.00001025 [22:24:08] Epoch: 1 Batch: 640/38378 (1.67%) Loss: 2.550642 LR: 0.00001025 [22:24:10] Epoch: 1 Batch: 641/38378 (1.67%) Loss: 2.362021 LR: 0.00001036 [22:24:11] Epoch: 1 Batch: 642/38378 (1.67%) Loss: 2.491523 LR: 0.00001036 [22:24:13] Epoch: 1 Batch: 643/38378 (1.68%) Loss: 2.467066 LR: 0.00001036 [22:24:15] Epoch: 1 Batch: 644/38378 (1.68%) Loss: 2.708607 LR: 0.00001036 [22:24:16] Epoch: 1 Batch: 645/38378 (1.68%) Loss: 2.333639 LR: 0.00001036 [22:24:18] Epoch: 1 Batch: 646/38378 (1.68%) Loss: 2.702145 LR: 0.00001036 [22:24:19] Epoch: 1 Batch: 647/38378 (1.69%) Loss: 2.674978 LR: 0.00001036 [22:24:21] Epoch: 1 Batch: 648/38378 (1.69%) Loss: 2.695945 LR: 0.00001048 [22:24:23] Epoch: 1 Batch: 649/38378 (1.69%) Loss: 2.475544 LR: 0.00001048 [22:24:24] Epoch: 1 Batch: 650/38378 (1.69%) Loss: 2.475773 LR: 0.00001048 [22:24:26] Epoch: 1 Batch: 651/38378 (1.70%) Loss: 2.269035 LR: 0.00001048 [22:24:28] Epoch: 1 Batch: 652/38378 (1.70%) Loss: 2.407399 LR: 0.00001048 [22:24:29] Epoch: 1 Batch: 653/38378 (1.70%) Loss: 2.554414 LR: 0.00001048 [22:24:31] Epoch: 1 Batch: 654/38378 (1.70%) Loss: 2.437216 LR: 0.00001048 [22:24:33] Epoch: 1 Batch: 655/38378 (1.71%) Loss: 2.556531 LR: 0.00001059 [22:24:34] Epoch: 1 Batch: 656/38378 (1.71%) Loss: 2.467597 LR: 0.00001059 [22:24:36] Epoch: 1 Batch: 657/38378 (1.71%) Loss: 2.345622 LR: 0.00001059 [22:24:38] Epoch: 1 Batch: 658/38378 (1.71%) Loss: 2.533615 LR: 0.00001059 [22:24:39] Epoch: 1 Batch: 659/38378 (1.72%) Loss: 2.705402 LR: 0.00001059 [22:24:45] >> Temp checkpoint saved: epoch1_step660, size: 0.1702 GB [22:24:45] Epoch: 1 Batch: 660/38378 (1.72%) Loss: 2.455079 LR: 0.00001059 [22:24:47] Epoch: 1 Batch: 661/38378 (1.72%) Loss: 2.775611 LR: 0.00001059 [22:24:48] Epoch: 1 Batch: 662/38378 (1.72%) Loss: 2.229932 LR: 0.00001071 [22:24:50] Epoch: 1 Batch: 663/38378 (1.73%) Loss: 2.205683 LR: 0.00001071 [22:24:52] Epoch: 1 Batch: 664/38378 (1.73%) Loss: 2.253190 LR: 0.00001071 [22:24:53] Epoch: 1 Batch: 665/38378 (1.73%) Loss: 2.074625 LR: 0.00001071 [22:24:55] Epoch: 1 Batch: 666/38378 (1.74%) Loss: 2.626468 LR: 0.00001071 [22:24:57] Epoch: 1 Batch: 667/38378 (1.74%) Loss: 2.765006 LR: 0.00001071 [22:24:59] Epoch: 1 Batch: 668/38378 (1.74%) Loss: 2.346671 LR: 0.00001071 [22:25:00] Epoch: 1 Batch: 669/38378 (1.74%) Loss: 2.431674 LR: 0.00001082 [22:25:02] Epoch: 1 Batch: 670/38378 (1.75%) Loss: 2.478313 LR: 0.00001082 [22:25:04] Epoch: 1 Batch: 671/38378 (1.75%) Loss: 2.413749 LR: 0.00001082 [22:25:06] Epoch: 1 Batch: 672/38378 (1.75%) Loss: 2.295818 LR: 0.00001082 [22:25:08] Epoch: 1 Batch: 673/38378 (1.75%) Loss: 2.428644 LR: 0.00001082 [22:25:09] Epoch: 1 Batch: 674/38378 (1.76%) Loss: 2.250736 LR: 0.00001082 [22:25:11] Epoch: 1 Batch: 675/38378 (1.76%) Loss: 2.286165 LR: 0.00001082 [22:25:13] Epoch: 1 Batch: 676/38378 (1.76%) Loss: 2.494623 LR: 0.00001093 [22:25:15] Epoch: 1 Batch: 677/38378 (1.76%) Loss: 2.175306 LR: 0.00001093 [22:25:16] Epoch: 1 Batch: 678/38378 (1.77%) Loss: 2.236684 LR: 0.00001093 [22:25:18] Epoch: 1 Batch: 679/38378 (1.77%) Loss: 2.389483 LR: 0.00001093 [22:25:20] Epoch: 1 Batch: 680/38378 (1.77%) Loss: 2.025570 LR: 0.00001093 [22:25:22] Epoch: 1 Batch: 681/38378 (1.77%) Loss: 2.433769 LR: 0.00001093 [22:25:23] Epoch: 1 Batch: 682/38378 (1.78%) Loss: 2.696160 LR: 0.00001093 [22:25:25] Epoch: 1 Batch: 683/38378 (1.78%) Loss: 2.474165 LR: 0.00001105 [22:25:27] Epoch: 1 Batch: 684/38378 (1.78%) Loss: 2.196084 LR: 0.00001105 [22:25:28] Epoch: 1 Batch: 685/38378 (1.78%) Loss: 2.159079 LR: 0.00001105 [22:25:30] Epoch: 1 Batch: 686/38378 (1.79%) Loss: 2.254333 LR: 0.00001105 [22:25:32] Epoch: 1 Batch: 687/38378 (1.79%) Loss: 2.292935 LR: 0.00001105 [22:25:33] Epoch: 1 Batch: 688/38378 (1.79%) Loss: 2.438191 LR: 0.00001105 [22:25:35] Epoch: 1 Batch: 689/38378 (1.80%) Loss: 2.467377 LR: 0.00001105 [22:25:37] Epoch: 1 Batch: 690/38378 (1.80%) Loss: 2.512142 LR: 0.00001116 [22:25:39] Epoch: 1 Batch: 691/38378 (1.80%) Loss: 2.111384 LR: 0.00001116 [22:25:40] Epoch: 1 Batch: 692/38378 (1.80%) Loss: 2.363394 LR: 0.00001116 [22:25:46] >> Cleaned up old temp checkpoint: epoch1_step363 [22:25:46] >> Temp checkpoint saved: epoch1_step693, size: 0.1702 GB [22:25:46] Epoch: 1 Batch: 693/38378 (1.81%) Loss: 2.389520 LR: 0.00001116 [22:25:48] Epoch: 1 Batch: 694/38378 (1.81%) Loss: 2.573147 LR: 0.00001116 [22:25:50] Epoch: 1 Batch: 695/38378 (1.81%) Loss: 2.121024 LR: 0.00001116 [22:25:51] Epoch: 1 Batch: 696/38378 (1.81%) Loss: 2.251171 LR: 0.00001116 [22:25:53] Epoch: 1 Batch: 697/38378 (1.82%) Loss: 2.333383 LR: 0.00001128 [22:25:55] Epoch: 1 Batch: 698/38378 (1.82%) Loss: 2.528267 LR: 0.00001128 [22:25:56] Epoch: 1 Batch: 699/38378 (1.82%) Loss: 2.570486 LR: 0.00001128 [22:25:58] Epoch: 1 Batch: 700/38378 (1.82%) Loss: 2.205407 LR: 0.00001128 [22:26:00] Epoch: 1 Batch: 701/38378 (1.83%) Loss: 2.329703 LR: 0.00001128 [22:26:01] Epoch: 1 Batch: 702/38378 (1.83%) Loss: 2.576238 LR: 0.00001128 [22:26:03] Epoch: 1 Batch: 703/38378 (1.83%) Loss: 2.452754 LR: 0.00001128 [22:26:05] Epoch: 1 Batch: 704/38378 (1.83%) Loss: 2.234351 LR: 0.00001139 [22:26:06] Epoch: 1 Batch: 705/38378 (1.84%) Loss: 2.796530 LR: 0.00001139 [22:26:08] Epoch: 1 Batch: 706/38378 (1.84%) Loss: 2.422507 LR: 0.00001139 [22:26:10] Epoch: 1 Batch: 707/38378 (1.84%) Loss: 2.210773 LR: 0.00001139 [22:26:12] Epoch: 1 Batch: 708/38378 (1.84%) Loss: 2.429106 LR: 0.00001139 [22:26:13] Epoch: 1 Batch: 709/38378 (1.85%) Loss: 2.605608 LR: 0.00001139 [22:26:15] Epoch: 1 Batch: 710/38378 (1.85%) Loss: 2.328905 LR: 0.00001139 [22:26:17] Epoch: 1 Batch: 711/38378 (1.85%) Loss: 2.351148 LR: 0.00001150 [22:26:18] Epoch: 1 Batch: 712/38378 (1.86%) Loss: 2.302187 LR: 0.00001150 [22:26:20] Epoch: 1 Batch: 713/38378 (1.86%) Loss: 2.425648 LR: 0.00001150 [22:26:22] Epoch: 1 Batch: 714/38378 (1.86%) Loss: 2.482795 LR: 0.00001150 [22:26:24] Epoch: 1 Batch: 715/38378 (1.86%) Loss: 2.497917 LR: 0.00001150 [22:26:25] Epoch: 1 Batch: 716/38378 (1.87%) Loss: 2.002152 LR: 0.00001150 [22:26:27] Epoch: 1 Batch: 717/38378 (1.87%) Loss: 2.037002 LR: 0.00001150 [22:26:29] Epoch: 1 Batch: 718/38378 (1.87%) Loss: 2.101960 LR: 0.00001162 [22:26:31] Epoch: 1 Batch: 719/38378 (1.87%) Loss: 2.360165 LR: 0.00001162 [22:26:32] Epoch: 1 Batch: 720/38378 (1.88%) Loss: 2.291713 LR: 0.00001162 [22:26:34] Epoch: 1 Batch: 721/38378 (1.88%) Loss: 2.295722 LR: 0.00001162 [22:26:36] Epoch: 1 Batch: 722/38378 (1.88%) Loss: 2.584399 LR: 0.00001162 [22:26:37] Epoch: 1 Batch: 723/38378 (1.88%) Loss: 2.346062 LR: 0.00001162 [22:26:39] Epoch: 1 Batch: 724/38378 (1.89%) Loss: 2.757287 LR: 0.00001162 [22:26:41] Epoch: 1 Batch: 725/38378 (1.89%) Loss: 2.320841 LR: 0.00001173 [22:26:47] >> Cleaned up old temp checkpoint: epoch1_step396 [22:26:47] >> Temp checkpoint saved: epoch1_step726, size: 0.1702 GB [22:26:47] Epoch: 1 Batch: 726/38378 (1.89%) Loss: 2.421071 LR: 0.00001173 [22:26:49] Epoch: 1 Batch: 727/38378 (1.89%) Loss: 2.477959 LR: 0.00001173 [22:26:50] Epoch: 1 Batch: 728/38378 (1.90%) Loss: 2.421084 LR: 0.00001173 [22:26:52] Epoch: 1 Batch: 729/38378 (1.90%) Loss: 2.415014 LR: 0.00001173 [22:26:54] Epoch: 1 Batch: 730/38378 (1.90%) Loss: 2.332962 LR: 0.00001173 [22:26:55] Epoch: 1 Batch: 731/38378 (1.90%) Loss: 2.595359 LR: 0.00001173 [22:26:57] Epoch: 1 Batch: 732/38378 (1.91%) Loss: 2.286149 LR: 0.00001185 [22:26:59] Epoch: 1 Batch: 733/38378 (1.91%) Loss: 2.425635 LR: 0.00001185 [22:27:00] Epoch: 1 Batch: 734/38378 (1.91%) Loss: 2.219296 LR: 0.00001185 [22:27:02] Epoch: 1 Batch: 735/38378 (1.92%) Loss: 2.370471 LR: 0.00001185 [22:27:04] Epoch: 1 Batch: 736/38378 (1.92%) Loss: 2.350822 LR: 0.00001185 [22:27:06] Epoch: 1 Batch: 737/38378 (1.92%) Loss: 2.176737 LR: 0.00001185 [22:27:07] Epoch: 1 Batch: 738/38378 (1.92%) Loss: 2.217588 LR: 0.00001185 [22:27:09] Epoch: 1 Batch: 739/38378 (1.93%) Loss: 2.626431 LR: 0.00001196 [22:27:11] Epoch: 1 Batch: 740/38378 (1.93%) Loss: 2.308899 LR: 0.00001196 [22:27:12] Epoch: 1 Batch: 741/38378 (1.93%) Loss: 2.488565 LR: 0.00001196 [22:27:14] Epoch: 1 Batch: 742/38378 (1.93%) Loss: 2.330527 LR: 0.00001196 [22:27:16] Epoch: 1 Batch: 743/38378 (1.94%) Loss: 2.466304 LR: 0.00001196 [22:27:18] Epoch: 1 Batch: 744/38378 (1.94%) Loss: 2.240545 LR: 0.00001196 [22:27:19] Epoch: 1 Batch: 745/38378 (1.94%) Loss: 2.276461 LR: 0.00001196 [22:27:21] Epoch: 1 Batch: 746/38378 (1.94%) Loss: 2.069174 LR: 0.00001207 [22:27:23] Epoch: 1 Batch: 747/38378 (1.95%) Loss: 2.142117 LR: 0.00001207 [22:27:24] Epoch: 1 Batch: 748/38378 (1.95%) Loss: 2.125295 LR: 0.00001207 [22:27:26] Epoch: 1 Batch: 749/38378 (1.95%) Loss: 2.203306 LR: 0.00001207 [22:27:28] Epoch: 1 Batch: 750/38378 (1.95%) Loss: 2.222285 LR: 0.00001207 [22:27:30] Epoch: 1 Batch: 751/38378 (1.96%) Loss: 2.231354 LR: 0.00001207 [22:27:31] Epoch: 1 Batch: 752/38378 (1.96%) Loss: 2.462741 LR: 0.00001207 [22:27:33] Epoch: 1 Batch: 753/38378 (1.96%) Loss: 2.268144 LR: 0.00001219 [22:27:35] Epoch: 1 Batch: 754/38378 (1.96%) Loss: 2.211864 LR: 0.00001219 [22:27:36] Epoch: 1 Batch: 755/38378 (1.97%) Loss: 2.272110 LR: 0.00001219 [22:27:38] Epoch: 1 Batch: 756/38378 (1.97%) Loss: 2.351582 LR: 0.00001219 [22:27:40] Epoch: 1 Batch: 757/38378 (1.97%) Loss: 2.114206 LR: 0.00001219 [22:27:42] Epoch: 1 Batch: 758/38378 (1.98%) Loss: 2.351835 LR: 0.00001219 [22:27:48] >> Cleaned up old temp checkpoint: epoch1_step429 [22:27:48] >> Temp checkpoint saved: epoch1_step759, size: 0.1702 GB [22:27:48] Epoch: 1 Batch: 759/38378 (1.98%) Loss: 2.628616 LR: 0.00001219 [22:27:49] Epoch: 1 Batch: 760/38378 (1.98%) Loss: 2.359353 LR: 0.00001230 [22:27:51] Epoch: 1 Batch: 761/38378 (1.98%) Loss: 2.174208 LR: 0.00001230 [22:27:53] Epoch: 1 Batch: 762/38378 (1.99%) Loss: 2.399855 LR: 0.00001230 [22:27:54] Epoch: 1 Batch: 763/38378 (1.99%) Loss: 2.422174 LR: 0.00001230 [22:27:56] Epoch: 1 Batch: 764/38378 (1.99%) Loss: 2.341859 LR: 0.00001230 [22:27:58] Epoch: 1 Batch: 765/38378 (1.99%) Loss: 2.068551 LR: 0.00001230 [22:28:00] Epoch: 1 Batch: 766/38378 (2.00%) Loss: 2.248235 LR: 0.00001230 [22:28:01] Epoch: 1 Batch: 767/38378 (2.00%) Loss: 2.490723 LR: 0.00001241 [22:28:03] Epoch: 1 Batch: 768/38378 (2.00%) Loss: 2.388455 LR: 0.00001241 [22:28:05] Epoch: 1 Batch: 769/38378 (2.00%) Loss: 2.472299 LR: 0.00001241 [22:28:06] Epoch: 1 Batch: 770/38378 (2.01%) Loss: 2.388600 LR: 0.00001241 [22:28:08] Epoch: 1 Batch: 771/38378 (2.01%) Loss: 2.529160 LR: 0.00001241 [22:28:10] Epoch: 1 Batch: 772/38378 (2.01%) Loss: 2.245700 LR: 0.00001241 [22:28:11] Epoch: 1 Batch: 773/38378 (2.01%) Loss: 2.261742 LR: 0.00001241 [22:28:13] Epoch: 1 Batch: 774/38378 (2.02%) Loss: 2.208684 LR: 0.00001253 [22:28:15] Epoch: 1 Batch: 775/38378 (2.02%) Loss: 2.203992 LR: 0.00001253 [22:28:17] Epoch: 1 Batch: 776/38378 (2.02%) Loss: 2.411349 LR: 0.00001253 [22:28:18] Epoch: 1 Batch: 777/38378 (2.02%) Loss: 2.619542 LR: 0.00001253 [22:28:20] Epoch: 1 Batch: 778/38378 (2.03%) Loss: 2.348228 LR: 0.00001253 [22:28:22] Epoch: 1 Batch: 779/38378 (2.03%) Loss: 2.725043 LR: 0.00001253 [22:28:24] Epoch: 1 Batch: 780/38378 (2.03%) Loss: 2.700975 LR: 0.00001253 [22:28:25] Epoch: 1 Batch: 781/38378 (2.04%) Loss: 2.160642 LR: 0.00001264 [22:28:27] Epoch: 1 Batch: 782/38378 (2.04%) Loss: 2.416787 LR: 0.00001264 [22:28:29] Epoch: 1 Batch: 783/38378 (2.04%) Loss: 2.161905 LR: 0.00001264 [22:28:31] Epoch: 1 Batch: 784/38378 (2.04%) Loss: 2.532455 LR: 0.00001264 [22:28:32] Epoch: 1 Batch: 785/38378 (2.05%) Loss: 2.434195 LR: 0.00001264 [22:28:34] Epoch: 1 Batch: 786/38378 (2.05%) Loss: 2.227629 LR: 0.00001264 [22:28:36] Epoch: 1 Batch: 787/38378 (2.05%) Loss: 2.522963 LR: 0.00001264 [22:28:38] Epoch: 1 Batch: 788/38378 (2.05%) Loss: 2.346722 LR: 0.00001276 [22:28:39] Epoch: 1 Batch: 789/38378 (2.06%) Loss: 2.257012 LR: 0.00001276 [22:28:41] Epoch: 1 Batch: 790/38378 (2.06%) Loss: 2.145150 LR: 0.00001276 [22:28:43] Epoch: 1 Batch: 791/38378 (2.06%) Loss: 2.439706 LR: 0.00001276 [22:28:49] >> Cleaned up old temp checkpoint: epoch1_step462 [22:28:49] >> Temp checkpoint saved: epoch1_step792, size: 0.1702 GB [22:28:49] Epoch: 1 Batch: 792/38378 (2.06%) Loss: 2.329388 LR: 0.00001276 [22:28:51] Epoch: 1 Batch: 793/38378 (2.07%) Loss: 2.369557 LR: 0.00001276 [22:28:52] Epoch: 1 Batch: 794/38378 (2.07%) Loss: 2.120241 LR: 0.00001276 [22:28:54] Epoch: 1 Batch: 795/38378 (2.07%) Loss: 2.534625 LR: 0.00001287 [22:28:56] Epoch: 1 Batch: 796/38378 (2.07%) Loss: 2.184517 LR: 0.00001287 [22:28:58] Epoch: 1 Batch: 797/38378 (2.08%) Loss: 2.139980 LR: 0.00001287 [22:28:59] Epoch: 1 Batch: 798/38378 (2.08%) Loss: 2.372327 LR: 0.00001287 [22:29:01] Epoch: 1 Batch: 799/38378 (2.08%) Loss: 2.378911 LR: 0.00001287 [22:29:03] Epoch: 1 Batch: 800/38378 (2.08%) Loss: 2.259941 LR: 0.00001287 [22:29:04] Epoch: 1 Batch: 801/38378 (2.09%) Loss: 2.310655 LR: 0.00001287 [22:29:06] Epoch: 1 Batch: 802/38378 (2.09%) Loss: 2.134416 LR: 0.00001298 [22:29:08] Epoch: 1 Batch: 803/38378 (2.09%) Loss: 2.476684 LR: 0.00001298 [22:29:10] Epoch: 1 Batch: 804/38378 (2.09%) Loss: 2.501624 LR: 0.00001298 [22:29:11] Epoch: 1 Batch: 805/38378 (2.10%) Loss: 2.041840 LR: 0.00001298 [22:29:13] Epoch: 1 Batch: 806/38378 (2.10%) Loss: 2.528334 LR: 0.00001298 [22:29:15] Epoch: 1 Batch: 807/38378 (2.10%) Loss: 2.612580 LR: 0.00001298 [22:29:16] Epoch: 1 Batch: 808/38378 (2.11%) Loss: 2.167219 LR: 0.00001298 [22:29:18] Epoch: 1 Batch: 809/38378 (2.11%) Loss: 2.357918 LR: 0.00001310 [22:29:20] Epoch: 1 Batch: 810/38378 (2.11%) Loss: 2.362829 LR: 0.00001310 [22:29:22] Epoch: 1 Batch: 811/38378 (2.11%) Loss: 2.356280 LR: 0.00001310 [22:29:23] Epoch: 1 Batch: 812/38378 (2.12%) Loss: 2.243010 LR: 0.00001310 [22:29:25] Epoch: 1 Batch: 813/38378 (2.12%) Loss: 2.363374 LR: 0.00001310 [22:29:27] Epoch: 1 Batch: 814/38378 (2.12%) Loss: 2.785685 LR: 0.00001310 [22:29:28] Epoch: 1 Batch: 815/38378 (2.12%) Loss: 2.079819 LR: 0.00001310 [22:29:30] Epoch: 1 Batch: 816/38378 (2.13%) Loss: 2.304609 LR: 0.00001321 [22:29:32] Epoch: 1 Batch: 817/38378 (2.13%) Loss: 2.223092 LR: 0.00001321 [22:29:34] Epoch: 1 Batch: 818/38378 (2.13%) Loss: 2.049437 LR: 0.00001321 [22:29:35] Epoch: 1 Batch: 819/38378 (2.13%) Loss: 2.200816 LR: 0.00001321 [22:29:37] Epoch: 1 Batch: 820/38378 (2.14%) Loss: 2.307378 LR: 0.00001321 [22:29:39] Epoch: 1 Batch: 821/38378 (2.14%) Loss: 2.492128 LR: 0.00001321 [22:29:40] Epoch: 1 Batch: 822/38378 (2.14%) Loss: 2.145402 LR: 0.00001321 [22:29:42] Epoch: 1 Batch: 823/38378 (2.14%) Loss: 2.241672 LR: 0.00001333 [22:29:44] Epoch: 1 Batch: 824/38378 (2.15%) Loss: 2.113649 LR: 0.00001333 [22:29:50] >> Cleaned up old temp checkpoint: epoch1_step495 [22:29:50] >> Temp checkpoint saved: epoch1_step825, size: 0.1702 GB [22:29:50] Epoch: 1 Batch: 825/38378 (2.15%) Loss: 2.330513 LR: 0.00001333 [22:29:52] Epoch: 1 Batch: 826/38378 (2.15%) Loss: 2.628118 LR: 0.00001333 [22:29:54] Epoch: 1 Batch: 827/38378 (2.15%) Loss: 1.997523 LR: 0.00001333 [22:29:55] Epoch: 1 Batch: 828/38378 (2.16%) Loss: 2.241353 LR: 0.00001333 [22:29:57] Epoch: 1 Batch: 829/38378 (2.16%) Loss: 2.398099 LR: 0.00001333 [22:29:59] Epoch: 1 Batch: 830/38378 (2.16%) Loss: 2.357974 LR: 0.00001344 [22:30:00] Epoch: 1 Batch: 831/38378 (2.17%) Loss: 2.167157 LR: 0.00001344 [22:30:02] Epoch: 1 Batch: 832/38378 (2.17%) Loss: 1.947745 LR: 0.00001344 [22:30:04] Epoch: 1 Batch: 833/38378 (2.17%) Loss: 2.151915 LR: 0.00001344 [22:30:05] Epoch: 1 Batch: 834/38378 (2.17%) Loss: 2.095519 LR: 0.00001344 [22:30:07] Epoch: 1 Batch: 835/38378 (2.18%) Loss: 2.759760 LR: 0.00001344 [22:30:09] Epoch: 1 Batch: 836/38378 (2.18%) Loss: 2.300821 LR: 0.00001344 [22:30:10] Epoch: 1 Batch: 837/38378 (2.18%) Loss: 2.454197 LR: 0.00001355 [22:30:12] Epoch: 1 Batch: 838/38378 (2.18%) Loss: 2.134704 LR: 0.00001355 [22:30:14] Epoch: 1 Batch: 839/38378 (2.19%) Loss: 2.402645 LR: 0.00001355 [22:30:15] Epoch: 1 Batch: 840/38378 (2.19%) Loss: 2.123997 LR: 0.00001355 [22:30:17] Epoch: 1 Batch: 841/38378 (2.19%) Loss: 2.726332 LR: 0.00001355 [22:30:18] Epoch: 1 Batch: 842/38378 (2.19%) Loss: 2.321239 LR: 0.00001355 [22:30:20] Epoch: 1 Batch: 843/38378 (2.20%) Loss: 1.984917 LR: 0.00001355 [22:30:22] Epoch: 1 Batch: 844/38378 (2.20%) Loss: 2.286136 LR: 0.00001367 [22:30:24] Epoch: 1 Batch: 845/38378 (2.20%) Loss: 2.324021 LR: 0.00001367 [22:30:25] Epoch: 1 Batch: 846/38378 (2.20%) Loss: 2.488766 LR: 0.00001367 [22:30:27] Epoch: 1 Batch: 847/38378 (2.21%) Loss: 2.351575 LR: 0.00001367 [22:30:29] Epoch: 1 Batch: 848/38378 (2.21%) Loss: 2.190423 LR: 0.00001367 [22:30:31] Epoch: 1 Batch: 849/38378 (2.21%) Loss: 2.273066 LR: 0.00001367 [22:30:33] Epoch: 1 Batch: 850/38378 (2.21%) Loss: 2.160201 LR: 0.00001367 [22:30:35] Epoch: 1 Batch: 851/38378 (2.22%) Loss: 2.525270 LR: 0.00001378 [22:30:36] Epoch: 1 Batch: 852/38378 (2.22%) Loss: 2.107598 LR: 0.00001378 [22:30:38] Epoch: 1 Batch: 853/38378 (2.22%) Loss: 2.438993 LR: 0.00001378 [22:30:40] Epoch: 1 Batch: 854/38378 (2.23%) Loss: 2.433100 LR: 0.00001378 [22:30:41] Epoch: 1 Batch: 855/38378 (2.23%) Loss: 2.296438 LR: 0.00001378 [22:30:43] Epoch: 1 Batch: 856/38378 (2.23%) Loss: 2.357097 LR: 0.00001378 [22:30:45] Epoch: 1 Batch: 857/38378 (2.23%) Loss: 2.577758 LR: 0.00001378 [22:30:51] >> Cleaned up old temp checkpoint: epoch1_step528 [22:30:51] >> Temp checkpoint saved: epoch1_step858, size: 0.1702 GB [22:30:51] Epoch: 1 Batch: 858/38378 (2.24%) Loss: 2.645242 LR: 0.00001390 [22:30:53] Epoch: 1 Batch: 859/38378 (2.24%) Loss: 2.198207 LR: 0.00001390 [22:30:54] Epoch: 1 Batch: 860/38378 (2.24%) Loss: 2.330546 LR: 0.00001390 [22:30:56] Epoch: 1 Batch: 861/38378 (2.24%) Loss: 1.965720 LR: 0.00001390 [22:30:58] Epoch: 1 Batch: 862/38378 (2.25%) Loss: 2.226267 LR: 0.00001390 [22:30:59] Epoch: 1 Batch: 863/38378 (2.25%) Loss: 2.282812 LR: 0.00001390 [22:31:01] Epoch: 1 Batch: 864/38378 (2.25%) Loss: 2.510505 LR: 0.00001390 [22:31:03] Epoch: 1 Batch: 865/38378 (2.25%) Loss: 2.194411 LR: 0.00001401 [22:31:04] Epoch: 1 Batch: 866/38378 (2.26%) Loss: 2.246597 LR: 0.00001401 [22:31:06] Epoch: 1 Batch: 867/38378 (2.26%) Loss: 2.595532 LR: 0.00001401 [22:31:08] Epoch: 1 Batch: 868/38378 (2.26%) Loss: 2.212927 LR: 0.00001401 [22:31:10] Epoch: 1 Batch: 869/38378 (2.26%) Loss: 2.261631 LR: 0.00001401 [22:31:11] Epoch: 1 Batch: 870/38378 (2.27%) Loss: 2.228106 LR: 0.00001401 [22:31:13] Epoch: 1 Batch: 871/38378 (2.27%) Loss: 2.235526 LR: 0.00001401 [22:31:15] Epoch: 1 Batch: 872/38378 (2.27%) Loss: 2.459242 LR: 0.00001412 [22:31:16] Epoch: 1 Batch: 873/38378 (2.27%) Loss: 2.328708 LR: 0.00001412 [22:31:18] Epoch: 1 Batch: 874/38378 (2.28%) Loss: 2.360423 LR: 0.00001412 [22:31:20] Epoch: 1 Batch: 875/38378 (2.28%) Loss: 2.332483 LR: 0.00001412 [22:31:22] Epoch: 1 Batch: 876/38378 (2.28%) Loss: 2.499199 LR: 0.00001412 [22:31:23] Epoch: 1 Batch: 877/38378 (2.29%) Loss: 2.150764 LR: 0.00001412 [22:31:25] Epoch: 1 Batch: 878/38378 (2.29%) Loss: 2.378063 LR: 0.00001412 [22:31:27] Epoch: 1 Batch: 879/38378 (2.29%) Loss: 2.231449 LR: 0.00001424 [22:31:29] Epoch: 1 Batch: 880/38378 (2.29%) Loss: 2.508583 LR: 0.00001424 [22:31:30] Epoch: 1 Batch: 881/38378 (2.30%) Loss: 2.185309 LR: 0.00001424 [22:31:32] Epoch: 1 Batch: 882/38378 (2.30%) Loss: 2.380587 LR: 0.00001424 [22:31:34] Epoch: 1 Batch: 883/38378 (2.30%) Loss: 2.225769 LR: 0.00001424 [22:31:36] Epoch: 1 Batch: 884/38378 (2.30%) Loss: 2.646519 LR: 0.00001424 [22:31:37] Epoch: 1 Batch: 885/38378 (2.31%) Loss: 2.370932 LR: 0.00001424 [22:31:39] Epoch: 1 Batch: 886/38378 (2.31%) Loss: 2.514417 LR: 0.00001435 [22:31:41] Epoch: 1 Batch: 887/38378 (2.31%) Loss: 2.403691 LR: 0.00001435 [22:31:43] Epoch: 1 Batch: 888/38378 (2.31%) Loss: 2.581993 LR: 0.00001435 [22:31:44] Epoch: 1 Batch: 889/38378 (2.32%) Loss: 2.234231 LR: 0.00001435 [22:31:46] Epoch: 1 Batch: 890/38378 (2.32%) Loss: 2.406683 LR: 0.00001435 [22:31:52] >> Cleaned up old temp checkpoint: epoch1_step561 [22:31:52] >> Temp checkpoint saved: epoch1_step891, size: 0.1702 GB [22:31:52] Epoch: 1 Batch: 891/38378 (2.32%) Loss: 2.178631 LR: 0.00001435 [22:31:54] Epoch: 1 Batch: 892/38378 (2.32%) Loss: 2.357566 LR: 0.00001435 [22:31:56] Epoch: 1 Batch: 893/38378 (2.33%) Loss: 2.304163 LR: 0.00001446 [22:31:57] Epoch: 1 Batch: 894/38378 (2.33%) Loss: 2.073378 LR: 0.00001446 [22:31:59] Epoch: 1 Batch: 895/38378 (2.33%) Loss: 2.279188 LR: 0.00001446 [22:32:01] Epoch: 1 Batch: 896/38378 (2.33%) Loss: 2.315275 LR: 0.00001446 [22:32:03] Epoch: 1 Batch: 897/38378 (2.34%) Loss: 2.512477 LR: 0.00001446 [22:32:04] Epoch: 1 Batch: 898/38378 (2.34%) Loss: 2.282399 LR: 0.00001446 [22:32:06] Epoch: 1 Batch: 899/38378 (2.34%) Loss: 2.244254 LR: 0.00001446 [22:32:08] Epoch: 1 Batch: 900/38378 (2.35%) Loss: 2.096140 LR: 0.00001458 [22:32:09] Epoch: 1 Batch: 901/38378 (2.35%) Loss: 2.219126 LR: 0.00001458 [22:32:11] Epoch: 1 Batch: 902/38378 (2.35%) Loss: 2.171244 LR: 0.00001458 [22:32:13] Epoch: 1 Batch: 903/38378 (2.35%) Loss: 2.074000 LR: 0.00001458 [22:32:15] Epoch: 1 Batch: 904/38378 (2.36%) Loss: 2.335721 LR: 0.00001458 [22:32:16] Epoch: 1 Batch: 905/38378 (2.36%) Loss: 2.277428 LR: 0.00001458 [22:32:18] Epoch: 1 Batch: 906/38378 (2.36%) Loss: 2.260314 LR: 0.00001458 [22:32:20] Epoch: 1 Batch: 907/38378 (2.36%) Loss: 2.371263 LR: 0.00001469 [22:32:21] Epoch: 1 Batch: 908/38378 (2.37%) Loss: 2.541821 LR: 0.00001469 [22:32:23] Epoch: 1 Batch: 909/38378 (2.37%) Loss: 2.329827 LR: 0.00001469 [22:32:25] Epoch: 1 Batch: 910/38378 (2.37%) Loss: 2.205117 LR: 0.00001469 [22:32:27] Epoch: 1 Batch: 911/38378 (2.37%) Loss: 2.116410 LR: 0.00001469 [22:32:28] Epoch: 1 Batch: 912/38378 (2.38%) Loss: 2.275937 LR: 0.00001469 [22:32:30] Epoch: 1 Batch: 913/38378 (2.38%) Loss: 2.445049 LR: 0.00001469 [22:32:32] Epoch: 1 Batch: 914/38378 (2.38%) Loss: 2.496572 LR: 0.00001481 [22:32:34] Epoch: 1 Batch: 915/38378 (2.38%) Loss: 2.201085 LR: 0.00001481 [22:32:35] Epoch: 1 Batch: 916/38378 (2.39%) Loss: 1.938788 LR: 0.00001481 [22:32:37] Epoch: 1 Batch: 917/38378 (2.39%) Loss: 2.321253 LR: 0.00001481 [22:32:39] Epoch: 1 Batch: 918/38378 (2.39%) Loss: 2.442510 LR: 0.00001481 [22:32:40] Epoch: 1 Batch: 919/38378 (2.39%) Loss: 2.334032 LR: 0.00001481 [22:32:42] Epoch: 1 Batch: 920/38378 (2.40%) Loss: 2.396160 LR: 0.00001481 [22:32:44] Epoch: 1 Batch: 921/38378 (2.40%) Loss: 2.191311 LR: 0.00001492 [22:32:45] Epoch: 1 Batch: 922/38378 (2.40%) Loss: 2.456035 LR: 0.00001492 [22:32:47] Epoch: 1 Batch: 923/38378 (2.41%) Loss: 2.355627 LR: 0.00001492 [22:32:53] >> Cleaned up old temp checkpoint: epoch1_step594 [22:32:53] >> Temp checkpoint saved: epoch1_step924, size: 0.1702 GB [22:32:53] Epoch: 1 Batch: 924/38378 (2.41%) Loss: 2.201743 LR: 0.00001492 [22:32:55] Epoch: 1 Batch: 925/38378 (2.41%) Loss: 2.233790 LR: 0.00001492 [22:32:56] Epoch: 1 Batch: 926/38378 (2.41%) Loss: 2.250316 LR: 0.00001492 [22:32:58] Epoch: 1 Batch: 927/38378 (2.42%) Loss: 2.085241 LR: 0.00001492 [22:33:00] Epoch: 1 Batch: 928/38378 (2.42%) Loss: 2.311651 LR: 0.00001503 [22:33:02] Epoch: 1 Batch: 929/38378 (2.42%) Loss: 2.147753 LR: 0.00001503 [22:33:03] Epoch: 1 Batch: 930/38378 (2.42%) Loss: 2.114787 LR: 0.00001503 [22:33:05] Epoch: 1 Batch: 931/38378 (2.43%) Loss: 2.023236 LR: 0.00001503 [22:33:07] Epoch: 1 Batch: 932/38378 (2.43%) Loss: 2.296315 LR: 0.00001503 [22:33:08] Epoch: 1 Batch: 933/38378 (2.43%) Loss: 2.204985 LR: 0.00001503 [22:33:10] Epoch: 1 Batch: 934/38378 (2.43%) Loss: 1.852154 LR: 0.00001503 [22:33:12] Epoch: 1 Batch: 935/38378 (2.44%) Loss: 2.165136 LR: 0.00001515 [22:33:13] Epoch: 1 Batch: 936/38378 (2.44%) Loss: 2.043202 LR: 0.00001515 [22:33:15] Epoch: 1 Batch: 937/38378 (2.44%) Loss: 2.161638 LR: 0.00001515 [22:33:17] Epoch: 1 Batch: 938/38378 (2.44%) Loss: 2.078285 LR: 0.00001515 [22:33:19] Epoch: 1 Batch: 939/38378 (2.45%) Loss: 2.136014 LR: 0.00001515 [22:33:20] Epoch: 1 Batch: 940/38378 (2.45%) Loss: 2.531197 LR: 0.00001515 [22:33:22] Epoch: 1 Batch: 941/38378 (2.45%) Loss: 2.500448 LR: 0.00001515 [22:33:24] Epoch: 1 Batch: 942/38378 (2.45%) Loss: 2.281656 LR: 0.00001526 [22:33:25] Epoch: 1 Batch: 943/38378 (2.46%) Loss: 2.320766 LR: 0.00001526 [22:33:27] Epoch: 1 Batch: 944/38378 (2.46%) Loss: 2.136386 LR: 0.00001526 [22:33:29] Epoch: 1 Batch: 945/38378 (2.46%) Loss: 2.145178 LR: 0.00001526 [22:33:31] Epoch: 1 Batch: 946/38378 (2.46%) Loss: 2.542266 LR: 0.00001526 [22:33:32] Epoch: 1 Batch: 947/38378 (2.47%) Loss: 2.480909 LR: 0.00001526 [22:33:34] Epoch: 1 Batch: 948/38378 (2.47%) Loss: 2.529258 LR: 0.00001526 [22:33:36] Epoch: 1 Batch: 949/38378 (2.47%) Loss: 2.603205 LR: 0.00001538 [22:33:38] Epoch: 1 Batch: 950/38378 (2.48%) Loss: 2.188004 LR: 0.00001538 [22:33:39] Epoch: 1 Batch: 951/38378 (2.48%) Loss: 1.941269 LR: 0.00001538 [22:33:41] Epoch: 1 Batch: 952/38378 (2.48%) Loss: 2.290674 LR: 0.00001538 [22:33:43] Epoch: 1 Batch: 953/38378 (2.48%) Loss: 2.195029 LR: 0.00001538 [22:33:44] Epoch: 1 Batch: 954/38378 (2.49%) Loss: 2.676865 LR: 0.00001538 [22:33:46] Epoch: 1 Batch: 955/38378 (2.49%) Loss: 2.219567 LR: 0.00001538 [22:33:48] Epoch: 1 Batch: 956/38378 (2.49%) Loss: 2.122832 LR: 0.00001549 [22:33:53] >> Cleaned up old temp checkpoint: epoch1_step627 [22:33:53] >> Temp checkpoint saved: epoch1_step957, size: 0.1702 GB [22:33:53] Epoch: 1 Batch: 957/38378 (2.49%) Loss: 2.104565 LR: 0.00001549 [22:33:55] Epoch: 1 Batch: 958/38378 (2.50%) Loss: 2.232712 LR: 0.00001549 [22:33:57] Epoch: 1 Batch: 959/38378 (2.50%) Loss: 2.274271 LR: 0.00001549 [22:33:59] Epoch: 1 Batch: 960/38378 (2.50%) Loss: 2.140128 LR: 0.00001549 [22:34:00] Epoch: 1 Batch: 961/38378 (2.50%) Loss: 2.243165 LR: 0.00001549 [22:34:02] Epoch: 1 Batch: 962/38378 (2.51%) Loss: 2.133811 LR: 0.00001549 [22:34:04] Epoch: 1 Batch: 963/38378 (2.51%) Loss: 2.315888 LR: 0.00001560 [22:34:05] Epoch: 1 Batch: 964/38378 (2.51%) Loss: 2.262115 LR: 0.00001560 [22:34:07] Epoch: 1 Batch: 965/38378 (2.51%) Loss: 2.065453 LR: 0.00001560 [22:34:09] Epoch: 1 Batch: 966/38378 (2.52%) Loss: 2.212199 LR: 0.00001560 [22:34:10] Epoch: 1 Batch: 967/38378 (2.52%) Loss: 2.535219 LR: 0.00001560 [22:34:12] Epoch: 1 Batch: 968/38378 (2.52%) Loss: 1.633229 LR: 0.00001560 [22:34:14] Epoch: 1 Batch: 969/38378 (2.52%) Loss: 2.362619 LR: 0.00001560 [22:34:16] Epoch: 1 Batch: 970/38378 (2.53%) Loss: 2.400365 LR: 0.00001572 [22:34:17] Epoch: 1 Batch: 971/38378 (2.53%) Loss: 2.401180 LR: 0.00001572 [22:34:19] Epoch: 1 Batch: 972/38378 (2.53%) Loss: 2.279969 LR: 0.00001572 [22:34:21] Epoch: 1 Batch: 973/38378 (2.54%) Loss: 2.656110 LR: 0.00001572 [22:34:22] Epoch: 1 Batch: 974/38378 (2.54%) Loss: 2.336715 LR: 0.00001572 [22:34:24] Epoch: 1 Batch: 975/38378 (2.54%) Loss: 2.105810 LR: 0.00001572 [22:34:26] Epoch: 1 Batch: 976/38378 (2.54%) Loss: 2.285779 LR: 0.00001572 [22:34:28] Epoch: 1 Batch: 977/38378 (2.55%) Loss: 2.026850 LR: 0.00001583 [22:34:29] Epoch: 1 Batch: 978/38378 (2.55%) Loss: 2.283332 LR: 0.00001583 [22:34:31] Epoch: 1 Batch: 979/38378 (2.55%) Loss: 2.348961 LR: 0.00001583 [22:34:33] Epoch: 1 Batch: 980/38378 (2.55%) Loss: 2.127547 LR: 0.00001583 [22:34:34] Epoch: 1 Batch: 981/38378 (2.56%) Loss: 2.050260 LR: 0.00001583 [22:34:36] Epoch: 1 Batch: 982/38378 (2.56%) Loss: 2.103195 LR: 0.00001583 [22:34:38] Epoch: 1 Batch: 983/38378 (2.56%) Loss: 2.291360 LR: 0.00001583 [22:34:40] Epoch: 1 Batch: 984/38378 (2.56%) Loss: 2.304586 LR: 0.00001595 [22:34:41] Epoch: 1 Batch: 985/38378 (2.57%) Loss: 2.400288 LR: 0.00001595 [22:34:43] Epoch: 1 Batch: 986/38378 (2.57%) Loss: 1.993506 LR: 0.00001595 [22:34:45] Epoch: 1 Batch: 987/38378 (2.57%) Loss: 2.043276 LR: 0.00001595 [22:34:46] Epoch: 1 Batch: 988/38378 (2.57%) Loss: 2.043858 LR: 0.00001595 [22:34:48] Epoch: 1 Batch: 989/38378 (2.58%) Loss: 2.332761 LR: 0.00001595 [22:34:54] >> Cleaned up old temp checkpoint: epoch1_step660 [22:34:54] >> Temp checkpoint saved: epoch1_step990, size: 0.1702 GB [22:34:54] Epoch: 1 Batch: 990/38378 (2.58%) Loss: 2.263372 LR: 0.00001595 [22:34:55] Epoch: 1 Batch: 991/38378 (2.58%) Loss: 2.022240 LR: 0.00001606 [22:34:57] Epoch: 1 Batch: 992/38378 (2.58%) Loss: 2.190782 LR: 0.00001606 [22:34:59] Epoch: 1 Batch: 993/38378 (2.59%) Loss: 2.009415 LR: 0.00001606 [22:35:00] Epoch: 1 Batch: 994/38378 (2.59%) Loss: 2.471318 LR: 0.00001606 [22:35:02] Epoch: 1 Batch: 995/38378 (2.59%) Loss: 2.297415 LR: 0.00001606 [22:35:04] Epoch: 1 Batch: 996/38378 (2.60%) Loss: 2.116226 LR: 0.00001606 [22:35:06] Epoch: 1 Batch: 997/38378 (2.60%) Loss: 2.263527 LR: 0.00001606 [22:35:07] Epoch: 1 Batch: 998/38378 (2.60%) Loss: 2.402356 LR: 0.00001617 [22:35:09] Epoch: 1 Batch: 999/38378 (2.60%) Loss: 2.092600 LR: 0.00001617 [22:35:11] >> Evaluating batch 0 [22:35:12] >> Evaluating batch 1 [22:35:13] >> Evaluating batch 2 [22:35:14] >> Evaluating batch 3 [22:35:15] >> Evaluating batch 4 [22:35:16] >> Evaluating batch 5 [22:35:16] >> Evaluating batch 6 [22:35:17] >> Evaluating batch 7 [22:35:18] >> Evaluating batch 8 [22:35:19] >> Evaluating batch 9 [22:35:20] >> Evaluating batch 10 [22:35:21] >> Evaluating batch 11 [22:35:22] >> Evaluating batch 12 [22:35:23] >> Evaluating batch 13 [22:35:24] >> Evaluating batch 14 [22:35:25] >> Evaluating batch 15 [22:35:26] >> Evaluating batch 16 [22:35:27] Epoch: 1 Step: 1000/38378 Evaluation: [22:35:27] [1mAvg Loss Since Last Eval: 0.8682 Val Loss: 2.3729 Validation loss delta: 2.3729 Perplexity: 10.7282 LR: 0.00001617 [22:35:31] >> Checkpoint saved: epoch1_step1000, size: 0.1702 GB [22:35:31] Epoch: 1 Batch: 1000/38378 (2.61%) Loss: 2.212438 LR: 0.00001617 [22:35:32] Epoch: 1 Batch: 1001/38378 (2.61%) Loss: 2.170512 LR: 0.00001617 [22:35:34] Epoch: 1 Batch: 1002/38378 (2.61%) Loss: 2.310854 LR: 0.00001617 [22:35:36] Epoch: 1 Batch: 1003/38378 (2.61%) Loss: 2.353486 LR: 0.00001617 [22:35:37] Epoch: 1 Batch: 1004/38378 (2.62%) Loss: 2.133377 LR: 0.00001617 [22:35:39] Epoch: 1 Batch: 1005/38378 (2.62%) Loss: 2.092452 LR: 0.00001629 [22:35:41] Epoch: 1 Batch: 1006/38378 (2.62%) Loss: 2.383041 LR: 0.00001629 [22:35:43] Epoch: 1 Batch: 1007/38378 (2.62%) Loss: 2.312902 LR: 0.00001629 [22:35:44] Epoch: 1 Batch: 1008/38378 (2.63%) Loss: 2.360334 LR: 0.00001629 [22:35:46] Epoch: 1 Batch: 1009/38378 (2.63%) Loss: 2.114092 LR: 0.00001629 [22:35:48] Epoch: 1 Batch: 1010/38378 (2.63%) Loss: 2.391002 LR: 0.00001629 [22:35:50] Epoch: 1 Batch: 1011/38378 (2.63%) Loss: 2.394932 LR: 0.00001629 [22:35:51] Epoch: 1 Batch: 1012/38378 (2.64%) Loss: 2.173802 LR: 0.00001640 [22:35:53] Epoch: 1 Batch: 1013/38378 (2.64%) Loss: 2.045889 LR: 0.00001640 [22:35:55] Epoch: 1 Batch: 1014/38378 (2.64%) Loss: 1.941021 LR: 0.00001640 [22:35:56] Epoch: 1 Batch: 1015/38378 (2.64%) Loss: 1.938529 LR: 0.00001640 [22:35:58] Epoch: 1 Batch: 1016/38378 (2.65%) Loss: 2.111669 LR: 0.00001640 [22:36:00] Epoch: 1 Batch: 1017/38378 (2.65%) Loss: 2.139498 LR: 0.00001640 [22:36:02] Epoch: 1 Batch: 1018/38378 (2.65%) Loss: 2.237511 LR: 0.00001640 [22:36:03] Epoch: 1 Batch: 1019/38378 (2.66%) Loss: 2.365351 LR: 0.00001651 [22:36:05] Epoch: 1 Batch: 1020/38378 (2.66%) Loss: 2.244816 LR: 0.00001651 [22:36:07] Epoch: 1 Batch: 1021/38378 (2.66%) Loss: 2.086468 LR: 0.00001651 [22:36:09] Epoch: 1 Batch: 1022/38378 (2.66%) Loss: 2.090502 LR: 0.00001651 [22:36:14] >> Cleaned up old temp checkpoint: epoch1_step693 [22:36:14] >> Temp checkpoint saved: epoch1_step1023, size: 0.1702 GB [22:36:14] Epoch: 1 Batch: 1023/38378 (2.67%) Loss: 2.342029 LR: 0.00001651 [22:36:16] Epoch: 1 Batch: 1024/38378 (2.67%) Loss: 2.131766 LR: 0.00001651 [22:36:17] Epoch: 1 Batch: 1025/38378 (2.67%) Loss: 2.187128 LR: 0.00001651 [22:36:19] Epoch: 1 Batch: 1026/38378 (2.67%) Loss: 2.127114 LR: 0.00001663 [22:36:21] Epoch: 1 Batch: 1027/38378 (2.68%) Loss: 2.172258 LR: 0.00001663 [22:36:23] Epoch: 1 Batch: 1028/38378 (2.68%) Loss: 2.168259 LR: 0.00001663 [22:36:24] Epoch: 1 Batch: 1029/38378 (2.68%) Loss: 2.135739 LR: 0.00001663 [22:36:26] Epoch: 1 Batch: 1030/38378 (2.68%) Loss: 2.397351 LR: 0.00001663 [22:36:28] Epoch: 1 Batch: 1031/38378 (2.69%) Loss: 2.364576 LR: 0.00001663 [22:36:29] Epoch: 1 Batch: 1032/38378 (2.69%) Loss: 2.262063 LR: 0.00001663 [22:36:31] Epoch: 1 Batch: 1033/38378 (2.69%) Loss: 2.499722 LR: 0.00001674 [22:36:33] Epoch: 1 Batch: 1034/38378 (2.69%) Loss: 2.196900 LR: 0.00001674 [22:36:35] Epoch: 1 Batch: 1035/38378 (2.70%) Loss: 2.160228 LR: 0.00001674 [22:36:36] Epoch: 1 Batch: 1036/38378 (2.70%) Loss: 2.518227 LR: 0.00001674 [22:36:38] Epoch: 1 Batch: 1037/38378 (2.70%) Loss: 2.554521 LR: 0.00001674 [22:36:39] Epoch: 1 Batch: 1038/38378 (2.70%) Loss: 2.542899 LR: 0.00001674 [22:36:41] Epoch: 1 Batch: 1039/38378 (2.71%) Loss: 2.438259 LR: 0.00001674 [22:36:43] Epoch: 1 Batch: 1040/38378 (2.71%) Loss: 2.280049 LR: 0.00001686 [22:36:44] Epoch: 1 Batch: 1041/38378 (2.71%) Loss: 2.600825 LR: 0.00001686 [22:36:46] Epoch: 1 Batch: 1042/38378 (2.72%) Loss: 2.469117 LR: 0.00001686 [22:36:48] Epoch: 1 Batch: 1043/38378 (2.72%) Loss: 2.291897 LR: 0.00001686 [22:36:50] Epoch: 1 Batch: 1044/38378 (2.72%) Loss: 2.148425 LR: 0.00001686 [22:36:51] Epoch: 1 Batch: 1045/38378 (2.72%) Loss: 2.045824 LR: 0.00001686 [22:36:53] Epoch: 1 Batch: 1046/38378 (2.73%) Loss: 2.092523 LR: 0.00001686 [22:36:55] Epoch: 1 Batch: 1047/38378 (2.73%) Loss: 2.176796 LR: 0.00001697 [22:36:56] Epoch: 1 Batch: 1048/38378 (2.73%) Loss: 2.390354 LR: 0.00001697 [22:36:58] Epoch: 1 Batch: 1049/38378 (2.73%) Loss: 2.160176 LR: 0.00001697 [22:37:00] Epoch: 1 Batch: 1050/38378 (2.74%) Loss: 2.108541 LR: 0.00001697 [22:37:02] Epoch: 1 Batch: 1051/38378 (2.74%) Loss: 2.311870 LR: 0.00001697 [22:37:03] Epoch: 1 Batch: 1052/38378 (2.74%) Loss: 2.104497 LR: 0.00001697 [22:37:05] Epoch: 1 Batch: 1053/38378 (2.74%) Loss: 2.144133 LR: 0.00001697 [22:37:07] Epoch: 1 Batch: 1054/38378 (2.75%) Loss: 2.012256 LR: 0.00001708 [22:37:09] Epoch: 1 Batch: 1055/38378 (2.75%) Loss: 2.392164 LR: 0.00001708 [22:37:14] >> Cleaned up old temp checkpoint: epoch1_step726 [22:37:14] >> Temp checkpoint saved: epoch1_step1056, size: 0.1702 GB [22:37:14] Epoch: 1 Batch: 1056/38378 (2.75%) Loss: 2.315200 LR: 0.00001708 [22:37:16] Epoch: 1 Batch: 1057/38378 (2.75%) Loss: 1.948185 LR: 0.00001708 [22:37:18] Epoch: 1 Batch: 1058/38378 (2.76%) Loss: 2.447245 LR: 0.00001708 [22:37:19] Epoch: 1 Batch: 1059/38378 (2.76%) Loss: 1.888987 LR: 0.00001708 [22:37:21] Epoch: 1 Batch: 1060/38378 (2.76%) Loss: 2.481701 LR: 0.00001708 [22:37:23] Epoch: 1 Batch: 1061/38378 (2.76%) Loss: 2.324695 LR: 0.00001720 [22:37:24] Epoch: 1 Batch: 1062/38378 (2.77%) Loss: 1.905198 LR: 0.00001720 [22:37:26] Epoch: 1 Batch: 1063/38378 (2.77%) Loss: 2.486352 LR: 0.00001720 [22:37:28] Epoch: 1 Batch: 1064/38378 (2.77%) Loss: 2.043628 LR: 0.00001720 [22:37:29] Epoch: 1 Batch: 1065/38378 (2.78%) Loss: 1.932171 LR: 0.00001720 [22:37:31] Epoch: 1 Batch: 1066/38378 (2.78%) Loss: 1.989221 LR: 0.00001720 [22:37:33] Epoch: 1 Batch: 1067/38378 (2.78%) Loss: 2.596842 LR: 0.00001720 [22:37:35] Epoch: 1 Batch: 1068/38378 (2.78%) Loss: 1.920845 LR: 0.00001731 [22:37:36] Epoch: 1 Batch: 1069/38378 (2.79%) Loss: 2.401916 LR: 0.00001731 [22:37:38] Epoch: 1 Batch: 1070/38378 (2.79%) Loss: 2.432564 LR: 0.00001731 [22:37:40] Epoch: 1 Batch: 1071/38378 (2.79%) Loss: 2.374006 LR: 0.00001731 [22:37:41] Epoch: 1 Batch: 1072/38378 (2.79%) Loss: 2.269837 LR: 0.00001731 [22:37:43] Epoch: 1 Batch: 1073/38378 (2.80%) Loss: 1.807648 LR: 0.00001731 [22:37:45] Epoch: 1 Batch: 1074/38378 (2.80%) Loss: 2.092491 LR: 0.00001731 [22:37:47] Epoch: 1 Batch: 1075/38378 (2.80%) Loss: 2.326637 LR: 0.00001743 [22:37:48] Epoch: 1 Batch: 1076/38378 (2.80%) Loss: 2.128291 LR: 0.00001743 [22:37:50] Epoch: 1 Batch: 1077/38378 (2.81%) Loss: 2.136518 LR: 0.00001743 [22:37:52] Epoch: 1 Batch: 1078/38378 (2.81%) Loss: 1.959705 LR: 0.00001743 [22:37:53] Epoch: 1 Batch: 1079/38378 (2.81%) Loss: 2.003478 LR: 0.00001743 [22:37:55] Epoch: 1 Batch: 1080/38378 (2.81%) Loss: 2.162687 LR: 0.00001743 [22:37:57] Epoch: 1 Batch: 1081/38378 (2.82%) Loss: 2.275488 LR: 0.00001743 [22:37:59] Epoch: 1 Batch: 1082/38378 (2.82%) Loss: 2.107813 LR: 0.00001754 [22:38:00] Epoch: 1 Batch: 1083/38378 (2.82%) Loss: 2.029014 LR: 0.00001754 [22:38:02] Epoch: 1 Batch: 1084/38378 (2.82%) Loss: 1.932200 LR: 0.00001754 [22:38:04] Epoch: 1 Batch: 1085/38378 (2.83%) Loss: 2.370981 LR: 0.00001754 [22:38:06] Epoch: 1 Batch: 1086/38378 (2.83%) Loss: 1.990462 LR: 0.00001754 [22:38:07] Epoch: 1 Batch: 1087/38378 (2.83%) Loss: 2.225581 LR: 0.00001754 [22:38:09] Epoch: 1 Batch: 1088/38378 (2.83%) Loss: 2.234337 LR: 0.00001754 [22:38:15] >> Cleaned up old temp checkpoint: epoch1_step759 [22:38:15] >> Temp checkpoint saved: epoch1_step1089, size: 0.1702 GB [22:38:15] Epoch: 1 Batch: 1089/38378 (2.84%) Loss: 2.098845 LR: 0.00001765 [22:38:16] Epoch: 1 Batch: 1090/38378 (2.84%) Loss: 2.055129 LR: 0.00001765 [22:38:18] Epoch: 1 Batch: 1091/38378 (2.84%) Loss: 2.437760 LR: 0.00001765 [22:38:20] Epoch: 1 Batch: 1092/38378 (2.85%) Loss: 2.252306 LR: 0.00001765 [22:38:22] Epoch: 1 Batch: 1093/38378 (2.85%) Loss: 2.096856 LR: 0.00001765 [22:38:23] Epoch: 1 Batch: 1094/38378 (2.85%) Loss: 2.401266 LR: 0.00001765 [22:38:25] Epoch: 1 Batch: 1095/38378 (2.85%) Loss: 2.302351 LR: 0.00001765 [22:38:27] Epoch: 1 Batch: 1096/38378 (2.86%) Loss: 2.365348 LR: 0.00001777 [22:38:28] Epoch: 1 Batch: 1097/38378 (2.86%) Loss: 2.068930 LR: 0.00001777 [22:38:30] Epoch: 1 Batch: 1098/38378 (2.86%) Loss: 2.354689 LR: 0.00001777 [22:38:32] Epoch: 1 Batch: 1099/38378 (2.86%) Loss: 2.212543 LR: 0.00001777 [22:38:34] Epoch: 1 Batch: 1100/38378 (2.87%) Loss: 2.259763 LR: 0.00001777 [22:38:35] Epoch: 1 Batch: 1101/38378 (2.87%) Loss: 2.447295 LR: 0.00001777 [22:38:37] Epoch: 1 Batch: 1102/38378 (2.87%) Loss: 1.862782 LR: 0.00001777 [22:38:39] Epoch: 1 Batch: 1103/38378 (2.87%) Loss: 2.082386 LR: 0.00001788 [22:38:40] Epoch: 1 Batch: 1104/38378 (2.88%) Loss: 2.384881 LR: 0.00001788 [22:38:42] Epoch: 1 Batch: 1105/38378 (2.88%) Loss: 2.104220 LR: 0.00001788 [22:38:44] Epoch: 1 Batch: 1106/38378 (2.88%) Loss: 2.262308 LR: 0.00001788 [22:38:46] Epoch: 1 Batch: 1107/38378 (2.88%) Loss: 2.362705 LR: 0.00001788 [22:38:47] Epoch: 1 Batch: 1108/38378 (2.89%) Loss: 2.179756 LR: 0.00001788 [22:38:49] Epoch: 1 Batch: 1109/38378 (2.89%) Loss: 2.170965 LR: 0.00001788 [22:38:51] Epoch: 1 Batch: 1110/38378 (2.89%) Loss: 2.236340 LR: 0.00001800 [22:38:52] Epoch: 1 Batch: 1111/38378 (2.89%) Loss: 2.104921 LR: 0.00001800 [22:38:54] Epoch: 1 Batch: 1112/38378 (2.90%) Loss: 2.326958 LR: 0.00001800 [22:38:56] Epoch: 1 Batch: 1113/38378 (2.90%) Loss: 2.226389 LR: 0.00001800 [22:38:58] Epoch: 1 Batch: 1114/38378 (2.90%) Loss: 1.981984 LR: 0.00001800 [22:38:59] Epoch: 1 Batch: 1115/38378 (2.91%) Loss: 1.796934 LR: 0.00001800 [22:39:01] Epoch: 1 Batch: 1116/38378 (2.91%) Loss: 2.219715 LR: 0.00001800 [22:39:03] Epoch: 1 Batch: 1117/38378 (2.91%) Loss: 2.450666 LR: 0.00001811 [22:39:04] Epoch: 1 Batch: 1118/38378 (2.91%) Loss: 2.182508 LR: 0.00001811 [22:39:06] Epoch: 1 Batch: 1119/38378 (2.92%) Loss: 2.202199 LR: 0.00001811 [22:39:08] Epoch: 1 Batch: 1120/38378 (2.92%) Loss: 2.219282 LR: 0.00001811 [22:39:10] Epoch: 1 Batch: 1121/38378 (2.92%) Loss: 2.645129 LR: 0.00001811 [22:39:15] >> Cleaned up old temp checkpoint: epoch1_step792 [22:39:15] >> Temp checkpoint saved: epoch1_step1122, size: 0.1702 GB [22:39:15] Epoch: 1 Batch: 1122/38378 (2.92%) Loss: 2.313375 LR: 0.00001811 [22:39:17] Epoch: 1 Batch: 1123/38378 (2.93%) Loss: 2.034597 LR: 0.00001811 [22:39:19] Epoch: 1 Batch: 1124/38378 (2.93%) Loss: 2.444411 LR: 0.00001822 [22:39:21] Epoch: 1 Batch: 1125/38378 (2.93%) Loss: 2.342215 LR: 0.00001822 [22:39:22] Epoch: 1 Batch: 1126/38378 (2.93%) Loss: 2.195772 LR: 0.00001822 [22:39:24] Epoch: 1 Batch: 1127/38378 (2.94%) Loss: 2.267958 LR: 0.00001822 [22:39:26] Epoch: 1 Batch: 1128/38378 (2.94%) Loss: 2.330088 LR: 0.00001822 [22:39:27] Epoch: 1 Batch: 1129/38378 (2.94%) Loss: 2.154109 LR: 0.00001822 [22:39:29] Epoch: 1 Batch: 1130/38378 (2.94%) Loss: 2.349676 LR: 0.00001822 [22:39:31] Epoch: 1 Batch: 1131/38378 (2.95%) Loss: 2.152569 LR: 0.00001834 [22:39:32] Epoch: 1 Batch: 1132/38378 (2.95%) Loss: 2.252145 LR: 0.00001834 [22:39:34] Epoch: 1 Batch: 1133/38378 (2.95%) Loss: 2.142930 LR: 0.00001834 [22:39:36] Epoch: 1 Batch: 1134/38378 (2.95%) Loss: 2.115865 LR: 0.00001834 [22:39:38] Epoch: 1 Batch: 1135/38378 (2.96%) Loss: 2.184317 LR: 0.00001834 [22:39:39] Epoch: 1 Batch: 1136/38378 (2.96%) Loss: 2.079184 LR: 0.00001834 [22:39:41] Epoch: 1 Batch: 1137/38378 (2.96%) Loss: 2.454409 LR: 0.00001834 [22:39:43] Epoch: 1 Batch: 1138/38378 (2.97%) Loss: 1.941707 LR: 0.00001845 [22:39:44] Epoch: 1 Batch: 1139/38378 (2.97%) Loss: 2.390077 LR: 0.00001845 [22:39:46] Epoch: 1 Batch: 1140/38378 (2.97%) Loss: 2.197333 LR: 0.00001845 [22:39:48] Epoch: 1 Batch: 1141/38378 (2.97%) Loss: 2.102028 LR: 0.00001845 [22:39:50] Epoch: 1 Batch: 1142/38378 (2.98%) Loss: 2.249391 LR: 0.00001845 [22:39:51] Epoch: 1 Batch: 1143/38378 (2.98%) Loss: 2.431241 LR: 0.00001845 [22:39:53] Epoch: 1 Batch: 1144/38378 (2.98%) Loss: 2.298937 LR: 0.00001845 [22:39:55] Epoch: 1 Batch: 1145/38378 (2.98%) Loss: 1.923668 LR: 0.00001856 [22:39:56] Epoch: 1 Batch: 1146/38378 (2.99%) Loss: 2.357139 LR: 0.00001856 [22:39:58] Epoch: 1 Batch: 1147/38378 (2.99%) Loss: 2.031273 LR: 0.00001856 [22:40:00] Epoch: 1 Batch: 1148/38378 (2.99%) Loss: 2.040537 LR: 0.00001856 [22:40:01] Epoch: 1 Batch: 1149/38378 (2.99%) Loss: 2.259559 LR: 0.00001856 [22:40:03] Epoch: 1 Batch: 1150/38378 (3.00%) Loss: 2.174663 LR: 0.00001856 [22:40:05] Epoch: 1 Batch: 1151/38378 (3.00%) Loss: 2.213661 LR: 0.00001856 [22:40:07] Epoch: 1 Batch: 1152/38378 (3.00%) Loss: 2.308430 LR: 0.00001868 [22:40:08] Epoch: 1 Batch: 1153/38378 (3.00%) Loss: 2.032334 LR: 0.00001868 [22:40:10] Epoch: 1 Batch: 1154/38378 (3.01%) Loss: 2.286784 LR: 0.00001868 [22:40:16] >> Cleaned up old temp checkpoint: epoch1_step825 [22:40:16] >> Temp checkpoint saved: epoch1_step1155, size: 0.1702 GB [22:40:16] Epoch: 1 Batch: 1155/38378 (3.01%) Loss: 2.132458 LR: 0.00001868 [22:40:18] Epoch: 1 Batch: 1156/38378 (3.01%) Loss: 2.280944 LR: 0.00001868 [22:40:19] Epoch: 1 Batch: 1157/38378 (3.01%) Loss: 2.149388 LR: 0.00001868 [22:40:21] Epoch: 1 Batch: 1158/38378 (3.02%) Loss: 2.353193 LR: 0.00001868 [22:40:23] Epoch: 1 Batch: 1159/38378 (3.02%) Loss: 2.405811 LR: 0.00001879 [22:40:24] Epoch: 1 Batch: 1160/38378 (3.02%) Loss: 2.204599 LR: 0.00001879 [22:40:26] Epoch: 1 Batch: 1161/38378 (3.03%) Loss: 2.351357 LR: 0.00001879 [22:40:28] Epoch: 1 Batch: 1162/38378 (3.03%) Loss: 1.861008 LR: 0.00001879 [22:40:29] Epoch: 1 Batch: 1163/38378 (3.03%) Loss: 2.204915 LR: 0.00001879 [22:40:31] Epoch: 1 Batch: 1164/38378 (3.03%) Loss: 2.022805 LR: 0.00001879 [22:40:33] Epoch: 1 Batch: 1165/38378 (3.04%) Loss: 2.174303 LR: 0.00001879 [22:40:34] Epoch: 1 Batch: 1166/38378 (3.04%) Loss: 2.272209 LR: 0.00001891 [22:40:36] Epoch: 1 Batch: 1167/38378 (3.04%) Loss: 2.254183 LR: 0.00001891 [22:40:38] Epoch: 1 Batch: 1168/38378 (3.04%) Loss: 2.152935 LR: 0.00001891 [22:40:39] Epoch: 1 Batch: 1169/38378 (3.05%) Loss: 2.420075 LR: 0.00001891 [22:40:41] Epoch: 1 Batch: 1170/38378 (3.05%) Loss: 2.401262 LR: 0.00001891 [22:40:43] Epoch: 1 Batch: 1171/38378 (3.05%) Loss: 2.570300 LR: 0.00001891 [22:40:45] Epoch: 1 Batch: 1172/38378 (3.05%) Loss: 2.076540 LR: 0.00001891 [22:40:46] Epoch: 1 Batch: 1173/38378 (3.06%) Loss: 2.153765 LR: 0.00001902 [22:40:48] Epoch: 1 Batch: 1174/38378 (3.06%) Loss: 2.715366 LR: 0.00001902 [22:40:50] Epoch: 1 Batch: 1175/38378 (3.06%) Loss: 2.049115 LR: 0.00001902 [22:40:51] Epoch: 1 Batch: 1176/38378 (3.06%) Loss: 2.322598 LR: 0.00001902 [22:40:53] Epoch: 1 Batch: 1177/38378 (3.07%) Loss: 2.288172 LR: 0.00001902 [22:40:55] Epoch: 1 Batch: 1178/38378 (3.07%) Loss: 2.104622 LR: 0.00001902 [22:40:56] Epoch: 1 Batch: 1179/38378 (3.07%) Loss: 2.367956 LR: 0.00001902 [22:40:58] Epoch: 1 Batch: 1180/38378 (3.07%) Loss: 2.323787 LR: 0.00001913 [22:41:00] Epoch: 1 Batch: 1181/38378 (3.08%) Loss: 2.473728 LR: 0.00001913 [22:41:01] Epoch: 1 Batch: 1182/38378 (3.08%) Loss: 1.824755 LR: 0.00001913 [22:41:03] Epoch: 1 Batch: 1183/38378 (3.08%) Loss: 2.272438 LR: 0.00001913 [22:41:05] Epoch: 1 Batch: 1184/38378 (3.09%) Loss: 2.396883 LR: 0.00001913 [22:41:07] Epoch: 1 Batch: 1185/38378 (3.09%) Loss: 2.077168 LR: 0.00001913 [22:41:08] Epoch: 1 Batch: 1186/38378 (3.09%) Loss: 2.485021 LR: 0.00001913 [22:41:10] Epoch: 1 Batch: 1187/38378 (3.09%) Loss: 2.329345 LR: 0.00001925 [22:41:16] >> Cleaned up old temp checkpoint: epoch1_step858 [22:41:16] >> Temp checkpoint saved: epoch1_step1188, size: 0.1702 GB [22:41:16] Epoch: 1 Batch: 1188/38378 (3.10%) Loss: 2.475359 LR: 0.00001925 [22:41:17] Epoch: 1 Batch: 1189/38378 (3.10%) Loss: 2.303941 LR: 0.00001925 [22:41:19] Epoch: 1 Batch: 1190/38378 (3.10%) Loss: 2.280469 LR: 0.00001925 [22:41:21] Epoch: 1 Batch: 1191/38378 (3.10%) Loss: 2.253389 LR: 0.00001925 [22:41:23] Epoch: 1 Batch: 1192/38378 (3.11%) Loss: 2.372659 LR: 0.00001925 [22:41:24] Epoch: 1 Batch: 1193/38378 (3.11%) Loss: 2.096410 LR: 0.00001925 [22:41:26] Epoch: 1 Batch: 1194/38378 (3.11%) Loss: 2.068027 LR: 0.00001936 [22:41:28] Epoch: 1 Batch: 1195/38378 (3.11%) Loss: 2.122001 LR: 0.00001936 [22:41:29] Epoch: 1 Batch: 1196/38378 (3.12%) Loss: 2.127375 LR: 0.00001936 [22:41:31] Epoch: 1 Batch: 1197/38378 (3.12%) Loss: 2.137191 LR: 0.00001936 [22:41:33] Epoch: 1 Batch: 1198/38378 (3.12%) Loss: 2.088580 LR: 0.00001936 [22:41:34] Epoch: 1 Batch: 1199/38378 (3.12%) Loss: 2.140697 LR: 0.00001936 [22:41:36] Epoch: 1 Batch: 1200/38378 (3.13%) Loss: 2.090564 LR: 0.00001936 [22:41:38] Epoch: 1 Batch: 1201/38378 (3.13%) Loss: 2.204226 LR: 0.00001948 [22:41:40] Epoch: 1 Batch: 1202/38378 (3.13%) Loss: 2.087531 LR: 0.00001948 [22:41:41] Epoch: 1 Batch: 1203/38378 (3.13%) Loss: 2.213611 LR: 0.00001948 [22:41:43] Epoch: 1 Batch: 1204/38378 (3.14%) Loss: 2.253668 LR: 0.00001948 [22:41:45] Epoch: 1 Batch: 1205/38378 (3.14%) Loss: 2.008206 LR: 0.00001948 [22:41:47] Epoch: 1 Batch: 1206/38378 (3.14%) Loss: 2.362717 LR: 0.00001948 [22:41:48] Epoch: 1 Batch: 1207/38378 (3.15%) Loss: 1.986481 LR: 0.00001948 [22:41:50] Epoch: 1 Batch: 1208/38378 (3.15%) Loss: 2.084852 LR: 0.00001959 [22:41:52] Epoch: 1 Batch: 1209/38378 (3.15%) Loss: 2.173921 LR: 0.00001959 [22:41:53] Epoch: 1 Batch: 1210/38378 (3.15%) Loss: 2.186039 LR: 0.00001959 [22:41:55] Epoch: 1 Batch: 1211/38378 (3.16%) Loss: 2.206715 LR: 0.00001959 [22:41:57] Epoch: 1 Batch: 1212/38378 (3.16%) Loss: 1.916041 LR: 0.00001959 [22:41:59] Epoch: 1 Batch: 1213/38378 (3.16%) Loss: 2.097149 LR: 0.00001959 [22:42:00] Epoch: 1 Batch: 1214/38378 (3.16%) Loss: 2.120323 LR: 0.00001959 [22:42:02] Epoch: 1 Batch: 1215/38378 (3.17%) Loss: 2.091460 LR: 0.00001970 [22:42:04] Epoch: 1 Batch: 1216/38378 (3.17%) Loss: 2.298765 LR: 0.00001970 [22:42:06] Epoch: 1 Batch: 1217/38378 (3.17%) Loss: 1.923161 LR: 0.00001970 [22:42:07] Epoch: 1 Batch: 1218/38378 (3.17%) Loss: 2.291162 LR: 0.00001970 [22:42:09] Epoch: 1 Batch: 1219/38378 (3.18%) Loss: 2.118454 LR: 0.00001970 [22:42:11] Epoch: 1 Batch: 1220/38378 (3.18%) Loss: 2.430328 LR: 0.00001970 [22:42:16] >> Cleaned up old temp checkpoint: epoch1_step891 [22:42:16] >> Temp checkpoint saved: epoch1_step1221, size: 0.1702 GB [22:42:16] Epoch: 1 Batch: 1221/38378 (3.18%) Loss: 2.296656 LR: 0.00001970 [22:42:18] Epoch: 1 Batch: 1222/38378 (3.18%) Loss: 2.340523 LR: 0.00001982 [22:42:20] Epoch: 1 Batch: 1223/38378 (3.19%) Loss: 2.163659 LR: 0.00001982 [22:42:21] Epoch: 1 Batch: 1224/38378 (3.19%) Loss: 2.116104 LR: 0.00001982 [22:42:23] Epoch: 1 Batch: 1225/38378 (3.19%) Loss: 2.193758 LR: 0.00001982 [22:42:25] Epoch: 1 Batch: 1226/38378 (3.19%) Loss: 2.381800 LR: 0.00001982 [22:42:26] Epoch: 1 Batch: 1227/38378 (3.20%) Loss: 2.268094 LR: 0.00001982 [22:42:28] Epoch: 1 Batch: 1228/38378 (3.20%) Loss: 2.265924 LR: 0.00001982 [22:42:30] Epoch: 1 Batch: 1229/38378 (3.20%) Loss: 2.352249 LR: 0.00001993 [22:42:31] Epoch: 1 Batch: 1230/38378 (3.20%) Loss: 2.070889 LR: 0.00001993 [22:42:33] Epoch: 1 Batch: 1231/38378 (3.21%) Loss: 2.087032 LR: 0.00001993 [22:42:35] Epoch: 1 Batch: 1232/38378 (3.21%) Loss: 2.071229 LR: 0.00001993 [22:42:37] Epoch: 1 Batch: 1233/38378 (3.21%) Loss: 2.185615 LR: 0.00001993 [22:42:38] Epoch: 1 Batch: 1234/38378 (3.22%) Loss: 2.559912 LR: 0.00001993 [22:42:40] Epoch: 1 Batch: 1235/38378 (3.22%) Loss: 2.223444 LR: 0.00001993 [22:42:42] Epoch: 1 Batch: 1236/38378 (3.22%) Loss: 2.335812 LR: 0.00002005 [22:42:44] Epoch: 1 Batch: 1237/38378 (3.22%) Loss: 2.056628 LR: 0.00002005 [22:42:45] Epoch: 1 Batch: 1238/38378 (3.23%) Loss: 2.463855 LR: 0.00002005 [22:42:47] Epoch: 1 Batch: 1239/38378 (3.23%) Loss: 2.177207 LR: 0.00002005 [22:42:49] Epoch: 1 Batch: 1240/38378 (3.23%) Loss: 2.039003 LR: 0.00002005 [22:42:50] Epoch: 1 Batch: 1241/38378 (3.23%) Loss: 2.065904 LR: 0.00002005 [22:42:52] Epoch: 1 Batch: 1242/38378 (3.24%) Loss: 1.947221 LR: 0.00002005 [22:42:54] Epoch: 1 Batch: 1243/38378 (3.24%) Loss: 2.046678 LR: 0.00002016 [22:42:56] Epoch: 1 Batch: 1244/38378 (3.24%) Loss: 2.159497 LR: 0.00002016 [22:42:57] Epoch: 1 Batch: 1245/38378 (3.24%) Loss: 1.953072 LR: 0.00002016 [22:42:59] Epoch: 1 Batch: 1246/38378 (3.25%) Loss: 2.226450 LR: 0.00002016 [22:43:01] Epoch: 1 Batch: 1247/38378 (3.25%) Loss: 2.107733 LR: 0.00002016 [22:43:03] Epoch: 1 Batch: 1248/38378 (3.25%) Loss: 2.240254 LR: 0.00002016 [22:43:04] Epoch: 1 Batch: 1249/38378 (3.25%) Loss: 2.402659 LR: 0.00002016 [22:43:06] Epoch: 1 Batch: 1250/38378 (3.26%) Loss: 2.050722 LR: 0.00002027 [22:43:08] Epoch: 1 Batch: 1251/38378 (3.26%) Loss: 2.140222 LR: 0.00002027 [22:43:09] Epoch: 1 Batch: 1252/38378 (3.26%) Loss: 2.085062 LR: 0.00002027 [22:43:11] Epoch: 1 Batch: 1253/38378 (3.26%) Loss: 2.459008 LR: 0.00002027 [22:43:17] >> Cleaned up old temp checkpoint: epoch1_step924 [22:43:17] >> Temp checkpoint saved: epoch1_step1254, size: 0.1702 GB [22:43:17] Epoch: 1 Batch: 1254/38378 (3.27%) Loss: 2.378188 LR: 0.00002027 [22:43:18] Epoch: 1 Batch: 1255/38378 (3.27%) Loss: 2.290630 LR: 0.00002027 [22:43:20] Epoch: 1 Batch: 1256/38378 (3.27%) Loss: 2.228057 LR: 0.00002027 [22:43:22] Epoch: 1 Batch: 1257/38378 (3.28%) Loss: 2.130292 LR: 0.00002039 [22:43:23] Epoch: 1 Batch: 1258/38378 (3.28%) Loss: 2.298956 LR: 0.00002039 [22:43:25] Epoch: 1 Batch: 1259/38378 (3.28%) Loss: 2.394722 LR: 0.00002039 [22:43:27] Epoch: 1 Batch: 1260/38378 (3.28%) Loss: 2.347107 LR: 0.00002039 [22:43:28] Epoch: 1 Batch: 1261/38378 (3.29%) Loss: 2.377003 LR: 0.00002039 [22:43:30] Epoch: 1 Batch: 1262/38378 (3.29%) Loss: 2.086052 LR: 0.00002039 [22:43:32] Epoch: 1 Batch: 1263/38378 (3.29%) Loss: 2.120633 LR: 0.00002039 [22:43:34] Epoch: 1 Batch: 1264/38378 (3.29%) Loss: 2.519931 LR: 0.00002050 [22:43:35] Epoch: 1 Batch: 1265/38378 (3.30%) Loss: 2.286087 LR: 0.00002050 [22:43:37] Epoch: 1 Batch: 1266/38378 (3.30%) Loss: 2.152620 LR: 0.00002050 [22:43:39] Epoch: 1 Batch: 1267/38378 (3.30%) Loss: 2.251059 LR: 0.00002050 [22:43:40] Epoch: 1 Batch: 1268/38378 (3.30%) Loss: 2.070854 LR: 0.00002050 [22:43:42] Epoch: 1 Batch: 1269/38378 (3.31%) Loss: 2.271436 LR: 0.00002050 [22:43:44] Epoch: 1 Batch: 1270/38378 (3.31%) Loss: 2.152428 LR: 0.00002050 [22:43:46] Epoch: 1 Batch: 1271/38378 (3.31%) Loss: 2.109315 LR: 0.00002062 [22:43:47] Epoch: 1 Batch: 1272/38378 (3.31%) Loss: 2.205926 LR: 0.00002062 [22:43:49] Epoch: 1 Batch: 1273/38378 (3.32%) Loss: 2.436824 LR: 0.00002062 [22:43:51] Epoch: 1 Batch: 1274/38378 (3.32%) Loss: 2.033654 LR: 0.00002062 [22:43:53] Epoch: 1 Batch: 1275/38378 (3.32%) Loss: 2.262977 LR: 0.00002062 [22:43:54] Epoch: 1 Batch: 1276/38378 (3.32%) Loss: 2.134705 LR: 0.00002062 [22:43:56] Epoch: 1 Batch: 1277/38378 (3.33%) Loss: 2.178966 LR: 0.00002062 [22:43:58] Epoch: 1 Batch: 1278/38378 (3.33%) Loss: 2.050443 LR: 0.00002073 [22:43:59] Epoch: 1 Batch: 1279/38378 (3.33%) Loss: 2.053335 LR: 0.00002073 [22:44:01] Epoch: 1 Batch: 1280/38378 (3.34%) Loss: 2.322117 LR: 0.00002073 [22:44:03] Epoch: 1 Batch: 1281/38378 (3.34%) Loss: 2.436787 LR: 0.00002073 [22:44:05] Epoch: 1 Batch: 1282/38378 (3.34%) Loss: 2.407937 LR: 0.00002073 [22:44:06] Epoch: 1 Batch: 1283/38378 (3.34%) Loss: 2.495062 LR: 0.00002073 [22:44:08] Epoch: 1 Batch: 1284/38378 (3.35%) Loss: 2.154701 LR: 0.00002073 [22:44:09] Epoch: 1 Batch: 1285/38378 (3.35%) Loss: 2.173219 LR: 0.00002084 [22:44:11] Epoch: 1 Batch: 1286/38378 (3.35%) Loss: 1.895882 LR: 0.00002084 [22:44:17] >> Cleaned up old temp checkpoint: epoch1_step957 [22:44:17] >> Temp checkpoint saved: epoch1_step1287, size: 0.1702 GB [22:44:17] Epoch: 1 Batch: 1287/38378 (3.35%) Loss: 2.043006 LR: 0.00002084 [22:44:18] Epoch: 1 Batch: 1288/38378 (3.36%) Loss: 2.254220 LR: 0.00002084 [22:44:20] Epoch: 1 Batch: 1289/38378 (3.36%) Loss: 2.093658 LR: 0.00002084 [22:44:22] Epoch: 1 Batch: 1290/38378 (3.36%) Loss: 2.197687 LR: 0.00002084 [22:44:23] Epoch: 1 Batch: 1291/38378 (3.36%) Loss: 2.474113 LR: 0.00002084 [22:44:25] Epoch: 1 Batch: 1292/38378 (3.37%) Loss: 2.125831 LR: 0.00002096 [22:44:27] Epoch: 1 Batch: 1293/38378 (3.37%) Loss: 2.361526 LR: 0.00002096 [22:44:28] Epoch: 1 Batch: 1294/38378 (3.37%) Loss: 2.238201 LR: 0.00002096 [22:44:30] Epoch: 1 Batch: 1295/38378 (3.37%) Loss: 2.031679 LR: 0.00002096 [22:44:32] Epoch: 1 Batch: 1296/38378 (3.38%) Loss: 2.343372 LR: 0.00002096 [22:44:34] Epoch: 1 Batch: 1297/38378 (3.38%) Loss: 2.017302 LR: 0.00002096 [22:44:35] Epoch: 1 Batch: 1298/38378 (3.38%) Loss: 2.269171 LR: 0.00002096 [22:44:37] Epoch: 1 Batch: 1299/38378 (3.38%) Loss: 2.163590 LR: 0.00002107 [22:44:39] Epoch: 1 Batch: 1300/38378 (3.39%) Loss: 2.409576 LR: 0.00002107 [22:44:40] Epoch: 1 Batch: 1301/38378 (3.39%) Loss: 2.034244 LR: 0.00002107 [22:44:42] Epoch: 1 Batch: 1302/38378 (3.39%) Loss: 2.231017 LR: 0.00002107 [22:44:44] Epoch: 1 Batch: 1303/38378 (3.40%) Loss: 2.359407 LR: 0.00002107 [22:44:46] Epoch: 1 Batch: 1304/38378 (3.40%) Loss: 2.153118 LR: 0.00002107 [22:44:47] Epoch: 1 Batch: 1305/38378 (3.40%) Loss: 2.354819 LR: 0.00002107 [22:44:49] Epoch: 1 Batch: 1306/38378 (3.40%) Loss: 2.065668 LR: 0.00002118 [22:44:51] Epoch: 1 Batch: 1307/38378 (3.41%) Loss: 1.981843 LR: 0.00002118 [22:44:52] Epoch: 1 Batch: 1308/38378 (3.41%) Loss: 2.179824 LR: 0.00002118 [22:44:54] Epoch: 1 Batch: 1309/38378 (3.41%) Loss: 2.140324 LR: 0.00002118 [22:44:56] Epoch: 1 Batch: 1310/38378 (3.41%) Loss: 2.432803 LR: 0.00002118 [22:44:58] Epoch: 1 Batch: 1311/38378 (3.42%) Loss: 1.972342 LR: 0.00002118 [22:44:59] Epoch: 1 Batch: 1312/38378 (3.42%) Loss: 2.100742 LR: 0.00002118 [22:45:01] Epoch: 1 Batch: 1313/38378 (3.42%) Loss: 2.294926 LR: 0.00002130 [22:45:03] Epoch: 1 Batch: 1314/38378 (3.42%) Loss: 2.106036 LR: 0.00002130 [22:45:04] Epoch: 1 Batch: 1315/38378 (3.43%) Loss: 2.055069 LR: 0.00002130 [22:45:06] Epoch: 1 Batch: 1316/38378 (3.43%) Loss: 2.191335 LR: 0.00002130 [22:45:08] Epoch: 1 Batch: 1317/38378 (3.43%) Loss: 2.016557 LR: 0.00002130 [22:45:10] Epoch: 1 Batch: 1318/38378 (3.43%) Loss: 2.186176 LR: 0.00002130 [22:45:11] Epoch: 1 Batch: 1319/38378 (3.44%) Loss: 2.080774 LR: 0.00002130 [22:45:17] >> Cleaned up old temp checkpoint: epoch1_step990 [22:45:17] >> Temp checkpoint saved: epoch1_step1320, size: 0.1702 GB [22:45:17] Epoch: 1 Batch: 1320/38378 (3.44%) Loss: 2.145474 LR: 0.00002141 [22:45:19] Epoch: 1 Batch: 1321/38378 (3.44%) Loss: 2.222032 LR: 0.00002141 [22:45:20] Epoch: 1 Batch: 1322/38378 (3.44%) Loss: 2.075372 LR: 0.00002141 [22:45:22] Epoch: 1 Batch: 1323/38378 (3.45%) Loss: 2.199796 LR: 0.00002141 [22:45:24] Epoch: 1 Batch: 1324/38378 (3.45%) Loss: 1.947041 LR: 0.00002141 [22:45:25] Epoch: 1 Batch: 1325/38378 (3.45%) Loss: 2.417706 LR: 0.00002141 [22:45:27] Epoch: 1 Batch: 1326/38378 (3.46%) Loss: 2.459873 LR: 0.00002141 [22:45:29] Epoch: 1 Batch: 1327/38378 (3.46%) Loss: 1.964853 LR: 0.00002153 [22:45:31] Epoch: 1 Batch: 1328/38378 (3.46%) Loss: 1.810157 LR: 0.00002153 [22:45:32] Epoch: 1 Batch: 1329/38378 (3.46%) Loss: 2.263704 LR: 0.00002153 [22:45:34] Epoch: 1 Batch: 1330/38378 (3.47%) Loss: 2.262097 LR: 0.00002153 [22:45:36] Epoch: 1 Batch: 1331/38378 (3.47%) Loss: 2.224197 LR: 0.00002153 [22:45:37] Epoch: 1 Batch: 1332/38378 (3.47%) Loss: 2.175322 LR: 0.00002153 [22:45:39] Epoch: 1 Batch: 1333/38378 (3.47%) Loss: 2.382909 LR: 0.00002153 [22:45:41] Epoch: 1 Batch: 1334/38378 (3.48%) Loss: 2.079008 LR: 0.00002164 [22:45:42] Epoch: 1 Batch: 1335/38378 (3.48%) Loss: 2.273781 LR: 0.00002164 [22:45:44] Epoch: 1 Batch: 1336/38378 (3.48%) Loss: 2.038116 LR: 0.00002164 [22:45:46] Epoch: 1 Batch: 1337/38378 (3.48%) Loss: 2.402793 LR: 0.00002164 [22:45:47] Epoch: 1 Batch: 1338/38378 (3.49%) Loss: 2.144711 LR: 0.00002164 [22:45:49] Epoch: 1 Batch: 1339/38378 (3.49%) Loss: 2.179494 LR: 0.00002164 [22:45:51] Epoch: 1 Batch: 1340/38378 (3.49%) Loss: 2.216154 LR: 0.00002164 [22:45:53] Epoch: 1 Batch: 1341/38378 (3.49%) Loss: 2.444417 LR: 0.00002175 [22:45:54] Epoch: 1 Batch: 1342/38378 (3.50%) Loss: 2.079677 LR: 0.00002175 [22:45:56] Epoch: 1 Batch: 1343/38378 (3.50%) Loss: 2.113133 LR: 0.00002175 [22:45:58] Epoch: 1 Batch: 1344/38378 (3.50%) Loss: 2.253562 LR: 0.00002175 [22:45:59] Epoch: 1 Batch: 1345/38378 (3.50%) Loss: 2.382008 LR: 0.00002175 [22:46:01] Epoch: 1 Batch: 1346/38378 (3.51%) Loss: 2.079880 LR: 0.00002175 [22:46:03] Epoch: 1 Batch: 1347/38378 (3.51%) Loss: 2.321656 LR: 0.00002175 [22:46:05] Epoch: 1 Batch: 1348/38378 (3.51%) Loss: 2.175375 LR: 0.00002187 [22:46:06] Epoch: 1 Batch: 1349/38378 (3.52%) Loss: 2.030902 LR: 0.00002187 [22:46:08] Epoch: 1 Batch: 1350/38378 (3.52%) Loss: 1.912674 LR: 0.00002187 [22:46:10] Epoch: 1 Batch: 1351/38378 (3.52%) Loss: 2.611680 LR: 0.00002187 [22:46:12] Epoch: 1 Batch: 1352/38378 (3.52%) Loss: 2.177451 LR: 0.00002187 [22:46:17] >> Cleaned up old temp checkpoint: epoch1_step1023 [22:46:17] >> Temp checkpoint saved: epoch1_step1353, size: 0.1702 GB [22:46:17] Epoch: 1 Batch: 1353/38378 (3.53%) Loss: 2.206979 LR: 0.00002187 [22:46:19] Epoch: 1 Batch: 1354/38378 (3.53%) Loss: 2.006598 LR: 0.00002187 [22:46:21] Epoch: 1 Batch: 1355/38378 (3.53%) Loss: 1.879839 LR: 0.00002198 [22:46:22] Epoch: 1 Batch: 1356/38378 (3.53%) Loss: 2.086125 LR: 0.00002198 [22:46:24] Epoch: 1 Batch: 1357/38378 (3.54%) Loss: 2.313700 LR: 0.00002198 [22:46:26] Epoch: 1 Batch: 1358/38378 (3.54%) Loss: 2.166161 LR: 0.00002198 [22:46:27] Epoch: 1 Batch: 1359/38378 (3.54%) Loss: 2.205592 LR: 0.00002198 [22:46:29] Epoch: 1 Batch: 1360/38378 (3.54%) Loss: 2.413560 LR: 0.00002198 [22:46:31] Epoch: 1 Batch: 1361/38378 (3.55%) Loss: 2.192335 LR: 0.00002198 [22:46:32] Epoch: 1 Batch: 1362/38378 (3.55%) Loss: 1.791641 LR: 0.00002210 [22:46:34] Epoch: 1 Batch: 1363/38378 (3.55%) Loss: 2.226608 LR: 0.00002210 [22:46:36] Epoch: 1 Batch: 1364/38378 (3.55%) Loss: 2.230684 LR: 0.00002210 [22:46:38] Epoch: 1 Batch: 1365/38378 (3.56%) Loss: 1.954568 LR: 0.00002210 [22:46:39] Epoch: 1 Batch: 1366/38378 (3.56%) Loss: 1.899064 LR: 0.00002210 [22:46:41] Epoch: 1 Batch: 1367/38378 (3.56%) Loss: 2.434282 LR: 0.00002210 [22:46:43] Epoch: 1 Batch: 1368/38378 (3.56%) Loss: 2.126715 LR: 0.00002210 [22:46:44] Epoch: 1 Batch: 1369/38378 (3.57%) Loss: 2.005384 LR: 0.00002221 [22:46:46] Epoch: 1 Batch: 1370/38378 (3.57%) Loss: 2.033666 LR: 0.00002221 [22:46:48] Epoch: 1 Batch: 1371/38378 (3.57%) Loss: 2.243837 LR: 0.00002221 [22:46:50] Epoch: 1 Batch: 1372/38378 (3.57%) Loss: 2.051771 LR: 0.00002221 [22:46:51] Epoch: 1 Batch: 1373/38378 (3.58%) Loss: 1.951337 LR: 0.00002221 [22:46:53] Epoch: 1 Batch: 1374/38378 (3.58%) Loss: 2.461016 LR: 0.00002221 [22:46:54] Epoch: 1 Batch: 1375/38378 (3.58%) Loss: 2.270405 LR: 0.00002221 [22:46:56] Epoch: 1 Batch: 1376/38378 (3.59%) Loss: 2.616555 LR: 0.00002232 [22:46:58] Epoch: 1 Batch: 1377/38378 (3.59%) Loss: 2.060162 LR: 0.00002232 [22:47:00] Epoch: 1 Batch: 1378/38378 (3.59%) Loss: 1.972206 LR: 0.00002232 [22:47:01] Epoch: 1 Batch: 1379/38378 (3.59%) Loss: 2.209433 LR: 0.00002232 [22:47:03] Epoch: 1 Batch: 1380/38378 (3.60%) Loss: 2.156860 LR: 0.00002232 [22:47:05] Epoch: 1 Batch: 1381/38378 (3.60%) Loss: 2.144341 LR: 0.00002232 [22:47:06] Epoch: 1 Batch: 1382/38378 (3.60%) Loss: 2.119875 LR: 0.00002232 [22:47:08] Epoch: 1 Batch: 1383/38378 (3.60%) Loss: 2.219742 LR: 0.00002244 [22:47:10] Epoch: 1 Batch: 1384/38378 (3.61%) Loss: 1.702882 LR: 0.00002244 [22:47:12] Epoch: 1 Batch: 1385/38378 (3.61%) Loss: 1.965248 LR: 0.00002244 [22:47:17] >> Cleaned up old temp checkpoint: epoch1_step1056 [22:47:17] >> Temp checkpoint saved: epoch1_step1386, size: 0.1702 GB [22:47:17] Epoch: 1 Batch: 1386/38378 (3.61%) Loss: 2.058729 LR: 0.00002244 [22:47:19] Epoch: 1 Batch: 1387/38378 (3.61%) Loss: 2.328615 LR: 0.00002244 [22:47:21] Epoch: 1 Batch: 1388/38378 (3.62%) Loss: 1.975141 LR: 0.00002244 [22:47:23] Epoch: 1 Batch: 1389/38378 (3.62%) Loss: 2.507613 LR: 0.00002244 [22:47:24] Epoch: 1 Batch: 1390/38378 (3.62%) Loss: 2.180846 LR: 0.00002255 [22:47:26] Epoch: 1 Batch: 1391/38378 (3.62%) Loss: 1.975332 LR: 0.00002255 [22:47:28] Epoch: 1 Batch: 1392/38378 (3.63%) Loss: 2.212183 LR: 0.00002255 [22:47:29] Epoch: 1 Batch: 1393/38378 (3.63%) Loss: 1.974326 LR: 0.00002255 [22:47:31] Epoch: 1 Batch: 1394/38378 (3.63%) Loss: 2.353870 LR: 0.00002255 [22:47:33] Epoch: 1 Batch: 1395/38378 (3.63%) Loss: 2.386841 LR: 0.00002255 [22:47:34] Epoch: 1 Batch: 1396/38378 (3.64%) Loss: 2.480095 LR: 0.00002255 [22:47:36] Epoch: 1 Batch: 1397/38378 (3.64%) Loss: 2.159362 LR: 0.00002267 [22:47:38] Epoch: 1 Batch: 1398/38378 (3.64%) Loss: 2.448546 LR: 0.00002267 [22:47:40] Epoch: 1 Batch: 1399/38378 (3.65%) Loss: 2.203306 LR: 0.00002267 [22:47:41] Epoch: 1 Batch: 1400/38378 (3.65%) Loss: 2.228647 LR: 0.00002267 [22:47:43] Epoch: 1 Batch: 1401/38378 (3.65%) Loss: 2.001092 LR: 0.00002267 [22:47:45] Epoch: 1 Batch: 1402/38378 (3.65%) Loss: 2.096603 LR: 0.00002267 [22:47:46] Epoch: 1 Batch: 1403/38378 (3.66%) Loss: 2.024660 LR: 0.00002267 [22:47:48] Epoch: 1 Batch: 1404/38378 (3.66%) Loss: 2.231689 LR: 0.00002278 [22:47:50] Epoch: 1 Batch: 1405/38378 (3.66%) Loss: 1.861011 LR: 0.00002278 [22:47:52] Epoch: 1 Batch: 1406/38378 (3.66%) Loss: 2.305319 LR: 0.00002278 [22:47:53] Epoch: 1 Batch: 1407/38378 (3.67%) Loss: 1.969128 LR: 0.00002278 [22:47:55] Epoch: 1 Batch: 1408/38378 (3.67%) Loss: 2.332360 LR: 0.00002278 [22:47:57] Epoch: 1 Batch: 1409/38378 (3.67%) Loss: 2.011484 LR: 0.00002278 [22:47:58] Epoch: 1 Batch: 1410/38378 (3.67%) Loss: 2.426255 LR: 0.00002278 [22:48:00] Epoch: 1 Batch: 1411/38378 (3.68%) Loss: 2.251364 LR: 0.00002289 [22:48:02] Epoch: 1 Batch: 1412/38378 (3.68%) Loss: 2.211132 LR: 0.00002289 [22:48:04] Epoch: 1 Batch: 1413/38378 (3.68%) Loss: 2.161128 LR: 0.00002289 [22:48:05] Epoch: 1 Batch: 1414/38378 (3.68%) Loss: 2.297013 LR: 0.00002289 [22:48:07] Epoch: 1 Batch: 1415/38378 (3.69%) Loss: 2.301659 LR: 0.00002289 [22:48:09] Epoch: 1 Batch: 1416/38378 (3.69%) Loss: 2.066479 LR: 0.00002289 [22:48:10] Epoch: 1 Batch: 1417/38378 (3.69%) Loss: 2.262302 LR: 0.00002289 [22:48:12] Epoch: 1 Batch: 1418/38378 (3.69%) Loss: 1.939276 LR: 0.00002301 [22:48:18] >> Cleaned up old temp checkpoint: epoch1_step1089 [22:48:18] >> Temp checkpoint saved: epoch1_step1419, size: 0.1702 GB [22:48:18] Epoch: 1 Batch: 1419/38378 (3.70%) Loss: 1.903418 LR: 0.00002301 [22:48:19] Epoch: 1 Batch: 1420/38378 (3.70%) Loss: 2.454389 LR: 0.00002301 [22:48:21] Epoch: 1 Batch: 1421/38378 (3.70%) Loss: 2.349417 LR: 0.00002301 [22:48:23] Epoch: 1 Batch: 1422/38378 (3.71%) Loss: 2.393277 LR: 0.00002301 [22:48:24] Epoch: 1 Batch: 1423/38378 (3.71%) Loss: 2.088578 LR: 0.00002301 [22:48:26] Epoch: 1 Batch: 1424/38378 (3.71%) Loss: 2.167870 LR: 0.00002301 [22:48:28] Epoch: 1 Batch: 1425/38378 (3.71%) Loss: 2.055640 LR: 0.00002312 [22:48:30] Epoch: 1 Batch: 1426/38378 (3.72%) Loss: 2.157231 LR: 0.00002312 [22:48:31] Epoch: 1 Batch: 1427/38378 (3.72%) Loss: 2.179020 LR: 0.00002312 [22:48:33] Epoch: 1 Batch: 1428/38378 (3.72%) Loss: 2.142435 LR: 0.00002312 [22:48:35] Epoch: 1 Batch: 1429/38378 (3.72%) Loss: 2.040403 LR: 0.00002312 [22:48:36] Epoch: 1 Batch: 1430/38378 (3.73%) Loss: 2.252838 LR: 0.00002312 [22:48:38] Epoch: 1 Batch: 1431/38378 (3.73%) Loss: 1.958289 LR: 0.00002312 [22:48:40] Epoch: 1 Batch: 1432/38378 (3.73%) Loss: 1.950084 LR: 0.00002323 [22:48:42] Epoch: 1 Batch: 1433/38378 (3.73%) Loss: 1.947991 LR: 0.00002323 [22:48:43] Epoch: 1 Batch: 1434/38378 (3.74%) Loss: 2.156399 LR: 0.00002323 [22:48:45] Epoch: 1 Batch: 1435/38378 (3.74%) Loss: 2.091145 LR: 0.00002323 [22:48:47] Epoch: 1 Batch: 1436/38378 (3.74%) Loss: 2.298905 LR: 0.00002323 [22:48:49] Epoch: 1 Batch: 1437/38378 (3.74%) Loss: 2.009035 LR: 0.00002323 [22:48:50] Epoch: 1 Batch: 1438/38378 (3.75%) Loss: 2.308878 LR: 0.00002323 [22:48:52] Epoch: 1 Batch: 1439/38378 (3.75%) Loss: 1.847616 LR: 0.00002335 [22:48:54] Epoch: 1 Batch: 1440/38378 (3.75%) Loss: 2.131662 LR: 0.00002335 [22:48:55] Epoch: 1 Batch: 1441/38378 (3.75%) Loss: 2.273590 LR: 0.00002335 [22:48:57] Epoch: 1 Batch: 1442/38378 (3.76%) Loss: 2.335884 LR: 0.00002335 [22:48:59] Epoch: 1 Batch: 1443/38378 (3.76%) Loss: 2.451887 LR: 0.00002335 [22:49:01] Epoch: 1 Batch: 1444/38378 (3.76%) Loss: 2.242651 LR: 0.00002335 [22:49:02] Epoch: 1 Batch: 1445/38378 (3.77%) Loss: 2.042318 LR: 0.00002335 [22:49:04] Epoch: 1 Batch: 1446/38378 (3.77%) Loss: 2.134161 LR: 0.00002346 [22:49:06] Epoch: 1 Batch: 1447/38378 (3.77%) Loss: 2.259896 LR: 0.00002346 [22:49:08] Epoch: 1 Batch: 1448/38378 (3.77%) Loss: 2.128562 LR: 0.00002346 [22:49:09] Epoch: 1 Batch: 1449/38378 (3.78%) Loss: 2.527581 LR: 0.00002346 [22:49:11] Epoch: 1 Batch: 1450/38378 (3.78%) Loss: 2.295426 LR: 0.00002346 [22:49:13] Epoch: 1 Batch: 1451/38378 (3.78%) Loss: 1.847679 LR: 0.00002346 [22:49:18] >> Cleaned up old temp checkpoint: epoch1_step1122 [22:49:18] >> Temp checkpoint saved: epoch1_step1452, size: 0.1702 GB [22:49:18] Epoch: 1 Batch: 1452/38378 (3.78%) Loss: 2.163445 LR: 0.00002346 [22:49:20] Epoch: 1 Batch: 1453/38378 (3.79%) Loss: 2.051880 LR: 0.00002358 [22:49:22] Epoch: 1 Batch: 1454/38378 (3.79%) Loss: 2.183933 LR: 0.00002358 [22:49:23] Epoch: 1 Batch: 1455/38378 (3.79%) Loss: 2.140020 LR: 0.00002358 [22:49:25] Epoch: 1 Batch: 1456/38378 (3.79%) Loss: 2.157481 LR: 0.00002358 [22:49:27] Epoch: 1 Batch: 1457/38378 (3.80%) Loss: 2.116763 LR: 0.00002358 [22:49:29] Epoch: 1 Batch: 1458/38378 (3.80%) Loss: 2.091273 LR: 0.00002358 [22:49:30] Epoch: 1 Batch: 1459/38378 (3.80%) Loss: 2.122425 LR: 0.00002358 [22:49:32] Epoch: 1 Batch: 1460/38378 (3.80%) Loss: 2.151889 LR: 0.00002369 [22:49:34] Epoch: 1 Batch: 1461/38378 (3.81%) Loss: 2.061569 LR: 0.00002369 [22:49:35] Epoch: 1 Batch: 1462/38378 (3.81%) Loss: 2.143836 LR: 0.00002369 [22:49:37] Epoch: 1 Batch: 1463/38378 (3.81%) Loss: 2.251385 LR: 0.00002369 [22:49:39] Epoch: 1 Batch: 1464/38378 (3.81%) Loss: 2.164541 LR: 0.00002369 [22:49:41] Epoch: 1 Batch: 1465/38378 (3.82%) Loss: 2.332103 LR: 0.00002369 [22:49:42] Epoch: 1 Batch: 1466/38378 (3.82%) Loss: 2.203967 LR: 0.00002369 [22:49:44] Epoch: 1 Batch: 1467/38378 (3.82%) Loss: 2.193803 LR: 0.00002380 [22:49:46] Epoch: 1 Batch: 1468/38378 (3.83%) Loss: 2.321503 LR: 0.00002380 [22:49:47] Epoch: 1 Batch: 1469/38378 (3.83%) Loss: 2.249250 LR: 0.00002380 [22:49:49] Epoch: 1 Batch: 1470/38378 (3.83%) Loss: 2.196230 LR: 0.00002380 [22:49:51] Epoch: 1 Batch: 1471/38378 (3.83%) Loss: 2.556034 LR: 0.00002380 [22:49:53] Epoch: 1 Batch: 1472/38378 (3.84%) Loss: 1.989165 LR: 0.00002380 [22:49:54] Epoch: 1 Batch: 1473/38378 (3.84%) Loss: 1.862217 LR: 0.00002380 [22:49:56] Epoch: 1 Batch: 1474/38378 (3.84%) Loss: 2.211442 LR: 0.00002392 [22:49:58] Epoch: 1 Batch: 1475/38378 (3.84%) Loss: 2.221335 LR: 0.00002392 [22:49:59] Epoch: 1 Batch: 1476/38378 (3.85%) Loss: 2.124908 LR: 0.00002392 [22:50:01] Epoch: 1 Batch: 1477/38378 (3.85%) Loss: 2.010646 LR: 0.00002392 [22:50:03] Epoch: 1 Batch: 1478/38378 (3.85%) Loss: 2.270565 LR: 0.00002392 [22:50:05] Epoch: 1 Batch: 1479/38378 (3.85%) Loss: 2.048434 LR: 0.00002392 [22:50:06] Epoch: 1 Batch: 1480/38378 (3.86%) Loss: 2.133584 LR: 0.00002392 [22:50:08] Epoch: 1 Batch: 1481/38378 (3.86%) Loss: 1.955977 LR: 0.00002403 [22:50:10] Epoch: 1 Batch: 1482/38378 (3.86%) Loss: 2.314338 LR: 0.00002403 [22:50:12] Epoch: 1 Batch: 1483/38378 (3.86%) Loss: 2.114835 LR: 0.00002403 [22:50:13] Epoch: 1 Batch: 1484/38378 (3.87%) Loss: 2.087934 LR: 0.00002403 [22:50:19] >> Cleaned up old temp checkpoint: epoch1_step1155 [22:50:19] >> Temp checkpoint saved: epoch1_step1485, size: 0.1702 GB [22:50:19] Epoch: 1 Batch: 1485/38378 (3.87%) Loss: 2.061754 LR: 0.00002403 [22:50:20] Epoch: 1 Batch: 1486/38378 (3.87%) Loss: 1.970918 LR: 0.00002403 [22:50:22] Epoch: 1 Batch: 1487/38378 (3.87%) Loss: 2.203554 LR: 0.00002403 [22:50:24] Epoch: 1 Batch: 1488/38378 (3.88%) Loss: 2.127073 LR: 0.00002415 [22:50:26] Epoch: 1 Batch: 1489/38378 (3.88%) Loss: 2.333715 LR: 0.00002415 [22:50:27] Epoch: 1 Batch: 1490/38378 (3.88%) Loss: 2.129774 LR: 0.00002415 [22:50:29] Epoch: 1 Batch: 1491/38378 (3.89%) Loss: 2.044424 LR: 0.00002415 [22:50:31] Epoch: 1 Batch: 1492/38378 (3.89%) Loss: 2.082837 LR: 0.00002415 [22:50:32] Epoch: 1 Batch: 1493/38378 (3.89%) Loss: 2.183986 LR: 0.00002415 [22:50:34] Epoch: 1 Batch: 1494/38378 (3.89%) Loss: 2.077697 LR: 0.00002415 [22:50:36] Epoch: 1 Batch: 1495/38378 (3.90%) Loss: 2.342376 LR: 0.00002426 [22:50:37] Epoch: 1 Batch: 1496/38378 (3.90%) Loss: 2.272570 LR: 0.00002426 [22:50:39] Epoch: 1 Batch: 1497/38378 (3.90%) Loss: 1.914185 LR: 0.00002426 [22:50:41] Epoch: 1 Batch: 1498/38378 (3.90%) Loss: 1.923901 LR: 0.00002426 [22:50:43] Epoch: 1 Batch: 1499/38378 (3.91%) Loss: 2.159022 LR: 0.00002426 [22:50:44] >> Evaluating batch 0 [22:50:45] >> Evaluating batch 1 [22:50:46] >> Evaluating batch 2 [22:50:47] >> Evaluating batch 3 [22:50:48] >> Evaluating batch 4 [22:50:49] >> Evaluating batch 5 [22:50:50] >> Evaluating batch 6 [22:50:51] >> Evaluating batch 7 [22:50:52] >> Evaluating batch 8 [22:50:53] >> Evaluating batch 9 [22:50:54] >> Evaluating batch 10 [22:50:55] >> Evaluating batch 11 [22:50:56] >> Evaluating batch 12 [22:50:57] >> Evaluating batch 13 [22:50:57] >> Evaluating batch 14 [22:50:58] >> Evaluating batch 15 [22:50:59] >> Evaluating batch 16 [22:51:00] Epoch: 1 Step: 1500/38378 Evaluation: [22:51:00] [1mAvg Loss Since Last Eval: 2.1939 Val Loss: 2.2948 Validation loss delta: -0.0781 Perplexity: 9.9221 LR: 0.00002426 [22:51:04] >> Checkpoint saved: epoch1_step1500, size: 0.1702 GB [22:51:04] Epoch: 1 Batch: 1500/38378 (3.91%) Loss: 1.951164 LR: 0.00002426 [22:51:06] Epoch: 1 Batch: 1501/38378 (3.91%) Loss: 2.253273 LR: 0.00002426 [22:51:07] Epoch: 1 Batch: 1502/38378 (3.91%) Loss: 2.216931 LR: 0.00002437 [22:51:09] Epoch: 1 Batch: 1503/38378 (3.92%) Loss: 2.087189 LR: 0.00002437 [22:51:11] Epoch: 1 Batch: 1504/38378 (3.92%) Loss: 2.228819 LR: 0.00002437 [22:51:12] Epoch: 1 Batch: 1505/38378 (3.92%) Loss: 2.435357 LR: 0.00002437 [22:51:14] Epoch: 1 Batch: 1506/38378 (3.92%) Loss: 2.130095 LR: 0.00002437 [22:51:16] Epoch: 1 Batch: 1507/38378 (3.93%) Loss: 2.037754 LR: 0.00002437 [22:51:18] Epoch: 1 Batch: 1508/38378 (3.93%) Loss: 2.222289 LR: 0.00002437 [22:51:19] Epoch: 1 Batch: 1509/38378 (3.93%) Loss: 2.104354 LR: 0.00002449 [22:51:21] Epoch: 1 Batch: 1510/38378 (3.93%) Loss: 2.341756 LR: 0.00002449 [22:51:23] Epoch: 1 Batch: 1511/38378 (3.94%) Loss: 2.049899 LR: 0.00002449 [22:51:24] Epoch: 1 Batch: 1512/38378 (3.94%) Loss: 2.343318 LR: 0.00002449 [22:51:26] Epoch: 1 Batch: 1513/38378 (3.94%) Loss: 1.938720 LR: 0.00002449 [22:51:28] Epoch: 1 Batch: 1514/38378 (3.94%) Loss: 1.999284 LR: 0.00002449 [22:51:30] Epoch: 1 Batch: 1515/38378 (3.95%) Loss: 2.267390 LR: 0.00002449 [22:51:31] Epoch: 1 Batch: 1516/38378 (3.95%) Loss: 2.160032 LR: 0.00002460 [22:51:33] Epoch: 1 Batch: 1517/38378 (3.95%) Loss: 1.850871 LR: 0.00002460 [22:51:39] >> Cleaned up old temp checkpoint: epoch1_step1188 [22:51:39] >> Temp checkpoint saved: epoch1_step1518, size: 0.1702 GB [22:51:39] Epoch: 1 Batch: 1518/38378 (3.96%) Loss: 2.075061 LR: 0.00002460 [22:51:40] Epoch: 1 Batch: 1519/38378 (3.96%) Loss: 2.297321 LR: 0.00002460 [22:51:42] Epoch: 1 Batch: 1520/38378 (3.96%) Loss: 2.090566 LR: 0.00002460 [22:51:44] Epoch: 1 Batch: 1521/38378 (3.96%) Loss: 2.032634 LR: 0.00002460 [22:51:45] Epoch: 1 Batch: 1522/38378 (3.97%) Loss: 2.289391 LR: 0.00002460 [22:51:47] Epoch: 1 Batch: 1523/38378 (3.97%) Loss: 2.257866 LR: 0.00002472 [22:51:49] Epoch: 1 Batch: 1524/38378 (3.97%) Loss: 2.298833 LR: 0.00002472 [22:51:51] Epoch: 1 Batch: 1525/38378 (3.97%) Loss: 1.855131 LR: 0.00002472 [22:51:52] Epoch: 1 Batch: 1526/38378 (3.98%) Loss: 2.019964 LR: 0.00002472 [22:51:54] Epoch: 1 Batch: 1527/38378 (3.98%) Loss: 1.883322 LR: 0.00002472 [22:51:56] Epoch: 1 Batch: 1528/38378 (3.98%) Loss: 2.136859 LR: 0.00002472 [22:51:57] Epoch: 1 Batch: 1529/38378 (3.98%) Loss: 2.150979 LR: 0.00002472 [22:51:59] Epoch: 1 Batch: 1530/38378 (3.99%) Loss: 2.151801 LR: 0.00002483 [22:52:01] Epoch: 1 Batch: 1531/38378 (3.99%) Loss: 2.165458 LR: 0.00002483 [22:52:03] Epoch: 1 Batch: 1532/38378 (3.99%) Loss: 2.071557 LR: 0.00002483 [22:52:04] Epoch: 1 Batch: 1533/38378 (3.99%) Loss: 1.823841 LR: 0.00002483 [22:52:06] Epoch: 1 Batch: 1534/38378 (4.00%) Loss: 2.279119 LR: 0.00002483 [22:52:08] Epoch: 1 Batch: 1535/38378 (4.00%) Loss: 2.071026 LR: 0.00002483 [22:52:09] Epoch: 1 Batch: 1536/38378 (4.00%) Loss: 2.179393 LR: 0.00002483 [22:52:11] Epoch: 1 Batch: 1537/38378 (4.00%) Loss: 1.972529 LR: 0.00002494 [22:52:13] Epoch: 1 Batch: 1538/38378 (4.01%) Loss: 2.251748 LR: 0.00002494 [22:52:15] Epoch: 1 Batch: 1539/38378 (4.01%) Loss: 1.940086 LR: 0.00002494 [22:52:16] Epoch: 1 Batch: 1540/38378 (4.01%) Loss: 2.105968 LR: 0.00002494 [22:52:18] Epoch: 1 Batch: 1541/38378 (4.02%) Loss: 2.376522 LR: 0.00002494 [22:52:20] Epoch: 1 Batch: 1542/38378 (4.02%) Loss: 2.090080 LR: 0.00002494 [22:52:22] Epoch: 1 Batch: 1543/38378 (4.02%) Loss: 2.131909 LR: 0.00002494 [22:52:23] Epoch: 1 Batch: 1544/38378 (4.02%) Loss: 2.242106 LR: 0.00002506 [22:52:25] Epoch: 1 Batch: 1545/38378 (4.03%) Loss: 2.063104 LR: 0.00002506 [22:52:27] Epoch: 1 Batch: 1546/38378 (4.03%) Loss: 2.299373 LR: 0.00002506 [22:52:28] Epoch: 1 Batch: 1547/38378 (4.03%) Loss: 2.244961 LR: 0.00002506 [22:52:30] Epoch: 1 Batch: 1548/38378 (4.03%) Loss: 2.143691 LR: 0.00002506 [22:52:32] Epoch: 1 Batch: 1549/38378 (4.04%) Loss: 2.394026 LR: 0.00002506 [22:52:33] Epoch: 1 Batch: 1550/38378 (4.04%) Loss: 2.258300 LR: 0.00002506 [22:52:40] >> Cleaned up old temp checkpoint: epoch1_step1221 [22:52:40] >> Temp checkpoint saved: epoch1_step1551, size: 0.1702 GB [22:52:40] Epoch: 1 Batch: 1551/38378 (4.04%) Loss: 2.200159 LR: 0.00002517 [22:52:41] Epoch: 1 Batch: 1552/38378 (4.04%) Loss: 2.403032 LR: 0.00002517 [22:52:43] Epoch: 1 Batch: 1553/38378 (4.05%) Loss: 2.083825 LR: 0.00002517 [22:52:45] Epoch: 1 Batch: 1554/38378 (4.05%) Loss: 2.070457 LR: 0.00002517 [22:52:46] Epoch: 1 Batch: 1555/38378 (4.05%) Loss: 2.141441 LR: 0.00002517 [22:52:48] Epoch: 1 Batch: 1556/38378 (4.05%) Loss: 1.967154 LR: 0.00002517 [22:52:50] Epoch: 1 Batch: 1557/38378 (4.06%) Loss: 1.888095 LR: 0.00002517 [22:52:52] Epoch: 1 Batch: 1558/38378 (4.06%) Loss: 2.299089 LR: 0.00002528 [22:52:53] Epoch: 1 Batch: 1559/38378 (4.06%) Loss: 2.295424 LR: 0.00002528 [22:52:55] Epoch: 1 Batch: 1560/38378 (4.06%) Loss: 2.275674 LR: 0.00002528 [22:52:57] Epoch: 1 Batch: 1561/38378 (4.07%) Loss: 2.049292 LR: 0.00002528 [22:52:58] Epoch: 1 Batch: 1562/38378 (4.07%) Loss: 2.280156 LR: 0.00002528 [22:53:00] Epoch: 1 Batch: 1563/38378 (4.07%) Loss: 1.948663 LR: 0.00002528 [22:53:02] Epoch: 1 Batch: 1564/38378 (4.08%) Loss: 2.234488 LR: 0.00002528 [22:53:04] Epoch: 1 Batch: 1565/38378 (4.08%) Loss: 2.255351 LR: 0.00002540 [22:53:05] Epoch: 1 Batch: 1566/38378 (4.08%) Loss: 2.192027 LR: 0.00002540 [22:53:07] Epoch: 1 Batch: 1567/38378 (4.08%) Loss: 2.257930 LR: 0.00002540 [22:53:09] Epoch: 1 Batch: 1568/38378 (4.09%) Loss: 2.006736 LR: 0.00002540 [22:53:10] Epoch: 1 Batch: 1569/38378 (4.09%) Loss: 2.281848 LR: 0.00002540 [22:53:12] Epoch: 1 Batch: 1570/38378 (4.09%) Loss: 2.253753 LR: 0.00002540 [22:53:14] Epoch: 1 Batch: 1571/38378 (4.09%) Loss: 2.168597 LR: 0.00002540 [22:53:16] Epoch: 1 Batch: 1572/38378 (4.10%) Loss: 2.094562 LR: 0.00002551 [22:53:17] Epoch: 1 Batch: 1573/38378 (4.10%) Loss: 2.201246 LR: 0.00002551 [22:53:19] Epoch: 1 Batch: 1574/38378 (4.10%) Loss: 1.895353 LR: 0.00002551 [22:53:21] Epoch: 1 Batch: 1575/38378 (4.10%) Loss: 2.040542 LR: 0.00002551 [22:53:22] Epoch: 1 Batch: 1576/38378 (4.11%) Loss: 2.219421 LR: 0.00002551 [22:53:24] Epoch: 1 Batch: 1577/38378 (4.11%) Loss: 2.330731 LR: 0.00002551 [22:53:26] Epoch: 1 Batch: 1578/38378 (4.11%) Loss: 2.160148 LR: 0.00002551 [22:53:28] Epoch: 1 Batch: 1579/38378 (4.11%) Loss: 2.340636 LR: 0.00002563 [22:53:29] Epoch: 1 Batch: 1580/38378 (4.12%) Loss: 2.057997 LR: 0.00002563 [22:53:31] Epoch: 1 Batch: 1581/38378 (4.12%) Loss: 2.133434 LR: 0.00002563 [22:53:33] Epoch: 1 Batch: 1582/38378 (4.12%) Loss: 2.474434 LR: 0.00002563 [22:53:34] Epoch: 1 Batch: 1583/38378 (4.12%) Loss: 2.310620 LR: 0.00002563 [22:53:40] >> Cleaned up old temp checkpoint: epoch1_step1254 [22:53:40] >> Temp checkpoint saved: epoch1_step1584, size: 0.1702 GB [22:53:40] Epoch: 1 Batch: 1584/38378 (4.13%) Loss: 1.959900 LR: 0.00002563 [22:53:42] Epoch: 1 Batch: 1585/38378 (4.13%) Loss: 2.114975 LR: 0.00002563 [22:53:44] Epoch: 1 Batch: 1586/38378 (4.13%) Loss: 2.268670 LR: 0.00002574 [22:53:45] Epoch: 1 Batch: 1587/38378 (4.14%) Loss: 1.912564 LR: 0.00002574 [22:53:47] Epoch: 1 Batch: 1588/38378 (4.14%) Loss: 2.144301 LR: 0.00002574 [22:53:49] Epoch: 1 Batch: 1589/38378 (4.14%) Loss: 2.099195 LR: 0.00002574 [22:53:50] Epoch: 1 Batch: 1590/38378 (4.14%) Loss: 2.304856 LR: 0.00002574 [22:53:52] Epoch: 1 Batch: 1591/38378 (4.15%) Loss: 2.193111 LR: 0.00002574 [22:53:54] Epoch: 1 Batch: 1592/38378 (4.15%) Loss: 2.130089 LR: 0.00002574 [22:53:56] Epoch: 1 Batch: 1593/38378 (4.15%) Loss: 1.927632 LR: 0.00002585 [22:53:57] Epoch: 1 Batch: 1594/38378 (4.15%) Loss: 2.243536 LR: 0.00002585 [22:53:59] Epoch: 1 Batch: 1595/38378 (4.16%) Loss: 2.046861 LR: 0.00002585 [22:54:01] Epoch: 1 Batch: 1596/38378 (4.16%) Loss: 2.111112 LR: 0.00002585 [22:54:02] Epoch: 1 Batch: 1597/38378 (4.16%) Loss: 1.848633 LR: 0.00002585 [22:54:04] Epoch: 1 Batch: 1598/38378 (4.16%) Loss: 2.333470 LR: 0.00002585 [22:54:06] Epoch: 1 Batch: 1599/38378 (4.17%) Loss: 2.023428 LR: 0.00002585 [22:54:08] Epoch: 1 Batch: 1600/38378 (4.17%) Loss: 2.037003 LR: 0.00002597 [22:54:09] Epoch: 1 Batch: 1601/38378 (4.17%) Loss: 2.022699 LR: 0.00002597 [22:54:11] Epoch: 1 Batch: 1602/38378 (4.17%) Loss: 2.238757 LR: 0.00002597 [22:54:13] Epoch: 1 Batch: 1603/38378 (4.18%) Loss: 2.087529 LR: 0.00002597 [22:54:14] Epoch: 1 Batch: 1604/38378 (4.18%) Loss: 1.871752 LR: 0.00002597 [22:54:16] Epoch: 1 Batch: 1605/38378 (4.18%) Loss: 2.417693 LR: 0.00002597 [22:54:18] Epoch: 1 Batch: 1606/38378 (4.18%) Loss: 2.115495 LR: 0.00002597 [22:54:19] Epoch: 1 Batch: 1607/38378 (4.19%) Loss: 2.222329 LR: 0.00002608 [22:54:21] Epoch: 1 Batch: 1608/38378 (4.19%) Loss: 2.111047 LR: 0.00002608 [22:54:23] Epoch: 1 Batch: 1609/38378 (4.19%) Loss: 2.030957 LR: 0.00002608 [22:54:24] Epoch: 1 Batch: 1610/38378 (4.20%) Loss: 2.006907 LR: 0.00002608 [22:54:26] Epoch: 1 Batch: 1611/38378 (4.20%) Loss: 2.085886 LR: 0.00002608 [22:54:28] Epoch: 1 Batch: 1612/38378 (4.20%) Loss: 2.308063 LR: 0.00002608 [22:54:30] Epoch: 1 Batch: 1613/38378 (4.20%) Loss: 2.314592 LR: 0.00002608 [22:54:31] Epoch: 1 Batch: 1614/38378 (4.21%) Loss: 2.227323 LR: 0.00002620 [22:54:33] Epoch: 1 Batch: 1615/38378 (4.21%) Loss: 2.023457 LR: 0.00002620 [22:54:35] Epoch: 1 Batch: 1616/38378 (4.21%) Loss: 2.061758 LR: 0.00002620 [22:54:40] >> Cleaned up old temp checkpoint: epoch1_step1287 [22:54:40] >> Temp checkpoint saved: epoch1_step1617, size: 0.1702 GB [22:54:40] Epoch: 1 Batch: 1617/38378 (4.21%) Loss: 2.151324 LR: 0.00002620 [22:54:42] Epoch: 1 Batch: 1618/38378 (4.22%) Loss: 2.096740 LR: 0.00002620 [22:54:44] Epoch: 1 Batch: 1619/38378 (4.22%) Loss: 2.339346 LR: 0.00002620 [22:54:45] Epoch: 1 Batch: 1620/38378 (4.22%) Loss: 2.340799 LR: 0.00002620 [22:54:47] Epoch: 1 Batch: 1621/38378 (4.22%) Loss: 2.002657 LR: 0.00002631 [22:54:49] Epoch: 1 Batch: 1622/38378 (4.23%) Loss: 1.920086 LR: 0.00002631 [22:54:51] Epoch: 1 Batch: 1623/38378 (4.23%) Loss: 2.449331 LR: 0.00002631 [22:54:52] Epoch: 1 Batch: 1624/38378 (4.23%) Loss: 2.338094 LR: 0.00002631 [22:54:54] Epoch: 1 Batch: 1625/38378 (4.23%) Loss: 1.903337 LR: 0.00002631 [22:54:56] Epoch: 1 Batch: 1626/38378 (4.24%) Loss: 2.322266 LR: 0.00002631 [22:54:57] Epoch: 1 Batch: 1627/38378 (4.24%) Loss: 2.199806 LR: 0.00002631 [22:54:59] Epoch: 1 Batch: 1628/38378 (4.24%) Loss: 2.285554 LR: 0.00002642 [22:55:01] Epoch: 1 Batch: 1629/38378 (4.24%) Loss: 2.176074 LR: 0.00002642 [22:55:02] Epoch: 1 Batch: 1630/38378 (4.25%) Loss: 1.990614 LR: 0.00002642 [22:55:04] Epoch: 1 Batch: 1631/38378 (4.25%) Loss: 2.309839 LR: 0.00002642 [22:55:06] Epoch: 1 Batch: 1632/38378 (4.25%) Loss: 1.969015 LR: 0.00002642 [22:55:08] Epoch: 1 Batch: 1633/38378 (4.26%) Loss: 2.376291 LR: 0.00002642 [22:55:09] Epoch: 1 Batch: 1634/38378 (4.26%) Loss: 2.556727 LR: 0.00002642 [22:55:11] Epoch: 1 Batch: 1635/38378 (4.26%) Loss: 2.464419 LR: 0.00002654 [22:55:13] Epoch: 1 Batch: 1636/38378 (4.26%) Loss: 2.199304 LR: 0.00002654 [22:55:14] Epoch: 1 Batch: 1637/38378 (4.27%) Loss: 2.107065 LR: 0.00002654 [22:55:16] Epoch: 1 Batch: 1638/38378 (4.27%) Loss: 1.925143 LR: 0.00002654 [22:55:18] Epoch: 1 Batch: 1639/38378 (4.27%) Loss: 2.230034 LR: 0.00002654 [22:55:20] Epoch: 1 Batch: 1640/38378 (4.27%) Loss: 2.348477 LR: 0.00002654 [22:55:21] Epoch: 1 Batch: 1641/38378 (4.28%) Loss: 2.236915 LR: 0.00002654 [22:55:23] Epoch: 1 Batch: 1642/38378 (4.28%) Loss: 2.196802 LR: 0.00002665 [22:55:25] Epoch: 1 Batch: 1643/38378 (4.28%) Loss: 2.040906 LR: 0.00002665 [22:55:26] Epoch: 1 Batch: 1644/38378 (4.28%) Loss: 2.177886 LR: 0.00002665 [22:55:28] Epoch: 1 Batch: 1645/38378 (4.29%) Loss: 2.224734 LR: 0.00002665 [22:55:30] Epoch: 1 Batch: 1646/38378 (4.29%) Loss: 2.127580 LR: 0.00002665 [22:55:32] Epoch: 1 Batch: 1647/38378 (4.29%) Loss: 2.277780 LR: 0.00002665 [22:55:33] Epoch: 1 Batch: 1648/38378 (4.29%) Loss: 1.882855 LR: 0.00002665 [22:55:35] Epoch: 1 Batch: 1649/38378 (4.30%) Loss: 2.349675 LR: 0.00002677 [22:55:41] >> Cleaned up old temp checkpoint: epoch1_step1320 [22:55:41] >> Temp checkpoint saved: epoch1_step1650, size: 0.1702 GB [22:55:41] Epoch: 1 Batch: 1650/38378 (4.30%) Loss: 1.866318 LR: 0.00002677 [22:55:42] Epoch: 1 Batch: 1651/38378 (4.30%) Loss: 2.153677 LR: 0.00002677 [22:55:44] Epoch: 1 Batch: 1652/38378 (4.30%) Loss: 2.291927 LR: 0.00002677 [22:55:46] Epoch: 1 Batch: 1653/38378 (4.31%) Loss: 1.938041 LR: 0.00002677 [22:55:47] Epoch: 1 Batch: 1654/38378 (4.31%) Loss: 1.955416 LR: 0.00002677 [22:55:49] Epoch: 1 Batch: 1655/38378 (4.31%) Loss: 2.159003 LR: 0.00002677 [22:55:51] Epoch: 1 Batch: 1656/38378 (4.31%) Loss: 2.153174 LR: 0.00002688 [22:55:53] Epoch: 1 Batch: 1657/38378 (4.32%) Loss: 2.516941 LR: 0.00002688 [22:55:54] Epoch: 1 Batch: 1658/38378 (4.32%) Loss: 2.098830 LR: 0.00002688 [22:55:56] Epoch: 1 Batch: 1659/38378 (4.32%) Loss: 1.905271 LR: 0.00002688 [22:55:58] Epoch: 1 Batch: 1660/38378 (4.33%) Loss: 1.969208 LR: 0.00002688 [22:55:59] Epoch: 1 Batch: 1661/38378 (4.33%) Loss: 2.236362 LR: 0.00002688 [22:56:01] Epoch: 1 Batch: 1662/38378 (4.33%) Loss: 2.033532 LR: 0.00002688 [22:56:03] Epoch: 1 Batch: 1663/38378 (4.33%) Loss: 2.082271 LR: 0.00002699 [22:56:05] Epoch: 1 Batch: 1664/38378 (4.34%) Loss: 2.246315 LR: 0.00002699 [22:56:06] Epoch: 1 Batch: 1665/38378 (4.34%) Loss: 2.121267 LR: 0.00002699 [22:56:08] Epoch: 1 Batch: 1666/38378 (4.34%) Loss: 1.862837 LR: 0.00002699 [22:56:10] Epoch: 1 Batch: 1667/38378 (4.34%) Loss: 2.072571 LR: 0.00002699 [22:56:12] Epoch: 1 Batch: 1668/38378 (4.35%) Loss: 2.114615 LR: 0.00002699 [22:56:13] Epoch: 1 Batch: 1669/38378 (4.35%) Loss: 2.235082 LR: 0.00002699 [22:56:15] Epoch: 1 Batch: 1670/38378 (4.35%) Loss: 2.178928 LR: 0.00002711 [22:56:17] Epoch: 1 Batch: 1671/38378 (4.35%) Loss: 2.050560 LR: 0.00002711 [22:56:18] Epoch: 1 Batch: 1672/38378 (4.36%) Loss: 2.115236 LR: 0.00002711 [22:56:20] Epoch: 1 Batch: 1673/38378 (4.36%) Loss: 2.287935 LR: 0.00002711 [22:56:22] Epoch: 1 Batch: 1674/38378 (4.36%) Loss: 2.368252 LR: 0.00002711 [22:56:23] Epoch: 1 Batch: 1675/38378 (4.36%) Loss: 2.376531 LR: 0.00002711 [22:56:25] Epoch: 1 Batch: 1676/38378 (4.37%) Loss: 2.354725 LR: 0.00002711 [22:56:27] Epoch: 1 Batch: 1677/38378 (4.37%) Loss: 2.400668 LR: 0.00002722 [22:56:28] Epoch: 1 Batch: 1678/38378 (4.37%) Loss: 2.312017 LR: 0.00002722 [22:56:30] Epoch: 1 Batch: 1679/38378 (4.37%) Loss: 2.320798 LR: 0.00002722 [22:56:32] Epoch: 1 Batch: 1680/38378 (4.38%) Loss: 2.036642 LR: 0.00002722 [22:56:34] Epoch: 1 Batch: 1681/38378 (4.38%) Loss: 2.053143 LR: 0.00002722 [22:56:35] Epoch: 1 Batch: 1682/38378 (4.38%) Loss: 2.207022 LR: 0.00002722 [22:56:41] >> Cleaned up old temp checkpoint: epoch1_step1353 [22:56:41] >> Temp checkpoint saved: epoch1_step1683, size: 0.1702 GB [22:56:41] Epoch: 1 Batch: 1683/38378 (4.39%) Loss: 2.632588 LR: 0.00002722 [22:56:42] Epoch: 1 Batch: 1684/38378 (4.39%) Loss: 2.178931 LR: 0.00002733 [22:56:44] Epoch: 1 Batch: 1685/38378 (4.39%) Loss: 2.387262 LR: 0.00002733 [22:56:46] Epoch: 1 Batch: 1686/38378 (4.39%) Loss: 2.146719 LR: 0.00002733 [22:56:47] Epoch: 1 Batch: 1687/38378 (4.40%) Loss: 2.199212 LR: 0.00002733 [22:56:49] Epoch: 1 Batch: 1688/38378 (4.40%) Loss: 2.104156 LR: 0.00002733 [22:56:51] Epoch: 1 Batch: 1689/38378 (4.40%) Loss: 2.117516 LR: 0.00002733 [22:56:52] Epoch: 1 Batch: 1690/38378 (4.40%) Loss: 2.201334 LR: 0.00002733 [22:56:54] Epoch: 1 Batch: 1691/38378 (4.41%) Loss: 1.910871 LR: 0.00002745 [22:56:56] Epoch: 1 Batch: 1692/38378 (4.41%) Loss: 2.532411 LR: 0.00002745 [22:56:57] Epoch: 1 Batch: 1693/38378 (4.41%) Loss: 2.326171 LR: 0.00002745 [22:56:59] Epoch: 1 Batch: 1694/38378 (4.41%) Loss: 2.199494 LR: 0.00002745 [22:57:01] Epoch: 1 Batch: 1695/38378 (4.42%) Loss: 2.071105 LR: 0.00002745 [22:57:03] Epoch: 1 Batch: 1696/38378 (4.42%) Loss: 1.944267 LR: 0.00002745 [22:57:04] Epoch: 1 Batch: 1697/38378 (4.42%) Loss: 2.509650 LR: 0.00002745 [22:57:06] Epoch: 1 Batch: 1698/38378 (4.42%) Loss: 2.306263 LR: 0.00002756 [22:57:08] Epoch: 1 Batch: 1699/38378 (4.43%) Loss: 2.123835 LR: 0.00002756 [22:57:09] Epoch: 1 Batch: 1700/38378 (4.43%) Loss: 2.135135 LR: 0.00002756 [22:57:11] Epoch: 1 Batch: 1701/38378 (4.43%) Loss: 2.164179 LR: 0.00002756 [22:57:13] Epoch: 1 Batch: 1702/38378 (4.43%) Loss: 2.203838 LR: 0.00002756 [22:57:15] Epoch: 1 Batch: 1703/38378 (4.44%) Loss: 1.925977 LR: 0.00002756 [22:57:16] Epoch: 1 Batch: 1704/38378 (4.44%) Loss: 2.214497 LR: 0.00002756 [22:57:18] Epoch: 1 Batch: 1705/38378 (4.44%) Loss: 2.116419 LR: 0.00002768 [22:57:20] Epoch: 1 Batch: 1706/38378 (4.45%) Loss: 2.357285 LR: 0.00002768 [22:57:21] Epoch: 1 Batch: 1707/38378 (4.45%) Loss: 1.958508 LR: 0.00002768 [22:57:23] Epoch: 1 Batch: 1708/38378 (4.45%) Loss: 2.255493 LR: 0.00002768 [22:57:25] Epoch: 1 Batch: 1709/38378 (4.45%) Loss: 2.203868 LR: 0.00002768 [22:57:27] Epoch: 1 Batch: 1710/38378 (4.46%) Loss: 2.309745 LR: 0.00002768 [22:57:28] Epoch: 1 Batch: 1711/38378 (4.46%) Loss: 1.981248 LR: 0.00002768 [22:57:30] Epoch: 1 Batch: 1712/38378 (4.46%) Loss: 2.284818 LR: 0.00002779 [22:57:32] Epoch: 1 Batch: 1713/38378 (4.46%) Loss: 2.049701 LR: 0.00002779 [22:57:34] Epoch: 1 Batch: 1714/38378 (4.47%) Loss: 2.187848 LR: 0.00002779 [22:57:35] Epoch: 1 Batch: 1715/38378 (4.47%) Loss: 2.381933 LR: 0.00002779 [22:57:41] >> Cleaned up old temp checkpoint: epoch1_step1386 [22:57:41] >> Temp checkpoint saved: epoch1_step1716, size: 0.1702 GB [22:57:41] Epoch: 1 Batch: 1716/38378 (4.47%) Loss: 2.470870 LR: 0.00002779 [22:57:43] Epoch: 1 Batch: 1717/38378 (4.47%) Loss: 2.456876 LR: 0.00002779 [22:57:44] Epoch: 1 Batch: 1718/38378 (4.48%) Loss: 1.896476 LR: 0.00002779 [22:57:46] Epoch: 1 Batch: 1719/38378 (4.48%) Loss: 2.131408 LR: 0.00002790 [22:57:48] Epoch: 1 Batch: 1720/38378 (4.48%) Loss: 2.183335 LR: 0.00002790 [22:57:49] Epoch: 1 Batch: 1721/38378 (4.48%) Loss: 2.363118 LR: 0.00002790 [22:57:51] Epoch: 1 Batch: 1722/38378 (4.49%) Loss: 2.159676 LR: 0.00002790 [22:57:53] Epoch: 1 Batch: 1723/38378 (4.49%) Loss: 2.220191 LR: 0.00002790 [22:57:54] Epoch: 1 Batch: 1724/38378 (4.49%) Loss: 1.925780 LR: 0.00002790 [22:57:56] Epoch: 1 Batch: 1725/38378 (4.49%) Loss: 1.870172 LR: 0.00002790 [22:57:58] Epoch: 1 Batch: 1726/38378 (4.50%) Loss: 2.146592 LR: 0.00002802 [22:58:00] Epoch: 1 Batch: 1727/38378 (4.50%) Loss: 2.227577 LR: 0.00002802 [22:58:01] Epoch: 1 Batch: 1728/38378 (4.50%) Loss: 2.106805 LR: 0.00002802 [22:58:03] Epoch: 1 Batch: 1729/38378 (4.51%) Loss: 2.167259 LR: 0.00002802 [22:58:05] Epoch: 1 Batch: 1730/38378 (4.51%) Loss: 2.257608 LR: 0.00002802 [22:58:06] Epoch: 1 Batch: 1731/38378 (4.51%) Loss: 2.266949 LR: 0.00002802 [22:58:08] Epoch: 1 Batch: 1732/38378 (4.51%) Loss: 2.166760 LR: 0.00002802 [22:58:10] Epoch: 1 Batch: 1733/38378 (4.52%) Loss: 1.850667 LR: 0.00002813 [22:58:12] Epoch: 1 Batch: 1734/38378 (4.52%) Loss: 2.144458 LR: 0.00002813 [22:58:13] Epoch: 1 Batch: 1735/38378 (4.52%) Loss: 2.285538 LR: 0.00002813 [22:58:15] Epoch: 1 Batch: 1736/38378 (4.52%) Loss: 2.655144 LR: 0.00002813 [22:58:17] Epoch: 1 Batch: 1737/38378 (4.53%) Loss: 2.195747 LR: 0.00002813 [22:58:19] Epoch: 1 Batch: 1738/38378 (4.53%) Loss: 2.174826 LR: 0.00002813 [22:58:20] Epoch: 1 Batch: 1739/38378 (4.53%) Loss: 2.270308 LR: 0.00002813 [22:58:22] Epoch: 1 Batch: 1740/38378 (4.53%) Loss: 2.358975 LR: 0.00002825 [22:58:24] Epoch: 1 Batch: 1741/38378 (4.54%) Loss: 2.135242 LR: 0.00002825 [22:58:25] Epoch: 1 Batch: 1742/38378 (4.54%) Loss: 2.370115 LR: 0.00002825 [22:58:27] Epoch: 1 Batch: 1743/38378 (4.54%) Loss: 2.086851 LR: 0.00002825 [22:58:29] Epoch: 1 Batch: 1744/38378 (4.54%) Loss: 2.141946 LR: 0.00002825 [22:58:31] Epoch: 1 Batch: 1745/38378 (4.55%) Loss: 2.159686 LR: 0.00002825 [22:58:32] Epoch: 1 Batch: 1746/38378 (4.55%) Loss: 2.159151 LR: 0.00002825 [22:58:34] Epoch: 1 Batch: 1747/38378 (4.55%) Loss: 1.978777 LR: 0.00002836 [22:58:36] Epoch: 1 Batch: 1748/38378 (4.55%) Loss: 1.971067 LR: 0.00002836 [22:58:41] >> Cleaned up old temp checkpoint: epoch1_step1419 [22:58:41] >> Temp checkpoint saved: epoch1_step1749, size: 0.1702 GB [22:58:41] Epoch: 1 Batch: 1749/38378 (4.56%) Loss: 2.116151 LR: 0.00002836 [22:58:43] Epoch: 1 Batch: 1750/38378 (4.56%) Loss: 2.028837 LR: 0.00002836 [22:58:45] Epoch: 1 Batch: 1751/38378 (4.56%) Loss: 1.940651 LR: 0.00002836 [22:58:46] Epoch: 1 Batch: 1752/38378 (4.57%) Loss: 1.836811 LR: 0.00002836 [22:58:48] Epoch: 1 Batch: 1753/38378 (4.57%) Loss: 2.361853 LR: 0.00002836 [22:58:50] Epoch: 1 Batch: 1754/38378 (4.57%) Loss: 2.336723 LR: 0.00002847 [22:58:51] Epoch: 1 Batch: 1755/38378 (4.57%) Loss: 2.248835 LR: 0.00002847 [22:58:53] Epoch: 1 Batch: 1756/38378 (4.58%) Loss: 2.176446 LR: 0.00002847 [22:58:55] Epoch: 1 Batch: 1757/38378 (4.58%) Loss: 2.321644 LR: 0.00002847 [22:58:56] Epoch: 1 Batch: 1758/38378 (4.58%) Loss: 2.453677 LR: 0.00002847 [22:58:58] Epoch: 1 Batch: 1759/38378 (4.58%) Loss: 2.341236 LR: 0.00002847 [22:59:00] Epoch: 1 Batch: 1760/38378 (4.59%) Loss: 2.387456 LR: 0.00002847 [22:59:02] Epoch: 1 Batch: 1761/38378 (4.59%) Loss: 2.090935 LR: 0.00002859 [22:59:03] Epoch: 1 Batch: 1762/38378 (4.59%) Loss: 1.919222 LR: 0.00002859 [22:59:05] Epoch: 1 Batch: 1763/38378 (4.59%) Loss: 2.034333 LR: 0.00002859 [22:59:07] Epoch: 1 Batch: 1764/38378 (4.60%) Loss: 2.137968 LR: 0.00002859 [22:59:08] Epoch: 1 Batch: 1765/38378 (4.60%) Loss: 2.143405 LR: 0.00002859 [22:59:10] Epoch: 1 Batch: 1766/38378 (4.60%) Loss: 1.895978 LR: 0.00002859 [22:59:12] Epoch: 1 Batch: 1767/38378 (4.60%) Loss: 2.121862 LR: 0.00002859 [22:59:14] Epoch: 1 Batch: 1768/38378 (4.61%) Loss: 2.259369 LR: 0.00002870 [22:59:15] Epoch: 1 Batch: 1769/38378 (4.61%) Loss: 2.399077 LR: 0.00002870 [22:59:17] Epoch: 1 Batch: 1770/38378 (4.61%) Loss: 2.299661 LR: 0.00002870 [22:59:19] Epoch: 1 Batch: 1771/38378 (4.61%) Loss: 2.019463 LR: 0.00002870 [22:59:20] Epoch: 1 Batch: 1772/38378 (4.62%) Loss: 2.453168 LR: 0.00002870 [22:59:22] Epoch: 1 Batch: 1773/38378 (4.62%) Loss: 1.945382 LR: 0.00002870 [22:59:24] Epoch: 1 Batch: 1774/38378 (4.62%) Loss: 2.126784 LR: 0.00002870 [22:59:26] Epoch: 1 Batch: 1775/38378 (4.63%) Loss: 2.471445 LR: 0.00002882 [22:59:27] Epoch: 1 Batch: 1776/38378 (4.63%) Loss: 2.044829 LR: 0.00002882 [22:59:29] Epoch: 1 Batch: 1777/38378 (4.63%) Loss: 2.116757 LR: 0.00002882 [22:59:31] Epoch: 1 Batch: 1778/38378 (4.63%) Loss: 2.138946 LR: 0.00002882 [22:59:33] Epoch: 1 Batch: 1779/38378 (4.64%) Loss: 1.971629 LR: 0.00002882 [22:59:34] Epoch: 1 Batch: 1780/38378 (4.64%) Loss: 2.257641 LR: 0.00002882 [22:59:36] Epoch: 1 Batch: 1781/38378 (4.64%) Loss: 2.249787 LR: 0.00002882 [22:59:41] >> Cleaned up old temp checkpoint: epoch1_step1452 [22:59:41] >> Temp checkpoint saved: epoch1_step1782, size: 0.1702 GB [22:59:41] Epoch: 1 Batch: 1782/38378 (4.64%) Loss: 2.177279 LR: 0.00002893 [22:59:43] Epoch: 1 Batch: 1783/38378 (4.65%) Loss: 2.292101 LR: 0.00002893 [22:59:45] Epoch: 1 Batch: 1784/38378 (4.65%) Loss: 2.314098 LR: 0.00002893 [22:59:47] Epoch: 1 Batch: 1785/38378 (4.65%) Loss: 2.034084 LR: 0.00002893 [22:59:48] Epoch: 1 Batch: 1786/38378 (4.65%) Loss: 2.155256 LR: 0.00002893 [22:59:50] Epoch: 1 Batch: 1787/38378 (4.66%) Loss: 2.051966 LR: 0.00002893 [22:59:52] Epoch: 1 Batch: 1788/38378 (4.66%) Loss: 2.290211 LR: 0.00002893 [22:59:53] Epoch: 1 Batch: 1789/38378 (4.66%) Loss: 2.296626 LR: 0.00002904 [22:59:55] Epoch: 1 Batch: 1790/38378 (4.66%) Loss: 2.161907 LR: 0.00002904 [22:59:57] Epoch: 1 Batch: 1791/38378 (4.67%) Loss: 2.237209 LR: 0.00002904 [22:59:58] Epoch: 1 Batch: 1792/38378 (4.67%) Loss: 2.078632 LR: 0.00002904 [23:00:00] Epoch: 1 Batch: 1793/38378 (4.67%) Loss: 2.177241 LR: 0.00002904 [23:00:02] Epoch: 1 Batch: 1794/38378 (4.67%) Loss: 2.140836 LR: 0.00002904 [23:00:04] Epoch: 1 Batch: 1795/38378 (4.68%) Loss: 2.080113 LR: 0.00002904 [23:00:05] Epoch: 1 Batch: 1796/38378 (4.68%) Loss: 2.421650 LR: 0.00002916 [23:00:07] Epoch: 1 Batch: 1797/38378 (4.68%) Loss: 2.174161 LR: 0.00002916 [23:00:09] Epoch: 1 Batch: 1798/38378 (4.68%) Loss: 2.250116 LR: 0.00002916 [23:00:10] Epoch: 1 Batch: 1799/38378 (4.69%) Loss: 2.198870 LR: 0.00002916 [23:00:12] Epoch: 1 Batch: 1800/38378 (4.69%) Loss: 2.111294 LR: 0.00002916 [23:00:14] Epoch: 1 Batch: 1801/38378 (4.69%) Loss: 2.381663 LR: 0.00002916 [23:00:16] Epoch: 1 Batch: 1802/38378 (4.70%) Loss: 1.915382 LR: 0.00002916 [23:00:17] Epoch: 1 Batch: 1803/38378 (4.70%) Loss: 2.199210 LR: 0.00002927 [23:00:19] Epoch: 1 Batch: 1804/38378 (4.70%) Loss: 2.188769 LR: 0.00002927 [23:00:21] Epoch: 1 Batch: 1805/38378 (4.70%) Loss: 1.627947 LR: 0.00002927 [23:00:23] Epoch: 1 Batch: 1806/38378 (4.71%) Loss: 2.162667 LR: 0.00002927 [23:00:24] Epoch: 1 Batch: 1807/38378 (4.71%) Loss: 1.761972 LR: 0.00002927 [23:00:26] Epoch: 1 Batch: 1808/38378 (4.71%) Loss: 2.015022 LR: 0.00002927 [23:00:28] Epoch: 1 Batch: 1809/38378 (4.71%) Loss: 1.902522 LR: 0.00002927 [23:00:29] Epoch: 1 Batch: 1810/38378 (4.72%) Loss: 2.237528 LR: 0.00002938 [23:00:31] Epoch: 1 Batch: 1811/38378 (4.72%) Loss: 2.171511 LR: 0.00002938 [23:00:33] Epoch: 1 Batch: 1812/38378 (4.72%) Loss: 2.102392 LR: 0.00002938 [23:00:35] Epoch: 1 Batch: 1813/38378 (4.72%) Loss: 2.143115 LR: 0.00002938 [23:00:36] Epoch: 1 Batch: 1814/38378 (4.73%) Loss: 2.092632 LR: 0.00002938 [23:00:42] >> Cleaned up old temp checkpoint: epoch1_step1485 [23:00:42] >> Temp checkpoint saved: epoch1_step1815, size: 0.1702 GB [23:00:42] Epoch: 1 Batch: 1815/38378 (4.73%) Loss: 2.079161 LR: 0.00002938 [23:00:44] Epoch: 1 Batch: 1816/38378 (4.73%) Loss: 2.394300 LR: 0.00002938 [23:00:45] Epoch: 1 Batch: 1817/38378 (4.73%) Loss: 2.049967 LR: 0.00002950 [23:00:47] Epoch: 1 Batch: 1818/38378 (4.74%) Loss: 2.192763 LR: 0.00002950 [23:00:49] Epoch: 1 Batch: 1819/38378 (4.74%) Loss: 1.958158 LR: 0.00002950 [23:00:51] Epoch: 1 Batch: 1820/38378 (4.74%) Loss: 2.309785 LR: 0.00002950 [23:00:52] Epoch: 1 Batch: 1821/38378 (4.74%) Loss: 2.020407 LR: 0.00002950 [23:00:54] Epoch: 1 Batch: 1822/38378 (4.75%) Loss: 1.915201 LR: 0.00002950 [23:00:56] Epoch: 1 Batch: 1823/38378 (4.75%) Loss: 2.300217 LR: 0.00002950 [23:00:57] Epoch: 1 Batch: 1824/38378 (4.75%) Loss: 1.971002 LR: 0.00002961 [23:00:59] Epoch: 1 Batch: 1825/38378 (4.76%) Loss: 2.136789 LR: 0.00002961 [23:01:01] Epoch: 1 Batch: 1826/38378 (4.76%) Loss: 2.091536 LR: 0.00002961 [23:01:03] Epoch: 1 Batch: 1827/38378 (4.76%) Loss: 2.044902 LR: 0.00002961 [23:01:04] Epoch: 1 Batch: 1828/38378 (4.76%) Loss: 2.217125 LR: 0.00002961 [23:01:06] Epoch: 1 Batch: 1829/38378 (4.77%) Loss: 2.579179 LR: 0.00002961 [23:01:08] Epoch: 1 Batch: 1830/38378 (4.77%) Loss: 1.898491 LR: 0.00002961 [23:01:10] Epoch: 1 Batch: 1831/38378 (4.77%) Loss: 1.944604 LR: 0.00002973 [23:01:11] Epoch: 1 Batch: 1832/38378 (4.77%) Loss: 2.282025 LR: 0.00002973 [23:01:13] Epoch: 1 Batch: 1833/38378 (4.78%) Loss: 2.080129 LR: 0.00002973 [23:01:15] Epoch: 1 Batch: 1834/38378 (4.78%) Loss: 1.970557 LR: 0.00002973 [23:01:16] Epoch: 1 Batch: 1835/38378 (4.78%) Loss: 2.228021 LR: 0.00002973 [23:01:18] Epoch: 1 Batch: 1836/38378 (4.78%) Loss: 2.089762 LR: 0.00002973 [23:01:20] Epoch: 1 Batch: 1837/38378 (4.79%) Loss: 2.074716 LR: 0.00002973 [23:01:22] Epoch: 1 Batch: 1838/38378 (4.79%) Loss: 2.270194 LR: 0.00002984 [23:01:23] Epoch: 1 Batch: 1839/38378 (4.79%) Loss: 2.226758 LR: 0.00002984 [23:01:25] Epoch: 1 Batch: 1840/38378 (4.79%) Loss: 2.113522 LR: 0.00002984 [23:01:27] Epoch: 1 Batch: 1841/38378 (4.80%) Loss: 2.293765 LR: 0.00002984 [23:01:29] Epoch: 1 Batch: 1842/38378 (4.80%) Loss: 2.093931 LR: 0.00002984 [23:01:30] Epoch: 1 Batch: 1843/38378 (4.80%) Loss: 2.082081 LR: 0.00002984 [23:01:32] Epoch: 1 Batch: 1844/38378 (4.80%) Loss: 2.045989 LR: 0.00002984 [23:01:34] Epoch: 1 Batch: 1845/38378 (4.81%) Loss: 1.927086 LR: 0.00002995 [23:01:35] Epoch: 1 Batch: 1846/38378 (4.81%) Loss: 2.211935 LR: 0.00002995 [23:01:37] Epoch: 1 Batch: 1847/38378 (4.81%) Loss: 1.890245 LR: 0.00002995 [23:01:43] >> Cleaned up old temp checkpoint: epoch1_step1518 [23:01:43] >> Temp checkpoint saved: epoch1_step1848, size: 0.1702 GB [23:01:43] Epoch: 1 Batch: 1848/38378 (4.82%) Loss: 2.319668 LR: 0.00002995 [23:01:44] Epoch: 1 Batch: 1849/38378 (4.82%) Loss: 2.121136 LR: 0.00002995 [23:01:46] Epoch: 1 Batch: 1850/38378 (4.82%) Loss: 2.398127 LR: 0.00002995 [23:01:48] Epoch: 1 Batch: 1851/38378 (4.82%) Loss: 2.282784 LR: 0.00002995 [23:01:50] Epoch: 1 Batch: 1852/38378 (4.83%) Loss: 2.444046 LR: 0.00003007 [23:01:51] Epoch: 1 Batch: 1853/38378 (4.83%) Loss: 2.348752 LR: 0.00003007 [23:01:53] Epoch: 1 Batch: 1854/38378 (4.83%) Loss: 2.353798 LR: 0.00003007 [23:01:55] Epoch: 1 Batch: 1855/38378 (4.83%) Loss: 2.177972 LR: 0.00003007 [23:01:56] Epoch: 1 Batch: 1856/38378 (4.84%) Loss: 2.229741 LR: 0.00003007 [23:01:58] Epoch: 1 Batch: 1857/38378 (4.84%) Loss: 2.425702 LR: 0.00003007 [23:02:00] Epoch: 1 Batch: 1858/38378 (4.84%) Loss: 2.237645 LR: 0.00003007 [23:02:01] Epoch: 1 Batch: 1859/38378 (4.84%) Loss: 2.247368 LR: 0.00003018 [23:02:03] Epoch: 1 Batch: 1860/38378 (4.85%) Loss: 2.057825 LR: 0.00003018 [23:02:05] Epoch: 1 Batch: 1861/38378 (4.85%) Loss: 2.321633 LR: 0.00003018 [23:02:07] Epoch: 1 Batch: 1862/38378 (4.85%) Loss: 1.987616 LR: 0.00003018 [23:02:08] Epoch: 1 Batch: 1863/38378 (4.85%) Loss: 1.951452 LR: 0.00003018 [23:02:10] Epoch: 1 Batch: 1864/38378 (4.86%) Loss: 2.060427 LR: 0.00003018 [23:02:12] Epoch: 1 Batch: 1865/38378 (4.86%) Loss: 1.936700 LR: 0.00003018 [23:02:13] Epoch: 1 Batch: 1866/38378 (4.86%) Loss: 2.119260 LR: 0.00003030 [23:02:15] Epoch: 1 Batch: 1867/38378 (4.86%) Loss: 2.218834 LR: 0.00003030 [23:02:17] Epoch: 1 Batch: 1868/38378 (4.87%) Loss: 2.319914 LR: 0.00003030 [23:02:19] Epoch: 1 Batch: 1869/38378 (4.87%) Loss: 2.254581 LR: 0.00003030 [23:02:20] Epoch: 1 Batch: 1870/38378 (4.87%) Loss: 2.115586 LR: 0.00003030 [23:02:22] Epoch: 1 Batch: 1871/38378 (4.88%) Loss: 1.968373 LR: 0.00003030 [23:02:24] Epoch: 1 Batch: 1872/38378 (4.88%) Loss: 2.289592 LR: 0.00003030 [23:02:25] Epoch: 1 Batch: 1873/38378 (4.88%) Loss: 1.837303 LR: 0.00003041 [23:02:27] Epoch: 1 Batch: 1874/38378 (4.88%) Loss: 2.403216 LR: 0.00003041 [23:02:29] Epoch: 1 Batch: 1875/38378 (4.89%) Loss: 2.243858 LR: 0.00003041 [23:02:31] Epoch: 1 Batch: 1876/38378 (4.89%) Loss: 2.237090 LR: 0.00003041 [23:02:32] Epoch: 1 Batch: 1877/38378 (4.89%) Loss: 2.082023 LR: 0.00003041 [23:02:34] Epoch: 1 Batch: 1878/38378 (4.89%) Loss: 1.970489 LR: 0.00003041 [23:02:36] Epoch: 1 Batch: 1879/38378 (4.90%) Loss: 2.320048 LR: 0.00003041 [23:02:38] Epoch: 1 Batch: 1880/38378 (4.90%) Loss: 2.164555 LR: 0.00003052 [23:02:43] >> Cleaned up old temp checkpoint: epoch1_step1551 [23:02:43] >> Temp checkpoint saved: epoch1_step1881, size: 0.1702 GB [23:02:43] Epoch: 1 Batch: 1881/38378 (4.90%) Loss: 2.093516 LR: 0.00003052 [23:02:45] Epoch: 1 Batch: 1882/38378 (4.90%) Loss: 1.881988 LR: 0.00003052 [23:02:46] Epoch: 1 Batch: 1883/38378 (4.91%) Loss: 2.125041 LR: 0.00003052 [23:02:48] Epoch: 1 Batch: 1884/38378 (4.91%) Loss: 2.078808 LR: 0.00003052 [23:02:50] Epoch: 1 Batch: 1885/38378 (4.91%) Loss: 2.402085 LR: 0.00003052 [23:02:51] Epoch: 1 Batch: 1886/38378 (4.91%) Loss: 2.122593 LR: 0.00003052 [23:02:53] Epoch: 1 Batch: 1887/38378 (4.92%) Loss: 2.123190 LR: 0.00003064 [23:02:55] Epoch: 1 Batch: 1888/38378 (4.92%) Loss: 2.139832 LR: 0.00003064 [23:02:57] Epoch: 1 Batch: 1889/38378 (4.92%) Loss: 2.161169 LR: 0.00003064 [23:02:58] Epoch: 1 Batch: 1890/38378 (4.92%) Loss: 1.964597 LR: 0.00003064 [23:03:00] Epoch: 1 Batch: 1891/38378 (4.93%) Loss: 2.419918 LR: 0.00003064 [23:03:01] Epoch: 1 Batch: 1892/38378 (4.93%) Loss: 2.240129 LR: 0.00003064 [23:03:03] Epoch: 1 Batch: 1893/38378 (4.93%) Loss: 1.992009 LR: 0.00003064 [23:03:05] Epoch: 1 Batch: 1894/38378 (4.94%) Loss: 2.074735 LR: 0.00003075 [23:03:07] Epoch: 1 Batch: 1895/38378 (4.94%) Loss: 1.866576 LR: 0.00003075 [23:03:08] Epoch: 1 Batch: 1896/38378 (4.94%) Loss: 1.893652 LR: 0.00003075 [23:03:10] Epoch: 1 Batch: 1897/38378 (4.94%) Loss: 2.236307 LR: 0.00003075 [23:03:12] Epoch: 1 Batch: 1898/38378 (4.95%) Loss: 2.026220 LR: 0.00003075 [23:03:14] Epoch: 1 Batch: 1899/38378 (4.95%) Loss: 2.150656 LR: 0.00003075 [23:03:15] Epoch: 1 Batch: 1900/38378 (4.95%) Loss: 2.130933 LR: 0.00003075 [23:03:17] Epoch: 1 Batch: 1901/38378 (4.95%) Loss: 2.190584 LR: 0.00003087 [23:03:19] Epoch: 1 Batch: 1902/38378 (4.96%) Loss: 2.085210 LR: 0.00003087 [23:03:20] Epoch: 1 Batch: 1903/38378 (4.96%) Loss: 1.987576 LR: 0.00003087 [23:03:22] Epoch: 1 Batch: 1904/38378 (4.96%) Loss: 1.946704 LR: 0.00003087 [23:03:24] Epoch: 1 Batch: 1905/38378 (4.96%) Loss: 2.276359 LR: 0.00003087 [23:03:26] Epoch: 1 Batch: 1906/38378 (4.97%) Loss: 2.063867 LR: 0.00003087 [23:03:27] Epoch: 1 Batch: 1907/38378 (4.97%) Loss: 1.982338 LR: 0.00003087 [23:03:29] Epoch: 1 Batch: 1908/38378 (4.97%) Loss: 2.059637 LR: 0.00003098 [23:03:31] Epoch: 1 Batch: 1909/38378 (4.97%) Loss: 2.272393 LR: 0.00003098 [23:03:32] Epoch: 1 Batch: 1910/38378 (4.98%) Loss: 2.249704 LR: 0.00003098 [23:03:34] Epoch: 1 Batch: 1911/38378 (4.98%) Loss: 2.277984 LR: 0.00003098 [23:03:36] Epoch: 1 Batch: 1912/38378 (4.98%) Loss: 2.120034 LR: 0.00003098 [23:03:38] Epoch: 1 Batch: 1913/38378 (4.98%) Loss: 2.034819 LR: 0.00003098 [23:03:43] >> Cleaned up old temp checkpoint: epoch1_step1584 [23:03:43] >> Temp checkpoint saved: epoch1_step1914, size: 0.1702 GB [23:03:43] Epoch: 1 Batch: 1914/38378 (4.99%) Loss: 2.140613 LR: 0.00003098 [23:03:45] Epoch: 1 Batch: 1915/38378 (4.99%) Loss: 2.112329 LR: 0.00003109 [23:03:47] Epoch: 1 Batch: 1916/38378 (4.99%) Loss: 1.991107 LR: 0.00003109 [23:03:48] Epoch: 1 Batch: 1917/38378 (5.00%) Loss: 2.271477 LR: 0.00003109 [23:03:50] Epoch: 1 Batch: 1918/38378 (5.00%) Loss: 2.539681 LR: 0.00003109 [23:03:52] Epoch: 1 Batch: 1919/38378 (5.00%) Loss: 2.158386 LR: 0.00003109 [23:03:53] Epoch: 1 Batch: 1920/38378 (5.00%) Loss: 2.181026 LR: 0.00003109 [23:03:55] Epoch: 1 Batch: 1921/38378 (5.01%) Loss: 2.318695 LR: 0.00003109 [23:03:57] Epoch: 1 Batch: 1922/38378 (5.01%) Loss: 2.569548 LR: 0.00003121 [23:03:58] Epoch: 1 Batch: 1923/38378 (5.01%) Loss: 2.187791 LR: 0.00003121 [23:04:00] Epoch: 1 Batch: 1924/38378 (5.01%) Loss: 2.310390 LR: 0.00003121 [23:04:02] Epoch: 1 Batch: 1925/38378 (5.02%) Loss: 1.895934 LR: 0.00003121 [23:04:03] Epoch: 1 Batch: 1926/38378 (5.02%) Loss: 2.213696 LR: 0.00003121 [23:04:05] Epoch: 1 Batch: 1927/38378 (5.02%) Loss: 2.187885 LR: 0.00003121 [23:04:06] Epoch: 1 Batch: 1928/38378 (5.02%) Loss: 2.270372 LR: 0.00003121 [23:04:08] Epoch: 1 Batch: 1929/38378 (5.03%) Loss: 2.371767 LR: 0.00003132 [23:04:10] Epoch: 1 Batch: 1930/38378 (5.03%) Loss: 1.909932 LR: 0.00003132 [23:04:12] Epoch: 1 Batch: 1931/38378 (5.03%) Loss: 2.127637 LR: 0.00003132 [23:04:13] Epoch: 1 Batch: 1932/38378 (5.03%) Loss: 1.977319 LR: 0.00003132 [23:04:15] Epoch: 1 Batch: 1933/38378 (5.04%) Loss: 2.128151 LR: 0.00003132 [23:04:17] Epoch: 1 Batch: 1934/38378 (5.04%) Loss: 1.924514 LR: 0.00003132 [23:04:19] Epoch: 1 Batch: 1935/38378 (5.04%) Loss: 2.170329 LR: 0.00003132 [23:04:20] Epoch: 1 Batch: 1936/38378 (5.04%) Loss: 1.993827 LR: 0.00003144 [23:04:22] Epoch: 1 Batch: 1937/38378 (5.05%) Loss: 1.868351 LR: 0.00003144 [23:04:24] Epoch: 1 Batch: 1938/38378 (5.05%) Loss: 1.834383 LR: 0.00003144 [23:04:25] Epoch: 1 Batch: 1939/38378 (5.05%) Loss: 1.902260 LR: 0.00003144 [23:04:27] Epoch: 1 Batch: 1940/38378 (5.05%) Loss: 2.094202 LR: 0.00003144 [23:04:29] Epoch: 1 Batch: 1941/38378 (5.06%) Loss: 2.068084 LR: 0.00003144 [23:04:31] Epoch: 1 Batch: 1942/38378 (5.06%) Loss: 2.048988 LR: 0.00003144 [23:04:32] Epoch: 1 Batch: 1943/38378 (5.06%) Loss: 2.324540 LR: 0.00003155 [23:04:34] Epoch: 1 Batch: 1944/38378 (5.07%) Loss: 1.966457 LR: 0.00003155 [23:04:36] Epoch: 1 Batch: 1945/38378 (5.07%) Loss: 1.939078 LR: 0.00003155 [23:04:37] Epoch: 1 Batch: 1946/38378 (5.07%) Loss: 2.344365 LR: 0.00003155 [23:04:43] >> Cleaned up old temp checkpoint: epoch1_step1617 [23:04:43] >> Temp checkpoint saved: epoch1_step1947, size: 0.1702 GB [23:04:43] Epoch: 1 Batch: 1947/38378 (5.07%) Loss: 2.021998 LR: 0.00003155 [23:04:45] Epoch: 1 Batch: 1948/38378 (5.08%) Loss: 2.325001 LR: 0.00003155 [23:04:46] Epoch: 1 Batch: 1949/38378 (5.08%) Loss: 2.307759 LR: 0.00003155 [23:04:48] Epoch: 1 Batch: 1950/38378 (5.08%) Loss: 1.979140 LR: 0.00003166 [23:04:49] Epoch: 1 Batch: 1951/38378 (5.08%) Loss: 1.872964 LR: 0.00003166 [23:04:51] Epoch: 1 Batch: 1952/38378 (5.09%) Loss: 2.090074 LR: 0.00003166 [23:04:53] Epoch: 1 Batch: 1953/38378 (5.09%) Loss: 2.014417 LR: 0.00003166 [23:04:55] Epoch: 1 Batch: 1954/38378 (5.09%) Loss: 2.071924 LR: 0.00003166 [23:04:56] Epoch: 1 Batch: 1955/38378 (5.09%) Loss: 2.216836 LR: 0.00003166 [23:04:58] Epoch: 1 Batch: 1956/38378 (5.10%) Loss: 1.914975 LR: 0.00003166 [23:05:00] Epoch: 1 Batch: 1957/38378 (5.10%) Loss: 2.124208 LR: 0.00003178 [23:05:01] Epoch: 1 Batch: 1958/38378 (5.10%) Loss: 2.193392 LR: 0.00003178 [23:05:03] Epoch: 1 Batch: 1959/38378 (5.10%) Loss: 2.290875 LR: 0.00003178 [23:05:05] Epoch: 1 Batch: 1960/38378 (5.11%) Loss: 2.404477 LR: 0.00003178 [23:05:07] Epoch: 1 Batch: 1961/38378 (5.11%) Loss: 2.319868 LR: 0.00003178 [23:05:08] Epoch: 1 Batch: 1962/38378 (5.11%) Loss: 2.119728 LR: 0.00003178 [23:05:10] Epoch: 1 Batch: 1963/38378 (5.11%) Loss: 2.007164 LR: 0.00003178 [23:05:12] Epoch: 1 Batch: 1964/38378 (5.12%) Loss: 2.511609 LR: 0.00003189 [23:05:13] Epoch: 1 Batch: 1965/38378 (5.12%) Loss: 2.012425 LR: 0.00003189 [23:05:15] Epoch: 1 Batch: 1966/38378 (5.12%) Loss: 2.341562 LR: 0.00003189 [23:05:17] Epoch: 1 Batch: 1967/38378 (5.13%) Loss: 2.235298 LR: 0.00003189 [23:05:18] Epoch: 1 Batch: 1968/38378 (5.13%) Loss: 2.140602 LR: 0.00003189 [23:05:20] Epoch: 1 Batch: 1969/38378 (5.13%) Loss: 2.304202 LR: 0.00003189 [23:05:21] Epoch: 1 Batch: 1970/38378 (5.13%) Loss: 2.387093 LR: 0.00003189 [23:05:23] Epoch: 1 Batch: 1971/38378 (5.14%) Loss: 1.901002 LR: 0.00003200 [23:05:25] Epoch: 1 Batch: 1972/38378 (5.14%) Loss: 2.149265 LR: 0.00003200 [23:05:27] Epoch: 1 Batch: 1973/38378 (5.14%) Loss: 2.256977 LR: 0.00003200 [23:05:28] Epoch: 1 Batch: 1974/38378 (5.14%) Loss: 1.976371 LR: 0.00003200 [23:05:30] Epoch: 1 Batch: 1975/38378 (5.15%) Loss: 2.244966 LR: 0.00003200 [23:05:32] Epoch: 1 Batch: 1976/38378 (5.15%) Loss: 2.062795 LR: 0.00003200 [23:05:33] Epoch: 1 Batch: 1977/38378 (5.15%) Loss: 2.212317 LR: 0.00003200 [23:05:35] Epoch: 1 Batch: 1978/38378 (5.15%) Loss: 2.043091 LR: 0.00003212 [23:05:37] Epoch: 1 Batch: 1979/38378 (5.16%) Loss: 2.237756 LR: 0.00003212 [23:05:43] >> Cleaned up old temp checkpoint: epoch1_step1650 [23:05:43] >> Temp checkpoint saved: epoch1_step1980, size: 0.1702 GB [23:05:43] Epoch: 1 Batch: 1980/38378 (5.16%) Loss: 1.873359 LR: 0.00003212 [23:05:44] Epoch: 1 Batch: 1981/38378 (5.16%) Loss: 2.360784 LR: 0.00003212 [23:05:46] Epoch: 1 Batch: 1982/38378 (5.16%) Loss: 2.542692 LR: 0.00003212 [23:05:48] Epoch: 1 Batch: 1983/38378 (5.17%) Loss: 2.284298 LR: 0.00003212 [23:05:49] Epoch: 1 Batch: 1984/38378 (5.17%) Loss: 1.916126 LR: 0.00003212 [23:05:51] Epoch: 1 Batch: 1985/38378 (5.17%) Loss: 1.872502 LR: 0.00003223 [23:05:53] Epoch: 1 Batch: 1986/38378 (5.17%) Loss: 2.079219 LR: 0.00003223 [23:05:54] Epoch: 1 Batch: 1987/38378 (5.18%) Loss: 2.220907 LR: 0.00003223 [23:05:56] Epoch: 1 Batch: 1988/38378 (5.18%) Loss: 1.959510 LR: 0.00003223 [23:05:58] Epoch: 1 Batch: 1989/38378 (5.18%) Loss: 2.278151 LR: 0.00003223 [23:05:59] Epoch: 1 Batch: 1990/38378 (5.19%) Loss: 2.069926 LR: 0.00003223 [23:06:01] Epoch: 1 Batch: 1991/38378 (5.19%) Loss: 2.284002 LR: 0.00003223 [23:06:03] Epoch: 1 Batch: 1992/38378 (5.19%) Loss: 2.117228 LR: 0.00003235 [23:06:05] Epoch: 1 Batch: 1993/38378 (5.19%) Loss: 2.399024 LR: 0.00003235 [23:06:06] Epoch: 1 Batch: 1994/38378 (5.20%) Loss: 2.276935 LR: 0.00003235 [23:06:08] Epoch: 1 Batch: 1995/38378 (5.20%) Loss: 2.226874 LR: 0.00003235 [23:06:10] Epoch: 1 Batch: 1996/38378 (5.20%) Loss: 2.321448 LR: 0.00003235 [23:06:11] Epoch: 1 Batch: 1997/38378 (5.20%) Loss: 2.026446 LR: 0.00003235 [23:06:13] Epoch: 1 Batch: 1998/38378 (5.21%) Loss: 2.273086 LR: 0.00003235 [23:06:15] Epoch: 1 Batch: 1999/38378 (5.21%) Loss: 1.979595 LR: 0.00003246 [23:06:17] >> Evaluating batch 0 [23:06:17] >> Evaluating batch 1 [23:06:18] >> Evaluating batch 2 [23:06:19] >> Evaluating batch 3 [23:06:20] >> Evaluating batch 4 [23:06:21] >> Evaluating batch 5 [23:06:22] >> Evaluating batch 6 [23:06:23] >> Evaluating batch 7 [23:06:24] >> Evaluating batch 8 [23:06:25] >> Evaluating batch 9 [23:06:26] >> Evaluating batch 10 [23:06:27] >> Evaluating batch 11 [23:06:28] >> Evaluating batch 12 [23:06:29] >> Evaluating batch 13 [23:06:30] >> Evaluating batch 14 [23:06:31] >> Evaluating batch 15 [23:06:32] >> Evaluating batch 16 [23:06:32] Epoch: 1 Step: 2000/38378 Evaluation: [23:06:32] [1mAvg Loss Since Last Eval: 2.1565 Val Loss: 2.2501 Validation loss delta: -0.0447 Perplexity: 9.4887 LR: 0.00003246 [23:06:36] >> Checkpoint saved: epoch1_step2000, size: 0.1702 GB [23:06:36] Epoch: 1 Batch: 2000/38378 (5.21%) Loss: 1.827058 LR: 0.00003246 [23:06:38] Epoch: 1 Batch: 2001/38378 (5.21%) Loss: 2.319105 LR: 0.00003246 [23:06:40] Epoch: 1 Batch: 2002/38378 (5.22%) Loss: 2.153540 LR: 0.00003246 [23:06:41] Epoch: 1 Batch: 2003/38378 (5.22%) Loss: 2.367514 LR: 0.00003246 [23:06:43] Epoch: 1 Batch: 2004/38378 (5.22%) Loss: 2.211424 LR: 0.00003246 [23:06:45] Epoch: 1 Batch: 2005/38378 (5.22%) Loss: 2.269875 LR: 0.00003246 [23:06:47] Epoch: 1 Batch: 2006/38378 (5.23%) Loss: 2.268192 LR: 0.00003257 [23:06:48] Epoch: 1 Batch: 2007/38378 (5.23%) Loss: 2.288740 LR: 0.00003257 [23:06:50] Epoch: 1 Batch: 2008/38378 (5.23%) Loss: 2.044771 LR: 0.00003257 [23:06:52] Epoch: 1 Batch: 2009/38378 (5.23%) Loss: 2.037724 LR: 0.00003257 [23:06:53] Epoch: 1 Batch: 2010/38378 (5.24%) Loss: 2.445196 LR: 0.00003257 [23:06:55] Epoch: 1 Batch: 2011/38378 (5.24%) Loss: 2.143840 LR: 0.00003257 [23:06:57] Epoch: 1 Batch: 2012/38378 (5.24%) Loss: 2.272731 LR: 0.00003257 [23:07:03] >> Cleaned up old temp checkpoint: epoch1_step1683 [23:07:03] >> Temp checkpoint saved: epoch1_step2013, size: 0.1702 GB [23:07:03] Epoch: 1 Batch: 2013/38378 (5.25%) Loss: 2.243955 LR: 0.00003269 [23:07:04] Epoch: 1 Batch: 2014/38378 (5.25%) Loss: 2.233488 LR: 0.00003269 [23:07:06] Epoch: 1 Batch: 2015/38378 (5.25%) Loss: 2.159758 LR: 0.00003269 [23:07:08] Epoch: 1 Batch: 2016/38378 (5.25%) Loss: 2.437802 LR: 0.00003269 [23:07:09] Epoch: 1 Batch: 2017/38378 (5.26%) Loss: 2.115907 LR: 0.00003269 [23:07:11] Epoch: 1 Batch: 2018/38378 (5.26%) Loss: 1.899954 LR: 0.00003269 [23:07:13] Epoch: 1 Batch: 2019/38378 (5.26%) Loss: 2.093131 LR: 0.00003269 [23:07:14] Epoch: 1 Batch: 2020/38378 (5.26%) Loss: 2.006913 LR: 0.00003280 [23:07:16] Epoch: 1 Batch: 2021/38378 (5.27%) Loss: 2.278422 LR: 0.00003280 [23:07:18] Epoch: 1 Batch: 2022/38378 (5.27%) Loss: 2.302345 LR: 0.00003280 [23:07:20] Epoch: 1 Batch: 2023/38378 (5.27%) Loss: 2.255789 LR: 0.00003280 [23:07:21] Epoch: 1 Batch: 2024/38378 (5.27%) Loss: 2.039584 LR: 0.00003280 [23:07:23] Epoch: 1 Batch: 2025/38378 (5.28%) Loss: 2.125172 LR: 0.00003280 [23:07:25] Epoch: 1 Batch: 2026/38378 (5.28%) Loss: 2.426317 LR: 0.00003280 [23:07:27] Epoch: 1 Batch: 2027/38378 (5.28%) Loss: 1.792969 LR: 0.00003292 [23:07:28] Epoch: 1 Batch: 2028/38378 (5.28%) Loss: 2.267350 LR: 0.00003292 [23:07:30] Epoch: 1 Batch: 2029/38378 (5.29%) Loss: 2.053480 LR: 0.00003292 [23:07:32] Epoch: 1 Batch: 2030/38378 (5.29%) Loss: 2.102049 LR: 0.00003292 [23:07:33] Epoch: 1 Batch: 2031/38378 (5.29%) Loss: 2.248984 LR: 0.00003292 [23:07:35] Epoch: 1 Batch: 2032/38378 (5.29%) Loss: 2.109634 LR: 0.00003292 [23:07:37] Epoch: 1 Batch: 2033/38378 (5.30%) Loss: 2.332917 LR: 0.00003292 [23:07:39] Epoch: 1 Batch: 2034/38378 (5.30%) Loss: 2.247876 LR: 0.00003303 [23:07:40] Epoch: 1 Batch: 2035/38378 (5.30%) Loss: 2.217401 LR: 0.00003303 [23:07:42] Epoch: 1 Batch: 2036/38378 (5.31%) Loss: 2.344649 LR: 0.00003303 [23:07:44] Epoch: 1 Batch: 2037/38378 (5.31%) Loss: 1.849438 LR: 0.00003303 [23:07:45] Epoch: 1 Batch: 2038/38378 (5.31%) Loss: 2.069835 LR: 0.00003303 [23:07:47] Epoch: 1 Batch: 2039/38378 (5.31%) Loss: 2.137336 LR: 0.00003303 [23:07:49] Epoch: 1 Batch: 2040/38378 (5.32%) Loss: 2.030919 LR: 0.00003303 [23:07:51] Epoch: 1 Batch: 2041/38378 (5.32%) Loss: 2.541538 LR: 0.00003314 [23:07:52] Epoch: 1 Batch: 2042/38378 (5.32%) Loss: 2.333998 LR: 0.00003314 [23:07:54] Epoch: 1 Batch: 2043/38378 (5.32%) Loss: 2.207413 LR: 0.00003314 [23:07:56] Epoch: 1 Batch: 2044/38378 (5.33%) Loss: 2.172810 LR: 0.00003314 [23:07:57] Epoch: 1 Batch: 2045/38378 (5.33%) Loss: 2.319934 LR: 0.00003314 [23:08:03] >> Cleaned up old temp checkpoint: epoch1_step1716 [23:08:03] >> Temp checkpoint saved: epoch1_step2046, size: 0.1702 GB [23:08:03] Epoch: 1 Batch: 2046/38378 (5.33%) Loss: 2.073963 LR: 0.00003314 [23:08:05] Epoch: 1 Batch: 2047/38378 (5.33%) Loss: 2.029445 LR: 0.00003314 [23:08:06] Epoch: 1 Batch: 2048/38378 (5.34%) Loss: 2.015338 LR: 0.00003326 [23:08:08] Epoch: 1 Batch: 2049/38378 (5.34%) Loss: 2.093911 LR: 0.00003326 [23:08:10] Epoch: 1 Batch: 2050/38378 (5.34%) Loss: 1.990069 LR: 0.00003326 [23:08:11] Epoch: 1 Batch: 2051/38378 (5.34%) Loss: 2.605200 LR: 0.00003326 [23:08:13] Epoch: 1 Batch: 2052/38378 (5.35%) Loss: 1.837211 LR: 0.00003326 [23:08:15] Epoch: 1 Batch: 2053/38378 (5.35%) Loss: 2.384693 LR: 0.00003326 [23:08:16] Epoch: 1 Batch: 2054/38378 (5.35%) Loss: 2.254883 LR: 0.00003326 [23:08:18] Epoch: 1 Batch: 2055/38378 (5.35%) Loss: 2.109786 LR: 0.00003337 [23:08:20] Epoch: 1 Batch: 2056/38378 (5.36%) Loss: 1.945082 LR: 0.00003337 [23:08:22] Epoch: 1 Batch: 2057/38378 (5.36%) Loss: 1.967991 LR: 0.00003337 [23:08:23] Epoch: 1 Batch: 2058/38378 (5.36%) Loss: 1.944884 LR: 0.00003337 [23:08:25] Epoch: 1 Batch: 2059/38378 (5.37%) Loss: 2.337900 LR: 0.00003337 [23:08:27] Epoch: 1 Batch: 2060/38378 (5.37%) Loss: 2.188983 LR: 0.00003337 [23:08:28] Epoch: 1 Batch: 2061/38378 (5.37%) Loss: 2.216815 LR: 0.00003337 [23:08:30] Epoch: 1 Batch: 2062/38378 (5.37%) Loss: 2.146220 LR: 0.00003349 [23:08:32] Epoch: 1 Batch: 2063/38378 (5.38%) Loss: 2.241702 LR: 0.00003349 [23:08:34] Epoch: 1 Batch: 2064/38378 (5.38%) Loss: 2.371919 LR: 0.00003349 [23:08:35] Epoch: 1 Batch: 2065/38378 (5.38%) Loss: 2.116829 LR: 0.00003349 [23:08:37] Epoch: 1 Batch: 2066/38378 (5.38%) Loss: 2.140781 LR: 0.00003349 [23:08:39] Epoch: 1 Batch: 2067/38378 (5.39%) Loss: 2.159341 LR: 0.00003349 [23:08:41] Epoch: 1 Batch: 2068/38378 (5.39%) Loss: 2.052112 LR: 0.00003349 [23:08:42] Epoch: 1 Batch: 2069/38378 (5.39%) Loss: 2.095084 LR: 0.00003360 [23:08:44] Epoch: 1 Batch: 2070/38378 (5.39%) Loss: 1.944291 LR: 0.00003360 [23:08:46] Epoch: 1 Batch: 2071/38378 (5.40%) Loss: 2.089246 LR: 0.00003360 [23:08:47] Epoch: 1 Batch: 2072/38378 (5.40%) Loss: 2.264092 LR: 0.00003360 [23:08:49] Epoch: 1 Batch: 2073/38378 (5.40%) Loss: 2.243102 LR: 0.00003360 [23:08:51] Epoch: 1 Batch: 2074/38378 (5.40%) Loss: 2.055227 LR: 0.00003360 [23:08:53] Epoch: 1 Batch: 2075/38378 (5.41%) Loss: 2.161397 LR: 0.00003360 [23:08:54] Epoch: 1 Batch: 2076/38378 (5.41%) Loss: 2.322072 LR: 0.00003371 [23:08:56] Epoch: 1 Batch: 2077/38378 (5.41%) Loss: 1.992255 LR: 0.00003371 [23:08:58] Epoch: 1 Batch: 2078/38378 (5.41%) Loss: 2.318913 LR: 0.00003371 [23:09:04] >> Cleaned up old temp checkpoint: epoch1_step1749 [23:09:04] >> Temp checkpoint saved: epoch1_step2079, size: 0.1702 GB [23:09:04] Epoch: 1 Batch: 2079/38378 (5.42%) Loss: 2.086992 LR: 0.00003371 [23:09:05] Epoch: 1 Batch: 2080/38378 (5.42%) Loss: 2.182339 LR: 0.00003371 [23:09:07] Epoch: 1 Batch: 2081/38378 (5.42%) Loss: 2.108634 LR: 0.00003371 [23:09:09] Epoch: 1 Batch: 2082/38378 (5.42%) Loss: 1.791785 LR: 0.00003371 [23:09:10] Epoch: 1 Batch: 2083/38378 (5.43%) Loss: 2.424848 LR: 0.00003383 [23:09:12] Epoch: 1 Batch: 2084/38378 (5.43%) Loss: 2.216173 LR: 0.00003383 [23:09:14] Epoch: 1 Batch: 2085/38378 (5.43%) Loss: 2.207032 LR: 0.00003383 [23:09:16] Epoch: 1 Batch: 2086/38378 (5.44%) Loss: 2.285439 LR: 0.00003383 [23:09:17] Epoch: 1 Batch: 2087/38378 (5.44%) Loss: 2.244095 LR: 0.00003383 [23:09:19] Epoch: 1 Batch: 2088/38378 (5.44%) Loss: 2.047303 LR: 0.00003383 [23:09:21] Epoch: 1 Batch: 2089/38378 (5.44%) Loss: 2.368817 LR: 0.00003383 [23:09:22] Epoch: 1 Batch: 2090/38378 (5.45%) Loss: 1.921062 LR: 0.00003394 [23:09:24] Epoch: 1 Batch: 2091/38378 (5.45%) Loss: 1.972716 LR: 0.00003394 [23:09:26] Epoch: 1 Batch: 2092/38378 (5.45%) Loss: 2.163215 LR: 0.00003394 [23:09:28] Epoch: 1 Batch: 2093/38378 (5.45%) Loss: 2.418045 LR: 0.00003394 [23:09:29] Epoch: 1 Batch: 2094/38378 (5.46%) Loss: 2.081495 LR: 0.00003394 [23:09:31] Epoch: 1 Batch: 2095/38378 (5.46%) Loss: 2.023246 LR: 0.00003394 [23:09:33] Epoch: 1 Batch: 2096/38378 (5.46%) Loss: 1.945476 LR: 0.00003394 [23:09:34] Epoch: 1 Batch: 2097/38378 (5.46%) Loss: 1.999708 LR: 0.00003405 [23:09:36] Epoch: 1 Batch: 2098/38378 (5.47%) Loss: 2.246475 LR: 0.00003405 [23:09:38] Epoch: 1 Batch: 2099/38378 (5.47%) Loss: 2.351090 LR: 0.00003405 [23:09:40] Epoch: 1 Batch: 2100/38378 (5.47%) Loss: 2.174613 LR: 0.00003405 [23:09:41] Epoch: 1 Batch: 2101/38378 (5.47%) Loss: 1.985012 LR: 0.00003405 [23:09:43] Epoch: 1 Batch: 2102/38378 (5.48%) Loss: 1.829884 LR: 0.00003405 [23:09:45] Epoch: 1 Batch: 2103/38378 (5.48%) Loss: 2.196218 LR: 0.00003405 [23:09:46] Epoch: 1 Batch: 2104/38378 (5.48%) Loss: 2.297529 LR: 0.00003417 [23:09:48] Epoch: 1 Batch: 2105/38378 (5.48%) Loss: 2.275998 LR: 0.00003417 [23:09:50] Epoch: 1 Batch: 2106/38378 (5.49%) Loss: 2.197165 LR: 0.00003417 [23:09:52] Epoch: 1 Batch: 2107/38378 (5.49%) Loss: 2.091456 LR: 0.00003417 [23:09:53] Epoch: 1 Batch: 2108/38378 (5.49%) Loss: 2.089476 LR: 0.00003417 [23:09:55] Epoch: 1 Batch: 2109/38378 (5.50%) Loss: 1.850836 LR: 0.00003417 [23:09:57] Epoch: 1 Batch: 2110/38378 (5.50%) Loss: 1.803863 LR: 0.00003417 [23:09:58] Epoch: 1 Batch: 2111/38378 (5.50%) Loss: 2.214473 LR: 0.00003428 [23:10:04] >> Cleaned up old temp checkpoint: epoch1_step1782 [23:10:04] >> Temp checkpoint saved: epoch1_step2112, size: 0.1702 GB [23:10:04] Epoch: 1 Batch: 2112/38378 (5.50%) Loss: 2.089496 LR: 0.00003428 [23:10:06] Epoch: 1 Batch: 2113/38378 (5.51%) Loss: 2.292536 LR: 0.00003428 [23:10:07] Epoch: 1 Batch: 2114/38378 (5.51%) Loss: 2.226224 LR: 0.00003428 [23:10:09] Epoch: 1 Batch: 2115/38378 (5.51%) Loss: 2.030939 LR: 0.00003428 [23:10:11] Epoch: 1 Batch: 2116/38378 (5.51%) Loss: 2.131847 LR: 0.00003428 [23:10:12] Epoch: 1 Batch: 2117/38378 (5.52%) Loss: 2.000703 LR: 0.00003428 [23:10:14] Epoch: 1 Batch: 2118/38378 (5.52%) Loss: 2.325734 LR: 0.00003440 [23:10:16] Epoch: 1 Batch: 2119/38378 (5.52%) Loss: 1.873312 LR: 0.00003440 [23:10:18] Epoch: 1 Batch: 2120/38378 (5.52%) Loss: 2.024246 LR: 0.00003440 [23:10:19] Epoch: 1 Batch: 2121/38378 (5.53%) Loss: 2.166745 LR: 0.00003440 [23:10:21] Epoch: 1 Batch: 2122/38378 (5.53%) Loss: 2.237033 LR: 0.00003440 [23:10:23] Epoch: 1 Batch: 2123/38378 (5.53%) Loss: 2.045493 LR: 0.00003440 [23:10:24] Epoch: 1 Batch: 2124/38378 (5.53%) Loss: 1.826025 LR: 0.00003440 [23:10:26] Epoch: 1 Batch: 2125/38378 (5.54%) Loss: 2.417444 LR: 0.00003451 [23:10:28] Epoch: 1 Batch: 2126/38378 (5.54%) Loss: 2.130721 LR: 0.00003451 [23:10:29] Epoch: 1 Batch: 2127/38378 (5.54%) Loss: 2.208186 LR: 0.00003451 [23:10:31] Epoch: 1 Batch: 2128/38378 (5.54%) Loss: 1.867453 LR: 0.00003451 [23:10:33] Epoch: 1 Batch: 2129/38378 (5.55%) Loss: 2.155362 LR: 0.00003451 [23:10:35] Epoch: 1 Batch: 2130/38378 (5.55%) Loss: 1.897762 LR: 0.00003451 [23:10:36] Epoch: 1 Batch: 2131/38378 (5.55%) Loss: 2.049350 LR: 0.00003451 [23:10:38] Epoch: 1 Batch: 2132/38378 (5.56%) Loss: 2.069683 LR: 0.00003462 [23:10:40] Epoch: 1 Batch: 2133/38378 (5.56%) Loss: 1.791955 LR: 0.00003462 [23:10:42] Epoch: 1 Batch: 2134/38378 (5.56%) Loss: 2.022280 LR: 0.00003462 [23:10:43] Epoch: 1 Batch: 2135/38378 (5.56%) Loss: 2.089261 LR: 0.00003462 [23:10:45] Epoch: 1 Batch: 2136/38378 (5.57%) Loss: 1.911213 LR: 0.00003462 [23:10:47] Epoch: 1 Batch: 2137/38378 (5.57%) Loss: 1.876766 LR: 0.00003462 [23:10:48] Epoch: 1 Batch: 2138/38378 (5.57%) Loss: 2.088763 LR: 0.00003462 [23:10:50] Epoch: 1 Batch: 2139/38378 (5.57%) Loss: 1.855206 LR: 0.00003474 [23:10:52] Epoch: 1 Batch: 2140/38378 (5.58%) Loss: 2.107416 LR: 0.00003474 [23:10:54] Epoch: 1 Batch: 2141/38378 (5.58%) Loss: 2.289477 LR: 0.00003474 [23:10:55] Epoch: 1 Batch: 2142/38378 (5.58%) Loss: 1.859111 LR: 0.00003474 [23:10:57] Epoch: 1 Batch: 2143/38378 (5.58%) Loss: 1.908353 LR: 0.00003474 [23:10:59] Epoch: 1 Batch: 2144/38378 (5.59%) Loss: 2.175355 LR: 0.00003474 [23:11:05] >> Cleaned up old temp checkpoint: epoch1_step1815 [23:11:05] >> Temp checkpoint saved: epoch1_step2145, size: 0.1702 GB [23:11:05] Epoch: 1 Batch: 2145/38378 (5.59%) Loss: 2.085092 LR: 0.00003474 [23:11:06] Epoch: 1 Batch: 2146/38378 (5.59%) Loss: 2.354432 LR: 0.00003485 [23:11:08] Epoch: 1 Batch: 2147/38378 (5.59%) Loss: 2.112187 LR: 0.00003485 [23:11:10] Epoch: 1 Batch: 2148/38378 (5.60%) Loss: 1.788973 LR: 0.00003485 [23:11:11] Epoch: 1 Batch: 2149/38378 (5.60%) Loss: 2.111824 LR: 0.00003485 [23:11:13] Epoch: 1 Batch: 2150/38378 (5.60%) Loss: 1.891126 LR: 0.00003485 [23:11:15] Epoch: 1 Batch: 2151/38378 (5.60%) Loss: 2.293873 LR: 0.00003485 [23:11:16] Epoch: 1 Batch: 2152/38378 (5.61%) Loss: 1.920925 LR: 0.00003485 [23:11:18] Epoch: 1 Batch: 2153/38378 (5.61%) Loss: 2.025619 LR: 0.00003497 [23:11:20] Epoch: 1 Batch: 2154/38378 (5.61%) Loss: 2.252161 LR: 0.00003497 [23:11:22] Epoch: 1 Batch: 2155/38378 (5.62%) Loss: 2.046320 LR: 0.00003497 [23:11:23] Epoch: 1 Batch: 2156/38378 (5.62%) Loss: 2.296049 LR: 0.00003497 [23:11:25] Epoch: 1 Batch: 2157/38378 (5.62%) Loss: 2.047529 LR: 0.00003497 [23:11:27] Epoch: 1 Batch: 2158/38378 (5.62%) Loss: 1.808888 LR: 0.00003497 [23:11:28] Epoch: 1 Batch: 2159/38378 (5.63%) Loss: 1.752811 LR: 0.00003497 [23:11:30] Epoch: 1 Batch: 2160/38378 (5.63%) Loss: 2.345513 LR: 0.00003508 [23:11:32] Epoch: 1 Batch: 2161/38378 (5.63%) Loss: 2.276829 LR: 0.00003508 [23:11:34] Epoch: 1 Batch: 2162/38378 (5.63%) Loss: 2.219118 LR: 0.00003508 [23:11:35] Epoch: 1 Batch: 2163/38378 (5.64%) Loss: 2.078047 LR: 0.00003508 [23:11:37] Epoch: 1 Batch: 2164/38378 (5.64%) Loss: 2.036274 LR: 0.00003508 [23:11:39] Epoch: 1 Batch: 2165/38378 (5.64%) Loss: 2.025258 LR: 0.00003508 [23:11:40] Epoch: 1 Batch: 2166/38378 (5.64%) Loss: 2.228215 LR: 0.00003508 [23:11:42] Epoch: 1 Batch: 2167/38378 (5.65%) Loss: 2.317452 LR: 0.00003519 [23:11:44] Epoch: 1 Batch: 2168/38378 (5.65%) Loss: 2.097161 LR: 0.00003519 [23:11:46] Epoch: 1 Batch: 2169/38378 (5.65%) Loss: 2.066113 LR: 0.00003519 [23:11:47] Epoch: 1 Batch: 2170/38378 (5.65%) Loss: 2.061520 LR: 0.00003519 [23:11:49] Epoch: 1 Batch: 2171/38378 (5.66%) Loss: 2.237009 LR: 0.00003519 [23:11:51] Epoch: 1 Batch: 2172/38378 (5.66%) Loss: 2.078718 LR: 0.00003519 [23:11:52] Epoch: 1 Batch: 2173/38378 (5.66%) Loss: 2.149251 LR: 0.00003519 [23:11:54] Epoch: 1 Batch: 2174/38378 (5.66%) Loss: 2.229976 LR: 0.00003531 [23:11:56] Epoch: 1 Batch: 2175/38378 (5.67%) Loss: 2.040440 LR: 0.00003531 [23:11:58] Epoch: 1 Batch: 2176/38378 (5.67%) Loss: 2.087646 LR: 0.00003531 [23:11:59] Epoch: 1 Batch: 2177/38378 (5.67%) Loss: 2.176354 LR: 0.00003531 [23:12:05] >> Cleaned up old temp checkpoint: epoch1_step1848 [23:12:05] >> Temp checkpoint saved: epoch1_step2178, size: 0.1702 GB [23:12:05] Epoch: 1 Batch: 2178/38378 (5.68%) Loss: 2.393014 LR: 0.00003531 [23:12:07] Epoch: 1 Batch: 2179/38378 (5.68%) Loss: 2.285368 LR: 0.00003531 [23:12:08] Epoch: 1 Batch: 2180/38378 (5.68%) Loss: 2.294908 LR: 0.00003531 [23:12:10] Epoch: 1 Batch: 2181/38378 (5.68%) Loss: 2.547067 LR: 0.00003542 [23:12:12] Epoch: 1 Batch: 2182/38378 (5.69%) Loss: 2.193022 LR: 0.00003542 [23:12:13] Epoch: 1 Batch: 2183/38378 (5.69%) Loss: 2.256051 LR: 0.00003542 [23:12:15] Epoch: 1 Batch: 2184/38378 (5.69%) Loss: 2.014912 LR: 0.00003542 [23:12:17] Epoch: 1 Batch: 2185/38378 (5.69%) Loss: 2.018917 LR: 0.00003542 [23:12:18] Epoch: 1 Batch: 2186/38378 (5.70%) Loss: 2.471611 LR: 0.00003542 [23:12:20] Epoch: 1 Batch: 2187/38378 (5.70%) Loss: 2.152067 LR: 0.00003542 [23:12:22] Epoch: 1 Batch: 2188/38378 (5.70%) Loss: 1.906392 LR: 0.00003554 [23:12:24] Epoch: 1 Batch: 2189/38378 (5.70%) Loss: 2.079830 LR: 0.00003554 [23:12:25] Epoch: 1 Batch: 2190/38378 (5.71%) Loss: 2.132177 LR: 0.00003554 [23:12:27] Epoch: 1 Batch: 2191/38378 (5.71%) Loss: 2.090183 LR: 0.00003554 [23:12:29] Epoch: 1 Batch: 2192/38378 (5.71%) Loss: 2.232129 LR: 0.00003554 [23:12:30] Epoch: 1 Batch: 2193/38378 (5.71%) Loss: 2.114703 LR: 0.00003554 [23:12:32] Epoch: 1 Batch: 2194/38378 (5.72%) Loss: 2.147961 LR: 0.00003554 [23:12:34] Epoch: 1 Batch: 2195/38378 (5.72%) Loss: 2.246845 LR: 0.00003565 [23:12:36] Epoch: 1 Batch: 2196/38378 (5.72%) Loss: 1.961059 LR: 0.00003565 [23:12:37] Epoch: 1 Batch: 2197/38378 (5.72%) Loss: 1.986856 LR: 0.00003565 [23:12:39] Epoch: 1 Batch: 2198/38378 (5.73%) Loss: 2.005838 LR: 0.00003565 [23:12:41] Epoch: 1 Batch: 2199/38378 (5.73%) Loss: 2.080647 LR: 0.00003565 [23:12:43] Epoch: 1 Batch: 2200/38378 (5.73%) Loss: 2.274531 LR: 0.00003565 [23:12:44] Epoch: 1 Batch: 2201/38378 (5.74%) Loss: 2.260367 LR: 0.00003565 [23:12:46] Epoch: 1 Batch: 2202/38378 (5.74%) Loss: 2.127005 LR: 0.00003576 [23:12:48] Epoch: 1 Batch: 2203/38378 (5.74%) Loss: 2.061628 LR: 0.00003576 [23:12:49] Epoch: 1 Batch: 2204/38378 (5.74%) Loss: 2.335432 LR: 0.00003576 [23:12:51] Epoch: 1 Batch: 2205/38378 (5.75%) Loss: 2.663794 LR: 0.00003576 [23:12:53] Epoch: 1 Batch: 2206/38378 (5.75%) Loss: 2.023412 LR: 0.00003576 [23:12:54] Epoch: 1 Batch: 2207/38378 (5.75%) Loss: 1.931554 LR: 0.00003576 [23:12:56] Epoch: 1 Batch: 2208/38378 (5.75%) Loss: 2.060529 LR: 0.00003576 [23:12:58] Epoch: 1 Batch: 2209/38378 (5.76%) Loss: 2.044200 LR: 0.00003588 [23:13:00] Epoch: 1 Batch: 2210/38378 (5.76%) Loss: 2.061972 LR: 0.00003588 [23:13:05] >> Cleaned up old temp checkpoint: epoch1_step1881 [23:13:05] >> Temp checkpoint saved: epoch1_step2211, size: 0.1702 GB [23:13:05] Epoch: 1 Batch: 2211/38378 (5.76%) Loss: 2.029398 LR: 0.00003588 [23:13:07] Epoch: 1 Batch: 2212/38378 (5.76%) Loss: 2.059135 LR: 0.00003588 [23:13:08] Epoch: 1 Batch: 2213/38378 (5.77%) Loss: 2.114437 LR: 0.00003588 [23:13:10] Epoch: 1 Batch: 2214/38378 (5.77%) Loss: 1.953426 LR: 0.00003588 [23:13:12] Epoch: 1 Batch: 2215/38378 (5.77%) Loss: 2.168405 LR: 0.00003588 [23:13:13] Epoch: 1 Batch: 2216/38378 (5.77%) Loss: 2.186932 LR: 0.00003599 [23:13:15] Epoch: 1 Batch: 2217/38378 (5.78%) Loss: 2.130043 LR: 0.00003599 [23:13:17] Epoch: 1 Batch: 2218/38378 (5.78%) Loss: 1.809343 LR: 0.00003599 [23:13:19] Epoch: 1 Batch: 2219/38378 (5.78%) Loss: 2.105830 LR: 0.00003599 [23:13:20] Epoch: 1 Batch: 2220/38378 (5.78%) Loss: 2.384886 LR: 0.00003599 [23:13:22] Epoch: 1 Batch: 2221/38378 (5.79%) Loss: 2.317312 LR: 0.00003599 [23:13:24] Epoch: 1 Batch: 2222/38378 (5.79%) Loss: 1.986991 LR: 0.00003599 [23:13:25] Epoch: 1 Batch: 2223/38378 (5.79%) Loss: 2.278939 LR: 0.00003610 [23:13:27] Epoch: 1 Batch: 2224/38378 (5.79%) Loss: 2.394246 LR: 0.00003610 [23:13:29] Epoch: 1 Batch: 2225/38378 (5.80%) Loss: 1.950590 LR: 0.00003610 [23:13:31] Epoch: 1 Batch: 2226/38378 (5.80%) Loss: 2.234657 LR: 0.00003610 [23:13:32] Epoch: 1 Batch: 2227/38378 (5.80%) Loss: 1.849472 LR: 0.00003610 [23:13:34] Epoch: 1 Batch: 2228/38378 (5.81%) Loss: 2.334851 LR: 0.00003610 [23:13:36] Epoch: 1 Batch: 2229/38378 (5.81%) Loss: 2.045730 LR: 0.00003610 [23:13:37] Epoch: 1 Batch: 2230/38378 (5.81%) Loss: 2.231284 LR: 0.00003622 [23:13:39] Epoch: 1 Batch: 2231/38378 (5.81%) Loss: 2.121236 LR: 0.00003622 [23:13:41] Epoch: 1 Batch: 2232/38378 (5.82%) Loss: 2.085824 LR: 0.00003622 [23:13:43] Epoch: 1 Batch: 2233/38378 (5.82%) Loss: 2.349885 LR: 0.00003622 [23:13:44] Epoch: 1 Batch: 2234/38378 (5.82%) Loss: 1.986570 LR: 0.00003622 [23:13:46] Epoch: 1 Batch: 2235/38378 (5.82%) Loss: 2.226539 LR: 0.00003622 [23:13:48] Epoch: 1 Batch: 2236/38378 (5.83%) Loss: 2.236186 LR: 0.00003622 [23:13:50] Epoch: 1 Batch: 2237/38378 (5.83%) Loss: 1.676826 LR: 0.00003633 [23:13:51] Epoch: 1 Batch: 2238/38378 (5.83%) Loss: 1.939085 LR: 0.00003633 [23:13:53] Epoch: 1 Batch: 2239/38378 (5.83%) Loss: 2.061548 LR: 0.00003633 [23:13:55] Epoch: 1 Batch: 2240/38378 (5.84%) Loss: 2.170246 LR: 0.00003633 [23:13:56] Epoch: 1 Batch: 2241/38378 (5.84%) Loss: 2.009293 LR: 0.00003633 [23:13:58] Epoch: 1 Batch: 2242/38378 (5.84%) Loss: 2.091273 LR: 0.00003633 [23:14:00] Epoch: 1 Batch: 2243/38378 (5.84%) Loss: 2.452374 LR: 0.00003633 [23:14:05] >> Cleaned up old temp checkpoint: epoch1_step1914 [23:14:05] >> Temp checkpoint saved: epoch1_step2244, size: 0.1702 GB [23:14:05] Epoch: 1 Batch: 2244/38378 (5.85%) Loss: 1.958262 LR: 0.00003645 [23:14:07] Epoch: 1 Batch: 2245/38378 (5.85%) Loss: 2.260102 LR: 0.00003645 [23:14:09] Epoch: 1 Batch: 2246/38378 (5.85%) Loss: 2.039773 LR: 0.00003645 [23:14:10] Epoch: 1 Batch: 2247/38378 (5.85%) Loss: 2.020561 LR: 0.00003645 [23:14:12] Epoch: 1 Batch: 2248/38378 (5.86%) Loss: 1.832426 LR: 0.00003645 [23:14:14] Epoch: 1 Batch: 2249/38378 (5.86%) Loss: 1.939446 LR: 0.00003645 [23:14:15] Epoch: 1 Batch: 2250/38378 (5.86%) Loss: 1.897526 LR: 0.00003645 [23:14:17] Epoch: 1 Batch: 2251/38378 (5.87%) Loss: 2.084981 LR: 0.00003656 [23:14:19] Epoch: 1 Batch: 2252/38378 (5.87%) Loss: 2.167612 LR: 0.00003656 [23:14:20] Epoch: 1 Batch: 2253/38378 (5.87%) Loss: 2.179510 LR: 0.00003656 [23:14:22] Epoch: 1 Batch: 2254/38378 (5.87%) Loss: 2.077131 LR: 0.00003656 [23:14:24] Epoch: 1 Batch: 2255/38378 (5.88%) Loss: 2.071026 LR: 0.00003656 [23:14:26] Epoch: 1 Batch: 2256/38378 (5.88%) Loss: 2.077627 LR: 0.00003656 [23:14:27] Epoch: 1 Batch: 2257/38378 (5.88%) Loss: 2.106719 LR: 0.00003656 [23:14:29] Epoch: 1 Batch: 2258/38378 (5.88%) Loss: 2.080450 LR: 0.00003667 [23:14:31] Epoch: 1 Batch: 2259/38378 (5.89%) Loss: 2.181097 LR: 0.00003667 [23:14:32] Epoch: 1 Batch: 2260/38378 (5.89%) Loss: 2.126694 LR: 0.00003667 [23:14:34] Epoch: 1 Batch: 2261/38378 (5.89%) Loss: 1.929230 LR: 0.00003667 [23:14:36] Epoch: 1 Batch: 2262/38378 (5.89%) Loss: 1.967841 LR: 0.00003667 [23:14:38] Epoch: 1 Batch: 2263/38378 (5.90%) Loss: 1.859368 LR: 0.00003667 [23:14:39] Epoch: 1 Batch: 2264/38378 (5.90%) Loss: 2.062192 LR: 0.00003667 [23:14:41] Epoch: 1 Batch: 2265/38378 (5.90%) Loss: 2.070264 LR: 0.00003679 [23:14:43] Epoch: 1 Batch: 2266/38378 (5.90%) Loss: 1.957506 LR: 0.00003679 [23:14:44] Epoch: 1 Batch: 2267/38378 (5.91%) Loss: 1.850075 LR: 0.00003679 [23:14:46] Epoch: 1 Batch: 2268/38378 (5.91%) Loss: 2.057585 LR: 0.00003679 [23:14:48] Epoch: 1 Batch: 2269/38378 (5.91%) Loss: 2.100298 LR: 0.00003679 [23:14:50] Epoch: 1 Batch: 2270/38378 (5.91%) Loss: 2.073839 LR: 0.00003679 [23:14:51] Epoch: 1 Batch: 2271/38378 (5.92%) Loss: 2.248979 LR: 0.00003679 [23:14:53] Epoch: 1 Batch: 2272/38378 (5.92%) Loss: 2.303744 LR: 0.00003690 [23:14:55] Epoch: 1 Batch: 2273/38378 (5.92%) Loss: 2.189684 LR: 0.00003690 [23:14:56] Epoch: 1 Batch: 2274/38378 (5.93%) Loss: 2.083765 LR: 0.00003690 [23:14:58] Epoch: 1 Batch: 2275/38378 (5.93%) Loss: 2.212639 LR: 0.00003690 [23:15:00] Epoch: 1 Batch: 2276/38378 (5.93%) Loss: 2.198285 LR: 0.00003690 [23:15:06] >> Cleaned up old temp checkpoint: epoch1_step1947 [23:15:06] >> Temp checkpoint saved: epoch1_step2277, size: 0.1702 GB [23:15:06] Epoch: 1 Batch: 2277/38378 (5.93%) Loss: 2.188908 LR: 0.00003690 [23:15:07] Epoch: 1 Batch: 2278/38378 (5.94%) Loss: 2.126569 LR: 0.00003690 [23:15:09] Epoch: 1 Batch: 2279/38378 (5.94%) Loss: 1.935207 LR: 0.00003702 [23:15:11] Epoch: 1 Batch: 2280/38378 (5.94%) Loss: 1.962058 LR: 0.00003702 [23:15:12] Epoch: 1 Batch: 2281/38378 (5.94%) Loss: 1.903647 LR: 0.00003702 [23:15:14] Epoch: 1 Batch: 2282/38378 (5.95%) Loss: 2.434295 LR: 0.00003702 [23:15:16] Epoch: 1 Batch: 2283/38378 (5.95%) Loss: 2.074880 LR: 0.00003702 [23:15:17] Epoch: 1 Batch: 2284/38378 (5.95%) Loss: 2.175669 LR: 0.00003702 [23:15:19] Epoch: 1 Batch: 2285/38378 (5.95%) Loss: 2.063831 LR: 0.00003702 [23:15:21] Epoch: 1 Batch: 2286/38378 (5.96%) Loss: 2.150818 LR: 0.00003713 [23:15:23] Epoch: 1 Batch: 2287/38378 (5.96%) Loss: 2.101326 LR: 0.00003713 [23:15:24] Epoch: 1 Batch: 2288/38378 (5.96%) Loss: 1.789366 LR: 0.00003713 [23:15:26] Epoch: 1 Batch: 2289/38378 (5.96%) Loss: 2.374798 LR: 0.00003713 [23:15:28] Epoch: 1 Batch: 2290/38378 (5.97%) Loss: 1.932135 LR: 0.00003713 [23:15:29] Epoch: 1 Batch: 2291/38378 (5.97%) Loss: 2.312285 LR: 0.00003713 [23:15:31] Epoch: 1 Batch: 2292/38378 (5.97%) Loss: 2.125579 LR: 0.00003713 [23:15:33] Epoch: 1 Batch: 2293/38378 (5.97%) Loss: 2.087128 LR: 0.00003724 [23:15:34] Epoch: 1 Batch: 2294/38378 (5.98%) Loss: 2.148393 LR: 0.00003724 [23:15:36] Epoch: 1 Batch: 2295/38378 (5.98%) Loss: 1.999593 LR: 0.00003724 [23:15:38] Epoch: 1 Batch: 2296/38378 (5.98%) Loss: 2.226119 LR: 0.00003724 [23:15:39] Epoch: 1 Batch: 2297/38378 (5.99%) Loss: 2.100899 LR: 0.00003724 [23:15:41] Epoch: 1 Batch: 2298/38378 (5.99%) Loss: 2.326401 LR: 0.00003724 [23:15:43] Epoch: 1 Batch: 2299/38378 (5.99%) Loss: 2.013015 LR: 0.00003724 [23:15:45] Epoch: 1 Batch: 2300/38378 (5.99%) Loss: 1.962710 LR: 0.00003736 [23:15:46] Epoch: 1 Batch: 2301/38378 (6.00%) Loss: 2.111946 LR: 0.00003736 [23:15:48] Epoch: 1 Batch: 2302/38378 (6.00%) Loss: 1.866669 LR: 0.00003736 [23:15:50] Epoch: 1 Batch: 2303/38378 (6.00%) Loss: 2.125520 LR: 0.00003736 [23:15:51] Epoch: 1 Batch: 2304/38378 (6.00%) Loss: 2.351627 LR: 0.00003736 [23:15:53] Epoch: 1 Batch: 2305/38378 (6.01%) Loss: 2.581841 LR: 0.00003736 [23:15:55] Epoch: 1 Batch: 2306/38378 (6.01%) Loss: 2.096756 LR: 0.00003736 [23:15:56] Epoch: 1 Batch: 2307/38378 (6.01%) Loss: 2.042098 LR: 0.00003747 [23:15:58] Epoch: 1 Batch: 2308/38378 (6.01%) Loss: 2.284478 LR: 0.00003747 [23:16:00] Epoch: 1 Batch: 2309/38378 (6.02%) Loss: 2.176433 LR: 0.00003747 [23:16:06] >> Cleaned up old temp checkpoint: epoch1_step1980 [23:16:06] >> Temp checkpoint saved: epoch1_step2310, size: 0.1702 GB [23:16:06] Epoch: 1 Batch: 2310/38378 (6.02%) Loss: 2.088097 LR: 0.00003747 [23:16:07] Epoch: 1 Batch: 2311/38378 (6.02%) Loss: 2.341695 LR: 0.00003747 [23:16:09] Epoch: 1 Batch: 2312/38378 (6.02%) Loss: 2.234544 LR: 0.00003747 [23:16:11] Epoch: 1 Batch: 2313/38378 (6.03%) Loss: 2.320468 LR: 0.00003747 [23:16:12] Epoch: 1 Batch: 2314/38378 (6.03%) Loss: 2.184071 LR: 0.00003759 [23:16:14] Epoch: 1 Batch: 2315/38378 (6.03%) Loss: 2.294053 LR: 0.00003759 [23:16:16] Epoch: 1 Batch: 2316/38378 (6.03%) Loss: 2.098938 LR: 0.00003759 [23:16:17] Epoch: 1 Batch: 2317/38378 (6.04%) Loss: 1.992177 LR: 0.00003759 [23:16:19] Epoch: 1 Batch: 2318/38378 (6.04%) Loss: 1.825943 LR: 0.00003759 [23:16:21] Epoch: 1 Batch: 2319/38378 (6.04%) Loss: 2.327619 LR: 0.00003759 [23:16:22] Epoch: 1 Batch: 2320/38378 (6.05%) Loss: 1.869862 LR: 0.00003759 [23:16:24] Epoch: 1 Batch: 2321/38378 (6.05%) Loss: 2.097546 LR: 0.00003770 [23:16:26] Epoch: 1 Batch: 2322/38378 (6.05%) Loss: 2.337594 LR: 0.00003770 [23:16:28] Epoch: 1 Batch: 2323/38378 (6.05%) Loss: 2.018999 LR: 0.00003770 [23:16:29] Epoch: 1 Batch: 2324/38378 (6.06%) Loss: 2.249818 LR: 0.00003770 [23:16:31] Epoch: 1 Batch: 2325/38378 (6.06%) Loss: 2.040070 LR: 0.00003770 [23:16:33] Epoch: 1 Batch: 2326/38378 (6.06%) Loss: 1.970420 LR: 0.00003770 [23:16:34] Epoch: 1 Batch: 2327/38378 (6.06%) Loss: 1.905590 LR: 0.00003770 [23:16:36] Epoch: 1 Batch: 2328/38378 (6.07%) Loss: 2.139736 LR: 0.00003781 [23:16:38] Epoch: 1 Batch: 2329/38378 (6.07%) Loss: 1.941483 LR: 0.00003781 [23:16:40] Epoch: 1 Batch: 2330/38378 (6.07%) Loss: 2.234787 LR: 0.00003781 [23:16:41] Epoch: 1 Batch: 2331/38378 (6.07%) Loss: 2.054417 LR: 0.00003781 [23:16:43] Epoch: 1 Batch: 2332/38378 (6.08%) Loss: 2.164797 LR: 0.00003781 [23:16:45] Epoch: 1 Batch: 2333/38378 (6.08%) Loss: 1.941529 LR: 0.00003781 [23:16:47] Epoch: 1 Batch: 2334/38378 (6.08%) Loss: 2.475159 LR: 0.00003781 [23:16:48] Epoch: 1 Batch: 2335/38378 (6.08%) Loss: 1.955692 LR: 0.00003793 [23:16:50] Epoch: 1 Batch: 2336/38378 (6.09%) Loss: 1.950973 LR: 0.00003793 [23:16:52] Epoch: 1 Batch: 2337/38378 (6.09%) Loss: 1.757442 LR: 0.00003793 [23:16:53] Epoch: 1 Batch: 2338/38378 (6.09%) Loss: 2.271819 LR: 0.00003793 [23:16:55] Epoch: 1 Batch: 2339/38378 (6.09%) Loss: 2.107883 LR: 0.00003793 [23:16:57] Epoch: 1 Batch: 2340/38378 (6.10%) Loss: 2.182807 LR: 0.00003793 [23:16:59] Epoch: 1 Batch: 2341/38378 (6.10%) Loss: 2.416364 LR: 0.00003793 [23:17:00] Epoch: 1 Batch: 2342/38378 (6.10%) Loss: 2.307372 LR: 0.00003804 [23:17:06] >> Cleaned up old temp checkpoint: epoch1_step2013 [23:17:06] >> Temp checkpoint saved: epoch1_step2343, size: 0.1702 GB [23:17:06] Epoch: 1 Batch: 2343/38378 (6.11%) Loss: 2.124366 LR: 0.00003804 [23:17:08] Epoch: 1 Batch: 2344/38378 (6.11%) Loss: 2.229177 LR: 0.00003804 [23:17:10] Epoch: 1 Batch: 2345/38378 (6.11%) Loss: 2.103938 LR: 0.00003804 [23:17:11] Epoch: 1 Batch: 2346/38378 (6.11%) Loss: 2.028983 LR: 0.00003804 [23:17:13] Epoch: 1 Batch: 2347/38378 (6.12%) Loss: 2.200387 LR: 0.00003804 [23:17:15] Epoch: 1 Batch: 2348/38378 (6.12%) Loss: 2.273911 LR: 0.00003804 [23:17:16] Epoch: 1 Batch: 2349/38378 (6.12%) Loss: 2.180514 LR: 0.00003815 [23:17:18] Epoch: 1 Batch: 2350/38378 (6.12%) Loss: 2.271450 LR: 0.00003815 [23:17:20] Epoch: 1 Batch: 2351/38378 (6.13%) Loss: 1.984299 LR: 0.00003815 [23:17:21] Epoch: 1 Batch: 2352/38378 (6.13%) Loss: 2.308062 LR: 0.00003815 [23:17:23] Epoch: 1 Batch: 2353/38378 (6.13%) Loss: 2.358121 LR: 0.00003815 [23:17:25] Epoch: 1 Batch: 2354/38378 (6.13%) Loss: 2.189263 LR: 0.00003815 [23:17:27] Epoch: 1 Batch: 2355/38378 (6.14%) Loss: 2.100304 LR: 0.00003815 [23:17:28] Epoch: 1 Batch: 2356/38378 (6.14%) Loss: 2.271050 LR: 0.00003827 [23:17:30] Epoch: 1 Batch: 2357/38378 (6.14%) Loss: 1.810716 LR: 0.00003827 [23:17:32] Epoch: 1 Batch: 2358/38378 (6.14%) Loss: 2.022973 LR: 0.00003827 [23:17:33] Epoch: 1 Batch: 2359/38378 (6.15%) Loss: 2.197799 LR: 0.00003827 [23:17:35] Epoch: 1 Batch: 2360/38378 (6.15%) Loss: 2.009204 LR: 0.00003827 [23:17:37] Epoch: 1 Batch: 2361/38378 (6.15%) Loss: 1.963114 LR: 0.00003827 [23:17:39] Epoch: 1 Batch: 2362/38378 (6.15%) Loss: 1.884630 LR: 0.00003827 [23:17:40] Epoch: 1 Batch: 2363/38378 (6.16%) Loss: 1.905140 LR: 0.00003838 [23:17:42] Epoch: 1 Batch: 2364/38378 (6.16%) Loss: 1.902967 LR: 0.00003838 [23:17:44] Epoch: 1 Batch: 2365/38378 (6.16%) Loss: 1.955974 LR: 0.00003838 [23:17:46] Epoch: 1 Batch: 2366/38378 (6.16%) Loss: 2.498242 LR: 0.00003838 [23:17:47] Epoch: 1 Batch: 2367/38378 (6.17%) Loss: 2.060431 LR: 0.00003838 [23:17:49] Epoch: 1 Batch: 2368/38378 (6.17%) Loss: 1.892271 LR: 0.00003838 [23:17:51] Epoch: 1 Batch: 2369/38378 (6.17%) Loss: 2.109667 LR: 0.00003838 [23:17:52] Epoch: 1 Batch: 2370/38378 (6.18%) Loss: 1.959116 LR: 0.00003850 [23:17:54] Epoch: 1 Batch: 2371/38378 (6.18%) Loss: 2.370118 LR: 0.00003850 [23:17:56] Epoch: 1 Batch: 2372/38378 (6.18%) Loss: 1.918183 LR: 0.00003850 [23:17:58] Epoch: 1 Batch: 2373/38378 (6.18%) Loss: 2.306204 LR: 0.00003850 [23:17:59] Epoch: 1 Batch: 2374/38378 (6.19%) Loss: 2.124177 LR: 0.00003850 [23:18:01] Epoch: 1 Batch: 2375/38378 (6.19%) Loss: 2.117711 LR: 0.00003850 [23:18:07] >> Cleaned up old temp checkpoint: epoch1_step2046 [23:18:07] >> Temp checkpoint saved: epoch1_step2376, size: 0.1702 GB [23:18:07] Epoch: 1 Batch: 2376/38378 (6.19%) Loss: 2.141978 LR: 0.00003850 [23:18:09] Epoch: 1 Batch: 2377/38378 (6.19%) Loss: 2.275975 LR: 0.00003861 [23:18:10] Epoch: 1 Batch: 2378/38378 (6.20%) Loss: 1.982798 LR: 0.00003861 [23:18:12] Epoch: 1 Batch: 2379/38378 (6.20%) Loss: 1.954400 LR: 0.00003861 [23:18:14] Epoch: 1 Batch: 2380/38378 (6.20%) Loss: 1.810699 LR: 0.00003861 [23:18:15] Epoch: 1 Batch: 2381/38378 (6.20%) Loss: 2.139750 LR: 0.00003861 [23:18:17] Epoch: 1 Batch: 2382/38378 (6.21%) Loss: 2.182295 LR: 0.00003861 [23:18:19] Epoch: 1 Batch: 2383/38378 (6.21%) Loss: 2.108366 LR: 0.00003861 [23:18:20] Epoch: 1 Batch: 2384/38378 (6.21%) Loss: 1.891053 LR: 0.00003872 [23:18:22] Epoch: 1 Batch: 2385/38378 (6.21%) Loss: 2.178629 LR: 0.00003872 [23:18:24] Epoch: 1 Batch: 2386/38378 (6.22%) Loss: 1.789107 LR: 0.00003872 [23:18:26] Epoch: 1 Batch: 2387/38378 (6.22%) Loss: 2.299387 LR: 0.00003872 [23:18:27] Epoch: 1 Batch: 2388/38378 (6.22%) Loss: 2.238894 LR: 0.00003872 [23:18:29] Epoch: 1 Batch: 2389/38378 (6.22%) Loss: 1.981949 LR: 0.00003872 [23:18:31] Epoch: 1 Batch: 2390/38378 (6.23%) Loss: 2.110850 LR: 0.00003872 [23:18:32] Epoch: 1 Batch: 2391/38378 (6.23%) Loss: 2.403996 LR: 0.00003884 [23:18:34] Epoch: 1 Batch: 2392/38378 (6.23%) Loss: 2.056256 LR: 0.00003884 [23:18:36] Epoch: 1 Batch: 2393/38378 (6.24%) Loss: 1.890859 LR: 0.00003884 [23:18:38] Epoch: 1 Batch: 2394/38378 (6.24%) Loss: 2.140169 LR: 0.00003884 [23:18:39] Epoch: 1 Batch: 2395/38378 (6.24%) Loss: 2.209010 LR: 0.00003884 [23:18:41] Epoch: 1 Batch: 2396/38378 (6.24%) Loss: 2.208449 LR: 0.00003884 [23:18:43] Epoch: 1 Batch: 2397/38378 (6.25%) Loss: 1.813409 LR: 0.00003884 [23:18:45] Epoch: 1 Batch: 2398/38378 (6.25%) Loss: 1.960398 LR: 0.00003895 [23:18:46] Epoch: 1 Batch: 2399/38378 (6.25%) Loss: 2.168323 LR: 0.00003895 [23:18:48] Epoch: 1 Batch: 2400/38378 (6.25%) Loss: 2.052939 LR: 0.00003895 [23:18:50] Epoch: 1 Batch: 2401/38378 (6.26%) Loss: 2.164382 LR: 0.00003895 [23:18:51] Epoch: 1 Batch: 2402/38378 (6.26%) Loss: 2.281066 LR: 0.00003895 [23:18:53] Epoch: 1 Batch: 2403/38378 (6.26%) Loss: 2.127054 LR: 0.00003895 [23:18:55] Epoch: 1 Batch: 2404/38378 (6.26%) Loss: 2.128357 LR: 0.00003895 [23:18:57] Epoch: 1 Batch: 2405/38378 (6.27%) Loss: 2.020747 LR: 0.00003907 [23:18:58] Epoch: 1 Batch: 2406/38378 (6.27%) Loss: 2.082069 LR: 0.00003907 [23:19:00] Epoch: 1 Batch: 2407/38378 (6.27%) Loss: 2.209919 LR: 0.00003907 [23:19:02] Epoch: 1 Batch: 2408/38378 (6.27%) Loss: 1.887921 LR: 0.00003907 [23:19:07] >> Cleaned up old temp checkpoint: epoch1_step2079 [23:19:07] >> Temp checkpoint saved: epoch1_step2409, size: 0.1702 GB [23:19:07] Epoch: 1 Batch: 2409/38378 (6.28%) Loss: 1.937108 LR: 0.00003907 [23:19:09] Epoch: 1 Batch: 2410/38378 (6.28%) Loss: 2.162473 LR: 0.00003907 [23:19:11] Epoch: 1 Batch: 2411/38378 (6.28%) Loss: 2.315861 LR: 0.00003907 [23:19:12] Epoch: 1 Batch: 2412/38378 (6.28%) Loss: 2.023710 LR: 0.00003918 [23:19:14] Epoch: 1 Batch: 2413/38378 (6.29%) Loss: 2.435188 LR: 0.00003918 [23:19:16] Epoch: 1 Batch: 2414/38378 (6.29%) Loss: 2.124240 LR: 0.00003918 [23:19:17] Epoch: 1 Batch: 2415/38378 (6.29%) Loss: 2.193328 LR: 0.00003918 [23:19:19] Epoch: 1 Batch: 2416/38378 (6.30%) Loss: 2.036882 LR: 0.00003918 [23:19:21] Epoch: 1 Batch: 2417/38378 (6.30%) Loss: 2.044819 LR: 0.00003918 [23:19:22] Epoch: 1 Batch: 2418/38378 (6.30%) Loss: 2.089395 LR: 0.00003918 [23:19:24] Epoch: 1 Batch: 2419/38378 (6.30%) Loss: 2.133163 LR: 0.00003929 [23:19:26] Epoch: 1 Batch: 2420/38378 (6.31%) Loss: 2.029467 LR: 0.00003929 [23:19:28] Epoch: 1 Batch: 2421/38378 (6.31%) Loss: 2.099090 LR: 0.00003929 [23:19:29] Epoch: 1 Batch: 2422/38378 (6.31%) Loss: 1.984849 LR: 0.00003929 [23:19:31] Epoch: 1 Batch: 2423/38378 (6.31%) Loss: 1.914849 LR: 0.00003929 [23:19:33] Epoch: 1 Batch: 2424/38378 (6.32%) Loss: 2.107404 LR: 0.00003929 [23:19:34] Epoch: 1 Batch: 2425/38378 (6.32%) Loss: 2.049723 LR: 0.00003929 [23:19:36] Epoch: 1 Batch: 2426/38378 (6.32%) Loss: 1.986574 LR: 0.00003941 [23:19:38] Epoch: 1 Batch: 2427/38378 (6.32%) Loss: 2.042789 LR: 0.00003941 [23:19:40] Epoch: 1 Batch: 2428/38378 (6.33%) Loss: 2.058291 LR: 0.00003941 [23:19:41] Epoch: 1 Batch: 2429/38378 (6.33%) Loss: 2.064232 LR: 0.00003941 [23:19:43] Epoch: 1 Batch: 2430/38378 (6.33%) Loss: 2.451993 LR: 0.00003941 [23:19:45] Epoch: 1 Batch: 2431/38378 (6.33%) Loss: 2.026028 LR: 0.00003941 [23:19:47] Epoch: 1 Batch: 2432/38378 (6.34%) Loss: 2.252118 LR: 0.00003941 [23:19:48] Epoch: 1 Batch: 2433/38378 (6.34%) Loss: 2.006195 LR: 0.00003952 [23:19:50] Epoch: 1 Batch: 2434/38378 (6.34%) Loss: 2.018272 LR: 0.00003952 [23:19:52] Epoch: 1 Batch: 2435/38378 (6.34%) Loss: 2.358306 LR: 0.00003952 [23:19:53] Epoch: 1 Batch: 2436/38378 (6.35%) Loss: 2.335391 LR: 0.00003952 [23:19:55] Epoch: 1 Batch: 2437/38378 (6.35%) Loss: 2.256084 LR: 0.00003952 [23:19:57] Epoch: 1 Batch: 2438/38378 (6.35%) Loss: 2.180837 LR: 0.00003952 [23:19:59] Epoch: 1 Batch: 2439/38378 (6.36%) Loss: 2.373590 LR: 0.00003952 [23:20:00] Epoch: 1 Batch: 2440/38378 (6.36%) Loss: 1.899676 LR: 0.00003964 [23:20:02] Epoch: 1 Batch: 2441/38378 (6.36%) Loss: 2.430999 LR: 0.00003964 [23:20:08] >> Cleaned up old temp checkpoint: epoch1_step2112 [23:20:08] >> Temp checkpoint saved: epoch1_step2442, size: 0.1702 GB [23:20:08] Epoch: 1 Batch: 2442/38378 (6.36%) Loss: 2.282934 LR: 0.00003964 [23:20:09] Epoch: 1 Batch: 2443/38378 (6.37%) Loss: 1.997951 LR: 0.00003964 [23:20:11] Epoch: 1 Batch: 2444/38378 (6.37%) Loss: 2.131144 LR: 0.00003964 [23:20:13] Epoch: 1 Batch: 2445/38378 (6.37%) Loss: 2.160072 LR: 0.00003964 [23:20:14] Epoch: 1 Batch: 2446/38378 (6.37%) Loss: 2.632197 LR: 0.00003964 [23:20:16] Epoch: 1 Batch: 2447/38378 (6.38%) Loss: 2.234422 LR: 0.00003975 [23:20:18] Epoch: 1 Batch: 2448/38378 (6.38%) Loss: 2.333499 LR: 0.00003975 [23:20:19] Epoch: 1 Batch: 2449/38378 (6.38%) Loss: 1.908326 LR: 0.00003975 [23:20:21] Epoch: 1 Batch: 2450/38378 (6.38%) Loss: 1.871911 LR: 0.00003975 [23:20:23] Epoch: 1 Batch: 2451/38378 (6.39%) Loss: 2.044747 LR: 0.00003975 [23:20:24] Epoch: 1 Batch: 2452/38378 (6.39%) Loss: 2.041658 LR: 0.00003975 [23:20:26] Epoch: 1 Batch: 2453/38378 (6.39%) Loss: 1.992077 LR: 0.00003975 [23:20:28] Epoch: 1 Batch: 2454/38378 (6.39%) Loss: 2.276096 LR: 0.00003986 [23:20:30] Epoch: 1 Batch: 2455/38378 (6.40%) Loss: 2.123745 LR: 0.00003986 [23:20:31] Epoch: 1 Batch: 2456/38378 (6.40%) Loss: 2.372438 LR: 0.00003986 [23:20:33] Epoch: 1 Batch: 2457/38378 (6.40%) Loss: 1.854488 LR: 0.00003986 [23:20:35] Epoch: 1 Batch: 2458/38378 (6.40%) Loss: 2.005565 LR: 0.00003986 [23:20:37] Epoch: 1 Batch: 2459/38378 (6.41%) Loss: 2.202859 LR: 0.00003986 [23:20:38] Epoch: 1 Batch: 2460/38378 (6.41%) Loss: 2.041803 LR: 0.00003986 [23:20:40] Epoch: 1 Batch: 2461/38378 (6.41%) Loss: 2.105851 LR: 0.00003998 [23:20:42] Epoch: 1 Batch: 2462/38378 (6.42%) Loss: 1.980114 LR: 0.00003998 [23:20:43] Epoch: 1 Batch: 2463/38378 (6.42%) Loss: 2.071907 LR: 0.00003998 [23:20:45] Epoch: 1 Batch: 2464/38378 (6.42%) Loss: 2.214838 LR: 0.00003998 [23:20:47] Epoch: 1 Batch: 2465/38378 (6.42%) Loss: 2.040856 LR: 0.00003998 [23:20:49] Epoch: 1 Batch: 2466/38378 (6.43%) Loss: 2.060422 LR: 0.00003998 [23:20:50] Epoch: 1 Batch: 2467/38378 (6.43%) Loss: 2.162368 LR: 0.00003998 [23:20:52] Epoch: 1 Batch: 2468/38378 (6.43%) Loss: 2.393690 LR: 0.00004009 [23:20:54] Epoch: 1 Batch: 2469/38378 (6.43%) Loss: 2.408670 LR: 0.00004009 [23:20:55] Epoch: 1 Batch: 2470/38378 (6.44%) Loss: 2.203369 LR: 0.00004009 [23:20:57] Epoch: 1 Batch: 2471/38378 (6.44%) Loss: 2.126533 LR: 0.00004009 [23:20:59] Epoch: 1 Batch: 2472/38378 (6.44%) Loss: 2.125544 LR: 0.00004009 [23:21:01] Epoch: 1 Batch: 2473/38378 (6.44%) Loss: 2.060987 LR: 0.00004009 [23:21:02] Epoch: 1 Batch: 2474/38378 (6.45%) Loss: 2.042781 LR: 0.00004009 [23:21:08] >> Cleaned up old temp checkpoint: epoch1_step2145 [23:21:08] >> Temp checkpoint saved: epoch1_step2475, size: 0.1702 GB [23:21:08] Epoch: 1 Batch: 2475/38378 (6.45%) Loss: 1.875952 LR: 0.00004021 [23:21:09] Epoch: 1 Batch: 2476/38378 (6.45%) Loss: 2.229211 LR: 0.00004021 [23:21:11] Epoch: 1 Batch: 2477/38378 (6.45%) Loss: 1.997225 LR: 0.00004021 [23:21:13] Epoch: 1 Batch: 2478/38378 (6.46%) Loss: 2.163124 LR: 0.00004021 [23:21:14] Epoch: 1 Batch: 2479/38378 (6.46%) Loss: 2.508815 LR: 0.00004021 [23:21:16] Epoch: 1 Batch: 2480/38378 (6.46%) Loss: 1.893675 LR: 0.00004021 [23:21:18] Epoch: 1 Batch: 2481/38378 (6.46%) Loss: 1.971476 LR: 0.00004021 [23:21:20] Epoch: 1 Batch: 2482/38378 (6.47%) Loss: 2.138766 LR: 0.00004032 [23:21:21] Epoch: 1 Batch: 2483/38378 (6.47%) Loss: 2.309413 LR: 0.00004032 [23:21:23] Epoch: 1 Batch: 2484/38378 (6.47%) Loss: 2.017541 LR: 0.00004032 [23:21:25] Epoch: 1 Batch: 2485/38378 (6.48%) Loss: 2.239379 LR: 0.00004032 [23:21:26] Epoch: 1 Batch: 2486/38378 (6.48%) Loss: 2.030702 LR: 0.00004032 [23:21:28] Epoch: 1 Batch: 2487/38378 (6.48%) Loss: 2.141636 LR: 0.00004032 [23:21:30] Epoch: 1 Batch: 2488/38378 (6.48%) Loss: 2.176164 LR: 0.00004032 [23:21:32] Epoch: 1 Batch: 2489/38378 (6.49%) Loss: 2.079869 LR: 0.00004043 [23:21:33] Epoch: 1 Batch: 2490/38378 (6.49%) Loss: 2.241487 LR: 0.00004043 [23:21:35] Epoch: 1 Batch: 2491/38378 (6.49%) Loss: 1.938365 LR: 0.00004043 [23:21:37] Epoch: 1 Batch: 2492/38378 (6.49%) Loss: 2.176395 LR: 0.00004043 [23:21:39] Epoch: 1 Batch: 2493/38378 (6.50%) Loss: 1.876714 LR: 0.00004043 [23:21:40] Epoch: 1 Batch: 2494/38378 (6.50%) Loss: 2.159614 LR: 0.00004043 [23:21:42] Epoch: 1 Batch: 2495/38378 (6.50%) Loss: 2.098466 LR: 0.00004043 [23:21:44] Epoch: 1 Batch: 2496/38378 (6.50%) Loss: 2.223531 LR: 0.00004055 [23:21:45] Epoch: 1 Batch: 2497/38378 (6.51%) Loss: 2.030040 LR: 0.00004055 [23:21:47] Epoch: 1 Batch: 2498/38378 (6.51%) Loss: 1.922955 LR: 0.00004055 [23:21:49] Epoch: 1 Batch: 2499/38378 (6.51%) Loss: 1.909586 LR: 0.00004055 [23:21:51] >> Evaluating batch 0 [23:21:52] >> Evaluating batch 1 [23:21:52] >> Evaluating batch 2 [23:21:53] >> Evaluating batch 3 [23:21:54] >> Evaluating batch 4 [23:21:55] >> Evaluating batch 5 [23:21:56] >> Evaluating batch 6 [23:21:57] >> Evaluating batch 7 [23:21:58] >> Evaluating batch 8 [23:21:59] >> Evaluating batch 9 [23:22:00] >> Evaluating batch 10 [23:22:01] >> Evaluating batch 11 [23:22:02] >> Evaluating batch 12 [23:22:03] >> Evaluating batch 13 [23:22:04] >> Evaluating batch 14 [23:22:05] >> Evaluating batch 15 [23:22:06] >> Evaluating batch 16 [23:22:06] Epoch: 1 Step: 2500/38378 Evaluation: [23:22:06] [1mAvg Loss Since Last Eval: 2.1220 Val Loss: 2.2251 Validation loss delta: -0.0250 Perplexity: 9.2544 LR: 0.00004055 [23:22:10] >> Checkpoint saved: epoch1_step2500, size: 0.1702 GB [23:22:10] Epoch: 1 Batch: 2500/38378 (6.51%) Loss: 2.053781 LR: 0.00004055 [23:22:12] Epoch: 1 Batch: 2501/38378 (6.52%) Loss: 1.850347 LR: 0.00004055 [23:22:14] Epoch: 1 Batch: 2502/38378 (6.52%) Loss: 2.056819 LR: 0.00004055 [23:22:15] Epoch: 1 Batch: 2503/38378 (6.52%) Loss: 2.338325 LR: 0.00004066 [23:22:17] Epoch: 1 Batch: 2504/38378 (6.52%) Loss: 1.950747 LR: 0.00004066 [23:22:19] Epoch: 1 Batch: 2505/38378 (6.53%) Loss: 2.177858 LR: 0.00004066 [23:22:20] Epoch: 1 Batch: 2506/38378 (6.53%) Loss: 2.322347 LR: 0.00004066 [23:22:22] Epoch: 1 Batch: 2507/38378 (6.53%) Loss: 2.260843 LR: 0.00004066 [23:22:28] >> Cleaned up old temp checkpoint: epoch1_step2178 [23:22:28] >> Temp checkpoint saved: epoch1_step2508, size: 0.1702 GB [23:22:28] Epoch: 1 Batch: 2508/38378 (6.53%) Loss: 2.161491 LR: 0.00004066 [23:22:29] Epoch: 1 Batch: 2509/38378 (6.54%) Loss: 2.248005 LR: 0.00004066 [23:22:31] Epoch: 1 Batch: 2510/38378 (6.54%) Loss: 2.108250 LR: 0.00004077 [23:22:33] Epoch: 1 Batch: 2511/38378 (6.54%) Loss: 2.019470 LR: 0.00004077 [23:22:34] Epoch: 1 Batch: 2512/38378 (6.55%) Loss: 1.749485 LR: 0.00004077 [23:22:36] Epoch: 1 Batch: 2513/38378 (6.55%) Loss: 2.623809 LR: 0.00004077 [23:22:38] Epoch: 1 Batch: 2514/38378 (6.55%) Loss: 2.067180 LR: 0.00004077 [23:22:40] Epoch: 1 Batch: 2515/38378 (6.55%) Loss: 1.973288 LR: 0.00004077 [23:22:41] Epoch: 1 Batch: 2516/38378 (6.56%) Loss: 2.144429 LR: 0.00004077 [23:22:43] Epoch: 1 Batch: 2517/38378 (6.56%) Loss: 2.201555 LR: 0.00004089 [23:22:45] Epoch: 1 Batch: 2518/38378 (6.56%) Loss: 1.838156 LR: 0.00004089 [23:22:46] Epoch: 1 Batch: 2519/38378 (6.56%) Loss: 2.518749 LR: 0.00004089 [23:22:48] Epoch: 1 Batch: 2520/38378 (6.57%) Loss: 1.917830 LR: 0.00004089 [23:22:50] Epoch: 1 Batch: 2521/38378 (6.57%) Loss: 1.917283 LR: 0.00004089 [23:22:52] Epoch: 1 Batch: 2522/38378 (6.57%) Loss: 2.335254 LR: 0.00004089 [23:22:53] Epoch: 1 Batch: 2523/38378 (6.57%) Loss: 1.953485 LR: 0.00004089 [23:22:55] Epoch: 1 Batch: 2524/38378 (6.58%) Loss: 2.081092 LR: 0.00004100 [23:22:57] Epoch: 1 Batch: 2525/38378 (6.58%) Loss: 2.188473 LR: 0.00004100 [23:22:59] Epoch: 1 Batch: 2526/38378 (6.58%) Loss: 1.930727 LR: 0.00004100 [23:23:00] Epoch: 1 Batch: 2527/38378 (6.58%) Loss: 1.990883 LR: 0.00004100 [23:23:02] Epoch: 1 Batch: 2528/38378 (6.59%) Loss: 1.976096 LR: 0.00004100 [23:23:04] Epoch: 1 Batch: 2529/38378 (6.59%) Loss: 2.264721 LR: 0.00004100 [23:23:05] Epoch: 1 Batch: 2530/38378 (6.59%) Loss: 2.175728 LR: 0.00004100 [23:23:07] Epoch: 1 Batch: 2531/38378 (6.59%) Loss: 2.292674 LR: 0.00004112 [23:23:09] Epoch: 1 Batch: 2532/38378 (6.60%) Loss: 1.846302 LR: 0.00004112 [23:23:11] Epoch: 1 Batch: 2533/38378 (6.60%) Loss: 2.110858 LR: 0.00004112 [23:23:12] Epoch: 1 Batch: 2534/38378 (6.60%) Loss: 1.941298 LR: 0.00004112 [23:23:14] Epoch: 1 Batch: 2535/38378 (6.61%) Loss: 2.070604 LR: 0.00004112 [23:23:16] Epoch: 1 Batch: 2536/38378 (6.61%) Loss: 2.036760 LR: 0.00004112 [23:23:17] Epoch: 1 Batch: 2537/38378 (6.61%) Loss: 2.079645 LR: 0.00004112 [23:23:19] Epoch: 1 Batch: 2538/38378 (6.61%) Loss: 1.991414 LR: 0.00004123 [23:23:21] Epoch: 1 Batch: 2539/38378 (6.62%) Loss: 2.182666 LR: 0.00004123 [23:23:23] Epoch: 1 Batch: 2540/38378 (6.62%) Loss: 1.765537 LR: 0.00004123 [23:23:28] >> Cleaned up old temp checkpoint: epoch1_step2211 [23:23:28] >> Temp checkpoint saved: epoch1_step2541, size: 0.1702 GB [23:23:28] Epoch: 1 Batch: 2541/38378 (6.62%) Loss: 2.299293 LR: 0.00004123 [23:23:30] Epoch: 1 Batch: 2542/38378 (6.62%) Loss: 2.224129 LR: 0.00004123 [23:23:31] Epoch: 1 Batch: 2543/38378 (6.63%) Loss: 1.827202 LR: 0.00004123 [23:23:33] Epoch: 1 Batch: 2544/38378 (6.63%) Loss: 2.377915 LR: 0.00004123 [23:23:35] Epoch: 1 Batch: 2545/38378 (6.63%) Loss: 2.078833 LR: 0.00004134 [23:23:37] Epoch: 1 Batch: 2546/38378 (6.63%) Loss: 2.497208 LR: 0.00004134 [23:23:38] Epoch: 1 Batch: 2547/38378 (6.64%) Loss: 2.011196 LR: 0.00004134 [23:23:40] Epoch: 1 Batch: 2548/38378 (6.64%) Loss: 1.403600 LR: 0.00004134 [23:23:42] Epoch: 1 Batch: 2549/38378 (6.64%) Loss: 2.102640 LR: 0.00004134 [23:23:43] Epoch: 1 Batch: 2550/38378 (6.64%) Loss: 2.029038 LR: 0.00004134 [23:23:45] Epoch: 1 Batch: 2551/38378 (6.65%) Loss: 1.950490 LR: 0.00004134 [23:23:47] Epoch: 1 Batch: 2552/38378 (6.65%) Loss: 2.106346 LR: 0.00004146 [23:23:48] Epoch: 1 Batch: 2553/38378 (6.65%) Loss: 2.235656 LR: 0.00004146 [23:23:50] Epoch: 1 Batch: 2554/38378 (6.65%) Loss: 2.128160 LR: 0.00004146 [23:23:52] Epoch: 1 Batch: 2555/38378 (6.66%) Loss: 2.127552 LR: 0.00004146 [23:23:54] Epoch: 1 Batch: 2556/38378 (6.66%) Loss: 2.037681 LR: 0.00004146 [23:23:55] Epoch: 1 Batch: 2557/38378 (6.66%) Loss: 2.095592 LR: 0.00004146 [23:23:57] Epoch: 1 Batch: 2558/38378 (6.67%) Loss: 1.813573 LR: 0.00004146 [23:23:59] Epoch: 1 Batch: 2559/38378 (6.67%) Loss: 2.307922 LR: 0.00004157 [23:24:00] Epoch: 1 Batch: 2560/38378 (6.67%) Loss: 2.005042 LR: 0.00004157 [23:24:02] Epoch: 1 Batch: 2561/38378 (6.67%) Loss: 2.049816 LR: 0.00004157 [23:24:04] Epoch: 1 Batch: 2562/38378 (6.68%) Loss: 2.456255 LR: 0.00004157 [23:24:06] Epoch: 1 Batch: 2563/38378 (6.68%) Loss: 2.240180 LR: 0.00004157 [23:24:07] Epoch: 1 Batch: 2564/38378 (6.68%) Loss: 2.243929 LR: 0.00004157 [23:24:09] Epoch: 1 Batch: 2565/38378 (6.68%) Loss: 1.781509 LR: 0.00004157 [23:24:11] Epoch: 1 Batch: 2566/38378 (6.69%) Loss: 1.979903 LR: 0.00004169 [23:24:13] Epoch: 1 Batch: 2567/38378 (6.69%) Loss: 1.931533 LR: 0.00004169 [23:24:14] Epoch: 1 Batch: 2568/38378 (6.69%) Loss: 2.181523 LR: 0.00004169 [23:24:16] Epoch: 1 Batch: 2569/38378 (6.69%) Loss: 2.211285 LR: 0.00004169 [23:24:18] Epoch: 1 Batch: 2570/38378 (6.70%) Loss: 2.345494 LR: 0.00004169 [23:24:19] Epoch: 1 Batch: 2571/38378 (6.70%) Loss: 2.141566 LR: 0.00004169 [23:24:21] Epoch: 1 Batch: 2572/38378 (6.70%) Loss: 1.815638 LR: 0.00004169 [23:24:23] Epoch: 1 Batch: 2573/38378 (6.70%) Loss: 2.100546 LR: 0.00004180 [23:24:29] >> Cleaned up old temp checkpoint: epoch1_step2244 [23:24:29] >> Temp checkpoint saved: epoch1_step2574, size: 0.1702 GB [23:24:29] Epoch: 1 Batch: 2574/38378 (6.71%) Loss: 1.993306 LR: 0.00004180 [23:24:30] Epoch: 1 Batch: 2575/38378 (6.71%) Loss: 1.867384 LR: 0.00004180 [23:24:32] Epoch: 1 Batch: 2576/38378 (6.71%) Loss: 1.872852 LR: 0.00004180 [23:24:34] Epoch: 1 Batch: 2577/38378 (6.71%) Loss: 2.119528 LR: 0.00004180 [23:24:35] Epoch: 1 Batch: 2578/38378 (6.72%) Loss: 1.959100 LR: 0.00004180 [23:24:37] Epoch: 1 Batch: 2579/38378 (6.72%) Loss: 2.231022 LR: 0.00004180 [23:24:39] Epoch: 1 Batch: 2580/38378 (6.72%) Loss: 2.061972 LR: 0.00004191 [23:24:40] Epoch: 1 Batch: 2581/38378 (6.73%) Loss: 1.750340 LR: 0.00004191 [23:24:42] Epoch: 1 Batch: 2582/38378 (6.73%) Loss: 1.686915 LR: 0.00004191 [23:24:44] Epoch: 1 Batch: 2583/38378 (6.73%) Loss: 2.179421 LR: 0.00004191 [23:24:46] Epoch: 1 Batch: 2584/38378 (6.73%) Loss: 2.190564 LR: 0.00004191 [23:24:47] Epoch: 1 Batch: 2585/38378 (6.74%) Loss: 2.323244 LR: 0.00004191 [23:24:49] Epoch: 1 Batch: 2586/38378 (6.74%) Loss: 1.733016 LR: 0.00004191 [23:24:51] Epoch: 1 Batch: 2587/38378 (6.74%) Loss: 2.287431 LR: 0.00004203 [23:24:52] Epoch: 1 Batch: 2588/38378 (6.74%) Loss: 2.047410 LR: 0.00004203 [23:24:54] Epoch: 1 Batch: 2589/38378 (6.75%) Loss: 2.313742 LR: 0.00004203 [23:24:56] Epoch: 1 Batch: 2590/38378 (6.75%) Loss: 2.078272 LR: 0.00004203 [23:24:58] Epoch: 1 Batch: 2591/38378 (6.75%) Loss: 1.977305 LR: 0.00004203 [23:24:59] Epoch: 1 Batch: 2592/38378 (6.75%) Loss: 2.078715 LR: 0.00004203 [23:25:01] Epoch: 1 Batch: 2593/38378 (6.76%) Loss: 2.227724 LR: 0.00004203 [23:25:03] Epoch: 1 Batch: 2594/38378 (6.76%) Loss: 2.040906 LR: 0.00004214 [23:25:04] Epoch: 1 Batch: 2595/38378 (6.76%) Loss: 2.087864 LR: 0.00004214 [23:25:06] Epoch: 1 Batch: 2596/38378 (6.76%) Loss: 2.237977 LR: 0.00004214 [23:25:08] Epoch: 1 Batch: 2597/38378 (6.77%) Loss: 2.388822 LR: 0.00004214 [23:25:10] Epoch: 1 Batch: 2598/38378 (6.77%) Loss: 1.948165 LR: 0.00004214 [23:25:11] Epoch: 1 Batch: 2599/38378 (6.77%) Loss: 2.022051 LR: 0.00004214 [23:25:13] Epoch: 1 Batch: 2600/38378 (6.77%) Loss: 2.265757 LR: 0.00004214 [23:25:15] Epoch: 1 Batch: 2601/38378 (6.78%) Loss: 2.587772 LR: 0.00004226 [23:25:17] Epoch: 1 Batch: 2602/38378 (6.78%) Loss: 2.137290 LR: 0.00004226 [23:25:18] Epoch: 1 Batch: 2603/38378 (6.78%) Loss: 2.070702 LR: 0.00004226 [23:25:20] Epoch: 1 Batch: 2604/38378 (6.79%) Loss: 2.082190 LR: 0.00004226 [23:25:22] Epoch: 1 Batch: 2605/38378 (6.79%) Loss: 2.012011 LR: 0.00004226 [23:25:23] Epoch: 1 Batch: 2606/38378 (6.79%) Loss: 2.222436 LR: 0.00004226 [23:25:29] >> Cleaned up old temp checkpoint: epoch1_step2277 [23:25:29] >> Temp checkpoint saved: epoch1_step2607, size: 0.1702 GB [23:25:29] Epoch: 1 Batch: 2607/38378 (6.79%) Loss: 2.256665 LR: 0.00004226 [23:25:31] Epoch: 1 Batch: 2608/38378 (6.80%) Loss: 2.125242 LR: 0.00004237 [23:25:32] Epoch: 1 Batch: 2609/38378 (6.80%) Loss: 2.051250 LR: 0.00004237 [23:25:34] Epoch: 1 Batch: 2610/38378 (6.80%) Loss: 2.250930 LR: 0.00004237 [23:25:36] Epoch: 1 Batch: 2611/38378 (6.80%) Loss: 2.120554 LR: 0.00004237 [23:25:37] Epoch: 1 Batch: 2612/38378 (6.81%) Loss: 2.184187 LR: 0.00004237 [23:25:39] Epoch: 1 Batch: 2613/38378 (6.81%) Loss: 1.815083 LR: 0.00004237 [23:25:41] Epoch: 1 Batch: 2614/38378 (6.81%) Loss: 2.154438 LR: 0.00004237 [23:25:43] Epoch: 1 Batch: 2615/38378 (6.81%) Loss: 2.247254 LR: 0.00004248 [23:25:44] Epoch: 1 Batch: 2616/38378 (6.82%) Loss: 1.856784 LR: 0.00004248 [23:25:46] Epoch: 1 Batch: 2617/38378 (6.82%) Loss: 2.084839 LR: 0.00004248 [23:25:48] Epoch: 1 Batch: 2618/38378 (6.82%) Loss: 2.131512 LR: 0.00004248 [23:25:49] Epoch: 1 Batch: 2619/38378 (6.82%) Loss: 2.306402 LR: 0.00004248 [23:25:51] Epoch: 1 Batch: 2620/38378 (6.83%) Loss: 1.935287 LR: 0.00004248 [23:25:53] Epoch: 1 Batch: 2621/38378 (6.83%) Loss: 1.964777 LR: 0.00004248 [23:25:55] Epoch: 1 Batch: 2622/38378 (6.83%) Loss: 2.089988 LR: 0.00004260 [23:25:56] Epoch: 1 Batch: 2623/38378 (6.83%) Loss: 2.117852 LR: 0.00004260 [23:25:58] Epoch: 1 Batch: 2624/38378 (6.84%) Loss: 2.081146 LR: 0.00004260 [23:26:00] Epoch: 1 Batch: 2625/38378 (6.84%) Loss: 2.123301 LR: 0.00004260 [23:26:01] Epoch: 1 Batch: 2626/38378 (6.84%) Loss: 1.951235 LR: 0.00004260 [23:26:03] Epoch: 1 Batch: 2627/38378 (6.85%) Loss: 2.352582 LR: 0.00004260 [23:26:05] Epoch: 1 Batch: 2628/38378 (6.85%) Loss: 2.075014 LR: 0.00004260 [23:26:07] Epoch: 1 Batch: 2629/38378 (6.85%) Loss: 2.275148 LR: 0.00004271 [23:26:08] Epoch: 1 Batch: 2630/38378 (6.85%) Loss: 1.951904 LR: 0.00004271 [23:26:10] Epoch: 1 Batch: 2631/38378 (6.86%) Loss: 2.353962 LR: 0.00004271 [23:26:12] Epoch: 1 Batch: 2632/38378 (6.86%) Loss: 2.061930 LR: 0.00004271 [23:26:14] Epoch: 1 Batch: 2633/38378 (6.86%) Loss: 2.119303 LR: 0.00004271 [23:26:15] Epoch: 1 Batch: 2634/38378 (6.86%) Loss: 2.337706 LR: 0.00004271 [23:26:17] Epoch: 1 Batch: 2635/38378 (6.87%) Loss: 2.402781 LR: 0.00004271 [23:26:19] Epoch: 1 Batch: 2636/38378 (6.87%) Loss: 2.103202 LR: 0.00004282 [23:26:20] Epoch: 1 Batch: 2637/38378 (6.87%) Loss: 2.045478 LR: 0.00004282 [23:26:22] Epoch: 1 Batch: 2638/38378 (6.87%) Loss: 2.090189 LR: 0.00004282 [23:26:24] Epoch: 1 Batch: 2639/38378 (6.88%) Loss: 2.035884 LR: 0.00004282 [23:26:29] >> Cleaned up old temp checkpoint: epoch1_step2310 [23:26:29] >> Temp checkpoint saved: epoch1_step2640, size: 0.1702 GB [23:26:29] Epoch: 1 Batch: 2640/38378 (6.88%) Loss: 2.423451 LR: 0.00004282 [23:26:31] Epoch: 1 Batch: 2641/38378 (6.88%) Loss: 2.390584 LR: 0.00004282 [23:26:33] Epoch: 1 Batch: 2642/38378 (6.88%) Loss: 2.328247 LR: 0.00004282 [23:26:35] Epoch: 1 Batch: 2643/38378 (6.89%) Loss: 2.099589 LR: 0.00004294 [23:26:36] Epoch: 1 Batch: 2644/38378 (6.89%) Loss: 2.289317 LR: 0.00004294 [23:26:38] Epoch: 1 Batch: 2645/38378 (6.89%) Loss: 1.983261 LR: 0.00004294 [23:26:40] Epoch: 1 Batch: 2646/38378 (6.89%) Loss: 2.101246 LR: 0.00004294 [23:26:41] Epoch: 1 Batch: 2647/38378 (6.90%) Loss: 2.036022 LR: 0.00004294 [23:26:43] Epoch: 1 Batch: 2648/38378 (6.90%) Loss: 2.083837 LR: 0.00004294 [23:26:45] Epoch: 1 Batch: 2649/38378 (6.90%) Loss: 2.260486 LR: 0.00004294 [23:26:46] Epoch: 1 Batch: 2650/38378 (6.90%) Loss: 2.033533 LR: 0.00004305 [23:26:48] Epoch: 1 Batch: 2651/38378 (6.91%) Loss: 2.032969 LR: 0.00004305 [23:26:50] Epoch: 1 Batch: 2652/38378 (6.91%) Loss: 2.142096 LR: 0.00004305 [23:26:52] Epoch: 1 Batch: 2653/38378 (6.91%) Loss: 2.283736 LR: 0.00004305 [23:26:53] Epoch: 1 Batch: 2654/38378 (6.92%) Loss: 2.292140 LR: 0.00004305 [23:26:55] Epoch: 1 Batch: 2655/38378 (6.92%) Loss: 2.061473 LR: 0.00004305 [23:26:57] Epoch: 1 Batch: 2656/38378 (6.92%) Loss: 2.127393 LR: 0.00004305 [23:26:59] Epoch: 1 Batch: 2657/38378 (6.92%) Loss: 2.276409 LR: 0.00004317 [23:27:00] Epoch: 1 Batch: 2658/38378 (6.93%) Loss: 2.158996 LR: 0.00004317 [23:27:02] Epoch: 1 Batch: 2659/38378 (6.93%) Loss: 2.250258 LR: 0.00004317 [23:27:04] Epoch: 1 Batch: 2660/38378 (6.93%) Loss: 2.177076 LR: 0.00004317 [23:27:05] Epoch: 1 Batch: 2661/38378 (6.93%) Loss: 1.939249 LR: 0.00004317 [23:27:07] Epoch: 1 Batch: 2662/38378 (6.94%) Loss: 1.955695 LR: 0.00004317 [23:27:09] Epoch: 1 Batch: 2663/38378 (6.94%) Loss: 1.963501 LR: 0.00004317 [23:27:11] Epoch: 1 Batch: 2664/38378 (6.94%) Loss: 2.078528 LR: 0.00004328 [23:27:12] Epoch: 1 Batch: 2665/38378 (6.94%) Loss: 2.105029 LR: 0.00004328 [23:27:14] Epoch: 1 Batch: 2666/38378 (6.95%) Loss: 2.217089 LR: 0.00004328 [23:27:16] Epoch: 1 Batch: 2667/38378 (6.95%) Loss: 2.159544 LR: 0.00004328 [23:27:18] Epoch: 1 Batch: 2668/38378 (6.95%) Loss: 2.089706 LR: 0.00004328 [23:27:19] Epoch: 1 Batch: 2669/38378 (6.95%) Loss: 2.166085 LR: 0.00004328 [23:27:21] Epoch: 1 Batch: 2670/38378 (6.96%) Loss: 1.932123 LR: 0.00004328 [23:27:23] Epoch: 1 Batch: 2671/38378 (6.96%) Loss: 2.282445 LR: 0.00004339 [23:27:24] Epoch: 1 Batch: 2672/38378 (6.96%) Loss: 2.124566 LR: 0.00004339 [23:27:30] >> Cleaned up old temp checkpoint: epoch1_step2343 [23:27:30] >> Temp checkpoint saved: epoch1_step2673, size: 0.1702 GB [23:27:30] Epoch: 1 Batch: 2673/38378 (6.96%) Loss: 1.995393 LR: 0.00004339 [23:27:32] Epoch: 1 Batch: 2674/38378 (6.97%) Loss: 2.052197 LR: 0.00004339 [23:27:33] Epoch: 1 Batch: 2675/38378 (6.97%) Loss: 1.998315 LR: 0.00004339 [23:27:35] Epoch: 1 Batch: 2676/38378 (6.97%) Loss: 2.351486 LR: 0.00004339 [23:27:37] Epoch: 1 Batch: 2677/38378 (6.98%) Loss: 2.130790 LR: 0.00004339 [23:27:38] Epoch: 1 Batch: 2678/38378 (6.98%) Loss: 2.363429 LR: 0.00004351 [23:27:40] Epoch: 1 Batch: 2679/38378 (6.98%) Loss: 1.880832 LR: 0.00004351 [23:27:42] Epoch: 1 Batch: 2680/38378 (6.98%) Loss: 1.977058 LR: 0.00004351 [23:27:43] Epoch: 1 Batch: 2681/38378 (6.99%) Loss: 2.044522 LR: 0.00004351 [23:27:45] Epoch: 1 Batch: 2682/38378 (6.99%) Loss: 2.099239 LR: 0.00004351 [23:27:47] Epoch: 1 Batch: 2683/38378 (6.99%) Loss: 2.046808 LR: 0.00004351 [23:27:49] Epoch: 1 Batch: 2684/38378 (6.99%) Loss: 1.720592 LR: 0.00004351 [23:27:50] Epoch: 1 Batch: 2685/38378 (7.00%) Loss: 2.354264 LR: 0.00004362 [23:27:52] Epoch: 1 Batch: 2686/38378 (7.00%) Loss: 2.048023 LR: 0.00004362 [23:27:54] Epoch: 1 Batch: 2687/38378 (7.00%) Loss: 2.274082 LR: 0.00004362 [23:27:55] Epoch: 1 Batch: 2688/38378 (7.00%) Loss: 1.741634 LR: 0.00004362 [23:27:57] Epoch: 1 Batch: 2689/38378 (7.01%) Loss: 2.119365 LR: 0.00004362 [23:27:59] Epoch: 1 Batch: 2690/38378 (7.01%) Loss: 1.979411 LR: 0.00004362 [23:28:01] Epoch: 1 Batch: 2691/38378 (7.01%) Loss: 1.943356 LR: 0.00004362 [23:28:02] Epoch: 1 Batch: 2692/38378 (7.01%) Loss: 1.934111 LR: 0.00004374 [23:28:04] Epoch: 1 Batch: 2693/38378 (7.02%) Loss: 2.204762 LR: 0.00004374 [23:28:06] Epoch: 1 Batch: 2694/38378 (7.02%) Loss: 2.399249 LR: 0.00004374 [23:28:08] Epoch: 1 Batch: 2695/38378 (7.02%) Loss: 2.188204 LR: 0.00004374 [23:28:09] Epoch: 1 Batch: 2696/38378 (7.02%) Loss: 2.377270 LR: 0.00004374 [23:28:11] Epoch: 1 Batch: 2697/38378 (7.03%) Loss: 2.023346 LR: 0.00004374 [23:28:13] Epoch: 1 Batch: 2698/38378 (7.03%) Loss: 2.182932 LR: 0.00004374 [23:28:14] Epoch: 1 Batch: 2699/38378 (7.03%) Loss: 2.157082 LR: 0.00004385 [23:28:16] Epoch: 1 Batch: 2700/38378 (7.04%) Loss: 2.050459 LR: 0.00004385 [23:28:18] Epoch: 1 Batch: 2701/38378 (7.04%) Loss: 1.994134 LR: 0.00004385 [23:28:19] Epoch: 1 Batch: 2702/38378 (7.04%) Loss: 2.099288 LR: 0.00004385 [23:28:21] Epoch: 1 Batch: 2703/38378 (7.04%) Loss: 2.054786 LR: 0.00004385 [23:28:23] Epoch: 1 Batch: 2704/38378 (7.05%) Loss: 2.298101 LR: 0.00004385 [23:28:25] Epoch: 1 Batch: 2705/38378 (7.05%) Loss: 2.345712 LR: 0.00004385 [23:28:30] >> Cleaned up old temp checkpoint: epoch1_step2376 [23:28:30] >> Temp checkpoint saved: epoch1_step2706, size: 0.1702 GB [23:28:30] Epoch: 1 Batch: 2706/38378 (7.05%) Loss: 2.201215 LR: 0.00004396 [23:28:32] Epoch: 1 Batch: 2707/38378 (7.05%) Loss: 2.415853 LR: 0.00004396 [23:28:33] Epoch: 1 Batch: 2708/38378 (7.06%) Loss: 2.021423 LR: 0.00004396 [23:28:35] Epoch: 1 Batch: 2709/38378 (7.06%) Loss: 2.067646 LR: 0.00004396 [23:28:37] Epoch: 1 Batch: 2710/38378 (7.06%) Loss: 2.276038 LR: 0.00004396 [23:28:39] Epoch: 1 Batch: 2711/38378 (7.06%) Loss: 2.196244 LR: 0.00004396 [23:28:40] Epoch: 1 Batch: 2712/38378 (7.07%) Loss: 1.846590 LR: 0.00004396 [23:28:42] Epoch: 1 Batch: 2713/38378 (7.07%) Loss: 1.834542 LR: 0.00004408 [23:28:43] Epoch: 1 Batch: 2714/38378 (7.07%) Loss: 2.197329 LR: 0.00004408 [23:28:45] Epoch: 1 Batch: 2715/38378 (7.07%) Loss: 1.831817 LR: 0.00004408 [23:28:47] Epoch: 1 Batch: 2716/38378 (7.08%) Loss: 2.200826 LR: 0.00004408 [23:28:48] Epoch: 1 Batch: 2717/38378 (7.08%) Loss: 2.003673 LR: 0.00004408 [23:28:50] Epoch: 1 Batch: 2718/38378 (7.08%) Loss: 2.271400 LR: 0.00004408 [23:28:52] Epoch: 1 Batch: 2719/38378 (7.08%) Loss: 2.170201 LR: 0.00004408 [23:28:54] Epoch: 1 Batch: 2720/38378 (7.09%) Loss: 2.097388 LR: 0.00004419 [23:28:55] Epoch: 1 Batch: 2721/38378 (7.09%) Loss: 1.626284 LR: 0.00004419 [23:28:57] Epoch: 1 Batch: 2722/38378 (7.09%) Loss: 2.454423 LR: 0.00004419 [23:28:59] Epoch: 1 Batch: 2723/38378 (7.10%) Loss: 1.855198 LR: 0.00004419 [23:29:01] Epoch: 1 Batch: 2724/38378 (7.10%) Loss: 2.114324 LR: 0.00004419 [23:29:02] Epoch: 1 Batch: 2725/38378 (7.10%) Loss: 2.081022 LR: 0.00004419 [23:29:04] Epoch: 1 Batch: 2726/38378 (7.10%) Loss: 2.239003 LR: 0.00004419 [23:29:06] Epoch: 1 Batch: 2727/38378 (7.11%) Loss: 2.114091 LR: 0.00004431 [23:29:07] Epoch: 1 Batch: 2728/38378 (7.11%) Loss: 2.008112 LR: 0.00004431 [23:29:09] Epoch: 1 Batch: 2729/38378 (7.11%) Loss: 2.092388 LR: 0.00004431 [23:29:11] Epoch: 1 Batch: 2730/38378 (7.11%) Loss: 2.201090 LR: 0.00004431 [23:29:13] Epoch: 1 Batch: 2731/38378 (7.12%) Loss: 2.171645 LR: 0.00004431 [23:29:14] Epoch: 1 Batch: 2732/38378 (7.12%) Loss: 2.171017 LR: 0.00004431 [23:29:16] Epoch: 1 Batch: 2733/38378 (7.12%) Loss: 2.222183 LR: 0.00004431 [23:29:18] Epoch: 1 Batch: 2734/38378 (7.12%) Loss: 2.034724 LR: 0.00004442 [23:29:20] Epoch: 1 Batch: 2735/38378 (7.13%) Loss: 2.288110 LR: 0.00004442 [23:29:21] Epoch: 1 Batch: 2736/38378 (7.13%) Loss: 2.036421 LR: 0.00004442 [23:29:23] Epoch: 1 Batch: 2737/38378 (7.13%) Loss: 1.893699 LR: 0.00004442 [23:29:25] Epoch: 1 Batch: 2738/38378 (7.13%) Loss: 2.346658 LR: 0.00004442 [23:29:31] >> Cleaned up old temp checkpoint: epoch1_step2409 [23:29:31] >> Temp checkpoint saved: epoch1_step2739, size: 0.1702 GB [23:29:31] Epoch: 1 Batch: 2739/38378 (7.14%) Loss: 2.123895 LR: 0.00004442 [23:29:32] Epoch: 1 Batch: 2740/38378 (7.14%) Loss: 2.222238 LR: 0.00004442 [23:29:34] Epoch: 1 Batch: 2741/38378 (7.14%) Loss: 1.913351 LR: 0.00004453 [23:29:36] Epoch: 1 Batch: 2742/38378 (7.14%) Loss: 2.146994 LR: 0.00004453 [23:29:37] Epoch: 1 Batch: 2743/38378 (7.15%) Loss: 2.086460 LR: 0.00004453 [23:29:39] Epoch: 1 Batch: 2744/38378 (7.15%) Loss: 2.136325 LR: 0.00004453 [23:29:41] Epoch: 1 Batch: 2745/38378 (7.15%) Loss: 2.190209 LR: 0.00004453 [23:29:42] Epoch: 1 Batch: 2746/38378 (7.16%) Loss: 1.889293 LR: 0.00004453 [23:29:44] Epoch: 1 Batch: 2747/38378 (7.16%) Loss: 2.025436 LR: 0.00004453 [23:29:46] Epoch: 1 Batch: 2748/38378 (7.16%) Loss: 1.861417 LR: 0.00004465 [23:29:48] Epoch: 1 Batch: 2749/38378 (7.16%) Loss: 2.117957 LR: 0.00004465 [23:29:49] Epoch: 1 Batch: 2750/38378 (7.17%) Loss: 2.055618 LR: 0.00004465 [23:29:51] Epoch: 1 Batch: 2751/38378 (7.17%) Loss: 2.401686 LR: 0.00004465 [23:29:53] Epoch: 1 Batch: 2752/38378 (7.17%) Loss: 2.193048 LR: 0.00004465 [23:29:54] Epoch: 1 Batch: 2753/38378 (7.17%) Loss: 1.936103 LR: 0.00004465 [23:29:56] Epoch: 1 Batch: 2754/38378 (7.18%) Loss: 2.058488 LR: 0.00004465 [23:29:58] Epoch: 1 Batch: 2755/38378 (7.18%) Loss: 1.926870 LR: 0.00004476 [23:30:00] Epoch: 1 Batch: 2756/38378 (7.18%) Loss: 1.906889 LR: 0.00004476 [23:30:01] Epoch: 1 Batch: 2757/38378 (7.18%) Loss: 2.310630 LR: 0.00004476 [23:30:03] Epoch: 1 Batch: 2758/38378 (7.19%) Loss: 2.070721 LR: 0.00004476 [23:30:05] Epoch: 1 Batch: 2759/38378 (7.19%) Loss: 2.344788 LR: 0.00004476 [23:30:06] Epoch: 1 Batch: 2760/38378 (7.19%) Loss: 1.930302 LR: 0.00004476 [23:30:08] Epoch: 1 Batch: 2761/38378 (7.19%) Loss: 2.322395 LR: 0.00004476 [23:30:10] Epoch: 1 Batch: 2762/38378 (7.20%) Loss: 2.118234 LR: 0.00004487 [23:30:12] Epoch: 1 Batch: 2763/38378 (7.20%) Loss: 1.757824 LR: 0.00004487 [23:30:13] Epoch: 1 Batch: 2764/38378 (7.20%) Loss: 1.871361 LR: 0.00004487 [23:30:15] Epoch: 1 Batch: 2765/38378 (7.20%) Loss: 1.772661 LR: 0.00004487 [23:30:17] Epoch: 1 Batch: 2766/38378 (7.21%) Loss: 2.086480 LR: 0.00004487 [23:30:18] Epoch: 1 Batch: 2767/38378 (7.21%) Loss: 2.082543 LR: 0.00004487 [23:30:20] Epoch: 1 Batch: 2768/38378 (7.21%) Loss: 2.279593 LR: 0.00004487 [23:30:22] Epoch: 1 Batch: 2769/38378 (7.22%) Loss: 2.429686 LR: 0.00004499 [23:30:24] Epoch: 1 Batch: 2770/38378 (7.22%) Loss: 2.223535 LR: 0.00004499 [23:30:25] Epoch: 1 Batch: 2771/38378 (7.22%) Loss: 2.127789 LR: 0.00004499 [23:30:31] >> Cleaned up old temp checkpoint: epoch1_step2442 [23:30:31] >> Temp checkpoint saved: epoch1_step2772, size: 0.1702 GB [23:30:31] Epoch: 1 Batch: 2772/38378 (7.22%) Loss: 1.983861 LR: 0.00004499 [23:30:33] Epoch: 1 Batch: 2773/38378 (7.23%) Loss: 1.794764 LR: 0.00004499 [23:30:34] Epoch: 1 Batch: 2774/38378 (7.23%) Loss: 1.989872 LR: 0.00004499 [23:30:36] Epoch: 1 Batch: 2775/38378 (7.23%) Loss: 1.958883 LR: 0.00004499 [23:30:38] Epoch: 1 Batch: 2776/38378 (7.23%) Loss: 2.072640 LR: 0.00004510 [23:30:39] Epoch: 1 Batch: 2777/38378 (7.24%) Loss: 2.107469 LR: 0.00004510 [23:30:41] Epoch: 1 Batch: 2778/38378 (7.24%) Loss: 2.178393 LR: 0.00004510 [23:30:43] Epoch: 1 Batch: 2779/38378 (7.24%) Loss: 1.945143 LR: 0.00004510 [23:30:44] Epoch: 1 Batch: 2780/38378 (7.24%) Loss: 1.937206 LR: 0.00004510 [23:30:46] Epoch: 1 Batch: 2781/38378 (7.25%) Loss: 2.000920 LR: 0.00004510 [23:30:47] Epoch: 1 Batch: 2782/38378 (7.25%) Loss: 2.150360 LR: 0.00004510 [23:30:49] Epoch: 1 Batch: 2783/38378 (7.25%) Loss: 2.077252 LR: 0.00004522 [23:30:51] Epoch: 1 Batch: 2784/38378 (7.25%) Loss: 2.223091 LR: 0.00004522 [23:30:53] Epoch: 1 Batch: 2785/38378 (7.26%) Loss: 2.095732 LR: 0.00004522 [23:30:54] Epoch: 1 Batch: 2786/38378 (7.26%) Loss: 2.032708 LR: 0.00004522 [23:30:56] Epoch: 1 Batch: 2787/38378 (7.26%) Loss: 1.932890 LR: 0.00004522 [23:30:58] Epoch: 1 Batch: 2788/38378 (7.26%) Loss: 2.296624 LR: 0.00004522 [23:30:59] Epoch: 1 Batch: 2789/38378 (7.27%) Loss: 2.033661 LR: 0.00004522 [23:31:01] Epoch: 1 Batch: 2790/38378 (7.27%) Loss: 2.037749 LR: 0.00004533 [23:31:03] Epoch: 1 Batch: 2791/38378 (7.27%) Loss: 2.223700 LR: 0.00004533 [23:31:05] Epoch: 1 Batch: 2792/38378 (7.28%) Loss: 2.482026 LR: 0.00004533 [23:31:06] Epoch: 1 Batch: 2793/38378 (7.28%) Loss: 2.249235 LR: 0.00004533 [23:31:08] Epoch: 1 Batch: 2794/38378 (7.28%) Loss: 2.045522 LR: 0.00004533 [23:31:10] Epoch: 1 Batch: 2795/38378 (7.28%) Loss: 1.830891 LR: 0.00004533 [23:31:11] Epoch: 1 Batch: 2796/38378 (7.29%) Loss: 2.175030 LR: 0.00004533 [23:31:13] Epoch: 1 Batch: 2797/38378 (7.29%) Loss: 2.220475 LR: 0.00004544 [23:31:15] Epoch: 1 Batch: 2798/38378 (7.29%) Loss: 2.372560 LR: 0.00004544 [23:31:17] Epoch: 1 Batch: 2799/38378 (7.29%) Loss: 2.471156 LR: 0.00004544 [23:31:18] Epoch: 1 Batch: 2800/38378 (7.30%) Loss: 2.054988 LR: 0.00004544 [23:31:20] Epoch: 1 Batch: 2801/38378 (7.30%) Loss: 2.093137 LR: 0.00004544 [23:31:22] Epoch: 1 Batch: 2802/38378 (7.30%) Loss: 2.296959 LR: 0.00004544 [23:31:23] Epoch: 1 Batch: 2803/38378 (7.30%) Loss: 1.977358 LR: 0.00004544 [23:31:25] Epoch: 1 Batch: 2804/38378 (7.31%) Loss: 1.990342 LR: 0.00004556 [23:31:31] >> Cleaned up old temp checkpoint: epoch1_step2475 [23:31:31] >> Temp checkpoint saved: epoch1_step2805, size: 0.1702 GB [23:31:31] Epoch: 1 Batch: 2805/38378 (7.31%) Loss: 2.116326 LR: 0.00004556 [23:31:33] Epoch: 1 Batch: 2806/38378 (7.31%) Loss: 2.248602 LR: 0.00004556 [23:31:34] Epoch: 1 Batch: 2807/38378 (7.31%) Loss: 2.106040 LR: 0.00004556 [23:31:36] Epoch: 1 Batch: 2808/38378 (7.32%) Loss: 2.162217 LR: 0.00004556 [23:31:38] Epoch: 1 Batch: 2809/38378 (7.32%) Loss: 1.969230 LR: 0.00004556 [23:31:39] Epoch: 1 Batch: 2810/38378 (7.32%) Loss: 2.066475 LR: 0.00004556 [23:31:41] Epoch: 1 Batch: 2811/38378 (7.32%) Loss: 2.053093 LR: 0.00004567 [23:31:43] Epoch: 1 Batch: 2812/38378 (7.33%) Loss: 1.965694 LR: 0.00004567 [23:31:45] Epoch: 1 Batch: 2813/38378 (7.33%) Loss: 2.043386 LR: 0.00004567 [23:31:46] Epoch: 1 Batch: 2814/38378 (7.33%) Loss: 2.053178 LR: 0.00004567 [23:31:48] Epoch: 1 Batch: 2815/38378 (7.33%) Loss: 2.198084 LR: 0.00004567 [23:31:50] Epoch: 1 Batch: 2816/38378 (7.34%) Loss: 1.884850 LR: 0.00004567 [23:31:51] Epoch: 1 Batch: 2817/38378 (7.34%) Loss: 2.164388 LR: 0.00004567 [23:31:53] Epoch: 1 Batch: 2818/38378 (7.34%) Loss: 2.183155 LR: 0.00004579 [23:31:55] Epoch: 1 Batch: 2819/38378 (7.35%) Loss: 1.933461 LR: 0.00004579 [23:31:57] Epoch: 1 Batch: 2820/38378 (7.35%) Loss: 2.042251 LR: 0.00004579 [23:31:58] Epoch: 1 Batch: 2821/38378 (7.35%) Loss: 1.999437 LR: 0.00004579 [23:32:00] Epoch: 1 Batch: 2822/38378 (7.35%) Loss: 2.120438 LR: 0.00004579 [23:32:02] Epoch: 1 Batch: 2823/38378 (7.36%) Loss: 2.158878 LR: 0.00004579 [23:32:03] Epoch: 1 Batch: 2824/38378 (7.36%) Loss: 1.862318 LR: 0.00004579 [23:32:05] Epoch: 1 Batch: 2825/38378 (7.36%) Loss: 2.313287 LR: 0.00004590 [23:32:07] Epoch: 1 Batch: 2826/38378 (7.36%) Loss: 2.118741 LR: 0.00004590 [23:32:09] Epoch: 1 Batch: 2827/38378 (7.37%) Loss: 2.090606 LR: 0.00004590 [23:32:10] Epoch: 1 Batch: 2828/38378 (7.37%) Loss: 2.375948 LR: 0.00004590 [23:32:12] Epoch: 1 Batch: 2829/38378 (7.37%) Loss: 2.175883 LR: 0.00004590 [23:32:14] Epoch: 1 Batch: 2830/38378 (7.37%) Loss: 2.346854 LR: 0.00004590 [23:32:15] Epoch: 1 Batch: 2831/38378 (7.38%) Loss: 1.920561 LR: 0.00004590 [23:32:17] Epoch: 1 Batch: 2832/38378 (7.38%) Loss: 2.121428 LR: 0.00004601 [23:32:19] Epoch: 1 Batch: 2833/38378 (7.38%) Loss: 2.084292 LR: 0.00004601 [23:32:21] Epoch: 1 Batch: 2834/38378 (7.38%) Loss: 2.137857 LR: 0.00004601 [23:32:22] Epoch: 1 Batch: 2835/38378 (7.39%) Loss: 2.271318 LR: 0.00004601 [23:32:24] Epoch: 1 Batch: 2836/38378 (7.39%) Loss: 2.233318 LR: 0.00004601 [23:32:26] Epoch: 1 Batch: 2837/38378 (7.39%) Loss: 2.060320 LR: 0.00004601 [23:32:31] >> Cleaned up old temp checkpoint: epoch1_step2508 [23:32:31] >> Temp checkpoint saved: epoch1_step2838, size: 0.1702 GB [23:32:31] Epoch: 1 Batch: 2838/38378 (7.39%) Loss: 2.287112 LR: 0.00004601 [23:32:33] Epoch: 1 Batch: 2839/38378 (7.40%) Loss: 2.108496 LR: 0.00004613 [23:32:35] Epoch: 1 Batch: 2840/38378 (7.40%) Loss: 2.070480 LR: 0.00004613 [23:32:36] Epoch: 1 Batch: 2841/38378 (7.40%) Loss: 2.063335 LR: 0.00004613 [23:32:38] Epoch: 1 Batch: 2842/38378 (7.41%) Loss: 2.141643 LR: 0.00004613 [23:32:40] Epoch: 1 Batch: 2843/38378 (7.41%) Loss: 2.379064 LR: 0.00004613 [23:32:41] Epoch: 1 Batch: 2844/38378 (7.41%) Loss: 2.225612 LR: 0.00004613 [23:32:43] Epoch: 1 Batch: 2845/38378 (7.41%) Loss: 2.261684 LR: 0.00004613 [23:32:45] Epoch: 1 Batch: 2846/38378 (7.42%) Loss: 1.971400 LR: 0.00004624 [23:32:46] Epoch: 1 Batch: 2847/38378 (7.42%) Loss: 1.937964 LR: 0.00004624 [23:32:48] Epoch: 1 Batch: 2848/38378 (7.42%) Loss: 2.140193 LR: 0.00004624 [23:32:50] Epoch: 1 Batch: 2849/38378 (7.42%) Loss: 2.255605 LR: 0.00004624 [23:32:52] Epoch: 1 Batch: 2850/38378 (7.43%) Loss: 2.036391 LR: 0.00004624 [23:32:53] Epoch: 1 Batch: 2851/38378 (7.43%) Loss: 2.084508 LR: 0.00004624 [23:32:55] Epoch: 1 Batch: 2852/38378 (7.43%) Loss: 2.341056 LR: 0.00004624 [23:32:57] Epoch: 1 Batch: 2853/38378 (7.43%) Loss: 2.215845 LR: 0.00004636 [23:32:58] Epoch: 1 Batch: 2854/38378 (7.44%) Loss: 2.184735 LR: 0.00004636 [23:33:00] Epoch: 1 Batch: 2855/38378 (7.44%) Loss: 2.248746 LR: 0.00004636 [23:33:02] Epoch: 1 Batch: 2856/38378 (7.44%) Loss: 2.100674 LR: 0.00004636 [23:33:04] Epoch: 1 Batch: 2857/38378 (7.44%) Loss: 2.067689 LR: 0.00004636 [23:33:05] Epoch: 1 Batch: 2858/38378 (7.45%) Loss: 2.060108 LR: 0.00004636 [23:33:07] Epoch: 1 Batch: 2859/38378 (7.45%) Loss: 2.107181 LR: 0.00004636 [23:33:09] Epoch: 1 Batch: 2860/38378 (7.45%) Loss: 2.224648 LR: 0.00004647 [23:33:10] Epoch: 1 Batch: 2861/38378 (7.45%) Loss: 2.202618 LR: 0.00004647 [23:33:12] Epoch: 1 Batch: 2862/38378 (7.46%) Loss: 2.007692 LR: 0.00004647 [23:33:14] Epoch: 1 Batch: 2863/38378 (7.46%) Loss: 2.490119 LR: 0.00004647 [23:33:16] Epoch: 1 Batch: 2864/38378 (7.46%) Loss: 1.949979 LR: 0.00004647 [23:33:17] Epoch: 1 Batch: 2865/38378 (7.47%) Loss: 2.381131 LR: 0.00004647 [23:33:19] Epoch: 1 Batch: 2866/38378 (7.47%) Loss: 2.008822 LR: 0.00004647 [23:33:21] Epoch: 1 Batch: 2867/38378 (7.47%) Loss: 2.287667 LR: 0.00004658 [23:33:23] Epoch: 1 Batch: 2868/38378 (7.47%) Loss: 2.348686 LR: 0.00004658 [23:33:24] Epoch: 1 Batch: 2869/38378 (7.48%) Loss: 2.137073 LR: 0.00004658 [23:33:26] Epoch: 1 Batch: 2870/38378 (7.48%) Loss: 2.034303 LR: 0.00004658 [23:33:31] >> Cleaned up old temp checkpoint: epoch1_step2541 [23:33:31] >> Temp checkpoint saved: epoch1_step2871, size: 0.1702 GB [23:33:31] Epoch: 1 Batch: 2871/38378 (7.48%) Loss: 2.242398 LR: 0.00004658 [23:33:33] Epoch: 1 Batch: 2872/38378 (7.48%) Loss: 2.107256 LR: 0.00004658 [23:33:35] Epoch: 1 Batch: 2873/38378 (7.49%) Loss: 2.127366 LR: 0.00004658 [23:33:36] Epoch: 1 Batch: 2874/38378 (7.49%) Loss: 2.076053 LR: 0.00004670 [23:33:38] Epoch: 1 Batch: 2875/38378 (7.49%) Loss: 1.961156 LR: 0.00004670 [23:33:40] Epoch: 1 Batch: 2876/38378 (7.49%) Loss: 1.985126 LR: 0.00004670 [23:33:41] Epoch: 1 Batch: 2877/38378 (7.50%) Loss: 1.946758 LR: 0.00004670 [23:33:43] Epoch: 1 Batch: 2878/38378 (7.50%) Loss: 2.365103 LR: 0.00004670 [23:33:45] Epoch: 1 Batch: 2879/38378 (7.50%) Loss: 2.508772 LR: 0.00004670 [23:33:47] Epoch: 1 Batch: 2880/38378 (7.50%) Loss: 2.205335 LR: 0.00004670 [23:33:48] Epoch: 1 Batch: 2881/38378 (7.51%) Loss: 2.151121 LR: 0.00004681 [23:33:50] Epoch: 1 Batch: 2882/38378 (7.51%) Loss: 2.085858 LR: 0.00004681 [23:33:52] Epoch: 1 Batch: 2883/38378 (7.51%) Loss: 1.792478 LR: 0.00004681 [23:33:53] Epoch: 1 Batch: 2884/38378 (7.51%) Loss: 1.906105 LR: 0.00004681 [23:33:55] Epoch: 1 Batch: 2885/38378 (7.52%) Loss: 1.947727 LR: 0.00004681 [23:33:57] Epoch: 1 Batch: 2886/38378 (7.52%) Loss: 2.023829 LR: 0.00004681 [23:33:58] Epoch: 1 Batch: 2887/38378 (7.52%) Loss: 1.922450 LR: 0.00004681 [23:34:00] Epoch: 1 Batch: 2888/38378 (7.53%) Loss: 1.848824 LR: 0.00004692 [23:34:02] Epoch: 1 Batch: 2889/38378 (7.53%) Loss: 1.938638 LR: 0.00004692 [23:34:04] Epoch: 1 Batch: 2890/38378 (7.53%) Loss: 2.095084 LR: 0.00004692 [23:34:05] Epoch: 1 Batch: 2891/38378 (7.53%) Loss: 2.412039 LR: 0.00004692 [23:34:07] Epoch: 1 Batch: 2892/38378 (7.54%) Loss: 2.335267 LR: 0.00004692 [23:34:09] Epoch: 1 Batch: 2893/38378 (7.54%) Loss: 2.425535 LR: 0.00004692 [23:34:10] Epoch: 1 Batch: 2894/38378 (7.54%) Loss: 1.930607 LR: 0.00004692 [23:34:12] Epoch: 1 Batch: 2895/38378 (7.54%) Loss: 2.131195 LR: 0.00004704 [23:34:14] Epoch: 1 Batch: 2896/38378 (7.55%) Loss: 2.003519 LR: 0.00004704 [23:34:16] Epoch: 1 Batch: 2897/38378 (7.55%) Loss: 2.191966 LR: 0.00004704 [23:34:17] Epoch: 1 Batch: 2898/38378 (7.55%) Loss: 2.132141 LR: 0.00004704 [23:34:19] Epoch: 1 Batch: 2899/38378 (7.55%) Loss: 1.887691 LR: 0.00004704 [23:34:21] Epoch: 1 Batch: 2900/38378 (7.56%) Loss: 2.312617 LR: 0.00004704 [23:34:22] Epoch: 1 Batch: 2901/38378 (7.56%) Loss: 1.880995 LR: 0.00004704 [23:34:24] Epoch: 1 Batch: 2902/38378 (7.56%) Loss: 2.151476 LR: 0.00004715 [23:34:26] Epoch: 1 Batch: 2903/38378 (7.56%) Loss: 2.442139 LR: 0.00004715 [23:34:31] >> Cleaned up old temp checkpoint: epoch1_step2574 [23:34:31] >> Temp checkpoint saved: epoch1_step2904, size: 0.1702 GB [23:34:31] Epoch: 1 Batch: 2904/38378 (7.57%) Loss: 2.411016 LR: 0.00004715 [23:34:33] Epoch: 1 Batch: 2905/38378 (7.57%) Loss: 2.170861 LR: 0.00004715 [23:34:35] Epoch: 1 Batch: 2906/38378 (7.57%) Loss: 2.047992 LR: 0.00004715 [23:34:36] Epoch: 1 Batch: 2907/38378 (7.57%) Loss: 2.102328 LR: 0.00004715 [23:34:38] Epoch: 1 Batch: 2908/38378 (7.58%) Loss: 2.261783 LR: 0.00004715 [23:34:40] Epoch: 1 Batch: 2909/38378 (7.58%) Loss: 1.827680 LR: 0.00004727 [23:34:41] Epoch: 1 Batch: 2910/38378 (7.58%) Loss: 1.945661 LR: 0.00004727 [23:34:43] Epoch: 1 Batch: 2911/38378 (7.59%) Loss: 2.061609 LR: 0.00004727 [23:34:45] Epoch: 1 Batch: 2912/38378 (7.59%) Loss: 2.341881 LR: 0.00004727 [23:34:47] Epoch: 1 Batch: 2913/38378 (7.59%) Loss: 1.888567 LR: 0.00004727 [23:34:48] Epoch: 1 Batch: 2914/38378 (7.59%) Loss: 2.362970 LR: 0.00004727 [23:34:50] Epoch: 1 Batch: 2915/38378 (7.60%) Loss: 2.324730 LR: 0.00004727 [23:34:52] Epoch: 1 Batch: 2916/38378 (7.60%) Loss: 2.090279 LR: 0.00004738 [23:34:53] Epoch: 1 Batch: 2917/38378 (7.60%) Loss: 2.163359 LR: 0.00004738 [23:34:55] Epoch: 1 Batch: 2918/38378 (7.60%) Loss: 2.242092 LR: 0.00004738 [23:34:57] Epoch: 1 Batch: 2919/38378 (7.61%) Loss: 1.922567 LR: 0.00004738 [23:34:58] Epoch: 1 Batch: 2920/38378 (7.61%) Loss: 2.294524 LR: 0.00004738 [23:35:00] Epoch: 1 Batch: 2921/38378 (7.61%) Loss: 2.040353 LR: 0.00004738 [23:35:02] Epoch: 1 Batch: 2922/38378 (7.61%) Loss: 2.100698 LR: 0.00004738 [23:35:04] Epoch: 1 Batch: 2923/38378 (7.62%) Loss: 1.995748 LR: 0.00004749 [23:35:05] Epoch: 1 Batch: 2924/38378 (7.62%) Loss: 1.759020 LR: 0.00004749 [23:35:07] Epoch: 1 Batch: 2925/38378 (7.62%) Loss: 2.389180 LR: 0.00004749 [23:35:09] Epoch: 1 Batch: 2926/38378 (7.62%) Loss: 2.092687 LR: 0.00004749 [23:35:10] Epoch: 1 Batch: 2927/38378 (7.63%) Loss: 1.944916 LR: 0.00004749 [23:35:12] Epoch: 1 Batch: 2928/38378 (7.63%) Loss: 2.204448 LR: 0.00004749 [23:35:14] Epoch: 1 Batch: 2929/38378 (7.63%) Loss: 2.000234 LR: 0.00004749 [23:35:16] Epoch: 1 Batch: 2930/38378 (7.63%) Loss: 1.999399 LR: 0.00004761 [23:35:17] Epoch: 1 Batch: 2931/38378 (7.64%) Loss: 1.939745 LR: 0.00004761 [23:35:19] Epoch: 1 Batch: 2932/38378 (7.64%) Loss: 2.117894 LR: 0.00004761 [23:35:21] Epoch: 1 Batch: 2933/38378 (7.64%) Loss: 2.175213 LR: 0.00004761 [23:35:23] Epoch: 1 Batch: 2934/38378 (7.65%) Loss: 1.882061 LR: 0.00004761 [23:35:24] Epoch: 1 Batch: 2935/38378 (7.65%) Loss: 1.930691 LR: 0.00004761 [23:35:26] Epoch: 1 Batch: 2936/38378 (7.65%) Loss: 2.007210 LR: 0.00004761 [23:35:31] >> Cleaned up old temp checkpoint: epoch1_step2607 [23:35:31] >> Temp checkpoint saved: epoch1_step2937, size: 0.1702 GB [23:35:31] Epoch: 1 Batch: 2937/38378 (7.65%) Loss: 2.080753 LR: 0.00004772 [23:35:33] Epoch: 1 Batch: 2938/38378 (7.66%) Loss: 2.231810 LR: 0.00004772 [23:35:35] Epoch: 1 Batch: 2939/38378 (7.66%) Loss: 2.168559 LR: 0.00004772 [23:35:37] Epoch: 1 Batch: 2940/38378 (7.66%) Loss: 1.940776 LR: 0.00004772 [23:35:38] Epoch: 1 Batch: 2941/38378 (7.66%) Loss: 2.160566 LR: 0.00004772 [23:35:40] Epoch: 1 Batch: 2942/38378 (7.67%) Loss: 2.105378 LR: 0.00004772 [23:35:42] Epoch: 1 Batch: 2943/38378 (7.67%) Loss: 2.207187 LR: 0.00004772 [23:35:43] Epoch: 1 Batch: 2944/38378 (7.67%) Loss: 2.084618 LR: 0.00004784 [23:35:45] Epoch: 1 Batch: 2945/38378 (7.67%) Loss: 2.279881 LR: 0.00004784 [23:35:47] Epoch: 1 Batch: 2946/38378 (7.68%) Loss: 2.177884 LR: 0.00004784 [23:35:49] Epoch: 1 Batch: 2947/38378 (7.68%) Loss: 1.980320 LR: 0.00004784 [23:35:50] Epoch: 1 Batch: 2948/38378 (7.68%) Loss: 2.060420 LR: 0.00004784 [23:35:52] Epoch: 1 Batch: 2949/38378 (7.68%) Loss: 2.356809 LR: 0.00004784 [23:35:54] Epoch: 1 Batch: 2950/38378 (7.69%) Loss: 1.860972 LR: 0.00004784 [23:35:55] Epoch: 1 Batch: 2951/38378 (7.69%) Loss: 2.036846 LR: 0.00004795 [23:35:57] Epoch: 1 Batch: 2952/38378 (7.69%) Loss: 2.157434 LR: 0.00004795 [23:35:59] Epoch: 1 Batch: 2953/38378 (7.69%) Loss: 2.207828 LR: 0.00004795 [23:36:01] Epoch: 1 Batch: 2954/38378 (7.70%) Loss: 2.070905 LR: 0.00004795 [23:36:02] Epoch: 1 Batch: 2955/38378 (7.70%) Loss: 2.068813 LR: 0.00004795 [23:36:04] Epoch: 1 Batch: 2956/38378 (7.70%) Loss: 2.062347 LR: 0.00004795 [23:36:06] Epoch: 1 Batch: 2957/38378 (7.70%) Loss: 2.249961 LR: 0.00004795 [23:36:08] Epoch: 1 Batch: 2958/38378 (7.71%) Loss: 2.094690 LR: 0.00004806 [23:36:09] Epoch: 1 Batch: 2959/38378 (7.71%) Loss: 2.197634 LR: 0.00004806 [23:36:11] Epoch: 1 Batch: 2960/38378 (7.71%) Loss: 2.117057 LR: 0.00004806 [23:36:13] Epoch: 1 Batch: 2961/38378 (7.72%) Loss: 1.886536 LR: 0.00004806 [23:36:14] Epoch: 1 Batch: 2962/38378 (7.72%) Loss: 2.040452 LR: 0.00004806 [23:36:16] Epoch: 1 Batch: 2963/38378 (7.72%) Loss: 2.222139 LR: 0.00004806 [23:36:18] Epoch: 1 Batch: 2964/38378 (7.72%) Loss: 2.144346 LR: 0.00004806 [23:36:20] Epoch: 1 Batch: 2965/38378 (7.73%) Loss: 1.950423 LR: 0.00004818 [23:36:21] Epoch: 1 Batch: 2966/38378 (7.73%) Loss: 2.038025 LR: 0.00004818 [23:36:23] Epoch: 1 Batch: 2967/38378 (7.73%) Loss: 2.094963 LR: 0.00004818 [23:36:25] Epoch: 1 Batch: 2968/38378 (7.73%) Loss: 2.120322 LR: 0.00004818 [23:36:26] Epoch: 1 Batch: 2969/38378 (7.74%) Loss: 1.996073 LR: 0.00004818 [23:36:32] >> Cleaned up old temp checkpoint: epoch1_step2640 [23:36:32] >> Temp checkpoint saved: epoch1_step2970, size: 0.1702 GB [23:36:32] Epoch: 1 Batch: 2970/38378 (7.74%) Loss: 1.963540 LR: 0.00004818 [23:36:34] Epoch: 1 Batch: 2971/38378 (7.74%) Loss: 1.919069 LR: 0.00004818 [23:36:35] Epoch: 1 Batch: 2972/38378 (7.74%) Loss: 2.086324 LR: 0.00004829 [23:36:37] Epoch: 1 Batch: 2973/38378 (7.75%) Loss: 2.223399 LR: 0.00004829 [23:36:39] Epoch: 1 Batch: 2974/38378 (7.75%) Loss: 2.218710 LR: 0.00004829 [23:36:40] Epoch: 1 Batch: 2975/38378 (7.75%) Loss: 2.137966 LR: 0.00004829 [23:36:42] Epoch: 1 Batch: 2976/38378 (7.75%) Loss: 2.039804 LR: 0.00004829 [23:36:44] Epoch: 1 Batch: 2977/38378 (7.76%) Loss: 2.325732 LR: 0.00004829 [23:36:45] Epoch: 1 Batch: 2978/38378 (7.76%) Loss: 2.098118 LR: 0.00004829 [23:36:47] Epoch: 1 Batch: 2979/38378 (7.76%) Loss: 2.087634 LR: 0.00004841 [23:36:49] Epoch: 1 Batch: 2980/38378 (7.76%) Loss: 1.959323 LR: 0.00004841 [23:36:51] Epoch: 1 Batch: 2981/38378 (7.77%) Loss: 2.292839 LR: 0.00004841 [23:36:52] Epoch: 1 Batch: 2982/38378 (7.77%) Loss: 1.973872 LR: 0.00004841 [23:36:54] Epoch: 1 Batch: 2983/38378 (7.77%) Loss: 2.216008 LR: 0.00004841 [23:36:56] Epoch: 1 Batch: 2984/38378 (7.78%) Loss: 2.063330 LR: 0.00004841 [23:36:57] Epoch: 1 Batch: 2985/38378 (7.78%) Loss: 2.123426 LR: 0.00004841 [23:36:59] Epoch: 1 Batch: 2986/38378 (7.78%) Loss: 2.200953 LR: 0.00004852 [23:37:01] Epoch: 1 Batch: 2987/38378 (7.78%) Loss: 2.065748 LR: 0.00004852 [23:37:03] Epoch: 1 Batch: 2988/38378 (7.79%) Loss: 1.938581 LR: 0.00004852 [23:37:04] Epoch: 1 Batch: 2989/38378 (7.79%) Loss: 1.944660 LR: 0.00004852 [23:37:06] Epoch: 1 Batch: 2990/38378 (7.79%) Loss: 2.349703 LR: 0.00004852 [23:37:08] Epoch: 1 Batch: 2991/38378 (7.79%) Loss: 1.759314 LR: 0.00004852 [23:37:10] Epoch: 1 Batch: 2992/38378 (7.80%) Loss: 2.172476 LR: 0.00004852 [23:37:11] Epoch: 1 Batch: 2993/38378 (7.80%) Loss: 2.076606 LR: 0.00004863 [23:37:13] Epoch: 1 Batch: 2994/38378 (7.80%) Loss: 1.782976 LR: 0.00004863 [23:37:15] Epoch: 1 Batch: 2995/38378 (7.80%) Loss: 2.037318 LR: 0.00004863 [23:37:16] Epoch: 1 Batch: 2996/38378 (7.81%) Loss: 1.993797 LR: 0.00004863 [23:37:18] Epoch: 1 Batch: 2997/38378 (7.81%) Loss: 1.994588 LR: 0.00004863 [23:37:20] Epoch: 1 Batch: 2998/38378 (7.81%) Loss: 2.079484 LR: 0.00004863 [23:37:22] Epoch: 1 Batch: 2999/38378 (7.81%) Loss: 2.201092 LR: 0.00004863 [23:37:23] >> Evaluating batch 0 [23:37:24] >> Evaluating batch 1 [23:37:25] >> Evaluating batch 2 [23:37:26] >> Evaluating batch 3 [23:37:27] >> Evaluating batch 4 [23:37:28] >> Evaluating batch 5 [23:37:29] >> Evaluating batch 6 [23:37:30] >> Evaluating batch 7 [23:37:31] >> Evaluating batch 8 [23:37:32] >> Evaluating batch 9 [23:37:33] >> Evaluating batch 10 [23:37:34] >> Evaluating batch 11 [23:37:35] >> Evaluating batch 12 [23:37:36] >> Evaluating batch 13 [23:37:36] >> Evaluating batch 14 [23:37:37] >> Evaluating batch 15 [23:37:38] >> Evaluating batch 16 [23:37:39] Epoch: 1 Step: 3000/38378 Evaluation: [23:37:39] [1mAvg Loss Since Last Eval: 2.1075 Val Loss: 2.2121 Validation loss delta: -0.0130 Perplexity: 9.1347 LR: 0.00004875 [23:37:43] >> Checkpoint saved: epoch1_step3000, size: 0.1702 GB [23:37:43] Epoch: 1 Batch: 3000/38378 (7.82%) Loss: 2.443515 LR: 0.00004875 [23:37:45] Epoch: 1 Batch: 3001/38378 (7.82%) Loss: 2.151595 LR: 0.00004875 [23:37:46] Epoch: 1 Batch: 3002/38378 (7.82%) Loss: 1.915406 LR: 0.00004875 [23:37:52] >> Cleaned up old temp checkpoint: epoch1_step2673 [23:37:52] >> Temp checkpoint saved: epoch1_step3003, size: 0.1702 GB [23:37:52] Epoch: 1 Batch: 3003/38378 (7.82%) Loss: 2.056670 LR: 0.00004875 [23:37:54] Epoch: 1 Batch: 3004/38378 (7.83%) Loss: 1.941512 LR: 0.00004875 [23:37:55] Epoch: 1 Batch: 3005/38378 (7.83%) Loss: 2.041295 LR: 0.00004875 [23:37:57] Epoch: 1 Batch: 3006/38378 (7.83%) Loss: 2.181945 LR: 0.00004875 [23:37:59] Epoch: 1 Batch: 3007/38378 (7.84%) Loss: 1.894192 LR: 0.00004886 [23:38:01] Epoch: 1 Batch: 3008/38378 (7.84%) Loss: 2.243123 LR: 0.00004886 [23:38:02] Epoch: 1 Batch: 3009/38378 (7.84%) Loss: 2.171485 LR: 0.00004886 [23:38:04] Epoch: 1 Batch: 3010/38378 (7.84%) Loss: 2.212211 LR: 0.00004886 [23:38:06] Epoch: 1 Batch: 3011/38378 (7.85%) Loss: 2.232422 LR: 0.00004886 [23:38:07] Epoch: 1 Batch: 3012/38378 (7.85%) Loss: 1.913789 LR: 0.00004886 [23:38:09] Epoch: 1 Batch: 3013/38378 (7.85%) Loss: 1.832171 LR: 0.00004886 [23:38:11] Epoch: 1 Batch: 3014/38378 (7.85%) Loss: 2.073514 LR: 0.00004897 [23:38:13] Epoch: 1 Batch: 3015/38378 (7.86%) Loss: 2.211784 LR: 0.00004897 [23:38:14] Epoch: 1 Batch: 3016/38378 (7.86%) Loss: 2.293584 LR: 0.00004897 [23:38:16] Epoch: 1 Batch: 3017/38378 (7.86%) Loss: 2.097262 LR: 0.00004897 [23:38:18] Epoch: 1 Batch: 3018/38378 (7.86%) Loss: 2.440688 LR: 0.00004897 [23:38:20] Epoch: 1 Batch: 3019/38378 (7.87%) Loss: 1.752024 LR: 0.00004897 [23:38:21] Epoch: 1 Batch: 3020/38378 (7.87%) Loss: 2.170910 LR: 0.00004897 [23:38:23] Epoch: 1 Batch: 3021/38378 (7.87%) Loss: 2.032534 LR: 0.00004909 [23:38:25] Epoch: 1 Batch: 3022/38378 (7.87%) Loss: 1.965255 LR: 0.00004909 [23:38:27] Epoch: 1 Batch: 3023/38378 (7.88%) Loss: 2.090908 LR: 0.00004909 [23:38:28] Epoch: 1 Batch: 3024/38378 (7.88%) Loss: 2.408137 LR: 0.00004909 [23:38:30] Epoch: 1 Batch: 3025/38378 (7.88%) Loss: 2.002776 LR: 0.00004909 [23:38:32] Epoch: 1 Batch: 3026/38378 (7.88%) Loss: 1.941975 LR: 0.00004909 [23:38:34] Epoch: 1 Batch: 3027/38378 (7.89%) Loss: 1.908313 LR: 0.00004909 [23:38:35] Epoch: 1 Batch: 3028/38378 (7.89%) Loss: 2.129214 LR: 0.00004920 [23:38:37] Epoch: 1 Batch: 3029/38378 (7.89%) Loss: 2.294255 LR: 0.00004920 [23:38:39] Epoch: 1 Batch: 3030/38378 (7.90%) Loss: 2.166289 LR: 0.00004920 [23:38:40] Epoch: 1 Batch: 3031/38378 (7.90%) Loss: 2.367098 LR: 0.00004920 [23:38:42] Epoch: 1 Batch: 3032/38378 (7.90%) Loss: 1.596576 LR: 0.00004920 [23:38:44] Epoch: 1 Batch: 3033/38378 (7.90%) Loss: 2.050310 LR: 0.00004920 [23:38:46] Epoch: 1 Batch: 3034/38378 (7.91%) Loss: 2.072070 LR: 0.00004920 [23:38:47] Epoch: 1 Batch: 3035/38378 (7.91%) Loss: 2.112031 LR: 0.00004932 [23:38:53] >> Cleaned up old temp checkpoint: epoch1_step2706 [23:38:53] >> Temp checkpoint saved: epoch1_step3036, size: 0.1702 GB [23:38:53] Epoch: 1 Batch: 3036/38378 (7.91%) Loss: 1.943110 LR: 0.00004932 [23:38:54] Epoch: 1 Batch: 3037/38378 (7.91%) Loss: 2.279308 LR: 0.00004932 [23:38:56] Epoch: 1 Batch: 3038/38378 (7.92%) Loss: 1.896533 LR: 0.00004932 [23:38:58] Epoch: 1 Batch: 3039/38378 (7.92%) Loss: 2.305166 LR: 0.00004932 [23:38:59] Epoch: 1 Batch: 3040/38378 (7.92%) Loss: 2.237829 LR: 0.00004932 [23:39:01] Epoch: 1 Batch: 3041/38378 (7.92%) Loss: 1.850142 LR: 0.00004932 [23:39:03] Epoch: 1 Batch: 3042/38378 (7.93%) Loss: 1.981378 LR: 0.00004943 [23:39:05] Epoch: 1 Batch: 3043/38378 (7.93%) Loss: 1.965345 LR: 0.00004943 [23:39:06] Epoch: 1 Batch: 3044/38378 (7.93%) Loss: 2.390928 LR: 0.00004943 [23:39:08] Epoch: 1 Batch: 3045/38378 (7.93%) Loss: 2.118908 LR: 0.00004943 [23:39:10] Epoch: 1 Batch: 3046/38378 (7.94%) Loss: 1.884858 LR: 0.00004943 [23:39:11] Epoch: 1 Batch: 3047/38378 (7.94%) Loss: 2.211137 LR: 0.00004943 [23:39:13] Epoch: 1 Batch: 3048/38378 (7.94%) Loss: 2.088511 LR: 0.00004943 [23:39:15] Epoch: 1 Batch: 3049/38378 (7.94%) Loss: 1.899283 LR: 0.00004954 [23:39:16] Epoch: 1 Batch: 3050/38378 (7.95%) Loss: 1.943988 LR: 0.00004954 [23:39:18] Epoch: 1 Batch: 3051/38378 (7.95%) Loss: 2.018129 LR: 0.00004954 [23:39:20] Epoch: 1 Batch: 3052/38378 (7.95%) Loss: 1.971276 LR: 0.00004954 [23:39:21] Epoch: 1 Batch: 3053/38378 (7.96%) Loss: 2.183715 LR: 0.00004954 [23:39:23] Epoch: 1 Batch: 3054/38378 (7.96%) Loss: 1.798384 LR: 0.00004954 [23:39:25] Epoch: 1 Batch: 3055/38378 (7.96%) Loss: 1.740947 LR: 0.00004954 [23:39:27] Epoch: 1 Batch: 3056/38378 (7.96%) Loss: 2.068980 LR: 0.00004966 [23:39:28] Epoch: 1 Batch: 3057/38378 (7.97%) Loss: 2.155265 LR: 0.00004966 [23:39:30] Epoch: 1 Batch: 3058/38378 (7.97%) Loss: 1.981422 LR: 0.00004966 [23:39:32] Epoch: 1 Batch: 3059/38378 (7.97%) Loss: 1.778941 LR: 0.00004966 [23:39:33] Epoch: 1 Batch: 3060/38378 (7.97%) Loss: 2.001417 LR: 0.00004966 [23:39:35] Epoch: 1 Batch: 3061/38378 (7.98%) Loss: 2.022961 LR: 0.00004966 [23:39:37] Epoch: 1 Batch: 3062/38378 (7.98%) Loss: 2.327916 LR: 0.00004966 [23:39:39] Epoch: 1 Batch: 3063/38378 (7.98%) Loss: 2.335848 LR: 0.00004977 [23:39:40] Epoch: 1 Batch: 3064/38378 (7.98%) Loss: 2.397010 LR: 0.00004977 [23:39:42] Epoch: 1 Batch: 3065/38378 (7.99%) Loss: 2.002691 LR: 0.00004977 [23:39:44] Epoch: 1 Batch: 3066/38378 (7.99%) Loss: 1.724927 LR: 0.00004977 [23:39:45] Epoch: 1 Batch: 3067/38378 (7.99%) Loss: 2.168877 LR: 0.00004977 [23:39:47] Epoch: 1 Batch: 3068/38378 (7.99%) Loss: 2.087846 LR: 0.00004977 [23:39:53] >> Cleaned up old temp checkpoint: epoch1_step2739 [23:39:53] >> Temp checkpoint saved: epoch1_step3069, size: 0.1702 GB [23:39:53] Epoch: 1 Batch: 3069/38378 (8.00%) Loss: 2.005832 LR: 0.00004977 [23:39:54] Epoch: 1 Batch: 3070/38378 (8.00%) Loss: 1.775728 LR: 0.00004989 [23:39:56] Epoch: 1 Batch: 3071/38378 (8.00%) Loss: 2.027180 LR: 0.00004989 [23:39:58] Epoch: 1 Batch: 3072/38378 (8.00%) Loss: 1.954885 LR: 0.00004989 [23:39:59] Epoch: 1 Batch: 3073/38378 (8.01%) Loss: 2.237732 LR: 0.00004989 [23:40:01] Epoch: 1 Batch: 3074/38378 (8.01%) Loss: 2.234404 LR: 0.00004989 [23:40:03] Epoch: 1 Batch: 3075/38378 (8.01%) Loss: 2.069006 LR: 0.00004989 [23:40:04] Epoch: 1 Batch: 3076/38378 (8.02%) Loss: 1.874274 LR: 0.00004989 [23:40:06] Epoch: 1 Batch: 3077/38378 (8.02%) Loss: 1.952484 LR: 0.00005000 [23:40:08] Epoch: 1 Batch: 3078/38378 (8.02%) Loss: 2.363489 LR: 0.00005000 [23:40:10] Epoch: 1 Batch: 3079/38378 (8.02%) Loss: 2.153057 LR: 0.00005000 [23:40:11] Epoch: 1 Batch: 3080/38378 (8.03%) Loss: 2.140805 LR: 0.00005000 [23:40:13] Epoch: 1 Batch: 3081/38378 (8.03%) Loss: 2.143164 LR: 0.00005000 [23:40:15] Epoch: 1 Batch: 3082/38378 (8.03%) Loss: 2.056550 LR: 0.00005000 [23:40:16] Epoch: 1 Batch: 3083/38378 (8.03%) Loss: 2.019216 LR: 0.00005000 [23:40:18] Epoch: 1 Batch: 3084/38378 (8.04%) Loss: 2.335091 LR: 0.00005000 [23:40:20] Epoch: 1 Batch: 3085/38378 (8.04%) Loss: 2.054578 LR: 0.00005000 [23:40:21] Epoch: 1 Batch: 3086/38378 (8.04%) Loss: 2.251077 LR: 0.00005000 [23:40:23] Epoch: 1 Batch: 3087/38378 (8.04%) Loss: 2.142817 LR: 0.00005000 [23:40:25] Epoch: 1 Batch: 3088/38378 (8.05%) Loss: 2.207658 LR: 0.00005000 [23:40:27] Epoch: 1 Batch: 3089/38378 (8.05%) Loss: 2.034180 LR: 0.00005000 [23:40:28] Epoch: 1 Batch: 3090/38378 (8.05%) Loss: 1.865422 LR: 0.00005000 [23:40:30] Epoch: 1 Batch: 3091/38378 (8.05%) Loss: 2.251866 LR: 0.00005000 [23:40:32] Epoch: 1 Batch: 3092/38378 (8.06%) Loss: 2.108997 LR: 0.00005000 [23:40:33] Epoch: 1 Batch: 3093/38378 (8.06%) Loss: 2.202415 LR: 0.00005000 [23:40:35] Epoch: 1 Batch: 3094/38378 (8.06%) Loss: 2.027214 LR: 0.00005000 [23:40:37] Epoch: 1 Batch: 3095/38378 (8.06%) Loss: 2.078783 LR: 0.00005000 [23:40:39] Epoch: 1 Batch: 3096/38378 (8.07%) Loss: 1.842753 LR: 0.00005000 [23:40:40] Epoch: 1 Batch: 3097/38378 (8.07%) Loss: 2.368152 LR: 0.00005000 [23:40:42] Epoch: 1 Batch: 3098/38378 (8.07%) Loss: 2.158080 LR: 0.00005000 [23:40:44] Epoch: 1 Batch: 3099/38378 (8.07%) Loss: 1.974026 LR: 0.00005000 [23:40:46] Epoch: 1 Batch: 3100/38378 (8.08%) Loss: 1.985810 LR: 0.00005000 [23:40:47] Epoch: 1 Batch: 3101/38378 (8.08%) Loss: 1.950540 LR: 0.00005000 [23:40:53] >> Cleaned up old temp checkpoint: epoch1_step2772 [23:40:53] >> Temp checkpoint saved: epoch1_step3102, size: 0.1702 GB [23:40:53] Epoch: 1 Batch: 3102/38378 (8.08%) Loss: 2.037406 LR: 0.00005000 [23:40:55] Epoch: 1 Batch: 3103/38378 (8.09%) Loss: 2.047113 LR: 0.00005000 [23:40:56] Epoch: 1 Batch: 3104/38378 (8.09%) Loss: 1.837563 LR: 0.00005000 [23:40:58] Epoch: 1 Batch: 3105/38378 (8.09%) Loss: 1.958385 LR: 0.00005000 [23:41:00] Epoch: 1 Batch: 3106/38378 (8.09%) Loss: 2.295045 LR: 0.00005000 [23:41:01] Epoch: 1 Batch: 3107/38378 (8.10%) Loss: 2.330152 LR: 0.00005000 [23:41:03] Epoch: 1 Batch: 3108/38378 (8.10%) Loss: 1.853822 LR: 0.00005000 [23:41:05] Epoch: 1 Batch: 3109/38378 (8.10%) Loss: 2.162455 LR: 0.00005000 [23:41:07] Epoch: 1 Batch: 3110/38378 (8.10%) Loss: 2.073895 LR: 0.00005000 [23:41:08] Epoch: 1 Batch: 3111/38378 (8.11%) Loss: 2.091976 LR: 0.00005000 [23:41:10] Epoch: 1 Batch: 3112/38378 (8.11%) Loss: 1.949182 LR: 0.00005000 [23:41:12] Epoch: 1 Batch: 3113/38378 (8.11%) Loss: 2.168867 LR: 0.00005000 [23:41:13] Epoch: 1 Batch: 3114/38378 (8.11%) Loss: 1.803929 LR: 0.00005000 [23:41:15] Epoch: 1 Batch: 3115/38378 (8.12%) Loss: 1.764233 LR: 0.00005000 [23:41:17] Epoch: 1 Batch: 3116/38378 (8.12%) Loss: 2.129968 LR: 0.00005000 [23:41:19] Epoch: 1 Batch: 3117/38378 (8.12%) Loss: 1.727068 LR: 0.00005000 [23:41:20] Epoch: 1 Batch: 3118/38378 (8.12%) Loss: 1.961134 LR: 0.00005000 [23:41:22] Epoch: 1 Batch: 3119/38378 (8.13%) Loss: 2.219750 LR: 0.00005000 [23:41:24] Epoch: 1 Batch: 3120/38378 (8.13%) Loss: 2.112649 LR: 0.00005000 [23:41:25] Epoch: 1 Batch: 3121/38378 (8.13%) Loss: 2.090373 LR: 0.00005000 [23:41:27] Epoch: 1 Batch: 3122/38378 (8.13%) Loss: 1.883310 LR: 0.00005000 [23:41:29] Epoch: 1 Batch: 3123/38378 (8.14%) Loss: 2.061229 LR: 0.00005000 [23:41:31] Epoch: 1 Batch: 3124/38378 (8.14%) Loss: 1.959395 LR: 0.00005000 [23:41:32] Epoch: 1 Batch: 3125/38378 (8.14%) Loss: 1.997177 LR: 0.00005000 [23:41:34] Epoch: 1 Batch: 3126/38378 (8.15%) Loss: 2.486720 LR: 0.00005000 [23:41:36] Epoch: 1 Batch: 3127/38378 (8.15%) Loss: 2.130355 LR: 0.00005000 [23:41:37] Epoch: 1 Batch: 3128/38378 (8.15%) Loss: 1.886829 LR: 0.00005000 [23:41:39] Epoch: 1 Batch: 3129/38378 (8.15%) Loss: 1.765236 LR: 0.00005000 [23:41:41] Epoch: 1 Batch: 3130/38378 (8.16%) Loss: 1.868388 LR: 0.00005000 [23:41:43] Epoch: 1 Batch: 3131/38378 (8.16%) Loss: 2.126932 LR: 0.00005000 [23:41:44] Epoch: 1 Batch: 3132/38378 (8.16%) Loss: 2.317907 LR: 0.00005000 [23:41:46] Epoch: 1 Batch: 3133/38378 (8.16%) Loss: 1.812159 LR: 0.00005000 [23:41:48] Epoch: 1 Batch: 3134/38378 (8.17%) Loss: 2.097744 LR: 0.00005000 [23:41:53] >> Cleaned up old temp checkpoint: epoch1_step2805 [23:41:53] >> Temp checkpoint saved: epoch1_step3135, size: 0.1702 GB [23:41:53] Epoch: 1 Batch: 3135/38378 (8.17%) Loss: 2.264733 LR: 0.00005000 [23:41:55] Epoch: 1 Batch: 3136/38378 (8.17%) Loss: 2.181334 LR: 0.00005000 [23:41:57] Epoch: 1 Batch: 3137/38378 (8.17%) Loss: 2.166896 LR: 0.00005000 [23:41:58] Epoch: 1 Batch: 3138/38378 (8.18%) Loss: 2.022931 LR: 0.00005000 [23:42:00] Epoch: 1 Batch: 3139/38378 (8.18%) Loss: 2.188758 LR: 0.00005000 [23:42:02] Epoch: 1 Batch: 3140/38378 (8.18%) Loss: 1.893725 LR: 0.00005000 [23:42:03] Epoch: 1 Batch: 3141/38378 (8.18%) Loss: 2.128580 LR: 0.00005000 [23:42:05] Epoch: 1 Batch: 3142/38378 (8.19%) Loss: 2.094862 LR: 0.00005000 [23:42:07] Epoch: 1 Batch: 3143/38378 (8.19%) Loss: 2.111049 LR: 0.00005000 [23:42:08] Epoch: 1 Batch: 3144/38378 (8.19%) Loss: 2.055218 LR: 0.00005000 [23:42:10] Epoch: 1 Batch: 3145/38378 (8.19%) Loss: 1.737728 LR: 0.00005000 [23:42:12] Epoch: 1 Batch: 3146/38378 (8.20%) Loss: 1.903777 LR: 0.00005000 [23:42:14] Epoch: 1 Batch: 3147/38378 (8.20%) Loss: 1.902945 LR: 0.00005000 [23:42:15] Epoch: 1 Batch: 3148/38378 (8.20%) Loss: 2.007084 LR: 0.00005000 [23:42:17] Epoch: 1 Batch: 3149/38378 (8.21%) Loss: 2.450264 LR: 0.00005000 [23:42:19] Epoch: 1 Batch: 3150/38378 (8.21%) Loss: 1.794309 LR: 0.00005000 [23:42:20] Epoch: 1 Batch: 3151/38378 (8.21%) Loss: 2.064431 LR: 0.00005000 [23:42:22] Epoch: 1 Batch: 3152/38378 (8.21%) Loss: 2.089052 LR: 0.00005000 [23:42:24] Epoch: 1 Batch: 3153/38378 (8.22%) Loss: 2.097121 LR: 0.00005000 [23:42:26] Epoch: 1 Batch: 3154/38378 (8.22%) Loss: 2.213175 LR: 0.00005000 [23:42:27] Epoch: 1 Batch: 3155/38378 (8.22%) Loss: 2.010972 LR: 0.00005000 [23:42:29] Epoch: 1 Batch: 3156/38378 (8.22%) Loss: 2.304163 LR: 0.00005000 [23:42:31] Epoch: 1 Batch: 3157/38378 (8.23%) Loss: 2.380075 LR: 0.00005000 [23:42:32] Epoch: 1 Batch: 3158/38378 (8.23%) Loss: 1.970615 LR: 0.00005000 [23:42:34] Epoch: 1 Batch: 3159/38378 (8.23%) Loss: 1.923703 LR: 0.00005000 [23:42:36] Epoch: 1 Batch: 3160/38378 (8.23%) Loss: 2.018802 LR: 0.00005000 [23:42:38] Epoch: 1 Batch: 3161/38378 (8.24%) Loss: 1.783321 LR: 0.00005000 [23:42:39] Epoch: 1 Batch: 3162/38378 (8.24%) Loss: 2.082640 LR: 0.00005000 [23:42:41] Epoch: 1 Batch: 3163/38378 (8.24%) Loss: 1.778107 LR: 0.00005000 [23:42:43] Epoch: 1 Batch: 3164/38378 (8.24%) Loss: 2.336514 LR: 0.00005000 [23:42:44] Epoch: 1 Batch: 3165/38378 (8.25%) Loss: 2.242838 LR: 0.00005000 [23:42:46] Epoch: 1 Batch: 3166/38378 (8.25%) Loss: 2.102545 LR: 0.00005000 [23:42:48] Epoch: 1 Batch: 3167/38378 (8.25%) Loss: 2.186466 LR: 0.00005000 [23:42:53] >> Cleaned up old temp checkpoint: epoch1_step2838 [23:42:53] >> Temp checkpoint saved: epoch1_step3168, size: 0.1702 GB [23:42:54] Epoch: 1 Batch: 3168/38378 (8.25%) Loss: 2.033887 LR: 0.00005000 [23:42:55] Epoch: 1 Batch: 3169/38378 (8.26%) Loss: 2.447875 LR: 0.00005000 [23:42:57] Epoch: 1 Batch: 3170/38378 (8.26%) Loss: 1.930419 LR: 0.00005000 [23:42:59] Epoch: 1 Batch: 3171/38378 (8.26%) Loss: 2.002164 LR: 0.00005000 [23:43:00] Epoch: 1 Batch: 3172/38378 (8.27%) Loss: 2.034069 LR: 0.00005000 [23:43:02] Epoch: 1 Batch: 3173/38378 (8.27%) Loss: 1.890528 LR: 0.00005000 [23:43:04] Epoch: 1 Batch: 3174/38378 (8.27%) Loss: 2.029647 LR: 0.00005000 [23:43:05] Epoch: 1 Batch: 3175/38378 (8.27%) Loss: 2.125901 LR: 0.00005000 [23:43:07] Epoch: 1 Batch: 3176/38378 (8.28%) Loss: 2.087668 LR: 0.00005000 [23:43:08] Epoch: 1 Batch: 3177/38378 (8.28%) Loss: 2.352577 LR: 0.00005000 [23:43:10] Epoch: 1 Batch: 3178/38378 (8.28%) Loss: 2.100745 LR: 0.00005000 [23:43:12] Epoch: 1 Batch: 3179/38378 (8.28%) Loss: 2.101051 LR: 0.00005000 [23:43:13] Epoch: 1 Batch: 3180/38378 (8.29%) Loss: 1.839743 LR: 0.00005000 [23:43:15] Epoch: 1 Batch: 3181/38378 (8.29%) Loss: 2.114846 LR: 0.00005000 [23:43:17] Epoch: 1 Batch: 3182/38378 (8.29%) Loss: 2.062729 LR: 0.00005000 [23:43:19] Epoch: 1 Batch: 3183/38378 (8.29%) Loss: 2.399283 LR: 0.00005000 [23:43:20] Epoch: 1 Batch: 3184/38378 (8.30%) Loss: 1.992321 LR: 0.00005000 [23:43:22] Epoch: 1 Batch: 3185/38378 (8.30%) Loss: 2.309400 LR: 0.00005000 [23:43:24] Epoch: 1 Batch: 3186/38378 (8.30%) Loss: 1.948413 LR: 0.00005000 [23:43:25] Epoch: 1 Batch: 3187/38378 (8.30%) Loss: 2.259509 LR: 0.00005000 [23:43:27] Epoch: 1 Batch: 3188/38378 (8.31%) Loss: 2.049429 LR: 0.00005000 [23:43:29] Epoch: 1 Batch: 3189/38378 (8.31%) Loss: 2.311760 LR: 0.00005000 [23:43:31] Epoch: 1 Batch: 3190/38378 (8.31%) Loss: 2.004492 LR: 0.00005000 [23:43:32] Epoch: 1 Batch: 3191/38378 (8.31%) Loss: 1.960523 LR: 0.00005000 [23:43:34] Epoch: 1 Batch: 3192/38378 (8.32%) Loss: 2.282486 LR: 0.00005000 [23:43:36] Epoch: 1 Batch: 3193/38378 (8.32%) Loss: 2.123434 LR: 0.00005000 [23:43:37] Epoch: 1 Batch: 3194/38378 (8.32%) Loss: 2.126499 LR: 0.00005000 [23:43:39] Epoch: 1 Batch: 3195/38378 (8.33%) Loss: 2.080364 LR: 0.00005000 [23:43:41] Epoch: 1 Batch: 3196/38378 (8.33%) Loss: 2.132465 LR: 0.00005000 [23:43:43] Epoch: 1 Batch: 3197/38378 (8.33%) Loss: 2.045470 LR: 0.00005000 [23:43:44] Epoch: 1 Batch: 3198/38378 (8.33%) Loss: 2.220928 LR: 0.00005000 [23:43:46] Epoch: 1 Batch: 3199/38378 (8.34%) Loss: 2.114979 LR: 0.00005000 [23:43:48] Epoch: 1 Batch: 3200/38378 (8.34%) Loss: 2.081401 LR: 0.00005000 [23:43:53] >> Cleaned up old temp checkpoint: epoch1_step2871 [23:43:53] >> Temp checkpoint saved: epoch1_step3201, size: 0.1702 GB [23:43:53] Epoch: 1 Batch: 3201/38378 (8.34%) Loss: 2.395648 LR: 0.00005000 [23:43:55] Epoch: 1 Batch: 3202/38378 (8.34%) Loss: 2.199365 LR: 0.00005000 [23:43:57] Epoch: 1 Batch: 3203/38378 (8.35%) Loss: 1.989955 LR: 0.00005000 [23:43:58] Epoch: 1 Batch: 3204/38378 (8.35%) Loss: 1.979258 LR: 0.00005000 [23:44:00] Epoch: 1 Batch: 3205/38378 (8.35%) Loss: 1.998362 LR: 0.00005000 [23:44:02] Epoch: 1 Batch: 3206/38378 (8.35%) Loss: 1.905240 LR: 0.00005000 [23:44:03] Epoch: 1 Batch: 3207/38378 (8.36%) Loss: 2.041031 LR: 0.00005000 [23:44:05] Epoch: 1 Batch: 3208/38378 (8.36%) Loss: 1.920792 LR: 0.00005000 [23:44:07] Epoch: 1 Batch: 3209/38378 (8.36%) Loss: 2.254255 LR: 0.00005000 [23:44:08] Epoch: 1 Batch: 3210/38378 (8.36%) Loss: 2.388079 LR: 0.00005000 [23:44:10] Epoch: 1 Batch: 3211/38378 (8.37%) Loss: 2.016944 LR: 0.00005000 [23:44:12] Epoch: 1 Batch: 3212/38378 (8.37%) Loss: 2.282621 LR: 0.00005000 [23:44:14] Epoch: 1 Batch: 3213/38378 (8.37%) Loss: 1.795025 LR: 0.00005000 [23:44:15] Epoch: 1 Batch: 3214/38378 (8.37%) Loss: 2.246538 LR: 0.00005000 [23:44:17] Epoch: 1 Batch: 3215/38378 (8.38%) Loss: 2.142310 LR: 0.00005000 [23:44:19] Epoch: 1 Batch: 3216/38378 (8.38%) Loss: 1.724209 LR: 0.00005000 [23:44:20] Epoch: 1 Batch: 3217/38378 (8.38%) Loss: 1.977082 LR: 0.00005000 [23:44:22] Epoch: 1 Batch: 3218/38378 (8.39%) Loss: 2.136965 LR: 0.00005000 [23:44:24] Epoch: 1 Batch: 3219/38378 (8.39%) Loss: 1.886143 LR: 0.00005000 [23:44:25] Epoch: 1 Batch: 3220/38378 (8.39%) Loss: 2.121952 LR: 0.00005000 [23:44:27] Epoch: 1 Batch: 3221/38378 (8.39%) Loss: 2.061426 LR: 0.00005000 [23:44:29] Epoch: 1 Batch: 3222/38378 (8.40%) Loss: 2.179880 LR: 0.00005000 [23:44:31] Epoch: 1 Batch: 3223/38378 (8.40%) Loss: 2.057418 LR: 0.00005000 [23:44:32] Epoch: 1 Batch: 3224/38378 (8.40%) Loss: 2.103869 LR: 0.00005000 [23:44:34] Epoch: 1 Batch: 3225/38378 (8.40%) Loss: 2.112600 LR: 0.00005000 [23:44:36] Epoch: 1 Batch: 3226/38378 (8.41%) Loss: 2.039486 LR: 0.00005000 [23:44:37] Epoch: 1 Batch: 3227/38378 (8.41%) Loss: 2.190559 LR: 0.00005000 [23:44:39] Epoch: 1 Batch: 3228/38378 (8.41%) Loss: 1.886350 LR: 0.00005000 [23:44:41] Epoch: 1 Batch: 3229/38378 (8.41%) Loss: 2.395877 LR: 0.00005000 [23:44:42] Epoch: 1 Batch: 3230/38378 (8.42%) Loss: 2.248687 LR: 0.00005000 [23:44:44] Epoch: 1 Batch: 3231/38378 (8.42%) Loss: 2.119110 LR: 0.00005000 [23:44:46] Epoch: 1 Batch: 3232/38378 (8.42%) Loss: 2.193904 LR: 0.00005000 [23:44:48] Epoch: 1 Batch: 3233/38378 (8.42%) Loss: 2.028031 LR: 0.00005000 [23:44:53] >> Cleaned up old temp checkpoint: epoch1_step2904 [23:44:53] >> Temp checkpoint saved: epoch1_step3234, size: 0.1702 GB [23:44:53] Epoch: 1 Batch: 3234/38378 (8.43%) Loss: 2.262443 LR: 0.00005000 [23:44:55] Epoch: 1 Batch: 3235/38378 (8.43%) Loss: 1.987246 LR: 0.00005000 [23:44:56] Epoch: 1 Batch: 3236/38378 (8.43%) Loss: 2.287593 LR: 0.00005000 [23:44:58] Epoch: 1 Batch: 3237/38378 (8.43%) Loss: 2.102914 LR: 0.00005000 [23:45:00] Epoch: 1 Batch: 3238/38378 (8.44%) Loss: 2.068733 LR: 0.00005000 [23:45:02] Epoch: 1 Batch: 3239/38378 (8.44%) Loss: 2.018849 LR: 0.00005000 [23:45:03] Epoch: 1 Batch: 3240/38378 (8.44%) Loss: 1.962279 LR: 0.00005000 [23:45:05] Epoch: 1 Batch: 3241/38378 (8.44%) Loss: 1.943993 LR: 0.00005000 [23:45:07] Epoch: 1 Batch: 3242/38378 (8.45%) Loss: 2.125888 LR: 0.00005000 [23:45:08] Epoch: 1 Batch: 3243/38378 (8.45%) Loss: 2.089972 LR: 0.00005000 [23:45:10] Epoch: 1 Batch: 3244/38378 (8.45%) Loss: 2.025963 LR: 0.00005000 [23:45:12] Epoch: 1 Batch: 3245/38378 (8.46%) Loss: 2.045744 LR: 0.00005000 [23:45:14] Epoch: 1 Batch: 3246/38378 (8.46%) Loss: 2.131532 LR: 0.00005000 [23:45:15] Epoch: 1 Batch: 3247/38378 (8.46%) Loss: 1.919605 LR: 0.00005000 [23:45:17] Epoch: 1 Batch: 3248/38378 (8.46%) Loss: 1.835927 LR: 0.00005000 [23:45:19] Epoch: 1 Batch: 3249/38378 (8.47%) Loss: 1.967734 LR: 0.00005000 [23:45:20] Epoch: 1 Batch: 3250/38378 (8.47%) Loss: 1.841988 LR: 0.00005000 [23:45:22] Epoch: 1 Batch: 3251/38378 (8.47%) Loss: 2.002092 LR: 0.00005000 [23:45:24] Epoch: 1 Batch: 3252/38378 (8.47%) Loss: 1.959935 LR: 0.00005000 [23:45:26] Epoch: 1 Batch: 3253/38378 (8.48%) Loss: 2.112662 LR: 0.00005000 [23:45:27] Epoch: 1 Batch: 3254/38378 (8.48%) Loss: 1.941460 LR: 0.00005000 [23:45:29] Epoch: 1 Batch: 3255/38378 (8.48%) Loss: 2.206799 LR: 0.00005000 [23:45:30] Epoch: 1 Batch: 3256/38378 (8.48%) Loss: 2.306637 LR: 0.00005000 [23:45:32] Epoch: 1 Batch: 3257/38378 (8.49%) Loss: 1.982585 LR: 0.00005000 [23:45:34] Epoch: 1 Batch: 3258/38378 (8.49%) Loss: 1.956894 LR: 0.00005000 [23:45:36] Epoch: 1 Batch: 3259/38378 (8.49%) Loss: 2.047347 LR: 0.00005000 [23:45:37] Epoch: 1 Batch: 3260/38378 (8.49%) Loss: 1.968782 LR: 0.00005000 [23:45:39] Epoch: 1 Batch: 3261/38378 (8.50%) Loss: 2.150919 LR: 0.00005000 [23:45:41] Epoch: 1 Batch: 3262/38378 (8.50%) Loss: 2.091075 LR: 0.00005000 [23:45:43] Epoch: 1 Batch: 3263/38378 (8.50%) Loss: 2.011602 LR: 0.00005000 [23:45:44] Epoch: 1 Batch: 3264/38378 (8.50%) Loss: 2.159548 LR: 0.00005000 [23:45:46] Epoch: 1 Batch: 3265/38378 (8.51%) Loss: 2.527142 LR: 0.00005000 [23:45:48] Epoch: 1 Batch: 3266/38378 (8.51%) Loss: 1.889838 LR: 0.00005000 [23:45:53] >> Cleaned up old temp checkpoint: epoch1_step2937 [23:45:53] >> Temp checkpoint saved: epoch1_step3267, size: 0.1702 GB [23:45:53] Epoch: 1 Batch: 3267/38378 (8.51%) Loss: 2.059504 LR: 0.00005000 [23:45:55] Epoch: 1 Batch: 3268/38378 (8.52%) Loss: 2.083131 LR: 0.00005000 [23:45:56] Epoch: 1 Batch: 3269/38378 (8.52%) Loss: 2.123780 LR: 0.00005000 [23:45:58] Epoch: 1 Batch: 3270/38378 (8.52%) Loss: 1.965830 LR: 0.00005000 [23:46:00] Epoch: 1 Batch: 3271/38378 (8.52%) Loss: 1.855442 LR: 0.00005000 [23:46:01] Epoch: 1 Batch: 3272/38378 (8.53%) Loss: 2.002791 LR: 0.00005000 [23:46:03] Epoch: 1 Batch: 3273/38378 (8.53%) Loss: 1.831669 LR: 0.00005000 [23:46:05] Epoch: 1 Batch: 3274/38378 (8.53%) Loss: 2.316512 LR: 0.00005000 [23:46:07] Epoch: 1 Batch: 3275/38378 (8.53%) Loss: 2.031589 LR: 0.00005000 [23:46:08] Epoch: 1 Batch: 3276/38378 (8.54%) Loss: 2.446905 LR: 0.00005000 [23:46:10] Epoch: 1 Batch: 3277/38378 (8.54%) Loss: 1.955082 LR: 0.00005000 [23:46:12] Epoch: 1 Batch: 3278/38378 (8.54%) Loss: 2.052762 LR: 0.00005000 [23:46:13] Epoch: 1 Batch: 3279/38378 (8.54%) Loss: 2.148575 LR: 0.00005000 [23:46:15] Epoch: 1 Batch: 3280/38378 (8.55%) Loss: 2.019421 LR: 0.00005000 [23:46:17] Epoch: 1 Batch: 3281/38378 (8.55%) Loss: 2.212549 LR: 0.00005000 [23:46:19] Epoch: 1 Batch: 3282/38378 (8.55%) Loss: 2.049939 LR: 0.00005000 [23:46:20] Epoch: 1 Batch: 3283/38378 (8.55%) Loss: 2.160249 LR: 0.00005000 [23:46:22] Epoch: 1 Batch: 3284/38378 (8.56%) Loss: 2.193699 LR: 0.00005000 [23:46:24] Epoch: 1 Batch: 3285/38378 (8.56%) Loss: 1.888781 LR: 0.00005000 [23:46:26] Epoch: 1 Batch: 3286/38378 (8.56%) Loss: 2.155261 LR: 0.00005000 [23:46:27] Epoch: 1 Batch: 3287/38378 (8.56%) Loss: 2.351955 LR: 0.00005000 [23:46:29] Epoch: 1 Batch: 3288/38378 (8.57%) Loss: 2.113322 LR: 0.00005000 [23:46:31] Epoch: 1 Batch: 3289/38378 (8.57%) Loss: 2.015749 LR: 0.00005000 [23:46:33] Epoch: 1 Batch: 3290/38378 (8.57%) Loss: 2.090480 LR: 0.00005000 [23:46:34] Epoch: 1 Batch: 3291/38378 (8.58%) Loss: 1.845173 LR: 0.00005000 [23:46:36] Epoch: 1 Batch: 3292/38378 (8.58%) Loss: 2.142954 LR: 0.00005000 [23:46:38] Epoch: 1 Batch: 3293/38378 (8.58%) Loss: 1.932084 LR: 0.00005000 [23:46:39] Epoch: 1 Batch: 3294/38378 (8.58%) Loss: 2.338162 LR: 0.00005000 [23:46:41] Epoch: 1 Batch: 3295/38378 (8.59%) Loss: 2.108155 LR: 0.00005000 [23:46:43] Epoch: 1 Batch: 3296/38378 (8.59%) Loss: 1.753018 LR: 0.00005000 [23:46:45] Epoch: 1 Batch: 3297/38378 (8.59%) Loss: 2.039233 LR: 0.00005000 [23:46:46] Epoch: 1 Batch: 3298/38378 (8.59%) Loss: 1.786744 LR: 0.00005000 [23:46:48] Epoch: 1 Batch: 3299/38378 (8.60%) Loss: 1.925933 LR: 0.00005000 [23:46:54] >> Cleaned up old temp checkpoint: epoch1_step2970 [23:46:54] >> Temp checkpoint saved: epoch1_step3300, size: 0.1702 GB [23:46:54] Epoch: 1 Batch: 3300/38378 (8.60%) Loss: 2.239806 LR: 0.00005000 [23:46:55] Epoch: 1 Batch: 3301/38378 (8.60%) Loss: 2.072646 LR: 0.00005000 [23:46:57] Epoch: 1 Batch: 3302/38378 (8.60%) Loss: 2.055050 LR: 0.00005000 [23:46:59] Epoch: 1 Batch: 3303/38378 (8.61%) Loss: 1.862209 LR: 0.00005000 [23:47:00] Epoch: 1 Batch: 3304/38378 (8.61%) Loss: 2.067947 LR: 0.00005000 [23:47:02] Epoch: 1 Batch: 3305/38378 (8.61%) Loss: 2.184555 LR: 0.00005000 [23:47:04] Epoch: 1 Batch: 3306/38378 (8.61%) Loss: 2.308869 LR: 0.00005000 [23:47:06] Epoch: 1 Batch: 3307/38378 (8.62%) Loss: 2.011714 LR: 0.00005000 [23:47:07] Epoch: 1 Batch: 3308/38378 (8.62%) Loss: 2.202811 LR: 0.00005000 [23:47:09] Epoch: 1 Batch: 3309/38378 (8.62%) Loss: 1.818110 LR: 0.00005000 [23:47:11] Epoch: 1 Batch: 3310/38378 (8.62%) Loss: 2.049882 LR: 0.00005000 [23:47:12] Epoch: 1 Batch: 3311/38378 (8.63%) Loss: 1.965436 LR: 0.00005000 [23:47:14] Epoch: 1 Batch: 3312/38378 (8.63%) Loss: 1.948051 LR: 0.00005000 [23:47:16] Epoch: 1 Batch: 3313/38378 (8.63%) Loss: 2.054700 LR: 0.00005000 [23:47:18] Epoch: 1 Batch: 3314/38378 (8.64%) Loss: 2.109578 LR: 0.00005000 [23:47:19] Epoch: 1 Batch: 3315/38378 (8.64%) Loss: 2.046165 LR: 0.00005000 [23:47:21] Epoch: 1 Batch: 3316/38378 (8.64%) Loss: 1.986482 LR: 0.00005000 [23:47:23] Epoch: 1 Batch: 3317/38378 (8.64%) Loss: 2.036826 LR: 0.00005000 [23:47:24] Epoch: 1 Batch: 3318/38378 (8.65%) Loss: 2.408960 LR: 0.00005000 [23:47:26] Epoch: 1 Batch: 3319/38378 (8.65%) Loss: 1.893549 LR: 0.00005000 [23:47:28] Epoch: 1 Batch: 3320/38378 (8.65%) Loss: 2.148234 LR: 0.00005000 [23:47:30] Epoch: 1 Batch: 3321/38378 (8.65%) Loss: 1.856960 LR: 0.00005000 [23:47:31] Epoch: 1 Batch: 3322/38378 (8.66%) Loss: 1.961598 LR: 0.00005000 [23:47:33] Epoch: 1 Batch: 3323/38378 (8.66%) Loss: 2.061582 LR: 0.00005000 [23:47:35] Epoch: 1 Batch: 3324/38378 (8.66%) Loss: 2.256004 LR: 0.00005000 [23:47:37] Epoch: 1 Batch: 3325/38378 (8.66%) Loss: 2.047008 LR: 0.00005000 [23:47:38] Epoch: 1 Batch: 3326/38378 (8.67%) Loss: 2.223358 LR: 0.00005000 [23:47:40] Epoch: 1 Batch: 3327/38378 (8.67%) Loss: 2.070934 LR: 0.00005000 [23:47:42] Epoch: 1 Batch: 3328/38378 (8.67%) Loss: 1.997104 LR: 0.00005000 [23:47:43] Epoch: 1 Batch: 3329/38378 (8.67%) Loss: 1.861176 LR: 0.00004999 [23:47:45] Epoch: 1 Batch: 3330/38378 (8.68%) Loss: 2.124682 LR: 0.00004999 [23:47:47] Epoch: 1 Batch: 3331/38378 (8.68%) Loss: 2.164361 LR: 0.00004999 [23:47:49] Epoch: 1 Batch: 3332/38378 (8.68%) Loss: 2.235008 LR: 0.00004999 [23:47:54] >> Cleaned up old temp checkpoint: epoch1_step3003 [23:47:54] >> Temp checkpoint saved: epoch1_step3333, size: 0.1702 GB [23:47:54] Epoch: 1 Batch: 3333/38378 (8.68%) Loss: 2.306006 LR: 0.00004999 [23:47:56] Epoch: 1 Batch: 3334/38378 (8.69%) Loss: 1.935773 LR: 0.00004999 [23:47:57] Epoch: 1 Batch: 3335/38378 (8.69%) Loss: 2.163503 LR: 0.00004999 [23:47:59] Epoch: 1 Batch: 3336/38378 (8.69%) Loss: 2.176050 LR: 0.00004999 [23:48:01] Epoch: 1 Batch: 3337/38378 (8.70%) Loss: 2.155365 LR: 0.00004999 [23:48:02] Epoch: 1 Batch: 3338/38378 (8.70%) Loss: 1.821372 LR: 0.00004999 [23:48:04] Epoch: 1 Batch: 3339/38378 (8.70%) Loss: 2.195975 LR: 0.00004999 [23:48:06] Epoch: 1 Batch: 3340/38378 (8.70%) Loss: 2.191527 LR: 0.00004999 [23:48:07] Epoch: 1 Batch: 3341/38378 (8.71%) Loss: 2.237798 LR: 0.00004999 [23:48:09] Epoch: 1 Batch: 3342/38378 (8.71%) Loss: 2.009923 LR: 0.00004999 [23:48:11] Epoch: 1 Batch: 3343/38378 (8.71%) Loss: 2.044631 LR: 0.00004999 [23:48:13] Epoch: 1 Batch: 3344/38378 (8.71%) Loss: 2.069583 LR: 0.00004999 [23:48:14] Epoch: 1 Batch: 3345/38378 (8.72%) Loss: 2.219760 LR: 0.00004999 [23:48:16] Epoch: 1 Batch: 3346/38378 (8.72%) Loss: 2.042169 LR: 0.00004999 [23:48:18] Epoch: 1 Batch: 3347/38378 (8.72%) Loss: 2.141781 LR: 0.00004999 [23:48:19] Epoch: 1 Batch: 3348/38378 (8.72%) Loss: 2.036277 LR: 0.00004999 [23:48:21] Epoch: 1 Batch: 3349/38378 (8.73%) Loss: 2.071986 LR: 0.00004999 [23:48:23] Epoch: 1 Batch: 3350/38378 (8.73%) Loss: 2.083308 LR: 0.00004999 [23:48:24] Epoch: 1 Batch: 3351/38378 (8.73%) Loss: 2.277636 LR: 0.00004999 [23:48:26] Epoch: 1 Batch: 3352/38378 (8.73%) Loss: 2.151202 LR: 0.00004999 [23:48:28] Epoch: 1 Batch: 3353/38378 (8.74%) Loss: 2.118942 LR: 0.00004999 [23:48:30] Epoch: 1 Batch: 3354/38378 (8.74%) Loss: 2.091953 LR: 0.00004999 [23:48:31] Epoch: 1 Batch: 3355/38378 (8.74%) Loss: 2.281591 LR: 0.00004999 [23:48:33] Epoch: 1 Batch: 3356/38378 (8.74%) Loss: 2.225618 LR: 0.00004999 [23:48:35] Epoch: 1 Batch: 3357/38378 (8.75%) Loss: 1.874985 LR: 0.00004999 [23:48:36] Epoch: 1 Batch: 3358/38378 (8.75%) Loss: 2.086555 LR: 0.00004999 [23:48:38] Epoch: 1 Batch: 3359/38378 (8.75%) Loss: 2.171132 LR: 0.00004999 [23:48:40] Epoch: 1 Batch: 3360/38378 (8.76%) Loss: 2.070422 LR: 0.00004999 [23:48:41] Epoch: 1 Batch: 3361/38378 (8.76%) Loss: 1.992303 LR: 0.00004999 [23:48:43] Epoch: 1 Batch: 3362/38378 (8.76%) Loss: 2.286255 LR: 0.00004999 [23:48:45] Epoch: 1 Batch: 3363/38378 (8.76%) Loss: 1.979217 LR: 0.00004999 [23:48:47] Epoch: 1 Batch: 3364/38378 (8.77%) Loss: 2.178791 LR: 0.00004999 [23:48:48] Epoch: 1 Batch: 3365/38378 (8.77%) Loss: 1.936122 LR: 0.00004999 [23:48:54] >> Cleaned up old temp checkpoint: epoch1_step3036 [23:48:54] >> Temp checkpoint saved: epoch1_step3366, size: 0.1702 GB [23:48:54] Epoch: 1 Batch: 3366/38378 (8.77%) Loss: 1.864617 LR: 0.00004999 [23:48:56] Epoch: 1 Batch: 3367/38378 (8.77%) Loss: 2.389424 LR: 0.00004999 [23:48:57] Epoch: 1 Batch: 3368/38378 (8.78%) Loss: 1.939038 LR: 0.00004999 [23:48:59] Epoch: 1 Batch: 3369/38378 (8.78%) Loss: 2.074332 LR: 0.00004999 [23:49:01] Epoch: 1 Batch: 3370/38378 (8.78%) Loss: 2.413171 LR: 0.00004999 [23:49:02] Epoch: 1 Batch: 3371/38378 (8.78%) Loss: 1.746922 LR: 0.00004999 [23:49:04] Epoch: 1 Batch: 3372/38378 (8.79%) Loss: 2.126738 LR: 0.00004999 [23:49:05] Epoch: 1 Batch: 3373/38378 (8.79%) Loss: 2.532934 LR: 0.00004999 [23:49:07] Epoch: 1 Batch: 3374/38378 (8.79%) Loss: 2.117320 LR: 0.00004999 [23:49:09] Epoch: 1 Batch: 3375/38378 (8.79%) Loss: 2.298164 LR: 0.00004999 [23:49:10] Epoch: 1 Batch: 3376/38378 (8.80%) Loss: 1.698653 LR: 0.00004999 [23:49:12] Epoch: 1 Batch: 3377/38378 (8.80%) Loss: 1.829896 LR: 0.00004999 [23:49:14] Epoch: 1 Batch: 3378/38378 (8.80%) Loss: 2.126446 LR: 0.00004999 [23:49:16] Epoch: 1 Batch: 3379/38378 (8.80%) Loss: 2.260087 LR: 0.00004999 [23:49:17] Epoch: 1 Batch: 3380/38378 (8.81%) Loss: 2.173146 LR: 0.00004999 [23:49:19] Epoch: 1 Batch: 3381/38378 (8.81%) Loss: 1.919102 LR: 0.00004999 [23:49:21] Epoch: 1 Batch: 3382/38378 (8.81%) Loss: 1.851320 LR: 0.00004999 [23:49:22] Epoch: 1 Batch: 3383/38378 (8.81%) Loss: 2.124999 LR: 0.00004999 [23:49:24] Epoch: 1 Batch: 3384/38378 (8.82%) Loss: 2.178120 LR: 0.00004999 [23:49:26] Epoch: 1 Batch: 3385/38378 (8.82%) Loss: 2.026927 LR: 0.00004999 [23:49:28] Epoch: 1 Batch: 3386/38378 (8.82%) Loss: 2.073703 LR: 0.00004999 [23:49:29] Epoch: 1 Batch: 3387/38378 (8.83%) Loss: 1.886526 LR: 0.00004999 [23:49:31] Epoch: 1 Batch: 3388/38378 (8.83%) Loss: 2.208819 LR: 0.00004999 [23:49:33] Epoch: 1 Batch: 3389/38378 (8.83%) Loss: 2.263751 LR: 0.00004999 [23:49:35] Epoch: 1 Batch: 3390/38378 (8.83%) Loss: 2.042134 LR: 0.00004999 [23:49:36] Epoch: 1 Batch: 3391/38378 (8.84%) Loss: 1.995931 LR: 0.00004999 [23:49:38] Epoch: 1 Batch: 3392/38378 (8.84%) Loss: 2.061989 LR: 0.00004999 [23:49:40] Epoch: 1 Batch: 3393/38378 (8.84%) Loss: 2.125648 LR: 0.00004999 [23:49:41] Epoch: 1 Batch: 3394/38378 (8.84%) Loss: 1.825155 LR: 0.00004999 [23:49:43] Epoch: 1 Batch: 3395/38378 (8.85%) Loss: 2.062926 LR: 0.00004999 [23:49:45] Epoch: 1 Batch: 3396/38378 (8.85%) Loss: 2.025968 LR: 0.00004999 [23:49:47] Epoch: 1 Batch: 3397/38378 (8.85%) Loss: 2.199885 LR: 0.00004999 [23:49:48] Epoch: 1 Batch: 3398/38378 (8.85%) Loss: 1.957250 LR: 0.00004999 [23:49:54] >> Cleaned up old temp checkpoint: epoch1_step3069 [23:49:54] >> Temp checkpoint saved: epoch1_step3399, size: 0.1702 GB [23:49:54] Epoch: 1 Batch: 3399/38378 (8.86%) Loss: 2.105009 LR: 0.00004999 [23:49:56] Epoch: 1 Batch: 3400/38378 (8.86%) Loss: 2.300834 LR: 0.00004999 [23:49:57] Epoch: 1 Batch: 3401/38378 (8.86%) Loss: 1.918401 LR: 0.00004999 [23:49:59] Epoch: 1 Batch: 3402/38378 (8.86%) Loss: 2.105331 LR: 0.00004999 [23:50:01] Epoch: 1 Batch: 3403/38378 (8.87%) Loss: 2.187813 LR: 0.00004999 [23:50:02] Epoch: 1 Batch: 3404/38378 (8.87%) Loss: 1.862060 LR: 0.00004999 [23:50:04] Epoch: 1 Batch: 3405/38378 (8.87%) Loss: 2.214721 LR: 0.00004999 [23:50:06] Epoch: 1 Batch: 3406/38378 (8.87%) Loss: 2.244960 LR: 0.00004999 [23:50:07] Epoch: 1 Batch: 3407/38378 (8.88%) Loss: 2.314594 LR: 0.00004999 [23:50:09] Epoch: 1 Batch: 3408/38378 (8.88%) Loss: 1.804717 LR: 0.00004999 [23:50:11] Epoch: 1 Batch: 3409/38378 (8.88%) Loss: 1.984039 LR: 0.00004999 [23:50:13] Epoch: 1 Batch: 3410/38378 (8.89%) Loss: 1.697590 LR: 0.00004999 [23:50:14] Epoch: 1 Batch: 3411/38378 (8.89%) Loss: 1.971505 LR: 0.00004999 [23:50:16] Epoch: 1 Batch: 3412/38378 (8.89%) Loss: 2.361877 LR: 0.00004999 [23:50:18] Epoch: 1 Batch: 3413/38378 (8.89%) Loss: 2.417743 LR: 0.00004999 [23:50:19] Epoch: 1 Batch: 3414/38378 (8.90%) Loss: 1.974709 LR: 0.00004999 [23:50:21] Epoch: 1 Batch: 3415/38378 (8.90%) Loss: 1.936177 LR: 0.00004999 [23:50:23] Epoch: 1 Batch: 3416/38378 (8.90%) Loss: 2.022241 LR: 0.00004999 [23:50:25] Epoch: 1 Batch: 3417/38378 (8.90%) Loss: 2.049617 LR: 0.00004999 [23:50:26] Epoch: 1 Batch: 3418/38378 (8.91%) Loss: 1.859285 LR: 0.00004999 [23:50:28] Epoch: 1 Batch: 3419/38378 (8.91%) Loss: 1.940544 LR: 0.00004999 [23:50:30] Epoch: 1 Batch: 3420/38378 (8.91%) Loss: 2.061487 LR: 0.00004999 [23:50:31] Epoch: 1 Batch: 3421/38378 (8.91%) Loss: 2.521519 LR: 0.00004999 [23:50:33] Epoch: 1 Batch: 3422/38378 (8.92%) Loss: 1.875221 LR: 0.00004999 [23:50:35] Epoch: 1 Batch: 3423/38378 (8.92%) Loss: 2.474064 LR: 0.00004999 [23:50:36] Epoch: 1 Batch: 3424/38378 (8.92%) Loss: 2.141832 LR: 0.00004999 [23:50:38] Epoch: 1 Batch: 3425/38378 (8.92%) Loss: 2.077035 LR: 0.00004999 [23:50:40] Epoch: 1 Batch: 3426/38378 (8.93%) Loss: 1.994912 LR: 0.00004999 [23:50:42] Epoch: 1 Batch: 3427/38378 (8.93%) Loss: 2.148021 LR: 0.00004999 [23:50:43] Epoch: 1 Batch: 3428/38378 (8.93%) Loss: 2.026431 LR: 0.00004999 [23:50:45] Epoch: 1 Batch: 3429/38378 (8.93%) Loss: 1.798065 LR: 0.00004999 [23:50:47] Epoch: 1 Batch: 3430/38378 (8.94%) Loss: 2.140855 LR: 0.00004999 [23:50:48] Epoch: 1 Batch: 3431/38378 (8.94%) Loss: 2.390848 LR: 0.00004999 [23:50:54] >> Cleaned up old temp checkpoint: epoch1_step3102 [23:50:54] >> Temp checkpoint saved: epoch1_step3432, size: 0.1702 GB [23:50:54] Epoch: 1 Batch: 3432/38378 (8.94%) Loss: 1.931540 LR: 0.00004999 [23:50:56] Epoch: 1 Batch: 3433/38378 (8.95%) Loss: 1.729645 LR: 0.00004999 [23:50:57] Epoch: 1 Batch: 3434/38378 (8.95%) Loss: 2.160549 LR: 0.00004999 [23:50:59] Epoch: 1 Batch: 3435/38378 (8.95%) Loss: 1.815200 LR: 0.00004999 [23:51:01] Epoch: 1 Batch: 3436/38378 (8.95%) Loss: 1.958438 LR: 0.00004999 [23:51:02] Epoch: 1 Batch: 3437/38378 (8.96%) Loss: 2.073909 LR: 0.00004999 [23:51:04] Epoch: 1 Batch: 3438/38378 (8.96%) Loss: 2.079154 LR: 0.00004999 [23:51:06] Epoch: 1 Batch: 3439/38378 (8.96%) Loss: 1.965780 LR: 0.00004999 [23:51:07] Epoch: 1 Batch: 3440/38378 (8.96%) Loss: 2.341974 LR: 0.00004999 [23:51:09] Epoch: 1 Batch: 3441/38378 (8.97%) Loss: 2.244979 LR: 0.00004999 [23:51:11] Epoch: 1 Batch: 3442/38378 (8.97%) Loss: 2.441448 LR: 0.00004999 [23:51:12] Epoch: 1 Batch: 3443/38378 (8.97%) Loss: 2.222588 LR: 0.00004999 [23:51:14] Epoch: 1 Batch: 3444/38378 (8.97%) Loss: 1.989871 LR: 0.00004999 [23:51:16] Epoch: 1 Batch: 3445/38378 (8.98%) Loss: 2.146479 LR: 0.00004999 [23:51:17] Epoch: 1 Batch: 3446/38378 (8.98%) Loss: 1.998443 LR: 0.00004999 [23:51:19] Epoch: 1 Batch: 3447/38378 (8.98%) Loss: 2.413264 LR: 0.00004999 [23:51:21] Epoch: 1 Batch: 3448/38378 (8.98%) Loss: 2.036279 LR: 0.00004999 [23:51:23] Epoch: 1 Batch: 3449/38378 (8.99%) Loss: 2.157696 LR: 0.00004999 [23:51:24] Epoch: 1 Batch: 3450/38378 (8.99%) Loss: 1.693308 LR: 0.00004999 [23:51:26] Epoch: 1 Batch: 3451/38378 (8.99%) Loss: 2.009447 LR: 0.00004999 [23:51:28] Epoch: 1 Batch: 3452/38378 (8.99%) Loss: 1.879053 LR: 0.00004999 [23:51:29] Epoch: 1 Batch: 3453/38378 (9.00%) Loss: 1.945238 LR: 0.00004999 [23:51:31] Epoch: 1 Batch: 3454/38378 (9.00%) Loss: 2.326785 LR: 0.00004999 [23:51:33] Epoch: 1 Batch: 3455/38378 (9.00%) Loss: 2.004442 LR: 0.00004999 [23:51:35] Epoch: 1 Batch: 3456/38378 (9.01%) Loss: 1.792786 LR: 0.00004999 [23:51:36] Epoch: 1 Batch: 3457/38378 (9.01%) Loss: 2.297714 LR: 0.00004999 [23:51:38] Epoch: 1 Batch: 3458/38378 (9.01%) Loss: 1.742356 LR: 0.00004999 [23:51:40] Epoch: 1 Batch: 3459/38378 (9.01%) Loss: 1.982447 LR: 0.00004999 [23:51:41] Epoch: 1 Batch: 3460/38378 (9.02%) Loss: 1.716650 LR: 0.00004999 [23:51:43] Epoch: 1 Batch: 3461/38378 (9.02%) Loss: 2.341557 LR: 0.00004999 [23:51:45] Epoch: 1 Batch: 3462/38378 (9.02%) Loss: 2.218957 LR: 0.00004999 [23:51:47] Epoch: 1 Batch: 3463/38378 (9.02%) Loss: 2.176510 LR: 0.00004999 [23:51:48] Epoch: 1 Batch: 3464/38378 (9.03%) Loss: 2.173860 LR: 0.00004999 [23:51:54] >> Cleaned up old temp checkpoint: epoch1_step3135 [23:51:54] >> Temp checkpoint saved: epoch1_step3465, size: 0.1702 GB [23:51:54] Epoch: 1 Batch: 3465/38378 (9.03%) Loss: 1.989928 LR: 0.00004999 [23:51:55] Epoch: 1 Batch: 3466/38378 (9.03%) Loss: 2.131869 LR: 0.00004999 [23:51:57] Epoch: 1 Batch: 3467/38378 (9.03%) Loss: 1.758029 LR: 0.00004999 [23:51:59] Epoch: 1 Batch: 3468/38378 (9.04%) Loss: 2.150002 LR: 0.00004999 [23:52:01] Epoch: 1 Batch: 3469/38378 (9.04%) Loss: 1.920145 LR: 0.00004999 [23:52:02] Epoch: 1 Batch: 3470/38378 (9.04%) Loss: 2.039501 LR: 0.00004999 [23:52:04] Epoch: 1 Batch: 3471/38378 (9.04%) Loss: 1.845652 LR: 0.00004999 [23:52:06] Epoch: 1 Batch: 3472/38378 (9.05%) Loss: 2.218831 LR: 0.00004999 [23:52:07] Epoch: 1 Batch: 3473/38378 (9.05%) Loss: 2.292038 LR: 0.00004999 [23:52:09] Epoch: 1 Batch: 3474/38378 (9.05%) Loss: 1.998443 LR: 0.00004999 [23:52:11] Epoch: 1 Batch: 3475/38378 (9.05%) Loss: 2.062422 LR: 0.00004999 [23:52:12] Epoch: 1 Batch: 3476/38378 (9.06%) Loss: 2.162647 LR: 0.00004999 [23:52:14] Epoch: 1 Batch: 3477/38378 (9.06%) Loss: 2.242777 LR: 0.00004999 [23:52:16] Epoch: 1 Batch: 3478/38378 (9.06%) Loss: 2.220143 LR: 0.00004999 [23:52:17] Epoch: 1 Batch: 3479/38378 (9.07%) Loss: 2.196314 LR: 0.00004999 [23:52:19] Epoch: 1 Batch: 3480/38378 (9.07%) Loss: 1.850958 LR: 0.00004999 [23:52:21] Epoch: 1 Batch: 3481/38378 (9.07%) Loss: 2.200505 LR: 0.00004999 [23:52:23] Epoch: 1 Batch: 3482/38378 (9.07%) Loss: 2.056276 LR: 0.00004999 [23:52:24] Epoch: 1 Batch: 3483/38378 (9.08%) Loss: 2.092594 LR: 0.00004999 [23:52:26] Epoch: 1 Batch: 3484/38378 (9.08%) Loss: 1.933591 LR: 0.00004999 [23:52:28] Epoch: 1 Batch: 3485/38378 (9.08%) Loss: 1.964216 LR: 0.00004999 [23:52:30] Epoch: 1 Batch: 3486/38378 (9.08%) Loss: 2.059646 LR: 0.00004999 [23:52:31] Epoch: 1 Batch: 3487/38378 (9.09%) Loss: 2.329602 LR: 0.00004999 [23:52:33] Epoch: 1 Batch: 3488/38378 (9.09%) Loss: 2.058829 LR: 0.00004999 [23:52:34] Epoch: 1 Batch: 3489/38378 (9.09%) Loss: 2.161940 LR: 0.00004999 [23:52:36] Epoch: 1 Batch: 3490/38378 (9.09%) Loss: 2.097446 LR: 0.00004999 [23:52:38] Epoch: 1 Batch: 3491/38378 (9.10%) Loss: 1.869532 LR: 0.00004999 [23:52:39] Epoch: 1 Batch: 3492/38378 (9.10%) Loss: 2.358768 LR: 0.00004999 [23:52:41] Epoch: 1 Batch: 3493/38378 (9.10%) Loss: 1.914664 LR: 0.00004999 [23:52:43] Epoch: 1 Batch: 3494/38378 (9.10%) Loss: 2.058616 LR: 0.00004999 [23:52:44] Epoch: 1 Batch: 3495/38378 (9.11%) Loss: 2.300160 LR: 0.00004999 [23:52:46] Epoch: 1 Batch: 3496/38378 (9.11%) Loss: 2.063690 LR: 0.00004999 [23:52:48] Epoch: 1 Batch: 3497/38378 (9.11%) Loss: 1.913401 LR: 0.00004999 [23:52:53] >> Cleaned up old temp checkpoint: epoch1_step3168 [23:52:53] >> Temp checkpoint saved: epoch1_step3498, size: 0.1702 GB [23:52:53] Epoch: 1 Batch: 3498/38378 (9.11%) Loss: 2.212664 LR: 0.00004999 [23:52:55] Epoch: 1 Batch: 3499/38378 (9.12%) Loss: 2.094054 LR: 0.00004999 [23:52:57] >> Evaluating batch 0 [23:52:58] >> Evaluating batch 1 [23:52:59] >> Evaluating batch 2 [23:52:59] >> Evaluating batch 3 [23:53:00] >> Evaluating batch 4 [23:53:01] >> Evaluating batch 5 [23:53:02] >> Evaluating batch 6 [23:53:03] >> Evaluating batch 7 [23:53:04] >> Evaluating batch 8 [23:53:05] >> Evaluating batch 9 [23:53:06] >> Evaluating batch 10 [23:53:07] >> Evaluating batch 11 [23:53:08] >> Evaluating batch 12 [23:53:09] >> Evaluating batch 13 [23:53:10] >> Evaluating batch 14 [23:53:11] >> Evaluating batch 15 [23:53:12] >> Evaluating batch 16 [23:53:12] Epoch: 1 Step: 3500/38378 Evaluation: [23:53:12] [1mAvg Loss Since Last Eval: 2.0759 Val Loss: 2.1707 Validation loss delta: -0.0414 Perplexity: 8.7642 LR: 0.00004999 [23:53:17] >> Checkpoint saved: epoch1_step3500, size: 0.1702 GB [23:53:17] Epoch: 1 Batch: 3500/38378 (9.12%) Loss: 2.191550 LR: 0.00004999 [23:53:18] Epoch: 1 Batch: 3501/38378 (9.12%) Loss: 1.652207 LR: 0.00004999 [23:53:20] Epoch: 1 Batch: 3502/38378 (9.13%) Loss: 2.157607 LR: 0.00004999 [23:53:22] Epoch: 1 Batch: 3503/38378 (9.13%) Loss: 2.051500 LR: 0.00004999 [23:53:23] Epoch: 1 Batch: 3504/38378 (9.13%) Loss: 2.241178 LR: 0.00004999 [23:53:25] Epoch: 1 Batch: 3505/38378 (9.13%) Loss: 1.996584 LR: 0.00004999 [23:53:27] Epoch: 1 Batch: 3506/38378 (9.14%) Loss: 1.837048 LR: 0.00004999 [23:53:28] Epoch: 1 Batch: 3507/38378 (9.14%) Loss: 2.043217 LR: 0.00004999 [23:53:30] Epoch: 1 Batch: 3508/38378 (9.14%) Loss: 2.050857 LR: 0.00004999 [23:53:32] Epoch: 1 Batch: 3509/38378 (9.14%) Loss: 1.931344 LR: 0.00004999 [23:53:34] Epoch: 1 Batch: 3510/38378 (9.15%) Loss: 1.832479 LR: 0.00004999 [23:53:35] Epoch: 1 Batch: 3511/38378 (9.15%) Loss: 2.201020 LR: 0.00004999 [23:53:37] Epoch: 1 Batch: 3512/38378 (9.15%) Loss: 2.218129 LR: 0.00004999 [23:53:39] Epoch: 1 Batch: 3513/38378 (9.15%) Loss: 1.795457 LR: 0.00004999 [23:53:40] Epoch: 1 Batch: 3514/38378 (9.16%) Loss: 1.882312 LR: 0.00004999 [23:53:42] Epoch: 1 Batch: 3515/38378 (9.16%) Loss: 2.014766 LR: 0.00004999 [23:53:44] Epoch: 1 Batch: 3516/38378 (9.16%) Loss: 1.797875 LR: 0.00004999 [23:53:46] Epoch: 1 Batch: 3517/38378 (9.16%) Loss: 2.151524 LR: 0.00004999 [23:53:47] Epoch: 1 Batch: 3518/38378 (9.17%) Loss: 2.285318 LR: 0.00004998 [23:53:49] Epoch: 1 Batch: 3519/38378 (9.17%) Loss: 2.146827 LR: 0.00004998 [23:53:51] Epoch: 1 Batch: 3520/38378 (9.17%) Loss: 2.003139 LR: 0.00004998 [23:53:53] Epoch: 1 Batch: 3521/38378 (9.17%) Loss: 1.882733 LR: 0.00004998 [23:53:54] Epoch: 1 Batch: 3522/38378 (9.18%) Loss: 2.107421 LR: 0.00004998 [23:53:56] Epoch: 1 Batch: 3523/38378 (9.18%) Loss: 1.983973 LR: 0.00004998 [23:53:58] Epoch: 1 Batch: 3524/38378 (9.18%) Loss: 2.024448 LR: 0.00004998 [23:53:59] Epoch: 1 Batch: 3525/38378 (9.18%) Loss: 2.229386 LR: 0.00004998 [23:54:01] Epoch: 1 Batch: 3526/38378 (9.19%) Loss: 1.892024 LR: 0.00004998 [23:54:03] Epoch: 1 Batch: 3527/38378 (9.19%) Loss: 1.852450 LR: 0.00004998 [23:54:05] Epoch: 1 Batch: 3528/38378 (9.19%) Loss: 1.957305 LR: 0.00004998 [23:54:06] Epoch: 1 Batch: 3529/38378 (9.20%) Loss: 2.110895 LR: 0.00004998 [23:54:08] Epoch: 1 Batch: 3530/38378 (9.20%) Loss: 2.300326 LR: 0.00004998 [23:54:13] >> Cleaned up old temp checkpoint: epoch1_step3201 [23:54:13] >> Temp checkpoint saved: epoch1_step3531, size: 0.1702 GB [23:54:14] Epoch: 1 Batch: 3531/38378 (9.20%) Loss: 2.103405 LR: 0.00004998 [23:54:15] Epoch: 1 Batch: 3532/38378 (9.20%) Loss: 1.867092 LR: 0.00004998 [23:54:17] Epoch: 1 Batch: 3533/38378 (9.21%) Loss: 2.159433 LR: 0.00004998 [23:54:19] Epoch: 1 Batch: 3534/38378 (9.21%) Loss: 2.034619 LR: 0.00004998 [23:54:20] Epoch: 1 Batch: 3535/38378 (9.21%) Loss: 2.556628 LR: 0.00004998 [23:54:22] Epoch: 1 Batch: 3536/38378 (9.21%) Loss: 1.965015 LR: 0.00004998 [23:54:23] Epoch: 1 Batch: 3537/38378 (9.22%) Loss: 2.222981 LR: 0.00004998 [23:54:25] Epoch: 1 Batch: 3538/38378 (9.22%) Loss: 2.076320 LR: 0.00004998 [23:54:27] Epoch: 1 Batch: 3539/38378 (9.22%) Loss: 1.862755 LR: 0.00004998 [23:54:28] Epoch: 1 Batch: 3540/38378 (9.22%) Loss: 2.208698 LR: 0.00004998 [23:54:30] Epoch: 1 Batch: 3541/38378 (9.23%) Loss: 2.108908 LR: 0.00004998 [23:54:32] Epoch: 1 Batch: 3542/38378 (9.23%) Loss: 2.048112 LR: 0.00004998 [23:54:34] Epoch: 1 Batch: 3543/38378 (9.23%) Loss: 2.118297 LR: 0.00004998 [23:54:35] Epoch: 1 Batch: 3544/38378 (9.23%) Loss: 2.057481 LR: 0.00004998 [23:54:37] Epoch: 1 Batch: 3545/38378 (9.24%) Loss: 2.123950 LR: 0.00004998 [23:54:39] Epoch: 1 Batch: 3546/38378 (9.24%) Loss: 2.207963 LR: 0.00004998 [23:54:40] Epoch: 1 Batch: 3547/38378 (9.24%) Loss: 2.162302 LR: 0.00004998 [23:54:42] Epoch: 1 Batch: 3548/38378 (9.24%) Loss: 1.861630 LR: 0.00004998 [23:54:44] Epoch: 1 Batch: 3549/38378 (9.25%) Loss: 1.966256 LR: 0.00004998 [23:54:45] Epoch: 1 Batch: 3550/38378 (9.25%) Loss: 2.175669 LR: 0.00004998 [23:54:47] Epoch: 1 Batch: 3551/38378 (9.25%) Loss: 2.074818 LR: 0.00004998 [23:54:49] Epoch: 1 Batch: 3552/38378 (9.26%) Loss: 2.104165 LR: 0.00004998 [23:54:51] Epoch: 1 Batch: 3553/38378 (9.26%) Loss: 1.984847 LR: 0.00004998 [23:54:52] Epoch: 1 Batch: 3554/38378 (9.26%) Loss: 2.079124 LR: 0.00004998 [23:54:54] Epoch: 1 Batch: 3555/38378 (9.26%) Loss: 2.127360 LR: 0.00004998 [23:54:56] Epoch: 1 Batch: 3556/38378 (9.27%) Loss: 2.091468 LR: 0.00004998 [23:54:57] Epoch: 1 Batch: 3557/38378 (9.27%) Loss: 1.988440 LR: 0.00004998 [23:54:59] Epoch: 1 Batch: 3558/38378 (9.27%) Loss: 1.893930 LR: 0.00004998 [23:55:01] Epoch: 1 Batch: 3559/38378 (9.27%) Loss: 2.182957 LR: 0.00004998 [23:55:03] Epoch: 1 Batch: 3560/38378 (9.28%) Loss: 1.904555 LR: 0.00004998 [23:55:04] Epoch: 1 Batch: 3561/38378 (9.28%) Loss: 2.211303 LR: 0.00004998 [23:55:06] Epoch: 1 Batch: 3562/38378 (9.28%) Loss: 2.255754 LR: 0.00004998 [23:55:08] Epoch: 1 Batch: 3563/38378 (9.28%) Loss: 2.073259 LR: 0.00004998 [23:55:14] >> Cleaned up old temp checkpoint: epoch1_step3234 [23:55:14] >> Temp checkpoint saved: epoch1_step3564, size: 0.1702 GB [23:55:14] Epoch: 1 Batch: 3564/38378 (9.29%) Loss: 2.266631 LR: 0.00004998 [23:55:15] Epoch: 1 Batch: 3565/38378 (9.29%) Loss: 1.921927 LR: 0.00004998 [23:55:17] Epoch: 1 Batch: 3566/38378 (9.29%) Loss: 1.883172 LR: 0.00004998 [23:55:19] Epoch: 1 Batch: 3567/38378 (9.29%) Loss: 2.049267 LR: 0.00004998 [23:55:21] Epoch: 1 Batch: 3568/38378 (9.30%) Loss: 2.204152 LR: 0.00004998 [23:55:22] Epoch: 1 Batch: 3569/38378 (9.30%) Loss: 2.009425 LR: 0.00004998 [23:55:24] Epoch: 1 Batch: 3570/38378 (9.30%) Loss: 1.868983 LR: 0.00004998 [23:55:26] Epoch: 1 Batch: 3571/38378 (9.30%) Loss: 1.952603 LR: 0.00004998 [23:55:27] Epoch: 1 Batch: 3572/38378 (9.31%) Loss: 1.950245 LR: 0.00004998 [23:55:29] Epoch: 1 Batch: 3573/38378 (9.31%) Loss: 1.955929 LR: 0.00004998 [23:55:31] Epoch: 1 Batch: 3574/38378 (9.31%) Loss: 1.958323 LR: 0.00004998 [23:55:32] Epoch: 1 Batch: 3575/38378 (9.32%) Loss: 1.924472 LR: 0.00004998 [23:55:34] Epoch: 1 Batch: 3576/38378 (9.32%) Loss: 2.037561 LR: 0.00004998 [23:55:36] Epoch: 1 Batch: 3577/38378 (9.32%) Loss: 1.872620 LR: 0.00004998 [23:55:38] Epoch: 1 Batch: 3578/38378 (9.32%) Loss: 2.400761 LR: 0.00004998 [23:55:39] Epoch: 1 Batch: 3579/38378 (9.33%) Loss: 2.086349 LR: 0.00004998 [23:55:41] Epoch: 1 Batch: 3580/38378 (9.33%) Loss: 2.000920 LR: 0.00004998 [23:55:43] Epoch: 1 Batch: 3581/38378 (9.33%) Loss: 1.965078 LR: 0.00004998 [23:55:44] Epoch: 1 Batch: 3582/38378 (9.33%) Loss: 1.926586 LR: 0.00004998 [23:55:46] Epoch: 1 Batch: 3583/38378 (9.34%) Loss: 2.025816 LR: 0.00004998 [23:55:48] Epoch: 1 Batch: 3584/38378 (9.34%) Loss: 2.199895 LR: 0.00004998 [23:55:50] Epoch: 1 Batch: 3585/38378 (9.34%) Loss: 2.306777 LR: 0.00004998 [23:55:51] Epoch: 1 Batch: 3586/38378 (9.34%) Loss: 2.309087 LR: 0.00004998 [23:55:53] Epoch: 1 Batch: 3587/38378 (9.35%) Loss: 1.802768 LR: 0.00004998 [23:55:55] Epoch: 1 Batch: 3588/38378 (9.35%) Loss: 1.919645 LR: 0.00004998 [23:55:56] Epoch: 1 Batch: 3589/38378 (9.35%) Loss: 2.190325 LR: 0.00004998 [23:55:58] Epoch: 1 Batch: 3590/38378 (9.35%) Loss: 2.348268 LR: 0.00004998 [23:56:00] Epoch: 1 Batch: 3591/38378 (9.36%) Loss: 1.867360 LR: 0.00004998 [23:56:02] Epoch: 1 Batch: 3592/38378 (9.36%) Loss: 2.007169 LR: 0.00004998 [23:56:03] Epoch: 1 Batch: 3593/38378 (9.36%) Loss: 2.028220 LR: 0.00004998 [23:56:05] Epoch: 1 Batch: 3594/38378 (9.36%) Loss: 1.999787 LR: 0.00004998 [23:56:07] Epoch: 1 Batch: 3595/38378 (9.37%) Loss: 2.094374 LR: 0.00004998 [23:56:08] Epoch: 1 Batch: 3596/38378 (9.37%) Loss: 2.007006 LR: 0.00004998 [23:56:14] >> Cleaned up old temp checkpoint: epoch1_step3267 [23:56:14] >> Temp checkpoint saved: epoch1_step3597, size: 0.1702 GB [23:56:14] Epoch: 1 Batch: 3597/38378 (9.37%) Loss: 2.092566 LR: 0.00004998 [23:56:16] Epoch: 1 Batch: 3598/38378 (9.38%) Loss: 2.229358 LR: 0.00004998 [23:56:17] Epoch: 1 Batch: 3599/38378 (9.38%) Loss: 2.145194 LR: 0.00004998 [23:56:19] Epoch: 1 Batch: 3600/38378 (9.38%) Loss: 1.922928 LR: 0.00004998 [23:56:21] Epoch: 1 Batch: 3601/38378 (9.38%) Loss: 1.869418 LR: 0.00004998 [23:56:22] Epoch: 1 Batch: 3602/38378 (9.39%) Loss: 1.779664 LR: 0.00004998 [23:56:24] Epoch: 1 Batch: 3603/38378 (9.39%) Loss: 2.004366 LR: 0.00004998 [23:56:26] Epoch: 1 Batch: 3604/38378 (9.39%) Loss: 1.903442 LR: 0.00004998 [23:56:28] Epoch: 1 Batch: 3605/38378 (9.39%) Loss: 2.187532 LR: 0.00004998 [23:56:29] Epoch: 1 Batch: 3606/38378 (9.40%) Loss: 2.098557 LR: 0.00004998 [23:56:31] Epoch: 1 Batch: 3607/38378 (9.40%) Loss: 2.050435 LR: 0.00004998 [23:56:33] Epoch: 1 Batch: 3608/38378 (9.40%) Loss: 2.088250 LR: 0.00004998 [23:56:34] Epoch: 1 Batch: 3609/38378 (9.40%) Loss: 1.972046 LR: 0.00004998 [23:56:36] Epoch: 1 Batch: 3610/38378 (9.41%) Loss: 2.374681 LR: 0.00004998 [23:56:38] Epoch: 1 Batch: 3611/38378 (9.41%) Loss: 1.696532 LR: 0.00004998 [23:56:39] Epoch: 1 Batch: 3612/38378 (9.41%) Loss: 2.132875 LR: 0.00004998 [23:56:41] Epoch: 1 Batch: 3613/38378 (9.41%) Loss: 1.934095 LR: 0.00004998 [23:56:43] Epoch: 1 Batch: 3614/38378 (9.42%) Loss: 1.922545 LR: 0.00004998 [23:56:45] Epoch: 1 Batch: 3615/38378 (9.42%) Loss: 1.750140 LR: 0.00004998 [23:56:46] Epoch: 1 Batch: 3616/38378 (9.42%) Loss: 2.288540 LR: 0.00004998 [23:56:48] Epoch: 1 Batch: 3617/38378 (9.42%) Loss: 1.931041 LR: 0.00004998 [23:56:50] Epoch: 1 Batch: 3618/38378 (9.43%) Loss: 1.812920 LR: 0.00004998 [23:56:51] Epoch: 1 Batch: 3619/38378 (9.43%) Loss: 2.092947 LR: 0.00004998 [23:56:53] Epoch: 1 Batch: 3620/38378 (9.43%) Loss: 2.099936 LR: 0.00004998 [23:56:55] Epoch: 1 Batch: 3621/38378 (9.44%) Loss: 2.243516 LR: 0.00004998 [23:56:57] Epoch: 1 Batch: 3622/38378 (9.44%) Loss: 2.198152 LR: 0.00004998 [23:56:58] Epoch: 1 Batch: 3623/38378 (9.44%) Loss: 2.074850 LR: 0.00004998 [23:57:00] Epoch: 1 Batch: 3624/38378 (9.44%) Loss: 2.379201 LR: 0.00004998 [23:57:02] Epoch: 1 Batch: 3625/38378 (9.45%) Loss: 2.034017 LR: 0.00004998 [23:57:04] Epoch: 1 Batch: 3626/38378 (9.45%) Loss: 2.392898 LR: 0.00004998 [23:57:05] Epoch: 1 Batch: 3627/38378 (9.45%) Loss: 1.850118 LR: 0.00004998 [23:57:07] Epoch: 1 Batch: 3628/38378 (9.45%) Loss: 2.039065 LR: 0.00004998 [23:57:09] Epoch: 1 Batch: 3629/38378 (9.46%) Loss: 2.250348 LR: 0.00004998 [23:57:14] >> Cleaned up old temp checkpoint: epoch1_step3300 [23:57:14] >> Temp checkpoint saved: epoch1_step3630, size: 0.1702 GB [23:57:14] Epoch: 1 Batch: 3630/38378 (9.46%) Loss: 1.902450 LR: 0.00004998 [23:57:16] Epoch: 1 Batch: 3631/38378 (9.46%) Loss: 1.887701 LR: 0.00004998 [23:57:18] Epoch: 1 Batch: 3632/38378 (9.46%) Loss: 2.061330 LR: 0.00004998 [23:57:19] Epoch: 1 Batch: 3633/38378 (9.47%) Loss: 1.860360 LR: 0.00004998 [23:57:21] Epoch: 1 Batch: 3634/38378 (9.47%) Loss: 2.213076 LR: 0.00004998 [23:57:23] Epoch: 1 Batch: 3635/38378 (9.47%) Loss: 2.069491 LR: 0.00004998 [23:57:24] Epoch: 1 Batch: 3636/38378 (9.47%) Loss: 2.054968 LR: 0.00004998 [23:57:26] Epoch: 1 Batch: 3637/38378 (9.48%) Loss: 1.904636 LR: 0.00004998 [23:57:28] Epoch: 1 Batch: 3638/38378 (9.48%) Loss: 1.898407 LR: 0.00004998 [23:57:29] Epoch: 1 Batch: 3639/38378 (9.48%) Loss: 2.199010 LR: 0.00004998 [23:57:31] Epoch: 1 Batch: 3640/38378 (9.48%) Loss: 2.105518 LR: 0.00004998 [23:57:33] Epoch: 1 Batch: 3641/38378 (9.49%) Loss: 2.213751 LR: 0.00004998 [23:57:35] Epoch: 1 Batch: 3642/38378 (9.49%) Loss: 1.956051 LR: 0.00004998 [23:57:36] Epoch: 1 Batch: 3643/38378 (9.49%) Loss: 2.025024 LR: 0.00004998 [23:57:38] Epoch: 1 Batch: 3644/38378 (9.50%) Loss: 1.891895 LR: 0.00004997 [23:57:40] Epoch: 1 Batch: 3645/38378 (9.50%) Loss: 2.191621 LR: 0.00004997 [23:57:41] Epoch: 1 Batch: 3646/38378 (9.50%) Loss: 2.306533 LR: 0.00004997 [23:57:43] Epoch: 1 Batch: 3647/38378 (9.50%) Loss: 2.186682 LR: 0.00004997 [23:57:45] Epoch: 1 Batch: 3648/38378 (9.51%) Loss: 2.269561 LR: 0.00004997 [23:57:47] Epoch: 1 Batch: 3649/38378 (9.51%) Loss: 2.112397 LR: 0.00004997 [23:57:48] Epoch: 1 Batch: 3650/38378 (9.51%) Loss: 1.911303 LR: 0.00004997 [23:57:50] Epoch: 1 Batch: 3651/38378 (9.51%) Loss: 2.083165 LR: 0.00004997 [23:57:52] Epoch: 1 Batch: 3652/38378 (9.52%) Loss: 2.019231 LR: 0.00004997 [23:57:53] Epoch: 1 Batch: 3653/38378 (9.52%) Loss: 1.955795 LR: 0.00004997 [23:57:55] Epoch: 1 Batch: 3654/38378 (9.52%) Loss: 2.176938 LR: 0.00004997 [23:57:57] Epoch: 1 Batch: 3655/38378 (9.52%) Loss: 2.013093 LR: 0.00004997 [23:57:59] Epoch: 1 Batch: 3656/38378 (9.53%) Loss: 1.955961 LR: 0.00004997 [23:58:00] Epoch: 1 Batch: 3657/38378 (9.53%) Loss: 2.030050 LR: 0.00004997 [23:58:02] Epoch: 1 Batch: 3658/38378 (9.53%) Loss: 2.069354 LR: 0.00004997 [23:58:04] Epoch: 1 Batch: 3659/38378 (9.53%) Loss: 2.015117 LR: 0.00004997 [00:10:34] 2025-08-12 [00:10:34] Tesla T4 [00:10:34] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [00:10:34] CPU usage: 96.6%, RAM usage: 24.2% [00:10:34] Running with the following configuration: [00:10:34] model_name: NousResearch/Hermes-3-Llama-3.1-8B [00:10:34] tokenizer: NousResearch/Hermes-3-Llama-3.1-8B [00:10:34] output_dir: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview [00:10:34] train_path: /content/drive/MyDrive/data/none.csv [00:10:34] checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step3630 [00:10:34] lr: 5e-05 [00:10:34] lr_floor: 1e-05 [00:10:34] epochs: 1 [00:10:34] batch_size: 5 [00:10:34] accum_steps: 7 [00:10:34] val_batch_size: 6 [00:10:34] max_val_size: 100 [00:10:34] max_length: 150 [00:10:34] save_temp_frequency: 33 [00:10:34] save_frequency: 500 [00:10:34] eval_frequency: 500 [00:10:34] save_pattern: y [00:10:34] quantization: y [00:10:34] quantization_bits: 4 [00:10:34] lora: y [00:10:34] frozen_lora_path: None [00:10:34] lora_rank: 16 [00:10:34] lora_alpha: 32 [00:10:34] lora_dropout: 0.08 [00:10:34] optimizer_weight_decay: 0.0 [00:10:34] warmup_type: cosine [00:10:34] warmup_ratio: 0.08 [00:10:34] warmup_steps: 439 [00:10:34] shuffle: y [00:10:34] csv_column: text [00:10:34] new_run: n [00:10:34] label_smoothing: 0.05 [00:10:34] SEED: 1 [00:10:34] Using device: cuda [00:10:35] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step3630 [00:12:22] Embeddings shape after: torch.Size([128256, 4096]) [00:12:23] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step3630 [00:12:23] Trainable LoRA 'default': [00:12:23] task_type: CAUSAL_LM [00:12:23] peft_type: PeftType.LORA [00:12:23] auto_mapping: None [00:12:23] base_model_name_or_path: NousResearch/Hermes-3-Llama-3.1-8B [00:12:23] revision: None [00:12:23] inference_mode: False [00:12:23] r: 16 [00:12:23] target_modules: {'q_proj', 'v_proj', 'o_proj', 'k_proj'} [00:12:23] exclude_modules: None [00:12:23] lora_alpha: 32 [00:12:23] lora_dropout: 0.08 [00:12:23] fan_in_fan_out: False [00:12:23] bias: none [00:12:23] use_rslora: True [00:12:23] modules_to_save: None [00:12:23] init_lora_weights: True [00:12:23] layers_to_transform: None [00:12:23] layers_pattern: None [00:12:23] rank_pattern: {} [00:12:23] alpha_pattern: {} [00:12:23] megatron_config: None [00:12:23] megatron_core: megatron.core [00:12:23] trainable_token_indices: None [00:12:23] loftq_config: {} [00:12:23] eva_config: None [00:12:23] corda_config: None [00:12:23] use_dora: False [00:12:23] use_qalora: False [00:12:23] qalora_group_size: 16 [00:12:23] layer_replication: None [00:12:23] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [00:12:23] lora_bias: False [00:12:23] target_parameters: None [00:12:23] _custom_modules: None [00:12:23] Embeddings shape after: torch.Size([128256, 4096]) [00:12:25] Resumed from epoch 1, step 3631, file 1 [00:12:25] Starting from CSV file... [00:12:26] Splitting data into chunks of 11000... [00:12:26] Using 7 processes across 18 chunks [00:12:26] Using saved train/val split from checkpoint. [00:12:26] Resuming scheduler with warmup steps: 438, total steps: 5482 [00:12:26] Initializing scheduler with cosine schedule with warmup, warmup steps 439, total steps: 5482 [00:12:26] Train/Val split: 191887 train, 100 val samples. [00:12:35] Model: PeftModelForCausalLM [00:12:35] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.0", "use_cache": true, "vocab_size": 128256 } [00:12:35] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [00:12:35] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 5e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [00:12:35] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [00:12:35] Scheduler: [00:12:35] Training on 191887 training samples, 100 validation samples [00:12:35] Average tokens per sample: 141.99 [00:12:35] Estimated epoch time: ~572.74 min [00:12:35] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Active memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Requested memory | 5983 MiB | 7000 MiB | 332173 MiB | 326190 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 7616 MiB | 7616 MiB | 7616 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 1259 MiB | 5879 MiB | 333261 MiB | 332002 MiB | |---------------------------------------------------------------------------| | Allocations | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | Active allocs | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | GPU reserved segments | 186 | 186 | 186 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 33 | 37 | 12954 | 12921 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [00:12:35] Restoring shuffle indices from training state for epoch 1 [00:12:35] CPU usage: 58.7%, RAM usage: 40.7% [00:12:36] Epoch 1 learning rate: 0.0 [00:12:36] Starting epoch 1 [00:12:49] Batch 3631: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) [00:12:51] Epoch: 1 Batch: 3631/38378 (9.46%) Loss: 1.887466 LR: 0.00000000 [00:12:52] Epoch: 1 Batch: 3632/38378 (9.46%) Loss: 2.061783 LR: 0.00000000 [00:12:54] Epoch: 1 Batch: 3633/38378 (9.47%) Loss: 1.860898 LR: 0.00000000 [00:12:55] Epoch: 1 Batch: 3634/38378 (9.47%) Loss: 2.215345 LR: 0.00000000 [00:12:57] Epoch: 1 Batch: 3635/38378 (9.47%) Loss: 2.069868 LR: 0.00000000 [00:12:59] Epoch: 1 Batch: 3636/38378 (9.47%) Loss: 2.055941 LR: 0.00000000 [00:13:00] Epoch: 1 Batch: 3637/38378 (9.48%) Loss: 1.904017 LR: 0.00004998 [00:13:02] Epoch: 1 Batch: 3638/38378 (9.48%) Loss: 1.901492 LR: 0.00004998 [00:13:03] Epoch: 1 Batch: 3639/38378 (9.48%) Loss: 2.203099 LR: 0.00004998 [00:13:05] Epoch: 1 Batch: 3640/38378 (9.48%) Loss: 2.105629 LR: 0.00004998 [00:13:07] Epoch: 1 Batch: 3641/38378 (9.49%) Loss: 2.216597 LR: 0.00004998 [00:13:08] Epoch: 1 Batch: 3642/38378 (9.49%) Loss: 1.958523 LR: 0.00004998 [00:13:10] Epoch: 1 Batch: 3643/38378 (9.49%) Loss: 2.025889 LR: 0.00004998 [00:13:11] Epoch: 1 Batch: 3644/38378 (9.50%) Loss: 1.896952 LR: 0.00004997 [00:13:13] Epoch: 1 Batch: 3645/38378 (9.50%) Loss: 2.188288 LR: 0.00004997 [00:13:15] Epoch: 1 Batch: 3646/38378 (9.50%) Loss: 2.302963 LR: 0.00004997 [00:13:16] Epoch: 1 Batch: 3647/38378 (9.50%) Loss: 2.183474 LR: 0.00004997 [00:13:18] Epoch: 1 Batch: 3648/38378 (9.51%) Loss: 2.268047 LR: 0.00004997 [00:13:19] Epoch: 1 Batch: 3649/38378 (9.51%) Loss: 2.109174 LR: 0.00004997 [00:13:21] Epoch: 1 Batch: 3650/38378 (9.51%) Loss: 1.907742 LR: 0.00004997 [00:13:23] Epoch: 1 Batch: 3651/38378 (9.51%) Loss: 2.082826 LR: 0.00004997 [00:13:24] Epoch: 1 Batch: 3652/38378 (9.52%) Loss: 2.016663 LR: 0.00004997 [00:13:26] Epoch: 1 Batch: 3653/38378 (9.52%) Loss: 1.953224 LR: 0.00004997 [00:13:28] Epoch: 1 Batch: 3654/38378 (9.52%) Loss: 2.176015 LR: 0.00004997 [00:13:29] Epoch: 1 Batch: 3655/38378 (9.52%) Loss: 2.010934 LR: 0.00004997 [00:13:31] Epoch: 1 Batch: 3656/38378 (9.53%) Loss: 1.952345 LR: 0.00004997 [00:13:33] Epoch: 1 Batch: 3657/38378 (9.53%) Loss: 2.027800 LR: 0.00004997 [00:13:34] Epoch: 1 Batch: 3658/38378 (9.53%) Loss: 2.067652 LR: 0.00004997 [00:13:36] Epoch: 1 Batch: 3659/38378 (9.53%) Loss: 2.010762 LR: 0.00004997 [00:13:38] Epoch: 1 Batch: 3660/38378 (9.54%) Loss: 2.168887 LR: 0.00004997 [00:13:39] Epoch: 1 Batch: 3661/38378 (9.54%) Loss: 2.129639 LR: 0.00004997 [00:13:41] Epoch: 1 Batch: 3662/38378 (9.54%) Loss: 2.144897 LR: 0.00004997 [00:13:47] >> Cleaned up old temp checkpoint: epoch1_step3333 [00:13:47] >> Temp checkpoint saved: epoch1_step3663, size: 0.1702 GB [00:13:47] Epoch: 1 Batch: 3663/38378 (9.54%) Loss: 2.052339 LR: 0.00004997 [00:13:49] Epoch: 1 Batch: 3664/38378 (9.55%) Loss: 2.279216 LR: 0.00004997 [00:13:51] Epoch: 1 Batch: 3665/38378 (9.55%) Loss: 2.075847 LR: 0.00004997 [00:13:52] Epoch: 1 Batch: 3666/38378 (9.55%) Loss: 2.272588 LR: 0.00004997 [00:13:54] Epoch: 1 Batch: 3667/38378 (9.55%) Loss: 2.181938 LR: 0.00004997 [00:13:56] Epoch: 1 Batch: 3668/38378 (9.56%) Loss: 2.145755 LR: 0.00004997 [00:13:57] Epoch: 1 Batch: 3669/38378 (9.56%) Loss: 1.974259 LR: 0.00004997 [00:13:59] Epoch: 1 Batch: 3670/38378 (9.56%) Loss: 2.028040 LR: 0.00004997 [00:14:01] Epoch: 1 Batch: 3671/38378 (9.57%) Loss: 1.830349 LR: 0.00004997 [00:14:03] Epoch: 1 Batch: 3672/38378 (9.57%) Loss: 1.964195 LR: 0.00004997 [00:14:04] Epoch: 1 Batch: 3673/38378 (9.57%) Loss: 1.925146 LR: 0.00004997 [00:14:06] Epoch: 1 Batch: 3674/38378 (9.57%) Loss: 2.035786 LR: 0.00004997 [00:14:08] Epoch: 1 Batch: 3675/38378 (9.58%) Loss: 2.079034 LR: 0.00004997 [00:14:09] Epoch: 1 Batch: 3676/38378 (9.58%) Loss: 1.863224 LR: 0.00004997 [00:14:11] Epoch: 1 Batch: 3677/38378 (9.58%) Loss: 1.956241 LR: 0.00004997 [00:14:13] Epoch: 1 Batch: 3678/38378 (9.58%) Loss: 2.303366 LR: 0.00004997 [00:14:15] Epoch: 1 Batch: 3679/38378 (9.59%) Loss: 2.271150 LR: 0.00004997 [00:14:16] Epoch: 1 Batch: 3680/38378 (9.59%) Loss: 2.139923 LR: 0.00004997 [00:14:18] Epoch: 1 Batch: 3681/38378 (9.59%) Loss: 1.934603 LR: 0.00004997 [00:14:20] Epoch: 1 Batch: 3682/38378 (9.59%) Loss: 2.080581 LR: 0.00004997 [00:14:21] Epoch: 1 Batch: 3683/38378 (9.60%) Loss: 2.013627 LR: 0.00004997 [00:14:23] Epoch: 1 Batch: 3684/38378 (9.60%) Loss: 2.665467 LR: 0.00004997 [00:14:25] Epoch: 1 Batch: 3685/38378 (9.60%) Loss: 2.149057 LR: 0.00004997 [00:14:27] Epoch: 1 Batch: 3686/38378 (9.60%) Loss: 1.953933 LR: 0.00004997 [00:14:28] Epoch: 1 Batch: 3687/38378 (9.61%) Loss: 2.009007 LR: 0.00004997 [00:14:30] Epoch: 1 Batch: 3688/38378 (9.61%) Loss: 2.094461 LR: 0.00004997 [00:14:32] Epoch: 1 Batch: 3689/38378 (9.61%) Loss: 2.002739 LR: 0.00004997 [00:14:33] Epoch: 1 Batch: 3690/38378 (9.61%) Loss: 1.989227 LR: 0.00004997 [00:14:35] Epoch: 1 Batch: 3691/38378 (9.62%) Loss: 2.282540 LR: 0.00004997 [00:14:37] Epoch: 1 Batch: 3692/38378 (9.62%) Loss: 2.001654 LR: 0.00004997 [00:14:38] Epoch: 1 Batch: 3693/38378 (9.62%) Loss: 2.356913 LR: 0.00004997 [00:14:40] Epoch: 1 Batch: 3694/38378 (9.63%) Loss: 1.948015 LR: 0.00004997 [00:14:42] Epoch: 1 Batch: 3695/38378 (9.63%) Loss: 1.891243 LR: 0.00004997 [00:14:47] >> Cleaned up old temp checkpoint: epoch1_step3366 [00:14:47] >> Temp checkpoint saved: epoch1_step3696, size: 0.1702 GB [00:14:47] Epoch: 1 Batch: 3696/38378 (9.63%) Loss: 1.971847 LR: 0.00004997 [00:14:49] Epoch: 1 Batch: 3697/38378 (9.63%) Loss: 1.978511 LR: 0.00004997 [00:14:51] Epoch: 1 Batch: 3698/38378 (9.64%) Loss: 1.858000 LR: 0.00004997 [00:14:52] Epoch: 1 Batch: 3699/38378 (9.64%) Loss: 2.169906 LR: 0.00004997 [00:14:54] Epoch: 1 Batch: 3700/38378 (9.64%) Loss: 2.086829 LR: 0.00004997 [00:14:56] Epoch: 1 Batch: 3701/38378 (9.64%) Loss: 1.996521 LR: 0.00004997 [00:14:57] Epoch: 1 Batch: 3702/38378 (9.65%) Loss: 2.280440 LR: 0.00004997 [00:14:59] Epoch: 1 Batch: 3703/38378 (9.65%) Loss: 1.811304 LR: 0.00004997 [00:15:01] Epoch: 1 Batch: 3704/38378 (9.65%) Loss: 2.199296 LR: 0.00004997 [00:15:02] Epoch: 1 Batch: 3705/38378 (9.65%) Loss: 2.243775 LR: 0.00004997 [00:15:04] Epoch: 1 Batch: 3706/38378 (9.66%) Loss: 2.168764 LR: 0.00004997 [00:15:06] Epoch: 1 Batch: 3707/38378 (9.66%) Loss: 2.251110 LR: 0.00004997 [00:15:07] Epoch: 1 Batch: 3708/38378 (9.66%) Loss: 2.115487 LR: 0.00004997 [00:15:09] Epoch: 1 Batch: 3709/38378 (9.66%) Loss: 2.232844 LR: 0.00004997 [00:15:11] Epoch: 1 Batch: 3710/38378 (9.67%) Loss: 2.415325 LR: 0.00004997 [00:15:13] Epoch: 1 Batch: 3711/38378 (9.67%) Loss: 1.997220 LR: 0.00004997 [00:15:14] Epoch: 1 Batch: 3712/38378 (9.67%) Loss: 2.104048 LR: 0.00004997 [00:15:16] Epoch: 1 Batch: 3713/38378 (9.67%) Loss: 2.108422 LR: 0.00004997 [00:15:18] Epoch: 1 Batch: 3714/38378 (9.68%) Loss: 1.864396 LR: 0.00004997 [00:15:20] Epoch: 1 Batch: 3715/38378 (9.68%) Loss: 2.112816 LR: 0.00004997 [00:15:21] Epoch: 1 Batch: 3716/38378 (9.68%) Loss: 2.023434 LR: 0.00004997 [00:15:23] Epoch: 1 Batch: 3717/38378 (9.69%) Loss: 1.828788 LR: 0.00004997 [00:15:25] Epoch: 1 Batch: 3718/38378 (9.69%) Loss: 1.980828 LR: 0.00004997 [00:15:27] Epoch: 1 Batch: 3719/38378 (9.69%) Loss: 2.159196 LR: 0.00004997 [00:15:28] Epoch: 1 Batch: 3720/38378 (9.69%) Loss: 2.322002 LR: 0.00004997 [00:15:30] Epoch: 1 Batch: 3721/38378 (9.70%) Loss: 2.184050 LR: 0.00004997 [00:15:32] Epoch: 1 Batch: 3722/38378 (9.70%) Loss: 2.096860 LR: 0.00004997 [00:15:33] Epoch: 1 Batch: 3723/38378 (9.70%) Loss: 1.945031 LR: 0.00004997 [00:15:35] Epoch: 1 Batch: 3724/38378 (9.70%) Loss: 2.066177 LR: 0.00004997 [00:15:37] Epoch: 1 Batch: 3725/38378 (9.71%) Loss: 2.250332 LR: 0.00004997 [00:15:39] Epoch: 1 Batch: 3726/38378 (9.71%) Loss: 2.000521 LR: 0.00004997 [00:15:40] Epoch: 1 Batch: 3727/38378 (9.71%) Loss: 2.296556 LR: 0.00004997 [00:15:42] Epoch: 1 Batch: 3728/38378 (9.71%) Loss: 2.097992 LR: 0.00004997 [00:15:48] >> Cleaned up old temp checkpoint: epoch1_step3399 [00:15:48] >> Temp checkpoint saved: epoch1_step3729, size: 0.1702 GB [00:15:48] Epoch: 1 Batch: 3729/38378 (9.72%) Loss: 1.701422 LR: 0.00004997 [00:15:49] Epoch: 1 Batch: 3730/38378 (9.72%) Loss: 2.216450 LR: 0.00004997 [00:15:51] Epoch: 1 Batch: 3731/38378 (9.72%) Loss: 1.919648 LR: 0.00004997 [00:15:53] Epoch: 1 Batch: 3732/38378 (9.72%) Loss: 2.271408 LR: 0.00004997 [00:15:54] Epoch: 1 Batch: 3733/38378 (9.73%) Loss: 2.199741 LR: 0.00004997 [00:15:56] Epoch: 1 Batch: 3734/38378 (9.73%) Loss: 2.011250 LR: 0.00004997 [00:15:58] Epoch: 1 Batch: 3735/38378 (9.73%) Loss: 1.846381 LR: 0.00004997 [00:15:59] Epoch: 1 Batch: 3736/38378 (9.73%) Loss: 2.024504 LR: 0.00004997 [00:16:01] Epoch: 1 Batch: 3737/38378 (9.74%) Loss: 2.170674 LR: 0.00004997 [00:16:03] Epoch: 1 Batch: 3738/38378 (9.74%) Loss: 2.077742 LR: 0.00004997 [00:16:05] Epoch: 1 Batch: 3739/38378 (9.74%) Loss: 1.920801 LR: 0.00004997 [00:16:06] Epoch: 1 Batch: 3740/38378 (9.75%) Loss: 1.930700 LR: 0.00004997 [00:16:08] Epoch: 1 Batch: 3741/38378 (9.75%) Loss: 1.904685 LR: 0.00004997 [00:16:10] Epoch: 1 Batch: 3742/38378 (9.75%) Loss: 2.151476 LR: 0.00004996 [00:16:11] Epoch: 1 Batch: 3743/38378 (9.75%) Loss: 2.159115 LR: 0.00004996 [00:16:13] Epoch: 1 Batch: 3744/38378 (9.76%) Loss: 1.948646 LR: 0.00004996 [00:16:15] Epoch: 1 Batch: 3745/38378 (9.76%) Loss: 2.029473 LR: 0.00004996 [00:16:16] Epoch: 1 Batch: 3746/38378 (9.76%) Loss: 2.058630 LR: 0.00004996 [00:16:18] Epoch: 1 Batch: 3747/38378 (9.76%) Loss: 1.946703 LR: 0.00004996 [00:16:20] Epoch: 1 Batch: 3748/38378 (9.77%) Loss: 2.081581 LR: 0.00004996 [00:16:22] Epoch: 1 Batch: 3749/38378 (9.77%) Loss: 1.918032 LR: 0.00004996 [00:16:23] Epoch: 1 Batch: 3750/38378 (9.77%) Loss: 2.431919 LR: 0.00004996 [00:16:25] Epoch: 1 Batch: 3751/38378 (9.77%) Loss: 2.103381 LR: 0.00004996 [00:16:27] Epoch: 1 Batch: 3752/38378 (9.78%) Loss: 2.035605 LR: 0.00004996 [00:16:28] Epoch: 1 Batch: 3753/38378 (9.78%) Loss: 2.116280 LR: 0.00004996 [00:16:30] Epoch: 1 Batch: 3754/38378 (9.78%) Loss: 1.990209 LR: 0.00004996 [00:16:32] Epoch: 1 Batch: 3755/38378 (9.78%) Loss: 1.789263 LR: 0.00004996 [00:16:33] Epoch: 1 Batch: 3756/38378 (9.79%) Loss: 1.706761 LR: 0.00004996 [00:16:35] Epoch: 1 Batch: 3757/38378 (9.79%) Loss: 2.290510 LR: 0.00004996 [00:16:37] Epoch: 1 Batch: 3758/38378 (9.79%) Loss: 1.988681 LR: 0.00004996 [00:16:39] Epoch: 1 Batch: 3759/38378 (9.79%) Loss: 2.286469 LR: 0.00004996 [00:16:40] Epoch: 1 Batch: 3760/38378 (9.80%) Loss: 1.969938 LR: 0.00004996 [00:16:42] Epoch: 1 Batch: 3761/38378 (9.80%) Loss: 1.854371 LR: 0.00004996 [00:16:48] >> Cleaned up old temp checkpoint: epoch1_step3432 [00:16:48] >> Temp checkpoint saved: epoch1_step3762, size: 0.1702 GB [00:16:48] Epoch: 1 Batch: 3762/38378 (9.80%) Loss: 2.241086 LR: 0.00004996 [00:16:49] Epoch: 1 Batch: 3763/38378 (9.81%) Loss: 1.934888 LR: 0.00004996 [00:16:51] Epoch: 1 Batch: 3764/38378 (9.81%) Loss: 1.922343 LR: 0.00004996 [00:16:53] Epoch: 1 Batch: 3765/38378 (9.81%) Loss: 1.927464 LR: 0.00004996 [00:16:54] Epoch: 1 Batch: 3766/38378 (9.81%) Loss: 2.080988 LR: 0.00004996 [00:16:56] Epoch: 1 Batch: 3767/38378 (9.82%) Loss: 2.141328 LR: 0.00004996 [00:16:58] Epoch: 1 Batch: 3768/38378 (9.82%) Loss: 1.762515 LR: 0.00004996 [00:17:00] Epoch: 1 Batch: 3769/38378 (9.82%) Loss: 2.390140 LR: 0.00004996 [00:17:01] Epoch: 1 Batch: 3770/38378 (9.82%) Loss: 1.770986 LR: 0.00004996 [00:17:03] Epoch: 1 Batch: 3771/38378 (9.83%) Loss: 1.847224 LR: 0.00004996 [00:17:05] Epoch: 1 Batch: 3772/38378 (9.83%) Loss: 1.996838 LR: 0.00004996 [00:17:06] Epoch: 1 Batch: 3773/38378 (9.83%) Loss: 1.926503 LR: 0.00004996 [00:17:08] Epoch: 1 Batch: 3774/38378 (9.83%) Loss: 2.328197 LR: 0.00004996 [00:17:10] Epoch: 1 Batch: 3775/38378 (9.84%) Loss: 2.078576 LR: 0.00004996 [00:17:12] Epoch: 1 Batch: 3776/38378 (9.84%) Loss: 1.823912 LR: 0.00004996 [00:17:13] Epoch: 1 Batch: 3777/38378 (9.84%) Loss: 2.307814 LR: 0.00004996 [00:17:15] Epoch: 1 Batch: 3778/38378 (9.84%) Loss: 2.190810 LR: 0.00004996 [00:17:17] Epoch: 1 Batch: 3779/38378 (9.85%) Loss: 1.889884 LR: 0.00004996 [00:17:18] Epoch: 1 Batch: 3780/38378 (9.85%) Loss: 2.083631 LR: 0.00004996 [00:17:20] Epoch: 1 Batch: 3781/38378 (9.85%) Loss: 2.005529 LR: 0.00004996 [00:17:22] Epoch: 1 Batch: 3782/38378 (9.85%) Loss: 2.266496 LR: 0.00004996 [00:17:24] Epoch: 1 Batch: 3783/38378 (9.86%) Loss: 1.835557 LR: 0.00004996 [00:17:25] Epoch: 1 Batch: 3784/38378 (9.86%) Loss: 2.358848 LR: 0.00004996 [00:17:27] Epoch: 1 Batch: 3785/38378 (9.86%) Loss: 2.013807 LR: 0.00004996 [00:17:29] Epoch: 1 Batch: 3786/38378 (9.87%) Loss: 2.118159 LR: 0.00004996 [00:17:30] Epoch: 1 Batch: 3787/38378 (9.87%) Loss: 2.133390 LR: 0.00004996 [00:17:32] Epoch: 1 Batch: 3788/38378 (9.87%) Loss: 2.180889 LR: 0.00004996 [00:17:34] Epoch: 1 Batch: 3789/38378 (9.87%) Loss: 2.028257 LR: 0.00004996 [00:17:36] Epoch: 1 Batch: 3790/38378 (9.88%) Loss: 2.104965 LR: 0.00004996 [00:17:37] Epoch: 1 Batch: 3791/38378 (9.88%) Loss: 1.939509 LR: 0.00004996 [00:17:39] Epoch: 1 Batch: 3792/38378 (9.88%) Loss: 2.011513 LR: 0.00004996 [00:17:41] Epoch: 1 Batch: 3793/38378 (9.88%) Loss: 2.074126 LR: 0.00004996 [00:17:42] Epoch: 1 Batch: 3794/38378 (9.89%) Loss: 2.078220 LR: 0.00004996 [00:17:48] >> Cleaned up old temp checkpoint: epoch1_step3465 [00:17:48] >> Temp checkpoint saved: epoch1_step3795, size: 0.1702 GB [00:17:48] Epoch: 1 Batch: 3795/38378 (9.89%) Loss: 2.020704 LR: 0.00004996 [00:17:50] Epoch: 1 Batch: 3796/38378 (9.89%) Loss: 2.027433 LR: 0.00004996 [00:17:51] Epoch: 1 Batch: 3797/38378 (9.89%) Loss: 2.275753 LR: 0.00004996 [00:17:53] Epoch: 1 Batch: 3798/38378 (9.90%) Loss: 2.210509 LR: 0.00004996 [00:17:55] Epoch: 1 Batch: 3799/38378 (9.90%) Loss: 2.255682 LR: 0.00004996 [00:17:56] Epoch: 1 Batch: 3800/38378 (9.90%) Loss: 2.054657 LR: 0.00004996 [00:17:58] Epoch: 1 Batch: 3801/38378 (9.90%) Loss: 1.852100 LR: 0.00004996 [00:18:00] Epoch: 1 Batch: 3802/38378 (9.91%) Loss: 2.092054 LR: 0.00004996 [00:18:02] Epoch: 1 Batch: 3803/38378 (9.91%) Loss: 2.149859 LR: 0.00004996 [00:18:03] Epoch: 1 Batch: 3804/38378 (9.91%) Loss: 2.049251 LR: 0.00004996 [00:18:05] Epoch: 1 Batch: 3805/38378 (9.91%) Loss: 1.992502 LR: 0.00004996 [00:18:07] Epoch: 1 Batch: 3806/38378 (9.92%) Loss: 1.946724 LR: 0.00004996 [00:18:08] Epoch: 1 Batch: 3807/38378 (9.92%) Loss: 1.935066 LR: 0.00004996 [00:18:11] Epoch: 1 Batch: 3808/38378 (9.92%) Loss: 2.081296 LR: 0.00004996 [00:18:12] Epoch: 1 Batch: 3809/38378 (9.92%) Loss: 2.275548 LR: 0.00004996 [00:18:14] Epoch: 1 Batch: 3810/38378 (9.93%) Loss: 2.165495 LR: 0.00004996 [00:18:16] Epoch: 1 Batch: 3811/38378 (9.93%) Loss: 2.082064 LR: 0.00004996 [00:18:17] Epoch: 1 Batch: 3812/38378 (9.93%) Loss: 2.125441 LR: 0.00004996 [00:18:19] Epoch: 1 Batch: 3813/38378 (9.94%) Loss: 2.084312 LR: 0.00004996 [00:18:21] Epoch: 1 Batch: 3814/38378 (9.94%) Loss: 2.103895 LR: 0.00004996 [00:18:22] Epoch: 1 Batch: 3815/38378 (9.94%) Loss: 2.211685 LR: 0.00004996 [00:18:24] Epoch: 1 Batch: 3816/38378 (9.94%) Loss: 1.962742 LR: 0.00004996 [00:18:26] Epoch: 1 Batch: 3817/38378 (9.95%) Loss: 2.055504 LR: 0.00004996 [00:18:28] Epoch: 1 Batch: 3818/38378 (9.95%) Loss: 2.235935 LR: 0.00004996 [00:18:29] Epoch: 1 Batch: 3819/38378 (9.95%) Loss: 2.001931 LR: 0.00004996 [00:18:31] Epoch: 1 Batch: 3820/38378 (9.95%) Loss: 2.199240 LR: 0.00004996 [00:18:33] Epoch: 1 Batch: 3821/38378 (9.96%) Loss: 2.023818 LR: 0.00004996 [00:18:34] Epoch: 1 Batch: 3822/38378 (9.96%) Loss: 2.108021 LR: 0.00004996 [00:18:36] Epoch: 1 Batch: 3823/38378 (9.96%) Loss: 2.065312 LR: 0.00004996 [00:18:38] Epoch: 1 Batch: 3824/38378 (9.96%) Loss: 1.872877 LR: 0.00004996 [00:18:40] Epoch: 1 Batch: 3825/38378 (9.97%) Loss: 2.104890 LR: 0.00004996 [00:18:41] Epoch: 1 Batch: 3826/38378 (9.97%) Loss: 1.948580 LR: 0.00004996 [00:18:43] Epoch: 1 Batch: 3827/38378 (9.97%) Loss: 2.189779 LR: 0.00004996 [00:18:49] >> Cleaned up old temp checkpoint: epoch1_step3498 [00:18:49] >> Temp checkpoint saved: epoch1_step3828, size: 0.1702 GB [00:18:49] Epoch: 1 Batch: 3828/38378 (9.97%) Loss: 2.157942 LR: 0.00004996 [00:18:51] Epoch: 1 Batch: 3829/38378 (9.98%) Loss: 2.334561 LR: 0.00004996 [00:18:52] Epoch: 1 Batch: 3830/38378 (9.98%) Loss: 2.122692 LR: 0.00004996 [00:18:54] Epoch: 1 Batch: 3831/38378 (9.98%) Loss: 2.015479 LR: 0.00004996 [00:18:56] Epoch: 1 Batch: 3832/38378 (9.98%) Loss: 2.040537 LR: 0.00004996 [00:18:58] Epoch: 1 Batch: 3833/38378 (9.99%) Loss: 1.856259 LR: 0.00004995 [00:18:59] Epoch: 1 Batch: 3834/38378 (9.99%) Loss: 1.896581 LR: 0.00004995 [00:19:01] Epoch: 1 Batch: 3835/38378 (9.99%) Loss: 2.485229 LR: 0.00004995 [00:19:03] Epoch: 1 Batch: 3836/38378 (10.00%) Loss: 1.956606 LR: 0.00004995 [00:19:04] Epoch: 1 Batch: 3837/38378 (10.00%) Loss: 2.082857 LR: 0.00004995 [00:19:06] Epoch: 1 Batch: 3838/38378 (10.00%) Loss: 2.352217 LR: 0.00004995 [00:19:08] Epoch: 1 Batch: 3839/38378 (10.00%) Loss: 2.067350 LR: 0.00004995 [00:19:10] Epoch: 1 Batch: 3840/38378 (10.01%) Loss: 2.244382 LR: 0.00004995 [00:19:11] Epoch: 1 Batch: 3841/38378 (10.01%) Loss: 2.366728 LR: 0.00004995 [00:19:13] Epoch: 1 Batch: 3842/38378 (10.01%) Loss: 2.008995 LR: 0.00004995 [00:19:15] Epoch: 1 Batch: 3843/38378 (10.01%) Loss: 1.774190 LR: 0.00004995 [00:19:16] Epoch: 1 Batch: 3844/38378 (10.02%) Loss: 1.724624 LR: 0.00004995 [00:19:18] Epoch: 1 Batch: 3845/38378 (10.02%) Loss: 2.097665 LR: 0.00004995 [00:19:20] Epoch: 1 Batch: 3846/38378 (10.02%) Loss: 2.215500 LR: 0.00004995 [00:19:22] Epoch: 1 Batch: 3847/38378 (10.02%) Loss: 1.930366 LR: 0.00004995 [00:19:23] Epoch: 1 Batch: 3848/38378 (10.03%) Loss: 2.059847 LR: 0.00004995 [00:19:25] Epoch: 1 Batch: 3849/38378 (10.03%) Loss: 2.121897 LR: 0.00004995 [00:19:27] Epoch: 1 Batch: 3850/38378 (10.03%) Loss: 1.928489 LR: 0.00004995 [00:19:28] Epoch: 1 Batch: 3851/38378 (10.03%) Loss: 2.222757 LR: 0.00004995 [00:19:30] Epoch: 1 Batch: 3852/38378 (10.04%) Loss: 1.960623 LR: 0.00004995 [00:19:32] Epoch: 1 Batch: 3853/38378 (10.04%) Loss: 2.009138 LR: 0.00004995 [00:19:34] Epoch: 1 Batch: 3854/38378 (10.04%) Loss: 1.935373 LR: 0.00004995 [00:19:35] Epoch: 1 Batch: 3855/38378 (10.04%) Loss: 2.026858 LR: 0.00004995 [00:19:37] Epoch: 1 Batch: 3856/38378 (10.05%) Loss: 2.362399 LR: 0.00004995 [00:19:39] Epoch: 1 Batch: 3857/38378 (10.05%) Loss: 2.057836 LR: 0.00004995 [00:19:40] Epoch: 1 Batch: 3858/38378 (10.05%) Loss: 1.793245 LR: 0.00004995 [00:19:42] Epoch: 1 Batch: 3859/38378 (10.06%) Loss: 2.077559 LR: 0.00004995 [00:19:44] Epoch: 1 Batch: 3860/38378 (10.06%) Loss: 2.135655 LR: 0.00004995 [00:19:49] >> Cleaned up old temp checkpoint: epoch1_step3531 [00:19:50] >> Temp checkpoint saved: epoch1_step3861, size: 0.1702 GB [00:19:50] Epoch: 1 Batch: 3861/38378 (10.06%) Loss: 2.104693 LR: 0.00004995 [00:19:51] Epoch: 1 Batch: 3862/38378 (10.06%) Loss: 2.050349 LR: 0.00004995 [00:19:53] Epoch: 1 Batch: 3863/38378 (10.07%) Loss: 2.144490 LR: 0.00004995 [00:19:55] Epoch: 1 Batch: 3864/38378 (10.07%) Loss: 2.178722 LR: 0.00004995 [00:19:56] Epoch: 1 Batch: 3865/38378 (10.07%) Loss: 1.920578 LR: 0.00004995 [00:19:58] Epoch: 1 Batch: 3866/38378 (10.07%) Loss: 2.056083 LR: 0.00004995 [00:20:00] Epoch: 1 Batch: 3867/38378 (10.08%) Loss: 1.976148 LR: 0.00004995 [00:20:01] Epoch: 1 Batch: 3868/38378 (10.08%) Loss: 1.989179 LR: 0.00004995 [00:20:03] Epoch: 1 Batch: 3869/38378 (10.08%) Loss: 2.046777 LR: 0.00004995 [00:20:05] Epoch: 1 Batch: 3870/38378 (10.08%) Loss: 2.623630 LR: 0.00004995 [00:20:07] Epoch: 1 Batch: 3871/38378 (10.09%) Loss: 2.141826 LR: 0.00004995 [00:20:08] Epoch: 1 Batch: 3872/38378 (10.09%) Loss: 2.063361 LR: 0.00004995 [00:20:10] Epoch: 1 Batch: 3873/38378 (10.09%) Loss: 2.081761 LR: 0.00004995 [00:20:12] Epoch: 1 Batch: 3874/38378 (10.09%) Loss: 1.897465 LR: 0.00004995 [00:20:13] Epoch: 1 Batch: 3875/38378 (10.10%) Loss: 2.085907 LR: 0.00004995 [00:20:15] Epoch: 1 Batch: 3876/38378 (10.10%) Loss: 2.036105 LR: 0.00004995 [00:20:17] Epoch: 1 Batch: 3877/38378 (10.10%) Loss: 2.102937 LR: 0.00004995 [00:20:19] Epoch: 1 Batch: 3878/38378 (10.10%) Loss: 2.284543 LR: 0.00004995 [00:20:20] Epoch: 1 Batch: 3879/38378 (10.11%) Loss: 2.084446 LR: 0.00004995 [00:20:22] Epoch: 1 Batch: 3880/38378 (10.11%) Loss: 1.972124 LR: 0.00004995 [00:20:24] Epoch: 1 Batch: 3881/38378 (10.11%) Loss: 1.939434 LR: 0.00004995 [00:20:25] Epoch: 1 Batch: 3882/38378 (10.12%) Loss: 2.011525 LR: 0.00004995 [00:20:27] Epoch: 1 Batch: 3883/38378 (10.12%) Loss: 2.008310 LR: 0.00004995 [00:20:29] Epoch: 1 Batch: 3884/38378 (10.12%) Loss: 2.240626 LR: 0.00004995 [00:20:31] Epoch: 1 Batch: 3885/38378 (10.12%) Loss: 1.978885 LR: 0.00004995 [00:20:32] Epoch: 1 Batch: 3886/38378 (10.13%) Loss: 2.090805 LR: 0.00004995 [00:20:34] Epoch: 1 Batch: 3887/38378 (10.13%) Loss: 1.895073 LR: 0.00004995 [00:20:36] Epoch: 1 Batch: 3888/38378 (10.13%) Loss: 2.048081 LR: 0.00004995 [00:20:37] Epoch: 1 Batch: 3889/38378 (10.13%) Loss: 1.896739 LR: 0.00004995 [00:20:39] Epoch: 1 Batch: 3890/38378 (10.14%) Loss: 2.179632 LR: 0.00004995 [00:20:41] Epoch: 1 Batch: 3891/38378 (10.14%) Loss: 1.997189 LR: 0.00004995 [00:20:43] Epoch: 1 Batch: 3892/38378 (10.14%) Loss: 1.993140 LR: 0.00004995 [00:20:44] Epoch: 1 Batch: 3893/38378 (10.14%) Loss: 1.889229 LR: 0.00004995 [00:20:50] >> Cleaned up old temp checkpoint: epoch1_step3564 [00:20:50] >> Temp checkpoint saved: epoch1_step3894, size: 0.1702 GB [00:20:50] Epoch: 1 Batch: 3894/38378 (10.15%) Loss: 2.150344 LR: 0.00004995 [00:20:52] Epoch: 1 Batch: 3895/38378 (10.15%) Loss: 1.816465 LR: 0.00004995 [00:20:53] Epoch: 1 Batch: 3896/38378 (10.15%) Loss: 2.224157 LR: 0.00004995 [00:20:55] Epoch: 1 Batch: 3897/38378 (10.15%) Loss: 2.139259 LR: 0.00004995 [00:20:57] Epoch: 1 Batch: 3898/38378 (10.16%) Loss: 1.863078 LR: 0.00004995 [00:20:58] Epoch: 1 Batch: 3899/38378 (10.16%) Loss: 2.322302 LR: 0.00004995 [00:21:00] Epoch: 1 Batch: 3900/38378 (10.16%) Loss: 1.874993 LR: 0.00004995 [00:21:02] Epoch: 1 Batch: 3901/38378 (10.16%) Loss: 2.169041 LR: 0.00004995 [00:21:03] Epoch: 1 Batch: 3902/38378 (10.17%) Loss: 1.944709 LR: 0.00004995 [00:21:05] Epoch: 1 Batch: 3903/38378 (10.17%) Loss: 2.057567 LR: 0.00004995 [00:21:07] Epoch: 1 Batch: 3904/38378 (10.17%) Loss: 1.924797 LR: 0.00004995 [00:21:09] Epoch: 1 Batch: 3905/38378 (10.18%) Loss: 2.139266 LR: 0.00004995 [00:21:10] Epoch: 1 Batch: 3906/38378 (10.18%) Loss: 2.095165 LR: 0.00004995 [00:21:12] Epoch: 1 Batch: 3907/38378 (10.18%) Loss: 2.058807 LR: 0.00004995 [00:21:14] Epoch: 1 Batch: 3908/38378 (10.18%) Loss: 2.149660 LR: 0.00004995 [00:21:15] Epoch: 1 Batch: 3909/38378 (10.19%) Loss: 1.928863 LR: 0.00004995 [00:21:17] Epoch: 1 Batch: 3910/38378 (10.19%) Loss: 1.906671 LR: 0.00004995 [00:21:19] Epoch: 1 Batch: 3911/38378 (10.19%) Loss: 1.831699 LR: 0.00004995 [00:21:21] Epoch: 1 Batch: 3912/38378 (10.19%) Loss: 1.999679 LR: 0.00004995 [00:21:22] Epoch: 1 Batch: 3913/38378 (10.20%) Loss: 2.183829 LR: 0.00004995 [00:21:24] Epoch: 1 Batch: 3914/38378 (10.20%) Loss: 2.107630 LR: 0.00004995 [00:21:26] Epoch: 1 Batch: 3915/38378 (10.20%) Loss: 2.146065 LR: 0.00004995 [00:21:27] Epoch: 1 Batch: 3916/38378 (10.20%) Loss: 2.008844 LR: 0.00004995 [00:21:29] Epoch: 1 Batch: 3917/38378 (10.21%) Loss: 2.279676 LR: 0.00004994 [00:21:31] Epoch: 1 Batch: 3918/38378 (10.21%) Loss: 2.194450 LR: 0.00004994 [00:21:33] Epoch: 1 Batch: 3919/38378 (10.21%) Loss: 1.949866 LR: 0.00004994 [00:21:34] Epoch: 1 Batch: 3920/38378 (10.21%) Loss: 2.027902 LR: 0.00004994 [00:21:36] Epoch: 1 Batch: 3921/38378 (10.22%) Loss: 1.925706 LR: 0.00004994 [00:21:38] Epoch: 1 Batch: 3922/38378 (10.22%) Loss: 2.038238 LR: 0.00004994 [00:21:39] Epoch: 1 Batch: 3923/38378 (10.22%) Loss: 2.243536 LR: 0.00004994 [00:21:41] Epoch: 1 Batch: 3924/38378 (10.22%) Loss: 2.074098 LR: 0.00004994 [00:21:43] Epoch: 1 Batch: 3925/38378 (10.23%) Loss: 2.096351 LR: 0.00004994 [00:21:45] Epoch: 1 Batch: 3926/38378 (10.23%) Loss: 1.988287 LR: 0.00004994 [00:21:50] >> Cleaned up old temp checkpoint: epoch1_step3597 [00:21:50] >> Temp checkpoint saved: epoch1_step3927, size: 0.1702 GB [00:21:50] Epoch: 1 Batch: 3927/38378 (10.23%) Loss: 2.562402 LR: 0.00004994 [00:21:52] Epoch: 1 Batch: 3928/38378 (10.24%) Loss: 1.873113 LR: 0.00004994 [00:21:53] Epoch: 1 Batch: 3929/38378 (10.24%) Loss: 2.066786 LR: 0.00004994 [00:21:55] Epoch: 1 Batch: 3930/38378 (10.24%) Loss: 1.909242 LR: 0.00004994 [00:21:57] Epoch: 1 Batch: 3931/38378 (10.24%) Loss: 1.881066 LR: 0.00004994 [00:21:58] Epoch: 1 Batch: 3932/38378 (10.25%) Loss: 2.118363 LR: 0.00004994 [00:22:00] Epoch: 1 Batch: 3933/38378 (10.25%) Loss: 2.074937 LR: 0.00004994 [00:22:02] Epoch: 1 Batch: 3934/38378 (10.25%) Loss: 2.079181 LR: 0.00004994 [00:22:04] Epoch: 1 Batch: 3935/38378 (10.25%) Loss: 2.017228 LR: 0.00004994 [00:22:05] Epoch: 1 Batch: 3936/38378 (10.26%) Loss: 2.098662 LR: 0.00004994 [00:22:07] Epoch: 1 Batch: 3937/38378 (10.26%) Loss: 2.224152 LR: 0.00004994 [00:22:09] Epoch: 1 Batch: 3938/38378 (10.26%) Loss: 2.307223 LR: 0.00004994 [00:22:10] Epoch: 1 Batch: 3939/38378 (10.26%) Loss: 1.768886 LR: 0.00004994 [00:22:12] Epoch: 1 Batch: 3940/38378 (10.27%) Loss: 2.060860 LR: 0.00004994 [00:22:14] Epoch: 1 Batch: 3941/38378 (10.27%) Loss: 2.068433 LR: 0.00004994 [00:22:15] Epoch: 1 Batch: 3942/38378 (10.27%) Loss: 2.068700 LR: 0.00004994 [00:22:17] Epoch: 1 Batch: 3943/38378 (10.27%) Loss: 2.137808 LR: 0.00004994 [00:22:19] Epoch: 1 Batch: 3944/38378 (10.28%) Loss: 2.056303 LR: 0.00004994 [00:22:21] Epoch: 1 Batch: 3945/38378 (10.28%) Loss: 1.761443 LR: 0.00004994 [00:22:22] Epoch: 1 Batch: 3946/38378 (10.28%) Loss: 2.099188 LR: 0.00004994 [00:22:24] Epoch: 1 Batch: 3947/38378 (10.28%) Loss: 2.856028 LR: 0.00004994 [00:22:26] Epoch: 1 Batch: 3948/38378 (10.29%) Loss: 1.943787 LR: 0.00004994 [00:22:27] Epoch: 1 Batch: 3949/38378 (10.29%) Loss: 1.982280 LR: 0.00004994 [00:22:29] Epoch: 1 Batch: 3950/38378 (10.29%) Loss: 1.812521 LR: 0.00004994 [00:22:31] Epoch: 1 Batch: 3951/38378 (10.29%) Loss: 2.007506 LR: 0.00004994 [00:22:32] Epoch: 1 Batch: 3952/38378 (10.30%) Loss: 2.491319 LR: 0.00004994 [00:22:34] Epoch: 1 Batch: 3953/38378 (10.30%) Loss: 2.116142 LR: 0.00004994 [00:22:36] Epoch: 1 Batch: 3954/38378 (10.30%) Loss: 1.979151 LR: 0.00004994 [00:22:38] Epoch: 1 Batch: 3955/38378 (10.31%) Loss: 1.816672 LR: 0.00004994 [00:22:39] Epoch: 1 Batch: 3956/38378 (10.31%) Loss: 2.293830 LR: 0.00004994 [00:22:41] Epoch: 1 Batch: 3957/38378 (10.31%) Loss: 2.071482 LR: 0.00004994 [00:22:43] Epoch: 1 Batch: 3958/38378 (10.31%) Loss: 1.727024 LR: 0.00004994 [00:22:45] Epoch: 1 Batch: 3959/38378 (10.32%) Loss: 2.100447 LR: 0.00004994 [00:22:50] >> Cleaned up old temp checkpoint: epoch1_step3630 [00:22:50] >> Deleted old temp checkpoint zip: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step3630.zip [00:22:50] >> Temp checkpoint saved: epoch1_step3960, size: 0.1702 GB [00:22:50] Epoch: 1 Batch: 3960/38378 (10.32%) Loss: 2.111247 LR: 0.00004994 [00:22:52] Epoch: 1 Batch: 3961/38378 (10.32%) Loss: 2.284051 LR: 0.00004994 [00:22:53] Epoch: 1 Batch: 3962/38378 (10.32%) Loss: 2.525779 LR: 0.00004994 [00:22:55] Epoch: 1 Batch: 3963/38378 (10.33%) Loss: 2.052404 LR: 0.00004994 [00:22:57] Epoch: 1 Batch: 3964/38378 (10.33%) Loss: 1.799427 LR: 0.00004994 [00:22:58] Epoch: 1 Batch: 3965/38378 (10.33%) Loss: 2.496456 LR: 0.00004994 [00:23:00] Epoch: 1 Batch: 3966/38378 (10.33%) Loss: 2.058163 LR: 0.00004994 [00:23:02] Epoch: 1 Batch: 3967/38378 (10.34%) Loss: 2.057737 LR: 0.00004994 [00:23:04] Epoch: 1 Batch: 3968/38378 (10.34%) Loss: 1.773966 LR: 0.00004994 [00:23:05] Epoch: 1 Batch: 3969/38378 (10.34%) Loss: 2.026609 LR: 0.00004994 [00:23:07] Epoch: 1 Batch: 3970/38378 (10.34%) Loss: 2.049850 LR: 0.00004994 [00:23:09] Epoch: 1 Batch: 3971/38378 (10.35%) Loss: 1.992007 LR: 0.00004994 [00:23:10] Epoch: 1 Batch: 3972/38378 (10.35%) Loss: 2.196543 LR: 0.00004994 [00:23:12] Epoch: 1 Batch: 3973/38378 (10.35%) Loss: 1.992464 LR: 0.00004994 [00:23:14] Epoch: 1 Batch: 3974/38378 (10.35%) Loss: 2.179340 LR: 0.00004994 [00:23:16] Epoch: 1 Batch: 3975/38378 (10.36%) Loss: 2.279162 LR: 0.00004994 [00:23:17] Epoch: 1 Batch: 3976/38378 (10.36%) Loss: 1.911385 LR: 0.00004994 [00:23:19] Epoch: 1 Batch: 3977/38378 (10.36%) Loss: 2.069915 LR: 0.00004994 [00:23:21] Epoch: 1 Batch: 3978/38378 (10.37%) Loss: 2.139474 LR: 0.00004994 [00:23:23] Epoch: 1 Batch: 3979/38378 (10.37%) Loss: 1.858873 LR: 0.00004994 [00:23:24] Epoch: 1 Batch: 3980/38378 (10.37%) Loss: 1.782772 LR: 0.00004994 [00:23:26] Epoch: 1 Batch: 3981/38378 (10.37%) Loss: 2.099753 LR: 0.00004994 [00:23:28] Epoch: 1 Batch: 3982/38378 (10.38%) Loss: 2.055882 LR: 0.00004994 [00:23:29] Epoch: 1 Batch: 3983/38378 (10.38%) Loss: 1.930809 LR: 0.00004994 [00:23:31] Epoch: 1 Batch: 3984/38378 (10.38%) Loss: 1.932184 LR: 0.00004994 [00:23:33] Epoch: 1 Batch: 3985/38378 (10.38%) Loss: 1.805502 LR: 0.00004994 [00:23:35] Epoch: 1 Batch: 3986/38378 (10.39%) Loss: 2.140358 LR: 0.00004994 [00:23:36] Epoch: 1 Batch: 3987/38378 (10.39%) Loss: 2.015846 LR: 0.00004993 [00:23:38] Epoch: 1 Batch: 3988/38378 (10.39%) Loss: 2.181654 LR: 0.00004993 [00:23:40] Epoch: 1 Batch: 3989/38378 (10.39%) Loss: 2.493454 LR: 0.00004993 [00:23:42] Epoch: 1 Batch: 3990/38378 (10.40%) Loss: 2.203938 LR: 0.00004993 [00:23:43] Epoch: 1 Batch: 3991/38378 (10.40%) Loss: 1.734715 LR: 0.00004993 [00:23:45] Epoch: 1 Batch: 3992/38378 (10.40%) Loss: 2.071052 LR: 0.00004993 [00:23:50] >> Cleaned up old temp checkpoint: epoch1_step3663 [00:23:50] >> Temp checkpoint saved: epoch1_step3993, size: 0.1702 GB [00:23:50] Epoch: 1 Batch: 3993/38378 (10.40%) Loss: 2.312405 LR: 0.00004993 [00:23:52] Epoch: 1 Batch: 3994/38378 (10.41%) Loss: 1.811478 LR: 0.00004993 [00:23:54] Epoch: 1 Batch: 3995/38378 (10.41%) Loss: 1.747792 LR: 0.00004993 [00:23:55] Epoch: 1 Batch: 3996/38378 (10.41%) Loss: 2.076281 LR: 0.00004993 [00:23:57] Epoch: 1 Batch: 3997/38378 (10.41%) Loss: 1.857183 LR: 0.00004993 [00:23:59] Epoch: 1 Batch: 3998/38378 (10.42%) Loss: 2.019402 LR: 0.00004993 [00:24:00] Epoch: 1 Batch: 3999/38378 (10.42%) Loss: 1.974579 LR: 0.00004993 [00:24:02] >> Evaluating batch 0 [00:24:03] >> Evaluating batch 1 [00:24:04] >> Evaluating batch 2 [00:24:05] >> Evaluating batch 3 [00:24:06] >> Evaluating batch 4 [00:24:07] >> Evaluating batch 5 [00:24:08] >> Evaluating batch 6 [00:24:09] >> Evaluating batch 7 [00:24:10] >> Evaluating batch 8 [00:24:11] >> Evaluating batch 9 [00:24:12] >> Evaluating batch 10 [00:24:12] >> Evaluating batch 11 [00:24:13] >> Evaluating batch 12 [00:24:14] >> Evaluating batch 13 [00:24:15] >> Evaluating batch 14 [00:24:16] >> Evaluating batch 15 [00:24:17] >> Evaluating batch 16 [00:24:18] Epoch: 1 Step: 4000/38378 Evaluation: [00:24:18] [1mAvg Loss Since Last Eval: 0.1911 Val Loss: 2.1567 Validation loss delta: 2.1567 Perplexity: 8.6428 LR: 0.00004993 [00:24:22] >> Checkpoint saved: epoch1_step4000, size: 0.1702 GB [00:24:22] Epoch: 1 Batch: 4000/38378 (10.42%) Loss: 1.957031 LR: 0.00004993 [00:24:24] Epoch: 1 Batch: 4001/38378 (10.43%) Loss: 2.296851 LR: 0.00004993 [00:24:25] Epoch: 1 Batch: 4002/38378 (10.43%) Loss: 1.971190 LR: 0.00004993 [00:24:27] Epoch: 1 Batch: 4003/38378 (10.43%) Loss: 1.905132 LR: 0.00004993 [00:24:29] Epoch: 1 Batch: 4004/38378 (10.43%) Loss: 2.042533 LR: 0.00004993 [00:24:30] Epoch: 1 Batch: 4005/38378 (10.44%) Loss: 1.904363 LR: 0.00004993 [00:24:32] Epoch: 1 Batch: 4006/38378 (10.44%) Loss: 2.157550 LR: 0.00004993 [00:24:34] Epoch: 1 Batch: 4007/38378 (10.44%) Loss: 2.229871 LR: 0.00004993 [00:24:36] Epoch: 1 Batch: 4008/38378 (10.44%) Loss: 1.834514 LR: 0.00004993 [00:24:37] Epoch: 1 Batch: 4009/38378 (10.45%) Loss: 1.876961 LR: 0.00004993 [00:24:39] Epoch: 1 Batch: 4010/38378 (10.45%) Loss: 2.113167 LR: 0.00004993 [00:24:41] Epoch: 1 Batch: 4011/38378 (10.45%) Loss: 1.880054 LR: 0.00004993 [00:24:43] Epoch: 1 Batch: 4012/38378 (10.45%) Loss: 2.057337 LR: 0.00004993 [00:24:44] Epoch: 1 Batch: 4013/38378 (10.46%) Loss: 2.097212 LR: 0.00004993 [00:24:46] Epoch: 1 Batch: 4014/38378 (10.46%) Loss: 2.166005 LR: 0.00004993 [00:24:48] Epoch: 1 Batch: 4015/38378 (10.46%) Loss: 2.145948 LR: 0.00004993 [00:24:49] Epoch: 1 Batch: 4016/38378 (10.46%) Loss: 1.997242 LR: 0.00004993 [00:24:51] Epoch: 1 Batch: 4017/38378 (10.47%) Loss: 2.019651 LR: 0.00004993 [00:24:53] Epoch: 1 Batch: 4018/38378 (10.47%) Loss: 2.102742 LR: 0.00004993 [00:24:55] Epoch: 1 Batch: 4019/38378 (10.47%) Loss: 1.919119 LR: 0.00004993 [00:24:56] Epoch: 1 Batch: 4020/38378 (10.47%) Loss: 2.266054 LR: 0.00004993 [00:24:58] Epoch: 1 Batch: 4021/38378 (10.48%) Loss: 1.789955 LR: 0.00004993 [00:25:00] Epoch: 1 Batch: 4022/38378 (10.48%) Loss: 1.930170 LR: 0.00004993 [00:25:01] Epoch: 1 Batch: 4023/38378 (10.48%) Loss: 2.186632 LR: 0.00004993 [00:25:03] Epoch: 1 Batch: 4024/38378 (10.49%) Loss: 1.942712 LR: 0.00004993 [00:25:05] Epoch: 1 Batch: 4025/38378 (10.49%) Loss: 2.135246 LR: 0.00004993 [00:25:11] >> Cleaned up old temp checkpoint: epoch1_step3696 [00:25:11] >> Temp checkpoint saved: epoch1_step4026, size: 0.1702 GB [00:25:11] Epoch: 1 Batch: 4026/38378 (10.49%) Loss: 1.938936 LR: 0.00004993 [00:25:13] Epoch: 1 Batch: 4027/38378 (10.49%) Loss: 1.888069 LR: 0.00004993 [00:25:14] Epoch: 1 Batch: 4028/38378 (10.50%) Loss: 2.270702 LR: 0.00004993 [00:25:16] Epoch: 1 Batch: 4029/38378 (10.50%) Loss: 1.966207 LR: 0.00004993 [00:25:18] Epoch: 1 Batch: 4030/38378 (10.50%) Loss: 2.122396 LR: 0.00004993 [00:25:19] Epoch: 1 Batch: 4031/38378 (10.50%) Loss: 2.147965 LR: 0.00004993 [00:25:21] Epoch: 1 Batch: 4032/38378 (10.51%) Loss: 2.100988 LR: 0.00004993 [00:25:23] Epoch: 1 Batch: 4033/38378 (10.51%) Loss: 2.456939 LR: 0.00004993 [00:25:25] Epoch: 1 Batch: 4034/38378 (10.51%) Loss: 2.026231 LR: 0.00004993 [00:25:26] Epoch: 1 Batch: 4035/38378 (10.51%) Loss: 2.021543 LR: 0.00004993 [00:25:28] Epoch: 1 Batch: 4036/38378 (10.52%) Loss: 2.197185 LR: 0.00004993 [00:25:30] Epoch: 1 Batch: 4037/38378 (10.52%) Loss: 1.868367 LR: 0.00004993 [00:25:31] Epoch: 1 Batch: 4038/38378 (10.52%) Loss: 2.385763 LR: 0.00004993 [00:25:33] Epoch: 1 Batch: 4039/38378 (10.52%) Loss: 2.145139 LR: 0.00004993 [00:25:35] Epoch: 1 Batch: 4040/38378 (10.53%) Loss: 1.991080 LR: 0.00004993 [00:25:36] Epoch: 1 Batch: 4041/38378 (10.53%) Loss: 2.179657 LR: 0.00004993 [00:25:38] Epoch: 1 Batch: 4042/38378 (10.53%) Loss: 1.927568 LR: 0.00004993 [00:25:40] Epoch: 1 Batch: 4043/38378 (10.53%) Loss: 2.278923 LR: 0.00004993 [00:25:42] Epoch: 1 Batch: 4044/38378 (10.54%) Loss: 2.446083 LR: 0.00004993 [00:25:43] Epoch: 1 Batch: 4045/38378 (10.54%) Loss: 1.869571 LR: 0.00004993 [00:25:45] Epoch: 1 Batch: 4046/38378 (10.54%) Loss: 1.932889 LR: 0.00004993 [00:25:47] Epoch: 1 Batch: 4047/38378 (10.55%) Loss: 2.020109 LR: 0.00004993 [00:25:49] Epoch: 1 Batch: 4048/38378 (10.55%) Loss: 2.191731 LR: 0.00004993 [00:25:50] Epoch: 1 Batch: 4049/38378 (10.55%) Loss: 1.976874 LR: 0.00004993 [00:25:52] Epoch: 1 Batch: 4050/38378 (10.55%) Loss: 2.233070 LR: 0.00004993 [00:25:54] Epoch: 1 Batch: 4051/38378 (10.56%) Loss: 1.820854 LR: 0.00004993 [00:25:55] Epoch: 1 Batch: 4052/38378 (10.56%) Loss: 2.112206 LR: 0.00004993 [00:25:57] Epoch: 1 Batch: 4053/38378 (10.56%) Loss: 1.919260 LR: 0.00004993 [00:25:59] Epoch: 1 Batch: 4054/38378 (10.56%) Loss: 2.038659 LR: 0.00004993 [00:26:01] Epoch: 1 Batch: 4055/38378 (10.57%) Loss: 1.967465 LR: 0.00004993 [00:26:02] Epoch: 1 Batch: 4056/38378 (10.57%) Loss: 2.148939 LR: 0.00004993 [00:26:04] Epoch: 1 Batch: 4057/38378 (10.57%) Loss: 1.968682 LR: 0.00004992 [00:26:06] Epoch: 1 Batch: 4058/38378 (10.57%) Loss: 1.799923 LR: 0.00004992 [00:26:11] >> Cleaned up old temp checkpoint: epoch1_step3729 [00:26:11] >> Temp checkpoint saved: epoch1_step4059, size: 0.1702 GB [00:26:11] Epoch: 1 Batch: 4059/38378 (10.58%) Loss: 1.875215 LR: 0.00004992 [00:26:13] Epoch: 1 Batch: 4060/38378 (10.58%) Loss: 1.992655 LR: 0.00004992 [00:26:15] Epoch: 1 Batch: 4061/38378 (10.58%) Loss: 1.788203 LR: 0.00004992 [00:26:16] Epoch: 1 Batch: 4062/38378 (10.58%) Loss: 1.950572 LR: 0.00004992 [00:26:18] Epoch: 1 Batch: 4063/38378 (10.59%) Loss: 2.041706 LR: 0.00004992 [00:26:20] Epoch: 1 Batch: 4064/38378 (10.59%) Loss: 2.007662 LR: 0.00004992 [00:26:21] Epoch: 1 Batch: 4065/38378 (10.59%) Loss: 1.783556 LR: 0.00004992 [00:26:23] Epoch: 1 Batch: 4066/38378 (10.59%) Loss: 2.120369 LR: 0.00004992 [00:26:25] Epoch: 1 Batch: 4067/38378 (10.60%) Loss: 1.752171 LR: 0.00004992 [00:26:27] Epoch: 1 Batch: 4068/38378 (10.60%) Loss: 1.976731 LR: 0.00004992 [00:26:28] Epoch: 1 Batch: 4069/38378 (10.60%) Loss: 1.936839 LR: 0.00004992 [00:26:30] Epoch: 1 Batch: 4070/38378 (10.61%) Loss: 1.932468 LR: 0.00004992 [00:26:32] Epoch: 1 Batch: 4071/38378 (10.61%) Loss: 1.904854 LR: 0.00004992 [00:26:33] Epoch: 1 Batch: 4072/38378 (10.61%) Loss: 1.653688 LR: 0.00004992 [00:26:35] Epoch: 1 Batch: 4073/38378 (10.61%) Loss: 1.534032 LR: 0.00004992 [00:26:37] Epoch: 1 Batch: 4074/38378 (10.62%) Loss: 2.109621 LR: 0.00004992 [00:26:38] Epoch: 1 Batch: 4075/38378 (10.62%) Loss: 1.787801 LR: 0.00004992 [00:26:40] Epoch: 1 Batch: 4076/38378 (10.62%) Loss: 2.203741 LR: 0.00004992 [00:26:42] Epoch: 1 Batch: 4077/38378 (10.62%) Loss: 2.195061 LR: 0.00004992 [00:26:44] Epoch: 1 Batch: 4078/38378 (10.63%) Loss: 2.162152 LR: 0.00004992 [00:26:45] Epoch: 1 Batch: 4079/38378 (10.63%) Loss: 2.310037 LR: 0.00004992 [00:26:47] Epoch: 1 Batch: 4080/38378 (10.63%) Loss: 1.899925 LR: 0.00004992 [00:26:49] Epoch: 1 Batch: 4081/38378 (10.63%) Loss: 1.993508 LR: 0.00004992 [00:26:51] Epoch: 1 Batch: 4082/38378 (10.64%) Loss: 2.139043 LR: 0.00004992 [00:26:52] Epoch: 1 Batch: 4083/38378 (10.64%) Loss: 1.701891 LR: 0.00004992 [00:26:54] Epoch: 1 Batch: 4084/38378 (10.64%) Loss: 1.847757 LR: 0.00004992 [00:26:56] Epoch: 1 Batch: 4085/38378 (10.64%) Loss: 2.265172 LR: 0.00004992 [00:26:57] Epoch: 1 Batch: 4086/38378 (10.65%) Loss: 1.868558 LR: 0.00004992 [00:26:59] Epoch: 1 Batch: 4087/38378 (10.65%) Loss: 1.909332 LR: 0.00004992 [00:27:01] Epoch: 1 Batch: 4088/38378 (10.65%) Loss: 2.226554 LR: 0.00004992 [00:27:03] Epoch: 1 Batch: 4089/38378 (10.65%) Loss: 1.965011 LR: 0.00004992 [00:27:04] Epoch: 1 Batch: 4090/38378 (10.66%) Loss: 2.153261 LR: 0.00004992 [00:27:06] Epoch: 1 Batch: 4091/38378 (10.66%) Loss: 2.114173 LR: 0.00004992 [00:27:12] >> Cleaned up old temp checkpoint: epoch1_step3762 [00:27:12] >> Temp checkpoint saved: epoch1_step4092, size: 0.1702 GB [00:27:12] Epoch: 1 Batch: 4092/38378 (10.66%) Loss: 2.029572 LR: 0.00004992 [00:27:13] Epoch: 1 Batch: 4093/38378 (10.66%) Loss: 1.882836 LR: 0.00004992 [00:27:15] Epoch: 1 Batch: 4094/38378 (10.67%) Loss: 2.458692 LR: 0.00004992 [00:27:17] Epoch: 1 Batch: 4095/38378 (10.67%) Loss: 2.292662 LR: 0.00004992 [00:27:18] Epoch: 1 Batch: 4096/38378 (10.67%) Loss: 1.866231 LR: 0.00004992 [00:27:20] Epoch: 1 Batch: 4097/38378 (10.68%) Loss: 2.134983 LR: 0.00004992 [00:27:22] Epoch: 1 Batch: 4098/38378 (10.68%) Loss: 1.810497 LR: 0.00004992 [00:27:23] Epoch: 1 Batch: 4099/38378 (10.68%) Loss: 2.088933 LR: 0.00004992 [00:27:25] Epoch: 1 Batch: 4100/38378 (10.68%) Loss: 2.053565 LR: 0.00004992 [00:27:27] Epoch: 1 Batch: 4101/38378 (10.69%) Loss: 2.120575 LR: 0.00004992 [00:27:29] Epoch: 1 Batch: 4102/38378 (10.69%) Loss: 1.841351 LR: 0.00004992 [00:27:30] Epoch: 1 Batch: 4103/38378 (10.69%) Loss: 1.715830 LR: 0.00004992 [00:27:32] Epoch: 1 Batch: 4104/38378 (10.69%) Loss: 2.057563 LR: 0.00004992 [00:27:34] Epoch: 1 Batch: 4105/38378 (10.70%) Loss: 1.977033 LR: 0.00004992 [00:27:35] Epoch: 1 Batch: 4106/38378 (10.70%) Loss: 2.063789 LR: 0.00004992 [00:27:37] Epoch: 1 Batch: 4107/38378 (10.70%) Loss: 2.014769 LR: 0.00004992 [00:27:39] Epoch: 1 Batch: 4108/38378 (10.70%) Loss: 1.976660 LR: 0.00004992 [00:27:40] Epoch: 1 Batch: 4109/38378 (10.71%) Loss: 2.108324 LR: 0.00004992 [00:27:42] Epoch: 1 Batch: 4110/38378 (10.71%) Loss: 2.110029 LR: 0.00004992 [00:27:44] Epoch: 1 Batch: 4111/38378 (10.71%) Loss: 2.193363 LR: 0.00004992 [00:27:46] Epoch: 1 Batch: 4112/38378 (10.71%) Loss: 2.110614 LR: 0.00004992 [00:27:47] Epoch: 1 Batch: 4113/38378 (10.72%) Loss: 2.148514 LR: 0.00004992 [00:27:49] Epoch: 1 Batch: 4114/38378 (10.72%) Loss: 2.272184 LR: 0.00004992 [00:27:51] Epoch: 1 Batch: 4115/38378 (10.72%) Loss: 2.255415 LR: 0.00004992 [00:27:52] Epoch: 1 Batch: 4116/38378 (10.72%) Loss: 2.265577 LR: 0.00004992 [00:27:54] Epoch: 1 Batch: 4117/38378 (10.73%) Loss: 1.871718 LR: 0.00004992 [00:27:56] Epoch: 1 Batch: 4118/38378 (10.73%) Loss: 2.106356 LR: 0.00004992 [00:27:57] Epoch: 1 Batch: 4119/38378 (10.73%) Loss: 2.097848 LR: 0.00004992 [00:27:59] Epoch: 1 Batch: 4120/38378 (10.74%) Loss: 2.210457 LR: 0.00004991 [00:28:01] Epoch: 1 Batch: 4121/38378 (10.74%) Loss: 2.139550 LR: 0.00004991 [00:28:03] Epoch: 1 Batch: 4122/38378 (10.74%) Loss: 2.115926 LR: 0.00004991 [00:28:04] Epoch: 1 Batch: 4123/38378 (10.74%) Loss: 2.308593 LR: 0.00004991 [00:28:06] Epoch: 1 Batch: 4124/38378 (10.75%) Loss: 1.852711 LR: 0.00004991 [00:28:11] >> Cleaned up old temp checkpoint: epoch1_step3795 [00:28:11] >> Temp checkpoint saved: epoch1_step4125, size: 0.1702 GB [00:28:11] Epoch: 1 Batch: 4125/38378 (10.75%) Loss: 2.129118 LR: 0.00004991 [00:28:13] Epoch: 1 Batch: 4126/38378 (10.75%) Loss: 2.003217 LR: 0.00004991 [00:28:15] Epoch: 1 Batch: 4127/38378 (10.75%) Loss: 2.145838 LR: 0.00004991 [00:28:17] Epoch: 1 Batch: 4128/38378 (10.76%) Loss: 2.246157 LR: 0.00004991 [00:28:18] Epoch: 1 Batch: 4129/38378 (10.76%) Loss: 2.052120 LR: 0.00004991 [00:28:20] Epoch: 1 Batch: 4130/38378 (10.76%) Loss: 2.123782 LR: 0.00004991 [00:28:22] Epoch: 1 Batch: 4131/38378 (10.76%) Loss: 1.878765 LR: 0.00004991 [00:28:23] Epoch: 1 Batch: 4132/38378 (10.77%) Loss: 2.280629 LR: 0.00004991 [00:28:25] Epoch: 1 Batch: 4133/38378 (10.77%) Loss: 2.064098 LR: 0.00004991 [00:28:27] Epoch: 1 Batch: 4134/38378 (10.77%) Loss: 2.008151 LR: 0.00004991 [00:28:28] Epoch: 1 Batch: 4135/38378 (10.77%) Loss: 2.034409 LR: 0.00004991 [00:28:30] Epoch: 1 Batch: 4136/38378 (10.78%) Loss: 1.886915 LR: 0.00004991 [00:28:32] Epoch: 1 Batch: 4137/38378 (10.78%) Loss: 2.003403 LR: 0.00004991 [00:28:34] Epoch: 1 Batch: 4138/38378 (10.78%) Loss: 2.029105 LR: 0.00004991 [00:28:35] Epoch: 1 Batch: 4139/38378 (10.78%) Loss: 1.835864 LR: 0.00004991 [00:28:37] Epoch: 1 Batch: 4140/38378 (10.79%) Loss: 2.056185 LR: 0.00004991 [00:28:39] Epoch: 1 Batch: 4141/38378 (10.79%) Loss: 2.007665 LR: 0.00004991 [00:28:40] Epoch: 1 Batch: 4142/38378 (10.79%) Loss: 2.040192 LR: 0.00004991 [00:28:42] Epoch: 1 Batch: 4143/38378 (10.80%) Loss: 2.142645 LR: 0.00004991 [00:28:44] Epoch: 1 Batch: 4144/38378 (10.80%) Loss: 1.931860 LR: 0.00004991 [00:28:46] Epoch: 1 Batch: 4145/38378 (10.80%) Loss: 1.761315 LR: 0.00004991 [00:28:47] Epoch: 1 Batch: 4146/38378 (10.80%) Loss: 1.716926 LR: 0.00004991 [00:28:49] Epoch: 1 Batch: 4147/38378 (10.81%) Loss: 2.060702 LR: 0.00004991 [00:28:51] Epoch: 1 Batch: 4148/38378 (10.81%) Loss: 1.889852 LR: 0.00004991 [00:28:52] Epoch: 1 Batch: 4149/38378 (10.81%) Loss: 1.944878 LR: 0.00004991 [00:28:54] Epoch: 1 Batch: 4150/38378 (10.81%) Loss: 2.158502 LR: 0.00004991 [00:28:56] Epoch: 1 Batch: 4151/38378 (10.82%) Loss: 1.844268 LR: 0.00004991 [00:28:58] Epoch: 1 Batch: 4152/38378 (10.82%) Loss: 1.983797 LR: 0.00004991 [00:28:59] Epoch: 1 Batch: 4153/38378 (10.82%) Loss: 2.044981 LR: 0.00004991 [00:29:01] Epoch: 1 Batch: 4154/38378 (10.82%) Loss: 1.880744 LR: 0.00004991 [00:29:03] Epoch: 1 Batch: 4155/38378 (10.83%) Loss: 2.209298 LR: 0.00004991 [00:29:04] Epoch: 1 Batch: 4156/38378 (10.83%) Loss: 1.890877 LR: 0.00004991 [00:29:06] Epoch: 1 Batch: 4157/38378 (10.83%) Loss: 1.794856 LR: 0.00004991 [00:29:12] >> Cleaned up old temp checkpoint: epoch1_step3828 [00:29:12] >> Temp checkpoint saved: epoch1_step4158, size: 0.1702 GB [00:29:12] Epoch: 1 Batch: 4158/38378 (10.83%) Loss: 1.970247 LR: 0.00004991 [00:29:13] Epoch: 1 Batch: 4159/38378 (10.84%) Loss: 2.202449 LR: 0.00004991 [00:29:15] Epoch: 1 Batch: 4160/38378 (10.84%) Loss: 1.927799 LR: 0.00004991 [00:29:17] Epoch: 1 Batch: 4161/38378 (10.84%) Loss: 2.276272 LR: 0.00004991 [00:29:18] Epoch: 1 Batch: 4162/38378 (10.84%) Loss: 2.389697 LR: 0.00004991 [00:29:20] Epoch: 1 Batch: 4163/38378 (10.85%) Loss: 2.007517 LR: 0.00004991 [00:29:22] Epoch: 1 Batch: 4164/38378 (10.85%) Loss: 1.948222 LR: 0.00004991 [00:29:23] Epoch: 1 Batch: 4165/38378 (10.85%) Loss: 2.047504 LR: 0.00004991 [00:29:25] Epoch: 1 Batch: 4166/38378 (10.86%) Loss: 2.012149 LR: 0.00004991 [00:29:27] Epoch: 1 Batch: 4167/38378 (10.86%) Loss: 2.093044 LR: 0.00004991 [00:29:29] Epoch: 1 Batch: 4168/38378 (10.86%) Loss: 2.111143 LR: 0.00004991 [00:29:30] Epoch: 1 Batch: 4169/38378 (10.86%) Loss: 1.747115 LR: 0.00004991 [00:29:32] Epoch: 1 Batch: 4170/38378 (10.87%) Loss: 1.871529 LR: 0.00004991 [00:29:34] Epoch: 1 Batch: 4171/38378 (10.87%) Loss: 2.073632 LR: 0.00004991 [00:29:35] Epoch: 1 Batch: 4172/38378 (10.87%) Loss: 1.869734 LR: 0.00004991 [00:29:37] Epoch: 1 Batch: 4173/38378 (10.87%) Loss: 2.177303 LR: 0.00004991 [00:29:39] Epoch: 1 Batch: 4174/38378 (10.88%) Loss: 1.987960 LR: 0.00004991 [00:29:41] Epoch: 1 Batch: 4175/38378 (10.88%) Loss: 2.078547 LR: 0.00004991 [00:29:42] Epoch: 1 Batch: 4176/38378 (10.88%) Loss: 1.953037 LR: 0.00004990 [00:29:44] Epoch: 1 Batch: 4177/38378 (10.88%) Loss: 1.758524 LR: 0.00004990 [00:29:46] Epoch: 1 Batch: 4178/38378 (10.89%) Loss: 2.179696 LR: 0.00004990 [00:29:47] Epoch: 1 Batch: 4179/38378 (10.89%) Loss: 2.208877 LR: 0.00004990 [00:29:49] Epoch: 1 Batch: 4180/38378 (10.89%) Loss: 1.676603 LR: 0.00004990 [00:29:51] Epoch: 1 Batch: 4181/38378 (10.89%) Loss: 2.022076 LR: 0.00004990 [00:29:52] Epoch: 1 Batch: 4182/38378 (10.90%) Loss: 2.117439 LR: 0.00004990 [00:29:54] Epoch: 1 Batch: 4183/38378 (10.90%) Loss: 2.330500 LR: 0.00004990 [00:29:56] Epoch: 1 Batch: 4184/38378 (10.90%) Loss: 2.239749 LR: 0.00004990 [00:29:57] Epoch: 1 Batch: 4185/38378 (10.90%) Loss: 1.641382 LR: 0.00004990 [00:29:59] Epoch: 1 Batch: 4186/38378 (10.91%) Loss: 1.897234 LR: 0.00004990 [00:30:01] Epoch: 1 Batch: 4187/38378 (10.91%) Loss: 2.379676 LR: 0.00004990 [00:30:02] Epoch: 1 Batch: 4188/38378 (10.91%) Loss: 2.326206 LR: 0.00004990 [00:30:04] Epoch: 1 Batch: 4189/38378 (10.92%) Loss: 1.973119 LR: 0.00004990 [00:30:06] Epoch: 1 Batch: 4190/38378 (10.92%) Loss: 1.894502 LR: 0.00004990 [00:30:11] >> Cleaned up old temp checkpoint: epoch1_step3861 [00:30:11] >> Temp checkpoint saved: epoch1_step4191, size: 0.1702 GB [00:30:11] Epoch: 1 Batch: 4191/38378 (10.92%) Loss: 2.144800 LR: 0.00004990 [00:30:13] Epoch: 1 Batch: 4192/38378 (10.92%) Loss: 1.749164 LR: 0.00004990 [00:30:15] Epoch: 1 Batch: 4193/38378 (10.93%) Loss: 1.864666 LR: 0.00004990 [00:30:16] Epoch: 1 Batch: 4194/38378 (10.93%) Loss: 2.040620 LR: 0.00004990 [00:30:18] Epoch: 1 Batch: 4195/38378 (10.93%) Loss: 2.035757 LR: 0.00004990 [00:30:20] Epoch: 1 Batch: 4196/38378 (10.93%) Loss: 1.910780 LR: 0.00004990 [00:30:21] Epoch: 1 Batch: 4197/38378 (10.94%) Loss: 2.054040 LR: 0.00004990 [00:30:23] Epoch: 1 Batch: 4198/38378 (10.94%) Loss: 1.940913 LR: 0.00004990 [00:30:25] Epoch: 1 Batch: 4199/38378 (10.94%) Loss: 2.341528 LR: 0.00004990 [00:30:26] Epoch: 1 Batch: 4200/38378 (10.94%) Loss: 2.141082 LR: 0.00004990 [00:30:28] Epoch: 1 Batch: 4201/38378 (10.95%) Loss: 2.079216 LR: 0.00004990 [00:30:30] Epoch: 1 Batch: 4202/38378 (10.95%) Loss: 1.825482 LR: 0.00004990 [00:30:32] Epoch: 1 Batch: 4203/38378 (10.95%) Loss: 1.988120 LR: 0.00004990 [00:30:33] Epoch: 1 Batch: 4204/38378 (10.95%) Loss: 1.849755 LR: 0.00004990 [00:30:35] Epoch: 1 Batch: 4205/38378 (10.96%) Loss: 2.137517 LR: 0.00004990 [00:30:37] Epoch: 1 Batch: 4206/38378 (10.96%) Loss: 1.895941 LR: 0.00004990 [00:30:38] Epoch: 1 Batch: 4207/38378 (10.96%) Loss: 2.003304 LR: 0.00004990 [00:30:40] Epoch: 1 Batch: 4208/38378 (10.96%) Loss: 2.219096 LR: 0.00004990 [00:30:42] Epoch: 1 Batch: 4209/38378 (10.97%) Loss: 1.880147 LR: 0.00004990 [00:30:44] Epoch: 1 Batch: 4210/38378 (10.97%) Loss: 2.194307 LR: 0.00004990 [00:30:45] Epoch: 1 Batch: 4211/38378 (10.97%) Loss: 2.400377 LR: 0.00004990 [00:30:47] Epoch: 1 Batch: 4212/38378 (10.98%) Loss: 1.910117 LR: 0.00004990 [00:30:49] Epoch: 1 Batch: 4213/38378 (10.98%) Loss: 1.862848 LR: 0.00004990 [00:30:51] Epoch: 1 Batch: 4214/38378 (10.98%) Loss: 2.089781 LR: 0.00004990 [00:30:52] Epoch: 1 Batch: 4215/38378 (10.98%) Loss: 1.895362 LR: 0.00004990 [00:30:54] Epoch: 1 Batch: 4216/38378 (10.99%) Loss: 1.621288 LR: 0.00004990 [00:30:56] Epoch: 1 Batch: 4217/38378 (10.99%) Loss: 2.035715 LR: 0.00004990 [00:30:57] Epoch: 1 Batch: 4218/38378 (10.99%) Loss: 2.075269 LR: 0.00004990 [00:30:59] Epoch: 1 Batch: 4219/38378 (10.99%) Loss: 1.816645 LR: 0.00004990 [00:31:01] Epoch: 1 Batch: 4220/38378 (11.00%) Loss: 1.906750 LR: 0.00004990 [00:31:03] Epoch: 1 Batch: 4221/38378 (11.00%) Loss: 1.776284 LR: 0.00004990 [00:31:04] Epoch: 1 Batch: 4222/38378 (11.00%) Loss: 2.112038 LR: 0.00004990 [00:31:06] Epoch: 1 Batch: 4223/38378 (11.00%) Loss: 1.878415 LR: 0.00004990 [00:31:11] >> Cleaned up old temp checkpoint: epoch1_step3894 [00:31:11] >> Temp checkpoint saved: epoch1_step4224, size: 0.1702 GB [00:31:11] Epoch: 1 Batch: 4224/38378 (11.01%) Loss: 2.288046 LR: 0.00004990 [00:31:13] Epoch: 1 Batch: 4225/38378 (11.01%) Loss: 1.989903 LR: 0.00004990 [00:31:15] Epoch: 1 Batch: 4226/38378 (11.01%) Loss: 2.055176 LR: 0.00004990 [00:31:17] Epoch: 1 Batch: 4227/38378 (11.01%) Loss: 1.977425 LR: 0.00004990 [00:31:18] Epoch: 1 Batch: 4228/38378 (11.02%) Loss: 1.894991 LR: 0.00004990 [00:31:20] Epoch: 1 Batch: 4229/38378 (11.02%) Loss: 2.005564 LR: 0.00004990 [00:31:22] Epoch: 1 Batch: 4230/38378 (11.02%) Loss: 2.026092 LR: 0.00004990 [00:31:23] Epoch: 1 Batch: 4231/38378 (11.02%) Loss: 2.356766 LR: 0.00004990 [00:31:25] Epoch: 1 Batch: 4232/38378 (11.03%) Loss: 1.843311 LR: 0.00004989 [00:31:27] Epoch: 1 Batch: 4233/38378 (11.03%) Loss: 1.922941 LR: 0.00004989 [00:31:28] Epoch: 1 Batch: 4234/38378 (11.03%) Loss: 2.066322 LR: 0.00004989 [00:31:30] Epoch: 1 Batch: 4235/38378 (11.03%) Loss: 1.950856 LR: 0.00004989 [00:31:32] Epoch: 1 Batch: 4236/38378 (11.04%) Loss: 2.434165 LR: 0.00004989 [00:31:34] Epoch: 1 Batch: 4237/38378 (11.04%) Loss: 1.709635 LR: 0.00004989 [00:31:35] Epoch: 1 Batch: 4238/38378 (11.04%) Loss: 1.944524 LR: 0.00004989 [00:31:37] Epoch: 1 Batch: 4239/38378 (11.05%) Loss: 1.864184 LR: 0.00004989 [00:31:39] Epoch: 1 Batch: 4240/38378 (11.05%) Loss: 1.950546 LR: 0.00004989 [00:31:40] Epoch: 1 Batch: 4241/38378 (11.05%) Loss: 1.934196 LR: 0.00004989 [00:31:42] Epoch: 1 Batch: 4242/38378 (11.05%) Loss: 1.975197 LR: 0.00004989 [00:31:44] Epoch: 1 Batch: 4243/38378 (11.06%) Loss: 1.933893 LR: 0.00004989 [00:31:46] Epoch: 1 Batch: 4244/38378 (11.06%) Loss: 2.185662 LR: 0.00004989 [00:31:47] Epoch: 1 Batch: 4245/38378 (11.06%) Loss: 2.216472 LR: 0.00004989 [00:31:49] Epoch: 1 Batch: 4246/38378 (11.06%) Loss: 2.057077 LR: 0.00004989 [00:31:51] Epoch: 1 Batch: 4247/38378 (11.07%) Loss: 2.042129 LR: 0.00004989 [00:31:52] Epoch: 1 Batch: 4248/38378 (11.07%) Loss: 2.274766 LR: 0.00004989 [00:31:54] Epoch: 1 Batch: 4249/38378 (11.07%) Loss: 2.051557 LR: 0.00004989 [00:31:56] Epoch: 1 Batch: 4250/38378 (11.07%) Loss: 2.317633 LR: 0.00004989 [00:31:58] Epoch: 1 Batch: 4251/38378 (11.08%) Loss: 2.083232 LR: 0.00004989 [00:31:59] Epoch: 1 Batch: 4252/38378 (11.08%) Loss: 2.473666 LR: 0.00004989 [00:32:01] Epoch: 1 Batch: 4253/38378 (11.08%) Loss: 2.081514 LR: 0.00004989 [00:32:03] Epoch: 1 Batch: 4254/38378 (11.08%) Loss: 2.046160 LR: 0.00004989 [00:32:04] Epoch: 1 Batch: 4255/38378 (11.09%) Loss: 2.260576 LR: 0.00004989 [00:32:06] Epoch: 1 Batch: 4256/38378 (11.09%) Loss: 2.130973 LR: 0.00004989 [00:32:12] >> Cleaned up old temp checkpoint: epoch1_step3927 [00:32:12] >> Temp checkpoint saved: epoch1_step4257, size: 0.1702 GB [00:32:12] Epoch: 1 Batch: 4257/38378 (11.09%) Loss: 1.920633 LR: 0.00004989 [00:32:13] Epoch: 1 Batch: 4258/38378 (11.09%) Loss: 2.009412 LR: 0.00004989 [00:32:15] Epoch: 1 Batch: 4259/38378 (11.10%) Loss: 2.203702 LR: 0.00004989 [00:32:17] Epoch: 1 Batch: 4260/38378 (11.10%) Loss: 1.996776 LR: 0.00004989 [00:32:19] Epoch: 1 Batch: 4261/38378 (11.10%) Loss: 1.945248 LR: 0.00004989 [00:32:20] Epoch: 1 Batch: 4262/38378 (11.11%) Loss: 2.183851 LR: 0.00004989 [00:32:22] Epoch: 1 Batch: 4263/38378 (11.11%) Loss: 1.972335 LR: 0.00004989 [00:32:24] Epoch: 1 Batch: 4264/38378 (11.11%) Loss: 2.340107 LR: 0.00004989 [00:32:25] Epoch: 1 Batch: 4265/38378 (11.11%) Loss: 2.094063 LR: 0.00004989 [00:32:27] Epoch: 1 Batch: 4266/38378 (11.12%) Loss: 2.005667 LR: 0.00004989 [00:32:29] Epoch: 1 Batch: 4267/38378 (11.12%) Loss: 2.079958 LR: 0.00004989 [00:32:30] Epoch: 1 Batch: 4268/38378 (11.12%) Loss: 2.148177 LR: 0.00004989 [00:32:32] Epoch: 1 Batch: 4269/38378 (11.12%) Loss: 2.103871 LR: 0.00004989 [00:32:34] Epoch: 1 Batch: 4270/38378 (11.13%) Loss: 1.909702 LR: 0.00004989 [00:32:36] Epoch: 1 Batch: 4271/38378 (11.13%) Loss: 2.013060 LR: 0.00004989 [00:32:37] Epoch: 1 Batch: 4272/38378 (11.13%) Loss: 2.233722 LR: 0.00004989 [00:32:39] Epoch: 1 Batch: 4273/38378 (11.13%) Loss: 2.340864 LR: 0.00004989 [00:32:41] Epoch: 1 Batch: 4274/38378 (11.14%) Loss: 2.212867 LR: 0.00004989 [00:32:42] Epoch: 1 Batch: 4275/38378 (11.14%) Loss: 1.674354 LR: 0.00004989 [00:32:44] Epoch: 1 Batch: 4276/38378 (11.14%) Loss: 1.979382 LR: 0.00004989 [00:32:46] Epoch: 1 Batch: 4277/38378 (11.14%) Loss: 2.213628 LR: 0.00004989 [00:32:48] Epoch: 1 Batch: 4278/38378 (11.15%) Loss: 1.867928 LR: 0.00004989 [00:32:49] Epoch: 1 Batch: 4279/38378 (11.15%) Loss: 1.988628 LR: 0.00004989 [00:32:51] Epoch: 1 Batch: 4280/38378 (11.15%) Loss: 1.961667 LR: 0.00004989 [00:32:53] Epoch: 1 Batch: 4281/38378 (11.15%) Loss: 1.954172 LR: 0.00004989 [00:32:55] Epoch: 1 Batch: 4282/38378 (11.16%) Loss: 2.056198 LR: 0.00004989 [00:32:56] Epoch: 1 Batch: 4283/38378 (11.16%) Loss: 1.795959 LR: 0.00004989 [00:32:58] Epoch: 1 Batch: 4284/38378 (11.16%) Loss: 1.955517 LR: 0.00004989 [00:33:00] Epoch: 1 Batch: 4285/38378 (11.17%) Loss: 2.291558 LR: 0.00004989 [00:33:01] Epoch: 1 Batch: 4286/38378 (11.17%) Loss: 1.908500 LR: 0.00004989 [00:33:03] Epoch: 1 Batch: 4287/38378 (11.17%) Loss: 1.751265 LR: 0.00004989 [00:33:05] Epoch: 1 Batch: 4288/38378 (11.17%) Loss: 1.929457 LR: 0.00004988 [00:33:07] Epoch: 1 Batch: 4289/38378 (11.18%) Loss: 1.950414 LR: 0.00004988 [00:33:12] >> Cleaned up old temp checkpoint: epoch1_step3960 [00:33:12] >> Temp checkpoint saved: epoch1_step4290, size: 0.1702 GB [00:33:12] Epoch: 1 Batch: 4290/38378 (11.18%) Loss: 1.892689 LR: 0.00004988 [00:33:14] Epoch: 1 Batch: 4291/38378 (11.18%) Loss: 1.946755 LR: 0.00004988 [00:33:15] Epoch: 1 Batch: 4292/38378 (11.18%) Loss: 2.149248 LR: 0.00004988 [00:33:17] Epoch: 1 Batch: 4293/38378 (11.19%) Loss: 1.866808 LR: 0.00004988 [00:33:19] Epoch: 1 Batch: 4294/38378 (11.19%) Loss: 2.081128 LR: 0.00004988 [00:33:21] Epoch: 1 Batch: 4295/38378 (11.19%) Loss: 2.420268 LR: 0.00004988 [00:33:22] Epoch: 1 Batch: 4296/38378 (11.19%) Loss: 1.886426 LR: 0.00004988 [00:33:24] Epoch: 1 Batch: 4297/38378 (11.20%) Loss: 2.089846 LR: 0.00004988 [00:33:26] Epoch: 1 Batch: 4298/38378 (11.20%) Loss: 2.058004 LR: 0.00004988 [00:33:27] Epoch: 1 Batch: 4299/38378 (11.20%) Loss: 1.919317 LR: 0.00004988 [00:33:29] Epoch: 1 Batch: 4300/38378 (11.20%) Loss: 2.355330 LR: 0.00004988 [00:33:31] Epoch: 1 Batch: 4301/38378 (11.21%) Loss: 2.313044 LR: 0.00004988 [00:33:32] Epoch: 1 Batch: 4302/38378 (11.21%) Loss: 1.765585 LR: 0.00004988 [00:33:34] Epoch: 1 Batch: 4303/38378 (11.21%) Loss: 1.929733 LR: 0.00004988 [00:33:36] Epoch: 1 Batch: 4304/38378 (11.21%) Loss: 2.280480 LR: 0.00004988 [00:33:37] Epoch: 1 Batch: 4305/38378 (11.22%) Loss: 2.090032 LR: 0.00004988 [00:33:39] Epoch: 1 Batch: 4306/38378 (11.22%) Loss: 1.990724 LR: 0.00004988 [00:33:41] Epoch: 1 Batch: 4307/38378 (11.22%) Loss: 2.193614 LR: 0.00004988 [00:33:43] Epoch: 1 Batch: 4308/38378 (11.23%) Loss: 2.010049 LR: 0.00004988 [00:33:44] Epoch: 1 Batch: 4309/38378 (11.23%) Loss: 2.396734 LR: 0.00004988 [00:33:46] Epoch: 1 Batch: 4310/38378 (11.23%) Loss: 2.254580 LR: 0.00004988 [00:33:48] Epoch: 1 Batch: 4311/38378 (11.23%) Loss: 2.156287 LR: 0.00004988 [00:33:50] Epoch: 1 Batch: 4312/38378 (11.24%) Loss: 1.950957 LR: 0.00004988 [00:33:51] Epoch: 1 Batch: 4313/38378 (11.24%) Loss: 2.000104 LR: 0.00004988 [00:33:53] Epoch: 1 Batch: 4314/38378 (11.24%) Loss: 2.269155 LR: 0.00004988 [00:33:55] Epoch: 1 Batch: 4315/38378 (11.24%) Loss: 2.411146 LR: 0.00004988 [00:33:56] Epoch: 1 Batch: 4316/38378 (11.25%) Loss: 1.936317 LR: 0.00004988 [00:33:58] Epoch: 1 Batch: 4317/38378 (11.25%) Loss: 1.715981 LR: 0.00004988 [00:34:00] Epoch: 1 Batch: 4318/38378 (11.25%) Loss: 2.221874 LR: 0.00004988 [00:34:01] Epoch: 1 Batch: 4319/38378 (11.25%) Loss: 1.765958 LR: 0.00004988 [00:34:03] Epoch: 1 Batch: 4320/38378 (11.26%) Loss: 2.206268 LR: 0.00004988 [00:34:05] Epoch: 1 Batch: 4321/38378 (11.26%) Loss: 1.756412 LR: 0.00004988 [00:34:07] Epoch: 1 Batch: 4322/38378 (11.26%) Loss: 2.094153 LR: 0.00004988 [00:34:12] >> Cleaned up old temp checkpoint: epoch1_step3993 [00:34:12] >> Temp checkpoint saved: epoch1_step4323, size: 0.1702 GB [00:34:12] Epoch: 1 Batch: 4323/38378 (11.26%) Loss: 1.717367 LR: 0.00004988 [00:34:14] Epoch: 1 Batch: 4324/38378 (11.27%) Loss: 1.844830 LR: 0.00004988 [00:34:16] Epoch: 1 Batch: 4325/38378 (11.27%) Loss: 2.010899 LR: 0.00004988 [00:34:17] Epoch: 1 Batch: 4326/38378 (11.27%) Loss: 2.319438 LR: 0.00004988 [00:34:19] Epoch: 1 Batch: 4327/38378 (11.27%) Loss: 2.016336 LR: 0.00004988 [00:34:21] Epoch: 1 Batch: 4328/38378 (11.28%) Loss: 2.172543 LR: 0.00004988 [00:34:23] Epoch: 1 Batch: 4329/38378 (11.28%) Loss: 1.819550 LR: 0.00004988 [00:34:24] Epoch: 1 Batch: 4330/38378 (11.28%) Loss: 1.898485 LR: 0.00004988 [00:34:26] Epoch: 1 Batch: 4331/38378 (11.29%) Loss: 1.997350 LR: 0.00004988 [00:34:28] Epoch: 1 Batch: 4332/38378 (11.29%) Loss: 2.539787 LR: 0.00004988 [00:34:29] Epoch: 1 Batch: 4333/38378 (11.29%) Loss: 1.959157 LR: 0.00004988 [00:34:31] Epoch: 1 Batch: 4334/38378 (11.29%) Loss: 2.070842 LR: 0.00004988 [00:34:33] Epoch: 1 Batch: 4335/38378 (11.30%) Loss: 1.870636 LR: 0.00004988 [00:34:35] Epoch: 1 Batch: 4336/38378 (11.30%) Loss: 2.044977 LR: 0.00004988 [00:34:36] Epoch: 1 Batch: 4337/38378 (11.30%) Loss: 2.227082 LR: 0.00004987 [00:34:38] Epoch: 1 Batch: 4338/38378 (11.30%) Loss: 2.132361 LR: 0.00004987 [00:34:40] Epoch: 1 Batch: 4339/38378 (11.31%) Loss: 2.049498 LR: 0.00004987 [00:34:41] Epoch: 1 Batch: 4340/38378 (11.31%) Loss: 2.027112 LR: 0.00004987 [00:34:43] Epoch: 1 Batch: 4341/38378 (11.31%) Loss: 1.960952 LR: 0.00004987 [00:34:45] Epoch: 1 Batch: 4342/38378 (11.31%) Loss: 1.988122 LR: 0.00004987 [00:34:47] Epoch: 1 Batch: 4343/38378 (11.32%) Loss: 2.148223 LR: 0.00004987 [00:34:48] Epoch: 1 Batch: 4344/38378 (11.32%) Loss: 2.312505 LR: 0.00004987 [00:34:50] Epoch: 1 Batch: 4345/38378 (11.32%) Loss: 2.331240 LR: 0.00004987 [00:34:52] Epoch: 1 Batch: 4346/38378 (11.32%) Loss: 2.081262 LR: 0.00004987 [00:34:54] Epoch: 1 Batch: 4347/38378 (11.33%) Loss: 1.566525 LR: 0.00004987 [00:34:55] Epoch: 1 Batch: 4348/38378 (11.33%) Loss: 1.929517 LR: 0.00004987 [00:34:57] Epoch: 1 Batch: 4349/38378 (11.33%) Loss: 2.002232 LR: 0.00004987 [00:34:59] Epoch: 1 Batch: 4350/38378 (11.33%) Loss: 2.373092 LR: 0.00004987 [00:35:00] Epoch: 1 Batch: 4351/38378 (11.34%) Loss: 1.948626 LR: 0.00004987 [00:35:02] Epoch: 1 Batch: 4352/38378 (11.34%) Loss: 1.949633 LR: 0.00004987 [00:35:04] Epoch: 1 Batch: 4353/38378 (11.34%) Loss: 2.154383 LR: 0.00004987 [00:35:05] Epoch: 1 Batch: 4354/38378 (11.35%) Loss: 2.180878 LR: 0.00004987 [00:35:07] Epoch: 1 Batch: 4355/38378 (11.35%) Loss: 2.140448 LR: 0.00004987 [00:35:13] >> Cleaned up old temp checkpoint: epoch1_step4026 [00:35:13] >> Temp checkpoint saved: epoch1_step4356, size: 0.1702 GB [00:35:13] Epoch: 1 Batch: 4356/38378 (11.35%) Loss: 2.165409 LR: 0.00004987 [00:35:14] Epoch: 1 Batch: 4357/38378 (11.35%) Loss: 2.350175 LR: 0.00004987 [00:35:16] Epoch: 1 Batch: 4358/38378 (11.36%) Loss: 2.180708 LR: 0.00004987 [00:35:18] Epoch: 1 Batch: 4359/38378 (11.36%) Loss: 1.843134 LR: 0.00004987 [00:35:20] Epoch: 1 Batch: 4360/38378 (11.36%) Loss: 2.129569 LR: 0.00004987 [00:35:21] Epoch: 1 Batch: 4361/38378 (11.36%) Loss: 2.089060 LR: 0.00004987 [00:35:23] Epoch: 1 Batch: 4362/38378 (11.37%) Loss: 2.079893 LR: 0.00004987 [00:35:25] Epoch: 1 Batch: 4363/38378 (11.37%) Loss: 2.105174 LR: 0.00004987 [00:35:26] Epoch: 1 Batch: 4364/38378 (11.37%) Loss: 1.997936 LR: 0.00004987 [00:35:28] Epoch: 1 Batch: 4365/38378 (11.37%) Loss: 2.146648 LR: 0.00004987 [00:35:30] Epoch: 1 Batch: 4366/38378 (11.38%) Loss: 2.050090 LR: 0.00004987 [00:35:31] Epoch: 1 Batch: 4367/38378 (11.38%) Loss: 2.131743 LR: 0.00004987 [00:35:33] Epoch: 1 Batch: 4368/38378 (11.38%) Loss: 2.237187 LR: 0.00004987 [00:35:35] Epoch: 1 Batch: 4369/38378 (11.38%) Loss: 2.208874 LR: 0.00004987 [00:35:36] Epoch: 1 Batch: 4370/38378 (11.39%) Loss: 1.937943 LR: 0.00004987 [00:35:38] Epoch: 1 Batch: 4371/38378 (11.39%) Loss: 1.838258 LR: 0.00004987 [00:35:40] Epoch: 1 Batch: 4372/38378 (11.39%) Loss: 2.160030 LR: 0.00004987 [00:35:42] Epoch: 1 Batch: 4373/38378 (11.39%) Loss: 2.327881 LR: 0.00004987 [00:35:43] Epoch: 1 Batch: 4374/38378 (11.40%) Loss: 1.965976 LR: 0.00004987 [00:35:45] Epoch: 1 Batch: 4375/38378 (11.40%) Loss: 2.163260 LR: 0.00004987 [00:35:47] Epoch: 1 Batch: 4376/38378 (11.40%) Loss: 2.065370 LR: 0.00004987 [00:35:49] Epoch: 1 Batch: 4377/38378 (11.40%) Loss: 2.320777 LR: 0.00004987 [00:35:50] Epoch: 1 Batch: 4378/38378 (11.41%) Loss: 2.050135 LR: 0.00004987 [00:35:52] Epoch: 1 Batch: 4379/38378 (11.41%) Loss: 1.814769 LR: 0.00004987 [00:35:54] Epoch: 1 Batch: 4380/38378 (11.41%) Loss: 2.138218 LR: 0.00004987 [00:35:55] Epoch: 1 Batch: 4381/38378 (11.42%) Loss: 2.036593 LR: 0.00004987 [00:35:57] Epoch: 1 Batch: 4382/38378 (11.42%) Loss: 2.044949 LR: 0.00004987 [00:35:59] Epoch: 1 Batch: 4383/38378 (11.42%) Loss: 1.785360 LR: 0.00004987 [00:36:01] Epoch: 1 Batch: 4384/38378 (11.42%) Loss: 1.796649 LR: 0.00004987 [00:36:02] Epoch: 1 Batch: 4385/38378 (11.43%) Loss: 1.731849 LR: 0.00004987 [00:36:04] Epoch: 1 Batch: 4386/38378 (11.43%) Loss: 2.195482 LR: 0.00004986 [00:36:06] Epoch: 1 Batch: 4387/38378 (11.43%) Loss: 2.256860 LR: 0.00004986 [00:36:07] Epoch: 1 Batch: 4388/38378 (11.43%) Loss: 1.869696 LR: 0.00004986 [00:36:13] >> Cleaned up old temp checkpoint: epoch1_step4059 [00:36:13] >> Temp checkpoint saved: epoch1_step4389, size: 0.1702 GB [00:36:13] Epoch: 1 Batch: 4389/38378 (11.44%) Loss: 1.818198 LR: 0.00004986 [00:36:15] Epoch: 1 Batch: 4390/38378 (11.44%) Loss: 2.109636 LR: 0.00004986 [00:36:16] Epoch: 1 Batch: 4391/38378 (11.44%) Loss: 2.203518 LR: 0.00004986 [00:36:18] Epoch: 1 Batch: 4392/38378 (11.44%) Loss: 1.853158 LR: 0.00004986 [00:36:20] Epoch: 1 Batch: 4393/38378 (11.45%) Loss: 1.992029 LR: 0.00004986 [00:36:21] Epoch: 1 Batch: 4394/38378 (11.45%) Loss: 2.118175 LR: 0.00004986 [00:36:23] Epoch: 1 Batch: 4395/38378 (11.45%) Loss: 2.066330 LR: 0.00004986 [00:36:25] Epoch: 1 Batch: 4396/38378 (11.45%) Loss: 1.930812 LR: 0.00004986 [00:36:27] Epoch: 1 Batch: 4397/38378 (11.46%) Loss: 1.960557 LR: 0.00004986 [00:36:28] Epoch: 1 Batch: 4398/38378 (11.46%) Loss: 2.144077 LR: 0.00004986 [00:36:30] Epoch: 1 Batch: 4399/38378 (11.46%) Loss: 2.208865 LR: 0.00004986 [00:36:31] Epoch: 1 Batch: 4400/38378 (11.46%) Loss: 1.893828 LR: 0.00004986 [00:36:33] Epoch: 1 Batch: 4401/38378 (11.47%) Loss: 2.123889 LR: 0.00004986 [00:36:35] Epoch: 1 Batch: 4402/38378 (11.47%) Loss: 2.026417 LR: 0.00004986 [00:36:36] Epoch: 1 Batch: 4403/38378 (11.47%) Loss: 2.236127 LR: 0.00004986 [00:36:38] Epoch: 1 Batch: 4404/38378 (11.48%) Loss: 2.043963 LR: 0.00004986 [00:36:40] Epoch: 1 Batch: 4405/38378 (11.48%) Loss: 1.856880 LR: 0.00004986 [00:36:42] Epoch: 1 Batch: 4406/38378 (11.48%) Loss: 1.952160 LR: 0.00004986 [00:36:43] Epoch: 1 Batch: 4407/38378 (11.48%) Loss: 1.815247 LR: 0.00004986 [00:36:45] Epoch: 1 Batch: 4408/38378 (11.49%) Loss: 2.041456 LR: 0.00004986 [00:36:47] Epoch: 1 Batch: 4409/38378 (11.49%) Loss: 1.886904 LR: 0.00004986 [00:36:49] Epoch: 1 Batch: 4410/38378 (11.49%) Loss: 2.183876 LR: 0.00004986 [00:36:50] Epoch: 1 Batch: 4411/38378 (11.49%) Loss: 1.710440 LR: 0.00004986 [00:36:52] Epoch: 1 Batch: 4412/38378 (11.50%) Loss: 2.114131 LR: 0.00004986 [00:36:54] Epoch: 1 Batch: 4413/38378 (11.50%) Loss: 2.225226 LR: 0.00004986 [00:36:55] Epoch: 1 Batch: 4414/38378 (11.50%) Loss: 1.693137 LR: 0.00004986 [00:36:57] Epoch: 1 Batch: 4415/38378 (11.50%) Loss: 2.077019 LR: 0.00004986 [00:36:59] Epoch: 1 Batch: 4416/38378 (11.51%) Loss: 2.148792 LR: 0.00004986 [00:37:00] Epoch: 1 Batch: 4417/38378 (11.51%) Loss: 2.488265 LR: 0.00004986 [00:37:02] Epoch: 1 Batch: 4418/38378 (11.51%) Loss: 1.788221 LR: 0.00004986 [00:37:04] Epoch: 1 Batch: 4419/38378 (11.51%) Loss: 1.955293 LR: 0.00004986 [00:37:06] Epoch: 1 Batch: 4420/38378 (11.52%) Loss: 2.137931 LR: 0.00004986 [00:37:07] Epoch: 1 Batch: 4421/38378 (11.52%) Loss: 1.970717 LR: 0.00004986 [00:37:13] >> Cleaned up old temp checkpoint: epoch1_step4092 [00:37:13] >> Temp checkpoint saved: epoch1_step4422, size: 0.1702 GB [00:37:13] Epoch: 1 Batch: 4422/38378 (11.52%) Loss: 2.401851 LR: 0.00004986 [00:37:15] Epoch: 1 Batch: 4423/38378 (11.52%) Loss: 2.222393 LR: 0.00004986 [00:37:16] Epoch: 1 Batch: 4424/38378 (11.53%) Loss: 1.932056 LR: 0.00004986 [00:37:18] Epoch: 1 Batch: 4425/38378 (11.53%) Loss: 2.170494 LR: 0.00004986 [00:37:20] Epoch: 1 Batch: 4426/38378 (11.53%) Loss: 1.959457 LR: 0.00004986 [00:37:21] Epoch: 1 Batch: 4427/38378 (11.54%) Loss: 1.895791 LR: 0.00004986 [00:37:23] Epoch: 1 Batch: 4428/38378 (11.54%) Loss: 2.091208 LR: 0.00004986 [00:37:25] Epoch: 1 Batch: 4429/38378 (11.54%) Loss: 1.564316 LR: 0.00004986 [00:37:27] Epoch: 1 Batch: 4430/38378 (11.54%) Loss: 2.153178 LR: 0.00004986 [00:37:28] Epoch: 1 Batch: 4431/38378 (11.55%) Loss: 1.789698 LR: 0.00004986 [00:37:30] Epoch: 1 Batch: 4432/38378 (11.55%) Loss: 2.061217 LR: 0.00004986 [00:37:31] Epoch: 1 Batch: 4433/38378 (11.55%) Loss: 2.116496 LR: 0.00004986 [00:37:33] Epoch: 1 Batch: 4434/38378 (11.55%) Loss: 2.354796 LR: 0.00004986 [00:37:35] Epoch: 1 Batch: 4435/38378 (11.56%) Loss: 2.111603 LR: 0.00004985 [00:37:36] Epoch: 1 Batch: 4436/38378 (11.56%) Loss: 1.560693 LR: 0.00004985 [00:37:38] Epoch: 1 Batch: 4437/38378 (11.56%) Loss: 1.838644 LR: 0.00004985 [00:37:40] Epoch: 1 Batch: 4438/38378 (11.56%) Loss: 1.849159 LR: 0.00004985 [00:37:42] Epoch: 1 Batch: 4439/38378 (11.57%) Loss: 1.719850 LR: 0.00004985 [00:37:43] Epoch: 1 Batch: 4440/38378 (11.57%) Loss: 2.149719 LR: 0.00004985 [00:37:45] Epoch: 1 Batch: 4441/38378 (11.57%) Loss: 1.652159 LR: 0.00004985 [00:37:47] Epoch: 1 Batch: 4442/38378 (11.57%) Loss: 1.908923 LR: 0.00004985 [00:37:48] Epoch: 1 Batch: 4443/38378 (11.58%) Loss: 2.144212 LR: 0.00004985 [00:37:50] Epoch: 1 Batch: 4444/38378 (11.58%) Loss: 2.060133 LR: 0.00004985 [00:37:52] Epoch: 1 Batch: 4445/38378 (11.58%) Loss: 1.902770 LR: 0.00004985 [00:37:54] Epoch: 1 Batch: 4446/38378 (11.58%) Loss: 2.272214 LR: 0.00004985 [00:37:55] Epoch: 1 Batch: 4447/38378 (11.59%) Loss: 1.723768 LR: 0.00004985 [00:37:57] Epoch: 1 Batch: 4448/38378 (11.59%) Loss: 1.965383 LR: 0.00004985 [00:37:59] Epoch: 1 Batch: 4449/38378 (11.59%) Loss: 2.343402 LR: 0.00004985 [00:38:00] Epoch: 1 Batch: 4450/38378 (11.60%) Loss: 2.157482 LR: 0.00004985 [00:38:02] Epoch: 1 Batch: 4451/38378 (11.60%) Loss: 2.022085 LR: 0.00004985 [00:38:04] Epoch: 1 Batch: 4452/38378 (11.60%) Loss: 2.074096 LR: 0.00004985 [00:38:06] Epoch: 1 Batch: 4453/38378 (11.60%) Loss: 2.258751 LR: 0.00004985 [00:38:07] Epoch: 1 Batch: 4454/38378 (11.61%) Loss: 2.123975 LR: 0.00004985 [00:38:13] >> Cleaned up old temp checkpoint: epoch1_step4125 [00:38:13] >> Temp checkpoint saved: epoch1_step4455, size: 0.1702 GB [00:38:13] Epoch: 1 Batch: 4455/38378 (11.61%) Loss: 2.164347 LR: 0.00004985 [00:38:14] Epoch: 1 Batch: 4456/38378 (11.61%) Loss: 1.918089 LR: 0.00004985 [00:38:16] Epoch: 1 Batch: 4457/38378 (11.61%) Loss: 2.120847 LR: 0.00004985 [00:38:18] Epoch: 1 Batch: 4458/38378 (11.62%) Loss: 2.059470 LR: 0.00004985 [00:38:19] Epoch: 1 Batch: 4459/38378 (11.62%) Loss: 1.893516 LR: 0.00004985 [00:38:21] Epoch: 1 Batch: 4460/38378 (11.62%) Loss: 2.197999 LR: 0.00004985 [00:38:23] Epoch: 1 Batch: 4461/38378 (11.62%) Loss: 1.768295 LR: 0.00004985 [00:38:25] Epoch: 1 Batch: 4462/38378 (11.63%) Loss: 2.212385 LR: 0.00004985 [00:38:26] Epoch: 1 Batch: 4463/38378 (11.63%) Loss: 2.246852 LR: 0.00004985 [00:38:28] Epoch: 1 Batch: 4464/38378 (11.63%) Loss: 2.078133 LR: 0.00004985 [00:38:30] Epoch: 1 Batch: 4465/38378 (11.63%) Loss: 2.419941 LR: 0.00004985 [00:38:31] Epoch: 1 Batch: 4466/38378 (11.64%) Loss: 1.700842 LR: 0.00004985 [00:38:33] Epoch: 1 Batch: 4467/38378 (11.64%) Loss: 2.026595 LR: 0.00004985 [00:38:35] Epoch: 1 Batch: 4468/38378 (11.64%) Loss: 2.142695 LR: 0.00004985 [00:38:37] Epoch: 1 Batch: 4469/38378 (11.64%) Loss: 2.149890 LR: 0.00004985 [00:38:38] Epoch: 1 Batch: 4470/38378 (11.65%) Loss: 2.112944 LR: 0.00004985 [00:38:40] Epoch: 1 Batch: 4471/38378 (11.65%) Loss: 1.909624 LR: 0.00004985 [00:38:42] Epoch: 1 Batch: 4472/38378 (11.65%) Loss: 2.015801 LR: 0.00004985 [00:38:43] Epoch: 1 Batch: 4473/38378 (11.66%) Loss: 1.873169 LR: 0.00004985 [00:38:45] Epoch: 1 Batch: 4474/38378 (11.66%) Loss: 2.161999 LR: 0.00004985 [00:38:47] Epoch: 1 Batch: 4475/38378 (11.66%) Loss: 2.068761 LR: 0.00004985 [00:38:48] Epoch: 1 Batch: 4476/38378 (11.66%) Loss: 1.891570 LR: 0.00004985 [00:38:50] Epoch: 1 Batch: 4477/38378 (11.67%) Loss: 1.925680 LR: 0.00004984 [00:38:52] Epoch: 1 Batch: 4478/38378 (11.67%) Loss: 1.826525 LR: 0.00004984 [00:38:54] Epoch: 1 Batch: 4479/38378 (11.67%) Loss: 2.002112 LR: 0.00004984 [00:38:55] Epoch: 1 Batch: 4480/38378 (11.67%) Loss: 1.743501 LR: 0.00004984 [00:38:57] Epoch: 1 Batch: 4481/38378 (11.68%) Loss: 1.900473 LR: 0.00004984 [00:38:59] Epoch: 1 Batch: 4482/38378 (11.68%) Loss: 2.191130 LR: 0.00004984 [00:39:00] Epoch: 1 Batch: 4483/38378 (11.68%) Loss: 2.234194 LR: 0.00004984 [00:39:02] Epoch: 1 Batch: 4484/38378 (11.68%) Loss: 2.217563 LR: 0.00004984 [00:39:04] Epoch: 1 Batch: 4485/38378 (11.69%) Loss: 1.979064 LR: 0.00004984 [00:39:06] Epoch: 1 Batch: 4486/38378 (11.69%) Loss: 2.136748 LR: 0.00004984 [00:39:07] Epoch: 1 Batch: 4487/38378 (11.69%) Loss: 1.799819 LR: 0.00004984 [00:39:13] >> Cleaned up old temp checkpoint: epoch1_step4158 [00:39:13] >> Temp checkpoint saved: epoch1_step4488, size: 0.1702 GB [00:39:13] Epoch: 1 Batch: 4488/38378 (11.69%) Loss: 1.910524 LR: 0.00004984 [00:39:15] Epoch: 1 Batch: 4489/38378 (11.70%) Loss: 1.905073 LR: 0.00004984 [00:39:17] Epoch: 1 Batch: 4490/38378 (11.70%) Loss: 1.813487 LR: 0.00004984 [00:39:18] Epoch: 1 Batch: 4491/38378 (11.70%) Loss: 2.129090 LR: 0.00004984 [00:39:20] Epoch: 1 Batch: 4492/38378 (11.70%) Loss: 2.039773 LR: 0.00004984 [00:39:22] Epoch: 1 Batch: 4493/38378 (11.71%) Loss: 1.987774 LR: 0.00004984 [00:39:23] Epoch: 1 Batch: 4494/38378 (11.71%) Loss: 2.131857 LR: 0.00004984 [00:39:25] Epoch: 1 Batch: 4495/38378 (11.71%) Loss: 2.114208 LR: 0.00004984 [00:39:27] Epoch: 1 Batch: 4496/38378 (11.72%) Loss: 2.351780 LR: 0.00004984 [00:39:29] Epoch: 1 Batch: 4497/38378 (11.72%) Loss: 1.893758 LR: 0.00004984 [00:39:30] Epoch: 1 Batch: 4498/38378 (11.72%) Loss: 2.069962 LR: 0.00004984 [00:39:32] Epoch: 1 Batch: 4499/38378 (11.72%) Loss: 2.241955 LR: 0.00004984 [00:39:34] >> Evaluating batch 0 [00:39:35] >> Evaluating batch 1 [00:39:36] >> Evaluating batch 2 [00:39:36] >> Evaluating batch 3 [00:39:37] >> Evaluating batch 4 [00:39:38] >> Evaluating batch 5 [00:39:39] >> Evaluating batch 6 [00:39:40] >> Evaluating batch 7 [00:39:41] >> Evaluating batch 8 [00:39:42] >> Evaluating batch 9 [00:39:43] >> Evaluating batch 10 [00:39:44] >> Evaluating batch 11 [00:39:45] >> Evaluating batch 12 [00:39:46] >> Evaluating batch 13 [00:39:47] >> Evaluating batch 14 [00:39:48] >> Evaluating batch 15 [00:39:49] >> Evaluating batch 16 [00:39:50] Epoch: 1 Step: 4500/38378 Evaluation: [00:39:50] [1mAvg Loss Since Last Eval: 2.0378 Val Loss: 2.1503 Validation loss delta: -0.0065 Perplexity: 8.5870 LR: 0.00004984 [00:39:54] >> Checkpoint saved: epoch1_step4500, size: 0.1702 GB [00:39:54] Epoch: 1 Batch: 4500/38378 (11.73%) Loss: 2.070434 LR: 0.00004984 [00:39:55] Epoch: 1 Batch: 4501/38378 (11.73%) Loss: 1.881939 LR: 0.00004984 [00:39:57] Epoch: 1 Batch: 4502/38378 (11.73%) Loss: 2.132876 LR: 0.00004984 [00:39:59] Epoch: 1 Batch: 4503/38378 (11.73%) Loss: 2.214985 LR: 0.00004984 [00:40:00] Epoch: 1 Batch: 4504/38378 (11.74%) Loss: 2.082613 LR: 0.00004984 [00:40:02] Epoch: 1 Batch: 4505/38378 (11.74%) Loss: 2.052266 LR: 0.00004984 [00:40:04] Epoch: 1 Batch: 4506/38378 (11.74%) Loss: 2.004916 LR: 0.00004984 [00:40:06] Epoch: 1 Batch: 4507/38378 (11.74%) Loss: 2.032346 LR: 0.00004984 [00:40:07] Epoch: 1 Batch: 4508/38378 (11.75%) Loss: 2.117329 LR: 0.00004984 [00:40:09] Epoch: 1 Batch: 4509/38378 (11.75%) Loss: 1.967452 LR: 0.00004984 [00:40:11] Epoch: 1 Batch: 4510/38378 (11.75%) Loss: 2.459560 LR: 0.00004984 [00:40:12] Epoch: 1 Batch: 4511/38378 (11.75%) Loss: 2.252164 LR: 0.00004984 [00:40:14] Epoch: 1 Batch: 4512/38378 (11.76%) Loss: 2.038841 LR: 0.00004984 [00:40:17] Epoch: 1 Batch: 4513/38378 (11.76%) Loss: 2.280729 LR: 0.00004984 [00:40:18] Epoch: 1 Batch: 4514/38378 (11.76%) Loss: 1.851694 LR: 0.00004984 [00:40:20] Epoch: 1 Batch: 4515/38378 (11.76%) Loss: 1.997438 LR: 0.00004984 [00:40:22] Epoch: 1 Batch: 4516/38378 (11.77%) Loss: 2.095534 LR: 0.00004984 [00:40:23] Epoch: 1 Batch: 4517/38378 (11.77%) Loss: 1.864828 LR: 0.00004984 [00:40:25] Epoch: 1 Batch: 4518/38378 (11.77%) Loss: 2.432832 LR: 0.00004984 [00:40:27] Epoch: 1 Batch: 4519/38378 (11.77%) Loss: 2.348974 LR: 0.00004984 [00:40:29] Epoch: 1 Batch: 4520/38378 (11.78%) Loss: 2.057460 LR: 0.00004984 [00:40:34] >> Cleaned up old temp checkpoint: epoch1_step4191 [00:40:34] >> Temp checkpoint saved: epoch1_step4521, size: 0.1702 GB [00:40:34] Epoch: 1 Batch: 4521/38378 (11.78%) Loss: 2.235962 LR: 0.00004984 [00:40:36] Epoch: 1 Batch: 4522/38378 (11.78%) Loss: 1.995107 LR: 0.00004984 [00:40:37] Epoch: 1 Batch: 4523/38378 (11.79%) Loss: 2.027255 LR: 0.00004984 [00:40:39] Epoch: 1 Batch: 4524/38378 (11.79%) Loss: 2.317662 LR: 0.00004984 [00:40:41] Epoch: 1 Batch: 4525/38378 (11.79%) Loss: 2.229203 LR: 0.00004984 [00:40:43] Epoch: 1 Batch: 4526/38378 (11.79%) Loss: 2.022394 LR: 0.00004983 [00:40:44] Epoch: 1 Batch: 4527/38378 (11.80%) Loss: 1.816421 LR: 0.00004983 [00:40:46] Epoch: 1 Batch: 4528/38378 (11.80%) Loss: 1.695474 LR: 0.00004983 [00:40:48] Epoch: 1 Batch: 4529/38378 (11.80%) Loss: 2.213921 LR: 0.00004983 [00:40:49] Epoch: 1 Batch: 4530/38378 (11.80%) Loss: 1.955865 LR: 0.00004983 [00:40:51] Epoch: 1 Batch: 4531/38378 (11.81%) Loss: 2.227113 LR: 0.00004983 [00:40:53] Epoch: 1 Batch: 4532/38378 (11.81%) Loss: 2.069185 LR: 0.00004983 [00:40:55] Epoch: 1 Batch: 4533/38378 (11.81%) Loss: 2.218678 LR: 0.00004983 [00:40:56] Epoch: 1 Batch: 4534/38378 (11.81%) Loss: 2.043429 LR: 0.00004983 [00:40:58] Epoch: 1 Batch: 4535/38378 (11.82%) Loss: 1.957109 LR: 0.00004983 [00:41:00] Epoch: 1 Batch: 4536/38378 (11.82%) Loss: 1.971985 LR: 0.00004983 [00:41:02] Epoch: 1 Batch: 4537/38378 (11.82%) Loss: 2.176418 LR: 0.00004983 [00:41:03] Epoch: 1 Batch: 4538/38378 (11.82%) Loss: 2.176540 LR: 0.00004983 [00:41:05] Epoch: 1 Batch: 4539/38378 (11.83%) Loss: 1.834062 LR: 0.00004983 [00:41:07] Epoch: 1 Batch: 4540/38378 (11.83%) Loss: 1.967368 LR: 0.00004983 [00:41:08] Epoch: 1 Batch: 4541/38378 (11.83%) Loss: 2.015087 LR: 0.00004983 [00:41:10] Epoch: 1 Batch: 4542/38378 (11.83%) Loss: 2.038439 LR: 0.00004983 [00:41:12] Epoch: 1 Batch: 4543/38378 (11.84%) Loss: 1.970870 LR: 0.00004983 [00:41:14] Epoch: 1 Batch: 4544/38378 (11.84%) Loss: 2.070424 LR: 0.00004983 [00:41:15] Epoch: 1 Batch: 4545/38378 (11.84%) Loss: 2.081442 LR: 0.00004983 [00:41:17] Epoch: 1 Batch: 4546/38378 (11.85%) Loss: 2.076233 LR: 0.00004983 [00:41:19] Epoch: 1 Batch: 4547/38378 (11.85%) Loss: 1.960405 LR: 0.00004983 [00:41:20] Epoch: 1 Batch: 4548/38378 (11.85%) Loss: 1.766622 LR: 0.00004983 [00:41:22] Epoch: 1 Batch: 4549/38378 (11.85%) Loss: 2.065095 LR: 0.00004983 [00:41:24] Epoch: 1 Batch: 4550/38378 (11.86%) Loss: 1.835579 LR: 0.00004983 [00:41:26] Epoch: 1 Batch: 4551/38378 (11.86%) Loss: 2.021425 LR: 0.00004983 [00:41:27] Epoch: 1 Batch: 4552/38378 (11.86%) Loss: 2.041445 LR: 0.00004983 [00:41:29] Epoch: 1 Batch: 4553/38378 (11.86%) Loss: 1.972127 LR: 0.00004983 [00:41:35] >> Cleaned up old temp checkpoint: epoch1_step4224 [00:41:35] >> Temp checkpoint saved: epoch1_step4554, size: 0.1702 GB [00:41:35] Epoch: 1 Batch: 4554/38378 (11.87%) Loss: 2.010917 LR: 0.00004983 [00:41:36] Epoch: 1 Batch: 4555/38378 (11.87%) Loss: 1.740127 LR: 0.00004983 [00:41:38] Epoch: 1 Batch: 4556/38378 (11.87%) Loss: 2.263415 LR: 0.00004983 [00:41:40] Epoch: 1 Batch: 4557/38378 (11.87%) Loss: 2.147346 LR: 0.00004983 [00:41:41] Epoch: 1 Batch: 4558/38378 (11.88%) Loss: 1.958144 LR: 0.00004983 [00:41:43] Epoch: 1 Batch: 4559/38378 (11.88%) Loss: 2.028508 LR: 0.00004983 [00:41:45] Epoch: 1 Batch: 4560/38378 (11.88%) Loss: 1.927357 LR: 0.00004983 [00:41:47] Epoch: 1 Batch: 4561/38378 (11.88%) Loss: 2.212876 LR: 0.00004983 [00:41:48] Epoch: 1 Batch: 4562/38378 (11.89%) Loss: 2.160640 LR: 0.00004983 [00:41:50] Epoch: 1 Batch: 4563/38378 (11.89%) Loss: 2.207965 LR: 0.00004983 [00:41:52] Epoch: 1 Batch: 4564/38378 (11.89%) Loss: 2.247812 LR: 0.00004983 [00:41:53] Epoch: 1 Batch: 4565/38378 (11.89%) Loss: 1.955693 LR: 0.00004983 [00:41:55] Epoch: 1 Batch: 4566/38378 (11.90%) Loss: 2.056520 LR: 0.00004983 [00:41:57] Epoch: 1 Batch: 4567/38378 (11.90%) Loss: 2.030792 LR: 0.00004983 [00:41:59] Epoch: 1 Batch: 4568/38378 (11.90%) Loss: 2.285382 LR: 0.00004982 [00:42:00] Epoch: 1 Batch: 4569/38378 (11.91%) Loss: 1.965538 LR: 0.00004982 [00:42:02] Epoch: 1 Batch: 4570/38378 (11.91%) Loss: 2.177406 LR: 0.00004982 [00:42:04] Epoch: 1 Batch: 4571/38378 (11.91%) Loss: 2.243164 LR: 0.00004982 [00:42:05] Epoch: 1 Batch: 4572/38378 (11.91%) Loss: 2.136081 LR: 0.00004982 [00:42:07] Epoch: 1 Batch: 4573/38378 (11.92%) Loss: 2.197769 LR: 0.00004982 [00:42:09] Epoch: 1 Batch: 4574/38378 (11.92%) Loss: 1.847170 LR: 0.00004982 [00:42:11] Epoch: 1 Batch: 4575/38378 (11.92%) Loss: 2.093056 LR: 0.00004982 [00:42:12] Epoch: 1 Batch: 4576/38378 (11.92%) Loss: 2.386444 LR: 0.00004982 [00:42:14] Epoch: 1 Batch: 4577/38378 (11.93%) Loss: 2.106261 LR: 0.00004982 [00:42:16] Epoch: 1 Batch: 4578/38378 (11.93%) Loss: 2.004311 LR: 0.00004982 [00:42:17] Epoch: 1 Batch: 4579/38378 (11.93%) Loss: 2.144683 LR: 0.00004982 [00:42:19] Epoch: 1 Batch: 4580/38378 (11.93%) Loss: 2.104072 LR: 0.00004982 [00:42:21] Epoch: 1 Batch: 4581/38378 (11.94%) Loss: 1.991325 LR: 0.00004982 [00:42:23] Epoch: 1 Batch: 4582/38378 (11.94%) Loss: 2.193348 LR: 0.00004982 [00:42:24] Epoch: 1 Batch: 4583/38378 (11.94%) Loss: 2.005103 LR: 0.00004982 [00:42:26] Epoch: 1 Batch: 4584/38378 (11.94%) Loss: 2.170678 LR: 0.00004982 [00:42:28] Epoch: 1 Batch: 4585/38378 (11.95%) Loss: 2.101341 LR: 0.00004982 [00:42:30] Epoch: 1 Batch: 4586/38378 (11.95%) Loss: 2.148072 LR: 0.00004982 [00:42:35] >> Cleaned up old temp checkpoint: epoch1_step4257 [00:42:35] >> Temp checkpoint saved: epoch1_step4587, size: 0.1702 GB [00:42:35] Epoch: 1 Batch: 4587/38378 (11.95%) Loss: 2.107281 LR: 0.00004982 [00:42:37] Epoch: 1 Batch: 4588/38378 (11.95%) Loss: 1.977161 LR: 0.00004982 [00:42:39] Epoch: 1 Batch: 4589/38378 (11.96%) Loss: 2.163453 LR: 0.00004982 [00:42:40] Epoch: 1 Batch: 4590/38378 (11.96%) Loss: 1.928914 LR: 0.00004982 [00:42:42] Epoch: 1 Batch: 4591/38378 (11.96%) Loss: 2.009293 LR: 0.00004982 [00:42:44] Epoch: 1 Batch: 4592/38378 (11.97%) Loss: 2.173346 LR: 0.00004982 [00:42:45] Epoch: 1 Batch: 4593/38378 (11.97%) Loss: 2.097334 LR: 0.00004982 [00:42:47] Epoch: 1 Batch: 4594/38378 (11.97%) Loss: 2.051588 LR: 0.00004982 [00:42:49] Epoch: 1 Batch: 4595/38378 (11.97%) Loss: 2.369077 LR: 0.00004982 [00:42:50] Epoch: 1 Batch: 4596/38378 (11.98%) Loss: 2.395177 LR: 0.00004982 [00:42:52] Epoch: 1 Batch: 4597/38378 (11.98%) Loss: 1.874042 LR: 0.00004982 [00:42:54] Epoch: 1 Batch: 4598/38378 (11.98%) Loss: 2.234525 LR: 0.00004982 [00:42:56] Epoch: 1 Batch: 4599/38378 (11.98%) Loss: 2.101777 LR: 0.00004982 [00:42:57] Epoch: 1 Batch: 4600/38378 (11.99%) Loss: 1.990791 LR: 0.00004982 [00:42:59] Epoch: 1 Batch: 4601/38378 (11.99%) Loss: 1.936930 LR: 0.00004982 [00:43:01] Epoch: 1 Batch: 4602/38378 (11.99%) Loss: 2.098371 LR: 0.00004982 [00:43:03] Epoch: 1 Batch: 4603/38378 (11.99%) Loss: 2.154568 LR: 0.00004982 [00:43:04] Epoch: 1 Batch: 4604/38378 (12.00%) Loss: 1.971428 LR: 0.00004982 [00:43:06] Epoch: 1 Batch: 4605/38378 (12.00%) Loss: 1.912180 LR: 0.00004982 [00:43:08] Epoch: 1 Batch: 4606/38378 (12.00%) Loss: 2.120921 LR: 0.00004982 [00:43:09] Epoch: 1 Batch: 4607/38378 (12.00%) Loss: 2.184255 LR: 0.00004982 [00:43:11] Epoch: 1 Batch: 4608/38378 (12.01%) Loss: 1.977211 LR: 0.00004982 [00:43:13] Epoch: 1 Batch: 4609/38378 (12.01%) Loss: 1.872607 LR: 0.00004982 [00:43:14] Epoch: 1 Batch: 4610/38378 (12.01%) Loss: 1.883629 LR: 0.00004981 [00:43:16] Epoch: 1 Batch: 4611/38378 (12.01%) Loss: 2.018373 LR: 0.00004981 [00:43:18] Epoch: 1 Batch: 4612/38378 (12.02%) Loss: 1.713331 LR: 0.00004981 [00:43:20] Epoch: 1 Batch: 4613/38378 (12.02%) Loss: 2.127102 LR: 0.00004981 [00:43:21] Epoch: 1 Batch: 4614/38378 (12.02%) Loss: 2.183779 LR: 0.00004981 [00:43:23] Epoch: 1 Batch: 4615/38378 (12.03%) Loss: 1.826312 LR: 0.00004981 [00:43:25] Epoch: 1 Batch: 4616/38378 (12.03%) Loss: 2.233393 LR: 0.00004981 [00:43:27] Epoch: 1 Batch: 4617/38378 (12.03%) Loss: 2.058935 LR: 0.00004981 [00:43:28] Epoch: 1 Batch: 4618/38378 (12.03%) Loss: 2.122322 LR: 0.00004981 [00:43:30] Epoch: 1 Batch: 4619/38378 (12.04%) Loss: 2.048433 LR: 0.00004981 [00:43:36] >> Cleaned up old temp checkpoint: epoch1_step4290 [00:43:36] >> Temp checkpoint saved: epoch1_step4620, size: 0.1702 GB [00:43:36] Epoch: 1 Batch: 4620/38378 (12.04%) Loss: 2.122298 LR: 0.00004981 [00:43:38] Epoch: 1 Batch: 4621/38378 (12.04%) Loss: 2.228492 LR: 0.00004981 [00:43:39] Epoch: 1 Batch: 4622/38378 (12.04%) Loss: 2.138320 LR: 0.00004981 [00:43:41] Epoch: 1 Batch: 4623/38378 (12.05%) Loss: 1.998316 LR: 0.00004981 [00:43:43] Epoch: 1 Batch: 4624/38378 (12.05%) Loss: 2.070194 LR: 0.00004981 [00:43:44] Epoch: 1 Batch: 4625/38378 (12.05%) Loss: 2.108939 LR: 0.00004981 [00:43:46] Epoch: 1 Batch: 4626/38378 (12.05%) Loss: 1.918696 LR: 0.00004981 [00:43:48] Epoch: 1 Batch: 4627/38378 (12.06%) Loss: 2.143961 LR: 0.00004981 [00:43:49] Epoch: 1 Batch: 4628/38378 (12.06%) Loss: 2.223374 LR: 0.00004981 [00:43:51] Epoch: 1 Batch: 4629/38378 (12.06%) Loss: 2.178454 LR: 0.00004981 [00:43:53] Epoch: 1 Batch: 4630/38378 (12.06%) Loss: 2.356283 LR: 0.00004981 [00:43:55] Epoch: 1 Batch: 4631/38378 (12.07%) Loss: 1.875424 LR: 0.00004981 [00:43:56] Epoch: 1 Batch: 4632/38378 (12.07%) Loss: 2.258763 LR: 0.00004981 [00:43:58] Epoch: 1 Batch: 4633/38378 (12.07%) Loss: 1.852929 LR: 0.00004981 [00:44:00] Epoch: 1 Batch: 4634/38378 (12.07%) Loss: 2.100655 LR: 0.00004981 [00:44:01] Epoch: 1 Batch: 4635/38378 (12.08%) Loss: 2.088116 LR: 0.00004981 [00:44:03] Epoch: 1 Batch: 4636/38378 (12.08%) Loss: 2.095180 LR: 0.00004981 [00:44:05] Epoch: 1 Batch: 4637/38378 (12.08%) Loss: 2.417222 LR: 0.00004981 [00:44:07] Epoch: 1 Batch: 4638/38378 (12.09%) Loss: 2.080600 LR: 0.00004981 [00:44:08] Epoch: 1 Batch: 4639/38378 (12.09%) Loss: 2.532008 LR: 0.00004981 [00:44:10] Epoch: 1 Batch: 4640/38378 (12.09%) Loss: 2.199915 LR: 0.00004981 [00:44:12] Epoch: 1 Batch: 4641/38378 (12.09%) Loss: 2.101186 LR: 0.00004981 [00:44:13] Epoch: 1 Batch: 4642/38378 (12.10%) Loss: 2.190157 LR: 0.00004981 [00:44:15] Epoch: 1 Batch: 4643/38378 (12.10%) Loss: 2.090321 LR: 0.00004981 [00:44:17] Epoch: 1 Batch: 4644/38378 (12.10%) Loss: 2.107137 LR: 0.00004981 [00:44:19] Epoch: 1 Batch: 4645/38378 (12.10%) Loss: 2.029093 LR: 0.00004981 [00:44:20] Epoch: 1 Batch: 4646/38378 (12.11%) Loss: 1.991805 LR: 0.00004981 [00:44:22] Epoch: 1 Batch: 4647/38378 (12.11%) Loss: 2.175107 LR: 0.00004981 [00:44:24] Epoch: 1 Batch: 4648/38378 (12.11%) Loss: 2.023410 LR: 0.00004981 [00:44:25] Epoch: 1 Batch: 4649/38378 (12.11%) Loss: 2.151998 LR: 0.00004981 [00:44:27] Epoch: 1 Batch: 4650/38378 (12.12%) Loss: 1.914152 LR: 0.00004981 [00:44:29] Epoch: 1 Batch: 4651/38378 (12.12%) Loss: 1.973235 LR: 0.00004981 [00:44:31] Epoch: 1 Batch: 4652/38378 (12.12%) Loss: 2.040840 LR: 0.00004980 [00:44:36] >> Cleaned up old temp checkpoint: epoch1_step4323 [00:44:36] >> Temp checkpoint saved: epoch1_step4653, size: 0.1702 GB [00:44:36] Epoch: 1 Batch: 4653/38378 (12.12%) Loss: 1.834999 LR: 0.00004980 [00:44:38] Epoch: 1 Batch: 4654/38378 (12.13%) Loss: 1.953386 LR: 0.00004980 [00:44:40] Epoch: 1 Batch: 4655/38378 (12.13%) Loss: 1.905205 LR: 0.00004980 [00:44:41] Epoch: 1 Batch: 4656/38378 (12.13%) Loss: 1.800173 LR: 0.00004980 [00:44:43] Epoch: 1 Batch: 4657/38378 (12.13%) Loss: 2.178008 LR: 0.00004980 [00:44:45] Epoch: 1 Batch: 4658/38378 (12.14%) Loss: 2.310735 LR: 0.00004980 [00:44:46] Epoch: 1 Batch: 4659/38378 (12.14%) Loss: 2.168745 LR: 0.00004980 [00:44:48] Epoch: 1 Batch: 4660/38378 (12.14%) Loss: 2.142912 LR: 0.00004980 [00:44:50] Epoch: 1 Batch: 4661/38378 (12.14%) Loss: 1.880079 LR: 0.00004980 [00:44:51] Epoch: 1 Batch: 4662/38378 (12.15%) Loss: 2.316683 LR: 0.00004980 [00:44:53] Epoch: 1 Batch: 4663/38378 (12.15%) Loss: 1.906141 LR: 0.00004980 [00:44:55] Epoch: 1 Batch: 4664/38378 (12.15%) Loss: 2.038142 LR: 0.00004980 [00:44:57] Epoch: 1 Batch: 4665/38378 (12.16%) Loss: 2.109677 LR: 0.00004980 [00:44:58] Epoch: 1 Batch: 4666/38378 (12.16%) Loss: 1.780924 LR: 0.00004980 [00:45:00] Epoch: 1 Batch: 4667/38378 (12.16%) Loss: 2.165264 LR: 0.00004980 [00:45:02] Epoch: 1 Batch: 4668/38378 (12.16%) Loss: 1.982119 LR: 0.00004980 [00:45:03] Epoch: 1 Batch: 4669/38378 (12.17%) Loss: 1.964497 LR: 0.00004980 [00:45:05] Epoch: 1 Batch: 4670/38378 (12.17%) Loss: 2.256456 LR: 0.00004980 [00:45:07] Epoch: 1 Batch: 4671/38378 (12.17%) Loss: 1.907297 LR: 0.00004980 [00:45:09] Epoch: 1 Batch: 4672/38378 (12.17%) Loss: 2.100260 LR: 0.00004980 [00:45:10] Epoch: 1 Batch: 4673/38378 (12.18%) Loss: 1.897476 LR: 0.00004980 [00:45:12] Epoch: 1 Batch: 4674/38378 (12.18%) Loss: 2.233237 LR: 0.00004980 [00:45:14] Epoch: 1 Batch: 4675/38378 (12.18%) Loss: 2.370881 LR: 0.00004980 [00:45:15] Epoch: 1 Batch: 4676/38378 (12.18%) Loss: 1.915416 LR: 0.00004980 [00:45:17] Epoch: 1 Batch: 4677/38378 (12.19%) Loss: 1.917716 LR: 0.00004980 [00:45:19] Epoch: 1 Batch: 4678/38378 (12.19%) Loss: 2.040777 LR: 0.00004980 [00:45:20] Epoch: 1 Batch: 4679/38378 (12.19%) Loss: 2.132517 LR: 0.00004980 [00:45:22] Epoch: 1 Batch: 4680/38378 (12.19%) Loss: 2.137515 LR: 0.00004980 [00:45:24] Epoch: 1 Batch: 4681/38378 (12.20%) Loss: 2.153800 LR: 0.00004980 [00:45:26] Epoch: 1 Batch: 4682/38378 (12.20%) Loss: 2.032082 LR: 0.00004980 [00:45:27] Epoch: 1 Batch: 4683/38378 (12.20%) Loss: 2.113070 LR: 0.00004980 [00:45:29] Epoch: 1 Batch: 4684/38378 (12.20%) Loss: 1.862704 LR: 0.00004980 [00:45:31] Epoch: 1 Batch: 4685/38378 (12.21%) Loss: 2.183102 LR: 0.00004980 [00:45:36] >> Cleaned up old temp checkpoint: epoch1_step4356 [00:45:36] >> Temp checkpoint saved: epoch1_step4686, size: 0.1702 GB [00:45:36] Epoch: 1 Batch: 4686/38378 (12.21%) Loss: 2.214778 LR: 0.00004980 [00:45:38] Epoch: 1 Batch: 4687/38378 (12.21%) Loss: 2.061861 LR: 0.00004980 [00:45:40] Epoch: 1 Batch: 4688/38378 (12.22%) Loss: 2.175989 LR: 0.00004980 [00:45:41] Epoch: 1 Batch: 4689/38378 (12.22%) Loss: 2.217680 LR: 0.00004980 [00:45:43] Epoch: 1 Batch: 4690/38378 (12.22%) Loss: 1.748635 LR: 0.00004980 [00:45:45] Epoch: 1 Batch: 4691/38378 (12.22%) Loss: 1.963620 LR: 0.00004980 [00:45:46] Epoch: 1 Batch: 4692/38378 (12.23%) Loss: 2.026498 LR: 0.00004980 [00:45:48] Epoch: 1 Batch: 4693/38378 (12.23%) Loss: 2.079992 LR: 0.00004980 [00:45:50] Epoch: 1 Batch: 4694/38378 (12.23%) Loss: 1.924025 LR: 0.00004979 [00:45:51] Epoch: 1 Batch: 4695/38378 (12.23%) Loss: 1.978351 LR: 0.00004979 [00:45:53] Epoch: 1 Batch: 4696/38378 (12.24%) Loss: 2.443846 LR: 0.00004979 [00:45:55] Epoch: 1 Batch: 4697/38378 (12.24%) Loss: 1.875061 LR: 0.00004979 [00:45:57] Epoch: 1 Batch: 4698/38378 (12.24%) Loss: 1.707608 LR: 0.00004979 [00:45:58] Epoch: 1 Batch: 4699/38378 (12.24%) Loss: 1.749591 LR: 0.00004979 [00:46:00] Epoch: 1 Batch: 4700/38378 (12.25%) Loss: 2.061278 LR: 0.00004979 [00:46:02] Epoch: 1 Batch: 4701/38378 (12.25%) Loss: 1.888207 LR: 0.00004979 [00:46:03] Epoch: 1 Batch: 4702/38378 (12.25%) Loss: 2.286964 LR: 0.00004979 [00:46:05] Epoch: 1 Batch: 4703/38378 (12.25%) Loss: 2.322436 LR: 0.00004979 [00:46:07] Epoch: 1 Batch: 4704/38378 (12.26%) Loss: 1.914719 LR: 0.00004979 [00:46:09] Epoch: 1 Batch: 4705/38378 (12.26%) Loss: 2.151086 LR: 0.00004979 [00:46:10] Epoch: 1 Batch: 4706/38378 (12.26%) Loss: 2.165530 LR: 0.00004979 [00:46:12] Epoch: 1 Batch: 4707/38378 (12.26%) Loss: 1.817740 LR: 0.00004979 [00:46:14] Epoch: 1 Batch: 4708/38378 (12.27%) Loss: 2.047826 LR: 0.00004979 [00:46:15] Epoch: 1 Batch: 4709/38378 (12.27%) Loss: 1.866875 LR: 0.00004979 [00:46:17] Epoch: 1 Batch: 4710/38378 (12.27%) Loss: 1.998134 LR: 0.00004979 [00:46:19] Epoch: 1 Batch: 4711/38378 (12.28%) Loss: 2.169641 LR: 0.00004979 [00:46:21] Epoch: 1 Batch: 4712/38378 (12.28%) Loss: 1.899395 LR: 0.00004979 [00:46:22] Epoch: 1 Batch: 4713/38378 (12.28%) Loss: 2.082416 LR: 0.00004979 [00:46:24] Epoch: 1 Batch: 4714/38378 (12.28%) Loss: 2.042946 LR: 0.00004979 [00:46:26] Epoch: 1 Batch: 4715/38378 (12.29%) Loss: 2.072518 LR: 0.00004979 [00:46:27] Epoch: 1 Batch: 4716/38378 (12.29%) Loss: 2.054198 LR: 0.00004979 [00:46:29] Epoch: 1 Batch: 4717/38378 (12.29%) Loss: 2.114684 LR: 0.00004979 [00:46:31] Epoch: 1 Batch: 4718/38378 (12.29%) Loss: 2.532222 LR: 0.00004979 [00:46:37] >> Cleaned up old temp checkpoint: epoch1_step4389 [00:46:37] >> Temp checkpoint saved: epoch1_step4719, size: 0.1702 GB [00:46:37] Epoch: 1 Batch: 4719/38378 (12.30%) Loss: 2.022624 LR: 0.00004979 [00:46:38] Epoch: 1 Batch: 4720/38378 (12.30%) Loss: 2.034916 LR: 0.00004979 [00:46:40] Epoch: 1 Batch: 4721/38378 (12.30%) Loss: 2.188519 LR: 0.00004979 [00:46:42] Epoch: 1 Batch: 4722/38378 (12.30%) Loss: 1.860512 LR: 0.00004979 [00:46:43] Epoch: 1 Batch: 4723/38378 (12.31%) Loss: 2.080614 LR: 0.00004979 [00:46:45] Epoch: 1 Batch: 4724/38378 (12.31%) Loss: 2.496634 LR: 0.00004979 [00:46:46] Epoch: 1 Batch: 4725/38378 (12.31%) Loss: 1.936229 LR: 0.00004979 [00:46:48] Epoch: 1 Batch: 4726/38378 (12.31%) Loss: 1.928122 LR: 0.00004979 [00:46:50] Epoch: 1 Batch: 4727/38378 (12.32%) Loss: 2.218178 LR: 0.00004979 [00:46:52] Epoch: 1 Batch: 4728/38378 (12.32%) Loss: 2.118633 LR: 0.00004979 [00:46:53] Epoch: 1 Batch: 4729/38378 (12.32%) Loss: 1.997210 LR: 0.00004978 [00:46:55] Epoch: 1 Batch: 4730/38378 (12.32%) Loss: 2.129689 LR: 0.00004978 [00:46:57] Epoch: 1 Batch: 4731/38378 (12.33%) Loss: 2.107217 LR: 0.00004978 [00:46:58] Epoch: 1 Batch: 4732/38378 (12.33%) Loss: 2.243840 LR: 0.00004978 [00:47:00] Epoch: 1 Batch: 4733/38378 (12.33%) Loss: 1.955986 LR: 0.00004978 [00:47:02] Epoch: 1 Batch: 4734/38378 (12.34%) Loss: 1.936823 LR: 0.00004978 [00:47:03] Epoch: 1 Batch: 4735/38378 (12.34%) Loss: 2.124618 LR: 0.00004978 [00:47:05] Epoch: 1 Batch: 4736/38378 (12.34%) Loss: 1.871942 LR: 0.00004978 [00:47:07] Epoch: 1 Batch: 4737/38378 (12.34%) Loss: 1.926138 LR: 0.00004978 [00:47:09] Epoch: 1 Batch: 4738/38378 (12.35%) Loss: 2.174020 LR: 0.00004978 [00:47:10] Epoch: 1 Batch: 4739/38378 (12.35%) Loss: 2.093034 LR: 0.00004978 [00:47:12] Epoch: 1 Batch: 4740/38378 (12.35%) Loss: 1.952988 LR: 0.00004978 [00:47:14] Epoch: 1 Batch: 4741/38378 (12.35%) Loss: 2.022856 LR: 0.00004978 [00:47:15] Epoch: 1 Batch: 4742/38378 (12.36%) Loss: 2.041429 LR: 0.00004978 [00:47:17] Epoch: 1 Batch: 4743/38378 (12.36%) Loss: 2.075953 LR: 0.00004978 [00:47:19] Epoch: 1 Batch: 4744/38378 (12.36%) Loss: 2.006096 LR: 0.00004978 [00:47:21] Epoch: 1 Batch: 4745/38378 (12.36%) Loss: 2.081679 LR: 0.00004978 [00:47:22] Epoch: 1 Batch: 4746/38378 (12.37%) Loss: 1.964832 LR: 0.00004978 [00:47:24] Epoch: 1 Batch: 4747/38378 (12.37%) Loss: 2.076305 LR: 0.00004978 [00:47:26] Epoch: 1 Batch: 4748/38378 (12.37%) Loss: 2.092150 LR: 0.00004978 [00:47:28] Epoch: 1 Batch: 4749/38378 (12.37%) Loss: 2.064009 LR: 0.00004978 [00:47:29] Epoch: 1 Batch: 4750/38378 (12.38%) Loss: 2.089690 LR: 0.00004978 [00:47:31] Epoch: 1 Batch: 4751/38378 (12.38%) Loss: 1.946348 LR: 0.00004978 [00:47:36] >> Cleaned up old temp checkpoint: epoch1_step4422 [00:47:37] >> Temp checkpoint saved: epoch1_step4752, size: 0.1702 GB [00:47:37] Epoch: 1 Batch: 4752/38378 (12.38%) Loss: 2.296526 LR: 0.00004978 [00:47:38] Epoch: 1 Batch: 4753/38378 (12.38%) Loss: 2.284325 LR: 0.00004978 [00:47:40] Epoch: 1 Batch: 4754/38378 (12.39%) Loss: 1.840384 LR: 0.00004978 [00:47:42] Epoch: 1 Batch: 4755/38378 (12.39%) Loss: 2.184779 LR: 0.00004978 [00:47:43] Epoch: 1 Batch: 4756/38378 (12.39%) Loss: 2.129848 LR: 0.00004978 [00:47:45] Epoch: 1 Batch: 4757/38378 (12.40%) Loss: 2.140034 LR: 0.00004978 [00:47:47] Epoch: 1 Batch: 4758/38378 (12.40%) Loss: 2.139942 LR: 0.00004978 [00:47:48] Epoch: 1 Batch: 4759/38378 (12.40%) Loss: 1.800304 LR: 0.00004978 [00:47:50] Epoch: 1 Batch: 4760/38378 (12.40%) Loss: 2.430086 LR: 0.00004978 [00:47:52] Epoch: 1 Batch: 4761/38378 (12.41%) Loss: 1.839972 LR: 0.00004978 [00:47:53] Epoch: 1 Batch: 4762/38378 (12.41%) Loss: 2.172222 LR: 0.00004978 [00:47:55] Epoch: 1 Batch: 4763/38378 (12.41%) Loss: 2.065644 LR: 0.00004978 [00:47:57] Epoch: 1 Batch: 4764/38378 (12.41%) Loss: 1.704207 LR: 0.00004978 [00:47:59] Epoch: 1 Batch: 4765/38378 (12.42%) Loss: 1.898186 LR: 0.00004978 [00:48:00] Epoch: 1 Batch: 4766/38378 (12.42%) Loss: 2.160420 LR: 0.00004978 [00:48:02] Epoch: 1 Batch: 4767/38378 (12.42%) Loss: 2.024711 LR: 0.00004978 [00:48:04] Epoch: 1 Batch: 4768/38378 (12.42%) Loss: 1.923483 LR: 0.00004978 [00:48:05] Epoch: 1 Batch: 4769/38378 (12.43%) Loss: 2.131946 LR: 0.00004978 [00:48:07] Epoch: 1 Batch: 4770/38378 (12.43%) Loss: 2.076715 LR: 0.00004978 [00:48:09] Epoch: 1 Batch: 4771/38378 (12.43%) Loss: 1.882964 LR: 0.00004977 [00:48:11] Epoch: 1 Batch: 4772/38378 (12.43%) Loss: 2.387631 LR: 0.00004977 [00:48:12] Epoch: 1 Batch: 4773/38378 (12.44%) Loss: 2.012633 LR: 0.00004977 [00:48:14] Epoch: 1 Batch: 4774/38378 (12.44%) Loss: 2.012357 LR: 0.00004977 [00:48:16] Epoch: 1 Batch: 4775/38378 (12.44%) Loss: 2.288540 LR: 0.00004977 [00:48:17] Epoch: 1 Batch: 4776/38378 (12.44%) Loss: 2.045963 LR: 0.00004977 [00:48:19] Epoch: 1 Batch: 4777/38378 (12.45%) Loss: 2.264065 LR: 0.00004977 [00:48:21] Epoch: 1 Batch: 4778/38378 (12.45%) Loss: 1.942488 LR: 0.00004977 [00:48:22] Epoch: 1 Batch: 4779/38378 (12.45%) Loss: 2.111010 LR: 0.00004977 [00:48:24] Epoch: 1 Batch: 4780/38378 (12.46%) Loss: 2.105346 LR: 0.00004977 [00:48:26] Epoch: 1 Batch: 4781/38378 (12.46%) Loss: 2.143708 LR: 0.00004977 [00:48:28] Epoch: 1 Batch: 4782/38378 (12.46%) Loss: 2.048092 LR: 0.00004977 [00:48:29] Epoch: 1 Batch: 4783/38378 (12.46%) Loss: 1.939137 LR: 0.00004977 [00:48:31] Epoch: 1 Batch: 4784/38378 (12.47%) Loss: 2.062071 LR: 0.00004977 [00:48:37] >> Cleaned up old temp checkpoint: epoch1_step4455 [00:48:37] >> Temp checkpoint saved: epoch1_step4785, size: 0.1702 GB [00:48:37] Epoch: 1 Batch: 4785/38378 (12.47%) Loss: 2.157162 LR: 0.00004977 [00:48:39] Epoch: 1 Batch: 4786/38378 (12.47%) Loss: 1.923687 LR: 0.00004977 [00:48:40] Epoch: 1 Batch: 4787/38378 (12.47%) Loss: 2.169578 LR: 0.00004977 [00:48:42] Epoch: 1 Batch: 4788/38378 (12.48%) Loss: 2.147032 LR: 0.00004977 [00:48:44] Epoch: 1 Batch: 4789/38378 (12.48%) Loss: 1.957973 LR: 0.00004977 [00:48:45] Epoch: 1 Batch: 4790/38378 (12.48%) Loss: 1.976629 LR: 0.00004977 [00:48:47] Epoch: 1 Batch: 4791/38378 (12.48%) Loss: 2.343859 LR: 0.00004977 [00:48:49] Epoch: 1 Batch: 4792/38378 (12.49%) Loss: 2.313522 LR: 0.00004977 [00:48:50] Epoch: 1 Batch: 4793/38378 (12.49%) Loss: 2.110001 LR: 0.00004977 [00:48:52] Epoch: 1 Batch: 4794/38378 (12.49%) Loss: 1.908735 LR: 0.00004977 [00:48:54] Epoch: 1 Batch: 4795/38378 (12.49%) Loss: 1.816482 LR: 0.00004977 [00:48:55] Epoch: 1 Batch: 4796/38378 (12.50%) Loss: 1.868999 LR: 0.00004977 [00:48:57] Epoch: 1 Batch: 4797/38378 (12.50%) Loss: 2.287471 LR: 0.00004977 [00:48:59] Epoch: 1 Batch: 4798/38378 (12.50%) Loss: 1.777799 LR: 0.00004977 [00:49:01] Epoch: 1 Batch: 4799/38378 (12.50%) Loss: 2.064521 LR: 0.00004977 [00:49:02] Epoch: 1 Batch: 4800/38378 (12.51%) Loss: 2.126427 LR: 0.00004977 [00:49:04] Epoch: 1 Batch: 4801/38378 (12.51%) Loss: 2.190462 LR: 0.00004977 [00:49:06] Epoch: 1 Batch: 4802/38378 (12.51%) Loss: 2.260711 LR: 0.00004977 [00:49:07] Epoch: 1 Batch: 4803/38378 (12.51%) Loss: 1.916076 LR: 0.00004977 [00:49:09] Epoch: 1 Batch: 4804/38378 (12.52%) Loss: 2.008036 LR: 0.00004977 [00:49:11] Epoch: 1 Batch: 4805/38378 (12.52%) Loss: 2.027388 LR: 0.00004977 [00:49:13] Epoch: 1 Batch: 4806/38378 (12.52%) Loss: 2.152166 LR: 0.00004976 [00:49:14] Epoch: 1 Batch: 4807/38378 (12.53%) Loss: 2.322752 LR: 0.00004976 [00:49:16] Epoch: 1 Batch: 4808/38378 (12.53%) Loss: 1.786327 LR: 0.00004976 [00:49:18] Epoch: 1 Batch: 4809/38378 (12.53%) Loss: 1.895355 LR: 0.00004976 [00:49:19] Epoch: 1 Batch: 4810/38378 (12.53%) Loss: 2.356525 LR: 0.00004976 [00:49:21] Epoch: 1 Batch: 4811/38378 (12.54%) Loss: 2.121958 LR: 0.00004976 [00:49:23] Epoch: 1 Batch: 4812/38378 (12.54%) Loss: 2.007089 LR: 0.00004976 [00:49:25] Epoch: 1 Batch: 4813/38378 (12.54%) Loss: 1.956484 LR: 0.00004976 [00:49:26] Epoch: 1 Batch: 4814/38378 (12.54%) Loss: 2.023003 LR: 0.00004976 [00:49:28] Epoch: 1 Batch: 4815/38378 (12.55%) Loss: 1.920226 LR: 0.00004976 [00:49:30] Epoch: 1 Batch: 4816/38378 (12.55%) Loss: 1.868517 LR: 0.00004976 [00:49:32] Epoch: 1 Batch: 4817/38378 (12.55%) Loss: 1.859007 LR: 0.00004976 [00:49:37] >> Cleaned up old temp checkpoint: epoch1_step4488 [00:49:37] >> Temp checkpoint saved: epoch1_step4818, size: 0.1702 GB [00:49:37] Epoch: 1 Batch: 4818/38378 (12.55%) Loss: 2.058723 LR: 0.00004976 [00:49:39] Epoch: 1 Batch: 4819/38378 (12.56%) Loss: 1.934783 LR: 0.00004976 [00:49:41] Epoch: 1 Batch: 4820/38378 (12.56%) Loss: 2.255737 LR: 0.00004976 [00:49:42] Epoch: 1 Batch: 4821/38378 (12.56%) Loss: 2.090794 LR: 0.00004976 [00:49:44] Epoch: 1 Batch: 4822/38378 (12.56%) Loss: 1.863737 LR: 0.00004976 [00:49:46] Epoch: 1 Batch: 4823/38378 (12.57%) Loss: 2.013553 LR: 0.00004976 [00:49:47] Epoch: 1 Batch: 4824/38378 (12.57%) Loss: 1.950961 LR: 0.00004976 [00:49:49] Epoch: 1 Batch: 4825/38378 (12.57%) Loss: 2.178843 LR: 0.00004976 [00:49:51] Epoch: 1 Batch: 4826/38378 (12.57%) Loss: 2.069467 LR: 0.00004976 [00:49:52] Epoch: 1 Batch: 4827/38378 (12.58%) Loss: 2.103805 LR: 0.00004976 [00:49:54] Epoch: 1 Batch: 4828/38378 (12.58%) Loss: 2.165799 LR: 0.00004976 [00:49:56] Epoch: 1 Batch: 4829/38378 (12.58%) Loss: 1.826341 LR: 0.00004976 [00:49:57] Epoch: 1 Batch: 4830/38378 (12.59%) Loss: 2.405940 LR: 0.00004976 [00:49:59] Epoch: 1 Batch: 4831/38378 (12.59%) Loss: 2.105283 LR: 0.00004976 [00:50:01] Epoch: 1 Batch: 4832/38378 (12.59%) Loss: 2.235296 LR: 0.00004976 [00:50:02] Epoch: 1 Batch: 4833/38378 (12.59%) Loss: 2.002461 LR: 0.00004976 [00:50:04] Epoch: 1 Batch: 4834/38378 (12.60%) Loss: 2.029455 LR: 0.00004976 [00:50:06] Epoch: 1 Batch: 4835/38378 (12.60%) Loss: 2.374368 LR: 0.00004976 [00:50:07] Epoch: 1 Batch: 4836/38378 (12.60%) Loss: 2.051934 LR: 0.00004976 [00:50:09] Epoch: 1 Batch: 4837/38378 (12.60%) Loss: 1.833718 LR: 0.00004976 [00:50:11] Epoch: 1 Batch: 4838/38378 (12.61%) Loss: 2.203160 LR: 0.00004976 [00:50:13] Epoch: 1 Batch: 4839/38378 (12.61%) Loss: 2.094539 LR: 0.00004976 [00:50:14] Epoch: 1 Batch: 4840/38378 (12.61%) Loss: 1.947897 LR: 0.00004976 [00:50:16] Epoch: 1 Batch: 4841/38378 (12.61%) Loss: 2.059309 LR: 0.00004975 [00:50:18] Epoch: 1 Batch: 4842/38378 (12.62%) Loss: 2.228027 LR: 0.00004975 [00:50:19] Epoch: 1 Batch: 4843/38378 (12.62%) Loss: 1.885186 LR: 0.00004975 [00:50:21] Epoch: 1 Batch: 4844/38378 (12.62%) Loss: 1.904515 LR: 0.00004975 [00:50:23] Epoch: 1 Batch: 4845/38378 (12.62%) Loss: 1.929056 LR: 0.00004975 [00:50:24] Epoch: 1 Batch: 4846/38378 (12.63%) Loss: 2.072643 LR: 0.00004975 [00:50:26] Epoch: 1 Batch: 4847/38378 (12.63%) Loss: 2.260052 LR: 0.00004975 [00:50:28] Epoch: 1 Batch: 4848/38378 (12.63%) Loss: 2.102242 LR: 0.00004975 [00:50:29] Epoch: 1 Batch: 4849/38378 (12.63%) Loss: 2.126989 LR: 0.00004975 [00:50:31] Epoch: 1 Batch: 4850/38378 (12.64%) Loss: 2.296492 LR: 0.00004975 [00:50:37] >> Cleaned up old temp checkpoint: epoch1_step4521 [00:50:37] >> Temp checkpoint saved: epoch1_step4851, size: 0.1702 GB [00:50:37] Epoch: 1 Batch: 4851/38378 (12.64%) Loss: 1.950151 LR: 0.00004975 [00:50:39] Epoch: 1 Batch: 4852/38378 (12.64%) Loss: 1.799632 LR: 0.00004975 [00:50:40] Epoch: 1 Batch: 4853/38378 (12.65%) Loss: 2.047917 LR: 0.00004975 [00:50:42] Epoch: 1 Batch: 4854/38378 (12.65%) Loss: 1.738240 LR: 0.00004975 [00:50:44] Epoch: 1 Batch: 4855/38378 (12.65%) Loss: 2.196216 LR: 0.00004975 [00:50:45] Epoch: 1 Batch: 4856/38378 (12.65%) Loss: 1.909480 LR: 0.00004975 [00:50:47] Epoch: 1 Batch: 4857/38378 (12.66%) Loss: 2.062992 LR: 0.00004975 [00:50:49] Epoch: 1 Batch: 4858/38378 (12.66%) Loss: 1.913254 LR: 0.00004975 [00:50:50] Epoch: 1 Batch: 4859/38378 (12.66%) Loss: 2.123627 LR: 0.00004975 [00:50:52] Epoch: 1 Batch: 4860/38378 (12.66%) Loss: 1.933706 LR: 0.00004975 [00:50:54] Epoch: 1 Batch: 4861/38378 (12.67%) Loss: 1.913769 LR: 0.00004975 [00:50:56] Epoch: 1 Batch: 4862/38378 (12.67%) Loss: 2.040353 LR: 0.00004975 [00:50:57] Epoch: 1 Batch: 4863/38378 (12.67%) Loss: 1.915283 LR: 0.00004975 [00:50:59] Epoch: 1 Batch: 4864/38378 (12.67%) Loss: 2.357000 LR: 0.00004975 [00:51:01] Epoch: 1 Batch: 4865/38378 (12.68%) Loss: 2.127616 LR: 0.00004975 [00:51:02] Epoch: 1 Batch: 4866/38378 (12.68%) Loss: 1.948076 LR: 0.00004975 [00:51:04] Epoch: 1 Batch: 4867/38378 (12.68%) Loss: 2.333989 LR: 0.00004975 [00:51:06] Epoch: 1 Batch: 4868/38378 (12.68%) Loss: 1.792589 LR: 0.00004975 [00:51:08] Epoch: 1 Batch: 4869/38378 (12.69%) Loss: 2.078964 LR: 0.00004975 [00:51:09] Epoch: 1 Batch: 4870/38378 (12.69%) Loss: 1.809301 LR: 0.00004975 [00:51:11] Epoch: 1 Batch: 4871/38378 (12.69%) Loss: 2.090938 LR: 0.00004975 [00:51:13] Epoch: 1 Batch: 4872/38378 (12.69%) Loss: 1.987533 LR: 0.00004975 [00:51:14] Epoch: 1 Batch: 4873/38378 (12.70%) Loss: 2.152270 LR: 0.00004975 [00:51:16] Epoch: 1 Batch: 4874/38378 (12.70%) Loss: 2.092277 LR: 0.00004975 [00:51:17] Epoch: 1 Batch: 4875/38378 (12.70%) Loss: 2.072474 LR: 0.00004975 [00:51:19] Epoch: 1 Batch: 4876/38378 (12.71%) Loss: 2.035483 LR: 0.00004974 [00:51:21] Epoch: 1 Batch: 4877/38378 (12.71%) Loss: 2.338224 LR: 0.00004974 [00:51:23] Epoch: 1 Batch: 4878/38378 (12.71%) Loss: 1.879883 LR: 0.00004974 [00:51:24] Epoch: 1 Batch: 4879/38378 (12.71%) Loss: 1.850033 LR: 0.00004974 [00:51:26] Epoch: 1 Batch: 4880/38378 (12.72%) Loss: 2.295108 LR: 0.00004974 [00:51:28] Epoch: 1 Batch: 4881/38378 (12.72%) Loss: 1.933998 LR: 0.00004974 [00:51:30] Epoch: 1 Batch: 4882/38378 (12.72%) Loss: 1.624529 LR: 0.00004974 [00:51:31] Epoch: 1 Batch: 4883/38378 (12.72%) Loss: 1.972017 LR: 0.00004974 [00:51:37] >> Cleaned up old temp checkpoint: epoch1_step4554 [00:51:37] >> Temp checkpoint saved: epoch1_step4884, size: 0.1702 GB [00:51:37] Epoch: 1 Batch: 4884/38378 (12.73%) Loss: 1.918211 LR: 0.00004974 [00:51:38] Epoch: 1 Batch: 4885/38378 (12.73%) Loss: 1.882912 LR: 0.00004974 [00:51:40] Epoch: 1 Batch: 4886/38378 (12.73%) Loss: 2.196185 LR: 0.00004974 [00:51:42] Epoch: 1 Batch: 4887/38378 (12.73%) Loss: 1.950956 LR: 0.00004974 [00:51:43] Epoch: 1 Batch: 4888/38378 (12.74%) Loss: 2.346004 LR: 0.00004974 [00:51:45] Epoch: 1 Batch: 4889/38378 (12.74%) Loss: 2.143733 LR: 0.00004974 [00:51:47] Epoch: 1 Batch: 4890/38378 (12.74%) Loss: 2.124403 LR: 0.00004974 [00:51:49] Epoch: 1 Batch: 4891/38378 (12.74%) Loss: 2.130190 LR: 0.00004974 [00:51:50] Epoch: 1 Batch: 4892/38378 (12.75%) Loss: 1.783804 LR: 0.00004974 [00:51:52] Epoch: 1 Batch: 4893/38378 (12.75%) Loss: 1.758174 LR: 0.00004974 [00:51:54] Epoch: 1 Batch: 4894/38378 (12.75%) Loss: 2.204987 LR: 0.00004974 [00:51:55] Epoch: 1 Batch: 4895/38378 (12.75%) Loss: 1.956018 LR: 0.00004974 [00:51:57] Epoch: 1 Batch: 4896/38378 (12.76%) Loss: 2.012085 LR: 0.00004974 [00:51:59] Epoch: 1 Batch: 4897/38378 (12.76%) Loss: 1.948844 LR: 0.00004974 [00:52:00] Epoch: 1 Batch: 4898/38378 (12.76%) Loss: 1.795234 LR: 0.00004974 [00:52:02] Epoch: 1 Batch: 4899/38378 (12.77%) Loss: 2.339830 LR: 0.00004974 [00:52:04] Epoch: 1 Batch: 4900/38378 (12.77%) Loss: 2.230311 LR: 0.00004974 [00:52:06] Epoch: 1 Batch: 4901/38378 (12.77%) Loss: 1.732650 LR: 0.00004974 [00:52:07] Epoch: 1 Batch: 4902/38378 (12.77%) Loss: 1.809012 LR: 0.00004974 [00:52:09] Epoch: 1 Batch: 4903/38378 (12.78%) Loss: 1.978222 LR: 0.00004974 [00:52:11] Epoch: 1 Batch: 4904/38378 (12.78%) Loss: 2.086674 LR: 0.00004974 [00:52:13] Epoch: 1 Batch: 4905/38378 (12.78%) Loss: 1.963109 LR: 0.00004974 [00:52:14] Epoch: 1 Batch: 4906/38378 (12.78%) Loss: 1.846334 LR: 0.00004974 [00:52:16] Epoch: 1 Batch: 4907/38378 (12.79%) Loss: 1.962060 LR: 0.00004974 [00:52:18] Epoch: 1 Batch: 4908/38378 (12.79%) Loss: 2.083555 LR: 0.00004974 [00:52:19] Epoch: 1 Batch: 4909/38378 (12.79%) Loss: 1.906203 LR: 0.00004974 [00:52:21] Epoch: 1 Batch: 4910/38378 (12.79%) Loss: 2.003086 LR: 0.00004974 [00:52:23] Epoch: 1 Batch: 4911/38378 (12.80%) Loss: 1.844440 LR: 0.00004973 [00:52:24] Epoch: 1 Batch: 4912/38378 (12.80%) Loss: 2.012722 LR: 0.00004973 [00:52:26] Epoch: 1 Batch: 4913/38378 (12.80%) Loss: 1.996265 LR: 0.00004973 [00:52:28] Epoch: 1 Batch: 4914/38378 (12.80%) Loss: 1.992287 LR: 0.00004973 [00:52:30] Epoch: 1 Batch: 4915/38378 (12.81%) Loss: 1.872170 LR: 0.00004973 [00:52:31] Epoch: 1 Batch: 4916/38378 (12.81%) Loss: 1.995432 LR: 0.00004973 [00:52:37] >> Cleaned up old temp checkpoint: epoch1_step4587 [00:52:37] >> Temp checkpoint saved: epoch1_step4917, size: 0.1702 GB [00:52:37] Epoch: 1 Batch: 4917/38378 (12.81%) Loss: 2.304372 LR: 0.00004973 [00:52:39] Epoch: 1 Batch: 4918/38378 (12.81%) Loss: 1.922926 LR: 0.00004973 [00:52:40] Epoch: 1 Batch: 4919/38378 (12.82%) Loss: 2.251523 LR: 0.00004973 [00:52:42] Epoch: 1 Batch: 4920/38378 (12.82%) Loss: 1.894608 LR: 0.00004973 [00:52:44] Epoch: 1 Batch: 4921/38378 (12.82%) Loss: 1.962688 LR: 0.00004973 [00:52:45] Epoch: 1 Batch: 4922/38378 (12.83%) Loss: 2.127648 LR: 0.00004973 [00:52:47] Epoch: 1 Batch: 4923/38378 (12.83%) Loss: 2.162582 LR: 0.00004973 [00:52:49] Epoch: 1 Batch: 4924/38378 (12.83%) Loss: 2.027888 LR: 0.00004973 [00:52:51] Epoch: 1 Batch: 4925/38378 (12.83%) Loss: 2.252507 LR: 0.00004973 [00:52:52] Epoch: 1 Batch: 4926/38378 (12.84%) Loss: 1.955570 LR: 0.00004973 [00:52:54] Epoch: 1 Batch: 4927/38378 (12.84%) Loss: 1.931421 LR: 0.00004973 [00:52:56] Epoch: 1 Batch: 4928/38378 (12.84%) Loss: 2.053542 LR: 0.00004973 [00:52:57] Epoch: 1 Batch: 4929/38378 (12.84%) Loss: 1.927570 LR: 0.00004973 [00:52:59] Epoch: 1 Batch: 4930/38378 (12.85%) Loss: 1.959551 LR: 0.00004973 [00:53:01] Epoch: 1 Batch: 4931/38378 (12.85%) Loss: 1.882183 LR: 0.00004973 [00:53:03] Epoch: 1 Batch: 4932/38378 (12.85%) Loss: 1.792406 LR: 0.00004973 [00:53:04] Epoch: 1 Batch: 4933/38378 (12.85%) Loss: 2.085823 LR: 0.00004973 [00:53:06] Epoch: 1 Batch: 4934/38378 (12.86%) Loss: 2.424347 LR: 0.00004973 [00:53:08] Epoch: 1 Batch: 4935/38378 (12.86%) Loss: 1.732540 LR: 0.00004973 [00:53:09] Epoch: 1 Batch: 4936/38378 (12.86%) Loss: 1.817635 LR: 0.00004973 [00:53:11] Epoch: 1 Batch: 4937/38378 (12.86%) Loss: 2.084664 LR: 0.00004973 [00:53:13] Epoch: 1 Batch: 4938/38378 (12.87%) Loss: 2.066056 LR: 0.00004973 [00:53:15] Epoch: 1 Batch: 4939/38378 (12.87%) Loss: 2.077584 LR: 0.00004973 [00:53:16] Epoch: 1 Batch: 4940/38378 (12.87%) Loss: 2.074690 LR: 0.00004973 [00:53:18] Epoch: 1 Batch: 4941/38378 (12.87%) Loss: 2.264418 LR: 0.00004973 [00:53:20] Epoch: 1 Batch: 4942/38378 (12.88%) Loss: 2.012218 LR: 0.00004973 [00:53:22] Epoch: 1 Batch: 4943/38378 (12.88%) Loss: 2.204513 LR: 0.00004973 [00:53:23] Epoch: 1 Batch: 4944/38378 (12.88%) Loss: 2.046588 LR: 0.00004973 [00:53:25] Epoch: 1 Batch: 4945/38378 (12.88%) Loss: 2.248729 LR: 0.00004973 [00:53:27] Epoch: 1 Batch: 4946/38378 (12.89%) Loss: 2.210101 LR: 0.00004972 [00:53:28] Epoch: 1 Batch: 4947/38378 (12.89%) Loss: 1.924088 LR: 0.00004972 [00:53:30] Epoch: 1 Batch: 4948/38378 (12.89%) Loss: 2.131400 LR: 0.00004972 [00:53:32] Epoch: 1 Batch: 4949/38378 (12.90%) Loss: 1.825895 LR: 0.00004972 [00:53:37] >> Cleaned up old temp checkpoint: epoch1_step4620 [00:53:37] >> Temp checkpoint saved: epoch1_step4950, size: 0.1702 GB [00:53:37] Epoch: 1 Batch: 4950/38378 (12.90%) Loss: 1.963036 LR: 0.00004972 [00:53:39] Epoch: 1 Batch: 4951/38378 (12.90%) Loss: 1.815051 LR: 0.00004972 [00:53:41] Epoch: 1 Batch: 4952/38378 (12.90%) Loss: 2.119629 LR: 0.00004972 [00:53:42] Epoch: 1 Batch: 4953/38378 (12.91%) Loss: 2.269242 LR: 0.00004972 [00:53:44] Epoch: 1 Batch: 4954/38378 (12.91%) Loss: 2.039976 LR: 0.00004972 [00:53:46] Epoch: 1 Batch: 4955/38378 (12.91%) Loss: 2.090694 LR: 0.00004972 [00:53:48] Epoch: 1 Batch: 4956/38378 (12.91%) Loss: 2.054755 LR: 0.00004972 [00:53:49] Epoch: 1 Batch: 4957/38378 (12.92%) Loss: 1.865907 LR: 0.00004972 [00:53:51] Epoch: 1 Batch: 4958/38378 (12.92%) Loss: 1.840482 LR: 0.00004972 [00:53:53] Epoch: 1 Batch: 4959/38378 (12.92%) Loss: 1.916909 LR: 0.00004972 [00:53:54] Epoch: 1 Batch: 4960/38378 (12.92%) Loss: 2.035165 LR: 0.00004972 [00:53:56] Epoch: 1 Batch: 4961/38378 (12.93%) Loss: 2.204831 LR: 0.00004972 [00:53:58] Epoch: 1 Batch: 4962/38378 (12.93%) Loss: 2.165701 LR: 0.00004972 [00:54:00] Epoch: 1 Batch: 4963/38378 (12.93%) Loss: 2.253197 LR: 0.00004972 [00:54:01] Epoch: 1 Batch: 4964/38378 (12.93%) Loss: 2.223191 LR: 0.00004972 [00:54:03] Epoch: 1 Batch: 4965/38378 (12.94%) Loss: 1.903635 LR: 0.00004972 [00:54:05] Epoch: 1 Batch: 4966/38378 (12.94%) Loss: 1.944886 LR: 0.00004972 [00:54:06] Epoch: 1 Batch: 4967/38378 (12.94%) Loss: 2.221573 LR: 0.00004972 [00:54:08] Epoch: 1 Batch: 4968/38378 (12.94%) Loss: 1.973347 LR: 0.00004972 [00:54:10] Epoch: 1 Batch: 4969/38378 (12.95%) Loss: 2.130461 LR: 0.00004972 [00:54:12] Epoch: 1 Batch: 4970/38378 (12.95%) Loss: 1.918488 LR: 0.00004972 [00:54:13] Epoch: 1 Batch: 4971/38378 (12.95%) Loss: 2.240085 LR: 0.00004972 [00:54:15] Epoch: 1 Batch: 4972/38378 (12.96%) Loss: 2.066537 LR: 0.00004972 [00:54:17] Epoch: 1 Batch: 4973/38378 (12.96%) Loss: 1.947742 LR: 0.00004972 [00:54:18] Epoch: 1 Batch: 4974/38378 (12.96%) Loss: 1.949132 LR: 0.00004972 [00:54:20] Epoch: 1 Batch: 4975/38378 (12.96%) Loss: 2.112357 LR: 0.00004972 [00:54:22] Epoch: 1 Batch: 4976/38378 (12.97%) Loss: 1.913219 LR: 0.00004972 [00:54:24] Epoch: 1 Batch: 4977/38378 (12.97%) Loss: 2.044577 LR: 0.00004972 [00:54:25] Epoch: 1 Batch: 4978/38378 (12.97%) Loss: 2.392039 LR: 0.00004972 [00:54:27] Epoch: 1 Batch: 4979/38378 (12.97%) Loss: 2.292930 LR: 0.00004972 [00:54:29] Epoch: 1 Batch: 4980/38378 (12.98%) Loss: 2.337845 LR: 0.00004972 [00:54:31] Epoch: 1 Batch: 4981/38378 (12.98%) Loss: 2.258076 LR: 0.00004971 [00:54:32] Epoch: 1 Batch: 4982/38378 (12.98%) Loss: 1.879782 LR: 0.00004971 [00:54:38] >> Cleaned up old temp checkpoint: epoch1_step4653 [00:54:38] >> Temp checkpoint saved: epoch1_step4983, size: 0.1702 GB [00:54:38] Epoch: 1 Batch: 4983/38378 (12.98%) Loss: 2.094370 LR: 0.00004971 [00:54:39] Epoch: 1 Batch: 4984/38378 (12.99%) Loss: 2.038706 LR: 0.00004971 [00:54:41] Epoch: 1 Batch: 4985/38378 (12.99%) Loss: 1.877685 LR: 0.00004971 [00:54:43] Epoch: 1 Batch: 4986/38378 (12.99%) Loss: 2.243435 LR: 0.00004971 [00:54:44] Epoch: 1 Batch: 4987/38378 (12.99%) Loss: 1.945118 LR: 0.00004971 [00:54:46] Epoch: 1 Batch: 4988/38378 (13.00%) Loss: 2.254821 LR: 0.00004971 [00:54:48] Epoch: 1 Batch: 4989/38378 (13.00%) Loss: 2.200989 LR: 0.00004971 [00:54:50] Epoch: 1 Batch: 4990/38378 (13.00%) Loss: 2.071805 LR: 0.00004971 [00:54:51] Epoch: 1 Batch: 4991/38378 (13.00%) Loss: 2.034578 LR: 0.00004971 [00:54:53] Epoch: 1 Batch: 4992/38378 (13.01%) Loss: 1.702497 LR: 0.00004971 [00:54:55] Epoch: 1 Batch: 4993/38378 (13.01%) Loss: 2.311357 LR: 0.00004971 [00:54:56] Epoch: 1 Batch: 4994/38378 (13.01%) Loss: 2.051034 LR: 0.00004971 [00:54:58] Epoch: 1 Batch: 4995/38378 (13.02%) Loss: 1.997980 LR: 0.00004971 [00:55:00] Epoch: 1 Batch: 4996/38378 (13.02%) Loss: 1.801209 LR: 0.00004971 [00:55:01] Epoch: 1 Batch: 4997/38378 (13.02%) Loss: 2.065601 LR: 0.00004971 [00:55:03] Epoch: 1 Batch: 4998/38378 (13.02%) Loss: 2.072617 LR: 0.00004971 [00:55:05] Epoch: 1 Batch: 4999/38378 (13.03%) Loss: 2.198611 LR: 0.00004971 [00:55:07] >> Evaluating batch 0 [00:55:08] >> Evaluating batch 1 [00:55:08] >> Evaluating batch 2 [00:55:09] >> Evaluating batch 3 [00:55:10] >> Evaluating batch 4 [00:55:11] >> Evaluating batch 5 [00:55:12] >> Evaluating batch 6 [00:55:13] >> Evaluating batch 7 [00:55:14] >> Evaluating batch 8 [00:55:15] >> Evaluating batch 9 [00:55:16] >> Evaluating batch 10 [00:55:17] >> Evaluating batch 11 [00:55:18] >> Evaluating batch 12 [00:55:19] >> Evaluating batch 13 [00:55:20] >> Evaluating batch 14 [00:55:21] >> Evaluating batch 15 [00:55:22] >> Evaluating batch 16 [00:55:22] Epoch: 1 Step: 5000/38378 Evaluation: [00:55:22] [1mAvg Loss Since Last Eval: 2.0586 Val Loss: 2.1513 Validation loss delta: 0.0010 Perplexity: 8.5957 LR: 0.00004971 [00:55:27] >> Checkpoint saved: epoch1_step5000, size: 0.1702 GB [00:55:27] Epoch: 1 Batch: 5000/38378 (13.03%) Loss: 2.014356 LR: 0.00004971 [00:55:28] Epoch: 1 Batch: 5001/38378 (13.03%) Loss: 1.718513 LR: 0.00004971 [00:55:30] Epoch: 1 Batch: 5002/38378 (13.03%) Loss: 1.757577 LR: 0.00004971 [00:55:32] Epoch: 1 Batch: 5003/38378 (13.04%) Loss: 1.729069 LR: 0.00004971 [00:55:33] Epoch: 1 Batch: 5004/38378 (13.04%) Loss: 2.201573 LR: 0.00004971 [00:55:35] Epoch: 1 Batch: 5005/38378 (13.04%) Loss: 1.879069 LR: 0.00004971 [00:55:37] Epoch: 1 Batch: 5006/38378 (13.04%) Loss: 2.059279 LR: 0.00004971 [00:55:38] Epoch: 1 Batch: 5007/38378 (13.05%) Loss: 1.839095 LR: 0.00004971 [00:55:40] Epoch: 1 Batch: 5008/38378 (13.05%) Loss: 2.056367 LR: 0.00004971 [00:55:42] Epoch: 1 Batch: 5009/38378 (13.05%) Loss: 2.137700 LR: 0.00004971 [00:55:43] Epoch: 1 Batch: 5010/38378 (13.05%) Loss: 2.415473 LR: 0.00004971 [00:55:45] Epoch: 1 Batch: 5011/38378 (13.06%) Loss: 1.880968 LR: 0.00004971 [00:55:47] Epoch: 1 Batch: 5012/38378 (13.06%) Loss: 1.970077 LR: 0.00004971 [00:55:48] Epoch: 1 Batch: 5013/38378 (13.06%) Loss: 1.986072 LR: 0.00004971 [00:55:50] Epoch: 1 Batch: 5014/38378 (13.06%) Loss: 2.193713 LR: 0.00004971 [00:55:52] Epoch: 1 Batch: 5015/38378 (13.07%) Loss: 2.013559 LR: 0.00004971 [00:55:57] >> Cleaned up old temp checkpoint: epoch1_step4686 [00:55:57] >> Temp checkpoint saved: epoch1_step5016, size: 0.1702 GB [00:55:57] Epoch: 1 Batch: 5016/38378 (13.07%) Loss: 2.037997 LR: 0.00004970 [00:55:59] Epoch: 1 Batch: 5017/38378 (13.07%) Loss: 2.112754 LR: 0.00004970 [00:56:01] Epoch: 1 Batch: 5018/38378 (13.08%) Loss: 2.199662 LR: 0.00004970 [00:56:03] Epoch: 1 Batch: 5019/38378 (13.08%) Loss: 2.054731 LR: 0.00004970 [00:56:04] Epoch: 1 Batch: 5020/38378 (13.08%) Loss: 1.893900 LR: 0.00004970 [00:56:06] Epoch: 1 Batch: 5021/38378 (13.08%) Loss: 1.925575 LR: 0.00004970 [00:56:08] Epoch: 1 Batch: 5022/38378 (13.09%) Loss: 1.773102 LR: 0.00004970 [00:56:10] Epoch: 1 Batch: 5023/38378 (13.09%) Loss: 1.861257 LR: 0.00004970 [00:56:11] Epoch: 1 Batch: 5024/38378 (13.09%) Loss: 2.214829 LR: 0.00004970 [00:56:13] Epoch: 1 Batch: 5025/38378 (13.09%) Loss: 2.260908 LR: 0.00004970 [00:56:15] Epoch: 1 Batch: 5026/38378 (13.10%) Loss: 2.126524 LR: 0.00004970 [00:56:16] Epoch: 1 Batch: 5027/38378 (13.10%) Loss: 1.851691 LR: 0.00004970 [00:56:18] Epoch: 1 Batch: 5028/38378 (13.10%) Loss: 1.898825 LR: 0.00004970 [00:56:20] Epoch: 1 Batch: 5029/38378 (13.10%) Loss: 2.175165 LR: 0.00004970 [00:56:21] Epoch: 1 Batch: 5030/38378 (13.11%) Loss: 1.971264 LR: 0.00004970 [00:56:23] Epoch: 1 Batch: 5031/38378 (13.11%) Loss: 2.075548 LR: 0.00004970 [00:56:25] Epoch: 1 Batch: 5032/38378 (13.11%) Loss: 1.984416 LR: 0.00004970 [00:56:27] Epoch: 1 Batch: 5033/38378 (13.11%) Loss: 2.121924 LR: 0.00004970 [00:56:28] Epoch: 1 Batch: 5034/38378 (13.12%) Loss: 2.036361 LR: 0.00004970 [00:56:30] Epoch: 1 Batch: 5035/38378 (13.12%) Loss: 2.109339 LR: 0.00004970 [00:56:32] Epoch: 1 Batch: 5036/38378 (13.12%) Loss: 1.946767 LR: 0.00004970 [00:56:34] Epoch: 1 Batch: 5037/38378 (13.12%) Loss: 2.091914 LR: 0.00004970 [00:56:35] Epoch: 1 Batch: 5038/38378 (13.13%) Loss: 1.907534 LR: 0.00004970 [00:56:37] Epoch: 1 Batch: 5039/38378 (13.13%) Loss: 1.989829 LR: 0.00004970 [00:56:39] Epoch: 1 Batch: 5040/38378 (13.13%) Loss: 2.453308 LR: 0.00004970 [00:56:40] Epoch: 1 Batch: 5041/38378 (13.14%) Loss: 2.074690 LR: 0.00004970 [00:56:42] Epoch: 1 Batch: 5042/38378 (13.14%) Loss: 2.065414 LR: 0.00004970 [00:56:44] Epoch: 1 Batch: 5043/38378 (13.14%) Loss: 2.068321 LR: 0.00004970 [00:56:45] Epoch: 1 Batch: 5044/38378 (13.14%) Loss: 2.021986 LR: 0.00004969 [00:56:47] Epoch: 1 Batch: 5045/38378 (13.15%) Loss: 2.115630 LR: 0.00004969 [00:56:49] Epoch: 1 Batch: 5046/38378 (13.15%) Loss: 1.900452 LR: 0.00004969 [00:56:51] Epoch: 1 Batch: 5047/38378 (13.15%) Loss: 1.846389 LR: 0.00004969 [00:56:52] Epoch: 1 Batch: 5048/38378 (13.15%) Loss: 1.675684 LR: 0.00004969 [00:56:58] >> Cleaned up old temp checkpoint: epoch1_step4719 [00:56:58] >> Temp checkpoint saved: epoch1_step5049, size: 0.1702 GB [00:56:58] Epoch: 1 Batch: 5049/38378 (13.16%) Loss: 1.982566 LR: 0.00004969 [00:57:00] Epoch: 1 Batch: 5050/38378 (13.16%) Loss: 2.199967 LR: 0.00004969 [00:57:01] Epoch: 1 Batch: 5051/38378 (13.16%) Loss: 1.921771 LR: 0.00004969 [00:57:03] Epoch: 1 Batch: 5052/38378 (13.16%) Loss: 1.974283 LR: 0.00004969 [00:57:05] Epoch: 1 Batch: 5053/38378 (13.17%) Loss: 1.973044 LR: 0.00004969 [00:57:06] Epoch: 1 Batch: 5054/38378 (13.17%) Loss: 2.012420 LR: 0.00004969 [00:57:08] Epoch: 1 Batch: 5055/38378 (13.17%) Loss: 1.823497 LR: 0.00004969 [00:57:10] Epoch: 1 Batch: 5056/38378 (13.17%) Loss: 1.870089 LR: 0.00004969 [00:57:12] Epoch: 1 Batch: 5057/38378 (13.18%) Loss: 2.182071 LR: 0.00004969 [00:57:13] Epoch: 1 Batch: 5058/38378 (13.18%) Loss: 2.008246 LR: 0.00004969 [00:57:15] Epoch: 1 Batch: 5059/38378 (13.18%) Loss: 2.102038 LR: 0.00004969 [00:57:17] Epoch: 1 Batch: 5060/38378 (13.18%) Loss: 2.130366 LR: 0.00004969 [00:57:18] Epoch: 1 Batch: 5061/38378 (13.19%) Loss: 1.899871 LR: 0.00004969 [00:57:20] Epoch: 1 Batch: 5062/38378 (13.19%) Loss: 1.994394 LR: 0.00004969 [00:57:22] Epoch: 1 Batch: 5063/38378 (13.19%) Loss: 2.267294 LR: 0.00004969 [00:57:24] Epoch: 1 Batch: 5064/38378 (13.20%) Loss: 2.187387 LR: 0.00004969 [00:57:25] Epoch: 1 Batch: 5065/38378 (13.20%) Loss: 1.794463 LR: 0.00004969 [00:57:27] Epoch: 1 Batch: 5066/38378 (13.20%) Loss: 2.023021 LR: 0.00004969 [00:57:29] Epoch: 1 Batch: 5067/38378 (13.20%) Loss: 2.095496 LR: 0.00004969 [00:57:30] Epoch: 1 Batch: 5068/38378 (13.21%) Loss: 2.409549 LR: 0.00004969 [00:57:32] Epoch: 1 Batch: 5069/38378 (13.21%) Loss: 2.092090 LR: 0.00004969 [00:57:34] Epoch: 1 Batch: 5070/38378 (13.21%) Loss: 2.167885 LR: 0.00004969 [00:57:36] Epoch: 1 Batch: 5071/38378 (13.21%) Loss: 2.184850 LR: 0.00004969 [00:57:37] Epoch: 1 Batch: 5072/38378 (13.22%) Loss: 2.121003 LR: 0.00004969 [00:57:39] Epoch: 1 Batch: 5073/38378 (13.22%) Loss: 1.988276 LR: 0.00004969 [00:57:41] Epoch: 1 Batch: 5074/38378 (13.22%) Loss: 1.937128 LR: 0.00004969 [00:57:42] Epoch: 1 Batch: 5075/38378 (13.22%) Loss: 2.111315 LR: 0.00004969 [00:57:44] Epoch: 1 Batch: 5076/38378 (13.23%) Loss: 2.094282 LR: 0.00004969 [00:57:46] Epoch: 1 Batch: 5077/38378 (13.23%) Loss: 2.035639 LR: 0.00004969 [00:57:47] Epoch: 1 Batch: 5078/38378 (13.23%) Loss: 1.934189 LR: 0.00004969 [00:57:49] Epoch: 1 Batch: 5079/38378 (13.23%) Loss: 1.915860 LR: 0.00004968 [00:57:51] Epoch: 1 Batch: 5080/38378 (13.24%) Loss: 1.997064 LR: 0.00004968 [00:57:53] Epoch: 1 Batch: 5081/38378 (13.24%) Loss: 2.484647 LR: 0.00004968 [00:57:58] >> Cleaned up old temp checkpoint: epoch1_step4752 [00:57:58] >> Temp checkpoint saved: epoch1_step5082, size: 0.1702 GB [00:57:58] Epoch: 1 Batch: 5082/38378 (13.24%) Loss: 1.967983 LR: 0.00004968 [00:58:00] Epoch: 1 Batch: 5083/38378 (13.24%) Loss: 2.088020 LR: 0.00004968 [00:58:02] Epoch: 1 Batch: 5084/38378 (13.25%) Loss: 2.142425 LR: 0.00004968 [00:58:03] Epoch: 1 Batch: 5085/38378 (13.25%) Loss: 1.899951 LR: 0.00004968 [00:58:05] Epoch: 1 Batch: 5086/38378 (13.25%) Loss: 1.937239 LR: 0.00004968 [00:58:07] Epoch: 1 Batch: 5087/38378 (13.25%) Loss: 2.200383 LR: 0.00004968 [00:58:09] Epoch: 1 Batch: 5088/38378 (13.26%) Loss: 2.094956 LR: 0.00004968 [00:58:10] Epoch: 1 Batch: 5089/38378 (13.26%) Loss: 2.005933 LR: 0.00004968 [00:58:12] Epoch: 1 Batch: 5090/38378 (13.26%) Loss: 2.097271 LR: 0.00004968 [00:58:14] Epoch: 1 Batch: 5091/38378 (13.27%) Loss: 2.157191 LR: 0.00004968 [00:58:15] Epoch: 1 Batch: 5092/38378 (13.27%) Loss: 1.765028 LR: 0.00004968 [00:58:17] Epoch: 1 Batch: 5093/38378 (13.27%) Loss: 2.251379 LR: 0.00004968 [00:58:19] Epoch: 1 Batch: 5094/38378 (13.27%) Loss: 2.223057 LR: 0.00004968 [00:58:21] Epoch: 1 Batch: 5095/38378 (13.28%) Loss: 1.941772 LR: 0.00004968 [00:58:22] Epoch: 1 Batch: 5096/38378 (13.28%) Loss: 2.271359 LR: 0.00004968 [00:58:24] Epoch: 1 Batch: 5097/38378 (13.28%) Loss: 1.889384 LR: 0.00004968 [00:58:26] Epoch: 1 Batch: 5098/38378 (13.28%) Loss: 2.488349 LR: 0.00004968 [00:58:27] Epoch: 1 Batch: 5099/38378 (13.29%) Loss: 2.229466 LR: 0.00004968 [00:58:29] Epoch: 1 Batch: 5100/38378 (13.29%) Loss: 1.939515 LR: 0.00004968 [00:58:31] Epoch: 1 Batch: 5101/38378 (13.29%) Loss: 2.299156 LR: 0.00004968 [00:58:33] Epoch: 1 Batch: 5102/38378 (13.29%) Loss: 1.894840 LR: 0.00004968 [00:58:34] Epoch: 1 Batch: 5103/38378 (13.30%) Loss: 2.122424 LR: 0.00004968 [00:58:36] Epoch: 1 Batch: 5104/38378 (13.30%) Loss: 2.034110 LR: 0.00004968 [00:58:38] Epoch: 1 Batch: 5105/38378 (13.30%) Loss: 2.042831 LR: 0.00004968 [00:58:39] Epoch: 1 Batch: 5106/38378 (13.30%) Loss: 2.222114 LR: 0.00004968 [00:58:41] Epoch: 1 Batch: 5107/38378 (13.31%) Loss: 1.893978 LR: 0.00004967 [00:58:43] Epoch: 1 Batch: 5108/38378 (13.31%) Loss: 2.143454 LR: 0.00004967 [00:58:45] Epoch: 1 Batch: 5109/38378 (13.31%) Loss: 2.158057 LR: 0.00004967 [00:58:46] Epoch: 1 Batch: 5110/38378 (13.31%) Loss: 1.803718 LR: 0.00004967 [00:58:48] Epoch: 1 Batch: 5111/38378 (13.32%) Loss: 2.176700 LR: 0.00004967 [00:58:50] Epoch: 1 Batch: 5112/38378 (13.32%) Loss: 1.936818 LR: 0.00004967 [00:58:52] Epoch: 1 Batch: 5113/38378 (13.32%) Loss: 2.079281 LR: 0.00004967 [00:58:53] Epoch: 1 Batch: 5114/38378 (13.33%) Loss: 1.842349 LR: 0.00004967 [00:58:59] >> Cleaned up old temp checkpoint: epoch1_step4785 [00:58:59] >> Temp checkpoint saved: epoch1_step5115, size: 0.1702 GB [00:58:59] Epoch: 1 Batch: 5115/38378 (13.33%) Loss: 2.302540 LR: 0.00004967 [00:59:01] Epoch: 1 Batch: 5116/38378 (13.33%) Loss: 2.046425 LR: 0.00004967 [00:59:02] Epoch: 1 Batch: 5117/38378 (13.33%) Loss: 2.163092 LR: 0.00004967 [00:59:04] Epoch: 1 Batch: 5118/38378 (13.34%) Loss: 2.206788 LR: 0.00004967 [00:59:06] Epoch: 1 Batch: 5119/38378 (13.34%) Loss: 1.998895 LR: 0.00004967 [00:59:07] Epoch: 1 Batch: 5120/38378 (13.34%) Loss: 2.250984 LR: 0.00004967 [00:59:09] Epoch: 1 Batch: 5121/38378 (13.34%) Loss: 2.064012 LR: 0.00004967 [00:59:11] Epoch: 1 Batch: 5122/38378 (13.35%) Loss: 2.022837 LR: 0.00004967 [00:59:12] Epoch: 1 Batch: 5123/38378 (13.35%) Loss: 1.799514 LR: 0.00004967 [00:59:14] Epoch: 1 Batch: 5124/38378 (13.35%) Loss: 2.071236 LR: 0.00004967 [00:59:16] Epoch: 1 Batch: 5125/38378 (13.35%) Loss: 1.933082 LR: 0.00004967 [00:59:17] Epoch: 1 Batch: 5126/38378 (13.36%) Loss: 2.189298 LR: 0.00004967 [00:59:19] Epoch: 1 Batch: 5127/38378 (13.36%) Loss: 2.019474 LR: 0.00004967 [00:59:21] Epoch: 1 Batch: 5128/38378 (13.36%) Loss: 2.160422 LR: 0.00004967 [00:59:23] Epoch: 1 Batch: 5129/38378 (13.36%) Loss: 1.959389 LR: 0.00004967 [00:59:24] Epoch: 1 Batch: 5130/38378 (13.37%) Loss: 1.724982 LR: 0.00004967 [00:59:26] Epoch: 1 Batch: 5131/38378 (13.37%) Loss: 2.162336 LR: 0.00004967 [00:59:28] Epoch: 1 Batch: 5132/38378 (13.37%) Loss: 1.898405 LR: 0.00004967 [00:59:29] Epoch: 1 Batch: 5133/38378 (13.37%) Loss: 1.767077 LR: 0.00004967 [00:59:31] Epoch: 1 Batch: 5134/38378 (13.38%) Loss: 1.745381 LR: 0.00004967 [00:59:33] Epoch: 1 Batch: 5135/38378 (13.38%) Loss: 2.106006 LR: 0.00004967 [00:59:35] Epoch: 1 Batch: 5136/38378 (13.38%) Loss: 1.891136 LR: 0.00004967 [00:59:36] Epoch: 1 Batch: 5137/38378 (13.39%) Loss: 1.975339 LR: 0.00004967 [00:59:38] Epoch: 1 Batch: 5138/38378 (13.39%) Loss: 2.229166 LR: 0.00004967 [00:59:40] Epoch: 1 Batch: 5139/38378 (13.39%) Loss: 2.324724 LR: 0.00004967 [00:59:41] Epoch: 1 Batch: 5140/38378 (13.39%) Loss: 1.935659 LR: 0.00004967 [00:59:43] Epoch: 1 Batch: 5141/38378 (13.40%) Loss: 2.002692 LR: 0.00004967 [00:59:45] Epoch: 1 Batch: 5142/38378 (13.40%) Loss: 2.002889 LR: 0.00004966 [00:59:47] Epoch: 1 Batch: 5143/38378 (13.40%) Loss: 1.829497 LR: 0.00004966 [00:59:48] Epoch: 1 Batch: 5144/38378 (13.40%) Loss: 2.038789 LR: 0.00004966 [00:59:50] Epoch: 1 Batch: 5145/38378 (13.41%) Loss: 2.019301 LR: 0.00004966 [00:59:52] Epoch: 1 Batch: 5146/38378 (13.41%) Loss: 1.628020 LR: 0.00004966 [00:59:54] Epoch: 1 Batch: 5147/38378 (13.41%) Loss: 2.254559 LR: 0.00004966 [00:59:59] >> Cleaned up old temp checkpoint: epoch1_step4818 [00:59:59] >> Temp checkpoint saved: epoch1_step5148, size: 0.1702 GB [00:59:59] Epoch: 1 Batch: 5148/38378 (13.41%) Loss: 1.757470 LR: 0.00004966 [01:00:01] Epoch: 1 Batch: 5149/38378 (13.42%) Loss: 1.851289 LR: 0.00004966 [01:00:02] Epoch: 1 Batch: 5150/38378 (13.42%) Loss: 1.940275 LR: 0.00004966 [01:00:04] Epoch: 1 Batch: 5151/38378 (13.42%) Loss: 1.997426 LR: 0.00004966 [01:00:06] Epoch: 1 Batch: 5152/38378 (13.42%) Loss: 2.073054 LR: 0.00004966 [01:00:08] Epoch: 1 Batch: 5153/38378 (13.43%) Loss: 1.768161 LR: 0.00004966 [01:00:09] Epoch: 1 Batch: 5154/38378 (13.43%) Loss: 1.917329 LR: 0.00004966 [01:00:11] Epoch: 1 Batch: 5155/38378 (13.43%) Loss: 2.148782 LR: 0.00004966 [01:00:13] Epoch: 1 Batch: 5156/38378 (13.43%) Loss: 1.660130 LR: 0.00004966 [01:00:14] Epoch: 1 Batch: 5157/38378 (13.44%) Loss: 2.098531 LR: 0.00004966 [01:00:16] Epoch: 1 Batch: 5158/38378 (13.44%) Loss: 2.145216 LR: 0.00004966 [01:00:18] Epoch: 1 Batch: 5159/38378 (13.44%) Loss: 1.904931 LR: 0.00004966 [01:00:19] Epoch: 1 Batch: 5160/38378 (13.45%) Loss: 2.107263 LR: 0.00004966 [01:00:21] Epoch: 1 Batch: 5161/38378 (13.45%) Loss: 2.209747 LR: 0.00004966 [01:00:23] Epoch: 1 Batch: 5162/38378 (13.45%) Loss: 2.032433 LR: 0.00004966 [01:00:25] Epoch: 1 Batch: 5163/38378 (13.45%) Loss: 2.188580 LR: 0.00004966 [01:00:26] Epoch: 1 Batch: 5164/38378 (13.46%) Loss: 2.180898 LR: 0.00004966 [01:00:28] Epoch: 1 Batch: 5165/38378 (13.46%) Loss: 2.209518 LR: 0.00004966 [01:00:30] Epoch: 1 Batch: 5166/38378 (13.46%) Loss: 1.562123 LR: 0.00004966 [01:00:32] Epoch: 1 Batch: 5167/38378 (13.46%) Loss: 2.178917 LR: 0.00004966 [01:00:33] Epoch: 1 Batch: 5168/38378 (13.47%) Loss: 1.929192 LR: 0.00004966 [01:00:35] Epoch: 1 Batch: 5169/38378 (13.47%) Loss: 2.086150 LR: 0.00004966 [01:00:37] Epoch: 1 Batch: 5170/38378 (13.47%) Loss: 1.982952 LR: 0.00004965 [01:00:38] Epoch: 1 Batch: 5171/38378 (13.47%) Loss: 1.975965 LR: 0.00004965 [01:00:40] Epoch: 1 Batch: 5172/38378 (13.48%) Loss: 1.933397 LR: 0.00004965 [01:00:42] Epoch: 1 Batch: 5173/38378 (13.48%) Loss: 2.055906 LR: 0.00004965 [01:00:44] Epoch: 1 Batch: 5174/38378 (13.48%) Loss: 2.074585 LR: 0.00004965 [01:00:45] Epoch: 1 Batch: 5175/38378 (13.48%) Loss: 2.210596 LR: 0.00004965 [01:00:47] Epoch: 1 Batch: 5176/38378 (13.49%) Loss: 1.936505 LR: 0.00004965 [01:00:49] Epoch: 1 Batch: 5177/38378 (13.49%) Loss: 2.072573 LR: 0.00004965 [01:00:50] Epoch: 1 Batch: 5178/38378 (13.49%) Loss: 1.881355 LR: 0.00004965 [01:00:52] Epoch: 1 Batch: 5179/38378 (13.49%) Loss: 2.001360 LR: 0.00004965 [01:00:54] Epoch: 1 Batch: 5180/38378 (13.50%) Loss: 2.100689 LR: 0.00004965 [01:00:59] >> Cleaned up old temp checkpoint: epoch1_step4851 [01:00:59] >> Temp checkpoint saved: epoch1_step5181, size: 0.1702 GB [01:00:59] Epoch: 1 Batch: 5181/38378 (13.50%) Loss: 1.917050 LR: 0.00004965 [01:01:01] Epoch: 1 Batch: 5182/38378 (13.50%) Loss: 1.889355 LR: 0.00004965 [01:01:03] Epoch: 1 Batch: 5183/38378 (13.51%) Loss: 2.009071 LR: 0.00004965 [01:01:04] Epoch: 1 Batch: 5184/38378 (13.51%) Loss: 1.929952 LR: 0.00004965 [01:01:06] Epoch: 1 Batch: 5185/38378 (13.51%) Loss: 2.159705 LR: 0.00004965 [01:01:08] Epoch: 1 Batch: 5186/38378 (13.51%) Loss: 2.167853 LR: 0.00004965 [01:01:10] Epoch: 1 Batch: 5187/38378 (13.52%) Loss: 2.187889 LR: 0.00004965 [01:01:11] Epoch: 1 Batch: 5188/38378 (13.52%) Loss: 2.059660 LR: 0.00004965 [01:01:13] Epoch: 1 Batch: 5189/38378 (13.52%) Loss: 1.763733 LR: 0.00004965 [01:01:14] Epoch: 1 Batch: 5190/38378 (13.52%) Loss: 1.837900 LR: 0.00004965 [01:01:16] Epoch: 1 Batch: 5191/38378 (13.53%) Loss: 1.681289 LR: 0.00004965 [01:01:18] Epoch: 1 Batch: 5192/38378 (13.53%) Loss: 2.011882 LR: 0.00004965 [01:01:20] Epoch: 1 Batch: 5193/38378 (13.53%) Loss: 1.955222 LR: 0.00004965 [01:01:21] Epoch: 1 Batch: 5194/38378 (13.53%) Loss: 1.996799 LR: 0.00004965 [01:01:23] Epoch: 1 Batch: 5195/38378 (13.54%) Loss: 2.161945 LR: 0.00004965 [01:01:25] Epoch: 1 Batch: 5196/38378 (13.54%) Loss: 1.971661 LR: 0.00004965 [01:01:26] Epoch: 1 Batch: 5197/38378 (13.54%) Loss: 1.864347 LR: 0.00004965 [01:01:28] Epoch: 1 Batch: 5198/38378 (13.54%) Loss: 1.918077 LR: 0.00004964 [01:01:30] Epoch: 1 Batch: 5199/38378 (13.55%) Loss: 2.240896 LR: 0.00004964 [01:01:31] Epoch: 1 Batch: 5200/38378 (13.55%) Loss: 2.064426 LR: 0.00004964 [01:01:33] Epoch: 1 Batch: 5201/38378 (13.55%) Loss: 2.078244 LR: 0.00004964 [01:01:35] Epoch: 1 Batch: 5202/38378 (13.55%) Loss: 2.007316 LR: 0.00004964 [01:01:36] Epoch: 1 Batch: 5203/38378 (13.56%) Loss: 2.118230 LR: 0.00004964 [01:01:38] Epoch: 1 Batch: 5204/38378 (13.56%) Loss: 1.908486 LR: 0.00004964 [01:01:40] Epoch: 1 Batch: 5205/38378 (13.56%) Loss: 2.217383 LR: 0.00004964 [01:01:42] Epoch: 1 Batch: 5206/38378 (13.57%) Loss: 2.133290 LR: 0.00004964 [01:01:43] Epoch: 1 Batch: 5207/38378 (13.57%) Loss: 1.944490 LR: 0.00004964 [01:01:45] Epoch: 1 Batch: 5208/38378 (13.57%) Loss: 1.933812 LR: 0.00004964 [01:01:47] Epoch: 1 Batch: 5209/38378 (13.57%) Loss: 2.229230 LR: 0.00004964 [01:01:48] Epoch: 1 Batch: 5210/38378 (13.58%) Loss: 2.075545 LR: 0.00004964 [01:01:50] Epoch: 1 Batch: 5211/38378 (13.58%) Loss: 2.065793 LR: 0.00004964 [01:01:52] Epoch: 1 Batch: 5212/38378 (13.58%) Loss: 1.925335 LR: 0.00004964 [01:01:54] Epoch: 1 Batch: 5213/38378 (13.58%) Loss: 1.892716 LR: 0.00004964 [01:01:59] >> Cleaned up old temp checkpoint: epoch1_step4884 [01:01:59] >> Temp checkpoint saved: epoch1_step5214, size: 0.1702 GB [01:01:59] Epoch: 1 Batch: 5214/38378 (13.59%) Loss: 2.069077 LR: 0.00004964 [01:02:01] Epoch: 1 Batch: 5215/38378 (13.59%) Loss: 1.870763 LR: 0.00004964 [01:02:03] Epoch: 1 Batch: 5216/38378 (13.59%) Loss: 2.177942 LR: 0.00004964 [01:02:05] Epoch: 1 Batch: 5217/38378 (13.59%) Loss: 1.812364 LR: 0.00004964 [01:02:06] Epoch: 1 Batch: 5218/38378 (13.60%) Loss: 1.831014 LR: 0.00004964 [01:02:08] Epoch: 1 Batch: 5219/38378 (13.60%) Loss: 2.226837 LR: 0.00004964 [01:02:10] Epoch: 1 Batch: 5220/38378 (13.60%) Loss: 1.922960 LR: 0.00004964 [01:02:11] Epoch: 1 Batch: 5221/38378 (13.60%) Loss: 2.295044 LR: 0.00004964 [01:02:13] Epoch: 1 Batch: 5222/38378 (13.61%) Loss: 2.018672 LR: 0.00004964 [01:02:15] Epoch: 1 Batch: 5223/38378 (13.61%) Loss: 1.947313 LR: 0.00004964 [01:02:16] Epoch: 1 Batch: 5224/38378 (13.61%) Loss: 1.964415 LR: 0.00004964 [01:02:18] Epoch: 1 Batch: 5225/38378 (13.61%) Loss: 1.888612 LR: 0.00004964 [01:02:20] Epoch: 1 Batch: 5226/38378 (13.62%) Loss: 1.844212 LR: 0.00004964 [01:02:22] Epoch: 1 Batch: 5227/38378 (13.62%) Loss: 2.229791 LR: 0.00004964 [01:02:23] Epoch: 1 Batch: 5228/38378 (13.62%) Loss: 1.978431 LR: 0.00004964 [01:02:25] Epoch: 1 Batch: 5229/38378 (13.62%) Loss: 2.127693 LR: 0.00004964 [01:02:27] Epoch: 1 Batch: 5230/38378 (13.63%) Loss: 1.751077 LR: 0.00004964 [01:02:29] Epoch: 1 Batch: 5231/38378 (13.63%) Loss: 1.780081 LR: 0.00004964 [01:02:30] Epoch: 1 Batch: 5232/38378 (13.63%) Loss: 2.223754 LR: 0.00004964 [01:02:32] Epoch: 1 Batch: 5233/38378 (13.64%) Loss: 2.038002 LR: 0.00004963 [01:02:34] Epoch: 1 Batch: 5234/38378 (13.64%) Loss: 1.900055 LR: 0.00004963 [01:02:35] Epoch: 1 Batch: 5235/38378 (13.64%) Loss: 1.935893 LR: 0.00004963 [01:02:37] Epoch: 1 Batch: 5236/38378 (13.64%) Loss: 1.896518 LR: 0.00004963 [01:02:39] Epoch: 1 Batch: 5237/38378 (13.65%) Loss: 2.043605 LR: 0.00004963 [01:02:41] Epoch: 1 Batch: 5238/38378 (13.65%) Loss: 1.988619 LR: 0.00004963 [01:02:42] Epoch: 1 Batch: 5239/38378 (13.65%) Loss: 2.106696 LR: 0.00004963 [01:02:44] Epoch: 1 Batch: 5240/38378 (13.65%) Loss: 2.169425 LR: 0.00004963 [01:02:46] Epoch: 1 Batch: 5241/38378 (13.66%) Loss: 1.970232 LR: 0.00004963 [01:02:47] Epoch: 1 Batch: 5242/38378 (13.66%) Loss: 2.256925 LR: 0.00004963 [01:02:49] Epoch: 1 Batch: 5243/38378 (13.66%) Loss: 2.065327 LR: 0.00004963 [01:02:51] Epoch: 1 Batch: 5244/38378 (13.66%) Loss: 2.196327 LR: 0.00004963 [01:02:53] Epoch: 1 Batch: 5245/38378 (13.67%) Loss: 1.825002 LR: 0.00004963 [01:02:54] Epoch: 1 Batch: 5246/38378 (13.67%) Loss: 2.094812 LR: 0.00004963 [01:03:00] >> Cleaned up old temp checkpoint: epoch1_step4917 [01:03:00] >> Temp checkpoint saved: epoch1_step5247, size: 0.1702 GB [01:03:00] Epoch: 1 Batch: 5247/38378 (13.67%) Loss: 2.062808 LR: 0.00004963 [01:03:01] Epoch: 1 Batch: 5248/38378 (13.67%) Loss: 2.001784 LR: 0.00004963 [01:03:03] Epoch: 1 Batch: 5249/38378 (13.68%) Loss: 2.196062 LR: 0.00004963 [01:03:05] Epoch: 1 Batch: 5250/38378 (13.68%) Loss: 2.217685 LR: 0.00004963 [01:03:06] Epoch: 1 Batch: 5251/38378 (13.68%) Loss: 2.063996 LR: 0.00004963 [01:03:08] Epoch: 1 Batch: 5252/38378 (13.68%) Loss: 1.771300 LR: 0.00004963 [01:03:10] Epoch: 1 Batch: 5253/38378 (13.69%) Loss: 1.989539 LR: 0.00004963 [01:03:12] Epoch: 1 Batch: 5254/38378 (13.69%) Loss: 2.020596 LR: 0.00004963 [01:03:13] Epoch: 1 Batch: 5255/38378 (13.69%) Loss: 1.945777 LR: 0.00004963 [01:03:15] Epoch: 1 Batch: 5256/38378 (13.70%) Loss: 1.794502 LR: 0.00004963 [01:03:17] Epoch: 1 Batch: 5257/38378 (13.70%) Loss: 1.905171 LR: 0.00004963 [01:03:18] Epoch: 1 Batch: 5258/38378 (13.70%) Loss: 1.797072 LR: 0.00004963 [01:03:20] Epoch: 1 Batch: 5259/38378 (13.70%) Loss: 1.945821 LR: 0.00004963 [01:03:22] Epoch: 1 Batch: 5260/38378 (13.71%) Loss: 1.854635 LR: 0.00004963 [01:03:24] Epoch: 1 Batch: 5261/38378 (13.71%) Loss: 2.080437 LR: 0.00004962 [01:03:25] Epoch: 1 Batch: 5262/38378 (13.71%) Loss: 2.259488 LR: 0.00004962 [01:03:27] Epoch: 1 Batch: 5263/38378 (13.71%) Loss: 1.898400 LR: 0.00004962 [01:03:29] Epoch: 1 Batch: 5264/38378 (13.72%) Loss: 2.109076 LR: 0.00004962 [01:03:30] Epoch: 1 Batch: 5265/38378 (13.72%) Loss: 2.331333 LR: 0.00004962 [01:03:32] Epoch: 1 Batch: 5266/38378 (13.72%) Loss: 1.941643 LR: 0.00004962 [01:03:34] Epoch: 1 Batch: 5267/38378 (13.72%) Loss: 2.217087 LR: 0.00004962 [01:03:36] Epoch: 1 Batch: 5268/38378 (13.73%) Loss: 2.175626 LR: 0.00004962 [01:03:37] Epoch: 1 Batch: 5269/38378 (13.73%) Loss: 2.151291 LR: 0.00004962 [01:03:39] Epoch: 1 Batch: 5270/38378 (13.73%) Loss: 2.155598 LR: 0.00004962 [01:03:41] Epoch: 1 Batch: 5271/38378 (13.73%) Loss: 2.163633 LR: 0.00004962 [01:03:42] Epoch: 1 Batch: 5272/38378 (13.74%) Loss: 1.833352 LR: 0.00004962 [01:03:44] Epoch: 1 Batch: 5273/38378 (13.74%) Loss: 2.144623 LR: 0.00004962 [01:03:46] Epoch: 1 Batch: 5274/38378 (13.74%) Loss: 1.778933 LR: 0.00004962 [01:03:48] Epoch: 1 Batch: 5275/38378 (13.74%) Loss: 1.834425 LR: 0.00004962 [01:03:49] Epoch: 1 Batch: 5276/38378 (13.75%) Loss: 1.815044 LR: 0.00004962 [01:03:51] Epoch: 1 Batch: 5277/38378 (13.75%) Loss: 2.082620 LR: 0.00004962 [01:03:53] Epoch: 1 Batch: 5278/38378 (13.75%) Loss: 2.075174 LR: 0.00004962 [01:03:54] Epoch: 1 Batch: 5279/38378 (13.76%) Loss: 2.289674 LR: 0.00004962 [01:04:00] >> Cleaned up old temp checkpoint: epoch1_step4950 [01:04:00] >> Temp checkpoint saved: epoch1_step5280, size: 0.1702 GB [01:04:00] Epoch: 1 Batch: 5280/38378 (13.76%) Loss: 1.716468 LR: 0.00004962 [01:04:02] Epoch: 1 Batch: 5281/38378 (13.76%) Loss: 2.241341 LR: 0.00004962 [01:04:04] Epoch: 1 Batch: 5282/38378 (13.76%) Loss: 1.987349 LR: 0.00004962 [01:04:05] Epoch: 1 Batch: 5283/38378 (13.77%) Loss: 2.271190 LR: 0.00004962 [01:04:07] Epoch: 1 Batch: 5284/38378 (13.77%) Loss: 2.106761 LR: 0.00004962 [01:04:09] Epoch: 1 Batch: 5285/38378 (13.77%) Loss: 1.990499 LR: 0.00004962 [01:04:10] Epoch: 1 Batch: 5286/38378 (13.77%) Loss: 2.021762 LR: 0.00004962 [01:04:12] Epoch: 1 Batch: 5287/38378 (13.78%) Loss: 1.970420 LR: 0.00004962 [01:04:14] Epoch: 1 Batch: 5288/38378 (13.78%) Loss: 2.170508 LR: 0.00004962 [01:04:15] Epoch: 1 Batch: 5289/38378 (13.78%) Loss: 2.226257 LR: 0.00004961 [01:04:17] Epoch: 1 Batch: 5290/38378 (13.78%) Loss: 2.292582 LR: 0.00004961 [01:04:19] Epoch: 1 Batch: 5291/38378 (13.79%) Loss: 2.138058 LR: 0.00004961 [01:04:20] Epoch: 1 Batch: 5292/38378 (13.79%) Loss: 1.940163 LR: 0.00004961 [01:04:22] Epoch: 1 Batch: 5293/38378 (13.79%) Loss: 2.070838 LR: 0.00004961 [01:04:24] Epoch: 1 Batch: 5294/38378 (13.79%) Loss: 1.951433 LR: 0.00004961 [01:04:26] Epoch: 1 Batch: 5295/38378 (13.80%) Loss: 1.920214 LR: 0.00004961 [01:04:27] Epoch: 1 Batch: 5296/38378 (13.80%) Loss: 1.928773 LR: 0.00004961 [01:04:29] Epoch: 1 Batch: 5297/38378 (13.80%) Loss: 2.386750 LR: 0.00004961 [01:04:31] Epoch: 1 Batch: 5298/38378 (13.80%) Loss: 1.897525 LR: 0.00004961 [01:04:32] Epoch: 1 Batch: 5299/38378 (13.81%) Loss: 1.986910 LR: 0.00004961 [01:04:34] Epoch: 1 Batch: 5300/38378 (13.81%) Loss: 1.873762 LR: 0.00004961 [01:04:36] Epoch: 1 Batch: 5301/38378 (13.81%) Loss: 1.925346 LR: 0.00004961 [01:04:37] Epoch: 1 Batch: 5302/38378 (13.82%) Loss: 2.448441 LR: 0.00004961 [01:04:39] Epoch: 1 Batch: 5303/38378 (13.82%) Loss: 1.811696 LR: 0.00004961 [01:04:41] Epoch: 1 Batch: 5304/38378 (13.82%) Loss: 2.001545 LR: 0.00004961 [01:04:42] Epoch: 1 Batch: 5305/38378 (13.82%) Loss: 2.130124 LR: 0.00004961 [01:04:44] Epoch: 1 Batch: 5306/38378 (13.83%) Loss: 1.969219 LR: 0.00004961 [01:04:46] Epoch: 1 Batch: 5307/38378 (13.83%) Loss: 2.037022 LR: 0.00004961 [01:04:48] Epoch: 1 Batch: 5308/38378 (13.83%) Loss: 1.750853 LR: 0.00004961 [01:04:49] Epoch: 1 Batch: 5309/38378 (13.83%) Loss: 1.829623 LR: 0.00004961 [01:04:51] Epoch: 1 Batch: 5310/38378 (13.84%) Loss: 1.852449 LR: 0.00004961 [01:04:53] Epoch: 1 Batch: 5311/38378 (13.84%) Loss: 2.008468 LR: 0.00004961 [01:04:54] Epoch: 1 Batch: 5312/38378 (13.84%) Loss: 1.902927 LR: 0.00004961 [01:05:00] >> Cleaned up old temp checkpoint: epoch1_step4983 [01:05:00] >> Temp checkpoint saved: epoch1_step5313, size: 0.1702 GB [01:05:00] Epoch: 1 Batch: 5313/38378 (13.84%) Loss: 1.804783 LR: 0.00004961 [01:05:02] Epoch: 1 Batch: 5314/38378 (13.85%) Loss: 2.313500 LR: 0.00004961 [01:05:03] Epoch: 1 Batch: 5315/38378 (13.85%) Loss: 1.830461 LR: 0.00004961 [01:05:05] Epoch: 1 Batch: 5316/38378 (13.85%) Loss: 2.190423 LR: 0.00004961 [01:05:07] Epoch: 1 Batch: 5317/38378 (13.85%) Loss: 2.173262 LR: 0.00004960 [01:05:08] Epoch: 1 Batch: 5318/38378 (13.86%) Loss: 2.158887 LR: 0.00004960 [01:05:10] Epoch: 1 Batch: 5319/38378 (13.86%) Loss: 1.933068 LR: 0.00004960 [01:05:12] Epoch: 1 Batch: 5320/38378 (13.86%) Loss: 2.012646 LR: 0.00004960 [01:05:13] Epoch: 1 Batch: 5321/38378 (13.86%) Loss: 1.645858 LR: 0.00004960 [01:05:15] Epoch: 1 Batch: 5322/38378 (13.87%) Loss: 1.892446 LR: 0.00004960 [01:05:17] Epoch: 1 Batch: 5323/38378 (13.87%) Loss: 2.150795 LR: 0.00004960 [01:05:19] Epoch: 1 Batch: 5324/38378 (13.87%) Loss: 2.362930 LR: 0.00004960 [01:05:20] Epoch: 1 Batch: 5325/38378 (13.88%) Loss: 2.179144 LR: 0.00004960 [01:05:22] Epoch: 1 Batch: 5326/38378 (13.88%) Loss: 1.794431 LR: 0.00004960 [01:05:24] Epoch: 1 Batch: 5327/38378 (13.88%) Loss: 1.891260 LR: 0.00004960 [01:05:25] Epoch: 1 Batch: 5328/38378 (13.88%) Loss: 2.027938 LR: 0.00004960 [01:05:27] Epoch: 1 Batch: 5329/38378 (13.89%) Loss: 2.175924 LR: 0.00004960 [01:05:29] Epoch: 1 Batch: 5330/38378 (13.89%) Loss: 1.975005 LR: 0.00004960 [01:05:31] Epoch: 1 Batch: 5331/38378 (13.89%) Loss: 1.762090 LR: 0.00004960 [01:05:32] Epoch: 1 Batch: 5332/38378 (13.89%) Loss: 1.754476 LR: 0.00004960 [01:05:34] Epoch: 1 Batch: 5333/38378 (13.90%) Loss: 2.030319 LR: 0.00004960 [01:05:36] Epoch: 1 Batch: 5334/38378 (13.90%) Loss: 1.991653 LR: 0.00004960 [01:05:38] Epoch: 1 Batch: 5335/38378 (13.90%) Loss: 2.245056 LR: 0.00004960 [01:05:39] Epoch: 1 Batch: 5336/38378 (13.90%) Loss: 1.875564 LR: 0.00004960 [01:05:41] Epoch: 1 Batch: 5337/38378 (13.91%) Loss: 1.945618 LR: 0.00004960 [01:05:43] Epoch: 1 Batch: 5338/38378 (13.91%) Loss: 2.161370 LR: 0.00004960 [01:05:44] Epoch: 1 Batch: 5339/38378 (13.91%) Loss: 1.958106 LR: 0.00004960 [01:05:46] Epoch: 1 Batch: 5340/38378 (13.91%) Loss: 2.029338 LR: 0.00004960 [01:05:48] Epoch: 1 Batch: 5341/38378 (13.92%) Loss: 1.740734 LR: 0.00004960 [01:05:50] Epoch: 1 Batch: 5342/38378 (13.92%) Loss: 2.052108 LR: 0.00004960 [01:05:51] Epoch: 1 Batch: 5343/38378 (13.92%) Loss: 2.079945 LR: 0.00004960 [01:05:53] Epoch: 1 Batch: 5344/38378 (13.92%) Loss: 2.145521 LR: 0.00004960 [01:05:55] Epoch: 1 Batch: 5345/38378 (13.93%) Loss: 1.829422 LR: 0.00004959 [01:06:00] >> Cleaned up old temp checkpoint: epoch1_step5016 [01:06:00] >> Temp checkpoint saved: epoch1_step5346, size: 0.1702 GB [01:06:00] Epoch: 1 Batch: 5346/38378 (13.93%) Loss: 1.819862 LR: 0.00004959 [01:06:02] Epoch: 1 Batch: 5347/38378 (13.93%) Loss: 2.068049 LR: 0.00004959 [01:06:04] Epoch: 1 Batch: 5348/38378 (13.94%) Loss: 1.972680 LR: 0.00004959 [01:06:05] Epoch: 1 Batch: 5349/38378 (13.94%) Loss: 2.058950 LR: 0.00004959 [01:06:07] Epoch: 1 Batch: 5350/38378 (13.94%) Loss: 2.040587 LR: 0.00004959 [01:06:09] Epoch: 1 Batch: 5351/38378 (13.94%) Loss: 2.012839 LR: 0.00004959 [01:06:10] Epoch: 1 Batch: 5352/38378 (13.95%) Loss: 1.975562 LR: 0.00004959 [01:06:12] Epoch: 1 Batch: 5353/38378 (13.95%) Loss: 2.130652 LR: 0.00004959 [01:06:14] Epoch: 1 Batch: 5354/38378 (13.95%) Loss: 1.921179 LR: 0.00004959 [01:06:16] Epoch: 1 Batch: 5355/38378 (13.95%) Loss: 2.072465 LR: 0.00004959 [01:06:17] Epoch: 1 Batch: 5356/38378 (13.96%) Loss: 1.689819 LR: 0.00004959 [01:06:19] Epoch: 1 Batch: 5357/38378 (13.96%) Loss: 2.155798 LR: 0.00004959 [01:06:21] Epoch: 1 Batch: 5358/38378 (13.96%) Loss: 2.073030 LR: 0.00004959 [01:06:22] Epoch: 1 Batch: 5359/38378 (13.96%) Loss: 1.896284 LR: 0.00004959 [01:06:24] Epoch: 1 Batch: 5360/38378 (13.97%) Loss: 2.198050 LR: 0.00004959 [01:06:26] Epoch: 1 Batch: 5361/38378 (13.97%) Loss: 2.016085 LR: 0.00004959 [01:06:28] Epoch: 1 Batch: 5362/38378 (13.97%) Loss: 1.715124 LR: 0.00004959 [01:06:29] Epoch: 1 Batch: 5363/38378 (13.97%) Loss: 2.275536 LR: 0.00004959 [01:06:31] Epoch: 1 Batch: 5364/38378 (13.98%) Loss: 2.079530 LR: 0.00004959 [01:06:33] Epoch: 1 Batch: 5365/38378 (13.98%) Loss: 2.198180 LR: 0.00004959 [01:06:34] Epoch: 1 Batch: 5366/38378 (13.98%) Loss: 1.916486 LR: 0.00004959 [01:06:36] Epoch: 1 Batch: 5367/38378 (13.98%) Loss: 2.063087 LR: 0.00004959 [01:06:38] Epoch: 1 Batch: 5368/38378 (13.99%) Loss: 1.757161 LR: 0.00004959 [01:06:39] Epoch: 1 Batch: 5369/38378 (13.99%) Loss: 1.988632 LR: 0.00004959 [01:06:41] Epoch: 1 Batch: 5370/38378 (13.99%) Loss: 2.289060 LR: 0.00004959 [01:06:43] Epoch: 1 Batch: 5371/38378 (13.99%) Loss: 1.806296 LR: 0.00004959 [01:06:45] Epoch: 1 Batch: 5372/38378 (14.00%) Loss: 2.427435 LR: 0.00004959 [01:06:46] Epoch: 1 Batch: 5373/38378 (14.00%) Loss: 1.906016 LR: 0.00004958 [01:06:48] Epoch: 1 Batch: 5374/38378 (14.00%) Loss: 1.603056 LR: 0.00004958 [01:06:50] Epoch: 1 Batch: 5375/38378 (14.01%) Loss: 1.888769 LR: 0.00004958 [01:06:52] Epoch: 1 Batch: 5376/38378 (14.01%) Loss: 2.220194 LR: 0.00004958 [01:06:53] Epoch: 1 Batch: 5377/38378 (14.01%) Loss: 2.139665 LR: 0.00004958 [01:06:55] Epoch: 1 Batch: 5378/38378 (14.01%) Loss: 2.043507 LR: 0.00004958 [01:07:00] >> Cleaned up old temp checkpoint: epoch1_step5049 [01:07:01] >> Temp checkpoint saved: epoch1_step5379, size: 0.1702 GB [01:07:01] Epoch: 1 Batch: 5379/38378 (14.02%) Loss: 1.911013 LR: 0.00004958 [01:07:02] Epoch: 1 Batch: 5380/38378 (14.02%) Loss: 2.105254 LR: 0.00004958 [01:07:04] Epoch: 1 Batch: 5381/38378 (14.02%) Loss: 2.140257 LR: 0.00004958 [01:07:06] Epoch: 1 Batch: 5382/38378 (14.02%) Loss: 2.165188 LR: 0.00004958 [01:07:07] Epoch: 1 Batch: 5383/38378 (14.03%) Loss: 2.013776 LR: 0.00004958 [01:07:09] Epoch: 1 Batch: 5384/38378 (14.03%) Loss: 1.952430 LR: 0.00004958 [01:07:11] Epoch: 1 Batch: 5385/38378 (14.03%) Loss: 2.328531 LR: 0.00004958 [01:07:12] Epoch: 1 Batch: 5386/38378 (14.03%) Loss: 2.112321 LR: 0.00004958 [01:07:14] Epoch: 1 Batch: 5387/38378 (14.04%) Loss: 2.229117 LR: 0.00004958 [01:07:16] Epoch: 1 Batch: 5388/38378 (14.04%) Loss: 1.998434 LR: 0.00004958 [01:07:17] Epoch: 1 Batch: 5389/38378 (14.04%) Loss: 2.189659 LR: 0.00004958 [01:07:19] Epoch: 1 Batch: 5390/38378 (14.04%) Loss: 1.882194 LR: 0.00004958 [01:07:21] Epoch: 1 Batch: 5391/38378 (14.05%) Loss: 2.054069 LR: 0.00004958 [01:07:23] Epoch: 1 Batch: 5392/38378 (14.05%) Loss: 1.999932 LR: 0.00004958 [01:07:24] Epoch: 1 Batch: 5393/38378 (14.05%) Loss: 2.285015 LR: 0.00004958 [01:07:26] Epoch: 1 Batch: 5394/38378 (14.05%) Loss: 1.785553 LR: 0.00004958 [01:07:28] Epoch: 1 Batch: 5395/38378 (14.06%) Loss: 2.000108 LR: 0.00004958 [01:07:29] Epoch: 1 Batch: 5396/38378 (14.06%) Loss: 2.317250 LR: 0.00004958 [01:07:31] Epoch: 1 Batch: 5397/38378 (14.06%) Loss: 1.898568 LR: 0.00004958 [01:07:33] Epoch: 1 Batch: 5398/38378 (14.07%) Loss: 1.960312 LR: 0.00004958 [01:07:34] Epoch: 1 Batch: 5399/38378 (14.07%) Loss: 2.152464 LR: 0.00004958 [01:07:36] Epoch: 1 Batch: 5400/38378 (14.07%) Loss: 2.106624 LR: 0.00004958 [01:07:38] Epoch: 1 Batch: 5401/38378 (14.07%) Loss: 2.083543 LR: 0.00004957 [01:07:40] Epoch: 1 Batch: 5402/38378 (14.08%) Loss: 1.899257 LR: 0.00004957 [01:07:41] Epoch: 1 Batch: 5403/38378 (14.08%) Loss: 2.139449 LR: 0.00004957 [01:07:43] Epoch: 1 Batch: 5404/38378 (14.08%) Loss: 2.331406 LR: 0.00004957 [01:07:45] Epoch: 1 Batch: 5405/38378 (14.08%) Loss: 2.153540 LR: 0.00004957 [01:07:46] Epoch: 1 Batch: 5406/38378 (14.09%) Loss: 2.249365 LR: 0.00004957 [01:07:48] Epoch: 1 Batch: 5407/38378 (14.09%) Loss: 2.022091 LR: 0.00004957 [01:07:50] Epoch: 1 Batch: 5408/38378 (14.09%) Loss: 1.956810 LR: 0.00004957 [01:07:52] Epoch: 1 Batch: 5409/38378 (14.09%) Loss: 1.817010 LR: 0.00004957 [01:07:53] Epoch: 1 Batch: 5410/38378 (14.10%) Loss: 2.103464 LR: 0.00004957 [01:07:55] Epoch: 1 Batch: 5411/38378 (14.10%) Loss: 2.091732 LR: 0.00004957 [01:08:00] >> Cleaned up old temp checkpoint: epoch1_step5082 [01:08:01] >> Temp checkpoint saved: epoch1_step5412, size: 0.1702 GB [01:08:01] Epoch: 1 Batch: 5412/38378 (14.10%) Loss: 1.945551 LR: 0.00004957 [01:08:02] Epoch: 1 Batch: 5413/38378 (14.10%) Loss: 2.057142 LR: 0.00004957 [01:08:04] Epoch: 1 Batch: 5414/38378 (14.11%) Loss: 2.165389 LR: 0.00004957 [01:08:06] Epoch: 1 Batch: 5415/38378 (14.11%) Loss: 2.247954 LR: 0.00004957 [01:08:07] Epoch: 1 Batch: 5416/38378 (14.11%) Loss: 2.044979 LR: 0.00004957 [01:08:09] Epoch: 1 Batch: 5417/38378 (14.11%) Loss: 1.956754 LR: 0.00004957 [01:08:11] Epoch: 1 Batch: 5418/38378 (14.12%) Loss: 2.169722 LR: 0.00004957 [01:08:12] Epoch: 1 Batch: 5419/38378 (14.12%) Loss: 1.881148 LR: 0.00004957 [01:08:14] Epoch: 1 Batch: 5420/38378 (14.12%) Loss: 2.184538 LR: 0.00004957 [01:08:16] Epoch: 1 Batch: 5421/38378 (14.13%) Loss: 1.961103 LR: 0.00004957 [01:08:18] Epoch: 1 Batch: 5422/38378 (14.13%) Loss: 2.013631 LR: 0.00004957 [01:08:19] Epoch: 1 Batch: 5423/38378 (14.13%) Loss: 1.919629 LR: 0.00004957 [01:08:21] Epoch: 1 Batch: 5424/38378 (14.13%) Loss: 1.869737 LR: 0.00004957 [01:08:23] Epoch: 1 Batch: 5425/38378 (14.14%) Loss: 1.976950 LR: 0.00004957 [01:08:24] Epoch: 1 Batch: 5426/38378 (14.14%) Loss: 2.149016 LR: 0.00004957 [01:08:26] Epoch: 1 Batch: 5427/38378 (14.14%) Loss: 2.068887 LR: 0.00004957 [01:08:28] Epoch: 1 Batch: 5428/38378 (14.14%) Loss: 2.338788 LR: 0.00004957 [01:08:29] Epoch: 1 Batch: 5429/38378 (14.15%) Loss: 1.864176 LR: 0.00004956 [01:08:31] Epoch: 1 Batch: 5430/38378 (14.15%) Loss: 1.929023 LR: 0.00004956 [01:08:33] Epoch: 1 Batch: 5431/38378 (14.15%) Loss: 1.857557 LR: 0.00004956 [01:08:35] Epoch: 1 Batch: 5432/38378 (14.15%) Loss: 1.949665 LR: 0.00004956 [01:08:36] Epoch: 1 Batch: 5433/38378 (14.16%) Loss: 1.814592 LR: 0.00004956 [01:08:38] Epoch: 1 Batch: 5434/38378 (14.16%) Loss: 1.828944 LR: 0.00004956 [01:08:40] Epoch: 1 Batch: 5435/38378 (14.16%) Loss: 2.024586 LR: 0.00004956 [01:08:41] Epoch: 1 Batch: 5436/38378 (14.16%) Loss: 1.888087 LR: 0.00004956 [01:08:43] Epoch: 1 Batch: 5437/38378 (14.17%) Loss: 2.355776 LR: 0.00004956 [01:08:45] Epoch: 1 Batch: 5438/38378 (14.17%) Loss: 2.316802 LR: 0.00004956 [01:08:47] Epoch: 1 Batch: 5439/38378 (14.17%) Loss: 2.155732 LR: 0.00004956 [01:08:48] Epoch: 1 Batch: 5440/38378 (14.17%) Loss: 2.038552 LR: 0.00004956 [01:08:50] Epoch: 1 Batch: 5441/38378 (14.18%) Loss: 1.885662 LR: 0.00004956 [01:08:52] Epoch: 1 Batch: 5442/38378 (14.18%) Loss: 2.220757 LR: 0.00004956 [01:08:53] Epoch: 1 Batch: 5443/38378 (14.18%) Loss: 2.124257 LR: 0.00004956 [01:08:55] Epoch: 1 Batch: 5444/38378 (14.19%) Loss: 2.007061 LR: 0.00004956 [01:09:01] >> Cleaned up old temp checkpoint: epoch1_step5115 [01:09:01] >> Temp checkpoint saved: epoch1_step5445, size: 0.1702 GB [01:09:01] Epoch: 1 Batch: 5445/38378 (14.19%) Loss: 1.853730 LR: 0.00004956 [01:09:03] Epoch: 1 Batch: 5446/38378 (14.19%) Loss: 2.056552 LR: 0.00004956 [01:09:04] Epoch: 1 Batch: 5447/38378 (14.19%) Loss: 1.895824 LR: 0.00004956 [01:09:06] Epoch: 1 Batch: 5448/38378 (14.20%) Loss: 1.929462 LR: 0.00004956 [01:09:08] Epoch: 1 Batch: 5449/38378 (14.20%) Loss: 2.027690 LR: 0.00004956 [01:09:10] Epoch: 1 Batch: 5450/38378 (14.20%) Loss: 1.940276 LR: 0.00004956 [01:09:11] Epoch: 1 Batch: 5451/38378 (14.20%) Loss: 1.932914 LR: 0.00004956 [01:09:13] Epoch: 1 Batch: 5452/38378 (14.21%) Loss: 1.746073 LR: 0.00004956 [01:09:15] Epoch: 1 Batch: 5453/38378 (14.21%) Loss: 1.680498 LR: 0.00004956 [01:09:16] Epoch: 1 Batch: 5454/38378 (14.21%) Loss: 2.163077 LR: 0.00004956 [01:09:18] Epoch: 1 Batch: 5455/38378 (14.21%) Loss: 2.185470 LR: 0.00004956 [01:09:20] Epoch: 1 Batch: 5456/38378 (14.22%) Loss: 1.958431 LR: 0.00004956 [01:09:21] Epoch: 1 Batch: 5457/38378 (14.22%) Loss: 1.878170 LR: 0.00004955 [01:09:23] Epoch: 1 Batch: 5458/38378 (14.22%) Loss: 1.922723 LR: 0.00004955 [01:09:25] Epoch: 1 Batch: 5459/38378 (14.22%) Loss: 2.286853 LR: 0.00004955 [01:09:27] Epoch: 1 Batch: 5460/38378 (14.23%) Loss: 2.252505 LR: 0.00004955 [01:09:28] Epoch: 1 Batch: 5461/38378 (14.23%) Loss: 1.858819 LR: 0.00004955 [01:09:30] Epoch: 1 Batch: 5462/38378 (14.23%) Loss: 2.130759 LR: 0.00004955 [01:09:32] Epoch: 1 Batch: 5463/38378 (14.23%) Loss: 1.983292 LR: 0.00004955 [01:09:33] Epoch: 1 Batch: 5464/38378 (14.24%) Loss: 1.812167 LR: 0.00004955 [01:09:35] Epoch: 1 Batch: 5465/38378 (14.24%) Loss: 2.115699 LR: 0.00004955 [01:09:37] Epoch: 1 Batch: 5466/38378 (14.24%) Loss: 1.953719 LR: 0.00004955 [01:09:39] Epoch: 1 Batch: 5467/38378 (14.25%) Loss: 2.038461 LR: 0.00004955 [01:09:40] Epoch: 1 Batch: 5468/38378 (14.25%) Loss: 2.107009 LR: 0.00004955 [01:09:42] Epoch: 1 Batch: 5469/38378 (14.25%) Loss: 1.767473 LR: 0.00004955 [01:09:44] Epoch: 1 Batch: 5470/38378 (14.25%) Loss: 1.876109 LR: 0.00004955 [01:09:46] Epoch: 1 Batch: 5471/38378 (14.26%) Loss: 2.272206 LR: 0.00004955 [01:09:47] Epoch: 1 Batch: 5472/38378 (14.26%) Loss: 2.098433 LR: 0.00004955 [01:09:49] Epoch: 1 Batch: 5473/38378 (14.26%) Loss: 1.987693 LR: 0.00004955 [01:09:51] Epoch: 1 Batch: 5474/38378 (14.26%) Loss: 2.048927 LR: 0.00004955 [01:09:52] Epoch: 1 Batch: 5475/38378 (14.27%) Loss: 2.107104 LR: 0.00004955 [01:09:54] Epoch: 1 Batch: 5476/38378 (14.27%) Loss: 2.033817 LR: 0.00004955 [01:09:56] Epoch: 1 Batch: 5477/38378 (14.27%) Loss: 2.248459 LR: 0.00004955 [01:10:02] >> Cleaned up old temp checkpoint: epoch1_step5148 [01:10:02] >> Temp checkpoint saved: epoch1_step5478, size: 0.1702 GB [01:10:02] Epoch: 1 Batch: 5478/38378 (14.27%) Loss: 2.130816 LR: 0.00004955 [01:10:03] Epoch: 1 Batch: 5479/38378 (14.28%) Loss: 2.084036 LR: 0.00004955 [01:10:05] Epoch: 1 Batch: 5480/38378 (14.28%) Loss: 2.314504 LR: 0.00004955 [01:10:07] Epoch: 1 Batch: 5481/38378 (14.28%) Loss: 1.950452 LR: 0.00004955 [01:10:08] Epoch: 1 Batch: 5482/38378 (14.28%) Loss: 1.910868 LR: 0.00004955 [01:10:10] Epoch: 1 Batch: 5483/38378 (14.29%) Loss: 2.252058 LR: 0.00004955 [01:10:12] Epoch: 1 Batch: 5484/38378 (14.29%) Loss: 2.116154 LR: 0.00004955 [01:10:14] Epoch: 1 Batch: 5485/38378 (14.29%) Loss: 2.109612 LR: 0.00004954 [01:10:15] Epoch: 1 Batch: 5486/38378 (14.29%) Loss: 2.052201 LR: 0.00004954 [01:10:17] Epoch: 1 Batch: 5487/38378 (14.30%) Loss: 2.298438 LR: 0.00004954 [01:10:19] Epoch: 1 Batch: 5488/38378 (14.30%) Loss: 2.059344 LR: 0.00004954 [01:10:20] Epoch: 1 Batch: 5489/38378 (14.30%) Loss: 1.837942 LR: 0.00004954 [01:10:22] Epoch: 1 Batch: 5490/38378 (14.31%) Loss: 1.806293 LR: 0.00004954 [01:10:24] Epoch: 1 Batch: 5491/38378 (14.31%) Loss: 2.334202 LR: 0.00004954 [01:10:26] Epoch: 1 Batch: 5492/38378 (14.31%) Loss: 2.152220 LR: 0.00004954 [01:10:27] Epoch: 1 Batch: 5493/38378 (14.31%) Loss: 1.966660 LR: 0.00004954 [01:10:29] Epoch: 1 Batch: 5494/38378 (14.32%) Loss: 1.912211 LR: 0.00004954 [01:10:31] Epoch: 1 Batch: 5495/38378 (14.32%) Loss: 1.902996 LR: 0.00004954 [01:10:32] Epoch: 1 Batch: 5496/38378 (14.32%) Loss: 1.994143 LR: 0.00004954 [01:10:34] Epoch: 1 Batch: 5497/38378 (14.32%) Loss: 1.767845 LR: 0.00004954 [01:10:36] Epoch: 1 Batch: 5498/38378 (14.33%) Loss: 1.912839 LR: 0.00004954 [01:10:38] Epoch: 1 Batch: 5499/38378 (14.33%) Loss: 1.998232 LR: 0.00004954 [01:10:39] >> Evaluating batch 0 [01:10:40] >> Evaluating batch 1 [01:10:41] >> Evaluating batch 2 [01:10:42] >> Evaluating batch 3 [01:10:43] >> Evaluating batch 4 [01:10:44] >> Evaluating batch 5 [01:10:45] >> Evaluating batch 6 [01:10:46] >> Evaluating batch 7 [01:10:47] >> Evaluating batch 8 [01:10:48] >> Evaluating batch 9 [01:10:49] >> Evaluating batch 10 [01:10:50] >> Evaluating batch 11 [01:10:51] >> Evaluating batch 12 [01:10:52] >> Evaluating batch 13 [01:10:53] >> Evaluating batch 14 [01:10:54] >> Evaluating batch 15 [01:10:55] >> Evaluating batch 16 [01:10:55] Epoch: 1 Step: 5500/38378 Evaluation: [01:10:55] [1mAvg Loss Since Last Eval: 2.0267 Val Loss: 2.1402 Validation loss delta: -0.0110 Perplexity: 8.5015 LR: 0.00004954 [01:10:59] >> Checkpoint saved: epoch1_step5500, size: 0.1702 GB [01:10:59] Epoch: 1 Batch: 5500/38378 (14.33%) Loss: 1.977355 LR: 0.00004954 [01:11:01] Epoch: 1 Batch: 5501/38378 (14.33%) Loss: 1.978239 LR: 0.00004954 [01:11:03] Epoch: 1 Batch: 5502/38378 (14.34%) Loss: 2.088221 LR: 0.00004954 [01:11:04] Epoch: 1 Batch: 5503/38378 (14.34%) Loss: 2.218324 LR: 0.00004954 [01:11:06] Epoch: 1 Batch: 5504/38378 (14.34%) Loss: 2.267159 LR: 0.00004954 [01:11:08] Epoch: 1 Batch: 5505/38378 (14.34%) Loss: 2.371107 LR: 0.00004954 [01:11:09] Epoch: 1 Batch: 5506/38378 (14.35%) Loss: 1.982504 LR: 0.00004953 [01:11:11] Epoch: 1 Batch: 5507/38378 (14.35%) Loss: 2.073754 LR: 0.00004953 [01:11:13] Epoch: 1 Batch: 5508/38378 (14.35%) Loss: 2.006752 LR: 0.00004953 [01:11:14] Epoch: 1 Batch: 5509/38378 (14.35%) Loss: 2.249062 LR: 0.00004953 [01:11:16] Epoch: 1 Batch: 5510/38378 (14.36%) Loss: 2.066826 LR: 0.00004953 [01:11:22] >> Cleaned up old temp checkpoint: epoch1_step5181 [01:11:22] >> Temp checkpoint saved: epoch1_step5511, size: 0.1702 GB [01:11:22] Epoch: 1 Batch: 5511/38378 (14.36%) Loss: 1.729951 LR: 0.00004953 [01:11:23] Epoch: 1 Batch: 5512/38378 (14.36%) Loss: 1.915005 LR: 0.00004953 [01:11:25] Epoch: 1 Batch: 5513/38378 (14.37%) Loss: 2.054513 LR: 0.00004953 [01:11:27] Epoch: 1 Batch: 5514/38378 (14.37%) Loss: 2.201534 LR: 0.00004953 [01:11:29] Epoch: 1 Batch: 5515/38378 (14.37%) Loss: 2.141383 LR: 0.00004953 [01:11:30] Epoch: 1 Batch: 5516/38378 (14.37%) Loss: 1.901169 LR: 0.00004953 [01:11:32] Epoch: 1 Batch: 5517/38378 (14.38%) Loss: 2.063041 LR: 0.00004953 [01:11:34] Epoch: 1 Batch: 5518/38378 (14.38%) Loss: 2.051059 LR: 0.00004953 [01:11:35] Epoch: 1 Batch: 5519/38378 (14.38%) Loss: 1.959933 LR: 0.00004953 [01:11:37] Epoch: 1 Batch: 5520/38378 (14.38%) Loss: 1.978554 LR: 0.00004953 [01:11:39] Epoch: 1 Batch: 5521/38378 (14.39%) Loss: 2.099207 LR: 0.00004953 [01:11:41] Epoch: 1 Batch: 5522/38378 (14.39%) Loss: 2.081979 LR: 0.00004953 [01:11:42] Epoch: 1 Batch: 5523/38378 (14.39%) Loss: 2.075440 LR: 0.00004953 [01:11:44] Epoch: 1 Batch: 5524/38378 (14.39%) Loss: 2.025357 LR: 0.00004953 [01:11:46] Epoch: 1 Batch: 5525/38378 (14.40%) Loss: 2.257065 LR: 0.00004953 [01:11:48] Epoch: 1 Batch: 5526/38378 (14.40%) Loss: 1.822152 LR: 0.00004953 [01:11:49] Epoch: 1 Batch: 5527/38378 (14.40%) Loss: 2.133061 LR: 0.00004953 [01:11:51] Epoch: 1 Batch: 5528/38378 (14.40%) Loss: 1.892268 LR: 0.00004953 [01:11:53] Epoch: 1 Batch: 5529/38378 (14.41%) Loss: 1.711875 LR: 0.00004953 [01:11:54] Epoch: 1 Batch: 5530/38378 (14.41%) Loss: 2.117686 LR: 0.00004953 [01:11:56] Epoch: 1 Batch: 5531/38378 (14.41%) Loss: 1.997665 LR: 0.00004953 [01:11:58] Epoch: 1 Batch: 5532/38378 (14.41%) Loss: 2.224498 LR: 0.00004953 [01:12:00] Epoch: 1 Batch: 5533/38378 (14.42%) Loss: 1.736269 LR: 0.00004953 [01:12:01] Epoch: 1 Batch: 5534/38378 (14.42%) Loss: 2.109783 LR: 0.00004952 [01:12:03] Epoch: 1 Batch: 5535/38378 (14.42%) Loss: 2.138518 LR: 0.00004952 [01:12:05] Epoch: 1 Batch: 5536/38378 (14.42%) Loss: 2.142685 LR: 0.00004952 [01:12:07] Epoch: 1 Batch: 5537/38378 (14.43%) Loss: 1.927379 LR: 0.00004952 [01:12:08] Epoch: 1 Batch: 5538/38378 (14.43%) Loss: 2.334906 LR: 0.00004952 [01:12:10] Epoch: 1 Batch: 5539/38378 (14.43%) Loss: 2.116695 LR: 0.00004952 [01:12:12] Epoch: 1 Batch: 5540/38378 (14.44%) Loss: 1.988108 LR: 0.00004952 [01:12:13] Epoch: 1 Batch: 5541/38378 (14.44%) Loss: 1.988153 LR: 0.00004952 [01:12:15] Epoch: 1 Batch: 5542/38378 (14.44%) Loss: 1.665435 LR: 0.00004952 [01:12:17] Epoch: 1 Batch: 5543/38378 (14.44%) Loss: 1.969626 LR: 0.00004952 [01:12:22] >> Cleaned up old temp checkpoint: epoch1_step5214 [01:12:22] >> Temp checkpoint saved: epoch1_step5544, size: 0.1702 GB [01:12:22] Epoch: 1 Batch: 5544/38378 (14.45%) Loss: 2.058544 LR: 0.00004952 [01:12:24] Epoch: 1 Batch: 5545/38378 (14.45%) Loss: 2.202483 LR: 0.00004952 [01:12:26] Epoch: 1 Batch: 5546/38378 (14.45%) Loss: 1.893694 LR: 0.00004952 [01:12:27] Epoch: 1 Batch: 5547/38378 (14.45%) Loss: 1.581784 LR: 0.00004952 [01:12:29] Epoch: 1 Batch: 5548/38378 (14.46%) Loss: 1.979799 LR: 0.00004952 [01:12:31] Epoch: 1 Batch: 5549/38378 (14.46%) Loss: 2.106153 LR: 0.00004952 [01:12:32] Epoch: 1 Batch: 5550/38378 (14.46%) Loss: 1.979008 LR: 0.00004952 [01:12:34] Epoch: 1 Batch: 5551/38378 (14.46%) Loss: 2.142535 LR: 0.00004952 [01:12:36] Epoch: 1 Batch: 5552/38378 (14.47%) Loss: 2.253607 LR: 0.00004952 [01:12:38] Epoch: 1 Batch: 5553/38378 (14.47%) Loss: 2.078433 LR: 0.00004952 [01:12:39] Epoch: 1 Batch: 5554/38378 (14.47%) Loss: 2.163398 LR: 0.00004952 [01:12:41] Epoch: 1 Batch: 5555/38378 (14.47%) Loss: 1.656374 LR: 0.00004952 [01:12:43] Epoch: 1 Batch: 5556/38378 (14.48%) Loss: 2.281512 LR: 0.00004952 [01:12:44] Epoch: 1 Batch: 5557/38378 (14.48%) Loss: 1.982845 LR: 0.00004952 [01:12:46] Epoch: 1 Batch: 5558/38378 (14.48%) Loss: 2.089610 LR: 0.00004952 [01:12:48] Epoch: 1 Batch: 5559/38378 (14.48%) Loss: 1.997407 LR: 0.00004952 [01:12:50] Epoch: 1 Batch: 5560/38378 (14.49%) Loss: 1.986408 LR: 0.00004952 [01:12:51] Epoch: 1 Batch: 5561/38378 (14.49%) Loss: 1.715275 LR: 0.00004952 [01:12:53] Epoch: 1 Batch: 5562/38378 (14.49%) Loss: 2.512363 LR: 0.00004951 [01:12:55] Epoch: 1 Batch: 5563/38378 (14.50%) Loss: 1.984991 LR: 0.00004951 [01:12:56] Epoch: 1 Batch: 5564/38378 (14.50%) Loss: 1.928516 LR: 0.00004951 [01:12:58] Epoch: 1 Batch: 5565/38378 (14.50%) Loss: 1.968556 LR: 0.00004951 [01:13:00] Epoch: 1 Batch: 5566/38378 (14.50%) Loss: 1.845404 LR: 0.00004951 [01:13:02] Epoch: 1 Batch: 5567/38378 (14.51%) Loss: 2.117705 LR: 0.00004951 [01:13:03] Epoch: 1 Batch: 5568/38378 (14.51%) Loss: 1.835259 LR: 0.00004951 [01:13:05] Epoch: 1 Batch: 5569/38378 (14.51%) Loss: 1.901751 LR: 0.00004951 [01:13:07] Epoch: 1 Batch: 5570/38378 (14.51%) Loss: 2.021241 LR: 0.00004951 [01:13:09] Epoch: 1 Batch: 5571/38378 (14.52%) Loss: 2.035525 LR: 0.00004951 [01:13:10] Epoch: 1 Batch: 5572/38378 (14.52%) Loss: 1.929320 LR: 0.00004951 [01:13:12] Epoch: 1 Batch: 5573/38378 (14.52%) Loss: 1.789789 LR: 0.00004951 [01:13:14] Epoch: 1 Batch: 5574/38378 (14.52%) Loss: 2.398961 LR: 0.00004951 [01:13:15] Epoch: 1 Batch: 5575/38378 (14.53%) Loss: 2.147902 LR: 0.00004951 [01:13:17] Epoch: 1 Batch: 5576/38378 (14.53%) Loss: 2.202415 LR: 0.00004951 [01:13:23] >> Cleaned up old temp checkpoint: epoch1_step5247 [01:13:23] >> Temp checkpoint saved: epoch1_step5577, size: 0.1702 GB [01:13:23] Epoch: 1 Batch: 5577/38378 (14.53%) Loss: 2.039439 LR: 0.00004951 [01:13:24] Epoch: 1 Batch: 5578/38378 (14.53%) Loss: 1.727499 LR: 0.00004951 [01:13:26] Epoch: 1 Batch: 5579/38378 (14.54%) Loss: 2.235772 LR: 0.00004951 [01:13:28] Epoch: 1 Batch: 5580/38378 (14.54%) Loss: 1.924780 LR: 0.00004951 [01:13:30] Epoch: 1 Batch: 5581/38378 (14.54%) Loss: 1.985894 LR: 0.00004951 [01:13:31] Epoch: 1 Batch: 5582/38378 (14.54%) Loss: 1.966790 LR: 0.00004951 [01:13:33] Epoch: 1 Batch: 5583/38378 (14.55%) Loss: 1.939465 LR: 0.00004950 [01:13:35] Epoch: 1 Batch: 5584/38378 (14.55%) Loss: 2.072089 LR: 0.00004950 [01:13:36] Epoch: 1 Batch: 5585/38378 (14.55%) Loss: 1.965266 LR: 0.00004950 [01:13:38] Epoch: 1 Batch: 5586/38378 (14.56%) Loss: 1.913079 LR: 0.00004950 [01:13:40] Epoch: 1 Batch: 5587/38378 (14.56%) Loss: 1.686635 LR: 0.00004950 [01:13:41] Epoch: 1 Batch: 5588/38378 (14.56%) Loss: 2.158391 LR: 0.00004950 [01:13:43] Epoch: 1 Batch: 5589/38378 (14.56%) Loss: 1.773269 LR: 0.00004950 [01:13:45] Epoch: 1 Batch: 5590/38378 (14.57%) Loss: 1.938957 LR: 0.00004950 [01:13:47] Epoch: 1 Batch: 5591/38378 (14.57%) Loss: 2.044909 LR: 0.00004950 [01:13:48] Epoch: 1 Batch: 5592/38378 (14.57%) Loss: 2.139741 LR: 0.00004950 [01:13:50] Epoch: 1 Batch: 5593/38378 (14.57%) Loss: 1.816593 LR: 0.00004950 [01:13:52] Epoch: 1 Batch: 5594/38378 (14.58%) Loss: 2.040699 LR: 0.00004950 [01:13:54] Epoch: 1 Batch: 5595/38378 (14.58%) Loss: 2.216782 LR: 0.00004950 [01:13:55] Epoch: 1 Batch: 5596/38378 (14.58%) Loss: 2.009439 LR: 0.00004950 [01:13:57] Epoch: 1 Batch: 5597/38378 (14.58%) Loss: 2.103401 LR: 0.00004950 [01:13:59] Epoch: 1 Batch: 5598/38378 (14.59%) Loss: 1.909915 LR: 0.00004950 [01:14:00] Epoch: 1 Batch: 5599/38378 (14.59%) Loss: 2.055641 LR: 0.00004950 [01:14:02] Epoch: 1 Batch: 5600/38378 (14.59%) Loss: 2.251878 LR: 0.00004950 [01:14:04] Epoch: 1 Batch: 5601/38378 (14.59%) Loss: 1.995253 LR: 0.00004950 [01:14:06] Epoch: 1 Batch: 5602/38378 (14.60%) Loss: 2.030371 LR: 0.00004950 [01:14:07] Epoch: 1 Batch: 5603/38378 (14.60%) Loss: 2.098455 LR: 0.00004950 [01:14:09] Epoch: 1 Batch: 5604/38378 (14.60%) Loss: 2.035079 LR: 0.00004950 [01:14:11] Epoch: 1 Batch: 5605/38378 (14.60%) Loss: 2.181922 LR: 0.00004950 [01:14:12] Epoch: 1 Batch: 5606/38378 (14.61%) Loss: 2.051709 LR: 0.00004950 [01:14:14] Epoch: 1 Batch: 5607/38378 (14.61%) Loss: 1.920125 LR: 0.00004950 [01:14:16] Epoch: 1 Batch: 5608/38378 (14.61%) Loss: 1.925571 LR: 0.00004950 [01:14:18] Epoch: 1 Batch: 5609/38378 (14.62%) Loss: 2.157840 LR: 0.00004950 [01:14:23] >> Cleaned up old temp checkpoint: epoch1_step5280 [01:14:23] >> Temp checkpoint saved: epoch1_step5610, size: 0.1702 GB [01:14:23] Epoch: 1 Batch: 5610/38378 (14.62%) Loss: 1.781237 LR: 0.00004950 [01:14:25] Epoch: 1 Batch: 5611/38378 (14.62%) Loss: 1.910165 LR: 0.00004949 [01:14:27] Epoch: 1 Batch: 5612/38378 (14.62%) Loss: 2.288680 LR: 0.00004949 [01:14:28] Epoch: 1 Batch: 5613/38378 (14.63%) Loss: 2.320060 LR: 0.00004949 [01:14:30] Epoch: 1 Batch: 5614/38378 (14.63%) Loss: 1.928941 LR: 0.00004949 [01:14:32] Epoch: 1 Batch: 5615/38378 (14.63%) Loss: 1.961804 LR: 0.00004949 [01:14:33] Epoch: 1 Batch: 5616/38378 (14.63%) Loss: 1.738712 LR: 0.00004949 [01:14:35] Epoch: 1 Batch: 5617/38378 (14.64%) Loss: 1.959316 LR: 0.00004949 [01:14:37] Epoch: 1 Batch: 5618/38378 (14.64%) Loss: 1.970023 LR: 0.00004949 [01:14:39] Epoch: 1 Batch: 5619/38378 (14.64%) Loss: 2.006603 LR: 0.00004949 [01:14:40] Epoch: 1 Batch: 5620/38378 (14.64%) Loss: 1.981919 LR: 0.00004949 [01:14:42] Epoch: 1 Batch: 5621/38378 (14.65%) Loss: 1.946826 LR: 0.00004949 [01:14:44] Epoch: 1 Batch: 5622/38378 (14.65%) Loss: 2.107531 LR: 0.00004949 [01:14:45] Epoch: 1 Batch: 5623/38378 (14.65%) Loss: 1.898181 LR: 0.00004949 [01:14:47] Epoch: 1 Batch: 5624/38378 (14.65%) Loss: 2.082769 LR: 0.00004949 [01:14:49] Epoch: 1 Batch: 5625/38378 (14.66%) Loss: 2.136478 LR: 0.00004949 [01:14:51] Epoch: 1 Batch: 5626/38378 (14.66%) Loss: 2.154365 LR: 0.00004949 [01:14:52] Epoch: 1 Batch: 5627/38378 (14.66%) Loss: 1.927717 LR: 0.00004949 [01:14:54] Epoch: 1 Batch: 5628/38378 (14.66%) Loss: 2.104450 LR: 0.00004949 [01:14:56] Epoch: 1 Batch: 5629/38378 (14.67%) Loss: 2.450682 LR: 0.00004949 [01:14:57] Epoch: 1 Batch: 5630/38378 (14.67%) Loss: 1.625857 LR: 0.00004949 [01:14:59] Epoch: 1 Batch: 5631/38378 (14.67%) Loss: 2.104634 LR: 0.00004949 [01:15:01] Epoch: 1 Batch: 5632/38378 (14.68%) Loss: 2.043217 LR: 0.00004949 [01:15:03] Epoch: 1 Batch: 5633/38378 (14.68%) Loss: 2.357828 LR: 0.00004949 [01:15:04] Epoch: 1 Batch: 5634/38378 (14.68%) Loss: 2.022066 LR: 0.00004949 [01:15:06] Epoch: 1 Batch: 5635/38378 (14.68%) Loss: 1.703110 LR: 0.00004949 [01:15:08] Epoch: 1 Batch: 5636/38378 (14.69%) Loss: 1.985441 LR: 0.00004949 [01:15:10] Epoch: 1 Batch: 5637/38378 (14.69%) Loss: 2.227655 LR: 0.00004949 [01:15:11] Epoch: 1 Batch: 5638/38378 (14.69%) Loss: 2.267305 LR: 0.00004949 [01:15:13] Epoch: 1 Batch: 5639/38378 (14.69%) Loss: 2.065802 LR: 0.00004948 [01:15:15] Epoch: 1 Batch: 5640/38378 (14.70%) Loss: 2.020289 LR: 0.00004948 [01:15:16] Epoch: 1 Batch: 5641/38378 (14.70%) Loss: 2.477007 LR: 0.00004948 [01:15:18] Epoch: 1 Batch: 5642/38378 (14.70%) Loss: 1.942980 LR: 0.00004948 [01:15:24] >> Cleaned up old temp checkpoint: epoch1_step5313 [01:15:24] >> Temp checkpoint saved: epoch1_step5643, size: 0.1702 GB [01:15:24] Epoch: 1 Batch: 5643/38378 (14.70%) Loss: 1.926834 LR: 0.00004948 [01:15:25] Epoch: 1 Batch: 5644/38378 (14.71%) Loss: 2.102071 LR: 0.00004948 [01:15:27] Epoch: 1 Batch: 5645/38378 (14.71%) Loss: 1.894063 LR: 0.00004948 [01:15:29] Epoch: 1 Batch: 5646/38378 (14.71%) Loss: 2.086302 LR: 0.00004948 [01:15:30] Epoch: 1 Batch: 5647/38378 (14.71%) Loss: 1.816932 LR: 0.00004948 [01:15:32] Epoch: 1 Batch: 5648/38378 (14.72%) Loss: 1.804371 LR: 0.00004948 [01:15:34] Epoch: 1 Batch: 5649/38378 (14.72%) Loss: 2.057684 LR: 0.00004948 [01:15:35] Epoch: 1 Batch: 5650/38378 (14.72%) Loss: 1.758757 LR: 0.00004948 [01:15:37] Epoch: 1 Batch: 5651/38378 (14.72%) Loss: 2.019595 LR: 0.00004948 [01:15:39] Epoch: 1 Batch: 5652/38378 (14.73%) Loss: 2.143749 LR: 0.00004948 [01:15:41] Epoch: 1 Batch: 5653/38378 (14.73%) Loss: 1.851195 LR: 0.00004948 [01:15:42] Epoch: 1 Batch: 5654/38378 (14.73%) Loss: 2.045108 LR: 0.00004948 [01:15:44] Epoch: 1 Batch: 5655/38378 (14.74%) Loss: 1.821493 LR: 0.00004948 [01:15:46] Epoch: 1 Batch: 5656/38378 (14.74%) Loss: 1.952172 LR: 0.00004948 [01:15:47] Epoch: 1 Batch: 5657/38378 (14.74%) Loss: 1.928399 LR: 0.00004948 [01:15:49] Epoch: 1 Batch: 5658/38378 (14.74%) Loss: 1.806237 LR: 0.00004948 [01:15:51] Epoch: 1 Batch: 5659/38378 (14.75%) Loss: 1.948931 LR: 0.00004948 [01:15:52] Epoch: 1 Batch: 5660/38378 (14.75%) Loss: 2.224009 LR: 0.00004947 [01:15:54] Epoch: 1 Batch: 5661/38378 (14.75%) Loss: 2.056186 LR: 0.00004947 [01:15:56] Epoch: 1 Batch: 5662/38378 (14.75%) Loss: 2.124335 LR: 0.00004947 [01:15:58] Epoch: 1 Batch: 5663/38378 (14.76%) Loss: 1.735664 LR: 0.00004947 [01:15:59] Epoch: 1 Batch: 5664/38378 (14.76%) Loss: 1.934559 LR: 0.00004947 [01:16:01] Epoch: 1 Batch: 5665/38378 (14.76%) Loss: 2.236310 LR: 0.00004947 [01:16:03] Epoch: 1 Batch: 5666/38378 (14.76%) Loss: 1.964606 LR: 0.00004947 [01:16:04] Epoch: 1 Batch: 5667/38378 (14.77%) Loss: 2.029847 LR: 0.00004947 [01:16:06] Epoch: 1 Batch: 5668/38378 (14.77%) Loss: 2.198106 LR: 0.00004947 [01:16:08] Epoch: 1 Batch: 5669/38378 (14.77%) Loss: 2.311516 LR: 0.00004947 [01:16:10] Epoch: 1 Batch: 5670/38378 (14.77%) Loss: 1.970953 LR: 0.00004947 [01:16:11] Epoch: 1 Batch: 5671/38378 (14.78%) Loss: 2.292971 LR: 0.00004947 [01:16:13] Epoch: 1 Batch: 5672/38378 (14.78%) Loss: 2.152456 LR: 0.00004947 [01:16:15] Epoch: 1 Batch: 5673/38378 (14.78%) Loss: 1.701068 LR: 0.00004947 [01:16:16] Epoch: 1 Batch: 5674/38378 (14.78%) Loss: 1.721403 LR: 0.00004947 [01:16:18] Epoch: 1 Batch: 5675/38378 (14.79%) Loss: 1.972209 LR: 0.00004947 [01:16:24] >> Cleaned up old temp checkpoint: epoch1_step5346 [01:16:24] >> Temp checkpoint saved: epoch1_step5676, size: 0.1702 GB [01:16:24] Epoch: 1 Batch: 5676/38378 (14.79%) Loss: 2.086488 LR: 0.00004947 [01:16:25] Epoch: 1 Batch: 5677/38378 (14.79%) Loss: 1.889044 LR: 0.00004947 [01:16:27] Epoch: 1 Batch: 5678/38378 (14.79%) Loss: 2.371778 LR: 0.00004947 [01:16:29] Epoch: 1 Batch: 5679/38378 (14.80%) Loss: 1.949101 LR: 0.00004947 [01:16:30] Epoch: 1 Batch: 5680/38378 (14.80%) Loss: 2.031364 LR: 0.00004947 [01:16:32] Epoch: 1 Batch: 5681/38378 (14.80%) Loss: 1.980626 LR: 0.00004947 [01:16:34] Epoch: 1 Batch: 5682/38378 (14.81%) Loss: 1.991806 LR: 0.00004947 [01:16:36] Epoch: 1 Batch: 5683/38378 (14.81%) Loss: 2.100978 LR: 0.00004947 [01:16:37] Epoch: 1 Batch: 5684/38378 (14.81%) Loss: 1.772540 LR: 0.00004947 [01:16:39] Epoch: 1 Batch: 5685/38378 (14.81%) Loss: 1.956221 LR: 0.00004947 [01:16:41] Epoch: 1 Batch: 5686/38378 (14.82%) Loss: 1.736816 LR: 0.00004947 [01:16:42] Epoch: 1 Batch: 5687/38378 (14.82%) Loss: 1.960188 LR: 0.00004947 [01:16:44] Epoch: 1 Batch: 5688/38378 (14.82%) Loss: 1.729544 LR: 0.00004946 [01:16:46] Epoch: 1 Batch: 5689/38378 (14.82%) Loss: 1.847575 LR: 0.00004946 [01:16:48] Epoch: 1 Batch: 5690/38378 (14.83%) Loss: 1.841851 LR: 0.00004946 [01:16:49] Epoch: 1 Batch: 5691/38378 (14.83%) Loss: 1.845056 LR: 0.00004946 [01:16:51] Epoch: 1 Batch: 5692/38378 (14.83%) Loss: 2.080137 LR: 0.00004946 [01:16:53] Epoch: 1 Batch: 5693/38378 (14.83%) Loss: 2.037311 LR: 0.00004946 [01:16:54] Epoch: 1 Batch: 5694/38378 (14.84%) Loss: 2.244881 LR: 0.00004946 [01:16:56] Epoch: 1 Batch: 5695/38378 (14.84%) Loss: 2.229821 LR: 0.00004946 [01:16:58] Epoch: 1 Batch: 5696/38378 (14.84%) Loss: 2.096900 LR: 0.00004946 [01:17:00] Epoch: 1 Batch: 5697/38378 (14.84%) Loss: 2.291136 LR: 0.00004946 [01:17:01] Epoch: 1 Batch: 5698/38378 (14.85%) Loss: 2.055040 LR: 0.00004946 [01:17:03] Epoch: 1 Batch: 5699/38378 (14.85%) Loss: 2.234567 LR: 0.00004946 [01:17:05] Epoch: 1 Batch: 5700/38378 (14.85%) Loss: 2.172896 LR: 0.00004946 [01:17:06] Epoch: 1 Batch: 5701/38378 (14.85%) Loss: 1.977515 LR: 0.00004946 [01:17:08] Epoch: 1 Batch: 5702/38378 (14.86%) Loss: 2.174000 LR: 0.00004946 [01:17:10] Epoch: 1 Batch: 5703/38378 (14.86%) Loss: 2.070157 LR: 0.00004946 [01:17:12] Epoch: 1 Batch: 5704/38378 (14.86%) Loss: 2.321576 LR: 0.00004946 [01:17:13] Epoch: 1 Batch: 5705/38378 (14.87%) Loss: 1.939365 LR: 0.00004946 [01:17:15] Epoch: 1 Batch: 5706/38378 (14.87%) Loss: 1.987766 LR: 0.00004946 [01:17:17] Epoch: 1 Batch: 5707/38378 (14.87%) Loss: 2.069349 LR: 0.00004946 [01:17:18] Epoch: 1 Batch: 5708/38378 (14.87%) Loss: 2.019765 LR: 0.00004946 [01:17:24] >> Cleaned up old temp checkpoint: epoch1_step5379 [01:17:24] >> Temp checkpoint saved: epoch1_step5709, size: 0.1702 GB [01:17:24] Epoch: 1 Batch: 5709/38378 (14.88%) Loss: 1.940136 LR: 0.00004945 [01:17:26] Epoch: 1 Batch: 5710/38378 (14.88%) Loss: 2.073976 LR: 0.00004945 [01:17:27] Epoch: 1 Batch: 5711/38378 (14.88%) Loss: 1.934713 LR: 0.00004945 [01:17:29] Epoch: 1 Batch: 5712/38378 (14.88%) Loss: 1.734922 LR: 0.00004945 [01:17:31] Epoch: 1 Batch: 5713/38378 (14.89%) Loss: 1.777874 LR: 0.00004945 [01:17:33] Epoch: 1 Batch: 5714/38378 (14.89%) Loss: 1.737153 LR: 0.00004945 [01:17:34] Epoch: 1 Batch: 5715/38378 (14.89%) Loss: 2.153291 LR: 0.00004945 [01:17:36] Epoch: 1 Batch: 5716/38378 (14.89%) Loss: 1.849960 LR: 0.00004945 [01:17:38] Epoch: 1 Batch: 5717/38378 (14.90%) Loss: 1.969453 LR: 0.00004945 [01:17:39] Epoch: 1 Batch: 5718/38378 (14.90%) Loss: 2.020235 LR: 0.00004945 [01:17:41] Epoch: 1 Batch: 5719/38378 (14.90%) Loss: 1.999668 LR: 0.00004945 [01:17:43] Epoch: 1 Batch: 5720/38378 (14.90%) Loss: 2.147713 LR: 0.00004945 [01:17:44] Epoch: 1 Batch: 5721/38378 (14.91%) Loss: 2.123923 LR: 0.00004945 [01:17:46] Epoch: 1 Batch: 5722/38378 (14.91%) Loss: 2.227221 LR: 0.00004945 [01:17:48] Epoch: 1 Batch: 5723/38378 (14.91%) Loss: 2.061365 LR: 0.00004945 [01:17:50] Epoch: 1 Batch: 5724/38378 (14.91%) Loss: 2.166404 LR: 0.00004945 [01:17:51] Epoch: 1 Batch: 5725/38378 (14.92%) Loss: 2.127981 LR: 0.00004945 [01:17:53] Epoch: 1 Batch: 5726/38378 (14.92%) Loss: 1.851582 LR: 0.00004945 [01:17:55] Epoch: 1 Batch: 5727/38378 (14.92%) Loss: 2.050506 LR: 0.00004945 [01:17:56] Epoch: 1 Batch: 5728/38378 (14.93%) Loss: 2.218149 LR: 0.00004945 [01:17:58] Epoch: 1 Batch: 5729/38378 (14.93%) Loss: 1.767740 LR: 0.00004945 [01:18:00] Epoch: 1 Batch: 5730/38378 (14.93%) Loss: 1.946719 LR: 0.00004945 [01:18:02] Epoch: 1 Batch: 5731/38378 (14.93%) Loss: 1.980213 LR: 0.00004945 [01:18:03] Epoch: 1 Batch: 5732/38378 (14.94%) Loss: 2.113613 LR: 0.00004945 [01:18:05] Epoch: 1 Batch: 5733/38378 (14.94%) Loss: 1.973254 LR: 0.00004945 [01:18:07] Epoch: 1 Batch: 5734/38378 (14.94%) Loss: 2.242348 LR: 0.00004945 [01:18:08] Epoch: 1 Batch: 5735/38378 (14.94%) Loss: 1.989548 LR: 0.00004945 [01:18:10] Epoch: 1 Batch: 5736/38378 (14.95%) Loss: 1.941666 LR: 0.00004945 [01:18:12] Epoch: 1 Batch: 5737/38378 (14.95%) Loss: 2.138519 LR: 0.00004944 [01:18:14] Epoch: 1 Batch: 5738/38378 (14.95%) Loss: 2.084923 LR: 0.00004944 [01:18:15] Epoch: 1 Batch: 5739/38378 (14.95%) Loss: 1.920979 LR: 0.00004944 [01:18:17] Epoch: 1 Batch: 5740/38378 (14.96%) Loss: 1.970261 LR: 0.00004944 [01:18:19] Epoch: 1 Batch: 5741/38378 (14.96%) Loss: 1.939063 LR: 0.00004944 [02:39:57] 2025-08-12 [02:39:58] Tesla T4 [02:39:58] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [02:39:58] CPU usage: 67.1%, RAM usage: 27.4% [02:39:58] Running with the following configuration: [02:39:58] model_name: NousResearch/Hermes-3-Llama-3.1-8B [02:39:58] tokenizer: NousResearch/Hermes-3-Llama-3.1-8B [02:39:58] output_dir: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview [02:39:58] train_path: /content/drive/MyDrive/data/none.csv [02:39:58] checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step7656 [02:39:58] lr: 5e-05 [02:39:58] lr_floor: 1e-05 [02:39:58] epochs: 1 [02:39:58] batch_size: 5 [02:39:59] accum_steps: 7 [02:39:59] val_batch_size: 6 [02:39:59] max_val_size: 100 [02:39:59] max_length: 150 [02:39:59] save_temp_frequency: 33 [02:39:59] save_frequency: 500 [02:39:59] eval_frequency: 500 [02:39:59] save_pattern: y [02:39:59] quantization: y [02:39:59] quantization_bits: 4 [02:39:59] lora: y [02:39:59] frozen_lora_path: None [02:39:59] lora_rank: 16 [02:39:59] lora_alpha: 32 [02:39:59] lora_dropout: 0.08 [02:39:59] optimizer_weight_decay: 0.0 [02:39:59] warmup_type: cosine [02:39:59] warmup_ratio: 0.08 [02:39:59] warmup_steps: 439 [02:39:59] shuffle: y [02:39:59] csv_column: text [02:39:59] new_run: n [02:39:59] label_smoothing: 0.05 [02:39:59] SEED: 1 [02:39:59] Using device: cuda [02:39:59] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step7656 [02:41:35] Embeddings shape after: torch.Size([128256, 4096]) [02:41:45] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step7656 [02:41:45] Trainable LoRA 'default': [02:41:45] task_type: CAUSAL_LM [02:41:45] peft_type: PeftType.LORA [02:41:45] auto_mapping: None [02:41:45] base_model_name_or_path: NousResearch/Hermes-3-Llama-3.1-8B [02:41:45] revision: None [02:41:45] inference_mode: False [02:41:45] r: 16 [02:41:45] target_modules: {'k_proj', 'v_proj', 'q_proj', 'o_proj'} [02:41:45] exclude_modules: None [02:41:45] lora_alpha: 32 [02:41:45] lora_dropout: 0.08 [02:41:45] fan_in_fan_out: False [02:41:45] bias: none [02:41:45] use_rslora: True [02:41:45] modules_to_save: None [02:41:45] init_lora_weights: True [02:41:45] layers_to_transform: None [02:41:45] layers_pattern: None [02:41:45] rank_pattern: {} [02:41:45] alpha_pattern: {} [02:41:45] megatron_config: None [02:41:45] megatron_core: megatron.core [02:41:45] trainable_token_indices: None [02:41:45] loftq_config: {} [02:41:45] eva_config: None [02:41:45] corda_config: None [02:41:45] use_dora: False [02:41:45] use_qalora: False [02:41:45] qalora_group_size: 16 [02:41:45] layer_replication: None [02:41:45] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [02:41:45] lora_bias: False [02:41:45] target_parameters: None [02:41:45] _custom_modules: None [02:41:45] Embeddings shape after: torch.Size([128256, 4096]) [02:41:55] Resumed from epoch 1, step 7657, file 1 [02:41:55] Starting from CSV file... [02:42:01] Splitting data into chunks of 11000... [02:42:01] Using 7 processes across 18 chunks [02:42:02] Using saved train/val split from checkpoint. [02:42:02] Resuming scheduler with warmup steps: 438, total steps: 5482 [02:42:02] Initializing scheduler with cosine schedule with warmup, warmup steps 439, total steps: 5482 [02:42:02] Train/Val split: 191887 train, 100 val samples. [02:42:12] Model: PeftModelForCausalLM [02:42:12] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.0", "use_cache": true, "vocab_size": 128256 } [02:42:12] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [02:42:12] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 5e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [02:42:12] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [02:42:12] Scheduler: [02:42:12] Training on 191887 training samples, 100 validation samples [02:42:12] Average tokens per sample: 141.99 [02:42:12] Estimated epoch time: ~646.78 min [02:42:12] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Active memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Requested memory | 5983 MiB | 7000 MiB | 332173 MiB | 326190 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 7616 MiB | 7616 MiB | 7616 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 1259 MiB | 5879 MiB | 333261 MiB | 332002 MiB | |---------------------------------------------------------------------------| | Allocations | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | Active allocs | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | GPU reserved segments | 186 | 186 | 186 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 33 | 37 | 12954 | 12921 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [02:42:12] Restoring shuffle indices from training state for epoch 1 [02:42:12] CPU usage: 49.3%, RAM usage: 41.4% [02:42:13] Epoch 1 learning rate: 0.0 [02:42:13] Starting epoch 1 [02:42:41] Batch 7657: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) [02:42:42] Epoch: 1 Batch: 7657/38378 (19.95%) Loss: 2.279839 LR: 0.00000000 [02:42:44] Epoch: 1 Batch: 7658/38378 (19.95%) Loss: 2.200092 LR: 0.00000000 [02:42:46] Epoch: 1 Batch: 7659/38378 (19.96%) Loss: 1.931966 LR: 0.00000000 [02:42:47] Epoch: 1 Batch: 7660/38378 (19.96%) Loss: 2.368343 LR: 0.00000000 [02:42:49] Epoch: 1 Batch: 7661/38378 (19.96%) Loss: 2.128850 LR: 0.00000000 [02:42:51] Epoch: 1 Batch: 7662/38378 (19.96%) Loss: 1.942849 LR: 0.00000000 [02:42:53] Epoch: 1 Batch: 7663/38378 (19.97%) Loss: 1.790564 LR: 0.00004836 [02:42:54] Epoch: 1 Batch: 7664/38378 (19.97%) Loss: 2.171243 LR: 0.00004836 [02:42:56] Epoch: 1 Batch: 7665/38378 (19.97%) Loss: 2.022177 LR: 0.00004836 [02:42:58] Epoch: 1 Batch: 7666/38378 (19.97%) Loss: 2.302803 LR: 0.00004836 [02:43:00] Epoch: 1 Batch: 7667/38378 (19.98%) Loss: 2.157929 LR: 0.00004836 [02:43:02] Epoch: 1 Batch: 7668/38378 (19.98%) Loss: 1.991353 LR: 0.00004836 [02:43:03] Epoch: 1 Batch: 7669/38378 (19.98%) Loss: 1.856156 LR: 0.00004836 [02:43:05] Epoch: 1 Batch: 7670/38378 (19.99%) Loss: 2.050696 LR: 0.00004836 [02:43:07] Epoch: 1 Batch: 7671/38378 (19.99%) Loss: 2.196747 LR: 0.00004836 [02:43:09] Epoch: 1 Batch: 7672/38378 (19.99%) Loss: 2.278774 LR: 0.00004836 [02:43:11] Epoch: 1 Batch: 7673/38378 (19.99%) Loss: 2.029981 LR: 0.00004836 [02:43:13] Epoch: 1 Batch: 7674/38378 (20.00%) Loss: 1.926431 LR: 0.00004836 [02:43:14] Epoch: 1 Batch: 7675/38378 (20.00%) Loss: 1.959365 LR: 0.00004836 [02:43:16] Epoch: 1 Batch: 7676/38378 (20.00%) Loss: 1.975612 LR: 0.00004836 [02:43:18] Epoch: 1 Batch: 7677/38378 (20.00%) Loss: 2.201211 LR: 0.00004835 [02:43:20] Epoch: 1 Batch: 7678/38378 (20.01%) Loss: 2.016473 LR: 0.00004835 [02:43:22] Epoch: 1 Batch: 7679/38378 (20.01%) Loss: 1.858086 LR: 0.00004835 [02:43:24] Epoch: 1 Batch: 7680/38378 (20.01%) Loss: 2.092333 LR: 0.00004835 [02:43:26] Epoch: 1 Batch: 7681/38378 (20.01%) Loss: 1.989362 LR: 0.00004835 [02:43:28] Epoch: 1 Batch: 7682/38378 (20.02%) Loss: 1.844403 LR: 0.00004835 [02:43:30] Epoch: 1 Batch: 7683/38378 (20.02%) Loss: 2.107212 LR: 0.00004835 [02:43:32] Epoch: 1 Batch: 7684/38378 (20.02%) Loss: 1.946116 LR: 0.00004835 [02:43:34] Epoch: 1 Batch: 7685/38378 (20.02%) Loss: 2.190180 LR: 0.00004835 [02:43:36] Epoch: 1 Batch: 7686/38378 (20.03%) Loss: 1.993916 LR: 0.00004835 [02:43:37] Epoch: 1 Batch: 7687/38378 (20.03%) Loss: 2.217124 LR: 0.00004835 [02:43:39] Epoch: 1 Batch: 7688/38378 (20.03%) Loss: 1.866453 LR: 0.00004835 [02:43:46] >> Cleaned up old temp checkpoint: epoch1_step4224 [02:43:47] >> Cleaned up old temp checkpoint: epoch1_step4191 [02:43:47] >> Cleaned up old temp checkpoint: epoch1_step4158 [02:43:47] >> Temp checkpoint saved: epoch1_step7689, size: 0.1702 GB [02:43:47] Epoch: 1 Batch: 7689/38378 (20.03%) Loss: 1.947765 LR: 0.00004835 [02:43:49] Epoch: 1 Batch: 7690/38378 (20.04%) Loss: 2.161984 LR: 0.00004835 [02:43:51] Epoch: 1 Batch: 7691/38378 (20.04%) Loss: 2.005873 LR: 0.00004834 [02:43:52] Epoch: 1 Batch: 7692/38378 (20.04%) Loss: 2.278562 LR: 0.00004834 [02:43:54] Epoch: 1 Batch: 7693/38378 (20.05%) Loss: 1.965345 LR: 0.00004834 [02:43:56] Epoch: 1 Batch: 7694/38378 (20.05%) Loss: 1.874763 LR: 0.00004834 [02:43:58] Epoch: 1 Batch: 7695/38378 (20.05%) Loss: 2.075504 LR: 0.00004834 [02:43:59] Epoch: 1 Batch: 7696/38378 (20.05%) Loss: 2.121562 LR: 0.00004834 [02:44:01] Epoch: 1 Batch: 7697/38378 (20.06%) Loss: 1.946980 LR: 0.00004834 [02:44:03] Epoch: 1 Batch: 7698/38378 (20.06%) Loss: 1.976197 LR: 0.00004834 [02:44:05] Epoch: 1 Batch: 7699/38378 (20.06%) Loss: 1.856675 LR: 0.00004834 [02:44:07] Epoch: 1 Batch: 7700/38378 (20.06%) Loss: 2.203666 LR: 0.00004834 [02:44:08] Epoch: 1 Batch: 7701/38378 (20.07%) Loss: 2.162302 LR: 0.00004834 [02:44:10] Epoch: 1 Batch: 7702/38378 (20.07%) Loss: 1.778212 LR: 0.00004834 [02:44:12] Epoch: 1 Batch: 7703/38378 (20.07%) Loss: 1.744386 LR: 0.00004834 [02:44:13] Epoch: 1 Batch: 7704/38378 (20.07%) Loss: 2.771662 LR: 0.00004834 [02:44:15] Epoch: 1 Batch: 7705/38378 (20.08%) Loss: 2.067090 LR: 0.00004833 [02:44:17] Epoch: 1 Batch: 7706/38378 (20.08%) Loss: 1.969250 LR: 0.00004833 [02:44:19] Epoch: 1 Batch: 7707/38378 (20.08%) Loss: 1.918763 LR: 0.00004833 [02:44:20] Epoch: 1 Batch: 7708/38378 (20.08%) Loss: 1.937641 LR: 0.00004833 [02:44:22] Epoch: 1 Batch: 7709/38378 (20.09%) Loss: 1.871756 LR: 0.00004833 [02:44:24] Epoch: 1 Batch: 7710/38378 (20.09%) Loss: 1.770394 LR: 0.00004833 [02:44:26] Epoch: 1 Batch: 7711/38378 (20.09%) Loss: 2.016642 LR: 0.00004833 [02:44:28] Epoch: 1 Batch: 7712/38378 (20.09%) Loss: 1.928336 LR: 0.00004833 [02:44:30] Epoch: 1 Batch: 7713/38378 (20.10%) Loss: 1.820101 LR: 0.00004833 [02:44:31] Epoch: 1 Batch: 7714/38378 (20.10%) Loss: 1.967673 LR: 0.00004833 [02:44:33] Epoch: 1 Batch: 7715/38378 (20.10%) Loss: 2.097526 LR: 0.00004833 [02:44:35] Epoch: 1 Batch: 7716/38378 (20.11%) Loss: 1.917682 LR: 0.00004833 [02:44:37] Epoch: 1 Batch: 7717/38378 (20.11%) Loss: 2.069004 LR: 0.00004833 [02:44:39] Epoch: 1 Batch: 7718/38378 (20.11%) Loss: 2.032358 LR: 0.00004833 [02:44:41] Epoch: 1 Batch: 7719/38378 (20.11%) Loss: 2.066600 LR: 0.00004832 [02:44:42] Epoch: 1 Batch: 7720/38378 (20.12%) Loss: 2.009333 LR: 0.00004832 [02:44:44] Epoch: 1 Batch: 7721/38378 (20.12%) Loss: 2.052599 LR: 0.00004832 [02:44:51] >> Cleaned up old temp checkpoint: epoch1_step4257 [02:44:51] >> Temp checkpoint saved: epoch1_step7722, size: 0.1702 GB [02:44:51] Epoch: 1 Batch: 7722/38378 (20.12%) Loss: 2.046376 LR: 0.00004832 [02:44:53] Epoch: 1 Batch: 7723/38378 (20.12%) Loss: 2.241497 LR: 0.00004832 [02:44:54] Epoch: 1 Batch: 7724/38378 (20.13%) Loss: 1.844923 LR: 0.00004832 [02:44:56] Epoch: 1 Batch: 7725/38378 (20.13%) Loss: 1.929844 LR: 0.00004832 [02:44:58] Epoch: 1 Batch: 7726/38378 (20.13%) Loss: 2.020353 LR: 0.00004832 [02:45:00] Epoch: 1 Batch: 7727/38378 (20.13%) Loss: 1.701778 LR: 0.00004832 [02:45:02] Epoch: 1 Batch: 7728/38378 (20.14%) Loss: 1.986983 LR: 0.00004832 [02:45:03] Epoch: 1 Batch: 7729/38378 (20.14%) Loss: 2.074876 LR: 0.00004832 [02:45:05] Epoch: 1 Batch: 7730/38378 (20.14%) Loss: 1.979867 LR: 0.00004832 [02:45:07] Epoch: 1 Batch: 7731/38378 (20.14%) Loss: 2.241323 LR: 0.00004832 [02:45:09] Epoch: 1 Batch: 7732/38378 (20.15%) Loss: 2.343879 LR: 0.00004832 [02:45:11] Epoch: 1 Batch: 7733/38378 (20.15%) Loss: 1.992258 LR: 0.00004831 [02:45:12] Epoch: 1 Batch: 7734/38378 (20.15%) Loss: 2.215482 LR: 0.00004831 [02:45:14] Epoch: 1 Batch: 7735/38378 (20.15%) Loss: 2.025219 LR: 0.00004831 [02:45:16] Epoch: 1 Batch: 7736/38378 (20.16%) Loss: 2.365318 LR: 0.00004831 [02:45:18] Epoch: 1 Batch: 7737/38378 (20.16%) Loss: 1.943261 LR: 0.00004831 [02:45:20] Epoch: 1 Batch: 7738/38378 (20.16%) Loss: 2.028336 LR: 0.00004831 [02:45:22] Epoch: 1 Batch: 7739/38378 (20.17%) Loss: 2.191431 LR: 0.00004831 [02:45:23] Epoch: 1 Batch: 7740/38378 (20.17%) Loss: 1.980772 LR: 0.00004831 [02:45:25] Epoch: 1 Batch: 7741/38378 (20.17%) Loss: 2.098001 LR: 0.00004831 [02:45:27] Epoch: 1 Batch: 7742/38378 (20.17%) Loss: 1.824117 LR: 0.00004831 [02:45:29] Epoch: 1 Batch: 7743/38378 (20.18%) Loss: 1.925163 LR: 0.00004831 [02:45:31] Epoch: 1 Batch: 7744/38378 (20.18%) Loss: 2.291680 LR: 0.00004831 [02:45:33] Epoch: 1 Batch: 7745/38378 (20.18%) Loss: 1.865411 LR: 0.00004831 [02:45:34] Epoch: 1 Batch: 7746/38378 (20.18%) Loss: 2.102697 LR: 0.00004831 [02:45:36] Epoch: 1 Batch: 7747/38378 (20.19%) Loss: 2.317083 LR: 0.00004830 [02:45:38] Epoch: 1 Batch: 7748/38378 (20.19%) Loss: 1.971925 LR: 0.00004830 [02:45:40] Epoch: 1 Batch: 7749/38378 (20.19%) Loss: 2.120839 LR: 0.00004830 [02:45:42] Epoch: 1 Batch: 7750/38378 (20.19%) Loss: 2.042975 LR: 0.00004830 [02:45:43] Epoch: 1 Batch: 7751/38378 (20.20%) Loss: 2.127020 LR: 0.00004830 [02:45:45] Epoch: 1 Batch: 7752/38378 (20.20%) Loss: 2.111148 LR: 0.00004830 [02:45:47] Epoch: 1 Batch: 7753/38378 (20.20%) Loss: 1.979473 LR: 0.00004830 [02:45:49] Epoch: 1 Batch: 7754/38378 (20.20%) Loss: 1.845352 LR: 0.00004830 [02:45:55] >> Cleaned up old temp checkpoint: epoch1_step4290 [02:45:55] >> Temp checkpoint saved: epoch1_step7755, size: 0.1702 GB [02:45:55] Epoch: 1 Batch: 7755/38378 (20.21%) Loss: 1.962475 LR: 0.00004830 [02:45:57] Epoch: 1 Batch: 7756/38378 (20.21%) Loss: 2.031070 LR: 0.00004830 [02:45:59] Epoch: 1 Batch: 7757/38378 (20.21%) Loss: 1.916147 LR: 0.00004830 [02:46:01] Epoch: 1 Batch: 7758/38378 (20.21%) Loss: 1.688788 LR: 0.00004830 [02:46:02] Epoch: 1 Batch: 7759/38378 (20.22%) Loss: 1.911380 LR: 0.00004830 [02:46:04] Epoch: 1 Batch: 7760/38378 (20.22%) Loss: 2.084385 LR: 0.00004830 [02:46:06] Epoch: 1 Batch: 7761/38378 (20.22%) Loss: 2.199245 LR: 0.00004829 [02:46:08] Epoch: 1 Batch: 7762/38378 (20.23%) Loss: 2.094602 LR: 0.00004829 [02:46:10] Epoch: 1 Batch: 7763/38378 (20.23%) Loss: 1.826198 LR: 0.00004829 [02:46:12] Epoch: 1 Batch: 7764/38378 (20.23%) Loss: 1.922884 LR: 0.00004829 [02:46:13] Epoch: 1 Batch: 7765/38378 (20.23%) Loss: 2.063935 LR: 0.00004829 [02:46:15] Epoch: 1 Batch: 7766/38378 (20.24%) Loss: 2.178767 LR: 0.00004829 [02:46:17] Epoch: 1 Batch: 7767/38378 (20.24%) Loss: 2.073499 LR: 0.00004829 [02:46:19] Epoch: 1 Batch: 7768/38378 (20.24%) Loss: 1.855149 LR: 0.00004829 [02:46:21] Epoch: 1 Batch: 7769/38378 (20.24%) Loss: 1.811892 LR: 0.00004829 [02:46:23] Epoch: 1 Batch: 7770/38378 (20.25%) Loss: 2.036003 LR: 0.00004829 [02:46:24] Epoch: 1 Batch: 7771/38378 (20.25%) Loss: 2.234199 LR: 0.00004829 [02:46:26] Epoch: 1 Batch: 7772/38378 (20.25%) Loss: 2.235971 LR: 0.00004829 [02:46:28] Epoch: 1 Batch: 7773/38378 (20.25%) Loss: 2.056884 LR: 0.00004829 [02:46:30] Epoch: 1 Batch: 7774/38378 (20.26%) Loss: 2.011361 LR: 0.00004829 [02:46:32] Epoch: 1 Batch: 7775/38378 (20.26%) Loss: 2.243298 LR: 0.00004828 [02:46:33] Epoch: 1 Batch: 7776/38378 (20.26%) Loss: 2.169891 LR: 0.00004828 [02:46:35] Epoch: 1 Batch: 7777/38378 (20.26%) Loss: 2.036465 LR: 0.00004828 [02:46:37] Epoch: 1 Batch: 7778/38378 (20.27%) Loss: 2.150438 LR: 0.00004828 [02:46:39] Epoch: 1 Batch: 7779/38378 (20.27%) Loss: 2.083549 LR: 0.00004828 [02:46:41] Epoch: 1 Batch: 7780/38378 (20.27%) Loss: 2.269890 LR: 0.00004828 [02:46:42] Epoch: 1 Batch: 7781/38378 (20.27%) Loss: 2.072986 LR: 0.00004828 [02:46:44] Epoch: 1 Batch: 7782/38378 (20.28%) Loss: 2.023790 LR: 0.00004828 [02:46:46] Epoch: 1 Batch: 7783/38378 (20.28%) Loss: 1.862268 LR: 0.00004828 [02:46:48] Epoch: 1 Batch: 7784/38378 (20.28%) Loss: 1.926476 LR: 0.00004828 [02:46:50] Epoch: 1 Batch: 7785/38378 (20.29%) Loss: 2.041685 LR: 0.00004828 [02:46:52] Epoch: 1 Batch: 7786/38378 (20.29%) Loss: 1.866477 LR: 0.00004828 [02:46:53] Epoch: 1 Batch: 7787/38378 (20.29%) Loss: 2.004151 LR: 0.00004828 [02:47:00] >> Cleaned up old temp checkpoint: epoch1_step4323 [02:47:00] >> Temp checkpoint saved: epoch1_step7788, size: 0.1702 GB [02:47:00] Epoch: 1 Batch: 7788/38378 (20.29%) Loss: 1.877562 LR: 0.00004828 [02:47:02] Epoch: 1 Batch: 7789/38378 (20.30%) Loss: 1.827030 LR: 0.00004827 [02:47:03] Epoch: 1 Batch: 7790/38378 (20.30%) Loss: 1.890854 LR: 0.00004827 [02:47:05] Epoch: 1 Batch: 7791/38378 (20.30%) Loss: 1.963534 LR: 0.00004827 [02:47:07] Epoch: 1 Batch: 7792/38378 (20.30%) Loss: 2.025708 LR: 0.00004827 [02:47:09] Epoch: 1 Batch: 7793/38378 (20.31%) Loss: 1.888261 LR: 0.00004827 [02:47:10] Epoch: 1 Batch: 7794/38378 (20.31%) Loss: 2.060962 LR: 0.00004827 [02:47:12] Epoch: 1 Batch: 7795/38378 (20.31%) Loss: 1.744882 LR: 0.00004827 [02:47:14] Epoch: 1 Batch: 7796/38378 (20.31%) Loss: 2.348481 LR: 0.00004827 [02:47:16] Epoch: 1 Batch: 7797/38378 (20.32%) Loss: 2.295023 LR: 0.00004827 [02:47:18] Epoch: 1 Batch: 7798/38378 (20.32%) Loss: 2.117446 LR: 0.00004827 [02:47:20] Epoch: 1 Batch: 7799/38378 (20.32%) Loss: 2.004434 LR: 0.00004827 [02:47:21] Epoch: 1 Batch: 7800/38378 (20.32%) Loss: 1.700328 LR: 0.00004827 [02:47:23] Epoch: 1 Batch: 7801/38378 (20.33%) Loss: 1.963206 LR: 0.00004827 [02:47:25] Epoch: 1 Batch: 7802/38378 (20.33%) Loss: 2.158362 LR: 0.00004827 [02:47:27] Epoch: 1 Batch: 7803/38378 (20.33%) Loss: 2.078999 LR: 0.00004826 [02:47:29] Epoch: 1 Batch: 7804/38378 (20.33%) Loss: 2.025779 LR: 0.00004826 [02:47:31] Epoch: 1 Batch: 7805/38378 (20.34%) Loss: 1.882740 LR: 0.00004826 [02:47:32] Epoch: 1 Batch: 7806/38378 (20.34%) Loss: 2.003368 LR: 0.00004826 [02:47:34] Epoch: 1 Batch: 7807/38378 (20.34%) Loss: 2.162062 LR: 0.00004826 [02:47:36] Epoch: 1 Batch: 7808/38378 (20.34%) Loss: 2.009810 LR: 0.00004826 [02:47:38] Epoch: 1 Batch: 7809/38378 (20.35%) Loss: 1.896341 LR: 0.00004826 [02:47:40] Epoch: 1 Batch: 7810/38378 (20.35%) Loss: 1.945916 LR: 0.00004826 [02:47:42] Epoch: 1 Batch: 7811/38378 (20.35%) Loss: 1.896764 LR: 0.00004826 [02:47:43] Epoch: 1 Batch: 7812/38378 (20.36%) Loss: 1.952393 LR: 0.00004826 [02:47:45] Epoch: 1 Batch: 7813/38378 (20.36%) Loss: 2.402016 LR: 0.00004826 [02:47:47] Epoch: 1 Batch: 7814/38378 (20.36%) Loss: 2.037831 LR: 0.00004826 [02:47:49] Epoch: 1 Batch: 7815/38378 (20.36%) Loss: 2.036178 LR: 0.00004826 [02:47:51] Epoch: 1 Batch: 7816/38378 (20.37%) Loss: 2.213655 LR: 0.00004826 [02:47:53] Epoch: 1 Batch: 7817/38378 (20.37%) Loss: 2.171077 LR: 0.00004825 [02:47:55] Epoch: 1 Batch: 7818/38378 (20.37%) Loss: 2.044367 LR: 0.00004825 [02:47:56] Epoch: 1 Batch: 7819/38378 (20.37%) Loss: 2.107384 LR: 0.00004825 [02:47:58] Epoch: 1 Batch: 7820/38378 (20.38%) Loss: 1.899033 LR: 0.00004825 [02:48:05] >> Cleaned up old temp checkpoint: epoch1_step4356 [02:48:05] >> Temp checkpoint saved: epoch1_step7821, size: 0.1702 GB [02:48:05] Epoch: 1 Batch: 7821/38378 (20.38%) Loss: 1.902879 LR: 0.00004825 [02:48:07] Epoch: 1 Batch: 7822/38378 (20.38%) Loss: 1.992112 LR: 0.00004825 [02:48:08] Epoch: 1 Batch: 7823/38378 (20.38%) Loss: 1.860241 LR: 0.00004825 [02:48:10] Epoch: 1 Batch: 7824/38378 (20.39%) Loss: 1.721268 LR: 0.00004825 [02:48:12] Epoch: 1 Batch: 7825/38378 (20.39%) Loss: 2.308289 LR: 0.00004825 [02:48:14] Epoch: 1 Batch: 7826/38378 (20.39%) Loss: 2.094235 LR: 0.00004825 [02:48:15] Epoch: 1 Batch: 7827/38378 (20.39%) Loss: 1.860471 LR: 0.00004825 [02:48:17] Epoch: 1 Batch: 7828/38378 (20.40%) Loss: 1.922462 LR: 0.00004825 [02:48:19] Epoch: 1 Batch: 7829/38378 (20.40%) Loss: 2.116393 LR: 0.00004825 [02:48:21] Epoch: 1 Batch: 7830/38378 (20.40%) Loss: 2.087054 LR: 0.00004825 [02:48:23] Epoch: 1 Batch: 7831/38378 (20.40%) Loss: 1.753931 LR: 0.00004824 [02:48:25] Epoch: 1 Batch: 7832/38378 (20.41%) Loss: 1.686618 LR: 0.00004824 [02:48:26] Epoch: 1 Batch: 7833/38378 (20.41%) Loss: 1.879802 LR: 0.00004824 [02:48:28] Epoch: 1 Batch: 7834/38378 (20.41%) Loss: 2.077358 LR: 0.00004824 [02:48:30] Epoch: 1 Batch: 7835/38378 (20.42%) Loss: 2.151112 LR: 0.00004824 [02:48:32] Epoch: 1 Batch: 7836/38378 (20.42%) Loss: 2.061481 LR: 0.00004824 [02:48:34] Epoch: 1 Batch: 7837/38378 (20.42%) Loss: 2.212515 LR: 0.00004824 [02:48:36] Epoch: 1 Batch: 7838/38378 (20.42%) Loss: 2.159324 LR: 0.00004824 [02:48:37] Epoch: 1 Batch: 7839/38378 (20.43%) Loss: 2.274742 LR: 0.00004824 [02:48:39] Epoch: 1 Batch: 7840/38378 (20.43%) Loss: 1.870944 LR: 0.00004824 [02:48:41] Epoch: 1 Batch: 7841/38378 (20.43%) Loss: 1.919613 LR: 0.00004824 [02:48:43] Epoch: 1 Batch: 7842/38378 (20.43%) Loss: 2.036011 LR: 0.00004824 [02:48:45] Epoch: 1 Batch: 7843/38378 (20.44%) Loss: 2.351251 LR: 0.00004824 [02:48:47] Epoch: 1 Batch: 7844/38378 (20.44%) Loss: 2.177514 LR: 0.00004824 [02:48:48] Epoch: 1 Batch: 7845/38378 (20.44%) Loss: 1.839619 LR: 0.00004823 [02:48:50] Epoch: 1 Batch: 7846/38378 (20.44%) Loss: 2.066897 LR: 0.00004823 [02:48:52] Epoch: 1 Batch: 7847/38378 (20.45%) Loss: 2.027008 LR: 0.00004823 [02:48:54] Epoch: 1 Batch: 7848/38378 (20.45%) Loss: 2.034527 LR: 0.00004823 [02:48:56] Epoch: 1 Batch: 7849/38378 (20.45%) Loss: 1.987746 LR: 0.00004823 [02:48:57] Epoch: 1 Batch: 7850/38378 (20.45%) Loss: 1.856069 LR: 0.00004823 [02:48:59] Epoch: 1 Batch: 7851/38378 (20.46%) Loss: 2.135460 LR: 0.00004823 [02:49:01] Epoch: 1 Batch: 7852/38378 (20.46%) Loss: 1.905659 LR: 0.00004823 [02:49:03] Epoch: 1 Batch: 7853/38378 (20.46%) Loss: 1.849123 LR: 0.00004823 [02:49:09] >> Cleaned up old temp checkpoint: epoch1_step4389 [02:49:09] >> Temp checkpoint saved: epoch1_step7854, size: 0.1702 GB [02:49:09] Epoch: 1 Batch: 7854/38378 (20.46%) Loss: 2.231301 LR: 0.00004823 [02:49:11] Epoch: 1 Batch: 7855/38378 (20.47%) Loss: 2.061322 LR: 0.00004823 [02:49:13] Epoch: 1 Batch: 7856/38378 (20.47%) Loss: 2.131053 LR: 0.00004823 [02:49:15] Epoch: 1 Batch: 7857/38378 (20.47%) Loss: 2.423076 LR: 0.00004823 [02:49:16] Epoch: 1 Batch: 7858/38378 (20.48%) Loss: 2.020836 LR: 0.00004823 [02:49:18] Epoch: 1 Batch: 7859/38378 (20.48%) Loss: 2.147482 LR: 0.00004822 [02:49:20] Epoch: 1 Batch: 7860/38378 (20.48%) Loss: 1.893150 LR: 0.00004822 [02:49:22] Epoch: 1 Batch: 7861/38378 (20.48%) Loss: 1.961560 LR: 0.00004822 [02:49:24] Epoch: 1 Batch: 7862/38378 (20.49%) Loss: 2.112576 LR: 0.00004822 [02:49:25] Epoch: 1 Batch: 7863/38378 (20.49%) Loss: 1.891842 LR: 0.00004822 [02:49:27] Epoch: 1 Batch: 7864/38378 (20.49%) Loss: 1.689456 LR: 0.00004822 [02:49:29] Epoch: 1 Batch: 7865/38378 (20.49%) Loss: 1.758470 LR: 0.00004822 [02:49:31] Epoch: 1 Batch: 7866/38378 (20.50%) Loss: 1.998850 LR: 0.00004822 [02:49:33] Epoch: 1 Batch: 7867/38378 (20.50%) Loss: 1.734300 LR: 0.00004822 [02:49:34] Epoch: 1 Batch: 7868/38378 (20.50%) Loss: 2.164288 LR: 0.00004822 [02:49:36] Epoch: 1 Batch: 7869/38378 (20.50%) Loss: 1.991787 LR: 0.00004822 [02:49:38] Epoch: 1 Batch: 7870/38378 (20.51%) Loss: 1.948953 LR: 0.00004822 [02:49:40] Epoch: 1 Batch: 7871/38378 (20.51%) Loss: 2.196657 LR: 0.00004822 [02:49:42] Epoch: 1 Batch: 7872/38378 (20.51%) Loss: 2.004462 LR: 0.00004822 [02:49:44] Epoch: 1 Batch: 7873/38378 (20.51%) Loss: 2.105367 LR: 0.00004821 [02:49:45] Epoch: 1 Batch: 7874/38378 (20.52%) Loss: 1.773513 LR: 0.00004821 [02:49:47] Epoch: 1 Batch: 7875/38378 (20.52%) Loss: 1.979963 LR: 0.00004821 [02:49:49] Epoch: 1 Batch: 7876/38378 (20.52%) Loss: 2.329052 LR: 0.00004821 [02:49:51] Epoch: 1 Batch: 7877/38378 (20.52%) Loss: 1.951219 LR: 0.00004821 [02:49:53] Epoch: 1 Batch: 7878/38378 (20.53%) Loss: 2.061749 LR: 0.00004821 [02:49:55] Epoch: 1 Batch: 7879/38378 (20.53%) Loss: 2.244781 LR: 0.00004821 [02:49:57] Epoch: 1 Batch: 7880/38378 (20.53%) Loss: 2.217209 LR: 0.00004821 [02:49:58] Epoch: 1 Batch: 7881/38378 (20.54%) Loss: 1.786519 LR: 0.00004821 [02:50:00] Epoch: 1 Batch: 7882/38378 (20.54%) Loss: 2.119271 LR: 0.00004821 [02:50:03] Epoch: 1 Batch: 7883/38378 (20.54%) Loss: 2.064533 LR: 0.00004821 [02:50:04] Epoch: 1 Batch: 7884/38378 (20.54%) Loss: 2.101726 LR: 0.00004821 [02:50:06] Epoch: 1 Batch: 7885/38378 (20.55%) Loss: 2.036859 LR: 0.00004821 [02:50:08] Epoch: 1 Batch: 7886/38378 (20.55%) Loss: 2.055555 LR: 0.00004821 [02:50:14] >> Cleaned up old temp checkpoint: epoch1_step4422 [02:50:14] >> Temp checkpoint saved: epoch1_step7887, size: 0.1702 GB [02:50:14] Epoch: 1 Batch: 7887/38378 (20.55%) Loss: 2.180634 LR: 0.00004820 [02:50:16] Epoch: 1 Batch: 7888/38378 (20.55%) Loss: 1.894246 LR: 0.00004820 [02:50:18] Epoch: 1 Batch: 7889/38378 (20.56%) Loss: 1.755736 LR: 0.00004820 [02:50:20] Epoch: 1 Batch: 7890/38378 (20.56%) Loss: 1.985015 LR: 0.00004820 [02:50:22] Epoch: 1 Batch: 7891/38378 (20.56%) Loss: 2.013196 LR: 0.00004820 [02:50:23] Epoch: 1 Batch: 7892/38378 (20.56%) Loss: 2.132719 LR: 0.00004820 [02:50:25] Epoch: 1 Batch: 7893/38378 (20.57%) Loss: 1.491899 LR: 0.00004820 [02:50:27] Epoch: 1 Batch: 7894/38378 (20.57%) Loss: 1.654181 LR: 0.00004820 [02:50:29] Epoch: 1 Batch: 7895/38378 (20.57%) Loss: 1.884698 LR: 0.00004820 [02:50:30] Epoch: 1 Batch: 7896/38378 (20.57%) Loss: 2.191949 LR: 0.00004820 [02:50:32] Epoch: 1 Batch: 7897/38378 (20.58%) Loss: 1.926924 LR: 0.00004820 [02:50:34] Epoch: 1 Batch: 7898/38378 (20.58%) Loss: 1.856888 LR: 0.00004820 [02:50:36] Epoch: 1 Batch: 7899/38378 (20.58%) Loss: 2.166034 LR: 0.00004820 [02:50:38] Epoch: 1 Batch: 7900/38378 (20.58%) Loss: 2.040722 LR: 0.00004820 [02:50:40] Epoch: 1 Batch: 7901/38378 (20.59%) Loss: 1.980067 LR: 0.00004819 [02:50:41] Epoch: 1 Batch: 7902/38378 (20.59%) Loss: 1.961213 LR: 0.00004819 [02:50:43] Epoch: 1 Batch: 7903/38378 (20.59%) Loss: 2.298577 LR: 0.00004819 [02:50:45] Epoch: 1 Batch: 7904/38378 (20.60%) Loss: 1.874646 LR: 0.00004819 [02:50:47] Epoch: 1 Batch: 7905/38378 (20.60%) Loss: 2.095804 LR: 0.00004819 [02:50:49] Epoch: 1 Batch: 7906/38378 (20.60%) Loss: 1.858909 LR: 0.00004819 [02:50:51] Epoch: 1 Batch: 7907/38378 (20.60%) Loss: 2.186397 LR: 0.00004819 [02:50:53] Epoch: 1 Batch: 7908/38378 (20.61%) Loss: 1.998580 LR: 0.00004819 [02:50:54] Epoch: 1 Batch: 7909/38378 (20.61%) Loss: 2.001812 LR: 0.00004819 [02:50:56] Epoch: 1 Batch: 7910/38378 (20.61%) Loss: 2.328959 LR: 0.00004819 [02:50:58] Epoch: 1 Batch: 7911/38378 (20.61%) Loss: 2.136782 LR: 0.00004819 [02:51:00] Epoch: 1 Batch: 7912/38378 (20.62%) Loss: 1.977580 LR: 0.00004819 [02:51:02] Epoch: 1 Batch: 7913/38378 (20.62%) Loss: 2.158888 LR: 0.00004819 [02:51:04] Epoch: 1 Batch: 7914/38378 (20.62%) Loss: 1.647700 LR: 0.00004819 [02:51:05] Epoch: 1 Batch: 7915/38378 (20.62%) Loss: 1.941090 LR: 0.00004818 [02:51:07] Epoch: 1 Batch: 7916/38378 (20.63%) Loss: 1.789460 LR: 0.00004818 [02:51:09] Epoch: 1 Batch: 7917/38378 (20.63%) Loss: 1.988927 LR: 0.00004818 [02:51:11] Epoch: 1 Batch: 7918/38378 (20.63%) Loss: 1.946123 LR: 0.00004818 [02:51:13] Epoch: 1 Batch: 7919/38378 (20.63%) Loss: 1.711469 LR: 0.00004818 [02:51:20] >> Cleaned up old temp checkpoint: epoch1_step4653 [02:51:20] >> Cleaned up old temp checkpoint: epoch1_step4620 [02:51:21] >> Cleaned up old temp checkpoint: epoch1_step4587 [02:51:21] >> Cleaned up old temp checkpoint: epoch1_step4554 [02:51:22] >> Cleaned up old temp checkpoint: epoch1_step4521 [02:51:22] >> Cleaned up old temp checkpoint: epoch1_step4488 [02:51:22] >> Cleaned up old temp checkpoint: epoch1_step4455 [02:51:22] >> Temp checkpoint saved: epoch1_step7920, size: 0.1702 GB [02:51:22] Epoch: 1 Batch: 7920/38378 (20.64%) Loss: 2.105002 LR: 0.00004818 [02:51:24] Epoch: 1 Batch: 7921/38378 (20.64%) Loss: 1.774501 LR: 0.00004818 [02:51:26] Epoch: 1 Batch: 7922/38378 (20.64%) Loss: 1.849257 LR: 0.00004818 [02:51:27] Epoch: 1 Batch: 7923/38378 (20.64%) Loss: 2.016015 LR: 0.00004818 [02:51:29] Epoch: 1 Batch: 7924/38378 (20.65%) Loss: 1.887791 LR: 0.00004818 [02:51:31] Epoch: 1 Batch: 7925/38378 (20.65%) Loss: 1.701907 LR: 0.00004818 [02:51:33] Epoch: 1 Batch: 7926/38378 (20.65%) Loss: 1.901340 LR: 0.00004818 [02:51:35] Epoch: 1 Batch: 7927/38378 (20.66%) Loss: 2.236007 LR: 0.00004818 [02:51:36] Epoch: 1 Batch: 7928/38378 (20.66%) Loss: 1.727237 LR: 0.00004818 [02:51:38] Epoch: 1 Batch: 7929/38378 (20.66%) Loss: 2.217573 LR: 0.00004817 [02:51:40] Epoch: 1 Batch: 7930/38378 (20.66%) Loss: 2.333663 LR: 0.00004817 [02:51:42] Epoch: 1 Batch: 7931/38378 (20.67%) Loss: 1.950191 LR: 0.00004817 [02:51:44] Epoch: 1 Batch: 7932/38378 (20.67%) Loss: 1.832566 LR: 0.00004817 [02:51:46] Epoch: 1 Batch: 7933/38378 (20.67%) Loss: 2.252072 LR: 0.00004817 [02:51:47] Epoch: 1 Batch: 7934/38378 (20.67%) Loss: 2.146582 LR: 0.00004817 [02:51:49] Epoch: 1 Batch: 7935/38378 (20.68%) Loss: 2.188895 LR: 0.00004817 [02:51:51] Epoch: 1 Batch: 7936/38378 (20.68%) Loss: 1.899095 LR: 0.00004817 [02:51:53] Epoch: 1 Batch: 7937/38378 (20.68%) Loss: 1.989581 LR: 0.00004817 [02:51:55] Epoch: 1 Batch: 7938/38378 (20.68%) Loss: 2.001612 LR: 0.00004817 [02:51:57] Epoch: 1 Batch: 7939/38378 (20.69%) Loss: 2.168845 LR: 0.00004817 [02:51:59] Epoch: 1 Batch: 7940/38378 (20.69%) Loss: 1.811673 LR: 0.00004817 [02:52:01] Epoch: 1 Batch: 7941/38378 (20.69%) Loss: 2.331052 LR: 0.00004817 [02:52:02] Epoch: 1 Batch: 7942/38378 (20.69%) Loss: 1.842121 LR: 0.00004817 [02:52:04] Epoch: 1 Batch: 7943/38378 (20.70%) Loss: 2.136697 LR: 0.00004816 [02:52:06] Epoch: 1 Batch: 7944/38378 (20.70%) Loss: 1.952896 LR: 0.00004816 [02:52:08] Epoch: 1 Batch: 7945/38378 (20.70%) Loss: 2.275040 LR: 0.00004816 [02:52:09] Epoch: 1 Batch: 7946/38378 (20.70%) Loss: 2.144568 LR: 0.00004816 [02:52:11] Epoch: 1 Batch: 7947/38378 (20.71%) Loss: 2.186820 LR: 0.00004816 [02:52:13] Epoch: 1 Batch: 7948/38378 (20.71%) Loss: 1.633857 LR: 0.00004816 [02:52:15] Epoch: 1 Batch: 7949/38378 (20.71%) Loss: 2.093174 LR: 0.00004816 [02:52:17] Epoch: 1 Batch: 7950/38378 (20.71%) Loss: 1.927283 LR: 0.00004815 [02:52:18] Epoch: 1 Batch: 7951/38378 (20.72%) Loss: 1.875328 LR: 0.00004815 [02:52:20] Epoch: 1 Batch: 7952/38378 (20.72%) Loss: 1.860049 LR: 0.00004815 [02:52:27] >> Cleaned up old temp checkpoint: epoch1_step4851 [02:52:28] >> Cleaned up old temp checkpoint: epoch1_step4818 [02:52:29] >> Cleaned up old temp checkpoint: epoch1_step4785 [02:52:29] >> Cleaned up old temp checkpoint: epoch1_step4752 [02:52:29] >> Cleaned up old temp checkpoint: epoch1_step4719 [02:52:30] >> Cleaned up old temp checkpoint: epoch1_step4686 [02:52:30] >> Temp checkpoint saved: epoch1_step7953, size: 0.1702 GB [02:52:30] Epoch: 1 Batch: 7953/38378 (20.72%) Loss: 2.310371 LR: 0.00004815 [02:52:31] Epoch: 1 Batch: 7954/38378 (20.73%) Loss: 1.940253 LR: 0.00004815 [02:52:33] Epoch: 1 Batch: 7955/38378 (20.73%) Loss: 1.933596 LR: 0.00004815 [02:52:35] Epoch: 1 Batch: 7956/38378 (20.73%) Loss: 1.821873 LR: 0.00004815 [02:52:37] Epoch: 1 Batch: 7957/38378 (20.73%) Loss: 1.958018 LR: 0.00004815 [02:52:38] Epoch: 1 Batch: 7958/38378 (20.74%) Loss: 2.239073 LR: 0.00004815 [02:52:40] Epoch: 1 Batch: 7959/38378 (20.74%) Loss: 2.258297 LR: 0.00004815 [02:52:42] Epoch: 1 Batch: 7960/38378 (20.74%) Loss: 2.124160 LR: 0.00004815 [02:52:44] Epoch: 1 Batch: 7961/38378 (20.74%) Loss: 2.171405 LR: 0.00004815 [02:52:46] Epoch: 1 Batch: 7962/38378 (20.75%) Loss: 2.025572 LR: 0.00004815 [02:52:47] Epoch: 1 Batch: 7963/38378 (20.75%) Loss: 1.907111 LR: 0.00004815 [02:52:49] Epoch: 1 Batch: 7964/38378 (20.75%) Loss: 2.199614 LR: 0.00004814 [02:52:51] Epoch: 1 Batch: 7965/38378 (20.75%) Loss: 2.021102 LR: 0.00004814 [02:52:53] Epoch: 1 Batch: 7966/38378 (20.76%) Loss: 2.041867 LR: 0.00004814 [02:52:55] Epoch: 1 Batch: 7967/38378 (20.76%) Loss: 1.951577 LR: 0.00004814 [02:52:57] Epoch: 1 Batch: 7968/38378 (20.76%) Loss: 1.673662 LR: 0.00004814 [02:52:59] Epoch: 1 Batch: 7969/38378 (20.76%) Loss: 2.127041 LR: 0.00004814 [02:53:00] Epoch: 1 Batch: 7970/38378 (20.77%) Loss: 1.927790 LR: 0.00004814 [02:53:02] Epoch: 1 Batch: 7971/38378 (20.77%) Loss: 2.050301 LR: 0.00004814 [02:53:04] Epoch: 1 Batch: 7972/38378 (20.77%) Loss: 1.882482 LR: 0.00004814 [02:53:06] Epoch: 1 Batch: 7973/38378 (20.77%) Loss: 2.170589 LR: 0.00004814 [02:53:08] Epoch: 1 Batch: 7974/38378 (20.78%) Loss: 1.935107 LR: 0.00004814 [02:53:10] Epoch: 1 Batch: 7975/38378 (20.78%) Loss: 1.651857 LR: 0.00004814 [02:53:12] Epoch: 1 Batch: 7976/38378 (20.78%) Loss: 2.023328 LR: 0.00004814 [02:53:13] Epoch: 1 Batch: 7977/38378 (20.79%) Loss: 2.174708 LR: 0.00004814 [02:53:15] Epoch: 1 Batch: 7978/38378 (20.79%) Loss: 1.980964 LR: 0.00004813 [02:53:17] Epoch: 1 Batch: 7979/38378 (20.79%) Loss: 2.277690 LR: 0.00004813 [02:53:19] Epoch: 1 Batch: 7980/38378 (20.79%) Loss: 1.932092 LR: 0.00004813 [02:53:20] Epoch: 1 Batch: 7981/38378 (20.80%) Loss: 2.095986 LR: 0.00004813 [02:53:22] Epoch: 1 Batch: 7982/38378 (20.80%) Loss: 1.704691 LR: 0.00004813 [02:53:24] Epoch: 1 Batch: 7983/38378 (20.80%) Loss: 2.136888 LR: 0.00004813 [02:53:26] Epoch: 1 Batch: 7984/38378 (20.80%) Loss: 1.876969 LR: 0.00004813 [02:53:28] Epoch: 1 Batch: 7985/38378 (20.81%) Loss: 1.939946 LR: 0.00004813 [02:53:34] >> Cleaned up old temp checkpoint: epoch1_step7656 [02:53:34] >> Temp checkpoint saved: epoch1_step7986, size: 0.1702 GB [02:53:34] Epoch: 1 Batch: 7986/38378 (20.81%) Loss: 2.215143 LR: 0.00004813 [02:53:36] Epoch: 1 Batch: 7987/38378 (20.81%) Loss: 1.705265 LR: 0.00004813 [02:55:04] 2025-08-12 [02:55:04] Tesla T4 [02:55:04] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [02:55:04] CPU usage: 91.1%, RAM usage: 27.8% [02:55:04] Running with the following configuration: [02:55:04] model_name: NousResearch/Hermes-3-Llama-3.1-8B [02:55:04] tokenizer: NousResearch/Hermes-3-Llama-3.1-8B [02:55:04] output_dir: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview [02:55:04] train_path: /content/drive/MyDrive/data/none.csv [02:55:04] checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step7986 [02:55:04] lr: 5e-05 [02:55:04] lr_floor: 1e-05 [02:55:04] epochs: 1 [02:55:04] batch_size: 5 [02:55:04] accum_steps: 7 [02:55:04] val_batch_size: 6 [02:55:04] max_val_size: 100 [02:55:04] max_length: 150 [02:55:04] save_temp_frequency: 200 [02:55:04] save_frequency: 500 [02:55:04] eval_frequency: 500 [02:55:04] save_pattern: y [02:55:04] quantization: y [02:55:04] quantization_bits: 4 [02:55:04] lora: y [02:55:04] frozen_lora_path: None [02:55:04] lora_rank: 16 [02:55:04] lora_alpha: 32 [02:55:04] lora_dropout: 0.08 [02:55:04] optimizer_weight_decay: 0.0 [02:55:04] warmup_type: cosine [02:55:04] warmup_ratio: 0.08 [02:55:04] warmup_steps: 439 [02:55:04] shuffle: y [02:55:04] csv_column: text [02:55:04] new_run: n [02:55:04] label_smoothing: 0.05 [02:55:04] SEED: 1 [02:55:04] Using device: cuda [02:55:04] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step7986 [02:56:34] Embeddings shape after: torch.Size([128256, 4096]) [02:56:34] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step7986 [02:56:34] Trainable LoRA 'default': [02:56:34] task_type: CAUSAL_LM [02:56:34] peft_type: PeftType.LORA [02:56:34] auto_mapping: None [02:56:34] base_model_name_or_path: NousResearch/Hermes-3-Llama-3.1-8B [02:56:34] revision: None [02:56:35] inference_mode: False [02:56:35] r: 16 [02:56:35] target_modules: {'q_proj', 'k_proj', 'v_proj', 'o_proj'} [02:56:35] exclude_modules: None [02:56:35] lora_alpha: 32 [02:56:35] lora_dropout: 0.08 [02:56:35] fan_in_fan_out: False [02:56:35] bias: none [02:56:35] use_rslora: True [02:56:35] modules_to_save: None [02:56:35] init_lora_weights: True [02:56:35] layers_to_transform: None [02:56:35] layers_pattern: None [02:56:35] rank_pattern: {} [02:56:35] alpha_pattern: {} [02:56:35] megatron_config: None [02:56:35] megatron_core: megatron.core [02:56:35] trainable_token_indices: None [02:56:35] loftq_config: {} [02:56:35] eva_config: None [02:56:35] corda_config: None [02:56:35] use_dora: False [02:56:35] use_qalora: False [02:56:35] qalora_group_size: 16 [02:56:35] layer_replication: None [02:56:35] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [02:56:35] lora_bias: False [02:56:35] target_parameters: None [02:56:35] _custom_modules: None [02:56:35] Embeddings shape after: torch.Size([128256, 4096]) [02:56:36] Resumed from epoch 1, step 7987, file 1 [02:56:36] Starting from CSV file... [02:56:37] Splitting data into chunks of 11000... [02:56:37] Using 7 processes across 18 chunks [02:56:37] Using saved train/val split from checkpoint. [02:56:37] Resuming scheduler with warmup steps: 438, total steps: 5482 [02:56:37] Initializing scheduler with cosine schedule with warmup, warmup steps 439, total steps: 5482 [02:56:37] Train/Val split: 191887 train, 100 val samples. [02:56:48] Model: PeftModelForCausalLM [02:56:48] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.0", "use_cache": true, "vocab_size": 128256 } [02:56:48] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [02:56:48] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 5e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [02:56:48] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [02:56:48] Scheduler: [02:56:48] Training on 191887 training samples, 100 validation samples [02:56:48] Average tokens per sample: 141.99 [02:56:48] Estimated epoch time: ~686.92 min [02:56:48] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Active memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Requested memory | 5983 MiB | 7000 MiB | 332173 MiB | 326190 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 7616 MiB | 7616 MiB | 7616 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 1259 MiB | 5879 MiB | 333261 MiB | 332002 MiB | |---------------------------------------------------------------------------| | Allocations | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | Active allocs | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | GPU reserved segments | 186 | 186 | 186 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 33 | 37 | 12954 | 12921 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [02:56:48] Restoring shuffle indices from training state for epoch 1 [02:56:48] CPU usage: 56.8%, RAM usage: 40.5% [02:56:49] Epoch 1 learning rate: 0.0 [02:56:49] Starting epoch 1 [02:57:17] Batch 7987: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) [02:57:19] Epoch: 1 Batch: 7987/38378 (20.81%) Loss: 1.707219 LR: 0.00000000 [02:57:21] Epoch: 1 Batch: 7988/38378 (20.81%) Loss: 2.078534 LR: 0.00000000 [02:57:23] Epoch: 1 Batch: 7989/38378 (20.82%) Loss: 2.188701 LR: 0.00000000 [02:57:25] Epoch: 1 Batch: 7990/38378 (20.82%) Loss: 2.067998 LR: 0.00000000 [02:57:27] Epoch: 1 Batch: 7991/38378 (20.82%) Loss: 1.982541 LR: 0.00000000 [02:57:29] Epoch: 1 Batch: 7992/38378 (20.82%) Loss: 1.936272 LR: 0.00000000 [02:57:31] Epoch: 1 Batch: 7993/38378 (20.83%) Loss: 2.104191 LR: 0.00004812 [02:57:33] Epoch: 1 Batch: 7994/38378 (20.83%) Loss: 1.817523 LR: 0.00004812 [02:57:35] Epoch: 1 Batch: 7995/38378 (20.83%) Loss: 1.853921 LR: 0.00004812 [02:57:37] Epoch: 1 Batch: 7996/38378 (20.83%) Loss: 1.792269 LR: 0.00004812 [02:57:39] Epoch: 1 Batch: 7997/38378 (20.84%) Loss: 2.171779 LR: 0.00004812 [02:57:41] Epoch: 1 Batch: 7998/38378 (20.84%) Loss: 1.882930 LR: 0.00004812 [02:57:43] Epoch: 1 Batch: 7999/38378 (20.84%) Loss: 2.090516 LR: 0.00004812 [02:57:45] >> Evaluating batch 0 [02:57:46] >> Evaluating batch 1 [02:57:47] >> Evaluating batch 2 [02:57:48] >> Evaluating batch 3 [02:57:49] >> Evaluating batch 4 [02:57:50] >> Evaluating batch 5 [02:57:51] >> Evaluating batch 6 [02:57:52] >> Evaluating batch 7 [02:57:53] >> Evaluating batch 8 [02:57:54] >> Evaluating batch 9 [02:57:55] >> Evaluating batch 10 [02:57:56] >> Evaluating batch 11 [02:57:57] >> Evaluating batch 12 [02:57:58] >> Evaluating batch 13 [02:57:59] >> Evaluating batch 14 [02:58:00] >> Evaluating batch 15 [02:58:01] >> Evaluating batch 16 [02:58:02] Epoch: 1 Step: 8000/38378 Evaluation: [02:58:02] [1mAvg Loss Since Last Eval: 0.0034 Val Loss: 2.1255 Validation loss delta: 2.1255 Perplexity: 8.3771 LR: 0.00004812 [02:58:06] >> Cleaned up old temp checkpoint: epoch1_step7689 [02:58:06] >> Temp checkpoint saved: epoch1_step8000, size: 0.1702 GB [02:58:10] >> Checkpoint saved: epoch1_step8000, size: 0.1702 GB [02:58:10] Epoch: 1 Batch: 8000/38378 (20.85%) Loss: 1.686536 LR: 0.00004812 [02:58:12] Epoch: 1 Batch: 8001/38378 (20.85%) Loss: 2.008306 LR: 0.00004812 [02:58:14] Epoch: 1 Batch: 8002/38378 (20.85%) Loss: 1.748453 LR: 0.00004812 [02:58:16] Epoch: 1 Batch: 8003/38378 (20.85%) Loss: 2.110169 LR: 0.00004812 [02:58:17] Epoch: 1 Batch: 8004/38378 (20.86%) Loss: 2.123830 LR: 0.00004812 [02:58:19] Epoch: 1 Batch: 8005/38378 (20.86%) Loss: 1.858136 LR: 0.00004812 [02:58:21] Epoch: 1 Batch: 8006/38378 (20.86%) Loss: 2.169140 LR: 0.00004812 [02:58:23] Epoch: 1 Batch: 8007/38378 (20.86%) Loss: 1.939992 LR: 0.00004811 [02:58:24] Epoch: 1 Batch: 8008/38378 (20.87%) Loss: 1.934220 LR: 0.00004811 [02:58:26] Epoch: 1 Batch: 8009/38378 (20.87%) Loss: 2.057169 LR: 0.00004811 [02:58:28] Epoch: 1 Batch: 8010/38378 (20.87%) Loss: 1.969541 LR: 0.00004811 [02:58:30] Epoch: 1 Batch: 8011/38378 (20.87%) Loss: 2.005899 LR: 0.00004811 [02:58:31] Epoch: 1 Batch: 8012/38378 (20.88%) Loss: 2.148997 LR: 0.00004811 [02:58:33] Epoch: 1 Batch: 8013/38378 (20.88%) Loss: 2.151880 LR: 0.00004811 [02:58:35] Epoch: 1 Batch: 8014/38378 (20.88%) Loss: 2.019382 LR: 0.00004811 [02:58:37] Epoch: 1 Batch: 8015/38378 (20.88%) Loss: 1.871612 LR: 0.00004811 [02:58:39] Epoch: 1 Batch: 8016/38378 (20.89%) Loss: 2.024364 LR: 0.00004811 [02:58:40] Epoch: 1 Batch: 8017/38378 (20.89%) Loss: 2.132738 LR: 0.00004811 [02:58:42] Epoch: 1 Batch: 8018/38378 (20.89%) Loss: 2.014958 LR: 0.00004811 [02:58:44] Epoch: 1 Batch: 8019/38378 (20.89%) Loss: 1.905372 LR: 0.00004811 [02:58:46] Epoch: 1 Batch: 8020/38378 (20.90%) Loss: 2.125328 LR: 0.00004811 [02:58:48] Epoch: 1 Batch: 8021/38378 (20.90%) Loss: 1.955070 LR: 0.00004810 [02:58:50] Epoch: 1 Batch: 8022/38378 (20.90%) Loss: 1.800622 LR: 0.00004810 [02:58:52] Epoch: 1 Batch: 8023/38378 (20.91%) Loss: 1.847535 LR: 0.00004810 [02:58:54] Epoch: 1 Batch: 8024/38378 (20.91%) Loss: 2.413548 LR: 0.00004810 [02:58:56] Epoch: 1 Batch: 8025/38378 (20.91%) Loss: 2.186883 LR: 0.00004810 [02:58:57] Epoch: 1 Batch: 8026/38378 (20.91%) Loss: 2.394751 LR: 0.00004810 [02:58:59] Epoch: 1 Batch: 8027/38378 (20.92%) Loss: 2.035432 LR: 0.00004810 [02:59:01] Epoch: 1 Batch: 8028/38378 (20.92%) Loss: 2.217789 LR: 0.00004810 [02:59:03] Epoch: 1 Batch: 8029/38378 (20.92%) Loss: 1.893898 LR: 0.00004810 [02:59:05] Epoch: 1 Batch: 8030/38378 (20.92%) Loss: 2.288666 LR: 0.00004810 [02:59:07] Epoch: 1 Batch: 8031/38378 (20.93%) Loss: 2.237590 LR: 0.00004810 [02:59:09] Epoch: 1 Batch: 8032/38378 (20.93%) Loss: 2.082520 LR: 0.00004810 [02:59:10] Epoch: 1 Batch: 8033/38378 (20.93%) Loss: 1.937890 LR: 0.00004810 [02:59:12] Epoch: 1 Batch: 8034/38378 (20.93%) Loss: 2.193989 LR: 0.00004810 [02:59:14] Epoch: 1 Batch: 8035/38378 (20.94%) Loss: 2.212941 LR: 0.00004809 [02:59:16] Epoch: 1 Batch: 8036/38378 (20.94%) Loss: 1.889497 LR: 0.00004809 [02:59:18] Epoch: 1 Batch: 8037/38378 (20.94%) Loss: 2.093536 LR: 0.00004809 [02:59:19] Epoch: 1 Batch: 8038/38378 (20.94%) Loss: 2.209449 LR: 0.00004809 [02:59:21] Epoch: 1 Batch: 8039/38378 (20.95%) Loss: 1.740424 LR: 0.00004809 [02:59:23] Epoch: 1 Batch: 8040/38378 (20.95%) Loss: 1.778903 LR: 0.00004809 [02:59:25] Epoch: 1 Batch: 8041/38378 (20.95%) Loss: 1.752310 LR: 0.00004809 [02:59:27] Epoch: 1 Batch: 8042/38378 (20.95%) Loss: 2.065710 LR: 0.00004809 [02:59:28] Epoch: 1 Batch: 8043/38378 (20.96%) Loss: 1.976806 LR: 0.00004809 [02:59:30] Epoch: 1 Batch: 8044/38378 (20.96%) Loss: 2.159395 LR: 0.00004809 [02:59:32] Epoch: 1 Batch: 8045/38378 (20.96%) Loss: 1.833137 LR: 0.00004809 [02:59:34] Epoch: 1 Batch: 8046/38378 (20.97%) Loss: 2.121296 LR: 0.00004809 [02:59:36] Epoch: 1 Batch: 8047/38378 (20.97%) Loss: 2.302634 LR: 0.00004809 [02:59:37] Epoch: 1 Batch: 8048/38378 (20.97%) Loss: 1.783223 LR: 0.00004809 [02:59:39] Epoch: 1 Batch: 8049/38378 (20.97%) Loss: 1.845905 LR: 0.00004808 [02:59:41] Epoch: 1 Batch: 8050/38378 (20.98%) Loss: 1.870067 LR: 0.00004808 [02:59:43] Epoch: 1 Batch: 8051/38378 (20.98%) Loss: 2.127215 LR: 0.00004808 [02:59:45] Epoch: 1 Batch: 8052/38378 (20.98%) Loss: 2.033572 LR: 0.00004808 [02:59:46] Epoch: 1 Batch: 8053/38378 (20.98%) Loss: 1.982444 LR: 0.00004808 [02:59:48] Epoch: 1 Batch: 8054/38378 (20.99%) Loss: 2.272452 LR: 0.00004808 [02:59:50] Epoch: 1 Batch: 8055/38378 (20.99%) Loss: 2.040801 LR: 0.00004808 [02:59:52] Epoch: 1 Batch: 8056/38378 (20.99%) Loss: 2.004933 LR: 0.00004808 [02:59:54] Epoch: 1 Batch: 8057/38378 (20.99%) Loss: 2.225465 LR: 0.00004808 [02:59:55] Epoch: 1 Batch: 8058/38378 (21.00%) Loss: 2.225056 LR: 0.00004808 [02:59:57] Epoch: 1 Batch: 8059/38378 (21.00%) Loss: 1.844380 LR: 0.00004808 [02:59:59] Epoch: 1 Batch: 8060/38378 (21.00%) Loss: 2.170405 LR: 0.00004808 [03:00:01] Epoch: 1 Batch: 8061/38378 (21.00%) Loss: 2.129645 LR: 0.00004808 [03:00:03] Epoch: 1 Batch: 8062/38378 (21.01%) Loss: 2.044746 LR: 0.00004808 [03:00:05] Epoch: 1 Batch: 8063/38378 (21.01%) Loss: 2.058914 LR: 0.00004807 [03:00:06] Epoch: 1 Batch: 8064/38378 (21.01%) Loss: 2.231777 LR: 0.00004807 [03:00:08] Epoch: 1 Batch: 8065/38378 (21.01%) Loss: 1.995242 LR: 0.00004807 [03:00:10] Epoch: 1 Batch: 8066/38378 (21.02%) Loss: 2.037953 LR: 0.00004807 [03:00:12] Epoch: 1 Batch: 8067/38378 (21.02%) Loss: 1.900391 LR: 0.00004807 [03:00:14] Epoch: 1 Batch: 8068/38378 (21.02%) Loss: 2.045904 LR: 0.00004807 [03:00:16] Epoch: 1 Batch: 8069/38378 (21.03%) Loss: 2.295369 LR: 0.00004807 [03:00:17] Epoch: 1 Batch: 8070/38378 (21.03%) Loss: 1.740472 LR: 0.00004806 [03:00:19] Epoch: 1 Batch: 8071/38378 (21.03%) Loss: 2.281579 LR: 0.00004806 [03:00:21] Epoch: 1 Batch: 8072/38378 (21.03%) Loss: 2.051681 LR: 0.00004806 [03:00:23] Epoch: 1 Batch: 8073/38378 (21.04%) Loss: 2.167869 LR: 0.00004806 [03:00:25] Epoch: 1 Batch: 8074/38378 (21.04%) Loss: 2.235588 LR: 0.00004806 [03:00:26] Epoch: 1 Batch: 8075/38378 (21.04%) Loss: 1.844115 LR: 0.00004806 [03:00:28] Epoch: 1 Batch: 8076/38378 (21.04%) Loss: 2.103257 LR: 0.00004806 [03:00:30] Epoch: 1 Batch: 8077/38378 (21.05%) Loss: 1.943989 LR: 0.00004806 [03:00:32] Epoch: 1 Batch: 8078/38378 (21.05%) Loss: 2.087058 LR: 0.00004806 [03:00:34] Epoch: 1 Batch: 8079/38378 (21.05%) Loss: 1.962083 LR: 0.00004806 [03:00:36] Epoch: 1 Batch: 8080/38378 (21.05%) Loss: 1.666144 LR: 0.00004806 [03:00:37] Epoch: 1 Batch: 8081/38378 (21.06%) Loss: 1.862550 LR: 0.00004806 [03:00:39] Epoch: 1 Batch: 8082/38378 (21.06%) Loss: 2.091953 LR: 0.00004806 [03:00:41] Epoch: 1 Batch: 8083/38378 (21.06%) Loss: 2.316795 LR: 0.00004806 [03:00:43] Epoch: 1 Batch: 8084/38378 (21.06%) Loss: 2.160309 LR: 0.00004805 [03:00:45] Epoch: 1 Batch: 8085/38378 (21.07%) Loss: 1.888204 LR: 0.00004805 [03:00:46] Epoch: 1 Batch: 8086/38378 (21.07%) Loss: 2.008882 LR: 0.00004805 [03:00:48] Epoch: 1 Batch: 8087/38378 (21.07%) Loss: 2.271172 LR: 0.00004805 [03:00:50] Epoch: 1 Batch: 8088/38378 (21.07%) Loss: 2.119469 LR: 0.00004805 [03:00:52] Epoch: 1 Batch: 8089/38378 (21.08%) Loss: 2.086283 LR: 0.00004805 [03:00:54] Epoch: 1 Batch: 8090/38378 (21.08%) Loss: 2.012098 LR: 0.00004805 [03:00:55] Epoch: 1 Batch: 8091/38378 (21.08%) Loss: 1.886191 LR: 0.00004805 [03:00:57] Epoch: 1 Batch: 8092/38378 (21.08%) Loss: 2.134174 LR: 0.00004805 [03:00:59] Epoch: 1 Batch: 8093/38378 (21.09%) Loss: 2.171127 LR: 0.00004805 [03:01:01] Epoch: 1 Batch: 8094/38378 (21.09%) Loss: 1.777152 LR: 0.00004805 [03:01:03] Epoch: 1 Batch: 8095/38378 (21.09%) Loss: 1.742320 LR: 0.00004805 [03:01:04] Epoch: 1 Batch: 8096/38378 (21.10%) Loss: 1.843654 LR: 0.00004805 [03:01:06] Epoch: 1 Batch: 8097/38378 (21.10%) Loss: 1.945837 LR: 0.00004805 [03:01:08] Epoch: 1 Batch: 8098/38378 (21.10%) Loss: 2.356480 LR: 0.00004804 [03:01:10] Epoch: 1 Batch: 8099/38378 (21.10%) Loss: 1.681739 LR: 0.00004804 [03:01:12] Epoch: 1 Batch: 8100/38378 (21.11%) Loss: 2.025586 LR: 0.00004804 [03:01:14] Epoch: 1 Batch: 8101/38378 (21.11%) Loss: 2.124032 LR: 0.00004804 [03:01:15] Epoch: 1 Batch: 8102/38378 (21.11%) Loss: 1.950509 LR: 0.00004804 [03:01:17] Epoch: 1 Batch: 8103/38378 (21.11%) Loss: 2.271297 LR: 0.00004804 [03:01:19] Epoch: 1 Batch: 8104/38378 (21.12%) Loss: 2.026312 LR: 0.00004804 [03:01:21] Epoch: 1 Batch: 8105/38378 (21.12%) Loss: 2.214646 LR: 0.00004804 [03:01:23] Epoch: 1 Batch: 8106/38378 (21.12%) Loss: 1.970896 LR: 0.00004804 [03:01:24] Epoch: 1 Batch: 8107/38378 (21.12%) Loss: 1.821710 LR: 0.00004804 [03:01:26] Epoch: 1 Batch: 8108/38378 (21.13%) Loss: 2.277453 LR: 0.00004804 [03:01:28] Epoch: 1 Batch: 8109/38378 (21.13%) Loss: 1.950187 LR: 0.00004804 [03:01:30] Epoch: 1 Batch: 8110/38378 (21.13%) Loss: 1.943561 LR: 0.00004804 [03:01:32] Epoch: 1 Batch: 8111/38378 (21.13%) Loss: 1.971852 LR: 0.00004804 [03:01:33] Epoch: 1 Batch: 8112/38378 (21.14%) Loss: 1.970296 LR: 0.00004803 [03:01:35] Epoch: 1 Batch: 8113/38378 (21.14%) Loss: 1.733088 LR: 0.00004803 [03:01:37] Epoch: 1 Batch: 8114/38378 (21.14%) Loss: 2.043008 LR: 0.00004803 [03:01:39] Epoch: 1 Batch: 8115/38378 (21.14%) Loss: 2.063622 LR: 0.00004803 [03:01:41] Epoch: 1 Batch: 8116/38378 (21.15%) Loss: 1.975174 LR: 0.00004803 [03:01:42] Epoch: 1 Batch: 8117/38378 (21.15%) Loss: 2.081450 LR: 0.00004803 [03:01:44] Epoch: 1 Batch: 8118/38378 (21.15%) Loss: 1.960690 LR: 0.00004803 [03:01:46] Epoch: 1 Batch: 8119/38378 (21.16%) Loss: 2.155098 LR: 0.00004803 [03:01:48] Epoch: 1 Batch: 8120/38378 (21.16%) Loss: 2.274953 LR: 0.00004803 [03:01:50] Epoch: 1 Batch: 8121/38378 (21.16%) Loss: 1.866052 LR: 0.00004803 [03:01:51] Epoch: 1 Batch: 8122/38378 (21.16%) Loss: 2.217664 LR: 0.00004803 [03:01:53] Epoch: 1 Batch: 8123/38378 (21.17%) Loss: 2.038324 LR: 0.00004803 [03:01:55] Epoch: 1 Batch: 8124/38378 (21.17%) Loss: 1.724896 LR: 0.00004803 [03:01:57] Epoch: 1 Batch: 8125/38378 (21.17%) Loss: 1.845020 LR: 0.00004803 [03:01:59] Epoch: 1 Batch: 8126/38378 (21.17%) Loss: 2.592887 LR: 0.00004802 [03:02:00] Epoch: 1 Batch: 8127/38378 (21.18%) Loss: 1.907865 LR: 0.00004802 [03:02:02] Epoch: 1 Batch: 8128/38378 (21.18%) Loss: 2.336214 LR: 0.00004802 [03:02:04] Epoch: 1 Batch: 8129/38378 (21.18%) Loss: 2.097243 LR: 0.00004802 [03:02:06] Epoch: 1 Batch: 8130/38378 (21.18%) Loss: 1.918989 LR: 0.00004802 [03:02:08] Epoch: 1 Batch: 8131/38378 (21.19%) Loss: 2.129736 LR: 0.00004802 [03:02:10] Epoch: 1 Batch: 8132/38378 (21.19%) Loss: 1.904976 LR: 0.00004802 [03:02:11] Epoch: 1 Batch: 8133/38378 (21.19%) Loss: 1.721111 LR: 0.00004802 [03:02:13] Epoch: 1 Batch: 8134/38378 (21.19%) Loss: 2.031356 LR: 0.00004802 [03:02:15] Epoch: 1 Batch: 8135/38378 (21.20%) Loss: 1.735791 LR: 0.00004802 [03:02:17] Epoch: 1 Batch: 8136/38378 (21.20%) Loss: 1.847022 LR: 0.00004802 [03:02:19] Epoch: 1 Batch: 8137/38378 (21.20%) Loss: 2.102488 LR: 0.00004802 [03:02:20] Epoch: 1 Batch: 8138/38378 (21.20%) Loss: 2.177984 LR: 0.00004802 [03:02:22] Epoch: 1 Batch: 8139/38378 (21.21%) Loss: 1.790614 LR: 0.00004802 [03:02:24] Epoch: 1 Batch: 8140/38378 (21.21%) Loss: 1.816302 LR: 0.00004801 [03:02:26] Epoch: 1 Batch: 8141/38378 (21.21%) Loss: 1.999409 LR: 0.00004801 [03:02:28] Epoch: 1 Batch: 8142/38378 (21.22%) Loss: 1.883851 LR: 0.00004801 [03:02:29] Epoch: 1 Batch: 8143/38378 (21.22%) Loss: 2.105308 LR: 0.00004801 [03:02:31] Epoch: 1 Batch: 8144/38378 (21.22%) Loss: 2.100747 LR: 0.00004801 [03:02:33] Epoch: 1 Batch: 8145/38378 (21.22%) Loss: 2.170545 LR: 0.00004801 [03:02:35] Epoch: 1 Batch: 8146/38378 (21.23%) Loss: 2.068018 LR: 0.00004801 [03:02:37] Epoch: 1 Batch: 8147/38378 (21.23%) Loss: 1.869875 LR: 0.00004801 [03:02:39] Epoch: 1 Batch: 8148/38378 (21.23%) Loss: 1.964799 LR: 0.00004801 [03:02:40] Epoch: 1 Batch: 8149/38378 (21.23%) Loss: 1.915840 LR: 0.00004801 [03:02:42] Epoch: 1 Batch: 8150/38378 (21.24%) Loss: 2.351452 LR: 0.00004801 [03:02:44] Epoch: 1 Batch: 8151/38378 (21.24%) Loss: 1.730702 LR: 0.00004801 [03:02:46] Epoch: 1 Batch: 8152/38378 (21.24%) Loss: 2.262251 LR: 0.00004801 [03:02:47] Epoch: 1 Batch: 8153/38378 (21.24%) Loss: 2.007354 LR: 0.00004801 [03:02:49] Epoch: 1 Batch: 8154/38378 (21.25%) Loss: 1.758919 LR: 0.00004800 [03:02:51] Epoch: 1 Batch: 8155/38378 (21.25%) Loss: 2.060567 LR: 0.00004800 [03:02:53] Epoch: 1 Batch: 8156/38378 (21.25%) Loss: 1.959012 LR: 0.00004800 [03:02:55] Epoch: 1 Batch: 8157/38378 (21.25%) Loss: 1.777851 LR: 0.00004800 [03:02:56] Epoch: 1 Batch: 8158/38378 (21.26%) Loss: 2.022889 LR: 0.00004800 [03:02:58] Epoch: 1 Batch: 8159/38378 (21.26%) Loss: 1.864591 LR: 0.00004800 [03:03:00] Epoch: 1 Batch: 8160/38378 (21.26%) Loss: 2.096520 LR: 0.00004800 [03:03:02] Epoch: 1 Batch: 8161/38378 (21.26%) Loss: 2.086354 LR: 0.00004799 [03:03:04] Epoch: 1 Batch: 8162/38378 (21.27%) Loss: 1.941894 LR: 0.00004799 [03:03:06] Epoch: 1 Batch: 8163/38378 (21.27%) Loss: 2.045301 LR: 0.00004799 [03:03:07] Epoch: 1 Batch: 8164/38378 (21.27%) Loss: 2.195432 LR: 0.00004799 [03:03:09] Epoch: 1 Batch: 8165/38378 (21.28%) Loss: 1.752969 LR: 0.00004799 [03:03:11] Epoch: 1 Batch: 8166/38378 (21.28%) Loss: 2.297909 LR: 0.00004799 [03:03:13] Epoch: 1 Batch: 8167/38378 (21.28%) Loss: 1.977972 LR: 0.00004799 [03:03:15] Epoch: 1 Batch: 8168/38378 (21.28%) Loss: 1.857060 LR: 0.00004799 [03:03:16] Epoch: 1 Batch: 8169/38378 (21.29%) Loss: 1.948543 LR: 0.00004799 [03:03:18] Epoch: 1 Batch: 8170/38378 (21.29%) Loss: 2.073386 LR: 0.00004799 [03:03:20] Epoch: 1 Batch: 8171/38378 (21.29%) Loss: 2.071375 LR: 0.00004799 [03:03:22] Epoch: 1 Batch: 8172/38378 (21.29%) Loss: 2.259946 LR: 0.00004799 [03:03:24] Epoch: 1 Batch: 8173/38378 (21.30%) Loss: 1.827685 LR: 0.00004799 [03:03:26] Epoch: 1 Batch: 8174/38378 (21.30%) Loss: 1.933662 LR: 0.00004799 [03:03:27] Epoch: 1 Batch: 8175/38378 (21.30%) Loss: 1.937818 LR: 0.00004798 [03:03:29] Epoch: 1 Batch: 8176/38378 (21.30%) Loss: 1.777645 LR: 0.00004798 [03:03:31] Epoch: 1 Batch: 8177/38378 (21.31%) Loss: 2.095098 LR: 0.00004798 [03:03:33] Epoch: 1 Batch: 8178/38378 (21.31%) Loss: 1.817652 LR: 0.00004798 [03:03:35] Epoch: 1 Batch: 8179/38378 (21.31%) Loss: 2.021966 LR: 0.00004798 [03:03:36] Epoch: 1 Batch: 8180/38378 (21.31%) Loss: 1.972002 LR: 0.00004798 [03:03:38] Epoch: 1 Batch: 8181/38378 (21.32%) Loss: 1.952094 LR: 0.00004798 [03:03:40] Epoch: 1 Batch: 8182/38378 (21.32%) Loss: 1.979371 LR: 0.00004798 [03:03:42] Epoch: 1 Batch: 8183/38378 (21.32%) Loss: 1.883924 LR: 0.00004798 [03:03:44] Epoch: 1 Batch: 8184/38378 (21.32%) Loss: 1.916568 LR: 0.00004798 [03:03:46] Epoch: 1 Batch: 8185/38378 (21.33%) Loss: 1.827311 LR: 0.00004798 [03:03:47] Epoch: 1 Batch: 8186/38378 (21.33%) Loss: 1.870582 LR: 0.00004798 [03:03:49] Epoch: 1 Batch: 8187/38378 (21.33%) Loss: 2.196567 LR: 0.00004798 [03:03:51] Epoch: 1 Batch: 8188/38378 (21.34%) Loss: 2.191387 LR: 0.00004798 [03:03:53] Epoch: 1 Batch: 8189/38378 (21.34%) Loss: 1.922961 LR: 0.00004797 [03:03:55] Epoch: 1 Batch: 8190/38378 (21.34%) Loss: 2.103819 LR: 0.00004797 [03:03:56] Epoch: 1 Batch: 8191/38378 (21.34%) Loss: 2.086903 LR: 0.00004797 [03:03:58] Epoch: 1 Batch: 8192/38378 (21.35%) Loss: 2.111332 LR: 0.00004797 [03:04:00] Epoch: 1 Batch: 8193/38378 (21.35%) Loss: 1.957002 LR: 0.00004797 [03:04:02] Epoch: 1 Batch: 8194/38378 (21.35%) Loss: 2.168357 LR: 0.00004797 [03:04:04] Epoch: 1 Batch: 8195/38378 (21.35%) Loss: 2.091088 LR: 0.00004797 [03:04:05] Epoch: 1 Batch: 8196/38378 (21.36%) Loss: 2.296849 LR: 0.00004797 [03:04:07] Epoch: 1 Batch: 8197/38378 (21.36%) Loss: 2.327231 LR: 0.00004797 [03:04:09] Epoch: 1 Batch: 8198/38378 (21.36%) Loss: 2.181507 LR: 0.00004797 [03:04:11] Epoch: 1 Batch: 8199/38378 (21.36%) Loss: 2.086902 LR: 0.00004797 [03:04:17] >> Cleaned up old temp checkpoint: epoch1_step7722 [03:04:17] >> Temp checkpoint saved: epoch1_step8200, size: 0.1702 GB [03:04:17] Epoch: 1 Batch: 8200/38378 (21.37%) Loss: 1.905944 LR: 0.00004797 [03:04:19] Epoch: 1 Batch: 8201/38378 (21.37%) Loss: 2.045184 LR: 0.00004797 [03:04:20] Epoch: 1 Batch: 8202/38378 (21.37%) Loss: 2.121726 LR: 0.00004797 [03:04:22] Epoch: 1 Batch: 8203/38378 (21.37%) Loss: 2.148170 LR: 0.00004796 [03:04:24] Epoch: 1 Batch: 8204/38378 (21.38%) Loss: 1.874371 LR: 0.00004796 [03:04:26] Epoch: 1 Batch: 8205/38378 (21.38%) Loss: 1.981748 LR: 0.00004796 [03:04:27] Epoch: 1 Batch: 8206/38378 (21.38%) Loss: 1.932165 LR: 0.00004796 [03:04:29] Epoch: 1 Batch: 8207/38378 (21.38%) Loss: 1.995138 LR: 0.00004796 [03:04:31] Epoch: 1 Batch: 8208/38378 (21.39%) Loss: 2.211462 LR: 0.00004796 [03:04:33] Epoch: 1 Batch: 8209/38378 (21.39%) Loss: 1.944670 LR: 0.00004796 [03:04:35] Epoch: 1 Batch: 8210/38378 (21.39%) Loss: 1.775072 LR: 0.00004796 [03:04:36] Epoch: 1 Batch: 8211/38378 (21.40%) Loss: 2.155430 LR: 0.00004796 [03:04:38] Epoch: 1 Batch: 8212/38378 (21.40%) Loss: 1.997672 LR: 0.00004796 [03:04:40] Epoch: 1 Batch: 8213/38378 (21.40%) Loss: 1.940266 LR: 0.00004796 [03:04:42] Epoch: 1 Batch: 8214/38378 (21.40%) Loss: 2.008967 LR: 0.00004796 [03:04:44] Epoch: 1 Batch: 8215/38378 (21.41%) Loss: 2.033118 LR: 0.00004796 [03:04:46] Epoch: 1 Batch: 8216/38378 (21.41%) Loss: 2.213782 LR: 0.00004796 [03:04:47] Epoch: 1 Batch: 8217/38378 (21.41%) Loss: 1.997963 LR: 0.00004795 [03:04:49] Epoch: 1 Batch: 8218/38378 (21.41%) Loss: 1.980145 LR: 0.00004795 [03:04:51] Epoch: 1 Batch: 8219/38378 (21.42%) Loss: 1.926024 LR: 0.00004795 [03:04:53] Epoch: 1 Batch: 8220/38378 (21.42%) Loss: 1.988806 LR: 0.00004795 [03:04:55] Epoch: 1 Batch: 8221/38378 (21.42%) Loss: 2.046656 LR: 0.00004795 [03:04:57] Epoch: 1 Batch: 8222/38378 (21.42%) Loss: 1.749111 LR: 0.00004795 [03:04:58] Epoch: 1 Batch: 8223/38378 (21.43%) Loss: 2.020314 LR: 0.00004795 [03:05:00] Epoch: 1 Batch: 8224/38378 (21.43%) Loss: 2.039068 LR: 0.00004795 [03:05:02] Epoch: 1 Batch: 8225/38378 (21.43%) Loss: 2.005919 LR: 0.00004795 [03:05:04] Epoch: 1 Batch: 8226/38378 (21.43%) Loss: 2.017470 LR: 0.00004795 [03:05:06] Epoch: 1 Batch: 8227/38378 (21.44%) Loss: 1.835509 LR: 0.00004795 [03:05:08] Epoch: 1 Batch: 8228/38378 (21.44%) Loss: 1.955480 LR: 0.00004795 [03:05:10] Epoch: 1 Batch: 8229/38378 (21.44%) Loss: 2.088947 LR: 0.00004795 [03:05:12] Epoch: 1 Batch: 8230/38378 (21.44%) Loss: 1.756257 LR: 0.00004795 [03:05:14] Epoch: 1 Batch: 8231/38378 (21.45%) Loss: 2.013090 LR: 0.00004794 [03:05:16] Epoch: 1 Batch: 8232/38378 (21.45%) Loss: 1.855327 LR: 0.00004794 [03:05:17] Epoch: 1 Batch: 8233/38378 (21.45%) Loss: 1.964470 LR: 0.00004794 [03:05:19] Epoch: 1 Batch: 8234/38378 (21.46%) Loss: 2.156899 LR: 0.00004794 [03:05:21] Epoch: 1 Batch: 8235/38378 (21.46%) Loss: 1.972999 LR: 0.00004794 [03:05:23] Epoch: 1 Batch: 8236/38378 (21.46%) Loss: 1.962523 LR: 0.00004794 [03:05:25] Epoch: 1 Batch: 8237/38378 (21.46%) Loss: 2.285248 LR: 0.00004794 [03:05:26] Epoch: 1 Batch: 8238/38378 (21.47%) Loss: 1.990021 LR: 0.00004793 [03:05:28] Epoch: 1 Batch: 8239/38378 (21.47%) Loss: 1.805527 LR: 0.00004793 [03:05:30] Epoch: 1 Batch: 8240/38378 (21.47%) Loss: 1.955262 LR: 0.00004793 [03:05:32] Epoch: 1 Batch: 8241/38378 (21.47%) Loss: 2.297494 LR: 0.00004793 [03:05:34] Epoch: 1 Batch: 8242/38378 (21.48%) Loss: 1.902373 LR: 0.00004793 [03:05:36] Epoch: 1 Batch: 8243/38378 (21.48%) Loss: 2.175416 LR: 0.00004793 [03:05:37] Epoch: 1 Batch: 8244/38378 (21.48%) Loss: 1.965202 LR: 0.00004793 [03:05:39] Epoch: 1 Batch: 8245/38378 (21.48%) Loss: 2.072131 LR: 0.00004793 [03:05:41] Epoch: 1 Batch: 8246/38378 (21.49%) Loss: 2.147903 LR: 0.00004793 [03:05:43] Epoch: 1 Batch: 8247/38378 (21.49%) Loss: 2.162387 LR: 0.00004793 [03:05:45] Epoch: 1 Batch: 8248/38378 (21.49%) Loss: 2.022049 LR: 0.00004793 [03:05:46] Epoch: 1 Batch: 8249/38378 (21.49%) Loss: 2.182920 LR: 0.00004793 [03:05:48] Epoch: 1 Batch: 8250/38378 (21.50%) Loss: 1.703808 LR: 0.00004793 [03:05:50] Epoch: 1 Batch: 8251/38378 (21.50%) Loss: 1.831593 LR: 0.00004793 [03:05:52] Epoch: 1 Batch: 8252/38378 (21.50%) Loss: 2.335700 LR: 0.00004792 [03:05:54] Epoch: 1 Batch: 8253/38378 (21.50%) Loss: 2.064381 LR: 0.00004792 [03:05:55] Epoch: 1 Batch: 8254/38378 (21.51%) Loss: 2.343077 LR: 0.00004792 [03:05:57] Epoch: 1 Batch: 8255/38378 (21.51%) Loss: 2.086383 LR: 0.00004792 [03:05:59] Epoch: 1 Batch: 8256/38378 (21.51%) Loss: 2.346537 LR: 0.00004792 [03:06:01] Epoch: 1 Batch: 8257/38378 (21.51%) Loss: 2.038818 LR: 0.00004792 [03:06:03] Epoch: 1 Batch: 8258/38378 (21.52%) Loss: 2.001679 LR: 0.00004792 [03:06:04] Epoch: 1 Batch: 8259/38378 (21.52%) Loss: 2.042239 LR: 0.00004792 [03:06:06] Epoch: 1 Batch: 8260/38378 (21.52%) Loss: 1.816480 LR: 0.00004792 [03:06:08] Epoch: 1 Batch: 8261/38378 (21.53%) Loss: 2.137854 LR: 0.00004792 [03:06:10] Epoch: 1 Batch: 8262/38378 (21.53%) Loss: 2.011523 LR: 0.00004792 [03:06:11] Epoch: 1 Batch: 8263/38378 (21.53%) Loss: 2.245397 LR: 0.00004792 [03:06:13] Epoch: 1 Batch: 8264/38378 (21.53%) Loss: 2.123482 LR: 0.00004792 [03:06:15] Epoch: 1 Batch: 8265/38378 (21.54%) Loss: 1.951261 LR: 0.00004792 [03:06:17] Epoch: 1 Batch: 8266/38378 (21.54%) Loss: 1.982517 LR: 0.00004791 [03:06:19] Epoch: 1 Batch: 8267/38378 (21.54%) Loss: 2.080817 LR: 0.00004791 [03:06:21] Epoch: 1 Batch: 8268/38378 (21.54%) Loss: 1.999257 LR: 0.00004791 [03:06:22] Epoch: 1 Batch: 8269/38378 (21.55%) Loss: 2.045434 LR: 0.00004791 [03:06:24] Epoch: 1 Batch: 8270/38378 (21.55%) Loss: 1.817004 LR: 0.00004791 [03:06:26] Epoch: 1 Batch: 8271/38378 (21.55%) Loss: 2.193616 LR: 0.00004791 [03:06:28] Epoch: 1 Batch: 8272/38378 (21.55%) Loss: 1.827715 LR: 0.00004791 [03:06:30] Epoch: 1 Batch: 8273/38378 (21.56%) Loss: 2.250510 LR: 0.00004791 [03:06:31] Epoch: 1 Batch: 8274/38378 (21.56%) Loss: 1.994824 LR: 0.00004791 [03:06:33] Epoch: 1 Batch: 8275/38378 (21.56%) Loss: 1.988048 LR: 0.00004791 [03:06:35] Epoch: 1 Batch: 8276/38378 (21.56%) Loss: 1.962107 LR: 0.00004791 [03:06:37] Epoch: 1 Batch: 8277/38378 (21.57%) Loss: 1.977101 LR: 0.00004791 [03:06:39] Epoch: 1 Batch: 8278/38378 (21.57%) Loss: 1.966710 LR: 0.00004791 [03:06:41] Epoch: 1 Batch: 8279/38378 (21.57%) Loss: 2.161539 LR: 0.00004791 [03:06:42] Epoch: 1 Batch: 8280/38378 (21.57%) Loss: 2.034915 LR: 0.00004790 [03:06:44] Epoch: 1 Batch: 8281/38378 (21.58%) Loss: 2.284038 LR: 0.00004790 [03:06:46] Epoch: 1 Batch: 8282/38378 (21.58%) Loss: 1.975187 LR: 0.00004790 [03:06:48] Epoch: 1 Batch: 8283/38378 (21.58%) Loss: 1.860868 LR: 0.00004790 [03:06:50] Epoch: 1 Batch: 8284/38378 (21.59%) Loss: 1.716512 LR: 0.00004790 [03:06:51] Epoch: 1 Batch: 8285/38378 (21.59%) Loss: 1.894977 LR: 0.00004790 [03:06:53] Epoch: 1 Batch: 8286/38378 (21.59%) Loss: 2.097499 LR: 0.00004790 [03:06:55] Epoch: 1 Batch: 8287/38378 (21.59%) Loss: 1.978143 LR: 0.00004790 [03:06:57] Epoch: 1 Batch: 8288/38378 (21.60%) Loss: 2.002612 LR: 0.00004790 [03:06:59] Epoch: 1 Batch: 8289/38378 (21.60%) Loss: 2.339187 LR: 0.00004790 [03:07:01] Epoch: 1 Batch: 8290/38378 (21.60%) Loss: 2.009681 LR: 0.00004790 [03:07:02] Epoch: 1 Batch: 8291/38378 (21.60%) Loss: 1.972662 LR: 0.00004790 [03:07:04] Epoch: 1 Batch: 8292/38378 (21.61%) Loss: 1.931794 LR: 0.00004790 [03:07:06] Epoch: 1 Batch: 8293/38378 (21.61%) Loss: 1.691731 LR: 0.00004790 [03:07:08] Epoch: 1 Batch: 8294/38378 (21.61%) Loss: 2.448151 LR: 0.00004789 [03:07:10] Epoch: 1 Batch: 8295/38378 (21.61%) Loss: 2.202015 LR: 0.00004789 [03:07:11] Epoch: 1 Batch: 8296/38378 (21.62%) Loss: 1.869607 LR: 0.00004789 [03:07:13] Epoch: 1 Batch: 8297/38378 (21.62%) Loss: 2.056697 LR: 0.00004789 [03:07:15] Epoch: 1 Batch: 8298/38378 (21.62%) Loss: 2.095424 LR: 0.00004789 [03:07:17] Epoch: 1 Batch: 8299/38378 (21.62%) Loss: 2.083735 LR: 0.00004789 [03:07:19] Epoch: 1 Batch: 8300/38378 (21.63%) Loss: 1.851310 LR: 0.00004789 [03:07:20] Epoch: 1 Batch: 8301/38378 (21.63%) Loss: 1.848316 LR: 0.00004788 [03:07:22] Epoch: 1 Batch: 8302/38378 (21.63%) Loss: 2.093216 LR: 0.00004788 [03:07:24] Epoch: 1 Batch: 8303/38378 (21.63%) Loss: 2.083670 LR: 0.00004788 [03:07:26] Epoch: 1 Batch: 8304/38378 (21.64%) Loss: 2.075788 LR: 0.00004788 [03:07:28] Epoch: 1 Batch: 8305/38378 (21.64%) Loss: 2.202710 LR: 0.00004788 [03:07:29] Epoch: 1 Batch: 8306/38378 (21.64%) Loss: 2.134157 LR: 0.00004788 [03:07:31] Epoch: 1 Batch: 8307/38378 (21.65%) Loss: 1.904704 LR: 0.00004788 [03:07:33] Epoch: 1 Batch: 8308/38378 (21.65%) Loss: 2.064005 LR: 0.00004788 [03:07:35] Epoch: 1 Batch: 8309/38378 (21.65%) Loss: 2.312336 LR: 0.00004788 [03:07:37] Epoch: 1 Batch: 8310/38378 (21.65%) Loss: 1.754493 LR: 0.00004788 [03:07:39] Epoch: 1 Batch: 8311/38378 (21.66%) Loss: 1.620817 LR: 0.00004788 [03:07:40] Epoch: 1 Batch: 8312/38378 (21.66%) Loss: 1.943750 LR: 0.00004788 [03:07:42] Epoch: 1 Batch: 8313/38378 (21.66%) Loss: 2.088125 LR: 0.00004788 [03:07:44] Epoch: 1 Batch: 8314/38378 (21.66%) Loss: 1.968967 LR: 0.00004788 [03:07:46] Epoch: 1 Batch: 8315/38378 (21.67%) Loss: 1.842138 LR: 0.00004787 [03:07:48] Epoch: 1 Batch: 8316/38378 (21.67%) Loss: 2.068822 LR: 0.00004787 [03:07:50] Epoch: 1 Batch: 8317/38378 (21.67%) Loss: 2.183222 LR: 0.00004787 [03:07:51] Epoch: 1 Batch: 8318/38378 (21.67%) Loss: 2.152869 LR: 0.00004787 [03:07:53] Epoch: 1 Batch: 8319/38378 (21.68%) Loss: 2.046955 LR: 0.00004787 [03:07:55] Epoch: 1 Batch: 8320/38378 (21.68%) Loss: 1.919553 LR: 0.00004787 [03:07:57] Epoch: 1 Batch: 8321/38378 (21.68%) Loss: 2.102647 LR: 0.00004787 [03:07:59] Epoch: 1 Batch: 8322/38378 (21.68%) Loss: 2.122226 LR: 0.00004787 [03:08:00] Epoch: 1 Batch: 8323/38378 (21.69%) Loss: 2.111784 LR: 0.00004787 [03:08:02] Epoch: 1 Batch: 8324/38378 (21.69%) Loss: 2.247846 LR: 0.00004787 [03:08:04] Epoch: 1 Batch: 8325/38378 (21.69%) Loss: 1.835209 LR: 0.00004787 [03:08:06] Epoch: 1 Batch: 8326/38378 (21.69%) Loss: 1.886048 LR: 0.00004787 [03:08:08] Epoch: 1 Batch: 8327/38378 (21.70%) Loss: 1.989193 LR: 0.00004787 [03:08:09] Epoch: 1 Batch: 8328/38378 (21.70%) Loss: 2.360810 LR: 0.00004787 [03:08:11] Epoch: 1 Batch: 8329/38378 (21.70%) Loss: 1.883485 LR: 0.00004786 [03:08:13] Epoch: 1 Batch: 8330/38378 (21.71%) Loss: 1.871117 LR: 0.00004786 [03:08:15] Epoch: 1 Batch: 8331/38378 (21.71%) Loss: 2.292341 LR: 0.00004786 [03:08:17] Epoch: 1 Batch: 8332/38378 (21.71%) Loss: 1.814601 LR: 0.00004786 [03:08:18] Epoch: 1 Batch: 8333/38378 (21.71%) Loss: 1.934506 LR: 0.00004786 [03:08:20] Epoch: 1 Batch: 8334/38378 (21.72%) Loss: 2.037055 LR: 0.00004786 [03:08:22] Epoch: 1 Batch: 8335/38378 (21.72%) Loss: 1.817032 LR: 0.00004786 [03:08:24] Epoch: 1 Batch: 8336/38378 (21.72%) Loss: 1.779308 LR: 0.00004786 [03:08:26] Epoch: 1 Batch: 8337/38378 (21.72%) Loss: 2.046139 LR: 0.00004786 [03:08:28] Epoch: 1 Batch: 8338/38378 (21.73%) Loss: 2.175580 LR: 0.00004786 [03:08:29] Epoch: 1 Batch: 8339/38378 (21.73%) Loss: 2.134446 LR: 0.00004786 [03:08:31] Epoch: 1 Batch: 8340/38378 (21.73%) Loss: 1.904938 LR: 0.00004786 [03:08:33] Epoch: 1 Batch: 8341/38378 (21.73%) Loss: 1.685404 LR: 0.00004786 [03:08:35] Epoch: 1 Batch: 8342/38378 (21.74%) Loss: 1.925023 LR: 0.00004786 [03:08:36] Epoch: 1 Batch: 8343/38378 (21.74%) Loss: 2.094517 LR: 0.00004785 [03:08:38] Epoch: 1 Batch: 8344/38378 (21.74%) Loss: 2.030531 LR: 0.00004785 [03:08:40] Epoch: 1 Batch: 8345/38378 (21.74%) Loss: 1.702441 LR: 0.00004785 [03:08:42] Epoch: 1 Batch: 8346/38378 (21.75%) Loss: 2.132737 LR: 0.00004785 [03:08:44] Epoch: 1 Batch: 8347/38378 (21.75%) Loss: 2.074157 LR: 0.00004785 [03:08:45] Epoch: 1 Batch: 8348/38378 (21.75%) Loss: 1.917263 LR: 0.00004785 [03:08:47] Epoch: 1 Batch: 8349/38378 (21.75%) Loss: 2.158683 LR: 0.00004785 [03:08:49] Epoch: 1 Batch: 8350/38378 (21.76%) Loss: 1.980731 LR: 0.00004785 [03:08:51] Epoch: 1 Batch: 8351/38378 (21.76%) Loss: 2.238470 LR: 0.00004785 [03:08:53] Epoch: 1 Batch: 8352/38378 (21.76%) Loss: 1.993449 LR: 0.00004785 [03:08:55] Epoch: 1 Batch: 8353/38378 (21.77%) Loss: 1.862625 LR: 0.00004785 [03:08:56] Epoch: 1 Batch: 8354/38378 (21.77%) Loss: 2.081552 LR: 0.00004785 [03:08:58] Epoch: 1 Batch: 8355/38378 (21.77%) Loss: 1.994331 LR: 0.00004785 [03:09:00] Epoch: 1 Batch: 8356/38378 (21.77%) Loss: 2.336659 LR: 0.00004785 [03:09:01] Epoch: 1 Batch: 8357/38378 (21.78%) Loss: 2.285078 LR: 0.00004784 [03:09:03] Epoch: 1 Batch: 8358/38378 (21.78%) Loss: 2.020790 LR: 0.00004784 [03:09:05] Epoch: 1 Batch: 8359/38378 (21.78%) Loss: 2.168011 LR: 0.00004784 [03:09:07] Epoch: 1 Batch: 8360/38378 (21.78%) Loss: 1.765059 LR: 0.00004784 [03:09:09] Epoch: 1 Batch: 8361/38378 (21.79%) Loss: 1.994056 LR: 0.00004784 [03:09:11] Epoch: 1 Batch: 8362/38378 (21.79%) Loss: 1.795471 LR: 0.00004784 [03:09:12] Epoch: 1 Batch: 8363/38378 (21.79%) Loss: 1.916368 LR: 0.00004784 [03:09:14] Epoch: 1 Batch: 8364/38378 (21.79%) Loss: 1.830927 LR: 0.00004783 [03:09:16] Epoch: 1 Batch: 8365/38378 (21.80%) Loss: 2.224215 LR: 0.00004783 [03:09:18] Epoch: 1 Batch: 8366/38378 (21.80%) Loss: 1.988513 LR: 0.00004783 [03:09:19] Epoch: 1 Batch: 8367/38378 (21.80%) Loss: 2.000482 LR: 0.00004783 [03:09:21] Epoch: 1 Batch: 8368/38378 (21.80%) Loss: 1.919643 LR: 0.00004783 [03:09:23] Epoch: 1 Batch: 8369/38378 (21.81%) Loss: 1.698256 LR: 0.00004783 [03:09:25] Epoch: 1 Batch: 8370/38378 (21.81%) Loss: 2.212921 LR: 0.00004783 [03:09:27] Epoch: 1 Batch: 8371/38378 (21.81%) Loss: 2.008948 LR: 0.00004783 [03:09:29] Epoch: 1 Batch: 8372/38378 (21.81%) Loss: 1.960526 LR: 0.00004783 [03:09:30] Epoch: 1 Batch: 8373/38378 (21.82%) Loss: 2.061747 LR: 0.00004783 [03:09:32] Epoch: 1 Batch: 8374/38378 (21.82%) Loss: 1.790391 LR: 0.00004783 [03:09:34] Epoch: 1 Batch: 8375/38378 (21.82%) Loss: 2.444751 LR: 0.00004783 [03:09:36] Epoch: 1 Batch: 8376/38378 (21.83%) Loss: 2.050276 LR: 0.00004783 [03:09:38] Epoch: 1 Batch: 8377/38378 (21.83%) Loss: 1.961354 LR: 0.00004783 [03:09:40] Epoch: 1 Batch: 8378/38378 (21.83%) Loss: 1.986438 LR: 0.00004782 [03:09:41] Epoch: 1 Batch: 8379/38378 (21.83%) Loss: 1.801977 LR: 0.00004782 [03:09:43] Epoch: 1 Batch: 8380/38378 (21.84%) Loss: 1.919773 LR: 0.00004782 [03:09:45] Epoch: 1 Batch: 8381/38378 (21.84%) Loss: 1.875082 LR: 0.00004782 [03:09:47] Epoch: 1 Batch: 8382/38378 (21.84%) Loss: 1.841200 LR: 0.00004782 [03:09:49] Epoch: 1 Batch: 8383/38378 (21.84%) Loss: 1.820345 LR: 0.00004782 [03:09:50] Epoch: 1 Batch: 8384/38378 (21.85%) Loss: 1.788565 LR: 0.00004782 [03:09:52] Epoch: 1 Batch: 8385/38378 (21.85%) Loss: 2.140554 LR: 0.00004782 [03:09:54] Epoch: 1 Batch: 8386/38378 (21.85%) Loss: 1.907433 LR: 0.00004782 [03:09:56] Epoch: 1 Batch: 8387/38378 (21.85%) Loss: 2.193382 LR: 0.00004782 [03:09:58] Epoch: 1 Batch: 8388/38378 (21.86%) Loss: 2.032856 LR: 0.00004782 [03:10:00] Epoch: 1 Batch: 8389/38378 (21.86%) Loss: 1.647669 LR: 0.00004782 [03:10:01] Epoch: 1 Batch: 8390/38378 (21.86%) Loss: 1.989116 LR: 0.00004782 [03:10:03] Epoch: 1 Batch: 8391/38378 (21.86%) Loss: 2.042062 LR: 0.00004782 [03:10:05] Epoch: 1 Batch: 8392/38378 (21.87%) Loss: 2.170011 LR: 0.00004781 [03:10:07] Epoch: 1 Batch: 8393/38378 (21.87%) Loss: 1.852944 LR: 0.00004781 [03:10:09] Epoch: 1 Batch: 8394/38378 (21.87%) Loss: 2.236347 LR: 0.00004781 [03:10:10] Epoch: 1 Batch: 8395/38378 (21.87%) Loss: 1.868130 LR: 0.00004781 [03:10:12] Epoch: 1 Batch: 8396/38378 (21.88%) Loss: 2.030403 LR: 0.00004781 [03:10:14] Epoch: 1 Batch: 8397/38378 (21.88%) Loss: 1.902255 LR: 0.00004781 [03:10:16] Epoch: 1 Batch: 8398/38378 (21.88%) Loss: 2.157067 LR: 0.00004781 [03:10:18] Epoch: 1 Batch: 8399/38378 (21.88%) Loss: 1.652098 LR: 0.00004781 [03:10:24] >> Cleaned up old temp checkpoint: epoch1_step7755 [03:10:24] >> Temp checkpoint saved: epoch1_step8400, size: 0.1702 GB [03:10:24] Epoch: 1 Batch: 8400/38378 (21.89%) Loss: 2.000503 LR: 0.00004781 [03:10:25] Epoch: 1 Batch: 8401/38378 (21.89%) Loss: 1.661097 LR: 0.00004781 [03:10:27] Epoch: 1 Batch: 8402/38378 (21.89%) Loss: 2.189321 LR: 0.00004781 [03:10:29] Epoch: 1 Batch: 8403/38378 (21.90%) Loss: 2.113332 LR: 0.00004781 [03:10:31] Epoch: 1 Batch: 8404/38378 (21.90%) Loss: 1.949175 LR: 0.00004781 [03:10:33] Epoch: 1 Batch: 8405/38378 (21.90%) Loss: 1.641855 LR: 0.00004781 [03:10:34] Epoch: 1 Batch: 8406/38378 (21.90%) Loss: 1.869258 LR: 0.00004780 [03:10:36] Epoch: 1 Batch: 8407/38378 (21.91%) Loss: 2.031563 LR: 0.00004780 [03:10:38] Epoch: 1 Batch: 8408/38378 (21.91%) Loss: 1.946014 LR: 0.00004780 [03:10:40] Epoch: 1 Batch: 8409/38378 (21.91%) Loss: 1.942540 LR: 0.00004780 [03:10:42] Epoch: 1 Batch: 8410/38378 (21.91%) Loss: 1.949252 LR: 0.00004780 [03:10:44] Epoch: 1 Batch: 8411/38378 (21.92%) Loss: 2.103549 LR: 0.00004780 [03:10:45] Epoch: 1 Batch: 8412/38378 (21.92%) Loss: 2.300362 LR: 0.00004780 [03:10:47] Epoch: 1 Batch: 8413/38378 (21.92%) Loss: 1.643677 LR: 0.00004779 [03:10:49] Epoch: 1 Batch: 8414/38378 (21.92%) Loss: 1.778654 LR: 0.00004779 [03:10:51] Epoch: 1 Batch: 8415/38378 (21.93%) Loss: 2.138963 LR: 0.00004779 [03:10:53] Epoch: 1 Batch: 8416/38378 (21.93%) Loss: 1.913583 LR: 0.00004779 [03:10:55] Epoch: 1 Batch: 8417/38378 (21.93%) Loss: 1.776061 LR: 0.00004779 [03:10:56] Epoch: 1 Batch: 8418/38378 (21.93%) Loss: 1.890731 LR: 0.00004779 [03:10:58] Epoch: 1 Batch: 8419/38378 (21.94%) Loss: 1.917440 LR: 0.00004779 [03:11:00] Epoch: 1 Batch: 8420/38378 (21.94%) Loss: 1.871977 LR: 0.00004779 [03:11:02] Epoch: 1 Batch: 8421/38378 (21.94%) Loss: 2.028303 LR: 0.00004779 [03:11:04] Epoch: 1 Batch: 8422/38378 (21.94%) Loss: 2.175402 LR: 0.00004779 [03:11:06] Epoch: 1 Batch: 8423/38378 (21.95%) Loss: 1.796144 LR: 0.00004779 [03:11:07] Epoch: 1 Batch: 8424/38378 (21.95%) Loss: 1.698666 LR: 0.00004779 [03:11:09] Epoch: 1 Batch: 8425/38378 (21.95%) Loss: 1.835672 LR: 0.00004779 [03:11:11] Epoch: 1 Batch: 8426/38378 (21.96%) Loss: 2.282600 LR: 0.00004779 [03:11:13] Epoch: 1 Batch: 8427/38378 (21.96%) Loss: 1.955519 LR: 0.00004778 [03:11:15] Epoch: 1 Batch: 8428/38378 (21.96%) Loss: 2.141587 LR: 0.00004778 [03:11:16] Epoch: 1 Batch: 8429/38378 (21.96%) Loss: 1.851147 LR: 0.00004778 [03:11:18] Epoch: 1 Batch: 8430/38378 (21.97%) Loss: 2.124201 LR: 0.00004778 [03:11:20] Epoch: 1 Batch: 8431/38378 (21.97%) Loss: 1.962720 LR: 0.00004778 [03:11:22] Epoch: 1 Batch: 8432/38378 (21.97%) Loss: 2.113667 LR: 0.00004778 [03:11:24] Epoch: 1 Batch: 8433/38378 (21.97%) Loss: 2.459851 LR: 0.00004778 [03:11:26] Epoch: 1 Batch: 8434/38378 (21.98%) Loss: 2.112270 LR: 0.00004778 [03:11:27] Epoch: 1 Batch: 8435/38378 (21.98%) Loss: 1.820575 LR: 0.00004778 [03:11:29] Epoch: 1 Batch: 8436/38378 (21.98%) Loss: 1.870984 LR: 0.00004778 [03:11:31] Epoch: 1 Batch: 8437/38378 (21.98%) Loss: 2.104205 LR: 0.00004778 [03:11:33] Epoch: 1 Batch: 8438/38378 (21.99%) Loss: 2.112763 LR: 0.00004778 [03:11:35] Epoch: 1 Batch: 8439/38378 (21.99%) Loss: 2.018190 LR: 0.00004778 [03:11:36] Epoch: 1 Batch: 8440/38378 (21.99%) Loss: 1.893036 LR: 0.00004778 [03:11:38] Epoch: 1 Batch: 8441/38378 (21.99%) Loss: 2.145040 LR: 0.00004777 [03:11:40] Epoch: 1 Batch: 8442/38378 (22.00%) Loss: 2.066367 LR: 0.00004777 [03:11:42] Epoch: 1 Batch: 8443/38378 (22.00%) Loss: 1.897894 LR: 0.00004777 [03:11:44] Epoch: 1 Batch: 8444/38378 (22.00%) Loss: 2.273404 LR: 0.00004777 [03:11:45] Epoch: 1 Batch: 8445/38378 (22.00%) Loss: 2.177021 LR: 0.00004777 [03:11:47] Epoch: 1 Batch: 8446/38378 (22.01%) Loss: 2.084403 LR: 0.00004777 [03:11:49] Epoch: 1 Batch: 8447/38378 (22.01%) Loss: 1.969411 LR: 0.00004777 [03:11:51] Epoch: 1 Batch: 8448/38378 (22.01%) Loss: 1.981872 LR: 0.00004777 [03:11:53] Epoch: 1 Batch: 8449/38378 (22.02%) Loss: 2.078520 LR: 0.00004777 [03:11:55] Epoch: 1 Batch: 8450/38378 (22.02%) Loss: 2.213328 LR: 0.00004777 [03:11:56] Epoch: 1 Batch: 8451/38378 (22.02%) Loss: 1.888661 LR: 0.00004777 [03:11:58] Epoch: 1 Batch: 8452/38378 (22.02%) Loss: 2.004865 LR: 0.00004777 [03:12:00] Epoch: 1 Batch: 8453/38378 (22.03%) Loss: 1.985460 LR: 0.00004777 [03:12:02] Epoch: 1 Batch: 8454/38378 (22.03%) Loss: 1.890639 LR: 0.00004777 [03:12:04] Epoch: 1 Batch: 8455/38378 (22.03%) Loss: 2.057177 LR: 0.00004776 [03:12:06] Epoch: 1 Batch: 8456/38378 (22.03%) Loss: 2.199497 LR: 0.00004776 [03:12:07] Epoch: 1 Batch: 8457/38378 (22.04%) Loss: 2.089747 LR: 0.00004776 [03:12:09] Epoch: 1 Batch: 8458/38378 (22.04%) Loss: 1.667341 LR: 0.00004776 [03:12:11] Epoch: 1 Batch: 8459/38378 (22.04%) Loss: 2.117139 LR: 0.00004776 [03:12:13] Epoch: 1 Batch: 8460/38378 (22.04%) Loss: 2.141109 LR: 0.00004776 [03:12:15] Epoch: 1 Batch: 8461/38378 (22.05%) Loss: 1.966746 LR: 0.00004776 [03:12:16] Epoch: 1 Batch: 8462/38378 (22.05%) Loss: 1.929034 LR: 0.00004775 [03:12:18] Epoch: 1 Batch: 8463/38378 (22.05%) Loss: 2.341074 LR: 0.00004775 [03:12:20] Epoch: 1 Batch: 8464/38378 (22.05%) Loss: 1.964143 LR: 0.00004775 [03:12:22] Epoch: 1 Batch: 8465/38378 (22.06%) Loss: 2.011792 LR: 0.00004775 [03:12:24] Epoch: 1 Batch: 8466/38378 (22.06%) Loss: 1.819196 LR: 0.00004775 [03:12:25] Epoch: 1 Batch: 8467/38378 (22.06%) Loss: 2.101501 LR: 0.00004775 [03:12:27] Epoch: 1 Batch: 8468/38378 (22.06%) Loss: 2.427652 LR: 0.00004775 [03:12:29] Epoch: 1 Batch: 8469/38378 (22.07%) Loss: 2.115909 LR: 0.00004775 [03:12:31] Epoch: 1 Batch: 8470/38378 (22.07%) Loss: 1.754575 LR: 0.00004775 [03:12:33] Epoch: 1 Batch: 8471/38378 (22.07%) Loss: 2.298028 LR: 0.00004775 [03:12:34] Epoch: 1 Batch: 8472/38378 (22.08%) Loss: 2.348003 LR: 0.00004775 [03:12:36] Epoch: 1 Batch: 8473/38378 (22.08%) Loss: 1.897339 LR: 0.00004775 [03:12:38] Epoch: 1 Batch: 8474/38378 (22.08%) Loss: 1.921705 LR: 0.00004775 [03:12:40] Epoch: 1 Batch: 8475/38378 (22.08%) Loss: 1.891410 LR: 0.00004775 [03:12:42] Epoch: 1 Batch: 8476/38378 (22.09%) Loss: 2.406947 LR: 0.00004774 [03:12:43] Epoch: 1 Batch: 8477/38378 (22.09%) Loss: 1.887209 LR: 0.00004774 [03:12:45] Epoch: 1 Batch: 8478/38378 (22.09%) Loss: 2.126408 LR: 0.00004774 [03:12:47] Epoch: 1 Batch: 8479/38378 (22.09%) Loss: 1.609110 LR: 0.00004774 [03:12:49] Epoch: 1 Batch: 8480/38378 (22.10%) Loss: 1.953901 LR: 0.00004774 [03:12:51] Epoch: 1 Batch: 8481/38378 (22.10%) Loss: 1.820273 LR: 0.00004774 [03:12:52] Epoch: 1 Batch: 8482/38378 (22.10%) Loss: 2.125658 LR: 0.00004774 [03:12:54] Epoch: 1 Batch: 8483/38378 (22.10%) Loss: 1.975016 LR: 0.00004774 [03:12:56] Epoch: 1 Batch: 8484/38378 (22.11%) Loss: 1.915442 LR: 0.00004774 [03:12:58] Epoch: 1 Batch: 8485/38378 (22.11%) Loss: 1.983678 LR: 0.00004774 [03:13:00] Epoch: 1 Batch: 8486/38378 (22.11%) Loss: 2.203883 LR: 0.00004774 [03:13:02] Epoch: 1 Batch: 8487/38378 (22.11%) Loss: 1.840296 LR: 0.00004774 [03:13:03] Epoch: 1 Batch: 8488/38378 (22.12%) Loss: 2.159626 LR: 0.00004774 [03:13:05] Epoch: 1 Batch: 8489/38378 (22.12%) Loss: 2.097397 LR: 0.00004774 [03:13:07] Epoch: 1 Batch: 8490/38378 (22.12%) Loss: 1.942394 LR: 0.00004773 [03:13:09] Epoch: 1 Batch: 8491/38378 (22.12%) Loss: 2.332338 LR: 0.00004773 [03:13:11] Epoch: 1 Batch: 8492/38378 (22.13%) Loss: 1.895753 LR: 0.00004773 [03:13:13] Epoch: 1 Batch: 8493/38378 (22.13%) Loss: 2.018970 LR: 0.00004773 [03:13:14] Epoch: 1 Batch: 8494/38378 (22.13%) Loss: 1.828760 LR: 0.00004773 [03:13:16] Epoch: 1 Batch: 8495/38378 (22.14%) Loss: 1.741158 LR: 0.00004773 [03:13:18] Epoch: 1 Batch: 8496/38378 (22.14%) Loss: 1.890747 LR: 0.00004773 [03:13:20] Epoch: 1 Batch: 8497/38378 (22.14%) Loss: 1.778926 LR: 0.00004773 [03:13:22] Epoch: 1 Batch: 8498/38378 (22.14%) Loss: 1.874176 LR: 0.00004773 [03:13:23] Epoch: 1 Batch: 8499/38378 (22.15%) Loss: 2.311595 LR: 0.00004773 [03:13:25] >> Evaluating batch 0 [03:13:26] >> Evaluating batch 1 [03:13:27] >> Evaluating batch 2 [03:13:28] >> Evaluating batch 3 [03:13:29] >> Evaluating batch 4 [03:13:30] >> Evaluating batch 5 [03:13:31] >> Evaluating batch 6 [03:13:32] >> Evaluating batch 7 [03:13:33] >> Evaluating batch 8 [03:13:34] >> Evaluating batch 9 [03:13:35] >> Evaluating batch 10 [03:13:36] >> Evaluating batch 11 [03:13:37] >> Evaluating batch 12 [03:13:38] >> Evaluating batch 13 [03:13:39] >> Evaluating batch 14 [03:13:40] >> Evaluating batch 15 [03:13:41] >> Evaluating batch 16 [03:13:42] Epoch: 1 Step: 8500/38378 Evaluation: [03:13:42] [1mAvg Loss Since Last Eval: 2.0154 Val Loss: 2.1200 Validation loss delta: -0.0055 Perplexity: 8.3311 LR: 0.00004773 [03:13:46] >> Checkpoint saved: epoch1_step8500, size: 0.1702 GB [03:13:46] Epoch: 1 Batch: 8500/38378 (22.15%) Loss: 1.995334 LR: 0.00004773 [03:13:48] Epoch: 1 Batch: 8501/38378 (22.15%) Loss: 2.084596 LR: 0.00004773 [03:13:50] Epoch: 1 Batch: 8502/38378 (22.15%) Loss: 1.994964 LR: 0.00004773 [03:13:52] Epoch: 1 Batch: 8503/38378 (22.16%) Loss: 1.761729 LR: 0.00004773 [03:13:53] Epoch: 1 Batch: 8504/38378 (22.16%) Loss: 1.876814 LR: 0.00004772 [03:13:55] Epoch: 1 Batch: 8505/38378 (22.16%) Loss: 1.964409 LR: 0.00004772 [03:13:57] Epoch: 1 Batch: 8506/38378 (22.16%) Loss: 1.907187 LR: 0.00004772 [03:13:59] Epoch: 1 Batch: 8507/38378 (22.17%) Loss: 1.743956 LR: 0.00004772 [03:14:01] Epoch: 1 Batch: 8508/38378 (22.17%) Loss: 1.993391 LR: 0.00004772 [03:14:02] Epoch: 1 Batch: 8509/38378 (22.17%) Loss: 2.286578 LR: 0.00004772 [03:14:04] Epoch: 1 Batch: 8510/38378 (22.17%) Loss: 2.102301 LR: 0.00004772 [03:14:06] Epoch: 1 Batch: 8511/38378 (22.18%) Loss: 2.302190 LR: 0.00004771 [03:14:08] Epoch: 1 Batch: 8512/38378 (22.18%) Loss: 2.048070 LR: 0.00004771 [03:14:10] Epoch: 1 Batch: 8513/38378 (22.18%) Loss: 2.093873 LR: 0.00004771 [03:14:12] Epoch: 1 Batch: 8514/38378 (22.18%) Loss: 2.072031 LR: 0.00004771 [03:14:14] Epoch: 1 Batch: 8515/38378 (22.19%) Loss: 2.160517 LR: 0.00004771 [03:14:15] Epoch: 1 Batch: 8516/38378 (22.19%) Loss: 2.016633 LR: 0.00004771 [03:14:17] Epoch: 1 Batch: 8517/38378 (22.19%) Loss: 1.932083 LR: 0.00004771 [03:14:19] Epoch: 1 Batch: 8518/38378 (22.20%) Loss: 2.069827 LR: 0.00004771 [03:14:21] Epoch: 1 Batch: 8519/38378 (22.20%) Loss: 1.715482 LR: 0.00004771 [03:14:23] Epoch: 1 Batch: 8520/38378 (22.20%) Loss: 1.800007 LR: 0.00004771 [03:14:25] Epoch: 1 Batch: 8521/38378 (22.20%) Loss: 2.102190 LR: 0.00004771 [03:14:26] Epoch: 1 Batch: 8522/38378 (22.21%) Loss: 2.060934 LR: 0.00004771 [03:14:28] Epoch: 1 Batch: 8523/38378 (22.21%) Loss: 2.181480 LR: 0.00004771 [03:14:30] Epoch: 1 Batch: 8524/38378 (22.21%) Loss: 1.830884 LR: 0.00004771 [03:14:32] Epoch: 1 Batch: 8525/38378 (22.21%) Loss: 1.841083 LR: 0.00004770 [03:14:34] Epoch: 1 Batch: 8526/38378 (22.22%) Loss: 1.940845 LR: 0.00004770 [03:14:36] Epoch: 1 Batch: 8527/38378 (22.22%) Loss: 1.891014 LR: 0.00004770 [03:14:37] Epoch: 1 Batch: 8528/38378 (22.22%) Loss: 1.931740 LR: 0.00004770 [03:14:39] Epoch: 1 Batch: 8529/38378 (22.22%) Loss: 2.107671 LR: 0.00004770 [03:14:41] Epoch: 1 Batch: 8530/38378 (22.23%) Loss: 2.150391 LR: 0.00004770 [03:14:43] Epoch: 1 Batch: 8531/38378 (22.23%) Loss: 2.197390 LR: 0.00004770 [03:14:45] Epoch: 1 Batch: 8532/38378 (22.23%) Loss: 1.807903 LR: 0.00004770 [03:14:46] Epoch: 1 Batch: 8533/38378 (22.23%) Loss: 1.910434 LR: 0.00004770 [03:14:48] Epoch: 1 Batch: 8534/38378 (22.24%) Loss: 2.127063 LR: 0.00004770 [03:14:50] Epoch: 1 Batch: 8535/38378 (22.24%) Loss: 2.052518 LR: 0.00004770 [03:14:52] Epoch: 1 Batch: 8536/38378 (22.24%) Loss: 2.168076 LR: 0.00004770 [03:14:54] Epoch: 1 Batch: 8537/38378 (22.24%) Loss: 1.994446 LR: 0.00004770 [03:14:55] Epoch: 1 Batch: 8538/38378 (22.25%) Loss: 2.055612 LR: 0.00004770 [03:14:57] Epoch: 1 Batch: 8539/38378 (22.25%) Loss: 1.900748 LR: 0.00004769 [03:14:59] Epoch: 1 Batch: 8540/38378 (22.25%) Loss: 1.908839 LR: 0.00004769 [03:15:01] Epoch: 1 Batch: 8541/38378 (22.25%) Loss: 2.022840 LR: 0.00004769 [03:15:03] Epoch: 1 Batch: 8542/38378 (22.26%) Loss: 2.155651 LR: 0.00004769 [03:15:04] Epoch: 1 Batch: 8543/38378 (22.26%) Loss: 2.073529 LR: 0.00004769 [03:15:06] Epoch: 1 Batch: 8544/38378 (22.26%) Loss: 2.147232 LR: 0.00004769 [03:15:08] Epoch: 1 Batch: 8545/38378 (22.27%) Loss: 1.995788 LR: 0.00004769 [03:15:10] Epoch: 1 Batch: 8546/38378 (22.27%) Loss: 1.955246 LR: 0.00004769 [03:15:12] Epoch: 1 Batch: 8547/38378 (22.27%) Loss: 2.135294 LR: 0.00004769 [03:15:14] Epoch: 1 Batch: 8548/38378 (22.27%) Loss: 2.036642 LR: 0.00004769 [03:15:15] Epoch: 1 Batch: 8549/38378 (22.28%) Loss: 1.946110 LR: 0.00004769 [03:15:17] Epoch: 1 Batch: 8550/38378 (22.28%) Loss: 2.002348 LR: 0.00004769 [03:15:19] Epoch: 1 Batch: 8551/38378 (22.28%) Loss: 1.684548 LR: 0.00004769 [03:15:21] Epoch: 1 Batch: 8552/38378 (22.28%) Loss: 2.019252 LR: 0.00004769 [03:15:23] Epoch: 1 Batch: 8553/38378 (22.29%) Loss: 2.056675 LR: 0.00004768 [03:15:24] Epoch: 1 Batch: 8554/38378 (22.29%) Loss: 1.893888 LR: 0.00004768 [03:15:26] Epoch: 1 Batch: 8555/38378 (22.29%) Loss: 1.791684 LR: 0.00004768 [03:15:28] Epoch: 1 Batch: 8556/38378 (22.29%) Loss: 2.134743 LR: 0.00004768 [03:15:30] Epoch: 1 Batch: 8557/38378 (22.30%) Loss: 1.946015 LR: 0.00004768 [03:15:32] Epoch: 1 Batch: 8558/38378 (22.30%) Loss: 2.227021 LR: 0.00004768 [03:15:33] Epoch: 1 Batch: 8559/38378 (22.30%) Loss: 1.916670 LR: 0.00004768 [03:15:35] Epoch: 1 Batch: 8560/38378 (22.30%) Loss: 1.970124 LR: 0.00004767 [03:15:37] Epoch: 1 Batch: 8561/38378 (22.31%) Loss: 2.262810 LR: 0.00004767 [03:15:39] Epoch: 1 Batch: 8562/38378 (22.31%) Loss: 2.249086 LR: 0.00004767 [03:15:41] Epoch: 1 Batch: 8563/38378 (22.31%) Loss: 2.092540 LR: 0.00004767 [03:15:43] Epoch: 1 Batch: 8564/38378 (22.31%) Loss: 2.341392 LR: 0.00004767 [03:15:44] Epoch: 1 Batch: 8565/38378 (22.32%) Loss: 1.928578 LR: 0.00004767 [03:15:46] Epoch: 1 Batch: 8566/38378 (22.32%) Loss: 1.854232 LR: 0.00004767 [03:15:48] Epoch: 1 Batch: 8567/38378 (22.32%) Loss: 1.657388 LR: 0.00004767 [03:15:50] Epoch: 1 Batch: 8568/38378 (22.33%) Loss: 2.023746 LR: 0.00004767 [03:15:52] Epoch: 1 Batch: 8569/38378 (22.33%) Loss: 1.879796 LR: 0.00004767 [03:15:53] Epoch: 1 Batch: 8570/38378 (22.33%) Loss: 2.027297 LR: 0.00004767 [03:15:55] Epoch: 1 Batch: 8571/38378 (22.33%) Loss: 1.779654 LR: 0.00004767 [03:15:57] Epoch: 1 Batch: 8572/38378 (22.34%) Loss: 1.962526 LR: 0.00004767 [03:15:59] Epoch: 1 Batch: 8573/38378 (22.34%) Loss: 2.376302 LR: 0.00004767 [03:16:01] Epoch: 1 Batch: 8574/38378 (22.34%) Loss: 2.296014 LR: 0.00004766 [03:16:03] Epoch: 1 Batch: 8575/38378 (22.34%) Loss: 1.950207 LR: 0.00004766 [03:16:04] Epoch: 1 Batch: 8576/38378 (22.35%) Loss: 2.234977 LR: 0.00004766 [03:16:06] Epoch: 1 Batch: 8577/38378 (22.35%) Loss: 2.264228 LR: 0.00004766 [03:16:08] Epoch: 1 Batch: 8578/38378 (22.35%) Loss: 1.924372 LR: 0.00004766 [03:16:10] Epoch: 1 Batch: 8579/38378 (22.35%) Loss: 1.649318 LR: 0.00004766 [03:16:12] Epoch: 1 Batch: 8580/38378 (22.36%) Loss: 1.949658 LR: 0.00004766 [03:16:13] Epoch: 1 Batch: 8581/38378 (22.36%) Loss: 2.143515 LR: 0.00004766 [03:16:15] Epoch: 1 Batch: 8582/38378 (22.36%) Loss: 2.149496 LR: 0.00004766 [03:16:17] Epoch: 1 Batch: 8583/38378 (22.36%) Loss: 2.155423 LR: 0.00004766 [03:16:19] Epoch: 1 Batch: 8584/38378 (22.37%) Loss: 1.864259 LR: 0.00004766 [03:16:20] Epoch: 1 Batch: 8585/38378 (22.37%) Loss: 2.028039 LR: 0.00004766 [03:16:22] Epoch: 1 Batch: 8586/38378 (22.37%) Loss: 2.283403 LR: 0.00004766 [03:16:24] Epoch: 1 Batch: 8587/38378 (22.37%) Loss: 1.976132 LR: 0.00004766 [03:16:26] Epoch: 1 Batch: 8588/38378 (22.38%) Loss: 1.818585 LR: 0.00004765 [03:16:28] Epoch: 1 Batch: 8589/38378 (22.38%) Loss: 2.075837 LR: 0.00004765 [03:16:30] Epoch: 1 Batch: 8590/38378 (22.38%) Loss: 1.780525 LR: 0.00004765 [03:16:31] Epoch: 1 Batch: 8591/38378 (22.39%) Loss: 1.949432 LR: 0.00004765 [03:16:33] Epoch: 1 Batch: 8592/38378 (22.39%) Loss: 2.002723 LR: 0.00004765 [03:16:35] Epoch: 1 Batch: 8593/38378 (22.39%) Loss: 1.914204 LR: 0.00004765 [03:16:37] Epoch: 1 Batch: 8594/38378 (22.39%) Loss: 2.229866 LR: 0.00004765 [03:16:39] Epoch: 1 Batch: 8595/38378 (22.40%) Loss: 1.894504 LR: 0.00004764 [03:16:40] Epoch: 1 Batch: 8596/38378 (22.40%) Loss: 2.210873 LR: 0.00004764 [03:16:42] Epoch: 1 Batch: 8597/38378 (22.40%) Loss: 1.994613 LR: 0.00004764 [03:16:44] Epoch: 1 Batch: 8598/38378 (22.40%) Loss: 2.125347 LR: 0.00004764 [03:16:46] Epoch: 1 Batch: 8599/38378 (22.41%) Loss: 1.915160 LR: 0.00004764 [03:16:52] >> Cleaned up old temp checkpoint: epoch1_step7788 [03:16:52] >> Temp checkpoint saved: epoch1_step8600, size: 0.1702 GB [03:16:52] Epoch: 1 Batch: 8600/38378 (22.41%) Loss: 2.001736 LR: 0.00004764 [03:16:53] Epoch: 1 Batch: 8601/38378 (22.41%) Loss: 2.072408 LR: 0.00004764 [03:16:55] Epoch: 1 Batch: 8602/38378 (22.41%) Loss: 2.333053 LR: 0.00004764 [03:16:57] Epoch: 1 Batch: 8603/38378 (22.42%) Loss: 2.021613 LR: 0.00004764 [03:16:59] Epoch: 1 Batch: 8604/38378 (22.42%) Loss: 2.281033 LR: 0.00004764 [03:17:01] Epoch: 1 Batch: 8605/38378 (22.42%) Loss: 1.948945 LR: 0.00004764 [03:17:02] Epoch: 1 Batch: 8606/38378 (22.42%) Loss: 2.229289 LR: 0.00004764 [03:17:04] Epoch: 1 Batch: 8607/38378 (22.43%) Loss: 1.840835 LR: 0.00004764 [03:17:06] Epoch: 1 Batch: 8608/38378 (22.43%) Loss: 2.238033 LR: 0.00004764 [03:17:08] Epoch: 1 Batch: 8609/38378 (22.43%) Loss: 1.887048 LR: 0.00004763 [03:17:10] Epoch: 1 Batch: 8610/38378 (22.43%) Loss: 1.959313 LR: 0.00004763 [03:17:12] Epoch: 1 Batch: 8611/38378 (22.44%) Loss: 2.318200 LR: 0.00004763 [03:17:13] Epoch: 1 Batch: 8612/38378 (22.44%) Loss: 2.032145 LR: 0.00004763 [03:17:15] Epoch: 1 Batch: 8613/38378 (22.44%) Loss: 2.189077 LR: 0.00004763 [03:17:17] Epoch: 1 Batch: 8614/38378 (22.45%) Loss: 1.954774 LR: 0.00004763 [03:17:19] Epoch: 1 Batch: 8615/38378 (22.45%) Loss: 2.243237 LR: 0.00004763 [03:17:21] Epoch: 1 Batch: 8616/38378 (22.45%) Loss: 1.909718 LR: 0.00004763 [03:17:23] Epoch: 1 Batch: 8617/38378 (22.45%) Loss: 2.102046 LR: 0.00004763 [03:17:24] Epoch: 1 Batch: 8618/38378 (22.46%) Loss: 1.712692 LR: 0.00004763 [03:17:26] Epoch: 1 Batch: 8619/38378 (22.46%) Loss: 1.730050 LR: 0.00004763 [03:17:28] Epoch: 1 Batch: 8620/38378 (22.46%) Loss: 2.384350 LR: 0.00004763 [03:17:30] Epoch: 1 Batch: 8621/38378 (22.46%) Loss: 1.959595 LR: 0.00004763 [03:17:32] Epoch: 1 Batch: 8622/38378 (22.47%) Loss: 1.711236 LR: 0.00004763 [03:17:34] Epoch: 1 Batch: 8623/38378 (22.47%) Loss: 2.154005 LR: 0.00004762 [03:17:35] Epoch: 1 Batch: 8624/38378 (22.47%) Loss: 1.950876 LR: 0.00004762 [03:17:37] Epoch: 1 Batch: 8625/38378 (22.47%) Loss: 2.080679 LR: 0.00004762 [03:17:39] Epoch: 1 Batch: 8626/38378 (22.48%) Loss: 2.182483 LR: 0.00004762 [03:17:41] Epoch: 1 Batch: 8627/38378 (22.48%) Loss: 2.231094 LR: 0.00004762 [03:17:43] Epoch: 1 Batch: 8628/38378 (22.48%) Loss: 2.066381 LR: 0.00004762 [03:17:44] Epoch: 1 Batch: 8629/38378 (22.48%) Loss: 2.017373 LR: 0.00004762 [03:17:46] Epoch: 1 Batch: 8630/38378 (22.49%) Loss: 2.079048 LR: 0.00004761 [03:17:48] Epoch: 1 Batch: 8631/38378 (22.49%) Loss: 1.718363 LR: 0.00004761 [03:17:50] Epoch: 1 Batch: 8632/38378 (22.49%) Loss: 2.003346 LR: 0.00004761 [03:17:52] Epoch: 1 Batch: 8633/38378 (22.49%) Loss: 1.930657 LR: 0.00004761 [03:17:53] Epoch: 1 Batch: 8634/38378 (22.50%) Loss: 1.836403 LR: 0.00004761 [03:17:55] Epoch: 1 Batch: 8635/38378 (22.50%) Loss: 1.866896 LR: 0.00004761 [03:17:57] Epoch: 1 Batch: 8636/38378 (22.50%) Loss: 2.183445 LR: 0.00004761 [03:17:59] Epoch: 1 Batch: 8637/38378 (22.51%) Loss: 1.888120 LR: 0.00004761 [03:18:01] Epoch: 1 Batch: 8638/38378 (22.51%) Loss: 1.753765 LR: 0.00004761 [03:18:03] Epoch: 1 Batch: 8639/38378 (22.51%) Loss: 1.871319 LR: 0.00004761 [03:18:04] Epoch: 1 Batch: 8640/38378 (22.51%) Loss: 1.935270 LR: 0.00004761 [03:18:06] Epoch: 1 Batch: 8641/38378 (22.52%) Loss: 2.194718 LR: 0.00004761 [03:18:08] Epoch: 1 Batch: 8642/38378 (22.52%) Loss: 2.026928 LR: 0.00004761 [03:18:10] Epoch: 1 Batch: 8643/38378 (22.52%) Loss: 1.996885 LR: 0.00004761 [03:18:12] Epoch: 1 Batch: 8644/38378 (22.52%) Loss: 1.971765 LR: 0.00004760 [03:18:13] Epoch: 1 Batch: 8645/38378 (22.53%) Loss: 1.877265 LR: 0.00004760 [03:18:15] Epoch: 1 Batch: 8646/38378 (22.53%) Loss: 2.292495 LR: 0.00004760 [03:18:17] Epoch: 1 Batch: 8647/38378 (22.53%) Loss: 2.166906 LR: 0.00004760 [03:18:19] Epoch: 1 Batch: 8648/38378 (22.53%) Loss: 2.006453 LR: 0.00004760 [03:18:21] Epoch: 1 Batch: 8649/38378 (22.54%) Loss: 2.061064 LR: 0.00004760 [03:18:22] Epoch: 1 Batch: 8650/38378 (22.54%) Loss: 1.830851 LR: 0.00004760 [03:18:24] Epoch: 1 Batch: 8651/38378 (22.54%) Loss: 1.672869 LR: 0.00004760 [03:18:26] Epoch: 1 Batch: 8652/38378 (22.54%) Loss: 2.051566 LR: 0.00004760 [03:18:28] Epoch: 1 Batch: 8653/38378 (22.55%) Loss: 1.788891 LR: 0.00004760 [03:18:30] Epoch: 1 Batch: 8654/38378 (22.55%) Loss: 1.789030 LR: 0.00004760 [03:18:32] Epoch: 1 Batch: 8655/38378 (22.55%) Loss: 2.014768 LR: 0.00004760 [03:18:33] Epoch: 1 Batch: 8656/38378 (22.55%) Loss: 1.895341 LR: 0.00004760 [03:18:35] Epoch: 1 Batch: 8657/38378 (22.56%) Loss: 2.044992 LR: 0.00004760 [03:18:37] Epoch: 1 Batch: 8658/38378 (22.56%) Loss: 2.133060 LR: 0.00004759 [03:18:39] Epoch: 1 Batch: 8659/38378 (22.56%) Loss: 2.143849 LR: 0.00004759 [03:18:41] Epoch: 1 Batch: 8660/38378 (22.57%) Loss: 2.045904 LR: 0.00004759 [03:18:42] Epoch: 1 Batch: 8661/38378 (22.57%) Loss: 2.028126 LR: 0.00004759 [03:18:44] Epoch: 1 Batch: 8662/38378 (22.57%) Loss: 1.896478 LR: 0.00004759 [03:18:46] Epoch: 1 Batch: 8663/38378 (22.57%) Loss: 2.332339 LR: 0.00004759 [03:18:48] Epoch: 1 Batch: 8664/38378 (22.58%) Loss: 1.937010 LR: 0.00004759 [03:18:50] Epoch: 1 Batch: 8665/38378 (22.58%) Loss: 2.140057 LR: 0.00004759 [03:18:51] Epoch: 1 Batch: 8666/38378 (22.58%) Loss: 2.142118 LR: 0.00004759 [03:18:53] Epoch: 1 Batch: 8667/38378 (22.58%) Loss: 2.081564 LR: 0.00004759 [03:18:55] Epoch: 1 Batch: 8668/38378 (22.59%) Loss: 2.176376 LR: 0.00004759 [03:18:57] Epoch: 1 Batch: 8669/38378 (22.59%) Loss: 1.913454 LR: 0.00004759 [03:18:59] Epoch: 1 Batch: 8670/38378 (22.59%) Loss: 1.925469 LR: 0.00004759 [03:19:01] Epoch: 1 Batch: 8671/38378 (22.59%) Loss: 2.301352 LR: 0.00004759 [03:19:02] Epoch: 1 Batch: 8672/38378 (22.60%) Loss: 1.927902 LR: 0.00004758 [03:19:04] Epoch: 1 Batch: 8673/38378 (22.60%) Loss: 2.025926 LR: 0.00004758 [03:19:06] Epoch: 1 Batch: 8674/38378 (22.60%) Loss: 2.031440 LR: 0.00004758 [03:19:08] Epoch: 1 Batch: 8675/38378 (22.60%) Loss: 1.686026 LR: 0.00004758 [03:19:10] Epoch: 1 Batch: 8676/38378 (22.61%) Loss: 1.696853 LR: 0.00004758 [03:19:11] Epoch: 1 Batch: 8677/38378 (22.61%) Loss: 2.028439 LR: 0.00004758 [03:19:13] Epoch: 1 Batch: 8678/38378 (22.61%) Loss: 2.139505 LR: 0.00004758 [03:19:15] Epoch: 1 Batch: 8679/38378 (22.61%) Loss: 2.278124 LR: 0.00004757 [03:19:17] Epoch: 1 Batch: 8680/38378 (22.62%) Loss: 1.902090 LR: 0.00004757 [03:19:19] Epoch: 1 Batch: 8681/38378 (22.62%) Loss: 2.120574 LR: 0.00004757 [03:19:20] Epoch: 1 Batch: 8682/38378 (22.62%) Loss: 2.049294 LR: 0.00004757 [03:19:22] Epoch: 1 Batch: 8683/38378 (22.62%) Loss: 1.775980 LR: 0.00004757 [03:19:24] Epoch: 1 Batch: 8684/38378 (22.63%) Loss: 2.213257 LR: 0.00004757 [03:19:26] Epoch: 1 Batch: 8685/38378 (22.63%) Loss: 1.654033 LR: 0.00004757 [03:19:28] Epoch: 1 Batch: 8686/38378 (22.63%) Loss: 1.710364 LR: 0.00004757 [03:19:30] Epoch: 1 Batch: 8687/38378 (22.64%) Loss: 1.950726 LR: 0.00004757 [03:19:31] Epoch: 1 Batch: 8688/38378 (22.64%) Loss: 1.970679 LR: 0.00004757 [03:19:33] Epoch: 1 Batch: 8689/38378 (22.64%) Loss: 1.982458 LR: 0.00004757 [03:19:35] Epoch: 1 Batch: 8690/38378 (22.64%) Loss: 1.905606 LR: 0.00004757 [03:19:37] Epoch: 1 Batch: 8691/38378 (22.65%) Loss: 2.169738 LR: 0.00004757 [03:19:39] Epoch: 1 Batch: 8692/38378 (22.65%) Loss: 2.209201 LR: 0.00004757 [03:19:41] Epoch: 1 Batch: 8693/38378 (22.65%) Loss: 1.956914 LR: 0.00004756 [03:19:42] Epoch: 1 Batch: 8694/38378 (22.65%) Loss: 1.828718 LR: 0.00004756 [03:19:44] Epoch: 1 Batch: 8695/38378 (22.66%) Loss: 2.302418 LR: 0.00004756 [03:19:46] Epoch: 1 Batch: 8696/38378 (22.66%) Loss: 2.269065 LR: 0.00004756 [03:19:48] Epoch: 1 Batch: 8697/38378 (22.66%) Loss: 1.822239 LR: 0.00004756 [03:19:50] Epoch: 1 Batch: 8698/38378 (22.66%) Loss: 2.085698 LR: 0.00004756 [03:19:52] Epoch: 1 Batch: 8699/38378 (22.67%) Loss: 2.279650 LR: 0.00004756 [03:19:53] Epoch: 1 Batch: 8700/38378 (22.67%) Loss: 2.002047 LR: 0.00004756 [03:19:55] Epoch: 1 Batch: 8701/38378 (22.67%) Loss: 2.032129 LR: 0.00004756 [03:19:57] Epoch: 1 Batch: 8702/38378 (22.67%) Loss: 2.032751 LR: 0.00004756 [03:19:59] Epoch: 1 Batch: 8703/38378 (22.68%) Loss: 2.127845 LR: 0.00004756 [03:20:01] Epoch: 1 Batch: 8704/38378 (22.68%) Loss: 2.298009 LR: 0.00004756 [03:20:02] Epoch: 1 Batch: 8705/38378 (22.68%) Loss: 2.214371 LR: 0.00004756 [03:20:04] Epoch: 1 Batch: 8706/38378 (22.68%) Loss: 1.936682 LR: 0.00004756 [03:20:06] Epoch: 1 Batch: 8707/38378 (22.69%) Loss: 2.113154 LR: 0.00004755 [03:20:08] Epoch: 1 Batch: 8708/38378 (22.69%) Loss: 2.030934 LR: 0.00004755 [03:20:10] Epoch: 1 Batch: 8709/38378 (22.69%) Loss: 2.111720 LR: 0.00004755 [03:20:12] Epoch: 1 Batch: 8710/38378 (22.70%) Loss: 2.046757 LR: 0.00004755 [03:20:13] Epoch: 1 Batch: 8711/38378 (22.70%) Loss: 2.027663 LR: 0.00004755 [03:20:15] Epoch: 1 Batch: 8712/38378 (22.70%) Loss: 2.053302 LR: 0.00004755 [03:20:17] Epoch: 1 Batch: 8713/38378 (22.70%) Loss: 2.093789 LR: 0.00004755 [03:20:19] Epoch: 1 Batch: 8714/38378 (22.71%) Loss: 2.143325 LR: 0.00004754 [03:20:21] Epoch: 1 Batch: 8715/38378 (22.71%) Loss: 2.101476 LR: 0.00004754 [03:20:22] Epoch: 1 Batch: 8716/38378 (22.71%) Loss: 2.307654 LR: 0.00004754 [03:20:24] Epoch: 1 Batch: 8717/38378 (22.71%) Loss: 1.686311 LR: 0.00004754 [03:20:26] Epoch: 1 Batch: 8718/38378 (22.72%) Loss: 2.082305 LR: 0.00004754 [03:20:28] Epoch: 1 Batch: 8719/38378 (22.72%) Loss: 1.949171 LR: 0.00004754 [03:20:30] Epoch: 1 Batch: 8720/38378 (22.72%) Loss: 2.350704 LR: 0.00004754 [03:20:31] Epoch: 1 Batch: 8721/38378 (22.72%) Loss: 2.156203 LR: 0.00004754 [03:20:33] Epoch: 1 Batch: 8722/38378 (22.73%) Loss: 2.236906 LR: 0.00004754 [03:20:35] Epoch: 1 Batch: 8723/38378 (22.73%) Loss: 1.945807 LR: 0.00004754 [03:20:37] Epoch: 1 Batch: 8724/38378 (22.73%) Loss: 2.106501 LR: 0.00004754 [03:20:39] Epoch: 1 Batch: 8725/38378 (22.73%) Loss: 1.784138 LR: 0.00004754 [03:20:40] Epoch: 1 Batch: 8726/38378 (22.74%) Loss: 2.046939 LR: 0.00004754 [03:20:42] Epoch: 1 Batch: 8727/38378 (22.74%) Loss: 1.721466 LR: 0.00004754 [03:20:44] Epoch: 1 Batch: 8728/38378 (22.74%) Loss: 1.904638 LR: 0.00004753 [03:20:46] Epoch: 1 Batch: 8729/38378 (22.74%) Loss: 1.875475 LR: 0.00004753 [03:20:48] Epoch: 1 Batch: 8730/38378 (22.75%) Loss: 1.731823 LR: 0.00004753 [03:20:50] Epoch: 1 Batch: 8731/38378 (22.75%) Loss: 2.232054 LR: 0.00004753 [03:20:51] Epoch: 1 Batch: 8732/38378 (22.75%) Loss: 2.089412 LR: 0.00004753 [03:20:53] Epoch: 1 Batch: 8733/38378 (22.76%) Loss: 1.762655 LR: 0.00004753 [03:20:55] Epoch: 1 Batch: 8734/38378 (22.76%) Loss: 1.842425 LR: 0.00004753 [03:20:57] Epoch: 1 Batch: 8735/38378 (22.76%) Loss: 1.904517 LR: 0.00004753 [03:20:59] Epoch: 1 Batch: 8736/38378 (22.76%) Loss: 2.131936 LR: 0.00004753 [03:21:00] Epoch: 1 Batch: 8737/38378 (22.77%) Loss: 2.143731 LR: 0.00004753 [03:21:02] Epoch: 1 Batch: 8738/38378 (22.77%) Loss: 1.823126 LR: 0.00004753 [03:21:04] Epoch: 1 Batch: 8739/38378 (22.77%) Loss: 1.925990 LR: 0.00004753 [03:21:06] Epoch: 1 Batch: 8740/38378 (22.77%) Loss: 2.120471 LR: 0.00004753 [03:21:08] Epoch: 1 Batch: 8741/38378 (22.78%) Loss: 2.104345 LR: 0.00004753 [03:21:10] Epoch: 1 Batch: 8742/38378 (22.78%) Loss: 2.108690 LR: 0.00004752 [03:21:11] Epoch: 1 Batch: 8743/38378 (22.78%) Loss: 2.115558 LR: 0.00004752 [03:21:13] Epoch: 1 Batch: 8744/38378 (22.78%) Loss: 1.787086 LR: 0.00004752 [03:21:15] Epoch: 1 Batch: 8745/38378 (22.79%) Loss: 1.896527 LR: 0.00004752 [03:21:17] Epoch: 1 Batch: 8746/38378 (22.79%) Loss: 2.189379 LR: 0.00004752 [03:21:19] Epoch: 1 Batch: 8747/38378 (22.79%) Loss: 1.805947 LR: 0.00004752 [03:21:20] Epoch: 1 Batch: 8748/38378 (22.79%) Loss: 1.862388 LR: 0.00004752 [03:21:22] Epoch: 1 Batch: 8749/38378 (22.80%) Loss: 2.159337 LR: 0.00004751 [03:21:24] Epoch: 1 Batch: 8750/38378 (22.80%) Loss: 2.038173 LR: 0.00004751 [03:21:26] Epoch: 1 Batch: 8751/38378 (22.80%) Loss: 1.735138 LR: 0.00004751 [03:21:28] Epoch: 1 Batch: 8752/38378 (22.80%) Loss: 1.952540 LR: 0.00004751 [03:21:29] Epoch: 1 Batch: 8753/38378 (22.81%) Loss: 2.106268 LR: 0.00004751 [03:21:31] Epoch: 1 Batch: 8754/38378 (22.81%) Loss: 1.807743 LR: 0.00004751 [03:21:33] Epoch: 1 Batch: 8755/38378 (22.81%) Loss: 1.892464 LR: 0.00004751 [03:21:35] Epoch: 1 Batch: 8756/38378 (22.82%) Loss: 1.952630 LR: 0.00004751 [03:21:37] Epoch: 1 Batch: 8757/38378 (22.82%) Loss: 1.820390 LR: 0.00004751 [03:21:39] Epoch: 1 Batch: 8758/38378 (22.82%) Loss: 2.135293 LR: 0.00004751 [03:21:40] Epoch: 1 Batch: 8759/38378 (22.82%) Loss: 2.080506 LR: 0.00004751 [03:21:42] Epoch: 1 Batch: 8760/38378 (22.83%) Loss: 1.934732 LR: 0.00004751 [03:21:44] Epoch: 1 Batch: 8761/38378 (22.83%) Loss: 2.015195 LR: 0.00004751 [03:21:46] Epoch: 1 Batch: 8762/38378 (22.83%) Loss: 1.896310 LR: 0.00004751 [03:21:48] Epoch: 1 Batch: 8763/38378 (22.83%) Loss: 2.250939 LR: 0.00004750 [03:21:49] Epoch: 1 Batch: 8764/38378 (22.84%) Loss: 1.704272 LR: 0.00004750 [03:21:51] Epoch: 1 Batch: 8765/38378 (22.84%) Loss: 1.705493 LR: 0.00004750 [03:21:53] Epoch: 1 Batch: 8766/38378 (22.84%) Loss: 1.983360 LR: 0.00004750 [03:21:55] Epoch: 1 Batch: 8767/38378 (22.84%) Loss: 2.078374 LR: 0.00004750 [03:21:57] Epoch: 1 Batch: 8768/38378 (22.85%) Loss: 2.003416 LR: 0.00004750 [03:21:58] Epoch: 1 Batch: 8769/38378 (22.85%) Loss: 2.185097 LR: 0.00004750 [03:22:00] Epoch: 1 Batch: 8770/38378 (22.85%) Loss: 2.457951 LR: 0.00004750 [03:22:02] Epoch: 1 Batch: 8771/38378 (22.85%) Loss: 1.935084 LR: 0.00004750 [03:22:04] Epoch: 1 Batch: 8772/38378 (22.86%) Loss: 2.153690 LR: 0.00004750 [03:22:06] Epoch: 1 Batch: 8773/38378 (22.86%) Loss: 2.018250 LR: 0.00004750 [03:22:07] Epoch: 1 Batch: 8774/38378 (22.86%) Loss: 2.283141 LR: 0.00004750 [03:22:09] Epoch: 1 Batch: 8775/38378 (22.86%) Loss: 1.917776 LR: 0.00004750 [03:22:11] Epoch: 1 Batch: 8776/38378 (22.87%) Loss: 2.053624 LR: 0.00004750 [03:22:13] Epoch: 1 Batch: 8777/38378 (22.87%) Loss: 2.057859 LR: 0.00004749 [03:22:15] Epoch: 1 Batch: 8778/38378 (22.87%) Loss: 1.949018 LR: 0.00004749 [03:22:17] Epoch: 1 Batch: 8779/38378 (22.88%) Loss: 2.074105 LR: 0.00004749 [03:22:18] Epoch: 1 Batch: 8780/38378 (22.88%) Loss: 1.842592 LR: 0.00004749 [03:22:20] Epoch: 1 Batch: 8781/38378 (22.88%) Loss: 2.047801 LR: 0.00004749 [03:22:22] Epoch: 1 Batch: 8782/38378 (22.88%) Loss: 2.070447 LR: 0.00004749 [03:22:24] Epoch: 1 Batch: 8783/38378 (22.89%) Loss: 2.086492 LR: 0.00004749 [03:22:26] Epoch: 1 Batch: 8784/38378 (22.89%) Loss: 1.980670 LR: 0.00004748 [03:22:27] Epoch: 1 Batch: 8785/38378 (22.89%) Loss: 2.132327 LR: 0.00004748 [03:22:29] Epoch: 1 Batch: 8786/38378 (22.89%) Loss: 2.271658 LR: 0.00004748 [03:22:31] Epoch: 1 Batch: 8787/38378 (22.90%) Loss: 1.882832 LR: 0.00004748 [03:22:33] Epoch: 1 Batch: 8788/38378 (22.90%) Loss: 1.920425 LR: 0.00004748 [03:22:35] Epoch: 1 Batch: 8789/38378 (22.90%) Loss: 1.951075 LR: 0.00004748 [03:22:37] Epoch: 1 Batch: 8790/38378 (22.90%) Loss: 1.998548 LR: 0.00004748 [03:22:38] Epoch: 1 Batch: 8791/38378 (22.91%) Loss: 2.268683 LR: 0.00004748 [03:22:40] Epoch: 1 Batch: 8792/38378 (22.91%) Loss: 1.921460 LR: 0.00004748 [03:22:42] Epoch: 1 Batch: 8793/38378 (22.91%) Loss: 1.840277 LR: 0.00004748 [03:22:44] Epoch: 1 Batch: 8794/38378 (22.91%) Loss: 2.157662 LR: 0.00004748 [03:22:46] Epoch: 1 Batch: 8795/38378 (22.92%) Loss: 1.986957 LR: 0.00004748 [03:22:47] Epoch: 1 Batch: 8796/38378 (22.92%) Loss: 1.839571 LR: 0.00004748 [03:22:49] Epoch: 1 Batch: 8797/38378 (22.92%) Loss: 2.008726 LR: 0.00004748 [03:22:51] Epoch: 1 Batch: 8798/38378 (22.92%) Loss: 2.149051 LR: 0.00004747 [03:22:53] Epoch: 1 Batch: 8799/38378 (22.93%) Loss: 2.004284 LR: 0.00004747 [03:22:59] >> Cleaned up old temp checkpoint: epoch1_step7821 [03:22:59] >> Temp checkpoint saved: epoch1_step8800, size: 0.1702 GB [03:22:59] Epoch: 1 Batch: 8800/38378 (22.93%) Loss: 1.910388 LR: 0.00004747 [03:23:01] Epoch: 1 Batch: 8801/38378 (22.93%) Loss: 1.881146 LR: 0.00004747 [03:23:02] Epoch: 1 Batch: 8802/38378 (22.94%) Loss: 1.843739 LR: 0.00004747 [03:23:04] Epoch: 1 Batch: 8803/38378 (22.94%) Loss: 1.978833 LR: 0.00004747 [03:23:06] Epoch: 1 Batch: 8804/38378 (22.94%) Loss: 2.222628 LR: 0.00004747 [03:23:08] Epoch: 1 Batch: 8805/38378 (22.94%) Loss: 2.033759 LR: 0.00004747 [03:23:10] Epoch: 1 Batch: 8806/38378 (22.95%) Loss: 2.276226 LR: 0.00004747 [03:23:11] Epoch: 1 Batch: 8807/38378 (22.95%) Loss: 1.932605 LR: 0.00004747 [03:23:13] Epoch: 1 Batch: 8808/38378 (22.95%) Loss: 1.888209 LR: 0.00004747 [03:23:15] Epoch: 1 Batch: 8809/38378 (22.95%) Loss: 2.132505 LR: 0.00004747 [03:23:17] Epoch: 1 Batch: 8810/38378 (22.96%) Loss: 1.935065 LR: 0.00004747 [03:23:19] Epoch: 1 Batch: 8811/38378 (22.96%) Loss: 2.053693 LR: 0.00004747 [03:23:20] Epoch: 1 Batch: 8812/38378 (22.96%) Loss: 2.138246 LR: 0.00004746 [03:23:22] Epoch: 1 Batch: 8813/38378 (22.96%) Loss: 1.974136 LR: 0.00004746 [03:23:24] Epoch: 1 Batch: 8814/38378 (22.97%) Loss: 2.050647 LR: 0.00004746 [03:23:26] Epoch: 1 Batch: 8815/38378 (22.97%) Loss: 1.926124 LR: 0.00004746 [03:23:28] Epoch: 1 Batch: 8816/38378 (22.97%) Loss: 2.175773 LR: 0.00004746 [03:23:30] Epoch: 1 Batch: 8817/38378 (22.97%) Loss: 2.235950 LR: 0.00004746 [03:23:32] Epoch: 1 Batch: 8818/38378 (22.98%) Loss: 1.912362 LR: 0.00004746 [03:23:33] Epoch: 1 Batch: 8819/38378 (22.98%) Loss: 1.936644 LR: 0.00004745 [03:23:35] Epoch: 1 Batch: 8820/38378 (22.98%) Loss: 2.153690 LR: 0.00004745 [03:23:37] Epoch: 1 Batch: 8821/38378 (22.98%) Loss: 1.738404 LR: 0.00004745 [03:23:39] Epoch: 1 Batch: 8822/38378 (22.99%) Loss: 1.996770 LR: 0.00004745 [03:23:41] Epoch: 1 Batch: 8823/38378 (22.99%) Loss: 1.834217 LR: 0.00004745 [03:23:43] Epoch: 1 Batch: 8824/38378 (22.99%) Loss: 1.742725 LR: 0.00004745 [03:23:44] Epoch: 1 Batch: 8825/38378 (22.99%) Loss: 2.026824 LR: 0.00004745 [03:23:46] Epoch: 1 Batch: 8826/38378 (23.00%) Loss: 2.001445 LR: 0.00004745 [03:23:48] Epoch: 1 Batch: 8827/38378 (23.00%) Loss: 1.472426 LR: 0.00004745 [03:23:50] Epoch: 1 Batch: 8828/38378 (23.00%) Loss: 1.874247 LR: 0.00004745 [03:23:52] Epoch: 1 Batch: 8829/38378 (23.01%) Loss: 1.955604 LR: 0.00004745 [03:23:54] Epoch: 1 Batch: 8830/38378 (23.01%) Loss: 2.004473 LR: 0.00004745 [03:23:55] Epoch: 1 Batch: 8831/38378 (23.01%) Loss: 2.078773 LR: 0.00004745 [03:23:57] Epoch: 1 Batch: 8832/38378 (23.01%) Loss: 2.005597 LR: 0.00004745 [03:23:59] Epoch: 1 Batch: 8833/38378 (23.02%) Loss: 1.738026 LR: 0.00004744 [03:24:01] Epoch: 1 Batch: 8834/38378 (23.02%) Loss: 2.102031 LR: 0.00004744 [03:24:03] Epoch: 1 Batch: 8835/38378 (23.02%) Loss: 2.284072 LR: 0.00004744 [03:24:05] Epoch: 1 Batch: 8836/38378 (23.02%) Loss: 2.011840 LR: 0.00004744 [03:24:06] Epoch: 1 Batch: 8837/38378 (23.03%) Loss: 2.410889 LR: 0.00004744 [03:24:08] Epoch: 1 Batch: 8838/38378 (23.03%) Loss: 1.912201 LR: 0.00004744 [03:24:10] Epoch: 1 Batch: 8839/38378 (23.03%) Loss: 1.673693 LR: 0.00004744 [03:24:12] Epoch: 1 Batch: 8840/38378 (23.03%) Loss: 2.082856 LR: 0.00004743 [03:24:13] Epoch: 1 Batch: 8841/38378 (23.04%) Loss: 2.375144 LR: 0.00004743 [03:24:15] Epoch: 1 Batch: 8842/38378 (23.04%) Loss: 1.772572 LR: 0.00004743 [03:24:17] Epoch: 1 Batch: 8843/38378 (23.04%) Loss: 2.213844 LR: 0.00004743 [03:24:19] Epoch: 1 Batch: 8844/38378 (23.04%) Loss: 2.164256 LR: 0.00004743 [03:24:21] Epoch: 1 Batch: 8845/38378 (23.05%) Loss: 2.272927 LR: 0.00004743 [03:24:22] Epoch: 1 Batch: 8846/38378 (23.05%) Loss: 2.013160 LR: 0.00004743 [03:24:24] Epoch: 1 Batch: 8847/38378 (23.05%) Loss: 1.853538 LR: 0.00004743 [03:24:26] Epoch: 1 Batch: 8848/38378 (23.05%) Loss: 2.048810 LR: 0.00004743 [03:24:28] Epoch: 1 Batch: 8849/38378 (23.06%) Loss: 1.783992 LR: 0.00004743 [03:24:30] Epoch: 1 Batch: 8850/38378 (23.06%) Loss: 1.859985 LR: 0.00004743 [03:24:31] Epoch: 1 Batch: 8851/38378 (23.06%) Loss: 2.035185 LR: 0.00004743 [03:24:33] Epoch: 1 Batch: 8852/38378 (23.07%) Loss: 1.990909 LR: 0.00004743 [03:24:35] Epoch: 1 Batch: 8853/38378 (23.07%) Loss: 2.016727 LR: 0.00004743 [03:24:37] Epoch: 1 Batch: 8854/38378 (23.07%) Loss: 1.943828 LR: 0.00004742 [03:24:39] Epoch: 1 Batch: 8855/38378 (23.07%) Loss: 2.107544 LR: 0.00004742 [03:24:40] Epoch: 1 Batch: 8856/38378 (23.08%) Loss: 2.043343 LR: 0.00004742 [03:24:42] Epoch: 1 Batch: 8857/38378 (23.08%) Loss: 2.174233 LR: 0.00004742 [03:24:44] Epoch: 1 Batch: 8858/38378 (23.08%) Loss: 1.877971 LR: 0.00004742 [03:24:46] Epoch: 1 Batch: 8859/38378 (23.08%) Loss: 2.129687 LR: 0.00004742 [03:24:48] Epoch: 1 Batch: 8860/38378 (23.09%) Loss: 2.065465 LR: 0.00004742 [03:24:50] Epoch: 1 Batch: 8861/38378 (23.09%) Loss: 2.063010 LR: 0.00004742 [03:24:51] Epoch: 1 Batch: 8862/38378 (23.09%) Loss: 1.884736 LR: 0.00004742 [03:24:53] Epoch: 1 Batch: 8863/38378 (23.09%) Loss: 2.062076 LR: 0.00004742 [03:24:55] Epoch: 1 Batch: 8864/38378 (23.10%) Loss: 1.890201 LR: 0.00004742 [03:24:57] Epoch: 1 Batch: 8865/38378 (23.10%) Loss: 2.132063 LR: 0.00004742 [03:24:59] Epoch: 1 Batch: 8866/38378 (23.10%) Loss: 1.954474 LR: 0.00004742 [03:25:00] Epoch: 1 Batch: 8867/38378 (23.10%) Loss: 2.386152 LR: 0.00004742 [03:25:02] Epoch: 1 Batch: 8868/38378 (23.11%) Loss: 1.733114 LR: 0.00004741 [03:25:04] Epoch: 1 Batch: 8869/38378 (23.11%) Loss: 1.895670 LR: 0.00004741 [03:25:06] Epoch: 1 Batch: 8870/38378 (23.11%) Loss: 1.835846 LR: 0.00004741 [03:25:08] Epoch: 1 Batch: 8871/38378 (23.11%) Loss: 1.709631 LR: 0.00004741 [03:25:10] Epoch: 1 Batch: 8872/38378 (23.12%) Loss: 1.879787 LR: 0.00004741 [03:25:11] Epoch: 1 Batch: 8873/38378 (23.12%) Loss: 1.933324 LR: 0.00004741 [03:25:13] Epoch: 1 Batch: 8874/38378 (23.12%) Loss: 1.957975 LR: 0.00004741 [03:25:15] Epoch: 1 Batch: 8875/38378 (23.13%) Loss: 2.143047 LR: 0.00004740 [03:25:17] Epoch: 1 Batch: 8876/38378 (23.13%) Loss: 1.979751 LR: 0.00004740 [03:25:19] Epoch: 1 Batch: 8877/38378 (23.13%) Loss: 1.687283 LR: 0.00004740 [03:25:20] Epoch: 1 Batch: 8878/38378 (23.13%) Loss: 2.074564 LR: 0.00004740 [03:25:22] Epoch: 1 Batch: 8879/38378 (23.14%) Loss: 1.810762 LR: 0.00004740 [03:25:24] Epoch: 1 Batch: 8880/38378 (23.14%) Loss: 1.858242 LR: 0.00004740 [03:25:26] Epoch: 1 Batch: 8881/38378 (23.14%) Loss: 1.702003 LR: 0.00004740 [03:25:28] Epoch: 1 Batch: 8882/38378 (23.14%) Loss: 2.103293 LR: 0.00004740 [03:25:30] Epoch: 1 Batch: 8883/38378 (23.15%) Loss: 1.960972 LR: 0.00004740 [03:25:31] Epoch: 1 Batch: 8884/38378 (23.15%) Loss: 1.971506 LR: 0.00004740 [03:25:33] Epoch: 1 Batch: 8885/38378 (23.15%) Loss: 1.652198 LR: 0.00004740 [03:25:35] Epoch: 1 Batch: 8886/38378 (23.15%) Loss: 1.829023 LR: 0.00004740 [03:25:37] Epoch: 1 Batch: 8887/38378 (23.16%) Loss: 2.166625 LR: 0.00004740 [03:25:39] Epoch: 1 Batch: 8888/38378 (23.16%) Loss: 2.087285 LR: 0.00004740 [03:25:41] Epoch: 1 Batch: 8889/38378 (23.16%) Loss: 2.297620 LR: 0.00004739 [03:25:42] Epoch: 1 Batch: 8890/38378 (23.16%) Loss: 1.753732 LR: 0.00004739 [03:25:44] Epoch: 1 Batch: 8891/38378 (23.17%) Loss: 2.304194 LR: 0.00004739 [03:25:46] Epoch: 1 Batch: 8892/38378 (23.17%) Loss: 2.023731 LR: 0.00004739 [03:25:48] Epoch: 1 Batch: 8893/38378 (23.17%) Loss: 1.885469 LR: 0.00004739 [03:25:50] Epoch: 1 Batch: 8894/38378 (23.17%) Loss: 1.702207 LR: 0.00004739 [03:25:51] Epoch: 1 Batch: 8895/38378 (23.18%) Loss: 2.000852 LR: 0.00004739 [03:25:53] Epoch: 1 Batch: 8896/38378 (23.18%) Loss: 2.314128 LR: 0.00004739 [03:25:55] Epoch: 1 Batch: 8897/38378 (23.18%) Loss: 1.931459 LR: 0.00004739 [03:25:57] Epoch: 1 Batch: 8898/38378 (23.19%) Loss: 1.925254 LR: 0.00004739 [03:25:59] Epoch: 1 Batch: 8899/38378 (23.19%) Loss: 2.096242 LR: 0.00004739 [03:26:01] Epoch: 1 Batch: 8900/38378 (23.19%) Loss: 2.090897 LR: 0.00004739 [03:26:02] Epoch: 1 Batch: 8901/38378 (23.19%) Loss: 2.078765 LR: 0.00004739 [03:26:04] Epoch: 1 Batch: 8902/38378 (23.20%) Loss: 1.884718 LR: 0.00004739 [03:26:06] Epoch: 1 Batch: 8903/38378 (23.20%) Loss: 2.139952 LR: 0.00004738 [03:26:08] Epoch: 1 Batch: 8904/38378 (23.20%) Loss: 1.986803 LR: 0.00004738 [03:26:09] Epoch: 1 Batch: 8905/38378 (23.20%) Loss: 1.908209 LR: 0.00004738 [03:26:11] Epoch: 1 Batch: 8906/38378 (23.21%) Loss: 1.773505 LR: 0.00004738 [03:26:13] Epoch: 1 Batch: 8907/38378 (23.21%) Loss: 2.027258 LR: 0.00004738 [03:26:15] Epoch: 1 Batch: 8908/38378 (23.21%) Loss: 2.069901 LR: 0.00004738 [03:26:17] Epoch: 1 Batch: 8909/38378 (23.21%) Loss: 2.182594 LR: 0.00004738 [03:26:19] Epoch: 1 Batch: 8910/38378 (23.22%) Loss: 1.855205 LR: 0.00004737 [03:26:20] Epoch: 1 Batch: 8911/38378 (23.22%) Loss: 1.969663 LR: 0.00004737 [03:26:22] Epoch: 1 Batch: 8912/38378 (23.22%) Loss: 2.196601 LR: 0.00004737 [03:26:24] Epoch: 1 Batch: 8913/38378 (23.22%) Loss: 2.011024 LR: 0.00004737 [03:26:26] Epoch: 1 Batch: 8914/38378 (23.23%) Loss: 1.966759 LR: 0.00004737 [03:26:28] Epoch: 1 Batch: 8915/38378 (23.23%) Loss: 1.978872 LR: 0.00004737 [03:26:30] Epoch: 1 Batch: 8916/38378 (23.23%) Loss: 2.092842 LR: 0.00004737 [03:26:31] Epoch: 1 Batch: 8917/38378 (23.23%) Loss: 2.130796 LR: 0.00004737 [03:26:33] Epoch: 1 Batch: 8918/38378 (23.24%) Loss: 1.870755 LR: 0.00004737 [03:26:35] Epoch: 1 Batch: 8919/38378 (23.24%) Loss: 1.655622 LR: 0.00004737 [03:26:37] Epoch: 1 Batch: 8920/38378 (23.24%) Loss: 2.026241 LR: 0.00004737 [03:26:39] Epoch: 1 Batch: 8921/38378 (23.25%) Loss: 1.970565 LR: 0.00004737 [03:26:40] Epoch: 1 Batch: 8922/38378 (23.25%) Loss: 2.198351 LR: 0.00004737 [03:26:42] Epoch: 1 Batch: 8923/38378 (23.25%) Loss: 1.910072 LR: 0.00004737 [03:26:44] Epoch: 1 Batch: 8924/38378 (23.25%) Loss: 2.052690 LR: 0.00004736 [03:26:46] Epoch: 1 Batch: 8925/38378 (23.26%) Loss: 2.137886 LR: 0.00004736 [03:26:48] Epoch: 1 Batch: 8926/38378 (23.26%) Loss: 2.155351 LR: 0.00004736 [03:26:50] Epoch: 1 Batch: 8927/38378 (23.26%) Loss: 1.755800 LR: 0.00004736 [03:26:51] Epoch: 1 Batch: 8928/38378 (23.26%) Loss: 2.122845 LR: 0.00004736 [03:26:53] Epoch: 1 Batch: 8929/38378 (23.27%) Loss: 2.432230 LR: 0.00004736 [03:26:55] Epoch: 1 Batch: 8930/38378 (23.27%) Loss: 1.896150 LR: 0.00004736 [03:26:57] Epoch: 1 Batch: 8931/38378 (23.27%) Loss: 2.050861 LR: 0.00004735 [03:26:59] Epoch: 1 Batch: 8932/38378 (23.27%) Loss: 2.095916 LR: 0.00004735 [03:27:01] Epoch: 1 Batch: 8933/38378 (23.28%) Loss: 2.177011 LR: 0.00004735 [03:27:02] Epoch: 1 Batch: 8934/38378 (23.28%) Loss: 1.932103 LR: 0.00004735 [03:27:04] Epoch: 1 Batch: 8935/38378 (23.28%) Loss: 1.977991 LR: 0.00004735 [03:27:06] Epoch: 1 Batch: 8936/38378 (23.28%) Loss: 2.071947 LR: 0.00004735 [03:27:08] Epoch: 1 Batch: 8937/38378 (23.29%) Loss: 2.153493 LR: 0.00004735 [03:27:10] Epoch: 1 Batch: 8938/38378 (23.29%) Loss: 2.193616 LR: 0.00004735 [03:27:12] Epoch: 1 Batch: 8939/38378 (23.29%) Loss: 2.000483 LR: 0.00004735 [03:27:13] Epoch: 1 Batch: 8940/38378 (23.29%) Loss: 2.179614 LR: 0.00004735 [03:27:15] Epoch: 1 Batch: 8941/38378 (23.30%) Loss: 2.095540 LR: 0.00004735 [03:27:17] Epoch: 1 Batch: 8942/38378 (23.30%) Loss: 2.029519 LR: 0.00004735 [03:27:19] Epoch: 1 Batch: 8943/38378 (23.30%) Loss: 1.748118 LR: 0.00004735 [03:27:21] Epoch: 1 Batch: 8944/38378 (23.31%) Loss: 2.108144 LR: 0.00004735 [03:27:23] Epoch: 1 Batch: 8945/38378 (23.31%) Loss: 2.074549 LR: 0.00004734 [03:27:24] Epoch: 1 Batch: 8946/38378 (23.31%) Loss: 1.886203 LR: 0.00004734 [03:27:26] Epoch: 1 Batch: 8947/38378 (23.31%) Loss: 1.991799 LR: 0.00004734 [03:27:28] Epoch: 1 Batch: 8948/38378 (23.32%) Loss: 2.192107 LR: 0.00004734 [03:27:30] Epoch: 1 Batch: 8949/38378 (23.32%) Loss: 1.913267 LR: 0.00004734 [03:27:32] Epoch: 1 Batch: 8950/38378 (23.32%) Loss: 1.939264 LR: 0.00004734 [03:27:33] Epoch: 1 Batch: 8951/38378 (23.32%) Loss: 2.207378 LR: 0.00004734 [03:27:35] Epoch: 1 Batch: 8952/38378 (23.33%) Loss: 2.179459 LR: 0.00004734 [03:27:37] Epoch: 1 Batch: 8953/38378 (23.33%) Loss: 2.207419 LR: 0.00004734 [03:27:39] Epoch: 1 Batch: 8954/38378 (23.33%) Loss: 2.007946 LR: 0.00004734 [03:27:41] Epoch: 1 Batch: 8955/38378 (23.33%) Loss: 2.082830 LR: 0.00004734 [03:27:42] Epoch: 1 Batch: 8956/38378 (23.34%) Loss: 1.826406 LR: 0.00004734 [03:27:44] Epoch: 1 Batch: 8957/38378 (23.34%) Loss: 2.171031 LR: 0.00004734 [03:27:46] Epoch: 1 Batch: 8958/38378 (23.34%) Loss: 2.031775 LR: 0.00004734 [03:27:48] Epoch: 1 Batch: 8959/38378 (23.34%) Loss: 2.046030 LR: 0.00004733 [03:27:50] Epoch: 1 Batch: 8960/38378 (23.35%) Loss: 2.029275 LR: 0.00004733 [03:27:51] Epoch: 1 Batch: 8961/38378 (23.35%) Loss: 1.932921 LR: 0.00004733 [03:27:53] Epoch: 1 Batch: 8962/38378 (23.35%) Loss: 2.029431 LR: 0.00004733 [03:27:55] Epoch: 1 Batch: 8963/38378 (23.35%) Loss: 1.817625 LR: 0.00004733 [03:27:57] Epoch: 1 Batch: 8964/38378 (23.36%) Loss: 1.811968 LR: 0.00004733 [03:27:59] Epoch: 1 Batch: 8965/38378 (23.36%) Loss: 1.885018 LR: 0.00004733 [03:28:00] Epoch: 1 Batch: 8966/38378 (23.36%) Loss: 2.112296 LR: 0.00004732 [03:28:02] Epoch: 1 Batch: 8967/38378 (23.36%) Loss: 2.074311 LR: 0.00004732 [03:28:04] Epoch: 1 Batch: 8968/38378 (23.37%) Loss: 1.953914 LR: 0.00004732 [03:28:06] Epoch: 1 Batch: 8969/38378 (23.37%) Loss: 2.434816 LR: 0.00004732 [03:28:08] Epoch: 1 Batch: 8970/38378 (23.37%) Loss: 2.007475 LR: 0.00004732 [03:28:09] Epoch: 1 Batch: 8971/38378 (23.38%) Loss: 2.082678 LR: 0.00004732 [03:28:11] Epoch: 1 Batch: 8972/38378 (23.38%) Loss: 1.781805 LR: 0.00004732 [03:28:13] Epoch: 1 Batch: 8973/38378 (23.38%) Loss: 2.179554 LR: 0.00004732 [03:28:15] Epoch: 1 Batch: 8974/38378 (23.38%) Loss: 2.109743 LR: 0.00004732 [03:28:17] Epoch: 1 Batch: 8975/38378 (23.39%) Loss: 1.729612 LR: 0.00004732 [03:28:19] Epoch: 1 Batch: 8976/38378 (23.39%) Loss: 2.148314 LR: 0.00004732 [03:28:20] Epoch: 1 Batch: 8977/38378 (23.39%) Loss: 2.195863 LR: 0.00004732 [03:28:22] Epoch: 1 Batch: 8978/38378 (23.39%) Loss: 1.901170 LR: 0.00004732 [03:28:24] Epoch: 1 Batch: 8979/38378 (23.40%) Loss: 1.888220 LR: 0.00004732 [03:28:26] Epoch: 1 Batch: 8980/38378 (23.40%) Loss: 2.126223 LR: 0.00004731 [03:28:28] Epoch: 1 Batch: 8981/38378 (23.40%) Loss: 2.029117 LR: 0.00004731 [03:28:29] Epoch: 1 Batch: 8982/38378 (23.40%) Loss: 2.086574 LR: 0.00004731 [03:28:31] Epoch: 1 Batch: 8983/38378 (23.41%) Loss: 1.834385 LR: 0.00004731 [03:28:33] Epoch: 1 Batch: 8984/38378 (23.41%) Loss: 2.109307 LR: 0.00004731 [03:28:35] Epoch: 1 Batch: 8985/38378 (23.41%) Loss: 1.939941 LR: 0.00004731 [03:28:37] Epoch: 1 Batch: 8986/38378 (23.41%) Loss: 2.112655 LR: 0.00004731 [03:28:39] Epoch: 1 Batch: 8987/38378 (23.42%) Loss: 1.931526 LR: 0.00004730 [03:28:40] Epoch: 1 Batch: 8988/38378 (23.42%) Loss: 2.032926 LR: 0.00004730 [03:28:42] Epoch: 1 Batch: 8989/38378 (23.42%) Loss: 1.843328 LR: 0.00004730 [03:28:44] Epoch: 1 Batch: 8990/38378 (23.42%) Loss: 2.134673 LR: 0.00004730 [03:28:46] Epoch: 1 Batch: 8991/38378 (23.43%) Loss: 1.817296 LR: 0.00004730 [03:28:48] Epoch: 1 Batch: 8992/38378 (23.43%) Loss: 2.106817 LR: 0.00004730 [03:28:49] Epoch: 1 Batch: 8993/38378 (23.43%) Loss: 2.021741 LR: 0.00004730 [03:28:51] Epoch: 1 Batch: 8994/38378 (23.44%) Loss: 2.113780 LR: 0.00004730 [03:28:53] Epoch: 1 Batch: 8995/38378 (23.44%) Loss: 2.170046 LR: 0.00004730 [03:28:55] Epoch: 1 Batch: 8996/38378 (23.44%) Loss: 2.165618 LR: 0.00004730 [03:28:57] Epoch: 1 Batch: 8997/38378 (23.44%) Loss: 2.214233 LR: 0.00004730 [03:28:59] Epoch: 1 Batch: 8998/38378 (23.45%) Loss: 2.182542 LR: 0.00004730 [03:29:00] Epoch: 1 Batch: 8999/38378 (23.45%) Loss: 2.309780 LR: 0.00004730 [03:29:02] >> Evaluating batch 0 [03:29:03] >> Evaluating batch 1 [03:29:04] >> Evaluating batch 2 [03:29:05] >> Evaluating batch 3 [03:29:06] >> Evaluating batch 4 [03:29:07] >> Evaluating batch 5 [03:29:08] >> Evaluating batch 6 [03:29:09] >> Evaluating batch 7 [03:29:10] >> Evaluating batch 8 [03:29:11] >> Evaluating batch 9 [03:29:12] >> Evaluating batch 10 [03:29:13] >> Evaluating batch 11 [03:29:14] >> Evaluating batch 12 [03:29:15] >> Evaluating batch 13 [03:29:16] >> Evaluating batch 14 [03:29:17] >> Evaluating batch 15 [03:29:18] >> Evaluating batch 16 [03:29:19] Epoch: 1 Step: 9000/38378 Evaluation: [03:29:19] [1mAvg Loss Since Last Eval: 2.0145 Val Loss: 2.1167 Validation loss delta: -0.0033 Perplexity: 8.3040 LR: 0.00004730 [03:29:23] >> Cleaned up old temp checkpoint: epoch1_step7854 [03:29:23] >> Temp checkpoint saved: epoch1_step9000, size: 0.1702 GB [03:29:27] >> Checkpoint saved: epoch1_step9000, size: 0.1702 GB [03:29:27] Epoch: 1 Batch: 9000/38378 (23.45%) Loss: 2.022630 LR: 0.00004730 [03:29:29] Epoch: 1 Batch: 9001/38378 (23.45%) Loss: 2.406335 LR: 0.00004729 [03:29:31] Epoch: 1 Batch: 9002/38378 (23.46%) Loss: 1.947535 LR: 0.00004729 [03:29:33] Epoch: 1 Batch: 9003/38378 (23.46%) Loss: 1.873526 LR: 0.00004729 [03:29:35] Epoch: 1 Batch: 9004/38378 (23.46%) Loss: 1.960947 LR: 0.00004729 [03:29:36] Epoch: 1 Batch: 9005/38378 (23.46%) Loss: 1.823244 LR: 0.00004729 [03:29:38] Epoch: 1 Batch: 9006/38378 (23.47%) Loss: 2.076504 LR: 0.00004729 [03:29:40] Epoch: 1 Batch: 9007/38378 (23.47%) Loss: 1.886681 LR: 0.00004729 [03:29:42] Epoch: 1 Batch: 9008/38378 (23.47%) Loss: 1.930404 LR: 0.00004729 [03:29:44] Epoch: 1 Batch: 9009/38378 (23.47%) Loss: 1.981671 LR: 0.00004729 [03:29:45] Epoch: 1 Batch: 9010/38378 (23.48%) Loss: 1.929001 LR: 0.00004729 [03:29:47] Epoch: 1 Batch: 9011/38378 (23.48%) Loss: 1.964540 LR: 0.00004729 [03:29:49] Epoch: 1 Batch: 9012/38378 (23.48%) Loss: 2.415701 LR: 0.00004729 [03:29:51] Epoch: 1 Batch: 9013/38378 (23.48%) Loss: 1.973019 LR: 0.00004729 [03:29:53] Epoch: 1 Batch: 9014/38378 (23.49%) Loss: 2.227771 LR: 0.00004729 [03:29:55] Epoch: 1 Batch: 9015/38378 (23.49%) Loss: 2.227099 LR: 0.00004728 [03:29:57] Epoch: 1 Batch: 9016/38378 (23.49%) Loss: 1.858810 LR: 0.00004728 [03:29:59] Epoch: 1 Batch: 9017/38378 (23.50%) Loss: 2.293938 LR: 0.00004728 [03:30:00] Epoch: 1 Batch: 9018/38378 (23.50%) Loss: 1.724603 LR: 0.00004728 [03:30:02] Epoch: 1 Batch: 9019/38378 (23.50%) Loss: 1.902087 LR: 0.00004728 [03:30:04] Epoch: 1 Batch: 9020/38378 (23.50%) Loss: 1.822939 LR: 0.00004728 [03:30:06] Epoch: 1 Batch: 9021/38378 (23.51%) Loss: 1.864133 LR: 0.00004728 [03:30:08] Epoch: 1 Batch: 9022/38378 (23.51%) Loss: 1.948688 LR: 0.00004727 [03:30:10] Epoch: 1 Batch: 9023/38378 (23.51%) Loss: 1.989001 LR: 0.00004727 [03:30:11] Epoch: 1 Batch: 9024/38378 (23.51%) Loss: 1.946019 LR: 0.00004727 [03:30:13] Epoch: 1 Batch: 9025/38378 (23.52%) Loss: 2.394369 LR: 0.00004727 [03:30:15] Epoch: 1 Batch: 9026/38378 (23.52%) Loss: 1.900811 LR: 0.00004727 [03:30:17] Epoch: 1 Batch: 9027/38378 (23.52%) Loss: 1.812286 LR: 0.00004727 [03:30:19] Epoch: 1 Batch: 9028/38378 (23.52%) Loss: 1.864725 LR: 0.00004727 [03:30:21] Epoch: 1 Batch: 9029/38378 (23.53%) Loss: 2.151371 LR: 0.00004727 [03:30:22] Epoch: 1 Batch: 9030/38378 (23.53%) Loss: 1.952918 LR: 0.00004727 [03:30:24] Epoch: 1 Batch: 9031/38378 (23.53%) Loss: 1.826541 LR: 0.00004727 [03:30:26] Epoch: 1 Batch: 9032/38378 (23.53%) Loss: 2.138244 LR: 0.00004727 [03:30:28] Epoch: 1 Batch: 9033/38378 (23.54%) Loss: 1.912668 LR: 0.00004727 [03:30:30] Epoch: 1 Batch: 9034/38378 (23.54%) Loss: 2.046931 LR: 0.00004727 [03:30:31] Epoch: 1 Batch: 9035/38378 (23.54%) Loss: 1.729904 LR: 0.00004727 [03:30:33] Epoch: 1 Batch: 9036/38378 (23.54%) Loss: 2.010792 LR: 0.00004726 [03:30:35] Epoch: 1 Batch: 9037/38378 (23.55%) Loss: 2.056139 LR: 0.00004726 [03:30:37] Epoch: 1 Batch: 9038/38378 (23.55%) Loss: 1.877592 LR: 0.00004726 [03:30:39] Epoch: 1 Batch: 9039/38378 (23.55%) Loss: 2.022431 LR: 0.00004726 [03:30:40] Epoch: 1 Batch: 9040/38378 (23.56%) Loss: 2.063795 LR: 0.00004726 [03:30:42] Epoch: 1 Batch: 9041/38378 (23.56%) Loss: 1.801237 LR: 0.00004726 [03:30:44] Epoch: 1 Batch: 9042/38378 (23.56%) Loss: 1.761474 LR: 0.00004726 [03:30:46] Epoch: 1 Batch: 9043/38378 (23.56%) Loss: 2.161245 LR: 0.00004725 [03:30:47] Epoch: 1 Batch: 9044/38378 (23.57%) Loss: 1.861888 LR: 0.00004725 [03:30:49] Epoch: 1 Batch: 9045/38378 (23.57%) Loss: 1.830380 LR: 0.00004725 [03:30:51] Epoch: 1 Batch: 9046/38378 (23.57%) Loss: 2.075063 LR: 0.00004725 [03:30:53] Epoch: 1 Batch: 9047/38378 (23.57%) Loss: 1.964865 LR: 0.00004725 [03:30:55] Epoch: 1 Batch: 9048/38378 (23.58%) Loss: 1.756666 LR: 0.00004725 [03:30:56] Epoch: 1 Batch: 9049/38378 (23.58%) Loss: 2.091845 LR: 0.00004725 [03:30:58] Epoch: 1 Batch: 9050/38378 (23.58%) Loss: 1.971485 LR: 0.00004725 [03:31:00] Epoch: 1 Batch: 9051/38378 (23.58%) Loss: 2.153795 LR: 0.00004725 [03:31:02] Epoch: 1 Batch: 9052/38378 (23.59%) Loss: 1.967069 LR: 0.00004725 [03:31:04] Epoch: 1 Batch: 9053/38378 (23.59%) Loss: 1.901645 LR: 0.00004725 [03:31:06] Epoch: 1 Batch: 9054/38378 (23.59%) Loss: 2.050969 LR: 0.00004725 [03:31:07] Epoch: 1 Batch: 9055/38378 (23.59%) Loss: 2.095690 LR: 0.00004725 [03:31:09] Epoch: 1 Batch: 9056/38378 (23.60%) Loss: 1.942748 LR: 0.00004725 [03:31:11] Epoch: 1 Batch: 9057/38378 (23.60%) Loss: 1.974414 LR: 0.00004724 [03:31:13] Epoch: 1 Batch: 9058/38378 (23.60%) Loss: 1.995272 LR: 0.00004724 [03:31:15] Epoch: 1 Batch: 9059/38378 (23.60%) Loss: 1.886101 LR: 0.00004724 [03:31:17] Epoch: 1 Batch: 9060/38378 (23.61%) Loss: 2.134061 LR: 0.00004724 [03:31:18] Epoch: 1 Batch: 9061/38378 (23.61%) Loss: 2.056536 LR: 0.00004724 [03:31:20] Epoch: 1 Batch: 9062/38378 (23.61%) Loss: 2.112317 LR: 0.00004724 [03:31:22] Epoch: 1 Batch: 9063/38378 (23.62%) Loss: 2.303201 LR: 0.00004724 [03:31:24] Epoch: 1 Batch: 9064/38378 (23.62%) Loss: 2.091416 LR: 0.00004724 [03:31:26] Epoch: 1 Batch: 9065/38378 (23.62%) Loss: 1.996232 LR: 0.00004724 [03:31:28] Epoch: 1 Batch: 9066/38378 (23.62%) Loss: 1.835080 LR: 0.00004724 [03:31:29] Epoch: 1 Batch: 9067/38378 (23.63%) Loss: 1.895785 LR: 0.00004724 [03:31:31] Epoch: 1 Batch: 9068/38378 (23.63%) Loss: 1.865061 LR: 0.00004724 [03:31:33] Epoch: 1 Batch: 9069/38378 (23.63%) Loss: 1.948158 LR: 0.00004724 [03:31:35] Epoch: 1 Batch: 9070/38378 (23.63%) Loss: 1.999003 LR: 0.00004724 [03:31:37] Epoch: 1 Batch: 9071/38378 (23.64%) Loss: 2.058129 LR: 0.00004723 [03:31:39] Epoch: 1 Batch: 9072/38378 (23.64%) Loss: 1.796205 LR: 0.00004723 [03:31:40] Epoch: 1 Batch: 9073/38378 (23.64%) Loss: 2.116453 LR: 0.00004723 [03:31:42] Epoch: 1 Batch: 9074/38378 (23.64%) Loss: 2.068969 LR: 0.00004723 [03:31:44] Epoch: 1 Batch: 9075/38378 (23.65%) Loss: 2.123958 LR: 0.00004723 [03:31:46] Epoch: 1 Batch: 9076/38378 (23.65%) Loss: 2.218745 LR: 0.00004723 [03:31:48] Epoch: 1 Batch: 9077/38378 (23.65%) Loss: 1.879081 LR: 0.00004723 [03:31:49] Epoch: 1 Batch: 9078/38378 (23.65%) Loss: 2.120997 LR: 0.00004722 [03:31:51] Epoch: 1 Batch: 9079/38378 (23.66%) Loss: 1.862514 LR: 0.00004722 [03:31:53] Epoch: 1 Batch: 9080/38378 (23.66%) Loss: 2.048383 LR: 0.00004722 [03:31:55] Epoch: 1 Batch: 9081/38378 (23.66%) Loss: 2.108473 LR: 0.00004722 [03:31:57] Epoch: 1 Batch: 9082/38378 (23.66%) Loss: 2.276480 LR: 0.00004722 [03:31:58] Epoch: 1 Batch: 9083/38378 (23.67%) Loss: 2.365653 LR: 0.00004722 [03:32:00] Epoch: 1 Batch: 9084/38378 (23.67%) Loss: 2.004320 LR: 0.00004722 [03:32:02] Epoch: 1 Batch: 9085/38378 (23.67%) Loss: 2.039874 LR: 0.00004722 [03:32:04] Epoch: 1 Batch: 9086/38378 (23.68%) Loss: 1.796968 LR: 0.00004722 [03:32:05] Epoch: 1 Batch: 9087/38378 (23.68%) Loss: 1.929780 LR: 0.00004722 [03:32:07] Epoch: 1 Batch: 9088/38378 (23.68%) Loss: 2.030037 LR: 0.00004722 [03:32:09] Epoch: 1 Batch: 9089/38378 (23.68%) Loss: 2.043820 LR: 0.00004722 [03:32:11] Epoch: 1 Batch: 9090/38378 (23.69%) Loss: 2.501511 LR: 0.00004722 [03:32:13] Epoch: 1 Batch: 9091/38378 (23.69%) Loss: 1.908047 LR: 0.00004722 [03:32:14] Epoch: 1 Batch: 9092/38378 (23.69%) Loss: 1.849903 LR: 0.00004721 [03:32:16] Epoch: 1 Batch: 9093/38378 (23.69%) Loss: 2.152168 LR: 0.00004721 [03:32:18] Epoch: 1 Batch: 9094/38378 (23.70%) Loss: 2.369039 LR: 0.00004721 [03:32:20] Epoch: 1 Batch: 9095/38378 (23.70%) Loss: 1.855758 LR: 0.00004721 [03:32:22] Epoch: 1 Batch: 9096/38378 (23.70%) Loss: 2.038777 LR: 0.00004721 [03:32:23] Epoch: 1 Batch: 9097/38378 (23.70%) Loss: 1.831292 LR: 0.00004721 [03:32:25] Epoch: 1 Batch: 9098/38378 (23.71%) Loss: 1.862730 LR: 0.00004721 [03:32:27] Epoch: 1 Batch: 9099/38378 (23.71%) Loss: 1.827043 LR: 0.00004720 [03:32:29] Epoch: 1 Batch: 9100/38378 (23.71%) Loss: 2.244331 LR: 0.00004720 [03:32:31] Epoch: 1 Batch: 9101/38378 (23.71%) Loss: 2.041650 LR: 0.00004720 [03:32:32] Epoch: 1 Batch: 9102/38378 (23.72%) Loss: 1.952482 LR: 0.00004720 [03:32:34] Epoch: 1 Batch: 9103/38378 (23.72%) Loss: 1.987373 LR: 0.00004720 [03:32:36] Epoch: 1 Batch: 9104/38378 (23.72%) Loss: 1.854040 LR: 0.00004720 [03:32:38] Epoch: 1 Batch: 9105/38378 (23.72%) Loss: 2.157838 LR: 0.00004720 [03:32:40] Epoch: 1 Batch: 9106/38378 (23.73%) Loss: 1.809104 LR: 0.00004720 [03:32:42] Epoch: 1 Batch: 9107/38378 (23.73%) Loss: 1.790284 LR: 0.00004720 [03:32:43] Epoch: 1 Batch: 9108/38378 (23.73%) Loss: 1.863706 LR: 0.00004720 [03:32:45] Epoch: 1 Batch: 9109/38378 (23.73%) Loss: 2.179959 LR: 0.00004720 [03:32:47] Epoch: 1 Batch: 9110/38378 (23.74%) Loss: 2.130286 LR: 0.00004720 [03:32:49] Epoch: 1 Batch: 9111/38378 (23.74%) Loss: 1.963722 LR: 0.00004720 [03:32:50] Epoch: 1 Batch: 9112/38378 (23.74%) Loss: 1.838330 LR: 0.00004720 [03:32:52] Epoch: 1 Batch: 9113/38378 (23.75%) Loss: 2.144217 LR: 0.00004719 [03:32:54] Epoch: 1 Batch: 9114/38378 (23.75%) Loss: 1.837892 LR: 0.00004719 [03:32:56] Epoch: 1 Batch: 9115/38378 (23.75%) Loss: 2.102596 LR: 0.00004719 [03:32:58] Epoch: 1 Batch: 9116/38378 (23.75%) Loss: 1.802810 LR: 0.00004719 [03:32:59] Epoch: 1 Batch: 9117/38378 (23.76%) Loss: 2.117810 LR: 0.00004719 [03:33:01] Epoch: 1 Batch: 9118/38378 (23.76%) Loss: 1.944652 LR: 0.00004719 [03:33:03] Epoch: 1 Batch: 9119/38378 (23.76%) Loss: 2.033571 LR: 0.00004719 [03:33:05] Epoch: 1 Batch: 9120/38378 (23.76%) Loss: 2.072142 LR: 0.00004719 [03:33:07] Epoch: 1 Batch: 9121/38378 (23.77%) Loss: 2.052119 LR: 0.00004719 [03:33:09] Epoch: 1 Batch: 9122/38378 (23.77%) Loss: 2.117492 LR: 0.00004719 [03:33:10] Epoch: 1 Batch: 9123/38378 (23.77%) Loss: 2.085336 LR: 0.00004719 [03:33:12] Epoch: 1 Batch: 9124/38378 (23.77%) Loss: 1.860775 LR: 0.00004719 [03:33:14] Epoch: 1 Batch: 9125/38378 (23.78%) Loss: 1.963777 LR: 0.00004719 [03:33:16] Epoch: 1 Batch: 9126/38378 (23.78%) Loss: 2.122074 LR: 0.00004719 [03:33:18] Epoch: 1 Batch: 9127/38378 (23.78%) Loss: 2.130671 LR: 0.00004718 [03:33:19] Epoch: 1 Batch: 9128/38378 (23.78%) Loss: 2.174767 LR: 0.00004718 [03:33:21] Epoch: 1 Batch: 9129/38378 (23.79%) Loss: 2.199102 LR: 0.00004718 [03:33:23] Epoch: 1 Batch: 9130/38378 (23.79%) Loss: 1.872220 LR: 0.00004718 [03:33:25] Epoch: 1 Batch: 9131/38378 (23.79%) Loss: 1.939815 LR: 0.00004718 [03:33:27] Epoch: 1 Batch: 9132/38378 (23.79%) Loss: 1.855543 LR: 0.00004718 [03:33:29] Epoch: 1 Batch: 9133/38378 (23.80%) Loss: 1.708971 LR: 0.00004718 [03:33:30] Epoch: 1 Batch: 9134/38378 (23.80%) Loss: 1.941435 LR: 0.00004717 [03:33:32] Epoch: 1 Batch: 9135/38378 (23.80%) Loss: 2.181653 LR: 0.00004717 [03:33:34] Epoch: 1 Batch: 9136/38378 (23.81%) Loss: 2.149953 LR: 0.00004717 [03:33:36] Epoch: 1 Batch: 9137/38378 (23.81%) Loss: 1.866861 LR: 0.00004717 [03:33:38] Epoch: 1 Batch: 9138/38378 (23.81%) Loss: 1.876149 LR: 0.00004717 [03:33:39] Epoch: 1 Batch: 9139/38378 (23.81%) Loss: 2.261644 LR: 0.00004717 [03:33:41] Epoch: 1 Batch: 9140/38378 (23.82%) Loss: 2.005490 LR: 0.00004717 [03:33:43] Epoch: 1 Batch: 9141/38378 (23.82%) Loss: 2.091052 LR: 0.00004717 [03:33:45] Epoch: 1 Batch: 9142/38378 (23.82%) Loss: 1.917919 LR: 0.00004717 [03:33:47] Epoch: 1 Batch: 9143/38378 (23.82%) Loss: 2.101658 LR: 0.00004717 [03:33:49] Epoch: 1 Batch: 9144/38378 (23.83%) Loss: 1.978027 LR: 0.00004717 [03:33:50] Epoch: 1 Batch: 9145/38378 (23.83%) Loss: 1.658592 LR: 0.00004717 [03:33:52] Epoch: 1 Batch: 9146/38378 (23.83%) Loss: 2.129060 LR: 0.00004717 [03:33:54] Epoch: 1 Batch: 9147/38378 (23.83%) Loss: 1.873514 LR: 0.00004717 [03:33:56] Epoch: 1 Batch: 9148/38378 (23.84%) Loss: 2.097378 LR: 0.00004716 [03:33:58] Epoch: 1 Batch: 9149/38378 (23.84%) Loss: 1.903289 LR: 0.00004716 [03:34:00] Epoch: 1 Batch: 9150/38378 (23.84%) Loss: 2.131827 LR: 0.00004716 [03:34:01] Epoch: 1 Batch: 9151/38378 (23.84%) Loss: 1.978116 LR: 0.00004716 [03:34:03] Epoch: 1 Batch: 9152/38378 (23.85%) Loss: 1.805921 LR: 0.00004716 [03:34:05] Epoch: 1 Batch: 9153/38378 (23.85%) Loss: 2.032259 LR: 0.00004716 [03:34:07] Epoch: 1 Batch: 9154/38378 (23.85%) Loss: 2.140058 LR: 0.00004716 [03:34:09] Epoch: 1 Batch: 9155/38378 (23.85%) Loss: 1.907258 LR: 0.00004715 [03:34:11] Epoch: 1 Batch: 9156/38378 (23.86%) Loss: 1.895732 LR: 0.00004715 [03:34:12] Epoch: 1 Batch: 9157/38378 (23.86%) Loss: 1.838090 LR: 0.00004715 [03:34:14] Epoch: 1 Batch: 9158/38378 (23.86%) Loss: 2.210084 LR: 0.00004715 [03:34:16] Epoch: 1 Batch: 9159/38378 (23.87%) Loss: 1.670849 LR: 0.00004715 [03:34:18] Epoch: 1 Batch: 9160/38378 (23.87%) Loss: 1.913740 LR: 0.00004715 [03:34:20] Epoch: 1 Batch: 9161/38378 (23.87%) Loss: 2.113215 LR: 0.00004715 [03:34:21] Epoch: 1 Batch: 9162/38378 (23.87%) Loss: 1.922526 LR: 0.00004715 [03:34:23] Epoch: 1 Batch: 9163/38378 (23.88%) Loss: 1.947631 LR: 0.00004715 [03:34:25] Epoch: 1 Batch: 9164/38378 (23.88%) Loss: 1.963700 LR: 0.00004715 [03:34:27] Epoch: 1 Batch: 9165/38378 (23.88%) Loss: 2.084625 LR: 0.00004715 [03:34:29] Epoch: 1 Batch: 9166/38378 (23.88%) Loss: 1.770611 LR: 0.00004715 [03:34:31] Epoch: 1 Batch: 9167/38378 (23.89%) Loss: 2.056065 LR: 0.00004715 [03:34:32] Epoch: 1 Batch: 9168/38378 (23.89%) Loss: 2.156921 LR: 0.00004715 [03:34:34] Epoch: 1 Batch: 9169/38378 (23.89%) Loss: 2.097175 LR: 0.00004714 [03:34:36] Epoch: 1 Batch: 9170/38378 (23.89%) Loss: 2.078178 LR: 0.00004714 [03:34:38] Epoch: 1 Batch: 9171/38378 (23.90%) Loss: 1.917287 LR: 0.00004714 [03:34:40] Epoch: 1 Batch: 9172/38378 (23.90%) Loss: 1.956711 LR: 0.00004714 [03:34:42] Epoch: 1 Batch: 9173/38378 (23.90%) Loss: 1.565984 LR: 0.00004714 [03:34:43] Epoch: 1 Batch: 9174/38378 (23.90%) Loss: 2.101231 LR: 0.00004714 [03:34:45] Epoch: 1 Batch: 9175/38378 (23.91%) Loss: 2.117578 LR: 0.00004714 [03:34:47] Epoch: 1 Batch: 9176/38378 (23.91%) Loss: 1.913106 LR: 0.00004713 [03:34:49] Epoch: 1 Batch: 9177/38378 (23.91%) Loss: 1.847651 LR: 0.00004713 [03:34:50] Epoch: 1 Batch: 9178/38378 (23.91%) Loss: 1.821926 LR: 0.00004713 [03:34:52] Epoch: 1 Batch: 9179/38378 (23.92%) Loss: 2.054031 LR: 0.00004713 [03:34:54] Epoch: 1 Batch: 9180/38378 (23.92%) Loss: 1.955319 LR: 0.00004713 [03:34:56] Epoch: 1 Batch: 9181/38378 (23.92%) Loss: 2.059680 LR: 0.00004713 [03:34:58] Epoch: 1 Batch: 9182/38378 (23.93%) Loss: 1.888946 LR: 0.00004713 [03:35:00] Epoch: 1 Batch: 9183/38378 (23.93%) Loss: 2.193974 LR: 0.00004713 [03:35:01] Epoch: 1 Batch: 9184/38378 (23.93%) Loss: 2.220543 LR: 0.00004713 [03:35:03] Epoch: 1 Batch: 9185/38378 (23.93%) Loss: 1.997503 LR: 0.00004713 [03:35:05] Epoch: 1 Batch: 9186/38378 (23.94%) Loss: 1.838337 LR: 0.00004713 [03:35:07] Epoch: 1 Batch: 9187/38378 (23.94%) Loss: 1.952705 LR: 0.00004713 [03:35:09] Epoch: 1 Batch: 9188/38378 (23.94%) Loss: 2.418995 LR: 0.00004713 [03:35:11] Epoch: 1 Batch: 9189/38378 (23.94%) Loss: 1.808274 LR: 0.00004713 [03:35:12] Epoch: 1 Batch: 9190/38378 (23.95%) Loss: 2.031067 LR: 0.00004712 [03:35:14] Epoch: 1 Batch: 9191/38378 (23.95%) Loss: 2.273835 LR: 0.00004712 [03:35:16] Epoch: 1 Batch: 9192/38378 (23.95%) Loss: 2.171609 LR: 0.00004712 [03:35:18] Epoch: 1 Batch: 9193/38378 (23.95%) Loss: 2.059387 LR: 0.00004712 [03:35:20] Epoch: 1 Batch: 9194/38378 (23.96%) Loss: 2.081965 LR: 0.00004712 [03:35:21] Epoch: 1 Batch: 9195/38378 (23.96%) Loss: 1.935137 LR: 0.00004712 [03:35:23] Epoch: 1 Batch: 9196/38378 (23.96%) Loss: 2.303317 LR: 0.00004712 [03:35:25] Epoch: 1 Batch: 9197/38378 (23.96%) Loss: 1.855594 LR: 0.00004711 [03:35:27] Epoch: 1 Batch: 9198/38378 (23.97%) Loss: 2.379554 LR: 0.00004711 [03:35:29] Epoch: 1 Batch: 9199/38378 (23.97%) Loss: 1.914021 LR: 0.00004711 [03:35:35] >> Cleaned up old temp checkpoint: epoch1_step7887 [03:35:35] >> Temp checkpoint saved: epoch1_step9200, size: 0.1702 GB [03:35:35] Epoch: 1 Batch: 9200/38378 (23.97%) Loss: 1.924488 LR: 0.00004711 [03:35:37] Epoch: 1 Batch: 9201/38378 (23.97%) Loss: 2.049100 LR: 0.00004711 [03:35:38] Epoch: 1 Batch: 9202/38378 (23.98%) Loss: 2.258688 LR: 0.00004711 [03:35:40] Epoch: 1 Batch: 9203/38378 (23.98%) Loss: 2.119177 LR: 0.00004711 [03:35:42] Epoch: 1 Batch: 9204/38378 (23.98%) Loss: 1.781741 LR: 0.00004711 [03:35:44] Epoch: 1 Batch: 9205/38378 (23.99%) Loss: 1.811275 LR: 0.00004711 [03:35:46] Epoch: 1 Batch: 9206/38378 (23.99%) Loss: 1.814199 LR: 0.00004711 [03:35:47] Epoch: 1 Batch: 9207/38378 (23.99%) Loss: 1.877836 LR: 0.00004711 [03:35:49] Epoch: 1 Batch: 9208/38378 (23.99%) Loss: 2.017676 LR: 0.00004711 [03:35:51] Epoch: 1 Batch: 9209/38378 (24.00%) Loss: 2.079310 LR: 0.00004711 [03:35:53] Epoch: 1 Batch: 9210/38378 (24.00%) Loss: 1.857607 LR: 0.00004711 [03:35:55] Epoch: 1 Batch: 9211/38378 (24.00%) Loss: 1.955441 LR: 0.00004710 [03:35:57] Epoch: 1 Batch: 9212/38378 (24.00%) Loss: 2.211822 LR: 0.00004710 [03:35:58] Epoch: 1 Batch: 9213/38378 (24.01%) Loss: 2.152778 LR: 0.00004710 [03:36:00] Epoch: 1 Batch: 9214/38378 (24.01%) Loss: 2.085364 LR: 0.00004710 [03:36:02] Epoch: 1 Batch: 9215/38378 (24.01%) Loss: 1.877905 LR: 0.00004710 [03:36:04] Epoch: 1 Batch: 9216/38378 (24.01%) Loss: 2.189696 LR: 0.00004710 [03:36:06] Epoch: 1 Batch: 9217/38378 (24.02%) Loss: 1.845435 LR: 0.00004710 [03:36:08] Epoch: 1 Batch: 9218/38378 (24.02%) Loss: 2.057133 LR: 0.00004710 [03:36:09] Epoch: 1 Batch: 9219/38378 (24.02%) Loss: 2.139803 LR: 0.00004710 [03:36:11] Epoch: 1 Batch: 9220/38378 (24.02%) Loss: 1.877408 LR: 0.00004710 [03:36:13] Epoch: 1 Batch: 9221/38378 (24.03%) Loss: 1.657570 LR: 0.00004710 [03:36:15] Epoch: 1 Batch: 9222/38378 (24.03%) Loss: 2.100004 LR: 0.00004710 [03:36:17] Epoch: 1 Batch: 9223/38378 (24.03%) Loss: 1.719867 LR: 0.00004710 [03:36:19] Epoch: 1 Batch: 9224/38378 (24.03%) Loss: 1.912030 LR: 0.00004710 [03:36:20] Epoch: 1 Batch: 9225/38378 (24.04%) Loss: 2.195365 LR: 0.00004709 [03:36:22] Epoch: 1 Batch: 9226/38378 (24.04%) Loss: 1.902583 LR: 0.00004709 [03:36:24] Epoch: 1 Batch: 9227/38378 (24.04%) Loss: 2.103639 LR: 0.00004709 [03:36:26] Epoch: 1 Batch: 9228/38378 (24.05%) Loss: 1.949658 LR: 0.00004709 [03:36:28] Epoch: 1 Batch: 9229/38378 (24.05%) Loss: 2.160896 LR: 0.00004709 [03:36:30] Epoch: 1 Batch: 9230/38378 (24.05%) Loss: 2.153387 LR: 0.00004709 [03:36:32] Epoch: 1 Batch: 9231/38378 (24.05%) Loss: 2.135776 LR: 0.00004709 [03:36:33] Epoch: 1 Batch: 9232/38378 (24.06%) Loss: 1.914277 LR: 0.00004708 [03:36:35] Epoch: 1 Batch: 9233/38378 (24.06%) Loss: 2.041172 LR: 0.00004708 [03:36:37] Epoch: 1 Batch: 9234/38378 (24.06%) Loss: 1.846971 LR: 0.00004708 [03:36:39] Epoch: 1 Batch: 9235/38378 (24.06%) Loss: 2.163164 LR: 0.00004708 [03:36:41] Epoch: 1 Batch: 9236/38378 (24.07%) Loss: 1.870713 LR: 0.00004708 [03:36:42] Epoch: 1 Batch: 9237/38378 (24.07%) Loss: 1.978842 LR: 0.00004708 [03:36:44] Epoch: 1 Batch: 9238/38378 (24.07%) Loss: 2.035817 LR: 0.00004708 [03:36:46] Epoch: 1 Batch: 9239/38378 (24.07%) Loss: 2.068093 LR: 0.00004708 [03:36:48] Epoch: 1 Batch: 9240/38378 (24.08%) Loss: 1.922789 LR: 0.00004708 [03:36:50] Epoch: 1 Batch: 9241/38378 (24.08%) Loss: 2.093138 LR: 0.00004708 [03:36:51] Epoch: 1 Batch: 9242/38378 (24.08%) Loss: 2.021227 LR: 0.00004708 [03:36:53] Epoch: 1 Batch: 9243/38378 (24.08%) Loss: 2.025205 LR: 0.00004708 [03:36:55] Epoch: 1 Batch: 9244/38378 (24.09%) Loss: 2.330823 LR: 0.00004708 [03:36:57] Epoch: 1 Batch: 9245/38378 (24.09%) Loss: 1.823304 LR: 0.00004708 [03:36:58] Epoch: 1 Batch: 9246/38378 (24.09%) Loss: 1.778993 LR: 0.00004707 [03:37:00] Epoch: 1 Batch: 9247/38378 (24.09%) Loss: 2.035558 LR: 0.00004707 [03:37:02] Epoch: 1 Batch: 9248/38378 (24.10%) Loss: 1.942320 LR: 0.00004707 [03:37:04] Epoch: 1 Batch: 9249/38378 (24.10%) Loss: 2.212362 LR: 0.00004707 [03:37:06] Epoch: 1 Batch: 9250/38378 (24.10%) Loss: 1.931740 LR: 0.00004707 [03:37:07] Epoch: 1 Batch: 9251/38378 (24.10%) Loss: 1.836466 LR: 0.00004707 [03:37:09] Epoch: 1 Batch: 9252/38378 (24.11%) Loss: 1.740608 LR: 0.00004707 [03:37:11] Epoch: 1 Batch: 9253/38378 (24.11%) Loss: 1.675931 LR: 0.00004706 [03:37:13] Epoch: 1 Batch: 9254/38378 (24.11%) Loss: 1.729617 LR: 0.00004706 [03:37:15] Epoch: 1 Batch: 9255/38378 (24.12%) Loss: 1.734875 LR: 0.00004706 [03:37:16] Epoch: 1 Batch: 9256/38378 (24.12%) Loss: 2.294031 LR: 0.00004706 [03:37:18] Epoch: 1 Batch: 9257/38378 (24.12%) Loss: 1.747419 LR: 0.00004706 [03:37:20] Epoch: 1 Batch: 9258/38378 (24.12%) Loss: 2.028474 LR: 0.00004706 [03:37:22] Epoch: 1 Batch: 9259/38378 (24.13%) Loss: 1.888286 LR: 0.00004706 [03:37:24] Epoch: 1 Batch: 9260/38378 (24.13%) Loss: 2.163175 LR: 0.00004706 [03:37:25] Epoch: 1 Batch: 9261/38378 (24.13%) Loss: 2.029185 LR: 0.00004706 [03:37:27] Epoch: 1 Batch: 9262/38378 (24.13%) Loss: 2.004286 LR: 0.00004706 [03:37:29] Epoch: 1 Batch: 9263/38378 (24.14%) Loss: 1.940898 LR: 0.00004706 [03:37:31] Epoch: 1 Batch: 9264/38378 (24.14%) Loss: 1.787493 LR: 0.00004706 [03:37:33] Epoch: 1 Batch: 9265/38378 (24.14%) Loss: 2.217890 LR: 0.00004706 [03:37:35] Epoch: 1 Batch: 9266/38378 (24.14%) Loss: 1.874783 LR: 0.00004706 [03:37:36] Epoch: 1 Batch: 9267/38378 (24.15%) Loss: 2.140322 LR: 0.00004705 [03:37:38] Epoch: 1 Batch: 9268/38378 (24.15%) Loss: 1.935340 LR: 0.00004705 [03:37:40] Epoch: 1 Batch: 9269/38378 (24.15%) Loss: 1.826662 LR: 0.00004705 [03:37:42] Epoch: 1 Batch: 9270/38378 (24.15%) Loss: 1.846721 LR: 0.00004705 [03:37:44] Epoch: 1 Batch: 9271/38378 (24.16%) Loss: 2.129752 LR: 0.00004705 [03:37:46] Epoch: 1 Batch: 9272/38378 (24.16%) Loss: 2.090233 LR: 0.00004705 [03:37:47] Epoch: 1 Batch: 9273/38378 (24.16%) Loss: 1.966987 LR: 0.00004705 [03:37:49] Epoch: 1 Batch: 9274/38378 (24.16%) Loss: 1.896914 LR: 0.00004704 [03:37:51] Epoch: 1 Batch: 9275/38378 (24.17%) Loss: 1.973822 LR: 0.00004704 [03:37:53] Epoch: 1 Batch: 9276/38378 (24.17%) Loss: 1.791646 LR: 0.00004704 [03:37:55] Epoch: 1 Batch: 9277/38378 (24.17%) Loss: 2.126816 LR: 0.00004704 [03:37:56] Epoch: 1 Batch: 9278/38378 (24.18%) Loss: 2.230977 LR: 0.00004704 [03:37:58] Epoch: 1 Batch: 9279/38378 (24.18%) Loss: 2.134949 LR: 0.00004704 [03:38:00] Epoch: 1 Batch: 9280/38378 (24.18%) Loss: 1.887068 LR: 0.00004704 [03:38:02] Epoch: 1 Batch: 9281/38378 (24.18%) Loss: 1.959369 LR: 0.00004704 [03:38:04] Epoch: 1 Batch: 9282/38378 (24.19%) Loss: 1.944015 LR: 0.00004704 [03:38:06] Epoch: 1 Batch: 9283/38378 (24.19%) Loss: 2.243762 LR: 0.00004704 [03:38:07] Epoch: 1 Batch: 9284/38378 (24.19%) Loss: 2.169785 LR: 0.00004704 [03:38:09] Epoch: 1 Batch: 9285/38378 (24.19%) Loss: 1.654997 LR: 0.00004704 [03:38:11] Epoch: 1 Batch: 9286/38378 (24.20%) Loss: 2.036184 LR: 0.00004704 [03:38:13] Epoch: 1 Batch: 9287/38378 (24.20%) Loss: 1.897310 LR: 0.00004704 [03:38:15] Epoch: 1 Batch: 9288/38378 (24.20%) Loss: 2.191240 LR: 0.00004703 [03:38:16] Epoch: 1 Batch: 9289/38378 (24.20%) Loss: 1.815630 LR: 0.00004703 [03:38:18] Epoch: 1 Batch: 9290/38378 (24.21%) Loss: 2.182502 LR: 0.00004703 [03:38:20] Epoch: 1 Batch: 9291/38378 (24.21%) Loss: 1.854906 LR: 0.00004703 [03:38:22] Epoch: 1 Batch: 9292/38378 (24.21%) Loss: 2.038037 LR: 0.00004703 [03:38:24] Epoch: 1 Batch: 9293/38378 (24.21%) Loss: 1.955106 LR: 0.00004703 [03:38:25] Epoch: 1 Batch: 9294/38378 (24.22%) Loss: 1.801815 LR: 0.00004703 [03:38:27] Epoch: 1 Batch: 9295/38378 (24.22%) Loss: 2.124083 LR: 0.00004702 [03:38:29] Epoch: 1 Batch: 9296/38378 (24.22%) Loss: 1.966577 LR: 0.00004702 [03:38:31] Epoch: 1 Batch: 9297/38378 (24.22%) Loss: 1.799548 LR: 0.00004702 [03:38:33] Epoch: 1 Batch: 9298/38378 (24.23%) Loss: 2.349067 LR: 0.00004702 [03:38:34] Epoch: 1 Batch: 9299/38378 (24.23%) Loss: 2.000731 LR: 0.00004702 [03:38:36] Epoch: 1 Batch: 9300/38378 (24.23%) Loss: 1.986765 LR: 0.00004702 [03:38:38] Epoch: 1 Batch: 9301/38378 (24.24%) Loss: 1.861080 LR: 0.00004702 [03:38:40] Epoch: 1 Batch: 9302/38378 (24.24%) Loss: 1.972998 LR: 0.00004702 [03:38:42] Epoch: 1 Batch: 9303/38378 (24.24%) Loss: 1.804051 LR: 0.00004702 [03:38:44] Epoch: 1 Batch: 9304/38378 (24.24%) Loss: 2.166611 LR: 0.00004702 [03:38:45] Epoch: 1 Batch: 9305/38378 (24.25%) Loss: 1.333106 LR: 0.00004702 [03:38:47] Epoch: 1 Batch: 9306/38378 (24.25%) Loss: 2.003067 LR: 0.00004702 [03:38:49] Epoch: 1 Batch: 9307/38378 (24.25%) Loss: 1.904734 LR: 0.00004702 [03:38:51] Epoch: 1 Batch: 9308/38378 (24.25%) Loss: 1.959138 LR: 0.00004702 [03:38:53] Epoch: 1 Batch: 9309/38378 (24.26%) Loss: 1.952844 LR: 0.00004701 [03:38:54] Epoch: 1 Batch: 9310/38378 (24.26%) Loss: 1.962507 LR: 0.00004701 [03:38:56] Epoch: 1 Batch: 9311/38378 (24.26%) Loss: 2.008141 LR: 0.00004701 [03:38:58] Epoch: 1 Batch: 9312/38378 (24.26%) Loss: 2.100358 LR: 0.00004701 [03:39:00] Epoch: 1 Batch: 9313/38378 (24.27%) Loss: 1.849859 LR: 0.00004701 [03:39:02] Epoch: 1 Batch: 9314/38378 (24.27%) Loss: 1.793861 LR: 0.00004701 [03:39:04] Epoch: 1 Batch: 9315/38378 (24.27%) Loss: 2.213338 LR: 0.00004701 [03:39:05] Epoch: 1 Batch: 9316/38378 (24.27%) Loss: 2.017798 LR: 0.00004700 [03:39:07] Epoch: 1 Batch: 9317/38378 (24.28%) Loss: 1.960959 LR: 0.00004700 [03:39:09] Epoch: 1 Batch: 9318/38378 (24.28%) Loss: 2.110336 LR: 0.00004700 [03:39:11] Epoch: 1 Batch: 9319/38378 (24.28%) Loss: 1.740627 LR: 0.00004700 [03:39:13] Epoch: 1 Batch: 9320/38378 (24.28%) Loss: 2.046599 LR: 0.00004700 [03:39:15] Epoch: 1 Batch: 9321/38378 (24.29%) Loss: 2.177114 LR: 0.00004700 [03:39:16] Epoch: 1 Batch: 9322/38378 (24.29%) Loss: 1.874116 LR: 0.00004700 [03:39:18] Epoch: 1 Batch: 9323/38378 (24.29%) Loss: 2.045886 LR: 0.00004700 [03:39:20] Epoch: 1 Batch: 9324/38378 (24.30%) Loss: 2.017733 LR: 0.00004700 [03:39:22] Epoch: 1 Batch: 9325/38378 (24.30%) Loss: 1.900532 LR: 0.00004700 [03:39:24] Epoch: 1 Batch: 9326/38378 (24.30%) Loss: 1.882755 LR: 0.00004700 [03:39:26] Epoch: 1 Batch: 9327/38378 (24.30%) Loss: 1.954517 LR: 0.00004700 [03:39:27] Epoch: 1 Batch: 9328/38378 (24.31%) Loss: 2.319405 LR: 0.00004700 [03:39:29] Epoch: 1 Batch: 9329/38378 (24.31%) Loss: 2.144419 LR: 0.00004700 [03:39:31] Epoch: 1 Batch: 9330/38378 (24.31%) Loss: 1.771830 LR: 0.00004699 [03:39:33] Epoch: 1 Batch: 9331/38378 (24.31%) Loss: 2.029032 LR: 0.00004699 [03:39:35] Epoch: 1 Batch: 9332/38378 (24.32%) Loss: 2.252746 LR: 0.00004699 [03:39:36] Epoch: 1 Batch: 9333/38378 (24.32%) Loss: 1.948323 LR: 0.00004699 [03:39:38] Epoch: 1 Batch: 9334/38378 (24.32%) Loss: 1.812333 LR: 0.00004699 [03:39:40] Epoch: 1 Batch: 9335/38378 (24.32%) Loss: 2.006802 LR: 0.00004699 [03:39:42] Epoch: 1 Batch: 9336/38378 (24.33%) Loss: 1.929848 LR: 0.00004699 [03:39:44] Epoch: 1 Batch: 9337/38378 (24.33%) Loss: 1.903523 LR: 0.00004698 [03:39:45] Epoch: 1 Batch: 9338/38378 (24.33%) Loss: 1.825499 LR: 0.00004698 [03:39:47] Epoch: 1 Batch: 9339/38378 (24.33%) Loss: 2.097974 LR: 0.00004698 [03:39:49] Epoch: 1 Batch: 9340/38378 (24.34%) Loss: 1.999614 LR: 0.00004698 [03:39:51] Epoch: 1 Batch: 9341/38378 (24.34%) Loss: 2.056019 LR: 0.00004698 [03:39:53] Epoch: 1 Batch: 9342/38378 (24.34%) Loss: 2.314937 LR: 0.00004698 [03:39:55] Epoch: 1 Batch: 9343/38378 (24.34%) Loss: 1.976218 LR: 0.00004698 [03:39:56] Epoch: 1 Batch: 9344/38378 (24.35%) Loss: 2.013579 LR: 0.00004698 [03:39:58] Epoch: 1 Batch: 9345/38378 (24.35%) Loss: 2.185159 LR: 0.00004698 [03:40:00] Epoch: 1 Batch: 9346/38378 (24.35%) Loss: 2.186000 LR: 0.00004698 [03:40:02] Epoch: 1 Batch: 9347/38378 (24.36%) Loss: 1.961609 LR: 0.00004698 [03:40:04] Epoch: 1 Batch: 9348/38378 (24.36%) Loss: 2.039341 LR: 0.00004698 [03:40:05] Epoch: 1 Batch: 9349/38378 (24.36%) Loss: 2.479078 LR: 0.00004698 [03:40:07] Epoch: 1 Batch: 9350/38378 (24.36%) Loss: 1.934542 LR: 0.00004698 [03:40:09] Epoch: 1 Batch: 9351/38378 (24.37%) Loss: 2.021327 LR: 0.00004697 [03:40:11] Epoch: 1 Batch: 9352/38378 (24.37%) Loss: 1.910714 LR: 0.00004697 [03:40:12] Epoch: 1 Batch: 9353/38378 (24.37%) Loss: 1.951018 LR: 0.00004697 [03:40:14] Epoch: 1 Batch: 9354/38378 (24.37%) Loss: 1.996546 LR: 0.00004697 [03:40:16] Epoch: 1 Batch: 9355/38378 (24.38%) Loss: 2.385044 LR: 0.00004697 [03:40:18] Epoch: 1 Batch: 9356/38378 (24.38%) Loss: 1.789272 LR: 0.00004697 [03:40:20] Epoch: 1 Batch: 9357/38378 (24.38%) Loss: 2.214319 LR: 0.00004697 [03:40:21] Epoch: 1 Batch: 9358/38378 (24.38%) Loss: 1.988006 LR: 0.00004696 [03:40:23] Epoch: 1 Batch: 9359/38378 (24.39%) Loss: 1.992482 LR: 0.00004696 [03:40:25] Epoch: 1 Batch: 9360/38378 (24.39%) Loss: 2.126733 LR: 0.00004696 [03:40:27] Epoch: 1 Batch: 9361/38378 (24.39%) Loss: 1.822051 LR: 0.00004696 [03:40:29] Epoch: 1 Batch: 9362/38378 (24.39%) Loss: 2.107751 LR: 0.00004696 [03:40:30] Epoch: 1 Batch: 9363/38378 (24.40%) Loss: 1.997035 LR: 0.00004696 [03:40:32] Epoch: 1 Batch: 9364/38378 (24.40%) Loss: 2.184773 LR: 0.00004696 [03:40:34] Epoch: 1 Batch: 9365/38378 (24.40%) Loss: 2.271525 LR: 0.00004696 [03:40:36] Epoch: 1 Batch: 9366/38378 (24.40%) Loss: 1.800302 LR: 0.00004696 [03:40:38] Epoch: 1 Batch: 9367/38378 (24.41%) Loss: 1.953322 LR: 0.00004696 [03:40:39] Epoch: 1 Batch: 9368/38378 (24.41%) Loss: 1.720931 LR: 0.00004696 [03:40:41] Epoch: 1 Batch: 9369/38378 (24.41%) Loss: 1.912436 LR: 0.00004696 [03:40:43] Epoch: 1 Batch: 9370/38378 (24.42%) Loss: 1.884130 LR: 0.00004696 [03:40:45] Epoch: 1 Batch: 9371/38378 (24.42%) Loss: 2.269929 LR: 0.00004696 [03:40:47] Epoch: 1 Batch: 9372/38378 (24.42%) Loss: 2.134584 LR: 0.00004695 [03:40:49] Epoch: 1 Batch: 9373/38378 (24.42%) Loss: 2.125122 LR: 0.00004695 [03:40:50] Epoch: 1 Batch: 9374/38378 (24.43%) Loss: 1.884197 LR: 0.00004695 [03:40:52] Epoch: 1 Batch: 9375/38378 (24.43%) Loss: 2.103548 LR: 0.00004695 [03:40:54] Epoch: 1 Batch: 9376/38378 (24.43%) Loss: 1.918392 LR: 0.00004695 [03:40:56] Epoch: 1 Batch: 9377/38378 (24.43%) Loss: 2.509785 LR: 0.00004695 [03:40:57] Epoch: 1 Batch: 9378/38378 (24.44%) Loss: 2.011983 LR: 0.00004695 [03:40:59] Epoch: 1 Batch: 9379/38378 (24.44%) Loss: 2.089715 LR: 0.00004694 [03:41:01] Epoch: 1 Batch: 9380/38378 (24.44%) Loss: 1.831910 LR: 0.00004694 [03:41:03] Epoch: 1 Batch: 9381/38378 (24.44%) Loss: 2.179665 LR: 0.00004694 [03:41:05] Epoch: 1 Batch: 9382/38378 (24.45%) Loss: 1.936359 LR: 0.00004694 [03:41:06] Epoch: 1 Batch: 9383/38378 (24.45%) Loss: 2.025154 LR: 0.00004694 [03:41:08] Epoch: 1 Batch: 9384/38378 (24.45%) Loss: 2.015620 LR: 0.00004694 [03:41:10] Epoch: 1 Batch: 9385/38378 (24.45%) Loss: 1.940558 LR: 0.00004694 [03:41:12] Epoch: 1 Batch: 9386/38378 (24.46%) Loss: 1.895806 LR: 0.00004694 [03:41:14] Epoch: 1 Batch: 9387/38378 (24.46%) Loss: 2.048493 LR: 0.00004694 [03:41:16] Epoch: 1 Batch: 9388/38378 (24.46%) Loss: 2.012376 LR: 0.00004694 [03:41:17] Epoch: 1 Batch: 9389/38378 (24.46%) Loss: 1.648322 LR: 0.00004694 [03:41:19] Epoch: 1 Batch: 9390/38378 (24.47%) Loss: 2.028505 LR: 0.00004694 [03:41:21] Epoch: 1 Batch: 9391/38378 (24.47%) Loss: 2.082756 LR: 0.00004694 [03:41:23] Epoch: 1 Batch: 9392/38378 (24.47%) Loss: 1.755790 LR: 0.00004694 [03:41:25] Epoch: 1 Batch: 9393/38378 (24.47%) Loss: 2.054532 LR: 0.00004693 [03:41:27] Epoch: 1 Batch: 9394/38378 (24.48%) Loss: 2.091036 LR: 0.00004693 [03:41:28] Epoch: 1 Batch: 9395/38378 (24.48%) Loss: 2.002744 LR: 0.00004693 [03:41:30] Epoch: 1 Batch: 9396/38378 (24.48%) Loss: 1.953247 LR: 0.00004693 [03:41:32] Epoch: 1 Batch: 9397/38378 (24.49%) Loss: 1.783574 LR: 0.00004693 [03:41:34] Epoch: 1 Batch: 9398/38378 (24.49%) Loss: 2.081086 LR: 0.00004693 [03:41:36] Epoch: 1 Batch: 9399/38378 (24.49%) Loss: 2.042548 LR: 0.00004693 [03:41:42] >> Cleaned up old temp checkpoint: epoch1_step7920 [03:41:42] >> Temp checkpoint saved: epoch1_step9400, size: 0.1702 GB [03:41:42] Epoch: 1 Batch: 9400/38378 (24.49%) Loss: 1.854210 LR: 0.00004692 [03:41:43] Epoch: 1 Batch: 9401/38378 (24.50%) Loss: 2.078813 LR: 0.00004692 [03:41:45] Epoch: 1 Batch: 9402/38378 (24.50%) Loss: 2.117598 LR: 0.00004692 [03:41:47] Epoch: 1 Batch: 9403/38378 (24.50%) Loss: 2.049081 LR: 0.00004692 [03:41:49] Epoch: 1 Batch: 9404/38378 (24.50%) Loss: 1.748989 LR: 0.00004692 [03:41:51] Epoch: 1 Batch: 9405/38378 (24.51%) Loss: 1.956632 LR: 0.00004692 [03:41:52] Epoch: 1 Batch: 9406/38378 (24.51%) Loss: 2.168120 LR: 0.00004692 [03:41:54] Epoch: 1 Batch: 9407/38378 (24.51%) Loss: 2.064944 LR: 0.00004692 [03:41:56] Epoch: 1 Batch: 9408/38378 (24.51%) Loss: 1.695845 LR: 0.00004692 [03:41:58] Epoch: 1 Batch: 9409/38378 (24.52%) Loss: 1.774383 LR: 0.00004692 [03:42:00] Epoch: 1 Batch: 9410/38378 (24.52%) Loss: 2.120658 LR: 0.00004692 [03:42:01] Epoch: 1 Batch: 9411/38378 (24.52%) Loss: 2.279830 LR: 0.00004692 [03:42:03] Epoch: 1 Batch: 9412/38378 (24.52%) Loss: 1.763329 LR: 0.00004692 [03:42:05] Epoch: 1 Batch: 9413/38378 (24.53%) Loss: 1.748884 LR: 0.00004692 [03:42:07] Epoch: 1 Batch: 9414/38378 (24.53%) Loss: 1.755971 LR: 0.00004691 [03:42:09] Epoch: 1 Batch: 9415/38378 (24.53%) Loss: 1.800819 LR: 0.00004691 [03:42:11] Epoch: 1 Batch: 9416/38378 (24.53%) Loss: 2.011318 LR: 0.00004691 [03:42:12] Epoch: 1 Batch: 9417/38378 (24.54%) Loss: 1.777598 LR: 0.00004691 [03:42:14] Epoch: 1 Batch: 9418/38378 (24.54%) Loss: 1.872319 LR: 0.00004691 [03:42:16] Epoch: 1 Batch: 9419/38378 (24.54%) Loss: 2.008355 LR: 0.00004691 [03:42:18] Epoch: 1 Batch: 9420/38378 (24.55%) Loss: 2.199481 LR: 0.00004691 [03:42:20] Epoch: 1 Batch: 9421/38378 (24.55%) Loss: 1.769303 LR: 0.00004690 [03:42:22] Epoch: 1 Batch: 9422/38378 (24.55%) Loss: 2.189212 LR: 0.00004690 [03:42:23] Epoch: 1 Batch: 9423/38378 (24.55%) Loss: 1.792142 LR: 0.00004690 [03:42:25] Epoch: 1 Batch: 9424/38378 (24.56%) Loss: 1.855767 LR: 0.00004690 [03:42:27] Epoch: 1 Batch: 9425/38378 (24.56%) Loss: 1.943227 LR: 0.00004690 [03:42:29] Epoch: 1 Batch: 9426/38378 (24.56%) Loss: 2.149127 LR: 0.00004690 [03:42:31] Epoch: 1 Batch: 9427/38378 (24.56%) Loss: 1.988283 LR: 0.00004690 [03:42:33] Epoch: 1 Batch: 9428/38378 (24.57%) Loss: 1.889620 LR: 0.00004690 [03:42:34] Epoch: 1 Batch: 9429/38378 (24.57%) Loss: 1.961084 LR: 0.00004690 [03:42:36] Epoch: 1 Batch: 9430/38378 (24.57%) Loss: 2.025715 LR: 0.00004690 [03:42:38] Epoch: 1 Batch: 9431/38378 (24.57%) Loss: 2.037203 LR: 0.00004690 [03:42:40] Epoch: 1 Batch: 9432/38378 (24.58%) Loss: 2.045723 LR: 0.00004690 [03:42:42] Epoch: 1 Batch: 9433/38378 (24.58%) Loss: 2.080145 LR: 0.00004690 [03:42:43] Epoch: 1 Batch: 9434/38378 (24.58%) Loss: 2.069897 LR: 0.00004690 [03:42:45] Epoch: 1 Batch: 9435/38378 (24.58%) Loss: 2.214781 LR: 0.00004689 [03:42:47] Epoch: 1 Batch: 9436/38378 (24.59%) Loss: 2.401160 LR: 0.00004689 [03:42:49] Epoch: 1 Batch: 9437/38378 (24.59%) Loss: 1.940175 LR: 0.00004689 [03:42:51] Epoch: 1 Batch: 9438/38378 (24.59%) Loss: 2.149864 LR: 0.00004689 [03:42:53] Epoch: 1 Batch: 9439/38378 (24.59%) Loss: 2.142200 LR: 0.00004689 [03:42:54] Epoch: 1 Batch: 9440/38378 (24.60%) Loss: 1.880942 LR: 0.00004689 [03:42:56] Epoch: 1 Batch: 9441/38378 (24.60%) Loss: 2.109797 LR: 0.00004689 [03:42:58] Epoch: 1 Batch: 9442/38378 (24.60%) Loss: 2.088143 LR: 0.00004688 [03:43:00] Epoch: 1 Batch: 9443/38378 (24.61%) Loss: 1.787108 LR: 0.00004688 [03:43:02] Epoch: 1 Batch: 9444/38378 (24.61%) Loss: 1.949236 LR: 0.00004688 [03:43:03] Epoch: 1 Batch: 9445/38378 (24.61%) Loss: 2.018074 LR: 0.00004688 [03:43:05] Epoch: 1 Batch: 9446/38378 (24.61%) Loss: 1.836638 LR: 0.00004688 [03:43:07] Epoch: 1 Batch: 9447/38378 (24.62%) Loss: 1.741730 LR: 0.00004688 [03:43:09] Epoch: 1 Batch: 9448/38378 (24.62%) Loss: 1.901615 LR: 0.00004688 [03:43:11] Epoch: 1 Batch: 9449/38378 (24.62%) Loss: 2.207136 LR: 0.00004688 [03:43:13] Epoch: 1 Batch: 9450/38378 (24.62%) Loss: 2.071441 LR: 0.00004688 [03:43:14] Epoch: 1 Batch: 9451/38378 (24.63%) Loss: 2.099325 LR: 0.00004688 [03:43:16] Epoch: 1 Batch: 9452/38378 (24.63%) Loss: 1.832772 LR: 0.00004688 [03:43:18] Epoch: 1 Batch: 9453/38378 (24.63%) Loss: 2.238715 LR: 0.00004688 [03:43:20] Epoch: 1 Batch: 9454/38378 (24.63%) Loss: 2.278389 LR: 0.00004688 [03:43:22] Epoch: 1 Batch: 9455/38378 (24.64%) Loss: 2.271536 LR: 0.00004688 [03:43:23] Epoch: 1 Batch: 9456/38378 (24.64%) Loss: 1.970615 LR: 0.00004687 [03:43:25] Epoch: 1 Batch: 9457/38378 (24.64%) Loss: 1.944601 LR: 0.00004687 [03:43:27] Epoch: 1 Batch: 9458/38378 (24.64%) Loss: 1.817497 LR: 0.00004687 [03:43:29] Epoch: 1 Batch: 9459/38378 (24.65%) Loss: 2.246702 LR: 0.00004687 [03:43:31] Epoch: 1 Batch: 9460/38378 (24.65%) Loss: 2.015997 LR: 0.00004687 [03:43:32] Epoch: 1 Batch: 9461/38378 (24.65%) Loss: 2.469110 LR: 0.00004687 [03:43:34] Epoch: 1 Batch: 9462/38378 (24.65%) Loss: 1.776531 LR: 0.00004687 [03:43:36] Epoch: 1 Batch: 9463/38378 (24.66%) Loss: 1.795169 LR: 0.00004686 [03:43:38] Epoch: 1 Batch: 9464/38378 (24.66%) Loss: 1.952488 LR: 0.00004686 [03:43:40] Epoch: 1 Batch: 9465/38378 (24.66%) Loss: 1.958284 LR: 0.00004686 [03:43:42] Epoch: 1 Batch: 9466/38378 (24.67%) Loss: 1.830424 LR: 0.00004686 [03:43:43] Epoch: 1 Batch: 9467/38378 (24.67%) Loss: 2.064728 LR: 0.00004686 [03:43:45] Epoch: 1 Batch: 9468/38378 (24.67%) Loss: 2.152834 LR: 0.00004686 [03:43:47] Epoch: 1 Batch: 9469/38378 (24.67%) Loss: 1.788657 LR: 0.00004686 [03:43:49] Epoch: 1 Batch: 9470/38378 (24.68%) Loss: 1.997308 LR: 0.00004686 [03:43:51] Epoch: 1 Batch: 9471/38378 (24.68%) Loss: 2.240062 LR: 0.00004686 [03:43:53] Epoch: 1 Batch: 9472/38378 (24.68%) Loss: 1.724711 LR: 0.00004686 [03:43:54] Epoch: 1 Batch: 9473/38378 (24.68%) Loss: 1.796827 LR: 0.00004686 [03:43:56] Epoch: 1 Batch: 9474/38378 (24.69%) Loss: 2.026116 LR: 0.00004686 [03:43:58] Epoch: 1 Batch: 9475/38378 (24.69%) Loss: 2.083722 LR: 0.00004686 [03:44:00] Epoch: 1 Batch: 9476/38378 (24.69%) Loss: 1.918852 LR: 0.00004686 [03:44:02] Epoch: 1 Batch: 9477/38378 (24.69%) Loss: 2.019364 LR: 0.00004685 [03:44:04] Epoch: 1 Batch: 9478/38378 (24.70%) Loss: 1.817547 LR: 0.00004685 [03:44:05] Epoch: 1 Batch: 9479/38378 (24.70%) Loss: 2.028699 LR: 0.00004685 [03:44:07] Epoch: 1 Batch: 9480/38378 (24.70%) Loss: 2.173977 LR: 0.00004685 [03:44:09] Epoch: 1 Batch: 9481/38378 (24.70%) Loss: 1.948057 LR: 0.00004685 [03:44:11] Epoch: 1 Batch: 9482/38378 (24.71%) Loss: 1.989633 LR: 0.00004685 [03:44:13] Epoch: 1 Batch: 9483/38378 (24.71%) Loss: 2.211272 LR: 0.00004685 [03:44:15] Epoch: 1 Batch: 9484/38378 (24.71%) Loss: 2.071044 LR: 0.00004684 [03:44:16] Epoch: 1 Batch: 9485/38378 (24.71%) Loss: 1.930968 LR: 0.00004684 [03:44:18] Epoch: 1 Batch: 9486/38378 (24.72%) Loss: 2.240852 LR: 0.00004684 [03:44:20] Epoch: 1 Batch: 9487/38378 (24.72%) Loss: 1.879903 LR: 0.00004684 [03:44:22] Epoch: 1 Batch: 9488/38378 (24.72%) Loss: 1.816054 LR: 0.00004684 [03:44:23] Epoch: 1 Batch: 9489/38378 (24.73%) Loss: 2.163644 LR: 0.00004684 [03:44:25] Epoch: 1 Batch: 9490/38378 (24.73%) Loss: 2.067294 LR: 0.00004684 [03:44:27] Epoch: 1 Batch: 9491/38378 (24.73%) Loss: 1.938173 LR: 0.00004684 [03:44:29] Epoch: 1 Batch: 9492/38378 (24.73%) Loss: 2.417358 LR: 0.00004684 [03:44:31] Epoch: 1 Batch: 9493/38378 (24.74%) Loss: 2.253709 LR: 0.00004684 [03:44:32] Epoch: 1 Batch: 9494/38378 (24.74%) Loss: 2.175321 LR: 0.00004684 [03:44:34] Epoch: 1 Batch: 9495/38378 (24.74%) Loss: 1.983510 LR: 0.00004684 [03:44:36] Epoch: 1 Batch: 9496/38378 (24.74%) Loss: 1.935474 LR: 0.00004684 [03:44:38] Epoch: 1 Batch: 9497/38378 (24.75%) Loss: 1.762043 LR: 0.00004684 [03:44:40] Epoch: 1 Batch: 9498/38378 (24.75%) Loss: 2.027140 LR: 0.00004683 [03:44:41] Epoch: 1 Batch: 9499/38378 (24.75%) Loss: 2.158940 LR: 0.00004683 [03:44:43] >> Evaluating batch 0 [03:44:44] >> Evaluating batch 1 [03:44:45] >> Evaluating batch 2 [03:44:46] >> Evaluating batch 3 [03:44:47] >> Evaluating batch 4 [03:44:48] >> Evaluating batch 5 [03:44:49] >> Evaluating batch 6 [03:44:50] >> Evaluating batch 7 [03:44:51] >> Evaluating batch 8 [03:44:52] >> Evaluating batch 9 [03:44:53] >> Evaluating batch 10 [03:44:54] >> Evaluating batch 11 [03:44:55] >> Evaluating batch 12 [03:44:56] >> Evaluating batch 13 [03:44:57] >> Evaluating batch 14 [03:44:58] >> Evaluating batch 15 [03:44:59] >> Evaluating batch 16 [03:45:00] Epoch: 1 Step: 9500/38378 Evaluation: [03:45:00] [1mAvg Loss Since Last Eval: 2.0006 Val Loss: 2.1375 Validation loss delta: 0.0208 Perplexity: 8.4785 LR: 0.00004683 [03:45:04] >> Checkpoint saved: epoch1_step9500, size: 0.1702 GB [03:45:04] Epoch: 1 Batch: 9500/38378 (24.75%) Loss: 2.334390 LR: 0.00004683 [03:45:06] Epoch: 1 Batch: 9501/38378 (24.76%) Loss: 2.213674 LR: 0.00004683 [03:45:08] Epoch: 1 Batch: 9502/38378 (24.76%) Loss: 2.075346 LR: 0.00004683 [03:45:10] Epoch: 1 Batch: 9503/38378 (24.76%) Loss: 1.780990 LR: 0.00004683 [03:45:11] Epoch: 1 Batch: 9504/38378 (24.76%) Loss: 2.093056 LR: 0.00004683 [03:45:13] Epoch: 1 Batch: 9505/38378 (24.77%) Loss: 1.961994 LR: 0.00004682 [03:45:15] Epoch: 1 Batch: 9506/38378 (24.77%) Loss: 2.146558 LR: 0.00004682 [03:45:17] Epoch: 1 Batch: 9507/38378 (24.77%) Loss: 2.033780 LR: 0.00004682 [03:45:19] Epoch: 1 Batch: 9508/38378 (24.77%) Loss: 1.915588 LR: 0.00004682 [03:45:20] Epoch: 1 Batch: 9509/38378 (24.78%) Loss: 1.824688 LR: 0.00004682 [03:45:22] Epoch: 1 Batch: 9510/38378 (24.78%) Loss: 2.042005 LR: 0.00004682 [03:45:24] Epoch: 1 Batch: 9511/38378 (24.78%) Loss: 1.945111 LR: 0.00004682 [03:45:26] Epoch: 1 Batch: 9512/38378 (24.79%) Loss: 1.988561 LR: 0.00004682 [03:45:28] Epoch: 1 Batch: 9513/38378 (24.79%) Loss: 2.202442 LR: 0.00004682 [03:45:30] Epoch: 1 Batch: 9514/38378 (24.79%) Loss: 1.891542 LR: 0.00004682 [03:45:31] Epoch: 1 Batch: 9515/38378 (24.79%) Loss: 1.950625 LR: 0.00004682 [03:45:33] Epoch: 1 Batch: 9516/38378 (24.80%) Loss: 2.165046 LR: 0.00004682 [03:45:35] Epoch: 1 Batch: 9517/38378 (24.80%) Loss: 1.849011 LR: 0.00004682 [03:45:37] Epoch: 1 Batch: 9518/38378 (24.80%) Loss: 1.985897 LR: 0.00004682 [03:45:39] Epoch: 1 Batch: 9519/38378 (24.80%) Loss: 1.978360 LR: 0.00004681 [03:45:41] Epoch: 1 Batch: 9520/38378 (24.81%) Loss: 1.917884 LR: 0.00004681 [03:45:42] Epoch: 1 Batch: 9521/38378 (24.81%) Loss: 1.885475 LR: 0.00004681 [03:45:44] Epoch: 1 Batch: 9522/38378 (24.81%) Loss: 2.197337 LR: 0.00004681 [03:45:46] Epoch: 1 Batch: 9523/38378 (24.81%) Loss: 1.817958 LR: 0.00004681 [03:45:48] Epoch: 1 Batch: 9524/38378 (24.82%) Loss: 2.252729 LR: 0.00004681 [03:45:50] Epoch: 1 Batch: 9525/38378 (24.82%) Loss: 2.163118 LR: 0.00004681 [03:45:51] Epoch: 1 Batch: 9526/38378 (24.82%) Loss: 1.864224 LR: 0.00004680 [03:45:53] Epoch: 1 Batch: 9527/38378 (24.82%) Loss: 2.059297 LR: 0.00004680 [03:45:55] Epoch: 1 Batch: 9528/38378 (24.83%) Loss: 2.102667 LR: 0.00004680 [03:45:57] Epoch: 1 Batch: 9529/38378 (24.83%) Loss: 1.701247 LR: 0.00004680 [03:45:59] Epoch: 1 Batch: 9530/38378 (24.83%) Loss: 2.137001 LR: 0.00004680 [03:46:00] Epoch: 1 Batch: 9531/38378 (24.83%) Loss: 1.997295 LR: 0.00004680 [03:46:02] Epoch: 1 Batch: 9532/38378 (24.84%) Loss: 1.888281 LR: 0.00004680 [03:46:04] Epoch: 1 Batch: 9533/38378 (24.84%) Loss: 2.272129 LR: 0.00004680 [03:46:06] Epoch: 1 Batch: 9534/38378 (24.84%) Loss: 2.320078 LR: 0.00004680 [03:46:08] Epoch: 1 Batch: 9535/38378 (24.84%) Loss: 2.116017 LR: 0.00004680 [03:46:10] Epoch: 1 Batch: 9536/38378 (24.85%) Loss: 1.638931 LR: 0.00004680 [03:46:11] Epoch: 1 Batch: 9537/38378 (24.85%) Loss: 1.803356 LR: 0.00004680 [03:46:13] Epoch: 1 Batch: 9538/38378 (24.85%) Loss: 1.943571 LR: 0.00004680 [03:46:15] Epoch: 1 Batch: 9539/38378 (24.86%) Loss: 2.087474 LR: 0.00004680 [03:46:17] Epoch: 1 Batch: 9540/38378 (24.86%) Loss: 1.734611 LR: 0.00004679 [03:46:19] Epoch: 1 Batch: 9541/38378 (24.86%) Loss: 2.020048 LR: 0.00004679 [03:46:20] Epoch: 1 Batch: 9542/38378 (24.86%) Loss: 2.134796 LR: 0.00004679 [03:46:22] Epoch: 1 Batch: 9543/38378 (24.87%) Loss: 2.240090 LR: 0.00004679 [03:46:24] Epoch: 1 Batch: 9544/38378 (24.87%) Loss: 2.181534 LR: 0.00004679 [03:46:26] Epoch: 1 Batch: 9545/38378 (24.87%) Loss: 1.589216 LR: 0.00004679 [03:46:28] Epoch: 1 Batch: 9546/38378 (24.87%) Loss: 1.963018 LR: 0.00004679 [03:46:29] Epoch: 1 Batch: 9547/38378 (24.88%) Loss: 2.272533 LR: 0.00004678 [03:46:31] Epoch: 1 Batch: 9548/38378 (24.88%) Loss: 2.100308 LR: 0.00004678 [03:46:33] Epoch: 1 Batch: 9549/38378 (24.88%) Loss: 2.115584 LR: 0.00004678 [03:46:35] Epoch: 1 Batch: 9550/38378 (24.88%) Loss: 1.813974 LR: 0.00004678 [03:46:37] Epoch: 1 Batch: 9551/38378 (24.89%) Loss: 1.972536 LR: 0.00004678 [03:46:38] Epoch: 1 Batch: 9552/38378 (24.89%) Loss: 1.954621 LR: 0.00004678 [03:46:40] Epoch: 1 Batch: 9553/38378 (24.89%) Loss: 2.212619 LR: 0.00004678 [03:46:42] Epoch: 1 Batch: 9554/38378 (24.89%) Loss: 2.199035 LR: 0.00004678 [03:46:44] Epoch: 1 Batch: 9555/38378 (24.90%) Loss: 1.904681 LR: 0.00004678 [03:46:46] Epoch: 1 Batch: 9556/38378 (24.90%) Loss: 2.084526 LR: 0.00004678 [03:46:48] Epoch: 1 Batch: 9557/38378 (24.90%) Loss: 2.160925 LR: 0.00004678 [03:46:49] Epoch: 1 Batch: 9558/38378 (24.90%) Loss: 2.027193 LR: 0.00004678 [03:46:51] Epoch: 1 Batch: 9559/38378 (24.91%) Loss: 2.018216 LR: 0.00004678 [03:46:53] Epoch: 1 Batch: 9560/38378 (24.91%) Loss: 1.963789 LR: 0.00004678 [03:46:55] Epoch: 1 Batch: 9561/38378 (24.91%) Loss: 1.968155 LR: 0.00004677 [03:46:57] Epoch: 1 Batch: 9562/38378 (24.92%) Loss: 2.148865 LR: 0.00004677 [03:46:59] Epoch: 1 Batch: 9563/38378 (24.92%) Loss: 1.771235 LR: 0.00004677 [03:47:00] Epoch: 1 Batch: 9564/38378 (24.92%) Loss: 2.091806 LR: 0.00004677 [03:47:02] Epoch: 1 Batch: 9565/38378 (24.92%) Loss: 2.125596 LR: 0.00004677 [03:47:04] Epoch: 1 Batch: 9566/38378 (24.93%) Loss: 1.782764 LR: 0.00004677 [03:47:06] Epoch: 1 Batch: 9567/38378 (24.93%) Loss: 2.059231 LR: 0.00004677 [03:47:08] Epoch: 1 Batch: 9568/38378 (24.93%) Loss: 1.959029 LR: 0.00004676 [03:47:10] Epoch: 1 Batch: 9569/38378 (24.93%) Loss: 1.794892 LR: 0.00004676 [03:47:11] Epoch: 1 Batch: 9570/38378 (24.94%) Loss: 2.058843 LR: 0.00004676 [03:47:13] Epoch: 1 Batch: 9571/38378 (24.94%) Loss: 1.866887 LR: 0.00004676 [03:47:15] Epoch: 1 Batch: 9572/38378 (24.94%) Loss: 1.899321 LR: 0.00004676 [03:47:17] Epoch: 1 Batch: 9573/38378 (24.94%) Loss: 1.986459 LR: 0.00004676 [03:47:19] Epoch: 1 Batch: 9574/38378 (24.95%) Loss: 1.870167 LR: 0.00004676 [03:47:21] Epoch: 1 Batch: 9575/38378 (24.95%) Loss: 1.774990 LR: 0.00004676 [03:47:22] Epoch: 1 Batch: 9576/38378 (24.95%) Loss: 2.016445 LR: 0.00004676 [03:47:24] Epoch: 1 Batch: 9577/38378 (24.95%) Loss: 1.890440 LR: 0.00004676 [03:47:26] Epoch: 1 Batch: 9578/38378 (24.96%) Loss: 2.110390 LR: 0.00004676 [03:47:28] Epoch: 1 Batch: 9579/38378 (24.96%) Loss: 2.025342 LR: 0.00004676 [03:47:30] Epoch: 1 Batch: 9580/38378 (24.96%) Loss: 2.242459 LR: 0.00004676 [03:47:32] Epoch: 1 Batch: 9581/38378 (24.96%) Loss: 2.233757 LR: 0.00004676 [03:47:33] Epoch: 1 Batch: 9582/38378 (24.97%) Loss: 1.704761 LR: 0.00004675 [03:47:35] Epoch: 1 Batch: 9583/38378 (24.97%) Loss: 1.814240 LR: 0.00004675 [03:47:37] Epoch: 1 Batch: 9584/38378 (24.97%) Loss: 2.003537 LR: 0.00004675 [03:47:39] Epoch: 1 Batch: 9585/38378 (24.98%) Loss: 2.401984 LR: 0.00004675 [03:47:41] Epoch: 1 Batch: 9586/38378 (24.98%) Loss: 2.330114 LR: 0.00004675 [03:47:42] Epoch: 1 Batch: 9587/38378 (24.98%) Loss: 1.814199 LR: 0.00004675 [03:47:44] Epoch: 1 Batch: 9588/38378 (24.98%) Loss: 1.803513 LR: 0.00004675 [03:47:46] Epoch: 1 Batch: 9589/38378 (24.99%) Loss: 2.176948 LR: 0.00004674 [03:47:48] Epoch: 1 Batch: 9590/38378 (24.99%) Loss: 1.925369 LR: 0.00004674 [03:47:50] Epoch: 1 Batch: 9591/38378 (24.99%) Loss: 2.087675 LR: 0.00004674 [03:47:52] Epoch: 1 Batch: 9592/38378 (24.99%) Loss: 1.908881 LR: 0.00004674 [03:47:53] Epoch: 1 Batch: 9593/38378 (25.00%) Loss: 2.079805 LR: 0.00004674 [03:47:55] Epoch: 1 Batch: 9594/38378 (25.00%) Loss: 1.906627 LR: 0.00004674 [03:47:57] Epoch: 1 Batch: 9595/38378 (25.00%) Loss: 2.027252 LR: 0.00004674 [03:47:59] Epoch: 1 Batch: 9596/38378 (25.00%) Loss: 2.098606 LR: 0.00004674 [03:48:01] Epoch: 1 Batch: 9597/38378 (25.01%) Loss: 1.842442 LR: 0.00004674 [03:48:02] Epoch: 1 Batch: 9598/38378 (25.01%) Loss: 2.027441 LR: 0.00004674 [03:48:04] Epoch: 1 Batch: 9599/38378 (25.01%) Loss: 2.359993 LR: 0.00004674 [03:48:11] >> Cleaned up old temp checkpoint: epoch1_step7953 [03:48:11] >> Temp checkpoint saved: epoch1_step9600, size: 0.1702 GB [03:48:11] Epoch: 1 Batch: 9600/38378 (25.01%) Loss: 2.007325 LR: 0.00004674 [03:48:12] Epoch: 1 Batch: 9601/38378 (25.02%) Loss: 2.117451 LR: 0.00004674 [03:48:14] Epoch: 1 Batch: 9602/38378 (25.02%) Loss: 2.157679 LR: 0.00004674 [03:48:16] Epoch: 1 Batch: 9603/38378 (25.02%) Loss: 2.174812 LR: 0.00004673 [03:48:18] Epoch: 1 Batch: 9604/38378 (25.02%) Loss: 1.784111 LR: 0.00004673 [03:48:19] Epoch: 1 Batch: 9605/38378 (25.03%) Loss: 1.907146 LR: 0.00004673 [03:48:21] Epoch: 1 Batch: 9606/38378 (25.03%) Loss: 1.948658 LR: 0.00004673 [03:48:23] Epoch: 1 Batch: 9607/38378 (25.03%) Loss: 1.864098 LR: 0.00004673 [03:48:25] Epoch: 1 Batch: 9608/38378 (25.04%) Loss: 2.232354 LR: 0.00004673 [03:48:27] Epoch: 1 Batch: 9609/38378 (25.04%) Loss: 1.895289 LR: 0.00004673 [03:48:29] Epoch: 1 Batch: 9610/38378 (25.04%) Loss: 1.866651 LR: 0.00004672 [03:48:30] Epoch: 1 Batch: 9611/38378 (25.04%) Loss: 1.951607 LR: 0.00004672 [03:48:32] Epoch: 1 Batch: 9612/38378 (25.05%) Loss: 2.123422 LR: 0.00004672 [03:48:34] Epoch: 1 Batch: 9613/38378 (25.05%) Loss: 1.846882 LR: 0.00004672 [03:48:36] Epoch: 1 Batch: 9614/38378 (25.05%) Loss: 1.579348 LR: 0.00004672 [03:48:38] Epoch: 1 Batch: 9615/38378 (25.05%) Loss: 2.149368 LR: 0.00004672 [03:48:39] Epoch: 1 Batch: 9616/38378 (25.06%) Loss: 1.982863 LR: 0.00004672 [03:48:41] Epoch: 1 Batch: 9617/38378 (25.06%) Loss: 2.055669 LR: 0.00004672 [03:48:43] Epoch: 1 Batch: 9618/38378 (25.06%) Loss: 1.872289 LR: 0.00004672 [03:48:45] Epoch: 1 Batch: 9619/38378 (25.06%) Loss: 1.990785 LR: 0.00004672 [03:48:47] Epoch: 1 Batch: 9620/38378 (25.07%) Loss: 2.122347 LR: 0.00004672 [03:48:49] Epoch: 1 Batch: 9621/38378 (25.07%) Loss: 1.998959 LR: 0.00004672 [03:48:50] Epoch: 1 Batch: 9622/38378 (25.07%) Loss: 1.965562 LR: 0.00004672 [03:48:52] Epoch: 1 Batch: 9623/38378 (25.07%) Loss: 1.720041 LR: 0.00004672 [03:48:54] Epoch: 1 Batch: 9624/38378 (25.08%) Loss: 1.779526 LR: 0.00004671 [03:48:56] Epoch: 1 Batch: 9625/38378 (25.08%) Loss: 1.795766 LR: 0.00004671 [03:48:58] Epoch: 1 Batch: 9626/38378 (25.08%) Loss: 2.106318 LR: 0.00004671 [03:48:59] Epoch: 1 Batch: 9627/38378 (25.08%) Loss: 2.001509 LR: 0.00004671 [03:49:01] Epoch: 1 Batch: 9628/38378 (25.09%) Loss: 2.243628 LR: 0.00004671 [03:49:03] Epoch: 1 Batch: 9629/38378 (25.09%) Loss: 2.122576 LR: 0.00004671 [03:49:05] Epoch: 1 Batch: 9630/38378 (25.09%) Loss: 2.092660 LR: 0.00004671 [03:49:07] Epoch: 1 Batch: 9631/38378 (25.10%) Loss: 2.025012 LR: 0.00004670 [03:49:08] Epoch: 1 Batch: 9632/38378 (25.10%) Loss: 2.011413 LR: 0.00004670 [03:49:10] Epoch: 1 Batch: 9633/38378 (25.10%) Loss: 2.092734 LR: 0.00004670 [03:49:12] Epoch: 1 Batch: 9634/38378 (25.10%) Loss: 1.817397 LR: 0.00004670 [03:49:14] Epoch: 1 Batch: 9635/38378 (25.11%) Loss: 2.154519 LR: 0.00004670 [03:49:16] Epoch: 1 Batch: 9636/38378 (25.11%) Loss: 2.086692 LR: 0.00004670 [03:49:18] Epoch: 1 Batch: 9637/38378 (25.11%) Loss: 1.766176 LR: 0.00004670 [03:49:19] Epoch: 1 Batch: 9638/38378 (25.11%) Loss: 1.995083 LR: 0.00004670 [03:49:21] Epoch: 1 Batch: 9639/38378 (25.12%) Loss: 2.014270 LR: 0.00004670 [03:49:23] Epoch: 1 Batch: 9640/38378 (25.12%) Loss: 2.223600 LR: 0.00004670 [03:49:25] Epoch: 1 Batch: 9641/38378 (25.12%) Loss: 1.875875 LR: 0.00004670 [03:49:27] Epoch: 1 Batch: 9642/38378 (25.12%) Loss: 1.762475 LR: 0.00004670 [03:49:28] Epoch: 1 Batch: 9643/38378 (25.13%) Loss: 2.008028 LR: 0.00004670 [03:49:30] Epoch: 1 Batch: 9644/38378 (25.13%) Loss: 2.006030 LR: 0.00004670 [03:49:32] Epoch: 1 Batch: 9645/38378 (25.13%) Loss: 2.089276 LR: 0.00004669 [03:49:34] Epoch: 1 Batch: 9646/38378 (25.13%) Loss: 1.884442 LR: 0.00004669 [03:49:36] Epoch: 1 Batch: 9647/38378 (25.14%) Loss: 1.726710 LR: 0.00004669 [03:49:37] Epoch: 1 Batch: 9648/38378 (25.14%) Loss: 1.931065 LR: 0.00004669 [03:49:39] Epoch: 1 Batch: 9649/38378 (25.14%) Loss: 1.872182 LR: 0.00004669 [03:49:41] Epoch: 1 Batch: 9650/38378 (25.14%) Loss: 2.002011 LR: 0.00004669 [03:49:43] Epoch: 1 Batch: 9651/38378 (25.15%) Loss: 2.026274 LR: 0.00004669 [03:49:45] Epoch: 1 Batch: 9652/38378 (25.15%) Loss: 1.983878 LR: 0.00004668 [03:49:46] Epoch: 1 Batch: 9653/38378 (25.15%) Loss: 1.897920 LR: 0.00004668 [03:49:48] Epoch: 1 Batch: 9654/38378 (25.16%) Loss: 2.019270 LR: 0.00004668 [03:49:50] Epoch: 1 Batch: 9655/38378 (25.16%) Loss: 1.912647 LR: 0.00004668 [03:49:52] Epoch: 1 Batch: 9656/38378 (25.16%) Loss: 1.938333 LR: 0.00004668 [03:49:54] Epoch: 1 Batch: 9657/38378 (25.16%) Loss: 2.122874 LR: 0.00004668 [03:49:56] Epoch: 1 Batch: 9658/38378 (25.17%) Loss: 1.891961 LR: 0.00004668 [03:49:57] Epoch: 1 Batch: 9659/38378 (25.17%) Loss: 1.871649 LR: 0.00004667 [03:49:59] Epoch: 1 Batch: 9660/38378 (25.17%) Loss: 1.999687 LR: 0.00004667 [03:50:01] Epoch: 1 Batch: 9661/38378 (25.17%) Loss: 1.824862 LR: 0.00004667 [03:50:03] Epoch: 1 Batch: 9662/38378 (25.18%) Loss: 2.027690 LR: 0.00004667 [03:50:05] Epoch: 1 Batch: 9663/38378 (25.18%) Loss: 1.830722 LR: 0.00004667 [03:50:07] Epoch: 1 Batch: 9664/38378 (25.18%) Loss: 1.860024 LR: 0.00004667 [03:50:08] Epoch: 1 Batch: 9665/38378 (25.18%) Loss: 1.903766 LR: 0.00004667 [03:50:10] Epoch: 1 Batch: 9666/38378 (25.19%) Loss: 1.987714 LR: 0.00004667 [03:50:12] Epoch: 1 Batch: 9667/38378 (25.19%) Loss: 1.795762 LR: 0.00004667 [03:50:14] Epoch: 1 Batch: 9668/38378 (25.19%) Loss: 2.122075 LR: 0.00004667 [03:50:16] Epoch: 1 Batch: 9669/38378 (25.19%) Loss: 1.832701 LR: 0.00004667 [03:50:18] Epoch: 1 Batch: 9670/38378 (25.20%) Loss: 2.044252 LR: 0.00004667 [03:50:19] Epoch: 1 Batch: 9671/38378 (25.20%) Loss: 1.921826 LR: 0.00004667 [03:50:21] Epoch: 1 Batch: 9672/38378 (25.20%) Loss: 2.128116 LR: 0.00004667 [03:50:23] Epoch: 1 Batch: 9673/38378 (25.20%) Loss: 2.145595 LR: 0.00004666 [03:50:25] Epoch: 1 Batch: 9674/38378 (25.21%) Loss: 1.716457 LR: 0.00004666 [03:50:27] Epoch: 1 Batch: 9675/38378 (25.21%) Loss: 2.334355 LR: 0.00004666 [03:50:29] Epoch: 1 Batch: 9676/38378 (25.21%) Loss: 1.922137 LR: 0.00004666 [03:50:30] Epoch: 1 Batch: 9677/38378 (25.21%) Loss: 2.283133 LR: 0.00004666 [03:50:32] Epoch: 1 Batch: 9678/38378 (25.22%) Loss: 2.110949 LR: 0.00004666 [03:50:34] Epoch: 1 Batch: 9679/38378 (25.22%) Loss: 1.923910 LR: 0.00004666 [03:50:36] Epoch: 1 Batch: 9680/38378 (25.22%) Loss: 2.151995 LR: 0.00004665 [03:50:38] Epoch: 1 Batch: 9681/38378 (25.23%) Loss: 1.925553 LR: 0.00004665 [03:50:40] Epoch: 1 Batch: 9682/38378 (25.23%) Loss: 1.926429 LR: 0.00004665 [03:50:41] Epoch: 1 Batch: 9683/38378 (25.23%) Loss: 2.066507 LR: 0.00004665 [03:50:43] Epoch: 1 Batch: 9684/38378 (25.23%) Loss: 1.994409 LR: 0.00004665 [03:50:45] Epoch: 1 Batch: 9685/38378 (25.24%) Loss: 1.740086 LR: 0.00004665 [03:50:47] Epoch: 1 Batch: 9686/38378 (25.24%) Loss: 1.961175 LR: 0.00004665 [03:50:49] Epoch: 1 Batch: 9687/38378 (25.24%) Loss: 2.051810 LR: 0.00004665 [03:50:51] Epoch: 1 Batch: 9688/38378 (25.24%) Loss: 1.957307 LR: 0.00004665 [03:50:52] Epoch: 1 Batch: 9689/38378 (25.25%) Loss: 1.858456 LR: 0.00004665 [03:50:54] Epoch: 1 Batch: 9690/38378 (25.25%) Loss: 2.157585 LR: 0.00004665 [03:50:56] Epoch: 1 Batch: 9691/38378 (25.25%) Loss: 2.223922 LR: 0.00004665 [03:50:58] Epoch: 1 Batch: 9692/38378 (25.25%) Loss: 1.997486 LR: 0.00004665 [03:51:00] Epoch: 1 Batch: 9693/38378 (25.26%) Loss: 1.738508 LR: 0.00004665 [03:51:02] Epoch: 1 Batch: 9694/38378 (25.26%) Loss: 1.846093 LR: 0.00004664 [03:51:03] Epoch: 1 Batch: 9695/38378 (25.26%) Loss: 1.625219 LR: 0.00004664 [03:51:05] Epoch: 1 Batch: 9696/38378 (25.26%) Loss: 2.270227 LR: 0.00004664 [03:51:07] Epoch: 1 Batch: 9697/38378 (25.27%) Loss: 2.075181 LR: 0.00004664 [03:51:09] Epoch: 1 Batch: 9698/38378 (25.27%) Loss: 2.009235 LR: 0.00004664 [03:51:10] Epoch: 1 Batch: 9699/38378 (25.27%) Loss: 1.961425 LR: 0.00004664 [03:51:12] Epoch: 1 Batch: 9700/38378 (25.27%) Loss: 1.851077 LR: 0.00004664 [03:51:14] Epoch: 1 Batch: 9701/38378 (25.28%) Loss: 2.169434 LR: 0.00004663 [03:51:16] Epoch: 1 Batch: 9702/38378 (25.28%) Loss: 2.145984 LR: 0.00004663 [03:51:18] Epoch: 1 Batch: 9703/38378 (25.28%) Loss: 2.000685 LR: 0.00004663 [03:51:20] Epoch: 1 Batch: 9704/38378 (25.29%) Loss: 2.132921 LR: 0.00004663 [03:51:21] Epoch: 1 Batch: 9705/38378 (25.29%) Loss: 2.142439 LR: 0.00004663 [03:51:23] Epoch: 1 Batch: 9706/38378 (25.29%) Loss: 2.117642 LR: 0.00004663 [03:51:25] Epoch: 1 Batch: 9707/38378 (25.29%) Loss: 1.638785 LR: 0.00004663 [03:51:27] Epoch: 1 Batch: 9708/38378 (25.30%) Loss: 1.974461 LR: 0.00004663 [03:51:29] Epoch: 1 Batch: 9709/38378 (25.30%) Loss: 1.926222 LR: 0.00004663 [03:51:31] Epoch: 1 Batch: 9710/38378 (25.30%) Loss: 2.063652 LR: 0.00004663 [03:51:32] Epoch: 1 Batch: 9711/38378 (25.30%) Loss: 2.019779 LR: 0.00004663 [03:51:34] Epoch: 1 Batch: 9712/38378 (25.31%) Loss: 1.938119 LR: 0.00004663 [03:51:36] Epoch: 1 Batch: 9713/38378 (25.31%) Loss: 2.129525 LR: 0.00004663 [03:51:38] Epoch: 1 Batch: 9714/38378 (25.31%) Loss: 2.024796 LR: 0.00004663 [03:51:40] Epoch: 1 Batch: 9715/38378 (25.31%) Loss: 2.152037 LR: 0.00004662 [03:51:42] Epoch: 1 Batch: 9716/38378 (25.32%) Loss: 1.977836 LR: 0.00004662 [03:51:43] Epoch: 1 Batch: 9717/38378 (25.32%) Loss: 1.744152 LR: 0.00004662 [03:51:45] Epoch: 1 Batch: 9718/38378 (25.32%) Loss: 1.756595 LR: 0.00004662 [03:51:47] Epoch: 1 Batch: 9719/38378 (25.32%) Loss: 1.999081 LR: 0.00004662 [03:51:49] Epoch: 1 Batch: 9720/38378 (25.33%) Loss: 2.559202 LR: 0.00004662 [03:51:51] Epoch: 1 Batch: 9721/38378 (25.33%) Loss: 1.632002 LR: 0.00004662 [03:51:52] Epoch: 1 Batch: 9722/38378 (25.33%) Loss: 2.435557 LR: 0.00004661 [03:51:54] Epoch: 1 Batch: 9723/38378 (25.33%) Loss: 2.119335 LR: 0.00004661 [03:51:56] Epoch: 1 Batch: 9724/38378 (25.34%) Loss: 1.894244 LR: 0.00004661 [03:51:58] Epoch: 1 Batch: 9725/38378 (25.34%) Loss: 2.450293 LR: 0.00004661 [03:52:00] Epoch: 1 Batch: 9726/38378 (25.34%) Loss: 1.909549 LR: 0.00004661 [03:52:02] Epoch: 1 Batch: 9727/38378 (25.35%) Loss: 1.947946 LR: 0.00004661 [03:52:03] Epoch: 1 Batch: 9728/38378 (25.35%) Loss: 1.918529 LR: 0.00004661 [03:52:05] Epoch: 1 Batch: 9729/38378 (25.35%) Loss: 2.133081 LR: 0.00004661 [03:52:07] Epoch: 1 Batch: 9730/38378 (25.35%) Loss: 1.778026 LR: 0.00004661 [03:52:09] Epoch: 1 Batch: 9731/38378 (25.36%) Loss: 2.199778 LR: 0.00004661 [03:52:11] Epoch: 1 Batch: 9732/38378 (25.36%) Loss: 2.052404 LR: 0.00004661 [03:52:13] Epoch: 1 Batch: 9733/38378 (25.36%) Loss: 1.850572 LR: 0.00004661 [03:52:14] Epoch: 1 Batch: 9734/38378 (25.36%) Loss: 1.868837 LR: 0.00004661 [03:52:16] Epoch: 1 Batch: 9735/38378 (25.37%) Loss: 1.989961 LR: 0.00004661 [03:52:18] Epoch: 1 Batch: 9736/38378 (25.37%) Loss: 1.746057 LR: 0.00004660 [03:52:20] Epoch: 1 Batch: 9737/38378 (25.37%) Loss: 1.785782 LR: 0.00004660 [03:52:22] Epoch: 1 Batch: 9738/38378 (25.37%) Loss: 1.683676 LR: 0.00004660 [03:52:24] Epoch: 1 Batch: 9739/38378 (25.38%) Loss: 2.023303 LR: 0.00004660 [03:52:25] Epoch: 1 Batch: 9740/38378 (25.38%) Loss: 1.975506 LR: 0.00004660 [03:52:27] Epoch: 1 Batch: 9741/38378 (25.38%) Loss: 2.052599 LR: 0.00004660 [03:52:29] Epoch: 1 Batch: 9742/38378 (25.38%) Loss: 1.826241 LR: 0.00004660 [03:52:31] Epoch: 1 Batch: 9743/38378 (25.39%) Loss: 1.541751 LR: 0.00004659 [03:52:33] Epoch: 1 Batch: 9744/38378 (25.39%) Loss: 2.155082 LR: 0.00004659 [03:52:34] Epoch: 1 Batch: 9745/38378 (25.39%) Loss: 1.957544 LR: 0.00004659 [03:52:36] Epoch: 1 Batch: 9746/38378 (25.39%) Loss: 2.166008 LR: 0.00004659 [03:52:38] Epoch: 1 Batch: 9747/38378 (25.40%) Loss: 2.314962 LR: 0.00004659 [03:52:40] Epoch: 1 Batch: 9748/38378 (25.40%) Loss: 1.995021 LR: 0.00004659 [03:52:41] Epoch: 1 Batch: 9749/38378 (25.40%) Loss: 2.223433 LR: 0.00004659 [03:52:43] Epoch: 1 Batch: 9750/38378 (25.41%) Loss: 1.695757 LR: 0.00004658 [03:52:45] Epoch: 1 Batch: 9751/38378 (25.41%) Loss: 1.972253 LR: 0.00004658 [03:52:47] Epoch: 1 Batch: 9752/38378 (25.41%) Loss: 1.924197 LR: 0.00004658 [03:52:49] Epoch: 1 Batch: 9753/38378 (25.41%) Loss: 1.502376 LR: 0.00004658 [03:52:50] Epoch: 1 Batch: 9754/38378 (25.42%) Loss: 2.012576 LR: 0.00004658 [03:52:52] Epoch: 1 Batch: 9755/38378 (25.42%) Loss: 1.944808 LR: 0.00004658 [03:52:54] Epoch: 1 Batch: 9756/38378 (25.42%) Loss: 1.866870 LR: 0.00004658 [03:52:56] Epoch: 1 Batch: 9757/38378 (25.42%) Loss: 2.343163 LR: 0.00004658 [03:52:58] Epoch: 1 Batch: 9758/38378 (25.43%) Loss: 2.078026 LR: 0.00004658 [03:53:00] Epoch: 1 Batch: 9759/38378 (25.43%) Loss: 2.190246 LR: 0.00004658 [03:53:01] Epoch: 1 Batch: 9760/38378 (25.43%) Loss: 2.359272 LR: 0.00004658 [03:53:03] Epoch: 1 Batch: 9761/38378 (25.43%) Loss: 2.050840 LR: 0.00004658 [03:53:05] Epoch: 1 Batch: 9762/38378 (25.44%) Loss: 1.995708 LR: 0.00004658 [03:53:07] Epoch: 1 Batch: 9763/38378 (25.44%) Loss: 1.878640 LR: 0.00004658 [03:53:09] Epoch: 1 Batch: 9764/38378 (25.44%) Loss: 1.991296 LR: 0.00004657 [03:53:11] Epoch: 1 Batch: 9765/38378 (25.44%) Loss: 1.940980 LR: 0.00004657 [03:53:12] Epoch: 1 Batch: 9766/38378 (25.45%) Loss: 1.800258 LR: 0.00004657 [03:53:14] Epoch: 1 Batch: 9767/38378 (25.45%) Loss: 2.373830 LR: 0.00004657 [03:53:16] Epoch: 1 Batch: 9768/38378 (25.45%) Loss: 1.803761 LR: 0.00004657 [03:53:18] Epoch: 1 Batch: 9769/38378 (25.45%) Loss: 2.178313 LR: 0.00004657 [03:53:20] Epoch: 1 Batch: 9770/38378 (25.46%) Loss: 1.771465 LR: 0.00004657 [03:53:22] Epoch: 1 Batch: 9771/38378 (25.46%) Loss: 2.083705 LR: 0.00004656 [03:53:23] Epoch: 1 Batch: 9772/38378 (25.46%) Loss: 1.734591 LR: 0.00004656 [03:53:25] Epoch: 1 Batch: 9773/38378 (25.47%) Loss: 1.841603 LR: 0.00004656 [03:53:27] Epoch: 1 Batch: 9774/38378 (25.47%) Loss: 2.031302 LR: 0.00004656 [03:53:29] Epoch: 1 Batch: 9775/38378 (25.47%) Loss: 1.826100 LR: 0.00004656 [03:53:31] Epoch: 1 Batch: 9776/38378 (25.47%) Loss: 1.810139 LR: 0.00004656 [03:53:32] Epoch: 1 Batch: 9777/38378 (25.48%) Loss: 2.032620 LR: 0.00004656 [03:53:34] Epoch: 1 Batch: 9778/38378 (25.48%) Loss: 2.070941 LR: 0.00004656 [03:53:36] Epoch: 1 Batch: 9779/38378 (25.48%) Loss: 1.899342 LR: 0.00004656 [03:53:38] Epoch: 1 Batch: 9780/38378 (25.48%) Loss: 2.255046 LR: 0.00004656 [03:53:40] Epoch: 1 Batch: 9781/38378 (25.49%) Loss: 2.291331 LR: 0.00004656 [03:53:42] Epoch: 1 Batch: 9782/38378 (25.49%) Loss: 1.920640 LR: 0.00004656 [03:53:43] Epoch: 1 Batch: 9783/38378 (25.49%) Loss: 2.070855 LR: 0.00004656 [03:53:45] Epoch: 1 Batch: 9784/38378 (25.49%) Loss: 1.836239 LR: 0.00004656 [03:53:47] Epoch: 1 Batch: 9785/38378 (25.50%) Loss: 2.062707 LR: 0.00004655 [03:53:49] Epoch: 1 Batch: 9786/38378 (25.50%) Loss: 2.305186 LR: 0.00004655 [03:53:51] Epoch: 1 Batch: 9787/38378 (25.50%) Loss: 1.884267 LR: 0.00004655 [03:53:53] Epoch: 1 Batch: 9788/38378 (25.50%) Loss: 2.159806 LR: 0.00004655 [03:53:54] Epoch: 1 Batch: 9789/38378 (25.51%) Loss: 2.044000 LR: 0.00004655 [03:53:56] Epoch: 1 Batch: 9790/38378 (25.51%) Loss: 2.039302 LR: 0.00004655 [03:53:58] Epoch: 1 Batch: 9791/38378 (25.51%) Loss: 2.151173 LR: 0.00004655 [03:54:00] Epoch: 1 Batch: 9792/38378 (25.51%) Loss: 2.274321 LR: 0.00004654 [03:54:02] Epoch: 1 Batch: 9793/38378 (25.52%) Loss: 2.125990 LR: 0.00004654 [03:54:03] Epoch: 1 Batch: 9794/38378 (25.52%) Loss: 1.803141 LR: 0.00004654 [03:54:05] Epoch: 1 Batch: 9795/38378 (25.52%) Loss: 2.153043 LR: 0.00004654 [03:54:07] Epoch: 1 Batch: 9796/38378 (25.53%) Loss: 1.922039 LR: 0.00004654 [03:54:09] Epoch: 1 Batch: 9797/38378 (25.53%) Loss: 2.038520 LR: 0.00004654 [03:54:11] Epoch: 1 Batch: 9798/38378 (25.53%) Loss: 1.796431 LR: 0.00004654 [03:54:13] Epoch: 1 Batch: 9799/38378 (25.53%) Loss: 2.106550 LR: 0.00004654 [03:54:19] >> Cleaned up old temp checkpoint: epoch1_step7986 [03:54:19] >> Temp checkpoint saved: epoch1_step9800, size: 0.1702 GB [03:54:19] Epoch: 1 Batch: 9800/38378 (25.54%) Loss: 1.984695 LR: 0.00004654 [03:54:20] Epoch: 1 Batch: 9801/38378 (25.54%) Loss: 2.255949 LR: 0.00004654 [03:54:22] Epoch: 1 Batch: 9802/38378 (25.54%) Loss: 1.825070 LR: 0.00004654 [03:54:24] Epoch: 1 Batch: 9803/38378 (25.54%) Loss: 1.852217 LR: 0.00004654 [03:54:26] Epoch: 1 Batch: 9804/38378 (25.55%) Loss: 2.126453 LR: 0.00004654 [03:54:28] Epoch: 1 Batch: 9805/38378 (25.55%) Loss: 1.886048 LR: 0.00004654 [03:54:29] Epoch: 1 Batch: 9806/38378 (25.55%) Loss: 2.191350 LR: 0.00004653 [03:54:31] Epoch: 1 Batch: 9807/38378 (25.55%) Loss: 2.387366 LR: 0.00004653 [03:54:33] Epoch: 1 Batch: 9808/38378 (25.56%) Loss: 1.745471 LR: 0.00004653 [03:54:35] Epoch: 1 Batch: 9809/38378 (25.56%) Loss: 1.932756 LR: 0.00004653 [03:54:37] Epoch: 1 Batch: 9810/38378 (25.56%) Loss: 2.080037 LR: 0.00004653 [03:54:39] Epoch: 1 Batch: 9811/38378 (25.56%) Loss: 2.228781 LR: 0.00004653 [03:54:40] Epoch: 1 Batch: 9812/38378 (25.57%) Loss: 2.085218 LR: 0.00004653 [03:54:42] Epoch: 1 Batch: 9813/38378 (25.57%) Loss: 2.170784 LR: 0.00004652 [03:54:44] Epoch: 1 Batch: 9814/38378 (25.57%) Loss: 2.064950 LR: 0.00004652 [03:54:46] Epoch: 1 Batch: 9815/38378 (25.57%) Loss: 1.937239 LR: 0.00004652 [03:54:48] Epoch: 1 Batch: 9816/38378 (25.58%) Loss: 1.950819 LR: 0.00004652 [03:54:50] Epoch: 1 Batch: 9817/38378 (25.58%) Loss: 2.042729 LR: 0.00004652 [03:54:51] Epoch: 1 Batch: 9818/38378 (25.58%) Loss: 1.987321 LR: 0.00004652 [03:54:53] Epoch: 1 Batch: 9819/38378 (25.58%) Loss: 2.052424 LR: 0.00004652 [03:54:55] Epoch: 1 Batch: 9820/38378 (25.59%) Loss: 1.990961 LR: 0.00004651 [03:54:57] Epoch: 1 Batch: 9821/38378 (25.59%) Loss: 2.187625 LR: 0.00004651 [03:54:59] Epoch: 1 Batch: 9822/38378 (25.59%) Loss: 2.076311 LR: 0.00004651 [03:55:01] Epoch: 1 Batch: 9823/38378 (25.60%) Loss: 1.703889 LR: 0.00004651 [03:55:02] Epoch: 1 Batch: 9824/38378 (25.60%) Loss: 1.679397 LR: 0.00004651 [03:55:04] Epoch: 1 Batch: 9825/38378 (25.60%) Loss: 2.102667 LR: 0.00004651 [03:55:06] Epoch: 1 Batch: 9826/38378 (25.60%) Loss: 2.043635 LR: 0.00004651 [03:55:08] Epoch: 1 Batch: 9827/38378 (25.61%) Loss: 2.114476 LR: 0.00004651 [03:55:10] Epoch: 1 Batch: 9828/38378 (25.61%) Loss: 1.899555 LR: 0.00004651 [03:55:12] Epoch: 1 Batch: 9829/38378 (25.61%) Loss: 1.803290 LR: 0.00004651 [03:55:13] Epoch: 1 Batch: 9830/38378 (25.61%) Loss: 2.133252 LR: 0.00004651 [03:55:15] Epoch: 1 Batch: 9831/38378 (25.62%) Loss: 2.282306 LR: 0.00004651 [03:55:17] Epoch: 1 Batch: 9832/38378 (25.62%) Loss: 1.769023 LR: 0.00004651 [03:55:19] Epoch: 1 Batch: 9833/38378 (25.62%) Loss: 2.015946 LR: 0.00004651 [03:55:21] Epoch: 1 Batch: 9834/38378 (25.62%) Loss: 2.323207 LR: 0.00004650 [03:55:22] Epoch: 1 Batch: 9835/38378 (25.63%) Loss: 1.981412 LR: 0.00004650 [03:55:24] Epoch: 1 Batch: 9836/38378 (25.63%) Loss: 1.977583 LR: 0.00004650 [03:55:26] Epoch: 1 Batch: 9837/38378 (25.63%) Loss: 2.280615 LR: 0.00004650 [03:55:28] Epoch: 1 Batch: 9838/38378 (25.63%) Loss: 2.226203 LR: 0.00004650 [03:55:30] Epoch: 1 Batch: 9839/38378 (25.64%) Loss: 2.131194 LR: 0.00004650 [03:55:31] Epoch: 1 Batch: 9840/38378 (25.64%) Loss: 1.996772 LR: 0.00004650 [03:55:33] Epoch: 1 Batch: 9841/38378 (25.64%) Loss: 2.022247 LR: 0.00004649 [03:55:35] Epoch: 1 Batch: 9842/38378 (25.64%) Loss: 1.918151 LR: 0.00004649 [03:55:37] Epoch: 1 Batch: 9843/38378 (25.65%) Loss: 2.137034 LR: 0.00004649 [03:55:39] Epoch: 1 Batch: 9844/38378 (25.65%) Loss: 2.347473 LR: 0.00004649 [03:55:40] Epoch: 1 Batch: 9845/38378 (25.65%) Loss: 1.671202 LR: 0.00004649 [03:55:42] Epoch: 1 Batch: 9846/38378 (25.66%) Loss: 2.032831 LR: 0.00004649 [03:55:44] Epoch: 1 Batch: 9847/38378 (25.66%) Loss: 1.837801 LR: 0.00004649 [03:55:46] Epoch: 1 Batch: 9848/38378 (25.66%) Loss: 1.988118 LR: 0.00004649 [03:55:48] Epoch: 1 Batch: 9849/38378 (25.66%) Loss: 2.076422 LR: 0.00004649 [03:55:50] Epoch: 1 Batch: 9850/38378 (25.67%) Loss: 2.225384 LR: 0.00004649 [03:55:51] Epoch: 1 Batch: 9851/38378 (25.67%) Loss: 2.022997 LR: 0.00004649 [03:55:53] Epoch: 1 Batch: 9852/38378 (25.67%) Loss: 1.792089 LR: 0.00004649 [03:55:55] Epoch: 1 Batch: 9853/38378 (25.67%) Loss: 2.022955 LR: 0.00004649 [03:55:57] Epoch: 1 Batch: 9854/38378 (25.68%) Loss: 1.908012 LR: 0.00004649 [03:55:59] Epoch: 1 Batch: 9855/38378 (25.68%) Loss: 1.795263 LR: 0.00004648 [03:56:00] Epoch: 1 Batch: 9856/38378 (25.68%) Loss: 1.744388 LR: 0.00004648 [03:56:02] Epoch: 1 Batch: 9857/38378 (25.68%) Loss: 2.050083 LR: 0.00004648 [03:56:04] Epoch: 1 Batch: 9858/38378 (25.69%) Loss: 1.843134 LR: 0.00004648 [03:56:06] Epoch: 1 Batch: 9859/38378 (25.69%) Loss: 1.756084 LR: 0.00004648 [03:56:08] Epoch: 1 Batch: 9860/38378 (25.69%) Loss: 1.819795 LR: 0.00004648 [03:56:10] Epoch: 1 Batch: 9861/38378 (25.69%) Loss: 2.163643 LR: 0.00004648 [03:56:11] Epoch: 1 Batch: 9862/38378 (25.70%) Loss: 1.967793 LR: 0.00004647 [03:56:13] Epoch: 1 Batch: 9863/38378 (25.70%) Loss: 1.908922 LR: 0.00004647 [03:56:15] Epoch: 1 Batch: 9864/38378 (25.70%) Loss: 2.215624 LR: 0.00004647 [03:56:17] Epoch: 1 Batch: 9865/38378 (25.70%) Loss: 1.752208 LR: 0.00004647 [03:56:19] Epoch: 1 Batch: 9866/38378 (25.71%) Loss: 2.217832 LR: 0.00004647 [03:56:20] Epoch: 1 Batch: 9867/38378 (25.71%) Loss: 1.810007 LR: 0.00004647 [03:56:22] Epoch: 1 Batch: 9868/38378 (25.71%) Loss: 2.327444 LR: 0.00004647 [03:56:24] Epoch: 1 Batch: 9869/38378 (25.72%) Loss: 1.871035 LR: 0.00004647 [03:56:26] Epoch: 1 Batch: 9870/38378 (25.72%) Loss: 1.730192 LR: 0.00004647 [03:56:28] Epoch: 1 Batch: 9871/38378 (25.72%) Loss: 1.895558 LR: 0.00004647 [03:56:30] Epoch: 1 Batch: 9872/38378 (25.72%) Loss: 1.825913 LR: 0.00004647 [03:56:31] Epoch: 1 Batch: 9873/38378 (25.73%) Loss: 2.085825 LR: 0.00004647 [03:56:33] Epoch: 1 Batch: 9874/38378 (25.73%) Loss: 1.857922 LR: 0.00004647 [03:56:35] Epoch: 1 Batch: 9875/38378 (25.73%) Loss: 1.950167 LR: 0.00004647 [03:56:37] Epoch: 1 Batch: 9876/38378 (25.73%) Loss: 1.765145 LR: 0.00004646 [03:56:39] Epoch: 1 Batch: 9877/38378 (25.74%) Loss: 1.860615 LR: 0.00004646 [03:56:41] Epoch: 1 Batch: 9878/38378 (25.74%) Loss: 2.057994 LR: 0.00004646 [03:56:42] Epoch: 1 Batch: 9879/38378 (25.74%) Loss: 1.527588 LR: 0.00004646 [03:56:44] Epoch: 1 Batch: 9880/38378 (25.74%) Loss: 1.892285 LR: 0.00004646 [03:56:46] Epoch: 1 Batch: 9881/38378 (25.75%) Loss: 2.033911 LR: 0.00004646 [03:56:48] Epoch: 1 Batch: 9882/38378 (25.75%) Loss: 2.074496 LR: 0.00004646 [03:56:50] Epoch: 1 Batch: 9883/38378 (25.75%) Loss: 1.949174 LR: 0.00004645 [03:56:51] Epoch: 1 Batch: 9884/38378 (25.75%) Loss: 1.691090 LR: 0.00004645 [03:56:53] Epoch: 1 Batch: 9885/38378 (25.76%) Loss: 2.004467 LR: 0.00004645 [03:56:55] Epoch: 1 Batch: 9886/38378 (25.76%) Loss: 2.052814 LR: 0.00004645 [03:56:57] Epoch: 1 Batch: 9887/38378 (25.76%) Loss: 2.236207 LR: 0.00004645 [03:56:59] Epoch: 1 Batch: 9888/38378 (25.76%) Loss: 1.894767 LR: 0.00004645 [03:57:00] Epoch: 1 Batch: 9889/38378 (25.77%) Loss: 2.141414 LR: 0.00004645 [03:57:02] Epoch: 1 Batch: 9890/38378 (25.77%) Loss: 2.276990 LR: 0.00004644 [03:57:04] Epoch: 1 Batch: 9891/38378 (25.77%) Loss: 1.829370 LR: 0.00004644 [03:57:06] Epoch: 1 Batch: 9892/38378 (25.78%) Loss: 1.730128 LR: 0.00004644 [03:57:08] Epoch: 1 Batch: 9893/38378 (25.78%) Loss: 2.133551 LR: 0.00004644 [03:57:10] Epoch: 1 Batch: 9894/38378 (25.78%) Loss: 2.048569 LR: 0.00004644 [03:57:11] Epoch: 1 Batch: 9895/38378 (25.78%) Loss: 1.886105 LR: 0.00004644 [03:57:13] Epoch: 1 Batch: 9896/38378 (25.79%) Loss: 1.636790 LR: 0.00004644 [03:57:15] Epoch: 1 Batch: 9897/38378 (25.79%) Loss: 2.235791 LR: 0.00004644 [03:57:17] Epoch: 1 Batch: 9898/38378 (25.79%) Loss: 2.447304 LR: 0.00004644 [03:57:19] Epoch: 1 Batch: 9899/38378 (25.79%) Loss: 2.154236 LR: 0.00004644 [03:57:21] Epoch: 1 Batch: 9900/38378 (25.80%) Loss: 2.014818 LR: 0.00004644 [03:57:22] Epoch: 1 Batch: 9901/38378 (25.80%) Loss: 2.535034 LR: 0.00004644 [03:57:24] Epoch: 1 Batch: 9902/38378 (25.80%) Loss: 1.633047 LR: 0.00004644 [03:57:26] Epoch: 1 Batch: 9903/38378 (25.80%) Loss: 1.702200 LR: 0.00004644 [03:57:28] Epoch: 1 Batch: 9904/38378 (25.81%) Loss: 2.160291 LR: 0.00004643 [03:57:30] Epoch: 1 Batch: 9905/38378 (25.81%) Loss: 1.982940 LR: 0.00004643 [03:57:32] Epoch: 1 Batch: 9906/38378 (25.81%) Loss: 1.957793 LR: 0.00004643 [03:57:33] Epoch: 1 Batch: 9907/38378 (25.81%) Loss: 2.040353 LR: 0.00004643 [03:57:35] Epoch: 1 Batch: 9908/38378 (25.82%) Loss: 2.261699 LR: 0.00004643 [03:57:37] Epoch: 1 Batch: 9909/38378 (25.82%) Loss: 1.865672 LR: 0.00004643 [03:57:39] Epoch: 1 Batch: 9910/38378 (25.82%) Loss: 1.888000 LR: 0.00004643 [03:57:41] Epoch: 1 Batch: 9911/38378 (25.82%) Loss: 1.918551 LR: 0.00004642 [03:57:42] Epoch: 1 Batch: 9912/38378 (25.83%) Loss: 1.770772 LR: 0.00004642 [03:57:44] Epoch: 1 Batch: 9913/38378 (25.83%) Loss: 1.878072 LR: 0.00004642 [03:57:46] Epoch: 1 Batch: 9914/38378 (25.83%) Loss: 2.005922 LR: 0.00004642 [03:57:48] Epoch: 1 Batch: 9915/38378 (25.84%) Loss: 1.610221 LR: 0.00004642 [03:57:50] Epoch: 1 Batch: 9916/38378 (25.84%) Loss: 1.848739 LR: 0.00004642 [03:57:52] Epoch: 1 Batch: 9917/38378 (25.84%) Loss: 2.076270 LR: 0.00004642 [03:57:53] Epoch: 1 Batch: 9918/38378 (25.84%) Loss: 2.264435 LR: 0.00004642 [03:57:55] Epoch: 1 Batch: 9919/38378 (25.85%) Loss: 1.986684 LR: 0.00004642 [03:57:57] Epoch: 1 Batch: 9920/38378 (25.85%) Loss: 1.887674 LR: 0.00004642 [03:57:59] Epoch: 1 Batch: 9921/38378 (25.85%) Loss: 2.039040 LR: 0.00004642 [03:58:01] Epoch: 1 Batch: 9922/38378 (25.85%) Loss: 2.137633 LR: 0.00004642 [03:58:03] Epoch: 1 Batch: 9923/38378 (25.86%) Loss: 2.047436 LR: 0.00004642 [03:58:04] Epoch: 1 Batch: 9924/38378 (25.86%) Loss: 1.973964 LR: 0.00004642 [03:58:06] Epoch: 1 Batch: 9925/38378 (25.86%) Loss: 2.055670 LR: 0.00004641 [03:58:08] Epoch: 1 Batch: 9926/38378 (25.86%) Loss: 1.842709 LR: 0.00004641 [03:58:10] Epoch: 1 Batch: 9927/38378 (25.87%) Loss: 2.083205 LR: 0.00004641 [03:58:12] Epoch: 1 Batch: 9928/38378 (25.87%) Loss: 1.869026 LR: 0.00004641 [03:58:13] Epoch: 1 Batch: 9929/38378 (25.87%) Loss: 1.833418 LR: 0.00004641 [03:58:15] Epoch: 1 Batch: 9930/38378 (25.87%) Loss: 2.069907 LR: 0.00004641 [03:58:17] Epoch: 1 Batch: 9931/38378 (25.88%) Loss: 2.054301 LR: 0.00004641 [03:58:19] Epoch: 1 Batch: 9932/38378 (25.88%) Loss: 1.812065 LR: 0.00004640 [03:58:21] Epoch: 1 Batch: 9933/38378 (25.88%) Loss: 1.967770 LR: 0.00004640 [03:58:22] Epoch: 1 Batch: 9934/38378 (25.88%) Loss: 1.824050 LR: 0.00004640 [03:58:24] Epoch: 1 Batch: 9935/38378 (25.89%) Loss: 2.251893 LR: 0.00004640 [03:58:26] Epoch: 1 Batch: 9936/38378 (25.89%) Loss: 2.009332 LR: 0.00004640 [03:58:28] Epoch: 1 Batch: 9937/38378 (25.89%) Loss: 1.964569 LR: 0.00004640 [03:58:30] Epoch: 1 Batch: 9938/38378 (25.90%) Loss: 1.789070 LR: 0.00004640 [03:58:32] Epoch: 1 Batch: 9939/38378 (25.90%) Loss: 2.008607 LR: 0.00004639 [03:58:33] Epoch: 1 Batch: 9940/38378 (25.90%) Loss: 1.906156 LR: 0.00004639 [03:58:35] Epoch: 1 Batch: 9941/38378 (25.90%) Loss: 1.843239 LR: 0.00004639 [03:58:37] Epoch: 1 Batch: 9942/38378 (25.91%) Loss: 2.161401 LR: 0.00004639 [03:58:39] Epoch: 1 Batch: 9943/38378 (25.91%) Loss: 2.025804 LR: 0.00004639 [03:58:41] Epoch: 1 Batch: 9944/38378 (25.91%) Loss: 2.213947 LR: 0.00004639 [03:58:43] Epoch: 1 Batch: 9945/38378 (25.91%) Loss: 1.955713 LR: 0.00004639 [03:58:44] Epoch: 1 Batch: 9946/38378 (25.92%) Loss: 2.302667 LR: 0.00004639 [03:58:46] Epoch: 1 Batch: 9947/38378 (25.92%) Loss: 2.079329 LR: 0.00004639 [03:58:48] Epoch: 1 Batch: 9948/38378 (25.92%) Loss: 2.182439 LR: 0.00004639 [03:58:50] Epoch: 1 Batch: 9949/38378 (25.92%) Loss: 2.232221 LR: 0.00004639 [03:58:52] Epoch: 1 Batch: 9950/38378 (25.93%) Loss: 1.978771 LR: 0.00004639 [03:58:53] Epoch: 1 Batch: 9951/38378 (25.93%) Loss: 1.879708 LR: 0.00004639 [03:58:55] Epoch: 1 Batch: 9952/38378 (25.93%) Loss: 1.806601 LR: 0.00004639 [03:58:57] Epoch: 1 Batch: 9953/38378 (25.93%) Loss: 2.238771 LR: 0.00004638 [03:58:59] Epoch: 1 Batch: 9954/38378 (25.94%) Loss: 2.372246 LR: 0.00004638 [03:59:01] Epoch: 1 Batch: 9955/38378 (25.94%) Loss: 2.149771 LR: 0.00004638 [03:59:03] Epoch: 1 Batch: 9956/38378 (25.94%) Loss: 1.810582 LR: 0.00004638 [03:59:04] Epoch: 1 Batch: 9957/38378 (25.94%) Loss: 1.801823 LR: 0.00004638 [03:59:06] Epoch: 1 Batch: 9958/38378 (25.95%) Loss: 2.320636 LR: 0.00004638 [03:59:08] Epoch: 1 Batch: 9959/38378 (25.95%) Loss: 2.470533 LR: 0.00004638 [03:59:10] Epoch: 1 Batch: 9960/38378 (25.95%) Loss: 2.055185 LR: 0.00004637 [03:59:12] Epoch: 1 Batch: 9961/38378 (25.95%) Loss: 2.395270 LR: 0.00004637 [03:59:13] Epoch: 1 Batch: 9962/38378 (25.96%) Loss: 2.085417 LR: 0.00004637 [03:59:15] Epoch: 1 Batch: 9963/38378 (25.96%) Loss: 2.101674 LR: 0.00004637 [03:59:17] Epoch: 1 Batch: 9964/38378 (25.96%) Loss: 2.036698 LR: 0.00004637 [03:59:19] Epoch: 1 Batch: 9965/38378 (25.97%) Loss: 1.938257 LR: 0.00004637 [03:59:21] Epoch: 1 Batch: 9966/38378 (25.97%) Loss: 2.117231 LR: 0.00004637 [03:59:23] Epoch: 1 Batch: 9967/38378 (25.97%) Loss: 1.861233 LR: 0.00004637 [03:59:24] Epoch: 1 Batch: 9968/38378 (25.97%) Loss: 2.064500 LR: 0.00004637 [03:59:26] Epoch: 1 Batch: 9969/38378 (25.98%) Loss: 2.045890 LR: 0.00004637 [03:59:28] Epoch: 1 Batch: 9970/38378 (25.98%) Loss: 1.729218 LR: 0.00004637 [03:59:30] Epoch: 1 Batch: 9971/38378 (25.98%) Loss: 2.197456 LR: 0.00004637 [03:59:32] Epoch: 1 Batch: 9972/38378 (25.98%) Loss: 2.019245 LR: 0.00004637 [03:59:33] Epoch: 1 Batch: 9973/38378 (25.99%) Loss: 1.968275 LR: 0.00004637 [03:59:35] Epoch: 1 Batch: 9974/38378 (25.99%) Loss: 1.944799 LR: 0.00004636 [03:59:37] Epoch: 1 Batch: 9975/38378 (25.99%) Loss: 2.070168 LR: 0.00004636 [03:59:39] Epoch: 1 Batch: 9976/38378 (25.99%) Loss: 1.791415 LR: 0.00004636 [03:59:41] Epoch: 1 Batch: 9977/38378 (26.00%) Loss: 2.020802 LR: 0.00004636 [03:59:42] Epoch: 1 Batch: 9978/38378 (26.00%) Loss: 2.034825 LR: 0.00004636 [03:59:44] Epoch: 1 Batch: 9979/38378 (26.00%) Loss: 1.593609 LR: 0.00004636 [03:59:46] Epoch: 1 Batch: 9980/38378 (26.00%) Loss: 2.158105 LR: 0.00004636 [03:59:48] Epoch: 1 Batch: 9981/38378 (26.01%) Loss: 2.182292 LR: 0.00004635 [03:59:50] Epoch: 1 Batch: 9982/38378 (26.01%) Loss: 1.837479 LR: 0.00004635 [03:59:51] Epoch: 1 Batch: 9983/38378 (26.01%) Loss: 1.796456 LR: 0.00004635 [03:59:53] Epoch: 1 Batch: 9984/38378 (26.01%) Loss: 2.124022 LR: 0.00004635 [03:59:55] Epoch: 1 Batch: 9985/38378 (26.02%) Loss: 2.086068 LR: 0.00004635 [03:59:57] Epoch: 1 Batch: 9986/38378 (26.02%) Loss: 1.896826 LR: 0.00004635 [03:59:59] Epoch: 1 Batch: 9987/38378 (26.02%) Loss: 1.984435 LR: 0.00004635 [04:00:01] Epoch: 1 Batch: 9988/38378 (26.03%) Loss: 2.212707 LR: 0.00004634 [04:00:02] Epoch: 1 Batch: 9989/38378 (26.03%) Loss: 2.217299 LR: 0.00004634 [04:00:04] Epoch: 1 Batch: 9990/38378 (26.03%) Loss: 2.108394 LR: 0.00004634 [04:00:06] Epoch: 1 Batch: 9991/38378 (26.03%) Loss: 2.270349 LR: 0.00004634 [04:00:08] Epoch: 1 Batch: 9992/38378 (26.04%) Loss: 2.015316 LR: 0.00004634 [04:00:10] Epoch: 1 Batch: 9993/38378 (26.04%) Loss: 1.933327 LR: 0.00004634 [04:00:12] Epoch: 1 Batch: 9994/38378 (26.04%) Loss: 1.961929 LR: 0.00004634 [04:00:13] Epoch: 1 Batch: 9995/38378 (26.04%) Loss: 2.051817 LR: 0.00004634 [04:00:15] Epoch: 1 Batch: 9996/38378 (26.05%) Loss: 2.093563 LR: 0.00004634 [04:00:17] Epoch: 1 Batch: 9997/38378 (26.05%) Loss: 2.073246 LR: 0.00004634 [04:00:19] Epoch: 1 Batch: 9998/38378 (26.05%) Loss: 2.139817 LR: 0.00004634 [04:00:21] Epoch: 1 Batch: 9999/38378 (26.05%) Loss: 2.187880 LR: 0.00004634 [04:00:23] >> Evaluating batch 0 [04:00:24] >> Evaluating batch 1 [04:00:24] >> Evaluating batch 2 [04:00:25] >> Evaluating batch 3 [04:00:26] >> Evaluating batch 4 [04:00:27] >> Evaluating batch 5 [04:00:28] >> Evaluating batch 6 [04:00:29] >> Evaluating batch 7 [04:00:30] >> Evaluating batch 8 [04:00:31] >> Evaluating batch 9 [04:00:32] >> Evaluating batch 10 [04:00:33] >> Evaluating batch 11 [04:00:34] >> Evaluating batch 12 [04:00:35] >> Evaluating batch 13 [04:00:36] >> Evaluating batch 14 [04:00:38] >> Evaluating batch 15 [04:00:39] >> Evaluating batch 16 [04:00:39] Epoch: 1 Step: 10000/38378 Evaluation: [04:00:39] [1mAvg Loss Since Last Eval: 2.0009 Val Loss: 2.1144 Validation loss delta: -0.0231 Perplexity: 8.2847 LR: 0.00004634 [04:00:43] >> Cleaned up old temp checkpoint: epoch1_step8000 [04:00:43] >> Temp checkpoint saved: epoch1_step10000, size: 0.1702 GB [04:00:48] >> Checkpoint saved: epoch1_step10000, size: 0.1702 GB [04:00:48] Epoch: 1 Batch: 10000/38378 (26.06%) Loss: 1.787469 LR: 0.00004634 [04:00:49] Epoch: 1 Batch: 10001/38378 (26.06%) Loss: 2.080121 LR: 0.00004634 [04:00:51] Epoch: 1 Batch: 10002/38378 (26.06%) Loss: 1.805490 LR: 0.00004633 [04:00:53] Epoch: 1 Batch: 10003/38378 (26.06%) Loss: 1.988428 LR: 0.00004633 [04:00:55] Epoch: 1 Batch: 10004/38378 (26.07%) Loss: 1.756595 LR: 0.00004633 [04:00:56] Epoch: 1 Batch: 10005/38378 (26.07%) Loss: 2.138768 LR: 0.00004633 [04:00:58] Epoch: 1 Batch: 10006/38378 (26.07%) Loss: 2.218801 LR: 0.00004633 [04:01:00] Epoch: 1 Batch: 10007/38378 (26.07%) Loss: 2.215887 LR: 0.00004633 [04:01:02] Epoch: 1 Batch: 10008/38378 (26.08%) Loss: 2.223219 LR: 0.00004633 [04:01:04] Epoch: 1 Batch: 10009/38378 (26.08%) Loss: 1.828655 LR: 0.00004632 [04:01:06] Epoch: 1 Batch: 10010/38378 (26.08%) Loss: 1.920265 LR: 0.00004632 [04:01:07] Epoch: 1 Batch: 10011/38378 (26.09%) Loss: 1.941513 LR: 0.00004632 [04:01:09] Epoch: 1 Batch: 10012/38378 (26.09%) Loss: 2.308464 LR: 0.00004632 [04:01:11] Epoch: 1 Batch: 10013/38378 (26.09%) Loss: 2.085204 LR: 0.00004632 [04:01:13] Epoch: 1 Batch: 10014/38378 (26.09%) Loss: 2.050099 LR: 0.00004632 [04:01:15] Epoch: 1 Batch: 10015/38378 (26.10%) Loss: 2.098574 LR: 0.00004632 [04:01:17] Epoch: 1 Batch: 10016/38378 (26.10%) Loss: 1.912149 LR: 0.00004632 [04:01:19] Epoch: 1 Batch: 10017/38378 (26.10%) Loss: 2.057008 LR: 0.00004632 [04:01:21] Epoch: 1 Batch: 10018/38378 (26.10%) Loss: 2.163745 LR: 0.00004632 [04:01:23] Epoch: 1 Batch: 10019/38378 (26.11%) Loss: 1.791170 LR: 0.00004632 [04:01:24] Epoch: 1 Batch: 10020/38378 (26.11%) Loss: 2.261635 LR: 0.00004632 [04:01:26] Epoch: 1 Batch: 10021/38378 (26.11%) Loss: 2.061695 LR: 0.00004632 [04:01:28] Epoch: 1 Batch: 10022/38378 (26.11%) Loss: 2.118419 LR: 0.00004632 [04:01:30] Epoch: 1 Batch: 10023/38378 (26.12%) Loss: 1.648465 LR: 0.00004631 [04:01:32] Epoch: 1 Batch: 10024/38378 (26.12%) Loss: 1.928131 LR: 0.00004631 [04:01:34] Epoch: 1 Batch: 10025/38378 (26.12%) Loss: 2.282211 LR: 0.00004631 [04:01:35] Epoch: 1 Batch: 10026/38378 (26.12%) Loss: 2.424745 LR: 0.00004631 [04:01:37] Epoch: 1 Batch: 10027/38378 (26.13%) Loss: 2.099145 LR: 0.00004631 [04:01:39] Epoch: 1 Batch: 10028/38378 (26.13%) Loss: 2.028864 LR: 0.00004631 [04:01:41] Epoch: 1 Batch: 10029/38378 (26.13%) Loss: 1.918000 LR: 0.00004631 [04:01:43] Epoch: 1 Batch: 10030/38378 (26.13%) Loss: 1.994665 LR: 0.00004630 [04:01:44] Epoch: 1 Batch: 10031/38378 (26.14%) Loss: 2.159721 LR: 0.00004630 [04:01:46] Epoch: 1 Batch: 10032/38378 (26.14%) Loss: 1.965564 LR: 0.00004630 [04:01:48] Epoch: 1 Batch: 10033/38378 (26.14%) Loss: 1.965433 LR: 0.00004630 [04:01:50] Epoch: 1 Batch: 10034/38378 (26.15%) Loss: 1.860112 LR: 0.00004630 [04:01:52] Epoch: 1 Batch: 10035/38378 (26.15%) Loss: 2.009096 LR: 0.00004630 [04:01:53] Epoch: 1 Batch: 10036/38378 (26.15%) Loss: 1.960438 LR: 0.00004630 [04:01:55] Epoch: 1 Batch: 10037/38378 (26.15%) Loss: 2.070268 LR: 0.00004629 [04:01:57] Epoch: 1 Batch: 10038/38378 (26.16%) Loss: 1.925712 LR: 0.00004629 [04:01:59] Epoch: 1 Batch: 10039/38378 (26.16%) Loss: 2.133903 LR: 0.00004629 [04:02:00] Epoch: 1 Batch: 10040/38378 (26.16%) Loss: 1.849491 LR: 0.00004629 [04:02:02] Epoch: 1 Batch: 10041/38378 (26.16%) Loss: 2.430716 LR: 0.00004629 [04:02:04] Epoch: 1 Batch: 10042/38378 (26.17%) Loss: 2.154561 LR: 0.00004629 [04:02:06] Epoch: 1 Batch: 10043/38378 (26.17%) Loss: 1.912096 LR: 0.00004629 [04:02:08] Epoch: 1 Batch: 10044/38378 (26.17%) Loss: 2.160287 LR: 0.00004629 [04:02:10] Epoch: 1 Batch: 10045/38378 (26.17%) Loss: 1.952944 LR: 0.00004629 [04:02:11] Epoch: 1 Batch: 10046/38378 (26.18%) Loss: 1.871431 LR: 0.00004629 [04:02:13] Epoch: 1 Batch: 10047/38378 (26.18%) Loss: 2.281508 LR: 0.00004629 [04:02:15] Epoch: 1 Batch: 10048/38378 (26.18%) Loss: 2.172200 LR: 0.00004629 [04:02:17] Epoch: 1 Batch: 10049/38378 (26.18%) Loss: 2.135591 LR: 0.00004629 [04:02:19] Epoch: 1 Batch: 10050/38378 (26.19%) Loss: 2.045250 LR: 0.00004629 [04:02:20] Epoch: 1 Batch: 10051/38378 (26.19%) Loss: 2.097167 LR: 0.00004628 [04:02:22] Epoch: 1 Batch: 10052/38378 (26.19%) Loss: 2.029287 LR: 0.00004628 [04:02:24] Epoch: 1 Batch: 10053/38378 (26.19%) Loss: 2.075479 LR: 0.00004628 [04:02:26] Epoch: 1 Batch: 10054/38378 (26.20%) Loss: 2.308260 LR: 0.00004628 [04:02:28] Epoch: 1 Batch: 10055/38378 (26.20%) Loss: 1.752046 LR: 0.00004628 [04:02:29] Epoch: 1 Batch: 10056/38378 (26.20%) Loss: 1.899346 LR: 0.00004628 [04:02:31] Epoch: 1 Batch: 10057/38378 (26.21%) Loss: 1.837534 LR: 0.00004628 [04:02:33] Epoch: 1 Batch: 10058/38378 (26.21%) Loss: 2.074875 LR: 0.00004627 [04:02:35] Epoch: 1 Batch: 10059/38378 (26.21%) Loss: 1.996054 LR: 0.00004627 [04:02:37] Epoch: 1 Batch: 10060/38378 (26.21%) Loss: 2.154859 LR: 0.00004627 [04:02:39] Epoch: 1 Batch: 10061/38378 (26.22%) Loss: 1.961843 LR: 0.00004627 [04:02:40] Epoch: 1 Batch: 10062/38378 (26.22%) Loss: 2.141573 LR: 0.00004627 [04:02:42] Epoch: 1 Batch: 10063/38378 (26.22%) Loss: 1.932421 LR: 0.00004627 [04:02:44] Epoch: 1 Batch: 10064/38378 (26.22%) Loss: 2.064615 LR: 0.00004627 [04:02:46] Epoch: 1 Batch: 10065/38378 (26.23%) Loss: 2.037143 LR: 0.00004626 [04:02:48] Epoch: 1 Batch: 10066/38378 (26.23%) Loss: 1.799222 LR: 0.00004626 [04:02:49] Epoch: 1 Batch: 10067/38378 (26.23%) Loss: 1.867523 LR: 0.00004626 [04:02:51] Epoch: 1 Batch: 10068/38378 (26.23%) Loss: 2.178023 LR: 0.00004626 [04:02:53] Epoch: 1 Batch: 10069/38378 (26.24%) Loss: 1.997376 LR: 0.00004626 [04:02:55] Epoch: 1 Batch: 10070/38378 (26.24%) Loss: 2.098215 LR: 0.00004626 [04:02:57] Epoch: 1 Batch: 10071/38378 (26.24%) Loss: 1.952506 LR: 0.00004626 [04:02:59] Epoch: 1 Batch: 10072/38378 (26.24%) Loss: 2.201558 LR: 0.00004626 [04:03:00] Epoch: 1 Batch: 10073/38378 (26.25%) Loss: 1.946430 LR: 0.00004626 [04:03:02] Epoch: 1 Batch: 10074/38378 (26.25%) Loss: 1.991806 LR: 0.00004626 [04:03:04] Epoch: 1 Batch: 10075/38378 (26.25%) Loss: 1.804087 LR: 0.00004626 [04:03:06] Epoch: 1 Batch: 10076/38378 (26.25%) Loss: 1.833003 LR: 0.00004626 [04:03:08] Epoch: 1 Batch: 10077/38378 (26.26%) Loss: 2.020130 LR: 0.00004626 [04:03:09] Epoch: 1 Batch: 10078/38378 (26.26%) Loss: 2.063711 LR: 0.00004626 [04:03:11] Epoch: 1 Batch: 10079/38378 (26.26%) Loss: 1.631156 LR: 0.00004625 [04:03:13] Epoch: 1 Batch: 10080/38378 (26.27%) Loss: 1.897123 LR: 0.00004625 [04:03:15] Epoch: 1 Batch: 10081/38378 (26.27%) Loss: 2.036788 LR: 0.00004625 [04:03:17] Epoch: 1 Batch: 10082/38378 (26.27%) Loss: 2.023436 LR: 0.00004625 [04:03:19] Epoch: 1 Batch: 10083/38378 (26.27%) Loss: 2.173907 LR: 0.00004625 [04:03:20] Epoch: 1 Batch: 10084/38378 (26.28%) Loss: 2.193174 LR: 0.00004625 [04:03:22] Epoch: 1 Batch: 10085/38378 (26.28%) Loss: 1.820272 LR: 0.00004625 [04:03:24] Epoch: 1 Batch: 10086/38378 (26.28%) Loss: 2.004534 LR: 0.00004624 [04:03:26] Epoch: 1 Batch: 10087/38378 (26.28%) Loss: 1.959461 LR: 0.00004624 [04:03:27] Epoch: 1 Batch: 10088/38378 (26.29%) Loss: 2.266957 LR: 0.00004624 [04:03:29] Epoch: 1 Batch: 10089/38378 (26.29%) Loss: 2.208118 LR: 0.00004624 [04:03:31] Epoch: 1 Batch: 10090/38378 (26.29%) Loss: 2.450329 LR: 0.00004624 [04:03:33] Epoch: 1 Batch: 10091/38378 (26.29%) Loss: 2.258820 LR: 0.00004624 [04:03:35] Epoch: 1 Batch: 10092/38378 (26.30%) Loss: 2.020242 LR: 0.00004624 [04:03:37] Epoch: 1 Batch: 10093/38378 (26.30%) Loss: 1.807028 LR: 0.00004624 [04:03:38] Epoch: 1 Batch: 10094/38378 (26.30%) Loss: 1.983063 LR: 0.00004624 [04:03:40] Epoch: 1 Batch: 10095/38378 (26.30%) Loss: 1.922329 LR: 0.00004624 [04:03:42] Epoch: 1 Batch: 10096/38378 (26.31%) Loss: 2.174489 LR: 0.00004624 [04:03:44] Epoch: 1 Batch: 10097/38378 (26.31%) Loss: 1.952657 LR: 0.00004624 [04:03:46] Epoch: 1 Batch: 10098/38378 (26.31%) Loss: 2.042056 LR: 0.00004624 [04:03:47] Epoch: 1 Batch: 10099/38378 (26.31%) Loss: 2.084369 LR: 0.00004624 [04:03:49] Epoch: 1 Batch: 10100/38378 (26.32%) Loss: 2.110110 LR: 0.00004623 [04:03:51] Epoch: 1 Batch: 10101/38378 (26.32%) Loss: 2.354767 LR: 0.00004623 [04:03:53] Epoch: 1 Batch: 10102/38378 (26.32%) Loss: 2.037348 LR: 0.00004623 [04:03:55] Epoch: 1 Batch: 10103/38378 (26.32%) Loss: 2.035642 LR: 0.00004623 [04:03:56] Epoch: 1 Batch: 10104/38378 (26.33%) Loss: 2.240328 LR: 0.00004623 [04:03:58] Epoch: 1 Batch: 10105/38378 (26.33%) Loss: 2.312918 LR: 0.00004623 [04:04:00] Epoch: 1 Batch: 10106/38378 (26.33%) Loss: 2.205818 LR: 0.00004623 [04:04:02] Epoch: 1 Batch: 10107/38378 (26.34%) Loss: 1.894610 LR: 0.00004622 [04:04:04] Epoch: 1 Batch: 10108/38378 (26.34%) Loss: 1.818871 LR: 0.00004622 [04:04:05] Epoch: 1 Batch: 10109/38378 (26.34%) Loss: 2.031973 LR: 0.00004622 [04:04:07] Epoch: 1 Batch: 10110/38378 (26.34%) Loss: 2.189125 LR: 0.00004622 [04:04:09] Epoch: 1 Batch: 10111/38378 (26.35%) Loss: 1.949501 LR: 0.00004622 [04:04:11] Epoch: 1 Batch: 10112/38378 (26.35%) Loss: 1.671927 LR: 0.00004622 [04:04:13] Epoch: 1 Batch: 10113/38378 (26.35%) Loss: 1.871904 LR: 0.00004622 [04:04:14] Epoch: 1 Batch: 10114/38378 (26.35%) Loss: 1.868408 LR: 0.00004621 [04:04:16] Epoch: 1 Batch: 10115/38378 (26.36%) Loss: 2.133748 LR: 0.00004621 [04:04:18] Epoch: 1 Batch: 10116/38378 (26.36%) Loss: 1.967559 LR: 0.00004621 [04:04:20] Epoch: 1 Batch: 10117/38378 (26.36%) Loss: 2.210897 LR: 0.00004621 [04:04:22] Epoch: 1 Batch: 10118/38378 (26.36%) Loss: 2.228588 LR: 0.00004621 [04:04:23] Epoch: 1 Batch: 10119/38378 (26.37%) Loss: 2.002855 LR: 0.00004621 [04:04:25] Epoch: 1 Batch: 10120/38378 (26.37%) Loss: 1.935039 LR: 0.00004621 [04:04:27] Epoch: 1 Batch: 10121/38378 (26.37%) Loss: 1.724454 LR: 0.00004621 [04:04:29] Epoch: 1 Batch: 10122/38378 (26.37%) Loss: 2.005616 LR: 0.00004621 [04:04:31] Epoch: 1 Batch: 10123/38378 (26.38%) Loss: 2.181191 LR: 0.00004621 [04:04:32] Epoch: 1 Batch: 10124/38378 (26.38%) Loss: 2.065157 LR: 0.00004621 [04:04:34] Epoch: 1 Batch: 10125/38378 (26.38%) Loss: 1.859669 LR: 0.00004621 [04:04:36] Epoch: 1 Batch: 10126/38378 (26.38%) Loss: 2.269369 LR: 0.00004621 [04:04:38] Epoch: 1 Batch: 10127/38378 (26.39%) Loss: 1.967286 LR: 0.00004621 [04:04:40] Epoch: 1 Batch: 10128/38378 (26.39%) Loss: 1.760852 LR: 0.00004620 [04:04:42] Epoch: 1 Batch: 10129/38378 (26.39%) Loss: 2.180921 LR: 0.00004620 [04:04:43] Epoch: 1 Batch: 10130/38378 (26.40%) Loss: 1.865152 LR: 0.00004620 [04:04:45] Epoch: 1 Batch: 10131/38378 (26.40%) Loss: 1.850074 LR: 0.00004620 [04:04:47] Epoch: 1 Batch: 10132/38378 (26.40%) Loss: 1.945346 LR: 0.00004620 [04:04:49] Epoch: 1 Batch: 10133/38378 (26.40%) Loss: 1.827431 LR: 0.00004620 [04:04:51] Epoch: 1 Batch: 10134/38378 (26.41%) Loss: 2.041720 LR: 0.00004620 [04:04:52] Epoch: 1 Batch: 10135/38378 (26.41%) Loss: 2.160608 LR: 0.00004619 [04:04:54] Epoch: 1 Batch: 10136/38378 (26.41%) Loss: 2.097301 LR: 0.00004619 [04:04:56] Epoch: 1 Batch: 10137/38378 (26.41%) Loss: 1.748609 LR: 0.00004619 [04:04:58] Epoch: 1 Batch: 10138/38378 (26.42%) Loss: 1.762600 LR: 0.00004619 [04:05:00] Epoch: 1 Batch: 10139/38378 (26.42%) Loss: 2.035471 LR: 0.00004619 [04:05:02] Epoch: 1 Batch: 10140/38378 (26.42%) Loss: 1.724686 LR: 0.00004619 [04:05:03] Epoch: 1 Batch: 10141/38378 (26.42%) Loss: 2.153426 LR: 0.00004619 [04:05:05] Epoch: 1 Batch: 10142/38378 (26.43%) Loss: 1.599282 LR: 0.00004618 [04:05:07] Epoch: 1 Batch: 10143/38378 (26.43%) Loss: 2.208833 LR: 0.00004618 [04:05:09] Epoch: 1 Batch: 10144/38378 (26.43%) Loss: 1.720362 LR: 0.00004618 [04:05:11] Epoch: 1 Batch: 10145/38378 (26.43%) Loss: 1.956959 LR: 0.00004618 [04:05:12] Epoch: 1 Batch: 10146/38378 (26.44%) Loss: 2.394100 LR: 0.00004618 [04:05:14] Epoch: 1 Batch: 10147/38378 (26.44%) Loss: 2.322309 LR: 0.00004618 [04:05:16] Epoch: 1 Batch: 10148/38378 (26.44%) Loss: 2.245885 LR: 0.00004618 [04:05:18] Epoch: 1 Batch: 10149/38378 (26.44%) Loss: 2.018982 LR: 0.00004618 [04:05:20] Epoch: 1 Batch: 10150/38378 (26.45%) Loss: 2.364718 LR: 0.00004618 [04:05:21] Epoch: 1 Batch: 10151/38378 (26.45%) Loss: 1.723174 LR: 0.00004618 [04:05:23] Epoch: 1 Batch: 10152/38378 (26.45%) Loss: 2.108822 LR: 0.00004618 [04:05:25] Epoch: 1 Batch: 10153/38378 (26.46%) Loss: 1.945288 LR: 0.00004618 [04:05:27] Epoch: 1 Batch: 10154/38378 (26.46%) Loss: 2.062258 LR: 0.00004618 [04:05:29] Epoch: 1 Batch: 10155/38378 (26.46%) Loss: 2.154566 LR: 0.00004618 [04:05:30] Epoch: 1 Batch: 10156/38378 (26.46%) Loss: 2.230699 LR: 0.00004617 [04:05:32] Epoch: 1 Batch: 10157/38378 (26.47%) Loss: 2.171771 LR: 0.00004617 [04:05:34] Epoch: 1 Batch: 10158/38378 (26.47%) Loss: 1.753684 LR: 0.00004617 [04:05:36] Epoch: 1 Batch: 10159/38378 (26.47%) Loss: 1.759652 LR: 0.00004617 [04:05:38] Epoch: 1 Batch: 10160/38378 (26.47%) Loss: 2.102142 LR: 0.00004617 [04:05:40] Epoch: 1 Batch: 10161/38378 (26.48%) Loss: 1.938723 LR: 0.00004617 [04:05:41] Epoch: 1 Batch: 10162/38378 (26.48%) Loss: 2.015558 LR: 0.00004617 [04:05:43] Epoch: 1 Batch: 10163/38378 (26.48%) Loss: 1.904364 LR: 0.00004616 [04:05:45] Epoch: 1 Batch: 10164/38378 (26.48%) Loss: 1.903070 LR: 0.00004616 [04:05:47] Epoch: 1 Batch: 10165/38378 (26.49%) Loss: 1.862506 LR: 0.00004616 [04:05:49] Epoch: 1 Batch: 10166/38378 (26.49%) Loss: 2.166924 LR: 0.00004616 [04:05:50] Epoch: 1 Batch: 10167/38378 (26.49%) Loss: 1.785182 LR: 0.00004616 [04:05:52] Epoch: 1 Batch: 10168/38378 (26.49%) Loss: 1.908208 LR: 0.00004616 [04:05:54] Epoch: 1 Batch: 10169/38378 (26.50%) Loss: 1.780808 LR: 0.00004616 [04:05:56] Epoch: 1 Batch: 10170/38378 (26.50%) Loss: 1.914504 LR: 0.00004616 [04:05:58] Epoch: 1 Batch: 10171/38378 (26.50%) Loss: 1.931285 LR: 0.00004616 [04:06:00] Epoch: 1 Batch: 10172/38378 (26.50%) Loss: 1.997874 LR: 0.00004616 [04:06:01] Epoch: 1 Batch: 10173/38378 (26.51%) Loss: 1.941471 LR: 0.00004616 [04:06:03] Epoch: 1 Batch: 10174/38378 (26.51%) Loss: 1.816774 LR: 0.00004616 [04:06:05] Epoch: 1 Batch: 10175/38378 (26.51%) Loss: 1.990365 LR: 0.00004616 [04:06:07] Epoch: 1 Batch: 10176/38378 (26.52%) Loss: 2.222850 LR: 0.00004616 [04:06:09] Epoch: 1 Batch: 10177/38378 (26.52%) Loss: 2.031037 LR: 0.00004615 [04:06:11] Epoch: 1 Batch: 10178/38378 (26.52%) Loss: 2.017581 LR: 0.00004615 [04:06:12] Epoch: 1 Batch: 10179/38378 (26.52%) Loss: 2.190814 LR: 0.00004615 [04:06:14] Epoch: 1 Batch: 10180/38378 (26.53%) Loss: 2.028279 LR: 0.00004615 [04:06:16] Epoch: 1 Batch: 10181/38378 (26.53%) Loss: 1.846185 LR: 0.00004615 [04:06:18] Epoch: 1 Batch: 10182/38378 (26.53%) Loss: 1.909859 LR: 0.00004615 [04:06:20] Epoch: 1 Batch: 10183/38378 (26.53%) Loss: 1.845629 LR: 0.00004615 [04:06:22] Epoch: 1 Batch: 10184/38378 (26.54%) Loss: 1.963704 LR: 0.00004614 [04:06:23] Epoch: 1 Batch: 10185/38378 (26.54%) Loss: 1.995903 LR: 0.00004614 [04:06:25] Epoch: 1 Batch: 10186/38378 (26.54%) Loss: 2.474459 LR: 0.00004614 [04:06:27] Epoch: 1 Batch: 10187/38378 (26.54%) Loss: 2.154914 LR: 0.00004614 [04:06:29] Epoch: 1 Batch: 10188/38378 (26.55%) Loss: 2.167427 LR: 0.00004614 [04:06:31] Epoch: 1 Batch: 10189/38378 (26.55%) Loss: 1.871915 LR: 0.00004614 [04:06:32] Epoch: 1 Batch: 10190/38378 (26.55%) Loss: 1.971387 LR: 0.00004614 [04:06:34] Epoch: 1 Batch: 10191/38378 (26.55%) Loss: 2.119719 LR: 0.00004613 [04:06:36] Epoch: 1 Batch: 10192/38378 (26.56%) Loss: 2.157206 LR: 0.00004613 [04:06:38] Epoch: 1 Batch: 10193/38378 (26.56%) Loss: 1.775026 LR: 0.00004613 [04:06:40] Epoch: 1 Batch: 10194/38378 (26.56%) Loss: 2.115146 LR: 0.00004613 [04:06:41] Epoch: 1 Batch: 10195/38378 (26.56%) Loss: 2.223667 LR: 0.00004613 [04:06:43] Epoch: 1 Batch: 10196/38378 (26.57%) Loss: 2.043443 LR: 0.00004613 [04:06:45] Epoch: 1 Batch: 10197/38378 (26.57%) Loss: 2.073015 LR: 0.00004613 [04:06:47] Epoch: 1 Batch: 10198/38378 (26.57%) Loss: 1.967287 LR: 0.00004613 [04:06:48] Epoch: 1 Batch: 10199/38378 (26.58%) Loss: 2.120140 LR: 0.00004613 [04:06:55] >> Cleaned up old temp checkpoint: epoch1_step8200 [04:06:55] >> Temp checkpoint saved: epoch1_step10200, size: 0.1702 GB [04:06:55] Epoch: 1 Batch: 10200/38378 (26.58%) Loss: 1.806471 LR: 0.00004613 [04:06:56] Epoch: 1 Batch: 10201/38378 (26.58%) Loss: 1.941758 LR: 0.00004613 [04:06:58] Epoch: 1 Batch: 10202/38378 (26.58%) Loss: 2.123550 LR: 0.00004613 [04:07:00] Epoch: 1 Batch: 10203/38378 (26.59%) Loss: 1.815551 LR: 0.00004613 [04:07:02] Epoch: 1 Batch: 10204/38378 (26.59%) Loss: 1.689228 LR: 0.00004613 [04:07:04] Epoch: 1 Batch: 10205/38378 (26.59%) Loss: 2.183515 LR: 0.00004612 [04:07:05] Epoch: 1 Batch: 10206/38378 (26.59%) Loss: 2.065373 LR: 0.00004612 [04:07:07] Epoch: 1 Batch: 10207/38378 (26.60%) Loss: 2.133849 LR: 0.00004612 [04:07:09] Epoch: 1 Batch: 10208/38378 (26.60%) Loss: 2.131194 LR: 0.00004612 [04:07:11] Epoch: 1 Batch: 10209/38378 (26.60%) Loss: 1.884495 LR: 0.00004612 [04:07:13] Epoch: 1 Batch: 10210/38378 (26.60%) Loss: 1.979694 LR: 0.00004612 [04:07:15] Epoch: 1 Batch: 10211/38378 (26.61%) Loss: 1.989306 LR: 0.00004612 [04:07:16] Epoch: 1 Batch: 10212/38378 (26.61%) Loss: 1.993174 LR: 0.00004611 [04:07:18] Epoch: 1 Batch: 10213/38378 (26.61%) Loss: 2.238783 LR: 0.00004611 [04:07:20] Epoch: 1 Batch: 10214/38378 (26.61%) Loss: 2.004342 LR: 0.00004611 [04:07:22] Epoch: 1 Batch: 10215/38378 (26.62%) Loss: 2.130722 LR: 0.00004611 [04:07:24] Epoch: 1 Batch: 10216/38378 (26.62%) Loss: 1.764895 LR: 0.00004611 [04:07:26] Epoch: 1 Batch: 10217/38378 (26.62%) Loss: 1.851352 LR: 0.00004611 [04:07:27] Epoch: 1 Batch: 10218/38378 (26.62%) Loss: 1.819882 LR: 0.00004611 [04:07:29] Epoch: 1 Batch: 10219/38378 (26.63%) Loss: 1.899062 LR: 0.00004610 [04:07:31] Epoch: 1 Batch: 10220/38378 (26.63%) Loss: 1.968014 LR: 0.00004610 [04:07:33] Epoch: 1 Batch: 10221/38378 (26.63%) Loss: 2.108230 LR: 0.00004610 [04:07:35] Epoch: 1 Batch: 10222/38378 (26.64%) Loss: 1.800966 LR: 0.00004610 [04:07:37] Epoch: 1 Batch: 10223/38378 (26.64%) Loss: 1.995010 LR: 0.00004610 [04:07:39] Epoch: 1 Batch: 10224/38378 (26.64%) Loss: 2.037763 LR: 0.00004610 [04:07:40] Epoch: 1 Batch: 10225/38378 (26.64%) Loss: 1.805076 LR: 0.00004610 [04:07:42] Epoch: 1 Batch: 10226/38378 (26.65%) Loss: 1.917245 LR: 0.00004610 [04:07:44] Epoch: 1 Batch: 10227/38378 (26.65%) Loss: 2.064476 LR: 0.00004610 [04:07:46] Epoch: 1 Batch: 10228/38378 (26.65%) Loss: 1.992139 LR: 0.00004610 [04:07:48] Epoch: 1 Batch: 10229/38378 (26.65%) Loss: 1.860983 LR: 0.00004610 [04:07:50] Epoch: 1 Batch: 10230/38378 (26.66%) Loss: 2.157642 LR: 0.00004610 [04:07:51] Epoch: 1 Batch: 10231/38378 (26.66%) Loss: 1.759573 LR: 0.00004610 [04:07:53] Epoch: 1 Batch: 10232/38378 (26.66%) Loss: 2.322516 LR: 0.00004610 [04:07:55] Epoch: 1 Batch: 10233/38378 (26.66%) Loss: 1.817274 LR: 0.00004609 [04:07:57] Epoch: 1 Batch: 10234/38378 (26.67%) Loss: 2.169218 LR: 0.00004609 [04:07:59] Epoch: 1 Batch: 10235/38378 (26.67%) Loss: 2.239540 LR: 0.00004609 [04:08:01] Epoch: 1 Batch: 10236/38378 (26.67%) Loss: 2.285468 LR: 0.00004609 [04:08:02] Epoch: 1 Batch: 10237/38378 (26.67%) Loss: 1.716680 LR: 0.00004609 [04:08:04] Epoch: 1 Batch: 10238/38378 (26.68%) Loss: 2.092363 LR: 0.00004609 [04:08:06] Epoch: 1 Batch: 10239/38378 (26.68%) Loss: 2.144930 LR: 0.00004609 [04:08:08] Epoch: 1 Batch: 10240/38378 (26.68%) Loss: 1.944552 LR: 0.00004608 [04:08:10] Epoch: 1 Batch: 10241/38378 (26.68%) Loss: 2.104501 LR: 0.00004608 [04:08:11] Epoch: 1 Batch: 10242/38378 (26.69%) Loss: 2.053567 LR: 0.00004608 [04:08:13] Epoch: 1 Batch: 10243/38378 (26.69%) Loss: 1.900120 LR: 0.00004608 [04:08:15] Epoch: 1 Batch: 10244/38378 (26.69%) Loss: 2.035473 LR: 0.00004608 [04:08:17] Epoch: 1 Batch: 10245/38378 (26.69%) Loss: 1.891128 LR: 0.00004608 [04:08:19] Epoch: 1 Batch: 10246/38378 (26.70%) Loss: 1.959550 LR: 0.00004608 [04:08:20] Epoch: 1 Batch: 10247/38378 (26.70%) Loss: 1.859036 LR: 0.00004607 [04:08:22] Epoch: 1 Batch: 10248/38378 (26.70%) Loss: 2.177996 LR: 0.00004607 [04:08:24] Epoch: 1 Batch: 10249/38378 (26.71%) Loss: 2.186305 LR: 0.00004607 [04:08:26] Epoch: 1 Batch: 10250/38378 (26.71%) Loss: 1.963618 LR: 0.00004607 [04:08:27] Epoch: 1 Batch: 10251/38378 (26.71%) Loss: 2.208262 LR: 0.00004607 [04:08:29] Epoch: 1 Batch: 10252/38378 (26.71%) Loss: 2.052716 LR: 0.00004607 [04:08:31] Epoch: 1 Batch: 10253/38378 (26.72%) Loss: 2.162185 LR: 0.00004607 [04:08:33] Epoch: 1 Batch: 10254/38378 (26.72%) Loss: 1.839076 LR: 0.00004607 [04:08:35] Epoch: 1 Batch: 10255/38378 (26.72%) Loss: 2.131784 LR: 0.00004607 [04:08:36] Epoch: 1 Batch: 10256/38378 (26.72%) Loss: 1.903915 LR: 0.00004607 [04:08:38] Epoch: 1 Batch: 10257/38378 (26.73%) Loss: 2.012946 LR: 0.00004607 [04:08:40] Epoch: 1 Batch: 10258/38378 (26.73%) Loss: 2.060828 LR: 0.00004607 [04:08:42] Epoch: 1 Batch: 10259/38378 (26.73%) Loss: 2.080993 LR: 0.00004607 [04:08:44] Epoch: 1 Batch: 10260/38378 (26.73%) Loss: 1.793428 LR: 0.00004607 [04:08:46] Epoch: 1 Batch: 10261/38378 (26.74%) Loss: 2.054028 LR: 0.00004606 [04:08:47] Epoch: 1 Batch: 10262/38378 (26.74%) Loss: 2.117969 LR: 0.00004606 [04:08:49] Epoch: 1 Batch: 10263/38378 (26.74%) Loss: 2.032524 LR: 0.00004606 [04:08:51] Epoch: 1 Batch: 10264/38378 (26.74%) Loss: 2.167358 LR: 0.00004606 [04:08:53] Epoch: 1 Batch: 10265/38378 (26.75%) Loss: 2.107014 LR: 0.00004606 [04:08:55] Epoch: 1 Batch: 10266/38378 (26.75%) Loss: 1.991382 LR: 0.00004606 [04:08:56] Epoch: 1 Batch: 10267/38378 (26.75%) Loss: 1.501072 LR: 0.00004606 [04:08:58] Epoch: 1 Batch: 10268/38378 (26.75%) Loss: 2.235712 LR: 0.00004605 [04:09:00] Epoch: 1 Batch: 10269/38378 (26.76%) Loss: 2.034016 LR: 0.00004605 [04:09:02] Epoch: 1 Batch: 10270/38378 (26.76%) Loss: 2.059857 LR: 0.00004605 [04:09:04] Epoch: 1 Batch: 10271/38378 (26.76%) Loss: 2.029543 LR: 0.00004605 [04:09:06] Epoch: 1 Batch: 10272/38378 (26.77%) Loss: 2.276239 LR: 0.00004605 [04:09:07] Epoch: 1 Batch: 10273/38378 (26.77%) Loss: 2.023857 LR: 0.00004605 [04:09:09] Epoch: 1 Batch: 10274/38378 (26.77%) Loss: 2.072550 LR: 0.00004605 [04:09:11] Epoch: 1 Batch: 10275/38378 (26.77%) Loss: 1.967254 LR: 0.00004604 [04:09:13] Epoch: 1 Batch: 10276/38378 (26.78%) Loss: 2.255588 LR: 0.00004604 [04:09:15] Epoch: 1 Batch: 10277/38378 (26.78%) Loss: 2.091327 LR: 0.00004604 [04:09:16] Epoch: 1 Batch: 10278/38378 (26.78%) Loss: 1.805558 LR: 0.00004604 [04:09:18] Epoch: 1 Batch: 10279/38378 (26.78%) Loss: 1.908792 LR: 0.00004604 [04:09:20] Epoch: 1 Batch: 10280/38378 (26.79%) Loss: 1.868741 LR: 0.00004604 [04:09:22] Epoch: 1 Batch: 10281/38378 (26.79%) Loss: 1.960735 LR: 0.00004604 [04:09:24] Epoch: 1 Batch: 10282/38378 (26.79%) Loss: 1.706095 LR: 0.00004604 [04:09:26] Epoch: 1 Batch: 10283/38378 (26.79%) Loss: 1.936148 LR: 0.00004604 [04:09:27] Epoch: 1 Batch: 10284/38378 (26.80%) Loss: 2.076582 LR: 0.00004604 [04:09:29] Epoch: 1 Batch: 10285/38378 (26.80%) Loss: 1.990391 LR: 0.00004604 [04:09:31] Epoch: 1 Batch: 10286/38378 (26.80%) Loss: 1.856547 LR: 0.00004604 [04:09:33] Epoch: 1 Batch: 10287/38378 (26.80%) Loss: 2.083800 LR: 0.00004604 [04:09:35] Epoch: 1 Batch: 10288/38378 (26.81%) Loss: 1.705945 LR: 0.00004604 [04:09:37] Epoch: 1 Batch: 10289/38378 (26.81%) Loss: 1.953500 LR: 0.00004603 [04:09:38] Epoch: 1 Batch: 10290/38378 (26.81%) Loss: 2.313994 LR: 0.00004603 [04:09:40] Epoch: 1 Batch: 10291/38378 (26.81%) Loss: 2.192066 LR: 0.00004603 [04:09:42] Epoch: 1 Batch: 10292/38378 (26.82%) Loss: 2.091915 LR: 0.00004603 [04:09:44] Epoch: 1 Batch: 10293/38378 (26.82%) Loss: 1.790621 LR: 0.00004603 [04:09:46] Epoch: 1 Batch: 10294/38378 (26.82%) Loss: 1.755899 LR: 0.00004603 [04:09:47] Epoch: 1 Batch: 10295/38378 (26.83%) Loss: 1.676206 LR: 0.00004603 [04:09:49] Epoch: 1 Batch: 10296/38378 (26.83%) Loss: 1.944771 LR: 0.00004602 [04:09:51] Epoch: 1 Batch: 10297/38378 (26.83%) Loss: 1.873606 LR: 0.00004602 [04:09:53] Epoch: 1 Batch: 10298/38378 (26.83%) Loss: 1.930361 LR: 0.00004602 [04:09:55] Epoch: 1 Batch: 10299/38378 (26.84%) Loss: 2.201552 LR: 0.00004602 [04:09:56] Epoch: 1 Batch: 10300/38378 (26.84%) Loss: 1.916859 LR: 0.00004602 [04:09:58] Epoch: 1 Batch: 10301/38378 (26.84%) Loss: 2.020654 LR: 0.00004602 [04:10:00] Epoch: 1 Batch: 10302/38378 (26.84%) Loss: 1.920342 LR: 0.00004602 [04:10:02] Epoch: 1 Batch: 10303/38378 (26.85%) Loss: 2.283105 LR: 0.00004601 [04:10:04] Epoch: 1 Batch: 10304/38378 (26.85%) Loss: 2.044371 LR: 0.00004601 [04:10:05] Epoch: 1 Batch: 10305/38378 (26.85%) Loss: 2.009887 LR: 0.00004601 [04:10:07] Epoch: 1 Batch: 10306/38378 (26.85%) Loss: 2.009286 LR: 0.00004601 [04:10:09] Epoch: 1 Batch: 10307/38378 (26.86%) Loss: 1.943626 LR: 0.00004601 [04:10:11] Epoch: 1 Batch: 10308/38378 (26.86%) Loss: 1.729240 LR: 0.00004601 [04:10:13] Epoch: 1 Batch: 10309/38378 (26.86%) Loss: 2.001116 LR: 0.00004601 [04:10:14] Epoch: 1 Batch: 10310/38378 (26.86%) Loss: 1.850164 LR: 0.00004601 [04:10:16] Epoch: 1 Batch: 10311/38378 (26.87%) Loss: 1.748134 LR: 0.00004601 [04:10:18] Epoch: 1 Batch: 10312/38378 (26.87%) Loss: 2.294315 LR: 0.00004601 [04:10:20] Epoch: 1 Batch: 10313/38378 (26.87%) Loss: 1.943611 LR: 0.00004601 [04:10:22] Epoch: 1 Batch: 10314/38378 (26.87%) Loss: 2.005002 LR: 0.00004601 [04:10:24] Epoch: 1 Batch: 10315/38378 (26.88%) Loss: 1.767763 LR: 0.00004601 [04:10:25] Epoch: 1 Batch: 10316/38378 (26.88%) Loss: 1.891569 LR: 0.00004601 [04:10:27] Epoch: 1 Batch: 10317/38378 (26.88%) Loss: 2.017306 LR: 0.00004600 [04:10:29] Epoch: 1 Batch: 10318/38378 (26.89%) Loss: 2.013476 LR: 0.00004600 [04:10:31] Epoch: 1 Batch: 10319/38378 (26.89%) Loss: 1.896694 LR: 0.00004600 [04:10:33] Epoch: 1 Batch: 10320/38378 (26.89%) Loss: 2.068624 LR: 0.00004600 [04:10:34] Epoch: 1 Batch: 10321/38378 (26.89%) Loss: 2.089064 LR: 0.00004600 [04:10:36] Epoch: 1 Batch: 10322/38378 (26.90%) Loss: 1.884260 LR: 0.00004600 [04:10:38] Epoch: 1 Batch: 10323/38378 (26.90%) Loss: 1.994137 LR: 0.00004600 [04:10:40] Epoch: 1 Batch: 10324/38378 (26.90%) Loss: 2.067861 LR: 0.00004599 [04:10:42] Epoch: 1 Batch: 10325/38378 (26.90%) Loss: 1.600635 LR: 0.00004599 [04:10:44] Epoch: 1 Batch: 10326/38378 (26.91%) Loss: 1.752390 LR: 0.00004599 [04:10:45] Epoch: 1 Batch: 10327/38378 (26.91%) Loss: 1.874911 LR: 0.00004599 [04:10:47] Epoch: 1 Batch: 10328/38378 (26.91%) Loss: 2.171140 LR: 0.00004599 [04:10:49] Epoch: 1 Batch: 10329/38378 (26.91%) Loss: 1.918843 LR: 0.00004599 [04:10:51] Epoch: 1 Batch: 10330/38378 (26.92%) Loss: 2.107302 LR: 0.00004599 [04:10:53] Epoch: 1 Batch: 10331/38378 (26.92%) Loss: 1.812537 LR: 0.00004598 [04:10:55] Epoch: 1 Batch: 10332/38378 (26.92%) Loss: 2.225427 LR: 0.00004598 [04:10:56] Epoch: 1 Batch: 10333/38378 (26.92%) Loss: 1.831983 LR: 0.00004598 [04:10:58] Epoch: 1 Batch: 10334/38378 (26.93%) Loss: 1.998510 LR: 0.00004598 [04:11:00] Epoch: 1 Batch: 10335/38378 (26.93%) Loss: 1.974996 LR: 0.00004598 [04:11:02] Epoch: 1 Batch: 10336/38378 (26.93%) Loss: 1.884161 LR: 0.00004598 [04:11:04] Epoch: 1 Batch: 10337/38378 (26.93%) Loss: 2.039936 LR: 0.00004598 [04:11:05] Epoch: 1 Batch: 10338/38378 (26.94%) Loss: 2.177691 LR: 0.00004598 [04:11:07] Epoch: 1 Batch: 10339/38378 (26.94%) Loss: 2.013855 LR: 0.00004598 [04:11:09] Epoch: 1 Batch: 10340/38378 (26.94%) Loss: 2.051358 LR: 0.00004598 [04:11:11] Epoch: 1 Batch: 10341/38378 (26.95%) Loss: 1.958020 LR: 0.00004598 [04:11:13] Epoch: 1 Batch: 10342/38378 (26.95%) Loss: 1.634267 LR: 0.00004598 [04:11:15] Epoch: 1 Batch: 10343/38378 (26.95%) Loss: 2.088063 LR: 0.00004598 [04:11:16] Epoch: 1 Batch: 10344/38378 (26.95%) Loss: 1.897043 LR: 0.00004598 [04:11:18] Epoch: 1 Batch: 10345/38378 (26.96%) Loss: 1.996751 LR: 0.00004597 [04:11:20] Epoch: 1 Batch: 10346/38378 (26.96%) Loss: 2.085182 LR: 0.00004597 [04:11:22] Epoch: 1 Batch: 10347/38378 (26.96%) Loss: 2.098017 LR: 0.00004597 [04:11:24] Epoch: 1 Batch: 10348/38378 (26.96%) Loss: 1.919711 LR: 0.00004597 [04:11:25] Epoch: 1 Batch: 10349/38378 (26.97%) Loss: 2.091734 LR: 0.00004597 [04:11:27] Epoch: 1 Batch: 10350/38378 (26.97%) Loss: 1.982766 LR: 0.00004597 [04:11:29] Epoch: 1 Batch: 10351/38378 (26.97%) Loss: 1.974493 LR: 0.00004597 [04:11:31] Epoch: 1 Batch: 10352/38378 (26.97%) Loss: 1.955423 LR: 0.00004596 [04:11:33] Epoch: 1 Batch: 10353/38378 (26.98%) Loss: 1.848359 LR: 0.00004596 [04:11:34] Epoch: 1 Batch: 10354/38378 (26.98%) Loss: 2.064606 LR: 0.00004596 [04:11:36] Epoch: 1 Batch: 10355/38378 (26.98%) Loss: 2.061454 LR: 0.00004596 [04:11:38] Epoch: 1 Batch: 10356/38378 (26.98%) Loss: 1.947181 LR: 0.00004596 [04:11:40] Epoch: 1 Batch: 10357/38378 (26.99%) Loss: 1.778235 LR: 0.00004596 [04:11:42] Epoch: 1 Batch: 10358/38378 (26.99%) Loss: 2.042944 LR: 0.00004596 [04:11:44] Epoch: 1 Batch: 10359/38378 (26.99%) Loss: 2.045018 LR: 0.00004595 [04:11:45] Epoch: 1 Batch: 10360/38378 (26.99%) Loss: 1.893408 LR: 0.00004595 [04:11:47] Epoch: 1 Batch: 10361/38378 (27.00%) Loss: 1.932929 LR: 0.00004595 [04:11:49] Epoch: 1 Batch: 10362/38378 (27.00%) Loss: 1.980645 LR: 0.00004595 [04:11:51] Epoch: 1 Batch: 10363/38378 (27.00%) Loss: 2.168235 LR: 0.00004595 [04:11:53] Epoch: 1 Batch: 10364/38378 (27.01%) Loss: 2.255298 LR: 0.00004595 [04:11:54] Epoch: 1 Batch: 10365/38378 (27.01%) Loss: 1.972163 LR: 0.00004595 [04:11:56] Epoch: 1 Batch: 10366/38378 (27.01%) Loss: 2.084646 LR: 0.00004595 [04:11:58] Epoch: 1 Batch: 10367/38378 (27.01%) Loss: 2.046229 LR: 0.00004595 [04:12:00] Epoch: 1 Batch: 10368/38378 (27.02%) Loss: 1.862239 LR: 0.00004595 [04:12:02] Epoch: 1 Batch: 10369/38378 (27.02%) Loss: 2.124658 LR: 0.00004595 [04:12:04] Epoch: 1 Batch: 10370/38378 (27.02%) Loss: 1.884963 LR: 0.00004595 [04:12:05] Epoch: 1 Batch: 10371/38378 (27.02%) Loss: 1.914478 LR: 0.00004595 [04:12:07] Epoch: 1 Batch: 10372/38378 (27.03%) Loss: 2.314108 LR: 0.00004595 [04:12:09] Epoch: 1 Batch: 10373/38378 (27.03%) Loss: 1.713359 LR: 0.00004594 [04:12:11] Epoch: 1 Batch: 10374/38378 (27.03%) Loss: 2.003057 LR: 0.00004594 [04:12:13] Epoch: 1 Batch: 10375/38378 (27.03%) Loss: 2.317890 LR: 0.00004594 [04:12:15] Epoch: 1 Batch: 10376/38378 (27.04%) Loss: 1.703373 LR: 0.00004594 [04:12:16] Epoch: 1 Batch: 10377/38378 (27.04%) Loss: 2.030322 LR: 0.00004594 [04:12:18] Epoch: 1 Batch: 10378/38378 (27.04%) Loss: 2.178579 LR: 0.00004594 [04:12:20] Epoch: 1 Batch: 10379/38378 (27.04%) Loss: 1.990036 LR: 0.00004594 [04:12:22] Epoch: 1 Batch: 10380/38378 (27.05%) Loss: 2.018633 LR: 0.00004593 [04:12:24] Epoch: 1 Batch: 10381/38378 (27.05%) Loss: 1.874318 LR: 0.00004593 [04:12:25] Epoch: 1 Batch: 10382/38378 (27.05%) Loss: 2.086491 LR: 0.00004593 [04:12:27] Epoch: 1 Batch: 10383/38378 (27.05%) Loss: 1.993498 LR: 0.00004593 [04:12:29] Epoch: 1 Batch: 10384/38378 (27.06%) Loss: 1.985831 LR: 0.00004593 [04:12:31] Epoch: 1 Batch: 10385/38378 (27.06%) Loss: 1.898395 LR: 0.00004593 [04:12:33] Epoch: 1 Batch: 10386/38378 (27.06%) Loss: 2.247682 LR: 0.00004593 [04:12:34] Epoch: 1 Batch: 10387/38378 (27.06%) Loss: 2.100321 LR: 0.00004592 [04:12:36] Epoch: 1 Batch: 10388/38378 (27.07%) Loss: 2.275005 LR: 0.00004592 [04:12:38] Epoch: 1 Batch: 10389/38378 (27.07%) Loss: 2.332880 LR: 0.00004592 [04:12:40] Epoch: 1 Batch: 10390/38378 (27.07%) Loss: 1.818262 LR: 0.00004592 [04:12:42] Epoch: 1 Batch: 10391/38378 (27.08%) Loss: 2.047369 LR: 0.00004592 [04:12:44] Epoch: 1 Batch: 10392/38378 (27.08%) Loss: 2.351765 LR: 0.00004592 [04:12:45] Epoch: 1 Batch: 10393/38378 (27.08%) Loss: 2.359341 LR: 0.00004592 [04:12:47] Epoch: 1 Batch: 10394/38378 (27.08%) Loss: 2.205192 LR: 0.00004592 [04:12:49] Epoch: 1 Batch: 10395/38378 (27.09%) Loss: 2.018120 LR: 0.00004592 [04:12:51] Epoch: 1 Batch: 10396/38378 (27.09%) Loss: 1.955380 LR: 0.00004592 [04:12:53] Epoch: 1 Batch: 10397/38378 (27.09%) Loss: 1.850649 LR: 0.00004592 [04:12:54] Epoch: 1 Batch: 10398/38378 (27.09%) Loss: 2.048072 LR: 0.00004592 [04:12:56] Epoch: 1 Batch: 10399/38378 (27.10%) Loss: 1.944117 LR: 0.00004592 [04:13:02] >> Cleaned up old temp checkpoint: epoch1_step8400 [04:13:02] >> Temp checkpoint saved: epoch1_step10400, size: 0.1702 GB [04:13:02] Epoch: 1 Batch: 10400/38378 (27.10%) Loss: 1.924390 LR: 0.00004592 [04:13:04] Epoch: 1 Batch: 10401/38378 (27.10%) Loss: 1.877722 LR: 0.00004591 [04:13:06] Epoch: 1 Batch: 10402/38378 (27.10%) Loss: 1.815459 LR: 0.00004591 [04:13:08] Epoch: 1 Batch: 10403/38378 (27.11%) Loss: 2.214222 LR: 0.00004591 [04:13:10] Epoch: 1 Batch: 10404/38378 (27.11%) Loss: 2.167792 LR: 0.00004591 [04:13:11] Epoch: 1 Batch: 10405/38378 (27.11%) Loss: 1.513661 LR: 0.00004591 [04:13:13] Epoch: 1 Batch: 10406/38378 (27.11%) Loss: 1.969011 LR: 0.00004591 [04:13:15] Epoch: 1 Batch: 10407/38378 (27.12%) Loss: 2.061868 LR: 0.00004591 [04:13:17] Epoch: 1 Batch: 10408/38378 (27.12%) Loss: 1.861908 LR: 0.00004590 [04:13:19] Epoch: 1 Batch: 10409/38378 (27.12%) Loss: 1.968517 LR: 0.00004590 [04:13:20] Epoch: 1 Batch: 10410/38378 (27.12%) Loss: 2.113432 LR: 0.00004590 [04:13:22] Epoch: 1 Batch: 10411/38378 (27.13%) Loss: 1.999525 LR: 0.00004590 [04:13:24] Epoch: 1 Batch: 10412/38378 (27.13%) Loss: 1.803482 LR: 0.00004590 [04:13:26] Epoch: 1 Batch: 10413/38378 (27.13%) Loss: 2.210159 LR: 0.00004590 [04:13:28] Epoch: 1 Batch: 10414/38378 (27.14%) Loss: 2.036120 LR: 0.00004590 [04:13:30] Epoch: 1 Batch: 10415/38378 (27.14%) Loss: 2.195702 LR: 0.00004589 [04:13:31] Epoch: 1 Batch: 10416/38378 (27.14%) Loss: 2.133349 LR: 0.00004589 [04:13:33] Epoch: 1 Batch: 10417/38378 (27.14%) Loss: 1.884917 LR: 0.00004589 [04:13:35] Epoch: 1 Batch: 10418/38378 (27.15%) Loss: 1.922318 LR: 0.00004589 [04:13:37] Epoch: 1 Batch: 10419/38378 (27.15%) Loss: 1.984806 LR: 0.00004589 [04:13:39] Epoch: 1 Batch: 10420/38378 (27.15%) Loss: 1.905290 LR: 0.00004589 [04:13:40] Epoch: 1 Batch: 10421/38378 (27.15%) Loss: 1.956610 LR: 0.00004589 [04:13:42] Epoch: 1 Batch: 10422/38378 (27.16%) Loss: 2.138107 LR: 0.00004589 [04:13:44] Epoch: 1 Batch: 10423/38378 (27.16%) Loss: 1.720280 LR: 0.00004589 [04:13:46] Epoch: 1 Batch: 10424/38378 (27.16%) Loss: 1.985312 LR: 0.00004589 [04:13:48] Epoch: 1 Batch: 10425/38378 (27.16%) Loss: 2.020316 LR: 0.00004589 [04:13:50] Epoch: 1 Batch: 10426/38378 (27.17%) Loss: 1.879706 LR: 0.00004589 [04:13:51] Epoch: 1 Batch: 10427/38378 (27.17%) Loss: 2.071302 LR: 0.00004589 [04:13:53] Epoch: 1 Batch: 10428/38378 (27.17%) Loss: 1.882604 LR: 0.00004589 [04:13:55] Epoch: 1 Batch: 10429/38378 (27.17%) Loss: 1.767954 LR: 0.00004588 [04:13:57] Epoch: 1 Batch: 10430/38378 (27.18%) Loss: 1.776591 LR: 0.00004588 [04:13:59] Epoch: 1 Batch: 10431/38378 (27.18%) Loss: 1.968201 LR: 0.00004588 [04:14:01] Epoch: 1 Batch: 10432/38378 (27.18%) Loss: 1.948715 LR: 0.00004588 [04:14:02] Epoch: 1 Batch: 10433/38378 (27.18%) Loss: 1.775470 LR: 0.00004588 [04:14:04] Epoch: 1 Batch: 10434/38378 (27.19%) Loss: 1.906985 LR: 0.00004588 [04:14:06] Epoch: 1 Batch: 10435/38378 (27.19%) Loss: 1.893189 LR: 0.00004588 [04:14:08] Epoch: 1 Batch: 10436/38378 (27.19%) Loss: 1.986797 LR: 0.00004587 [04:14:10] Epoch: 1 Batch: 10437/38378 (27.20%) Loss: 2.134367 LR: 0.00004587 [04:14:11] Epoch: 1 Batch: 10438/38378 (27.20%) Loss: 2.196340 LR: 0.00004587 [04:14:13] Epoch: 1 Batch: 10439/38378 (27.20%) Loss: 2.134259 LR: 0.00004587 [04:14:15] Epoch: 1 Batch: 10440/38378 (27.20%) Loss: 2.150287 LR: 0.00004587 [04:14:17] Epoch: 1 Batch: 10441/38378 (27.21%) Loss: 1.979233 LR: 0.00004587 [04:14:18] Epoch: 1 Batch: 10442/38378 (27.21%) Loss: 2.043277 LR: 0.00004587 [04:14:20] Epoch: 1 Batch: 10443/38378 (27.21%) Loss: 2.144840 LR: 0.00004586 [04:14:22] Epoch: 1 Batch: 10444/38378 (27.21%) Loss: 1.730943 LR: 0.00004586 [04:14:24] Epoch: 1 Batch: 10445/38378 (27.22%) Loss: 1.818369 LR: 0.00004586 [04:14:26] Epoch: 1 Batch: 10446/38378 (27.22%) Loss: 2.134776 LR: 0.00004586 [04:14:27] Epoch: 1 Batch: 10447/38378 (27.22%) Loss: 2.120450 LR: 0.00004586 [04:14:29] Epoch: 1 Batch: 10448/38378 (27.22%) Loss: 2.186803 LR: 0.00004586 [04:14:31] Epoch: 1 Batch: 10449/38378 (27.23%) Loss: 2.202352 LR: 0.00004586 [04:14:33] Epoch: 1 Batch: 10450/38378 (27.23%) Loss: 2.312178 LR: 0.00004586 [04:14:35] Epoch: 1 Batch: 10451/38378 (27.23%) Loss: 2.093230 LR: 0.00004586 [04:14:36] Epoch: 1 Batch: 10452/38378 (27.23%) Loss: 1.877905 LR: 0.00004586 [04:14:38] Epoch: 1 Batch: 10453/38378 (27.24%) Loss: 1.992820 LR: 0.00004586 [04:14:40] Epoch: 1 Batch: 10454/38378 (27.24%) Loss: 2.079411 LR: 0.00004586 [04:14:42] Epoch: 1 Batch: 10455/38378 (27.24%) Loss: 1.864340 LR: 0.00004586 [04:14:44] Epoch: 1 Batch: 10456/38378 (27.24%) Loss: 2.058783 LR: 0.00004586 [04:14:46] Epoch: 1 Batch: 10457/38378 (27.25%) Loss: 2.127670 LR: 0.00004585 [04:14:47] Epoch: 1 Batch: 10458/38378 (27.25%) Loss: 1.934426 LR: 0.00004585 [04:14:49] Epoch: 1 Batch: 10459/38378 (27.25%) Loss: 2.116485 LR: 0.00004585 [04:14:51] Epoch: 1 Batch: 10460/38378 (27.26%) Loss: 2.030585 LR: 0.00004585 [04:14:53] Epoch: 1 Batch: 10461/38378 (27.26%) Loss: 1.830844 LR: 0.00004585 [04:14:55] Epoch: 1 Batch: 10462/38378 (27.26%) Loss: 1.757207 LR: 0.00004585 [04:14:56] Epoch: 1 Batch: 10463/38378 (27.26%) Loss: 1.875613 LR: 0.00004585 [04:14:58] Epoch: 1 Batch: 10464/38378 (27.27%) Loss: 2.168111 LR: 0.00004584 [04:15:00] Epoch: 1 Batch: 10465/38378 (27.27%) Loss: 2.078919 LR: 0.00004584 [04:15:02] Epoch: 1 Batch: 10466/38378 (27.27%) Loss: 1.844605 LR: 0.00004584 [04:15:04] Epoch: 1 Batch: 10467/38378 (27.27%) Loss: 2.043216 LR: 0.00004584 [04:15:06] Epoch: 1 Batch: 10468/38378 (27.28%) Loss: 2.071216 LR: 0.00004584 [04:15:07] Epoch: 1 Batch: 10469/38378 (27.28%) Loss: 2.014406 LR: 0.00004584 [04:15:09] Epoch: 1 Batch: 10470/38378 (27.28%) Loss: 2.148127 LR: 0.00004584 [04:15:11] Epoch: 1 Batch: 10471/38378 (27.28%) Loss: 1.894766 LR: 0.00004583 [04:15:13] Epoch: 1 Batch: 10472/38378 (27.29%) Loss: 2.092049 LR: 0.00004583 [04:15:15] Epoch: 1 Batch: 10473/38378 (27.29%) Loss: 2.072696 LR: 0.00004583 [04:15:16] Epoch: 1 Batch: 10474/38378 (27.29%) Loss: 2.047865 LR: 0.00004583 [04:15:18] Epoch: 1 Batch: 10475/38378 (27.29%) Loss: 2.094930 LR: 0.00004583 [04:15:20] Epoch: 1 Batch: 10476/38378 (27.30%) Loss: 1.984827 LR: 0.00004583 [04:15:22] Epoch: 1 Batch: 10477/38378 (27.30%) Loss: 1.893005 LR: 0.00004583 [04:15:24] Epoch: 1 Batch: 10478/38378 (27.30%) Loss: 1.873878 LR: 0.00004583 [04:15:26] Epoch: 1 Batch: 10479/38378 (27.30%) Loss: 1.885429 LR: 0.00004583 [04:15:27] Epoch: 1 Batch: 10480/38378 (27.31%) Loss: 2.335483 LR: 0.00004583 [04:15:29] Epoch: 1 Batch: 10481/38378 (27.31%) Loss: 1.858471 LR: 0.00004583 [04:15:31] Epoch: 1 Batch: 10482/38378 (27.31%) Loss: 1.821010 LR: 0.00004583 [04:15:33] Epoch: 1 Batch: 10483/38378 (27.32%) Loss: 2.139455 LR: 0.00004583 [04:15:35] Epoch: 1 Batch: 10484/38378 (27.32%) Loss: 2.131841 LR: 0.00004583 [04:15:37] Epoch: 1 Batch: 10485/38378 (27.32%) Loss: 1.950914 LR: 0.00004582 [04:15:38] Epoch: 1 Batch: 10486/38378 (27.32%) Loss: 1.786409 LR: 0.00004582 [04:15:40] Epoch: 1 Batch: 10487/38378 (27.33%) Loss: 1.952599 LR: 0.00004582 [04:15:42] Epoch: 1 Batch: 10488/38378 (27.33%) Loss: 1.986137 LR: 0.00004582 [04:15:44] Epoch: 1 Batch: 10489/38378 (27.33%) Loss: 2.001888 LR: 0.00004582 [04:15:46] Epoch: 1 Batch: 10490/38378 (27.33%) Loss: 1.898696 LR: 0.00004582 [04:15:47] Epoch: 1 Batch: 10491/38378 (27.34%) Loss: 1.829930 LR: 0.00004582 [04:15:49] Epoch: 1 Batch: 10492/38378 (27.34%) Loss: 2.036837 LR: 0.00004581 [04:15:51] Epoch: 1 Batch: 10493/38378 (27.34%) Loss: 2.103071 LR: 0.00004581 [04:15:53] Epoch: 1 Batch: 10494/38378 (27.34%) Loss: 1.956396 LR: 0.00004581 [04:15:55] Epoch: 1 Batch: 10495/38378 (27.35%) Loss: 1.990301 LR: 0.00004581 [04:15:56] Epoch: 1 Batch: 10496/38378 (27.35%) Loss: 1.607194 LR: 0.00004581 [04:15:58] Epoch: 1 Batch: 10497/38378 (27.35%) Loss: 2.160250 LR: 0.00004581 [04:16:00] Epoch: 1 Batch: 10498/38378 (27.35%) Loss: 2.049344 LR: 0.00004581 [04:16:02] Epoch: 1 Batch: 10499/38378 (27.36%) Loss: 2.212015 LR: 0.00004580 [04:16:04] >> Evaluating batch 0 [04:16:05] >> Evaluating batch 1 [04:16:06] >> Evaluating batch 2 [04:16:07] >> Evaluating batch 3 [04:16:08] >> Evaluating batch 4 [04:16:09] >> Evaluating batch 5 [04:16:10] >> Evaluating batch 6 [04:16:11] >> Evaluating batch 7 [04:16:12] >> Evaluating batch 8 [04:16:13] >> Evaluating batch 9 [04:16:14] >> Evaluating batch 10 [04:16:15] >> Evaluating batch 11 [04:16:16] >> Evaluating batch 12 [04:16:17] >> Evaluating batch 13 [04:16:18] >> Evaluating batch 14 [04:16:19] >> Evaluating batch 15 [04:16:20] >> Evaluating batch 16 [04:16:21] Epoch: 1 Step: 10500/38378 Evaluation: [04:16:21] [1mAvg Loss Since Last Eval: 2.0071 Val Loss: 2.1129 Validation loss delta: -0.0015 Perplexity: 8.2724 LR: 0.00004580 [04:16:26] >> Checkpoint saved: epoch1_step10500, size: 0.1702 GB [04:16:26] Epoch: 1 Batch: 10500/38378 (27.36%) Loss: 1.920366 LR: 0.00004580 [04:16:27] Epoch: 1 Batch: 10501/38378 (27.36%) Loss: 1.951355 LR: 0.00004580 [04:16:29] Epoch: 1 Batch: 10502/38378 (27.36%) Loss: 1.950682 LR: 0.00004580 [04:16:31] Epoch: 1 Batch: 10503/38378 (27.37%) Loss: 1.944482 LR: 0.00004580 [04:16:33] Epoch: 1 Batch: 10504/38378 (27.37%) Loss: 2.107131 LR: 0.00004580 [04:16:35] Epoch: 1 Batch: 10505/38378 (27.37%) Loss: 2.057603 LR: 0.00004580 [04:16:37] Epoch: 1 Batch: 10506/38378 (27.38%) Loss: 2.117314 LR: 0.00004580 [04:16:38] Epoch: 1 Batch: 10507/38378 (27.38%) Loss: 2.163083 LR: 0.00004580 [04:16:40] Epoch: 1 Batch: 10508/38378 (27.38%) Loss: 1.787866 LR: 0.00004580 [04:16:42] Epoch: 1 Batch: 10509/38378 (27.38%) Loss: 1.877926 LR: 0.00004580 [04:16:44] Epoch: 1 Batch: 10510/38378 (27.39%) Loss: 1.814360 LR: 0.00004580 [04:16:46] Epoch: 1 Batch: 10511/38378 (27.39%) Loss: 1.838599 LR: 0.00004580 [04:16:48] Epoch: 1 Batch: 10512/38378 (27.39%) Loss: 1.987654 LR: 0.00004580 [04:16:49] Epoch: 1 Batch: 10513/38378 (27.39%) Loss: 1.936296 LR: 0.00004579 [04:16:51] Epoch: 1 Batch: 10514/38378 (27.40%) Loss: 1.837660 LR: 0.00004579 [04:16:53] Epoch: 1 Batch: 10515/38378 (27.40%) Loss: 2.014110 LR: 0.00004579 [04:16:55] Epoch: 1 Batch: 10516/38378 (27.40%) Loss: 2.027225 LR: 0.00004579 [04:16:57] Epoch: 1 Batch: 10517/38378 (27.40%) Loss: 1.878309 LR: 0.00004579 [04:16:59] Epoch: 1 Batch: 10518/38378 (27.41%) Loss: 1.778077 LR: 0.00004579 [04:17:00] Epoch: 1 Batch: 10519/38378 (27.41%) Loss: 2.081650 LR: 0.00004579 [04:17:02] Epoch: 1 Batch: 10520/38378 (27.41%) Loss: 1.949157 LR: 0.00004578 [04:17:04] Epoch: 1 Batch: 10521/38378 (27.41%) Loss: 1.734000 LR: 0.00004578 [04:17:06] Epoch: 1 Batch: 10522/38378 (27.42%) Loss: 1.871448 LR: 0.00004578 [04:17:07] Epoch: 1 Batch: 10523/38378 (27.42%) Loss: 2.207636 LR: 0.00004578 [04:17:09] Epoch: 1 Batch: 10524/38378 (27.42%) Loss: 2.007124 LR: 0.00004578 [04:17:11] Epoch: 1 Batch: 10525/38378 (27.42%) Loss: 2.063137 LR: 0.00004578 [04:17:13] Epoch: 1 Batch: 10526/38378 (27.43%) Loss: 2.135763 LR: 0.00004578 [04:17:14] Epoch: 1 Batch: 10527/38378 (27.43%) Loss: 2.286052 LR: 0.00004577 [04:17:16] Epoch: 1 Batch: 10528/38378 (27.43%) Loss: 2.317340 LR: 0.00004577 [04:17:18] Epoch: 1 Batch: 10529/38378 (27.43%) Loss: 1.993216 LR: 0.00004577 [04:17:20] Epoch: 1 Batch: 10530/38378 (27.44%) Loss: 2.149664 LR: 0.00004577 [04:17:22] Epoch: 1 Batch: 10531/38378 (27.44%) Loss: 2.387444 LR: 0.00004577 [04:17:24] Epoch: 1 Batch: 10532/38378 (27.44%) Loss: 2.050059 LR: 0.00004577 [04:17:25] Epoch: 1 Batch: 10533/38378 (27.45%) Loss: 1.712730 LR: 0.00004577 [04:17:27] Epoch: 1 Batch: 10534/38378 (27.45%) Loss: 2.072389 LR: 0.00004577 [04:17:29] Epoch: 1 Batch: 10535/38378 (27.45%) Loss: 2.205467 LR: 0.00004577 [04:17:31] Epoch: 1 Batch: 10536/38378 (27.45%) Loss: 1.884312 LR: 0.00004577 [04:17:33] Epoch: 1 Batch: 10537/38378 (27.46%) Loss: 2.109833 LR: 0.00004577 [04:17:34] Epoch: 1 Batch: 10538/38378 (27.46%) Loss: 2.241773 LR: 0.00004577 [04:17:36] Epoch: 1 Batch: 10539/38378 (27.46%) Loss: 1.972441 LR: 0.00004577 [04:17:38] Epoch: 1 Batch: 10540/38378 (27.46%) Loss: 2.235503 LR: 0.00004577 [04:17:40] Epoch: 1 Batch: 10541/38378 (27.47%) Loss: 2.041943 LR: 0.00004576 [04:17:41] Epoch: 1 Batch: 10542/38378 (27.47%) Loss: 1.878918 LR: 0.00004576 [04:17:43] Epoch: 1 Batch: 10543/38378 (27.47%) Loss: 1.732469 LR: 0.00004576 [04:17:45] Epoch: 1 Batch: 10544/38378 (27.47%) Loss: 1.897875 LR: 0.00004576 [04:17:47] Epoch: 1 Batch: 10545/38378 (27.48%) Loss: 1.965025 LR: 0.00004576 [04:17:49] Epoch: 1 Batch: 10546/38378 (27.48%) Loss: 2.297006 LR: 0.00004576 [04:17:50] Epoch: 1 Batch: 10547/38378 (27.48%) Loss: 1.735539 LR: 0.00004576 [04:17:52] Epoch: 1 Batch: 10548/38378 (27.48%) Loss: 2.182311 LR: 0.00004575 [04:17:54] Epoch: 1 Batch: 10549/38378 (27.49%) Loss: 1.851198 LR: 0.00004575 [04:17:56] Epoch: 1 Batch: 10550/38378 (27.49%) Loss: 2.013133 LR: 0.00004575 [04:17:58] Epoch: 1 Batch: 10551/38378 (27.49%) Loss: 1.835912 LR: 0.00004575 [04:18:00] Epoch: 1 Batch: 10552/38378 (27.49%) Loss: 2.280264 LR: 0.00004575 [04:18:01] Epoch: 1 Batch: 10553/38378 (27.50%) Loss: 1.900826 LR: 0.00004575 [04:18:03] Epoch: 1 Batch: 10554/38378 (27.50%) Loss: 1.979487 LR: 0.00004575 [04:18:05] Epoch: 1 Batch: 10555/38378 (27.50%) Loss: 1.983309 LR: 0.00004574 [04:18:07] Epoch: 1 Batch: 10556/38378 (27.51%) Loss: 1.895916 LR: 0.00004574 [04:18:09] Epoch: 1 Batch: 10557/38378 (27.51%) Loss: 2.018012 LR: 0.00004574 [04:18:10] Epoch: 1 Batch: 10558/38378 (27.51%) Loss: 1.874438 LR: 0.00004574 [04:18:12] Epoch: 1 Batch: 10559/38378 (27.51%) Loss: 2.121560 LR: 0.00004574 [04:18:14] Epoch: 1 Batch: 10560/38378 (27.52%) Loss: 1.825558 LR: 0.00004574 [04:18:16] Epoch: 1 Batch: 10561/38378 (27.52%) Loss: 1.965747 LR: 0.00004574 [04:18:18] Epoch: 1 Batch: 10562/38378 (27.52%) Loss: 2.124398 LR: 0.00004573 [04:18:20] Epoch: 1 Batch: 10563/38378 (27.52%) Loss: 1.909192 LR: 0.00004573 [04:18:21] Epoch: 1 Batch: 10564/38378 (27.53%) Loss: 1.953874 LR: 0.00004573 [04:18:23] Epoch: 1 Batch: 10565/38378 (27.53%) Loss: 1.847171 LR: 0.00004573 [04:18:25] Epoch: 1 Batch: 10566/38378 (27.53%) Loss: 1.937280 LR: 0.00004573 [04:18:27] Epoch: 1 Batch: 10567/38378 (27.53%) Loss: 2.096254 LR: 0.00004573 [04:18:29] Epoch: 1 Batch: 10568/38378 (27.54%) Loss: 1.967311 LR: 0.00004573 [04:18:30] Epoch: 1 Batch: 10569/38378 (27.54%) Loss: 2.218564 LR: 0.00004573 [04:18:32] Epoch: 1 Batch: 10570/38378 (27.54%) Loss: 2.049486 LR: 0.00004573 [04:18:34] Epoch: 1 Batch: 10571/38378 (27.54%) Loss: 2.076021 LR: 0.00004573 [04:18:36] Epoch: 1 Batch: 10572/38378 (27.55%) Loss: 2.087473 LR: 0.00004573 [04:18:38] Epoch: 1 Batch: 10573/38378 (27.55%) Loss: 1.726582 LR: 0.00004573 [04:18:39] Epoch: 1 Batch: 10574/38378 (27.55%) Loss: 1.842426 LR: 0.00004573 [04:18:41] Epoch: 1 Batch: 10575/38378 (27.55%) Loss: 1.840457 LR: 0.00004573 [04:18:43] Epoch: 1 Batch: 10576/38378 (27.56%) Loss: 1.914377 LR: 0.00004572 [04:18:45] Epoch: 1 Batch: 10577/38378 (27.56%) Loss: 1.886741 LR: 0.00004572 [04:18:47] Epoch: 1 Batch: 10578/38378 (27.56%) Loss: 2.136955 LR: 0.00004572 [04:18:48] Epoch: 1 Batch: 10579/38378 (27.57%) Loss: 2.501349 LR: 0.00004572 [04:18:50] Epoch: 1 Batch: 10580/38378 (27.57%) Loss: 1.983028 LR: 0.00004572 [04:18:52] Epoch: 1 Batch: 10581/38378 (27.57%) Loss: 1.903992 LR: 0.00004572 [04:18:54] Epoch: 1 Batch: 10582/38378 (27.57%) Loss: 2.102289 LR: 0.00004572 [04:18:56] Epoch: 1 Batch: 10583/38378 (27.58%) Loss: 2.127456 LR: 0.00004571 [04:18:58] Epoch: 1 Batch: 10584/38378 (27.58%) Loss: 1.947141 LR: 0.00004571 [04:18:59] Epoch: 1 Batch: 10585/38378 (27.58%) Loss: 1.878915 LR: 0.00004571 [04:19:01] Epoch: 1 Batch: 10586/38378 (27.58%) Loss: 1.998630 LR: 0.00004571 [04:19:03] Epoch: 1 Batch: 10587/38378 (27.59%) Loss: 1.679491 LR: 0.00004571 [04:19:05] Epoch: 1 Batch: 10588/38378 (27.59%) Loss: 1.878598 LR: 0.00004571 [04:19:07] Epoch: 1 Batch: 10589/38378 (27.59%) Loss: 2.415026 LR: 0.00004571 [04:19:09] Epoch: 1 Batch: 10590/38378 (27.59%) Loss: 1.995798 LR: 0.00004570 [04:19:10] Epoch: 1 Batch: 10591/38378 (27.60%) Loss: 2.052814 LR: 0.00004570 [04:19:12] Epoch: 1 Batch: 10592/38378 (27.60%) Loss: 2.157985 LR: 0.00004570 [04:19:14] Epoch: 1 Batch: 10593/38378 (27.60%) Loss: 2.093993 LR: 0.00004570 [04:19:16] Epoch: 1 Batch: 10594/38378 (27.60%) Loss: 1.920844 LR: 0.00004570 [04:19:18] Epoch: 1 Batch: 10595/38378 (27.61%) Loss: 1.805081 LR: 0.00004570 [04:19:19] Epoch: 1 Batch: 10596/38378 (27.61%) Loss: 1.987298 LR: 0.00004570 [04:19:21] Epoch: 1 Batch: 10597/38378 (27.61%) Loss: 1.906420 LR: 0.00004570 [04:19:23] Epoch: 1 Batch: 10598/38378 (27.61%) Loss: 2.137435 LR: 0.00004570 [04:19:25] Epoch: 1 Batch: 10599/38378 (27.62%) Loss: 1.997297 LR: 0.00004570 [04:19:31] >> Cleaned up old temp checkpoint: epoch1_step8600 [04:19:31] >> Temp checkpoint saved: epoch1_step10600, size: 0.1702 GB [04:19:31] Epoch: 1 Batch: 10600/38378 (27.62%) Loss: 1.909372 LR: 0.00004570 [04:19:33] Epoch: 1 Batch: 10601/38378 (27.62%) Loss: 2.221926 LR: 0.00004570 [04:19:35] Epoch: 1 Batch: 10602/38378 (27.63%) Loss: 1.967513 LR: 0.00004570 [04:19:36] Epoch: 1 Batch: 10603/38378 (27.63%) Loss: 1.761651 LR: 0.00004570 [04:19:38] Epoch: 1 Batch: 10604/38378 (27.63%) Loss: 1.783521 LR: 0.00004569 [04:19:40] Epoch: 1 Batch: 10605/38378 (27.63%) Loss: 1.889897 LR: 0.00004569 [04:19:42] Epoch: 1 Batch: 10606/38378 (27.64%) Loss: 2.193228 LR: 0.00004569 [04:19:44] Epoch: 1 Batch: 10607/38378 (27.64%) Loss: 1.702720 LR: 0.00004569 [04:19:45] Epoch: 1 Batch: 10608/38378 (27.64%) Loss: 2.001634 LR: 0.00004569 [04:19:47] Epoch: 1 Batch: 10609/38378 (27.64%) Loss: 1.960563 LR: 0.00004569 [04:19:49] Epoch: 1 Batch: 10610/38378 (27.65%) Loss: 2.069382 LR: 0.00004569 [04:19:51] Epoch: 1 Batch: 10611/38378 (27.65%) Loss: 2.149808 LR: 0.00004568 [04:19:53] Epoch: 1 Batch: 10612/38378 (27.65%) Loss: 1.877520 LR: 0.00004568 [04:19:54] Epoch: 1 Batch: 10613/38378 (27.65%) Loss: 1.967469 LR: 0.00004568 [04:19:56] Epoch: 1 Batch: 10614/38378 (27.66%) Loss: 1.928375 LR: 0.00004568 [04:19:58] Epoch: 1 Batch: 10615/38378 (27.66%) Loss: 2.164744 LR: 0.00004568 [04:20:00] Epoch: 1 Batch: 10616/38378 (27.66%) Loss: 2.061193 LR: 0.00004568 [04:20:02] Epoch: 1 Batch: 10617/38378 (27.66%) Loss: 1.909353 LR: 0.00004568 [04:20:04] Epoch: 1 Batch: 10618/38378 (27.67%) Loss: 1.820832 LR: 0.00004567 [04:20:06] Epoch: 1 Batch: 10619/38378 (27.67%) Loss: 1.967273 LR: 0.00004567 [04:20:07] Epoch: 1 Batch: 10620/38378 (27.67%) Loss: 2.094443 LR: 0.00004567 [04:20:09] Epoch: 1 Batch: 10621/38378 (27.67%) Loss: 2.292390 LR: 0.00004567 [04:20:11] Epoch: 1 Batch: 10622/38378 (27.68%) Loss: 2.029427 LR: 0.00004567 [04:20:13] Epoch: 1 Batch: 10623/38378 (27.68%) Loss: 1.936688 LR: 0.00004567 [04:20:15] Epoch: 1 Batch: 10624/38378 (27.68%) Loss: 1.711059 LR: 0.00004567 [04:20:17] Epoch: 1 Batch: 10625/38378 (27.69%) Loss: 1.907114 LR: 0.00004566 [04:20:18] Epoch: 1 Batch: 10626/38378 (27.69%) Loss: 2.054103 LR: 0.00004566 [04:20:20] Epoch: 1 Batch: 10627/38378 (27.69%) Loss: 1.951521 LR: 0.00004566 [04:20:22] Epoch: 1 Batch: 10628/38378 (27.69%) Loss: 1.828709 LR: 0.00004566 [04:20:24] Epoch: 1 Batch: 10629/38378 (27.70%) Loss: 1.990233 LR: 0.00004566 [04:20:25] Epoch: 1 Batch: 10630/38378 (27.70%) Loss: 2.080250 LR: 0.00004566 [04:20:27] Epoch: 1 Batch: 10631/38378 (27.70%) Loss: 2.221188 LR: 0.00004566 [04:20:29] Epoch: 1 Batch: 10632/38378 (27.70%) Loss: 1.864145 LR: 0.00004566 [04:20:31] Epoch: 1 Batch: 10633/38378 (27.71%) Loss: 1.800894 LR: 0.00004566 [04:20:33] Epoch: 1 Batch: 10634/38378 (27.71%) Loss: 2.269503 LR: 0.00004566 [04:20:34] Epoch: 1 Batch: 10635/38378 (27.71%) Loss: 1.975035 LR: 0.00004566 [04:20:36] Epoch: 1 Batch: 10636/38378 (27.71%) Loss: 1.905688 LR: 0.00004566 [04:20:38] Epoch: 1 Batch: 10637/38378 (27.72%) Loss: 1.855292 LR: 0.00004566 [04:20:40] Epoch: 1 Batch: 10638/38378 (27.72%) Loss: 2.385361 LR: 0.00004566 [04:20:42] Epoch: 1 Batch: 10639/38378 (27.72%) Loss: 2.012213 LR: 0.00004565 [04:20:43] Epoch: 1 Batch: 10640/38378 (27.72%) Loss: 1.885821 LR: 0.00004565 [04:20:45] Epoch: 1 Batch: 10641/38378 (27.73%) Loss: 2.170024 LR: 0.00004565 [04:20:47] Epoch: 1 Batch: 10642/38378 (27.73%) Loss: 2.031122 LR: 0.00004565 [04:20:49] Epoch: 1 Batch: 10643/38378 (27.73%) Loss: 1.708873 LR: 0.00004565 [04:24:33] 2025-08-12 [04:24:33] Tesla T4 [04:24:33] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [04:24:33] CPU usage: 41.5%, RAM usage: 13.9% [04:24:33] Running with the following configuration: [04:24:33] model_name: NousResearch/Hermes-3-Llama-3.1-8B [04:24:33] tokenizer: NousResearch/Hermes-3-Llama-3.1-8B [04:24:33] output_dir: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview [04:24:33] train_path: /content/drive/MyDrive/data/none.csv [04:24:33] checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step10600 [04:24:33] lr: 5e-05 [04:24:33] lr_floor: 1e-05 [04:24:33] epochs: 1 [04:24:33] batch_size: 5 [04:24:33] accum_steps: 7 [04:24:33] val_batch_size: 6 [04:24:33] max_val_size: 100 [04:24:33] max_length: 150 [04:24:33] save_temp_frequency: 100 [04:24:33] save_frequency: 500 [04:24:33] eval_frequency: 500 [04:24:33] save_pattern: y [04:24:33] quantization: y [04:24:33] quantization_bits: 4 [04:24:33] lora: y [04:24:33] frozen_lora_path: None [04:24:33] lora_rank: 16 [04:24:33] lora_alpha: 32 [04:24:33] lora_dropout: 0.08 [04:24:33] optimizer_weight_decay: 0.0 [04:24:33] warmup_type: cosine [04:24:33] warmup_ratio: 0.08 [04:24:33] warmup_steps: 439 [04:24:33] shuffle: y [04:24:33] csv_column: text [04:24:33] new_run: n [04:24:33] label_smoothing: 0.05 [04:24:33] SEED: 1 [04:24:33] Using device: cuda [04:24:34] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step10600 [04:26:15] Embeddings shape after: torch.Size([128256, 4096]) [04:26:16] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step10600 [04:26:16] Trainable LoRA 'default': [04:26:16] task_type: CAUSAL_LM [04:26:16] peft_type: PeftType.LORA [04:26:16] auto_mapping: None [04:26:16] base_model_name_or_path: NousResearch/Hermes-3-Llama-3.1-8B [04:26:16] revision: None [04:26:16] inference_mode: False [04:26:16] r: 16 [04:26:16] target_modules: {'q_proj', 'o_proj', 'v_proj', 'k_proj'} [04:26:16] exclude_modules: None [04:26:16] lora_alpha: 32 [04:26:16] lora_dropout: 0.08 [04:26:16] fan_in_fan_out: False [04:26:16] bias: none [04:26:16] use_rslora: True [04:26:16] modules_to_save: None [04:26:16] init_lora_weights: True [04:26:16] layers_to_transform: None [04:26:16] layers_pattern: None [04:26:16] rank_pattern: {} [04:26:16] alpha_pattern: {} [04:26:16] megatron_config: None [04:26:16] megatron_core: megatron.core [04:26:16] trainable_token_indices: None [04:26:16] loftq_config: {} [04:26:16] eva_config: None [04:26:16] corda_config: None [04:26:16] use_dora: False [04:26:16] use_qalora: False [04:26:16] qalora_group_size: 16 [04:26:16] layer_replication: None [04:26:16] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [04:26:16] lora_bias: False [04:26:16] target_parameters: None [04:26:16] _custom_modules: None [04:26:16] Embeddings shape after: torch.Size([128256, 4096]) [04:26:17] Resumed from epoch 1, step 10601, file 1 [04:26:17] Starting from CSV file... [04:26:18] Splitting data into chunks of 11000... [04:26:18] Using 7 processes across 18 chunks [04:26:19] Using saved train/val split from checkpoint. [04:26:19] Resuming scheduler with warmup steps: 438, total steps: 5482 [04:26:19] Initializing scheduler with cosine schedule with warmup, warmup steps 439, total steps: 5482 [04:26:19] Train/Val split: 191887 train, 100 val samples. [04:26:28] Model: PeftModelForCausalLM [04:26:28] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.0", "use_cache": true, "vocab_size": 128256 } [04:26:28] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [04:26:28] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 5e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [04:26:28] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [04:26:28] Scheduler: [04:26:28] Training on 191887 training samples, 100 validation samples [04:26:28] Average tokens per sample: 141.99 [04:26:28] Estimated epoch time: ~584.89 min [04:26:28] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Active memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Requested memory | 5983 MiB | 7000 MiB | 332173 MiB | 326190 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 7616 MiB | 7616 MiB | 7616 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 1259 MiB | 5879 MiB | 333261 MiB | 332002 MiB | |---------------------------------------------------------------------------| | Allocations | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | Active allocs | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | GPU reserved segments | 186 | 186 | 186 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 33 | 37 | 12954 | 12921 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [04:26:28] Restoring shuffle indices from training state for epoch 1 [04:26:28] CPU usage: 45.3%, RAM usage: 28.3% [04:26:29] Epoch 1 learning rate: 0.0 [04:26:29] Starting epoch 1 [04:27:05] Batch 10601: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) [04:27:07] Epoch: 1 Batch: 10601/38378 (27.62%) Loss: 2.223089 LR: 0.00000000 [04:27:09] Epoch: 1 Batch: 10602/38378 (27.63%) Loss: 1.969958 LR: 0.00000000 [04:27:10] Epoch: 1 Batch: 10603/38378 (27.63%) Loss: 1.761000 LR: 0.00000000 [04:27:12] Epoch: 1 Batch: 10604/38378 (27.63%) Loss: 1.784588 LR: 0.00000000 [04:27:13] Epoch: 1 Batch: 10605/38378 (27.63%) Loss: 1.895041 LR: 0.00000000 [04:27:15] Epoch: 1 Batch: 10606/38378 (27.64%) Loss: 2.197046 LR: 0.00000000 [04:27:16] Epoch: 1 Batch: 10607/38378 (27.64%) Loss: 1.704328 LR: 0.00004569 [04:27:18] Epoch: 1 Batch: 10608/38378 (27.64%) Loss: 2.000993 LR: 0.00004569 [04:27:20] Epoch: 1 Batch: 10609/38378 (27.64%) Loss: 1.957377 LR: 0.00004569 [04:27:21] Epoch: 1 Batch: 10610/38378 (27.65%) Loss: 2.069538 LR: 0.00004569 [04:27:23] Epoch: 1 Batch: 10611/38378 (27.65%) Loss: 2.149582 LR: 0.00004569 [04:27:24] Epoch: 1 Batch: 10612/38378 (27.65%) Loss: 1.877160 LR: 0.00004569 [04:27:26] Epoch: 1 Batch: 10613/38378 (27.65%) Loss: 1.968197 LR: 0.00004569 [04:27:28] Epoch: 1 Batch: 10614/38378 (27.66%) Loss: 1.925551 LR: 0.00004568 [04:27:29] Epoch: 1 Batch: 10615/38378 (27.66%) Loss: 2.168220 LR: 0.00004568 [04:27:31] Epoch: 1 Batch: 10616/38378 (27.66%) Loss: 2.067800 LR: 0.00004568 [04:27:32] Epoch: 1 Batch: 10617/38378 (27.66%) Loss: 1.909999 LR: 0.00004568 [04:27:34] Epoch: 1 Batch: 10618/38378 (27.67%) Loss: 1.820437 LR: 0.00004568 [04:27:36] Epoch: 1 Batch: 10619/38378 (27.67%) Loss: 1.966379 LR: 0.00004568 [04:27:37] Epoch: 1 Batch: 10620/38378 (27.67%) Loss: 2.094368 LR: 0.00004568 [04:27:39] Epoch: 1 Batch: 10621/38378 (27.67%) Loss: 2.292704 LR: 0.00004567 [04:27:41] Epoch: 1 Batch: 10622/38378 (27.68%) Loss: 2.043954 LR: 0.00004567 [04:27:42] Epoch: 1 Batch: 10623/38378 (27.68%) Loss: 1.939826 LR: 0.00004567 [04:27:44] Epoch: 1 Batch: 10624/38378 (27.68%) Loss: 1.723382 LR: 0.00004567 [04:27:45] Epoch: 1 Batch: 10625/38378 (27.69%) Loss: 1.904157 LR: 0.00004567 [04:27:47] Epoch: 1 Batch: 10626/38378 (27.69%) Loss: 2.061491 LR: 0.00004567 [04:27:49] Epoch: 1 Batch: 10627/38378 (27.69%) Loss: 1.960651 LR: 0.00004567 [04:27:51] Epoch: 1 Batch: 10628/38378 (27.69%) Loss: 1.836240 LR: 0.00004566 [04:27:52] Epoch: 1 Batch: 10629/38378 (27.70%) Loss: 1.993587 LR: 0.00004566 [04:27:54] Epoch: 1 Batch: 10630/38378 (27.70%) Loss: 2.079757 LR: 0.00004566 [04:27:55] Epoch: 1 Batch: 10631/38378 (27.70%) Loss: 2.218522 LR: 0.00004566 [04:27:57] Epoch: 1 Batch: 10632/38378 (27.70%) Loss: 1.863237 LR: 0.00004566 [04:27:59] Epoch: 1 Batch: 10633/38378 (27.71%) Loss: 1.801179 LR: 0.00004566 [04:28:00] Epoch: 1 Batch: 10634/38378 (27.71%) Loss: 2.269645 LR: 0.00004566 [04:28:02] Epoch: 1 Batch: 10635/38378 (27.71%) Loss: 1.977999 LR: 0.00004566 [04:28:04] Epoch: 1 Batch: 10636/38378 (27.71%) Loss: 1.903700 LR: 0.00004566 [04:28:06] Epoch: 1 Batch: 10637/38378 (27.72%) Loss: 1.852725 LR: 0.00004566 [04:28:07] Epoch: 1 Batch: 10638/38378 (27.72%) Loss: 2.383553 LR: 0.00004566 [04:28:09] Epoch: 1 Batch: 10639/38378 (27.72%) Loss: 2.009324 LR: 0.00004566 [04:28:11] Epoch: 1 Batch: 10640/38378 (27.72%) Loss: 1.884002 LR: 0.00004566 [04:28:13] Epoch: 1 Batch: 10641/38378 (27.73%) Loss: 2.174232 LR: 0.00004566 [04:28:15] Epoch: 1 Batch: 10642/38378 (27.73%) Loss: 2.035105 LR: 0.00004565 [04:28:16] Epoch: 1 Batch: 10643/38378 (27.73%) Loss: 1.709102 LR: 0.00004565 [04:28:18] Epoch: 1 Batch: 10644/38378 (27.73%) Loss: 2.150301 LR: 0.00004565 [04:28:20] Epoch: 1 Batch: 10645/38378 (27.74%) Loss: 1.840176 LR: 0.00004565 [04:28:22] Epoch: 1 Batch: 10646/38378 (27.74%) Loss: 1.716522 LR: 0.00004565 [04:28:24] Epoch: 1 Batch: 10647/38378 (27.74%) Loss: 1.564602 LR: 0.00004565 [04:28:26] Epoch: 1 Batch: 10648/38378 (27.75%) Loss: 2.211375 LR: 0.00004565 [04:28:28] Epoch: 1 Batch: 10649/38378 (27.75%) Loss: 1.844014 LR: 0.00004564 [04:28:29] Epoch: 1 Batch: 10650/38378 (27.75%) Loss: 2.278008 LR: 0.00004564 [04:28:31] Epoch: 1 Batch: 10651/38378 (27.75%) Loss: 2.125452 LR: 0.00004564 [04:28:33] Epoch: 1 Batch: 10652/38378 (27.76%) Loss: 1.777509 LR: 0.00004564 [04:28:35] Epoch: 1 Batch: 10653/38378 (27.76%) Loss: 2.277589 LR: 0.00004564 [04:28:37] Epoch: 1 Batch: 10654/38378 (27.76%) Loss: 1.986299 LR: 0.00004564 [04:28:39] Epoch: 1 Batch: 10655/38378 (27.76%) Loss: 2.057026 LR: 0.00004564 [04:28:41] Epoch: 1 Batch: 10656/38378 (27.77%) Loss: 1.924182 LR: 0.00004563 [04:28:43] Epoch: 1 Batch: 10657/38378 (27.77%) Loss: 2.152681 LR: 0.00004563 [04:28:44] Epoch: 1 Batch: 10658/38378 (27.77%) Loss: 1.944242 LR: 0.00004563 [04:28:46] Epoch: 1 Batch: 10659/38378 (27.77%) Loss: 1.711190 LR: 0.00004563 [04:28:48] Epoch: 1 Batch: 10660/38378 (27.78%) Loss: 2.253204 LR: 0.00004563 [04:28:50] Epoch: 1 Batch: 10661/38378 (27.78%) Loss: 1.848699 LR: 0.00004563 [04:28:52] Epoch: 1 Batch: 10662/38378 (27.78%) Loss: 1.850938 LR: 0.00004563 [04:28:53] Epoch: 1 Batch: 10663/38378 (27.78%) Loss: 2.169410 LR: 0.00004563 [04:28:55] Epoch: 1 Batch: 10664/38378 (27.79%) Loss: 2.293253 LR: 0.00004563 [04:28:57] Epoch: 1 Batch: 10665/38378 (27.79%) Loss: 2.117518 LR: 0.00004563 [04:28:59] Epoch: 1 Batch: 10666/38378 (27.79%) Loss: 1.981093 LR: 0.00004563 [04:29:00] Epoch: 1 Batch: 10667/38378 (27.79%) Loss: 1.957993 LR: 0.00004563 [04:29:02] Epoch: 1 Batch: 10668/38378 (27.80%) Loss: 2.047533 LR: 0.00004563 [04:29:04] Epoch: 1 Batch: 10669/38378 (27.80%) Loss: 1.912400 LR: 0.00004563 [04:29:06] Epoch: 1 Batch: 10670/38378 (27.80%) Loss: 1.860818 LR: 0.00004562 [04:29:08] Epoch: 1 Batch: 10671/38378 (27.80%) Loss: 2.001743 LR: 0.00004562 [04:29:09] Epoch: 1 Batch: 10672/38378 (27.81%) Loss: 2.040824 LR: 0.00004562 [04:29:11] Epoch: 1 Batch: 10673/38378 (27.81%) Loss: 2.049336 LR: 0.00004562 [04:29:13] Epoch: 1 Batch: 10674/38378 (27.81%) Loss: 2.247488 LR: 0.00004562 [04:29:15] Epoch: 1 Batch: 10675/38378 (27.82%) Loss: 1.961020 LR: 0.00004562 [04:29:16] Epoch: 1 Batch: 10676/38378 (27.82%) Loss: 2.691478 LR: 0.00004562 [04:29:18] Epoch: 1 Batch: 10677/38378 (27.82%) Loss: 1.899328 LR: 0.00004561 [04:29:20] Epoch: 1 Batch: 10678/38378 (27.82%) Loss: 2.413175 LR: 0.00004561 [04:29:22] Epoch: 1 Batch: 10679/38378 (27.83%) Loss: 2.248569 LR: 0.00004561 [04:29:23] Epoch: 1 Batch: 10680/38378 (27.83%) Loss: 2.034903 LR: 0.00004561 [04:29:25] Epoch: 1 Batch: 10681/38378 (27.83%) Loss: 1.759094 LR: 0.00004561 [04:29:27] Epoch: 1 Batch: 10682/38378 (27.83%) Loss: 2.028972 LR: 0.00004561 [04:29:29] Epoch: 1 Batch: 10683/38378 (27.84%) Loss: 2.181344 LR: 0.00004561 [04:29:30] Epoch: 1 Batch: 10684/38378 (27.84%) Loss: 1.841881 LR: 0.00004560 [04:29:32] Epoch: 1 Batch: 10685/38378 (27.84%) Loss: 1.780001 LR: 0.00004560 [04:29:34] Epoch: 1 Batch: 10686/38378 (27.84%) Loss: 2.014908 LR: 0.00004560 [04:29:36] Epoch: 1 Batch: 10687/38378 (27.85%) Loss: 2.251900 LR: 0.00004560 [04:29:38] Epoch: 1 Batch: 10688/38378 (27.85%) Loss: 1.873316 LR: 0.00004560 [04:29:39] Epoch: 1 Batch: 10689/38378 (27.85%) Loss: 1.891794 LR: 0.00004560 [04:29:41] Epoch: 1 Batch: 10690/38378 (27.85%) Loss: 2.231860 LR: 0.00004560 [04:29:43] Epoch: 1 Batch: 10691/38378 (27.86%) Loss: 1.868428 LR: 0.00004559 [04:29:45] Epoch: 1 Batch: 10692/38378 (27.86%) Loss: 2.144686 LR: 0.00004559 [04:29:47] Epoch: 1 Batch: 10693/38378 (27.86%) Loss: 2.104770 LR: 0.00004559 [04:29:48] Epoch: 1 Batch: 10694/38378 (27.86%) Loss: 2.266313 LR: 0.00004559 [04:29:50] Epoch: 1 Batch: 10695/38378 (27.87%) Loss: 2.054454 LR: 0.00004559 [04:29:52] Epoch: 1 Batch: 10696/38378 (27.87%) Loss: 1.820274 LR: 0.00004559 [04:29:54] Epoch: 1 Batch: 10697/38378 (27.87%) Loss: 1.811525 LR: 0.00004559 [04:29:56] Epoch: 1 Batch: 10698/38378 (27.88%) Loss: 2.122303 LR: 0.00004559 [04:29:58] Epoch: 1 Batch: 10699/38378 (27.88%) Loss: 2.251631 LR: 0.00004559 [04:30:04] >> Cleaned up old temp checkpoint: epoch1_step8800 [04:30:04] >> Temp checkpoint saved: epoch1_step10700, size: 0.1702 GB [04:30:04] Epoch: 1 Batch: 10700/38378 (27.88%) Loss: 2.215653 LR: 0.00004559 [04:30:06] Epoch: 1 Batch: 10701/38378 (27.88%) Loss: 1.999649 LR: 0.00004559 [04:30:08] Epoch: 1 Batch: 10702/38378 (27.89%) Loss: 2.026795 LR: 0.00004559 [04:30:10] Epoch: 1 Batch: 10703/38378 (27.89%) Loss: 2.043397 LR: 0.00004559 [04:30:11] Epoch: 1 Batch: 10704/38378 (27.89%) Loss: 1.978901 LR: 0.00004559 [04:30:13] Epoch: 1 Batch: 10705/38378 (27.89%) Loss: 1.900115 LR: 0.00004558 [04:30:15] Epoch: 1 Batch: 10706/38378 (27.90%) Loss: 2.205195 LR: 0.00004558 [04:30:17] Epoch: 1 Batch: 10707/38378 (27.90%) Loss: 2.079425 LR: 0.00004558 [04:30:19] Epoch: 1 Batch: 10708/38378 (27.90%) Loss: 1.906977 LR: 0.00004558 [04:30:20] Epoch: 1 Batch: 10709/38378 (27.90%) Loss: 2.084344 LR: 0.00004558 [04:30:22] Epoch: 1 Batch: 10710/38378 (27.91%) Loss: 1.761330 LR: 0.00004558 [04:30:24] Epoch: 1 Batch: 10711/38378 (27.91%) Loss: 1.781310 LR: 0.00004558 [04:30:26] Epoch: 1 Batch: 10712/38378 (27.91%) Loss: 1.929323 LR: 0.00004557 [04:30:28] Epoch: 1 Batch: 10713/38378 (27.91%) Loss: 2.072757 LR: 0.00004557 [04:30:29] Epoch: 1 Batch: 10714/38378 (27.92%) Loss: 2.004372 LR: 0.00004557 [04:30:31] Epoch: 1 Batch: 10715/38378 (27.92%) Loss: 2.131880 LR: 0.00004557 [04:30:33] Epoch: 1 Batch: 10716/38378 (27.92%) Loss: 2.200589 LR: 0.00004557 [04:30:35] Epoch: 1 Batch: 10717/38378 (27.92%) Loss: 1.834826 LR: 0.00004557 [04:30:37] Epoch: 1 Batch: 10718/38378 (27.93%) Loss: 1.880394 LR: 0.00004557 [04:30:39] Epoch: 1 Batch: 10719/38378 (27.93%) Loss: 2.022375 LR: 0.00004556 [04:30:40] Epoch: 1 Batch: 10720/38378 (27.93%) Loss: 1.623513 LR: 0.00004556 [04:30:42] Epoch: 1 Batch: 10721/38378 (27.94%) Loss: 2.122257 LR: 0.00004556 [04:30:44] Epoch: 1 Batch: 10722/38378 (27.94%) Loss: 1.923754 LR: 0.00004556 [04:30:46] Epoch: 1 Batch: 10723/38378 (27.94%) Loss: 2.303446 LR: 0.00004556 [04:30:48] Epoch: 1 Batch: 10724/38378 (27.94%) Loss: 1.819935 LR: 0.00004556 [04:30:50] Epoch: 1 Batch: 10725/38378 (27.95%) Loss: 2.313043 LR: 0.00004556 [04:30:51] Epoch: 1 Batch: 10726/38378 (27.95%) Loss: 2.327678 LR: 0.00004556 [04:30:53] Epoch: 1 Batch: 10727/38378 (27.95%) Loss: 2.075560 LR: 0.00004556 [04:30:55] Epoch: 1 Batch: 10728/38378 (27.95%) Loss: 2.088103 LR: 0.00004556 [04:30:57] Epoch: 1 Batch: 10729/38378 (27.96%) Loss: 1.859343 LR: 0.00004556 [04:30:59] Epoch: 1 Batch: 10730/38378 (27.96%) Loss: 1.927793 LR: 0.00004556 [04:31:00] Epoch: 1 Batch: 10731/38378 (27.96%) Loss: 1.969144 LR: 0.00004556 [04:31:02] Epoch: 1 Batch: 10732/38378 (27.96%) Loss: 2.100349 LR: 0.00004556 [04:31:04] Epoch: 1 Batch: 10733/38378 (27.97%) Loss: 2.024287 LR: 0.00004555 [04:31:06] Epoch: 1 Batch: 10734/38378 (27.97%) Loss: 2.055397 LR: 0.00004555 [04:31:08] Epoch: 1 Batch: 10735/38378 (27.97%) Loss: 2.003108 LR: 0.00004555 [04:31:09] Epoch: 1 Batch: 10736/38378 (27.97%) Loss: 2.061287 LR: 0.00004555 [04:31:11] Epoch: 1 Batch: 10737/38378 (27.98%) Loss: 1.939777 LR: 0.00004555 [04:31:13] Epoch: 1 Batch: 10738/38378 (27.98%) Loss: 2.274066 LR: 0.00004555 [04:31:15] Epoch: 1 Batch: 10739/38378 (27.98%) Loss: 2.072858 LR: 0.00004555 [04:31:17] Epoch: 1 Batch: 10740/38378 (27.98%) Loss: 1.684630 LR: 0.00004554 [04:31:18] Epoch: 1 Batch: 10741/38378 (27.99%) Loss: 1.796899 LR: 0.00004554 [04:31:20] Epoch: 1 Batch: 10742/38378 (27.99%) Loss: 1.625272 LR: 0.00004554 [04:31:22] Epoch: 1 Batch: 10743/38378 (27.99%) Loss: 2.375532 LR: 0.00004554 [04:31:24] Epoch: 1 Batch: 10744/38378 (28.00%) Loss: 2.150653 LR: 0.00004554 [04:31:26] Epoch: 1 Batch: 10745/38378 (28.00%) Loss: 2.127051 LR: 0.00004554 [04:31:27] Epoch: 1 Batch: 10746/38378 (28.00%) Loss: 1.961257 LR: 0.00004554 [04:31:29] Epoch: 1 Batch: 10747/38378 (28.00%) Loss: 1.903830 LR: 0.00004553 [04:31:31] Epoch: 1 Batch: 10748/38378 (28.01%) Loss: 1.772891 LR: 0.00004553 [04:31:33] Epoch: 1 Batch: 10749/38378 (28.01%) Loss: 1.835633 LR: 0.00004553 [04:31:35] Epoch: 1 Batch: 10750/38378 (28.01%) Loss: 2.124185 LR: 0.00004553 [04:31:37] Epoch: 1 Batch: 10751/38378 (28.01%) Loss: 2.005511 LR: 0.00004553 [04:31:38] Epoch: 1 Batch: 10752/38378 (28.02%) Loss: 2.114487 LR: 0.00004553 [04:31:40] Epoch: 1 Batch: 10753/38378 (28.02%) Loss: 1.999027 LR: 0.00004553 [04:31:42] Epoch: 1 Batch: 10754/38378 (28.02%) Loss: 1.943499 LR: 0.00004552 [04:31:44] Epoch: 1 Batch: 10755/38378 (28.02%) Loss: 2.042965 LR: 0.00004552 [04:31:46] Epoch: 1 Batch: 10756/38378 (28.03%) Loss: 2.131337 LR: 0.00004552 [04:31:47] Epoch: 1 Batch: 10757/38378 (28.03%) Loss: 2.164202 LR: 0.00004552 [04:31:49] Epoch: 1 Batch: 10758/38378 (28.03%) Loss: 2.080067 LR: 0.00004552 [04:31:51] Epoch: 1 Batch: 10759/38378 (28.03%) Loss: 2.152890 LR: 0.00004552 [04:31:53] Epoch: 1 Batch: 10760/38378 (28.04%) Loss: 1.830539 LR: 0.00004552 [04:31:54] Epoch: 1 Batch: 10761/38378 (28.04%) Loss: 2.073145 LR: 0.00004552 [04:31:56] Epoch: 1 Batch: 10762/38378 (28.04%) Loss: 1.731518 LR: 0.00004552 [04:31:58] Epoch: 1 Batch: 10763/38378 (28.04%) Loss: 1.732909 LR: 0.00004552 [04:32:00] Epoch: 1 Batch: 10764/38378 (28.05%) Loss: 2.097022 LR: 0.00004552 [04:32:02] Epoch: 1 Batch: 10765/38378 (28.05%) Loss: 2.182674 LR: 0.00004552 [04:32:03] Epoch: 1 Batch: 10766/38378 (28.05%) Loss: 1.865981 LR: 0.00004552 [04:32:05] Epoch: 1 Batch: 10767/38378 (28.06%) Loss: 1.727686 LR: 0.00004552 [04:32:07] Epoch: 1 Batch: 10768/38378 (28.06%) Loss: 2.165188 LR: 0.00004551 [04:32:09] Epoch: 1 Batch: 10769/38378 (28.06%) Loss: 1.812077 LR: 0.00004551 [04:32:11] Epoch: 1 Batch: 10770/38378 (28.06%) Loss: 2.025732 LR: 0.00004551 [04:32:12] Epoch: 1 Batch: 10771/38378 (28.07%) Loss: 1.999504 LR: 0.00004551 [04:32:14] Epoch: 1 Batch: 10772/38378 (28.07%) Loss: 2.056843 LR: 0.00004551 [04:32:16] Epoch: 1 Batch: 10773/38378 (28.07%) Loss: 2.058927 LR: 0.00004551 [04:32:18] Epoch: 1 Batch: 10774/38378 (28.07%) Loss: 2.073040 LR: 0.00004551 [04:32:20] Epoch: 1 Batch: 10775/38378 (28.08%) Loss: 2.037997 LR: 0.00004550 [04:32:22] Epoch: 1 Batch: 10776/38378 (28.08%) Loss: 1.838144 LR: 0.00004550 [04:32:23] Epoch: 1 Batch: 10777/38378 (28.08%) Loss: 2.014061 LR: 0.00004550 [04:32:25] Epoch: 1 Batch: 10778/38378 (28.08%) Loss: 1.927953 LR: 0.00004550 [04:32:27] Epoch: 1 Batch: 10779/38378 (28.09%) Loss: 1.940662 LR: 0.00004550 [04:32:29] Epoch: 1 Batch: 10780/38378 (28.09%) Loss: 1.859072 LR: 0.00004550 [04:32:31] Epoch: 1 Batch: 10781/38378 (28.09%) Loss: 2.046896 LR: 0.00004550 [04:32:33] Epoch: 1 Batch: 10782/38378 (28.09%) Loss: 2.354966 LR: 0.00004549 [04:32:34] Epoch: 1 Batch: 10783/38378 (28.10%) Loss: 1.696209 LR: 0.00004549 [04:32:36] Epoch: 1 Batch: 10784/38378 (28.10%) Loss: 2.062575 LR: 0.00004549 [04:32:38] Epoch: 1 Batch: 10785/38378 (28.10%) Loss: 2.075513 LR: 0.00004549 [04:32:40] Epoch: 1 Batch: 10786/38378 (28.10%) Loss: 2.146026 LR: 0.00004549 [04:32:42] Epoch: 1 Batch: 10787/38378 (28.11%) Loss: 1.727004 LR: 0.00004549 [04:32:43] Epoch: 1 Batch: 10788/38378 (28.11%) Loss: 2.198904 LR: 0.00004549 [04:32:45] Epoch: 1 Batch: 10789/38378 (28.11%) Loss: 1.713833 LR: 0.00004549 [04:32:47] Epoch: 1 Batch: 10790/38378 (28.12%) Loss: 1.786474 LR: 0.00004549 [04:32:49] Epoch: 1 Batch: 10791/38378 (28.12%) Loss: 1.991996 LR: 0.00004549 [04:32:51] Epoch: 1 Batch: 10792/38378 (28.12%) Loss: 2.172888 LR: 0.00004549 [04:32:52] Epoch: 1 Batch: 10793/38378 (28.12%) Loss: 1.688208 LR: 0.00004549 [04:32:54] Epoch: 1 Batch: 10794/38378 (28.13%) Loss: 1.754474 LR: 0.00004549 [04:32:56] Epoch: 1 Batch: 10795/38378 (28.13%) Loss: 1.826911 LR: 0.00004549 [04:32:58] Epoch: 1 Batch: 10796/38378 (28.13%) Loss: 2.129895 LR: 0.00004548 [04:33:00] Epoch: 1 Batch: 10797/38378 (28.13%) Loss: 1.865245 LR: 0.00004548 [04:33:02] Epoch: 1 Batch: 10798/38378 (28.14%) Loss: 1.963141 LR: 0.00004548 [04:33:03] Epoch: 1 Batch: 10799/38378 (28.14%) Loss: 2.187189 LR: 0.00004548 [04:33:10] >> Cleaned up old temp checkpoint: epoch1_step9000 [04:33:10] >> Temp checkpoint saved: epoch1_step10800, size: 0.1702 GB [04:33:10] Epoch: 1 Batch: 10800/38378 (28.14%) Loss: 1.756781 LR: 0.00004548 [04:33:11] Epoch: 1 Batch: 10801/38378 (28.14%) Loss: 1.923856 LR: 0.00004548 [04:33:13] Epoch: 1 Batch: 10802/38378 (28.15%) Loss: 1.865147 LR: 0.00004548 [04:33:15] Epoch: 1 Batch: 10803/38378 (28.15%) Loss: 1.989577 LR: 0.00004547 [04:33:17] Epoch: 1 Batch: 10804/38378 (28.15%) Loss: 2.003279 LR: 0.00004547 [04:33:19] Epoch: 1 Batch: 10805/38378 (28.15%) Loss: 1.942789 LR: 0.00004547 [04:33:20] Epoch: 1 Batch: 10806/38378 (28.16%) Loss: 1.778238 LR: 0.00004547 [04:33:22] Epoch: 1 Batch: 10807/38378 (28.16%) Loss: 1.959439 LR: 0.00004547 [04:33:24] Epoch: 1 Batch: 10808/38378 (28.16%) Loss: 1.840816 LR: 0.00004547 [04:33:26] Epoch: 1 Batch: 10809/38378 (28.16%) Loss: 1.805431 LR: 0.00004547 [04:33:28] Epoch: 1 Batch: 10810/38378 (28.17%) Loss: 1.756344 LR: 0.00004546 [04:33:29] Epoch: 1 Batch: 10811/38378 (28.17%) Loss: 1.784342 LR: 0.00004546 [04:33:31] Epoch: 1 Batch: 10812/38378 (28.17%) Loss: 2.216217 LR: 0.00004546 [04:33:33] Epoch: 1 Batch: 10813/38378 (28.17%) Loss: 1.760028 LR: 0.00004546 [04:33:35] Epoch: 1 Batch: 10814/38378 (28.18%) Loss: 2.153324 LR: 0.00004546 [04:33:37] Epoch: 1 Batch: 10815/38378 (28.18%) Loss: 1.961676 LR: 0.00004546 [04:33:39] Epoch: 1 Batch: 10816/38378 (28.18%) Loss: 2.287419 LR: 0.00004546 [04:33:40] Epoch: 1 Batch: 10817/38378 (28.19%) Loss: 2.155663 LR: 0.00004545 [04:33:42] Epoch: 1 Batch: 10818/38378 (28.19%) Loss: 2.122951 LR: 0.00004545 [04:33:44] Epoch: 1 Batch: 10819/38378 (28.19%) Loss: 2.102880 LR: 0.00004545 [04:33:46] Epoch: 1 Batch: 10820/38378 (28.19%) Loss: 1.798635 LR: 0.00004545 [04:33:48] Epoch: 1 Batch: 10821/38378 (28.20%) Loss: 2.116366 LR: 0.00004545 [04:33:50] Epoch: 1 Batch: 10822/38378 (28.20%) Loss: 2.323159 LR: 0.00004545 [04:33:51] Epoch: 1 Batch: 10823/38378 (28.20%) Loss: 2.093443 LR: 0.00004545 [04:33:53] Epoch: 1 Batch: 10824/38378 (28.20%) Loss: 2.100391 LR: 0.00004545 [04:33:55] Epoch: 1 Batch: 10825/38378 (28.21%) Loss: 2.017188 LR: 0.00004545 [04:33:57] Epoch: 1 Batch: 10826/38378 (28.21%) Loss: 2.121881 LR: 0.00004545 [04:33:59] Epoch: 1 Batch: 10827/38378 (28.21%) Loss: 1.818769 LR: 0.00004545 [04:34:01] Epoch: 1 Batch: 10828/38378 (28.21%) Loss: 1.875507 LR: 0.00004545 [04:34:02] Epoch: 1 Batch: 10829/38378 (28.22%) Loss: 1.754229 LR: 0.00004545 [04:34:04] Epoch: 1 Batch: 10830/38378 (28.22%) Loss: 2.296078 LR: 0.00004545 [04:34:06] Epoch: 1 Batch: 10831/38378 (28.22%) Loss: 1.956523 LR: 0.00004544 [04:34:08] Epoch: 1 Batch: 10832/38378 (28.22%) Loss: 2.147970 LR: 0.00004544 [04:34:10] Epoch: 1 Batch: 10833/38378 (28.23%) Loss: 1.941650 LR: 0.00004544 [04:34:11] Epoch: 1 Batch: 10834/38378 (28.23%) Loss: 1.883108 LR: 0.00004544 [04:34:13] Epoch: 1 Batch: 10835/38378 (28.23%) Loss: 2.172405 LR: 0.00004544 [04:34:15] Epoch: 1 Batch: 10836/38378 (28.23%) Loss: 2.335866 LR: 0.00004544 [04:34:17] Epoch: 1 Batch: 10837/38378 (28.24%) Loss: 2.125579 LR: 0.00004544 [04:34:19] Epoch: 1 Batch: 10838/38378 (28.24%) Loss: 1.815593 LR: 0.00004543 [04:34:21] Epoch: 1 Batch: 10839/38378 (28.24%) Loss: 2.196525 LR: 0.00004543 [04:34:22] Epoch: 1 Batch: 10840/38378 (28.25%) Loss: 1.817860 LR: 0.00004543 [04:34:24] Epoch: 1 Batch: 10841/38378 (28.25%) Loss: 1.911860 LR: 0.00004543 [04:34:26] Epoch: 1 Batch: 10842/38378 (28.25%) Loss: 2.195315 LR: 0.00004543 [04:34:28] Epoch: 1 Batch: 10843/38378 (28.25%) Loss: 2.131778 LR: 0.00004543 [04:34:30] Epoch: 1 Batch: 10844/38378 (28.26%) Loss: 1.978851 LR: 0.00004543 [04:34:31] Epoch: 1 Batch: 10845/38378 (28.26%) Loss: 2.004491 LR: 0.00004542 [04:34:33] Epoch: 1 Batch: 10846/38378 (28.26%) Loss: 1.845895 LR: 0.00004542 [04:34:35] Epoch: 1 Batch: 10847/38378 (28.26%) Loss: 1.933135 LR: 0.00004542 [04:34:37] Epoch: 1 Batch: 10848/38378 (28.27%) Loss: 1.861240 LR: 0.00004542 [04:34:39] Epoch: 1 Batch: 10849/38378 (28.27%) Loss: 1.980507 LR: 0.00004542 [04:34:41] Epoch: 1 Batch: 10850/38378 (28.27%) Loss: 1.769656 LR: 0.00004542 [04:34:43] Epoch: 1 Batch: 10851/38378 (28.27%) Loss: 1.794040 LR: 0.00004542 [04:34:45] Epoch: 1 Batch: 10852/38378 (28.28%) Loss: 2.409227 LR: 0.00004541 [04:34:46] Epoch: 1 Batch: 10853/38378 (28.28%) Loss: 2.124468 LR: 0.00004541 [04:34:48] Epoch: 1 Batch: 10854/38378 (28.28%) Loss: 2.318832 LR: 0.00004541 [04:34:50] Epoch: 1 Batch: 10855/38378 (28.28%) Loss: 2.304666 LR: 0.00004541 [04:34:52] Epoch: 1 Batch: 10856/38378 (28.29%) Loss: 1.846732 LR: 0.00004541 [04:34:53] Epoch: 1 Batch: 10857/38378 (28.29%) Loss: 2.003015 LR: 0.00004541 [04:34:55] Epoch: 1 Batch: 10858/38378 (28.29%) Loss: 1.722425 LR: 0.00004541 [04:34:57] Epoch: 1 Batch: 10859/38378 (28.29%) Loss: 2.280276 LR: 0.00004541 [04:34:59] Epoch: 1 Batch: 10860/38378 (28.30%) Loss: 2.046573 LR: 0.00004541 [04:35:01] Epoch: 1 Batch: 10861/38378 (28.30%) Loss: 2.162262 LR: 0.00004541 [04:35:03] Epoch: 1 Batch: 10862/38378 (28.30%) Loss: 2.106942 LR: 0.00004541 [04:35:04] Epoch: 1 Batch: 10863/38378 (28.31%) Loss: 2.162852 LR: 0.00004541 [04:35:06] Epoch: 1 Batch: 10864/38378 (28.31%) Loss: 2.099532 LR: 0.00004541 [04:35:08] Epoch: 1 Batch: 10865/38378 (28.31%) Loss: 1.947149 LR: 0.00004541 [04:35:10] Epoch: 1 Batch: 10866/38378 (28.31%) Loss: 1.878577 LR: 0.00004540 [04:35:12] Epoch: 1 Batch: 10867/38378 (28.32%) Loss: 2.101059 LR: 0.00004540 [04:35:13] Epoch: 1 Batch: 10868/38378 (28.32%) Loss: 1.897439 LR: 0.00004540 [04:35:15] Epoch: 1 Batch: 10869/38378 (28.32%) Loss: 1.706188 LR: 0.00004540 [04:35:17] Epoch: 1 Batch: 10870/38378 (28.32%) Loss: 1.748418 LR: 0.00004540 [04:35:19] Epoch: 1 Batch: 10871/38378 (28.33%) Loss: 2.040262 LR: 0.00004540 [04:35:21] Epoch: 1 Batch: 10872/38378 (28.33%) Loss: 1.853219 LR: 0.00004540 [04:35:23] Epoch: 1 Batch: 10873/38378 (28.33%) Loss: 2.384363 LR: 0.00004539 [04:35:24] Epoch: 1 Batch: 10874/38378 (28.33%) Loss: 2.122882 LR: 0.00004539 [04:35:26] Epoch: 1 Batch: 10875/38378 (28.34%) Loss: 1.691590 LR: 0.00004539 [04:35:28] Epoch: 1 Batch: 10876/38378 (28.34%) Loss: 1.977499 LR: 0.00004539 [04:35:30] Epoch: 1 Batch: 10877/38378 (28.34%) Loss: 1.732623 LR: 0.00004539 [04:35:32] Epoch: 1 Batch: 10878/38378 (28.34%) Loss: 2.236433 LR: 0.00004539 [04:35:34] Epoch: 1 Batch: 10879/38378 (28.35%) Loss: 2.100875 LR: 0.00004539 [04:35:35] Epoch: 1 Batch: 10880/38378 (28.35%) Loss: 2.134829 LR: 0.00004538 [04:35:37] Epoch: 1 Batch: 10881/38378 (28.35%) Loss: 2.012547 LR: 0.00004538 [04:35:39] Epoch: 1 Batch: 10882/38378 (28.35%) Loss: 1.676571 LR: 0.00004538 [04:35:41] Epoch: 1 Batch: 10883/38378 (28.36%) Loss: 1.842099 LR: 0.00004538 [04:35:43] Epoch: 1 Batch: 10884/38378 (28.36%) Loss: 1.839354 LR: 0.00004538 [04:35:44] Epoch: 1 Batch: 10885/38378 (28.36%) Loss: 2.282016 LR: 0.00004538 [04:35:46] Epoch: 1 Batch: 10886/38378 (28.37%) Loss: 2.009964 LR: 0.00004538 [04:35:48] Epoch: 1 Batch: 10887/38378 (28.37%) Loss: 2.060713 LR: 0.00004537 [04:35:50] Epoch: 1 Batch: 10888/38378 (28.37%) Loss: 1.939433 LR: 0.00004537 [04:35:51] Epoch: 1 Batch: 10889/38378 (28.37%) Loss: 1.878699 LR: 0.00004537 [04:35:53] Epoch: 1 Batch: 10890/38378 (28.38%) Loss: 2.111608 LR: 0.00004537 [04:35:55] Epoch: 1 Batch: 10891/38378 (28.38%) Loss: 1.902231 LR: 0.00004537 [04:35:57] Epoch: 1 Batch: 10892/38378 (28.38%) Loss: 2.133624 LR: 0.00004537 [04:35:59] Epoch: 1 Batch: 10893/38378 (28.38%) Loss: 1.953162 LR: 0.00004537 [04:36:01] Epoch: 1 Batch: 10894/38378 (28.39%) Loss: 2.285864 LR: 0.00004537 [04:36:02] Epoch: 1 Batch: 10895/38378 (28.39%) Loss: 1.857207 LR: 0.00004537 [04:36:04] Epoch: 1 Batch: 10896/38378 (28.39%) Loss: 1.854724 LR: 0.00004537 [04:36:06] Epoch: 1 Batch: 10897/38378 (28.39%) Loss: 1.836905 LR: 0.00004537 [04:36:08] Epoch: 1 Batch: 10898/38378 (28.40%) Loss: 2.043229 LR: 0.00004537 [04:36:10] Epoch: 1 Batch: 10899/38378 (28.40%) Loss: 2.133729 LR: 0.00004537 [04:36:16] >> Cleaned up old temp checkpoint: epoch1_step9200 [04:36:16] >> Temp checkpoint saved: epoch1_step10900, size: 0.1702 GB [04:36:16] Epoch: 1 Batch: 10900/38378 (28.40%) Loss: 2.077722 LR: 0.00004537 [04:36:18] Epoch: 1 Batch: 10901/38378 (28.40%) Loss: 1.903918 LR: 0.00004536 [04:36:19] Epoch: 1 Batch: 10902/38378 (28.41%) Loss: 2.009094 LR: 0.00004536 [04:36:21] Epoch: 1 Batch: 10903/38378 (28.41%) Loss: 2.234791 LR: 0.00004536 [04:36:23] Epoch: 1 Batch: 10904/38378 (28.41%) Loss: 1.510425 LR: 0.00004536 [04:36:25] Epoch: 1 Batch: 10905/38378 (28.41%) Loss: 2.076042 LR: 0.00004536 [04:36:27] Epoch: 1 Batch: 10906/38378 (28.42%) Loss: 1.909536 LR: 0.00004536 [04:36:28] Epoch: 1 Batch: 10907/38378 (28.42%) Loss: 2.013001 LR: 0.00004536 [04:36:30] Epoch: 1 Batch: 10908/38378 (28.42%) Loss: 2.003044 LR: 0.00004535 [04:36:32] Epoch: 1 Batch: 10909/38378 (28.43%) Loss: 1.795929 LR: 0.00004535 [04:36:34] Epoch: 1 Batch: 10910/38378 (28.43%) Loss: 2.050716 LR: 0.00004535 [04:36:36] Epoch: 1 Batch: 10911/38378 (28.43%) Loss: 2.002397 LR: 0.00004535 [04:36:37] Epoch: 1 Batch: 10912/38378 (28.43%) Loss: 1.741102 LR: 0.00004535 [04:36:39] Epoch: 1 Batch: 10913/38378 (28.44%) Loss: 2.024601 LR: 0.00004535 [04:36:41] Epoch: 1 Batch: 10914/38378 (28.44%) Loss: 1.878193 LR: 0.00004535 [04:36:43] Epoch: 1 Batch: 10915/38378 (28.44%) Loss: 2.005230 LR: 0.00004534 [04:36:45] Epoch: 1 Batch: 10916/38378 (28.44%) Loss: 2.155629 LR: 0.00004534 [04:36:47] Epoch: 1 Batch: 10917/38378 (28.45%) Loss: 1.855796 LR: 0.00004534 [04:36:48] Epoch: 1 Batch: 10918/38378 (28.45%) Loss: 2.123617 LR: 0.00004534 [04:36:50] Epoch: 1 Batch: 10919/38378 (28.45%) Loss: 2.438756 LR: 0.00004534 [04:36:52] Epoch: 1 Batch: 10920/38378 (28.45%) Loss: 2.094358 LR: 0.00004534 [04:36:54] Epoch: 1 Batch: 10921/38378 (28.46%) Loss: 1.892924 LR: 0.00004534 [04:36:56] Epoch: 1 Batch: 10922/38378 (28.46%) Loss: 2.084634 LR: 0.00004533 [04:36:58] Epoch: 1 Batch: 10923/38378 (28.46%) Loss: 1.908073 LR: 0.00004533 [04:36:59] Epoch: 1 Batch: 10924/38378 (28.46%) Loss: 1.944661 LR: 0.00004533 [04:37:01] Epoch: 1 Batch: 10925/38378 (28.47%) Loss: 2.184660 LR: 0.00004533 [04:37:03] Epoch: 1 Batch: 10926/38378 (28.47%) Loss: 1.888356 LR: 0.00004533 [04:37:05] Epoch: 1 Batch: 10927/38378 (28.47%) Loss: 2.047700 LR: 0.00004533 [04:37:07] Epoch: 1 Batch: 10928/38378 (28.47%) Loss: 2.001870 LR: 0.00004533 [04:37:08] Epoch: 1 Batch: 10929/38378 (28.48%) Loss: 1.981171 LR: 0.00004533 [04:37:10] Epoch: 1 Batch: 10930/38378 (28.48%) Loss: 2.128405 LR: 0.00004533 [04:37:12] Epoch: 1 Batch: 10931/38378 (28.48%) Loss: 1.780712 LR: 0.00004533 [04:37:14] Epoch: 1 Batch: 10932/38378 (28.49%) Loss: 1.897723 LR: 0.00004533 [04:37:16] Epoch: 1 Batch: 10933/38378 (28.49%) Loss: 1.909742 LR: 0.00004533 [04:37:17] Epoch: 1 Batch: 10934/38378 (28.49%) Loss: 1.942873 LR: 0.00004533 [04:37:19] Epoch: 1 Batch: 10935/38378 (28.49%) Loss: 2.032419 LR: 0.00004533 [04:37:21] Epoch: 1 Batch: 10936/38378 (28.50%) Loss: 1.995891 LR: 0.00004532 [04:37:23] Epoch: 1 Batch: 10937/38378 (28.50%) Loss: 1.983553 LR: 0.00004532 [04:37:24] Epoch: 1 Batch: 10938/38378 (28.50%) Loss: 2.240669 LR: 0.00004532 [04:37:26] Epoch: 1 Batch: 10939/38378 (28.50%) Loss: 1.977876 LR: 0.00004532 [04:37:28] Epoch: 1 Batch: 10940/38378 (28.51%) Loss: 2.056421 LR: 0.00004532 [04:37:30] Epoch: 1 Batch: 10941/38378 (28.51%) Loss: 1.969945 LR: 0.00004532 [04:37:32] Epoch: 1 Batch: 10942/38378 (28.51%) Loss: 1.953721 LR: 0.00004532 [04:37:33] Epoch: 1 Batch: 10943/38378 (28.51%) Loss: 2.169201 LR: 0.00004531 [04:37:35] Epoch: 1 Batch: 10944/38378 (28.52%) Loss: 1.915119 LR: 0.00004531 [04:37:37] Epoch: 1 Batch: 10945/38378 (28.52%) Loss: 1.703659 LR: 0.00004531 [04:37:39] Epoch: 1 Batch: 10946/38378 (28.52%) Loss: 2.100257 LR: 0.00004531 [04:37:41] Epoch: 1 Batch: 10947/38378 (28.52%) Loss: 2.244923 LR: 0.00004531 [04:37:42] Epoch: 1 Batch: 10948/38378 (28.53%) Loss: 2.000950 LR: 0.00004531 [04:37:44] Epoch: 1 Batch: 10949/38378 (28.53%) Loss: 2.139436 LR: 0.00004531 [04:37:46] Epoch: 1 Batch: 10950/38378 (28.53%) Loss: 2.141486 LR: 0.00004530 [04:37:48] Epoch: 1 Batch: 10951/38378 (28.53%) Loss: 1.821760 LR: 0.00004530 [04:37:50] Epoch: 1 Batch: 10952/38378 (28.54%) Loss: 1.862558 LR: 0.00004530 [04:37:51] Epoch: 1 Batch: 10953/38378 (28.54%) Loss: 2.058548 LR: 0.00004530 [04:37:53] Epoch: 1 Batch: 10954/38378 (28.54%) Loss: 1.745513 LR: 0.00004530 [04:37:55] Epoch: 1 Batch: 10955/38378 (28.54%) Loss: 2.288185 LR: 0.00004530 [04:37:57] Epoch: 1 Batch: 10956/38378 (28.55%) Loss: 1.914178 LR: 0.00004530 [04:37:58] Epoch: 1 Batch: 10957/38378 (28.55%) Loss: 1.714108 LR: 0.00004529 [04:38:00] Epoch: 1 Batch: 10958/38378 (28.55%) Loss: 1.711036 LR: 0.00004529 [04:38:02] Epoch: 1 Batch: 10959/38378 (28.56%) Loss: 1.919610 LR: 0.00004529 [04:38:04] Epoch: 1 Batch: 10960/38378 (28.56%) Loss: 1.965809 LR: 0.00004529 [04:38:06] Epoch: 1 Batch: 10961/38378 (28.56%) Loss: 1.888347 LR: 0.00004529 [04:38:08] Epoch: 1 Batch: 10962/38378 (28.56%) Loss: 2.153063 LR: 0.00004529 [04:38:09] Epoch: 1 Batch: 10963/38378 (28.57%) Loss: 1.947273 LR: 0.00004529 [04:38:11] Epoch: 1 Batch: 10964/38378 (28.57%) Loss: 2.103535 LR: 0.00004529 [04:38:13] Epoch: 1 Batch: 10965/38378 (28.57%) Loss: 1.899281 LR: 0.00004529 [04:38:15] Epoch: 1 Batch: 10966/38378 (28.57%) Loss: 1.965386 LR: 0.00004529 [04:38:17] Epoch: 1 Batch: 10967/38378 (28.58%) Loss: 1.761065 LR: 0.00004529 [04:38:18] Epoch: 1 Batch: 10968/38378 (28.58%) Loss: 1.934237 LR: 0.00004529 [04:38:20] Epoch: 1 Batch: 10969/38378 (28.58%) Loss: 1.815348 LR: 0.00004529 [04:38:22] Epoch: 1 Batch: 10970/38378 (28.58%) Loss: 2.131972 LR: 0.00004529 [04:38:24] Epoch: 1 Batch: 10971/38378 (28.59%) Loss: 1.822037 LR: 0.00004528 [04:38:26] Epoch: 1 Batch: 10972/38378 (28.59%) Loss: 2.268066 LR: 0.00004528 [04:38:28] Epoch: 1 Batch: 10973/38378 (28.59%) Loss: 1.945579 LR: 0.00004528 [04:38:29] Epoch: 1 Batch: 10974/38378 (28.59%) Loss: 1.862416 LR: 0.00004528 [04:38:31] Epoch: 1 Batch: 10975/38378 (28.60%) Loss: 1.977346 LR: 0.00004528 [04:38:33] Epoch: 1 Batch: 10976/38378 (28.60%) Loss: 1.987091 LR: 0.00004528 [04:38:35] Epoch: 1 Batch: 10977/38378 (28.60%) Loss: 2.266568 LR: 0.00004528 [04:38:37] Epoch: 1 Batch: 10978/38378 (28.60%) Loss: 1.587825 LR: 0.00004527 [04:38:38] Epoch: 1 Batch: 10979/38378 (28.61%) Loss: 2.319386 LR: 0.00004527 [04:38:40] Epoch: 1 Batch: 10980/38378 (28.61%) Loss: 2.139579 LR: 0.00004527 [04:38:42] Epoch: 1 Batch: 10981/38378 (28.61%) Loss: 1.939067 LR: 0.00004527 [04:38:44] Epoch: 1 Batch: 10982/38378 (28.62%) Loss: 1.821788 LR: 0.00004527 [04:38:46] Epoch: 1 Batch: 10983/38378 (28.62%) Loss: 2.111634 LR: 0.00004527 [04:38:47] Epoch: 1 Batch: 10984/38378 (28.62%) Loss: 2.301510 LR: 0.00004527 [04:38:49] Epoch: 1 Batch: 10985/38378 (28.62%) Loss: 2.011672 LR: 0.00004526 [04:38:51] Epoch: 1 Batch: 10986/38378 (28.63%) Loss: 1.878857 LR: 0.00004526 [04:38:53] Epoch: 1 Batch: 10987/38378 (28.63%) Loss: 1.927991 LR: 0.00004526 [04:38:55] Epoch: 1 Batch: 10988/38378 (28.63%) Loss: 2.098392 LR: 0.00004526 [04:38:57] Epoch: 1 Batch: 10989/38378 (28.63%) Loss: 2.184886 LR: 0.00004526 [04:38:58] Epoch: 1 Batch: 10990/38378 (28.64%) Loss: 1.939067 LR: 0.00004526 [04:39:00] Epoch: 1 Batch: 10991/38378 (28.64%) Loss: 1.971691 LR: 0.00004526 [04:39:02] Epoch: 1 Batch: 10992/38378 (28.64%) Loss: 2.121071 LR: 0.00004525 [04:39:04] Epoch: 1 Batch: 10993/38378 (28.64%) Loss: 1.732317 LR: 0.00004525 [04:39:06] Epoch: 1 Batch: 10994/38378 (28.65%) Loss: 2.232495 LR: 0.00004525 [04:39:07] Epoch: 1 Batch: 10995/38378 (28.65%) Loss: 2.074658 LR: 0.00004525 [04:39:09] Epoch: 1 Batch: 10996/38378 (28.65%) Loss: 2.039598 LR: 0.00004525 [04:39:11] Epoch: 1 Batch: 10997/38378 (28.65%) Loss: 1.923406 LR: 0.00004525 [04:39:13] Epoch: 1 Batch: 10998/38378 (28.66%) Loss: 2.018338 LR: 0.00004525 [04:39:15] Epoch: 1 Batch: 10999/38378 (28.66%) Loss: 2.152633 LR: 0.00004525 [04:39:17] >> Evaluating batch 0 [04:39:18] >> Evaluating batch 1 [04:39:19] >> Evaluating batch 2 [04:39:20] >> Evaluating batch 3 [04:39:21] >> Evaluating batch 4 [04:39:22] >> Evaluating batch 5 [04:39:23] >> Evaluating batch 6 [04:39:24] >> Evaluating batch 7 [04:39:25] >> Evaluating batch 8 [04:39:26] >> Evaluating batch 9 [04:39:27] >> Evaluating batch 10 [04:39:28] >> Evaluating batch 11 [04:39:29] >> Evaluating batch 12 [04:39:30] >> Evaluating batch 13 [04:39:31] >> Evaluating batch 14 [04:39:32] >> Evaluating batch 15 [04:39:33] >> Evaluating batch 16 [04:39:34] Epoch: 1 Step: 11000/38378 Evaluation: [04:39:34] [1mAvg Loss Since Last Eval: 0.0727 Val Loss: 2.1128 Validation loss delta: 2.1128 Perplexity: 8.2713 LR: 0.00004525 [04:39:38] >> Cleaned up old temp checkpoint: epoch1_step9400 [04:39:38] >> Temp checkpoint saved: epoch1_step11000, size: 0.1702 GB [04:39:42] >> Checkpoint saved: epoch1_step11000, size: 0.1702 GB [04:39:42] Epoch: 1 Batch: 11000/38378 (28.66%) Loss: 2.174906 LR: 0.00004525 [04:39:44] Epoch: 1 Batch: 11001/38378 (28.66%) Loss: 2.020166 LR: 0.00004525 [04:39:46] Epoch: 1 Batch: 11002/38378 (28.67%) Loss: 1.796986 LR: 0.00004525 [04:39:48] Epoch: 1 Batch: 11003/38378 (28.67%) Loss: 2.022302 LR: 0.00004525 [04:39:49] Epoch: 1 Batch: 11004/38378 (28.67%) Loss: 2.046053 LR: 0.00004525 [04:39:51] Epoch: 1 Batch: 11005/38378 (28.68%) Loss: 1.863793 LR: 0.00004525 [04:39:53] Epoch: 1 Batch: 11006/38378 (28.68%) Loss: 2.185234 LR: 0.00004524 [04:39:55] Epoch: 1 Batch: 11007/38378 (28.68%) Loss: 1.985107 LR: 0.00004524 [04:39:57] Epoch: 1 Batch: 11008/38378 (28.68%) Loss: 2.185098 LR: 0.00004524 [04:39:59] Epoch: 1 Batch: 11009/38378 (28.69%) Loss: 2.072021 LR: 0.00004524 [04:40:00] Epoch: 1 Batch: 11010/38378 (28.69%) Loss: 1.890948 LR: 0.00004524 [04:40:02] Epoch: 1 Batch: 11011/38378 (28.69%) Loss: 1.952632 LR: 0.00004524 [04:40:04] Epoch: 1 Batch: 11012/38378 (28.69%) Loss: 2.181680 LR: 0.00004524 [04:40:06] Epoch: 1 Batch: 11013/38378 (28.70%) Loss: 1.955177 LR: 0.00004523 [04:40:08] Epoch: 1 Batch: 11014/38378 (28.70%) Loss: 2.041339 LR: 0.00004523 [04:40:10] Epoch: 1 Batch: 11015/38378 (28.70%) Loss: 1.966967 LR: 0.00004523 [04:40:12] Epoch: 1 Batch: 11016/38378 (28.70%) Loss: 1.993811 LR: 0.00004523 [04:40:14] Epoch: 1 Batch: 11017/38378 (28.71%) Loss: 2.300664 LR: 0.00004523 [04:40:15] Epoch: 1 Batch: 11018/38378 (28.71%) Loss: 1.995612 LR: 0.00004523 [04:40:17] Epoch: 1 Batch: 11019/38378 (28.71%) Loss: 2.079785 LR: 0.00004523 [04:40:19] Epoch: 1 Batch: 11020/38378 (28.71%) Loss: 1.629744 LR: 0.00004522 [04:40:21] Epoch: 1 Batch: 11021/38378 (28.72%) Loss: 2.046931 LR: 0.00004522 [04:40:23] Epoch: 1 Batch: 11022/38378 (28.72%) Loss: 2.231313 LR: 0.00004522 [04:40:25] Epoch: 1 Batch: 11023/38378 (28.72%) Loss: 2.108710 LR: 0.00004522 [04:40:27] Epoch: 1 Batch: 11024/38378 (28.72%) Loss: 2.128283 LR: 0.00004522 [04:40:28] Epoch: 1 Batch: 11025/38378 (28.73%) Loss: 1.959584 LR: 0.00004522 [04:40:30] Epoch: 1 Batch: 11026/38378 (28.73%) Loss: 2.110540 LR: 0.00004522 [04:40:32] Epoch: 1 Batch: 11027/38378 (28.73%) Loss: 1.972137 LR: 0.00004521 [04:40:34] Epoch: 1 Batch: 11028/38378 (28.74%) Loss: 1.950256 LR: 0.00004521 [04:40:36] Epoch: 1 Batch: 11029/38378 (28.74%) Loss: 1.838900 LR: 0.00004521 [04:40:38] Epoch: 1 Batch: 11030/38378 (28.74%) Loss: 1.954544 LR: 0.00004521 [04:40:39] Epoch: 1 Batch: 11031/38378 (28.74%) Loss: 2.209736 LR: 0.00004521 [04:40:41] Epoch: 1 Batch: 11032/38378 (28.75%) Loss: 1.724927 LR: 0.00004521 [04:40:43] Epoch: 1 Batch: 11033/38378 (28.75%) Loss: 2.182104 LR: 0.00004521 [04:40:45] Epoch: 1 Batch: 11034/38378 (28.75%) Loss: 2.063014 LR: 0.00004521 [04:40:47] Epoch: 1 Batch: 11035/38378 (28.75%) Loss: 1.578407 LR: 0.00004521 [04:40:48] Epoch: 1 Batch: 11036/38378 (28.76%) Loss: 1.847591 LR: 0.00004521 [04:40:50] Epoch: 1 Batch: 11037/38378 (28.76%) Loss: 1.981021 LR: 0.00004521 [04:40:52] Epoch: 1 Batch: 11038/38378 (28.76%) Loss: 2.217388 LR: 0.00004521 [04:40:54] Epoch: 1 Batch: 11039/38378 (28.76%) Loss: 2.017372 LR: 0.00004521 [04:40:56] Epoch: 1 Batch: 11040/38378 (28.77%) Loss: 1.930387 LR: 0.00004521 [04:40:57] Epoch: 1 Batch: 11041/38378 (28.77%) Loss: 1.980851 LR: 0.00004520 [04:40:59] Epoch: 1 Batch: 11042/38378 (28.77%) Loss: 1.902871 LR: 0.00004520 [04:41:01] Epoch: 1 Batch: 11043/38378 (28.77%) Loss: 2.006164 LR: 0.00004520 [04:41:03] Epoch: 1 Batch: 11044/38378 (28.78%) Loss: 2.028517 LR: 0.00004520 [04:41:04] Epoch: 1 Batch: 11045/38378 (28.78%) Loss: 1.659364 LR: 0.00004520 [04:41:06] Epoch: 1 Batch: 11046/38378 (28.78%) Loss: 2.090887 LR: 0.00004520 [04:41:08] Epoch: 1 Batch: 11047/38378 (28.78%) Loss: 2.040465 LR: 0.00004520 [04:41:10] Epoch: 1 Batch: 11048/38378 (28.79%) Loss: 1.908064 LR: 0.00004519 [04:41:12] Epoch: 1 Batch: 11049/38378 (28.79%) Loss: 2.133365 LR: 0.00004519 [04:41:14] Epoch: 1 Batch: 11050/38378 (28.79%) Loss: 2.195353 LR: 0.00004519 [04:41:15] Epoch: 1 Batch: 11051/38378 (28.80%) Loss: 1.956774 LR: 0.00004519 [04:41:17] Epoch: 1 Batch: 11052/38378 (28.80%) Loss: 1.913120 LR: 0.00004519 [04:41:19] Epoch: 1 Batch: 11053/38378 (28.80%) Loss: 2.042956 LR: 0.00004519 [04:41:21] Epoch: 1 Batch: 11054/38378 (28.80%) Loss: 1.511101 LR: 0.00004519 [04:41:23] Epoch: 1 Batch: 11055/38378 (28.81%) Loss: 2.141649 LR: 0.00004518 [04:41:24] Epoch: 1 Batch: 11056/38378 (28.81%) Loss: 2.024657 LR: 0.00004518 [04:41:26] Epoch: 1 Batch: 11057/38378 (28.81%) Loss: 2.207556 LR: 0.00004518 [04:41:28] Epoch: 1 Batch: 11058/38378 (28.81%) Loss: 1.919351 LR: 0.00004518 [04:41:30] Epoch: 1 Batch: 11059/38378 (28.82%) Loss: 2.158364 LR: 0.00004518 [04:41:32] Epoch: 1 Batch: 11060/38378 (28.82%) Loss: 1.984481 LR: 0.00004518 [04:41:33] Epoch: 1 Batch: 11061/38378 (28.82%) Loss: 1.928614 LR: 0.00004518 [04:41:35] Epoch: 1 Batch: 11062/38378 (28.82%) Loss: 1.963573 LR: 0.00004517 [04:41:37] Epoch: 1 Batch: 11063/38378 (28.83%) Loss: 1.802738 LR: 0.00004517 [04:41:39] Epoch: 1 Batch: 11064/38378 (28.83%) Loss: 2.003264 LR: 0.00004517 [04:41:41] Epoch: 1 Batch: 11065/38378 (28.83%) Loss: 2.335901 LR: 0.00004517 [04:41:43] Epoch: 1 Batch: 11066/38378 (28.83%) Loss: 1.908911 LR: 0.00004517 [04:41:44] Epoch: 1 Batch: 11067/38378 (28.84%) Loss: 2.200695 LR: 0.00004517 [04:41:46] Epoch: 1 Batch: 11068/38378 (28.84%) Loss: 2.082600 LR: 0.00004517 [04:41:48] Epoch: 1 Batch: 11069/38378 (28.84%) Loss: 2.051932 LR: 0.00004516 [04:41:50] Epoch: 1 Batch: 11070/38378 (28.84%) Loss: 1.953924 LR: 0.00004516 [04:41:52] Epoch: 1 Batch: 11071/38378 (28.85%) Loss: 2.266454 LR: 0.00004516 [04:41:54] Epoch: 1 Batch: 11072/38378 (28.85%) Loss: 1.810168 LR: 0.00004516 [04:41:55] Epoch: 1 Batch: 11073/38378 (28.85%) Loss: 2.046915 LR: 0.00004516 [04:41:57] Epoch: 1 Batch: 11074/38378 (28.86%) Loss: 1.912727 LR: 0.00004516 [04:41:59] Epoch: 1 Batch: 11075/38378 (28.86%) Loss: 2.052575 LR: 0.00004516 [04:42:01] Epoch: 1 Batch: 11076/38378 (28.86%) Loss: 2.089960 LR: 0.00004516 [04:42:03] Epoch: 1 Batch: 11077/38378 (28.86%) Loss: 1.815332 LR: 0.00004516 [04:42:04] Epoch: 1 Batch: 11078/38378 (28.87%) Loss: 1.997076 LR: 0.00004516 [04:42:06] Epoch: 1 Batch: 11079/38378 (28.87%) Loss: 1.997221 LR: 0.00004516 [04:42:08] Epoch: 1 Batch: 11080/38378 (28.87%) Loss: 1.911113 LR: 0.00004516 [04:42:10] Epoch: 1 Batch: 11081/38378 (28.87%) Loss: 2.054034 LR: 0.00004516 [04:42:12] Epoch: 1 Batch: 11082/38378 (28.88%) Loss: 2.028438 LR: 0.00004516 [04:42:14] Epoch: 1 Batch: 11083/38378 (28.88%) Loss: 2.122522 LR: 0.00004515 [04:42:15] Epoch: 1 Batch: 11084/38378 (28.88%) Loss: 1.983898 LR: 0.00004515 [04:42:17] Epoch: 1 Batch: 11085/38378 (28.88%) Loss: 2.216756 LR: 0.00004515 [04:42:19] Epoch: 1 Batch: 11086/38378 (28.89%) Loss: 1.846231 LR: 0.00004515 [04:42:21] Epoch: 1 Batch: 11087/38378 (28.89%) Loss: 1.899538 LR: 0.00004515 [04:42:22] Epoch: 1 Batch: 11088/38378 (28.89%) Loss: 2.205058 LR: 0.00004515 [04:42:24] Epoch: 1 Batch: 11089/38378 (28.89%) Loss: 2.245658 LR: 0.00004515 [04:42:26] Epoch: 1 Batch: 11090/38378 (28.90%) Loss: 2.278641 LR: 0.00004514 [04:42:28] Epoch: 1 Batch: 11091/38378 (28.90%) Loss: 2.028502 LR: 0.00004514 [04:42:30] Epoch: 1 Batch: 11092/38378 (28.90%) Loss: 2.019415 LR: 0.00004514 [04:42:31] Epoch: 1 Batch: 11093/38378 (28.90%) Loss: 1.887129 LR: 0.00004514 [04:42:33] Epoch: 1 Batch: 11094/38378 (28.91%) Loss: 1.930984 LR: 0.00004514 [04:42:35] Epoch: 1 Batch: 11095/38378 (28.91%) Loss: 1.784726 LR: 0.00004514 [04:42:37] Epoch: 1 Batch: 11096/38378 (28.91%) Loss: 1.849385 LR: 0.00004514 [04:42:39] Epoch: 1 Batch: 11097/38378 (28.92%) Loss: 2.168661 LR: 0.00004513 [04:42:41] Epoch: 1 Batch: 11098/38378 (28.92%) Loss: 1.943768 LR: 0.00004513 [04:42:42] Epoch: 1 Batch: 11099/38378 (28.92%) Loss: 2.014961 LR: 0.00004513 [04:42:48] >> Cleaned up old temp checkpoint: epoch1_step9600 [04:42:49] >> Temp checkpoint saved: epoch1_step11100, size: 0.1702 GB [04:42:49] Epoch: 1 Batch: 11100/38378 (28.92%) Loss: 1.889601 LR: 0.00004513 [04:42:50] Epoch: 1 Batch: 11101/38378 (28.93%) Loss: 1.871853 LR: 0.00004513 [04:42:52] Epoch: 1 Batch: 11102/38378 (28.93%) Loss: 2.177116 LR: 0.00004513 [04:42:54] Epoch: 1 Batch: 11103/38378 (28.93%) Loss: 1.715973 LR: 0.00004513 [04:42:56] Epoch: 1 Batch: 11104/38378 (28.93%) Loss: 1.624780 LR: 0.00004512 [04:42:57] Epoch: 1 Batch: 11105/38378 (28.94%) Loss: 2.040741 LR: 0.00004512 [04:42:59] Epoch: 1 Batch: 11106/38378 (28.94%) Loss: 2.187209 LR: 0.00004512 [04:43:01] Epoch: 1 Batch: 11107/38378 (28.94%) Loss: 2.112456 LR: 0.00004512 [04:43:03] Epoch: 1 Batch: 11108/38378 (28.94%) Loss: 1.866624 LR: 0.00004512 [04:43:05] Epoch: 1 Batch: 11109/38378 (28.95%) Loss: 1.803974 LR: 0.00004512 [04:43:07] Epoch: 1 Batch: 11110/38378 (28.95%) Loss: 1.802784 LR: 0.00004512 [04:43:08] Epoch: 1 Batch: 11111/38378 (28.95%) Loss: 1.993898 LR: 0.00004512 [04:43:10] Epoch: 1 Batch: 11112/38378 (28.95%) Loss: 2.265051 LR: 0.00004512 [04:43:12] Epoch: 1 Batch: 11113/38378 (28.96%) Loss: 2.019148 LR: 0.00004512 [04:43:14] Epoch: 1 Batch: 11114/38378 (28.96%) Loss: 2.021825 LR: 0.00004512 [04:43:16] Epoch: 1 Batch: 11115/38378 (28.96%) Loss: 1.921160 LR: 0.00004512 [04:43:18] Epoch: 1 Batch: 11116/38378 (28.96%) Loss: 2.206277 LR: 0.00004512 [04:43:19] Epoch: 1 Batch: 11117/38378 (28.97%) Loss: 2.324144 LR: 0.00004512 [04:43:21] Epoch: 1 Batch: 11118/38378 (28.97%) Loss: 2.221150 LR: 0.00004511 [04:43:23] Epoch: 1 Batch: 11119/38378 (28.97%) Loss: 1.846747 LR: 0.00004511 [04:43:25] Epoch: 1 Batch: 11120/38378 (28.97%) Loss: 1.792355 LR: 0.00004511 [04:43:27] Epoch: 1 Batch: 11121/38378 (28.98%) Loss: 1.718191 LR: 0.00004511 [04:43:29] Epoch: 1 Batch: 11122/38378 (28.98%) Loss: 1.906555 LR: 0.00004511 [04:43:30] Epoch: 1 Batch: 11123/38378 (28.98%) Loss: 1.825115 LR: 0.00004511 [04:43:32] Epoch: 1 Batch: 11124/38378 (28.99%) Loss: 2.122017 LR: 0.00004511 [04:43:34] Epoch: 1 Batch: 11125/38378 (28.99%) Loss: 2.112741 LR: 0.00004510 [04:43:36] Epoch: 1 Batch: 11126/38378 (28.99%) Loss: 1.676858 LR: 0.00004510 [04:43:38] Epoch: 1 Batch: 11127/38378 (28.99%) Loss: 2.024251 LR: 0.00004510 [04:43:39] Epoch: 1 Batch: 11128/38378 (29.00%) Loss: 2.223293 LR: 0.00004510 [04:43:41] Epoch: 1 Batch: 11129/38378 (29.00%) Loss: 2.193522 LR: 0.00004510 [04:43:43] Epoch: 1 Batch: 11130/38378 (29.00%) Loss: 1.845294 LR: 0.00004510 [04:43:45] Epoch: 1 Batch: 11131/38378 (29.00%) Loss: 1.752337 LR: 0.00004510 [04:43:47] Epoch: 1 Batch: 11132/38378 (29.01%) Loss: 2.028218 LR: 0.00004509 [04:43:48] Epoch: 1 Batch: 11133/38378 (29.01%) Loss: 1.714913 LR: 0.00004509 [04:43:50] Epoch: 1 Batch: 11134/38378 (29.01%) Loss: 1.854469 LR: 0.00004509 [04:43:52] Epoch: 1 Batch: 11135/38378 (29.01%) Loss: 1.862110 LR: 0.00004509 [04:43:54] Epoch: 1 Batch: 11136/38378 (29.02%) Loss: 2.056289 LR: 0.00004509 [04:43:56] Epoch: 1 Batch: 11137/38378 (29.02%) Loss: 1.943473 LR: 0.00004509 [04:43:57] Epoch: 1 Batch: 11138/38378 (29.02%) Loss: 2.170294 LR: 0.00004509 [04:43:59] Epoch: 1 Batch: 11139/38378 (29.02%) Loss: 1.942118 LR: 0.00004508 [04:44:01] Epoch: 1 Batch: 11140/38378 (29.03%) Loss: 1.904762 LR: 0.00004508 [04:44:03] Epoch: 1 Batch: 11141/38378 (29.03%) Loss: 2.270910 LR: 0.00004508 [04:44:05] Epoch: 1 Batch: 11142/38378 (29.03%) Loss: 2.025910 LR: 0.00004508 [04:44:06] Epoch: 1 Batch: 11143/38378 (29.03%) Loss: 1.842002 LR: 0.00004508 [04:44:08] Epoch: 1 Batch: 11144/38378 (29.04%) Loss: 1.997115 LR: 0.00004508 [04:44:10] Epoch: 1 Batch: 11145/38378 (29.04%) Loss: 1.967699 LR: 0.00004508 [04:44:12] Epoch: 1 Batch: 11146/38378 (29.04%) Loss: 2.074711 LR: 0.00004508 [04:44:14] Epoch: 1 Batch: 11147/38378 (29.05%) Loss: 1.994292 LR: 0.00004508 [04:44:15] Epoch: 1 Batch: 11148/38378 (29.05%) Loss: 1.801456 LR: 0.00004508 [04:44:17] Epoch: 1 Batch: 11149/38378 (29.05%) Loss: 1.870099 LR: 0.00004508 [04:44:19] Epoch: 1 Batch: 11150/38378 (29.05%) Loss: 1.847231 LR: 0.00004508 [04:44:21] Epoch: 1 Batch: 11151/38378 (29.06%) Loss: 1.765889 LR: 0.00004508 [04:44:23] Epoch: 1 Batch: 11152/38378 (29.06%) Loss: 1.852488 LR: 0.00004508 [04:44:24] Epoch: 1 Batch: 11153/38378 (29.06%) Loss: 1.981815 LR: 0.00004507 [04:44:26] Epoch: 1 Batch: 11154/38378 (29.06%) Loss: 2.486608 LR: 0.00004507 [04:44:28] Epoch: 1 Batch: 11155/38378 (29.07%) Loss: 2.138361 LR: 0.00004507 [04:44:29] Epoch: 1 Batch: 11156/38378 (29.07%) Loss: 1.915300 LR: 0.00004507 [04:44:31] Epoch: 1 Batch: 11157/38378 (29.07%) Loss: 2.211436 LR: 0.00004507 [04:44:33] Epoch: 1 Batch: 11158/38378 (29.07%) Loss: 1.658258 LR: 0.00004507 [04:44:35] Epoch: 1 Batch: 11159/38378 (29.08%) Loss: 2.285368 LR: 0.00004507 [04:44:37] Epoch: 1 Batch: 11160/38378 (29.08%) Loss: 1.998415 LR: 0.00004506 [04:44:39] Epoch: 1 Batch: 11161/38378 (29.08%) Loss: 2.257386 LR: 0.00004506 [04:44:40] Epoch: 1 Batch: 11162/38378 (29.08%) Loss: 1.828226 LR: 0.00004506 [04:44:42] Epoch: 1 Batch: 11163/38378 (29.09%) Loss: 1.802060 LR: 0.00004506 [04:44:44] Epoch: 1 Batch: 11164/38378 (29.09%) Loss: 1.780710 LR: 0.00004506 [04:44:46] Epoch: 1 Batch: 11165/38378 (29.09%) Loss: 1.958022 LR: 0.00004506 [04:44:48] Epoch: 1 Batch: 11166/38378 (29.09%) Loss: 1.899267 LR: 0.00004506 [04:44:49] Epoch: 1 Batch: 11167/38378 (29.10%) Loss: 2.066028 LR: 0.00004505 [04:44:51] Epoch: 1 Batch: 11168/38378 (29.10%) Loss: 1.944581 LR: 0.00004505 [04:44:53] Epoch: 1 Batch: 11169/38378 (29.10%) Loss: 2.118492 LR: 0.00004505 [04:44:55] Epoch: 1 Batch: 11170/38378 (29.11%) Loss: 1.891813 LR: 0.00004505 [04:44:56] Epoch: 1 Batch: 11171/38378 (29.11%) Loss: 1.911475 LR: 0.00004505 [04:44:58] Epoch: 1 Batch: 11172/38378 (29.11%) Loss: 1.724477 LR: 0.00004505 [04:45:00] Epoch: 1 Batch: 11173/38378 (29.11%) Loss: 2.034009 LR: 0.00004505 [04:45:02] Epoch: 1 Batch: 11174/38378 (29.12%) Loss: 1.835110 LR: 0.00004504 [04:45:04] Epoch: 1 Batch: 11175/38378 (29.12%) Loss: 2.261705 LR: 0.00004504 [04:45:06] Epoch: 1 Batch: 11176/38378 (29.12%) Loss: 2.098816 LR: 0.00004504 [04:45:07] Epoch: 1 Batch: 11177/38378 (29.12%) Loss: 2.115546 LR: 0.00004504 [04:45:09] Epoch: 1 Batch: 11178/38378 (29.13%) Loss: 2.080701 LR: 0.00004504 [04:45:11] Epoch: 1 Batch: 11179/38378 (29.13%) Loss: 1.998222 LR: 0.00004504 [04:45:13] Epoch: 1 Batch: 11180/38378 (29.13%) Loss: 2.092068 LR: 0.00004504 [04:45:15] Epoch: 1 Batch: 11181/38378 (29.13%) Loss: 2.348287 LR: 0.00004503 [04:45:16] Epoch: 1 Batch: 11182/38378 (29.14%) Loss: 1.788761 LR: 0.00004503 [04:45:18] Epoch: 1 Batch: 11183/38378 (29.14%) Loss: 2.051554 LR: 0.00004503 [04:45:20] Epoch: 1 Batch: 11184/38378 (29.14%) Loss: 2.318583 LR: 0.00004503 [04:45:22] Epoch: 1 Batch: 11185/38378 (29.14%) Loss: 1.896739 LR: 0.00004503 [04:45:24] Epoch: 1 Batch: 11186/38378 (29.15%) Loss: 1.790765 LR: 0.00004503 [04:45:25] Epoch: 1 Batch: 11187/38378 (29.15%) Loss: 1.946387 LR: 0.00004503 [04:45:27] Epoch: 1 Batch: 11188/38378 (29.15%) Loss: 2.105281 LR: 0.00004503 [04:45:29] Epoch: 1 Batch: 11189/38378 (29.15%) Loss: 1.979953 LR: 0.00004503 [04:45:31] Epoch: 1 Batch: 11190/38378 (29.16%) Loss: 1.991165 LR: 0.00004503 [04:45:33] Epoch: 1 Batch: 11191/38378 (29.16%) Loss: 1.999294 LR: 0.00004503 [04:45:35] Epoch: 1 Batch: 11192/38378 (29.16%) Loss: 1.864440 LR: 0.00004503 [04:45:36] Epoch: 1 Batch: 11193/38378 (29.17%) Loss: 1.959905 LR: 0.00004503 [04:45:38] Epoch: 1 Batch: 11194/38378 (29.17%) Loss: 1.865288 LR: 0.00004503 [04:45:40] Epoch: 1 Batch: 11195/38378 (29.17%) Loss: 1.941193 LR: 0.00004502 [04:45:42] Epoch: 1 Batch: 11196/38378 (29.17%) Loss: 1.999669 LR: 0.00004502 [04:45:44] Epoch: 1 Batch: 11197/38378 (29.18%) Loss: 1.778390 LR: 0.00004502 [04:45:45] Epoch: 1 Batch: 11198/38378 (29.18%) Loss: 2.165558 LR: 0.00004502 [04:45:47] Epoch: 1 Batch: 11199/38378 (29.18%) Loss: 1.693284 LR: 0.00004502 [04:45:53] >> Cleaned up old temp checkpoint: epoch1_step9800 [04:45:53] >> Temp checkpoint saved: epoch1_step11200, size: 0.1702 GB [04:45:53] Epoch: 1 Batch: 11200/38378 (29.18%) Loss: 2.184376 LR: 0.00004502 [04:45:55] Epoch: 1 Batch: 11201/38378 (29.19%) Loss: 1.896840 LR: 0.00004502 [04:45:57] Epoch: 1 Batch: 11202/38378 (29.19%) Loss: 1.973236 LR: 0.00004501 [04:45:59] Epoch: 1 Batch: 11203/38378 (29.19%) Loss: 2.057072 LR: 0.00004501 [04:46:00] Epoch: 1 Batch: 11204/38378 (29.19%) Loss: 1.922331 LR: 0.00004501 [04:46:02] Epoch: 1 Batch: 11205/38378 (29.20%) Loss: 1.851018 LR: 0.00004501 [04:46:04] Epoch: 1 Batch: 11206/38378 (29.20%) Loss: 1.798468 LR: 0.00004501 [04:46:06] Epoch: 1 Batch: 11207/38378 (29.20%) Loss: 2.359436 LR: 0.00004501 [04:46:08] Epoch: 1 Batch: 11208/38378 (29.20%) Loss: 2.011555 LR: 0.00004501 [04:46:10] Epoch: 1 Batch: 11209/38378 (29.21%) Loss: 2.336864 LR: 0.00004500 [04:46:11] Epoch: 1 Batch: 11210/38378 (29.21%) Loss: 2.019753 LR: 0.00004500 [04:46:13] Epoch: 1 Batch: 11211/38378 (29.21%) Loss: 2.083151 LR: 0.00004500 [04:46:15] Epoch: 1 Batch: 11212/38378 (29.21%) Loss: 1.768294 LR: 0.00004500 [04:46:17] Epoch: 1 Batch: 11213/38378 (29.22%) Loss: 1.896361 LR: 0.00004500 [04:46:19] Epoch: 1 Batch: 11214/38378 (29.22%) Loss: 2.306327 LR: 0.00004500 [04:46:20] Epoch: 1 Batch: 11215/38378 (29.22%) Loss: 2.006748 LR: 0.00004500 [04:46:22] Epoch: 1 Batch: 11216/38378 (29.23%) Loss: 2.065486 LR: 0.00004499 [04:46:24] Epoch: 1 Batch: 11217/38378 (29.23%) Loss: 1.934965 LR: 0.00004499 [04:46:26] Epoch: 1 Batch: 11218/38378 (29.23%) Loss: 1.792967 LR: 0.00004499 [04:46:28] Epoch: 1 Batch: 11219/38378 (29.23%) Loss: 2.078615 LR: 0.00004499 [04:46:30] Epoch: 1 Batch: 11220/38378 (29.24%) Loss: 1.994571 LR: 0.00004499 [04:46:31] Epoch: 1 Batch: 11221/38378 (29.24%) Loss: 1.872356 LR: 0.00004499 [04:46:33] Epoch: 1 Batch: 11222/38378 (29.24%) Loss: 1.868151 LR: 0.00004499 [04:46:35] Epoch: 1 Batch: 11223/38378 (29.24%) Loss: 2.507131 LR: 0.00004498 [04:46:37] Epoch: 1 Batch: 11224/38378 (29.25%) Loss: 2.095558 LR: 0.00004498 [04:46:39] Epoch: 1 Batch: 11225/38378 (29.25%) Loss: 1.954479 LR: 0.00004498 [04:46:41] Epoch: 1 Batch: 11226/38378 (29.25%) Loss: 1.957539 LR: 0.00004498 [04:46:43] Epoch: 1 Batch: 11227/38378 (29.25%) Loss: 1.769883 LR: 0.00004498 [04:46:44] Epoch: 1 Batch: 11228/38378 (29.26%) Loss: 1.920227 LR: 0.00004498 [04:46:46] Epoch: 1 Batch: 11229/38378 (29.26%) Loss: 2.016807 LR: 0.00004498 [04:46:48] Epoch: 1 Batch: 11230/38378 (29.26%) Loss: 1.786622 LR: 0.00004498 [04:46:50] Epoch: 1 Batch: 11231/38378 (29.26%) Loss: 1.827622 LR: 0.00004498 [04:46:52] Epoch: 1 Batch: 11232/38378 (29.27%) Loss: 1.747607 LR: 0.00004498 [04:46:53] Epoch: 1 Batch: 11233/38378 (29.27%) Loss: 2.046749 LR: 0.00004498 [04:46:55] Epoch: 1 Batch: 11234/38378 (29.27%) Loss: 1.767756 LR: 0.00004498 [04:46:57] Epoch: 1 Batch: 11235/38378 (29.27%) Loss: 2.045485 LR: 0.00004498 [04:46:59] Epoch: 1 Batch: 11236/38378 (29.28%) Loss: 2.109749 LR: 0.00004498 [04:47:01] Epoch: 1 Batch: 11237/38378 (29.28%) Loss: 1.923301 LR: 0.00004497 [04:47:03] Epoch: 1 Batch: 11238/38378 (29.28%) Loss: 2.134276 LR: 0.00004497 [04:47:04] Epoch: 1 Batch: 11239/38378 (29.29%) Loss: 2.031230 LR: 0.00004497 [04:47:06] Epoch: 1 Batch: 11240/38378 (29.29%) Loss: 1.922970 LR: 0.00004497 [04:47:08] Epoch: 1 Batch: 11241/38378 (29.29%) Loss: 2.135774 LR: 0.00004497 [04:47:10] Epoch: 1 Batch: 11242/38378 (29.29%) Loss: 2.059282 LR: 0.00004497 [04:47:11] Epoch: 1 Batch: 11243/38378 (29.30%) Loss: 2.026691 LR: 0.00004497 [04:47:13] Epoch: 1 Batch: 11244/38378 (29.30%) Loss: 2.236313 LR: 0.00004496 [04:47:15] Epoch: 1 Batch: 11245/38378 (29.30%) Loss: 2.018555 LR: 0.00004496 [04:47:17] Epoch: 1 Batch: 11246/38378 (29.30%) Loss: 1.765023 LR: 0.00004496 [04:47:19] Epoch: 1 Batch: 11247/38378 (29.31%) Loss: 1.853291 LR: 0.00004496 [04:47:20] Epoch: 1 Batch: 11248/38378 (29.31%) Loss: 2.057438 LR: 0.00004496 [04:47:22] Epoch: 1 Batch: 11249/38378 (29.31%) Loss: 1.961257 LR: 0.00004496 [04:47:24] Epoch: 1 Batch: 11250/38378 (29.31%) Loss: 1.924082 LR: 0.00004496 [04:47:26] Epoch: 1 Batch: 11251/38378 (29.32%) Loss: 1.575449 LR: 0.00004495 [04:47:28] Epoch: 1 Batch: 11252/38378 (29.32%) Loss: 1.871308 LR: 0.00004495 [04:47:29] Epoch: 1 Batch: 11253/38378 (29.32%) Loss: 1.978413 LR: 0.00004495 [04:47:31] Epoch: 1 Batch: 11254/38378 (29.32%) Loss: 2.158601 LR: 0.00004495 [04:47:33] Epoch: 1 Batch: 11255/38378 (29.33%) Loss: 1.903791 LR: 0.00004495 [04:47:35] Epoch: 1 Batch: 11256/38378 (29.33%) Loss: 2.077929 LR: 0.00004495 [04:47:37] Epoch: 1 Batch: 11257/38378 (29.33%) Loss: 1.737569 LR: 0.00004495 [04:47:38] Epoch: 1 Batch: 11258/38378 (29.33%) Loss: 1.871662 LR: 0.00004494 [04:47:40] Epoch: 1 Batch: 11259/38378 (29.34%) Loss: 1.933571 LR: 0.00004494 [04:47:42] Epoch: 1 Batch: 11260/38378 (29.34%) Loss: 2.085270 LR: 0.00004494 [04:47:44] Epoch: 1 Batch: 11261/38378 (29.34%) Loss: 1.977281 LR: 0.00004494 [04:47:46] Epoch: 1 Batch: 11262/38378 (29.34%) Loss: 1.975281 LR: 0.00004494 [04:47:47] Epoch: 1 Batch: 11263/38378 (29.35%) Loss: 1.742806 LR: 0.00004494 [04:47:49] Epoch: 1 Batch: 11264/38378 (29.35%) Loss: 2.148059 LR: 0.00004494 [04:47:51] Epoch: 1 Batch: 11265/38378 (29.35%) Loss: 1.890183 LR: 0.00004494 [04:47:53] Epoch: 1 Batch: 11266/38378 (29.36%) Loss: 1.693635 LR: 0.00004494 [04:47:55] Epoch: 1 Batch: 11267/38378 (29.36%) Loss: 2.247982 LR: 0.00004494 [04:47:57] Epoch: 1 Batch: 11268/38378 (29.36%) Loss: 1.843875 LR: 0.00004494 [04:47:58] Epoch: 1 Batch: 11269/38378 (29.36%) Loss: 1.700894 LR: 0.00004494 [04:48:00] Epoch: 1 Batch: 11270/38378 (29.37%) Loss: 1.971350 LR: 0.00004494 [04:48:02] Epoch: 1 Batch: 11271/38378 (29.37%) Loss: 2.146259 LR: 0.00004494 [04:48:04] Epoch: 1 Batch: 11272/38378 (29.37%) Loss: 2.201580 LR: 0.00004493 [04:48:06] Epoch: 1 Batch: 11273/38378 (29.37%) Loss: 1.896176 LR: 0.00004493 [04:48:07] Epoch: 1 Batch: 11274/38378 (29.38%) Loss: 2.033486 LR: 0.00004493 [04:48:09] Epoch: 1 Batch: 11275/38378 (29.38%) Loss: 2.191892 LR: 0.00004493 [04:48:11] Epoch: 1 Batch: 11276/38378 (29.38%) Loss: 2.090381 LR: 0.00004493 [04:48:13] Epoch: 1 Batch: 11277/38378 (29.38%) Loss: 1.953594 LR: 0.00004493 [04:48:15] Epoch: 1 Batch: 11278/38378 (29.39%) Loss: 2.036361 LR: 0.00004493 [04:48:16] Epoch: 1 Batch: 11279/38378 (29.39%) Loss: 2.324868 LR: 0.00004492 [04:48:18] Epoch: 1 Batch: 11280/38378 (29.39%) Loss: 2.229543 LR: 0.00004492 [04:48:20] Epoch: 1 Batch: 11281/38378 (29.39%) Loss: 1.810736 LR: 0.00004492 [04:48:22] Epoch: 1 Batch: 11282/38378 (29.40%) Loss: 1.935212 LR: 0.00004492 [04:48:24] Epoch: 1 Batch: 11283/38378 (29.40%) Loss: 2.221744 LR: 0.00004492 [04:48:26] Epoch: 1 Batch: 11284/38378 (29.40%) Loss: 1.838974 LR: 0.00004492 [04:48:27] Epoch: 1 Batch: 11285/38378 (29.40%) Loss: 2.290731 LR: 0.00004492 [04:48:29] Epoch: 1 Batch: 11286/38378 (29.41%) Loss: 1.801353 LR: 0.00004491 [04:48:31] Epoch: 1 Batch: 11287/38378 (29.41%) Loss: 2.049397 LR: 0.00004491 [04:48:33] Epoch: 1 Batch: 11288/38378 (29.41%) Loss: 1.971835 LR: 0.00004491 [04:48:35] Epoch: 1 Batch: 11289/38378 (29.42%) Loss: 1.884371 LR: 0.00004491 [04:48:37] Epoch: 1 Batch: 11290/38378 (29.42%) Loss: 1.908892 LR: 0.00004491 [04:48:38] Epoch: 1 Batch: 11291/38378 (29.42%) Loss: 1.965167 LR: 0.00004491 [04:48:40] Epoch: 1 Batch: 11292/38378 (29.42%) Loss: 1.752321 LR: 0.00004491 [04:48:42] Epoch: 1 Batch: 11293/38378 (29.43%) Loss: 1.800513 LR: 0.00004490 [04:48:44] Epoch: 1 Batch: 11294/38378 (29.43%) Loss: 2.071554 LR: 0.00004490 [04:48:46] Epoch: 1 Batch: 11295/38378 (29.43%) Loss: 2.002298 LR: 0.00004490 [04:48:48] Epoch: 1 Batch: 11296/38378 (29.43%) Loss: 1.769203 LR: 0.00004490 [04:48:49] Epoch: 1 Batch: 11297/38378 (29.44%) Loss: 2.227477 LR: 0.00004490 [04:48:51] Epoch: 1 Batch: 11298/38378 (29.44%) Loss: 2.059594 LR: 0.00004490 [04:48:53] Epoch: 1 Batch: 11299/38378 (29.44%) Loss: 2.096157 LR: 0.00004490 [04:48:59] >> Cleaned up old temp checkpoint: epoch1_step10000 [04:48:59] >> Temp checkpoint saved: epoch1_step11300, size: 0.1702 GB [04:48:59] Epoch: 1 Batch: 11300/38378 (29.44%) Loss: 2.225298 LR: 0.00004489 [04:49:01] Epoch: 1 Batch: 11301/38378 (29.45%) Loss: 1.977475 LR: 0.00004489 [04:49:03] Epoch: 1 Batch: 11302/38378 (29.45%) Loss: 2.143869 LR: 0.00004489 [04:49:04] Epoch: 1 Batch: 11303/38378 (29.45%) Loss: 2.221218 LR: 0.00004489 [04:49:06] Epoch: 1 Batch: 11304/38378 (29.45%) Loss: 2.252809 LR: 0.00004489 [04:49:08] Epoch: 1 Batch: 11305/38378 (29.46%) Loss: 2.104841 LR: 0.00004489 [04:49:10] Epoch: 1 Batch: 11306/38378 (29.46%) Loss: 2.148064 LR: 0.00004489 [04:49:12] Epoch: 1 Batch: 11307/38378 (29.46%) Loss: 1.769749 LR: 0.00004489 [04:49:13] Epoch: 1 Batch: 11308/38378 (29.46%) Loss: 2.238701 LR: 0.00004489 [04:49:15] Epoch: 1 Batch: 11309/38378 (29.47%) Loss: 2.001847 LR: 0.00004489 [04:49:17] Epoch: 1 Batch: 11310/38378 (29.47%) Loss: 2.022028 LR: 0.00004489 [04:49:19] Epoch: 1 Batch: 11311/38378 (29.47%) Loss: 1.941760 LR: 0.00004489 [04:49:21] Epoch: 1 Batch: 11312/38378 (29.48%) Loss: 1.934844 LR: 0.00004489 [04:49:22] Epoch: 1 Batch: 11313/38378 (29.48%) Loss: 1.873126 LR: 0.00004489 [04:49:24] Epoch: 1 Batch: 11314/38378 (29.48%) Loss: 2.045126 LR: 0.00004488 [04:49:26] Epoch: 1 Batch: 11315/38378 (29.48%) Loss: 1.993307 LR: 0.00004488 [04:49:28] Epoch: 1 Batch: 11316/38378 (29.49%) Loss: 2.058062 LR: 0.00004488 [04:49:30] Epoch: 1 Batch: 11317/38378 (29.49%) Loss: 1.835294 LR: 0.00004488 [04:49:32] Epoch: 1 Batch: 11318/38378 (29.49%) Loss: 1.764918 LR: 0.00004488 [04:49:33] Epoch: 1 Batch: 11319/38378 (29.49%) Loss: 1.929772 LR: 0.00004488 [04:49:35] Epoch: 1 Batch: 11320/38378 (29.50%) Loss: 1.913255 LR: 0.00004488 [04:49:37] Epoch: 1 Batch: 11321/38378 (29.50%) Loss: 1.894351 LR: 0.00004487 [04:49:39] Epoch: 1 Batch: 11322/38378 (29.50%) Loss: 1.841590 LR: 0.00004487 [04:49:41] Epoch: 1 Batch: 11323/38378 (29.50%) Loss: 1.961521 LR: 0.00004487 [04:49:42] Epoch: 1 Batch: 11324/38378 (29.51%) Loss: 1.681467 LR: 0.00004487 [04:49:44] Epoch: 1 Batch: 11325/38378 (29.51%) Loss: 2.305100 LR: 0.00004487 [04:49:46] Epoch: 1 Batch: 11326/38378 (29.51%) Loss: 2.001129 LR: 0.00004487 [04:49:48] Epoch: 1 Batch: 11327/38378 (29.51%) Loss: 1.900039 LR: 0.00004487 [04:49:50] Epoch: 1 Batch: 11328/38378 (29.52%) Loss: 2.030700 LR: 0.00004486 [04:49:52] Epoch: 1 Batch: 11329/38378 (29.52%) Loss: 1.990338 LR: 0.00004486 [04:49:53] Epoch: 1 Batch: 11330/38378 (29.52%) Loss: 2.238939 LR: 0.00004486 [04:49:55] Epoch: 1 Batch: 11331/38378 (29.52%) Loss: 2.116067 LR: 0.00004486 [04:49:57] Epoch: 1 Batch: 11332/38378 (29.53%) Loss: 2.101171 LR: 0.00004486 [04:49:59] Epoch: 1 Batch: 11333/38378 (29.53%) Loss: 1.857990 LR: 0.00004486 [04:50:01] Epoch: 1 Batch: 11334/38378 (29.53%) Loss: 2.061959 LR: 0.00004486 [04:50:02] Epoch: 1 Batch: 11335/38378 (29.54%) Loss: 1.948217 LR: 0.00004485 [04:50:04] Epoch: 1 Batch: 11336/38378 (29.54%) Loss: 2.009365 LR: 0.00004485 [04:50:06] Epoch: 1 Batch: 11337/38378 (29.54%) Loss: 2.051650 LR: 0.00004485 [04:50:08] Epoch: 1 Batch: 11338/38378 (29.54%) Loss: 2.039364 LR: 0.00004485 [04:50:10] Epoch: 1 Batch: 11339/38378 (29.55%) Loss: 2.010065 LR: 0.00004485 [04:50:11] Epoch: 1 Batch: 11340/38378 (29.55%) Loss: 1.857397 LR: 0.00004485 [04:50:13] Epoch: 1 Batch: 11341/38378 (29.55%) Loss: 2.205501 LR: 0.00004485 [04:50:15] Epoch: 1 Batch: 11342/38378 (29.55%) Loss: 1.721957 LR: 0.00004484 [04:50:17] Epoch: 1 Batch: 11343/38378 (29.56%) Loss: 2.195068 LR: 0.00004484 [04:50:19] Epoch: 1 Batch: 11344/38378 (29.56%) Loss: 2.050568 LR: 0.00004484 [04:50:20] Epoch: 1 Batch: 11345/38378 (29.56%) Loss: 1.759090 LR: 0.00004484 [04:50:22] Epoch: 1 Batch: 11346/38378 (29.56%) Loss: 1.934346 LR: 0.00004484 [04:50:24] Epoch: 1 Batch: 11347/38378 (29.57%) Loss: 2.035136 LR: 0.00004484 [04:50:26] Epoch: 1 Batch: 11348/38378 (29.57%) Loss: 1.845507 LR: 0.00004484 [04:50:28] Epoch: 1 Batch: 11349/38378 (29.57%) Loss: 2.352113 LR: 0.00004484 [04:50:29] Epoch: 1 Batch: 11350/38378 (29.57%) Loss: 1.936177 LR: 0.00004484 [04:50:31] Epoch: 1 Batch: 11351/38378 (29.58%) Loss: 2.237672 LR: 0.00004484 [04:50:33] Epoch: 1 Batch: 11352/38378 (29.58%) Loss: 1.887175 LR: 0.00004484 [04:50:35] Epoch: 1 Batch: 11353/38378 (29.58%) Loss: 1.969030 LR: 0.00004484 [04:50:37] Epoch: 1 Batch: 11354/38378 (29.58%) Loss: 1.852456 LR: 0.00004484 [04:50:38] Epoch: 1 Batch: 11355/38378 (29.59%) Loss: 1.982347 LR: 0.00004484 [04:50:40] Epoch: 1 Batch: 11356/38378 (29.59%) Loss: 1.883713 LR: 0.00004483 [04:50:42] Epoch: 1 Batch: 11357/38378 (29.59%) Loss: 1.912082 LR: 0.00004483 [04:50:44] Epoch: 1 Batch: 11358/38378 (29.60%) Loss: 1.959767 LR: 0.00004483 [04:50:46] Epoch: 1 Batch: 11359/38378 (29.60%) Loss: 2.000479 LR: 0.00004483 [04:50:48] Epoch: 1 Batch: 11360/38378 (29.60%) Loss: 1.983152 LR: 0.00004483 [04:50:49] Epoch: 1 Batch: 11361/38378 (29.60%) Loss: 1.781639 LR: 0.00004483 [04:50:51] Epoch: 1 Batch: 11362/38378 (29.61%) Loss: 2.376374 LR: 0.00004483 [04:50:53] Epoch: 1 Batch: 11363/38378 (29.61%) Loss: 1.773822 LR: 0.00004482 [04:50:55] Epoch: 1 Batch: 11364/38378 (29.61%) Loss: 1.920466 LR: 0.00004482 [04:50:57] Epoch: 1 Batch: 11365/38378 (29.61%) Loss: 2.092187 LR: 0.00004482 [04:50:59] Epoch: 1 Batch: 11366/38378 (29.62%) Loss: 2.417500 LR: 0.00004482 [04:51:00] Epoch: 1 Batch: 11367/38378 (29.62%) Loss: 1.794413 LR: 0.00004482 [04:51:02] Epoch: 1 Batch: 11368/38378 (29.62%) Loss: 2.243563 LR: 0.00004482 [04:51:04] Epoch: 1 Batch: 11369/38378 (29.62%) Loss: 1.969418 LR: 0.00004482 [04:51:06] Epoch: 1 Batch: 11370/38378 (29.63%) Loss: 1.750369 LR: 0.00004481 [04:51:08] Epoch: 1 Batch: 11371/38378 (29.63%) Loss: 1.838813 LR: 0.00004481 [04:51:10] Epoch: 1 Batch: 11372/38378 (29.63%) Loss: 2.046696 LR: 0.00004481 [04:51:11] Epoch: 1 Batch: 11373/38378 (29.63%) Loss: 1.912720 LR: 0.00004481 [04:51:13] Epoch: 1 Batch: 11374/38378 (29.64%) Loss: 2.337198 LR: 0.00004481 [04:51:15] Epoch: 1 Batch: 11375/38378 (29.64%) Loss: 2.110298 LR: 0.00004481 [04:51:17] Epoch: 1 Batch: 11376/38378 (29.64%) Loss: 1.891698 LR: 0.00004481 [04:51:19] Epoch: 1 Batch: 11377/38378 (29.64%) Loss: 1.848754 LR: 0.00004480 [04:51:21] Epoch: 1 Batch: 11378/38378 (29.65%) Loss: 1.939450 LR: 0.00004480 [04:51:22] Epoch: 1 Batch: 11379/38378 (29.65%) Loss: 1.767565 LR: 0.00004480 [04:51:24] Epoch: 1 Batch: 11380/38378 (29.65%) Loss: 2.089465 LR: 0.00004480 [04:51:26] Epoch: 1 Batch: 11381/38378 (29.66%) Loss: 2.210090 LR: 0.00004480 [04:51:28] Epoch: 1 Batch: 11382/38378 (29.66%) Loss: 2.473781 LR: 0.00004480 [04:51:30] Epoch: 1 Batch: 11383/38378 (29.66%) Loss: 1.838557 LR: 0.00004480 [04:51:31] Epoch: 1 Batch: 11384/38378 (29.66%) Loss: 1.969883 LR: 0.00004479 [04:51:33] Epoch: 1 Batch: 11385/38378 (29.67%) Loss: 1.761035 LR: 0.00004479 [04:51:35] Epoch: 1 Batch: 11386/38378 (29.67%) Loss: 2.053845 LR: 0.00004479 [04:51:37] Epoch: 1 Batch: 11387/38378 (29.67%) Loss: 2.003353 LR: 0.00004479 [04:51:39] Epoch: 1 Batch: 11388/38378 (29.67%) Loss: 1.941875 LR: 0.00004479 [04:51:41] Epoch: 1 Batch: 11389/38378 (29.68%) Loss: 1.992371 LR: 0.00004479 [04:51:42] Epoch: 1 Batch: 11390/38378 (29.68%) Loss: 1.783024 LR: 0.00004479 [04:51:44] Epoch: 1 Batch: 11391/38378 (29.68%) Loss: 1.862634 LR: 0.00004479 [04:51:46] Epoch: 1 Batch: 11392/38378 (29.68%) Loss: 2.199442 LR: 0.00004479 [04:51:48] Epoch: 1 Batch: 11393/38378 (29.69%) Loss: 2.356360 LR: 0.00004479 [04:51:50] Epoch: 1 Batch: 11394/38378 (29.69%) Loss: 2.163617 LR: 0.00004479 [04:51:51] Epoch: 1 Batch: 11395/38378 (29.69%) Loss: 1.981883 LR: 0.00004479 [04:51:53] Epoch: 1 Batch: 11396/38378 (29.69%) Loss: 1.991077 LR: 0.00004479 [04:51:55] Epoch: 1 Batch: 11397/38378 (29.70%) Loss: 1.999778 LR: 0.00004479 [04:51:57] Epoch: 1 Batch: 11398/38378 (29.70%) Loss: 1.785602 LR: 0.00004478 [04:51:59] Epoch: 1 Batch: 11399/38378 (29.70%) Loss: 1.989616 LR: 0.00004478 [04:52:05] >> Cleaned up old temp checkpoint: epoch1_step10200 [04:52:05] >> Temp checkpoint saved: epoch1_step11400, size: 0.1702 GB [04:52:05] Epoch: 1 Batch: 11400/38378 (29.70%) Loss: 1.951318 LR: 0.00004478 [04:52:07] Epoch: 1 Batch: 11401/38378 (29.71%) Loss: 2.121742 LR: 0.00004478 [04:52:08] Epoch: 1 Batch: 11402/38378 (29.71%) Loss: 2.146710 LR: 0.00004478 [04:52:10] Epoch: 1 Batch: 11403/38378 (29.71%) Loss: 1.931729 LR: 0.00004478 [04:52:12] Epoch: 1 Batch: 11404/38378 (29.71%) Loss: 1.890844 LR: 0.00004478 [04:52:14] Epoch: 1 Batch: 11405/38378 (29.72%) Loss: 1.935974 LR: 0.00004477 [04:52:15] Epoch: 1 Batch: 11406/38378 (29.72%) Loss: 2.135721 LR: 0.00004477 [04:52:17] Epoch: 1 Batch: 11407/38378 (29.72%) Loss: 1.865294 LR: 0.00004477 [04:52:19] Epoch: 1 Batch: 11408/38378 (29.73%) Loss: 2.151669 LR: 0.00004477 [04:52:21] Epoch: 1 Batch: 11409/38378 (29.73%) Loss: 1.716630 LR: 0.00004477 [04:52:23] Epoch: 1 Batch: 11410/38378 (29.73%) Loss: 2.191932 LR: 0.00004477 [04:52:25] Epoch: 1 Batch: 11411/38378 (29.73%) Loss: 1.857214 LR: 0.00004477 [04:52:26] Epoch: 1 Batch: 11412/38378 (29.74%) Loss: 2.040390 LR: 0.00004476 [04:52:28] Epoch: 1 Batch: 11413/38378 (29.74%) Loss: 2.080357 LR: 0.00004476 [04:52:30] Epoch: 1 Batch: 11414/38378 (29.74%) Loss: 2.071606 LR: 0.00004476 [04:52:32] Epoch: 1 Batch: 11415/38378 (29.74%) Loss: 2.153327 LR: 0.00004476 [04:52:34] Epoch: 1 Batch: 11416/38378 (29.75%) Loss: 2.148752 LR: 0.00004476 [04:52:36] Epoch: 1 Batch: 11417/38378 (29.75%) Loss: 2.193048 LR: 0.00004476 [04:52:38] Epoch: 1 Batch: 11418/38378 (29.75%) Loss: 1.790634 LR: 0.00004476 [04:52:39] Epoch: 1 Batch: 11419/38378 (29.75%) Loss: 2.251047 LR: 0.00004475 [04:52:41] Epoch: 1 Batch: 11420/38378 (29.76%) Loss: 2.155799 LR: 0.00004475 [04:52:43] Epoch: 1 Batch: 11421/38378 (29.76%) Loss: 1.802927 LR: 0.00004475 [04:52:45] Epoch: 1 Batch: 11422/38378 (29.76%) Loss: 1.959521 LR: 0.00004475 [04:52:47] Epoch: 1 Batch: 11423/38378 (29.76%) Loss: 2.165535 LR: 0.00004475 [04:52:48] Epoch: 1 Batch: 11424/38378 (29.77%) Loss: 1.742899 LR: 0.00004475 [04:52:50] Epoch: 1 Batch: 11425/38378 (29.77%) Loss: 2.124980 LR: 0.00004475 [04:52:52] Epoch: 1 Batch: 11426/38378 (29.77%) Loss: 2.052035 LR: 0.00004474 [04:52:54] Epoch: 1 Batch: 11427/38378 (29.77%) Loss: 2.210299 LR: 0.00004474 [04:52:56] Epoch: 1 Batch: 11428/38378 (29.78%) Loss: 1.814276 LR: 0.00004474 [04:52:58] Epoch: 1 Batch: 11429/38378 (29.78%) Loss: 2.078919 LR: 0.00004474 [04:52:59] Epoch: 1 Batch: 11430/38378 (29.78%) Loss: 2.184421 LR: 0.00004474 [04:53:01] Epoch: 1 Batch: 11431/38378 (29.79%) Loss: 1.790305 LR: 0.00004474 [04:53:03] Epoch: 1 Batch: 11432/38378 (29.79%) Loss: 1.962905 LR: 0.00004474 [04:53:05] Epoch: 1 Batch: 11433/38378 (29.79%) Loss: 1.741565 LR: 0.00004473 [04:53:06] Epoch: 1 Batch: 11434/38378 (29.79%) Loss: 2.183784 LR: 0.00004473 [04:53:08] Epoch: 1 Batch: 11435/38378 (29.80%) Loss: 1.949869 LR: 0.00004473 [04:53:10] Epoch: 1 Batch: 11436/38378 (29.80%) Loss: 1.961224 LR: 0.00004473 [04:53:12] Epoch: 1 Batch: 11437/38378 (29.80%) Loss: 2.044248 LR: 0.00004473 [04:53:14] Epoch: 1 Batch: 11438/38378 (29.80%) Loss: 2.099345 LR: 0.00004473 [04:53:15] Epoch: 1 Batch: 11439/38378 (29.81%) Loss: 1.994163 LR: 0.00004473 [04:53:17] Epoch: 1 Batch: 11440/38378 (29.81%) Loss: 2.101351 LR: 0.00004473 [04:53:19] Epoch: 1 Batch: 11441/38378 (29.81%) Loss: 1.987179 LR: 0.00004473 [04:53:21] Epoch: 1 Batch: 11442/38378 (29.81%) Loss: 2.222500 LR: 0.00004473 [04:53:23] Epoch: 1 Batch: 11443/38378 (29.82%) Loss: 1.763257 LR: 0.00004473 [04:53:24] Epoch: 1 Batch: 11444/38378 (29.82%) Loss: 1.886707 LR: 0.00004473 [04:53:26] Epoch: 1 Batch: 11445/38378 (29.82%) Loss: 2.157727 LR: 0.00004473 [04:53:28] Epoch: 1 Batch: 11446/38378 (29.82%) Loss: 2.173677 LR: 0.00004473 [04:53:30] Epoch: 1 Batch: 11447/38378 (29.83%) Loss: 1.918796 LR: 0.00004472 [04:53:32] Epoch: 1 Batch: 11448/38378 (29.83%) Loss: 2.099858 LR: 0.00004472 [04:53:33] Epoch: 1 Batch: 11449/38378 (29.83%) Loss: 1.905834 LR: 0.00004472 [04:53:35] Epoch: 1 Batch: 11450/38378 (29.83%) Loss: 2.150176 LR: 0.00004472 [04:53:37] Epoch: 1 Batch: 11451/38378 (29.84%) Loss: 2.036411 LR: 0.00004472 [04:53:39] Epoch: 1 Batch: 11452/38378 (29.84%) Loss: 2.048972 LR: 0.00004472 [04:53:41] Epoch: 1 Batch: 11453/38378 (29.84%) Loss: 1.924755 LR: 0.00004472 [04:53:43] Epoch: 1 Batch: 11454/38378 (29.85%) Loss: 1.758272 LR: 0.00004471 [04:53:44] Epoch: 1 Batch: 11455/38378 (29.85%) Loss: 2.255157 LR: 0.00004471 [04:53:46] Epoch: 1 Batch: 11456/38378 (29.85%) Loss: 2.229823 LR: 0.00004471 [04:53:48] Epoch: 1 Batch: 11457/38378 (29.85%) Loss: 2.345565 LR: 0.00004471 [04:53:50] Epoch: 1 Batch: 11458/38378 (29.86%) Loss: 1.987796 LR: 0.00004471 [04:53:52] Epoch: 1 Batch: 11459/38378 (29.86%) Loss: 1.973680 LR: 0.00004471 [04:53:53] Epoch: 1 Batch: 11460/38378 (29.86%) Loss: 1.971297 LR: 0.00004471 [04:53:55] Epoch: 1 Batch: 11461/38378 (29.86%) Loss: 1.936365 LR: 0.00004470 [04:53:57] Epoch: 1 Batch: 11462/38378 (29.87%) Loss: 2.481089 LR: 0.00004470 [04:53:59] Epoch: 1 Batch: 11463/38378 (29.87%) Loss: 2.057058 LR: 0.00004470 [04:54:01] Epoch: 1 Batch: 11464/38378 (29.87%) Loss: 1.983630 LR: 0.00004470 [04:54:02] Epoch: 1 Batch: 11465/38378 (29.87%) Loss: 2.115111 LR: 0.00004470 [04:54:04] Epoch: 1 Batch: 11466/38378 (29.88%) Loss: 2.160908 LR: 0.00004470 [04:54:06] Epoch: 1 Batch: 11467/38378 (29.88%) Loss: 1.878218 LR: 0.00004470 [04:54:08] Epoch: 1 Batch: 11468/38378 (29.88%) Loss: 2.004966 LR: 0.00004469 [04:54:10] Epoch: 1 Batch: 11469/38378 (29.88%) Loss: 1.821823 LR: 0.00004469 [04:54:12] Epoch: 1 Batch: 11470/38378 (29.89%) Loss: 1.910521 LR: 0.00004469 [04:54:13] Epoch: 1 Batch: 11471/38378 (29.89%) Loss: 2.323449 LR: 0.00004469 [04:54:15] Epoch: 1 Batch: 11472/38378 (29.89%) Loss: 1.999254 LR: 0.00004469 [04:54:17] Epoch: 1 Batch: 11473/38378 (29.89%) Loss: 1.989103 LR: 0.00004469 [04:54:19] Epoch: 1 Batch: 11474/38378 (29.90%) Loss: 2.143398 LR: 0.00004469 [04:54:21] Epoch: 1 Batch: 11475/38378 (29.90%) Loss: 1.994376 LR: 0.00004468 [04:54:22] Epoch: 1 Batch: 11476/38378 (29.90%) Loss: 1.963411 LR: 0.00004468 [04:54:24] Epoch: 1 Batch: 11477/38378 (29.91%) Loss: 1.985763 LR: 0.00004468 [04:54:26] Epoch: 1 Batch: 11478/38378 (29.91%) Loss: 1.752858 LR: 0.00004468 [04:54:28] Epoch: 1 Batch: 11479/38378 (29.91%) Loss: 2.133915 LR: 0.00004468 [04:54:30] Epoch: 1 Batch: 11480/38378 (29.91%) Loss: 1.879266 LR: 0.00004468 [04:54:31] Epoch: 1 Batch: 11481/38378 (29.92%) Loss: 1.861118 LR: 0.00004468 [04:54:33] Epoch: 1 Batch: 11482/38378 (29.92%) Loss: 2.121920 LR: 0.00004468 [04:54:35] Epoch: 1 Batch: 11483/38378 (29.92%) Loss: 2.029787 LR: 0.00004468 [04:54:37] Epoch: 1 Batch: 11484/38378 (29.92%) Loss: 2.084710 LR: 0.00004468 [04:54:39] Epoch: 1 Batch: 11485/38378 (29.93%) Loss: 1.644493 LR: 0.00004468 [04:54:41] Epoch: 1 Batch: 11486/38378 (29.93%) Loss: 1.965889 LR: 0.00004468 [04:54:42] Epoch: 1 Batch: 11487/38378 (29.93%) Loss: 1.888628 LR: 0.00004468 [04:54:44] Epoch: 1 Batch: 11488/38378 (29.93%) Loss: 2.110315 LR: 0.00004468 [04:54:46] Epoch: 1 Batch: 11489/38378 (29.94%) Loss: 1.982128 LR: 0.00004467 [04:54:48] Epoch: 1 Batch: 11490/38378 (29.94%) Loss: 2.151216 LR: 0.00004467 [04:54:50] Epoch: 1 Batch: 11491/38378 (29.94%) Loss: 2.096279 LR: 0.00004467 [04:54:51] Epoch: 1 Batch: 11492/38378 (29.94%) Loss: 2.008268 LR: 0.00004467 [04:54:53] Epoch: 1 Batch: 11493/38378 (29.95%) Loss: 1.801496 LR: 0.00004467 [04:54:55] Epoch: 1 Batch: 11494/38378 (29.95%) Loss: 2.209768 LR: 0.00004467 [04:54:57] Epoch: 1 Batch: 11495/38378 (29.95%) Loss: 1.929123 LR: 0.00004467 [04:54:59] Epoch: 1 Batch: 11496/38378 (29.95%) Loss: 1.895121 LR: 0.00004466 [04:55:00] Epoch: 1 Batch: 11497/38378 (29.96%) Loss: 2.035752 LR: 0.00004466 [04:55:02] Epoch: 1 Batch: 11498/38378 (29.96%) Loss: 1.887603 LR: 0.00004466 [04:55:04] Epoch: 1 Batch: 11499/38378 (29.96%) Loss: 2.034011 LR: 0.00004466 [04:55:06] >> Evaluating batch 0 [04:55:07] >> Evaluating batch 1 [04:55:08] >> Evaluating batch 2 [04:55:09] >> Evaluating batch 3 [04:55:10] >> Evaluating batch 4 [04:55:11] >> Evaluating batch 5 [04:55:12] >> Evaluating batch 6 [04:55:13] >> Evaluating batch 7 [04:55:14] >> Evaluating batch 8 [04:55:15] >> Evaluating batch 9 [04:55:16] >> Evaluating batch 10 [04:55:17] >> Evaluating batch 11 [04:55:18] >> Evaluating batch 12 [04:55:19] >> Evaluating batch 13 [04:55:20] >> Evaluating batch 14 [04:55:21] >> Evaluating batch 15 [04:55:22] >> Evaluating batch 16 [04:55:23] Epoch: 1 Step: 11500/38378 Evaluation: [04:55:23] [1mAvg Loss Since Last Eval: 1.9981 Val Loss: 2.1085 Validation loss delta: -0.0042 Perplexity: 8.2363 LR: 0.00004466 [04:55:27] >> Cleaned up old temp checkpoint: epoch1_step10400 [04:55:27] >> Temp checkpoint saved: epoch1_step11500, size: 0.1702 GB [04:55:31] >> Checkpoint saved: epoch1_step11500, size: 0.1702 GB [04:55:31] Epoch: 1 Batch: 11500/38378 (29.97%) Loss: 1.948955 LR: 0.00004466 [04:55:33] Epoch: 1 Batch: 11501/38378 (29.97%) Loss: 1.883398 LR: 0.00004466 [04:55:35] Epoch: 1 Batch: 11502/38378 (29.97%) Loss: 2.173896 LR: 0.00004466 [04:55:37] Epoch: 1 Batch: 11503/38378 (29.97%) Loss: 1.849763 LR: 0.00004465 [04:55:38] Epoch: 1 Batch: 11504/38378 (29.98%) Loss: 2.423519 LR: 0.00004465 [04:55:40] Epoch: 1 Batch: 11505/38378 (29.98%) Loss: 1.866342 LR: 0.00004465 [04:55:42] Epoch: 1 Batch: 11506/38378 (29.98%) Loss: 1.875698 LR: 0.00004465 [04:55:43] Epoch: 1 Batch: 11507/38378 (29.98%) Loss: 2.172577 LR: 0.00004465 [04:55:45] Epoch: 1 Batch: 11508/38378 (29.99%) Loss: 2.098986 LR: 0.00004465 [04:55:47] Epoch: 1 Batch: 11509/38378 (29.99%) Loss: 1.949705 LR: 0.00004465 [04:55:49] Epoch: 1 Batch: 11510/38378 (29.99%) Loss: 2.068558 LR: 0.00004464 [04:55:51] Epoch: 1 Batch: 11511/38378 (29.99%) Loss: 1.727116 LR: 0.00004464 [04:55:52] Epoch: 1 Batch: 11512/38378 (30.00%) Loss: 2.402944 LR: 0.00004464 [04:55:54] Epoch: 1 Batch: 11513/38378 (30.00%) Loss: 1.910940 LR: 0.00004464 [04:55:56] Epoch: 1 Batch: 11514/38378 (30.00%) Loss: 2.101334 LR: 0.00004464 [04:55:58] Epoch: 1 Batch: 11515/38378 (30.00%) Loss: 2.216985 LR: 0.00004464 [04:56:00] Epoch: 1 Batch: 11516/38378 (30.01%) Loss: 1.827977 LR: 0.00004464 [04:56:02] Epoch: 1 Batch: 11517/38378 (30.01%) Loss: 1.761559 LR: 0.00004463 [04:56:04] Epoch: 1 Batch: 11518/38378 (30.01%) Loss: 1.865615 LR: 0.00004463 [04:56:06] Epoch: 1 Batch: 11519/38378 (30.01%) Loss: 1.866039 LR: 0.00004463 [04:56:07] Epoch: 1 Batch: 11520/38378 (30.02%) Loss: 1.861825 LR: 0.00004463 [04:56:09] Epoch: 1 Batch: 11521/38378 (30.02%) Loss: 2.169432 LR: 0.00004463 [04:56:11] Epoch: 1 Batch: 11522/38378 (30.02%) Loss: 1.864078 LR: 0.00004463 [04:56:13] Epoch: 1 Batch: 11523/38378 (30.03%) Loss: 2.025096 LR: 0.00004463 [04:56:15] Epoch: 1 Batch: 11524/38378 (30.03%) Loss: 1.913737 LR: 0.00004462 [04:56:16] Epoch: 1 Batch: 11525/38378 (30.03%) Loss: 1.959530 LR: 0.00004462 [04:56:18] Epoch: 1 Batch: 11526/38378 (30.03%) Loss: 2.030679 LR: 0.00004462 [04:56:20] Epoch: 1 Batch: 11527/38378 (30.04%) Loss: 1.964856 LR: 0.00004462 [04:56:22] Epoch: 1 Batch: 11528/38378 (30.04%) Loss: 2.054618 LR: 0.00004462 [04:56:24] Epoch: 1 Batch: 11529/38378 (30.04%) Loss: 1.933062 LR: 0.00004462 [04:56:25] Epoch: 1 Batch: 11530/38378 (30.04%) Loss: 1.952242 LR: 0.00004462 [04:56:27] Epoch: 1 Batch: 11531/38378 (30.05%) Loss: 2.190073 LR: 0.00004462 [04:56:29] Epoch: 1 Batch: 11532/38378 (30.05%) Loss: 2.000066 LR: 0.00004462 [04:56:31] Epoch: 1 Batch: 11533/38378 (30.05%) Loss: 1.733221 LR: 0.00004462 [04:56:33] Epoch: 1 Batch: 11534/38378 (30.05%) Loss: 2.065878 LR: 0.00004462 [04:56:35] Epoch: 1 Batch: 11535/38378 (30.06%) Loss: 1.966156 LR: 0.00004462 [04:56:36] Epoch: 1 Batch: 11536/38378 (30.06%) Loss: 1.889576 LR: 0.00004462 [04:56:38] Epoch: 1 Batch: 11537/38378 (30.06%) Loss: 2.037215 LR: 0.00004462 [04:56:40] Epoch: 1 Batch: 11538/38378 (30.06%) Loss: 1.921322 LR: 0.00004461 [04:56:42] Epoch: 1 Batch: 11539/38378 (30.07%) Loss: 1.849024 LR: 0.00004461 [04:56:44] Epoch: 1 Batch: 11540/38378 (30.07%) Loss: 1.628726 LR: 0.00004461 [04:56:45] Epoch: 1 Batch: 11541/38378 (30.07%) Loss: 2.329019 LR: 0.00004461 [04:56:47] Epoch: 1 Batch: 11542/38378 (30.07%) Loss: 1.795775 LR: 0.00004461 [04:56:49] Epoch: 1 Batch: 11543/38378 (30.08%) Loss: 2.218691 LR: 0.00004461 [04:56:51] Epoch: 1 Batch: 11544/38378 (30.08%) Loss: 1.588861 LR: 0.00004461 [04:56:53] Epoch: 1 Batch: 11545/38378 (30.08%) Loss: 2.315389 LR: 0.00004460 [04:56:54] Epoch: 1 Batch: 11546/38378 (30.08%) Loss: 1.772566 LR: 0.00004460 [04:56:56] Epoch: 1 Batch: 11547/38378 (30.09%) Loss: 1.834104 LR: 0.00004460 [04:56:58] Epoch: 1 Batch: 11548/38378 (30.09%) Loss: 1.764326 LR: 0.00004460 [04:57:00] Epoch: 1 Batch: 11549/38378 (30.09%) Loss: 1.762440 LR: 0.00004460 [04:57:02] Epoch: 1 Batch: 11550/38378 (30.10%) Loss: 2.067158 LR: 0.00004460 [04:57:03] Epoch: 1 Batch: 11551/38378 (30.10%) Loss: 1.933496 LR: 0.00004460 [04:57:05] Epoch: 1 Batch: 11552/38378 (30.10%) Loss: 1.956532 LR: 0.00004459 [04:57:07] Epoch: 1 Batch: 11553/38378 (30.10%) Loss: 1.796546 LR: 0.00004459 [04:57:09] Epoch: 1 Batch: 11554/38378 (30.11%) Loss: 2.071321 LR: 0.00004459 [04:57:11] Epoch: 1 Batch: 11555/38378 (30.11%) Loss: 2.045249 LR: 0.00004459 [04:57:12] Epoch: 1 Batch: 11556/38378 (30.11%) Loss: 2.202828 LR: 0.00004459 [04:57:14] Epoch: 1 Batch: 11557/38378 (30.11%) Loss: 2.117436 LR: 0.00004459 [04:57:16] Epoch: 1 Batch: 11558/38378 (30.12%) Loss: 1.802780 LR: 0.00004459 [04:57:18] Epoch: 1 Batch: 11559/38378 (30.12%) Loss: 1.752658 LR: 0.00004458 [04:57:20] Epoch: 1 Batch: 11560/38378 (30.12%) Loss: 2.306454 LR: 0.00004458 [04:57:22] Epoch: 1 Batch: 11561/38378 (30.12%) Loss: 2.235195 LR: 0.00004458 [04:57:23] Epoch: 1 Batch: 11562/38378 (30.13%) Loss: 2.092474 LR: 0.00004458 [04:57:25] Epoch: 1 Batch: 11563/38378 (30.13%) Loss: 1.951500 LR: 0.00004458 [04:57:27] Epoch: 1 Batch: 11564/38378 (30.13%) Loss: 1.931247 LR: 0.00004458 [04:57:29] Epoch: 1 Batch: 11565/38378 (30.13%) Loss: 2.269389 LR: 0.00004458 [04:57:31] Epoch: 1 Batch: 11566/38378 (30.14%) Loss: 2.160623 LR: 0.00004457 [04:57:33] Epoch: 1 Batch: 11567/38378 (30.14%) Loss: 1.716418 LR: 0.00004457 [04:57:34] Epoch: 1 Batch: 11568/38378 (30.14%) Loss: 2.001514 LR: 0.00004457 [04:57:36] Epoch: 1 Batch: 11569/38378 (30.14%) Loss: 2.046955 LR: 0.00004457 [04:57:38] Epoch: 1 Batch: 11570/38378 (30.15%) Loss: 1.712204 LR: 0.00004457 [04:57:40] Epoch: 1 Batch: 11571/38378 (30.15%) Loss: 1.859469 LR: 0.00004457 [04:57:42] Epoch: 1 Batch: 11572/38378 (30.15%) Loss: 1.963666 LR: 0.00004457 [04:57:43] Epoch: 1 Batch: 11573/38378 (30.16%) Loss: 2.329182 LR: 0.00004457 [04:57:45] Epoch: 1 Batch: 11574/38378 (30.16%) Loss: 2.331878 LR: 0.00004457 [04:57:47] Epoch: 1 Batch: 11575/38378 (30.16%) Loss: 2.041706 LR: 0.00004457 [04:57:49] Epoch: 1 Batch: 11576/38378 (30.16%) Loss: 1.757247 LR: 0.00004457 [04:57:51] Epoch: 1 Batch: 11577/38378 (30.17%) Loss: 2.011062 LR: 0.00004457 [04:57:53] Epoch: 1 Batch: 11578/38378 (30.17%) Loss: 1.910385 LR: 0.00004457 [04:57:54] Epoch: 1 Batch: 11579/38378 (30.17%) Loss: 2.013531 LR: 0.00004457 [04:57:56] Epoch: 1 Batch: 11580/38378 (30.17%) Loss: 1.920085 LR: 0.00004456 [04:57:58] Epoch: 1 Batch: 11581/38378 (30.18%) Loss: 1.784972 LR: 0.00004456 [04:58:00] Epoch: 1 Batch: 11582/38378 (30.18%) Loss: 1.919157 LR: 0.00004456 [04:58:02] Epoch: 1 Batch: 11583/38378 (30.18%) Loss: 2.009622 LR: 0.00004456 [04:58:03] Epoch: 1 Batch: 11584/38378 (30.18%) Loss: 1.994243 LR: 0.00004456 [04:58:05] Epoch: 1 Batch: 11585/38378 (30.19%) Loss: 1.857997 LR: 0.00004456 [04:58:07] Epoch: 1 Batch: 11586/38378 (30.19%) Loss: 1.929991 LR: 0.00004456 [04:58:09] Epoch: 1 Batch: 11587/38378 (30.19%) Loss: 1.983997 LR: 0.00004455 [04:58:11] Epoch: 1 Batch: 11588/38378 (30.19%) Loss: 1.986314 LR: 0.00004455 [04:58:12] Epoch: 1 Batch: 11589/38378 (30.20%) Loss: 2.018373 LR: 0.00004455 [04:58:14] Epoch: 1 Batch: 11590/38378 (30.20%) Loss: 1.861141 LR: 0.00004455 [04:58:16] Epoch: 1 Batch: 11591/38378 (30.20%) Loss: 1.956135 LR: 0.00004455 [04:58:18] Epoch: 1 Batch: 11592/38378 (30.20%) Loss: 2.310229 LR: 0.00004455 [04:58:20] Epoch: 1 Batch: 11593/38378 (30.21%) Loss: 2.117678 LR: 0.00004455 [04:58:21] Epoch: 1 Batch: 11594/38378 (30.21%) Loss: 1.677484 LR: 0.00004454 [04:58:23] Epoch: 1 Batch: 11595/38378 (30.21%) Loss: 2.007138 LR: 0.00004454 [04:58:25] Epoch: 1 Batch: 11596/38378 (30.22%) Loss: 2.341022 LR: 0.00004454 [04:58:27] Epoch: 1 Batch: 11597/38378 (30.22%) Loss: 1.859484 LR: 0.00004454 [04:58:29] Epoch: 1 Batch: 11598/38378 (30.22%) Loss: 1.959572 LR: 0.00004454 [04:58:30] Epoch: 1 Batch: 11599/38378 (30.22%) Loss: 1.986822 LR: 0.00004454 [04:58:37] >> Cleaned up old temp checkpoint: epoch1_step10600 [04:58:37] >> Deleted old temp checkpoint zip: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step10600.zip [04:58:37] >> Temp checkpoint saved: epoch1_step11600, size: 0.1702 GB [04:58:37] Epoch: 1 Batch: 11600/38378 (30.23%) Loss: 2.063007 LR: 0.00004454 [04:58:39] Epoch: 1 Batch: 11601/38378 (30.23%) Loss: 2.211144 LR: 0.00004453 [04:58:40] Epoch: 1 Batch: 11602/38378 (30.23%) Loss: 1.931343 LR: 0.00004453 [04:58:42] Epoch: 1 Batch: 11603/38378 (30.23%) Loss: 2.152757 LR: 0.00004453 [04:58:44] Epoch: 1 Batch: 11604/38378 (30.24%) Loss: 1.868549 LR: 0.00004453 [04:58:46] Epoch: 1 Batch: 11605/38378 (30.24%) Loss: 1.826070 LR: 0.00004453 [04:58:47] Epoch: 1 Batch: 11606/38378 (30.24%) Loss: 2.154822 LR: 0.00004453 [04:58:49] Epoch: 1 Batch: 11607/38378 (30.24%) Loss: 1.849347 LR: 0.00004453 [04:58:51] Epoch: 1 Batch: 11608/38378 (30.25%) Loss: 2.254358 LR: 0.00004452 [04:58:53] Epoch: 1 Batch: 11609/38378 (30.25%) Loss: 1.923345 LR: 0.00004452 [04:58:55] Epoch: 1 Batch: 11610/38378 (30.25%) Loss: 2.115042 LR: 0.00004452 [04:58:57] Epoch: 1 Batch: 11611/38378 (30.25%) Loss: 1.947948 LR: 0.00004452 [04:58:58] Epoch: 1 Batch: 11612/38378 (30.26%) Loss: 1.965749 LR: 0.00004452 [04:59:00] Epoch: 1 Batch: 11613/38378 (30.26%) Loss: 1.706522 LR: 0.00004452 [04:59:02] Epoch: 1 Batch: 11614/38378 (30.26%) Loss: 1.859543 LR: 0.00004452 [04:59:04] Epoch: 1 Batch: 11615/38378 (30.26%) Loss: 1.959893 LR: 0.00004451 [04:59:06] Epoch: 1 Batch: 11616/38378 (30.27%) Loss: 1.864534 LR: 0.00004451 [04:59:08] Epoch: 1 Batch: 11617/38378 (30.27%) Loss: 1.533174 LR: 0.00004451 [04:59:09] Epoch: 1 Batch: 11618/38378 (30.27%) Loss: 2.160504 LR: 0.00004451 [04:59:11] Epoch: 1 Batch: 11619/38378 (30.28%) Loss: 2.098374 LR: 0.00004451 [04:59:13] Epoch: 1 Batch: 11620/38378 (30.28%) Loss: 2.118183 LR: 0.00004451 [04:59:15] Epoch: 1 Batch: 11621/38378 (30.28%) Loss: 2.208989 LR: 0.00004451 [04:59:17] Epoch: 1 Batch: 11622/38378 (30.28%) Loss: 1.840802 LR: 0.00004451 [04:59:18] Epoch: 1 Batch: 11623/38378 (30.29%) Loss: 1.892675 LR: 0.00004451 [04:59:20] Epoch: 1 Batch: 11624/38378 (30.29%) Loss: 2.123357 LR: 0.00004451 [04:59:22] Epoch: 1 Batch: 11625/38378 (30.29%) Loss: 2.088713 LR: 0.00004451 [04:59:24] Epoch: 1 Batch: 11626/38378 (30.29%) Loss: 1.840989 LR: 0.00004451 [04:59:26] Epoch: 1 Batch: 11627/38378 (30.30%) Loss: 2.423406 LR: 0.00004451 [04:59:28] Epoch: 1 Batch: 11628/38378 (30.30%) Loss: 1.980468 LR: 0.00004451 [04:59:29] Epoch: 1 Batch: 11629/38378 (30.30%) Loss: 2.340794 LR: 0.00004450 [04:59:31] Epoch: 1 Batch: 11630/38378 (30.30%) Loss: 1.670903 LR: 0.00004450 [04:59:33] Epoch: 1 Batch: 11631/38378 (30.31%) Loss: 1.920688 LR: 0.00004450 [04:59:35] Epoch: 1 Batch: 11632/38378 (30.31%) Loss: 2.044761 LR: 0.00004450 [04:59:37] Epoch: 1 Batch: 11633/38378 (30.31%) Loss: 1.961344 LR: 0.00004450 [04:59:38] Epoch: 1 Batch: 11634/38378 (30.31%) Loss: 1.896186 LR: 0.00004450 [04:59:40] Epoch: 1 Batch: 11635/38378 (30.32%) Loss: 2.035325 LR: 0.00004450 [04:59:42] Epoch: 1 Batch: 11636/38378 (30.32%) Loss: 1.712392 LR: 0.00004449 [04:59:44] Epoch: 1 Batch: 11637/38378 (30.32%) Loss: 2.071315 LR: 0.00004449 [04:59:46] Epoch: 1 Batch: 11638/38378 (30.32%) Loss: 2.010972 LR: 0.00004449 [04:59:47] Epoch: 1 Batch: 11639/38378 (30.33%) Loss: 2.069018 LR: 0.00004449 [04:59:49] Epoch: 1 Batch: 11640/38378 (30.33%) Loss: 2.027285 LR: 0.00004449 [04:59:51] Epoch: 1 Batch: 11641/38378 (30.33%) Loss: 1.945326 LR: 0.00004449 [04:59:53] Epoch: 1 Batch: 11642/38378 (30.34%) Loss: 1.983109 LR: 0.00004449 [04:59:55] Epoch: 1 Batch: 11643/38378 (30.34%) Loss: 2.150182 LR: 0.00004448 [04:59:56] Epoch: 1 Batch: 11644/38378 (30.34%) Loss: 2.072503 LR: 0.00004448 [04:59:58] Epoch: 1 Batch: 11645/38378 (30.34%) Loss: 1.841610 LR: 0.00004448 [05:00:00] Epoch: 1 Batch: 11646/38378 (30.35%) Loss: 1.842233 LR: 0.00004448 [05:00:02] Epoch: 1 Batch: 11647/38378 (30.35%) Loss: 1.982225 LR: 0.00004448 [05:00:04] Epoch: 1 Batch: 11648/38378 (30.35%) Loss: 2.277162 LR: 0.00004448 [05:00:05] Epoch: 1 Batch: 11649/38378 (30.35%) Loss: 2.090498 LR: 0.00004448 [05:00:07] Epoch: 1 Batch: 11650/38378 (30.36%) Loss: 2.282704 LR: 0.00004447 [05:00:09] Epoch: 1 Batch: 11651/38378 (30.36%) Loss: 2.333936 LR: 0.00004447 [05:00:10] Epoch: 1 Batch: 11652/38378 (30.36%) Loss: 2.185520 LR: 0.00004447 [05:00:12] Epoch: 1 Batch: 11653/38378 (30.36%) Loss: 1.936118 LR: 0.00004447 [05:00:14] Epoch: 1 Batch: 11654/38378 (30.37%) Loss: 1.967798 LR: 0.00004447 [05:00:16] Epoch: 1 Batch: 11655/38378 (30.37%) Loss: 2.194254 LR: 0.00004447 [05:00:18] Epoch: 1 Batch: 11656/38378 (30.37%) Loss: 1.853122 LR: 0.00004447 [05:00:19] Epoch: 1 Batch: 11657/38378 (30.37%) Loss: 1.846666 LR: 0.00004446 [05:00:21] Epoch: 1 Batch: 11658/38378 (30.38%) Loss: 2.335295 LR: 0.00004446 [05:00:23] Epoch: 1 Batch: 11659/38378 (30.38%) Loss: 1.830021 LR: 0.00004446 [05:00:25] Epoch: 1 Batch: 11660/38378 (30.38%) Loss: 1.866744 LR: 0.00004446 [05:00:27] Epoch: 1 Batch: 11661/38378 (30.38%) Loss: 1.968200 LR: 0.00004446 [05:00:28] Epoch: 1 Batch: 11662/38378 (30.39%) Loss: 1.928710 LR: 0.00004446 [05:00:30] Epoch: 1 Batch: 11663/38378 (30.39%) Loss: 1.926613 LR: 0.00004446 [05:00:32] Epoch: 1 Batch: 11664/38378 (30.39%) Loss: 1.949685 LR: 0.00004445 [05:00:34] Epoch: 1 Batch: 11665/38378 (30.40%) Loss: 1.840273 LR: 0.00004445 [05:00:36] Epoch: 1 Batch: 11666/38378 (30.40%) Loss: 2.083225 LR: 0.00004445 [05:00:37] Epoch: 1 Batch: 11667/38378 (30.40%) Loss: 1.963443 LR: 0.00004445 [05:00:39] Epoch: 1 Batch: 11668/38378 (30.40%) Loss: 2.321289 LR: 0.00004445 [05:00:41] Epoch: 1 Batch: 11669/38378 (30.41%) Loss: 2.049219 LR: 0.00004445 [05:00:43] Epoch: 1 Batch: 11670/38378 (30.41%) Loss: 2.143170 LR: 0.00004445 [05:00:45] Epoch: 1 Batch: 11671/38378 (30.41%) Loss: 2.001472 LR: 0.00004444 [05:00:46] Epoch: 1 Batch: 11672/38378 (30.41%) Loss: 2.333686 LR: 0.00004444 [05:00:48] Epoch: 1 Batch: 11673/38378 (30.42%) Loss: 2.290992 LR: 0.00004444 [05:00:50] Epoch: 1 Batch: 11674/38378 (30.42%) Loss: 2.029214 LR: 0.00004444 [05:00:52] Epoch: 1 Batch: 11675/38378 (30.42%) Loss: 1.798892 LR: 0.00004444 [05:00:54] Epoch: 1 Batch: 11676/38378 (30.42%) Loss: 1.774915 LR: 0.00004444 [05:00:56] Epoch: 1 Batch: 11677/38378 (30.43%) Loss: 2.019782 LR: 0.00004444 [05:00:57] Epoch: 1 Batch: 11678/38378 (30.43%) Loss: 1.935067 LR: 0.00004444 [05:00:59] Epoch: 1 Batch: 11679/38378 (30.43%) Loss: 1.911553 LR: 0.00004444 [05:01:01] Epoch: 1 Batch: 11680/38378 (30.43%) Loss: 1.938908 LR: 0.00004444 [05:01:03] Epoch: 1 Batch: 11681/38378 (30.44%) Loss: 1.921310 LR: 0.00004444 [05:01:05] Epoch: 1 Batch: 11682/38378 (30.44%) Loss: 1.852941 LR: 0.00004444 [05:01:06] Epoch: 1 Batch: 11683/38378 (30.44%) Loss: 1.954090 LR: 0.00004444 [05:01:08] Epoch: 1 Batch: 11684/38378 (30.44%) Loss: 1.835235 LR: 0.00004444 [05:01:10] Epoch: 1 Batch: 11685/38378 (30.45%) Loss: 1.862635 LR: 0.00004443 [05:01:12] Epoch: 1 Batch: 11686/38378 (30.45%) Loss: 2.075659 LR: 0.00004443 [05:01:14] Epoch: 1 Batch: 11687/38378 (30.45%) Loss: 1.805037 LR: 0.00004443 [05:01:16] Epoch: 1 Batch: 11688/38378 (30.45%) Loss: 2.152834 LR: 0.00004443 [05:01:17] Epoch: 1 Batch: 11689/38378 (30.46%) Loss: 1.921087 LR: 0.00004443 [05:01:19] Epoch: 1 Batch: 11690/38378 (30.46%) Loss: 1.957070 LR: 0.00004443 [05:01:21] Epoch: 1 Batch: 11691/38378 (30.46%) Loss: 2.089368 LR: 0.00004443 [05:01:23] Epoch: 1 Batch: 11692/38378 (30.47%) Loss: 1.803326 LR: 0.00004442 [05:01:25] Epoch: 1 Batch: 11693/38378 (30.47%) Loss: 1.968797 LR: 0.00004442 [05:01:26] Epoch: 1 Batch: 11694/38378 (30.47%) Loss: 2.030890 LR: 0.00004442 [05:01:28] Epoch: 1 Batch: 11695/38378 (30.47%) Loss: 2.038758 LR: 0.00004442 [05:01:30] Epoch: 1 Batch: 11696/38378 (30.48%) Loss: 2.088329 LR: 0.00004442 [05:01:31] Epoch: 1 Batch: 11697/38378 (30.48%) Loss: 1.808807 LR: 0.00004442 [05:01:33] Epoch: 1 Batch: 11698/38378 (30.48%) Loss: 1.965375 LR: 0.00004442 [05:01:35] Epoch: 1 Batch: 11699/38378 (30.48%) Loss: 1.994721 LR: 0.00004441 [05:01:41] >> Cleaned up old temp checkpoint: epoch1_step10700 [05:01:41] >> Temp checkpoint saved: epoch1_step11700, size: 0.1702 GB [05:01:41] Epoch: 1 Batch: 11700/38378 (30.49%) Loss: 2.105813 LR: 0.00004441 [05:01:43] Epoch: 1 Batch: 11701/38378 (30.49%) Loss: 1.815262 LR: 0.00004441 [05:01:45] Epoch: 1 Batch: 11702/38378 (30.49%) Loss: 1.837236 LR: 0.00004441 [05:01:47] Epoch: 1 Batch: 11703/38378 (30.49%) Loss: 2.186430 LR: 0.00004441 [05:01:48] Epoch: 1 Batch: 11704/38378 (30.50%) Loss: 2.117984 LR: 0.00004441 [05:01:50] Epoch: 1 Batch: 11705/38378 (30.50%) Loss: 1.988098 LR: 0.00004441 [05:01:52] Epoch: 1 Batch: 11706/38378 (30.50%) Loss: 1.999977 LR: 0.00004440 [05:01:54] Epoch: 1 Batch: 11707/38378 (30.50%) Loss: 2.017248 LR: 0.00004440 [05:01:56] Epoch: 1 Batch: 11708/38378 (30.51%) Loss: 2.287342 LR: 0.00004440 [05:01:58] Epoch: 1 Batch: 11709/38378 (30.51%) Loss: 2.166212 LR: 0.00004440 [05:01:59] Epoch: 1 Batch: 11710/38378 (30.51%) Loss: 1.793780 LR: 0.00004440 [05:02:01] Epoch: 1 Batch: 11711/38378 (30.51%) Loss: 1.852949 LR: 0.00004440 [05:02:03] Epoch: 1 Batch: 11712/38378 (30.52%) Loss: 2.099337 LR: 0.00004440 [05:02:05] Epoch: 1 Batch: 11713/38378 (30.52%) Loss: 2.065964 LR: 0.00004439 [05:02:07] Epoch: 1 Batch: 11714/38378 (30.52%) Loss: 1.909776 LR: 0.00004439 [05:02:09] Epoch: 1 Batch: 11715/38378 (30.53%) Loss: 2.032983 LR: 0.00004439 [05:02:10] Epoch: 1 Batch: 11716/38378 (30.53%) Loss: 1.520394 LR: 0.00004439 [05:02:12] Epoch: 1 Batch: 11717/38378 (30.53%) Loss: 2.338616 LR: 0.00004439 [05:02:14] Epoch: 1 Batch: 11718/38378 (30.53%) Loss: 1.867661 LR: 0.00004439 [05:02:16] Epoch: 1 Batch: 11719/38378 (30.54%) Loss: 1.782065 LR: 0.00004439 [05:02:18] Epoch: 1 Batch: 11720/38378 (30.54%) Loss: 1.879843 LR: 0.00004438 [05:02:20] Epoch: 1 Batch: 11721/38378 (30.54%) Loss: 2.049062 LR: 0.00004438 [05:02:21] Epoch: 1 Batch: 11722/38378 (30.54%) Loss: 2.099484 LR: 0.00004438 [05:02:23] Epoch: 1 Batch: 11723/38378 (30.55%) Loss: 1.849415 LR: 0.00004438 [05:02:25] Epoch: 1 Batch: 11724/38378 (30.55%) Loss: 1.764393 LR: 0.00004438 [05:02:27] Epoch: 1 Batch: 11725/38378 (30.55%) Loss: 1.958573 LR: 0.00004438 [05:02:29] Epoch: 1 Batch: 11726/38378 (30.55%) Loss: 1.918679 LR: 0.00004438 [05:02:31] Epoch: 1 Batch: 11727/38378 (30.56%) Loss: 2.069365 LR: 0.00004438 [05:02:32] Epoch: 1 Batch: 11728/38378 (30.56%) Loss: 2.108470 LR: 0.00004438 [05:02:34] Epoch: 1 Batch: 11729/38378 (30.56%) Loss: 1.991368 LR: 0.00004438 [05:02:36] Epoch: 1 Batch: 11730/38378 (30.56%) Loss: 2.154122 LR: 0.00004438 [05:02:38] Epoch: 1 Batch: 11731/38378 (30.57%) Loss: 2.033962 LR: 0.00004438 [05:02:40] Epoch: 1 Batch: 11732/38378 (30.57%) Loss: 1.676417 LR: 0.00004438 [05:02:41] Epoch: 1 Batch: 11733/38378 (30.57%) Loss: 1.916758 LR: 0.00004438 [05:02:43] Epoch: 1 Batch: 11734/38378 (30.57%) Loss: 1.909925 LR: 0.00004437 [05:02:45] Epoch: 1 Batch: 11735/38378 (30.58%) Loss: 1.776679 LR: 0.00004437 [05:02:47] Epoch: 1 Batch: 11736/38378 (30.58%) Loss: 1.801270 LR: 0.00004437 [05:02:49] Epoch: 1 Batch: 11737/38378 (30.58%) Loss: 1.916838 LR: 0.00004437 [05:02:50] Epoch: 1 Batch: 11738/38378 (30.59%) Loss: 1.816507 LR: 0.00004437 [05:02:52] Epoch: 1 Batch: 11739/38378 (30.59%) Loss: 1.742006 LR: 0.00004437 [05:02:54] Epoch: 1 Batch: 11740/38378 (30.59%) Loss: 1.589682 LR: 0.00004437 [05:02:56] Epoch: 1 Batch: 11741/38378 (30.59%) Loss: 1.564774 LR: 0.00004436 [05:02:58] Epoch: 1 Batch: 11742/38378 (30.60%) Loss: 2.060347 LR: 0.00004436 [05:02:59] Epoch: 1 Batch: 11743/38378 (30.60%) Loss: 2.132597 LR: 0.00004436 [05:03:01] Epoch: 1 Batch: 11744/38378 (30.60%) Loss: 2.136504 LR: 0.00004436 [05:03:03] Epoch: 1 Batch: 11745/38378 (30.60%) Loss: 2.012783 LR: 0.00004436 [05:03:05] Epoch: 1 Batch: 11746/38378 (30.61%) Loss: 1.842370 LR: 0.00004436 [05:03:07] Epoch: 1 Batch: 11747/38378 (30.61%) Loss: 2.018536 LR: 0.00004436 [05:03:08] Epoch: 1 Batch: 11748/38378 (30.61%) Loss: 2.018021 LR: 0.00004435 [05:03:10] Epoch: 1 Batch: 11749/38378 (30.61%) Loss: 1.975503 LR: 0.00004435 [05:03:12] Epoch: 1 Batch: 11750/38378 (30.62%) Loss: 1.895544 LR: 0.00004435 [05:03:14] Epoch: 1 Batch: 11751/38378 (30.62%) Loss: 1.817742 LR: 0.00004435 [05:03:16] Epoch: 1 Batch: 11752/38378 (30.62%) Loss: 2.231245 LR: 0.00004435 [05:03:17] Epoch: 1 Batch: 11753/38378 (30.62%) Loss: 2.160124 LR: 0.00004435 [05:03:19] Epoch: 1 Batch: 11754/38378 (30.63%) Loss: 1.666297 LR: 0.00004435 [05:03:21] Epoch: 1 Batch: 11755/38378 (30.63%) Loss: 1.728140 LR: 0.00004434 [05:03:23] Epoch: 1 Batch: 11756/38378 (30.63%) Loss: 2.201134 LR: 0.00004434 [05:03:24] Epoch: 1 Batch: 11757/38378 (30.63%) Loss: 2.020347 LR: 0.00004434 [05:03:26] Epoch: 1 Batch: 11758/38378 (30.64%) Loss: 2.258912 LR: 0.00004434 [05:03:28] Epoch: 1 Batch: 11759/38378 (30.64%) Loss: 1.814107 LR: 0.00004434 [05:03:30] Epoch: 1 Batch: 11760/38378 (30.64%) Loss: 1.789111 LR: 0.00004434 [05:03:32] Epoch: 1 Batch: 11761/38378 (30.65%) Loss: 1.922742 LR: 0.00004434 [05:03:33] Epoch: 1 Batch: 11762/38378 (30.65%) Loss: 1.925533 LR: 0.00004433 [05:03:35] Epoch: 1 Batch: 11763/38378 (30.65%) Loss: 1.866517 LR: 0.00004433 [05:03:37] Epoch: 1 Batch: 11764/38378 (30.65%) Loss: 1.806706 LR: 0.00004433 [05:03:39] Epoch: 1 Batch: 11765/38378 (30.66%) Loss: 1.906257 LR: 0.00004433 [05:03:41] Epoch: 1 Batch: 11766/38378 (30.66%) Loss: 1.907665 LR: 0.00004433 [05:03:42] Epoch: 1 Batch: 11767/38378 (30.66%) Loss: 2.031062 LR: 0.00004433 [05:03:44] Epoch: 1 Batch: 11768/38378 (30.66%) Loss: 1.869808 LR: 0.00004433 [05:03:46] Epoch: 1 Batch: 11769/38378 (30.67%) Loss: 1.921578 LR: 0.00004432 [05:03:48] Epoch: 1 Batch: 11770/38378 (30.67%) Loss: 1.935169 LR: 0.00004432 [05:03:50] Epoch: 1 Batch: 11771/38378 (30.67%) Loss: 1.968423 LR: 0.00004432 [05:03:52] Epoch: 1 Batch: 11772/38378 (30.67%) Loss: 1.977427 LR: 0.00004432 [05:03:53] Epoch: 1 Batch: 11773/38378 (30.68%) Loss: 1.616505 LR: 0.00004432 [05:03:55] Epoch: 1 Batch: 11774/38378 (30.68%) Loss: 1.936668 LR: 0.00004432 [05:03:57] Epoch: 1 Batch: 11775/38378 (30.68%) Loss: 2.048638 LR: 0.00004432 [05:03:59] Epoch: 1 Batch: 11776/38378 (30.68%) Loss: 1.807999 LR: 0.00004432 [05:04:01] Epoch: 1 Batch: 11777/38378 (30.69%) Loss: 2.015400 LR: 0.00004432 [05:04:02] Epoch: 1 Batch: 11778/38378 (30.69%) Loss: 2.125698 LR: 0.00004432 [05:04:04] Epoch: 1 Batch: 11779/38378 (30.69%) Loss: 2.112405 LR: 0.00004432 [05:04:06] Epoch: 1 Batch: 11780/38378 (30.69%) Loss: 2.108213 LR: 0.00004432 [05:04:08] Epoch: 1 Batch: 11781/38378 (30.70%) Loss: 2.041627 LR: 0.00004432 [05:04:10] Epoch: 1 Batch: 11782/38378 (30.70%) Loss: 1.751301 LR: 0.00004432 [05:04:12] Epoch: 1 Batch: 11783/38378 (30.70%) Loss: 2.050149 LR: 0.00004431 [05:04:13] Epoch: 1 Batch: 11784/38378 (30.71%) Loss: 2.268837 LR: 0.00004431 [05:04:15] Epoch: 1 Batch: 11785/38378 (30.71%) Loss: 2.298742 LR: 0.00004431 [05:04:17] Epoch: 1 Batch: 11786/38378 (30.71%) Loss: 1.821775 LR: 0.00004431 [05:04:19] Epoch: 1 Batch: 11787/38378 (30.71%) Loss: 2.010039 LR: 0.00004431 [05:04:21] Epoch: 1 Batch: 11788/38378 (30.72%) Loss: 2.217145 LR: 0.00004431 [05:04:22] Epoch: 1 Batch: 11789/38378 (30.72%) Loss: 1.956344 LR: 0.00004431 [05:04:24] Epoch: 1 Batch: 11790/38378 (30.72%) Loss: 1.835845 LR: 0.00004430 [05:04:26] Epoch: 1 Batch: 11791/38378 (30.72%) Loss: 2.065043 LR: 0.00004430 [05:04:28] Epoch: 1 Batch: 11792/38378 (30.73%) Loss: 2.008930 LR: 0.00004430 [05:04:30] Epoch: 1 Batch: 11793/38378 (30.73%) Loss: 2.296351 LR: 0.00004430 [05:04:32] Epoch: 1 Batch: 11794/38378 (30.73%) Loss: 2.219252 LR: 0.00004430 [05:04:33] Epoch: 1 Batch: 11795/38378 (30.73%) Loss: 1.759891 LR: 0.00004430 [05:04:35] Epoch: 1 Batch: 11796/38378 (30.74%) Loss: 1.907271 LR: 0.00004430 [05:04:37] Epoch: 1 Batch: 11797/38378 (30.74%) Loss: 1.832157 LR: 0.00004429 [05:04:39] Epoch: 1 Batch: 11798/38378 (30.74%) Loss: 1.844962 LR: 0.00004429 [05:04:41] Epoch: 1 Batch: 11799/38378 (30.74%) Loss: 2.163055 LR: 0.00004429 [05:04:47] >> Cleaned up old temp checkpoint: epoch1_step10800 [05:04:47] >> Temp checkpoint saved: epoch1_step11800, size: 0.1702 GB [05:04:47] Epoch: 1 Batch: 11800/38378 (30.75%) Loss: 2.204120 LR: 0.00004429 [05:04:48] Epoch: 1 Batch: 11801/38378 (30.75%) Loss: 1.866919 LR: 0.00004429 [05:04:50] Epoch: 1 Batch: 11802/38378 (30.75%) Loss: 2.042518 LR: 0.00004429 [05:04:52] Epoch: 1 Batch: 11803/38378 (30.75%) Loss: 2.038138 LR: 0.00004429 [05:04:54] Epoch: 1 Batch: 11804/38378 (30.76%) Loss: 2.048381 LR: 0.00004428 [05:04:56] Epoch: 1 Batch: 11805/38378 (30.76%) Loss: 2.054173 LR: 0.00004428 [05:04:57] Epoch: 1 Batch: 11806/38378 (30.76%) Loss: 2.102525 LR: 0.00004428 [05:04:59] Epoch: 1 Batch: 11807/38378 (30.77%) Loss: 2.009118 LR: 0.00004428 [05:05:01] Epoch: 1 Batch: 11808/38378 (30.77%) Loss: 2.209581 LR: 0.00004428 [05:05:03] Epoch: 1 Batch: 11809/38378 (30.77%) Loss: 2.292787 LR: 0.00004428 [05:05:05] Epoch: 1 Batch: 11810/38378 (30.77%) Loss: 1.861594 LR: 0.00004428 [05:05:07] Epoch: 1 Batch: 11811/38378 (30.78%) Loss: 2.115141 LR: 0.00004427 [05:05:08] Epoch: 1 Batch: 11812/38378 (30.78%) Loss: 1.827169 LR: 0.00004427 [05:05:10] Epoch: 1 Batch: 11813/38378 (30.78%) Loss: 2.007973 LR: 0.00004427 [05:05:12] Epoch: 1 Batch: 11814/38378 (30.78%) Loss: 2.111253 LR: 0.00004427 [05:05:14] Epoch: 1 Batch: 11815/38378 (30.79%) Loss: 1.807545 LR: 0.00004427 [05:05:16] Epoch: 1 Batch: 11816/38378 (30.79%) Loss: 1.960423 LR: 0.00004427 [05:05:17] Epoch: 1 Batch: 11817/38378 (30.79%) Loss: 2.280352 LR: 0.00004427 [05:05:19] Epoch: 1 Batch: 11818/38378 (30.79%) Loss: 2.039189 LR: 0.00004426 [05:05:21] Epoch: 1 Batch: 11819/38378 (30.80%) Loss: 2.070787 LR: 0.00004426 [05:05:23] Epoch: 1 Batch: 11820/38378 (30.80%) Loss: 2.003550 LR: 0.00004426 [05:05:25] Epoch: 1 Batch: 11821/38378 (30.80%) Loss: 2.074294 LR: 0.00004426 [05:05:27] Epoch: 1 Batch: 11822/38378 (30.80%) Loss: 1.995953 LR: 0.00004426 [05:05:28] Epoch: 1 Batch: 11823/38378 (30.81%) Loss: 2.134269 LR: 0.00004426 [05:05:30] Epoch: 1 Batch: 11824/38378 (30.81%) Loss: 2.102843 LR: 0.00004426 [05:05:32] Epoch: 1 Batch: 11825/38378 (30.81%) Loss: 2.067273 LR: 0.00004425 [05:05:34] Epoch: 1 Batch: 11826/38378 (30.81%) Loss: 1.862863 LR: 0.00004425 [05:05:36] Epoch: 1 Batch: 11827/38378 (30.82%) Loss: 2.140809 LR: 0.00004425 [05:05:38] Epoch: 1 Batch: 11828/38378 (30.82%) Loss: 1.817180 LR: 0.00004425 [05:05:39] Epoch: 1 Batch: 11829/38378 (30.82%) Loss: 1.828140 LR: 0.00004425 [05:05:41] Epoch: 1 Batch: 11830/38378 (30.82%) Loss: 1.961698 LR: 0.00004425 [05:05:43] Epoch: 1 Batch: 11831/38378 (30.83%) Loss: 1.895237 LR: 0.00004425 [05:05:45] Epoch: 1 Batch: 11832/38378 (30.83%) Loss: 2.160777 LR: 0.00004425 [05:05:47] Epoch: 1 Batch: 11833/38378 (30.83%) Loss: 2.227946 LR: 0.00004425 [05:05:49] Epoch: 1 Batch: 11834/38378 (30.84%) Loss: 2.168916 LR: 0.00004425 [05:05:50] Epoch: 1 Batch: 11835/38378 (30.84%) Loss: 2.134222 LR: 0.00004425 [05:05:52] Epoch: 1 Batch: 11836/38378 (30.84%) Loss: 2.288351 LR: 0.00004425 [05:05:54] Epoch: 1 Batch: 11837/38378 (30.84%) Loss: 2.106228 LR: 0.00004425 [05:05:56] Epoch: 1 Batch: 11838/38378 (30.85%) Loss: 2.065130 LR: 0.00004425 [05:05:58] Epoch: 1 Batch: 11839/38378 (30.85%) Loss: 1.912633 LR: 0.00004424 [05:05:59] Epoch: 1 Batch: 11840/38378 (30.85%) Loss: 1.939887 LR: 0.00004424 [05:06:01] Epoch: 1 Batch: 11841/38378 (30.85%) Loss: 1.924163 LR: 0.00004424 [05:06:03] Epoch: 1 Batch: 11842/38378 (30.86%) Loss: 2.176286 LR: 0.00004424 [05:06:05] Epoch: 1 Batch: 11843/38378 (30.86%) Loss: 1.840183 LR: 0.00004424 [05:06:07] Epoch: 1 Batch: 11844/38378 (30.86%) Loss: 1.984822 LR: 0.00004424 [05:06:08] Epoch: 1 Batch: 11845/38378 (30.86%) Loss: 2.406434 LR: 0.00004424 [05:06:10] Epoch: 1 Batch: 11846/38378 (30.87%) Loss: 2.014205 LR: 0.00004423 [05:06:12] Epoch: 1 Batch: 11847/38378 (30.87%) Loss: 2.122869 LR: 0.00004423 [05:06:14] Epoch: 1 Batch: 11848/38378 (30.87%) Loss: 2.142292 LR: 0.00004423 [05:06:16] Epoch: 1 Batch: 11849/38378 (30.87%) Loss: 1.895917 LR: 0.00004423 [05:06:17] Epoch: 1 Batch: 11850/38378 (30.88%) Loss: 1.920091 LR: 0.00004423 [05:06:19] Epoch: 1 Batch: 11851/38378 (30.88%) Loss: 2.061452 LR: 0.00004423 [05:06:21] Epoch: 1 Batch: 11852/38378 (30.88%) Loss: 1.820735 LR: 0.00004423 [05:06:23] Epoch: 1 Batch: 11853/38378 (30.88%) Loss: 1.940086 LR: 0.00004422 [05:06:25] Epoch: 1 Batch: 11854/38378 (30.89%) Loss: 1.974977 LR: 0.00004422 [05:06:27] Epoch: 1 Batch: 11855/38378 (30.89%) Loss: 1.920514 LR: 0.00004422 [05:06:28] Epoch: 1 Batch: 11856/38378 (30.89%) Loss: 2.078328 LR: 0.00004422 [05:06:30] Epoch: 1 Batch: 11857/38378 (30.90%) Loss: 2.063147 LR: 0.00004422 [05:06:32] Epoch: 1 Batch: 11858/38378 (30.90%) Loss: 1.934753 LR: 0.00004422 [05:06:34] Epoch: 1 Batch: 11859/38378 (30.90%) Loss: 1.790273 LR: 0.00004422 [05:06:36] Epoch: 1 Batch: 11860/38378 (30.90%) Loss: 1.869922 LR: 0.00004421 [05:06:37] Epoch: 1 Batch: 11861/38378 (30.91%) Loss: 1.828231 LR: 0.00004421 [05:06:39] Epoch: 1 Batch: 11862/38378 (30.91%) Loss: 2.210664 LR: 0.00004421 [05:06:41] Epoch: 1 Batch: 11863/38378 (30.91%) Loss: 2.087591 LR: 0.00004421 [05:06:43] Epoch: 1 Batch: 11864/38378 (30.91%) Loss: 2.020143 LR: 0.00004421 [05:06:45] Epoch: 1 Batch: 11865/38378 (30.92%) Loss: 2.193394 LR: 0.00004421 [05:06:46] Epoch: 1 Batch: 11866/38378 (30.92%) Loss: 1.866073 LR: 0.00004421 [05:06:48] Epoch: 1 Batch: 11867/38378 (30.92%) Loss: 2.014706 LR: 0.00004420 [05:06:50] Epoch: 1 Batch: 11868/38378 (30.92%) Loss: 2.181805 LR: 0.00004420 [05:06:52] Epoch: 1 Batch: 11869/38378 (30.93%) Loss: 1.942848 LR: 0.00004420 [05:06:54] Epoch: 1 Batch: 11870/38378 (30.93%) Loss: 2.103471 LR: 0.00004420 [05:06:56] Epoch: 1 Batch: 11871/38378 (30.93%) Loss: 2.021926 LR: 0.00004420 [05:06:57] Epoch: 1 Batch: 11872/38378 (30.93%) Loss: 1.845657 LR: 0.00004420 [05:06:59] Epoch: 1 Batch: 11873/38378 (30.94%) Loss: 1.832814 LR: 0.00004420 [05:07:01] Epoch: 1 Batch: 11874/38378 (30.94%) Loss: 1.960347 LR: 0.00004419 [05:07:03] Epoch: 1 Batch: 11875/38378 (30.94%) Loss: 1.619869 LR: 0.00004419 [05:07:05] Epoch: 1 Batch: 11876/38378 (30.94%) Loss: 2.049212 LR: 0.00004419 [05:07:07] Epoch: 1 Batch: 11877/38378 (30.95%) Loss: 1.905433 LR: 0.00004419 [05:07:09] Epoch: 1 Batch: 11878/38378 (30.95%) Loss: 2.058832 LR: 0.00004419 [05:07:11] Epoch: 1 Batch: 11879/38378 (30.95%) Loss: 2.146295 LR: 0.00004419 [05:07:12] Epoch: 1 Batch: 11880/38378 (30.96%) Loss: 1.832810 LR: 0.00004419 [05:07:14] Epoch: 1 Batch: 11881/38378 (30.96%) Loss: 1.992932 LR: 0.00004418 [05:07:16] Epoch: 1 Batch: 11882/38378 (30.96%) Loss: 1.987316 LR: 0.00004418 [05:07:18] Epoch: 1 Batch: 11883/38378 (30.96%) Loss: 2.140971 LR: 0.00004418 [05:07:20] Epoch: 1 Batch: 11884/38378 (30.97%) Loss: 1.822239 LR: 0.00004418 [05:07:22] Epoch: 1 Batch: 11885/38378 (30.97%) Loss: 2.017561 LR: 0.00004418 [05:07:23] Epoch: 1 Batch: 11886/38378 (30.97%) Loss: 2.382759 LR: 0.00004418 [05:07:25] Epoch: 1 Batch: 11887/38378 (30.97%) Loss: 2.020827 LR: 0.00004418 [05:07:27] Epoch: 1 Batch: 11888/38378 (30.98%) Loss: 2.050354 LR: 0.00004418 [05:07:29] Epoch: 1 Batch: 11889/38378 (30.98%) Loss: 2.018245 LR: 0.00004418 [05:07:31] Epoch: 1 Batch: 11890/38378 (30.98%) Loss: 1.769567 LR: 0.00004418 [05:07:33] Epoch: 1 Batch: 11891/38378 (30.98%) Loss: 2.318694 LR: 0.00004418 [05:07:34] Epoch: 1 Batch: 11892/38378 (30.99%) Loss: 1.861883 LR: 0.00004418 [05:07:36] Epoch: 1 Batch: 11893/38378 (30.99%) Loss: 2.354582 LR: 0.00004418 [05:07:38] Epoch: 1 Batch: 11894/38378 (30.99%) Loss: 1.964995 LR: 0.00004418 [05:07:40] Epoch: 1 Batch: 11895/38378 (30.99%) Loss: 2.121198 LR: 0.00004417 [05:07:42] Epoch: 1 Batch: 11896/38378 (31.00%) Loss: 1.995147 LR: 0.00004417 [05:07:43] Epoch: 1 Batch: 11897/38378 (31.00%) Loss: 2.042321 LR: 0.00004417 [05:07:45] Epoch: 1 Batch: 11898/38378 (31.00%) Loss: 1.879637 LR: 0.00004417 [05:07:47] Epoch: 1 Batch: 11899/38378 (31.00%) Loss: 2.045294 LR: 0.00004417 [05:07:53] >> Cleaned up old temp checkpoint: epoch1_step10900 [05:07:53] >> Temp checkpoint saved: epoch1_step11900, size: 0.1702 GB [05:07:53] Epoch: 1 Batch: 11900/38378 (31.01%) Loss: 2.069543 LR: 0.00004417 [05:07:55] Epoch: 1 Batch: 11901/38378 (31.01%) Loss: 1.949639 LR: 0.00004417 [05:07:57] Epoch: 1 Batch: 11902/38378 (31.01%) Loss: 1.900697 LR: 0.00004416 [05:07:59] Epoch: 1 Batch: 11903/38378 (31.02%) Loss: 2.026139 LR: 0.00004416 [05:08:00] Epoch: 1 Batch: 11904/38378 (31.02%) Loss: 1.963815 LR: 0.00004416 [05:08:02] Epoch: 1 Batch: 11905/38378 (31.02%) Loss: 2.061264 LR: 0.00004416 [05:08:04] Epoch: 1 Batch: 11906/38378 (31.02%) Loss: 2.182341 LR: 0.00004416 [05:08:06] Epoch: 1 Batch: 11907/38378 (31.03%) Loss: 2.120150 LR: 0.00004416 [05:08:08] Epoch: 1 Batch: 11908/38378 (31.03%) Loss: 2.371940 LR: 0.00004416 [05:08:10] Epoch: 1 Batch: 11909/38378 (31.03%) Loss: 1.881064 LR: 0.00004415 [05:08:11] Epoch: 1 Batch: 11910/38378 (31.03%) Loss: 2.143912 LR: 0.00004415 [05:08:13] Epoch: 1 Batch: 11911/38378 (31.04%) Loss: 2.226220 LR: 0.00004415 [05:08:15] Epoch: 1 Batch: 11912/38378 (31.04%) Loss: 2.209293 LR: 0.00004415 [05:08:17] Epoch: 1 Batch: 11913/38378 (31.04%) Loss: 2.107514 LR: 0.00004415 [05:08:19] Epoch: 1 Batch: 11914/38378 (31.04%) Loss: 1.899898 LR: 0.00004415 [05:08:20] Epoch: 1 Batch: 11915/38378 (31.05%) Loss: 1.963350 LR: 0.00004415 [05:08:22] Epoch: 1 Batch: 11916/38378 (31.05%) Loss: 1.873603 LR: 0.00004414 [05:08:24] Epoch: 1 Batch: 11917/38378 (31.05%) Loss: 1.635140 LR: 0.00004414 [05:08:26] Epoch: 1 Batch: 11918/38378 (31.05%) Loss: 2.016055 LR: 0.00004414 [05:08:28] Epoch: 1 Batch: 11919/38378 (31.06%) Loss: 2.103798 LR: 0.00004414 [05:08:30] Epoch: 1 Batch: 11920/38378 (31.06%) Loss: 1.902949 LR: 0.00004414 [05:08:31] Epoch: 1 Batch: 11921/38378 (31.06%) Loss: 2.204264 LR: 0.00004414 [05:08:33] Epoch: 1 Batch: 11922/38378 (31.06%) Loss: 1.960532 LR: 0.00004414 [05:08:35] Epoch: 1 Batch: 11923/38378 (31.07%) Loss: 2.073684 LR: 0.00004413 [05:08:37] Epoch: 1 Batch: 11924/38378 (31.07%) Loss: 2.084132 LR: 0.00004413 [05:08:38] Epoch: 1 Batch: 11925/38378 (31.07%) Loss: 1.987643 LR: 0.00004413 [05:08:40] Epoch: 1 Batch: 11926/38378 (31.08%) Loss: 2.412119 LR: 0.00004413 [05:08:42] Epoch: 1 Batch: 11927/38378 (31.08%) Loss: 1.721764 LR: 0.00004413 [05:08:44] Epoch: 1 Batch: 11928/38378 (31.08%) Loss: 2.134800 LR: 0.00004413 [05:08:45] Epoch: 1 Batch: 11929/38378 (31.08%) Loss: 2.084443 LR: 0.00004413 [05:08:47] Epoch: 1 Batch: 11930/38378 (31.09%) Loss: 1.829003 LR: 0.00004412 [05:08:49] Epoch: 1 Batch: 11931/38378 (31.09%) Loss: 1.785618 LR: 0.00004412 [05:08:51] Epoch: 1 Batch: 11932/38378 (31.09%) Loss: 1.934066 LR: 0.00004412 [05:08:53] Epoch: 1 Batch: 11933/38378 (31.09%) Loss: 2.038156 LR: 0.00004412 [05:08:54] Epoch: 1 Batch: 11934/38378 (31.10%) Loss: 1.580063 LR: 0.00004412 [05:08:56] Epoch: 1 Batch: 11935/38378 (31.10%) Loss: 2.049441 LR: 0.00004412 [05:08:58] Epoch: 1 Batch: 11936/38378 (31.10%) Loss: 2.134331 LR: 0.00004412 [05:09:00] Epoch: 1 Batch: 11937/38378 (31.10%) Loss: 2.060289 LR: 0.00004411 [05:09:02] Epoch: 1 Batch: 11938/38378 (31.11%) Loss: 2.097190 LR: 0.00004411 [05:09:04] Epoch: 1 Batch: 11939/38378 (31.11%) Loss: 2.233111 LR: 0.00004411 [05:09:05] Epoch: 1 Batch: 11940/38378 (31.11%) Loss: 1.941742 LR: 0.00004411 [05:09:07] Epoch: 1 Batch: 11941/38378 (31.11%) Loss: 2.195282 LR: 0.00004411 [05:09:09] Epoch: 1 Batch: 11942/38378 (31.12%) Loss: 1.854291 LR: 0.00004411 [05:09:11] Epoch: 1 Batch: 11943/38378 (31.12%) Loss: 1.952884 LR: 0.00004411 [05:09:13] Epoch: 1 Batch: 11944/38378 (31.12%) Loss: 1.782236 LR: 0.00004410 [05:09:14] Epoch: 1 Batch: 11945/38378 (31.12%) Loss: 2.172006 LR: 0.00004410 [05:09:16] Epoch: 1 Batch: 11946/38378 (31.13%) Loss: 2.220444 LR: 0.00004410 [05:09:18] Epoch: 1 Batch: 11947/38378 (31.13%) Loss: 2.241233 LR: 0.00004410 [05:09:20] Epoch: 1 Batch: 11948/38378 (31.13%) Loss: 2.028348 LR: 0.00004410 [05:09:22] Epoch: 1 Batch: 11949/38378 (31.14%) Loss: 2.008491 LR: 0.00004410 [05:09:23] Epoch: 1 Batch: 11950/38378 (31.14%) Loss: 2.186824 LR: 0.00004410 [05:09:25] Epoch: 1 Batch: 11951/38378 (31.14%) Loss: 2.052493 LR: 0.00004410 [05:09:27] Epoch: 1 Batch: 11952/38378 (31.14%) Loss: 2.216157 LR: 0.00004410 [05:09:29] Epoch: 1 Batch: 11953/38378 (31.15%) Loss: 1.809587 LR: 0.00004410 [05:09:31] Epoch: 1 Batch: 11954/38378 (31.15%) Loss: 1.904667 LR: 0.00004410 [05:09:32] Epoch: 1 Batch: 11955/38378 (31.15%) Loss: 1.580501 LR: 0.00004410 [05:09:34] Epoch: 1 Batch: 11956/38378 (31.15%) Loss: 2.141566 LR: 0.00004410 [05:09:36] Epoch: 1 Batch: 11957/38378 (31.16%) Loss: 2.197514 LR: 0.00004410 [05:09:38] Epoch: 1 Batch: 11958/38378 (31.16%) Loss: 2.047429 LR: 0.00004409 [05:09:40] Epoch: 1 Batch: 11959/38378 (31.16%) Loss: 2.172543 LR: 0.00004409 [05:09:41] Epoch: 1 Batch: 11960/38378 (31.16%) Loss: 1.907673 LR: 0.00004409 [05:09:43] Epoch: 1 Batch: 11961/38378 (31.17%) Loss: 2.049029 LR: 0.00004409 [05:09:45] Epoch: 1 Batch: 11962/38378 (31.17%) Loss: 1.931568 LR: 0.00004409 [05:09:47] Epoch: 1 Batch: 11963/38378 (31.17%) Loss: 2.151119 LR: 0.00004409 [05:09:49] Epoch: 1 Batch: 11964/38378 (31.17%) Loss: 2.113839 LR: 0.00004409 [05:09:51] Epoch: 1 Batch: 11965/38378 (31.18%) Loss: 1.840092 LR: 0.00004408 [05:09:52] Epoch: 1 Batch: 11966/38378 (31.18%) Loss: 1.725379 LR: 0.00004408 [05:09:54] Epoch: 1 Batch: 11967/38378 (31.18%) Loss: 1.982408 LR: 0.00004408 [05:09:56] Epoch: 1 Batch: 11968/38378 (31.18%) Loss: 1.921280 LR: 0.00004408 [05:09:58] Epoch: 1 Batch: 11969/38378 (31.19%) Loss: 2.270618 LR: 0.00004408 [05:10:00] Epoch: 1 Batch: 11970/38378 (31.19%) Loss: 2.147893 LR: 0.00004408 [05:10:01] Epoch: 1 Batch: 11971/38378 (31.19%) Loss: 2.123952 LR: 0.00004408 [05:10:03] Epoch: 1 Batch: 11972/38378 (31.19%) Loss: 1.707603 LR: 0.00004407 [05:10:05] Epoch: 1 Batch: 11973/38378 (31.20%) Loss: 1.975235 LR: 0.00004407 [05:10:07] Epoch: 1 Batch: 11974/38378 (31.20%) Loss: 2.086658 LR: 0.00004407 [05:10:09] Epoch: 1 Batch: 11975/38378 (31.20%) Loss: 2.045553 LR: 0.00004407 [05:10:11] Epoch: 1 Batch: 11976/38378 (31.21%) Loss: 1.788473 LR: 0.00004407 [05:10:12] Epoch: 1 Batch: 11977/38378 (31.21%) Loss: 1.907490 LR: 0.00004407 [05:10:14] Epoch: 1 Batch: 11978/38378 (31.21%) Loss: 1.972914 LR: 0.00004407 [05:10:16] Epoch: 1 Batch: 11979/38378 (31.21%) Loss: 1.846739 LR: 0.00004406 [05:10:18] Epoch: 1 Batch: 11980/38378 (31.22%) Loss: 1.867905 LR: 0.00004406 [05:10:20] Epoch: 1 Batch: 11981/38378 (31.22%) Loss: 1.992080 LR: 0.00004406 [05:10:21] Epoch: 1 Batch: 11982/38378 (31.22%) Loss: 1.909951 LR: 0.00004406 [05:10:23] Epoch: 1 Batch: 11983/38378 (31.22%) Loss: 1.965988 LR: 0.00004406 [05:10:25] Epoch: 1 Batch: 11984/38378 (31.23%) Loss: 2.128244 LR: 0.00004406 [05:10:27] Epoch: 1 Batch: 11985/38378 (31.23%) Loss: 1.936763 LR: 0.00004406 [05:10:29] Epoch: 1 Batch: 11986/38378 (31.23%) Loss: 1.924918 LR: 0.00004405 [05:10:31] Epoch: 1 Batch: 11987/38378 (31.23%) Loss: 1.850901 LR: 0.00004405 [05:10:32] Epoch: 1 Batch: 11988/38378 (31.24%) Loss: 2.146663 LR: 0.00004405 [05:10:34] Epoch: 1 Batch: 11989/38378 (31.24%) Loss: 2.350337 LR: 0.00004405 [05:10:36] Epoch: 1 Batch: 11990/38378 (31.24%) Loss: 1.827909 LR: 0.00004405 [05:10:38] Epoch: 1 Batch: 11991/38378 (31.24%) Loss: 2.055452 LR: 0.00004405 [05:10:40] Epoch: 1 Batch: 11992/38378 (31.25%) Loss: 2.114909 LR: 0.00004405 [05:10:42] Epoch: 1 Batch: 11993/38378 (31.25%) Loss: 1.842170 LR: 0.00004404 [05:10:43] Epoch: 1 Batch: 11994/38378 (31.25%) Loss: 2.122059 LR: 0.00004404 [05:10:45] Epoch: 1 Batch: 11995/38378 (31.25%) Loss: 1.848369 LR: 0.00004404 [05:10:47] Epoch: 1 Batch: 11996/38378 (31.26%) Loss: 2.152650 LR: 0.00004404 [05:10:49] Epoch: 1 Batch: 11997/38378 (31.26%) Loss: 1.754121 LR: 0.00004404 [05:10:51] Epoch: 1 Batch: 11998/38378 (31.26%) Loss: 1.730735 LR: 0.00004404 [05:10:52] Epoch: 1 Batch: 11999/38378 (31.27%) Loss: 1.883859 LR: 0.00004404 [05:10:54] >> Evaluating batch 0 [05:10:55] >> Evaluating batch 1 [05:10:56] >> Evaluating batch 2 [05:10:57] >> Evaluating batch 3 [05:10:58] >> Evaluating batch 4 [05:10:59] >> Evaluating batch 5 [05:11:00] >> Evaluating batch 6 [05:11:01] >> Evaluating batch 7 [05:11:02] >> Evaluating batch 8 [05:11:03] >> Evaluating batch 9 [05:11:04] >> Evaluating batch 10 [05:11:05] >> Evaluating batch 11 [05:11:06] >> Evaluating batch 12 [05:11:07] >> Evaluating batch 13 [05:11:08] >> Evaluating batch 14 [05:11:09] >> Evaluating batch 15 [05:11:10] >> Evaluating batch 16 [05:11:11] Epoch: 1 Step: 12000/38378 Evaluation: [05:11:11] [1mAvg Loss Since Last Eval: 1.9943 Val Loss: 2.1058 Validation loss delta: -0.0027 Perplexity: 8.2137 LR: 0.00004403 [05:11:15] >> Cleaned up old temp checkpoint: epoch1_step11000 [05:11:15] >> Temp checkpoint saved: epoch1_step12000, size: 0.1702 GB [05:11:19] >> Checkpoint saved: epoch1_step12000, size: 0.1702 GB [05:11:19] Epoch: 1 Batch: 12000/38378 (31.27%) Loss: 1.791434 LR: 0.00004403 [05:11:21] Epoch: 1 Batch: 12001/38378 (31.27%) Loss: 1.781737 LR: 0.00004403 [05:11:23] Epoch: 1 Batch: 12002/38378 (31.27%) Loss: 1.960440 LR: 0.00004403 [05:11:25] Epoch: 1 Batch: 12003/38378 (31.28%) Loss: 1.819992 LR: 0.00004403 [05:11:26] Epoch: 1 Batch: 12004/38378 (31.28%) Loss: 2.121078 LR: 0.00004403 [05:11:28] Epoch: 1 Batch: 12005/38378 (31.28%) Loss: 1.947011 LR: 0.00004403 [05:11:30] Epoch: 1 Batch: 12006/38378 (31.28%) Loss: 1.926874 LR: 0.00004403 [05:11:32] Epoch: 1 Batch: 12007/38378 (31.29%) Loss: 2.242002 LR: 0.00004402 [05:11:34] Epoch: 1 Batch: 12008/38378 (31.29%) Loss: 1.930101 LR: 0.00004402 [05:11:35] Epoch: 1 Batch: 12009/38378 (31.29%) Loss: 1.962685 LR: 0.00004402 [05:11:37] Epoch: 1 Batch: 12010/38378 (31.29%) Loss: 2.316341 LR: 0.00004402 [05:11:39] Epoch: 1 Batch: 12011/38378 (31.30%) Loss: 1.956787 LR: 0.00004402 [05:11:41] Epoch: 1 Batch: 12012/38378 (31.30%) Loss: 2.041625 LR: 0.00004402 [05:11:43] Epoch: 1 Batch: 12013/38378 (31.30%) Loss: 2.115699 LR: 0.00004402 [05:11:45] Epoch: 1 Batch: 12014/38378 (31.30%) Loss: 2.107552 LR: 0.00004402 [05:11:47] Epoch: 1 Batch: 12015/38378 (31.31%) Loss: 2.355864 LR: 0.00004402 [05:11:48] Epoch: 1 Batch: 12016/38378 (31.31%) Loss: 1.760317 LR: 0.00004402 [05:11:50] Epoch: 1 Batch: 12017/38378 (31.31%) Loss: 1.867715 LR: 0.00004402 [05:11:52] Epoch: 1 Batch: 12018/38378 (31.31%) Loss: 2.053079 LR: 0.00004402 [05:11:54] Epoch: 1 Batch: 12019/38378 (31.32%) Loss: 1.866456 LR: 0.00004402 [05:11:56] Epoch: 1 Batch: 12020/38378 (31.32%) Loss: 2.161433 LR: 0.00004402 [05:11:58] Epoch: 1 Batch: 12021/38378 (31.32%) Loss: 1.931949 LR: 0.00004401 [05:12:00] Epoch: 1 Batch: 12022/38378 (31.33%) Loss: 1.878392 LR: 0.00004401 [05:12:01] Epoch: 1 Batch: 12023/38378 (31.33%) Loss: 1.742640 LR: 0.00004401 [05:12:03] Epoch: 1 Batch: 12024/38378 (31.33%) Loss: 1.819642 LR: 0.00004401 [05:12:05] Epoch: 1 Batch: 12025/38378 (31.33%) Loss: 2.261468 LR: 0.00004401 [05:12:07] Epoch: 1 Batch: 12026/38378 (31.34%) Loss: 1.886078 LR: 0.00004401 [05:12:09] Epoch: 1 Batch: 12027/38378 (31.34%) Loss: 1.961549 LR: 0.00004401 [05:12:11] Epoch: 1 Batch: 12028/38378 (31.34%) Loss: 1.927198 LR: 0.00004400 [05:12:12] Epoch: 1 Batch: 12029/38378 (31.34%) Loss: 1.990342 LR: 0.00004400 [05:12:14] Epoch: 1 Batch: 12030/38378 (31.35%) Loss: 1.940174 LR: 0.00004400 [05:12:16] Epoch: 1 Batch: 12031/38378 (31.35%) Loss: 2.074130 LR: 0.00004400 [05:12:18] Epoch: 1 Batch: 12032/38378 (31.35%) Loss: 2.021276 LR: 0.00004400 [05:12:20] Epoch: 1 Batch: 12033/38378 (31.35%) Loss: 2.150915 LR: 0.00004400 [05:12:21] Epoch: 1 Batch: 12034/38378 (31.36%) Loss: 1.966366 LR: 0.00004400 [05:12:23] Epoch: 1 Batch: 12035/38378 (31.36%) Loss: 1.992848 LR: 0.00004399 [05:12:25] Epoch: 1 Batch: 12036/38378 (31.36%) Loss: 2.127515 LR: 0.00004399 [05:12:27] Epoch: 1 Batch: 12037/38378 (31.36%) Loss: 2.079955 LR: 0.00004399 [05:12:28] Epoch: 1 Batch: 12038/38378 (31.37%) Loss: 1.590835 LR: 0.00004399 [05:12:30] Epoch: 1 Batch: 12039/38378 (31.37%) Loss: 2.414788 LR: 0.00004399 [05:12:32] Epoch: 1 Batch: 12040/38378 (31.37%) Loss: 1.831546 LR: 0.00004399 [05:12:34] Epoch: 1 Batch: 12041/38378 (31.37%) Loss: 2.073987 LR: 0.00004399 [05:12:35] Epoch: 1 Batch: 12042/38378 (31.38%) Loss: 2.086526 LR: 0.00004398 [05:12:37] Epoch: 1 Batch: 12043/38378 (31.38%) Loss: 1.842431 LR: 0.00004398 [05:12:39] Epoch: 1 Batch: 12044/38378 (31.38%) Loss: 2.111396 LR: 0.00004398 [05:12:41] Epoch: 1 Batch: 12045/38378 (31.39%) Loss: 1.804128 LR: 0.00004398 [05:12:43] Epoch: 1 Batch: 12046/38378 (31.39%) Loss: 2.255889 LR: 0.00004398 [05:12:44] Epoch: 1 Batch: 12047/38378 (31.39%) Loss: 1.979248 LR: 0.00004398 [05:12:46] Epoch: 1 Batch: 12048/38378 (31.39%) Loss: 2.192077 LR: 0.00004398 [05:12:48] Epoch: 1 Batch: 12049/38378 (31.40%) Loss: 1.811488 LR: 0.00004397 [05:12:50] Epoch: 1 Batch: 12050/38378 (31.40%) Loss: 1.940724 LR: 0.00004397 [05:12:52] Epoch: 1 Batch: 12051/38378 (31.40%) Loss: 2.271248 LR: 0.00004397 [05:12:53] Epoch: 1 Batch: 12052/38378 (31.40%) Loss: 2.033685 LR: 0.00004397 [05:12:55] Epoch: 1 Batch: 12053/38378 (31.41%) Loss: 2.225204 LR: 0.00004397 [05:12:57] Epoch: 1 Batch: 12054/38378 (31.41%) Loss: 2.011871 LR: 0.00004397 [05:12:59] Epoch: 1 Batch: 12055/38378 (31.41%) Loss: 1.841891 LR: 0.00004397 [05:13:01] Epoch: 1 Batch: 12056/38378 (31.41%) Loss: 1.732756 LR: 0.00004396 [05:13:02] Epoch: 1 Batch: 12057/38378 (31.42%) Loss: 2.006181 LR: 0.00004396 [05:13:04] Epoch: 1 Batch: 12058/38378 (31.42%) Loss: 1.893346 LR: 0.00004396 [05:13:06] Epoch: 1 Batch: 12059/38378 (31.42%) Loss: 1.971758 LR: 0.00004396 [05:13:08] Epoch: 1 Batch: 12060/38378 (31.42%) Loss: 2.383702 LR: 0.00004396 [05:13:10] Epoch: 1 Batch: 12061/38378 (31.43%) Loss: 1.929614 LR: 0.00004396 [05:13:11] Epoch: 1 Batch: 12062/38378 (31.43%) Loss: 1.848964 LR: 0.00004396 [05:13:13] Epoch: 1 Batch: 12063/38378 (31.43%) Loss: 1.852582 LR: 0.00004395 [05:13:15] Epoch: 1 Batch: 12064/38378 (31.43%) Loss: 1.973766 LR: 0.00004395 [05:13:17] Epoch: 1 Batch: 12065/38378 (31.44%) Loss: 2.038484 LR: 0.00004395 [05:13:19] Epoch: 1 Batch: 12066/38378 (31.44%) Loss: 2.309019 LR: 0.00004395 [05:13:20] Epoch: 1 Batch: 12067/38378 (31.44%) Loss: 2.064737 LR: 0.00004395 [05:13:22] Epoch: 1 Batch: 12068/38378 (31.45%) Loss: 2.332461 LR: 0.00004395 [05:13:24] Epoch: 1 Batch: 12069/38378 (31.45%) Loss: 1.921077 LR: 0.00004395 [05:13:26] Epoch: 1 Batch: 12070/38378 (31.45%) Loss: 2.086100 LR: 0.00004394 [05:13:28] Epoch: 1 Batch: 12071/38378 (31.45%) Loss: 2.208064 LR: 0.00004394 [05:13:30] Epoch: 1 Batch: 12072/38378 (31.46%) Loss: 2.019003 LR: 0.00004394 [05:13:31] Epoch: 1 Batch: 12073/38378 (31.46%) Loss: 2.190462 LR: 0.00004394 [05:13:33] Epoch: 1 Batch: 12074/38378 (31.46%) Loss: 2.110478 LR: 0.00004394 [05:13:35] Epoch: 1 Batch: 12075/38378 (31.46%) Loss: 1.817851 LR: 0.00004394 [05:13:37] Epoch: 1 Batch: 12076/38378 (31.47%) Loss: 2.079973 LR: 0.00004394 [05:13:39] Epoch: 1 Batch: 12077/38378 (31.47%) Loss: 2.161837 LR: 0.00004394 [05:13:41] Epoch: 1 Batch: 12078/38378 (31.47%) Loss: 2.101977 LR: 0.00004394 [05:13:42] Epoch: 1 Batch: 12079/38378 (31.47%) Loss: 1.897062 LR: 0.00004394 [05:13:44] Epoch: 1 Batch: 12080/38378 (31.48%) Loss: 1.968110 LR: 0.00004394 [05:13:46] Epoch: 1 Batch: 12081/38378 (31.48%) Loss: 2.179661 LR: 0.00004394 [05:13:48] Epoch: 1 Batch: 12082/38378 (31.48%) Loss: 2.069724 LR: 0.00004394 [05:13:50] Epoch: 1 Batch: 12083/38378 (31.48%) Loss: 2.218483 LR: 0.00004394 [05:13:51] Epoch: 1 Batch: 12084/38378 (31.49%) Loss: 1.813509 LR: 0.00004393 [05:13:53] Epoch: 1 Batch: 12085/38378 (31.49%) Loss: 1.895688 LR: 0.00004393 [05:13:55] Epoch: 1 Batch: 12086/38378 (31.49%) Loss: 1.953041 LR: 0.00004393 [05:13:57] Epoch: 1 Batch: 12087/38378 (31.49%) Loss: 1.914030 LR: 0.00004393 [05:13:59] Epoch: 1 Batch: 12088/38378 (31.50%) Loss: 1.916330 LR: 0.00004393 [05:14:00] Epoch: 1 Batch: 12089/38378 (31.50%) Loss: 2.107705 LR: 0.00004393 [05:14:02] Epoch: 1 Batch: 12090/38378 (31.50%) Loss: 1.936033 LR: 0.00004393 [05:14:04] Epoch: 1 Batch: 12091/38378 (31.51%) Loss: 1.780372 LR: 0.00004392 [05:14:06] Epoch: 1 Batch: 12092/38378 (31.51%) Loss: 2.216229 LR: 0.00004392 [05:14:08] Epoch: 1 Batch: 12093/38378 (31.51%) Loss: 2.405192 LR: 0.00004392 [05:14:09] Epoch: 1 Batch: 12094/38378 (31.51%) Loss: 2.167510 LR: 0.00004392 [05:14:11] Epoch: 1 Batch: 12095/38378 (31.52%) Loss: 1.729024 LR: 0.00004392 [05:14:13] Epoch: 1 Batch: 12096/38378 (31.52%) Loss: 2.128612 LR: 0.00004392 [05:14:15] Epoch: 1 Batch: 12097/38378 (31.52%) Loss: 1.918056 LR: 0.00004392 [05:14:17] Epoch: 1 Batch: 12098/38378 (31.52%) Loss: 1.849672 LR: 0.00004391 [05:14:19] Epoch: 1 Batch: 12099/38378 (31.53%) Loss: 2.273434 LR: 0.00004391 [05:14:25] >> Cleaned up old temp checkpoint: epoch1_step11100 [05:14:25] >> Temp checkpoint saved: epoch1_step12100, size: 0.1702 GB [05:14:25] Epoch: 1 Batch: 12100/38378 (31.53%) Loss: 1.884856 LR: 0.00004391 [05:14:27] Epoch: 1 Batch: 12101/38378 (31.53%) Loss: 2.092158 LR: 0.00004391 [05:14:28] Epoch: 1 Batch: 12102/38378 (31.53%) Loss: 1.929952 LR: 0.00004391 [05:14:30] Epoch: 1 Batch: 12103/38378 (31.54%) Loss: 1.939745 LR: 0.00004391 [05:14:32] Epoch: 1 Batch: 12104/38378 (31.54%) Loss: 2.305527 LR: 0.00004391 [05:14:34] Epoch: 1 Batch: 12105/38378 (31.54%) Loss: 1.696580 LR: 0.00004390 [05:14:36] Epoch: 1 Batch: 12106/38378 (31.54%) Loss: 2.291732 LR: 0.00004390 [05:14:37] Epoch: 1 Batch: 12107/38378 (31.55%) Loss: 1.980599 LR: 0.00004390 [05:14:39] Epoch: 1 Batch: 12108/38378 (31.55%) Loss: 2.060978 LR: 0.00004390 [05:14:41] Epoch: 1 Batch: 12109/38378 (31.55%) Loss: 1.879049 LR: 0.00004390 [05:14:43] Epoch: 1 Batch: 12110/38378 (31.55%) Loss: 2.198860 LR: 0.00004390 [05:14:45] Epoch: 1 Batch: 12111/38378 (31.56%) Loss: 2.052633 LR: 0.00004390 [05:14:46] Epoch: 1 Batch: 12112/38378 (31.56%) Loss: 2.212084 LR: 0.00004389 [05:14:48] Epoch: 1 Batch: 12113/38378 (31.56%) Loss: 1.804014 LR: 0.00004389 [05:14:50] Epoch: 1 Batch: 12114/38378 (31.56%) Loss: 2.065512 LR: 0.00004389 [05:14:52] Epoch: 1 Batch: 12115/38378 (31.57%) Loss: 2.134235 LR: 0.00004389 [05:14:54] Epoch: 1 Batch: 12116/38378 (31.57%) Loss: 1.898964 LR: 0.00004389 [05:14:56] Epoch: 1 Batch: 12117/38378 (31.57%) Loss: 1.896461 LR: 0.00004389 [05:14:57] Epoch: 1 Batch: 12118/38378 (31.58%) Loss: 2.081874 LR: 0.00004389 [05:14:59] Epoch: 1 Batch: 12119/38378 (31.58%) Loss: 2.079839 LR: 0.00004388 [05:15:01] Epoch: 1 Batch: 12120/38378 (31.58%) Loss: 1.818967 LR: 0.00004388 [05:15:03] Epoch: 1 Batch: 12121/38378 (31.58%) Loss: 2.243310 LR: 0.00004388 [05:15:05] Epoch: 1 Batch: 12122/38378 (31.59%) Loss: 1.913928 LR: 0.00004388 [05:15:07] Epoch: 1 Batch: 12123/38378 (31.59%) Loss: 1.739178 LR: 0.00004388 [05:15:08] Epoch: 1 Batch: 12124/38378 (31.59%) Loss: 2.153920 LR: 0.00004388 [05:15:10] Epoch: 1 Batch: 12125/38378 (31.59%) Loss: 2.064757 LR: 0.00004388 [05:15:12] Epoch: 1 Batch: 12126/38378 (31.60%) Loss: 2.176688 LR: 0.00004387 [05:15:14] Epoch: 1 Batch: 12127/38378 (31.60%) Loss: 2.050175 LR: 0.00004387 [05:15:16] Epoch: 1 Batch: 12128/38378 (31.60%) Loss: 2.020490 LR: 0.00004387 [05:15:17] Epoch: 1 Batch: 12129/38378 (31.60%) Loss: 1.969374 LR: 0.00004387 [05:15:19] Epoch: 1 Batch: 12130/38378 (31.61%) Loss: 1.869683 LR: 0.00004387 [05:15:21] Epoch: 1 Batch: 12131/38378 (31.61%) Loss: 2.145775 LR: 0.00004387 [05:15:23] Epoch: 1 Batch: 12132/38378 (31.61%) Loss: 1.626756 LR: 0.00004387 [05:15:25] Epoch: 1 Batch: 12133/38378 (31.61%) Loss: 1.889442 LR: 0.00004386 [05:15:26] Epoch: 1 Batch: 12134/38378 (31.62%) Loss: 1.989768 LR: 0.00004386 [05:15:28] Epoch: 1 Batch: 12135/38378 (31.62%) Loss: 2.057600 LR: 0.00004386 [05:15:30] Epoch: 1 Batch: 12136/38378 (31.62%) Loss: 1.845122 LR: 0.00004386 [05:15:32] Epoch: 1 Batch: 12137/38378 (31.62%) Loss: 1.765642 LR: 0.00004386 [05:15:34] Epoch: 1 Batch: 12138/38378 (31.63%) Loss: 1.854523 LR: 0.00004386 [05:15:36] Epoch: 1 Batch: 12139/38378 (31.63%) Loss: 2.013038 LR: 0.00004386 [05:15:37] Epoch: 1 Batch: 12140/38378 (31.63%) Loss: 2.163667 LR: 0.00004386 [05:15:39] Epoch: 1 Batch: 12141/38378 (31.64%) Loss: 2.283928 LR: 0.00004386 [05:15:41] Epoch: 1 Batch: 12142/38378 (31.64%) Loss: 2.114181 LR: 0.00004386 [05:15:43] Epoch: 1 Batch: 12143/38378 (31.64%) Loss: 1.906937 LR: 0.00004386 [05:15:45] Epoch: 1 Batch: 12144/38378 (31.64%) Loss: 2.068449 LR: 0.00004386 [05:15:46] Epoch: 1 Batch: 12145/38378 (31.65%) Loss: 1.844597 LR: 0.00004386 [05:15:48] Epoch: 1 Batch: 12146/38378 (31.65%) Loss: 1.996884 LR: 0.00004386 [05:15:50] Epoch: 1 Batch: 12147/38378 (31.65%) Loss: 2.028602 LR: 0.00004385 [05:15:52] Epoch: 1 Batch: 12148/38378 (31.65%) Loss: 1.992551 LR: 0.00004385 [05:15:54] Epoch: 1 Batch: 12149/38378 (31.66%) Loss: 1.861817 LR: 0.00004385 [05:15:55] Epoch: 1 Batch: 12150/38378 (31.66%) Loss: 1.983564 LR: 0.00004385 [05:15:57] Epoch: 1 Batch: 12151/38378 (31.66%) Loss: 2.157783 LR: 0.00004385 [05:15:59] Epoch: 1 Batch: 12152/38378 (31.66%) Loss: 1.958545 LR: 0.00004385 [05:16:01] Epoch: 1 Batch: 12153/38378 (31.67%) Loss: 2.056596 LR: 0.00004385 [05:16:03] Epoch: 1 Batch: 12154/38378 (31.67%) Loss: 2.176941 LR: 0.00004384 [05:16:04] Epoch: 1 Batch: 12155/38378 (31.67%) Loss: 1.954977 LR: 0.00004384 [05:16:06] Epoch: 1 Batch: 12156/38378 (31.67%) Loss: 1.784627 LR: 0.00004384 [05:16:08] Epoch: 1 Batch: 12157/38378 (31.68%) Loss: 1.876926 LR: 0.00004384 [05:16:10] Epoch: 1 Batch: 12158/38378 (31.68%) Loss: 2.215370 LR: 0.00004384 [05:16:12] Epoch: 1 Batch: 12159/38378 (31.68%) Loss: 2.040754 LR: 0.00004384 [05:16:13] Epoch: 1 Batch: 12160/38378 (31.68%) Loss: 1.683824 LR: 0.00004384 [05:16:15] Epoch: 1 Batch: 12161/38378 (31.69%) Loss: 2.009882 LR: 0.00004383 [05:16:17] Epoch: 1 Batch: 12162/38378 (31.69%) Loss: 2.305569 LR: 0.00004383 [05:16:19] Epoch: 1 Batch: 12163/38378 (31.69%) Loss: 2.124645 LR: 0.00004383 [05:16:21] Epoch: 1 Batch: 12164/38378 (31.70%) Loss: 1.766515 LR: 0.00004383 [05:16:23] Epoch: 1 Batch: 12165/38378 (31.70%) Loss: 1.866429 LR: 0.00004383 [05:16:24] Epoch: 1 Batch: 12166/38378 (31.70%) Loss: 1.808388 LR: 0.00004383 [05:16:26] Epoch: 1 Batch: 12167/38378 (31.70%) Loss: 1.885113 LR: 0.00004383 [05:16:28] Epoch: 1 Batch: 12168/38378 (31.71%) Loss: 2.030340 LR: 0.00004382 [05:16:30] Epoch: 1 Batch: 12169/38378 (31.71%) Loss: 2.113393 LR: 0.00004382 [05:16:32] Epoch: 1 Batch: 12170/38378 (31.71%) Loss: 1.791507 LR: 0.00004382 [05:16:33] Epoch: 1 Batch: 12171/38378 (31.71%) Loss: 2.203997 LR: 0.00004382 [05:16:35] Epoch: 1 Batch: 12172/38378 (31.72%) Loss: 1.802671 LR: 0.00004382 [05:16:37] Epoch: 1 Batch: 12173/38378 (31.72%) Loss: 1.861005 LR: 0.00004382 [05:16:39] Epoch: 1 Batch: 12174/38378 (31.72%) Loss: 2.157822 LR: 0.00004382 [05:16:41] Epoch: 1 Batch: 12175/38378 (31.72%) Loss: 2.279574 LR: 0.00004381 [05:16:43] Epoch: 1 Batch: 12176/38378 (31.73%) Loss: 1.834906 LR: 0.00004381 [05:16:44] Epoch: 1 Batch: 12177/38378 (31.73%) Loss: 2.094611 LR: 0.00004381 [05:16:46] Epoch: 1 Batch: 12178/38378 (31.73%) Loss: 1.980551 LR: 0.00004381 [05:16:48] Epoch: 1 Batch: 12179/38378 (31.73%) Loss: 2.156528 LR: 0.00004381 [05:16:50] Epoch: 1 Batch: 12180/38378 (31.74%) Loss: 2.168126 LR: 0.00004381 [05:16:52] Epoch: 1 Batch: 12181/38378 (31.74%) Loss: 1.972483 LR: 0.00004381 [05:16:54] Epoch: 1 Batch: 12182/38378 (31.74%) Loss: 2.185251 LR: 0.00004380 [05:16:55] Epoch: 1 Batch: 12183/38378 (31.74%) Loss: 1.738476 LR: 0.00004380 [05:16:57] Epoch: 1 Batch: 12184/38378 (31.75%) Loss: 2.072214 LR: 0.00004380 [05:16:59] Epoch: 1 Batch: 12185/38378 (31.75%) Loss: 1.625641 LR: 0.00004380 [05:17:01] Epoch: 1 Batch: 12186/38378 (31.75%) Loss: 2.181026 LR: 0.00004380 [05:17:03] Epoch: 1 Batch: 12187/38378 (31.76%) Loss: 1.860287 LR: 0.00004380 [05:17:04] Epoch: 1 Batch: 12188/38378 (31.76%) Loss: 2.228248 LR: 0.00004380 [05:17:06] Epoch: 1 Batch: 12189/38378 (31.76%) Loss: 2.148637 LR: 0.00004379 [05:17:08] Epoch: 1 Batch: 12190/38378 (31.76%) Loss: 1.979260 LR: 0.00004379 [05:17:10] Epoch: 1 Batch: 12191/38378 (31.77%) Loss: 2.154781 LR: 0.00004379 [05:17:12] Epoch: 1 Batch: 12192/38378 (31.77%) Loss: 2.277178 LR: 0.00004379 [05:17:13] Epoch: 1 Batch: 12193/38378 (31.77%) Loss: 2.120206 LR: 0.00004379 [05:17:15] Epoch: 1 Batch: 12194/38378 (31.77%) Loss: 2.259636 LR: 0.00004379 [05:17:17] Epoch: 1 Batch: 12195/38378 (31.78%) Loss: 2.039780 LR: 0.00004379 [05:17:19] Epoch: 1 Batch: 12196/38378 (31.78%) Loss: 2.066031 LR: 0.00004378 [05:17:21] Epoch: 1 Batch: 12197/38378 (31.78%) Loss: 2.123971 LR: 0.00004378 [05:17:22] Epoch: 1 Batch: 12198/38378 (31.78%) Loss: 1.953175 LR: 0.00004378 [05:17:24] Epoch: 1 Batch: 12199/38378 (31.79%) Loss: 2.029061 LR: 0.00004378 [05:17:30] >> Cleaned up old temp checkpoint: epoch1_step11200 [05:17:30] >> Temp checkpoint saved: epoch1_step12200, size: 0.1702 GB [05:17:30] Epoch: 1 Batch: 12200/38378 (31.79%) Loss: 2.062846 LR: 0.00004378 [05:17:32] Epoch: 1 Batch: 12201/38378 (31.79%) Loss: 2.004211 LR: 0.00004378 [05:17:34] Epoch: 1 Batch: 12202/38378 (31.79%) Loss: 1.964122 LR: 0.00004378 [05:17:36] Epoch: 1 Batch: 12203/38378 (31.80%) Loss: 2.020055 LR: 0.00004377 [05:17:37] Epoch: 1 Batch: 12204/38378 (31.80%) Loss: 1.943110 LR: 0.00004377 [05:17:39] Epoch: 1 Batch: 12205/38378 (31.80%) Loss: 2.056023 LR: 0.00004377 [05:17:41] Epoch: 1 Batch: 12206/38378 (31.80%) Loss: 1.760903 LR: 0.00004377 [05:17:43] Epoch: 1 Batch: 12207/38378 (31.81%) Loss: 2.188727 LR: 0.00004377 [05:17:45] Epoch: 1 Batch: 12208/38378 (31.81%) Loss: 1.764297 LR: 0.00004377 [05:17:46] Epoch: 1 Batch: 12209/38378 (31.81%) Loss: 1.997919 LR: 0.00004377 [05:17:48] Epoch: 1 Batch: 12210/38378 (31.82%) Loss: 1.655475 LR: 0.00004377 [05:17:50] Epoch: 1 Batch: 12211/38378 (31.82%) Loss: 2.447255 LR: 0.00004377 [05:17:52] Epoch: 1 Batch: 12212/38378 (31.82%) Loss: 1.915176 LR: 0.00004377 [05:17:54] Epoch: 1 Batch: 12213/38378 (31.82%) Loss: 2.109216 LR: 0.00004377 [05:17:55] Epoch: 1 Batch: 12214/38378 (31.83%) Loss: 2.124915 LR: 0.00004377 [05:17:57] Epoch: 1 Batch: 12215/38378 (31.83%) Loss: 1.998139 LR: 0.00004377 [05:17:59] Epoch: 1 Batch: 12216/38378 (31.83%) Loss: 1.949497 LR: 0.00004377 [05:18:01] Epoch: 1 Batch: 12217/38378 (31.83%) Loss: 2.153333 LR: 0.00004376 [05:18:03] Epoch: 1 Batch: 12218/38378 (31.84%) Loss: 2.113615 LR: 0.00004376 [05:18:05] Epoch: 1 Batch: 12219/38378 (31.84%) Loss: 2.091295 LR: 0.00004376 [05:18:06] Epoch: 1 Batch: 12220/38378 (31.84%) Loss: 2.094933 LR: 0.00004376 [05:18:08] Epoch: 1 Batch: 12221/38378 (31.84%) Loss: 1.936688 LR: 0.00004376 [05:18:10] Epoch: 1 Batch: 12222/38378 (31.85%) Loss: 2.002077 LR: 0.00004376 [05:18:12] Epoch: 1 Batch: 12223/38378 (31.85%) Loss: 2.296031 LR: 0.00004376 [05:18:14] Epoch: 1 Batch: 12224/38378 (31.85%) Loss: 2.113791 LR: 0.00004375 [05:18:16] Epoch: 1 Batch: 12225/38378 (31.85%) Loss: 1.668298 LR: 0.00004375 [05:18:17] Epoch: 1 Batch: 12226/38378 (31.86%) Loss: 2.040432 LR: 0.00004375 [05:18:19] Epoch: 1 Batch: 12227/38378 (31.86%) Loss: 1.935697 LR: 0.00004375 [05:18:21] Epoch: 1 Batch: 12228/38378 (31.86%) Loss: 2.308139 LR: 0.00004375 [05:18:23] Epoch: 1 Batch: 12229/38378 (31.86%) Loss: 2.045200 LR: 0.00004375 [05:18:25] Epoch: 1 Batch: 12230/38378 (31.87%) Loss: 2.002863 LR: 0.00004375 [05:18:26] Epoch: 1 Batch: 12231/38378 (31.87%) Loss: 1.849611 LR: 0.00004374 [05:18:28] Epoch: 1 Batch: 12232/38378 (31.87%) Loss: 1.902037 LR: 0.00004374 [05:18:30] Epoch: 1 Batch: 12233/38378 (31.88%) Loss: 1.771637 LR: 0.00004374 [05:18:32] Epoch: 1 Batch: 12234/38378 (31.88%) Loss: 2.146053 LR: 0.00004374 [05:18:34] Epoch: 1 Batch: 12235/38378 (31.88%) Loss: 2.148502 LR: 0.00004374 [05:18:36] Epoch: 1 Batch: 12236/38378 (31.88%) Loss: 2.132402 LR: 0.00004374 [05:18:37] Epoch: 1 Batch: 12237/38378 (31.89%) Loss: 1.997552 LR: 0.00004374 [05:18:39] Epoch: 1 Batch: 12238/38378 (31.89%) Loss: 2.072918 LR: 0.00004373 [05:18:41] Epoch: 1 Batch: 12239/38378 (31.89%) Loss: 1.729553 LR: 0.00004373 [05:18:43] Epoch: 1 Batch: 12240/38378 (31.89%) Loss: 2.119332 LR: 0.00004373 [05:18:45] Epoch: 1 Batch: 12241/38378 (31.90%) Loss: 1.900432 LR: 0.00004373 [05:18:46] Epoch: 1 Batch: 12242/38378 (31.90%) Loss: 2.122901 LR: 0.00004373 [05:18:48] Epoch: 1 Batch: 12243/38378 (31.90%) Loss: 2.009678 LR: 0.00004373 [05:18:50] Epoch: 1 Batch: 12244/38378 (31.90%) Loss: 1.786478 LR: 0.00004373 [05:18:52] Epoch: 1 Batch: 12245/38378 (31.91%) Loss: 2.034317 LR: 0.00004372 [05:18:54] Epoch: 1 Batch: 12246/38378 (31.91%) Loss: 1.960828 LR: 0.00004372 [05:18:55] Epoch: 1 Batch: 12247/38378 (31.91%) Loss: 2.317157 LR: 0.00004372 [05:18:57] Epoch: 1 Batch: 12248/38378 (31.91%) Loss: 1.986635 LR: 0.00004372 [05:18:59] Epoch: 1 Batch: 12249/38378 (31.92%) Loss: 2.190874 LR: 0.00004372 [05:19:01] Epoch: 1 Batch: 12250/38378 (31.92%) Loss: 2.106755 LR: 0.00004372 [05:19:02] Epoch: 1 Batch: 12251/38378 (31.92%) Loss: 1.853218 LR: 0.00004372 [05:19:04] Epoch: 1 Batch: 12252/38378 (31.92%) Loss: 2.110418 LR: 0.00004371 [05:19:06] Epoch: 1 Batch: 12253/38378 (31.93%) Loss: 2.013710 LR: 0.00004371 [05:19:08] Epoch: 1 Batch: 12254/38378 (31.93%) Loss: 1.824331 LR: 0.00004371 [05:19:10] Epoch: 1 Batch: 12255/38378 (31.93%) Loss: 1.843964 LR: 0.00004371 [05:19:11] Epoch: 1 Batch: 12256/38378 (31.93%) Loss: 2.070307 LR: 0.00004371 [05:19:13] Epoch: 1 Batch: 12257/38378 (31.94%) Loss: 1.610954 LR: 0.00004371 [05:19:15] Epoch: 1 Batch: 12258/38378 (31.94%) Loss: 2.275452 LR: 0.00004371 [05:19:17] Epoch: 1 Batch: 12259/38378 (31.94%) Loss: 1.852249 LR: 0.00004370 [05:19:19] Epoch: 1 Batch: 12260/38378 (31.95%) Loss: 2.435603 LR: 0.00004370 [05:19:20] Epoch: 1 Batch: 12261/38378 (31.95%) Loss: 2.166855 LR: 0.00004370 [05:19:22] Epoch: 1 Batch: 12262/38378 (31.95%) Loss: 2.141924 LR: 0.00004370 [05:19:24] Epoch: 1 Batch: 12263/38378 (31.95%) Loss: 2.042361 LR: 0.00004370 [05:19:26] Epoch: 1 Batch: 12264/38378 (31.96%) Loss: 1.853474 LR: 0.00004370 [05:19:28] Epoch: 1 Batch: 12265/38378 (31.96%) Loss: 2.158285 LR: 0.00004370 [05:19:29] Epoch: 1 Batch: 12266/38378 (31.96%) Loss: 2.058359 LR: 0.00004369 [05:19:31] Epoch: 1 Batch: 12267/38378 (31.96%) Loss: 1.846199 LR: 0.00004369 [05:19:33] Epoch: 1 Batch: 12268/38378 (31.97%) Loss: 2.120624 LR: 0.00004369 [05:19:35] Epoch: 1 Batch: 12269/38378 (31.97%) Loss: 2.096767 LR: 0.00004369 [05:19:37] Epoch: 1 Batch: 12270/38378 (31.97%) Loss: 1.957342 LR: 0.00004369 [05:19:39] Epoch: 1 Batch: 12271/38378 (31.97%) Loss: 2.062350 LR: 0.00004369 [05:19:40] Epoch: 1 Batch: 12272/38378 (31.98%) Loss: 1.918037 LR: 0.00004369 [05:19:42] Epoch: 1 Batch: 12273/38378 (31.98%) Loss: 2.327722 LR: 0.00004368 [05:19:44] Epoch: 1 Batch: 12274/38378 (31.98%) Loss: 1.919617 LR: 0.00004368 [05:19:46] Epoch: 1 Batch: 12275/38378 (31.98%) Loss: 1.927967 LR: 0.00004368 [05:19:47] Epoch: 1 Batch: 12276/38378 (31.99%) Loss: 2.375907 LR: 0.00004368 [05:19:49] Epoch: 1 Batch: 12277/38378 (31.99%) Loss: 1.967379 LR: 0.00004368 [05:19:51] Epoch: 1 Batch: 12278/38378 (31.99%) Loss: 2.050709 LR: 0.00004368 [05:19:53] Epoch: 1 Batch: 12279/38378 (31.99%) Loss: 1.637303 LR: 0.00004368 [05:19:55] Epoch: 1 Batch: 12280/38378 (32.00%) Loss: 1.949004 LR: 0.00004367 [05:19:56] Epoch: 1 Batch: 12281/38378 (32.00%) Loss: 2.031947 LR: 0.00004367 [05:19:58] Epoch: 1 Batch: 12282/38378 (32.00%) Loss: 1.809831 LR: 0.00004367 [05:20:00] Epoch: 1 Batch: 12283/38378 (32.01%) Loss: 2.015952 LR: 0.00004367 [05:20:02] Epoch: 1 Batch: 12284/38378 (32.01%) Loss: 2.205187 LR: 0.00004367 [05:20:03] Epoch: 1 Batch: 12285/38378 (32.01%) Loss: 1.688195 LR: 0.00004367 [05:20:05] Epoch: 1 Batch: 12286/38378 (32.01%) Loss: 2.155423 LR: 0.00004367 [05:20:07] Epoch: 1 Batch: 12287/38378 (32.02%) Loss: 2.202345 LR: 0.00004367 [05:20:09] Epoch: 1 Batch: 12288/38378 (32.02%) Loss: 1.783881 LR: 0.00004367 [05:20:11] Epoch: 1 Batch: 12289/38378 (32.02%) Loss: 1.906960 LR: 0.00004367 [05:20:13] Epoch: 1 Batch: 12290/38378 (32.02%) Loss: 1.882772 LR: 0.00004367 [05:20:14] Epoch: 1 Batch: 12291/38378 (32.03%) Loss: 2.214571 LR: 0.00004367 [05:20:16] Epoch: 1 Batch: 12292/38378 (32.03%) Loss: 1.892631 LR: 0.00004367 [05:20:18] Epoch: 1 Batch: 12293/38378 (32.03%) Loss: 2.257297 LR: 0.00004367 [05:20:20] Epoch: 1 Batch: 12294/38378 (32.03%) Loss: 2.141119 LR: 0.00004366 [05:20:22] Epoch: 1 Batch: 12295/38378 (32.04%) Loss: 1.909805 LR: 0.00004366 [05:20:23] Epoch: 1 Batch: 12296/38378 (32.04%) Loss: 2.117818 LR: 0.00004366 [05:20:25] Epoch: 1 Batch: 12297/38378 (32.04%) Loss: 1.870207 LR: 0.00004366 [05:20:27] Epoch: 1 Batch: 12298/38378 (32.04%) Loss: 1.963944 LR: 0.00004366 [05:20:29] Epoch: 1 Batch: 12299/38378 (32.05%) Loss: 1.904690 LR: 0.00004366 [05:20:35] >> Cleaned up old temp checkpoint: epoch1_step11300 [05:20:35] >> Temp checkpoint saved: epoch1_step12300, size: 0.1702 GB [05:20:35] Epoch: 1 Batch: 12300/38378 (32.05%) Loss: 1.959435 LR: 0.00004366 [05:20:37] Epoch: 1 Batch: 12301/38378 (32.05%) Loss: 2.218076 LR: 0.00004365 [05:20:39] Epoch: 1 Batch: 12302/38378 (32.05%) Loss: 1.937176 LR: 0.00004365 [05:20:40] Epoch: 1 Batch: 12303/38378 (32.06%) Loss: 2.094373 LR: 0.00004365 [05:20:42] Epoch: 1 Batch: 12304/38378 (32.06%) Loss: 2.093828 LR: 0.00004365 [05:20:44] Epoch: 1 Batch: 12305/38378 (32.06%) Loss: 2.058388 LR: 0.00004365 [05:20:46] Epoch: 1 Batch: 12306/38378 (32.07%) Loss: 1.992536 LR: 0.00004365 [05:20:48] Epoch: 1 Batch: 12307/38378 (32.07%) Loss: 2.103603 LR: 0.00004365 [05:20:50] Epoch: 1 Batch: 12308/38378 (32.07%) Loss: 2.173274 LR: 0.00004364 [05:20:51] Epoch: 1 Batch: 12309/38378 (32.07%) Loss: 2.217428 LR: 0.00004364 [05:20:53] Epoch: 1 Batch: 12310/38378 (32.08%) Loss: 2.030618 LR: 0.00004364 [05:20:55] Epoch: 1 Batch: 12311/38378 (32.08%) Loss: 2.128920 LR: 0.00004364 [05:20:57] Epoch: 1 Batch: 12312/38378 (32.08%) Loss: 2.159483 LR: 0.00004364 [05:20:59] Epoch: 1 Batch: 12313/38378 (32.08%) Loss: 2.181360 LR: 0.00004364 [05:21:00] Epoch: 1 Batch: 12314/38378 (32.09%) Loss: 1.981001 LR: 0.00004364 [05:21:02] Epoch: 1 Batch: 12315/38378 (32.09%) Loss: 2.161718 LR: 0.00004363 [05:21:04] Epoch: 1 Batch: 12316/38378 (32.09%) Loss: 1.773004 LR: 0.00004363 [05:21:06] Epoch: 1 Batch: 12317/38378 (32.09%) Loss: 2.109433 LR: 0.00004363 [05:21:08] Epoch: 1 Batch: 12318/38378 (32.10%) Loss: 2.083652 LR: 0.00004363 [05:21:09] Epoch: 1 Batch: 12319/38378 (32.10%) Loss: 2.087727 LR: 0.00004363 [05:21:11] Epoch: 1 Batch: 12320/38378 (32.10%) Loss: 2.273841 LR: 0.00004363 [05:21:13] Epoch: 1 Batch: 12321/38378 (32.10%) Loss: 2.367253 LR: 0.00004363 [05:21:15] Epoch: 1 Batch: 12322/38378 (32.11%) Loss: 2.080187 LR: 0.00004362 [05:21:17] Epoch: 1 Batch: 12323/38378 (32.11%) Loss: 1.891174 LR: 0.00004362 [05:21:19] Epoch: 1 Batch: 12324/38378 (32.11%) Loss: 1.919132 LR: 0.00004362 [05:21:20] Epoch: 1 Batch: 12325/38378 (32.11%) Loss: 2.230711 LR: 0.00004362 [05:21:22] Epoch: 1 Batch: 12326/38378 (32.12%) Loss: 2.028748 LR: 0.00004362 [05:21:24] Epoch: 1 Batch: 12327/38378 (32.12%) Loss: 1.988536 LR: 0.00004362 [05:21:26] Epoch: 1 Batch: 12328/38378 (32.12%) Loss: 1.841469 LR: 0.00004362 [05:21:28] Epoch: 1 Batch: 12329/38378 (32.13%) Loss: 1.872429 LR: 0.00004361 [05:21:30] Epoch: 1 Batch: 12330/38378 (32.13%) Loss: 1.799243 LR: 0.00004361 [05:21:31] Epoch: 1 Batch: 12331/38378 (32.13%) Loss: 1.828331 LR: 0.00004361 [05:21:33] Epoch: 1 Batch: 12332/38378 (32.13%) Loss: 1.868400 LR: 0.00004361 [05:21:35] Epoch: 1 Batch: 12333/38378 (32.14%) Loss: 2.005418 LR: 0.00004361 [05:21:37] Epoch: 1 Batch: 12334/38378 (32.14%) Loss: 2.344656 LR: 0.00004361 [05:21:39] Epoch: 1 Batch: 12335/38378 (32.14%) Loss: 1.933562 LR: 0.00004361 [05:21:40] Epoch: 1 Batch: 12336/38378 (32.14%) Loss: 1.893331 LR: 0.00004360 [05:21:42] Epoch: 1 Batch: 12337/38378 (32.15%) Loss: 1.932212 LR: 0.00004360 [05:21:44] Epoch: 1 Batch: 12338/38378 (32.15%) Loss: 1.996081 LR: 0.00004360 [05:21:46] Epoch: 1 Batch: 12339/38378 (32.15%) Loss: 1.993714 LR: 0.00004360 [05:21:48] Epoch: 1 Batch: 12340/38378 (32.15%) Loss: 1.852590 LR: 0.00004360 [05:21:50] Epoch: 1 Batch: 12341/38378 (32.16%) Loss: 1.819623 LR: 0.00004360 [05:21:51] Epoch: 1 Batch: 12342/38378 (32.16%) Loss: 1.913445 LR: 0.00004360 [05:21:53] Epoch: 1 Batch: 12343/38378 (32.16%) Loss: 2.267877 LR: 0.00004359 [05:21:55] Epoch: 1 Batch: 12344/38378 (32.16%) Loss: 2.310690 LR: 0.00004359 [05:21:57] Epoch: 1 Batch: 12345/38378 (32.17%) Loss: 2.076809 LR: 0.00004359 [05:21:59] Epoch: 1 Batch: 12346/38378 (32.17%) Loss: 1.977662 LR: 0.00004359 [05:22:00] Epoch: 1 Batch: 12347/38378 (32.17%) Loss: 1.912337 LR: 0.00004359 [05:22:02] Epoch: 1 Batch: 12348/38378 (32.17%) Loss: 2.251390 LR: 0.00004359 [05:22:04] Epoch: 1 Batch: 12349/38378 (32.18%) Loss: 2.147209 LR: 0.00004359 [05:22:06] Epoch: 1 Batch: 12350/38378 (32.18%) Loss: 1.782866 LR: 0.00004358 [05:22:08] Epoch: 1 Batch: 12351/38378 (32.18%) Loss: 1.686670 LR: 0.00004358 [05:22:10] Epoch: 1 Batch: 12352/38378 (32.19%) Loss: 2.145810 LR: 0.00004358 [05:22:11] Epoch: 1 Batch: 12353/38378 (32.19%) Loss: 1.861975 LR: 0.00004358 [05:22:13] Epoch: 1 Batch: 12354/38378 (32.19%) Loss: 2.094336 LR: 0.00004358 [05:22:15] Epoch: 1 Batch: 12355/38378 (32.19%) Loss: 2.095832 LR: 0.00004358 [05:22:17] Epoch: 1 Batch: 12356/38378 (32.20%) Loss: 2.147679 LR: 0.00004358 [05:22:19] Epoch: 1 Batch: 12357/38378 (32.20%) Loss: 1.945527 LR: 0.00004357 [05:22:20] Epoch: 1 Batch: 12358/38378 (32.20%) Loss: 1.913463 LR: 0.00004357 [05:22:22] Epoch: 1 Batch: 12359/38378 (32.20%) Loss: 2.082439 LR: 0.00004357 [05:22:24] Epoch: 1 Batch: 12360/38378 (32.21%) Loss: 2.081272 LR: 0.00004357 [05:22:26] Epoch: 1 Batch: 12361/38378 (32.21%) Loss: 1.852376 LR: 0.00004357 [05:22:27] Epoch: 1 Batch: 12362/38378 (32.21%) Loss: 1.765427 LR: 0.00004357 [05:22:29] Epoch: 1 Batch: 12363/38378 (32.21%) Loss: 1.781414 LR: 0.00004357 [05:22:31] Epoch: 1 Batch: 12364/38378 (32.22%) Loss: 1.908844 LR: 0.00004356 [05:22:33] Epoch: 1 Batch: 12365/38378 (32.22%) Loss: 2.140239 LR: 0.00004356 [05:22:35] Epoch: 1 Batch: 12366/38378 (32.22%) Loss: 1.974543 LR: 0.00004356 [05:22:36] Epoch: 1 Batch: 12367/38378 (32.22%) Loss: 1.657084 LR: 0.00004356 [05:22:38] Epoch: 1 Batch: 12368/38378 (32.23%) Loss: 1.905848 LR: 0.00004356 [05:22:40] Epoch: 1 Batch: 12369/38378 (32.23%) Loss: 2.008963 LR: 0.00004356 [05:22:42] Epoch: 1 Batch: 12370/38378 (32.23%) Loss: 1.908604 LR: 0.00004356 [05:22:44] Epoch: 1 Batch: 12371/38378 (32.23%) Loss: 2.219605 LR: 0.00004356 [05:22:45] Epoch: 1 Batch: 12372/38378 (32.24%) Loss: 1.791923 LR: 0.00004356 [05:22:47] Epoch: 1 Batch: 12373/38378 (32.24%) Loss: 1.829593 LR: 0.00004356 [05:22:49] Epoch: 1 Batch: 12374/38378 (32.24%) Loss: 1.949767 LR: 0.00004356 [05:22:51] Epoch: 1 Batch: 12375/38378 (32.25%) Loss: 2.206906 LR: 0.00004356 [05:22:53] Epoch: 1 Batch: 12376/38378 (32.25%) Loss: 1.917423 LR: 0.00004356 [05:22:54] Epoch: 1 Batch: 12377/38378 (32.25%) Loss: 1.659167 LR: 0.00004356 [05:22:56] Epoch: 1 Batch: 12378/38378 (32.25%) Loss: 2.065572 LR: 0.00004355 [05:22:58] Epoch: 1 Batch: 12379/38378 (32.26%) Loss: 2.130323 LR: 0.00004355 [05:23:00] Epoch: 1 Batch: 12380/38378 (32.26%) Loss: 1.740262 LR: 0.00004355 [05:23:02] Epoch: 1 Batch: 12381/38378 (32.26%) Loss: 2.010221 LR: 0.00004355 [05:23:04] Epoch: 1 Batch: 12382/38378 (32.26%) Loss: 1.944853 LR: 0.00004355 [05:23:05] Epoch: 1 Batch: 12383/38378 (32.27%) Loss: 1.830502 LR: 0.00004355 [05:23:07] Epoch: 1 Batch: 12384/38378 (32.27%) Loss: 1.570355 LR: 0.00004355 [05:23:09] Epoch: 1 Batch: 12385/38378 (32.27%) Loss: 1.763964 LR: 0.00004354 [05:23:11] Epoch: 1 Batch: 12386/38378 (32.27%) Loss: 1.886849 LR: 0.00004354 [05:23:13] Epoch: 1 Batch: 12387/38378 (32.28%) Loss: 1.884430 LR: 0.00004354 [05:23:14] Epoch: 1 Batch: 12388/38378 (32.28%) Loss: 1.930461 LR: 0.00004354 [05:23:16] Epoch: 1 Batch: 12389/38378 (32.28%) Loss: 1.907679 LR: 0.00004354 [05:23:18] Epoch: 1 Batch: 12390/38378 (32.28%) Loss: 2.001097 LR: 0.00004354 [05:23:20] Epoch: 1 Batch: 12391/38378 (32.29%) Loss: 2.242992 LR: 0.00004354 [05:23:22] Epoch: 1 Batch: 12392/38378 (32.29%) Loss: 2.046430 LR: 0.00004353 [05:23:24] Epoch: 1 Batch: 12393/38378 (32.29%) Loss: 1.647086 LR: 0.00004353 [05:23:25] Epoch: 1 Batch: 12394/38378 (32.29%) Loss: 2.147971 LR: 0.00004353 [05:23:27] Epoch: 1 Batch: 12395/38378 (32.30%) Loss: 1.847170 LR: 0.00004353 [05:23:29] Epoch: 1 Batch: 12396/38378 (32.30%) Loss: 1.995661 LR: 0.00004353 [05:23:31] Epoch: 1 Batch: 12397/38378 (32.30%) Loss: 1.792042 LR: 0.00004353 [05:23:33] Epoch: 1 Batch: 12398/38378 (32.30%) Loss: 1.853422 LR: 0.00004353 [05:23:34] Epoch: 1 Batch: 12399/38378 (32.31%) Loss: 1.982117 LR: 0.00004352 [05:23:40] >> Cleaned up old temp checkpoint: epoch1_step11400 [05:23:40] >> Temp checkpoint saved: epoch1_step12400, size: 0.1702 GB [05:23:40] Epoch: 1 Batch: 12400/38378 (32.31%) Loss: 1.883268 LR: 0.00004352 [05:23:42] Epoch: 1 Batch: 12401/38378 (32.31%) Loss: 1.799314 LR: 0.00004352 [05:23:44] Epoch: 1 Batch: 12402/38378 (32.32%) Loss: 1.894152 LR: 0.00004352 [05:23:46] Epoch: 1 Batch: 12403/38378 (32.32%) Loss: 1.768637 LR: 0.00004352 [05:23:48] Epoch: 1 Batch: 12404/38378 (32.32%) Loss: 2.055270 LR: 0.00004352 [05:23:49] Epoch: 1 Batch: 12405/38378 (32.32%) Loss: 1.835065 LR: 0.00004352 [05:23:51] Epoch: 1 Batch: 12406/38378 (32.33%) Loss: 2.107022 LR: 0.00004351 [05:23:53] Epoch: 1 Batch: 12407/38378 (32.33%) Loss: 2.359379 LR: 0.00004351 [05:23:55] Epoch: 1 Batch: 12408/38378 (32.33%) Loss: 2.088342 LR: 0.00004351 [05:23:57] Epoch: 1 Batch: 12409/38378 (32.33%) Loss: 1.657326 LR: 0.00004351 [05:23:58] Epoch: 1 Batch: 12410/38378 (32.34%) Loss: 2.142455 LR: 0.00004351 [05:24:00] Epoch: 1 Batch: 12411/38378 (32.34%) Loss: 1.731149 LR: 0.00004351 [05:24:02] Epoch: 1 Batch: 12412/38378 (32.34%) Loss: 1.861871 LR: 0.00004351 [05:24:04] Epoch: 1 Batch: 12413/38378 (32.34%) Loss: 2.091754 LR: 0.00004350 [05:24:06] Epoch: 1 Batch: 12414/38378 (32.35%) Loss: 2.118168 LR: 0.00004350 [05:24:08] Epoch: 1 Batch: 12415/38378 (32.35%) Loss: 1.805532 LR: 0.00004350 [05:24:09] Epoch: 1 Batch: 12416/38378 (32.35%) Loss: 2.239060 LR: 0.00004350 [05:24:11] Epoch: 1 Batch: 12417/38378 (32.35%) Loss: 2.028822 LR: 0.00004350 [05:24:13] Epoch: 1 Batch: 12418/38378 (32.36%) Loss: 2.246211 LR: 0.00004350 [05:24:15] Epoch: 1 Batch: 12419/38378 (32.36%) Loss: 1.801758 LR: 0.00004350 [05:24:17] Epoch: 1 Batch: 12420/38378 (32.36%) Loss: 2.149733 LR: 0.00004349 [05:24:19] Epoch: 1 Batch: 12421/38378 (32.36%) Loss: 2.209253 LR: 0.00004349 [05:24:20] Epoch: 1 Batch: 12422/38378 (32.37%) Loss: 1.990953 LR: 0.00004349 [05:24:22] Epoch: 1 Batch: 12423/38378 (32.37%) Loss: 1.911384 LR: 0.00004349 [05:24:24] Epoch: 1 Batch: 12424/38378 (32.37%) Loss: 1.977892 LR: 0.00004349 [05:24:26] Epoch: 1 Batch: 12425/38378 (32.38%) Loss: 1.573256 LR: 0.00004349 [05:24:28] Epoch: 1 Batch: 12426/38378 (32.38%) Loss: 1.952577 LR: 0.00004349 [05:24:30] Epoch: 1 Batch: 12427/38378 (32.38%) Loss: 2.078716 LR: 0.00004348 [05:24:31] Epoch: 1 Batch: 12428/38378 (32.38%) Loss: 2.101523 LR: 0.00004348 [05:24:33] Epoch: 1 Batch: 12429/38378 (32.39%) Loss: 1.985366 LR: 0.00004348 [05:24:35] Epoch: 1 Batch: 12430/38378 (32.39%) Loss: 1.714145 LR: 0.00004348 [05:24:37] Epoch: 1 Batch: 12431/38378 (32.39%) Loss: 2.304075 LR: 0.00004348 [05:24:39] Epoch: 1 Batch: 12432/38378 (32.39%) Loss: 2.114254 LR: 0.00004348 [05:24:40] Epoch: 1 Batch: 12433/38378 (32.40%) Loss: 1.843191 LR: 0.00004348 [05:24:42] Epoch: 1 Batch: 12434/38378 (32.40%) Loss: 1.984522 LR: 0.00004347 [05:24:44] Epoch: 1 Batch: 12435/38378 (32.40%) Loss: 2.072131 LR: 0.00004347 [05:24:46] Epoch: 1 Batch: 12436/38378 (32.40%) Loss: 2.002471 LR: 0.00004347 [05:24:48] Epoch: 1 Batch: 12437/38378 (32.41%) Loss: 1.819316 LR: 0.00004347 [05:24:50] Epoch: 1 Batch: 12438/38378 (32.41%) Loss: 2.304608 LR: 0.00004347 [05:24:51] Epoch: 1 Batch: 12439/38378 (32.41%) Loss: 2.028094 LR: 0.00004347 [05:24:53] Epoch: 1 Batch: 12440/38378 (32.41%) Loss: 2.081252 LR: 0.00004347 [05:24:55] Epoch: 1 Batch: 12441/38378 (32.42%) Loss: 1.945842 LR: 0.00004346 [05:24:57] Epoch: 1 Batch: 12442/38378 (32.42%) Loss: 1.943098 LR: 0.00004346 [05:24:58] Epoch: 1 Batch: 12443/38378 (32.42%) Loss: 1.954833 LR: 0.00004346 [05:25:00] Epoch: 1 Batch: 12444/38378 (32.42%) Loss: 2.038344 LR: 0.00004346 [05:25:02] Epoch: 1 Batch: 12445/38378 (32.43%) Loss: 2.012692 LR: 0.00004346 [05:25:04] Epoch: 1 Batch: 12446/38378 (32.43%) Loss: 1.892423 LR: 0.00004346 [05:25:06] Epoch: 1 Batch: 12447/38378 (32.43%) Loss: 1.838835 LR: 0.00004346 [05:25:08] Epoch: 1 Batch: 12448/38378 (32.44%) Loss: 2.098710 LR: 0.00004345 [05:25:09] Epoch: 1 Batch: 12449/38378 (32.44%) Loss: 2.128773 LR: 0.00004345 [05:25:11] Epoch: 1 Batch: 12450/38378 (32.44%) Loss: 2.003636 LR: 0.00004345 [05:25:13] Epoch: 1 Batch: 12451/38378 (32.44%) Loss: 1.668659 LR: 0.00004345 [05:25:15] Epoch: 1 Batch: 12452/38378 (32.45%) Loss: 1.630385 LR: 0.00004345 [05:25:17] Epoch: 1 Batch: 12453/38378 (32.45%) Loss: 1.999998 LR: 0.00004345 [05:25:18] Epoch: 1 Batch: 12454/38378 (32.45%) Loss: 1.798405 LR: 0.00004345 [05:25:20] Epoch: 1 Batch: 12455/38378 (32.45%) Loss: 1.879808 LR: 0.00004345 [05:25:22] Epoch: 1 Batch: 12456/38378 (32.46%) Loss: 2.006081 LR: 0.00004345 [05:25:24] Epoch: 1 Batch: 12457/38378 (32.46%) Loss: 1.959572 LR: 0.00004345 [05:25:26] Epoch: 1 Batch: 12458/38378 (32.46%) Loss: 2.127502 LR: 0.00004345 [05:25:27] Epoch: 1 Batch: 12459/38378 (32.46%) Loss: 1.916291 LR: 0.00004345 [05:25:29] Epoch: 1 Batch: 12460/38378 (32.47%) Loss: 2.032852 LR: 0.00004345 [05:25:31] Epoch: 1 Batch: 12461/38378 (32.47%) Loss: 1.934688 LR: 0.00004345 [05:25:33] Epoch: 1 Batch: 12462/38378 (32.47%) Loss: 1.939899 LR: 0.00004344 [05:25:35] Epoch: 1 Batch: 12463/38378 (32.47%) Loss: 2.238133 LR: 0.00004344 [05:25:37] Epoch: 1 Batch: 12464/38378 (32.48%) Loss: 1.851322 LR: 0.00004344 [05:25:38] Epoch: 1 Batch: 12465/38378 (32.48%) Loss: 1.835338 LR: 0.00004344 [05:25:40] Epoch: 1 Batch: 12466/38378 (32.48%) Loss: 2.082305 LR: 0.00004344 [05:25:42] Epoch: 1 Batch: 12467/38378 (32.48%) Loss: 1.921950 LR: 0.00004344 [05:25:44] Epoch: 1 Batch: 12468/38378 (32.49%) Loss: 2.218992 LR: 0.00004344 [05:25:46] Epoch: 1 Batch: 12469/38378 (32.49%) Loss: 1.753723 LR: 0.00004343 [05:25:47] Epoch: 1 Batch: 12470/38378 (32.49%) Loss: 2.459086 LR: 0.00004343 [05:25:49] Epoch: 1 Batch: 12471/38378 (32.50%) Loss: 2.028970 LR: 0.00004343 [05:25:51] Epoch: 1 Batch: 12472/38378 (32.50%) Loss: 1.989437 LR: 0.00004343 [05:25:53] Epoch: 1 Batch: 12473/38378 (32.50%) Loss: 2.049585 LR: 0.00004343 [05:25:54] Epoch: 1 Batch: 12474/38378 (32.50%) Loss: 2.047327 LR: 0.00004343 [05:25:56] Epoch: 1 Batch: 12475/38378 (32.51%) Loss: 2.084907 LR: 0.00004343 [05:25:58] Epoch: 1 Batch: 12476/38378 (32.51%) Loss: 1.890689 LR: 0.00004342 [05:26:00] Epoch: 1 Batch: 12477/38378 (32.51%) Loss: 2.225478 LR: 0.00004342 [05:26:02] Epoch: 1 Batch: 12478/38378 (32.51%) Loss: 1.805984 LR: 0.00004342 [05:26:04] Epoch: 1 Batch: 12479/38378 (32.52%) Loss: 2.114858 LR: 0.00004342 [05:26:05] Epoch: 1 Batch: 12480/38378 (32.52%) Loss: 1.934749 LR: 0.00004342 [05:26:07] Epoch: 1 Batch: 12481/38378 (32.52%) Loss: 1.974154 LR: 0.00004342 [05:26:09] Epoch: 1 Batch: 12482/38378 (32.52%) Loss: 2.033949 LR: 0.00004342 [05:26:11] Epoch: 1 Batch: 12483/38378 (32.53%) Loss: 1.983857 LR: 0.00004341 [05:26:13] Epoch: 1 Batch: 12484/38378 (32.53%) Loss: 2.375930 LR: 0.00004341 [05:26:14] Epoch: 1 Batch: 12485/38378 (32.53%) Loss: 1.915492 LR: 0.00004341 [05:26:16] Epoch: 1 Batch: 12486/38378 (32.53%) Loss: 1.807539 LR: 0.00004341 [05:26:18] Epoch: 1 Batch: 12487/38378 (32.54%) Loss: 1.795169 LR: 0.00004341 [05:26:20] Epoch: 1 Batch: 12488/38378 (32.54%) Loss: 1.963593 LR: 0.00004341 [05:26:22] Epoch: 1 Batch: 12489/38378 (32.54%) Loss: 2.302647 LR: 0.00004341 [05:26:23] Epoch: 1 Batch: 12490/38378 (32.54%) Loss: 2.029721 LR: 0.00004340 [05:26:25] Epoch: 1 Batch: 12491/38378 (32.55%) Loss: 2.080234 LR: 0.00004340 [05:26:27] Epoch: 1 Batch: 12492/38378 (32.55%) Loss: 1.716294 LR: 0.00004340 [05:26:29] Epoch: 1 Batch: 12493/38378 (32.55%) Loss: 2.042802 LR: 0.00004340 [05:26:31] Epoch: 1 Batch: 12494/38378 (32.56%) Loss: 1.863109 LR: 0.00004340 [05:26:32] Epoch: 1 Batch: 12495/38378 (32.56%) Loss: 2.034577 LR: 0.00004340 [05:26:34] Epoch: 1 Batch: 12496/38378 (32.56%) Loss: 2.227431 LR: 0.00004340 [05:26:36] Epoch: 1 Batch: 12497/38378 (32.56%) Loss: 1.946185 LR: 0.00004339 [05:26:38] Epoch: 1 Batch: 12498/38378 (32.57%) Loss: 1.915259 LR: 0.00004339 [05:26:40] Epoch: 1 Batch: 12499/38378 (32.57%) Loss: 1.764160 LR: 0.00004339 [05:26:41] >> Evaluating batch 0 [05:26:42] >> Evaluating batch 1 [05:26:43] >> Evaluating batch 2 [05:26:44] >> Evaluating batch 3 [05:26:45] >> Evaluating batch 4 [05:26:46] >> Evaluating batch 5 [05:26:47] >> Evaluating batch 6 [05:26:48] >> Evaluating batch 7 [05:26:49] >> Evaluating batch 8 [05:26:50] >> Evaluating batch 9 [05:26:51] >> Evaluating batch 10 [05:26:52] >> Evaluating batch 11 [05:26:53] >> Evaluating batch 12 [05:26:54] >> Evaluating batch 13 [05:26:55] >> Evaluating batch 14 [05:26:56] >> Evaluating batch 15 [05:26:57] >> Evaluating batch 16 [05:26:58] Epoch: 1 Step: 12500/38378 Evaluation: [05:26:58] [1mAvg Loss Since Last Eval: 2.0035 Val Loss: 2.1041 Validation loss delta: -0.0017 Perplexity: 8.1998 LR: 0.00004339 [05:27:02] >> Cleaned up old temp checkpoint: epoch1_step11500 [05:27:02] >> Temp checkpoint saved: epoch1_step12500, size: 0.1702 GB [05:27:06] >> Checkpoint saved: epoch1_step12500, size: 0.1702 GB [05:27:06] Epoch: 1 Batch: 12500/38378 (32.57%) Loss: 1.991239 LR: 0.00004339 [05:27:08] Epoch: 1 Batch: 12501/38378 (32.57%) Loss: 2.064735 LR: 0.00004339 [05:27:10] Epoch: 1 Batch: 12502/38378 (32.58%) Loss: 2.060699 LR: 0.00004339 [05:27:12] Epoch: 1 Batch: 12503/38378 (32.58%) Loss: 1.905621 LR: 0.00004339 [05:27:13] Epoch: 1 Batch: 12504/38378 (32.58%) Loss: 2.304635 LR: 0.00004338 [05:27:15] Epoch: 1 Batch: 12505/38378 (32.58%) Loss: 1.939079 LR: 0.00004338 [05:27:17] Epoch: 1 Batch: 12506/38378 (32.59%) Loss: 2.097610 LR: 0.00004338 [05:27:19] Epoch: 1 Batch: 12507/38378 (32.59%) Loss: 1.737985 LR: 0.00004338 [05:27:21] Epoch: 1 Batch: 12508/38378 (32.59%) Loss: 1.987575 LR: 0.00004338 [05:27:22] Epoch: 1 Batch: 12509/38378 (32.59%) Loss: 2.104150 LR: 0.00004338 [05:27:24] Epoch: 1 Batch: 12510/38378 (32.60%) Loss: 2.159462 LR: 0.00004338 [05:27:26] Epoch: 1 Batch: 12511/38378 (32.60%) Loss: 1.682075 LR: 0.00004337 [05:27:28] Epoch: 1 Batch: 12512/38378 (32.60%) Loss: 1.873797 LR: 0.00004337 [05:27:30] Epoch: 1 Batch: 12513/38378 (32.60%) Loss: 1.592288 LR: 0.00004337 [05:27:32] Epoch: 1 Batch: 12514/38378 (32.61%) Loss: 1.892790 LR: 0.00004337 [05:27:33] Epoch: 1 Batch: 12515/38378 (32.61%) Loss: 1.805660 LR: 0.00004337 [05:27:35] Epoch: 1 Batch: 12516/38378 (32.61%) Loss: 1.705950 LR: 0.00004337 [05:27:37] Epoch: 1 Batch: 12517/38378 (32.62%) Loss: 2.117811 LR: 0.00004337 [05:27:39] Epoch: 1 Batch: 12518/38378 (32.62%) Loss: 2.133795 LR: 0.00004336 [05:27:41] Epoch: 1 Batch: 12519/38378 (32.62%) Loss: 2.110880 LR: 0.00004336 [05:27:43] Epoch: 1 Batch: 12520/38378 (32.62%) Loss: 2.126414 LR: 0.00004336 [05:27:45] Epoch: 1 Batch: 12521/38378 (32.63%) Loss: 1.966740 LR: 0.00004336 [05:27:46] Epoch: 1 Batch: 12522/38378 (32.63%) Loss: 2.289321 LR: 0.00004336 [05:27:48] Epoch: 1 Batch: 12523/38378 (32.63%) Loss: 1.837956 LR: 0.00004336 [05:27:50] Epoch: 1 Batch: 12524/38378 (32.63%) Loss: 2.174764 LR: 0.00004336 [05:27:52] Epoch: 1 Batch: 12525/38378 (32.64%) Loss: 1.870435 LR: 0.00004335 [05:27:54] Epoch: 1 Batch: 12526/38378 (32.64%) Loss: 1.949694 LR: 0.00004335 [05:27:56] Epoch: 1 Batch: 12527/38378 (32.64%) Loss: 2.022763 LR: 0.00004335 [05:27:57] Epoch: 1 Batch: 12528/38378 (32.64%) Loss: 2.307664 LR: 0.00004335 [05:27:59] Epoch: 1 Batch: 12529/38378 (32.65%) Loss: 2.020528 LR: 0.00004335 [05:28:01] Epoch: 1 Batch: 12530/38378 (32.65%) Loss: 2.002713 LR: 0.00004335 [05:28:03] Epoch: 1 Batch: 12531/38378 (32.65%) Loss: 1.952532 LR: 0.00004335 [05:28:05] Epoch: 1 Batch: 12532/38378 (32.65%) Loss: 2.147504 LR: 0.00004334 [05:28:06] Epoch: 1 Batch: 12533/38378 (32.66%) Loss: 1.935486 LR: 0.00004334 [05:28:08] Epoch: 1 Batch: 12534/38378 (32.66%) Loss: 2.166965 LR: 0.00004334 [05:28:10] Epoch: 1 Batch: 12535/38378 (32.66%) Loss: 1.752695 LR: 0.00004334 [05:28:12] Epoch: 1 Batch: 12536/38378 (32.66%) Loss: 2.183397 LR: 0.00004334 [05:28:14] Epoch: 1 Batch: 12537/38378 (32.67%) Loss: 2.199473 LR: 0.00004334 [05:28:15] Epoch: 1 Batch: 12538/38378 (32.67%) Loss: 1.951771 LR: 0.00004334 [05:28:17] Epoch: 1 Batch: 12539/38378 (32.67%) Loss: 1.806903 LR: 0.00004333 [05:28:19] Epoch: 1 Batch: 12540/38378 (32.67%) Loss: 2.080066 LR: 0.00004333 [05:28:21] Epoch: 1 Batch: 12541/38378 (32.68%) Loss: 1.948420 LR: 0.00004333 [05:28:23] Epoch: 1 Batch: 12542/38378 (32.68%) Loss: 1.810369 LR: 0.00004333 [05:28:24] Epoch: 1 Batch: 12543/38378 (32.68%) Loss: 2.089108 LR: 0.00004333 [05:28:26] Epoch: 1 Batch: 12544/38378 (32.69%) Loss: 1.557699 LR: 0.00004333 [05:28:28] Epoch: 1 Batch: 12545/38378 (32.69%) Loss: 1.964722 LR: 0.00004333 [05:28:30] Epoch: 1 Batch: 12546/38378 (32.69%) Loss: 2.031756 LR: 0.00004333 [05:28:31] Epoch: 1 Batch: 12547/38378 (32.69%) Loss: 1.823960 LR: 0.00004333 [05:28:33] Epoch: 1 Batch: 12548/38378 (32.70%) Loss: 2.035984 LR: 0.00004333 [05:28:35] Epoch: 1 Batch: 12549/38378 (32.70%) Loss: 1.964218 LR: 0.00004333 [05:28:37] Epoch: 1 Batch: 12550/38378 (32.70%) Loss: 2.107492 LR: 0.00004333 [05:28:39] Epoch: 1 Batch: 12551/38378 (32.70%) Loss: 2.170182 LR: 0.00004333 [05:28:40] Epoch: 1 Batch: 12552/38378 (32.71%) Loss: 2.093199 LR: 0.00004333 [05:28:42] Epoch: 1 Batch: 12553/38378 (32.71%) Loss: 2.090325 LR: 0.00004332 [05:28:44] Epoch: 1 Batch: 12554/38378 (32.71%) Loss: 2.070082 LR: 0.00004332 [05:28:46] Epoch: 1 Batch: 12555/38378 (32.71%) Loss: 1.884537 LR: 0.00004332 [05:28:48] Epoch: 1 Batch: 12556/38378 (32.72%) Loss: 2.473371 LR: 0.00004332 [05:28:50] Epoch: 1 Batch: 12557/38378 (32.72%) Loss: 1.899168 LR: 0.00004332 [05:28:51] Epoch: 1 Batch: 12558/38378 (32.72%) Loss: 2.052628 LR: 0.00004332 [05:28:53] Epoch: 1 Batch: 12559/38378 (32.72%) Loss: 2.180401 LR: 0.00004332 [05:28:55] Epoch: 1 Batch: 12560/38378 (32.73%) Loss: 2.156700 LR: 0.00004331 [05:28:57] Epoch: 1 Batch: 12561/38378 (32.73%) Loss: 2.129172 LR: 0.00004331 [05:28:59] Epoch: 1 Batch: 12562/38378 (32.73%) Loss: 1.790809 LR: 0.00004331 [05:29:00] Epoch: 1 Batch: 12563/38378 (32.73%) Loss: 1.962072 LR: 0.00004331 [05:29:02] Epoch: 1 Batch: 12564/38378 (32.74%) Loss: 1.922906 LR: 0.00004331 [05:29:04] Epoch: 1 Batch: 12565/38378 (32.74%) Loss: 2.035179 LR: 0.00004331 [05:29:06] Epoch: 1 Batch: 12566/38378 (32.74%) Loss: 2.106061 LR: 0.00004331 [05:29:08] Epoch: 1 Batch: 12567/38378 (32.75%) Loss: 2.475627 LR: 0.00004330 [05:29:10] Epoch: 1 Batch: 12568/38378 (32.75%) Loss: 2.088617 LR: 0.00004330 [05:29:11] Epoch: 1 Batch: 12569/38378 (32.75%) Loss: 1.872884 LR: 0.00004330 [05:29:13] Epoch: 1 Batch: 12570/38378 (32.75%) Loss: 2.373793 LR: 0.00004330 [05:29:15] Epoch: 1 Batch: 12571/38378 (32.76%) Loss: 2.147692 LR: 0.00004330 [05:29:17] Epoch: 1 Batch: 12572/38378 (32.76%) Loss: 2.044887 LR: 0.00004330 [05:29:19] Epoch: 1 Batch: 12573/38378 (32.76%) Loss: 2.257491 LR: 0.00004330 [05:29:21] Epoch: 1 Batch: 12574/38378 (32.76%) Loss: 1.961132 LR: 0.00004329 [05:29:22] Epoch: 1 Batch: 12575/38378 (32.77%) Loss: 2.227080 LR: 0.00004329 [05:29:24] Epoch: 1 Batch: 12576/38378 (32.77%) Loss: 2.088747 LR: 0.00004329 [05:29:26] Epoch: 1 Batch: 12577/38378 (32.77%) Loss: 2.096282 LR: 0.00004329 [05:29:28] Epoch: 1 Batch: 12578/38378 (32.77%) Loss: 1.838005 LR: 0.00004329 [05:29:30] Epoch: 1 Batch: 12579/38378 (32.78%) Loss: 1.966617 LR: 0.00004329 [05:29:31] Epoch: 1 Batch: 12580/38378 (32.78%) Loss: 2.101049 LR: 0.00004329 [05:29:33] Epoch: 1 Batch: 12581/38378 (32.78%) Loss: 1.988027 LR: 0.00004328 [05:29:35] Epoch: 1 Batch: 12582/38378 (32.78%) Loss: 2.248285 LR: 0.00004328 [05:29:37] Epoch: 1 Batch: 12583/38378 (32.79%) Loss: 2.027244 LR: 0.00004328 [05:29:39] Epoch: 1 Batch: 12584/38378 (32.79%) Loss: 1.624502 LR: 0.00004328 [05:29:41] Epoch: 1 Batch: 12585/38378 (32.79%) Loss: 1.853872 LR: 0.00004328 [05:29:42] Epoch: 1 Batch: 12586/38378 (32.79%) Loss: 1.918122 LR: 0.00004328 [05:29:44] Epoch: 1 Batch: 12587/38378 (32.80%) Loss: 2.034979 LR: 0.00004328 [05:29:46] Epoch: 1 Batch: 12588/38378 (32.80%) Loss: 2.065543 LR: 0.00004327 [05:29:48] Epoch: 1 Batch: 12589/38378 (32.80%) Loss: 2.080855 LR: 0.00004327 [05:29:49] Epoch: 1 Batch: 12590/38378 (32.81%) Loss: 2.117172 LR: 0.00004327 [05:29:51] Epoch: 1 Batch: 12591/38378 (32.81%) Loss: 2.098180 LR: 0.00004327 [05:29:53] Epoch: 1 Batch: 12592/38378 (32.81%) Loss: 2.178429 LR: 0.00004327 [05:29:55] Epoch: 1 Batch: 12593/38378 (32.81%) Loss: 2.140217 LR: 0.00004327 [05:29:57] Epoch: 1 Batch: 12594/38378 (32.82%) Loss: 2.239366 LR: 0.00004327 [05:29:59] Epoch: 1 Batch: 12595/38378 (32.82%) Loss: 2.071242 LR: 0.00004326 [05:30:00] Epoch: 1 Batch: 12596/38378 (32.82%) Loss: 1.749909 LR: 0.00004326 [05:30:02] Epoch: 1 Batch: 12597/38378 (32.82%) Loss: 1.913414 LR: 0.00004326 [05:30:04] Epoch: 1 Batch: 12598/38378 (32.83%) Loss: 1.882949 LR: 0.00004326 [05:30:06] Epoch: 1 Batch: 12599/38378 (32.83%) Loss: 2.018853 LR: 0.00004326 [05:30:12] >> Cleaned up old temp checkpoint: epoch1_step11600 [05:30:12] >> Temp checkpoint saved: epoch1_step12600, size: 0.1702 GB [05:30:12] Epoch: 1 Batch: 12600/38378 (32.83%) Loss: 2.026177 LR: 0.00004326 [05:30:13] Epoch: 1 Batch: 12601/38378 (32.83%) Loss: 2.221314 LR: 0.00004326 [05:30:15] Epoch: 1 Batch: 12602/38378 (32.84%) Loss: 1.828446 LR: 0.00004325 [05:30:17] Epoch: 1 Batch: 12603/38378 (32.84%) Loss: 1.951125 LR: 0.00004325 [05:30:19] Epoch: 1 Batch: 12604/38378 (32.84%) Loss: 2.092114 LR: 0.00004325 [05:30:21] Epoch: 1 Batch: 12605/38378 (32.84%) Loss: 2.159427 LR: 0.00004325 [05:30:22] Epoch: 1 Batch: 12606/38378 (32.85%) Loss: 1.944318 LR: 0.00004325 [05:30:24] Epoch: 1 Batch: 12607/38378 (32.85%) Loss: 1.836684 LR: 0.00004325 [05:30:26] Epoch: 1 Batch: 12608/38378 (32.85%) Loss: 1.897457 LR: 0.00004325 [05:30:28] Epoch: 1 Batch: 12609/38378 (32.85%) Loss: 1.925593 LR: 0.00004324 [05:30:30] Epoch: 1 Batch: 12610/38378 (32.86%) Loss: 1.742363 LR: 0.00004324 [05:30:32] Epoch: 1 Batch: 12611/38378 (32.86%) Loss: 1.939952 LR: 0.00004324 [05:30:33] Epoch: 1 Batch: 12612/38378 (32.86%) Loss: 1.911343 LR: 0.00004324 [05:30:35] Epoch: 1 Batch: 12613/38378 (32.87%) Loss: 1.968996 LR: 0.00004324 [05:30:37] Epoch: 1 Batch: 12614/38378 (32.87%) Loss: 1.740018 LR: 0.00004324 [05:30:39] Epoch: 1 Batch: 12615/38378 (32.87%) Loss: 1.994958 LR: 0.00004324 [05:30:41] Epoch: 1 Batch: 12616/38378 (32.87%) Loss: 2.241837 LR: 0.00004323 [05:30:43] Epoch: 1 Batch: 12617/38378 (32.88%) Loss: 2.291443 LR: 0.00004323 [05:30:44] Epoch: 1 Batch: 12618/38378 (32.88%) Loss: 1.849432 LR: 0.00004323 [05:30:46] Epoch: 1 Batch: 12619/38378 (32.88%) Loss: 1.956507 LR: 0.00004323 [05:30:48] Epoch: 1 Batch: 12620/38378 (32.88%) Loss: 2.379190 LR: 0.00004323 [05:30:50] Epoch: 1 Batch: 12621/38378 (32.89%) Loss: 1.837520 LR: 0.00004323 [05:30:52] Epoch: 1 Batch: 12622/38378 (32.89%) Loss: 1.760755 LR: 0.00004323 [05:30:54] Epoch: 1 Batch: 12623/38378 (32.89%) Loss: 2.026296 LR: 0.00004322 [05:30:55] Epoch: 1 Batch: 12624/38378 (32.89%) Loss: 2.121697 LR: 0.00004322 [05:30:57] Epoch: 1 Batch: 12625/38378 (32.90%) Loss: 1.865763 LR: 0.00004322 [05:30:59] Epoch: 1 Batch: 12626/38378 (32.90%) Loss: 1.868941 LR: 0.00004322 [05:31:01] Epoch: 1 Batch: 12627/38378 (32.90%) Loss: 2.159031 LR: 0.00004322 [05:31:03] Epoch: 1 Batch: 12628/38378 (32.90%) Loss: 1.912013 LR: 0.00004322 [05:31:05] Epoch: 1 Batch: 12629/38378 (32.91%) Loss: 1.989366 LR: 0.00004322 [05:31:07] Epoch: 1 Batch: 12630/38378 (32.91%) Loss: 1.901838 LR: 0.00004321 [05:31:09] Epoch: 1 Batch: 12631/38378 (32.91%) Loss: 1.842262 LR: 0.00004321 [05:31:10] Epoch: 1 Batch: 12632/38378 (32.91%) Loss: 2.035881 LR: 0.00004321 [05:31:12] Epoch: 1 Batch: 12633/38378 (32.92%) Loss: 2.221473 LR: 0.00004321 [05:31:14] Epoch: 1 Batch: 12634/38378 (32.92%) Loss: 1.820068 LR: 0.00004321 [05:31:16] Epoch: 1 Batch: 12635/38378 (32.92%) Loss: 2.207670 LR: 0.00004321 [05:31:18] Epoch: 1 Batch: 12636/38378 (32.93%) Loss: 1.694284 LR: 0.00004321 [05:31:19] Epoch: 1 Batch: 12637/38378 (32.93%) Loss: 1.908246 LR: 0.00004320 [05:31:21] Epoch: 1 Batch: 12638/38378 (32.93%) Loss: 1.728074 LR: 0.00004320 [05:31:23] Epoch: 1 Batch: 12639/38378 (32.93%) Loss: 1.884312 LR: 0.00004320 [05:31:25] Epoch: 1 Batch: 12640/38378 (32.94%) Loss: 1.948774 LR: 0.00004320 [05:31:27] Epoch: 1 Batch: 12641/38378 (32.94%) Loss: 2.027419 LR: 0.00004320 [05:31:28] Epoch: 1 Batch: 12642/38378 (32.94%) Loss: 2.085319 LR: 0.00004320 [05:31:30] Epoch: 1 Batch: 12643/38378 (32.94%) Loss: 2.087522 LR: 0.00004320 [05:31:32] Epoch: 1 Batch: 12644/38378 (32.95%) Loss: 2.157210 LR: 0.00004319 [05:31:34] Epoch: 1 Batch: 12645/38378 (32.95%) Loss: 1.649080 LR: 0.00004319 [05:31:35] Epoch: 1 Batch: 12646/38378 (32.95%) Loss: 2.171523 LR: 0.00004319 [05:31:37] Epoch: 1 Batch: 12647/38378 (32.95%) Loss: 1.977506 LR: 0.00004319 [05:31:39] Epoch: 1 Batch: 12648/38378 (32.96%) Loss: 2.087809 LR: 0.00004319 [05:31:41] Epoch: 1 Batch: 12649/38378 (32.96%) Loss: 2.048177 LR: 0.00004319 [05:31:43] Epoch: 1 Batch: 12650/38378 (32.96%) Loss: 2.115990 LR: 0.00004319 [05:31:45] Epoch: 1 Batch: 12651/38378 (32.96%) Loss: 1.807735 LR: 0.00004319 [05:31:46] Epoch: 1 Batch: 12652/38378 (32.97%) Loss: 2.114959 LR: 0.00004319 [05:31:48] Epoch: 1 Batch: 12653/38378 (32.97%) Loss: 2.121686 LR: 0.00004319 [05:31:50] Epoch: 1 Batch: 12654/38378 (32.97%) Loss: 2.059336 LR: 0.00004319 [05:31:52] Epoch: 1 Batch: 12655/38378 (32.97%) Loss: 1.981027 LR: 0.00004319 [05:31:54] Epoch: 1 Batch: 12656/38378 (32.98%) Loss: 1.946622 LR: 0.00004319 [05:31:55] Epoch: 1 Batch: 12657/38378 (32.98%) Loss: 1.700121 LR: 0.00004319 [05:31:57] Epoch: 1 Batch: 12658/38378 (32.98%) Loss: 2.214173 LR: 0.00004318 [05:31:59] Epoch: 1 Batch: 12659/38378 (32.99%) Loss: 1.713417 LR: 0.00004318 [05:32:01] Epoch: 1 Batch: 12660/38378 (32.99%) Loss: 2.196970 LR: 0.00004318 [05:32:03] Epoch: 1 Batch: 12661/38378 (32.99%) Loss: 2.042654 LR: 0.00004318 [05:32:04] Epoch: 1 Batch: 12662/38378 (32.99%) Loss: 2.114296 LR: 0.00004318 [05:32:06] Epoch: 1 Batch: 12663/38378 (33.00%) Loss: 1.920267 LR: 0.00004318 [05:32:08] Epoch: 1 Batch: 12664/38378 (33.00%) Loss: 2.192348 LR: 0.00004318 [05:32:10] Epoch: 1 Batch: 12665/38378 (33.00%) Loss: 2.049108 LR: 0.00004317 [05:32:12] Epoch: 1 Batch: 12666/38378 (33.00%) Loss: 2.077224 LR: 0.00004317 [05:32:13] Epoch: 1 Batch: 12667/38378 (33.01%) Loss: 1.951174 LR: 0.00004317 [05:32:15] Epoch: 1 Batch: 12668/38378 (33.01%) Loss: 2.003502 LR: 0.00004317 [05:32:17] Epoch: 1 Batch: 12669/38378 (33.01%) Loss: 2.158018 LR: 0.00004317 [05:32:19] Epoch: 1 Batch: 12670/38378 (33.01%) Loss: 2.076625 LR: 0.00004317 [05:32:21] Epoch: 1 Batch: 12671/38378 (33.02%) Loss: 2.078633 LR: 0.00004317 [05:32:22] Epoch: 1 Batch: 12672/38378 (33.02%) Loss: 2.061679 LR: 0.00004316 [05:32:24] Epoch: 1 Batch: 12673/38378 (33.02%) Loss: 2.092013 LR: 0.00004316 [05:32:26] Epoch: 1 Batch: 12674/38378 (33.02%) Loss: 1.976016 LR: 0.00004316 [05:32:28] Epoch: 1 Batch: 12675/38378 (33.03%) Loss: 1.987225 LR: 0.00004316 [05:32:30] Epoch: 1 Batch: 12676/38378 (33.03%) Loss: 1.699550 LR: 0.00004316 [05:32:31] Epoch: 1 Batch: 12677/38378 (33.03%) Loss: 2.183875 LR: 0.00004316 [05:32:33] Epoch: 1 Batch: 12678/38378 (33.03%) Loss: 1.929354 LR: 0.00004316 [05:32:35] Epoch: 1 Batch: 12679/38378 (33.04%) Loss: 2.101493 LR: 0.00004315 [05:32:37] Epoch: 1 Batch: 12680/38378 (33.04%) Loss: 2.105596 LR: 0.00004315 [05:32:39] Epoch: 1 Batch: 12681/38378 (33.04%) Loss: 2.130618 LR: 0.00004315 [05:32:40] Epoch: 1 Batch: 12682/38378 (33.04%) Loss: 1.764877 LR: 0.00004315 [05:32:42] Epoch: 1 Batch: 12683/38378 (33.05%) Loss: 2.001164 LR: 0.00004315 [05:32:44] Epoch: 1 Batch: 12684/38378 (33.05%) Loss: 1.939194 LR: 0.00004315 [05:32:46] Epoch: 1 Batch: 12685/38378 (33.05%) Loss: 1.857585 LR: 0.00004315 [05:32:48] Epoch: 1 Batch: 12686/38378 (33.06%) Loss: 1.863273 LR: 0.00004314 [05:32:50] Epoch: 1 Batch: 12687/38378 (33.06%) Loss: 1.764386 LR: 0.00004314 [05:32:51] Epoch: 1 Batch: 12688/38378 (33.06%) Loss: 1.842674 LR: 0.00004314 [05:32:53] Epoch: 1 Batch: 12689/38378 (33.06%) Loss: 1.881393 LR: 0.00004314 [05:32:55] Epoch: 1 Batch: 12690/38378 (33.07%) Loss: 1.804658 LR: 0.00004314 [05:32:57] Epoch: 1 Batch: 12691/38378 (33.07%) Loss: 2.066009 LR: 0.00004314 [05:32:59] Epoch: 1 Batch: 12692/38378 (33.07%) Loss: 1.965091 LR: 0.00004314 [05:33:00] Epoch: 1 Batch: 12693/38378 (33.07%) Loss: 2.090695 LR: 0.00004313 [05:33:02] Epoch: 1 Batch: 12694/38378 (33.08%) Loss: 1.888288 LR: 0.00004313 [05:33:04] Epoch: 1 Batch: 12695/38378 (33.08%) Loss: 2.017371 LR: 0.00004313 [05:33:06] Epoch: 1 Batch: 12696/38378 (33.08%) Loss: 2.044198 LR: 0.00004313 [05:33:08] Epoch: 1 Batch: 12697/38378 (33.08%) Loss: 2.017777 LR: 0.00004313 [05:33:10] Epoch: 1 Batch: 12698/38378 (33.09%) Loss: 2.094762 LR: 0.00004313 [05:33:11] Epoch: 1 Batch: 12699/38378 (33.09%) Loss: 1.966835 LR: 0.00004313 [05:33:17] >> Cleaned up old temp checkpoint: epoch1_step11700 [05:33:17] >> Temp checkpoint saved: epoch1_step12700, size: 0.1702 GB [05:33:17] Epoch: 1 Batch: 12700/38378 (33.09%) Loss: 2.191144 LR: 0.00004312 [05:33:19] Epoch: 1 Batch: 12701/38378 (33.09%) Loss: 2.025458 LR: 0.00004312 [05:33:21] Epoch: 1 Batch: 12702/38378 (33.10%) Loss: 1.984511 LR: 0.00004312 [05:33:23] Epoch: 1 Batch: 12703/38378 (33.10%) Loss: 1.987742 LR: 0.00004312 [05:33:24] Epoch: 1 Batch: 12704/38378 (33.10%) Loss: 2.268859 LR: 0.00004312 [05:33:26] Epoch: 1 Batch: 12705/38378 (33.10%) Loss: 1.789995 LR: 0.00004312 [05:33:28] Epoch: 1 Batch: 12706/38378 (33.11%) Loss: 2.187613 LR: 0.00004312 [05:33:30] Epoch: 1 Batch: 12707/38378 (33.11%) Loss: 1.975371 LR: 0.00004311 [05:33:32] Epoch: 1 Batch: 12708/38378 (33.11%) Loss: 1.885731 LR: 0.00004311 [05:33:33] Epoch: 1 Batch: 12709/38378 (33.12%) Loss: 2.206920 LR: 0.00004311 [05:33:35] Epoch: 1 Batch: 12710/38378 (33.12%) Loss: 2.025048 LR: 0.00004311 [05:33:37] Epoch: 1 Batch: 12711/38378 (33.12%) Loss: 1.899931 LR: 0.00004311 [05:33:39] Epoch: 1 Batch: 12712/38378 (33.12%) Loss: 2.058014 LR: 0.00004311 [05:33:41] Epoch: 1 Batch: 12713/38378 (33.13%) Loss: 1.844223 LR: 0.00004311 [05:33:43] Epoch: 1 Batch: 12714/38378 (33.13%) Loss: 2.407271 LR: 0.00004310 [05:33:44] Epoch: 1 Batch: 12715/38378 (33.13%) Loss: 2.077880 LR: 0.00004310 [05:33:46] Epoch: 1 Batch: 12716/38378 (33.13%) Loss: 1.961259 LR: 0.00004310 [05:33:48] Epoch: 1 Batch: 12717/38378 (33.14%) Loss: 1.974212 LR: 0.00004310 [05:33:50] Epoch: 1 Batch: 12718/38378 (33.14%) Loss: 2.044319 LR: 0.00004310 [05:33:52] Epoch: 1 Batch: 12719/38378 (33.14%) Loss: 1.865428 LR: 0.00004310 [05:33:54] Epoch: 1 Batch: 12720/38378 (33.14%) Loss: 2.040797 LR: 0.00004310 [05:33:55] Epoch: 1 Batch: 12721/38378 (33.15%) Loss: 2.052384 LR: 0.00004309 [05:33:57] Epoch: 1 Batch: 12722/38378 (33.15%) Loss: 2.034810 LR: 0.00004309 [05:33:59] Epoch: 1 Batch: 12723/38378 (33.15%) Loss: 1.726452 LR: 0.00004309 [05:34:01] Epoch: 1 Batch: 12724/38378 (33.15%) Loss: 1.962440 LR: 0.00004309 [05:34:03] Epoch: 1 Batch: 12725/38378 (33.16%) Loss: 1.728635 LR: 0.00004309 [05:34:05] Epoch: 1 Batch: 12726/38378 (33.16%) Loss: 2.015323 LR: 0.00004309 [05:34:06] Epoch: 1 Batch: 12727/38378 (33.16%) Loss: 2.234163 LR: 0.00004309 [05:34:08] Epoch: 1 Batch: 12728/38378 (33.16%) Loss: 2.172541 LR: 0.00004308 [05:34:10] Epoch: 1 Batch: 12729/38378 (33.17%) Loss: 1.992743 LR: 0.00004308 [05:34:12] Epoch: 1 Batch: 12730/38378 (33.17%) Loss: 1.717849 LR: 0.00004308 [05:34:14] Epoch: 1 Batch: 12731/38378 (33.17%) Loss: 2.175636 LR: 0.00004308 [05:34:15] Epoch: 1 Batch: 12732/38378 (33.18%) Loss: 1.964004 LR: 0.00004308 [05:34:17] Epoch: 1 Batch: 12733/38378 (33.18%) Loss: 2.024407 LR: 0.00004308 [05:34:19] Epoch: 1 Batch: 12734/38378 (33.18%) Loss: 1.981591 LR: 0.00004308 [05:34:21] Epoch: 1 Batch: 12735/38378 (33.18%) Loss: 2.003430 LR: 0.00004307 [05:34:23] Epoch: 1 Batch: 12736/38378 (33.19%) Loss: 2.149473 LR: 0.00004307 [05:34:24] Epoch: 1 Batch: 12737/38378 (33.19%) Loss: 2.051206 LR: 0.00004307 [05:34:26] Epoch: 1 Batch: 12738/38378 (33.19%) Loss: 2.150404 LR: 0.00004307 [05:34:28] Epoch: 1 Batch: 12739/38378 (33.19%) Loss: 2.071676 LR: 0.00004307 [05:34:30] Epoch: 1 Batch: 12740/38378 (33.20%) Loss: 1.733171 LR: 0.00004307 [05:34:32] Epoch: 1 Batch: 12741/38378 (33.20%) Loss: 2.322521 LR: 0.00004307 [05:34:34] Epoch: 1 Batch: 12742/38378 (33.20%) Loss: 2.057618 LR: 0.00004306 [05:34:35] Epoch: 1 Batch: 12743/38378 (33.20%) Loss: 2.031811 LR: 0.00004306 [05:34:37] Epoch: 1 Batch: 12744/38378 (33.21%) Loss: 2.191610 LR: 0.00004306 [05:34:39] Epoch: 1 Batch: 12745/38378 (33.21%) Loss: 1.661536 LR: 0.00004306 [05:34:41] Epoch: 1 Batch: 12746/38378 (33.21%) Loss: 1.893873 LR: 0.00004306 [05:34:43] Epoch: 1 Batch: 12747/38378 (33.21%) Loss: 1.831596 LR: 0.00004306 [05:34:44] Epoch: 1 Batch: 12748/38378 (33.22%) Loss: 1.955288 LR: 0.00004306 [05:34:46] Epoch: 1 Batch: 12749/38378 (33.22%) Loss: 2.138541 LR: 0.00004305 [05:34:48] Epoch: 1 Batch: 12750/38378 (33.22%) Loss: 1.610408 LR: 0.00004305 [05:34:50] Epoch: 1 Batch: 12751/38378 (33.22%) Loss: 2.059554 LR: 0.00004305 [05:34:52] Epoch: 1 Batch: 12752/38378 (33.23%) Loss: 2.040603 LR: 0.00004305 [05:34:53] Epoch: 1 Batch: 12753/38378 (33.23%) Loss: 2.046406 LR: 0.00004305 [05:34:55] Epoch: 1 Batch: 12754/38378 (33.23%) Loss: 1.922044 LR: 0.00004305 [05:34:57] Epoch: 1 Batch: 12755/38378 (33.24%) Loss: 2.152988 LR: 0.00004305 [05:34:59] Epoch: 1 Batch: 12756/38378 (33.24%) Loss: 1.950876 LR: 0.00004304 [05:35:00] Epoch: 1 Batch: 12757/38378 (33.24%) Loss: 1.857501 LR: 0.00004304 [05:35:02] Epoch: 1 Batch: 12758/38378 (33.24%) Loss: 1.911129 LR: 0.00004304 [05:35:04] Epoch: 1 Batch: 12759/38378 (33.25%) Loss: 2.077919 LR: 0.00004304 [05:35:06] Epoch: 1 Batch: 12760/38378 (33.25%) Loss: 2.062258 LR: 0.00004304 [05:35:08] Epoch: 1 Batch: 12761/38378 (33.25%) Loss: 1.921015 LR: 0.00004304 [05:35:09] Epoch: 1 Batch: 12762/38378 (33.25%) Loss: 2.049939 LR: 0.00004304 [05:35:11] Epoch: 1 Batch: 12763/38378 (33.26%) Loss: 2.033183 LR: 0.00004303 [05:35:13] Epoch: 1 Batch: 12764/38378 (33.26%) Loss: 2.117196 LR: 0.00004303 [05:35:15] Epoch: 1 Batch: 12765/38378 (33.26%) Loss: 2.123358 LR: 0.00004303 [05:35:17] Epoch: 1 Batch: 12766/38378 (33.26%) Loss: 2.085819 LR: 0.00004303 [05:35:18] Epoch: 1 Batch: 12767/38378 (33.27%) Loss: 2.249185 LR: 0.00004303 [05:35:20] Epoch: 1 Batch: 12768/38378 (33.27%) Loss: 1.964159 LR: 0.00004303 [05:35:22] Epoch: 1 Batch: 12769/38378 (33.27%) Loss: 1.925700 LR: 0.00004303 [05:35:24] Epoch: 1 Batch: 12770/38378 (33.27%) Loss: 1.999961 LR: 0.00004303 [05:35:26] Epoch: 1 Batch: 12771/38378 (33.28%) Loss: 2.195561 LR: 0.00004303 [05:35:28] Epoch: 1 Batch: 12772/38378 (33.28%) Loss: 1.967049 LR: 0.00004303 [05:35:29] Epoch: 1 Batch: 12773/38378 (33.28%) Loss: 1.765101 LR: 0.00004303 [05:35:31] Epoch: 1 Batch: 12774/38378 (33.28%) Loss: 1.843899 LR: 0.00004303 [05:35:33] Epoch: 1 Batch: 12775/38378 (33.29%) Loss: 1.986012 LR: 0.00004303 [05:35:35] Epoch: 1 Batch: 12776/38378 (33.29%) Loss: 2.039969 LR: 0.00004303 [05:35:37] Epoch: 1 Batch: 12777/38378 (33.29%) Loss: 2.143873 LR: 0.00004302 [05:35:39] Epoch: 1 Batch: 12778/38378 (33.30%) Loss: 1.824240 LR: 0.00004302 [05:35:40] Epoch: 1 Batch: 12779/38378 (33.30%) Loss: 2.104168 LR: 0.00004302 [05:35:42] Epoch: 1 Batch: 12780/38378 (33.30%) Loss: 2.045856 LR: 0.00004302 [05:35:44] Epoch: 1 Batch: 12781/38378 (33.30%) Loss: 2.050540 LR: 0.00004302 [05:35:46] Epoch: 1 Batch: 12782/38378 (33.31%) Loss: 1.765194 LR: 0.00004302 [05:35:48] Epoch: 1 Batch: 12783/38378 (33.31%) Loss: 1.956440 LR: 0.00004302 [05:35:49] Epoch: 1 Batch: 12784/38378 (33.31%) Loss: 1.505169 LR: 0.00004301 [05:35:51] Epoch: 1 Batch: 12785/38378 (33.31%) Loss: 1.995146 LR: 0.00004301 [05:35:53] Epoch: 1 Batch: 12786/38378 (33.32%) Loss: 1.968706 LR: 0.00004301 [05:35:55] Epoch: 1 Batch: 12787/38378 (33.32%) Loss: 1.820467 LR: 0.00004301 [05:35:57] Epoch: 1 Batch: 12788/38378 (33.32%) Loss: 1.700028 LR: 0.00004301 [05:35:58] Epoch: 1 Batch: 12789/38378 (33.32%) Loss: 1.999261 LR: 0.00004301 [05:36:00] Epoch: 1 Batch: 12790/38378 (33.33%) Loss: 2.007062 LR: 0.00004301 [05:36:02] Epoch: 1 Batch: 12791/38378 (33.33%) Loss: 1.920316 LR: 0.00004300 [05:36:04] Epoch: 1 Batch: 12792/38378 (33.33%) Loss: 1.983962 LR: 0.00004300 [05:36:06] Epoch: 1 Batch: 12793/38378 (33.33%) Loss: 2.127086 LR: 0.00004300 [05:36:08] Epoch: 1 Batch: 12794/38378 (33.34%) Loss: 1.887056 LR: 0.00004300 [05:36:09] Epoch: 1 Batch: 12795/38378 (33.34%) Loss: 1.915983 LR: 0.00004300 [05:36:11] Epoch: 1 Batch: 12796/38378 (33.34%) Loss: 2.167278 LR: 0.00004300 [05:36:13] Epoch: 1 Batch: 12797/38378 (33.34%) Loss: 2.032081 LR: 0.00004300 [05:36:15] Epoch: 1 Batch: 12798/38378 (33.35%) Loss: 1.940478 LR: 0.00004299 [05:36:17] Epoch: 1 Batch: 12799/38378 (33.35%) Loss: 1.882353 LR: 0.00004299 [05:36:23] >> Cleaned up old temp checkpoint: epoch1_step11800 [05:36:23] >> Temp checkpoint saved: epoch1_step12800, size: 0.1702 GB [05:36:23] Epoch: 1 Batch: 12800/38378 (33.35%) Loss: 2.122954 LR: 0.00004299 [05:36:24] Epoch: 1 Batch: 12801/38378 (33.36%) Loss: 1.962108 LR: 0.00004299 [05:36:26] Epoch: 1 Batch: 12802/38378 (33.36%) Loss: 2.180520 LR: 0.00004299 [05:36:28] Epoch: 1 Batch: 12803/38378 (33.36%) Loss: 2.163942 LR: 0.00004299 [05:36:30] Epoch: 1 Batch: 12804/38378 (33.36%) Loss: 2.125569 LR: 0.00004299 [05:36:32] Epoch: 1 Batch: 12805/38378 (33.37%) Loss: 2.064341 LR: 0.00004298 [05:36:33] Epoch: 1 Batch: 12806/38378 (33.37%) Loss: 2.354636 LR: 0.00004298 [05:36:35] Epoch: 1 Batch: 12807/38378 (33.37%) Loss: 1.856818 LR: 0.00004298 [05:36:37] Epoch: 1 Batch: 12808/38378 (33.37%) Loss: 2.271771 LR: 0.00004298 [05:36:39] Epoch: 1 Batch: 12809/38378 (33.38%) Loss: 2.165137 LR: 0.00004298 [05:36:41] Epoch: 1 Batch: 12810/38378 (33.38%) Loss: 2.060035 LR: 0.00004298 [05:36:42] Epoch: 1 Batch: 12811/38378 (33.38%) Loss: 2.079594 LR: 0.00004298 [05:36:44] Epoch: 1 Batch: 12812/38378 (33.38%) Loss: 1.820403 LR: 0.00004297 [05:36:46] Epoch: 1 Batch: 12813/38378 (33.39%) Loss: 2.029559 LR: 0.00004297 [05:36:48] Epoch: 1 Batch: 12814/38378 (33.39%) Loss: 2.023443 LR: 0.00004297 [05:36:50] Epoch: 1 Batch: 12815/38378 (33.39%) Loss: 2.093742 LR: 0.00004297 [05:36:52] Epoch: 1 Batch: 12816/38378 (33.39%) Loss: 2.086204 LR: 0.00004297 [05:36:53] Epoch: 1 Batch: 12817/38378 (33.40%) Loss: 1.940987 LR: 0.00004297 [05:36:55] Epoch: 1 Batch: 12818/38378 (33.40%) Loss: 2.180547 LR: 0.00004297 [05:36:57] Epoch: 1 Batch: 12819/38378 (33.40%) Loss: 2.258896 LR: 0.00004296 [05:36:59] Epoch: 1 Batch: 12820/38378 (33.40%) Loss: 2.219034 LR: 0.00004296 [05:37:01] Epoch: 1 Batch: 12821/38378 (33.41%) Loss: 1.680753 LR: 0.00004296 [05:37:03] Epoch: 1 Batch: 12822/38378 (33.41%) Loss: 1.918839 LR: 0.00004296 [05:37:04] Epoch: 1 Batch: 12823/38378 (33.41%) Loss: 2.019493 LR: 0.00004296 [05:37:06] Epoch: 1 Batch: 12824/38378 (33.41%) Loss: 2.412176 LR: 0.00004296 [05:37:08] Epoch: 1 Batch: 12825/38378 (33.42%) Loss: 1.900395 LR: 0.00004296 [05:37:10] Epoch: 1 Batch: 12826/38378 (33.42%) Loss: 2.381239 LR: 0.00004295 [05:37:12] Epoch: 1 Batch: 12827/38378 (33.42%) Loss: 1.814585 LR: 0.00004295 [05:37:13] Epoch: 1 Batch: 12828/38378 (33.43%) Loss: 1.760233 LR: 0.00004295 [05:37:15] Epoch: 1 Batch: 12829/38378 (33.43%) Loss: 1.863988 LR: 0.00004295 [05:37:17] Epoch: 1 Batch: 12830/38378 (33.43%) Loss: 1.532926 LR: 0.00004295 [05:37:19] Epoch: 1 Batch: 12831/38378 (33.43%) Loss: 2.010251 LR: 0.00004295 [05:37:21] Epoch: 1 Batch: 12832/38378 (33.44%) Loss: 2.065846 LR: 0.00004295 [05:37:23] Epoch: 1 Batch: 12833/38378 (33.44%) Loss: 1.904088 LR: 0.00004294 [05:37:24] Epoch: 1 Batch: 12834/38378 (33.44%) Loss: 2.094692 LR: 0.00004294 [05:37:26] Epoch: 1 Batch: 12835/38378 (33.44%) Loss: 2.173428 LR: 0.00004294 [05:37:28] Epoch: 1 Batch: 12836/38378 (33.45%) Loss: 1.841932 LR: 0.00004294 [05:37:30] Epoch: 1 Batch: 12837/38378 (33.45%) Loss: 2.147803 LR: 0.00004294 [05:37:32] Epoch: 1 Batch: 12838/38378 (33.45%) Loss: 2.211935 LR: 0.00004294 [05:37:33] Epoch: 1 Batch: 12839/38378 (33.45%) Loss: 2.193129 LR: 0.00004294 [05:37:35] Epoch: 1 Batch: 12840/38378 (33.46%) Loss: 2.290139 LR: 0.00004293 [05:37:37] Epoch: 1 Batch: 12841/38378 (33.46%) Loss: 1.958849 LR: 0.00004293 [05:37:39] Epoch: 1 Batch: 12842/38378 (33.46%) Loss: 2.093093 LR: 0.00004293 [05:37:41] Epoch: 1 Batch: 12843/38378 (33.46%) Loss: 2.144213 LR: 0.00004293 [05:37:43] Epoch: 1 Batch: 12844/38378 (33.47%) Loss: 2.121068 LR: 0.00004293 [05:37:44] Epoch: 1 Batch: 12845/38378 (33.47%) Loss: 2.130536 LR: 0.00004293 [05:37:46] Epoch: 1 Batch: 12846/38378 (33.47%) Loss: 2.042430 LR: 0.00004293 [05:37:48] Epoch: 1 Batch: 12847/38378 (33.47%) Loss: 1.855235 LR: 0.00004292 [05:37:50] Epoch: 1 Batch: 12848/38378 (33.48%) Loss: 2.095638 LR: 0.00004292 [05:37:52] Epoch: 1 Batch: 12849/38378 (33.48%) Loss: 2.200028 LR: 0.00004292 [05:37:53] Epoch: 1 Batch: 12850/38378 (33.48%) Loss: 1.908898 LR: 0.00004292 [05:37:55] Epoch: 1 Batch: 12851/38378 (33.49%) Loss: 2.028561 LR: 0.00004292 [05:37:57] Epoch: 1 Batch: 12852/38378 (33.49%) Loss: 1.890039 LR: 0.00004292 [05:37:59] Epoch: 1 Batch: 12853/38378 (33.49%) Loss: 1.883829 LR: 0.00004292 [05:38:01] Epoch: 1 Batch: 12854/38378 (33.49%) Loss: 2.113655 LR: 0.00004291 [05:38:02] Epoch: 1 Batch: 12855/38378 (33.50%) Loss: 2.236683 LR: 0.00004291 [05:38:04] Epoch: 1 Batch: 12856/38378 (33.50%) Loss: 2.009511 LR: 0.00004291 [05:38:06] Epoch: 1 Batch: 12857/38378 (33.50%) Loss: 1.878061 LR: 0.00004291 [05:38:08] Epoch: 1 Batch: 12858/38378 (33.50%) Loss: 2.125229 LR: 0.00004291 [05:38:10] Epoch: 1 Batch: 12859/38378 (33.51%) Loss: 1.897334 LR: 0.00004291 [05:38:11] Epoch: 1 Batch: 12860/38378 (33.51%) Loss: 2.084513 LR: 0.00004291 [05:38:13] Epoch: 1 Batch: 12861/38378 (33.51%) Loss: 1.940598 LR: 0.00004290 [05:38:15] Epoch: 1 Batch: 12862/38378 (33.51%) Loss: 2.262042 LR: 0.00004290 [05:38:17] Epoch: 1 Batch: 12863/38378 (33.52%) Loss: 2.299261 LR: 0.00004290 [05:38:19] Epoch: 1 Batch: 12864/38378 (33.52%) Loss: 1.744135 LR: 0.00004290 [05:38:20] Epoch: 1 Batch: 12865/38378 (33.52%) Loss: 2.369913 LR: 0.00004290 [05:38:22] Epoch: 1 Batch: 12866/38378 (33.52%) Loss: 2.113072 LR: 0.00004290 [05:38:24] Epoch: 1 Batch: 12867/38378 (33.53%) Loss: 2.032818 LR: 0.00004290 [05:38:26] Epoch: 1 Batch: 12868/38378 (33.53%) Loss: 2.180835 LR: 0.00004289 [05:38:28] Epoch: 1 Batch: 12869/38378 (33.53%) Loss: 2.527470 LR: 0.00004289 [05:38:29] Epoch: 1 Batch: 12870/38378 (33.53%) Loss: 2.046157 LR: 0.00004289 [05:38:31] Epoch: 1 Batch: 12871/38378 (33.54%) Loss: 2.179647 LR: 0.00004289 [05:38:33] Epoch: 1 Batch: 12872/38378 (33.54%) Loss: 1.945701 LR: 0.00004289 [05:38:35] Epoch: 1 Batch: 12873/38378 (33.54%) Loss: 1.990889 LR: 0.00004289 [05:38:37] Epoch: 1 Batch: 12874/38378 (33.55%) Loss: 2.316546 LR: 0.00004289 [05:38:38] Epoch: 1 Batch: 12875/38378 (33.55%) Loss: 1.879144 LR: 0.00004288 [05:38:40] Epoch: 1 Batch: 12876/38378 (33.55%) Loss: 2.357434 LR: 0.00004288 [05:38:42] Epoch: 1 Batch: 12877/38378 (33.55%) Loss: 1.810179 LR: 0.00004288 [05:38:44] Epoch: 1 Batch: 12878/38378 (33.56%) Loss: 1.994356 LR: 0.00004288 [05:38:46] Epoch: 1 Batch: 12879/38378 (33.56%) Loss: 2.002550 LR: 0.00004288 [05:38:48] Epoch: 1 Batch: 12880/38378 (33.56%) Loss: 2.005446 LR: 0.00004288 [05:38:49] Epoch: 1 Batch: 12881/38378 (33.56%) Loss: 1.859056 LR: 0.00004288 [05:38:51] Epoch: 1 Batch: 12882/38378 (33.57%) Loss: 1.863627 LR: 0.00004287 [05:38:53] Epoch: 1 Batch: 12883/38378 (33.57%) Loss: 1.791205 LR: 0.00004287 [05:38:55] Epoch: 1 Batch: 12884/38378 (33.57%) Loss: 2.127324 LR: 0.00004287 [05:38:57] Epoch: 1 Batch: 12885/38378 (33.57%) Loss: 2.140053 LR: 0.00004287 [05:38:58] Epoch: 1 Batch: 12886/38378 (33.58%) Loss: 2.216991 LR: 0.00004287 [05:39:00] Epoch: 1 Batch: 12887/38378 (33.58%) Loss: 1.976587 LR: 0.00004287 [05:39:02] Epoch: 1 Batch: 12888/38378 (33.58%) Loss: 2.124816 LR: 0.00004287 [05:39:04] Epoch: 1 Batch: 12889/38378 (33.58%) Loss: 1.812375 LR: 0.00004286 [05:39:05] Epoch: 1 Batch: 12890/38378 (33.59%) Loss: 2.048164 LR: 0.00004286 [05:39:07] Epoch: 1 Batch: 12891/38378 (33.59%) Loss: 2.049675 LR: 0.00004286 [05:39:09] Epoch: 1 Batch: 12892/38378 (33.59%) Loss: 2.049620 LR: 0.00004286 [05:39:11] Epoch: 1 Batch: 12893/38378 (33.59%) Loss: 2.047817 LR: 0.00004286 [05:39:13] Epoch: 1 Batch: 12894/38378 (33.60%) Loss: 2.257551 LR: 0.00004286 [05:39:14] Epoch: 1 Batch: 12895/38378 (33.60%) Loss: 1.976438 LR: 0.00004286 [05:39:16] Epoch: 1 Batch: 12896/38378 (33.60%) Loss: 1.990891 LR: 0.00004285 [05:39:18] Epoch: 1 Batch: 12897/38378 (33.61%) Loss: 2.291413 LR: 0.00004285 [05:39:20] Epoch: 1 Batch: 12898/38378 (33.61%) Loss: 2.007978 LR: 0.00004285 [05:39:22] Epoch: 1 Batch: 12899/38378 (33.61%) Loss: 1.998008 LR: 0.00004285 [05:39:28] >> Cleaned up old temp checkpoint: epoch1_step11900 [05:39:28] >> Temp checkpoint saved: epoch1_step12900, size: 0.1702 GB [05:39:28] Epoch: 1 Batch: 12900/38378 (33.61%) Loss: 2.037416 LR: 0.00004285 [05:39:29] Epoch: 1 Batch: 12901/38378 (33.62%) Loss: 1.850854 LR: 0.00004285 [05:39:31] Epoch: 1 Batch: 12902/38378 (33.62%) Loss: 1.767163 LR: 0.00004285 [05:39:33] Epoch: 1 Batch: 12903/38378 (33.62%) Loss: 2.179210 LR: 0.00004284 [05:39:35] Epoch: 1 Batch: 12904/38378 (33.62%) Loss: 2.060891 LR: 0.00004284 [05:39:37] Epoch: 1 Batch: 12905/38378 (33.63%) Loss: 2.342829 LR: 0.00004284 [05:39:38] Epoch: 1 Batch: 12906/38378 (33.63%) Loss: 1.725702 LR: 0.00004284 [05:39:40] Epoch: 1 Batch: 12907/38378 (33.63%) Loss: 1.799347 LR: 0.00004284 [05:39:42] Epoch: 1 Batch: 12908/38378 (33.63%) Loss: 2.015419 LR: 0.00004284 [05:39:44] Epoch: 1 Batch: 12909/38378 (33.64%) Loss: 1.842577 LR: 0.00004284 [05:39:46] Epoch: 1 Batch: 12910/38378 (33.64%) Loss: 2.253414 LR: 0.00004284 [05:39:47] Epoch: 1 Batch: 12911/38378 (33.64%) Loss: 1.861283 LR: 0.00004284 [05:39:49] Epoch: 1 Batch: 12912/38378 (33.64%) Loss: 2.090620 LR: 0.00004284 [05:39:51] Epoch: 1 Batch: 12913/38378 (33.65%) Loss: 2.027748 LR: 0.00004284 [05:39:53] Epoch: 1 Batch: 12914/38378 (33.65%) Loss: 1.784148 LR: 0.00004284 [05:39:55] Epoch: 1 Batch: 12915/38378 (33.65%) Loss: 2.323980 LR: 0.00004284 [05:39:57] Epoch: 1 Batch: 12916/38378 (33.65%) Loss: 1.932860 LR: 0.00004284 [05:39:58] Epoch: 1 Batch: 12917/38378 (33.66%) Loss: 1.976355 LR: 0.00004283 [05:40:00] Epoch: 1 Batch: 12918/38378 (33.66%) Loss: 2.016249 LR: 0.00004283 [05:40:02] Epoch: 1 Batch: 12919/38378 (33.66%) Loss: 1.718019 LR: 0.00004283 [05:40:04] Epoch: 1 Batch: 12920/38378 (33.67%) Loss: 1.856219 LR: 0.00004283 [05:40:06] Epoch: 1 Batch: 12921/38378 (33.67%) Loss: 1.994786 LR: 0.00004283 [05:40:08] Epoch: 1 Batch: 12922/38378 (33.67%) Loss: 2.282214 LR: 0.00004283 [05:40:09] Epoch: 1 Batch: 12923/38378 (33.67%) Loss: 1.951053 LR: 0.00004283 [05:40:11] Epoch: 1 Batch: 12924/38378 (33.68%) Loss: 2.011711 LR: 0.00004282 [05:40:13] Epoch: 1 Batch: 12925/38378 (33.68%) Loss: 1.863290 LR: 0.00004282 [05:40:15] Epoch: 1 Batch: 12926/38378 (33.68%) Loss: 1.676115 LR: 0.00004282 [05:40:17] Epoch: 1 Batch: 12927/38378 (33.68%) Loss: 2.213558 LR: 0.00004282 [05:40:19] Epoch: 1 Batch: 12928/38378 (33.69%) Loss: 1.879564 LR: 0.00004282 [05:40:20] Epoch: 1 Batch: 12929/38378 (33.69%) Loss: 2.080908 LR: 0.00004282 [05:40:22] Epoch: 1 Batch: 12930/38378 (33.69%) Loss: 1.850427 LR: 0.00004282 [05:40:24] Epoch: 1 Batch: 12931/38378 (33.69%) Loss: 1.856007 LR: 0.00004281 [05:40:26] Epoch: 1 Batch: 12932/38378 (33.70%) Loss: 2.048914 LR: 0.00004281 [05:40:28] Epoch: 1 Batch: 12933/38378 (33.70%) Loss: 2.233345 LR: 0.00004281 [05:40:29] Epoch: 1 Batch: 12934/38378 (33.70%) Loss: 1.983170 LR: 0.00004281 [05:40:31] Epoch: 1 Batch: 12935/38378 (33.70%) Loss: 1.835452 LR: 0.00004281 [05:40:33] Epoch: 1 Batch: 12936/38378 (33.71%) Loss: 1.981958 LR: 0.00004281 [05:40:35] Epoch: 1 Batch: 12937/38378 (33.71%) Loss: 1.905221 LR: 0.00004281 [05:40:37] Epoch: 1 Batch: 12938/38378 (33.71%) Loss: 1.858973 LR: 0.00004280 [05:40:38] Epoch: 1 Batch: 12939/38378 (33.71%) Loss: 2.202056 LR: 0.00004280 [05:40:40] Epoch: 1 Batch: 12940/38378 (33.72%) Loss: 2.050872 LR: 0.00004280 [05:40:42] Epoch: 1 Batch: 12941/38378 (33.72%) Loss: 2.127094 LR: 0.00004280 [05:40:44] Epoch: 1 Batch: 12942/38378 (33.72%) Loss: 1.984748 LR: 0.00004280 [05:40:46] Epoch: 1 Batch: 12943/38378 (33.73%) Loss: 2.060833 LR: 0.00004280 [05:40:47] Epoch: 1 Batch: 12944/38378 (33.73%) Loss: 2.309531 LR: 0.00004280 [05:40:49] Epoch: 1 Batch: 12945/38378 (33.73%) Loss: 2.135656 LR: 0.00004279 [05:40:51] Epoch: 1 Batch: 12946/38378 (33.73%) Loss: 1.701918 LR: 0.00004279 [05:40:53] Epoch: 1 Batch: 12947/38378 (33.74%) Loss: 1.741495 LR: 0.00004279 [05:40:55] Epoch: 1 Batch: 12948/38378 (33.74%) Loss: 2.267742 LR: 0.00004279 [05:40:56] Epoch: 1 Batch: 12949/38378 (33.74%) Loss: 1.583023 LR: 0.00004279 [05:40:58] Epoch: 1 Batch: 12950/38378 (33.74%) Loss: 1.968609 LR: 0.00004279 [05:41:00] Epoch: 1 Batch: 12951/38378 (33.75%) Loss: 2.245116 LR: 0.00004279 [05:41:02] Epoch: 1 Batch: 12952/38378 (33.75%) Loss: 1.900995 LR: 0.00004278 [05:41:04] Epoch: 1 Batch: 12953/38378 (33.75%) Loss: 1.968702 LR: 0.00004278 [05:41:06] Epoch: 1 Batch: 12954/38378 (33.75%) Loss: 2.123539 LR: 0.00004278 [05:41:07] Epoch: 1 Batch: 12955/38378 (33.76%) Loss: 2.475922 LR: 0.00004278 [05:41:09] Epoch: 1 Batch: 12956/38378 (33.76%) Loss: 1.962475 LR: 0.00004278 [05:41:11] Epoch: 1 Batch: 12957/38378 (33.76%) Loss: 2.116965 LR: 0.00004278 [05:41:13] Epoch: 1 Batch: 12958/38378 (33.76%) Loss: 1.825992 LR: 0.00004278 [05:41:14] Epoch: 1 Batch: 12959/38378 (33.77%) Loss: 1.476532 LR: 0.00004277 [05:41:16] Epoch: 1 Batch: 12960/38378 (33.77%) Loss: 2.128913 LR: 0.00004277 [05:41:18] Epoch: 1 Batch: 12961/38378 (33.77%) Loss: 2.200882 LR: 0.00004277 [05:41:20] Epoch: 1 Batch: 12962/38378 (33.77%) Loss: 1.783694 LR: 0.00004277 [05:41:22] Epoch: 1 Batch: 12963/38378 (33.78%) Loss: 1.976771 LR: 0.00004277 [05:41:24] Epoch: 1 Batch: 12964/38378 (33.78%) Loss: 2.123845 LR: 0.00004277 [05:41:25] Epoch: 1 Batch: 12965/38378 (33.78%) Loss: 1.779671 LR: 0.00004277 [05:41:27] Epoch: 1 Batch: 12966/38378 (33.78%) Loss: 2.209486 LR: 0.00004276 [05:41:29] Epoch: 1 Batch: 12967/38378 (33.79%) Loss: 2.049807 LR: 0.00004276 [05:41:31] Epoch: 1 Batch: 12968/38378 (33.79%) Loss: 2.290458 LR: 0.00004276 [05:41:32] Epoch: 1 Batch: 12969/38378 (33.79%) Loss: 2.039003 LR: 0.00004276 [05:41:34] Epoch: 1 Batch: 12970/38378 (33.80%) Loss: 1.855017 LR: 0.00004276 [05:41:36] Epoch: 1 Batch: 12971/38378 (33.80%) Loss: 2.061133 LR: 0.00004276 [05:41:38] Epoch: 1 Batch: 12972/38378 (33.80%) Loss: 2.026227 LR: 0.00004276 [05:41:40] Epoch: 1 Batch: 12973/38378 (33.80%) Loss: 1.996670 LR: 0.00004275 [05:41:42] Epoch: 1 Batch: 12974/38378 (33.81%) Loss: 1.933434 LR: 0.00004275 [05:41:43] Epoch: 1 Batch: 12975/38378 (33.81%) Loss: 2.019561 LR: 0.00004275 [05:41:45] Epoch: 1 Batch: 12976/38378 (33.81%) Loss: 1.817465 LR: 0.00004275 [05:41:47] Epoch: 1 Batch: 12977/38378 (33.81%) Loss: 2.333608 LR: 0.00004275 [05:41:49] Epoch: 1 Batch: 12978/38378 (33.82%) Loss: 1.909539 LR: 0.00004275 [05:41:51] Epoch: 1 Batch: 12979/38378 (33.82%) Loss: 1.778833 LR: 0.00004275 [05:41:52] Epoch: 1 Batch: 12980/38378 (33.82%) Loss: 2.034575 LR: 0.00004274 [05:41:54] Epoch: 1 Batch: 12981/38378 (33.82%) Loss: 1.859682 LR: 0.00004274 [05:41:56] Epoch: 1 Batch: 12982/38378 (33.83%) Loss: 2.212250 LR: 0.00004274 [05:41:58] Epoch: 1 Batch: 12983/38378 (33.83%) Loss: 1.650774 LR: 0.00004274 [05:41:59] Epoch: 1 Batch: 12984/38378 (33.83%) Loss: 2.044337 LR: 0.00004274 [05:42:01] Epoch: 1 Batch: 12985/38378 (33.83%) Loss: 2.215264 LR: 0.00004274 [05:42:03] Epoch: 1 Batch: 12986/38378 (33.84%) Loss: 2.035365 LR: 0.00004274 [05:42:05] Epoch: 1 Batch: 12987/38378 (33.84%) Loss: 1.864688 LR: 0.00004273 [05:42:07] Epoch: 1 Batch: 12988/38378 (33.84%) Loss: 1.910842 LR: 0.00004273 [05:42:08] Epoch: 1 Batch: 12989/38378 (33.84%) Loss: 1.867209 LR: 0.00004273 [05:42:10] Epoch: 1 Batch: 12990/38378 (33.85%) Loss: 2.034223 LR: 0.00004273 [05:42:12] Epoch: 1 Batch: 12991/38378 (33.85%) Loss: 2.141270 LR: 0.00004273 [05:42:14] Epoch: 1 Batch: 12992/38378 (33.85%) Loss: 1.796413 LR: 0.00004273 [05:42:16] Epoch: 1 Batch: 12993/38378 (33.86%) Loss: 2.075444 LR: 0.00004273 [05:42:18] Epoch: 1 Batch: 12994/38378 (33.86%) Loss: 1.941440 LR: 0.00004272 [05:42:19] Epoch: 1 Batch: 12995/38378 (33.86%) Loss: 2.141688 LR: 0.00004272 [05:42:21] Epoch: 1 Batch: 12996/38378 (33.86%) Loss: 2.270176 LR: 0.00004272 [05:42:23] Epoch: 1 Batch: 12997/38378 (33.87%) Loss: 1.810075 LR: 0.00004272 [05:42:25] Epoch: 1 Batch: 12998/38378 (33.87%) Loss: 2.033224 LR: 0.00004272 [05:42:26] Epoch: 1 Batch: 12999/38378 (33.87%) Loss: 1.900832 LR: 0.00004272 [05:42:28] >> Evaluating batch 0 [05:42:29] >> Evaluating batch 1 [05:42:30] >> Evaluating batch 2 [05:42:31] >> Evaluating batch 3 [05:42:32] >> Evaluating batch 4 [05:42:33] >> Evaluating batch 5 [05:42:34] >> Evaluating batch 6 [05:42:35] >> Evaluating batch 7 [05:42:36] >> Evaluating batch 8 [05:42:37] >> Evaluating batch 9 [05:42:38] >> Evaluating batch 10 [05:42:39] >> Evaluating batch 11 [05:42:40] >> Evaluating batch 12 [05:42:41] >> Evaluating batch 13 [05:42:42] >> Evaluating batch 14 [05:42:43] >> Evaluating batch 15 [05:42:44] >> Evaluating batch 16 [05:42:45] Epoch: 1 Step: 13000/38378 Evaluation: [05:42:45] [1mAvg Loss Since Last Eval: 2.0102 Val Loss: 2.1063 Validation loss delta: 0.0022 Perplexity: 8.2176 LR: 0.00004272 [05:42:49] >> Cleaned up old temp checkpoint: epoch1_step12000 [05:42:49] >> Temp checkpoint saved: epoch1_step13000, size: 0.1702 GB [05:42:53] >> Checkpoint saved: epoch1_step13000, size: 0.1702 GB [05:42:53] Epoch: 1 Batch: 13000/38378 (33.87%) Loss: 1.835498 LR: 0.00004272 [05:42:55] Epoch: 1 Batch: 13001/38378 (33.88%) Loss: 1.832216 LR: 0.00004271 [05:42:57] Epoch: 1 Batch: 13002/38378 (33.88%) Loss: 1.998981 LR: 0.00004271 [05:42:59] Epoch: 1 Batch: 13003/38378 (33.88%) Loss: 1.749942 LR: 0.00004271 [05:43:00] Epoch: 1 Batch: 13004/38378 (33.88%) Loss: 1.984645 LR: 0.00004271 [05:43:02] Epoch: 1 Batch: 13005/38378 (33.89%) Loss: 1.853748 LR: 0.00004271 [05:43:04] Epoch: 1 Batch: 13006/38378 (33.89%) Loss: 1.849010 LR: 0.00004271 [05:43:06] Epoch: 1 Batch: 13007/38378 (33.89%) Loss: 2.086833 LR: 0.00004271 [05:43:08] Epoch: 1 Batch: 13008/38378 (33.89%) Loss: 1.965604 LR: 0.00004270 [05:43:09] Epoch: 1 Batch: 13009/38378 (33.90%) Loss: 1.884939 LR: 0.00004270 [05:43:11] Epoch: 1 Batch: 13010/38378 (33.90%) Loss: 1.909553 LR: 0.00004270 [05:43:13] Epoch: 1 Batch: 13011/38378 (33.90%) Loss: 1.739510 LR: 0.00004270 [05:43:15] Epoch: 1 Batch: 13012/38378 (33.90%) Loss: 1.822270 LR: 0.00004270 [05:43:17] Epoch: 1 Batch: 13013/38378 (33.91%) Loss: 2.227877 LR: 0.00004270 [05:43:19] Epoch: 1 Batch: 13014/38378 (33.91%) Loss: 2.152402 LR: 0.00004270 [05:43:21] Epoch: 1 Batch: 13015/38378 (33.91%) Loss: 1.719260 LR: 0.00004269 [05:43:22] Epoch: 1 Batch: 13016/38378 (33.92%) Loss: 1.986770 LR: 0.00004269 [05:43:24] Epoch: 1 Batch: 13017/38378 (33.92%) Loss: 1.803310 LR: 0.00004269 [05:43:26] Epoch: 1 Batch: 13018/38378 (33.92%) Loss: 2.062287 LR: 0.00004269 [05:43:28] Epoch: 1 Batch: 13019/38378 (33.92%) Loss: 1.640156 LR: 0.00004269 [05:43:30] Epoch: 1 Batch: 13020/38378 (33.93%) Loss: 2.077279 LR: 0.00004269 [05:43:32] Epoch: 1 Batch: 13021/38378 (33.93%) Loss: 1.744384 LR: 0.00004269 [05:43:34] Epoch: 1 Batch: 13022/38378 (33.93%) Loss: 2.111489 LR: 0.00004268 [05:43:35] Epoch: 1 Batch: 13023/38378 (33.93%) Loss: 2.029477 LR: 0.00004268 [05:43:37] Epoch: 1 Batch: 13024/38378 (33.94%) Loss: 2.027216 LR: 0.00004268 [05:43:39] Epoch: 1 Batch: 13025/38378 (33.94%) Loss: 2.097284 LR: 0.00004268 [05:43:41] Epoch: 1 Batch: 13026/38378 (33.94%) Loss: 2.215845 LR: 0.00004268 [05:43:43] Epoch: 1 Batch: 13027/38378 (33.94%) Loss: 1.855262 LR: 0.00004268 [05:43:44] Epoch: 1 Batch: 13028/38378 (33.95%) Loss: 1.999833 LR: 0.00004268 [05:43:46] Epoch: 1 Batch: 13029/38378 (33.95%) Loss: 2.005496 LR: 0.00004267 [05:43:48] Epoch: 1 Batch: 13030/38378 (33.95%) Loss: 2.030754 LR: 0.00004267 [05:43:50] Epoch: 1 Batch: 13031/38378 (33.95%) Loss: 1.888929 LR: 0.00004267 [05:43:52] Epoch: 1 Batch: 13032/38378 (33.96%) Loss: 1.834360 LR: 0.00004267 [05:43:54] Epoch: 1 Batch: 13033/38378 (33.96%) Loss: 1.919141 LR: 0.00004267 [05:43:55] Epoch: 1 Batch: 13034/38378 (33.96%) Loss: 1.950292 LR: 0.00004267 [05:43:57] Epoch: 1 Batch: 13035/38378 (33.96%) Loss: 2.050697 LR: 0.00004267 [05:43:59] Epoch: 1 Batch: 13036/38378 (33.97%) Loss: 1.940403 LR: 0.00004266 [05:44:01] Epoch: 1 Batch: 13037/38378 (33.97%) Loss: 2.111595 LR: 0.00004266 [05:44:02] Epoch: 1 Batch: 13038/38378 (33.97%) Loss: 1.763976 LR: 0.00004266 [05:44:04] Epoch: 1 Batch: 13039/38378 (33.98%) Loss: 1.769884 LR: 0.00004266 [05:44:06] Epoch: 1 Batch: 13040/38378 (33.98%) Loss: 2.158786 LR: 0.00004266 [05:44:08] Epoch: 1 Batch: 13041/38378 (33.98%) Loss: 1.642797 LR: 0.00004266 [05:44:10] Epoch: 1 Batch: 13042/38378 (33.98%) Loss: 1.911810 LR: 0.00004266 [05:44:11] Epoch: 1 Batch: 13043/38378 (33.99%) Loss: 2.181163 LR: 0.00004265 [05:44:13] Epoch: 1 Batch: 13044/38378 (33.99%) Loss: 2.070155 LR: 0.00004265 [05:44:15] Epoch: 1 Batch: 13045/38378 (33.99%) Loss: 1.968288 LR: 0.00004265 [05:44:17] Epoch: 1 Batch: 13046/38378 (33.99%) Loss: 2.058956 LR: 0.00004265 [05:44:19] Epoch: 1 Batch: 13047/38378 (34.00%) Loss: 1.869220 LR: 0.00004265 [05:44:20] Epoch: 1 Batch: 13048/38378 (34.00%) Loss: 2.148631 LR: 0.00004265 [05:44:22] Epoch: 1 Batch: 13049/38378 (34.00%) Loss: 2.082159 LR: 0.00004265 [05:44:24] Epoch: 1 Batch: 13050/38378 (34.00%) Loss: 2.041231 LR: 0.00004264 [05:44:26] Epoch: 1 Batch: 13051/38378 (34.01%) Loss: 2.138621 LR: 0.00004264 [05:44:28] Epoch: 1 Batch: 13052/38378 (34.01%) Loss: 2.149018 LR: 0.00004264 [05:44:29] Epoch: 1 Batch: 13053/38378 (34.01%) Loss: 2.049479 LR: 0.00004264 [05:44:31] Epoch: 1 Batch: 13054/38378 (34.01%) Loss: 2.028013 LR: 0.00004264 [05:44:33] Epoch: 1 Batch: 13055/38378 (34.02%) Loss: 1.957628 LR: 0.00004264 [05:44:35] Epoch: 1 Batch: 13056/38378 (34.02%) Loss: 1.936740 LR: 0.00004264 [05:44:37] Epoch: 1 Batch: 13057/38378 (34.02%) Loss: 2.191483 LR: 0.00004263 [05:44:38] Epoch: 1 Batch: 13058/38378 (34.02%) Loss: 1.922431 LR: 0.00004263 [05:44:40] Epoch: 1 Batch: 13059/38378 (34.03%) Loss: 1.986980 LR: 0.00004263 [05:44:42] Epoch: 1 Batch: 13060/38378 (34.03%) Loss: 2.086160 LR: 0.00004263 [05:44:44] Epoch: 1 Batch: 13061/38378 (34.03%) Loss: 2.072303 LR: 0.00004263 [05:44:46] Epoch: 1 Batch: 13062/38378 (34.04%) Loss: 1.959458 LR: 0.00004263 [05:44:47] Epoch: 1 Batch: 13063/38378 (34.04%) Loss: 2.361362 LR: 0.00004263 [05:44:49] Epoch: 1 Batch: 13064/38378 (34.04%) Loss: 1.930554 LR: 0.00004262 [05:44:51] Epoch: 1 Batch: 13065/38378 (34.04%) Loss: 1.946173 LR: 0.00004262 [05:44:53] Epoch: 1 Batch: 13066/38378 (34.05%) Loss: 1.983998 LR: 0.00004262 [05:44:55] Epoch: 1 Batch: 13067/38378 (34.05%) Loss: 1.817589 LR: 0.00004262 [05:44:57] Epoch: 1 Batch: 13068/38378 (34.05%) Loss: 2.005868 LR: 0.00004262 [05:44:58] Epoch: 1 Batch: 13069/38378 (34.05%) Loss: 2.200675 LR: 0.00004262 [05:45:00] Epoch: 1 Batch: 13070/38378 (34.06%) Loss: 2.003413 LR: 0.00004262 [05:45:02] Epoch: 1 Batch: 13071/38378 (34.06%) Loss: 1.922446 LR: 0.00004261 [05:45:04] Epoch: 1 Batch: 13072/38378 (34.06%) Loss: 2.318260 LR: 0.00004261 [05:45:06] Epoch: 1 Batch: 13073/38378 (34.06%) Loss: 2.130422 LR: 0.00004261 [05:45:07] Epoch: 1 Batch: 13074/38378 (34.07%) Loss: 2.040680 LR: 0.00004261 [05:45:09] Epoch: 1 Batch: 13075/38378 (34.07%) Loss: 2.125859 LR: 0.00004261 [05:45:11] Epoch: 1 Batch: 13076/38378 (34.07%) Loss: 2.058406 LR: 0.00004261 [05:45:13] Epoch: 1 Batch: 13077/38378 (34.07%) Loss: 2.049999 LR: 0.00004261 [05:45:15] Epoch: 1 Batch: 13078/38378 (34.08%) Loss: 1.991202 LR: 0.00004260 [05:45:17] Epoch: 1 Batch: 13079/38378 (34.08%) Loss: 1.855305 LR: 0.00004260 [05:45:18] Epoch: 1 Batch: 13080/38378 (34.08%) Loss: 2.138676 LR: 0.00004260 [05:45:20] Epoch: 1 Batch: 13081/38378 (34.08%) Loss: 2.152064 LR: 0.00004260 [05:45:22] Epoch: 1 Batch: 13082/38378 (34.09%) Loss: 1.919128 LR: 0.00004260 [05:45:24] Epoch: 1 Batch: 13083/38378 (34.09%) Loss: 1.862850 LR: 0.00004260 [05:45:26] Epoch: 1 Batch: 13084/38378 (34.09%) Loss: 1.895793 LR: 0.00004260 [05:45:27] Epoch: 1 Batch: 13085/38378 (34.10%) Loss: 1.778964 LR: 0.00004259 [05:45:29] Epoch: 1 Batch: 13086/38378 (34.10%) Loss: 1.926393 LR: 0.00004259 [05:45:31] Epoch: 1 Batch: 13087/38378 (34.10%) Loss: 2.001998 LR: 0.00004259 [05:45:33] Epoch: 1 Batch: 13088/38378 (34.10%) Loss: 1.971563 LR: 0.00004259 [05:45:35] Epoch: 1 Batch: 13089/38378 (34.11%) Loss: 2.099659 LR: 0.00004259 [05:45:37] Epoch: 1 Batch: 13090/38378 (34.11%) Loss: 2.147163 LR: 0.00004259 [05:45:38] Epoch: 1 Batch: 13091/38378 (34.11%) Loss: 2.001324 LR: 0.00004259 [05:45:40] Epoch: 1 Batch: 13092/38378 (34.11%) Loss: 2.281204 LR: 0.00004258 [05:45:42] Epoch: 1 Batch: 13093/38378 (34.12%) Loss: 2.140101 LR: 0.00004258 [05:45:44] Epoch: 1 Batch: 13094/38378 (34.12%) Loss: 1.933332 LR: 0.00004258 [05:45:45] Epoch: 1 Batch: 13095/38378 (34.12%) Loss: 1.871189 LR: 0.00004258 [05:45:47] Epoch: 1 Batch: 13096/38378 (34.12%) Loss: 1.726926 LR: 0.00004258 [05:45:49] Epoch: 1 Batch: 13097/38378 (34.13%) Loss: 1.987501 LR: 0.00004258 [05:45:51] Epoch: 1 Batch: 13098/38378 (34.13%) Loss: 2.628274 LR: 0.00004258 [05:45:53] Epoch: 1 Batch: 13099/38378 (34.13%) Loss: 1.951495 LR: 0.00004258 [05:45:59] >> Cleaned up old temp checkpoint: epoch1_step12100 [05:45:59] >> Temp checkpoint saved: epoch1_step13100, size: 0.1702 GB [05:45:59] Epoch: 1 Batch: 13100/38378 (34.13%) Loss: 1.983302 LR: 0.00004258 [05:46:00] Epoch: 1 Batch: 13101/38378 (34.14%) Loss: 1.998825 LR: 0.00004258 [05:46:02] Epoch: 1 Batch: 13102/38378 (34.14%) Loss: 1.707277 LR: 0.00004258 [05:46:04] Epoch: 1 Batch: 13103/38378 (34.14%) Loss: 1.872849 LR: 0.00004258 [05:46:06] Epoch: 1 Batch: 13104/38378 (34.14%) Loss: 2.176412 LR: 0.00004258 [05:46:08] Epoch: 1 Batch: 13105/38378 (34.15%) Loss: 1.911239 LR: 0.00004258 [05:46:10] Epoch: 1 Batch: 13106/38378 (34.15%) Loss: 1.885378 LR: 0.00004257 [05:46:11] Epoch: 1 Batch: 13107/38378 (34.15%) Loss: 2.121088 LR: 0.00004257 [05:46:13] Epoch: 1 Batch: 13108/38378 (34.15%) Loss: 2.362378 LR: 0.00004257 [05:46:15] Epoch: 1 Batch: 13109/38378 (34.16%) Loss: 2.173657 LR: 0.00004257 [05:46:17] Epoch: 1 Batch: 13110/38378 (34.16%) Loss: 1.933452 LR: 0.00004257 [05:46:19] Epoch: 1 Batch: 13111/38378 (34.16%) Loss: 2.118683 LR: 0.00004257 [05:46:21] Epoch: 1 Batch: 13112/38378 (34.17%) Loss: 2.012311 LR: 0.00004257 [05:46:22] Epoch: 1 Batch: 13113/38378 (34.17%) Loss: 2.245366 LR: 0.00004256 [05:46:24] Epoch: 1 Batch: 13114/38378 (34.17%) Loss: 2.210930 LR: 0.00004256 [05:46:26] Epoch: 1 Batch: 13115/38378 (34.17%) Loss: 2.024521 LR: 0.00004256 [05:46:28] Epoch: 1 Batch: 13116/38378 (34.18%) Loss: 2.154032 LR: 0.00004256 [05:46:30] Epoch: 1 Batch: 13117/38378 (34.18%) Loss: 2.049502 LR: 0.00004256 [05:46:32] Epoch: 1 Batch: 13118/38378 (34.18%) Loss: 1.937661 LR: 0.00004256 [05:46:33] Epoch: 1 Batch: 13119/38378 (34.18%) Loss: 2.172588 LR: 0.00004256 [05:46:35] Epoch: 1 Batch: 13120/38378 (34.19%) Loss: 1.955335 LR: 0.00004255 [05:46:37] Epoch: 1 Batch: 13121/38378 (34.19%) Loss: 1.950321 LR: 0.00004255 [05:46:39] Epoch: 1 Batch: 13122/38378 (34.19%) Loss: 1.979441 LR: 0.00004255 [05:46:41] Epoch: 1 Batch: 13123/38378 (34.19%) Loss: 1.909245 LR: 0.00004255 [05:46:43] Epoch: 1 Batch: 13124/38378 (34.20%) Loss: 2.030633 LR: 0.00004255 [05:46:44] Epoch: 1 Batch: 13125/38378 (34.20%) Loss: 1.829985 LR: 0.00004255 [05:46:46] Epoch: 1 Batch: 13126/38378 (34.20%) Loss: 1.991010 LR: 0.00004255 [05:46:48] Epoch: 1 Batch: 13127/38378 (34.20%) Loss: 2.160527 LR: 0.00004254 [05:46:50] Epoch: 1 Batch: 13128/38378 (34.21%) Loss: 2.170580 LR: 0.00004254 [05:46:52] Epoch: 1 Batch: 13129/38378 (34.21%) Loss: 2.002055 LR: 0.00004254 [05:46:53] Epoch: 1 Batch: 13130/38378 (34.21%) Loss: 1.875975 LR: 0.00004254 [05:46:55] Epoch: 1 Batch: 13131/38378 (34.21%) Loss: 1.889332 LR: 0.00004254 [05:46:57] Epoch: 1 Batch: 13132/38378 (34.22%) Loss: 2.100429 LR: 0.00004254 [05:46:59] Epoch: 1 Batch: 13133/38378 (34.22%) Loss: 1.726952 LR: 0.00004254 [05:47:01] Epoch: 1 Batch: 13134/38378 (34.22%) Loss: 1.993638 LR: 0.00004253 [05:47:03] Epoch: 1 Batch: 13135/38378 (34.23%) Loss: 1.887817 LR: 0.00004253 [05:47:04] Epoch: 1 Batch: 13136/38378 (34.23%) Loss: 1.672557 LR: 0.00004253 [05:47:06] Epoch: 1 Batch: 13137/38378 (34.23%) Loss: 1.871205 LR: 0.00004253 [05:47:08] Epoch: 1 Batch: 13138/38378 (34.23%) Loss: 2.078743 LR: 0.00004253 [05:47:10] Epoch: 1 Batch: 13139/38378 (34.24%) Loss: 1.713944 LR: 0.00004253 [05:47:12] Epoch: 1 Batch: 13140/38378 (34.24%) Loss: 2.116251 LR: 0.00004253 [05:47:13] Epoch: 1 Batch: 13141/38378 (34.24%) Loss: 2.208701 LR: 0.00004252 [05:47:15] Epoch: 1 Batch: 13142/38378 (34.24%) Loss: 1.958206 LR: 0.00004252 [05:47:17] Epoch: 1 Batch: 13143/38378 (34.25%) Loss: 1.691364 LR: 0.00004252 [05:47:19] Epoch: 1 Batch: 13144/38378 (34.25%) Loss: 1.900515 LR: 0.00004252 [05:47:21] Epoch: 1 Batch: 13145/38378 (34.25%) Loss: 2.057109 LR: 0.00004252 [05:47:22] Epoch: 1 Batch: 13146/38378 (34.25%) Loss: 1.955305 LR: 0.00004252 [05:47:24] Epoch: 1 Batch: 13147/38378 (34.26%) Loss: 1.986012 LR: 0.00004252 [05:47:26] Epoch: 1 Batch: 13148/38378 (34.26%) Loss: 2.107869 LR: 0.00004251 [05:47:28] Epoch: 1 Batch: 13149/38378 (34.26%) Loss: 1.626263 LR: 0.00004251 [05:47:30] Epoch: 1 Batch: 13150/38378 (34.26%) Loss: 1.959767 LR: 0.00004251 [05:47:31] Epoch: 1 Batch: 13151/38378 (34.27%) Loss: 1.921284 LR: 0.00004251 [05:47:33] Epoch: 1 Batch: 13152/38378 (34.27%) Loss: 1.791373 LR: 0.00004251 [05:47:35] Epoch: 1 Batch: 13153/38378 (34.27%) Loss: 1.907134 LR: 0.00004251 [05:47:37] Epoch: 1 Batch: 13154/38378 (34.27%) Loss: 1.684662 LR: 0.00004251 [05:47:39] Epoch: 1 Batch: 13155/38378 (34.28%) Loss: 1.866345 LR: 0.00004250 [05:47:40] Epoch: 1 Batch: 13156/38378 (34.28%) Loss: 1.858812 LR: 0.00004250 [05:47:42] Epoch: 1 Batch: 13157/38378 (34.28%) Loss: 1.947849 LR: 0.00004250 [05:47:44] Epoch: 1 Batch: 13158/38378 (34.29%) Loss: 2.138609 LR: 0.00004250 [05:47:46] Epoch: 1 Batch: 13159/38378 (34.29%) Loss: 1.849747 LR: 0.00004250 [05:47:48] Epoch: 1 Batch: 13160/38378 (34.29%) Loss: 1.688174 LR: 0.00004250 [05:47:49] Epoch: 1 Batch: 13161/38378 (34.29%) Loss: 2.129961 LR: 0.00004250 [05:47:51] Epoch: 1 Batch: 13162/38378 (34.30%) Loss: 2.375204 LR: 0.00004249 [05:47:53] Epoch: 1 Batch: 13163/38378 (34.30%) Loss: 1.959734 LR: 0.00004249 [05:47:55] Epoch: 1 Batch: 13164/38378 (34.30%) Loss: 2.078999 LR: 0.00004249 [05:47:57] Epoch: 1 Batch: 13165/38378 (34.30%) Loss: 2.302633 LR: 0.00004249 [05:47:58] Epoch: 1 Batch: 13166/38378 (34.31%) Loss: 2.067385 LR: 0.00004249 [05:48:00] Epoch: 1 Batch: 13167/38378 (34.31%) Loss: 1.877386 LR: 0.00004249 [05:48:02] Epoch: 1 Batch: 13168/38378 (34.31%) Loss: 2.060139 LR: 0.00004249 [05:48:04] Epoch: 1 Batch: 13169/38378 (34.31%) Loss: 2.660238 LR: 0.00004248 [05:48:06] Epoch: 1 Batch: 13170/38378 (34.32%) Loss: 2.177090 LR: 0.00004248 [05:48:07] Epoch: 1 Batch: 13171/38378 (34.32%) Loss: 2.053263 LR: 0.00004248 [05:48:09] Epoch: 1 Batch: 13172/38378 (34.32%) Loss: 2.294924 LR: 0.00004248 [05:48:11] Epoch: 1 Batch: 13173/38378 (34.32%) Loss: 1.996299 LR: 0.00004248 [05:48:13] Epoch: 1 Batch: 13174/38378 (34.33%) Loss: 2.274633 LR: 0.00004248 [05:48:15] Epoch: 1 Batch: 13175/38378 (34.33%) Loss: 1.992234 LR: 0.00004248 [05:48:16] Epoch: 1 Batch: 13176/38378 (34.33%) Loss: 2.008749 LR: 0.00004247 [05:48:18] Epoch: 1 Batch: 13177/38378 (34.33%) Loss: 1.934093 LR: 0.00004247 [05:48:20] Epoch: 1 Batch: 13178/38378 (34.34%) Loss: 1.854028 LR: 0.00004247 [05:48:22] Epoch: 1 Batch: 13179/38378 (34.34%) Loss: 2.015068 LR: 0.00004247 [05:48:24] Epoch: 1 Batch: 13180/38378 (34.34%) Loss: 2.241898 LR: 0.00004247 [05:48:26] Epoch: 1 Batch: 13181/38378 (34.35%) Loss: 1.611296 LR: 0.00004247 [05:48:27] Epoch: 1 Batch: 13182/38378 (34.35%) Loss: 1.997587 LR: 0.00004247 [05:48:29] Epoch: 1 Batch: 13183/38378 (34.35%) Loss: 1.983091 LR: 0.00004246 [05:48:31] Epoch: 1 Batch: 13184/38378 (34.35%) Loss: 1.985121 LR: 0.00004246 [05:48:33] Epoch: 1 Batch: 13185/38378 (34.36%) Loss: 1.991649 LR: 0.00004246 [05:48:35] Epoch: 1 Batch: 13186/38378 (34.36%) Loss: 2.012565 LR: 0.00004246 [05:48:36] Epoch: 1 Batch: 13187/38378 (34.36%) Loss: 2.153886 LR: 0.00004246 [05:48:38] Epoch: 1 Batch: 13188/38378 (34.36%) Loss: 1.908426 LR: 0.00004246 [05:48:40] Epoch: 1 Batch: 13189/38378 (34.37%) Loss: 1.804510 LR: 0.00004246 [05:48:42] Epoch: 1 Batch: 13190/38378 (34.37%) Loss: 1.972363 LR: 0.00004245 [05:48:44] Epoch: 1 Batch: 13191/38378 (34.37%) Loss: 2.246060 LR: 0.00004245 [05:48:46] Epoch: 1 Batch: 13192/38378 (34.37%) Loss: 1.842425 LR: 0.00004245 [05:48:47] Epoch: 1 Batch: 13193/38378 (34.38%) Loss: 1.983286 LR: 0.00004245 [05:48:49] Epoch: 1 Batch: 13194/38378 (34.38%) Loss: 2.026939 LR: 0.00004245 [05:48:51] Epoch: 1 Batch: 13195/38378 (34.38%) Loss: 2.105064 LR: 0.00004245 [05:48:53] Epoch: 1 Batch: 13196/38378 (34.38%) Loss: 2.002143 LR: 0.00004245 [05:48:55] Epoch: 1 Batch: 13197/38378 (34.39%) Loss: 1.822451 LR: 0.00004244 [05:48:56] Epoch: 1 Batch: 13198/38378 (34.39%) Loss: 2.244338 LR: 0.00004244 [05:48:58] Epoch: 1 Batch: 13199/38378 (34.39%) Loss: 1.763859 LR: 0.00004244 [05:49:04] >> Cleaned up old temp checkpoint: epoch1_step12200 [05:49:04] >> Temp checkpoint saved: epoch1_step13200, size: 0.1702 GB [05:49:04] Epoch: 1 Batch: 13200/38378 (34.39%) Loss: 2.100701 LR: 0.00004244 [05:49:06] Epoch: 1 Batch: 13201/38378 (34.40%) Loss: 1.785719 LR: 0.00004244 [05:49:08] Epoch: 1 Batch: 13202/38378 (34.40%) Loss: 1.935935 LR: 0.00004244 [05:49:10] Epoch: 1 Batch: 13203/38378 (34.40%) Loss: 2.117636 LR: 0.00004244 [05:49:11] Epoch: 1 Batch: 13204/38378 (34.41%) Loss: 1.947877 LR: 0.00004243 [05:49:13] Epoch: 1 Batch: 13205/38378 (34.41%) Loss: 1.870759 LR: 0.00004243 [05:49:15] Epoch: 1 Batch: 13206/38378 (34.41%) Loss: 2.170008 LR: 0.00004243 [05:49:17] Epoch: 1 Batch: 13207/38378 (34.41%) Loss: 2.054411 LR: 0.00004243 [05:49:18] Epoch: 1 Batch: 13208/38378 (34.42%) Loss: 2.274306 LR: 0.00004243 [05:49:20] Epoch: 1 Batch: 13209/38378 (34.42%) Loss: 1.869896 LR: 0.00004243 [05:49:22] Epoch: 1 Batch: 13210/38378 (34.42%) Loss: 1.909627 LR: 0.00004243 [05:49:24] Epoch: 1 Batch: 13211/38378 (34.42%) Loss: 1.851303 LR: 0.00004242 [05:49:26] Epoch: 1 Batch: 13212/38378 (34.43%) Loss: 1.793344 LR: 0.00004242 [05:49:28] Epoch: 1 Batch: 13213/38378 (34.43%) Loss: 2.052407 LR: 0.00004242 [05:49:29] Epoch: 1 Batch: 13214/38378 (34.43%) Loss: 1.734568 LR: 0.00004242 [05:49:31] Epoch: 1 Batch: 13215/38378 (34.43%) Loss: 1.785359 LR: 0.00004242 [05:49:33] Epoch: 1 Batch: 13216/38378 (34.44%) Loss: 2.025215 LR: 0.00004242 [05:49:35] Epoch: 1 Batch: 13217/38378 (34.44%) Loss: 2.047751 LR: 0.00004242 [05:49:37] Epoch: 1 Batch: 13218/38378 (34.44%) Loss: 1.698653 LR: 0.00004241 [05:49:38] Epoch: 1 Batch: 13219/38378 (34.44%) Loss: 2.143253 LR: 0.00004241 [05:49:40] Epoch: 1 Batch: 13220/38378 (34.45%) Loss: 2.177462 LR: 0.00004241 [05:49:42] Epoch: 1 Batch: 13221/38378 (34.45%) Loss: 1.964672 LR: 0.00004241 [05:49:44] Epoch: 1 Batch: 13222/38378 (34.45%) Loss: 1.929907 LR: 0.00004241 [05:49:46] Epoch: 1 Batch: 13223/38378 (34.45%) Loss: 1.833694 LR: 0.00004241 [05:49:47] Epoch: 1 Batch: 13224/38378 (34.46%) Loss: 2.016912 LR: 0.00004241 [05:49:49] Epoch: 1 Batch: 13225/38378 (34.46%) Loss: 1.958197 LR: 0.00004240 [05:49:51] Epoch: 1 Batch: 13226/38378 (34.46%) Loss: 1.988841 LR: 0.00004240 [05:49:53] Epoch: 1 Batch: 13227/38378 (34.47%) Loss: 2.003938 LR: 0.00004240 [05:49:55] Epoch: 1 Batch: 13228/38378 (34.47%) Loss: 2.005895 LR: 0.00004240 [05:49:57] Epoch: 1 Batch: 13229/38378 (34.47%) Loss: 1.900933 LR: 0.00004240 [05:49:58] Epoch: 1 Batch: 13230/38378 (34.47%) Loss: 2.228399 LR: 0.00004240 [05:50:00] Epoch: 1 Batch: 13231/38378 (34.48%) Loss: 1.774501 LR: 0.00004240 [05:50:02] Epoch: 1 Batch: 13232/38378 (34.48%) Loss: 2.069670 LR: 0.00004239 [05:50:04] Epoch: 1 Batch: 13233/38378 (34.48%) Loss: 1.917722 LR: 0.00004239 [05:50:06] Epoch: 1 Batch: 13234/38378 (34.48%) Loss: 2.197013 LR: 0.00004239 [05:50:08] Epoch: 1 Batch: 13235/38378 (34.49%) Loss: 1.749062 LR: 0.00004239 [05:50:09] Epoch: 1 Batch: 13236/38378 (34.49%) Loss: 1.852439 LR: 0.00004239 [05:50:11] Epoch: 1 Batch: 13237/38378 (34.49%) Loss: 1.688143 LR: 0.00004239 [05:50:13] Epoch: 1 Batch: 13238/38378 (34.49%) Loss: 2.153611 LR: 0.00004239 [05:50:15] Epoch: 1 Batch: 13239/38378 (34.50%) Loss: 2.309537 LR: 0.00004238 [05:50:16] Epoch: 1 Batch: 13240/38378 (34.50%) Loss: 2.077926 LR: 0.00004238 [05:50:18] Epoch: 1 Batch: 13241/38378 (34.50%) Loss: 2.352467 LR: 0.00004238 [05:50:20] Epoch: 1 Batch: 13242/38378 (34.50%) Loss: 1.790173 LR: 0.00004238 [05:50:22] Epoch: 1 Batch: 13243/38378 (34.51%) Loss: 1.892629 LR: 0.00004238 [05:50:24] Epoch: 1 Batch: 13244/38378 (34.51%) Loss: 2.185123 LR: 0.00004238 [05:50:25] Epoch: 1 Batch: 13245/38378 (34.51%) Loss: 1.866223 LR: 0.00004238 [05:50:27] Epoch: 1 Batch: 13246/38378 (34.51%) Loss: 2.075740 LR: 0.00004237 [05:50:29] Epoch: 1 Batch: 13247/38378 (34.52%) Loss: 1.835478 LR: 0.00004237 [05:50:31] Epoch: 1 Batch: 13248/38378 (34.52%) Loss: 1.736452 LR: 0.00004237 [05:50:33] Epoch: 1 Batch: 13249/38378 (34.52%) Loss: 2.077447 LR: 0.00004237 [05:50:34] Epoch: 1 Batch: 13250/38378 (34.52%) Loss: 2.078123 LR: 0.00004237 [05:50:36] Epoch: 1 Batch: 13251/38378 (34.53%) Loss: 2.119994 LR: 0.00004237 [05:50:38] Epoch: 1 Batch: 13252/38378 (34.53%) Loss: 1.857483 LR: 0.00004237 [05:50:40] Epoch: 1 Batch: 13253/38378 (34.53%) Loss: 2.225085 LR: 0.00004236 [05:50:42] Epoch: 1 Batch: 13254/38378 (34.54%) Loss: 2.138425 LR: 0.00004236 [05:50:43] Epoch: 1 Batch: 13255/38378 (34.54%) Loss: 1.958605 LR: 0.00004236 [05:50:45] Epoch: 1 Batch: 13256/38378 (34.54%) Loss: 2.281644 LR: 0.00004236 [05:50:47] Epoch: 1 Batch: 13257/38378 (34.54%) Loss: 2.183703 LR: 0.00004236 [05:50:49] Epoch: 1 Batch: 13258/38378 (34.55%) Loss: 2.035292 LR: 0.00004236 [05:50:51] Epoch: 1 Batch: 13259/38378 (34.55%) Loss: 2.244840 LR: 0.00004236 [05:50:52] Epoch: 1 Batch: 13260/38378 (34.55%) Loss: 2.107781 LR: 0.00004235 [05:50:54] Epoch: 1 Batch: 13261/38378 (34.55%) Loss: 2.039489 LR: 0.00004235 [05:50:56] Epoch: 1 Batch: 13262/38378 (34.56%) Loss: 2.001051 LR: 0.00004235 [05:50:58] Epoch: 1 Batch: 13263/38378 (34.56%) Loss: 1.827658 LR: 0.00004235 [05:51:00] Epoch: 1 Batch: 13264/38378 (34.56%) Loss: 1.865356 LR: 0.00004235 [05:51:01] Epoch: 1 Batch: 13265/38378 (34.56%) Loss: 1.823131 LR: 0.00004235 [05:51:03] Epoch: 1 Batch: 13266/38378 (34.57%) Loss: 1.951315 LR: 0.00004235 [05:51:05] Epoch: 1 Batch: 13267/38378 (34.57%) Loss: 2.006791 LR: 0.00004234 [05:51:07] Epoch: 1 Batch: 13268/38378 (34.57%) Loss: 2.199476 LR: 0.00004234 [05:51:08] Epoch: 1 Batch: 13269/38378 (34.57%) Loss: 2.477911 LR: 0.00004234 [05:51:10] Epoch: 1 Batch: 13270/38378 (34.58%) Loss: 2.500845 LR: 0.00004234 [05:51:12] Epoch: 1 Batch: 13271/38378 (34.58%) Loss: 2.058877 LR: 0.00004234 [05:51:14] Epoch: 1 Batch: 13272/38378 (34.58%) Loss: 2.004149 LR: 0.00004234 [05:51:16] Epoch: 1 Batch: 13273/38378 (34.58%) Loss: 1.941544 LR: 0.00004234 [05:51:17] Epoch: 1 Batch: 13274/38378 (34.59%) Loss: 2.128744 LR: 0.00004233 [05:51:19] Epoch: 1 Batch: 13275/38378 (34.59%) Loss: 1.964374 LR: 0.00004233 [05:51:21] Epoch: 1 Batch: 13276/38378 (34.59%) Loss: 1.830986 LR: 0.00004233 [05:51:23] Epoch: 1 Batch: 13277/38378 (34.60%) Loss: 2.257016 LR: 0.00004233 [05:51:25] Epoch: 1 Batch: 13278/38378 (34.60%) Loss: 1.992571 LR: 0.00004233 [05:51:26] Epoch: 1 Batch: 13279/38378 (34.60%) Loss: 2.021426 LR: 0.00004233 [05:51:28] Epoch: 1 Batch: 13280/38378 (34.60%) Loss: 2.018185 LR: 0.00004233 [05:51:30] Epoch: 1 Batch: 13281/38378 (34.61%) Loss: 2.144431 LR: 0.00004232 [05:51:32] Epoch: 1 Batch: 13282/38378 (34.61%) Loss: 1.859821 LR: 0.00004232 [05:51:34] Epoch: 1 Batch: 13283/38378 (34.61%) Loss: 1.624555 LR: 0.00004232 [05:51:35] Epoch: 1 Batch: 13284/38378 (34.61%) Loss: 2.109526 LR: 0.00004232 [05:51:37] Epoch: 1 Batch: 13285/38378 (34.62%) Loss: 2.143557 LR: 0.00004232 [05:51:39] Epoch: 1 Batch: 13286/38378 (34.62%) Loss: 2.069052 LR: 0.00004232 [05:51:41] Epoch: 1 Batch: 13287/38378 (34.62%) Loss: 2.231863 LR: 0.00004232 [05:51:43] Epoch: 1 Batch: 13288/38378 (34.62%) Loss: 2.064797 LR: 0.00004231 [05:51:45] Epoch: 1 Batch: 13289/38378 (34.63%) Loss: 2.145540 LR: 0.00004231 [05:51:46] Epoch: 1 Batch: 13290/38378 (34.63%) Loss: 1.739478 LR: 0.00004231 [05:51:48] Epoch: 1 Batch: 13291/38378 (34.63%) Loss: 2.068380 LR: 0.00004231 [05:51:50] Epoch: 1 Batch: 13292/38378 (34.63%) Loss: 1.824146 LR: 0.00004231 [05:51:52] Epoch: 1 Batch: 13293/38378 (34.64%) Loss: 1.993745 LR: 0.00004231 [05:51:54] Epoch: 1 Batch: 13294/38378 (34.64%) Loss: 1.845356 LR: 0.00004231 [05:51:55] Epoch: 1 Batch: 13295/38378 (34.64%) Loss: 1.640989 LR: 0.00004230 [05:51:57] Epoch: 1 Batch: 13296/38378 (34.64%) Loss: 1.937721 LR: 0.00004230 [05:51:59] Epoch: 1 Batch: 13297/38378 (34.65%) Loss: 1.559004 LR: 0.00004230 [05:52:01] Epoch: 1 Batch: 13298/38378 (34.65%) Loss: 1.826569 LR: 0.00004230 [05:52:03] Epoch: 1 Batch: 13299/38378 (34.65%) Loss: 1.883792 LR: 0.00004230 [05:52:09] >> Cleaned up old temp checkpoint: epoch1_step12300 [05:52:09] >> Temp checkpoint saved: epoch1_step13300, size: 0.1702 GB [05:52:09] Epoch: 1 Batch: 13300/38378 (34.66%) Loss: 1.958937 LR: 0.00004230 [05:52:10] Epoch: 1 Batch: 13301/38378 (34.66%) Loss: 1.776631 LR: 0.00004230 [05:52:12] Epoch: 1 Batch: 13302/38378 (34.66%) Loss: 1.887965 LR: 0.00004229 [05:52:14] Epoch: 1 Batch: 13303/38378 (34.66%) Loss: 2.017972 LR: 0.00004229 [05:52:16] Epoch: 1 Batch: 13304/38378 (34.67%) Loss: 2.041599 LR: 0.00004229 [05:52:17] Epoch: 1 Batch: 13305/38378 (34.67%) Loss: 2.037861 LR: 0.00004229 [05:52:19] Epoch: 1 Batch: 13306/38378 (34.67%) Loss: 1.667552 LR: 0.00004229 [05:52:21] Epoch: 1 Batch: 13307/38378 (34.67%) Loss: 1.656669 LR: 0.00004229 [05:52:23] Epoch: 1 Batch: 13308/38378 (34.68%) Loss: 2.475036 LR: 0.00004229 [05:52:25] Epoch: 1 Batch: 13309/38378 (34.68%) Loss: 2.273911 LR: 0.00004228 [05:52:26] Epoch: 1 Batch: 13310/38378 (34.68%) Loss: 2.314971 LR: 0.00004228 [05:52:28] Epoch: 1 Batch: 13311/38378 (34.68%) Loss: 2.071062 LR: 0.00004228 [05:52:30] Epoch: 1 Batch: 13312/38378 (34.69%) Loss: 2.130526 LR: 0.00004228 [05:52:32] Epoch: 1 Batch: 13313/38378 (34.69%) Loss: 2.241836 LR: 0.00004228 [05:52:33] Epoch: 1 Batch: 13314/38378 (34.69%) Loss: 2.115318 LR: 0.00004228 [05:52:35] Epoch: 1 Batch: 13315/38378 (34.69%) Loss: 2.092147 LR: 0.00004228 [05:52:37] Epoch: 1 Batch: 13316/38378 (34.70%) Loss: 2.040467 LR: 0.00004227 [05:52:39] Epoch: 1 Batch: 13317/38378 (34.70%) Loss: 2.090346 LR: 0.00004227 [05:52:41] Epoch: 1 Batch: 13318/38378 (34.70%) Loss: 2.040852 LR: 0.00004227 [05:52:42] Epoch: 1 Batch: 13319/38378 (34.70%) Loss: 2.151524 LR: 0.00004227 [05:52:44] Epoch: 1 Batch: 13320/38378 (34.71%) Loss: 2.059393 LR: 0.00004227 [05:52:46] Epoch: 1 Batch: 13321/38378 (34.71%) Loss: 1.983850 LR: 0.00004227 [05:52:47] Epoch: 1 Batch: 13322/38378 (34.71%) Loss: 2.299680 LR: 0.00004227 [05:52:49] Epoch: 1 Batch: 13323/38378 (34.72%) Loss: 2.116120 LR: 0.00004226 [05:52:51] Epoch: 1 Batch: 13324/38378 (34.72%) Loss: 2.252522 LR: 0.00004226 [05:52:53] Epoch: 1 Batch: 13325/38378 (34.72%) Loss: 2.144498 LR: 0.00004226 [05:52:55] Epoch: 1 Batch: 13326/38378 (34.72%) Loss: 2.426538 LR: 0.00004226 [05:52:57] Epoch: 1 Batch: 13327/38378 (34.73%) Loss: 1.984011 LR: 0.00004226 [05:52:58] Epoch: 1 Batch: 13328/38378 (34.73%) Loss: 1.803259 LR: 0.00004226 [05:53:00] Epoch: 1 Batch: 13329/38378 (34.73%) Loss: 2.258005 LR: 0.00004226 [05:53:02] Epoch: 1 Batch: 13330/38378 (34.73%) Loss: 2.009589 LR: 0.00004225 [05:53:04] Epoch: 1 Batch: 13331/38378 (34.74%) Loss: 2.331980 LR: 0.00004225 [05:53:06] Epoch: 1 Batch: 13332/38378 (34.74%) Loss: 1.973386 LR: 0.00004225 [05:53:08] Epoch: 1 Batch: 13333/38378 (34.74%) Loss: 2.004312 LR: 0.00004225 [05:53:09] Epoch: 1 Batch: 13334/38378 (34.74%) Loss: 1.941419 LR: 0.00004225 [05:53:11] Epoch: 1 Batch: 13335/38378 (34.75%) Loss: 1.937633 LR: 0.00004225 [05:53:13] Epoch: 1 Batch: 13336/38378 (34.75%) Loss: 2.316354 LR: 0.00004225 [05:53:15] Epoch: 1 Batch: 13337/38378 (34.75%) Loss: 2.157553 LR: 0.00004224 [05:53:17] Epoch: 1 Batch: 13338/38378 (34.75%) Loss: 1.859371 LR: 0.00004224 [05:53:18] Epoch: 1 Batch: 13339/38378 (34.76%) Loss: 2.026022 LR: 0.00004224 [05:53:20] Epoch: 1 Batch: 13340/38378 (34.76%) Loss: 1.883020 LR: 0.00004224 [05:53:22] Epoch: 1 Batch: 13341/38378 (34.76%) Loss: 1.876857 LR: 0.00004224 [05:53:24] Epoch: 1 Batch: 13342/38378 (34.76%) Loss: 1.861072 LR: 0.00004224 [05:53:26] Epoch: 1 Batch: 13343/38378 (34.77%) Loss: 1.968924 LR: 0.00004224 [05:53:27] Epoch: 1 Batch: 13344/38378 (34.77%) Loss: 1.795382 LR: 0.00004223 [05:53:29] Epoch: 1 Batch: 13345/38378 (34.77%) Loss: 2.137556 LR: 0.00004223 [05:53:31] Epoch: 1 Batch: 13346/38378 (34.78%) Loss: 2.126272 LR: 0.00004223 [05:53:33] Epoch: 1 Batch: 13347/38378 (34.78%) Loss: 1.844713 LR: 0.00004223 [05:53:35] Epoch: 1 Batch: 13348/38378 (34.78%) Loss: 2.082508 LR: 0.00004223 [05:53:36] Epoch: 1 Batch: 13349/38378 (34.78%) Loss: 2.067078 LR: 0.00004223 [05:53:38] Epoch: 1 Batch: 13350/38378 (34.79%) Loss: 2.165937 LR: 0.00004223 [05:53:40] Epoch: 1 Batch: 13351/38378 (34.79%) Loss: 2.081246 LR: 0.00004222 [05:53:42] Epoch: 1 Batch: 13352/38378 (34.79%) Loss: 1.906749 LR: 0.00004222 [05:53:44] Epoch: 1 Batch: 13353/38378 (34.79%) Loss: 1.992194 LR: 0.00004222 [05:53:46] Epoch: 1 Batch: 13354/38378 (34.80%) Loss: 1.801311 LR: 0.00004222 [05:53:47] Epoch: 1 Batch: 13355/38378 (34.80%) Loss: 1.848636 LR: 0.00004222 [05:53:49] Epoch: 1 Batch: 13356/38378 (34.80%) Loss: 1.821385 LR: 0.00004222 [05:53:51] Epoch: 1 Batch: 13357/38378 (34.80%) Loss: 2.166560 LR: 0.00004222 [05:53:53] Epoch: 1 Batch: 13358/38378 (34.81%) Loss: 2.255017 LR: 0.00004221 [05:53:55] Epoch: 1 Batch: 13359/38378 (34.81%) Loss: 2.082255 LR: 0.00004221 [05:53:56] Epoch: 1 Batch: 13360/38378 (34.81%) Loss: 2.035023 LR: 0.00004221 [05:53:58] Epoch: 1 Batch: 13361/38378 (34.81%) Loss: 1.823528 LR: 0.00004221 [05:54:00] Epoch: 1 Batch: 13362/38378 (34.82%) Loss: 1.926049 LR: 0.00004221 [05:54:02] Epoch: 1 Batch: 13363/38378 (34.82%) Loss: 2.131242 LR: 0.00004221 [05:54:04] Epoch: 1 Batch: 13364/38378 (34.82%) Loss: 2.098452 LR: 0.00004221 [05:54:05] Epoch: 1 Batch: 13365/38378 (34.82%) Loss: 1.941026 LR: 0.00004220 [05:54:07] Epoch: 1 Batch: 13366/38378 (34.83%) Loss: 1.880842 LR: 0.00004220 [05:54:09] Epoch: 1 Batch: 13367/38378 (34.83%) Loss: 1.894473 LR: 0.00004220 [05:54:11] Epoch: 1 Batch: 13368/38378 (34.83%) Loss: 1.948073 LR: 0.00004220 [05:54:13] Epoch: 1 Batch: 13369/38378 (34.84%) Loss: 2.073094 LR: 0.00004220 [05:54:14] Epoch: 1 Batch: 13370/38378 (34.84%) Loss: 2.029494 LR: 0.00004220 [05:54:16] Epoch: 1 Batch: 13371/38378 (34.84%) Loss: 2.064772 LR: 0.00004220 [05:54:18] Epoch: 1 Batch: 13372/38378 (34.84%) Loss: 1.934225 LR: 0.00004219 [05:54:20] Epoch: 1 Batch: 13373/38378 (34.85%) Loss: 2.126993 LR: 0.00004219 [05:54:22] Epoch: 1 Batch: 13374/38378 (34.85%) Loss: 2.274814 LR: 0.00004219 [05:54:23] Epoch: 1 Batch: 13375/38378 (34.85%) Loss: 1.965786 LR: 0.00004219 [05:54:25] Epoch: 1 Batch: 13376/38378 (34.85%) Loss: 2.016643 LR: 0.00004219 [05:54:27] Epoch: 1 Batch: 13377/38378 (34.86%) Loss: 2.064282 LR: 0.00004219 [05:54:29] Epoch: 1 Batch: 13378/38378 (34.86%) Loss: 1.920384 LR: 0.00004219 [05:54:31] Epoch: 1 Batch: 13379/38378 (34.86%) Loss: 2.221951 LR: 0.00004218 [05:54:33] Epoch: 1 Batch: 13380/38378 (34.86%) Loss: 1.944376 LR: 0.00004218 [05:54:34] Epoch: 1 Batch: 13381/38378 (34.87%) Loss: 2.203170 LR: 0.00004218 [05:54:36] Epoch: 1 Batch: 13382/38378 (34.87%) Loss: 2.018742 LR: 0.00004218 [05:54:38] Epoch: 1 Batch: 13383/38378 (34.87%) Loss: 1.968689 LR: 0.00004218 [05:54:40] Epoch: 1 Batch: 13384/38378 (34.87%) Loss: 1.768889 LR: 0.00004218 [05:54:42] Epoch: 1 Batch: 13385/38378 (34.88%) Loss: 1.945511 LR: 0.00004218 [05:54:43] Epoch: 1 Batch: 13386/38378 (34.88%) Loss: 2.151773 LR: 0.00004217 [05:54:45] Epoch: 1 Batch: 13387/38378 (34.88%) Loss: 1.858023 LR: 0.00004217 [05:54:47] Epoch: 1 Batch: 13388/38378 (34.88%) Loss: 1.717441 LR: 0.00004217 [05:54:49] Epoch: 1 Batch: 13389/38378 (34.89%) Loss: 2.159208 LR: 0.00004217 [05:54:51] Epoch: 1 Batch: 13390/38378 (34.89%) Loss: 1.774509 LR: 0.00004217 [05:54:52] Epoch: 1 Batch: 13391/38378 (34.89%) Loss: 1.913283 LR: 0.00004217 [05:54:54] Epoch: 1 Batch: 13392/38378 (34.89%) Loss: 1.717629 LR: 0.00004217 [05:54:56] Epoch: 1 Batch: 13393/38378 (34.90%) Loss: 1.769715 LR: 0.00004216 [05:54:58] Epoch: 1 Batch: 13394/38378 (34.90%) Loss: 2.207746 LR: 0.00004216 [05:55:00] Epoch: 1 Batch: 13395/38378 (34.90%) Loss: 1.949498 LR: 0.00004216 [05:55:02] Epoch: 1 Batch: 13396/38378 (34.91%) Loss: 1.844916 LR: 0.00004216 [05:55:03] Epoch: 1 Batch: 13397/38378 (34.91%) Loss: 1.621118 LR: 0.00004216 [05:55:05] Epoch: 1 Batch: 13398/38378 (34.91%) Loss: 1.798484 LR: 0.00004216 [05:55:07] Epoch: 1 Batch: 13399/38378 (34.91%) Loss: 2.249074 LR: 0.00004216 [05:55:13] >> Cleaned up old temp checkpoint: epoch1_step12400 [05:55:13] >> Temp checkpoint saved: epoch1_step13400, size: 0.1702 GB [05:55:13] Epoch: 1 Batch: 13400/38378 (34.92%) Loss: 2.321751 LR: 0.00004215 [05:55:15] Epoch: 1 Batch: 13401/38378 (34.92%) Loss: 1.824436 LR: 0.00004215 [05:55:17] Epoch: 1 Batch: 13402/38378 (34.92%) Loss: 1.976130 LR: 0.00004215 [05:55:19] Epoch: 1 Batch: 13403/38378 (34.92%) Loss: 2.107480 LR: 0.00004215 [05:55:20] Epoch: 1 Batch: 13404/38378 (34.93%) Loss: 1.973302 LR: 0.00004215 [05:55:22] Epoch: 1 Batch: 13405/38378 (34.93%) Loss: 2.180979 LR: 0.00004215 [05:55:24] Epoch: 1 Batch: 13406/38378 (34.93%) Loss: 2.062109 LR: 0.00004215 [05:55:26] Epoch: 1 Batch: 13407/38378 (34.93%) Loss: 1.934734 LR: 0.00004214 [05:55:28] Epoch: 1 Batch: 13408/38378 (34.94%) Loss: 2.226950 LR: 0.00004214 [05:55:30] Epoch: 1 Batch: 13409/38378 (34.94%) Loss: 1.942640 LR: 0.00004214 [05:55:31] Epoch: 1 Batch: 13410/38378 (34.94%) Loss: 1.940580 LR: 0.00004214 [05:55:33] Epoch: 1 Batch: 13411/38378 (34.94%) Loss: 1.744333 LR: 0.00004214 [05:55:35] Epoch: 1 Batch: 13412/38378 (34.95%) Loss: 2.140712 LR: 0.00004214 [05:55:37] Epoch: 1 Batch: 13413/38378 (34.95%) Loss: 1.893976 LR: 0.00004214 [05:55:39] Epoch: 1 Batch: 13414/38378 (34.95%) Loss: 1.833066 LR: 0.00004213 [05:55:40] Epoch: 1 Batch: 13415/38378 (34.95%) Loss: 1.919125 LR: 0.00004213 [05:55:42] Epoch: 1 Batch: 13416/38378 (34.96%) Loss: 1.911059 LR: 0.00004213 [05:55:44] Epoch: 1 Batch: 13417/38378 (34.96%) Loss: 2.095885 LR: 0.00004213 [05:55:46] Epoch: 1 Batch: 13418/38378 (34.96%) Loss: 2.101081 LR: 0.00004213 [05:55:48] Epoch: 1 Batch: 13419/38378 (34.97%) Loss: 2.036294 LR: 0.00004213 [05:55:50] Epoch: 1 Batch: 13420/38378 (34.97%) Loss: 1.882244 LR: 0.00004213 [05:55:51] Epoch: 1 Batch: 13421/38378 (34.97%) Loss: 2.075955 LR: 0.00004212 [05:55:53] Epoch: 1 Batch: 13422/38378 (34.97%) Loss: 1.874871 LR: 0.00004212 [05:55:55] Epoch: 1 Batch: 13423/38378 (34.98%) Loss: 1.853147 LR: 0.00004212 [05:55:57] Epoch: 1 Batch: 13424/38378 (34.98%) Loss: 2.216888 LR: 0.00004212 [05:55:59] Epoch: 1 Batch: 13425/38378 (34.98%) Loss: 1.632614 LR: 0.00004212 [05:56:00] Epoch: 1 Batch: 13426/38378 (34.98%) Loss: 1.792318 LR: 0.00004212 [05:56:02] Epoch: 1 Batch: 13427/38378 (34.99%) Loss: 2.218405 LR: 0.00004212 [05:56:04] Epoch: 1 Batch: 13428/38378 (34.99%) Loss: 1.665950 LR: 0.00004211 [05:56:06] Epoch: 1 Batch: 13429/38378 (34.99%) Loss: 2.136984 LR: 0.00004211 [05:56:08] Epoch: 1 Batch: 13430/38378 (34.99%) Loss: 1.935186 LR: 0.00004211 [05:56:10] Epoch: 1 Batch: 13431/38378 (35.00%) Loss: 1.998514 LR: 0.00004211 [05:56:11] Epoch: 1 Batch: 13432/38378 (35.00%) Loss: 2.003148 LR: 0.00004211 [05:56:13] Epoch: 1 Batch: 13433/38378 (35.00%) Loss: 2.268252 LR: 0.00004211 [05:56:15] Epoch: 1 Batch: 13434/38378 (35.00%) Loss: 1.714669 LR: 0.00004211 [05:56:17] Epoch: 1 Batch: 13435/38378 (35.01%) Loss: 2.057007 LR: 0.00004210 [05:56:19] Epoch: 1 Batch: 13436/38378 (35.01%) Loss: 1.816217 LR: 0.00004210 [05:56:20] Epoch: 1 Batch: 13437/38378 (35.01%) Loss: 2.061946 LR: 0.00004210 [05:56:22] Epoch: 1 Batch: 13438/38378 (35.01%) Loss: 2.018429 LR: 0.00004210 [05:56:24] Epoch: 1 Batch: 13439/38378 (35.02%) Loss: 1.799547 LR: 0.00004210 [05:56:26] Epoch: 1 Batch: 13440/38378 (35.02%) Loss: 2.271762 LR: 0.00004210 [05:56:28] Epoch: 1 Batch: 13441/38378 (35.02%) Loss: 2.017634 LR: 0.00004210 [05:56:29] Epoch: 1 Batch: 13442/38378 (35.03%) Loss: 2.142142 LR: 0.00004209 [05:56:31] Epoch: 1 Batch: 13443/38378 (35.03%) Loss: 1.814926 LR: 0.00004209 [05:56:33] Epoch: 1 Batch: 13444/38378 (35.03%) Loss: 2.190048 LR: 0.00004209 [05:56:35] Epoch: 1 Batch: 13445/38378 (35.03%) Loss: 1.963174 LR: 0.00004209 [05:56:37] Epoch: 1 Batch: 13446/38378 (35.04%) Loss: 2.024485 LR: 0.00004209 [05:56:38] Epoch: 1 Batch: 13447/38378 (35.04%) Loss: 2.220762 LR: 0.00004209 [05:56:40] Epoch: 1 Batch: 13448/38378 (35.04%) Loss: 1.803303 LR: 0.00004209 [05:56:42] Epoch: 1 Batch: 13449/38378 (35.04%) Loss: 2.137239 LR: 0.00004208 [05:56:43] Epoch: 1 Batch: 13450/38378 (35.05%) Loss: 2.042994 LR: 0.00004208 [05:56:45] Epoch: 1 Batch: 13451/38378 (35.05%) Loss: 1.630613 LR: 0.00004208 [05:56:47] Epoch: 1 Batch: 13452/38378 (35.05%) Loss: 1.944621 LR: 0.00004208 [05:56:49] Epoch: 1 Batch: 13453/38378 (35.05%) Loss: 1.893719 LR: 0.00004208 [05:56:51] Epoch: 1 Batch: 13454/38378 (35.06%) Loss: 2.145149 LR: 0.00004208 [05:56:52] Epoch: 1 Batch: 13455/38378 (35.06%) Loss: 1.962401 LR: 0.00004208 [05:56:54] Epoch: 1 Batch: 13456/38378 (35.06%) Loss: 2.425120 LR: 0.00004207 [05:56:56] Epoch: 1 Batch: 13457/38378 (35.06%) Loss: 2.216562 LR: 0.00004207 [05:56:58] Epoch: 1 Batch: 13458/38378 (35.07%) Loss: 2.129645 LR: 0.00004207 [05:57:00] Epoch: 1 Batch: 13459/38378 (35.07%) Loss: 1.827770 LR: 0.00004207 [05:57:01] Epoch: 1 Batch: 13460/38378 (35.07%) Loss: 1.866120 LR: 0.00004207 [05:57:03] Epoch: 1 Batch: 13461/38378 (35.07%) Loss: 2.028347 LR: 0.00004207 [05:57:05] Epoch: 1 Batch: 13462/38378 (35.08%) Loss: 2.099698 LR: 0.00004207 [05:57:07] Epoch: 1 Batch: 13463/38378 (35.08%) Loss: 1.672220 LR: 0.00004206 [05:57:09] Epoch: 1 Batch: 13464/38378 (35.08%) Loss: 2.187922 LR: 0.00004206 [05:57:10] Epoch: 1 Batch: 13465/38378 (35.09%) Loss: 2.129605 LR: 0.00004206 [05:57:12] Epoch: 1 Batch: 13466/38378 (35.09%) Loss: 2.073114 LR: 0.00004206 [05:57:14] Epoch: 1 Batch: 13467/38378 (35.09%) Loss: 1.814349 LR: 0.00004206 [05:57:16] Epoch: 1 Batch: 13468/38378 (35.09%) Loss: 1.957675 LR: 0.00004206 [05:57:17] Epoch: 1 Batch: 13469/38378 (35.10%) Loss: 1.958274 LR: 0.00004206 [05:57:19] Epoch: 1 Batch: 13470/38378 (35.10%) Loss: 2.025383 LR: 0.00004206 [05:57:21] Epoch: 1 Batch: 13471/38378 (35.10%) Loss: 1.703118 LR: 0.00004206 [05:57:23] Epoch: 1 Batch: 13472/38378 (35.10%) Loss: 1.997204 LR: 0.00004206 [05:57:25] Epoch: 1 Batch: 13473/38378 (35.11%) Loss: 2.300962 LR: 0.00004206 [05:57:26] Epoch: 1 Batch: 13474/38378 (35.11%) Loss: 1.929263 LR: 0.00004206 [05:57:28] Epoch: 1 Batch: 13475/38378 (35.11%) Loss: 2.219242 LR: 0.00004206 [05:57:30] Epoch: 1 Batch: 13476/38378 (35.11%) Loss: 1.970456 LR: 0.00004206 [05:57:32] Epoch: 1 Batch: 13477/38378 (35.12%) Loss: 2.100895 LR: 0.00004205 [05:57:34] Epoch: 1 Batch: 13478/38378 (35.12%) Loss: 1.959790 LR: 0.00004205 [05:57:35] Epoch: 1 Batch: 13479/38378 (35.12%) Loss: 2.267162 LR: 0.00004205 [05:57:37] Epoch: 1 Batch: 13480/38378 (35.12%) Loss: 2.234771 LR: 0.00004205 [05:57:39] Epoch: 1 Batch: 13481/38378 (35.13%) Loss: 1.927246 LR: 0.00004205 [05:57:41] Epoch: 1 Batch: 13482/38378 (35.13%) Loss: 2.082996 LR: 0.00004205 [05:57:42] Epoch: 1 Batch: 13483/38378 (35.13%) Loss: 2.160925 LR: 0.00004205 [05:57:44] Epoch: 1 Batch: 13484/38378 (35.13%) Loss: 1.826861 LR: 0.00004204 [05:57:46] Epoch: 1 Batch: 13485/38378 (35.14%) Loss: 2.073616 LR: 0.00004204 [05:57:48] Epoch: 1 Batch: 13486/38378 (35.14%) Loss: 2.368035 LR: 0.00004204 [05:57:49] Epoch: 1 Batch: 13487/38378 (35.14%) Loss: 1.795648 LR: 0.00004204 [05:57:51] Epoch: 1 Batch: 13488/38378 (35.15%) Loss: 2.248913 LR: 0.00004204 [05:57:53] Epoch: 1 Batch: 13489/38378 (35.15%) Loss: 2.053063 LR: 0.00004204 [05:57:55] Epoch: 1 Batch: 13490/38378 (35.15%) Loss: 2.192280 LR: 0.00004204 [05:57:57] Epoch: 1 Batch: 13491/38378 (35.15%) Loss: 2.182549 LR: 0.00004203 [05:57:59] Epoch: 1 Batch: 13492/38378 (35.16%) Loss: 1.887847 LR: 0.00004203 [05:58:00] Epoch: 1 Batch: 13493/38378 (35.16%) Loss: 1.989728 LR: 0.00004203 [05:58:02] Epoch: 1 Batch: 13494/38378 (35.16%) Loss: 2.030199 LR: 0.00004203 [05:58:04] Epoch: 1 Batch: 13495/38378 (35.16%) Loss: 2.117436 LR: 0.00004203 [05:58:06] Epoch: 1 Batch: 13496/38378 (35.17%) Loss: 1.748180 LR: 0.00004203 [05:58:08] Epoch: 1 Batch: 13497/38378 (35.17%) Loss: 1.986119 LR: 0.00004203 [05:58:10] Epoch: 1 Batch: 13498/38378 (35.17%) Loss: 2.104659 LR: 0.00004202 [05:58:11] Epoch: 1 Batch: 13499/38378 (35.17%) Loss: 2.130208 LR: 0.00004202 [05:58:13] >> Evaluating batch 0 [05:58:14] >> Evaluating batch 1 [05:58:15] >> Evaluating batch 2 [05:58:16] >> Evaluating batch 3 [05:58:17] >> Evaluating batch 4 [05:58:18] >> Evaluating batch 5 [05:58:19] >> Evaluating batch 6 [05:58:20] >> Evaluating batch 7 [05:58:21] >> Evaluating batch 8 [05:58:22] >> Evaluating batch 9 [05:58:23] >> Evaluating batch 10 [05:58:24] >> Evaluating batch 11 [05:58:25] >> Evaluating batch 12 [05:58:26] >> Evaluating batch 13 [05:58:27] >> Evaluating batch 14 [05:58:28] >> Evaluating batch 15 [05:58:29] >> Evaluating batch 16 [05:58:30] Epoch: 1 Step: 13500/38378 Evaluation: [05:58:30] [1mAvg Loss Since Last Eval: 2.0042 Val Loss: 2.1000 Validation loss delta: -0.0063 Perplexity: 8.1658 LR: 0.00004202 [05:58:34] >> Cleaned up old temp checkpoint: epoch1_step12500 [05:58:34] >> Temp checkpoint saved: epoch1_step13500, size: 0.1702 GB [05:58:38] >> Checkpoint saved: epoch1_step13500, size: 0.1702 GB [05:58:38] Epoch: 1 Batch: 13500/38378 (35.18%) Loss: 2.031862 LR: 0.00004202 [05:58:40] Epoch: 1 Batch: 13501/38378 (35.18%) Loss: 2.040909 LR: 0.00004202 [05:58:42] Epoch: 1 Batch: 13502/38378 (35.18%) Loss: 1.860996 LR: 0.00004202 [05:58:43] Epoch: 1 Batch: 13503/38378 (35.18%) Loss: 1.974870 LR: 0.00004202 [05:58:45] Epoch: 1 Batch: 13504/38378 (35.19%) Loss: 2.046797 LR: 0.00004202 [05:58:47] Epoch: 1 Batch: 13505/38378 (35.19%) Loss: 2.052102 LR: 0.00004201 [05:58:49] Epoch: 1 Batch: 13506/38378 (35.19%) Loss: 1.898687 LR: 0.00004201 [05:58:51] Epoch: 1 Batch: 13507/38378 (35.19%) Loss: 2.003723 LR: 0.00004201 [05:58:53] Epoch: 1 Batch: 13508/38378 (35.20%) Loss: 1.954631 LR: 0.00004201 [05:58:54] Epoch: 1 Batch: 13509/38378 (35.20%) Loss: 2.107064 LR: 0.00004201 [05:58:56] Epoch: 1 Batch: 13510/38378 (35.20%) Loss: 2.193435 LR: 0.00004201 [05:58:58] Epoch: 1 Batch: 13511/38378 (35.21%) Loss: 2.029717 LR: 0.00004201 [08:18:33] 2025-08-12 [08:18:34] Tesla T4 [08:18:34] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [08:18:34] CPU usage: 94.5%, RAM usage: 26.2% [08:18:34] Running with the following configuration: [08:18:34] model_name: NousResearch/Hermes-3-Llama-3.1-8B [08:18:34] tokenizer: NousResearch/Hermes-3-Llama-3.1-8B [08:18:34] output_dir: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview [08:18:34] train_path: /content/drive/MyDrive/data/none.csv [08:18:34] checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13500 [08:18:34] lr: 5e-05 [08:18:34] lr_floor: 1e-05 [08:18:34] epochs: 1 [08:18:34] batch_size: 5 [08:18:34] accum_steps: 7 [08:18:34] val_batch_size: 6 [08:18:34] max_val_size: 100 [08:18:34] max_length: 150 [08:18:34] save_temp_frequency: 100 [08:18:34] save_frequency: 500 [08:18:34] eval_frequency: 500 [08:18:34] save_pattern: y [08:18:34] quantization: y [08:18:34] quantization_bits: 4 [08:18:34] lora: y [08:18:34] frozen_lora_path: None [08:18:34] lora_rank: 16 [08:18:34] lora_alpha: 32 [08:18:34] lora_dropout: 0.08 [08:18:34] optimizer_weight_decay: 0.0 [08:18:34] warmup_type: cosine [08:18:34] warmup_ratio: 0.08 [08:18:34] warmup_steps: 439 [08:18:34] shuffle: y [08:18:34] csv_column: text [08:18:34] new_run: n [08:18:34] label_smoothing: 0.05 [08:18:34] SEED: 1 [08:18:34] Using device: cuda [08:18:35] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13500 [08:20:03] Embeddings shape after: torch.Size([128256, 4096]) [08:20:09] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13500 [08:20:09] Trainable LoRA 'default': [08:20:09] task_type: CAUSAL_LM [08:20:09] peft_type: PeftType.LORA [08:20:09] auto_mapping: None [08:20:09] base_model_name_or_path: NousResearch/Hermes-3-Llama-3.1-8B [08:20:09] revision: None [08:20:09] inference_mode: False [08:20:09] r: 16 [08:20:09] target_modules: {'v_proj', 'o_proj', 'q_proj', 'k_proj'} [08:20:09] exclude_modules: None [08:20:09] lora_alpha: 32 [08:20:09] lora_dropout: 0.08 [08:20:09] fan_in_fan_out: False [08:20:09] bias: none [08:20:09] use_rslora: True [08:20:09] modules_to_save: None [08:20:10] init_lora_weights: True [08:20:10] layers_to_transform: None [08:20:10] layers_pattern: None [08:20:10] rank_pattern: {} [08:20:10] alpha_pattern: {} [08:20:10] megatron_config: None [08:20:10] megatron_core: megatron.core [08:20:10] trainable_token_indices: None [08:20:10] loftq_config: {} [08:20:10] eva_config: None [08:20:10] corda_config: None [08:20:10] use_dora: False [08:20:10] use_qalora: False [08:20:10] qalora_group_size: 16 [08:20:10] layer_replication: None [08:20:10] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [08:20:10] lora_bias: False [08:20:10] target_parameters: None [08:20:10] _custom_modules: None [08:20:10] Embeddings shape after: torch.Size([128256, 4096]) [08:20:10] Warning: No training state found at /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13500/training_state.pt [08:20:10] TRAINING: base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:10] TRAINING: base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:10] TRAINING: base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:10] TRAINING: base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] TRAINING: base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:20:11] TRAINING: base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:20:11] TRAINING: base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:20:11] Total Parameters: 4,554,231,808 [08:20:11] Trainable Parameters: 13,631,488 [08:20:11] Trainable %: 0.2993% [08:20:11] base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:11] base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:20:12] Warning: Training state not found, starting from epoch 1 [08:20:12] Starting from CSV file... [08:20:16] Splitting data into chunks of 11000... [08:20:16] Using 7 processes across 18 chunks [08:22:03] 2025-08-12 [08:22:03] Tesla T4 [08:22:03] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [08:22:03] CPU usage: 97.9%, RAM usage: 23.4% [08:22:03] Running with the following configuration: [08:22:03] model_name: NousResearch/Hermes-3-Llama-3.1-8B [08:22:03] tokenizer: NousResearch/Hermes-3-Llama-3.1-8B [08:22:03] output_dir: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview [08:22:03] train_path: /content/drive/MyDrive/data/none.csv [08:22:03] checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13500 [08:22:03] lr: 5e-05 [08:22:03] lr_floor: 1e-05 [08:22:03] epochs: 1 [08:22:03] batch_size: 5 [08:22:03] accum_steps: 7 [08:22:03] val_batch_size: 6 [08:22:03] max_val_size: 100 [08:22:03] max_length: 150 [08:22:03] save_temp_frequency: 100 [08:22:03] save_frequency: 500 [08:22:03] eval_frequency: 500 [08:22:03] save_pattern: y [08:22:03] quantization: y [08:22:03] quantization_bits: 4 [08:22:03] lora: y [08:22:03] frozen_lora_path: None [08:22:03] lora_rank: 16 [08:22:03] lora_alpha: 32 [08:22:03] lora_dropout: 0.08 [08:22:03] optimizer_weight_decay: 0.0 [08:22:03] warmup_type: cosine [08:22:03] warmup_ratio: 0.08 [08:22:03] warmup_steps: 439 [08:22:03] shuffle: y [08:22:03] csv_column: text [08:22:03] new_run: n [08:22:03] label_smoothing: 0.05 [08:22:03] SEED: 1 [08:22:03] Using device: cuda [08:22:03] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13500 [08:23:35] Embeddings shape after: torch.Size([128256, 4096]) [08:23:38] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13500 [08:23:38] Trainable LoRA 'default': [08:23:38] task_type: CAUSAL_LM [08:23:38] peft_type: PeftType.LORA [08:23:38] auto_mapping: None [08:23:38] base_model_name_or_path: NousResearch/Hermes-3-Llama-3.1-8B [08:23:38] revision: None [08:23:38] inference_mode: False [08:23:38] r: 16 [08:23:38] target_modules: {'k_proj', 'q_proj', 'v_proj', 'o_proj'} [08:23:39] exclude_modules: None [08:23:39] lora_alpha: 32 [08:23:39] lora_dropout: 0.08 [08:23:39] fan_in_fan_out: False [08:23:39] bias: none [08:23:39] use_rslora: True [08:23:39] modules_to_save: None [08:23:39] init_lora_weights: True [08:23:39] layers_to_transform: None [08:23:39] layers_pattern: None [08:23:39] rank_pattern: {} [08:23:39] alpha_pattern: {} [08:23:39] megatron_config: None [08:23:39] megatron_core: megatron.core [08:23:39] trainable_token_indices: None [08:23:39] loftq_config: {} [08:23:39] eva_config: None [08:23:39] corda_config: None [08:23:39] use_dora: False [08:23:39] use_qalora: False [08:23:39] qalora_group_size: 16 [08:23:39] layer_replication: None [08:23:39] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [08:23:39] lora_bias: False [08:23:39] target_parameters: None [08:23:39] _custom_modules: None [08:23:39] Embeddings shape after: torch.Size([128256, 4096]) [08:23:39] Warning: No training state found at /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13500/training_state.pt [08:23:39] TRAINING: base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:39] TRAINING: base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:39] TRAINING: base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:39] TRAINING: base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] TRAINING: base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight - shape: torch.Size([1024, 16]) [08:23:40] TRAINING: base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight - shape: torch.Size([16, 4096]) [08:23:40] TRAINING: base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight - shape: torch.Size([4096, 16]) [08:23:40] Total Parameters: 4,554,231,808 [08:23:40] Trainable Parameters: 13,631,488 [08:23:40] Trainable %: 0.2993% [08:23:40] base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.0.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.0.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.0.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.0.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.1.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.1.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.1.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.1.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.2.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.2.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.2.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.2.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.3.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.3.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.3.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.3.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.4.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.4.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.4.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.4.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:40] base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.5.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.5.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.5.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.5.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.6.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.6.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.6.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.6.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.7.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.7.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.7.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.7.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.8.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.8.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.8.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.8.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.9.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.9.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.9.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.9.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.10.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.10.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.10.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.10.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.11.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.11.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.11.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.11.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.12.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.12.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.12.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.12.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.13.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.13.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.13.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.13.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.14.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.14.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.14.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.14.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.15.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.15.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.15.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.15.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.16.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.16.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.16.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.16.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.17.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.17.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.17.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.17.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.18.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.18.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.18.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.18.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.19.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.19.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.19.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.19.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.20.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.20.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.20.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.20.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.21.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.21.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.21.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.21.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.22.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.22.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.22.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.22.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.23.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.23.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.23.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.23.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.24.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.24.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.24.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.24.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.25.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.25.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.25.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.25.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.26.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.26.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.26.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.26.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.27.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.27.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.27.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.27.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.28.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.28.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.28.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.28.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.29.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.29.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.29.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.29.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:41] base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.30.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.30.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.30.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.30.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.31.self_attn.k_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.31.self_attn.k_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.31.self_attn.o_proj.lora_A.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] base_model.model.model.layers.31.self_attn.o_proj.lora_B.default.weight: dtype=torch.float32, device=cuda:0 [08:23:42] Warning: Training state not found, starting from epoch 1 [08:23:42] Starting from CSV file... [08:23:42] Splitting data into chunks of 11000... [08:23:42] Using 7 processes across 18 chunks [08:24:48] 2025-08-12 [08:24:48] Tesla T4 [08:24:48] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Active memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Requested memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | GPU reserved memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 0 B | 0 B | 0 B | 0 B | |---------------------------------------------------------------------------| | Allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Active allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | GPU reserved segments | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [08:24:48] CPU usage: 97.5%, RAM usage: 24.9% [08:24:48] Running with the following configuration: [08:24:48] model_name: NousResearch/Hermes-3-Llama-3.1-8B [08:24:48] tokenizer: NousResearch/Hermes-3-Llama-3.1-8B [08:24:48] output_dir: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview [08:24:48] train_path: /content/drive/MyDrive/data/none.csv [08:24:48] checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13400 [08:24:48] lr: 5e-05 [08:24:48] lr_floor: 1e-05 [08:24:48] epochs: 1 [08:24:48] batch_size: 5 [08:24:48] accum_steps: 7 [08:24:48] val_batch_size: 6 [08:24:48] max_val_size: 100 [08:24:48] max_length: 150 [08:24:48] save_temp_frequency: 100 [08:24:48] save_frequency: 500 [08:24:48] eval_frequency: 500 [08:24:48] save_pattern: y [08:24:48] quantization: y [08:24:48] quantization_bits: 4 [08:24:48] lora: y [08:24:48] frozen_lora_path: None [08:24:48] lora_rank: 16 [08:24:48] lora_alpha: 32 [08:24:48] lora_dropout: 0.08 [08:24:48] optimizer_weight_decay: 0.0 [08:24:48] warmup_type: cosine [08:24:48] warmup_ratio: 0.08 [08:24:48] warmup_steps: 439 [08:24:48] shuffle: y [08:24:48] csv_column: text [08:24:48] new_run: n [08:24:48] label_smoothing: 0.05 [08:24:48] SEED: 1 [08:24:48] Using device: cuda [08:24:48] Resuming from temp checkpoint: /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13400 [08:26:22] Embeddings shape after: torch.Size([128256, 4096]) [08:26:28] Loaded trainable LoRA adapter from /content/drive/MyDrive/llm/Discord-Micae-8B-Preview/temp/epoch1_step13400 [08:26:28] Trainable LoRA 'default': [08:26:28] task_type: CAUSAL_LM [08:26:28] peft_type: PeftType.LORA [08:26:28] auto_mapping: None [08:26:28] base_model_name_or_path: NousResearch/Hermes-3-Llama-3.1-8B [08:26:28] revision: None [08:26:28] inference_mode: False [08:26:28] r: 16 [08:26:28] target_modules: {'q_proj', 'k_proj', 'o_proj', 'v_proj'} [08:26:28] exclude_modules: None [08:26:28] lora_alpha: 32 [08:26:28] lora_dropout: 0.08 [08:26:29] fan_in_fan_out: False [08:26:29] bias: none [08:26:29] use_rslora: True [08:26:29] modules_to_save: None [08:26:29] init_lora_weights: True [08:26:29] layers_to_transform: None [08:26:29] layers_pattern: None [08:26:29] rank_pattern: {} [08:26:29] alpha_pattern: {} [08:26:29] megatron_config: None [08:26:29] megatron_core: megatron.core [08:26:29] trainable_token_indices: None [08:26:29] loftq_config: {} [08:26:29] eva_config: None [08:26:29] corda_config: None [08:26:29] use_dora: False [08:26:29] use_qalora: False [08:26:29] qalora_group_size: 16 [08:26:29] layer_replication: None [08:26:29] runtime_config: LoraRuntimeConfig(ephemeral_gpu_offload=False) [08:26:29] lora_bias: False [08:26:29] target_parameters: None [08:26:29] _custom_modules: None [08:26:29] Embeddings shape after: torch.Size([128256, 4096]) [08:26:43] Resumed from epoch 1, step 13401, file 1 [08:26:43] Starting from CSV file... [08:26:45] Splitting data into chunks of 11000... [08:26:45] Using 7 processes across 18 chunks [08:26:46] Using saved train/val split from checkpoint. [08:26:46] Resuming scheduler with warmup steps: 438, total steps: 5482 [08:26:46] Initializing scheduler with cosine schedule with warmup, warmup steps 439, total steps: 5482 [08:26:46] Train/Val split: 191887 train, 100 val samples. [08:26:56] Model: PeftModelForCausalLM [08:26:56] Model config: LlamaConfig { "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "attention_dropout": 0.0, "bos_token_id": 128000, "eos_token_id": 128040, "head_dim": 128, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 131072, "mlp_bias": false, "model_type": "llama", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "pretraining_tp": 1, "quantization_config": { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "float16", "bnb_4bit_quant_storage": "uint8", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": [ "lm_head" ], "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }, "rms_norm_eps": 1e-05, "rope_scaling": { "factor": 8.0, "high_freq_factor": 4.0, "low_freq_factor": 1.0, "original_max_position_embeddings": 8192, "rope_type": "llama3" }, "rope_theta": 500000.0, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.55.0", "use_cache": true, "vocab_size": 128256 } [08:26:56] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [08:26:56] Optimizer: PagedAdamW ( Parameter Group 0 alpha: 0.0 betas: (0.9, 0.95) eps: 1e-08 initial_lr: 5e-05 lr: 0.0 t_alpha: None t_beta3: None weight_decay: 0.0 ) [08:26:56] Optimizer params: lr=5e-05, weight_decay=0.0, accum_steps=7 [08:26:56] Scheduler: [08:26:56] Training on 191887 training samples, 100 validation samples [08:26:56] Average tokens per sample: 141.99 [08:26:56] Estimated epoch time: ~619.38 min [08:26:56] |===========================================================================| | PyTorch CUDA memory summary, device ID 0 | |---------------------------------------------------------------------------| | CUDA OOMs: 0 | cudaMalloc retries: 0 | |===========================================================================| | Metric | Cur Usage | Peak Usage | Tot Alloc | Tot Freed | |---------------------------------------------------------------------------| | Allocated memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Active memory | 5988 MiB | 7009 MiB | 332915 MiB | 326926 MiB | |---------------------------------------------------------------------------| | Requested memory | 5983 MiB | 7000 MiB | 332173 MiB | 326190 MiB | |---------------------------------------------------------------------------| | GPU reserved memory | 7616 MiB | 7616 MiB | 7616 MiB | 0 B | |---------------------------------------------------------------------------| | Non-releasable memory | 1259 MiB | 5879 MiB | 333261 MiB | 332002 MiB | |---------------------------------------------------------------------------| | Allocations | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | Active allocs | 2762 | 2840 | 33883 | 31121 | |---------------------------------------------------------------------------| | GPU reserved segments | 186 | 186 | 186 | 0 | |---------------------------------------------------------------------------| | Non-releasable allocs | 33 | 37 | 12954 | 12921 | |---------------------------------------------------------------------------| | Oversize allocations | 0 | 0 | 0 | 0 | |---------------------------------------------------------------------------| | Oversize GPU segments | 0 | 0 | 0 | 0 | |===========================================================================| [08:26:56] Restoring shuffle indices from training state for epoch 1 [08:26:56] CPU usage: 55.7%, RAM usage: 40.5% [08:26:56] Epoch 1 learning rate: 0.0 [08:26:56] Starting epoch 1 [08:27:43] Batch 13401: input_ids shape torch.Size([5, 150]), attention_mask shape torch.Size([5, 150]) [08:27:44] Epoch: 1 Batch: 13401/38378 (34.92%) Loss: 1.823305 LR: 0.00000000 [08:27:46] Epoch: 1 Batch: 13402/38378 (34.92%) Loss: 1.973170 LR: 0.00000000 [08:27:48] Epoch: 1 Batch: 13403/38378 (34.92%) Loss: 2.111477 LR: 0.00000000 [08:27:49] Epoch: 1 Batch: 13404/38378 (34.93%) Loss: 1.971268 LR: 0.00000000 [08:27:51] Epoch: 1 Batch: 13405/38378 (34.93%) Loss: 2.181599 LR: 0.00000000 [08:27:52] Epoch: 1 Batch: 13406/38378 (34.93%) Loss: 2.064401 LR: 0.00000000 [08:27:54] Epoch: 1 Batch: 13407/38378 (34.93%) Loss: 1.932939 LR: 0.00004214 [08:27:56] Epoch: 1 Batch: 13408/38378 (34.94%) Loss: 2.228264 LR: 0.00004214 [08:27:57] Epoch: 1 Batch: 13409/38378 (34.94%) Loss: 1.942464 LR: 0.00004214 [08:27:59] Epoch: 1 Batch: 13410/38378 (34.94%) Loss: 1.939871 LR: 0.00004214 [08:28:00] Epoch: 1 Batch: 13411/38378 (34.94%) Loss: 1.747177 LR: 0.00004214 [08:28:02] Epoch: 1 Batch: 13412/38378 (34.95%) Loss: 2.143284 LR: 0.00004214 [08:28:04] Epoch: 1 Batch: 13413/38378 (34.95%) Loss: 1.895698 LR: 0.00004214 [08:28:05] Epoch: 1 Batch: 13414/38378 (34.95%) Loss: 1.836198 LR: 0.00004213 [08:28:07] Epoch: 1 Batch: 13415/38378 (34.95%) Loss: 1.918679 LR: 0.00004213 [08:28:08] Epoch: 1 Batch: 13416/38378 (34.96%) Loss: 1.914530 LR: 0.00004213 [08:28:10] Epoch: 1 Batch: 13417/38378 (34.96%) Loss: 2.097434 LR: 0.00004213 [08:28:12] Epoch: 1 Batch: 13418/38378 (34.96%) Loss: 2.098905 LR: 0.00004213 [08:28:13] Epoch: 1 Batch: 13419/38378 (34.97%) Loss: 2.036600 LR: 0.00004213 [08:28:15] Epoch: 1 Batch: 13420/38378 (34.97%) Loss: 1.879557 LR: 0.00004213 [08:28:16] Epoch: 1 Batch: 13421/38378 (34.97%) Loss: 2.073641 LR: 0.00004212 [08:28:18] Epoch: 1 Batch: 13422/38378 (34.97%) Loss: 1.875441 LR: 0.00004212 [08:28:20] Epoch: 1 Batch: 13423/38378 (34.98%) Loss: 1.851702 LR: 0.00004212 [08:28:21] Epoch: 1 Batch: 13424/38378 (34.98%) Loss: 2.214441 LR: 0.00004212 [08:28:23] Epoch: 1 Batch: 13425/38378 (34.98%) Loss: 1.630403 LR: 0.00004212 [08:28:25] Epoch: 1 Batch: 13426/38378 (34.98%) Loss: 1.795471 LR: 0.00004212 [08:28:26] Epoch: 1 Batch: 13427/38378 (34.99%) Loss: 2.221330 LR: 0.00004212 [08:28:28] Epoch: 1 Batch: 13428/38378 (34.99%) Loss: 1.666821 LR: 0.00004211 [08:28:30] Epoch: 1 Batch: 13429/38378 (34.99%) Loss: 2.136977 LR: 0.00004211 [08:28:31] Epoch: 1 Batch: 13430/38378 (34.99%) Loss: 1.930993 LR: 0.00004211 [08:28:33] Epoch: 1 Batch: 13431/38378 (35.00%) Loss: 1.999146 LR: 0.00004211 [08:28:35] Epoch: 1 Batch: 13432/38378 (35.00%) Loss: 2.003677 LR: 0.00004211 [08:28:36] Epoch: 1 Batch: 13433/38378 (35.00%) Loss: 2.269207 LR: 0.00004211 [08:28:38] Epoch: 1 Batch: 13434/38378 (35.00%) Loss: 1.716201 LR: 0.00004211 [08:28:40] Epoch: 1 Batch: 13435/38378 (35.01%) Loss: 2.061193 LR: 0.00004210 [08:28:41] Epoch: 1 Batch: 13436/38378 (35.01%) Loss: 1.815125 LR: 0.00004210 [08:28:43] Epoch: 1 Batch: 13437/38378 (35.01%) Loss: 2.064722 LR: 0.00004210 [08:28:45] Epoch: 1 Batch: 13438/38378 (35.01%) Loss: 2.016171 LR: 0.00004210 [08:28:47] Epoch: 1 Batch: 13439/38378 (35.02%) Loss: 1.800730 LR: 0.00004210 [08:28:48] Epoch: 1 Batch: 13440/38378 (35.02%) Loss: 2.270462 LR: 0.00004210 [08:28:50] Epoch: 1 Batch: 13441/38378 (35.02%) Loss: 2.017740 LR: 0.00004210 [08:28:52] Epoch: 1 Batch: 13442/38378 (35.03%) Loss: 2.144624 LR: 0.00004209 [08:28:54] Epoch: 1 Batch: 13443/38378 (35.03%) Loss: 1.814462 LR: 0.00004209 [08:28:55] Epoch: 1 Batch: 13444/38378 (35.03%) Loss: 2.187500 LR: 0.00004209 [08:28:57] Epoch: 1 Batch: 13445/38378 (35.03%) Loss: 1.962030 LR: 0.00004209 [08:28:59] Epoch: 1 Batch: 13446/38378 (35.04%) Loss: 2.023931 LR: 0.00004209 [08:29:00] Epoch: 1 Batch: 13447/38378 (35.04%) Loss: 2.221822 LR: 0.00004209 [08:29:02] Epoch: 1 Batch: 13448/38378 (35.04%) Loss: 1.799438 LR: 0.00004209 [08:29:04] Epoch: 1 Batch: 13449/38378 (35.04%) Loss: 2.134984 LR: 0.00004208 [08:29:05] Epoch: 1 Batch: 13450/38378 (35.05%) Loss: 2.042431 LR: 0.00004208 [08:29:07] Epoch: 1 Batch: 13451/38378 (35.05%) Loss: 1.632847 LR: 0.00004208 [08:29:09] Epoch: 1 Batch: 13452/38378 (35.05%) Loss: 1.944418 LR: 0.00004208 [08:29:10] Epoch: 1 Batch: 13453/38378 (35.05%) Loss: 1.892594 LR: 0.00004208 [08:29:12] Epoch: 1 Batch: 13454/38378 (35.06%) Loss: 2.147931 LR: 0.00004208 [08:29:14] Epoch: 1 Batch: 13455/38378 (35.06%) Loss: 1.959502 LR: 0.00004208 [08:29:15] Epoch: 1 Batch: 13456/38378 (35.06%) Loss: 2.426257 LR: 0.00004207 [08:29:17] Epoch: 1 Batch: 13457/38378 (35.06%) Loss: 2.214834 LR: 0.00004207 [08:29:19] Epoch: 1 Batch: 13458/38378 (35.07%) Loss: 2.127820 LR: 0.00004207 [08:29:20] Epoch: 1 Batch: 13459/38378 (35.07%) Loss: 1.826477 LR: 0.00004207 [08:29:22] Epoch: 1 Batch: 13460/38378 (35.07%) Loss: 1.864570 LR: 0.00004207 [08:29:24] Epoch: 1 Batch: 13461/38378 (35.07%) Loss: 2.030092 LR: 0.00004207 [08:29:25] Epoch: 1 Batch: 13462/38378 (35.08%) Loss: 2.097678 LR: 0.00004207 [08:29:27] Epoch: 1 Batch: 13463/38378 (35.08%) Loss: 1.674179 LR: 0.00004206 [08:29:29] Epoch: 1 Batch: 13464/38378 (35.08%) Loss: 2.185902 LR: 0.00004206 [08:29:30] Epoch: 1 Batch: 13465/38378 (35.09%) Loss: 2.128707 LR: 0.00004206 [08:29:32] Epoch: 1 Batch: 13466/38378 (35.09%) Loss: 2.074733 LR: 0.00004206 [08:29:34] Epoch: 1 Batch: 13467/38378 (35.09%) Loss: 1.815334 LR: 0.00004206 [08:29:35] Epoch: 1 Batch: 13468/38378 (35.09%) Loss: 1.956867 LR: 0.00004206 [08:29:37] Epoch: 1 Batch: 13469/38378 (35.10%) Loss: 1.960581 LR: 0.00004206 [08:29:39] Epoch: 1 Batch: 13470/38378 (35.10%) Loss: 2.026441 LR: 0.00004206 [08:29:40] Epoch: 1 Batch: 13471/38378 (35.10%) Loss: 1.701879 LR: 0.00004206 [08:29:42] Epoch: 1 Batch: 13472/38378 (35.10%) Loss: 1.994706 LR: 0.00004206 [08:29:43] Epoch: 1 Batch: 13473/38378 (35.11%) Loss: 2.300013 LR: 0.00004206 [08:29:45] Epoch: 1 Batch: 13474/38378 (35.11%) Loss: 1.928253 LR: 0.00004206 [08:29:47] Epoch: 1 Batch: 13475/38378 (35.11%) Loss: 2.225573 LR: 0.00004206 [08:29:48] Epoch: 1 Batch: 13476/38378 (35.11%) Loss: 1.972857 LR: 0.00004206 [08:29:50] Epoch: 1 Batch: 13477/38378 (35.12%) Loss: 2.099809 LR: 0.00004205 [08:29:52] Epoch: 1 Batch: 13478/38378 (35.12%) Loss: 1.962304 LR: 0.00004205 [08:29:53] Epoch: 1 Batch: 13479/38378 (35.12%) Loss: 2.266814 LR: 0.00004205 [08:29:55] Epoch: 1 Batch: 13480/38378 (35.12%) Loss: 2.234650 LR: 0.00004205 [08:29:56] Epoch: 1 Batch: 13481/38378 (35.13%) Loss: 1.928932 LR: 0.00004205 [08:29:58] Epoch: 1 Batch: 13482/38378 (35.13%) Loss: 2.080168 LR: 0.00004205 [08:30:00] Epoch: 1 Batch: 13483/38378 (35.13%) Loss: 2.162432 LR: 0.00004205 [08:30:01] Epoch: 1 Batch: 13484/38378 (35.13%) Loss: 1.827424 LR: 0.00004204 [08:30:03] Epoch: 1 Batch: 13485/38378 (35.14%) Loss: 2.074479 LR: 0.00004204 [08:30:05] Epoch: 1 Batch: 13486/38378 (35.14%) Loss: 2.371328 LR: 0.00004204 [08:30:06] Epoch: 1 Batch: 13487/38378 (35.14%) Loss: 1.795771 LR: 0.00004204 [08:30:08] Epoch: 1 Batch: 13488/38378 (35.15%) Loss: 2.248077 LR: 0.00004204 [08:30:10] Epoch: 1 Batch: 13489/38378 (35.15%) Loss: 2.050601 LR: 0.00004204 [08:30:12] Epoch: 1 Batch: 13490/38378 (35.15%) Loss: 2.188969 LR: 0.00004204 [08:30:13] Epoch: 1 Batch: 13491/38378 (35.15%) Loss: 2.177884 LR: 0.00004203 [08:30:15] Epoch: 1 Batch: 13492/38378 (35.16%) Loss: 1.885526 LR: 0.00004203 [08:30:17] Epoch: 1 Batch: 13493/38378 (35.16%) Loss: 1.991965 LR: 0.00004203 [08:30:18] Epoch: 1 Batch: 13494/38378 (35.16%) Loss: 2.029452 LR: 0.00004203 [08:30:20] Epoch: 1 Batch: 13495/38378 (35.16%) Loss: 2.118225 LR: 0.00004203 [08:30:22] Epoch: 1 Batch: 13496/38378 (35.17%) Loss: 1.743957 LR: 0.00004203 [08:30:24] Epoch: 1 Batch: 13497/38378 (35.17%) Loss: 1.984632 LR: 0.00004203 [08:30:25] Epoch: 1 Batch: 13498/38378 (35.17%) Loss: 2.106600 LR: 0.00004202 [08:30:27] Epoch: 1 Batch: 13499/38378 (35.17%) Loss: 2.130984 LR: 0.00004202 [08:30:29] >> Evaluating batch 0 [08:30:30] >> Evaluating batch 1 [08:30:31] >> Evaluating batch 2 [08:30:32] >> Evaluating batch 3 [08:30:33] >> Evaluating batch 4 [08:30:34] >> Evaluating batch 5 [08:30:34] >> Evaluating batch 6 [08:30:35] >> Evaluating batch 7 [08:30:36] >> Evaluating batch 8 [08:30:37] >> Evaluating batch 9 [08:30:38] >> Evaluating batch 10 [08:30:39] >> Evaluating batch 11 [08:30:40] >> Evaluating batch 12 [08:30:41] >> Evaluating batch 13 [08:30:42] >> Evaluating batch 14 [08:30:43] >> Evaluating batch 15 [08:30:44] >> Evaluating batch 16 [08:30:44] Epoch: 1 Step: 13500/38378 Evaluation: [08:30:44] [1mAvg Loss Since Last Eval: 0.0149 Val Loss: 2.0991 Validation loss delta: 2.0991 Perplexity: 8.1590 LR: 0.00004202 [08:30:49] >> Cleaned up old temp checkpoint: epoch1_step12500 [08:30:49] >> Temp checkpoint saved: epoch1_step13500, size: 0.1702 GB [08:30:52] >> Checkpoint saved: epoch1_step13500, size: 0.1702 GB [08:30:52] Epoch: 1 Batch: 13500/38378 (35.18%) Loss: 2.026105 LR: 0.00004202 [08:30:54] Epoch: 1 Batch: 13501/38378 (35.18%) Loss: 2.046088 LR: 0.00004202 [08:30:56] Epoch: 1 Batch: 13502/38378 (35.18%) Loss: 1.863092 LR: 0.00004202 [08:30:57] Epoch: 1 Batch: 13503/38378 (35.18%) Loss: 1.972470 LR: 0.00004202 [08:30:59] Epoch: 1 Batch: 13504/38378 (35.19%) Loss: 2.048382 LR: 0.00004202 [08:31:01] Epoch: 1 Batch: 13505/38378 (35.19%) Loss: 2.048974 LR: 0.00004201 [08:31:02] Epoch: 1 Batch: 13506/38378 (35.19%) Loss: 1.899383 LR: 0.00004201 [08:31:04] Epoch: 1 Batch: 13507/38378 (35.19%) Loss: 2.000197 LR: 0.00004201 [08:31:06] Epoch: 1 Batch: 13508/38378 (35.20%) Loss: 1.956303 LR: 0.00004201 [08:31:07] Epoch: 1 Batch: 13509/38378 (35.20%) Loss: 2.103176 LR: 0.00004201 [08:31:09] Epoch: 1 Batch: 13510/38378 (35.20%) Loss: 2.191133 LR: 0.00004201 [08:31:11] Epoch: 1 Batch: 13511/38378 (35.21%) Loss: 2.030614 LR: 0.00004201 [08:31:13] Epoch: 1 Batch: 13512/38378 (35.21%) Loss: 1.787579 LR: 0.00004200 [08:31:14] Epoch: 1 Batch: 13513/38378 (35.21%) Loss: 1.791318 LR: 0.00004200 [08:31:16] Epoch: 1 Batch: 13514/38378 (35.21%) Loss: 1.911050 LR: 0.00004200 [08:31:18] Epoch: 1 Batch: 13515/38378 (35.22%) Loss: 1.952232 LR: 0.00004200 [08:31:19] Epoch: 1 Batch: 13516/38378 (35.22%) Loss: 2.175826 LR: 0.00004200 [08:31:21] Epoch: 1 Batch: 13517/38378 (35.22%) Loss: 2.050954 LR: 0.00004200 [08:31:23] Epoch: 1 Batch: 13518/38378 (35.22%) Loss: 2.007764 LR: 0.00004200 [08:31:25] Epoch: 1 Batch: 13519/38378 (35.23%) Loss: 1.990560 LR: 0.00004199 [08:31:26] Epoch: 1 Batch: 13520/38378 (35.23%) Loss: 1.826073 LR: 0.00004199 [08:31:28] Epoch: 1 Batch: 13521/38378 (35.23%) Loss: 1.983090 LR: 0.00004199 [08:31:30] Epoch: 1 Batch: 13522/38378 (35.23%) Loss: 1.784191 LR: 0.00004199 [08:31:32] Epoch: 1 Batch: 13523/38378 (35.24%) Loss: 1.890867 LR: 0.00004199 [08:31:33] Epoch: 1 Batch: 13524/38378 (35.24%) Loss: 2.116099 LR: 0.00004199 [08:31:35] Epoch: 1 Batch: 13525/38378 (35.24%) Loss: 1.921986 LR: 0.00004199 [08:31:37] Epoch: 1 Batch: 13526/38378 (35.24%) Loss: 2.061479 LR: 0.00004198 [08:31:38] Epoch: 1 Batch: 13527/38378 (35.25%) Loss: 1.926387 LR: 0.00004198 [08:31:40] Epoch: 1 Batch: 13528/38378 (35.25%) Loss: 1.977438 LR: 0.00004198 [08:31:42] Epoch: 1 Batch: 13529/38378 (35.25%) Loss: 1.891406 LR: 0.00004198 [08:31:44] Epoch: 1 Batch: 13530/38378 (35.25%) Loss: 1.833229 LR: 0.00004198 [08:31:45] Epoch: 1 Batch: 13531/38378 (35.26%) Loss: 2.286933 LR: 0.00004198 [08:31:47] Epoch: 1 Batch: 13532/38378 (35.26%) Loss: 2.197276 LR: 0.00004198 [08:31:49] Epoch: 1 Batch: 13533/38378 (35.26%) Loss: 2.037172 LR: 0.00004197 [08:31:50] Epoch: 1 Batch: 13534/38378 (35.26%) Loss: 1.912645 LR: 0.00004197 [08:31:52] Epoch: 1 Batch: 13535/38378 (35.27%) Loss: 1.895209 LR: 0.00004197 [08:31:54] Epoch: 1 Batch: 13536/38378 (35.27%) Loss: 2.233853 LR: 0.00004197 [08:31:55] Epoch: 1 Batch: 13537/38378 (35.27%) Loss: 1.955488 LR: 0.00004197 [08:31:57] Epoch: 1 Batch: 13538/38378 (35.28%) Loss: 1.871430 LR: 0.00004197 [08:31:59] Epoch: 1 Batch: 13539/38378 (35.28%) Loss: 2.105490 LR: 0.00004197 [08:32:00] Epoch: 1 Batch: 13540/38378 (35.28%) Loss: 2.115010 LR: 0.00004196 [08:32:02] Epoch: 1 Batch: 13541/38378 (35.28%) Loss: 2.332537 LR: 0.00004196 [08:32:04] Epoch: 1 Batch: 13542/38378 (35.29%) Loss: 1.959323 LR: 0.00004196 [08:32:05] Epoch: 1 Batch: 13543/38378 (35.29%) Loss: 2.008724 LR: 0.00004196 [08:32:07] Epoch: 1 Batch: 13544/38378 (35.29%) Loss: 1.898630 LR: 0.00004196 [08:32:09] Epoch: 1 Batch: 13545/38378 (35.29%) Loss: 1.973556 LR: 0.00004196 [08:32:11] Epoch: 1 Batch: 13546/38378 (35.30%) Loss: 1.951301 LR: 0.00004196 [08:32:12] Epoch: 1 Batch: 13547/38378 (35.30%) Loss: 1.916472 LR: 0.00004195 [08:32:14] Epoch: 1 Batch: 13548/38378 (35.30%) Loss: 2.183761 LR: 0.00004195 [08:32:16] Epoch: 1 Batch: 13549/38378 (35.30%) Loss: 1.998955 LR: 0.00004195 [08:32:17] Epoch: 1 Batch: 13550/38378 (35.31%) Loss: 2.016333 LR: 0.00004195 [08:32:19] Epoch: 1 Batch: 13551/38378 (35.31%) Loss: 1.875191 LR: 0.00004195 [08:32:21] Epoch: 1 Batch: 13552/38378 (35.31%) Loss: 1.860112 LR: 0.00004195 [08:32:22] Epoch: 1 Batch: 13553/38378 (35.31%) Loss: 1.797028 LR: 0.00004195 [08:32:24] Epoch: 1 Batch: 13554/38378 (35.32%) Loss: 2.025301 LR: 0.00004194 [08:32:26] Epoch: 1 Batch: 13555/38378 (35.32%) Loss: 2.094141 LR: 0.00004194 [08:32:27] Epoch: 1 Batch: 13556/38378 (35.32%) Loss: 2.131305 LR: 0.00004194 [08:32:29] Epoch: 1 Batch: 13557/38378 (35.32%) Loss: 2.351773 LR: 0.00004194 [08:32:31] Epoch: 1 Batch: 13558/38378 (35.33%) Loss: 2.168903 LR: 0.00004194 [08:32:32] Epoch: 1 Batch: 13559/38378 (35.33%) Loss: 2.046373 LR: 0.00004194 [08:32:34] Epoch: 1 Batch: 13560/38378 (35.33%) Loss: 2.091880 LR: 0.00004194 [08:32:36] Epoch: 1 Batch: 13561/38378 (35.34%) Loss: 1.881965 LR: 0.00004193 [08:32:37] Epoch: 1 Batch: 13562/38378 (35.34%) Loss: 2.085897 LR: 0.00004193 [08:32:39] Epoch: 1 Batch: 13563/38378 (35.34%) Loss: 1.679760 LR: 0.00004193 [08:32:41] Epoch: 1 Batch: 13564/38378 (35.34%) Loss: 1.996305 LR: 0.00004193 [08:32:42] Epoch: 1 Batch: 13565/38378 (35.35%) Loss: 2.061675 LR: 0.00004193 [08:32:44] Epoch: 1 Batch: 13566/38378 (35.35%) Loss: 2.159841 LR: 0.00004193 [08:32:46] Epoch: 1 Batch: 13567/38378 (35.35%) Loss: 2.090629 LR: 0.00004193 [08:32:47] Epoch: 1 Batch: 13568/38378 (35.35%) Loss: 1.922171 LR: 0.00004192 [08:32:49] Epoch: 1 Batch: 13569/38378 (35.36%) Loss: 2.106369 LR: 0.00004192 [08:32:51] Epoch: 1 Batch: 13570/38378 (35.36%) Loss: 2.006514 LR: 0.00004192 [08:32:53] Epoch: 1 Batch: 13571/38378 (35.36%) Loss: 1.852803 LR: 0.00004192 [08:32:54] Epoch: 1 Batch: 13572/38378 (35.36%) Loss: 1.971937 LR: 0.00004192 [08:32:56] Epoch: 1 Batch: 13573/38378 (35.37%) Loss: 1.963589 LR: 0.00004192 [08:32:58] Epoch: 1 Batch: 13574/38378 (35.37%) Loss: 2.043308 LR: 0.00004192 [08:32:59] Epoch: 1 Batch: 13575/38378 (35.37%) Loss: 1.797223 LR: 0.00004191 [08:33:01] Epoch: 1 Batch: 13576/38378 (35.37%) Loss: 1.766814 LR: 0.00004191 [08:33:03] Epoch: 1 Batch: 13577/38378 (35.38%) Loss: 2.132911 LR: 0.00004191 [08:33:05] Epoch: 1 Batch: 13578/38378 (35.38%) Loss: 2.135424 LR: 0.00004191 [08:33:06] Epoch: 1 Batch: 13579/38378 (35.38%) Loss: 2.180435 LR: 0.00004191 [08:33:08] Epoch: 1 Batch: 13580/38378 (35.38%) Loss: 2.261368 LR: 0.00004191 [08:33:10] Epoch: 1 Batch: 13581/38378 (35.39%) Loss: 1.990044 LR: 0.00004191 [08:33:12] Epoch: 1 Batch: 13582/38378 (35.39%) Loss: 1.762146 LR: 0.00004190 [08:33:14] Epoch: 1 Batch: 13583/38378 (35.39%) Loss: 1.852746 LR: 0.00004190 [08:33:15] Epoch: 1 Batch: 13584/38378 (35.40%) Loss: 1.813355 LR: 0.00004190 [08:33:17] Epoch: 1 Batch: 13585/38378 (35.40%) Loss: 1.978721 LR: 0.00004190 [08:33:19] Epoch: 1 Batch: 13586/38378 (35.40%) Loss: 2.227324 LR: 0.00004190 [08:33:21] Epoch: 1 Batch: 13587/38378 (35.40%) Loss: 1.963945 LR: 0.00004190 [08:33:22] Epoch: 1 Batch: 13588/38378 (35.41%) Loss: 1.908336 LR: 0.00004190 [08:33:24] Epoch: 1 Batch: 13589/38378 (35.41%) Loss: 2.151762 LR: 0.00004189 [08:33:26] Epoch: 1 Batch: 13590/38378 (35.41%) Loss: 2.108204 LR: 0.00004189 [08:33:27] Epoch: 1 Batch: 13591/38378 (35.41%) Loss: 2.099138 LR: 0.00004189 [08:33:29] Epoch: 1 Batch: 13592/38378 (35.42%) Loss: 2.341417 LR: 0.00004189 [08:33:31] Epoch: 1 Batch: 13593/38378 (35.42%) Loss: 1.890714 LR: 0.00004189 [08:33:32] Epoch: 1 Batch: 13594/38378 (35.42%) Loss: 2.046546 LR: 0.00004189 [08:33:34] Epoch: 1 Batch: 13595/38378 (35.42%) Loss: 1.889247 LR: 0.00004189 [08:33:36] Epoch: 1 Batch: 13596/38378 (35.43%) Loss: 2.076394 LR: 0.00004188 [08:33:38] Epoch: 1 Batch: 13597/38378 (35.43%) Loss: 1.794273 LR: 0.00004188 [08:33:39] Epoch: 1 Batch: 13598/38378 (35.43%) Loss: 2.214153 LR: 0.00004188 [08:33:41] Epoch: 1 Batch: 13599/38378 (35.43%) Loss: 2.315758 LR: 0.00004188 [08:33:47] >> Cleaned up old temp checkpoint: epoch1_step12600 [08:33:47] >> Temp checkpoint saved: epoch1_step13600, size: 0.1702 GB [08:33:47] Epoch: 1 Batch: 13600/38378 (35.44%) Loss: 2.168648 LR: 0.00004188 [08:33:48] Epoch: 1 Batch: 13601/38378 (35.44%) Loss: 2.004672 LR: 0.00004188 [08:33:50] Epoch: 1 Batch: 13602/38378 (35.44%) Loss: 2.136141 LR: 0.00004188 [08:33:52] Epoch: 1 Batch: 13603/38378 (35.44%) Loss: 2.072985 LR: 0.00004187 [08:33:53] Epoch: 1 Batch: 13604/38378 (35.45%) Loss: 1.896328 LR: 0.00004187 [08:33:55] Epoch: 1 Batch: 13605/38378 (35.45%) Loss: 1.965642 LR: 0.00004187 [08:33:57] Epoch: 1 Batch: 13606/38378 (35.45%) Loss: 1.928355 LR: 0.00004187 [08:33:58] Epoch: 1 Batch: 13607/38378 (35.46%) Loss: 1.999311 LR: 0.00004187 [08:34:00] Epoch: 1 Batch: 13608/38378 (35.46%) Loss: 2.047571 LR: 0.00004187 [08:34:02] Epoch: 1 Batch: 13609/38378 (35.46%) Loss: 2.099377 LR: 0.00004187 [08:34:03] Epoch: 1 Batch: 13610/38378 (35.46%) Loss: 1.966968 LR: 0.00004186 [08:34:05] Epoch: 1 Batch: 13611/38378 (35.47%) Loss: 2.012236 LR: 0.00004186 [08:34:07] Epoch: 1 Batch: 13612/38378 (35.47%) Loss: 1.824928 LR: 0.00004186 [08:34:08] Epoch: 1 Batch: 13613/38378 (35.47%) Loss: 1.811008 LR: 0.00004186 [08:34:10] Epoch: 1 Batch: 13614/38378 (35.47%) Loss: 2.010626 LR: 0.00004186 [08:34:12] Epoch: 1 Batch: 13615/38378 (35.48%) Loss: 2.146232 LR: 0.00004186 [08:34:14] Epoch: 1 Batch: 13616/38378 (35.48%) Loss: 1.925424 LR: 0.00004186 [08:34:15] Epoch: 1 Batch: 13617/38378 (35.48%) Loss: 1.997639 LR: 0.00004185 [08:34:17] Epoch: 1 Batch: 13618/38378 (35.48%) Loss: 2.330671 LR: 0.00004185 [08:34:19] Epoch: 1 Batch: 13619/38378 (35.49%) Loss: 1.871225 LR: 0.00004185 [08:34:20] Epoch: 1 Batch: 13620/38378 (35.49%) Loss: 1.962549 LR: 0.00004185 [08:34:22] Epoch: 1 Batch: 13621/38378 (35.49%) Loss: 1.988128 LR: 0.00004185 [08:34:24] Epoch: 1 Batch: 13622/38378 (35.49%) Loss: 2.046721 LR: 0.00004185 [08:34:26] Epoch: 1 Batch: 13623/38378 (35.50%) Loss: 2.089454 LR: 0.00004185 [08:34:27] Epoch: 1 Batch: 13624/38378 (35.50%) Loss: 1.871089 LR: 0.00004184 [08:34:29] Epoch: 1 Batch: 13625/38378 (35.50%) Loss: 2.144532 LR: 0.00004184 [08:34:31] Epoch: 1 Batch: 13626/38378 (35.50%) Loss: 1.969122 LR: 0.00004184 [08:34:32] Epoch: 1 Batch: 13627/38378 (35.51%) Loss: 2.157665 LR: 0.00004184 [08:34:34] Epoch: 1 Batch: 13628/38378 (35.51%) Loss: 2.028565 LR: 0.00004184 [08:34:36] Epoch: 1 Batch: 13629/38378 (35.51%) Loss: 2.054593 LR: 0.00004184 [08:34:37] Epoch: 1 Batch: 13630/38378 (35.52%) Loss: 1.692612 LR: 0.00004184 [08:34:39] Epoch: 1 Batch: 13631/38378 (35.52%) Loss: 1.901063 LR: 0.00004183 [08:34:41] Epoch: 1 Batch: 13632/38378 (35.52%) Loss: 1.929331 LR: 0.00004183 [08:34:43] Epoch: 1 Batch: 13633/38378 (35.52%) Loss: 1.781544 LR: 0.00004183 [08:34:44] Epoch: 1 Batch: 13634/38378 (35.53%) Loss: 2.227882 LR: 0.00004183 [08:34:46] Epoch: 1 Batch: 13635/38378 (35.53%) Loss: 1.997690 LR: 0.00004183 [08:34:48] Epoch: 1 Batch: 13636/38378 (35.53%) Loss: 2.029035 LR: 0.00004183 [08:34:49] Epoch: 1 Batch: 13637/38378 (35.53%) Loss: 2.029211 LR: 0.00004183 [08:34:51] Epoch: 1 Batch: 13638/38378 (35.54%) Loss: 1.648706 LR: 0.00004182 [08:34:53] Epoch: 1 Batch: 13639/38378 (35.54%) Loss: 1.741645 LR: 0.00004182 [08:34:54] Epoch: 1 Batch: 13640/38378 (35.54%) Loss: 2.293924 LR: 0.00004182 [08:34:56] Epoch: 1 Batch: 13641/38378 (35.54%) Loss: 1.860720 LR: 0.00004182 [08:34:58] Epoch: 1 Batch: 13642/38378 (35.55%) Loss: 2.046126 LR: 0.00004182 [08:34:59] Epoch: 1 Batch: 13643/38378 (35.55%) Loss: 2.092422 LR: 0.00004182 [08:35:01] Epoch: 1 Batch: 13644/38378 (35.55%) Loss: 1.676536 LR: 0.00004182 [08:35:03] Epoch: 1 Batch: 13645/38378 (35.55%) Loss: 2.167498 LR: 0.00004181 [08:35:04] Epoch: 1 Batch: 13646/38378 (35.56%) Loss: 1.777259 LR: 0.00004181 [08:35:06] Epoch: 1 Batch: 13647/38378 (35.56%) Loss: 1.501377 LR: 0.00004181 [08:35:08] Epoch: 1 Batch: 13648/38378 (35.56%) Loss: 1.757713 LR: 0.00004181 [08:35:09] Epoch: 1 Batch: 13649/38378 (35.56%) Loss: 1.724517 LR: 0.00004181 [08:35:11] Epoch: 1 Batch: 13650/38378 (35.57%) Loss: 1.944316 LR: 0.00004181 [08:35:13] Epoch: 1 Batch: 13651/38378 (35.57%) Loss: 1.872566 LR: 0.00004181 [08:35:15] Epoch: 1 Batch: 13652/38378 (35.57%) Loss: 2.248558 LR: 0.00004179 [08:35:16] Epoch: 1 Batch: 13653/38378 (35.58%) Loss: 1.778665 LR: 0.00004179 [08:35:18] Epoch: 1 Batch: 13654/38378 (35.58%) Loss: 1.883718 LR: 0.00004179 [08:35:20] Epoch: 1 Batch: 13655/38378 (35.58%) Loss: 2.089184 LR: 0.00004179 [08:35:21] Epoch: 1 Batch: 13656/38378 (35.58%) Loss: 2.225121 LR: 0.00004179 [08:35:23] Epoch: 1 Batch: 13657/38378 (35.59%) Loss: 2.110290 LR: 0.00004179 [08:35:25] Epoch: 1 Batch: 13658/38378 (35.59%) Loss: 1.843037 LR: 0.00004179 [08:35:26] Epoch: 1 Batch: 13659/38378 (35.59%) Loss: 1.805783 LR: 0.00004178 [08:35:28] Epoch: 1 Batch: 13660/38378 (35.59%) Loss: 2.048436 LR: 0.00004178 [08:35:30] Epoch: 1 Batch: 13661/38378 (35.60%) Loss: 1.907523 LR: 0.00004178 [08:35:31] Epoch: 1 Batch: 13662/38378 (35.60%) Loss: 1.945923 LR: 0.00004178 [08:35:33] Epoch: 1 Batch: 13663/38378 (35.60%) Loss: 1.746149 LR: 0.00004178 [08:35:35] Epoch: 1 Batch: 13664/38378 (35.60%) Loss: 2.060862 LR: 0.00004178 [08:35:37] Epoch: 1 Batch: 13665/38378 (35.61%) Loss: 1.699767 LR: 0.00004178 [08:35:38] Epoch: 1 Batch: 13666/38378 (35.61%) Loss: 2.260308 LR: 0.00004177 [08:35:40] Epoch: 1 Batch: 13667/38378 (35.61%) Loss: 1.997261 LR: 0.00004177 [08:35:42] Epoch: 1 Batch: 13668/38378 (35.61%) Loss: 1.905765 LR: 0.00004177 [08:35:43] Epoch: 1 Batch: 13669/38378 (35.62%) Loss: 1.973047 LR: 0.00004177 [08:35:45] Epoch: 1 Batch: 13670/38378 (35.62%) Loss: 1.857531 LR: 0.00004177 [08:35:47] Epoch: 1 Batch: 13671/38378 (35.62%) Loss: 1.952905 LR: 0.00004177 [08:35:48] Epoch: 1 Batch: 13672/38378 (35.62%) Loss: 2.013723 LR: 0.00004177 [08:35:50] Epoch: 1 Batch: 13673/38378 (35.63%) Loss: 1.748659 LR: 0.00004176 [08:35:52] Epoch: 1 Batch: 13674/38378 (35.63%) Loss: 2.050936 LR: 0.00004176 [08:35:53] Epoch: 1 Batch: 13675/38378 (35.63%) Loss: 1.794789 LR: 0.00004176 [08:35:55] Epoch: 1 Batch: 13676/38378 (35.63%) Loss: 2.113135 LR: 0.00004176 [08:35:57] Epoch: 1 Batch: 13677/38378 (35.64%) Loss: 2.114038 LR: 0.00004176 [08:35:58] Epoch: 1 Batch: 13678/38378 (35.64%) Loss: 1.861193 LR: 0.00004176 [08:36:00] Epoch: 1 Batch: 13679/38378 (35.64%) Loss: 1.844885 LR: 0.00004176 [08:36:02] Epoch: 1 Batch: 13680/38378 (35.65%) Loss: 1.956109 LR: 0.00004175 [08:36:04] Epoch: 1 Batch: 13681/38378 (35.65%) Loss: 2.185095 LR: 0.00004175 [08:36:05] Epoch: 1 Batch: 13682/38378 (35.65%) Loss: 2.327518 LR: 0.00004175 [08:36:07] Epoch: 1 Batch: 13683/38378 (35.65%) Loss: 1.912193 LR: 0.00004175 [08:36:09] Epoch: 1 Batch: 13684/38378 (35.66%) Loss: 2.202680 LR: 0.00004175 [08:36:10] Epoch: 1 Batch: 13685/38378 (35.66%) Loss: 1.772588 LR: 0.00004175 [08:36:12] Epoch: 1 Batch: 13686/38378 (35.66%) Loss: 1.723675 LR: 0.00004175 [08:36:14] Epoch: 1 Batch: 13687/38378 (35.66%) Loss: 1.831293 LR: 0.00004174 [08:36:15] Epoch: 1 Batch: 13688/38378 (35.67%) Loss: 1.719888 LR: 0.00004174 [08:36:17] Epoch: 1 Batch: 13689/38378 (35.67%) Loss: 2.186140 LR: 0.00004174 [08:36:19] Epoch: 1 Batch: 13690/38378 (35.67%) Loss: 2.149682 LR: 0.00004174 [08:36:20] Epoch: 1 Batch: 13691/38378 (35.67%) Loss: 1.776907 LR: 0.00004174 [08:36:22] Epoch: 1 Batch: 13692/38378 (35.68%) Loss: 1.790524 LR: 0.00004174 [08:36:24] Epoch: 1 Batch: 13693/38378 (35.68%) Loss: 1.846475 LR: 0.00004174 [08:36:26] Epoch: 1 Batch: 13694/38378 (35.68%) Loss: 2.268414 LR: 0.00004173 [08:36:27] Epoch: 1 Batch: 13695/38378 (35.68%) Loss: 1.932040 LR: 0.00004173 [08:36:29] Epoch: 1 Batch: 13696/38378 (35.69%) Loss: 1.939123 LR: 0.00004173 [08:36:31] Epoch: 1 Batch: 13697/38378 (35.69%) Loss: 1.915062 LR: 0.00004173 [08:36:32] Epoch: 1 Batch: 13698/38378 (35.69%) Loss: 2.033933 LR: 0.00004173 [08:36:34] Epoch: 1 Batch: 13699/38378 (35.69%) Loss: 1.784792 LR: 0.00004173 [08:36:40] >> Cleaned up old temp checkpoint: epoch1_step12700 [08:36:40] >> Temp checkpoint saved: epoch1_step13700, size: 0.1702 GB [08:36:40] Epoch: 1 Batch: 13700/38378 (35.70%) Loss: 1.974906 LR: 0.00004173 [08:36:42] Epoch: 1 Batch: 13701/38378 (35.70%) Loss: 1.989622 LR: 0.00004172 [08:36:43] Epoch: 1 Batch: 13702/38378 (35.70%) Loss: 1.871628 LR: 0.00004172 [08:36:45] Epoch: 1 Batch: 13703/38378 (35.71%) Loss: 1.796327 LR: 0.00004172 [08:36:47] Epoch: 1 Batch: 13704/38378 (35.71%) Loss: 2.079175 LR: 0.00004172 [08:36:48] Epoch: 1 Batch: 13705/38378 (35.71%) Loss: 2.198003 LR: 0.00004172 [08:36:50] Epoch: 1 Batch: 13706/38378 (35.71%) Loss: 2.082858 LR: 0.00004172 [08:36:52] Epoch: 1 Batch: 13707/38378 (35.72%) Loss: 2.217517 LR: 0.00004172 [08:36:53] Epoch: 1 Batch: 13708/38378 (35.72%) Loss: 1.965468 LR: 0.00004171 [08:36:55] Epoch: 1 Batch: 13709/38378 (35.72%) Loss: 1.834306 LR: 0.00004171 [08:36:57] Epoch: 1 Batch: 13710/38378 (35.72%) Loss: 1.588392 LR: 0.00004171 [08:36:58] Epoch: 1 Batch: 13711/38378 (35.73%) Loss: 2.086868 LR: 0.00004171 [08:37:00] Epoch: 1 Batch: 13712/38378 (35.73%) Loss: 1.792085 LR: 0.00004171 [08:37:02] Epoch: 1 Batch: 13713/38378 (35.73%) Loss: 1.970858 LR: 0.00004171 [08:37:04] Epoch: 1 Batch: 13714/38378 (35.73%) Loss: 2.049654 LR: 0.00004171 [08:37:05] Epoch: 1 Batch: 13715/38378 (35.74%) Loss: 2.085769 LR: 0.00004170 [08:37:07] Epoch: 1 Batch: 13716/38378 (35.74%) Loss: 2.209095 LR: 0.00004170 [08:37:09] Epoch: 1 Batch: 13717/38378 (35.74%) Loss: 1.895666 LR: 0.00004170 [08:37:10] Epoch: 1 Batch: 13718/38378 (35.74%) Loss: 2.267495 LR: 0.00004170 [08:37:12] Epoch: 1 Batch: 13719/38378 (35.75%) Loss: 2.205687 LR: 0.00004170 [08:37:14] Epoch: 1 Batch: 13720/38378 (35.75%) Loss: 1.837050 LR: 0.00004170 [08:37:16] Epoch: 1 Batch: 13721/38378 (35.75%) Loss: 1.917955 LR: 0.00004170 [08:37:17] Epoch: 1 Batch: 13722/38378 (35.75%) Loss: 1.753419 LR: 0.00004169 [08:37:19] Epoch: 1 Batch: 13723/38378 (35.76%) Loss: 2.029478 LR: 0.00004169 [08:37:21] Epoch: 1 Batch: 13724/38378 (35.76%) Loss: 1.684759 LR: 0.00004169 [08:37:22] Epoch: 1 Batch: 13725/38378 (35.76%) Loss: 1.979390 LR: 0.00004169 [08:37:24] Epoch: 1 Batch: 13726/38378 (35.77%) Loss: 2.268468 LR: 0.00004169 [08:37:26] Epoch: 1 Batch: 13727/38378 (35.77%) Loss: 1.986534 LR: 0.00004169 [08:37:27] Epoch: 1 Batch: 13728/38378 (35.77%) Loss: 2.183375 LR: 0.00004169 [08:37:29] Epoch: 1 Batch: 13729/38378 (35.77%) Loss: 1.799051 LR: 0.00004168 [08:37:31] Epoch: 1 Batch: 13730/38378 (35.78%) Loss: 1.874643 LR: 0.00004168 [08:37:33] Epoch: 1 Batch: 13731/38378 (35.78%) Loss: 2.236968 LR: 0.00004168 [08:37:34] Epoch: 1 Batch: 13732/38378 (35.78%) Loss: 2.265379 LR: 0.00004168 [08:37:36] Epoch: 1 Batch: 13733/38378 (35.78%) Loss: 1.920680 LR: 0.00004168 [08:37:38] Epoch: 1 Batch: 13734/38378 (35.79%) Loss: 1.871254 LR: 0.00004168 [08:37:39] Epoch: 1 Batch: 13735/38378 (35.79%) Loss: 1.787246 LR: 0.00004168 [08:37:41] Epoch: 1 Batch: 13736/38378 (35.79%) Loss: 2.070970 LR: 0.00004167 [08:37:43] Epoch: 1 Batch: 13737/38378 (35.79%) Loss: 2.211232 LR: 0.00004167 [08:37:44] Epoch: 1 Batch: 13738/38378 (35.80%) Loss: 2.330474 LR: 0.00004167 [08:37:46] Epoch: 1 Batch: 13739/38378 (35.80%) Loss: 2.207850 LR: 0.00004167 [08:37:48] Epoch: 1 Batch: 13740/38378 (35.80%) Loss: 2.291424 LR: 0.00004167 [08:37:49] Epoch: 1 Batch: 13741/38378 (35.80%) Loss: 1.828565 LR: 0.00004167 [08:37:51] Epoch: 1 Batch: 13742/38378 (35.81%) Loss: 1.737918 LR: 0.00004167 [08:37:53] Epoch: 1 Batch: 13743/38378 (35.81%) Loss: 1.984878 LR: 0.00004166 [08:37:54] Epoch: 1 Batch: 13744/38378 (35.81%) Loss: 1.884322 LR: 0.00004166 [08:37:56] Epoch: 1 Batch: 13745/38378 (35.81%) Loss: 2.037178 LR: 0.00004166 [08:37:58] Epoch: 1 Batch: 13746/38378 (35.82%) Loss: 1.765202 LR: 0.00004166 [08:37:59] Epoch: 1 Batch: 13747/38378 (35.82%) Loss: 1.908096 LR: 0.00004166 [08:38:01] Epoch: 1 Batch: 13748/38378 (35.82%) Loss: 2.005162 LR: 0.00004166 [08:38:03] Epoch: 1 Batch: 13749/38378 (35.83%) Loss: 1.919057 LR: 0.00004166 [08:38:04] Epoch: 1 Batch: 13750/38378 (35.83%) Loss: 1.729136 LR: 0.00004165 [08:38:06] Epoch: 1 Batch: 13751/38378 (35.83%) Loss: 1.972150 LR: 0.00004165 [08:38:08] Epoch: 1 Batch: 13752/38378 (35.83%) Loss: 1.843492 LR: 0.00004165 [08:38:09] Epoch: 1 Batch: 13753/38378 (35.84%) Loss: 1.976809 LR: 0.00004165 [08:38:11] Epoch: 1 Batch: 13754/38378 (35.84%) Loss: 1.943017 LR: 0.00004165 [08:38:13] Epoch: 1 Batch: 13755/38378 (35.84%) Loss: 2.091353 LR: 0.00004165 [08:38:14] Epoch: 1 Batch: 13756/38378 (35.84%) Loss: 1.763660 LR: 0.00004165 [08:38:16] Epoch: 1 Batch: 13757/38378 (35.85%) Loss: 1.770639 LR: 0.00004164 [08:38:18] Epoch: 1 Batch: 13758/38378 (35.85%) Loss: 1.982360 LR: 0.00004164 [08:38:19] Epoch: 1 Batch: 13759/38378 (35.85%) Loss: 1.788015 LR: 0.00004164 [08:38:21] Epoch: 1 Batch: 13760/38378 (35.85%) Loss: 2.248281 LR: 0.00004164 [08:38:23] Epoch: 1 Batch: 13761/38378 (35.86%) Loss: 1.841369 LR: 0.00004164 [08:38:25] Epoch: 1 Batch: 13762/38378 (35.86%) Loss: 2.001261 LR: 0.00004164 [08:38:26] Epoch: 1 Batch: 13763/38378 (35.86%) Loss: 1.575923 LR: 0.00004164 [08:38:28] Epoch: 1 Batch: 13764/38378 (35.86%) Loss: 1.824837 LR: 0.00004163 [08:38:30] Epoch: 1 Batch: 13765/38378 (35.87%) Loss: 1.988597 LR: 0.00004163 [08:38:31] Epoch: 1 Batch: 13766/38378 (35.87%) Loss: 1.970726 LR: 0.00004163 [08:38:33] Epoch: 1 Batch: 13767/38378 (35.87%) Loss: 1.702177 LR: 0.00004163 [08:38:35] Epoch: 1 Batch: 13768/38378 (35.87%) Loss: 1.869316 LR: 0.00004163 [08:38:36] Epoch: 1 Batch: 13769/38378 (35.88%) Loss: 1.955210 LR: 0.00004163 [08:38:38] Epoch: 1 Batch: 13770/38378 (35.88%) Loss: 2.006281 LR: 0.00004163 [08:38:40] Epoch: 1 Batch: 13771/38378 (35.88%) Loss: 1.978880 LR: 0.00004162 [08:38:41] Epoch: 1 Batch: 13772/38378 (35.89%) Loss: 2.035146 LR: 0.00004162 [08:38:43] Epoch: 1 Batch: 13773/38378 (35.89%) Loss: 1.909368 LR: 0.00004162 [08:38:45] Epoch: 1 Batch: 13774/38378 (35.89%) Loss: 2.172107 LR: 0.00004162 [08:38:46] Epoch: 1 Batch: 13775/38378 (35.89%) Loss: 1.828251 LR: 0.00004162 [08:38:48] Epoch: 1 Batch: 13776/38378 (35.90%) Loss: 1.815461 LR: 0.00004162 [08:38:50] Epoch: 1 Batch: 13777/38378 (35.90%) Loss: 2.001638 LR: 0.00004162 [08:38:51] Epoch: 1 Batch: 13778/38378 (35.90%) Loss: 1.984785 LR: 0.00004161 [08:38:53] Epoch: 1 Batch: 13779/38378 (35.90%) Loss: 2.099865 LR: 0.00004161 [08:38:55] Epoch: 1 Batch: 13780/38378 (35.91%) Loss: 2.043019 LR: 0.00004161 [08:38:57] Epoch: 1 Batch: 13781/38378 (35.91%) Loss: 2.138680 LR: 0.00004161 [08:38:58] Epoch: 1 Batch: 13782/38378 (35.91%) Loss: 1.877444 LR: 0.00004161 [08:39:00] Epoch: 1 Batch: 13783/38378 (35.91%) Loss: 1.831505 LR: 0.00004161 [08:39:02] Epoch: 1 Batch: 13784/38378 (35.92%) Loss: 1.890624 LR: 0.00004161 [08:39:03] Epoch: 1 Batch: 13785/38378 (35.92%) Loss: 2.185087 LR: 0.00004160 [08:39:05] Epoch: 1 Batch: 13786/38378 (35.92%) Loss: 1.935064 LR: 0.00004160 [08:39:07] Epoch: 1 Batch: 13787/38378 (35.92%) Loss: 2.324308 LR: 0.00004160 [08:39:08] Epoch: 1 Batch: 13788/38378 (35.93%) Loss: 1.997095 LR: 0.00004160 [08:39:10] Epoch: 1 Batch: 13789/38378 (35.93%) Loss: 1.799075 LR: 0.00004160 [08:39:12] Epoch: 1 Batch: 13790/38378 (35.93%) Loss: 1.813854 LR: 0.00004160 [08:39:14] Epoch: 1 Batch: 13791/38378 (35.93%) Loss: 1.892067 LR: 0.00004160 [08:39:15] Epoch: 1 Batch: 13792/38378 (35.94%) Loss: 2.077450 LR: 0.00004159 [08:39:17] Epoch: 1 Batch: 13793/38378 (35.94%) Loss: 1.826919 LR: 0.00004159 [08:39:18] Epoch: 1 Batch: 13794/38378 (35.94%) Loss: 1.974730 LR: 0.00004159 [08:39:20] Epoch: 1 Batch: 13795/38378 (35.95%) Loss: 1.788503 LR: 0.00004159 [08:39:22] Epoch: 1 Batch: 13796/38378 (35.95%) Loss: 1.815147 LR: 0.00004159 [08:39:24] Epoch: 1 Batch: 13797/38378 (35.95%) Loss: 2.215457 LR: 0.00004159 [08:39:25] Epoch: 1 Batch: 13798/38378 (35.95%) Loss: 2.139468 LR: 0.00004159 [08:39:27] Epoch: 1 Batch: 13799/38378 (35.96%) Loss: 1.978037 LR: 0.00004158 [08:39:33] >> Cleaned up old temp checkpoint: epoch1_step12800 [08:39:33] >> Temp checkpoint saved: epoch1_step13800, size: 0.1702 GB [08:39:33] Epoch: 1 Batch: 13800/38378 (35.96%) Loss: 1.978843 LR: 0.00004158 [08:39:35] Epoch: 1 Batch: 13801/38378 (35.96%) Loss: 1.918072 LR: 0.00004158 [08:39:36] Epoch: 1 Batch: 13802/38378 (35.96%) Loss: 1.675171 LR: 0.00004158 [08:39:38] Epoch: 1 Batch: 13803/38378 (35.97%) Loss: 2.115195 LR: 0.00004158 [08:39:40] Epoch: 1 Batch: 13804/38378 (35.97%) Loss: 1.812055 LR: 0.00004158 [08:39:41] Epoch: 1 Batch: 13805/38378 (35.97%) Loss: 1.836761 LR: 0.00004158 [08:39:43] Epoch: 1 Batch: 13806/38378 (35.97%) Loss: 2.035710 LR: 0.00004157 [08:39:45] Epoch: 1 Batch: 13807/38378 (35.98%) Loss: 2.168891 LR: 0.00004157 [08:39:46] Epoch: 1 Batch: 13808/38378 (35.98%) Loss: 1.832433 LR: 0.00004157 [08:39:48] Epoch: 1 Batch: 13809/38378 (35.98%) Loss: 1.843000 LR: 0.00004157 [08:39:50] Epoch: 1 Batch: 13810/38378 (35.98%) Loss: 1.960287 LR: 0.00004157 [08:39:51] Epoch: 1 Batch: 13811/38378 (35.99%) Loss: 2.118385 LR: 0.00004157 [08:39:53] Epoch: 1 Batch: 13812/38378 (35.99%) Loss: 2.512629 LR: 0.00004157 [08:39:55] Epoch: 1 Batch: 13813/38378 (35.99%) Loss: 1.738561 LR: 0.00004156 [08:39:57] Epoch: 1 Batch: 13814/38378 (35.99%) Loss: 1.727477 LR: 0.00004156 [08:39:58] Epoch: 1 Batch: 13815/38378 (36.00%) Loss: 2.470381 LR: 0.00004156 [08:40:00] Epoch: 1 Batch: 13816/38378 (36.00%) Loss: 1.887996 LR: 0.00004156 [08:40:02] Epoch: 1 Batch: 13817/38378 (36.00%) Loss: 2.014304 LR: 0.00004156 [08:40:03] Epoch: 1 Batch: 13818/38378 (36.01%) Loss: 1.942704 LR: 0.00004156 [08:40:05] Epoch: 1 Batch: 13819/38378 (36.01%) Loss: 2.018450 LR: 0.00004156 [08:40:07] Epoch: 1 Batch: 13820/38378 (36.01%) Loss: 2.130416 LR: 0.00004155 [08:40:09] Epoch: 1 Batch: 13821/38378 (36.01%) Loss: 1.818989 LR: 0.00004155 [08:40:10] Epoch: 1 Batch: 13822/38378 (36.02%) Loss: 2.233877 LR: 0.00004155 [08:40:12] Epoch: 1 Batch: 13823/38378 (36.02%) Loss: 2.016998 LR: 0.00004155 [08:40:14] Epoch: 1 Batch: 13824/38378 (36.02%) Loss: 2.067579 LR: 0.00004155 [08:40:15] Epoch: 1 Batch: 13825/38378 (36.02%) Loss: 2.026298 LR: 0.00004155 [08:40:17] Epoch: 1 Batch: 13826/38378 (36.03%) Loss: 2.049688 LR: 0.00004155 [08:40:19] Epoch: 1 Batch: 13827/38378 (36.03%) Loss: 1.894372 LR: 0.00004154 [08:40:20] Epoch: 1 Batch: 13828/38378 (36.03%) Loss: 1.891654 LR: 0.00004154 [08:40:22] Epoch: 1 Batch: 13829/38378 (36.03%) Loss: 2.063424 LR: 0.00004154 [08:40:24] Epoch: 1 Batch: 13830/38378 (36.04%) Loss: 1.980404 LR: 0.00004154 [08:40:26] Epoch: 1 Batch: 13831/38378 (36.04%) Loss: 2.076951 LR: 0.00004154 [08:40:27] Epoch: 1 Batch: 13832/38378 (36.04%) Loss: 2.311129 LR: 0.00004154 [08:40:29] Epoch: 1 Batch: 13833/38378 (36.04%) Loss: 2.031863 LR: 0.00004154 [08:40:31] Epoch: 1 Batch: 13834/38378 (36.05%) Loss: 1.834712 LR: 0.00004153 [08:40:32] Epoch: 1 Batch: 13835/38378 (36.05%) Loss: 1.992630 LR: 0.00004153 [08:40:34] Epoch: 1 Batch: 13836/38378 (36.05%) Loss: 1.932418 LR: 0.00004153 [08:40:36] Epoch: 1 Batch: 13837/38378 (36.05%) Loss: 2.146340 LR: 0.00004153 [08:40:38] Epoch: 1 Batch: 13838/38378 (36.06%) Loss: 1.784794 LR: 0.00004153 [08:40:39] Epoch: 1 Batch: 13839/38378 (36.06%) Loss: 1.825527 LR: 0.00004153 [08:40:41] Epoch: 1 Batch: 13840/38378 (36.06%) Loss: 1.870175 LR: 0.00004153 [08:40:43] Epoch: 1 Batch: 13841/38378 (36.06%) Loss: 2.479045 LR: 0.00004152 [08:40:44] Epoch: 1 Batch: 13842/38378 (36.07%) Loss: 1.797153 LR: 0.00004152 [08:40:46] Epoch: 1 Batch: 13843/38378 (36.07%) Loss: 1.937874 LR: 0.00004152 [08:40:48] Epoch: 1 Batch: 13844/38378 (36.07%) Loss: 2.049892 LR: 0.00004152 [08:40:49] Epoch: 1 Batch: 13845/38378 (36.08%) Loss: 2.018327 LR: 0.00004152 [08:40:51] Epoch: 1 Batch: 13846/38378 (36.08%) Loss: 1.833484 LR: 0.00004152 [08:40:53] Epoch: 1 Batch: 13847/38378 (36.08%) Loss: 2.143467 LR: 0.00004152 [08:40:54] Epoch: 1 Batch: 13848/38378 (36.08%) Loss: 2.334649 LR: 0.00004151 [08:40:56] Epoch: 1 Batch: 13849/38378 (36.09%) Loss: 2.036599 LR: 0.00004151 [08:40:58] Epoch: 1 Batch: 13850/38378 (36.09%) Loss: 2.207376 LR: 0.00004151 [08:40:59] Epoch: 1 Batch: 13851/38378 (36.09%) Loss: 1.886039 LR: 0.00004151 [08:41:01] Epoch: 1 Batch: 13852/38378 (36.09%) Loss: 2.309385 LR: 0.00004151 [08:41:03] Epoch: 1 Batch: 13853/38378 (36.10%) Loss: 1.980243 LR: 0.00004151 [08:41:05] Epoch: 1 Batch: 13854/38378 (36.10%) Loss: 1.659301 LR: 0.00004151 [08:41:06] Epoch: 1 Batch: 13855/38378 (36.10%) Loss: 2.025742 LR: 0.00004150 [08:41:08] Epoch: 1 Batch: 13856/38378 (36.10%) Loss: 1.872552 LR: 0.00004150 [08:41:10] Epoch: 1 Batch: 13857/38378 (36.11%) Loss: 1.681069 LR: 0.00004150 [08:41:11] Epoch: 1 Batch: 13858/38378 (36.11%) Loss: 1.833880 LR: 0.00004150 [08:41:13] Epoch: 1 Batch: 13859/38378 (36.11%) Loss: 2.232260 LR: 0.00004150 [08:41:15] Epoch: 1 Batch: 13860/38378 (36.11%) Loss: 2.114335 LR: 0.00004150 [08:41:16] Epoch: 1 Batch: 13861/38378 (36.12%) Loss: 2.374004 LR: 0.00004150 [08:41:18] Epoch: 1 Batch: 13862/38378 (36.12%) Loss: 1.813974 LR: 0.00004149 [08:41:20] Epoch: 1 Batch: 13863/38378 (36.12%) Loss: 1.885089 LR: 0.00004149 [08:41:21] Epoch: 1 Batch: 13864/38378 (36.12%) Loss: 1.836486 LR: 0.00004149 [08:41:23] Epoch: 1 Batch: 13865/38378 (36.13%) Loss: 1.937238 LR: 0.00004149 [08:41:25] Epoch: 1 Batch: 13866/38378 (36.13%) Loss: 2.274987 LR: 0.00004149 [08:41:27] Epoch: 1 Batch: 13867/38378 (36.13%) Loss: 1.965492 LR: 0.00004149 [08:41:28] Epoch: 1 Batch: 13868/38378 (36.14%) Loss: 2.181498 LR: 0.00004149 [08:41:30] Epoch: 1 Batch: 13869/38378 (36.14%) Loss: 2.078052 LR: 0.00004148 [08:41:32] Epoch: 1 Batch: 13870/38378 (36.14%) Loss: 1.974359 LR: 0.00004148 [08:41:33] Epoch: 1 Batch: 13871/38378 (36.14%) Loss: 1.636475 LR: 0.00004148 [08:41:35] Epoch: 1 Batch: 13872/38378 (36.15%) Loss: 2.046289 LR: 0.00004148 [08:41:37] Epoch: 1 Batch: 13873/38378 (36.15%) Loss: 1.977910 LR: 0.00004148 [08:41:38] Epoch: 1 Batch: 13874/38378 (36.15%) Loss: 2.208661 LR: 0.00004148 [08:41:40] Epoch: 1 Batch: 13875/38378 (36.15%) Loss: 2.082152 LR: 0.00004148 [08:41:42] Epoch: 1 Batch: 13876/38378 (36.16%) Loss: 1.777253 LR: 0.00004147 [08:41:44] Epoch: 1 Batch: 13877/38378 (36.16%) Loss: 1.980883 LR: 0.00004147 [08:41:45] Epoch: 1 Batch: 13878/38378 (36.16%) Loss: 2.038668 LR: 0.00004147 [08:41:47] Epoch: 1 Batch: 13879/38378 (36.16%) Loss: 1.757108 LR: 0.00004147 [08:41:49] Epoch: 1 Batch: 13880/38378 (36.17%) Loss: 1.909010 LR: 0.00004147 [08:41:50] Epoch: 1 Batch: 13881/38378 (36.17%) Loss: 1.903848 LR: 0.00004147 [08:41:52] Epoch: 1 Batch: 13882/38378 (36.17%) Loss: 1.893456 LR: 0.00004147 [08:41:54] Epoch: 1 Batch: 13883/38378 (36.17%) Loss: 2.158723 LR: 0.00004146 [08:41:55] Epoch: 1 Batch: 13884/38378 (36.18%) Loss: 2.280444 LR: 0.00004146 [08:41:57] Epoch: 1 Batch: 13885/38378 (36.18%) Loss: 2.009437 LR: 0.00004146 [08:41:59] Epoch: 1 Batch: 13886/38378 (36.18%) Loss: 1.847089 LR: 0.00004146 [08:42:01] Epoch: 1 Batch: 13887/38378 (36.18%) Loss: 1.585722 LR: 0.00004146 [08:42:02] Epoch: 1 Batch: 13888/38378 (36.19%) Loss: 1.822095 LR: 0.00004146 [08:42:04] Epoch: 1 Batch: 13889/38378 (36.19%) Loss: 2.086462 LR: 0.00004146 [08:42:06] Epoch: 1 Batch: 13890/38378 (36.19%) Loss: 2.109425 LR: 0.00004145 [08:42:07] Epoch: 1 Batch: 13891/38378 (36.20%) Loss: 1.957005 LR: 0.00004145 [08:42:09] Epoch: 1 Batch: 13892/38378 (36.20%) Loss: 1.871677 LR: 0.00004145 [08:42:11] Epoch: 1 Batch: 13893/38378 (36.20%) Loss: 1.855394 LR: 0.00004145 [08:42:13] Epoch: 1 Batch: 13894/38378 (36.20%) Loss: 2.107599 LR: 0.00004145 [08:42:14] Epoch: 1 Batch: 13895/38378 (36.21%) Loss: 1.858996 LR: 0.00004145 [08:42:16] Epoch: 1 Batch: 13896/38378 (36.21%) Loss: 1.996703 LR: 0.00004145 [08:42:18] Epoch: 1 Batch: 13897/38378 (36.21%) Loss: 2.075826 LR: 0.00004144 [08:42:19] Epoch: 1 Batch: 13898/38378 (36.21%) Loss: 2.117608 LR: 0.00004144 [08:42:21] Epoch: 1 Batch: 13899/38378 (36.22%) Loss: 2.187707 LR: 0.00004144 [08:42:27] >> Cleaned up old temp checkpoint: epoch1_step12900 [08:42:27] >> Temp checkpoint saved: epoch1_step13900, size: 0.1702 GB [08:42:27] Epoch: 1 Batch: 13900/38378 (36.22%) Loss: 2.142956 LR: 0.00004144 [08:42:28] Epoch: 1 Batch: 13901/38378 (36.22%) Loss: 2.042947 LR: 0.00004144 [08:42:30] Epoch: 1 Batch: 13902/38378 (36.22%) Loss: 2.148794 LR: 0.00004144 [08:42:32] Epoch: 1 Batch: 13903/38378 (36.23%) Loss: 1.617146 LR: 0.00004144 [08:42:34] Epoch: 1 Batch: 13904/38378 (36.23%) Loss: 2.010738 LR: 0.00004143 [08:42:35] Epoch: 1 Batch: 13905/38378 (36.23%) Loss: 2.120261 LR: 0.00004143 [08:42:37] Epoch: 1 Batch: 13906/38378 (36.23%) Loss: 1.772989 LR: 0.00004143 [08:42:39] Epoch: 1 Batch: 13907/38378 (36.24%) Loss: 2.079438 LR: 0.00004143 [08:42:40] Epoch: 1 Batch: 13908/38378 (36.24%) Loss: 2.036321 LR: 0.00004143 [08:42:42] Epoch: 1 Batch: 13909/38378 (36.24%) Loss: 2.182023 LR: 0.00004143 [08:42:44] Epoch: 1 Batch: 13910/38378 (36.24%) Loss: 1.845180 LR: 0.00004143 [08:42:45] Epoch: 1 Batch: 13911/38378 (36.25%) Loss: 2.020775 LR: 0.00004142 [08:42:47] Epoch: 1 Batch: 13912/38378 (36.25%) Loss: 2.018779 LR: 0.00004142 [08:42:49] Epoch: 1 Batch: 13913/38378 (36.25%) Loss: 2.085281 LR: 0.00004142 [08:42:51] Epoch: 1 Batch: 13914/38378 (36.26%) Loss: 1.842618 LR: 0.00004142 [08:42:52] Epoch: 1 Batch: 13915/38378 (36.26%) Loss: 1.664447 LR: 0.00004142 [08:42:54] Epoch: 1 Batch: 13916/38378 (36.26%) Loss: 1.914293 LR: 0.00004142 [08:42:56] Epoch: 1 Batch: 13917/38378 (36.26%) Loss: 2.309882 LR: 0.00004142 [08:42:57] Epoch: 1 Batch: 13918/38378 (36.27%) Loss: 1.628904 LR: 0.00004141 [08:42:59] Epoch: 1 Batch: 13919/38378 (36.27%) Loss: 2.250093 LR: 0.00004141 [08:43:01] Epoch: 1 Batch: 13920/38378 (36.27%) Loss: 2.259873 LR: 0.00004141 [08:43:02] Epoch: 1 Batch: 13921/38378 (36.27%) Loss: 1.866320 LR: 0.00004141 [08:43:04] Epoch: 1 Batch: 13922/38378 (36.28%) Loss: 1.793548 LR: 0.00004141 [08:43:06] Epoch: 1 Batch: 13923/38378 (36.28%) Loss: 1.852553 LR: 0.00004141 [08:43:08] Epoch: 1 Batch: 13924/38378 (36.28%) Loss: 2.216736 LR: 0.00004141 [08:43:09] Epoch: 1 Batch: 13925/38378 (36.28%) Loss: 1.784413 LR: 0.00004140 [08:43:11] Epoch: 1 Batch: 13926/38378 (36.29%) Loss: 1.873587 LR: 0.00004140 [08:43:13] Epoch: 1 Batch: 13927/38378 (36.29%) Loss: 1.788910 LR: 0.00004140 [08:43:14] Epoch: 1 Batch: 13928/38378 (36.29%) Loss: 2.033169 LR: 0.00004140 [08:43:16] Epoch: 1 Batch: 13929/38378 (36.29%) Loss: 1.560549 LR: 0.00004140 [08:43:18] Epoch: 1 Batch: 13930/38378 (36.30%) Loss: 2.159906 LR: 0.00004140 [08:43:19] Epoch: 1 Batch: 13931/38378 (36.30%) Loss: 1.791573 LR: 0.00004140 [08:43:21] Epoch: 1 Batch: 13932/38378 (36.30%) Loss: 2.092332 LR: 0.00004139 [08:43:23] Epoch: 1 Batch: 13933/38378 (36.30%) Loss: 1.811101 LR: 0.00004139 [08:43:25] Epoch: 1 Batch: 13934/38378 (36.31%) Loss: 1.992879 LR: 0.00004139 [08:43:26] Epoch: 1 Batch: 13935/38378 (36.31%) Loss: 1.982372 LR: 0.00004139 [08:43:28] Epoch: 1 Batch: 13936/38378 (36.31%) Loss: 1.920649 LR: 0.00004139 [08:43:30] Epoch: 1 Batch: 13937/38378 (36.32%) Loss: 2.149621 LR: 0.00004139 [08:43:31] Epoch: 1 Batch: 13938/38378 (36.32%) Loss: 2.017545 LR: 0.00004139 [08:43:33] Epoch: 1 Batch: 13939/38378 (36.32%) Loss: 1.992958 LR: 0.00004138 [08:43:35] Epoch: 1 Batch: 13940/38378 (36.32%) Loss: 2.024426 LR: 0.00004138 [08:43:36] Epoch: 1 Batch: 13941/38378 (36.33%) Loss: 2.035283 LR: 0.00004138 [08:43:38] Epoch: 1 Batch: 13942/38378 (36.33%) Loss: 2.035093 LR: 0.00004138 [08:43:40] Epoch: 1 Batch: 13943/38378 (36.33%) Loss: 2.148914 LR: 0.00004138 [08:43:41] Epoch: 1 Batch: 13944/38378 (36.33%) Loss: 1.895345 LR: 0.00004138 [08:43:43] Epoch: 1 Batch: 13945/38378 (36.34%) Loss: 1.921098 LR: 0.00004138 [08:43:45] Epoch: 1 Batch: 13946/38378 (36.34%) Loss: 2.255328 LR: 0.00004137 [08:43:46] Epoch: 1 Batch: 13947/38378 (36.34%) Loss: 1.828095 LR: 0.00004137 [08:43:48] Epoch: 1 Batch: 13948/38378 (36.34%) Loss: 2.109301 LR: 0.00004137 [08:43:50] Epoch: 1 Batch: 13949/38378 (36.35%) Loss: 2.008461 LR: 0.00004137 [08:43:51] Epoch: 1 Batch: 13950/38378 (36.35%) Loss: 2.139054 LR: 0.00004137 [08:43:53] Epoch: 1 Batch: 13951/38378 (36.35%) Loss: 1.915123 LR: 0.00004137 [08:43:55] Epoch: 1 Batch: 13952/38378 (36.35%) Loss: 2.125697 LR: 0.00004137 [08:43:56] Epoch: 1 Batch: 13953/38378 (36.36%) Loss: 1.759745 LR: 0.00004136 [08:43:58] Epoch: 1 Batch: 13954/38378 (36.36%) Loss: 1.844068 LR: 0.00004136 [08:44:00] Epoch: 1 Batch: 13955/38378 (36.36%) Loss: 2.006053 LR: 0.00004136 [08:44:01] Epoch: 1 Batch: 13956/38378 (36.36%) Loss: 2.129863 LR: 0.00004136 [08:44:03] Epoch: 1 Batch: 13957/38378 (36.37%) Loss: 1.911117 LR: 0.00004136 [08:44:05] Epoch: 1 Batch: 13958/38378 (36.37%) Loss: 1.885653 LR: 0.00004136 [08:44:06] Epoch: 1 Batch: 13959/38378 (36.37%) Loss: 2.040489 LR: 0.00004136 [08:44:08] Epoch: 1 Batch: 13960/38378 (36.38%) Loss: 1.789435 LR: 0.00004135 [08:44:10] Epoch: 1 Batch: 13961/38378 (36.38%) Loss: 1.753379 LR: 0.00004135 [08:44:11] Epoch: 1 Batch: 13962/38378 (36.38%) Loss: 2.027254 LR: 0.00004135 [08:44:13] Epoch: 1 Batch: 13963/38378 (36.38%) Loss: 2.267807 LR: 0.00004135 [08:44:15] Epoch: 1 Batch: 13964/38378 (36.39%) Loss: 1.778797 LR: 0.00004135 [08:44:16] Epoch: 1 Batch: 13965/38378 (36.39%) Loss: 1.807442 LR: 0.00004135 [08:44:18] Epoch: 1 Batch: 13966/38378 (36.39%) Loss: 2.021754 LR: 0.00004135 [08:44:20] Epoch: 1 Batch: 13967/38378 (36.39%) Loss: 1.973774 LR: 0.00004134 [08:44:21] Epoch: 1 Batch: 13968/38378 (36.40%) Loss: 2.012544 LR: 0.00004134 [08:44:23] Epoch: 1 Batch: 13969/38378 (36.40%) Loss: 1.877351 LR: 0.00004134 [08:44:25] Epoch: 1 Batch: 13970/38378 (36.40%) Loss: 1.984575 LR: 0.00004134 [08:44:27] Epoch: 1 Batch: 13971/38378 (36.40%) Loss: 2.251707 LR: 0.00004134 [08:44:28] Epoch: 1 Batch: 13972/38378 (36.41%) Loss: 1.700197 LR: 0.00004134 [08:44:30] Epoch: 1 Batch: 13973/38378 (36.41%) Loss: 2.050624 LR: 0.00004134 [08:44:32] Epoch: 1 Batch: 13974/38378 (36.41%) Loss: 2.179143 LR: 0.00004133 [08:44:33] Epoch: 1 Batch: 13975/38378 (36.41%) Loss: 1.975712 LR: 0.00004133 [08:44:35] Epoch: 1 Batch: 13976/38378 (36.42%) Loss: 1.867908 LR: 0.00004133 [08:44:37] Epoch: 1 Batch: 13977/38378 (36.42%) Loss: 1.876433 LR: 0.00004133 [08:44:38] Epoch: 1 Batch: 13978/38378 (36.42%) Loss: 2.166717 LR: 0.00004133 [08:44:40] Epoch: 1 Batch: 13979/38378 (36.42%) Loss: 2.080553 LR: 0.00004133 [08:44:42] Epoch: 1 Batch: 13980/38378 (36.43%) Loss: 2.205259 LR: 0.00004133 [08:44:44] Epoch: 1 Batch: 13981/38378 (36.43%) Loss: 1.847815 LR: 0.00004132 [08:44:45] Epoch: 1 Batch: 13982/38378 (36.43%) Loss: 2.025673 LR: 0.00004132 [08:44:47] Epoch: 1 Batch: 13983/38378 (36.43%) Loss: 1.773286 LR: 0.00004132 [08:44:49] Epoch: 1 Batch: 13984/38378 (36.44%) Loss: 2.139699 LR: 0.00004132 [08:44:50] Epoch: 1 Batch: 13985/38378 (36.44%) Loss: 2.035556 LR: 0.00004132 [08:44:52] Epoch: 1 Batch: 13986/38378 (36.44%) Loss: 1.948153 LR: 0.00004132 [08:44:54] Epoch: 1 Batch: 13987/38378 (36.45%) Loss: 1.978310 LR: 0.00004132 [08:44:55] Epoch: 1 Batch: 13988/38378 (36.45%) Loss: 2.163462 LR: 0.00004131 [08:44:57] Epoch: 1 Batch: 13989/38378 (36.45%) Loss: 2.147322 LR: 0.00004131 [08:44:59] Epoch: 1 Batch: 13990/38378 (36.45%) Loss: 2.201340 LR: 0.00004131 [08:45:00] Epoch: 1 Batch: 13991/38378 (36.46%) Loss: 1.866783 LR: 0.00004131 [08:45:02] Epoch: 1 Batch: 13992/38378 (36.46%) Loss: 1.834480 LR: 0.00004131 [08:45:04] Epoch: 1 Batch: 13993/38378 (36.46%) Loss: 1.971116 LR: 0.00004131 [08:45:05] Epoch: 1 Batch: 13994/38378 (36.46%) Loss: 2.194602 LR: 0.00004131 [08:45:07] Epoch: 1 Batch: 13995/38378 (36.47%) Loss: 2.134833 LR: 0.00004130 [08:45:09] Epoch: 1 Batch: 13996/38378 (36.47%) Loss: 2.257122 LR: 0.00004130 [08:45:11] Epoch: 1 Batch: 13997/38378 (36.47%) Loss: 1.960083 LR: 0.00004130 [08:45:12] Epoch: 1 Batch: 13998/38378 (36.47%) Loss: 2.090468 LR: 0.00004130 [08:45:14] Epoch: 1 Batch: 13999/38378 (36.48%) Loss: 1.959733 LR: 0.00004130 [08:45:16] >> Evaluating batch 0 [08:45:17] >> Evaluating batch 1 [08:45:17] >> Evaluating batch 2 [08:45:18] >> Evaluating batch 3 [08:45:19] >> Evaluating batch 4 [08:45:20] >> Evaluating batch 5 [08:45:21] >> Evaluating batch 6 [08:45:22] >> Evaluating batch 7 [08:45:23] >> Evaluating batch 8 [08:45:24] >> Evaluating batch 9 [08:45:25] >> Evaluating batch 10 [08:45:26] >> Evaluating batch 11 [08:45:27] >> Evaluating batch 12 [08:45:28] >> Evaluating batch 13 [08:45:29] >> Evaluating batch 14 [08:45:30] >> Evaluating batch 15 [08:45:31] >> Evaluating batch 16 [08:45:31] Epoch: 1 Step: 14000/38378 Evaluation: [08:45:31] [1mAvg Loss Since Last Eval: 1.9855 Val Loss: 2.1029 Validation loss delta: 0.0037 Perplexity: 8.1895 LR: 0.00004130 [08:45:35] >> Cleaned up old temp checkpoint: epoch1_step13000 [08:45:35] >> Temp checkpoint saved: epoch1_step14000, size: 0.1702 GB [08:45:39] >> Checkpoint saved: epoch1_step14000, size: 0.1702 GB [08:45:39] Epoch: 1 Batch: 14000/38378 (36.48%) Loss: 2.229358 LR: 0.00004130 [08:45:41] Epoch: 1 Batch: 14001/38378 (36.48%) Loss: 1.940333 LR: 0.00004130 [08:45:42] Epoch: 1 Batch: 14002/38378 (36.48%) Loss: 2.042239 LR: 0.00004129 [08:45:44] Epoch: 1 Batch: 14003/38378 (36.49%) Loss: 1.629476 LR: 0.00004129 [08:45:46] Epoch: 1 Batch: 14004/38378 (36.49%) Loss: 2.145404 LR: 0.00004129 [08:45:47] Epoch: 1 Batch: 14005/38378 (36.49%) Loss: 1.817740 LR: 0.00004129 [08:45:49] Epoch: 1 Batch: 14006/38378 (36.49%) Loss: 2.435265 LR: 0.00004129 [08:45:51] Epoch: 1 Batch: 14007/38378 (36.50%) Loss: 2.070725 LR: 0.00004129 [08:45:53] Epoch: 1 Batch: 14008/38378 (36.50%) Loss: 2.043949 LR: 0.00004129 [08:45:54] Epoch: 1 Batch: 14009/38378 (36.50%) Loss: 2.261795 LR: 0.00004128 [08:45:56] Epoch: 1 Batch: 14010/38378 (36.51%) Loss: 1.968814 LR: 0.00004128 [08:45:58] Epoch: 1 Batch: 14011/38378 (36.51%) Loss: 1.885242 LR: 0.00004128 [08:45:59] Epoch: 1 Batch: 14012/38378 (36.51%) Loss: 1.799105 LR: 0.00004128 [08:46:01] Epoch: 1 Batch: 14013/38378 (36.51%) Loss: 1.991265 LR: 0.00004128 [08:46:03] Epoch: 1 Batch: 14014/38378 (36.52%) Loss: 2.050004 LR: 0.00004128 [08:46:05] Epoch: 1 Batch: 14015/38378 (36.52%) Loss: 2.154674 LR: 0.00004128 [08:46:06] Epoch: 1 Batch: 14016/38378 (36.52%) Loss: 2.139949 LR: 0.00004127 [08:46:08] Epoch: 1 Batch: 14017/38378 (36.52%) Loss: 2.060616 LR: 0.00004127 [08:46:10] Epoch: 1 Batch: 14018/38378 (36.53%) Loss: 2.384308 LR: 0.00004127 [08:46:11] Epoch: 1 Batch: 14019/38378 (36.53%) Loss: 2.158039 LR: 0.00004127 [08:46:13] Epoch: 1 Batch: 14020/38378 (36.53%) Loss: 2.011289 LR: 0.00004127 [08:46:15] Epoch: 1 Batch: 14021/38378 (36.53%) Loss: 1.811864 LR: 0.00004127 [08:46:17] Epoch: 1 Batch: 14022/38378 (36.54%) Loss: 2.035575 LR: 0.00004127 [08:46:18] Epoch: 1 Batch: 14023/38378 (36.54%) Loss: 2.067055 LR: 0.00004126 [08:46:20] Epoch: 1 Batch: 14024/38378 (36.54%) Loss: 2.200787 LR: 0.00004126 [08:46:22] Epoch: 1 Batch: 14025/38378 (36.54%) Loss: 1.875517 LR: 0.00004126 [08:46:23] Epoch: 1 Batch: 14026/38378 (36.55%) Loss: 1.970965 LR: 0.00004126 [08:46:25] Epoch: 1 Batch: 14027/38378 (36.55%) Loss: 1.884115 LR: 0.00004126 [08:46:27] Epoch: 1 Batch: 14028/38378 (36.55%) Loss: 1.931827 LR: 0.00004126 [08:46:29] Epoch: 1 Batch: 14029/38378 (36.55%) Loss: 2.132640 LR: 0.00004126 [08:46:30] Epoch: 1 Batch: 14030/38378 (36.56%) Loss: 2.069100 LR: 0.00004125 [08:46:32] Epoch: 1 Batch: 14031/38378 (36.56%) Loss: 1.783625 LR: 0.00004125 [08:46:34] Epoch: 1 Batch: 14032/38378 (36.56%) Loss: 2.023760 LR: 0.00004125 [08:46:35] Epoch: 1 Batch: 14033/38378 (36.57%) Loss: 2.183824 LR: 0.00004125 [08:46:37] Epoch: 1 Batch: 14034/38378 (36.57%) Loss: 1.937116 LR: 0.00004125 [08:46:39] Epoch: 1 Batch: 14035/38378 (36.57%) Loss: 2.135855 LR: 0.00004125 [08:46:40] Epoch: 1 Batch: 14036/38378 (36.57%) Loss: 2.203950 LR: 0.00004125 [08:46:42] Epoch: 1 Batch: 14037/38378 (36.58%) Loss: 1.982483 LR: 0.00004123 [08:46:44] Epoch: 1 Batch: 14038/38378 (36.58%) Loss: 1.986598 LR: 0.00004123 [08:46:45] Epoch: 1 Batch: 14039/38378 (36.58%) Loss: 2.162602 LR: 0.00004123 [08:46:47] Epoch: 1 Batch: 14040/38378 (36.58%) Loss: 1.946285 LR: 0.00004123 [08:46:49] Epoch: 1 Batch: 14041/38378 (36.59%) Loss: 1.683897 LR: 0.00004123 [08:46:51] Epoch: 1 Batch: 14042/38378 (36.59%) Loss: 1.897848 LR: 0.00004123 [08:46:52] Epoch: 1 Batch: 14043/38378 (36.59%) Loss: 2.019852 LR: 0.00004123 [08:46:54] Epoch: 1 Batch: 14044/38378 (36.59%) Loss: 1.698071 LR: 0.00004122 [08:46:56] Epoch: 1 Batch: 14045/38378 (36.60%) Loss: 2.124586 LR: 0.00004122 [08:46:57] Epoch: 1 Batch: 14046/38378 (36.60%) Loss: 1.992868 LR: 0.00004122 [08:46:59] Epoch: 1 Batch: 14047/38378 (36.60%) Loss: 2.075391 LR: 0.00004122 [08:47:01] Epoch: 1 Batch: 14048/38378 (36.60%) Loss: 2.111393 LR: 0.00004122 [08:47:02] Epoch: 1 Batch: 14049/38378 (36.61%) Loss: 2.073183 LR: 0.00004122 [08:47:04] Epoch: 1 Batch: 14050/38378 (36.61%) Loss: 2.023930 LR: 0.00004122 [08:47:06] Epoch: 1 Batch: 14051/38378 (36.61%) Loss: 2.232947 LR: 0.00004121 [08:47:07] Epoch: 1 Batch: 14052/38378 (36.61%) Loss: 1.826922 LR: 0.00004121 [08:47:09] Epoch: 1 Batch: 14053/38378 (36.62%) Loss: 2.097935 LR: 0.00004121 [08:47:11] Epoch: 1 Batch: 14054/38378 (36.62%) Loss: 2.265113 LR: 0.00004121 [08:47:12] Epoch: 1 Batch: 14055/38378 (36.62%) Loss: 2.079818 LR: 0.00004121 [08:47:14] Epoch: 1 Batch: 14056/38378 (36.63%) Loss: 1.798482 LR: 0.00004121 [08:47:16] Epoch: 1 Batch: 14057/38378 (36.63%) Loss: 1.864256 LR: 0.00004121 [08:47:17] Epoch: 1 Batch: 14058/38378 (36.63%) Loss: 2.248140 LR: 0.00004120 [08:47:19] Epoch: 1 Batch: 14059/38378 (36.63%) Loss: 2.205120 LR: 0.00004120 [08:47:21] Epoch: 1 Batch: 14060/38378 (36.64%) Loss: 2.031483 LR: 0.00004120 [08:47:22] Epoch: 1 Batch: 14061/38378 (36.64%) Loss: 2.160546 LR: 0.00004120 [08:47:24] Epoch: 1 Batch: 14062/38378 (36.64%) Loss: 1.847163 LR: 0.00004120 [08:47:26] Epoch: 1 Batch: 14063/38378 (36.64%) Loss: 1.974741 LR: 0.00004120 [08:47:27] Epoch: 1 Batch: 14064/38378 (36.65%) Loss: 1.860397 LR: 0.00004120 [08:47:29] Epoch: 1 Batch: 14065/38378 (36.65%) Loss: 1.879417 LR: 0.00004119 [08:47:31] Epoch: 1 Batch: 14066/38378 (36.65%) Loss: 2.074056 LR: 0.00004119 [08:47:33] Epoch: 1 Batch: 14067/38378 (36.65%) Loss: 1.752527 LR: 0.00004119 [08:47:34] Epoch: 1 Batch: 14068/38378 (36.66%) Loss: 2.204764 LR: 0.00004119 [08:47:36] Epoch: 1 Batch: 14069/38378 (36.66%) Loss: 2.158399 LR: 0.00004119 [08:47:38] Epoch: 1 Batch: 14070/38378 (36.66%) Loss: 1.876150 LR: 0.00004119 [08:47:39] Epoch: 1 Batch: 14071/38378 (36.66%) Loss: 1.956169 LR: 0.00004119 [08:47:41] Epoch: 1 Batch: 14072/38378 (36.67%) Loss: 1.811990 LR: 0.00004118 [08:47:43] Epoch: 1 Batch: 14073/38378 (36.67%) Loss: 1.925146 LR: 0.00004118 [08:47:44] Epoch: 1 Batch: 14074/38378 (36.67%) Loss: 2.177038 LR: 0.00004118 [08:47:46] Epoch: 1 Batch: 14075/38378 (36.67%) Loss: 1.780647 LR: 0.00004118 [08:47:48] Epoch: 1 Batch: 14076/38378 (36.68%) Loss: 1.855668 LR: 0.00004118 [08:47:50] Epoch: 1 Batch: 14077/38378 (36.68%) Loss: 1.825013 LR: 0.00004118 [08:47:51] Epoch: 1 Batch: 14078/38378 (36.68%) Loss: 2.112453 LR: 0.00004118 [08:47:53] Epoch: 1 Batch: 14079/38378 (36.69%) Loss: 2.070766 LR: 0.00004117 [08:47:55] Epoch: 1 Batch: 14080/38378 (36.69%) Loss: 1.698166 LR: 0.00004117 [08:47:56] Epoch: 1 Batch: 14081/38378 (36.69%) Loss: 2.057551 LR: 0.00004117 [08:47:58] Epoch: 1 Batch: 14082/38378 (36.69%) Loss: 1.907692 LR: 0.00004117 [08:48:00] Epoch: 1 Batch: 14083/38378 (36.70%) Loss: 1.961269 LR: 0.00004117 [08:48:01] Epoch: 1 Batch: 14084/38378 (36.70%) Loss: 1.744783 LR: 0.00004117 [08:48:03] Epoch: 1 Batch: 14085/38378 (36.70%) Loss: 1.966428 LR: 0.00004117 [08:48:05] Epoch: 1 Batch: 14086/38378 (36.70%) Loss: 1.884864 LR: 0.00004116 [08:48:07] Epoch: 1 Batch: 14087/38378 (36.71%) Loss: 1.863099 LR: 0.00004116 [08:48:08] Epoch: 1 Batch: 14088/38378 (36.71%) Loss: 2.138685 LR: 0.00004116 [08:48:10] Epoch: 1 Batch: 14089/38378 (36.71%) Loss: 2.223822 LR: 0.00004116 [08:48:12] Epoch: 1 Batch: 14090/38378 (36.71%) Loss: 2.036133 LR: 0.00004116 [08:48:13] Epoch: 1 Batch: 14091/38378 (36.72%) Loss: 2.320401 LR: 0.00004116 [08:48:15] Epoch: 1 Batch: 14092/38378 (36.72%) Loss: 2.050054 LR: 0.00004116 [08:48:17] Epoch: 1 Batch: 14093/38378 (36.72%) Loss: 2.100774 LR: 0.00004115 [08:48:18] Epoch: 1 Batch: 14094/38378 (36.72%) Loss: 1.930816 LR: 0.00004115 [08:48:20] Epoch: 1 Batch: 14095/38378 (36.73%) Loss: 2.144516 LR: 0.00004115 [08:48:22] Epoch: 1 Batch: 14096/38378 (36.73%) Loss: 2.007445 LR: 0.00004115 [08:48:24] Epoch: 1 Batch: 14097/38378 (36.73%) Loss: 1.961317 LR: 0.00004115 [08:48:25] Epoch: 1 Batch: 14098/38378 (36.73%) Loss: 1.844846 LR: 0.00004115 [08:48:27] Epoch: 1 Batch: 14099/38378 (36.74%) Loss: 2.048872 LR: 0.00004115 [08:48:33] >> Cleaned up old temp checkpoint: epoch1_step13100 [08:48:33] >> Temp checkpoint saved: epoch1_step14100, size: 0.1702 GB [08:48:33] Epoch: 1 Batch: 14100/38378 (36.74%) Loss: 2.085460 LR: 0.00004114 [08:48:35] Epoch: 1 Batch: 14101/38378 (36.74%) Loss: 2.166768 LR: 0.00004114 [08:48:36] Epoch: 1 Batch: 14102/38378 (36.75%) Loss: 1.949625 LR: 0.00004114 [08:48:38] Epoch: 1 Batch: 14103/38378 (36.75%) Loss: 1.935690 LR: 0.00004114 [08:48:40] Epoch: 1 Batch: 14104/38378 (36.75%) Loss: 2.181109 LR: 0.00004114 [08:48:41] Epoch: 1 Batch: 14105/38378 (36.75%) Loss: 1.884481 LR: 0.00004114 [08:48:43] Epoch: 1 Batch: 14106/38378 (36.76%) Loss: 1.871389 LR: 0.00004114 [08:48:45] Epoch: 1 Batch: 14107/38378 (36.76%) Loss: 1.994612 LR: 0.00004113 [08:48:46] Epoch: 1 Batch: 14108/38378 (36.76%) Loss: 1.989304 LR: 0.00004113 [08:48:48] Epoch: 1 Batch: 14109/38378 (36.76%) Loss: 2.048593 LR: 0.00004113 [08:48:50] Epoch: 1 Batch: 14110/38378 (36.77%) Loss: 1.886252 LR: 0.00004113 [08:48:51] Epoch: 1 Batch: 14111/38378 (36.77%) Loss: 1.832184 LR: 0.00004113 [08:48:53] Epoch: 1 Batch: 14112/38378 (36.77%) Loss: 2.091200 LR: 0.00004113 [08:48:55] Epoch: 1 Batch: 14113/38378 (36.77%) Loss: 1.877836 LR: 0.00004113 [08:48:57] Epoch: 1 Batch: 14114/38378 (36.78%) Loss: 2.040619 LR: 0.00004112 [08:48:58] Epoch: 1 Batch: 14115/38378 (36.78%) Loss: 2.303033 LR: 0.00004112 [08:49:00] Epoch: 1 Batch: 14116/38378 (36.78%) Loss: 1.888110 LR: 0.00004112 [08:49:02] Epoch: 1 Batch: 14117/38378 (36.78%) Loss: 1.827320 LR: 0.00004112 [08:49:03] Epoch: 1 Batch: 14118/38378 (36.79%) Loss: 1.996740 LR: 0.00004112 [08:49:05] Epoch: 1 Batch: 14119/38378 (36.79%) Loss: 1.751387 LR: 0.00004112 [08:49:07] Epoch: 1 Batch: 14120/38378 (36.79%) Loss: 2.097757 LR: 0.00004112 [08:49:09] Epoch: 1 Batch: 14121/38378 (36.79%) Loss: 1.832786 LR: 0.00004111 [08:49:10] Epoch: 1 Batch: 14122/38378 (36.80%) Loss: 1.854508 LR: 0.00004111 [08:49:12] Epoch: 1 Batch: 14123/38378 (36.80%) Loss: 1.790447 LR: 0.00004111 [08:49:14] Epoch: 1 Batch: 14124/38378 (36.80%) Loss: 1.918465 LR: 0.00004111 [08:49:15] Epoch: 1 Batch: 14125/38378 (36.80%) Loss: 1.960069 LR: 0.00004111 [08:49:17] Epoch: 1 Batch: 14126/38378 (36.81%) Loss: 1.647504 LR: 0.00004111 [08:49:19] Epoch: 1 Batch: 14127/38378 (36.81%) Loss: 1.828959 LR: 0.00004111 [08:49:20] Epoch: 1 Batch: 14128/38378 (36.81%) Loss: 1.783358 LR: 0.00004110 [08:49:22] Epoch: 1 Batch: 14129/38378 (36.82%) Loss: 2.086984 LR: 0.00004110 [08:49:24] Epoch: 1 Batch: 14130/38378 (36.82%) Loss: 2.354092 LR: 0.00004110 [08:49:25] Epoch: 1 Batch: 14131/38378 (36.82%) Loss: 1.954526 LR: 0.00004110 [08:49:27] Epoch: 1 Batch: 14132/38378 (36.82%) Loss: 2.013857 LR: 0.00004110 [08:49:29] Epoch: 1 Batch: 14133/38378 (36.83%) Loss: 1.965841 LR: 0.00004110 [08:49:31] Epoch: 1 Batch: 14134/38378 (36.83%) Loss: 2.000713 LR: 0.00004110 [08:49:32] Epoch: 1 Batch: 14135/38378 (36.83%) Loss: 1.801040 LR: 0.00004109 [08:49:34] Epoch: 1 Batch: 14136/38378 (36.83%) Loss: 2.048661 LR: 0.00004109 [08:49:36] Epoch: 1 Batch: 14137/38378 (36.84%) Loss: 1.791524 LR: 0.00004109 [08:49:37] Epoch: 1 Batch: 14138/38378 (36.84%) Loss: 1.954068 LR: 0.00004109 [08:49:39] Epoch: 1 Batch: 14139/38378 (36.84%) Loss: 2.084322 LR: 0.00004109 [08:49:41] Epoch: 1 Batch: 14140/38378 (36.84%) Loss: 1.866202 LR: 0.00004109 [08:49:42] Epoch: 1 Batch: 14141/38378 (36.85%) Loss: 2.005863 LR: 0.00004109 [08:49:44] Epoch: 1 Batch: 14142/38378 (36.85%) Loss: 1.880453 LR: 0.00004108 [08:49:46] Epoch: 1 Batch: 14143/38378 (36.85%) Loss: 2.069230 LR: 0.00004108 [08:49:47] Epoch: 1 Batch: 14144/38378 (36.85%) Loss: 1.975560 LR: 0.00004108 [08:49:49] Epoch: 1 Batch: 14145/38378 (36.86%) Loss: 2.003236 LR: 0.00004108 [08:49:51] Epoch: 1 Batch: 14146/38378 (36.86%) Loss: 1.868774 LR: 0.00004108 [08:49:53] Epoch: 1 Batch: 14147/38378 (36.86%) Loss: 2.026016 LR: 0.00004108 [08:49:54] Epoch: 1 Batch: 14148/38378 (36.86%) Loss: 1.829159 LR: 0.00004108 [08:49:56] Epoch: 1 Batch: 14149/38378 (36.87%) Loss: 2.122408 LR: 0.00004107 [08:49:58] Epoch: 1 Batch: 14150/38378 (36.87%) Loss: 1.889529 LR: 0.00004107 [08:49:59] Epoch: 1 Batch: 14151/38378 (36.87%) Loss: 1.973846 LR: 0.00004107 [08:50:01] Epoch: 1 Batch: 14152/38378 (36.88%) Loss: 1.984494 LR: 0.00004107 [08:50:03] Epoch: 1 Batch: 14153/38378 (36.88%) Loss: 2.092648 LR: 0.00004107 [08:50:04] Epoch: 1 Batch: 14154/38378 (36.88%) Loss: 1.877442 LR: 0.00004107 [08:50:06] Epoch: 1 Batch: 14155/38378 (36.88%) Loss: 2.078130 LR: 0.00004107 [08:50:08] Epoch: 1 Batch: 14156/38378 (36.89%) Loss: 1.846934 LR: 0.00004106 [08:50:09] Epoch: 1 Batch: 14157/38378 (36.89%) Loss: 1.851514 LR: 0.00004106 [08:50:11] Epoch: 1 Batch: 14158/38378 (36.89%) Loss: 1.913155 LR: 0.00004106 [08:50:13] Epoch: 1 Batch: 14159/38378 (36.89%) Loss: 1.976059 LR: 0.00004106 [08:50:14] Epoch: 1 Batch: 14160/38378 (36.90%) Loss: 2.128029 LR: 0.00004106 [08:50:16] Epoch: 1 Batch: 14161/38378 (36.90%) Loss: 1.671209 LR: 0.00004106 [08:50:18] Epoch: 1 Batch: 14162/38378 (36.90%) Loss: 2.161097 LR: 0.00004106 [08:50:19] Epoch: 1 Batch: 14163/38378 (36.90%) Loss: 1.797539 LR: 0.00004105 [08:50:21] Epoch: 1 Batch: 14164/38378 (36.91%) Loss: 1.864091 LR: 0.00004105 [08:50:23] Epoch: 1 Batch: 14165/38378 (36.91%) Loss: 2.191092 LR: 0.00004105 [08:50:25] Epoch: 1 Batch: 14166/38378 (36.91%) Loss: 1.938625 LR: 0.00004105 [08:50:26] Epoch: 1 Batch: 14167/38378 (36.91%) Loss: 2.493554 LR: 0.00004105 [08:50:28] Epoch: 1 Batch: 14168/38378 (36.92%) Loss: 2.194729 LR: 0.00004105 [08:50:30] Epoch: 1 Batch: 14169/38378 (36.92%) Loss: 2.213271 LR: 0.00004105 [08:50:31] Epoch: 1 Batch: 14170/38378 (36.92%) Loss: 2.110141 LR: 0.00004104 [08:50:33] Epoch: 1 Batch: 14171/38378 (36.92%) Loss: 1.951438 LR: 0.00004104 [08:50:35] Epoch: 1 Batch: 14172/38378 (36.93%) Loss: 2.011771 LR: 0.00004104 [08:50:36] Epoch: 1 Batch: 14173/38378 (36.93%) Loss: 1.937911 LR: 0.00004104 [08:50:38] Epoch: 1 Batch: 14174/38378 (36.93%) Loss: 2.282106 LR: 0.00004104 [08:50:40] Epoch: 1 Batch: 14175/38378 (36.94%) Loss: 1.829971 LR: 0.00004104 [08:50:42] Epoch: 1 Batch: 14176/38378 (36.94%) Loss: 2.002868 LR: 0.00004104 [08:50:43] Epoch: 1 Batch: 14177/38378 (36.94%) Loss: 1.620549 LR: 0.00004103 [08:50:45] Epoch: 1 Batch: 14178/38378 (36.94%) Loss: 1.831737 LR: 0.00004103 [08:50:47] Epoch: 1 Batch: 14179/38378 (36.95%) Loss: 2.226922 LR: 0.00004103 [08:50:48] Epoch: 1 Batch: 14180/38378 (36.95%) Loss: 1.896083 LR: 0.00004103 [08:50:50] Epoch: 1 Batch: 14181/38378 (36.95%) Loss: 1.986022 LR: 0.00004103 [08:50:52] Epoch: 1 Batch: 14182/38378 (36.95%) Loss: 2.168641 LR: 0.00004103 [08:50:53] Epoch: 1 Batch: 14183/38378 (36.96%) Loss: 1.629353 LR: 0.00004103 [08:50:55] Epoch: 1 Batch: 14184/38378 (36.96%) Loss: 1.930562 LR: 0.00004102 [08:50:57] Epoch: 1 Batch: 14185/38378 (36.96%) Loss: 1.857225 LR: 0.00004102 [08:50:58] Epoch: 1 Batch: 14186/38378 (36.96%) Loss: 2.261322 LR: 0.00004102 [08:51:00] Epoch: 1 Batch: 14187/38378 (36.97%) Loss: 2.072331 LR: 0.00004102 [08:51:02] Epoch: 1 Batch: 14188/38378 (36.97%) Loss: 2.031950 LR: 0.00004102 [08:51:03] Epoch: 1 Batch: 14189/38378 (36.97%) Loss: 1.879074 LR: 0.00004102 [08:51:05] Epoch: 1 Batch: 14190/38378 (36.97%) Loss: 1.887737 LR: 0.00004102 [08:51:07] Epoch: 1 Batch: 14191/38378 (36.98%) Loss: 1.890093 LR: 0.00004101 [08:51:09] Epoch: 1 Batch: 14192/38378 (36.98%) Loss: 1.924646 LR: 0.00004101 [08:51:10] Epoch: 1 Batch: 14193/38378 (36.98%) Loss: 2.275104 LR: 0.00004101 [08:51:12] Epoch: 1 Batch: 14194/38378 (36.98%) Loss: 1.936869 LR: 0.00004101 [08:51:14] Epoch: 1 Batch: 14195/38378 (36.99%) Loss: 1.962010 LR: 0.00004101 [08:51:15] Epoch: 1 Batch: 14196/38378 (36.99%) Loss: 2.074900 LR: 0.00004101 [08:51:17] Epoch: 1 Batch: 14197/38378 (36.99%) Loss: 2.007629 LR: 0.00004101 [08:51:19] Epoch: 1 Batch: 14198/38378 (37.00%) Loss: 1.917617 LR: 0.00004100 [08:51:20] Epoch: 1 Batch: 14199/38378 (37.00%) Loss: 1.941650 LR: 0.00004100 [08:51:27] >> Cleaned up old temp checkpoint: epoch1_step13200 [08:51:27] >> Temp checkpoint saved: epoch1_step14200, size: 0.1702 GB [08:51:27] Epoch: 1 Batch: 14200/38378 (37.00%) Loss: 1.878023 LR: 0.00004100 [08:51:29] Epoch: 1 Batch: 14201/38378 (37.00%) Loss: 1.989844 LR: 0.00004100 [08:51:30] Epoch: 1 Batch: 14202/38378 (37.01%) Loss: 2.181966 LR: 0.00004100 [08:51:32] Epoch: 1 Batch: 14203/38378 (37.01%) Loss: 2.078077 LR: 0.00004100 [08:51:34] Epoch: 1 Batch: 14204/38378 (37.01%) Loss: 2.031514 LR: 0.00004100 [08:51:35] Epoch: 1 Batch: 14205/38378 (37.01%) Loss: 2.135583 LR: 0.00004099 [08:51:37] Epoch: 1 Batch: 14206/38378 (37.02%) Loss: 2.243103 LR: 0.00004099 [08:51:39] Epoch: 1 Batch: 14207/38378 (37.02%) Loss: 1.811368 LR: 0.00004099 [08:51:40] Epoch: 1 Batch: 14208/38378 (37.02%) Loss: 1.927928 LR: 0.00004099 [08:51:42] Epoch: 1 Batch: 14209/38378 (37.02%) Loss: 1.842021 LR: 0.00004099 [08:51:44] Epoch: 1 Batch: 14210/38378 (37.03%) Loss: 1.876462 LR: 0.00004099 [08:51:45] Epoch: 1 Batch: 14211/38378 (37.03%) Loss: 2.109757 LR: 0.00004099 [08:51:47] Epoch: 1 Batch: 14212/38378 (37.03%) Loss: 2.394044 LR: 0.00004098 [08:51:49] Epoch: 1 Batch: 14213/38378 (37.03%) Loss: 2.209350 LR: 0.00004098 [08:51:50] Epoch: 1 Batch: 14214/38378 (37.04%) Loss: 1.750512 LR: 0.00004098 [08:51:52] Epoch: 1 Batch: 14215/38378 (37.04%) Loss: 1.916255 LR: 0.00004098 [08:51:54] Epoch: 1 Batch: 14216/38378 (37.04%) Loss: 2.027502 LR: 0.00004098 [08:51:55] Epoch: 1 Batch: 14217/38378 (37.04%) Loss: 1.761465 LR: 0.00004098 [08:51:57] Epoch: 1 Batch: 14218/38378 (37.05%) Loss: 1.871220 LR: 0.00004098 [08:51:59] Epoch: 1 Batch: 14219/38378 (37.05%) Loss: 1.926607 LR: 0.00004097 [08:52:01] Epoch: 1 Batch: 14220/38378 (37.05%) Loss: 1.969284 LR: 0.00004097 [08:52:02] Epoch: 1 Batch: 14221/38378 (37.06%) Loss: 2.040153 LR: 0.00004097 [08:52:04] Epoch: 1 Batch: 14222/38378 (37.06%) Loss: 1.963535 LR: 0.00004097 [08:52:06] Epoch: 1 Batch: 14223/38378 (37.06%) Loss: 1.740622 LR: 0.00004097 [08:52:07] Epoch: 1 Batch: 14224/38378 (37.06%) Loss: 2.110042 LR: 0.00004097 [08:52:09] Epoch: 1 Batch: 14225/38378 (37.07%) Loss: 1.875390 LR: 0.00004097 [08:52:11] Epoch: 1 Batch: 14226/38378 (37.07%) Loss: 1.956576 LR: 0.00004095 [08:52:13] Epoch: 1 Batch: 14227/38378 (37.07%) Loss: 1.942684 LR: 0.00004095 [08:52:14] Epoch: 1 Batch: 14228/38378 (37.07%) Loss: 2.011146 LR: 0.00004095 [08:52:16] Epoch: 1 Batch: 14229/38378 (37.08%) Loss: 1.671192 LR: 0.00004095 [08:52:18] Epoch: 1 Batch: 14230/38378 (37.08%) Loss: 2.219209 LR: 0.00004095 [08:52:19] Epoch: 1 Batch: 14231/38378 (37.08%) Loss: 2.043221 LR: 0.00004095 [08:52:21] Epoch: 1 Batch: 14232/38378 (37.08%) Loss: 2.274731 LR: 0.00004095 [08:52:23] Epoch: 1 Batch: 14233/38378 (37.09%) Loss: 2.327107 LR: 0.00004094 [08:52:24] Epoch: 1 Batch: 14234/38378 (37.09%) Loss: 2.040472 LR: 0.00004094 [08:52:26] Epoch: 1 Batch: 14235/38378 (37.09%) Loss: 1.834956 LR: 0.00004094 [08:52:28] Epoch: 1 Batch: 14236/38378 (37.09%) Loss: 2.029092 LR: 0.00004094 [08:52:30] Epoch: 1 Batch: 14237/38378 (37.10%) Loss: 1.986150 LR: 0.00004094 [08:52:31] Epoch: 1 Batch: 14238/38378 (37.10%) Loss: 1.674833 LR: 0.00004094 [08:52:33] Epoch: 1 Batch: 14239/38378 (37.10%) Loss: 2.080991 LR: 0.00004094 [08:52:35] Epoch: 1 Batch: 14240/38378 (37.10%) Loss: 1.984327 LR: 0.00004093 [08:52:36] Epoch: 1 Batch: 14241/38378 (37.11%) Loss: 2.215146 LR: 0.00004093 [08:52:38] Epoch: 1 Batch: 14242/38378 (37.11%) Loss: 2.190696 LR: 0.00004093 [08:52:40] Epoch: 1 Batch: 14243/38378 (37.11%) Loss: 2.043078 LR: 0.00004093 [08:52:41] Epoch: 1 Batch: 14244/38378 (37.12%) Loss: 2.316162 LR: 0.00004093 [08:52:43] Epoch: 1 Batch: 14245/38378 (37.12%) Loss: 1.766795 LR: 0.00004093 [08:52:45] Epoch: 1 Batch: 14246/38378 (37.12%) Loss: 2.209609 LR: 0.00004093 [08:52:46] Epoch: 1 Batch: 14247/38378 (37.12%) Loss: 2.302051 LR: 0.00004092 [08:52:48] Epoch: 1 Batch: 14248/38378 (37.13%) Loss: 2.015039 LR: 0.00004092 [08:52:50] Epoch: 1 Batch: 14249/38378 (37.13%) Loss: 2.040876 LR: 0.00004092 [08:52:52] Epoch: 1 Batch: 14250/38378 (37.13%) Loss: 1.663144 LR: 0.00004092 [08:52:53] Epoch: 1 Batch: 14251/38378 (37.13%) Loss: 1.821430 LR: 0.00004092 [08:52:55] Epoch: 1 Batch: 14252/38378 (37.14%) Loss: 2.036509 LR: 0.00004092 [08:52:57] Epoch: 1 Batch: 14253/38378 (37.14%) Loss: 2.141145 LR: 0.00004092 [08:52:58] Epoch: 1 Batch: 14254/38378 (37.14%) Loss: 2.169177 LR: 0.00004091 [08:53:00] Epoch: 1 Batch: 14255/38378 (37.14%) Loss: 1.981794 LR: 0.00004091 [08:53:02] Epoch: 1 Batch: 14256/38378 (37.15%) Loss: 2.026105 LR: 0.00004091 [08:53:03] Epoch: 1 Batch: 14257/38378 (37.15%) Loss: 1.951088 LR: 0.00004091 [08:53:05] Epoch: 1 Batch: 14258/38378 (37.15%) Loss: 2.055061 LR: 0.00004091 [08:53:07] Epoch: 1 Batch: 14259/38378 (37.15%) Loss: 1.943783 LR: 0.00004091 [08:53:08] Epoch: 1 Batch: 14260/38378 (37.16%) Loss: 1.714245 LR: 0.00004091 [08:53:10] Epoch: 1 Batch: 14261/38378 (37.16%) Loss: 1.946972 LR: 0.00004090 [08:53:12] Epoch: 1 Batch: 14262/38378 (37.16%) Loss: 1.600106 LR: 0.00004090 [08:53:14] Epoch: 1 Batch: 14263/38378 (37.16%) Loss: 2.002338 LR: 0.00004090 [08:53:15] Epoch: 1 Batch: 14264/38378 (37.17%) Loss: 1.861829 LR: 0.00004090 [08:53:17] Epoch: 1 Batch: 14265/38378 (37.17%) Loss: 1.857064 LR: 0.00004090 [08:53:19] Epoch: 1 Batch: 14266/38378 (37.17%) Loss: 1.948882 LR: 0.00004090 [08:53:20] Epoch: 1 Batch: 14267/38378 (37.17%) Loss: 2.047589 LR: 0.00004090 [08:53:22] Epoch: 1 Batch: 14268/38378 (37.18%) Loss: 2.033821 LR: 0.00004089 [08:53:24] Epoch: 1 Batch: 14269/38378 (37.18%) Loss: 1.751467 LR: 0.00004089 [08:53:25] Epoch: 1 Batch: 14270/38378 (37.18%) Loss: 1.995309 LR: 0.00004089 [08:53:27] Epoch: 1 Batch: 14271/38378 (37.19%) Loss: 1.922445 LR: 0.00004089 [08:53:29] Epoch: 1 Batch: 14272/38378 (37.19%) Loss: 2.116428 LR: 0.00004089 [08:53:30] Epoch: 1 Batch: 14273/38378 (37.19%) Loss: 2.066782 LR: 0.00004089 [08:53:32] Epoch: 1 Batch: 14274/38378 (37.19%) Loss: 1.818668 LR: 0.00004089 [08:53:34] Epoch: 1 Batch: 14275/38378 (37.20%) Loss: 2.013414 LR: 0.00004088 [08:53:36] Epoch: 1 Batch: 14276/38378 (37.20%) Loss: 1.901539 LR: 0.00004088 [08:53:37] Epoch: 1 Batch: 14277/38378 (37.20%) Loss: 1.703852 LR: 0.00004088 [08:53:39] Epoch: 1 Batch: 14278/38378 (37.20%) Loss: 2.025016 LR: 0.00004088 [08:53:41] Epoch: 1 Batch: 14279/38378 (37.21%) Loss: 1.889851 LR: 0.00004088 [08:53:42] Epoch: 1 Batch: 14280/38378 (37.21%) Loss: 2.093717 LR: 0.00004088 [08:53:44] Epoch: 1 Batch: 14281/38378 (37.21%) Loss: 1.849563 LR: 0.00004088 [08:53:46] Epoch: 1 Batch: 14282/38378 (37.21%) Loss: 1.871043 LR: 0.00004087 [08:53:47] Epoch: 1 Batch: 14283/38378 (37.22%) Loss: 2.049833 LR: 0.00004087 [08:53:49] Epoch: 1 Batch: 14284/38378 (37.22%) Loss: 2.034845 LR: 0.00004087 [08:53:51] Epoch: 1 Batch: 14285/38378 (37.22%) Loss: 1.950855 LR: 0.00004087 [08:53:52] Epoch: 1 Batch: 14286/38378 (37.22%) Loss: 2.119799 LR: 0.00004087 [08:53:54] Epoch: 1 Batch: 14287/38378 (37.23%) Loss: 2.150754 LR: 0.00004087 [08:53:56] Epoch: 1 Batch: 14288/38378 (37.23%) Loss: 1.953324 LR: 0.00004087 [08:53:57] Epoch: 1 Batch: 14289/38378 (37.23%) Loss: 2.140833 LR: 0.00004086 [08:53:59] Epoch: 1 Batch: 14290/38378 (37.23%) Loss: 1.851082 LR: 0.00004086 [08:54:01] Epoch: 1 Batch: 14291/38378 (37.24%) Loss: 2.010727 LR: 0.00004086 [08:54:03] Epoch: 1 Batch: 14292/38378 (37.24%) Loss: 2.029198 LR: 0.00004086 [08:54:04] Epoch: 1 Batch: 14293/38378 (37.24%) Loss: 2.252271 LR: 0.00004086 [08:54:06] Epoch: 1 Batch: 14294/38378 (37.25%) Loss: 2.005383 LR: 0.00004086 [08:54:08] Epoch: 1 Batch: 14295/38378 (37.25%) Loss: 2.062914 LR: 0.00004086 [08:54:09] Epoch: 1 Batch: 14296/38378 (37.25%) Loss: 1.941525 LR: 0.00004085 [08:54:11] Epoch: 1 Batch: 14297/38378 (37.25%) Loss: 2.185290 LR: 0.00004085 [08:54:13] Epoch: 1 Batch: 14298/38378 (37.26%) Loss: 2.140220 LR: 0.00004085 [08:54:15] Epoch: 1 Batch: 14299/38378 (37.26%) Loss: 2.104048 LR: 0.00004085 [08:54:21] >> Cleaned up old temp checkpoint: epoch1_step13300 [08:54:21] >> Temp checkpoint saved: epoch1_step14300, size: 0.1702 GB [08:54:21] Epoch: 1 Batch: 14300/38378 (37.26%) Loss: 2.202823 LR: 0.00004085 [08:54:22] Epoch: 1 Batch: 14301/38378 (37.26%) Loss: 2.148599 LR: 0.00004085 [08:54:24] Epoch: 1 Batch: 14302/38378 (37.27%) Loss: 1.952808 LR: 0.00004085 [08:54:26] Epoch: 1 Batch: 14303/38378 (37.27%) Loss: 2.307902 LR: 0.00004084 [08:54:27] Epoch: 1 Batch: 14304/38378 (37.27%) Loss: 1.982285 LR: 0.00004084 [08:54:29] Epoch: 1 Batch: 14305/38378 (37.27%) Loss: 1.845378 LR: 0.00004084 [08:54:31] Epoch: 1 Batch: 14306/38378 (37.28%) Loss: 1.982126 LR: 0.00004084 [08:54:32] Epoch: 1 Batch: 14307/38378 (37.28%) Loss: 1.801139 LR: 0.00004084 [08:54:34] Epoch: 1 Batch: 14308/38378 (37.28%) Loss: 1.800351 LR: 0.00004084 [08:54:36] Epoch: 1 Batch: 14309/38378 (37.28%) Loss: 1.827358 LR: 0.00004084 [08:54:37] Epoch: 1 Batch: 14310/38378 (37.29%) Loss: 1.720057 LR: 0.00004083 [08:54:39] Epoch: 1 Batch: 14311/38378 (37.29%) Loss: 1.808305 LR: 0.00004083 [08:54:41] Epoch: 1 Batch: 14312/38378 (37.29%) Loss: 1.996701 LR: 0.00004083 [08:54:43] Epoch: 1 Batch: 14313/38378 (37.29%) Loss: 1.960272 LR: 0.00004083 [08:54:44] Epoch: 1 Batch: 14314/38378 (37.30%) Loss: 1.952303 LR: 0.00004083 [08:54:46] Epoch: 1 Batch: 14315/38378 (37.30%) Loss: 1.846044 LR: 0.00004083 [08:54:48] Epoch: 1 Batch: 14316/38378 (37.30%) Loss: 1.909697 LR: 0.00004083 [08:54:49] Epoch: 1 Batch: 14317/38378 (37.31%) Loss: 2.054856 LR: 0.00004082 [08:54:51] Epoch: 1 Batch: 14318/38378 (37.31%) Loss: 2.214071 LR: 0.00004082 [08:54:53] Epoch: 1 Batch: 14319/38378 (37.31%) Loss: 1.993338 LR: 0.00004082 [08:54:55] Epoch: 1 Batch: 14320/38378 (37.31%) Loss: 1.935791 LR: 0.00004082 [08:54:56] Epoch: 1 Batch: 14321/38378 (37.32%) Loss: 2.027514 LR: 0.00004082 [08:54:58] Epoch: 1 Batch: 14322/38378 (37.32%) Loss: 2.113013 LR: 0.00004082 [08:55:00] Epoch: 1 Batch: 14323/38378 (37.32%) Loss: 1.767748 LR: 0.00004082 [08:55:01] Epoch: 1 Batch: 14324/38378 (37.32%) Loss: 1.995890 LR: 0.00004081 [08:55:03] Epoch: 1 Batch: 14325/38378 (37.33%) Loss: 1.924154 LR: 0.00004081 [08:55:05] Epoch: 1 Batch: 14326/38378 (37.33%) Loss: 1.591038 LR: 0.00004081 [08:55:06] Epoch: 1 Batch: 14327/38378 (37.33%) Loss: 2.116831 LR: 0.00004081 [08:55:08] Epoch: 1 Batch: 14328/38378 (37.33%) Loss: 1.954332 LR: 0.00004081 [08:55:10] Epoch: 1 Batch: 14329/38378 (37.34%) Loss: 2.023487 LR: 0.00004081 [08:55:12] Epoch: 1 Batch: 14330/38378 (37.34%) Loss: 1.926249 LR: 0.00004081 [08:55:13] Epoch: 1 Batch: 14331/38378 (37.34%) Loss: 1.955259 LR: 0.00004080 [08:55:15] Epoch: 1 Batch: 14332/38378 (37.34%) Loss: 2.244297 LR: 0.00004080 [08:55:17] Epoch: 1 Batch: 14333/38378 (37.35%) Loss: 1.969109 LR: 0.00004080 [08:55:19] Epoch: 1 Batch: 14334/38378 (37.35%) Loss: 1.859902 LR: 0.00004080 [08:55:21] Epoch: 1 Batch: 14335/38378 (37.35%) Loss: 2.054511 LR: 0.00004080 [08:55:22] Epoch: 1 Batch: 14336/38378 (37.35%) Loss: 2.130859 LR: 0.00004080 [08:55:24] Epoch: 1 Batch: 14337/38378 (37.36%) Loss: 1.787339 LR: 0.00004080 [08:55:26] Epoch: 1 Batch: 14338/38378 (37.36%) Loss: 2.061487 LR: 0.00004079 [08:55:27] Epoch: 1 Batch: 14339/38378 (37.36%) Loss: 1.741220 LR: 0.00004079 [08:55:29] Epoch: 1 Batch: 14340/38378 (37.37%) Loss: 1.830363 LR: 0.00004079 [08:55:31] Epoch: 1 Batch: 14341/38378 (37.37%) Loss: 2.161565 LR: 0.00004079 [08:55:33] Epoch: 1 Batch: 14342/38378 (37.37%) Loss: 1.977891 LR: 0.00004079 [08:55:34] Epoch: 1 Batch: 14343/38378 (37.37%) Loss: 1.655777 LR: 0.00004079 [08:55:36] Epoch: 1 Batch: 14344/38378 (37.38%) Loss: 2.204886 LR: 0.00004079 [08:55:38] Epoch: 1 Batch: 14345/38378 (37.38%) Loss: 1.809073 LR: 0.00004078 [08:55:39] Epoch: 1 Batch: 14346/38378 (37.38%) Loss: 1.953066 LR: 0.00004078 [08:55:41] Epoch: 1 Batch: 14347/38378 (37.38%) Loss: 2.301768 LR: 0.00004078 [08:55:43] Epoch: 1 Batch: 14348/38378 (37.39%) Loss: 2.326260 LR: 0.00004078 [08:55:44] Epoch: 1 Batch: 14349/38378 (37.39%) Loss: 2.127197 LR: 0.00004078 [08:55:46] Epoch: 1 Batch: 14350/38378 (37.39%) Loss: 2.042076 LR: 0.00004078 [08:55:48] Epoch: 1 Batch: 14351/38378 (37.39%) Loss: 1.773384 LR: 0.00004078 [08:55:49] Epoch: 1 Batch: 14352/38378 (37.40%) Loss: 1.926635 LR: 0.00004077 [08:55:51] Epoch: 1 Batch: 14353/38378 (37.40%) Loss: 2.142186 LR: 0.00004077 [08:55:53] Epoch: 1 Batch: 14354/38378 (37.40%) Loss: 2.050251 LR: 0.00004077 [08:55:54] Epoch: 1 Batch: 14355/38378 (37.40%) Loss: 2.150774 LR: 0.00004077 [08:55:56] Epoch: 1 Batch: 14356/38378 (37.41%) Loss: 2.176170 LR: 0.00004077 [08:55:58] Epoch: 1 Batch: 14357/38378 (37.41%) Loss: 1.964546 LR: 0.00004077 [08:56:00] Epoch: 1 Batch: 14358/38378 (37.41%) Loss: 1.783470 LR: 0.00004077 [08:56:01] Epoch: 1 Batch: 14359/38378 (37.41%) Loss: 1.965487 LR: 0.00004076 [08:56:03] Epoch: 1 Batch: 14360/38378 (37.42%) Loss: 2.122163 LR: 0.00004076 [08:56:05] Epoch: 1 Batch: 14361/38378 (37.42%) Loss: 1.777999 LR: 0.00004076 [08:56:06] Epoch: 1 Batch: 14362/38378 (37.42%) Loss: 1.943825 LR: 0.00004076 [08:56:08] Epoch: 1 Batch: 14363/38378 (37.43%) Loss: 1.916798 LR: 0.00004076 [08:56:10] Epoch: 1 Batch: 14364/38378 (37.43%) Loss: 2.301926 LR: 0.00004076 [08:56:11] Epoch: 1 Batch: 14365/38378 (37.43%) Loss: 2.004620 LR: 0.00004076 [08:56:13] Epoch: 1 Batch: 14366/38378 (37.43%) Loss: 1.816549 LR: 0.00004075 [08:56:15] Epoch: 1 Batch: 14367/38378 (37.44%) Loss: 1.961571 LR: 0.00004075 [08:56:16] Epoch: 1 Batch: 14368/38378 (37.44%) Loss: 2.045745 LR: 0.00004075 [08:56:18] Epoch: 1 Batch: 14369/38378 (37.44%) Loss: 2.035563 LR: 0.00004075 [08:56:20] Epoch: 1 Batch: 14370/38378 (37.44%) Loss: 1.848432 LR: 0.00004075 [08:56:21] Epoch: 1 Batch: 14371/38378 (37.45%) Loss: 1.869498 LR: 0.00004075 [08:56:23] Epoch: 1 Batch: 14372/38378 (37.45%) Loss: 2.053012 LR: 0.00004075 [08:56:25] Epoch: 1 Batch: 14373/38378 (37.45%) Loss: 2.192760 LR: 0.00004074 [08:56:26] Epoch: 1 Batch: 14374/38378 (37.45%) Loss: 1.975218 LR: 0.00004074 [08:56:28] Epoch: 1 Batch: 14375/38378 (37.46%) Loss: 2.207526 LR: 0.00004074 [08:56:30] Epoch: 1 Batch: 14376/38378 (37.46%) Loss: 2.110546 LR: 0.00004074 [08:56:31] Epoch: 1 Batch: 14377/38378 (37.46%) Loss: 2.022631 LR: 0.00004074 [08:56:33] Epoch: 1 Batch: 14378/38378 (37.46%) Loss: 2.081951 LR: 0.00004074 [08:56:35] Epoch: 1 Batch: 14379/38378 (37.47%) Loss: 1.756634 LR: 0.00004074 [08:56:37] Epoch: 1 Batch: 14380/38378 (37.47%) Loss: 1.828269 LR: 0.00004072 [08:56:38] Epoch: 1 Batch: 14381/38378 (37.47%) Loss: 1.738442 LR: 0.00004072 [08:56:40] Epoch: 1 Batch: 14382/38378 (37.47%) Loss: 2.317517 LR: 0.00004072 [08:56:41] Epoch: 1 Batch: 14383/38378 (37.48%) Loss: 1.882653 LR: 0.00004072 [08:56:43] Epoch: 1 Batch: 14384/38378 (37.48%) Loss: 2.146236 LR: 0.00004072 [08:56:45] Epoch: 1 Batch: 14385/38378 (37.48%) Loss: 1.805594 LR: 0.00004072 [08:56:46] Epoch: 1 Batch: 14386/38378 (37.49%) Loss: 2.165928 LR: 0.00004072 [08:56:48] Epoch: 1 Batch: 14387/38378 (37.49%) Loss: 1.655669 LR: 0.00004071 [08:56:50] Epoch: 1 Batch: 14388/38378 (37.49%) Loss: 1.706389 LR: 0.00004071 [08:56:52] Epoch: 1 Batch: 14389/38378 (37.49%) Loss: 1.611230 LR: 0.00004071 [08:56:53] Epoch: 1 Batch: 14390/38378 (37.50%) Loss: 1.772035 LR: 0.00004071 [08:56:55] Epoch: 1 Batch: 14391/38378 (37.50%) Loss: 2.059271 LR: 0.00004071 [08:56:57] Epoch: 1 Batch: 14392/38378 (37.50%) Loss: 1.890518 LR: 0.00004071 [08:56:58] Epoch: 1 Batch: 14393/38378 (37.50%) Loss: 2.002819 LR: 0.00004071 [08:57:00] Epoch: 1 Batch: 14394/38378 (37.51%) Loss: 2.148016 LR: 0.00004070 [08:57:02] Epoch: 1 Batch: 14395/38378 (37.51%) Loss: 1.697415 LR: 0.00004070 [08:57:03] Epoch: 1 Batch: 14396/38378 (37.51%) Loss: 1.867724 LR: 0.00004070 [08:57:05] Epoch: 1 Batch: 14397/38378 (37.51%) Loss: 2.170769 LR: 0.00004070 [08:57:07] Epoch: 1 Batch: 14398/38378 (37.52%) Loss: 1.826872 LR: 0.00004070 [08:57:08] Epoch: 1 Batch: 14399/38378 (37.52%) Loss: 1.793886 LR: 0.00004070 [08:57:14] >> Cleaned up old temp checkpoint: epoch1_step13400 [08:57:14] >> Temp checkpoint saved: epoch1_step14400, size: 0.1702 GB [08:57:14] Epoch: 1 Batch: 14400/38378 (37.52%) Loss: 1.857966 LR: 0.00004070 [08:57:16] Epoch: 1 Batch: 14401/38378 (37.52%) Loss: 2.218565 LR: 0.00004069 [08:57:17] Epoch: 1 Batch: 14402/38378 (37.53%) Loss: 2.241311 LR: 0.00004069 [08:57:19] Epoch: 1 Batch: 14403/38378 (37.53%) Loss: 1.886268 LR: 0.00004069 [08:57:21] Epoch: 1 Batch: 14404/38378 (37.53%) Loss: 1.993719 LR: 0.00004069 [08:57:22] Epoch: 1 Batch: 14405/38378 (37.53%) Loss: 1.909076 LR: 0.00004069 [08:57:24] Epoch: 1 Batch: 14406/38378 (37.54%) Loss: 2.039978 LR: 0.00004069 [08:57:26] Epoch: 1 Batch: 14407/38378 (37.54%) Loss: 2.014351 LR: 0.00004069 [08:57:28] Epoch: 1 Batch: 14408/38378 (37.54%) Loss: 2.020127 LR: 0.00004068 [08:57:29] Epoch: 1 Batch: 14409/38378 (37.54%) Loss: 1.938463 LR: 0.00004068 [08:57:31] Epoch: 1 Batch: 14410/38378 (37.55%) Loss: 1.844026 LR: 0.00004068 [08:57:33] Epoch: 1 Batch: 14411/38378 (37.55%) Loss: 1.746707 LR: 0.00004068 [08:57:34] Epoch: 1 Batch: 14412/38378 (37.55%) Loss: 1.812948 LR: 0.00004068 [08:57:36] Epoch: 1 Batch: 14413/38378 (37.56%) Loss: 2.237851 LR: 0.00004068 [08:57:38] Epoch: 1 Batch: 14414/38378 (37.56%) Loss: 2.005767 LR: 0.00004068 [08:57:39] Epoch: 1 Batch: 14415/38378 (37.56%) Loss: 1.942677 LR: 0.00004067 [08:57:41] Epoch: 1 Batch: 14416/38378 (37.56%) Loss: 2.086463 LR: 0.00004067 [08:57:43] Epoch: 1 Batch: 14417/38378 (37.57%) Loss: 1.802880 LR: 0.00004067 [08:57:44] Epoch: 1 Batch: 14418/38378 (37.57%) Loss: 2.335187 LR: 0.00004067 [08:57:46] Epoch: 1 Batch: 14419/38378 (37.57%) Loss: 1.994104 LR: 0.00004067 [08:57:48] Epoch: 1 Batch: 14420/38378 (37.57%) Loss: 2.272693 LR: 0.00004067 [08:57:49] Epoch: 1 Batch: 14421/38378 (37.58%) Loss: 2.012164 LR: 0.00004067 [08:57:51] Epoch: 1 Batch: 14422/38378 (37.58%) Loss: 1.796062 LR: 0.00004066 [08:57:53] Epoch: 1 Batch: 14423/38378 (37.58%) Loss: 1.763181 LR: 0.00004066 [08:57:55] Epoch: 1 Batch: 14424/38378 (37.58%) Loss: 1.983904 LR: 0.00004066 [08:57:56] Epoch: 1 Batch: 14425/38378 (37.59%) Loss: 1.920042 LR: 0.00004066 [08:57:58] Epoch: 1 Batch: 14426/38378 (37.59%) Loss: 2.224792 LR: 0.00004066 [08:58:00] Epoch: 1 Batch: 14427/38378 (37.59%) Loss: 1.797784 LR: 0.00004066 [08:58:01] Epoch: 1 Batch: 14428/38378 (37.59%) Loss: 1.714665 LR: 0.00004066 [08:58:03] Epoch: 1 Batch: 14429/38378 (37.60%) Loss: 1.818631 LR: 0.00004065 [08:58:05] Epoch: 1 Batch: 14430/38378 (37.60%) Loss: 1.735356 LR: 0.00004065 [08:58:06] Epoch: 1 Batch: 14431/38378 (37.60%) Loss: 1.875444 LR: 0.00004065 [08:58:08] Epoch: 1 Batch: 14432/38378 (37.60%) Loss: 2.418164 LR: 0.00004065 [08:58:10] Epoch: 1 Batch: 14433/38378 (37.61%) Loss: 1.793324 LR: 0.00004065 [08:58:11] Epoch: 1 Batch: 14434/38378 (37.61%) Loss: 2.014642 LR: 0.00004065 [08:58:13] Epoch: 1 Batch: 14435/38378 (37.61%) Loss: 1.776583 LR: 0.00004065 [08:58:15] Epoch: 1 Batch: 14436/38378 (37.62%) Loss: 2.100743 LR: 0.00004064 [08:58:17] Epoch: 1 Batch: 14437/38378 (37.62%) Loss: 2.122661 LR: 0.00004064 [08:58:18] Epoch: 1 Batch: 14438/38378 (37.62%) Loss: 2.095289 LR: 0.00004064 [08:58:20] Epoch: 1 Batch: 14439/38378 (37.62%) Loss: 1.961793 LR: 0.00004064 [08:58:22] Epoch: 1 Batch: 14440/38378 (37.63%) Loss: 1.922636 LR: 0.00004064 [08:58:23] Epoch: 1 Batch: 14441/38378 (37.63%) Loss: 1.576199 LR: 0.00004064 [08:58:25] Epoch: 1 Batch: 14442/38378 (37.63%) Loss: 2.075839 LR: 0.00004064 [08:58:27] Epoch: 1 Batch: 14443/38378 (37.63%) Loss: 1.756187 LR: 0.00004063 [08:58:28] Epoch: 1 Batch: 14444/38378 (37.64%) Loss: 2.019334 LR: 0.00004063 [08:58:30] Epoch: 1 Batch: 14445/38378 (37.64%) Loss: 1.898395 LR: 0.00004063 [08:58:32] Epoch: 1 Batch: 14446/38378 (37.64%) Loss: 2.305898 LR: 0.00004063 [08:58:33] Epoch: 1 Batch: 14447/38378 (37.64%) Loss: 1.935081 LR: 0.00004063 [08:58:35] Epoch: 1 Batch: 14448/38378 (37.65%) Loss: 2.247668 LR: 0.00004063 [08:58:37] Epoch: 1 Batch: 14449/38378 (37.65%) Loss: 2.063263 LR: 0.00004063 [08:58:39] Epoch: 1 Batch: 14450/38378 (37.65%) Loss: 2.259341 LR: 0.00004062 [08:58:40] Epoch: 1 Batch: 14451/38378 (37.65%) Loss: 1.966155 LR: 0.00004062 [08:58:42] Epoch: 1 Batch: 14452/38378 (37.66%) Loss: 1.968596 LR: 0.00004062 [08:58:44] Epoch: 1 Batch: 14453/38378 (37.66%) Loss: 1.868605 LR: 0.00004062 [08:58:45] Epoch: 1 Batch: 14454/38378 (37.66%) Loss: 2.176154 LR: 0.00004062 [08:58:47] Epoch: 1 Batch: 14455/38378 (37.66%) Loss: 2.087994 LR: 0.00004062 [08:58:49] Epoch: 1 Batch: 14456/38378 (37.67%) Loss: 1.835609 LR: 0.00004062 [08:58:50] Epoch: 1 Batch: 14457/38378 (37.67%) Loss: 2.195578 LR: 0.00004061 [08:58:52] Epoch: 1 Batch: 14458/38378 (37.67%) Loss: 1.817916 LR: 0.00004061 [08:58:54] Epoch: 1 Batch: 14459/38378 (37.68%) Loss: 2.287206 LR: 0.00004061 [08:58:55] Epoch: 1 Batch: 14460/38378 (37.68%) Loss: 2.000673 LR: 0.00004061 [08:58:57] Epoch: 1 Batch: 14461/38378 (37.68%) Loss: 1.694082 LR: 0.00004061 [08:58:59] Epoch: 1 Batch: 14462/38378 (37.68%) Loss: 2.121542 LR: 0.00004061 [08:59:00] Epoch: 1 Batch: 14463/38378 (37.69%) Loss: 1.930682 LR: 0.00004061 [08:59:02] Epoch: 1 Batch: 14464/38378 (37.69%) Loss: 2.146252 LR: 0.00004060 [08:59:04] Epoch: 1 Batch: 14465/38378 (37.69%) Loss: 1.891358 LR: 0.00004060 [08:59:06] Epoch: 1 Batch: 14466/38378 (37.69%) Loss: 1.986138 LR: 0.00004060 [08:59:07] Epoch: 1 Batch: 14467/38378 (37.70%) Loss: 2.072604 LR: 0.00004060 [08:59:09] Epoch: 1 Batch: 14468/38378 (37.70%) Loss: 2.059858 LR: 0.00004060 [08:59:10] Epoch: 1 Batch: 14469/38378 (37.70%) Loss: 1.899054 LR: 0.00004060 [08:59:12] Epoch: 1 Batch: 14470/38378 (37.70%) Loss: 1.924160 LR: 0.00004060 [08:59:14] Epoch: 1 Batch: 14471/38378 (37.71%) Loss: 1.841091 LR: 0.00004059 [08:59:15] Epoch: 1 Batch: 14472/38378 (37.71%) Loss: 1.717916 LR: 0.00004059 [08:59:17] Epoch: 1 Batch: 14473/38378 (37.71%) Loss: 2.198406 LR: 0.00004059 [08:59:19] Epoch: 1 Batch: 14474/38378 (37.71%) Loss: 1.940631 LR: 0.00004059 [08:59:21] Epoch: 1 Batch: 14475/38378 (37.72%) Loss: 1.815237 LR: 0.00004059 [08:59:22] Epoch: 1 Batch: 14476/38378 (37.72%) Loss: 1.832206 LR: 0.00004059 [08:59:24] Epoch: 1 Batch: 14477/38378 (37.72%) Loss: 1.902150 LR: 0.00004059 [08:59:26] Epoch: 1 Batch: 14478/38378 (37.72%) Loss: 1.942894 LR: 0.00004058 [08:59:27] Epoch: 1 Batch: 14479/38378 (37.73%) Loss: 1.914397 LR: 0.00004058 [08:59:29] Epoch: 1 Batch: 14480/38378 (37.73%) Loss: 1.720182 LR: 0.00004058 [08:59:31] Epoch: 1 Batch: 14481/38378 (37.73%) Loss: 2.039709 LR: 0.00004058 [08:59:32] Epoch: 1 Batch: 14482/38378 (37.74%) Loss: 2.168299 LR: 0.00004058 [08:59:34] Epoch: 1 Batch: 14483/38378 (37.74%) Loss: 1.976187 LR: 0.00004058 [08:59:36] Epoch: 1 Batch: 14484/38378 (37.74%) Loss: 2.090849 LR: 0.00004058 [08:59:37] Epoch: 1 Batch: 14485/38378 (37.74%) Loss: 1.739711 LR: 0.00004057 [08:59:39] Epoch: 1 Batch: 14486/38378 (37.75%) Loss: 2.233333 LR: 0.00004057 [08:59:41] Epoch: 1 Batch: 14487/38378 (37.75%) Loss: 1.829381 LR: 0.00004057 [08:59:43] Epoch: 1 Batch: 14488/38378 (37.75%) Loss: 2.007208 LR: 0.00004057 [08:59:44] Epoch: 1 Batch: 14489/38378 (37.75%) Loss: 2.257852 LR: 0.00004057 [08:59:46] Epoch: 1 Batch: 14490/38378 (37.76%) Loss: 1.931051 LR: 0.00004057 [08:59:48] Epoch: 1 Batch: 14491/38378 (37.76%) Loss: 2.418006 LR: 0.00004057 [08:59:49] Epoch: 1 Batch: 14492/38378 (37.76%) Loss: 1.814704 LR: 0.00004056 [08:59:51] Epoch: 1 Batch: 14493/38378 (37.76%) Loss: 1.969754 LR: 0.00004056 [08:59:53] Epoch: 1 Batch: 14494/38378 (37.77%) Loss: 1.844416 LR: 0.00004056 [08:59:54] Epoch: 1 Batch: 14495/38378 (37.77%) Loss: 2.090565 LR: 0.00004056 [08:59:56] Epoch: 1 Batch: 14496/38378 (37.77%) Loss: 2.010163 LR: 0.00004056 [08:59:58] Epoch: 1 Batch: 14497/38378 (37.77%) Loss: 1.840321 LR: 0.00004056 [08:59:59] Epoch: 1 Batch: 14498/38378 (37.78%) Loss: 1.888564 LR: 0.00004056 [09:00:01] Epoch: 1 Batch: 14499/38378 (37.78%) Loss: 2.133784 LR: 0.00004055 [09:00:03] >> Evaluating batch 0 [09:00:04] >> Evaluating batch 1 [09:00:05] >> Evaluating batch 2 [09:00:06] >> Evaluating batch 3 [09:00:07] >> Evaluating batch 4 [09:00:07] >> Evaluating batch 5 [09:00:08] >> Evaluating batch 6 [09:00:09] >> Evaluating batch 7 [09:00:10] >> Evaluating batch 8 [09:00:11] >> Evaluating batch 9 [09:00:12] >> Evaluating batch 10 [09:00:13] >> Evaluating batch 11 [09:00:14] >> Evaluating batch 12 [09:00:15] >> Evaluating batch 13 [09:00:16] >> Evaluating batch 14 [09:00:17] >> Evaluating batch 15 [09:00:18] >> Evaluating batch 16 [09:00:19] Epoch: 1 Step: 14500/38378 Evaluation: [09:00:19] [1mAvg Loss Since Last Eval: 1.9865 Val Loss: 2.0976 Validation loss delta: -0.0052 Perplexity: 8.1470 LR: 0.00004055 [09:00:23] >> Cleaned up old temp checkpoint: epoch1_step13500 [09:00:23] >> Temp checkpoint saved: epoch1_step14500, size: 0.1702 GB [09:00:26] >> Checkpoint saved: epoch1_step14500, size: 0.1702 GB [09:00:26] Epoch: 1 Batch: 14500/38378 (37.78%) Loss: 2.077649 LR: 0.00004055 [09:00:28] Epoch: 1 Batch: 14501/38378 (37.78%) Loss: 1.937218 LR: 0.00004055 [09:00:30] Epoch: 1 Batch: 14502/38378 (37.79%) Loss: 1.710662 LR: 0.00004055 [09:00:31] Epoch: 1 Batch: 14503/38378 (37.79%) Loss: 1.754668 LR: 0.00004055 [09:00:33] Epoch: 1 Batch: 14504/38378 (37.79%) Loss: 2.031947 LR: 0.00004055 [09:00:35] Epoch: 1 Batch: 14505/38378 (37.80%) Loss: 2.002779 LR: 0.00004055 [09:00:36] Epoch: 1 Batch: 14506/38378 (37.80%) Loss: 1.998164 LR: 0.00004053 [09:00:38] Epoch: 1 Batch: 14507/38378 (37.80%) Loss: 1.956919 LR: 0.00004053 [09:00:40] Epoch: 1 Batch: 14508/38378 (37.80%) Loss: 1.796239 LR: 0.00004053 [09:00:41] Epoch: 1 Batch: 14509/38378 (37.81%) Loss: 1.737497 LR: 0.00004053 [09:00:43] Epoch: 1 Batch: 14510/38378 (37.81%) Loss: 2.052632 LR: 0.00004053 [09:00:45] Epoch: 1 Batch: 14511/38378 (37.81%) Loss: 1.974737 LR: 0.00004053 [09:00:47] Epoch: 1 Batch: 14512/38378 (37.81%) Loss: 1.847116 LR: 0.00004053 [09:00:48] Epoch: 1 Batch: 14513/38378 (37.82%) Loss: 2.031851 LR: 0.00004052 [09:00:50] Epoch: 1 Batch: 14514/38378 (37.82%) Loss: 1.649762 LR: 0.00004052 [09:00:52] Epoch: 1 Batch: 14515/38378 (37.82%) Loss: 2.258981 LR: 0.00004052 [09:00:54] Epoch: 1 Batch: 14516/38378 (37.82%) Loss: 2.174098 LR: 0.00004052 [09:00:55] Epoch: 1 Batch: 14517/38378 (37.83%) Loss: 2.007590 LR: 0.00004052 [09:00:57] Epoch: 1 Batch: 14518/38378 (37.83%) Loss: 2.148902 LR: 0.00004052 [09:00:59] Epoch: 1 Batch: 14519/38378 (37.83%) Loss: 1.879970 LR: 0.00004052 [09:01:00] Epoch: 1 Batch: 14520/38378 (37.83%) Loss: 2.058881 LR: 0.00004051 [09:01:02] Epoch: 1 Batch: 14521/38378 (37.84%) Loss: 1.888450 LR: 0.00004051 [09:01:04] Epoch: 1 Batch: 14522/38378 (37.84%) Loss: 1.967558 LR: 0.00004051 [09:01:06] Epoch: 1 Batch: 14523/38378 (37.84%) Loss: 2.027711 LR: 0.00004051 [09:01:07] Epoch: 1 Batch: 14524/38378 (37.84%) Loss: 1.768133 LR: 0.00004051 [09:01:09] Epoch: 1 Batch: 14525/38378 (37.85%) Loss: 1.888111 LR: 0.00004051 [09:01:11] Epoch: 1 Batch: 14526/38378 (37.85%) Loss: 1.998325 LR: 0.00004051 [09:01:12] Epoch: 1 Batch: 14527/38378 (37.85%) Loss: 1.957668 LR: 0.00004050 [09:01:14] Epoch: 1 Batch: 14528/38378 (37.86%) Loss: 2.261826 LR: 0.00004050 [09:01:16] Epoch: 1 Batch: 14529/38378 (37.86%) Loss: 2.155173 LR: 0.00004050 [09:01:18] Epoch: 1 Batch: 14530/38378 (37.86%) Loss: 1.777728 LR: 0.00004050 [09:01:19] Epoch: 1 Batch: 14531/38378 (37.86%) Loss: 1.907265 LR: 0.00004050 [09:01:21] Epoch: 1 Batch: 14532/38378 (37.87%) Loss: 1.973141 LR: 0.00004050 [09:01:23] Epoch: 1 Batch: 14533/38378 (37.87%) Loss: 1.990050 LR: 0.00004050 [09:01:24] Epoch: 1 Batch: 14534/38378 (37.87%) Loss: 1.795326 LR: 0.00004049 [09:01:26] Epoch: 1 Batch: 14535/38378 (37.87%) Loss: 2.221948 LR: 0.00004049 [09:01:28] Epoch: 1 Batch: 14536/38378 (37.88%) Loss: 1.956899 LR: 0.00004049 [09:01:29] Epoch: 1 Batch: 14537/38378 (37.88%) Loss: 1.962444 LR: 0.00004049 [09:01:31] Epoch: 1 Batch: 14538/38378 (37.88%) Loss: 2.001519 LR: 0.00004049 [09:01:33] Epoch: 1 Batch: 14539/38378 (37.88%) Loss: 1.819154 LR: 0.00004049 [09:01:34] Epoch: 1 Batch: 14540/38378 (37.89%) Loss: 1.919253 LR: 0.00004049 [09:01:36] Epoch: 1 Batch: 14541/38378 (37.89%) Loss: 2.026200 LR: 0.00004048 [09:01:38] Epoch: 1 Batch: 14542/38378 (37.89%) Loss: 1.997013 LR: 0.00004048 [09:01:39] Epoch: 1 Batch: 14543/38378 (37.89%) Loss: 1.942770 LR: 0.00004048 [09:01:41] Epoch: 1 Batch: 14544/38378 (37.90%) Loss: 1.958537 LR: 0.00004048 [09:01:43] Epoch: 1 Batch: 14545/38378 (37.90%) Loss: 2.256243 LR: 0.00004048 [09:01:44] Epoch: 1 Batch: 14546/38378 (37.90%) Loss: 1.806481 LR: 0.00004048 [09:01:46] Epoch: 1 Batch: 14547/38378 (37.90%) Loss: 2.099163 LR: 0.00004048 [09:01:48] Epoch: 1 Batch: 14548/38378 (37.91%) Loss: 1.815585 LR: 0.00004047 [09:01:50] Epoch: 1 Batch: 14549/38378 (37.91%) Loss: 2.386075 LR: 0.00004047 [09:01:51] Epoch: 1 Batch: 14550/38378 (37.91%) Loss: 1.804871 LR: 0.00004047 [09:01:53] Epoch: 1 Batch: 14551/38378 (37.91%) Loss: 2.003318 LR: 0.00004047 [09:01:55] Epoch: 1 Batch: 14552/38378 (37.92%) Loss: 2.161457 LR: 0.00004047 [09:01:56] Epoch: 1 Batch: 14553/38378 (37.92%) Loss: 2.087500 LR: 0.00004047 [09:01:58] Epoch: 1 Batch: 14554/38378 (37.92%) Loss: 1.690211 LR: 0.00004047 [09:02:00] Epoch: 1 Batch: 14555/38378 (37.93%) Loss: 1.732552 LR: 0.00004046 [09:02:01] Epoch: 1 Batch: 14556/38378 (37.93%) Loss: 1.954910 LR: 0.00004046 [09:02:03] Epoch: 1 Batch: 14557/38378 (37.93%) Loss: 1.867820 LR: 0.00004046 [09:02:05] Epoch: 1 Batch: 14558/38378 (37.93%) Loss: 2.027873 LR: 0.00004046 [09:02:06] Epoch: 1 Batch: 14559/38378 (37.94%) Loss: 2.167745 LR: 0.00004046 [09:02:08] Epoch: 1 Batch: 14560/38378 (37.94%) Loss: 1.841649 LR: 0.00004046 [09:02:10] Epoch: 1 Batch: 14561/38378 (37.94%) Loss: 1.982371 LR: 0.00004046 [09:02:11] Epoch: 1 Batch: 14562/38378 (37.94%) Loss: 2.205670 LR: 0.00004045 [09:02:13] Epoch: 1 Batch: 14563/38378 (37.95%) Loss: 1.907394 LR: 0.00004045 [09:02:15] Epoch: 1 Batch: 14564/38378 (37.95%) Loss: 2.154145 LR: 0.00004045 [09:02:16] Epoch: 1 Batch: 14565/38378 (37.95%) Loss: 2.103556 LR: 0.00004045 [09:02:18] Epoch: 1 Batch: 14566/38378 (37.95%) Loss: 1.914074 LR: 0.00004045 [09:02:20] Epoch: 1 Batch: 14567/38378 (37.96%) Loss: 1.949524 LR: 0.00004045 [09:02:21] Epoch: 1 Batch: 14568/38378 (37.96%) Loss: 2.120600 LR: 0.00004045 [09:02:23] Epoch: 1 Batch: 14569/38378 (37.96%) Loss: 2.006061 LR: 0.00004044 [09:02:25] Epoch: 1 Batch: 14570/38378 (37.96%) Loss: 2.120045 LR: 0.00004044 [09:02:27] Epoch: 1 Batch: 14571/38378 (37.97%) Loss: 2.121382 LR: 0.00004044 [09:02:28] Epoch: 1 Batch: 14572/38378 (37.97%) Loss: 2.117405 LR: 0.00004044 [09:02:30] Epoch: 1 Batch: 14573/38378 (37.97%) Loss: 2.218222 LR: 0.00004044 [09:02:32] Epoch: 1 Batch: 14574/38378 (37.97%) Loss: 2.015608 LR: 0.00004044 [09:02:33] Epoch: 1 Batch: 14575/38378 (37.98%) Loss: 2.135591 LR: 0.00004044 [09:02:35] Epoch: 1 Batch: 14576/38378 (37.98%) Loss: 2.012451 LR: 0.00004043 [09:02:37] Epoch: 1 Batch: 14577/38378 (37.98%) Loss: 2.024131 LR: 0.00004043 [09:02:38] Epoch: 1 Batch: 14578/38378 (37.99%) Loss: 2.204821 LR: 0.00004043 [09:02:40] Epoch: 1 Batch: 14579/38378 (37.99%) Loss: 1.982951 LR: 0.00004043 [09:02:42] Epoch: 1 Batch: 14580/38378 (37.99%) Loss: 1.994311 LR: 0.00004043 [09:02:43] Epoch: 1 Batch: 14581/38378 (37.99%) Loss: 1.938296 LR: 0.00004043 [09:02:45] Epoch: 1 Batch: 14582/38378 (38.00%) Loss: 2.099901 LR: 0.00004043 [09:02:47] Epoch: 1 Batch: 14583/38378 (38.00%) Loss: 1.754488 LR: 0.00004042 [09:02:48] Epoch: 1 Batch: 14584/38378 (38.00%) Loss: 2.095478 LR: 0.00004042 [09:02:50] Epoch: 1 Batch: 14585/38378 (38.00%) Loss: 1.953059 LR: 0.00004042 [09:02:52] Epoch: 1 Batch: 14586/38378 (38.01%) Loss: 2.121562 LR: 0.00004042 [09:02:54] Epoch: 1 Batch: 14587/38378 (38.01%) Loss: 2.029691 LR: 0.00004042 [09:02:55] Epoch: 1 Batch: 14588/38378 (38.01%) Loss: 1.986880 LR: 0.00004042 [09:02:57] Epoch: 1 Batch: 14589/38378 (38.01%) Loss: 2.084825 LR: 0.00004042 [09:02:59] Epoch: 1 Batch: 14590/38378 (38.02%) Loss: 1.882142 LR: 0.00004041 [09:03:00] Epoch: 1 Batch: 14591/38378 (38.02%) Loss: 1.867550 LR: 0.00004041 [09:03:02] Epoch: 1 Batch: 14592/38378 (38.02%) Loss: 1.938403 LR: 0.00004041 [09:03:04] Epoch: 1 Batch: 14593/38378 (38.02%) Loss: 1.637001 LR: 0.00004041 [09:03:05] Epoch: 1 Batch: 14594/38378 (38.03%) Loss: 1.724495 LR: 0.00004041 [09:03:07] Epoch: 1 Batch: 14595/38378 (38.03%) Loss: 1.888038 LR: 0.00004041 [09:03:09] Epoch: 1 Batch: 14596/38378 (38.03%) Loss: 2.069240 LR: 0.00004041 [09:03:11] Epoch: 1 Batch: 14597/38378 (38.03%) Loss: 1.756023 LR: 0.00004040 [09:03:12] Epoch: 1 Batch: 14598/38378 (38.04%) Loss: 1.942851 LR: 0.00004040 [09:03:14] Epoch: 1 Batch: 14599/38378 (38.04%) Loss: 1.984066 LR: 0.00004040 [09:03:20] >> Cleaned up old temp checkpoint: epoch1_step13600 [09:03:20] >> Temp checkpoint saved: epoch1_step14600, size: 0.1702 GB [09:03:20] Epoch: 1 Batch: 14600/38378 (38.04%) Loss: 2.107461 LR: 0.00004040 [09:03:21] Epoch: 1 Batch: 14601/38378 (38.05%) Loss: 2.192027 LR: 0.00004040 [09:03:23] Epoch: 1 Batch: 14602/38378 (38.05%) Loss: 2.115925 LR: 0.00004040 [09:03:24] Epoch: 1 Batch: 14603/38378 (38.05%) Loss: 2.154989 LR: 0.00004040 [09:03:26] Epoch: 1 Batch: 14604/38378 (38.05%) Loss: 1.922128 LR: 0.00004039 [09:03:28] Epoch: 1 Batch: 14605/38378 (38.06%) Loss: 2.209035 LR: 0.00004039 [09:03:30] Epoch: 1 Batch: 14606/38378 (38.06%) Loss: 1.661954 LR: 0.00004039 [09:03:31] Epoch: 1 Batch: 14607/38378 (38.06%) Loss: 2.210438 LR: 0.00004039 [09:03:33] Epoch: 1 Batch: 14608/38378 (38.06%) Loss: 2.172594 LR: 0.00004039 [09:03:35] Epoch: 1 Batch: 14609/38378 (38.07%) Loss: 2.074316 LR: 0.00004039 [09:03:36] Epoch: 1 Batch: 14610/38378 (38.07%) Loss: 1.788779 LR: 0.00004039 [09:03:38] Epoch: 1 Batch: 14611/38378 (38.07%) Loss: 2.168020 LR: 0.00004038 [09:03:40] Epoch: 1 Batch: 14612/38378 (38.07%) Loss: 1.962879 LR: 0.00004038 [09:03:41] Epoch: 1 Batch: 14613/38378 (38.08%) Loss: 1.903512 LR: 0.00004038 [09:03:43] Epoch: 1 Batch: 14614/38378 (38.08%) Loss: 2.033882 LR: 0.00004038 [09:03:45] Epoch: 1 Batch: 14615/38378 (38.08%) Loss: 2.067969 LR: 0.00004038 [09:03:47] Epoch: 1 Batch: 14616/38378 (38.08%) Loss: 1.852801 LR: 0.00004038 [09:03:48] Epoch: 1 Batch: 14617/38378 (38.09%) Loss: 2.108028 LR: 0.00004038 [09:03:50] Epoch: 1 Batch: 14618/38378 (38.09%) Loss: 2.110875 LR: 0.00004036 [09:03:52] Epoch: 1 Batch: 14619/38378 (38.09%) Loss: 1.944660 LR: 0.00004036 [09:03:53] Epoch: 1 Batch: 14620/38378 (38.09%) Loss: 2.080363 LR: 0.00004036 [09:03:55] Epoch: 1 Batch: 14621/38378 (38.10%) Loss: 1.940733 LR: 0.00004036 [09:03:57] Epoch: 1 Batch: 14622/38378 (38.10%) Loss: 1.876921 LR: 0.00004036 [09:03:59] Epoch: 1 Batch: 14623/38378 (38.10%) Loss: 1.839813 LR: 0.00004036 [09:04:00] Epoch: 1 Batch: 14624/38378 (38.11%) Loss: 2.012394 LR: 0.00004036 [09:04:02] Epoch: 1 Batch: 14625/38378 (38.11%) Loss: 2.281278 LR: 0.00004035 [09:04:04] Epoch: 1 Batch: 14626/38378 (38.11%) Loss: 2.139730 LR: 0.00004035 [09:04:05] Epoch: 1 Batch: 14627/38378 (38.11%) Loss: 2.186132 LR: 0.00004035 [09:04:07] Epoch: 1 Batch: 14628/38378 (38.12%) Loss: 1.977589 LR: 0.00004035 [09:04:09] Epoch: 1 Batch: 14629/38378 (38.12%) Loss: 1.982386 LR: 0.00004035 [09:04:11] Epoch: 1 Batch: 14630/38378 (38.12%) Loss: 2.095241 LR: 0.00004035 [09:04:12] Epoch: 1 Batch: 14631/38378 (38.12%) Loss: 2.002750 LR: 0.00004035 [09:04:14] Epoch: 1 Batch: 14632/38378 (38.13%) Loss: 2.140681 LR: 0.00004034 [09:04:16] Epoch: 1 Batch: 14633/38378 (38.13%) Loss: 2.086878 LR: 0.00004034 [09:04:17] Epoch: 1 Batch: 14634/38378 (38.13%) Loss: 2.009509 LR: 0.00004034 [09:04:19] Epoch: 1 Batch: 14635/38378 (38.13%) Loss: 1.913489 LR: 0.00004034 [09:04:21] Epoch: 1 Batch: 14636/38378 (38.14%) Loss: 1.849203 LR: 0.00004034 [09:04:22] Epoch: 1 Batch: 14637/38378 (38.14%) Loss: 1.756220 LR: 0.00004034 [09:04:24] Epoch: 1 Batch: 14638/38378 (38.14%) Loss: 1.964098 LR: 0.00004034 [09:04:26] Epoch: 1 Batch: 14639/38378 (38.14%) Loss: 1.992920 LR: 0.00004033 [09:04:28] Epoch: 1 Batch: 14640/38378 (38.15%) Loss: 2.102508 LR: 0.00004033 [09:04:29] Epoch: 1 Batch: 14641/38378 (38.15%) Loss: 2.296534 LR: 0.00004033 [09:04:31] Epoch: 1 Batch: 14642/38378 (38.15%) Loss: 1.641020 LR: 0.00004033 [09:04:33] Epoch: 1 Batch: 14643/38378 (38.15%) Loss: 1.700224 LR: 0.00004033 [09:04:34] Epoch: 1 Batch: 14644/38378 (38.16%) Loss: 2.108592 LR: 0.00004033 [09:04:36] Epoch: 1 Batch: 14645/38378 (38.16%) Loss: 1.788154 LR: 0.00004033 [09:04:38] Epoch: 1 Batch: 14646/38378 (38.16%) Loss: 1.866322 LR: 0.00004032 [09:04:39] Epoch: 1 Batch: 14647/38378 (38.17%) Loss: 2.179607 LR: 0.00004032 [09:04:41] Epoch: 1 Batch: 14648/38378 (38.17%) Loss: 1.945124 LR: 0.00004032 [09:04:43] Epoch: 1 Batch: 14649/38378 (38.17%) Loss: 1.984725 LR: 0.00004032 [09:04:44] Epoch: 1 Batch: 14650/38378 (38.17%) Loss: 1.992837 LR: 0.00004032 [09:04:46] Epoch: 1 Batch: 14651/38378 (38.18%) Loss: 2.084236 LR: 0.00004032 [09:04:48] Epoch: 1 Batch: 14652/38378 (38.18%) Loss: 1.712936 LR: 0.00004032 [09:04:49] Epoch: 1 Batch: 14653/38378 (38.18%) Loss: 1.963642 LR: 0.00004031 [09:04:51] Epoch: 1 Batch: 14654/38378 (38.18%) Loss: 1.908322 LR: 0.00004031 [09:04:53] Epoch: 1 Batch: 14655/38378 (38.19%) Loss: 2.020675 LR: 0.00004031 [09:04:54] Epoch: 1 Batch: 14656/38378 (38.19%) Loss: 1.768259 LR: 0.00004031 [09:04:56] Epoch: 1 Batch: 14657/38378 (38.19%) Loss: 1.928561 LR: 0.00004031 [09:04:58] Epoch: 1 Batch: 14658/38378 (38.19%) Loss: 1.856420 LR: 0.00004031 [09:04:59] Epoch: 1 Batch: 14659/38378 (38.20%) Loss: 2.085433 LR: 0.00004031 [09:05:01] Epoch: 1 Batch: 14660/38378 (38.20%) Loss: 2.168680 LR: 0.00004030 [09:05:03] Epoch: 1 Batch: 14661/38378 (38.20%) Loss: 2.068728 LR: 0.00004030 [09:05:05] Epoch: 1 Batch: 14662/38378 (38.20%) Loss: 2.026294 LR: 0.00004030 [09:05:06] Epoch: 1 Batch: 14663/38378 (38.21%) Loss: 2.099842 LR: 0.00004030 [09:05:08] Epoch: 1 Batch: 14664/38378 (38.21%) Loss: 1.981096 LR: 0.00004030 [09:05:10] Epoch: 1 Batch: 14665/38378 (38.21%) Loss: 1.856624 LR: 0.00004030 [09:05:11] Epoch: 1 Batch: 14666/38378 (38.21%) Loss: 1.949635 LR: 0.00004030 [09:05:13] Epoch: 1 Batch: 14667/38378 (38.22%) Loss: 2.026553 LR: 0.00004029 [09:05:15] Epoch: 1 Batch: 14668/38378 (38.22%) Loss: 1.883747 LR: 0.00004029 [09:05:16] Epoch: 1 Batch: 14669/38378 (38.22%) Loss: 1.864311 LR: 0.00004029 [09:05:18] Epoch: 1 Batch: 14670/38378 (38.23%) Loss: 2.002241 LR: 0.00004029 [09:05:20] Epoch: 1 Batch: 14671/38378 (38.23%) Loss: 1.825272 LR: 0.00004029 [09:05:21] Epoch: 1 Batch: 14672/38378 (38.23%) Loss: 2.152194 LR: 0.00004029 [09:05:23] Epoch: 1 Batch: 14673/38378 (38.23%) Loss: 2.025621 LR: 0.00004029 [09:05:25] Epoch: 1 Batch: 14674/38378 (38.24%) Loss: 1.924220 LR: 0.00004028 [09:05:26] Epoch: 1 Batch: 14675/38378 (38.24%) Loss: 1.758894 LR: 0.00004028 [09:05:28] Epoch: 1 Batch: 14676/38378 (38.24%) Loss: 1.763358 LR: 0.00004028 [09:05:30] Epoch: 1 Batch: 14677/38378 (38.24%) Loss: 1.886801 LR: 0.00004028 [09:05:31] Epoch: 1 Batch: 14678/38378 (38.25%) Loss: 1.778190 LR: 0.00004028 [09:05:33] Epoch: 1 Batch: 14679/38378 (38.25%) Loss: 1.950870 LR: 0.00004028 [09:05:35] Epoch: 1 Batch: 14680/38378 (38.25%) Loss: 1.928307 LR: 0.00004028 [09:05:37] Epoch: 1 Batch: 14681/38378 (38.25%) Loss: 2.411923 LR: 0.00004027 [09:05:38] Epoch: 1 Batch: 14682/38378 (38.26%) Loss: 2.006035 LR: 0.00004027 [09:05:40] Epoch: 1 Batch: 14683/38378 (38.26%) Loss: 2.018274 LR: 0.00004027 [09:05:41] Epoch: 1 Batch: 14684/38378 (38.26%) Loss: 2.400810 LR: 0.00004027 [09:05:43] Epoch: 1 Batch: 14685/38378 (38.26%) Loss: 1.862584 LR: 0.00004027 [09:05:45] Epoch: 1 Batch: 14686/38378 (38.27%) Loss: 2.221119 LR: 0.00004027 [09:05:46] Epoch: 1 Batch: 14687/38378 (38.27%) Loss: 2.002499 LR: 0.00004027 [09:05:48] Epoch: 1 Batch: 14688/38378 (38.27%) Loss: 1.915398 LR: 0.00004026 [09:05:50] Epoch: 1 Batch: 14689/38378 (38.27%) Loss: 1.985493 LR: 0.00004026 [09:05:51] Epoch: 1 Batch: 14690/38378 (38.28%) Loss: 2.146860 LR: 0.00004026 [09:05:53] Epoch: 1 Batch: 14691/38378 (38.28%) Loss: 1.803572 LR: 0.00004026 [09:05:55] Epoch: 1 Batch: 14692/38378 (38.28%) Loss: 1.793032 LR: 0.00004026 [09:05:57] Epoch: 1 Batch: 14693/38378 (38.28%) Loss: 2.026979 LR: 0.00004026 [09:05:58] Epoch: 1 Batch: 14694/38378 (38.29%) Loss: 1.898963 LR: 0.00004026 [09:06:00] Epoch: 1 Batch: 14695/38378 (38.29%) Loss: 1.854731 LR: 0.00004025 [09:06:02] Epoch: 1 Batch: 14696/38378 (38.29%) Loss: 2.045487 LR: 0.00004025 [09:06:03] Epoch: 1 Batch: 14697/38378 (38.30%) Loss: 2.047436 LR: 0.00004025 [09:06:05] Epoch: 1 Batch: 14698/38378 (38.30%) Loss: 1.937616 LR: 0.00004025 [09:06:07] Epoch: 1 Batch: 14699/38378 (38.30%) Loss: 2.263754 LR: 0.00004025 [09:06:12] >> Cleaned up old temp checkpoint: epoch1_step13700 [09:06:12] >> Temp checkpoint saved: epoch1_step14700, size: 0.1702 GB [09:06:12] Epoch: 1 Batch: 14700/38378 (38.30%) Loss: 1.905801 LR: 0.00004025 [09:06:14] Epoch: 1 Batch: 14701/38378 (38.31%) Loss: 1.973854 LR: 0.00004025 [09:06:16] Epoch: 1 Batch: 14702/38378 (38.31%) Loss: 2.028045 LR: 0.00004024 [09:06:17] Epoch: 1 Batch: 14703/38378 (38.31%) Loss: 2.046337 LR: 0.00004024 [09:06:19] Epoch: 1 Batch: 14704/38378 (38.31%) Loss: 1.736462 LR: 0.00004024 [09:06:21] Epoch: 1 Batch: 14705/38378 (38.32%) Loss: 2.265738 LR: 0.00004024 [09:06:22] Epoch: 1 Batch: 14706/38378 (38.32%) Loss: 1.766732 LR: 0.00004024 [09:06:24] Epoch: 1 Batch: 14707/38378 (38.32%) Loss: 2.039680 LR: 0.00004024 [09:06:26] Epoch: 1 Batch: 14708/38378 (38.32%) Loss: 2.135211 LR: 0.00004024 [09:06:27] Epoch: 1 Batch: 14709/38378 (38.33%) Loss: 1.821069 LR: 0.00004023 [09:06:29] Epoch: 1 Batch: 14710/38378 (38.33%) Loss: 1.941846 LR: 0.00004023 [09:06:31] Epoch: 1 Batch: 14711/38378 (38.33%) Loss: 2.381259 LR: 0.00004023 [09:06:32] Epoch: 1 Batch: 14712/38378 (38.33%) Loss: 1.919920 LR: 0.00004023 [09:06:34] Epoch: 1 Batch: 14713/38378 (38.34%) Loss: 1.952282 LR: 0.00004023 [09:06:36] Epoch: 1 Batch: 14714/38378 (38.34%) Loss: 2.175158 LR: 0.00004023 [09:06:38] Epoch: 1 Batch: 14715/38378 (38.34%) Loss: 2.241714 LR: 0.00004023 [09:06:39] Epoch: 1 Batch: 14716/38378 (38.34%) Loss: 1.952362 LR: 0.00004022 [09:06:41] Epoch: 1 Batch: 14717/38378 (38.35%) Loss: 1.940992 LR: 0.00004022 [09:06:43] Epoch: 1 Batch: 14718/38378 (38.35%) Loss: 1.859706 LR: 0.00004022 [09:06:44] Epoch: 1 Batch: 14719/38378 (38.35%) Loss: 1.943483 LR: 0.00004022 [09:06:46] Epoch: 1 Batch: 14720/38378 (38.36%) Loss: 2.256970 LR: 0.00004022 [09:06:48] Epoch: 1 Batch: 14721/38378 (38.36%) Loss: 1.996636 LR: 0.00004022 [09:06:50] Epoch: 1 Batch: 14722/38378 (38.36%) Loss: 2.190830 LR: 0.00004022 [09:06:51] Epoch: 1 Batch: 14723/38378 (38.36%) Loss: 1.837754 LR: 0.00004020 [09:06:53] Epoch: 1 Batch: 14724/38378 (38.37%) Loss: 2.216236 LR: 0.00004020 [09:06:55] Epoch: 1 Batch: 14725/38378 (38.37%) Loss: 2.048474 LR: 0.00004020 [09:06:56] Epoch: 1 Batch: 14726/38378 (38.37%) Loss: 2.065479 LR: 0.00004020 [09:06:58] Epoch: 1 Batch: 14727/38378 (38.37%) Loss: 2.050594 LR: 0.00004020 [09:07:00] Epoch: 1 Batch: 14728/38378 (38.38%) Loss: 1.885468 LR: 0.00004020 [09:07:01] Epoch: 1 Batch: 14729/38378 (38.38%) Loss: 2.132154 LR: 0.00004020 [09:07:03] Epoch: 1 Batch: 14730/38378 (38.38%) Loss: 1.888373 LR: 0.00004019 [09:07:05] Epoch: 1 Batch: 14731/38378 (38.38%) Loss: 1.826029 LR: 0.00004019 [09:07:06] Epoch: 1 Batch: 14732/38378 (38.39%) Loss: 2.276585 LR: 0.00004019 [09:07:08] Epoch: 1 Batch: 14733/38378 (38.39%) Loss: 1.990747 LR: 0.00004019 [09:07:10] Epoch: 1 Batch: 14734/38378 (38.39%) Loss: 2.039831 LR: 0.00004019 [09:07:12] Epoch: 1 Batch: 14735/38378 (38.39%) Loss: 1.805807 LR: 0.00004019 [09:07:13] Epoch: 1 Batch: 14736/38378 (38.40%) Loss: 1.976609 LR: 0.00004019 [09:07:15] Epoch: 1 Batch: 14737/38378 (38.40%) Loss: 1.968153 LR: 0.00004018 [09:07:17] Epoch: 1 Batch: 14738/38378 (38.40%) Loss: 1.741449 LR: 0.00004018 [09:07:18] Epoch: 1 Batch: 14739/38378 (38.40%) Loss: 2.074235 LR: 0.00004018 [09:07:20] Epoch: 1 Batch: 14740/38378 (38.41%) Loss: 1.976585 LR: 0.00004018 [09:07:22] Epoch: 1 Batch: 14741/38378 (38.41%) Loss: 2.083776 LR: 0.00004018 [09:07:23] Epoch: 1 Batch: 14742/38378 (38.41%) Loss: 1.891833 LR: 0.00004018 [09:07:25] Epoch: 1 Batch: 14743/38378 (38.42%) Loss: 1.916242 LR: 0.00004018 [09:07:27] Epoch: 1 Batch: 14744/38378 (38.42%) Loss: 1.785023 LR: 0.00004017 [09:07:28] Epoch: 1 Batch: 14745/38378 (38.42%) Loss: 2.049610 LR: 0.00004017 [09:07:30] Epoch: 1 Batch: 14746/38378 (38.42%) Loss: 1.977073 LR: 0.00004017 [09:07:32] Epoch: 1 Batch: 14747/38378 (38.43%) Loss: 1.823968 LR: 0.00004017 [09:07:33] Epoch: 1 Batch: 14748/38378 (38.43%) Loss: 2.398192 LR: 0.00004017 [09:07:35] Epoch: 1 Batch: 14749/38378 (38.43%) Loss: 2.159477 LR: 0.00004017 [09:07:37] Epoch: 1 Batch: 14750/38378 (38.43%) Loss: 1.752344 LR: 0.00004017 [09:07:38] Epoch: 1 Batch: 14751/38378 (38.44%) Loss: 2.044733 LR: 0.00004016 [09:07:40] Epoch: 1 Batch: 14752/38378 (38.44%) Loss: 2.167455 LR: 0.00004016 [09:07:42] Epoch: 1 Batch: 14753/38378 (38.44%) Loss: 2.217559 LR: 0.00004016 [09:07:44] Epoch: 1 Batch: 14754/38378 (38.44%) Loss: 2.206395 LR: 0.00004016 [09:07:45] Epoch: 1 Batch: 14755/38378 (38.45%) Loss: 1.884173 LR: 0.00004016 [09:07:47] Epoch: 1 Batch: 14756/38378 (38.45%) Loss: 1.918998 LR: 0.00004016 [09:07:49] Epoch: 1 Batch: 14757/38378 (38.45%) Loss: 1.993076 LR: 0.00004016 [09:07:50] Epoch: 1 Batch: 14758/38378 (38.45%) Loss: 1.874525 LR: 0.00004015 [09:07:52] Epoch: 1 Batch: 14759/38378 (38.46%) Loss: 1.880055 LR: 0.00004015 [09:07:54] Epoch: 1 Batch: 14760/38378 (38.46%) Loss: 1.824050 LR: 0.00004015 [09:07:55] Epoch: 1 Batch: 14761/38378 (38.46%) Loss: 1.953521 LR: 0.00004015 [09:07:57] Epoch: 1 Batch: 14762/38378 (38.46%) Loss: 1.836919 LR: 0.00004015 [09:07:59] Epoch: 1 Batch: 14763/38378 (38.47%) Loss: 2.082318 LR: 0.00004015 [09:08:00] Epoch: 1 Batch: 14764/38378 (38.47%) Loss: 1.731963 LR: 0.00004015 [09:08:02] Epoch: 1 Batch: 14765/38378 (38.47%) Loss: 2.138981 LR: 0.00004014 [09:08:04] Epoch: 1 Batch: 14766/38378 (38.48%) Loss: 2.142339 LR: 0.00004014 [09:08:06] Epoch: 1 Batch: 14767/38378 (38.48%) Loss: 2.158898 LR: 0.00004014 [09:08:07] Epoch: 1 Batch: 14768/38378 (38.48%) Loss: 1.737517 LR: 0.00004014 [09:08:09] Epoch: 1 Batch: 14769/38378 (38.48%) Loss: 2.047000 LR: 0.00004014 [09:08:11] Epoch: 1 Batch: 14770/38378 (38.49%) Loss: 1.971644 LR: 0.00004014 [09:08:12] Epoch: 1 Batch: 14771/38378 (38.49%) Loss: 2.308251 LR: 0.00004014 [09:08:14] Epoch: 1 Batch: 14772/38378 (38.49%) Loss: 2.196134 LR: 0.00004013 [09:08:16] Epoch: 1 Batch: 14773/38378 (38.49%) Loss: 1.995301 LR: 0.00004013 [09:08:17] Epoch: 1 Batch: 14774/38378 (38.50%) Loss: 1.897173 LR: 0.00004013 [09:08:19] Epoch: 1 Batch: 14775/38378 (38.50%) Loss: 1.938120 LR: 0.00004013 [09:08:21] Epoch: 1 Batch: 14776/38378 (38.50%) Loss: 2.200788 LR: 0.00004013 [09:08:22] Epoch: 1 Batch: 14777/38378 (38.50%) Loss: 2.222352 LR: 0.00004013 [09:08:24] Epoch: 1 Batch: 14778/38378 (38.51%) Loss: 1.905211 LR: 0.00004013 [09:08:26] Epoch: 1 Batch: 14779/38378 (38.51%) Loss: 1.832900 LR: 0.00004012 [09:08:27] Epoch: 1 Batch: 14780/38378 (38.51%) Loss: 1.921397 LR: 0.00004012 [09:08:29] Epoch: 1 Batch: 14781/38378 (38.51%) Loss: 2.028288 LR: 0.00004012 [09:08:31] Epoch: 1 Batch: 14782/38378 (38.52%) Loss: 1.879460 LR: 0.00004012 [09:08:33] Epoch: 1 Batch: 14783/38378 (38.52%) Loss: 2.067982 LR: 0.00004012 [09:08:34] Epoch: 1 Batch: 14784/38378 (38.52%) Loss: 1.967851 LR: 0.00004012 [09:08:36] Epoch: 1 Batch: 14785/38378 (38.52%) Loss: 2.309528 LR: 0.00004012 [09:08:38] Epoch: 1 Batch: 14786/38378 (38.53%) Loss: 2.240120 LR: 0.00004011 [09:08:39] Epoch: 1 Batch: 14787/38378 (38.53%) Loss: 1.989986 LR: 0.00004011 [09:08:41] Epoch: 1 Batch: 14788/38378 (38.53%) Loss: 2.195728 LR: 0.00004011 [09:08:43] Epoch: 1 Batch: 14789/38378 (38.54%) Loss: 2.302066 LR: 0.00004011 [09:08:44] Epoch: 1 Batch: 14790/38378 (38.54%) Loss: 1.900804 LR: 0.00004011 [09:08:46] Epoch: 1 Batch: 14791/38378 (38.54%) Loss: 2.102210 LR: 0.00004011 [09:08:48] Epoch: 1 Batch: 14792/38378 (38.54%) Loss: 2.167090 LR: 0.00004011 [09:08:50] Epoch: 1 Batch: 14793/38378 (38.55%) Loss: 1.986107 LR: 0.00004010 [09:08:51] Epoch: 1 Batch: 14794/38378 (38.55%) Loss: 1.963548 LR: 0.00004010 [09:08:53] Epoch: 1 Batch: 14795/38378 (38.55%) Loss: 1.872854 LR: 0.00004010 [09:08:55] Epoch: 1 Batch: 14796/38378 (38.55%) Loss: 1.957218 LR: 0.00004010 [09:08:56] Epoch: 1 Batch: 14797/38378 (38.56%) Loss: 2.165660 LR: 0.00004010 [09:08:58] Epoch: 1 Batch: 14798/38378 (38.56%) Loss: 1.998203 LR: 0.00004010 [09:09:00] Epoch: 1 Batch: 14799/38378 (38.56%) Loss: 2.123793 LR: 0.00004010 [09:09:05] >> Cleaned up old temp checkpoint: epoch1_step13800 [09:09:05] >> Temp checkpoint saved: epoch1_step14800, size: 0.1702 GB [09:09:05] Epoch: 1 Batch: 14800/38378 (38.56%) Loss: 1.801398 LR: 0.00004009 [09:09:07] Epoch: 1 Batch: 14801/38378 (38.57%) Loss: 1.649493 LR: 0.00004009 [09:09:09] Epoch: 1 Batch: 14802/38378 (38.57%) Loss: 1.952905 LR: 0.00004009 [09:09:10] Epoch: 1 Batch: 14803/38378 (38.57%) Loss: 2.151158 LR: 0.00004009 [09:09:12] Epoch: 1 Batch: 14804/38378 (38.57%) Loss: 2.017785 LR: 0.00004009 [09:09:14] Epoch: 1 Batch: 14805/38378 (38.58%) Loss: 1.572686 LR: 0.00004009 [09:09:15] Epoch: 1 Batch: 14806/38378 (38.58%) Loss: 1.991232 LR: 0.00004009 [09:09:17] Epoch: 1 Batch: 14807/38378 (38.58%) Loss: 1.765221 LR: 0.00004008 [09:09:19] Epoch: 1 Batch: 14808/38378 (38.58%) Loss: 1.969220 LR: 0.00004008 [09:09:20] Epoch: 1 Batch: 14809/38378 (38.59%) Loss: 2.029487 LR: 0.00004008 [09:09:22] Epoch: 1 Batch: 14810/38378 (38.59%) Loss: 1.888939 LR: 0.00004008 [09:09:24] Epoch: 1 Batch: 14811/38378 (38.59%) Loss: 1.935952 LR: 0.00004008 [09:09:25] Epoch: 1 Batch: 14812/38378 (38.60%) Loss: 2.041296 LR: 0.00004008 [09:09:27] Epoch: 1 Batch: 14813/38378 (38.60%) Loss: 2.295663 LR: 0.00004008 [09:09:29] Epoch: 1 Batch: 14814/38378 (38.60%) Loss: 1.948259 LR: 0.00004006 [09:09:31] Epoch: 1 Batch: 14815/38378 (38.60%) Loss: 2.217445 LR: 0.00004006 [09:09:32] Epoch: 1 Batch: 14816/38378 (38.61%) Loss: 2.056223 LR: 0.00004006 [09:09:34] Epoch: 1 Batch: 14817/38378 (38.61%) Loss: 2.098460 LR: 0.00004006 [09:09:36] Epoch: 1 Batch: 14818/38378 (38.61%) Loss: 2.034055 LR: 0.00004006 [09:09:37] Epoch: 1 Batch: 14819/38378 (38.61%) Loss: 2.169722 LR: 0.00004006 [09:09:39] Epoch: 1 Batch: 14820/38378 (38.62%) Loss: 2.192847 LR: 0.00004006 [09:09:41] Epoch: 1 Batch: 14821/38378 (38.62%) Loss: 2.200372 LR: 0.00004005 [09:09:43] Epoch: 1 Batch: 14822/38378 (38.62%) Loss: 2.260610 LR: 0.00004005 [09:09:44] Epoch: 1 Batch: 14823/38378 (38.62%) Loss: 1.962212 LR: 0.00004005 [09:09:46] Epoch: 1 Batch: 14824/38378 (38.63%) Loss: 2.101617 LR: 0.00004005 [09:09:48] Epoch: 1 Batch: 14825/38378 (38.63%) Loss: 2.091693 LR: 0.00004005 [09:09:49] Epoch: 1 Batch: 14826/38378 (38.63%) Loss: 2.118909 LR: 0.00004005 [09:09:51] Epoch: 1 Batch: 14827/38378 (38.63%) Loss: 1.772300 LR: 0.00004005 [09:09:53] Epoch: 1 Batch: 14828/38378 (38.64%) Loss: 1.703444 LR: 0.00004004 [09:09:54] Epoch: 1 Batch: 14829/38378 (38.64%) Loss: 2.181606 LR: 0.00004004 [09:09:56] Epoch: 1 Batch: 14830/38378 (38.64%) Loss: 2.056917 LR: 0.00004004 [09:09:58] Epoch: 1 Batch: 14831/38378 (38.64%) Loss: 2.108334 LR: 0.00004004 [09:09:59] Epoch: 1 Batch: 14832/38378 (38.65%) Loss: 1.854503 LR: 0.00004004 [09:10:01] Epoch: 1 Batch: 14833/38378 (38.65%) Loss: 1.857272 LR: 0.00004004 [09:10:03] Epoch: 1 Batch: 14834/38378 (38.65%) Loss: 2.097436 LR: 0.00004004 [09:10:05] Epoch: 1 Batch: 14835/38378 (38.65%) Loss: 2.115523 LR: 0.00004003 [09:10:06] Epoch: 1 Batch: 14836/38378 (38.66%) Loss: 2.185187 LR: 0.00004003 [09:10:08] Epoch: 1 Batch: 14837/38378 (38.66%) Loss: 2.235877 LR: 0.00004003 [09:10:10] Epoch: 1 Batch: 14838/38378 (38.66%) Loss: 2.197780 LR: 0.00004003 [09:10:11] Epoch: 1 Batch: 14839/38378 (38.67%) Loss: 2.180712 LR: 0.00004003 [09:10:13] Epoch: 1 Batch: 14840/38378 (38.67%) Loss: 1.919266 LR: 0.00004003 [09:10:15] Epoch: 1 Batch: 14841/38378 (38.67%) Loss: 1.976808 LR: 0.00004003 [09:10:16] Epoch: 1 Batch: 14842/38378 (38.67%) Loss: 2.199442 LR: 0.00004002 [09:10:18] Epoch: 1 Batch: 14843/38378 (38.68%) Loss: 1.997241 LR: 0.00004002 [09:10:20] Epoch: 1 Batch: 14844/38378 (38.68%) Loss: 2.072623 LR: 0.00004002 [09:10:21] Epoch: 1 Batch: 14845/38378 (38.68%) Loss: 1.928296 LR: 0.00004002 [09:10:23] Epoch: 1 Batch: 14846/38378 (38.68%) Loss: 2.114889 LR: 0.00004002 [09:10:25] Epoch: 1 Batch: 14847/38378 (38.69%) Loss: 1.826025 LR: 0.00004002 [09:10:26] Epoch: 1 Batch: 14848/38378 (38.69%) Loss: 1.996784 LR: 0.00004002 [09:10:28] Epoch: 1 Batch: 14849/38378 (38.69%) Loss: 2.078826 LR: 0.00004001 [09:10:30] Epoch: 1 Batch: 14850/38378 (38.69%) Loss: 1.869160 LR: 0.00004001 [09:10:32] Epoch: 1 Batch: 14851/38378 (38.70%) Loss: 1.877705 LR: 0.00004001 [09:10:33] Epoch: 1 Batch: 14852/38378 (38.70%) Loss: 1.842976 LR: 0.00004001 [09:10:35] Epoch: 1 Batch: 14853/38378 (38.70%) Loss: 1.906138 LR: 0.00004001 [09:10:37] Epoch: 1 Batch: 14854/38378 (38.70%) Loss: 2.190003 LR: 0.00004001 [09:10:38] Epoch: 1 Batch: 14855/38378 (38.71%) Loss: 1.997756 LR: 0.00004001 [09:10:40] Epoch: 1 Batch: 14856/38378 (38.71%) Loss: 2.156371 LR: 0.00004000 [09:10:42] Epoch: 1 Batch: 14857/38378 (38.71%) Loss: 2.149504 LR: 0.00004000 [09:10:43] Epoch: 1 Batch: 14858/38378 (38.71%) Loss: 2.222545 LR: 0.00004000 [09:10:45] Epoch: 1 Batch: 14859/38378 (38.72%) Loss: 2.138174 LR: 0.00004000 [09:10:47] Epoch: 1 Batch: 14860/38378 (38.72%) Loss: 1.905039 LR: 0.00004000 [09:10:48] Epoch: 1 Batch: 14861/38378 (38.72%) Loss: 2.094633 LR: 0.00004000 [09:10:50] Epoch: 1 Batch: 14862/38378 (38.73%) Loss: 1.876864 LR: 0.00004000 [09:10:52] Epoch: 1 Batch: 14863/38378 (38.73%) Loss: 2.152081 LR: 0.00003999 [09:10:53] Epoch: 1 Batch: 14864/38378 (38.73%) Loss: 2.050179 LR: 0.00003999 [09:10:55] Epoch: 1 Batch: 14865/38378 (38.73%) Loss: 2.214298 LR: 0.00003999 [09:10:57] Epoch: 1 Batch: 14866/38378 (38.74%) Loss: 1.956809 LR: 0.00003999 [09:10:59] Epoch: 1 Batch: 14867/38378 (38.74%) Loss: 1.900392 LR: 0.00003999 [09:11:00] Epoch: 1 Batch: 14868/38378 (38.74%) Loss: 1.902617 LR: 0.00003999 [09:11:02] Epoch: 1 Batch: 14869/38378 (38.74%) Loss: 1.818971 LR: 0.00003999 [09:11:04] Epoch: 1 Batch: 14870/38378 (38.75%) Loss: 1.735088 LR: 0.00003998 [09:11:05] Epoch: 1 Batch: 14871/38378 (38.75%) Loss: 1.876160 LR: 0.00003998 [09:11:07] Epoch: 1 Batch: 14872/38378 (38.75%) Loss: 1.779525 LR: 0.00003998 [09:11:09] Epoch: 1 Batch: 14873/38378 (38.75%) Loss: 1.895822 LR: 0.00003998 [09:11:10] Epoch: 1 Batch: 14874/38378 (38.76%) Loss: 2.007559 LR: 0.00003998 [09:11:12] Epoch: 1 Batch: 14875/38378 (38.76%) Loss: 2.010865 LR: 0.00003998 [09:11:14] Epoch: 1 Batch: 14876/38378 (38.76%) Loss: 1.904682 LR: 0.00003998 [09:11:15] Epoch: 1 Batch: 14877/38378 (38.76%) Loss: 2.288474 LR: 0.00003997 [09:11:17] Epoch: 1 Batch: 14878/38378 (38.77%) Loss: 2.111177 LR: 0.00003997 [09:11:19] Epoch: 1 Batch: 14879/38378 (38.77%) Loss: 1.893479 LR: 0.00003997 [09:11:20] Epoch: 1 Batch: 14880/38378 (38.77%) Loss: 1.965617 LR: 0.00003997 [09:11:22] Epoch: 1 Batch: 14881/38378 (38.77%) Loss: 1.900464 LR: 0.00003997 [09:11:24] Epoch: 1 Batch: 14882/38378 (38.78%) Loss: 2.029926 LR: 0.00003997 [09:11:26] Epoch: 1 Batch: 14883/38378 (38.78%) Loss: 2.121914 LR: 0.00003997 [09:11:27] Epoch: 1 Batch: 14884/38378 (38.78%) Loss: 1.861655 LR: 0.00003996 [09:11:29] Epoch: 1 Batch: 14885/38378 (38.79%) Loss: 2.127062 LR: 0.00003996 [09:11:31] Epoch: 1 Batch: 14886/38378 (38.79%) Loss: 1.858080 LR: 0.00003996 [09:11:32] Epoch: 1 Batch: 14887/38378 (38.79%) Loss: 1.605264 LR: 0.00003996 [09:11:34] Epoch: 1 Batch: 14888/38378 (38.79%) Loss: 1.853471 LR: 0.00003996 [09:11:36] Epoch: 1 Batch: 14889/38378 (38.80%) Loss: 2.233141 LR: 0.00003996 [09:11:37] Epoch: 1 Batch: 14890/38378 (38.80%) Loss: 2.075420 LR: 0.00003996 [09:11:39] Epoch: 1 Batch: 14891/38378 (38.80%) Loss: 1.685607 LR: 0.00003995 [09:11:41] Epoch: 1 Batch: 14892/38378 (38.80%) Loss: 1.929360 LR: 0.00003995 [09:11:42] Epoch: 1 Batch: 14893/38378 (38.81%) Loss: 1.892666 LR: 0.00003995 [09:11:44] Epoch: 1 Batch: 14894/38378 (38.81%) Loss: 1.877080 LR: 0.00003995 [09:11:46] Epoch: 1 Batch: 14895/38378 (38.81%) Loss: 1.886963 LR: 0.00003995 [09:11:47] Epoch: 1 Batch: 14896/38378 (38.81%) Loss: 2.005881 LR: 0.00003995 [09:11:49] Epoch: 1 Batch: 14897/38378 (38.82%) Loss: 1.940922 LR: 0.00003995 [09:11:51] Epoch: 1 Batch: 14898/38378 (38.82%) Loss: 2.061166 LR: 0.00003994 [09:11:53] Epoch: 1 Batch: 14899/38378 (38.82%) Loss: 1.912686 LR: 0.00003994 [09:11:58] >> Cleaned up old temp checkpoint: epoch1_step13900 [09:11:58] >> Temp checkpoint saved: epoch1_step14900, size: 0.1702 GB [09:11:58] Epoch: 1 Batch: 14900/38378 (38.82%) Loss: 1.921675 LR: 0.00003994 [09:12:00] Epoch: 1 Batch: 14901/38378 (38.83%) Loss: 1.903109 LR: 0.00003994 [09:12:02] Epoch: 1 Batch: 14902/38378 (38.83%) Loss: 2.060440 LR: 0.00003994 [09:12:03] Epoch: 1 Batch: 14903/38378 (38.83%) Loss: 1.803604 LR: 0.00003994 [09:12:05] Epoch: 1 Batch: 14904/38378 (38.83%) Loss: 2.093740 LR: 0.00003994 [09:12:07] Epoch: 1 Batch: 14905/38378 (38.84%) Loss: 1.703619 LR: 0.00003992 [09:12:08] Epoch: 1 Batch: 14906/38378 (38.84%) Loss: 2.058153 LR: 0.00003992 [09:12:10] Epoch: 1 Batch: 14907/38378 (38.84%) Loss: 2.139041 LR: 0.00003992 [09:12:12] Epoch: 1 Batch: 14908/38378 (38.85%) Loss: 2.187633 LR: 0.00003992 [09:12:13] Epoch: 1 Batch: 14909/38378 (38.85%) Loss: 1.855989 LR: 0.00003992 [09:12:15] Epoch: 1 Batch: 14910/38378 (38.85%) Loss: 2.096915 LR: 0.00003992 [09:12:17] Epoch: 1 Batch: 14911/38378 (38.85%) Loss: 2.056571 LR: 0.00003992 [09:12:18] Epoch: 1 Batch: 14912/38378 (38.86%) Loss: 1.790642 LR: 0.00003991 [09:12:20] Epoch: 1 Batch: 14913/38378 (38.86%) Loss: 1.872847 LR: 0.00003991 [09:12:22] Epoch: 1 Batch: 14914/38378 (38.86%) Loss: 2.192240 LR: 0.00003991 [09:12:23] Epoch: 1 Batch: 14915/38378 (38.86%) Loss: 2.009238 LR: 0.00003991 [09:12:25] Epoch: 1 Batch: 14916/38378 (38.87%) Loss: 2.111103 LR: 0.00003991 [09:12:27] Epoch: 1 Batch: 14917/38378 (38.87%) Loss: 1.764054 LR: 0.00003991 [09:12:29] Epoch: 1 Batch: 14918/38378 (38.87%) Loss: 2.069961 LR: 0.00003991 [09:12:30] Epoch: 1 Batch: 14919/38378 (38.87%) Loss: 1.876836 LR: 0.00003990 [09:12:32] Epoch: 1 Batch: 14920/38378 (38.88%) Loss: 1.836816 LR: 0.00003990 [09:12:34] Epoch: 1 Batch: 14921/38378 (38.88%) Loss: 2.130472 LR: 0.00003990 [09:12:35] Epoch: 1 Batch: 14922/38378 (38.88%) Loss: 1.934934 LR: 0.00003990 [09:12:37] Epoch: 1 Batch: 14923/38378 (38.88%) Loss: 1.964785 LR: 0.00003990 [09:12:39] Epoch: 1 Batch: 14924/38378 (38.89%) Loss: 1.886216 LR: 0.00003990 [09:12:40] Epoch: 1 Batch: 14925/38378 (38.89%) Loss: 2.032364 LR: 0.00003990 [09:12:42] Epoch: 1 Batch: 14926/38378 (38.89%) Loss: 2.066488 LR: 0.00003989 [09:12:44] Epoch: 1 Batch: 14927/38378 (38.89%) Loss: 2.045656 LR: 0.00003989 [09:12:45] Epoch: 1 Batch: 14928/38378 (38.90%) Loss: 1.966388 LR: 0.00003989 [09:12:47] Epoch: 1 Batch: 14929/38378 (38.90%) Loss: 2.033683 LR: 0.00003989 [09:12:49] Epoch: 1 Batch: 14930/38378 (38.90%) Loss: 1.891504 LR: 0.00003989 [09:12:50] Epoch: 1 Batch: 14931/38378 (38.91%) Loss: 1.783832 LR: 0.00003989 [09:12:52] Epoch: 1 Batch: 14932/38378 (38.91%) Loss: 1.912833 LR: 0.00003989 [09:12:54] Epoch: 1 Batch: 14933/38378 (38.91%) Loss: 2.034255 LR: 0.00003988 [09:12:56] Epoch: 1 Batch: 14934/38378 (38.91%) Loss: 1.913726 LR: 0.00003988 [09:12:57] Epoch: 1 Batch: 14935/38378 (38.92%) Loss: 1.908120 LR: 0.00003988 [09:12:59] Epoch: 1 Batch: 14936/38378 (38.92%) Loss: 1.553891 LR: 0.00003988 [09:13:01] Epoch: 1 Batch: 14937/38378 (38.92%) Loss: 1.803054 LR: 0.00003988 [09:13:02] Epoch: 1 Batch: 14938/38378 (38.92%) Loss: 2.190356 LR: 0.00003988 [09:13:04] Epoch: 1 Batch: 14939/38378 (38.93%) Loss: 1.849155 LR: 0.00003988 [09:13:06] Epoch: 1 Batch: 14940/38378 (38.93%) Loss: 1.903850 LR: 0.00003987 [09:13:07] Epoch: 1 Batch: 14941/38378 (38.93%) Loss: 1.840247 LR: 0.00003987 [09:13:09] Epoch: 1 Batch: 14942/38378 (38.93%) Loss: 2.120898 LR: 0.00003987 [09:13:11] Epoch: 1 Batch: 14943/38378 (38.94%) Loss: 2.026332 LR: 0.00003987 [09:13:12] Epoch: 1 Batch: 14944/38378 (38.94%) Loss: 1.908437 LR: 0.00003987 [09:13:14] Epoch: 1 Batch: 14945/38378 (38.94%) Loss: 1.711226 LR: 0.00003987 [09:13:16] Epoch: 1 Batch: 14946/38378 (38.94%) Loss: 1.896037 LR: 0.00003987 [09:13:17] Epoch: 1 Batch: 14947/38378 (38.95%) Loss: 2.127036 LR: 0.00003986 [09:13:19] Epoch: 1 Batch: 14948/38378 (38.95%) Loss: 2.103230 LR: 0.00003986 [09:13:21] Epoch: 1 Batch: 14949/38378 (38.95%) Loss: 2.148409 LR: 0.00003986 [09:13:22] Epoch: 1 Batch: 14950/38378 (38.95%) Loss: 1.812207 LR: 0.00003986 [09:13:24] Epoch: 1 Batch: 14951/38378 (38.96%) Loss: 1.890952 LR: 0.00003986 [09:13:26] Epoch: 1 Batch: 14952/38378 (38.96%) Loss: 1.965902 LR: 0.00003986 [09:13:27] Epoch: 1 Batch: 14953/38378 (38.96%) Loss: 2.249038 LR: 0.00003986 [09:13:29] Epoch: 1 Batch: 14954/38378 (38.97%) Loss: 1.979534 LR: 0.00003985 [09:13:31] Epoch: 1 Batch: 14955/38378 (38.97%) Loss: 1.837294 LR: 0.00003985 [09:13:33] Epoch: 1 Batch: 14956/38378 (38.97%) Loss: 1.832141 LR: 0.00003985 [09:13:34] Epoch: 1 Batch: 14957/38378 (38.97%) Loss: 2.138695 LR: 0.00003985 [09:13:36] Epoch: 1 Batch: 14958/38378 (38.98%) Loss: 2.081150 LR: 0.00003985 [09:13:38] Epoch: 1 Batch: 14959/38378 (38.98%) Loss: 1.969157 LR: 0.00003985 [09:13:39] Epoch: 1 Batch: 14960/38378 (38.98%) Loss: 1.955650 LR: 0.00003985 [09:13:41] Epoch: 1 Batch: 14961/38378 (38.98%) Loss: 2.148532 LR: 0.00003984 [09:13:43] Epoch: 1 Batch: 14962/38378 (38.99%) Loss: 1.999130 LR: 0.00003984 [09:13:44] Epoch: 1 Batch: 14963/38378 (38.99%) Loss: 1.795113 LR: 0.00003984 [09:13:46] Epoch: 1 Batch: 14964/38378 (38.99%) Loss: 1.891032 LR: 0.00003984 [09:13:48] Epoch: 1 Batch: 14965/38378 (38.99%) Loss: 1.924884 LR: 0.00003984 [09:13:49] Epoch: 1 Batch: 14966/38378 (39.00%) Loss: 2.105074 LR: 0.00003984 [09:13:51] Epoch: 1 Batch: 14967/38378 (39.00%) Loss: 2.083204 LR: 0.00003984 [09:13:53] Epoch: 1 Batch: 14968/38378 (39.00%) Loss: 2.087095 LR: 0.00003983 [09:13:55] Epoch: 1 Batch: 14969/38378 (39.00%) Loss: 2.110574 LR: 0.00003983 [09:13:56] Epoch: 1 Batch: 14970/38378 (39.01%) Loss: 1.835804 LR: 0.00003983 [09:13:58] Epoch: 1 Batch: 14971/38378 (39.01%) Loss: 1.857271 LR: 0.00003983 [09:14:00] Epoch: 1 Batch: 14972/38378 (39.01%) Loss: 1.751627 LR: 0.00003983 [09:14:01] Epoch: 1 Batch: 14973/38378 (39.01%) Loss: 2.389892 LR: 0.00003983 [09:14:03] Epoch: 1 Batch: 14974/38378 (39.02%) Loss: 1.921912 LR: 0.00003983 [09:14:05] Epoch: 1 Batch: 14975/38378 (39.02%) Loss: 2.244353 LR: 0.00003982 [09:14:06] Epoch: 1 Batch: 14976/38378 (39.02%) Loss: 1.700059 LR: 0.00003982 [09:14:08] Epoch: 1 Batch: 14977/38378 (39.02%) Loss: 1.967360 LR: 0.00003982 [09:14:10] Epoch: 1 Batch: 14978/38378 (39.03%) Loss: 1.974006 LR: 0.00003982 [09:14:11] Epoch: 1 Batch: 14979/38378 (39.03%) Loss: 1.997394 LR: 0.00003982 [09:14:13] Epoch: 1 Batch: 14980/38378 (39.03%) Loss: 1.938131 LR: 0.00003982 [09:14:15] Epoch: 1 Batch: 14981/38378 (39.04%) Loss: 2.064930 LR: 0.00003982 [09:14:16] Epoch: 1 Batch: 14982/38378 (39.04%) Loss: 1.984546 LR: 0.00003981 [09:14:18] Epoch: 1 Batch: 14983/38378 (39.04%) Loss: 1.969111 LR: 0.00003981 [09:14:20] Epoch: 1 Batch: 14984/38378 (39.04%) Loss: 2.081448 LR: 0.00003981 [09:14:21] Epoch: 1 Batch: 14985/38378 (39.05%) Loss: 2.049137 LR: 0.00003981 [09:14:23] Epoch: 1 Batch: 14986/38378 (39.05%) Loss: 2.095611 LR: 0.00003981 [09:14:25] Epoch: 1 Batch: 14987/38378 (39.05%) Loss: 1.927632 LR: 0.00003981 [09:14:27] Epoch: 1 Batch: 14988/38378 (39.05%) Loss: 1.913659 LR: 0.00003981 [09:14:28] Epoch: 1 Batch: 14989/38378 (39.06%) Loss: 2.250087 LR: 0.00003979 [09:14:30] Epoch: 1 Batch: 14990/38378 (39.06%) Loss: 1.902648 LR: 0.00003979 [09:14:32] Epoch: 1 Batch: 14991/38378 (39.06%) Loss: 1.935639 LR: 0.00003979 [09:14:33] Epoch: 1 Batch: 14992/38378 (39.06%) Loss: 1.802219 LR: 0.00003979 [09:14:35] Epoch: 1 Batch: 14993/38378 (39.07%) Loss: 1.793909 LR: 0.00003979 [09:14:37] Epoch: 1 Batch: 14994/38378 (39.07%) Loss: 2.127965 LR: 0.00003979 [09:14:38] Epoch: 1 Batch: 14995/38378 (39.07%) Loss: 2.052101 LR: 0.00003979 [09:14:40] Epoch: 1 Batch: 14996/38378 (39.07%) Loss: 1.959250 LR: 0.00003978 [09:14:42] Epoch: 1 Batch: 14997/38378 (39.08%) Loss: 2.132549 LR: 0.00003978 [09:14:43] Epoch: 1 Batch: 14998/38378 (39.08%) Loss: 2.100499 LR: 0.00003978 [09:14:45] Epoch: 1 Batch: 14999/38378 (39.08%) Loss: 2.322639 LR: 0.00003978 [09:14:47] >> Evaluating batch 0 [09:14:48] >> Evaluating batch 1 [09:14:48] >> Evaluating batch 2 [09:14:49] >> Evaluating batch 3 [09:14:50] >> Evaluating batch 4 [09:14:51] >> Evaluating batch 5 [09:14:52] >> Evaluating batch 6 [09:14:53] >> Evaluating batch 7 [09:14:54] >> Evaluating batch 8 [09:14:55] >> Evaluating batch 9 [09:14:56] >> Evaluating batch 10 [09:14:57] >> Evaluating batch 11 [09:14:58] >> Evaluating batch 12 [09:14:59] >> Evaluating batch 13 [09:15:00] >> Evaluating batch 14 [09:15:01] >> Evaluating batch 15 [09:15:02] >> Evaluating batch 16 [09:15:02] Epoch: 1 Step: 15000/38378 Evaluation: [09:15:02] [1mAvg Loss Since Last Eval: 1.9933 Val Loss: 2.0976 Validation loss delta: -0.0000 Perplexity: 8.1466 LR: 0.00003978 [09:15:06] >> Cleaned up old temp checkpoint: epoch1_step14000 [09:15:06] >> Temp checkpoint saved: epoch1_step15000, size: 0.1702 GB [09:15:10] >> Checkpoint saved: epoch1_step15000, size: 0.1702 GB