furmaniak commited on
Commit
3d53b59
·
verified ·
1 Parent(s): 261dafd

End of training

Browse files
Files changed (5) hide show
  1. README.md +1 -1
  2. all_results.json +6 -6
  3. train_results.json +6 -6
  4. trainer_state.json +1425 -11
  5. training_loss.png +0 -0
README.md CHANGED
@@ -16,7 +16,7 @@ should probably proofread and complete it, then remove this comment. -->
16
 
17
  # qwen2.5-32b-openalex
18
 
19
- This model is a fine-tuned version of [Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) on an unknown dataset.
20
 
21
  ## Model description
22
 
 
16
 
17
  # qwen2.5-32b-openalex
18
 
19
+ This model is a fine-tuned version of [Qwen/Qwen2.5-32B](https://huggingface.co/Qwen/Qwen2.5-32B) on the openalex dataset.
20
 
21
  ## Model description
22
 
all_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 0.9977164939399262,
3
- "total_flos": 4998655006212096.0,
4
- "train_loss": 1.5558027747651222,
5
- "train_runtime": 142154.3689,
6
- "train_samples_per_second": 0.641,
7
- "train_steps_per_second": 0.002
8
  }
 
1
  {
2
+ "epoch": 0.9991031390134529,
3
+ "total_flos": 7841700554735616.0,
4
+ "train_loss": 0.4275342236729456,
5
+ "train_runtime": 63728.3403,
6
+ "train_samples_per_second": 2.239,
7
+ "train_steps_per_second": 0.009
8
  }
train_results.json CHANGED
@@ -1,8 +1,8 @@
1
  {
2
- "epoch": 0.9977164939399262,
3
- "total_flos": 4998655006212096.0,
4
- "train_loss": 1.5558027747651222,
5
- "train_runtime": 142154.3689,
6
- "train_samples_per_second": 0.641,
7
- "train_steps_per_second": 0.002
8
  }
 
1
  {
2
+ "epoch": 0.9991031390134529,
3
+ "total_flos": 7841700554735616.0,
4
+ "train_loss": 0.4275342236729456,
5
+ "train_runtime": 63728.3403,
6
+ "train_samples_per_second": 2.239,
7
+ "train_steps_per_second": 0.009
8
  }
trainer_state.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
- "epoch": 0.9977164939399262,
5
  "eval_steps": 500,
6
- "global_step": 355,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
@@ -2494,17 +2494,1431 @@
2494
  "step": 355
2495
  },
2496
  {
2497
- "epoch": 0.9977164939399262,
2498
- "step": 355,
2499
- "total_flos": 4998655006212096.0,
2500
- "train_loss": 1.5558027747651222,
2501
- "train_runtime": 142154.3689,
2502
- "train_samples_per_second": 0.641,
2503
- "train_steps_per_second": 0.002
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2504
  }
2505
  ],
2506
  "logging_steps": 1,
2507
- "max_steps": 355,
2508
  "num_input_tokens_seen": 0,
2509
  "num_train_epochs": 1,
2510
  "save_steps": 100,
@@ -2520,7 +3934,7 @@
2520
  "attributes": {}
2521
  }
2522
  },
2523
- "total_flos": 4998655006212096.0,
2524
  "train_batch_size": 2,
2525
  "trial_name": null,
2526
  "trial_params": null
 
1
  {
2
  "best_metric": null,
3
  "best_model_checkpoint": null,
4
+ "epoch": 0.9991031390134529,
5
  "eval_steps": 500,
6
+ "global_step": 557,
7
  "is_hyper_param_search": false,
8
  "is_local_process_zero": true,
9
  "is_world_process_zero": true,
 
2494
  "step": 355
2495
  },
2496
  {
2497
+ "epoch": 0.6385650224215247,
2498
+ "grad_norm": 0.010235507041215897,
2499
+ "learning_rate": 2e-05,
2500
+ "loss": 1.5054,
2501
+ "step": 356
2502
+ },
2503
+ {
2504
+ "epoch": 0.6403587443946188,
2505
+ "grad_norm": 0.010560178197920322,
2506
+ "learning_rate": 2e-05,
2507
+ "loss": 1.5118,
2508
+ "step": 357
2509
+ },
2510
+ {
2511
+ "epoch": 0.6421524663677131,
2512
+ "grad_norm": 0.010353959165513515,
2513
+ "learning_rate": 2e-05,
2514
+ "loss": 1.5226,
2515
+ "step": 358
2516
+ },
2517
+ {
2518
+ "epoch": 0.6439461883408072,
2519
+ "grad_norm": 0.010382940992712975,
2520
+ "learning_rate": 2e-05,
2521
+ "loss": 1.5078,
2522
+ "step": 359
2523
+ },
2524
+ {
2525
+ "epoch": 0.6457399103139013,
2526
+ "grad_norm": 0.009856803342700005,
2527
+ "learning_rate": 2e-05,
2528
+ "loss": 1.5167,
2529
+ "step": 360
2530
+ },
2531
+ {
2532
+ "epoch": 0.6475336322869956,
2533
+ "grad_norm": 0.010195410810410976,
2534
+ "learning_rate": 2e-05,
2535
+ "loss": 1.5142,
2536
+ "step": 361
2537
+ },
2538
+ {
2539
+ "epoch": 0.6493273542600897,
2540
+ "grad_norm": 0.010302864946424961,
2541
+ "learning_rate": 2e-05,
2542
+ "loss": 1.5136,
2543
+ "step": 362
2544
+ },
2545
+ {
2546
+ "epoch": 0.6511210762331838,
2547
+ "grad_norm": 0.010046405717730522,
2548
+ "learning_rate": 2e-05,
2549
+ "loss": 1.5112,
2550
+ "step": 363
2551
+ },
2552
+ {
2553
+ "epoch": 0.6529147982062781,
2554
+ "grad_norm": 0.010849208571016788,
2555
+ "learning_rate": 2e-05,
2556
+ "loss": 1.5114,
2557
+ "step": 364
2558
+ },
2559
+ {
2560
+ "epoch": 0.6547085201793722,
2561
+ "grad_norm": 0.010421674698591232,
2562
+ "learning_rate": 2e-05,
2563
+ "loss": 1.5173,
2564
+ "step": 365
2565
+ },
2566
+ {
2567
+ "epoch": 0.6565022421524663,
2568
+ "grad_norm": 0.00989589188247919,
2569
+ "learning_rate": 2e-05,
2570
+ "loss": 1.5063,
2571
+ "step": 366
2572
+ },
2573
+ {
2574
+ "epoch": 0.6582959641255606,
2575
+ "grad_norm": 0.010465629398822784,
2576
+ "learning_rate": 2e-05,
2577
+ "loss": 1.5031,
2578
+ "step": 367
2579
+ },
2580
+ {
2581
+ "epoch": 0.6600896860986547,
2582
+ "grad_norm": 0.009964341297745705,
2583
+ "learning_rate": 2e-05,
2584
+ "loss": 1.5207,
2585
+ "step": 368
2586
+ },
2587
+ {
2588
+ "epoch": 0.6618834080717488,
2589
+ "grad_norm": 0.01189314667135477,
2590
+ "learning_rate": 2e-05,
2591
+ "loss": 1.5361,
2592
+ "step": 369
2593
+ },
2594
+ {
2595
+ "epoch": 0.6636771300448431,
2596
+ "grad_norm": 0.01012677513062954,
2597
+ "learning_rate": 2e-05,
2598
+ "loss": 1.5215,
2599
+ "step": 370
2600
+ },
2601
+ {
2602
+ "epoch": 0.6654708520179372,
2603
+ "grad_norm": 0.009877102449536324,
2604
+ "learning_rate": 2e-05,
2605
+ "loss": 1.5262,
2606
+ "step": 371
2607
+ },
2608
+ {
2609
+ "epoch": 0.6672645739910313,
2610
+ "grad_norm": 0.01000463031232357,
2611
+ "learning_rate": 2e-05,
2612
+ "loss": 1.5183,
2613
+ "step": 372
2614
+ },
2615
+ {
2616
+ "epoch": 0.6690582959641256,
2617
+ "grad_norm": 0.010188892483711243,
2618
+ "learning_rate": 2e-05,
2619
+ "loss": 1.5183,
2620
+ "step": 373
2621
+ },
2622
+ {
2623
+ "epoch": 0.6708520179372197,
2624
+ "grad_norm": 0.010129815898835659,
2625
+ "learning_rate": 2e-05,
2626
+ "loss": 1.5245,
2627
+ "step": 374
2628
+ },
2629
+ {
2630
+ "epoch": 0.672645739910314,
2631
+ "grad_norm": 0.010608335956931114,
2632
+ "learning_rate": 2e-05,
2633
+ "loss": 1.5169,
2634
+ "step": 375
2635
+ },
2636
+ {
2637
+ "epoch": 0.6744394618834081,
2638
+ "grad_norm": 0.010223207995295525,
2639
+ "learning_rate": 2e-05,
2640
+ "loss": 1.5185,
2641
+ "step": 376
2642
+ },
2643
+ {
2644
+ "epoch": 0.6762331838565022,
2645
+ "grad_norm": 0.010141369886696339,
2646
+ "learning_rate": 2e-05,
2647
+ "loss": 1.5161,
2648
+ "step": 377
2649
+ },
2650
+ {
2651
+ "epoch": 0.6780269058295965,
2652
+ "grad_norm": 0.01027351152151823,
2653
+ "learning_rate": 2e-05,
2654
+ "loss": 1.5119,
2655
+ "step": 378
2656
+ },
2657
+ {
2658
+ "epoch": 0.6798206278026906,
2659
+ "grad_norm": 0.010362266562879086,
2660
+ "learning_rate": 2e-05,
2661
+ "loss": 1.5126,
2662
+ "step": 379
2663
+ },
2664
+ {
2665
+ "epoch": 0.6816143497757847,
2666
+ "grad_norm": 0.010336722247302532,
2667
+ "learning_rate": 2e-05,
2668
+ "loss": 1.5173,
2669
+ "step": 380
2670
+ },
2671
+ {
2672
+ "epoch": 0.683408071748879,
2673
+ "grad_norm": 0.01007298193871975,
2674
+ "learning_rate": 2e-05,
2675
+ "loss": 1.5075,
2676
+ "step": 381
2677
+ },
2678
+ {
2679
+ "epoch": 0.6852017937219731,
2680
+ "grad_norm": 0.010275410488247871,
2681
+ "learning_rate": 2e-05,
2682
+ "loss": 1.5123,
2683
+ "step": 382
2684
+ },
2685
+ {
2686
+ "epoch": 0.6869955156950672,
2687
+ "grad_norm": 0.010203160345554352,
2688
+ "learning_rate": 2e-05,
2689
+ "loss": 1.5151,
2690
+ "step": 383
2691
+ },
2692
+ {
2693
+ "epoch": 0.6887892376681615,
2694
+ "grad_norm": 0.010127630084753036,
2695
+ "learning_rate": 2e-05,
2696
+ "loss": 1.5211,
2697
+ "step": 384
2698
+ },
2699
+ {
2700
+ "epoch": 0.6905829596412556,
2701
+ "grad_norm": 0.009799284860491753,
2702
+ "learning_rate": 2e-05,
2703
+ "loss": 1.5191,
2704
+ "step": 385
2705
+ },
2706
+ {
2707
+ "epoch": 0.6923766816143497,
2708
+ "grad_norm": 0.01014394499361515,
2709
+ "learning_rate": 2e-05,
2710
+ "loss": 1.5261,
2711
+ "step": 386
2712
+ },
2713
+ {
2714
+ "epoch": 0.694170403587444,
2715
+ "grad_norm": 0.010567774064838886,
2716
+ "learning_rate": 2e-05,
2717
+ "loss": 1.5232,
2718
+ "step": 387
2719
+ },
2720
+ {
2721
+ "epoch": 0.6959641255605381,
2722
+ "grad_norm": 0.010051852092146873,
2723
+ "learning_rate": 2e-05,
2724
+ "loss": 1.5212,
2725
+ "step": 388
2726
+ },
2727
+ {
2728
+ "epoch": 0.6977578475336322,
2729
+ "grad_norm": 0.010241293348371983,
2730
+ "learning_rate": 2e-05,
2731
+ "loss": 1.5094,
2732
+ "step": 389
2733
+ },
2734
+ {
2735
+ "epoch": 0.6995515695067265,
2736
+ "grad_norm": 0.0095717404037714,
2737
+ "learning_rate": 2e-05,
2738
+ "loss": 1.5115,
2739
+ "step": 390
2740
+ },
2741
+ {
2742
+ "epoch": 0.7013452914798206,
2743
+ "grad_norm": 0.00974031537771225,
2744
+ "learning_rate": 2e-05,
2745
+ "loss": 1.5195,
2746
+ "step": 391
2747
+ },
2748
+ {
2749
+ "epoch": 0.7031390134529149,
2750
+ "grad_norm": 0.010140657424926758,
2751
+ "learning_rate": 2e-05,
2752
+ "loss": 1.5048,
2753
+ "step": 392
2754
+ },
2755
+ {
2756
+ "epoch": 0.704932735426009,
2757
+ "grad_norm": 0.010055477730929852,
2758
+ "learning_rate": 2e-05,
2759
+ "loss": 1.5162,
2760
+ "step": 393
2761
+ },
2762
+ {
2763
+ "epoch": 0.7067264573991031,
2764
+ "grad_norm": 0.01005468424409628,
2765
+ "learning_rate": 2e-05,
2766
+ "loss": 1.5258,
2767
+ "step": 394
2768
+ },
2769
+ {
2770
+ "epoch": 0.7085201793721974,
2771
+ "grad_norm": 0.010284669697284698,
2772
+ "learning_rate": 2e-05,
2773
+ "loss": 1.5094,
2774
+ "step": 395
2775
+ },
2776
+ {
2777
+ "epoch": 0.7103139013452915,
2778
+ "grad_norm": 0.010200968012213707,
2779
+ "learning_rate": 2e-05,
2780
+ "loss": 1.5172,
2781
+ "step": 396
2782
+ },
2783
+ {
2784
+ "epoch": 0.7121076233183856,
2785
+ "grad_norm": 0.01015354972332716,
2786
+ "learning_rate": 2e-05,
2787
+ "loss": 1.5117,
2788
+ "step": 397
2789
+ },
2790
+ {
2791
+ "epoch": 0.7139013452914799,
2792
+ "grad_norm": 0.009913373738527298,
2793
+ "learning_rate": 2e-05,
2794
+ "loss": 1.5268,
2795
+ "step": 398
2796
+ },
2797
+ {
2798
+ "epoch": 0.715695067264574,
2799
+ "grad_norm": 0.010287330485880375,
2800
+ "learning_rate": 2e-05,
2801
+ "loss": 1.5211,
2802
+ "step": 399
2803
+ },
2804
+ {
2805
+ "epoch": 0.7174887892376681,
2806
+ "grad_norm": 0.01057345885783434,
2807
+ "learning_rate": 2e-05,
2808
+ "loss": 1.5199,
2809
+ "step": 400
2810
+ },
2811
+ {
2812
+ "epoch": 0.7192825112107624,
2813
+ "grad_norm": 0.010113878175616264,
2814
+ "learning_rate": 2e-05,
2815
+ "loss": 1.5168,
2816
+ "step": 401
2817
+ },
2818
+ {
2819
+ "epoch": 0.7210762331838565,
2820
+ "grad_norm": 0.009940318763256073,
2821
+ "learning_rate": 2e-05,
2822
+ "loss": 1.5175,
2823
+ "step": 402
2824
+ },
2825
+ {
2826
+ "epoch": 0.7228699551569506,
2827
+ "grad_norm": 0.010180394165217876,
2828
+ "learning_rate": 2e-05,
2829
+ "loss": 1.5211,
2830
+ "step": 403
2831
+ },
2832
+ {
2833
+ "epoch": 0.7246636771300449,
2834
+ "grad_norm": 0.00961736124008894,
2835
+ "learning_rate": 2e-05,
2836
+ "loss": 1.5228,
2837
+ "step": 404
2838
+ },
2839
+ {
2840
+ "epoch": 0.726457399103139,
2841
+ "grad_norm": 0.010378845036029816,
2842
+ "learning_rate": 2e-05,
2843
+ "loss": 1.522,
2844
+ "step": 405
2845
+ },
2846
+ {
2847
+ "epoch": 0.7282511210762331,
2848
+ "grad_norm": 0.010189516469836235,
2849
+ "learning_rate": 2e-05,
2850
+ "loss": 1.525,
2851
+ "step": 406
2852
+ },
2853
+ {
2854
+ "epoch": 0.7300448430493274,
2855
+ "grad_norm": 0.010004358366131783,
2856
+ "learning_rate": 2e-05,
2857
+ "loss": 1.5172,
2858
+ "step": 407
2859
+ },
2860
+ {
2861
+ "epoch": 0.7318385650224215,
2862
+ "grad_norm": 0.010387993417680264,
2863
+ "learning_rate": 2e-05,
2864
+ "loss": 1.5246,
2865
+ "step": 408
2866
+ },
2867
+ {
2868
+ "epoch": 0.7336322869955157,
2869
+ "grad_norm": 0.010004810988903046,
2870
+ "learning_rate": 2e-05,
2871
+ "loss": 1.5132,
2872
+ "step": 409
2873
+ },
2874
+ {
2875
+ "epoch": 0.7354260089686099,
2876
+ "grad_norm": 0.009845850057899952,
2877
+ "learning_rate": 2e-05,
2878
+ "loss": 1.5248,
2879
+ "step": 410
2880
+ },
2881
+ {
2882
+ "epoch": 0.737219730941704,
2883
+ "grad_norm": 0.010015097446739674,
2884
+ "learning_rate": 2e-05,
2885
+ "loss": 1.5196,
2886
+ "step": 411
2887
+ },
2888
+ {
2889
+ "epoch": 0.7390134529147983,
2890
+ "grad_norm": 0.009975203312933445,
2891
+ "learning_rate": 2e-05,
2892
+ "loss": 1.5096,
2893
+ "step": 412
2894
+ },
2895
+ {
2896
+ "epoch": 0.7408071748878924,
2897
+ "grad_norm": 0.010078891180455685,
2898
+ "learning_rate": 2e-05,
2899
+ "loss": 1.5162,
2900
+ "step": 413
2901
+ },
2902
+ {
2903
+ "epoch": 0.7426008968609865,
2904
+ "grad_norm": 0.011885426007211208,
2905
+ "learning_rate": 2e-05,
2906
+ "loss": 1.5189,
2907
+ "step": 414
2908
+ },
2909
+ {
2910
+ "epoch": 0.7443946188340808,
2911
+ "grad_norm": 0.009693853557109833,
2912
+ "learning_rate": 2e-05,
2913
+ "loss": 1.5194,
2914
+ "step": 415
2915
+ },
2916
+ {
2917
+ "epoch": 0.7461883408071749,
2918
+ "grad_norm": 0.010337116196751595,
2919
+ "learning_rate": 2e-05,
2920
+ "loss": 1.5191,
2921
+ "step": 416
2922
+ },
2923
+ {
2924
+ "epoch": 0.747982062780269,
2925
+ "grad_norm": 0.00993486400693655,
2926
+ "learning_rate": 2e-05,
2927
+ "loss": 1.5177,
2928
+ "step": 417
2929
+ },
2930
+ {
2931
+ "epoch": 0.7497757847533633,
2932
+ "grad_norm": 0.010143253020942211,
2933
+ "learning_rate": 2e-05,
2934
+ "loss": 1.514,
2935
+ "step": 418
2936
+ },
2937
+ {
2938
+ "epoch": 0.7515695067264574,
2939
+ "grad_norm": 0.010233073495328426,
2940
+ "learning_rate": 2e-05,
2941
+ "loss": 1.5154,
2942
+ "step": 419
2943
+ },
2944
+ {
2945
+ "epoch": 0.7533632286995515,
2946
+ "grad_norm": 0.009982983581721783,
2947
+ "learning_rate": 2e-05,
2948
+ "loss": 1.5223,
2949
+ "step": 420
2950
+ },
2951
+ {
2952
+ "epoch": 0.7551569506726458,
2953
+ "grad_norm": 0.010409766808152199,
2954
+ "learning_rate": 2e-05,
2955
+ "loss": 1.5152,
2956
+ "step": 421
2957
+ },
2958
+ {
2959
+ "epoch": 0.7569506726457399,
2960
+ "grad_norm": 0.0099264495074749,
2961
+ "learning_rate": 2e-05,
2962
+ "loss": 1.5185,
2963
+ "step": 422
2964
+ },
2965
+ {
2966
+ "epoch": 0.758744394618834,
2967
+ "grad_norm": 0.009928545914590359,
2968
+ "learning_rate": 2e-05,
2969
+ "loss": 1.4986,
2970
+ "step": 423
2971
+ },
2972
+ {
2973
+ "epoch": 0.7605381165919283,
2974
+ "grad_norm": 0.009940563701093197,
2975
+ "learning_rate": 2e-05,
2976
+ "loss": 1.5071,
2977
+ "step": 424
2978
+ },
2979
+ {
2980
+ "epoch": 0.7623318385650224,
2981
+ "grad_norm": 0.010767797008156776,
2982
+ "learning_rate": 2e-05,
2983
+ "loss": 1.5006,
2984
+ "step": 425
2985
+ },
2986
+ {
2987
+ "epoch": 0.7641255605381166,
2988
+ "grad_norm": 0.010551121085882187,
2989
+ "learning_rate": 2e-05,
2990
+ "loss": 1.5201,
2991
+ "step": 426
2992
+ },
2993
+ {
2994
+ "epoch": 0.7659192825112108,
2995
+ "grad_norm": 0.010118665173649788,
2996
+ "learning_rate": 2e-05,
2997
+ "loss": 1.5213,
2998
+ "step": 427
2999
+ },
3000
+ {
3001
+ "epoch": 0.7677130044843049,
3002
+ "grad_norm": 0.010247626341879368,
3003
+ "learning_rate": 2e-05,
3004
+ "loss": 1.5178,
3005
+ "step": 428
3006
+ },
3007
+ {
3008
+ "epoch": 0.7695067264573991,
3009
+ "grad_norm": 0.010188435204327106,
3010
+ "learning_rate": 2e-05,
3011
+ "loss": 1.5085,
3012
+ "step": 429
3013
+ },
3014
+ {
3015
+ "epoch": 0.7713004484304933,
3016
+ "grad_norm": 0.010428003035485744,
3017
+ "learning_rate": 2e-05,
3018
+ "loss": 1.5124,
3019
+ "step": 430
3020
+ },
3021
+ {
3022
+ "epoch": 0.7730941704035874,
3023
+ "grad_norm": 0.01012035645544529,
3024
+ "learning_rate": 2e-05,
3025
+ "loss": 1.5299,
3026
+ "step": 431
3027
+ },
3028
+ {
3029
+ "epoch": 0.7748878923766817,
3030
+ "grad_norm": 0.010584665462374687,
3031
+ "learning_rate": 2e-05,
3032
+ "loss": 1.5095,
3033
+ "step": 432
3034
+ },
3035
+ {
3036
+ "epoch": 0.7766816143497758,
3037
+ "grad_norm": 0.009979243390262127,
3038
+ "learning_rate": 2e-05,
3039
+ "loss": 1.5193,
3040
+ "step": 433
3041
+ },
3042
+ {
3043
+ "epoch": 0.7784753363228699,
3044
+ "grad_norm": 0.00958004966378212,
3045
+ "learning_rate": 2e-05,
3046
+ "loss": 1.5214,
3047
+ "step": 434
3048
+ },
3049
+ {
3050
+ "epoch": 0.7802690582959642,
3051
+ "grad_norm": 0.00973733700811863,
3052
+ "learning_rate": 2e-05,
3053
+ "loss": 1.5208,
3054
+ "step": 435
3055
+ },
3056
+ {
3057
+ "epoch": 0.7820627802690583,
3058
+ "grad_norm": 0.010465665720403194,
3059
+ "learning_rate": 2e-05,
3060
+ "loss": 1.5227,
3061
+ "step": 436
3062
+ },
3063
+ {
3064
+ "epoch": 0.7838565022421524,
3065
+ "grad_norm": 0.010098133236169815,
3066
+ "learning_rate": 2e-05,
3067
+ "loss": 1.5248,
3068
+ "step": 437
3069
+ },
3070
+ {
3071
+ "epoch": 0.7856502242152467,
3072
+ "grad_norm": 0.10259313136339188,
3073
+ "learning_rate": 2e-05,
3074
+ "loss": 1.5222,
3075
+ "step": 438
3076
+ },
3077
+ {
3078
+ "epoch": 0.7874439461883408,
3079
+ "grad_norm": 0.01040815282613039,
3080
+ "learning_rate": 2e-05,
3081
+ "loss": 1.5205,
3082
+ "step": 439
3083
+ },
3084
+ {
3085
+ "epoch": 0.7892376681614349,
3086
+ "grad_norm": 0.010325520299375057,
3087
+ "learning_rate": 2e-05,
3088
+ "loss": 1.5189,
3089
+ "step": 440
3090
+ },
3091
+ {
3092
+ "epoch": 0.7910313901345292,
3093
+ "grad_norm": 0.010079775005578995,
3094
+ "learning_rate": 2e-05,
3095
+ "loss": 1.5156,
3096
+ "step": 441
3097
+ },
3098
+ {
3099
+ "epoch": 0.7928251121076233,
3100
+ "grad_norm": 0.010167201980948448,
3101
+ "learning_rate": 2e-05,
3102
+ "loss": 1.5116,
3103
+ "step": 442
3104
+ },
3105
+ {
3106
+ "epoch": 0.7946188340807175,
3107
+ "grad_norm": 0.010806124657392502,
3108
+ "learning_rate": 2e-05,
3109
+ "loss": 1.5153,
3110
+ "step": 443
3111
+ },
3112
+ {
3113
+ "epoch": 0.7964125560538117,
3114
+ "grad_norm": 0.010324080474674702,
3115
+ "learning_rate": 2e-05,
3116
+ "loss": 1.5246,
3117
+ "step": 444
3118
+ },
3119
+ {
3120
+ "epoch": 0.7982062780269058,
3121
+ "grad_norm": 0.010092305950820446,
3122
+ "learning_rate": 2e-05,
3123
+ "loss": 1.5282,
3124
+ "step": 445
3125
+ },
3126
+ {
3127
+ "epoch": 0.8,
3128
+ "grad_norm": 0.01007048413157463,
3129
+ "learning_rate": 2e-05,
3130
+ "loss": 1.5108,
3131
+ "step": 446
3132
+ },
3133
+ {
3134
+ "epoch": 0.8017937219730942,
3135
+ "grad_norm": 0.010184276849031448,
3136
+ "learning_rate": 2e-05,
3137
+ "loss": 1.51,
3138
+ "step": 447
3139
+ },
3140
+ {
3141
+ "epoch": 0.8035874439461883,
3142
+ "grad_norm": 0.010521662421524525,
3143
+ "learning_rate": 2e-05,
3144
+ "loss": 1.5139,
3145
+ "step": 448
3146
+ },
3147
+ {
3148
+ "epoch": 0.8053811659192825,
3149
+ "grad_norm": 0.010600044392049313,
3150
+ "learning_rate": 2e-05,
3151
+ "loss": 1.5091,
3152
+ "step": 449
3153
+ },
3154
+ {
3155
+ "epoch": 0.8071748878923767,
3156
+ "grad_norm": 0.009714100509881973,
3157
+ "learning_rate": 2e-05,
3158
+ "loss": 1.5122,
3159
+ "step": 450
3160
+ },
3161
+ {
3162
+ "epoch": 0.8089686098654708,
3163
+ "grad_norm": 0.010295005515217781,
3164
+ "learning_rate": 2e-05,
3165
+ "loss": 1.52,
3166
+ "step": 451
3167
+ },
3168
+ {
3169
+ "epoch": 0.810762331838565,
3170
+ "grad_norm": 0.010034569539129734,
3171
+ "learning_rate": 2e-05,
3172
+ "loss": 1.5197,
3173
+ "step": 452
3174
+ },
3175
+ {
3176
+ "epoch": 0.8125560538116592,
3177
+ "grad_norm": 0.010086962021887302,
3178
+ "learning_rate": 2e-05,
3179
+ "loss": 1.5117,
3180
+ "step": 453
3181
+ },
3182
+ {
3183
+ "epoch": 0.8143497757847533,
3184
+ "grad_norm": 0.010277335532009602,
3185
+ "learning_rate": 2e-05,
3186
+ "loss": 1.5033,
3187
+ "step": 454
3188
+ },
3189
+ {
3190
+ "epoch": 0.8161434977578476,
3191
+ "grad_norm": 0.010540721006691456,
3192
+ "learning_rate": 2e-05,
3193
+ "loss": 1.5166,
3194
+ "step": 455
3195
+ },
3196
+ {
3197
+ "epoch": 0.8179372197309417,
3198
+ "grad_norm": 0.009755424223840237,
3199
+ "learning_rate": 2e-05,
3200
+ "loss": 1.5149,
3201
+ "step": 456
3202
+ },
3203
+ {
3204
+ "epoch": 0.8197309417040358,
3205
+ "grad_norm": 0.00984253827482462,
3206
+ "learning_rate": 2e-05,
3207
+ "loss": 1.5093,
3208
+ "step": 457
3209
+ },
3210
+ {
3211
+ "epoch": 0.8215246636771301,
3212
+ "grad_norm": 0.009836334735155106,
3213
+ "learning_rate": 2e-05,
3214
+ "loss": 1.5141,
3215
+ "step": 458
3216
+ },
3217
+ {
3218
+ "epoch": 0.8233183856502242,
3219
+ "grad_norm": 0.01032332144677639,
3220
+ "learning_rate": 2e-05,
3221
+ "loss": 1.5241,
3222
+ "step": 459
3223
+ },
3224
+ {
3225
+ "epoch": 0.8251121076233184,
3226
+ "grad_norm": 0.010635129176080227,
3227
+ "learning_rate": 2e-05,
3228
+ "loss": 1.5068,
3229
+ "step": 460
3230
+ },
3231
+ {
3232
+ "epoch": 0.8269058295964126,
3233
+ "grad_norm": 0.009664127603173256,
3234
+ "learning_rate": 2e-05,
3235
+ "loss": 1.5052,
3236
+ "step": 461
3237
+ },
3238
+ {
3239
+ "epoch": 0.8286995515695067,
3240
+ "grad_norm": 0.010554889217019081,
3241
+ "learning_rate": 2e-05,
3242
+ "loss": 1.5071,
3243
+ "step": 462
3244
+ },
3245
+ {
3246
+ "epoch": 0.8304932735426009,
3247
+ "grad_norm": 0.009871057234704494,
3248
+ "learning_rate": 2e-05,
3249
+ "loss": 1.5189,
3250
+ "step": 463
3251
+ },
3252
+ {
3253
+ "epoch": 0.8322869955156951,
3254
+ "grad_norm": 0.010431516915559769,
3255
+ "learning_rate": 2e-05,
3256
+ "loss": 1.5183,
3257
+ "step": 464
3258
+ },
3259
+ {
3260
+ "epoch": 0.8340807174887892,
3261
+ "grad_norm": 0.009860005229711533,
3262
+ "learning_rate": 2e-05,
3263
+ "loss": 1.5213,
3264
+ "step": 465
3265
+ },
3266
+ {
3267
+ "epoch": 0.8358744394618834,
3268
+ "grad_norm": 0.010233579203486443,
3269
+ "learning_rate": 2e-05,
3270
+ "loss": 1.5182,
3271
+ "step": 466
3272
+ },
3273
+ {
3274
+ "epoch": 0.8376681614349776,
3275
+ "grad_norm": 0.010311591438949108,
3276
+ "learning_rate": 2e-05,
3277
+ "loss": 1.5092,
3278
+ "step": 467
3279
+ },
3280
+ {
3281
+ "epoch": 0.8394618834080717,
3282
+ "grad_norm": 0.010733729228377342,
3283
+ "learning_rate": 2e-05,
3284
+ "loss": 1.5186,
3285
+ "step": 468
3286
+ },
3287
+ {
3288
+ "epoch": 0.841255605381166,
3289
+ "grad_norm": 0.009951340965926647,
3290
+ "learning_rate": 2e-05,
3291
+ "loss": 1.5097,
3292
+ "step": 469
3293
+ },
3294
+ {
3295
+ "epoch": 0.8430493273542601,
3296
+ "grad_norm": 0.01003777701407671,
3297
+ "learning_rate": 2e-05,
3298
+ "loss": 1.5173,
3299
+ "step": 470
3300
+ },
3301
+ {
3302
+ "epoch": 0.8448430493273542,
3303
+ "grad_norm": 0.009939250536262989,
3304
+ "learning_rate": 2e-05,
3305
+ "loss": 1.5108,
3306
+ "step": 471
3307
+ },
3308
+ {
3309
+ "epoch": 0.8466367713004485,
3310
+ "grad_norm": 0.009835812263190746,
3311
+ "learning_rate": 2e-05,
3312
+ "loss": 1.5272,
3313
+ "step": 472
3314
+ },
3315
+ {
3316
+ "epoch": 0.8484304932735426,
3317
+ "grad_norm": 0.010321546345949173,
3318
+ "learning_rate": 2e-05,
3319
+ "loss": 1.5193,
3320
+ "step": 473
3321
+ },
3322
+ {
3323
+ "epoch": 0.8502242152466367,
3324
+ "grad_norm": 0.01006554439663887,
3325
+ "learning_rate": 2e-05,
3326
+ "loss": 1.5165,
3327
+ "step": 474
3328
+ },
3329
+ {
3330
+ "epoch": 0.852017937219731,
3331
+ "grad_norm": 0.009972809813916683,
3332
+ "learning_rate": 2e-05,
3333
+ "loss": 1.5228,
3334
+ "step": 475
3335
+ },
3336
+ {
3337
+ "epoch": 0.8538116591928251,
3338
+ "grad_norm": 0.010388972237706184,
3339
+ "learning_rate": 2e-05,
3340
+ "loss": 1.5188,
3341
+ "step": 476
3342
+ },
3343
+ {
3344
+ "epoch": 0.8556053811659193,
3345
+ "grad_norm": 0.010111154057085514,
3346
+ "learning_rate": 2e-05,
3347
+ "loss": 1.5199,
3348
+ "step": 477
3349
+ },
3350
+ {
3351
+ "epoch": 0.8573991031390135,
3352
+ "grad_norm": 0.01029327604919672,
3353
+ "learning_rate": 2e-05,
3354
+ "loss": 1.516,
3355
+ "step": 478
3356
+ },
3357
+ {
3358
+ "epoch": 0.8591928251121076,
3359
+ "grad_norm": 0.010400544852018356,
3360
+ "learning_rate": 2e-05,
3361
+ "loss": 1.5218,
3362
+ "step": 479
3363
+ },
3364
+ {
3365
+ "epoch": 0.8609865470852018,
3366
+ "grad_norm": 0.0099885743111372,
3367
+ "learning_rate": 2e-05,
3368
+ "loss": 1.5155,
3369
+ "step": 480
3370
+ },
3371
+ {
3372
+ "epoch": 0.862780269058296,
3373
+ "grad_norm": 0.010007279925048351,
3374
+ "learning_rate": 2e-05,
3375
+ "loss": 1.5205,
3376
+ "step": 481
3377
+ },
3378
+ {
3379
+ "epoch": 0.8645739910313901,
3380
+ "grad_norm": 0.01053563691675663,
3381
+ "learning_rate": 2e-05,
3382
+ "loss": 1.5019,
3383
+ "step": 482
3384
+ },
3385
+ {
3386
+ "epoch": 0.8663677130044843,
3387
+ "grad_norm": 0.01031608134508133,
3388
+ "learning_rate": 2e-05,
3389
+ "loss": 1.5217,
3390
+ "step": 483
3391
+ },
3392
+ {
3393
+ "epoch": 0.8681614349775785,
3394
+ "grad_norm": 0.010082092136144638,
3395
+ "learning_rate": 2e-05,
3396
+ "loss": 1.5073,
3397
+ "step": 484
3398
+ },
3399
+ {
3400
+ "epoch": 0.8699551569506726,
3401
+ "grad_norm": 0.01012254785746336,
3402
+ "learning_rate": 2e-05,
3403
+ "loss": 1.5101,
3404
+ "step": 485
3405
+ },
3406
+ {
3407
+ "epoch": 0.8717488789237668,
3408
+ "grad_norm": 0.010539901442825794,
3409
+ "learning_rate": 2e-05,
3410
+ "loss": 1.5209,
3411
+ "step": 486
3412
+ },
3413
+ {
3414
+ "epoch": 0.873542600896861,
3415
+ "grad_norm": 0.009883386082947254,
3416
+ "learning_rate": 2e-05,
3417
+ "loss": 1.5275,
3418
+ "step": 487
3419
+ },
3420
+ {
3421
+ "epoch": 0.8753363228699551,
3422
+ "grad_norm": 0.010055874474346638,
3423
+ "learning_rate": 2e-05,
3424
+ "loss": 1.521,
3425
+ "step": 488
3426
+ },
3427
+ {
3428
+ "epoch": 0.8771300448430494,
3429
+ "grad_norm": 0.010441599413752556,
3430
+ "learning_rate": 2e-05,
3431
+ "loss": 1.5253,
3432
+ "step": 489
3433
+ },
3434
+ {
3435
+ "epoch": 0.8789237668161435,
3436
+ "grad_norm": 0.010321282781660557,
3437
+ "learning_rate": 2e-05,
3438
+ "loss": 1.5128,
3439
+ "step": 490
3440
+ },
3441
+ {
3442
+ "epoch": 0.8807174887892377,
3443
+ "grad_norm": 0.010404079221189022,
3444
+ "learning_rate": 2e-05,
3445
+ "loss": 1.5216,
3446
+ "step": 491
3447
+ },
3448
+ {
3449
+ "epoch": 0.8825112107623319,
3450
+ "grad_norm": 0.010680857114493847,
3451
+ "learning_rate": 2e-05,
3452
+ "loss": 1.5102,
3453
+ "step": 492
3454
+ },
3455
+ {
3456
+ "epoch": 0.884304932735426,
3457
+ "grad_norm": 0.009785238653421402,
3458
+ "learning_rate": 2e-05,
3459
+ "loss": 1.5152,
3460
+ "step": 493
3461
+ },
3462
+ {
3463
+ "epoch": 0.8860986547085202,
3464
+ "grad_norm": 0.010622934438288212,
3465
+ "learning_rate": 2e-05,
3466
+ "loss": 1.5134,
3467
+ "step": 494
3468
+ },
3469
+ {
3470
+ "epoch": 0.8878923766816144,
3471
+ "grad_norm": 0.009563595987856388,
3472
+ "learning_rate": 2e-05,
3473
+ "loss": 1.5213,
3474
+ "step": 495
3475
+ },
3476
+ {
3477
+ "epoch": 0.8896860986547085,
3478
+ "grad_norm": 0.009900403209030628,
3479
+ "learning_rate": 2e-05,
3480
+ "loss": 1.5254,
3481
+ "step": 496
3482
+ },
3483
+ {
3484
+ "epoch": 0.8914798206278027,
3485
+ "grad_norm": 0.010441206395626068,
3486
+ "learning_rate": 2e-05,
3487
+ "loss": 1.5042,
3488
+ "step": 497
3489
+ },
3490
+ {
3491
+ "epoch": 0.8932735426008969,
3492
+ "grad_norm": 0.010110273025929928,
3493
+ "learning_rate": 2e-05,
3494
+ "loss": 1.5141,
3495
+ "step": 498
3496
+ },
3497
+ {
3498
+ "epoch": 0.895067264573991,
3499
+ "grad_norm": 0.00976527575403452,
3500
+ "learning_rate": 2e-05,
3501
+ "loss": 1.5189,
3502
+ "step": 499
3503
+ },
3504
+ {
3505
+ "epoch": 0.8968609865470852,
3506
+ "grad_norm": 0.010270185768604279,
3507
+ "learning_rate": 2e-05,
3508
+ "loss": 1.5128,
3509
+ "step": 500
3510
+ },
3511
+ {
3512
+ "epoch": 0.8986547085201794,
3513
+ "grad_norm": 0.010477078147232533,
3514
+ "learning_rate": 2e-05,
3515
+ "loss": 1.5331,
3516
+ "step": 501
3517
+ },
3518
+ {
3519
+ "epoch": 0.9004484304932735,
3520
+ "grad_norm": 0.009786723181605339,
3521
+ "learning_rate": 2e-05,
3522
+ "loss": 1.5143,
3523
+ "step": 502
3524
+ },
3525
+ {
3526
+ "epoch": 0.9022421524663677,
3527
+ "grad_norm": 0.009838691912591457,
3528
+ "learning_rate": 2e-05,
3529
+ "loss": 1.5237,
3530
+ "step": 503
3531
+ },
3532
+ {
3533
+ "epoch": 0.9040358744394619,
3534
+ "grad_norm": 0.010305250994861126,
3535
+ "learning_rate": 2e-05,
3536
+ "loss": 1.5236,
3537
+ "step": 504
3538
+ },
3539
+ {
3540
+ "epoch": 0.905829596412556,
3541
+ "grad_norm": 0.010098317638039589,
3542
+ "learning_rate": 2e-05,
3543
+ "loss": 1.5189,
3544
+ "step": 505
3545
+ },
3546
+ {
3547
+ "epoch": 0.9076233183856502,
3548
+ "grad_norm": 0.010335841216146946,
3549
+ "learning_rate": 2e-05,
3550
+ "loss": 1.519,
3551
+ "step": 506
3552
+ },
3553
+ {
3554
+ "epoch": 0.9094170403587444,
3555
+ "grad_norm": 0.009809168055653572,
3556
+ "learning_rate": 2e-05,
3557
+ "loss": 1.5176,
3558
+ "step": 507
3559
+ },
3560
+ {
3561
+ "epoch": 0.9112107623318386,
3562
+ "grad_norm": 0.01069081760942936,
3563
+ "learning_rate": 2e-05,
3564
+ "loss": 1.5055,
3565
+ "step": 508
3566
+ },
3567
+ {
3568
+ "epoch": 0.9130044843049328,
3569
+ "grad_norm": 0.009927291423082352,
3570
+ "learning_rate": 2e-05,
3571
+ "loss": 1.5224,
3572
+ "step": 509
3573
+ },
3574
+ {
3575
+ "epoch": 0.9147982062780269,
3576
+ "grad_norm": 0.010560589842498302,
3577
+ "learning_rate": 2e-05,
3578
+ "loss": 1.5129,
3579
+ "step": 510
3580
+ },
3581
+ {
3582
+ "epoch": 0.9165919282511211,
3583
+ "grad_norm": 0.010154438205063343,
3584
+ "learning_rate": 2e-05,
3585
+ "loss": 1.52,
3586
+ "step": 511
3587
+ },
3588
+ {
3589
+ "epoch": 0.9183856502242153,
3590
+ "grad_norm": 0.010346156544983387,
3591
+ "learning_rate": 2e-05,
3592
+ "loss": 1.5194,
3593
+ "step": 512
3594
+ },
3595
+ {
3596
+ "epoch": 0.9201793721973094,
3597
+ "grad_norm": 0.010523281060159206,
3598
+ "learning_rate": 2e-05,
3599
+ "loss": 1.5187,
3600
+ "step": 513
3601
+ },
3602
+ {
3603
+ "epoch": 0.9219730941704036,
3604
+ "grad_norm": 0.010443002916872501,
3605
+ "learning_rate": 2e-05,
3606
+ "loss": 1.5059,
3607
+ "step": 514
3608
+ },
3609
+ {
3610
+ "epoch": 0.9237668161434978,
3611
+ "grad_norm": 0.010005362331867218,
3612
+ "learning_rate": 2e-05,
3613
+ "loss": 1.5102,
3614
+ "step": 515
3615
+ },
3616
+ {
3617
+ "epoch": 0.9255605381165919,
3618
+ "grad_norm": 0.010285025462508202,
3619
+ "learning_rate": 2e-05,
3620
+ "loss": 1.5217,
3621
+ "step": 516
3622
+ },
3623
+ {
3624
+ "epoch": 0.9273542600896861,
3625
+ "grad_norm": 0.010401098988950253,
3626
+ "learning_rate": 2e-05,
3627
+ "loss": 1.5243,
3628
+ "step": 517
3629
+ },
3630
+ {
3631
+ "epoch": 0.9291479820627803,
3632
+ "grad_norm": 0.010455128736793995,
3633
+ "learning_rate": 2e-05,
3634
+ "loss": 1.5054,
3635
+ "step": 518
3636
+ },
3637
+ {
3638
+ "epoch": 0.9309417040358744,
3639
+ "grad_norm": 0.00987928081303835,
3640
+ "learning_rate": 2e-05,
3641
+ "loss": 1.5053,
3642
+ "step": 519
3643
+ },
3644
+ {
3645
+ "epoch": 0.9327354260089686,
3646
+ "grad_norm": 0.010212692432105541,
3647
+ "learning_rate": 2e-05,
3648
+ "loss": 1.5151,
3649
+ "step": 520
3650
+ },
3651
+ {
3652
+ "epoch": 0.9345291479820628,
3653
+ "grad_norm": 0.010937588289380074,
3654
+ "learning_rate": 2e-05,
3655
+ "loss": 1.5099,
3656
+ "step": 521
3657
+ },
3658
+ {
3659
+ "epoch": 0.9363228699551569,
3660
+ "grad_norm": 0.010248001664876938,
3661
+ "learning_rate": 2e-05,
3662
+ "loss": 1.5256,
3663
+ "step": 522
3664
+ },
3665
+ {
3666
+ "epoch": 0.9381165919282511,
3667
+ "grad_norm": 0.010430903173983097,
3668
+ "learning_rate": 2e-05,
3669
+ "loss": 1.5056,
3670
+ "step": 523
3671
+ },
3672
+ {
3673
+ "epoch": 0.9399103139013453,
3674
+ "grad_norm": 0.0102499695494771,
3675
+ "learning_rate": 2e-05,
3676
+ "loss": 1.5285,
3677
+ "step": 524
3678
+ },
3679
+ {
3680
+ "epoch": 0.9417040358744395,
3681
+ "grad_norm": 0.010674213990569115,
3682
+ "learning_rate": 2e-05,
3683
+ "loss": 1.5136,
3684
+ "step": 525
3685
+ },
3686
+ {
3687
+ "epoch": 0.9434977578475336,
3688
+ "grad_norm": 0.010732615366578102,
3689
+ "learning_rate": 2e-05,
3690
+ "loss": 1.5119,
3691
+ "step": 526
3692
+ },
3693
+ {
3694
+ "epoch": 0.9452914798206278,
3695
+ "grad_norm": 0.009994648396968842,
3696
+ "learning_rate": 2e-05,
3697
+ "loss": 1.5228,
3698
+ "step": 527
3699
+ },
3700
+ {
3701
+ "epoch": 0.947085201793722,
3702
+ "grad_norm": 0.010234368033707142,
3703
+ "learning_rate": 2e-05,
3704
+ "loss": 1.5258,
3705
+ "step": 528
3706
+ },
3707
+ {
3708
+ "epoch": 0.9488789237668162,
3709
+ "grad_norm": 0.010327205993235111,
3710
+ "learning_rate": 2e-05,
3711
+ "loss": 1.5156,
3712
+ "step": 529
3713
+ },
3714
+ {
3715
+ "epoch": 0.9506726457399103,
3716
+ "grad_norm": 0.009836922399699688,
3717
+ "learning_rate": 2e-05,
3718
+ "loss": 1.5171,
3719
+ "step": 530
3720
+ },
3721
+ {
3722
+ "epoch": 0.9524663677130045,
3723
+ "grad_norm": 0.009962068870663643,
3724
+ "learning_rate": 2e-05,
3725
+ "loss": 1.5125,
3726
+ "step": 531
3727
+ },
3728
+ {
3729
+ "epoch": 0.9542600896860987,
3730
+ "grad_norm": 0.010127882473170757,
3731
+ "learning_rate": 2e-05,
3732
+ "loss": 1.5182,
3733
+ "step": 532
3734
+ },
3735
+ {
3736
+ "epoch": 0.9560538116591928,
3737
+ "grad_norm": 0.010251611471176147,
3738
+ "learning_rate": 2e-05,
3739
+ "loss": 1.5139,
3740
+ "step": 533
3741
+ },
3742
+ {
3743
+ "epoch": 0.957847533632287,
3744
+ "grad_norm": 0.010081682354211807,
3745
+ "learning_rate": 2e-05,
3746
+ "loss": 1.5239,
3747
+ "step": 534
3748
+ },
3749
+ {
3750
+ "epoch": 0.9596412556053812,
3751
+ "grad_norm": 0.010235367342829704,
3752
+ "learning_rate": 2e-05,
3753
+ "loss": 1.5159,
3754
+ "step": 535
3755
+ },
3756
+ {
3757
+ "epoch": 0.9614349775784753,
3758
+ "grad_norm": 0.009694702923297882,
3759
+ "learning_rate": 2e-05,
3760
+ "loss": 1.5174,
3761
+ "step": 536
3762
+ },
3763
+ {
3764
+ "epoch": 0.9632286995515695,
3765
+ "grad_norm": 0.010224996134638786,
3766
+ "learning_rate": 2e-05,
3767
+ "loss": 1.5171,
3768
+ "step": 537
3769
+ },
3770
+ {
3771
+ "epoch": 0.9650224215246637,
3772
+ "grad_norm": 0.010206632316112518,
3773
+ "learning_rate": 2e-05,
3774
+ "loss": 1.5223,
3775
+ "step": 538
3776
+ },
3777
+ {
3778
+ "epoch": 0.9668161434977578,
3779
+ "grad_norm": 0.010011864826083183,
3780
+ "learning_rate": 2e-05,
3781
+ "loss": 1.5282,
3782
+ "step": 539
3783
+ },
3784
+ {
3785
+ "epoch": 0.968609865470852,
3786
+ "grad_norm": 0.010364921763539314,
3787
+ "learning_rate": 2e-05,
3788
+ "loss": 1.5092,
3789
+ "step": 540
3790
+ },
3791
+ {
3792
+ "epoch": 0.9704035874439462,
3793
+ "grad_norm": 0.010109508410096169,
3794
+ "learning_rate": 2e-05,
3795
+ "loss": 1.5068,
3796
+ "step": 541
3797
+ },
3798
+ {
3799
+ "epoch": 0.9721973094170404,
3800
+ "grad_norm": 0.00964987464249134,
3801
+ "learning_rate": 2e-05,
3802
+ "loss": 1.5089,
3803
+ "step": 542
3804
+ },
3805
+ {
3806
+ "epoch": 0.9739910313901345,
3807
+ "grad_norm": 0.010244207456707954,
3808
+ "learning_rate": 2e-05,
3809
+ "loss": 1.5217,
3810
+ "step": 543
3811
+ },
3812
+ {
3813
+ "epoch": 0.9757847533632287,
3814
+ "grad_norm": 0.009797874838113785,
3815
+ "learning_rate": 2e-05,
3816
+ "loss": 1.5143,
3817
+ "step": 544
3818
+ },
3819
+ {
3820
+ "epoch": 0.9775784753363229,
3821
+ "grad_norm": 0.010056640952825546,
3822
+ "learning_rate": 2e-05,
3823
+ "loss": 1.5276,
3824
+ "step": 545
3825
+ },
3826
+ {
3827
+ "epoch": 0.979372197309417,
3828
+ "grad_norm": 0.009898710064589977,
3829
+ "learning_rate": 2e-05,
3830
+ "loss": 1.5222,
3831
+ "step": 546
3832
+ },
3833
+ {
3834
+ "epoch": 0.9811659192825112,
3835
+ "grad_norm": 0.0099082225933671,
3836
+ "learning_rate": 2e-05,
3837
+ "loss": 1.5276,
3838
+ "step": 547
3839
+ },
3840
+ {
3841
+ "epoch": 0.9829596412556054,
3842
+ "grad_norm": 0.01018478162586689,
3843
+ "learning_rate": 2e-05,
3844
+ "loss": 1.5217,
3845
+ "step": 548
3846
+ },
3847
+ {
3848
+ "epoch": 0.9847533632286996,
3849
+ "grad_norm": 0.009828625246882439,
3850
+ "learning_rate": 2e-05,
3851
+ "loss": 1.5194,
3852
+ "step": 549
3853
+ },
3854
+ {
3855
+ "epoch": 0.9865470852017937,
3856
+ "grad_norm": 0.010311014950275421,
3857
+ "learning_rate": 2e-05,
3858
+ "loss": 1.5138,
3859
+ "step": 550
3860
+ },
3861
+ {
3862
+ "epoch": 0.9883408071748879,
3863
+ "grad_norm": 0.010840130038559437,
3864
+ "learning_rate": 2e-05,
3865
+ "loss": 1.5044,
3866
+ "step": 551
3867
+ },
3868
+ {
3869
+ "epoch": 0.9901345291479821,
3870
+ "grad_norm": 0.009595104493200779,
3871
+ "learning_rate": 2e-05,
3872
+ "loss": 1.5165,
3873
+ "step": 552
3874
+ },
3875
+ {
3876
+ "epoch": 0.9919282511210762,
3877
+ "grad_norm": 0.01027593482285738,
3878
+ "learning_rate": 2e-05,
3879
+ "loss": 1.5291,
3880
+ "step": 553
3881
+ },
3882
+ {
3883
+ "epoch": 0.9937219730941704,
3884
+ "grad_norm": 0.010394555516541004,
3885
+ "learning_rate": 2e-05,
3886
+ "loss": 1.5109,
3887
+ "step": 554
3888
+ },
3889
+ {
3890
+ "epoch": 0.9955156950672646,
3891
+ "grad_norm": 0.00996735692024231,
3892
+ "learning_rate": 2e-05,
3893
+ "loss": 1.5212,
3894
+ "step": 555
3895
+ },
3896
+ {
3897
+ "epoch": 0.9973094170403587,
3898
+ "grad_norm": 0.010095257312059402,
3899
+ "learning_rate": 2e-05,
3900
+ "loss": 1.5106,
3901
+ "step": 556
3902
+ },
3903
+ {
3904
+ "epoch": 0.9991031390134529,
3905
+ "grad_norm": 0.01082176435738802,
3906
+ "learning_rate": 2e-05,
3907
+ "loss": 1.5099,
3908
+ "step": 557
3909
+ },
3910
+ {
3911
+ "epoch": 0.9991031390134529,
3912
+ "step": 557,
3913
+ "total_flos": 7841700554735616.0,
3914
+ "train_loss": 0.4275342236729456,
3915
+ "train_runtime": 63728.3403,
3916
+ "train_samples_per_second": 2.239,
3917
+ "train_steps_per_second": 0.009
3918
  }
3919
  ],
3920
  "logging_steps": 1,
3921
+ "max_steps": 557,
3922
  "num_input_tokens_seen": 0,
3923
  "num_train_epochs": 1,
3924
  "save_steps": 100,
 
3934
  "attributes": {}
3935
  }
3936
  },
3937
+ "total_flos": 7841700554735616.0,
3938
  "train_batch_size": 2,
3939
  "trial_name": null,
3940
  "trial_params": null
training_loss.png CHANGED