Training in progress, step 375, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +30 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +4 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +205 -0
last-checkpoint/trainer_state.json +2674 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: katuni4ka/tiny-random-dbrx
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "katuni4ka/tiny-random-dbrx",
+  "bias": "none",
+  "fan_in_fan_out": false,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 16,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 8,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "out_proj",
+    "Wqkv",
+    "layer"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": true
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:0b70d74cf0759d9dadc861254915bfee0208c56e422495d1aa66e6e350f70e89
+size 5752

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,4 @@

+{
+  "<|im_end|>": 100279,
+  "<|im_start|>": 100278
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:63bb4a006a92edc2273523d8eab51a1cd8b63917850554f90125c790b176d44c
+size 15814

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:dd5eadcdf608dd504d06315bd0c0b7a8d711126448227c5d55118f876bc4707a
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:4bf304e342001350c82d6970cec50fb92a4329a84dcb76ae8031bca03ca92aa9
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|pad|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,205 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "100256": {
+      "content": "<||_unused_0_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100257": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100258": {
+      "content": "<|fim_prefix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100259": {
+      "content": "<|fim_middle|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100260": {
+      "content": "<|fim_suffix|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100261": {
+      "content": "<||_unused_1_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100262": {
+      "content": "<||_unused_2_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100263": {
+      "content": "<||_unused_3_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100264": {
+      "content": "<||_unused_4_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100265": {
+      "content": "<||_unused_5_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100266": {
+      "content": "<||_unused_6_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100267": {
+      "content": "<||_unused_7_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100268": {
+      "content": "<||_unused_8_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100269": {
+      "content": "<||_unused_9_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100270": {
+      "content": "<||_unused_10_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100271": {
+      "content": "<||_unused_11_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100272": {
+      "content": "<||_unused_12_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100273": {
+      "content": "<||_unused_13_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100274": {
+      "content": "<||_unused_14_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100275": {
+      "content": "<||_unused_15_||>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100276": {
+      "content": "<|endofprompt|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100277": {
+      "content": "<|pad|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100278": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "100279": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "<|endoftext|>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": true,
+  "eos_token": "<|endoftext|>",
+  "model_max_length": 32768,
+  "pad_token": "<|pad|>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "<|endoftext|>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,2674 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.06002761270184285,
+  "eval_steps": 375,
+  "global_step": 375,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00016007363387158093,
+      "grad_norm": 5.485477959155105e-05,
+      "learning_rate": 2e-05,
+      "loss": 11.5,
+      "step": 1
+    },
+    {
+      "epoch": 0.00016007363387158093,
+      "eval_loss": 11.5,
+      "eval_runtime": 61.3751,
+      "eval_samples_per_second": 171.438,
+      "eval_steps_per_second": 85.719,
+      "step": 1
+    },
+    {
+      "epoch": 0.00032014726774316185,
+      "grad_norm": 8.732182323001325e-05,
+      "learning_rate": 4e-05,
+      "loss": 11.5,
+      "step": 2
+    },
+    {
+      "epoch": 0.0004802209016147428,
+      "grad_norm": 7.794566045049578e-05,
+      "learning_rate": 6e-05,
+      "loss": 11.5,
+      "step": 3
+    },
+    {
+      "epoch": 0.0006402945354863237,
+      "grad_norm": 0.00010553398169577122,
+      "learning_rate": 8e-05,
+      "loss": 11.5,
+      "step": 4
+    },
+    {
+      "epoch": 0.0008003681693579046,
+      "grad_norm": 9.141851478489116e-05,
+      "learning_rate": 0.0001,
+      "loss": 11.5,
+      "step": 5
+    },
+    {
+      "epoch": 0.0009604418032294856,
+      "grad_norm": 7.036867464194074e-05,
+      "learning_rate": 0.00012,
+      "loss": 11.5,
+      "step": 6
+    },
+    {
+      "epoch": 0.0011205154371010666,
+      "grad_norm": 0.00010718707926571369,
+      "learning_rate": 0.00014,
+      "loss": 11.5,
+      "step": 7
+    },
+    {
+      "epoch": 0.0012805890709726474,
+      "grad_norm": 5.46959818166215e-05,
+      "learning_rate": 0.00016,
+      "loss": 11.5,
+      "step": 8
+    },
+    {
+      "epoch": 0.0014406627048442284,
+      "grad_norm": 6.93291804054752e-05,
+      "learning_rate": 0.00018,
+      "loss": 11.5,
+      "step": 9
+    },
+    {
+      "epoch": 0.0016007363387158093,
+      "grad_norm": 6.610731361433864e-05,
+      "learning_rate": 0.0002,
+      "loss": 11.5,
+      "step": 10
+    },
+    {
+      "epoch": 0.0017608099725873903,
+      "grad_norm": 7.073202141327783e-05,
+      "learning_rate": 0.00019999977772170748,
+      "loss": 11.5,
+      "step": 11
+    },
+    {
+      "epoch": 0.0019208836064589711,
+      "grad_norm": 5.89275878155604e-05,
+      "learning_rate": 0.00019999911088781805,
+      "loss": 11.5,
+      "step": 12
+    },
+    {
+      "epoch": 0.002080957240330552,
+      "grad_norm": 7.123701652744785e-05,
+      "learning_rate": 0.0001999979995012962,
+      "loss": 11.5,
+      "step": 13
+    },
+    {
+      "epoch": 0.002241030874202133,
+      "grad_norm": 7.767384522594512e-05,
+      "learning_rate": 0.00019999644356708261,
+      "loss": 11.5,
+      "step": 14
+    },
+    {
+      "epoch": 0.002401104508073714,
+      "grad_norm": 0.00010461104830028489,
+      "learning_rate": 0.00019999444309209432,
+      "loss": 11.5,
+      "step": 15
+    },
+    {
+      "epoch": 0.002561178141945295,
+      "grad_norm": 0.00013544182002078742,
+      "learning_rate": 0.0001999919980852246,
+      "loss": 11.5,
+      "step": 16
+    },
+    {
+      "epoch": 0.0027212517758168756,
+      "grad_norm": 0.00012140026228735223,
+      "learning_rate": 0.00019998910855734288,
+      "loss": 11.5,
+      "step": 17
+    },
+    {
+      "epoch": 0.002881325409688457,
+      "grad_norm": 0.00015228302800096571,
+      "learning_rate": 0.0001999857745212947,
+      "loss": 11.5,
+      "step": 18
+    },
+    {
+      "epoch": 0.0030413990435600377,
+      "grad_norm": 0.00013085530372336507,
+      "learning_rate": 0.00019998199599190178,
+      "loss": 11.5,
+      "step": 19
+    },
+    {
+      "epoch": 0.0032014726774316185,
+      "grad_norm": 0.00015696501941420138,
+      "learning_rate": 0.0001999777729859618,
+      "loss": 11.5,
+      "step": 20
+    },
+    {
+      "epoch": 0.0033615463113031993,
+      "grad_norm": 0.00011811301374109462,
+      "learning_rate": 0.00019997310552224846,
+      "loss": 11.5,
+      "step": 21
+    },
+    {
+      "epoch": 0.0035216199451747806,
+      "grad_norm": 0.00017093494534492493,
+      "learning_rate": 0.00019996799362151122,
+      "loss": 11.5,
+      "step": 22
+    },
+    {
+      "epoch": 0.0036816935790463614,
+      "grad_norm": 0.00016428295930381864,
+      "learning_rate": 0.00019996243730647538,
+      "loss": 11.5,
+      "step": 23
+    },
+    {
+      "epoch": 0.0038417672129179422,
+      "grad_norm": 0.00017610278155189008,
+      "learning_rate": 0.00019995643660184191,
+      "loss": 11.5,
+      "step": 24
+    },
+    {
+      "epoch": 0.0040018408467895235,
+      "grad_norm": 0.00022790305956732482,
+      "learning_rate": 0.00019994999153428737,
+      "loss": 11.5,
+      "step": 25
+    },
+    {
+      "epoch": 0.004161914480661104,
+      "grad_norm": 0.00025935209123417735,
+      "learning_rate": 0.00019994310213246368,
+      "loss": 11.5,
+      "step": 26
+    },
+    {
+      "epoch": 0.004321988114532685,
+      "grad_norm": 0.0002765641256701201,
+      "learning_rate": 0.00019993576842699816,
+      "loss": 11.5,
+      "step": 27
+    },
+    {
+      "epoch": 0.004482061748404266,
+      "grad_norm": 0.00021294229372870177,
+      "learning_rate": 0.0001999279904504933,
+      "loss": 11.5,
+      "step": 28
+    },
+    {
+      "epoch": 0.004642135382275847,
+      "grad_norm": 0.00019341120787430555,
+      "learning_rate": 0.00019991976823752653,
+      "loss": 11.5,
+      "step": 29
+    },
+    {
+      "epoch": 0.004802209016147428,
+      "grad_norm": 0.0002532599901314825,
+      "learning_rate": 0.00019991110182465032,
+      "loss": 11.5,
+      "step": 30
+    },
+    {
+      "epoch": 0.004962282650019008,
+      "grad_norm": 0.0003308654122520238,
+      "learning_rate": 0.00019990199125039174,
+      "loss": 11.5,
+      "step": 31
+    },
+    {
+      "epoch": 0.00512235628389059,
+      "grad_norm": 0.00029195210663601756,
+      "learning_rate": 0.00019989243655525247,
+      "loss": 11.5,
+      "step": 32
+    },
+    {
+      "epoch": 0.005282429917762171,
+      "grad_norm": 0.00023563031572848558,
+      "learning_rate": 0.00019988243778170853,
+      "loss": 11.5,
+      "step": 33
+    },
+    {
+      "epoch": 0.005442503551633751,
+      "grad_norm": 0.00020833924645558,
+      "learning_rate": 0.0001998719949742101,
+      "loss": 11.5,
+      "step": 34
+    },
+    {
+      "epoch": 0.0056025771855053325,
+      "grad_norm": 0.0004335304838605225,
+      "learning_rate": 0.0001998611081791814,
+      "loss": 11.5,
+      "step": 35
+    },
+    {
+      "epoch": 0.005762650819376914,
+      "grad_norm": 0.0003254961338825524,
+      "learning_rate": 0.00019984977744502038,
+      "loss": 11.5,
+      "step": 36
+    },
+    {
+      "epoch": 0.005922724453248494,
+      "grad_norm": 0.0003397865511942655,
+      "learning_rate": 0.00019983800282209857,
+      "loss": 11.5,
+      "step": 37
+    },
+    {
+      "epoch": 0.006082798087120075,
+      "grad_norm": 0.00024661177303642035,
+      "learning_rate": 0.00019982578436276082,
+      "loss": 11.5,
+      "step": 38
+    },
+    {
+      "epoch": 0.006242871720991656,
+      "grad_norm": 0.00032913414179347456,
+      "learning_rate": 0.00019981312212132512,
+      "loss": 11.5,
+      "step": 39
+    },
+    {
+      "epoch": 0.006402945354863237,
+      "grad_norm": 0.00048708287067711353,
+      "learning_rate": 0.00019980001615408228,
+      "loss": 11.5,
+      "step": 40
+    },
+    {
+      "epoch": 0.006563018988734818,
+      "grad_norm": 0.0004900452331639826,
+      "learning_rate": 0.00019978646651929572,
+      "loss": 11.5,
+      "step": 41
+    },
+    {
+      "epoch": 0.006723092622606399,
+      "grad_norm": 0.00046467658830806613,
+      "learning_rate": 0.00019977247327720128,
+      "loss": 11.5,
+      "step": 42
+    },
+    {
+      "epoch": 0.00688316625647798,
+      "grad_norm": 0.0006093564443290234,
+      "learning_rate": 0.0001997580364900068,
+      "loss": 11.5,
+      "step": 43
+    },
+    {
+      "epoch": 0.007043239890349561,
+      "grad_norm": 0.0009993163403123617,
+      "learning_rate": 0.000199743156221892,
+      "loss": 11.5,
+      "step": 44
+    },
+    {
+      "epoch": 0.007203313524221142,
+      "grad_norm": 0.001258009229786694,
+      "learning_rate": 0.00019972783253900808,
+      "loss": 11.5,
+      "step": 45
+    },
+    {
+      "epoch": 0.007363387158092723,
+      "grad_norm": 0.001445628353394568,
+      "learning_rate": 0.00019971206550947748,
+      "loss": 11.5,
+      "step": 46
+    },
+    {
+      "epoch": 0.007523460791964303,
+      "grad_norm": 0.0013488645199686289,
+      "learning_rate": 0.00019969585520339354,
+      "loss": 11.5,
+      "step": 47
+    },
+    {
+      "epoch": 0.0076835344258358845,
+      "grad_norm": 0.0006975385476835072,
+      "learning_rate": 0.0001996792016928203,
+      "loss": 11.5,
+      "step": 48
+    },
+    {
+      "epoch": 0.007843608059707465,
+      "grad_norm": 0.000834140635561198,
+      "learning_rate": 0.00019966210505179197,
+      "loss": 11.5,
+      "step": 49
+    },
+    {
+      "epoch": 0.008003681693579047,
+      "grad_norm": 0.0004872016725130379,
+      "learning_rate": 0.00019964456535631286,
+      "loss": 11.5,
+      "step": 50
+    },
+    {
+      "epoch": 0.008163755327450627,
+      "grad_norm": 0.00016838940791785717,
+      "learning_rate": 0.0001996265826843568,
+      "loss": 11.5,
+      "step": 51
+    },
+    {
+      "epoch": 0.008323828961322208,
+      "grad_norm": 0.00023144090664573014,
+      "learning_rate": 0.00019960815711586696,
+      "loss": 11.5,
+      "step": 52
+    },
+    {
+      "epoch": 0.00848390259519379,
+      "grad_norm": 0.00019563916430342942,
+      "learning_rate": 0.00019958928873275539,
+      "loss": 11.5,
+      "step": 53
+    },
+    {
+      "epoch": 0.00864397622906537,
+      "grad_norm": 0.00020072546612937003,
+      "learning_rate": 0.00019956997761890277,
+      "loss": 11.5,
+      "step": 54
+    },
+    {
+      "epoch": 0.00880404986293695,
+      "grad_norm": 0.000214951389352791,
+      "learning_rate": 0.00019955022386015792,
+      "loss": 11.5,
+      "step": 55
+    },
+    {
+      "epoch": 0.008964123496808533,
+      "grad_norm": 0.0001576873182784766,
+      "learning_rate": 0.00019953002754433743,
+      "loss": 11.5,
+      "step": 56
+    },
+    {
+      "epoch": 0.009124197130680113,
+      "grad_norm": 0.00016661152767483145,
+      "learning_rate": 0.00019950938876122542,
+      "loss": 11.5,
+      "step": 57
+    },
+    {
+      "epoch": 0.009284270764551694,
+      "grad_norm": 0.00020268915977794677,
+      "learning_rate": 0.00019948830760257291,
+      "loss": 11.5,
+      "step": 58
+    },
+    {
+      "epoch": 0.009444344398423274,
+      "grad_norm": 0.00017972583009395748,
+      "learning_rate": 0.0001994667841620976,
+      "loss": 11.5,
+      "step": 59
+    },
+    {
+      "epoch": 0.009604418032294856,
+      "grad_norm": 0.00018623605137690902,
+      "learning_rate": 0.00019944481853548335,
+      "loss": 11.5,
+      "step": 60
+    },
+    {
+      "epoch": 0.009764491666166436,
+      "grad_norm": 0.00023312588746193796,
+      "learning_rate": 0.00019942241082037982,
+      "loss": 11.5,
+      "step": 61
+    },
+    {
+      "epoch": 0.009924565300038017,
+      "grad_norm": 0.0001705118193058297,
+      "learning_rate": 0.00019939956111640197,
+      "loss": 11.5,
+      "step": 62
+    },
+    {
+      "epoch": 0.010084638933909599,
+      "grad_norm": 0.0002978912671096623,
+      "learning_rate": 0.00019937626952512964,
+      "loss": 11.5,
+      "step": 63
+    },
+    {
+      "epoch": 0.01024471256778118,
+      "grad_norm": 0.00032203004229813814,
+      "learning_rate": 0.0001993525361501072,
+      "loss": 11.5,
+      "step": 64
+    },
+    {
+      "epoch": 0.01040478620165276,
+      "grad_norm": 0.0003070418315473944,
+      "learning_rate": 0.00019932836109684286,
+      "loss": 11.5,
+      "step": 65
+    },
+    {
+      "epoch": 0.010564859835524342,
+      "grad_norm": 0.0004711727669928223,
+      "learning_rate": 0.00019930374447280845,
+      "loss": 11.5,
+      "step": 66
+    },
+    {
+      "epoch": 0.010724933469395922,
+      "grad_norm": 0.0004554349579848349,
+      "learning_rate": 0.00019927868638743875,
+      "loss": 11.5,
+      "step": 67
+    },
+    {
+      "epoch": 0.010885007103267503,
+      "grad_norm": 0.0005644407356157899,
+      "learning_rate": 0.0001992531869521312,
+      "loss": 11.5,
+      "step": 68
+    },
+    {
+      "epoch": 0.011045080737139085,
+      "grad_norm": 0.0006445281323976815,
+      "learning_rate": 0.00019922724628024515,
+      "loss": 11.5,
+      "step": 69
+    },
+    {
+      "epoch": 0.011205154371010665,
+      "grad_norm": 0.00069785414962098,
+      "learning_rate": 0.0001992008644871016,
+      "loss": 11.5,
+      "step": 70
+    },
+    {
+      "epoch": 0.011365228004882245,
+      "grad_norm": 0.0009710849262773991,
+      "learning_rate": 0.00019917404168998256,
+      "loss": 11.5,
+      "step": 71
+    },
+    {
+      "epoch": 0.011525301638753828,
+      "grad_norm": 0.0006939312443137169,
+      "learning_rate": 0.0001991467780081305,
+      "loss": 11.5,
+      "step": 72
+    },
+    {
+      "epoch": 0.011685375272625408,
+      "grad_norm": 0.0008124214364215732,
+      "learning_rate": 0.00019911907356274795,
+      "loss": 11.5,
+      "step": 73
+    },
+    {
+      "epoch": 0.011845448906496988,
+      "grad_norm": 0.0010405608918517828,
+      "learning_rate": 0.00019909092847699683,
+      "loss": 11.5,
+      "step": 74
+    },
+    {
+      "epoch": 0.012005522540368569,
+      "grad_norm": 0.0007668936741538346,
+      "learning_rate": 0.00019906234287599798,
+      "loss": 11.5,
+      "step": 75
+    },
+    {
+      "epoch": 0.01216559617424015,
+      "grad_norm": 0.0009385758312419057,
+      "learning_rate": 0.00019903331688683057,
+      "loss": 11.5,
+      "step": 76
+    },
+    {
+      "epoch": 0.012325669808111731,
+      "grad_norm": 0.0007635114598087966,
+      "learning_rate": 0.00019900385063853154,
+      "loss": 11.5,
+      "step": 77
+    },
+    {
+      "epoch": 0.012485743441983312,
+      "grad_norm": 0.0008907768642529845,
+      "learning_rate": 0.00019897394426209505,
+      "loss": 11.5,
+      "step": 78
+    },
+    {
+      "epoch": 0.012645817075854894,
+      "grad_norm": 0.0007882064091973007,
+      "learning_rate": 0.00019894359789047187,
+      "loss": 11.5,
+      "step": 79
+    },
+    {
+      "epoch": 0.012805890709726474,
+      "grad_norm": 0.0009297032374888659,
+      "learning_rate": 0.00019891281165856873,
+      "loss": 11.5,
+      "step": 80
+    },
+    {
+      "epoch": 0.012965964343598054,
+      "grad_norm": 0.0008925965521484613,
+      "learning_rate": 0.00019888158570324795,
+      "loss": 11.5,
+      "step": 81
+    },
+    {
+      "epoch": 0.013126037977469637,
+      "grad_norm": 0.0011857181088998914,
+      "learning_rate": 0.0001988499201633265,
+      "loss": 11.5,
+      "step": 82
+    },
+    {
+      "epoch": 0.013286111611341217,
+      "grad_norm": 0.0008185437764041126,
+      "learning_rate": 0.00019881781517957562,
+      "loss": 11.5,
+      "step": 83
+    },
+    {
+      "epoch": 0.013446185245212797,
+      "grad_norm": 0.0010364435147494078,
+      "learning_rate": 0.0001987852708947202,
+      "loss": 11.5,
+      "step": 84
+    },
+    {
+      "epoch": 0.01360625887908438,
+      "grad_norm": 0.0013269423507153988,
+      "learning_rate": 0.00019875228745343794,
+      "loss": 11.5,
+      "step": 85
+    },
+    {
+      "epoch": 0.01376633251295596,
+      "grad_norm": 0.0012907687341794372,
+      "learning_rate": 0.0001987188650023589,
+      "loss": 11.5,
+      "step": 86
+    },
+    {
+      "epoch": 0.01392640614682754,
+      "grad_norm": 0.0011142597068101168,
+      "learning_rate": 0.0001986850036900648,
+      "loss": 11.5,
+      "step": 87
+    },
+    {
+      "epoch": 0.014086479780699122,
+      "grad_norm": 0.0012716882629320025,
+      "learning_rate": 0.00019865070366708836,
+      "loss": 11.5,
+      "step": 88
+    },
+    {
+      "epoch": 0.014246553414570703,
+      "grad_norm": 0.0014261571923270822,
+      "learning_rate": 0.00019861596508591255,
+      "loss": 11.5,
+      "step": 89
+    },
+    {
+      "epoch": 0.014406627048442283,
+      "grad_norm": 0.0009479870204813778,
+      "learning_rate": 0.00019858078810097002,
+      "loss": 11.5,
+      "step": 90
+    },
+    {
+      "epoch": 0.014566700682313864,
+      "grad_norm": 0.0018024398013949394,
+      "learning_rate": 0.00019854517286864245,
+      "loss": 11.5,
+      "step": 91
+    },
+    {
+      "epoch": 0.014726774316185446,
+      "grad_norm": 0.0014115448575466871,
+      "learning_rate": 0.0001985091195472596,
+      "loss": 11.5,
+      "step": 92
+    },
+    {
+      "epoch": 0.014886847950057026,
+      "grad_norm": 0.0021201353520154953,
+      "learning_rate": 0.0001984726282970989,
+      "loss": 11.5,
+      "step": 93
+    },
+    {
+      "epoch": 0.015046921583928606,
+      "grad_norm": 0.001981760375201702,
+      "learning_rate": 0.0001984356992803847,
+      "loss": 11.5,
+      "step": 94
+    },
+    {
+      "epoch": 0.015206995217800189,
+      "grad_norm": 0.0031021598260849714,
+      "learning_rate": 0.00019839833266128724,
+      "loss": 11.5,
+      "step": 95
+    },
+    {
+      "epoch": 0.015367068851671769,
+      "grad_norm": 0.004780913703143597,
+      "learning_rate": 0.00019836052860592237,
+      "loss": 11.5,
+      "step": 96
+    },
+    {
+      "epoch": 0.01552714248554335,
+      "grad_norm": 0.0040374197997152805,
+      "learning_rate": 0.0001983222872823505,
+      "loss": 11.5,
+      "step": 97
+    },
+    {
+      "epoch": 0.01568721611941493,
+      "grad_norm": 0.0029658880084753036,
+      "learning_rate": 0.00019828360886057594,
+      "loss": 11.5,
+      "step": 98
+    },
+    {
+      "epoch": 0.01584728975328651,
+      "grad_norm": 0.0017225135816261172,
+      "learning_rate": 0.00019824449351254616,
+      "loss": 11.5,
+      "step": 99
+    },
+    {
+      "epoch": 0.016007363387158094,
+      "grad_norm": 0.001711854711174965,
+      "learning_rate": 0.00019820494141215104,
+      "loss": 11.5,
+      "step": 100
+    },
+    {
+      "epoch": 0.016167437021029674,
+      "grad_norm": 0.0003945448261220008,
+      "learning_rate": 0.000198164952735222,
+      "loss": 11.5,
+      "step": 101
+    },
+    {
+      "epoch": 0.016327510654901255,
+      "grad_norm": 0.0003840984427370131,
+      "learning_rate": 0.00019812452765953135,
+      "loss": 11.5,
+      "step": 102
+    },
+    {
+      "epoch": 0.016487584288772835,
+      "grad_norm": 0.0004888374824076891,
+      "learning_rate": 0.00019808366636479147,
+      "loss": 11.5,
+      "step": 103
+    },
+    {
+      "epoch": 0.016647657922644415,
+      "grad_norm": 0.0005102389259263873,
+      "learning_rate": 0.00019804236903265388,
+      "loss": 11.5,
+      "step": 104
+    },
+    {
+      "epoch": 0.016807731556515996,
+      "grad_norm": 0.0005896842922084033,
+      "learning_rate": 0.00019800063584670863,
+      "loss": 11.5,
+      "step": 105
+    },
+    {
+      "epoch": 0.01696780519038758,
+      "grad_norm": 0.0005505888257175684,
+      "learning_rate": 0.00019795846699248332,
+      "loss": 11.5,
+      "step": 106
+    },
+    {
+      "epoch": 0.01712787882425916,
+      "grad_norm": 0.00038450522697530687,
+      "learning_rate": 0.00019791586265744237,
+      "loss": 11.5,
+      "step": 107
+    },
+    {
+      "epoch": 0.01728795245813074,
+      "grad_norm": 0.000713772140443325,
+      "learning_rate": 0.00019787282303098617,
+      "loss": 11.5,
+      "step": 108
+    },
+    {
+      "epoch": 0.01744802609200232,
+      "grad_norm": 0.0007358138682320714,
+      "learning_rate": 0.0001978293483044502,
+      "loss": 11.5,
+      "step": 109
+    },
+    {
+      "epoch": 0.0176080997258739,
+      "grad_norm": 0.00032157220994122326,
+      "learning_rate": 0.00019778543867110426,
+      "loss": 11.5,
+      "step": 110
+    },
+    {
+      "epoch": 0.01776817335974548,
+      "grad_norm": 0.0005707935779355466,
+      "learning_rate": 0.00019774109432615147,
+      "loss": 11.5,
+      "step": 111
+    },
+    {
+      "epoch": 0.017928246993617065,
+      "grad_norm": 0.00048133719246834517,
+      "learning_rate": 0.00019769631546672756,
+      "loss": 11.5,
+      "step": 112
+    },
+    {
+      "epoch": 0.018088320627488646,
+      "grad_norm": 0.00041901893564499915,
+      "learning_rate": 0.00019765110229189988,
+      "loss": 11.5,
+      "step": 113
+    },
+    {
+      "epoch": 0.018248394261360226,
+      "grad_norm": 0.0005528696929104626,
+      "learning_rate": 0.00019760545500266657,
+      "loss": 11.5,
+      "step": 114
+    },
+    {
+      "epoch": 0.018408467895231807,
+      "grad_norm": 0.001078005414456129,
+      "learning_rate": 0.00019755937380195568,
+      "loss": 11.5,
+      "step": 115
+    },
+    {
+      "epoch": 0.018568541529103387,
+      "grad_norm": 0.001656021922826767,
+      "learning_rate": 0.00019751285889462423,
+      "loss": 11.5,
+      "step": 116
+    },
+    {
+      "epoch": 0.018728615162974967,
+      "grad_norm": 0.0010341198649257421,
+      "learning_rate": 0.0001974659104874573,
+      "loss": 11.5,
+      "step": 117
+    },
+    {
+      "epoch": 0.018888688796846548,
+      "grad_norm": 0.001869532628916204,
+      "learning_rate": 0.0001974185287891671,
+      "loss": 11.5,
+      "step": 118
+    },
+    {
+      "epoch": 0.01904876243071813,
+      "grad_norm": 0.001640613074414432,
+      "learning_rate": 0.0001973707140103921,
+      "loss": 11.5,
+      "step": 119
+    },
+    {
+      "epoch": 0.019208836064589712,
+      "grad_norm": 0.0012787277810275555,
+      "learning_rate": 0.00019732246636369605,
+      "loss": 11.5,
+      "step": 120
+    },
+    {
+      "epoch": 0.019368909698461292,
+      "grad_norm": 0.001795590273104608,
+      "learning_rate": 0.00019727378606356703,
+      "loss": 11.5,
+      "step": 121
+    },
+    {
+      "epoch": 0.019528983332332873,
+      "grad_norm": 0.002011685399338603,
+      "learning_rate": 0.00019722467332641656,
+      "loss": 11.5,
+      "step": 122
+    },
+    {
+      "epoch": 0.019689056966204453,
+      "grad_norm": 0.0022899755276739597,
+      "learning_rate": 0.00019717512837057855,
+      "loss": 11.5,
+      "step": 123
+    },
+    {
+      "epoch": 0.019849130600076034,
+      "grad_norm": 0.0017873673932626843,
+      "learning_rate": 0.0001971251514163083,
+      "loss": 11.5,
+      "step": 124
+    },
+    {
+      "epoch": 0.020009204233947617,
+      "grad_norm": 0.002426765626296401,
+      "learning_rate": 0.0001970747426857817,
+      "loss": 11.5,
+      "step": 125
+    },
+    {
+      "epoch": 0.020169277867819198,
+      "grad_norm": 0.0022798413410782814,
+      "learning_rate": 0.00019702390240309404,
+      "loss": 11.5,
+      "step": 126
+    },
+    {
+      "epoch": 0.020329351501690778,
+      "grad_norm": 0.0030735242180526257,
+      "learning_rate": 0.0001969726307942592,
+      "loss": 11.5,
+      "step": 127
+    },
+    {
+      "epoch": 0.02048942513556236,
+      "grad_norm": 0.001868702471256256,
+      "learning_rate": 0.00019692092808720846,
+      "loss": 11.5,
+      "step": 128
+    },
+    {
+      "epoch": 0.02064949876943394,
+      "grad_norm": 0.00234604743309319,
+      "learning_rate": 0.0001968687945117896,
+      "loss": 11.5,
+      "step": 129
+    },
+    {
+      "epoch": 0.02080957240330552,
+      "grad_norm": 0.0020907274447381496,
+      "learning_rate": 0.00019681623029976588,
+      "loss": 11.5,
+      "step": 130
+    },
+    {
+      "epoch": 0.020969646037177103,
+      "grad_norm": 0.0021262471564114094,
+      "learning_rate": 0.00019676323568481498,
+      "loss": 11.5,
+      "step": 131
+    },
+    {
+      "epoch": 0.021129719671048684,
+      "grad_norm": 0.00310887536033988,
+      "learning_rate": 0.00019670981090252792,
+      "loss": 11.5,
+      "step": 132
+    },
+    {
+      "epoch": 0.021289793304920264,
+      "grad_norm": 0.002246802905574441,
+      "learning_rate": 0.00019665595619040808,
+      "loss": 11.5,
+      "step": 133
+    },
+    {
+      "epoch": 0.021449866938791844,
+      "grad_norm": 0.002843370893970132,
+      "learning_rate": 0.0001966016717878702,
+      "loss": 11.5,
+      "step": 134
+    },
+    {
+      "epoch": 0.021609940572663425,
+      "grad_norm": 0.002506015356630087,
+      "learning_rate": 0.00019654695793623907,
+      "loss": 11.5,
+      "step": 135
+    },
+    {
+      "epoch": 0.021770014206535005,
+      "grad_norm": 0.0026175500825047493,
+      "learning_rate": 0.0001964918148787488,
+      "loss": 11.5,
+      "step": 136
+    },
+    {
+      "epoch": 0.021930087840406585,
+      "grad_norm": 0.002090322319418192,
+      "learning_rate": 0.00019643624286054144,
+      "loss": 11.5,
+      "step": 137
+    },
+    {
+      "epoch": 0.02209016147427817,
+      "grad_norm": 0.0033717560581862926,
+      "learning_rate": 0.00019638024212866606,
+      "loss": 11.5,
+      "step": 138
+    },
+    {
+      "epoch": 0.02225023510814975,
+      "grad_norm": 0.0035848321858793497,
+      "learning_rate": 0.0001963238129320776,
+      "loss": 11.5,
+      "step": 139
+    },
+    {
+      "epoch": 0.02241030874202133,
+      "grad_norm": 0.0018094985280185938,
+      "learning_rate": 0.00019626695552163578,
+      "loss": 11.5,
+      "step": 140
+    },
+    {
+      "epoch": 0.02257038237589291,
+      "grad_norm": 0.0034546051174402237,
+      "learning_rate": 0.00019620967015010395,
+      "loss": 11.5,
+      "step": 141
+    },
+    {
+      "epoch": 0.02273045600976449,
+      "grad_norm": 0.0036798615474253893,
+      "learning_rate": 0.00019615195707214803,
+      "loss": 11.5,
+      "step": 142
+    },
+    {
+      "epoch": 0.02289052964363607,
+      "grad_norm": 0.0025492943823337555,
+      "learning_rate": 0.0001960938165443353,
+      "loss": 11.5,
+      "step": 143
+    },
+    {
+      "epoch": 0.023050603277507655,
+      "grad_norm": 0.004303085617721081,
+      "learning_rate": 0.00019603524882513327,
+      "loss": 11.5,
+      "step": 144
+    },
+    {
+      "epoch": 0.023210676911379235,
+      "grad_norm": 0.0070602428168058395,
+      "learning_rate": 0.0001959762541749086,
+      "loss": 11.5,
+      "step": 145
+    },
+    {
+      "epoch": 0.023370750545250816,
+      "grad_norm": 0.005210757255554199,
+      "learning_rate": 0.00019591683285592593,
+      "loss": 11.5,
+      "step": 146
+    },
+    {
+      "epoch": 0.023530824179122396,
+      "grad_norm": 0.007379893679171801,
+      "learning_rate": 0.00019585698513234663,
+      "loss": 11.5,
+      "step": 147
+    },
+    {
+      "epoch": 0.023690897812993977,
+      "grad_norm": 0.0037022745236754417,
+      "learning_rate": 0.0001957967112702277,
+      "loss": 11.5,
+      "step": 148
+    },
+    {
+      "epoch": 0.023850971446865557,
+      "grad_norm": 0.001964289229363203,
+      "learning_rate": 0.00019573601153752052,
+      "loss": 11.5,
+      "step": 149
+    },
+    {
+      "epoch": 0.024011045080737137,
+      "grad_norm": 0.0038277339190244675,
+      "learning_rate": 0.00019567488620406983,
+      "loss": 11.5,
+      "step": 150
+    },
+    {
+      "epoch": 0.02417111871460872,
+      "grad_norm": 0.001184475957415998,
+      "learning_rate": 0.00019561333554161224,
+      "loss": 11.5,
+      "step": 151
+    },
+    {
+      "epoch": 0.0243311923484803,
+      "grad_norm": 0.0013432531850412488,
+      "learning_rate": 0.0001955513598237753,
+      "loss": 11.5,
+      "step": 152
+    },
+    {
+      "epoch": 0.024491265982351882,
+      "grad_norm": 0.0012706320267170668,
+      "learning_rate": 0.00019548895932607621,
+      "loss": 11.5,
+      "step": 153
+    },
+    {
+      "epoch": 0.024651339616223462,
+      "grad_norm": 0.0011202878085896373,
+      "learning_rate": 0.00019542613432592038,
+      "loss": 11.5,
+      "step": 154
+    },
+    {
+      "epoch": 0.024811413250095043,
+      "grad_norm": 0.0010707036126405,
+      "learning_rate": 0.00019536288510260056,
+      "loss": 11.5,
+      "step": 155
+    },
+    {
+      "epoch": 0.024971486883966623,
+      "grad_norm": 0.0010517543414607644,
+      "learning_rate": 0.00019529921193729534,
+      "loss": 11.5,
+      "step": 156
+    },
+    {
+      "epoch": 0.025131560517838207,
+      "grad_norm": 0.0010216586524620652,
+      "learning_rate": 0.00019523511511306793,
+      "loss": 11.5,
+      "step": 157
+    },
+    {
+      "epoch": 0.025291634151709787,
+      "grad_norm": 0.0010877292370423675,
+      "learning_rate": 0.000195170594914865,
+      "loss": 11.5,
+      "step": 158
+    },
+    {
+      "epoch": 0.025451707785581368,
+      "grad_norm": 0.0007502186927013099,
+      "learning_rate": 0.00019510565162951537,
+      "loss": 11.5,
+      "step": 159
+    },
+    {
+      "epoch": 0.025611781419452948,
+      "grad_norm": 0.0013300442369654775,
+      "learning_rate": 0.00019504028554572864,
+      "loss": 11.5,
+      "step": 160
+    },
+    {
+      "epoch": 0.02577185505332453,
+      "grad_norm": 0.0007744083413854241,
+      "learning_rate": 0.00019497449695409408,
+      "loss": 11.5,
+      "step": 161
+    },
+    {
+      "epoch": 0.02593192868719611,
+      "grad_norm": 0.00122166913934052,
+      "learning_rate": 0.00019490828614707916,
+      "loss": 11.5,
+      "step": 162
+    },
+    {
+      "epoch": 0.026092002321067693,
+      "grad_norm": 0.0014826179249212146,
+      "learning_rate": 0.00019484165341902845,
+      "loss": 11.5,
+      "step": 163
+    },
+    {
+      "epoch": 0.026252075954939273,
+      "grad_norm": 0.001125049777328968,
+      "learning_rate": 0.00019477459906616206,
+      "loss": 11.5,
+      "step": 164
+    },
+    {
+      "epoch": 0.026412149588810854,
+      "grad_norm": 0.0015355943469330668,
+      "learning_rate": 0.00019470712338657458,
+      "loss": 11.5,
+      "step": 165
+    },
+    {
+      "epoch": 0.026572223222682434,
+      "grad_norm": 0.002622431144118309,
+      "learning_rate": 0.0001946392266802336,
+      "loss": 11.5,
+      "step": 166
+    },
+    {
+      "epoch": 0.026732296856554014,
+      "grad_norm": 0.0022032030392438173,
+      "learning_rate": 0.0001945709092489783,
+      "loss": 11.5,
+      "step": 167
+    },
+    {
+      "epoch": 0.026892370490425595,
+      "grad_norm": 0.0028216661885380745,
+      "learning_rate": 0.00019450217139651844,
+      "loss": 11.5,
+      "step": 168
+    },
+    {
+      "epoch": 0.027052444124297175,
+      "grad_norm": 0.003236501244828105,
+      "learning_rate": 0.0001944330134284326,
+      "loss": 11.5,
+      "step": 169
+    },
+    {
+      "epoch": 0.02721251775816876,
+      "grad_norm": 0.004418008495122194,
+      "learning_rate": 0.00019436343565216711,
+      "loss": 11.5,
+      "step": 170
+    },
+    {
+      "epoch": 0.02737259139204034,
+      "grad_norm": 0.0035076425410807133,
+      "learning_rate": 0.00019429343837703455,
+      "loss": 11.5,
+      "step": 171
+    },
+    {
+      "epoch": 0.02753266502591192,
+      "grad_norm": 0.003910847939550877,
+      "learning_rate": 0.0001942230219142124,
+      "loss": 11.5,
+      "step": 172
+    },
+    {
+      "epoch": 0.0276927386597835,
+      "grad_norm": 0.004566975403577089,
+      "learning_rate": 0.0001941521865767417,
+      "loss": 11.5,
+      "step": 173
+    },
+    {
+      "epoch": 0.02785281229365508,
+      "grad_norm": 0.0030329632572829723,
+      "learning_rate": 0.0001940809326795256,
+      "loss": 11.5,
+      "step": 174
+    },
+    {
+      "epoch": 0.02801288592752666,
+      "grad_norm": 0.0040180860087275505,
+      "learning_rate": 0.000194009260539328,
+      "loss": 11.5,
+      "step": 175
+    },
+    {
+      "epoch": 0.028172959561398245,
+      "grad_norm": 0.0033933145459741354,
+      "learning_rate": 0.0001939371704747721,
+      "loss": 11.5,
+      "step": 176
+    },
+    {
+      "epoch": 0.028333033195269825,
+      "grad_norm": 0.004221628420054913,
+      "learning_rate": 0.00019386466280633906,
+      "loss": 11.5,
+      "step": 177
+    },
+    {
+      "epoch": 0.028493106829141406,
+      "grad_norm": 0.0038972035981714725,
+      "learning_rate": 0.00019379173785636646,
+      "loss": 11.5,
+      "step": 178
+    },
+    {
+      "epoch": 0.028653180463012986,
+      "grad_norm": 0.0039198147132992744,
+      "learning_rate": 0.000193718395949047,
+      "loss": 11.5,
+      "step": 179
+    },
+    {
+      "epoch": 0.028813254096884566,
+      "grad_norm": 0.005124758463352919,
+      "learning_rate": 0.00019364463741042694,
+      "loss": 11.5,
+      "step": 180
+    },
+    {
+      "epoch": 0.028973327730756147,
+      "grad_norm": 0.004067589528858662,
+      "learning_rate": 0.00019357046256840473,
+      "loss": 11.5,
+      "step": 181
+    },
+    {
+      "epoch": 0.029133401364627727,
+      "grad_norm": 0.004641098901629448,
+      "learning_rate": 0.00019349587175272948,
+      "loss": 11.5,
+      "step": 182
+    },
+    {
+      "epoch": 0.02929347499849931,
+      "grad_norm": 0.004570235963910818,
+      "learning_rate": 0.0001934208652949996,
+      "loss": 11.5,
+      "step": 183
+    },
+    {
+      "epoch": 0.02945354863237089,
+      "grad_norm": 0.004834281280636787,
+      "learning_rate": 0.00019334544352866127,
+      "loss": 11.5,
+      "step": 184
+    },
+    {
+      "epoch": 0.02961362226624247,
+      "grad_norm": 0.004718173760920763,
+      "learning_rate": 0.00019326960678900688,
+      "loss": 11.5,
+      "step": 185
+    },
+    {
+      "epoch": 0.029773695900114052,
+      "grad_norm": 0.0034208674915134907,
+      "learning_rate": 0.00019319335541317361,
+      "loss": 11.5,
+      "step": 186
+    },
+    {
+      "epoch": 0.029933769533985632,
+      "grad_norm": 0.004345146007835865,
+      "learning_rate": 0.00019311668974014208,
+      "loss": 11.5,
+      "step": 187
+    },
+    {
+      "epoch": 0.030093843167857213,
+      "grad_norm": 0.004935440141707659,
+      "learning_rate": 0.00019303961011073447,
+      "loss": 11.5,
+      "step": 188
+    },
+    {
+      "epoch": 0.030253916801728797,
+      "grad_norm": 0.003570390399545431,
+      "learning_rate": 0.00019296211686761346,
+      "loss": 11.5,
+      "step": 189
+    },
+    {
+      "epoch": 0.030413990435600377,
+      "grad_norm": 0.004747358616441488,
+      "learning_rate": 0.00019288421035528028,
+      "loss": 11.5,
+      "step": 190
+    },
+    {
+      "epoch": 0.030574064069471957,
+      "grad_norm": 0.0070312670432031155,
+      "learning_rate": 0.00019280589092007352,
+      "loss": 11.5,
+      "step": 191
+    },
+    {
+      "epoch": 0.030734137703343538,
+      "grad_norm": 0.006528772413730621,
+      "learning_rate": 0.00019272715891016735,
+      "loss": 11.5,
+      "step": 192
+    },
+    {
+      "epoch": 0.030894211337215118,
+      "grad_norm": 0.004292914178222418,
+      "learning_rate": 0.00019264801467557007,
+      "loss": 11.5,
+      "step": 193
+    },
+    {
+      "epoch": 0.0310542849710867,
+      "grad_norm": 0.003930619452148676,
+      "learning_rate": 0.00019256845856812266,
+      "loss": 11.5,
+      "step": 194
+    },
+    {
+      "epoch": 0.031214358604958282,
+      "grad_norm": 0.009428051300346851,
+      "learning_rate": 0.000192488490941497,
+      "loss": 11.5,
+      "step": 195
+    },
+    {
+      "epoch": 0.03137443223882986,
+      "grad_norm": 0.011329618282616138,
+      "learning_rate": 0.00019240811215119448,
+      "loss": 11.5,
+      "step": 196
+    },
+    {
+      "epoch": 0.03153450587270144,
+      "grad_norm": 0.015292741358280182,
+      "learning_rate": 0.00019232732255454422,
+      "loss": 11.5,
+      "step": 197
+    },
+    {
+      "epoch": 0.03169457950657302,
+      "grad_norm": 0.007595112547278404,
+      "learning_rate": 0.00019224612251070175,
+      "loss": 11.5,
+      "step": 198
+    },
+    {
+      "epoch": 0.031854653140444604,
+      "grad_norm": 0.00519321346655488,
+      "learning_rate": 0.0001921645123806472,
+      "loss": 11.5,
+      "step": 199
+    },
+    {
+      "epoch": 0.03201472677431619,
+      "grad_norm": 0.008084663189947605,
+      "learning_rate": 0.0001920824925271838,
+      "loss": 11.5,
+      "step": 200
+    },
+    {
+      "epoch": 0.032174800408187765,
+      "grad_norm": 0.004125955514609814,
+      "learning_rate": 0.0001920000633149362,
+      "loss": 11.5,
+      "step": 201
+    },
+    {
+      "epoch": 0.03233487404205935,
+      "grad_norm": 0.004124477505683899,
+      "learning_rate": 0.00019191722511034884,
+      "loss": 11.5,
+      "step": 202
+    },
+    {
+      "epoch": 0.032494947675930926,
+      "grad_norm": 0.003988343756645918,
+      "learning_rate": 0.00019183397828168448,
+      "loss": 11.5,
+      "step": 203
+    },
+    {
+      "epoch": 0.03265502130980251,
+      "grad_norm": 0.00596611388027668,
+      "learning_rate": 0.00019175032319902234,
+      "loss": 11.5,
+      "step": 204
+    },
+    {
+      "epoch": 0.03281509494367409,
+      "grad_norm": 0.004642547108232975,
+      "learning_rate": 0.00019166626023425662,
+      "loss": 11.5,
+      "step": 205
+    },
+    {
+      "epoch": 0.03297516857754567,
+      "grad_norm": 0.005087132565677166,
+      "learning_rate": 0.00019158178976109476,
+      "loss": 11.5,
+      "step": 206
+    },
+    {
+      "epoch": 0.033135242211417254,
+      "grad_norm": 0.0064563388004899025,
+      "learning_rate": 0.0001914969121550558,
+      "loss": 11.5,
+      "step": 207
+    },
+    {
+      "epoch": 0.03329531584528883,
+      "grad_norm": 0.005523744970560074,
+      "learning_rate": 0.00019141162779346874,
+      "loss": 11.5,
+      "step": 208
+    },
+    {
+      "epoch": 0.033455389479160415,
+      "grad_norm": 0.005560313351452351,
+      "learning_rate": 0.00019132593705547082,
+      "loss": 11.5,
+      "step": 209
+    },
+    {
+      "epoch": 0.03361546311303199,
+      "grad_norm": 0.004362552426755428,
+      "learning_rate": 0.00019123984032200586,
+      "loss": 11.5,
+      "step": 210
+    },
+    {
+      "epoch": 0.033775536746903576,
+      "grad_norm": 0.005080255679786205,
+      "learning_rate": 0.00019115333797582254,
+      "loss": 11.5,
+      "step": 211
+    },
+    {
+      "epoch": 0.03393561038077516,
+      "grad_norm": 0.0047751725651323795,
+      "learning_rate": 0.00019106643040147278,
+      "loss": 11.5,
+      "step": 212
+    },
+    {
+      "epoch": 0.034095684014646736,
+      "grad_norm": 0.0036601570900529623,
+      "learning_rate": 0.00019097911798530987,
+      "loss": 11.5,
+      "step": 213
+    },
+    {
+      "epoch": 0.03425575764851832,
+      "grad_norm": 0.0042919619008898735,
+      "learning_rate": 0.00019089140111548696,
+      "loss": 11.5,
+      "step": 214
+    },
+    {
+      "epoch": 0.0344158312823899,
+      "grad_norm": 0.004078218713402748,
+      "learning_rate": 0.00019080328018195513,
+      "loss": 11.5,
+      "step": 215
+    },
+    {
+      "epoch": 0.03457590491626148,
+      "grad_norm": 0.0035699764266610146,
+      "learning_rate": 0.0001907147555764618,
+      "loss": 11.5,
+      "step": 216
+    },
+    {
+      "epoch": 0.03473597855013306,
+      "grad_norm": 0.006953587289899588,
+      "learning_rate": 0.00019062582769254895,
+      "loss": 11.5,
+      "step": 217
+    },
+    {
+      "epoch": 0.03489605218400464,
+      "grad_norm": 0.005379633978009224,
+      "learning_rate": 0.00019053649692555135,
+      "loss": 11.5,
+      "step": 218
+    },
+    {
+      "epoch": 0.035056125817876226,
+      "grad_norm": 0.005702970549464226,
+      "learning_rate": 0.00019044676367259476,
+      "loss": 11.5,
+      "step": 219
+    },
+    {
+      "epoch": 0.0352161994517478,
+      "grad_norm": 0.005873862653970718,
+      "learning_rate": 0.00019035662833259432,
+      "loss": 11.5,
+      "step": 220
+    },
+    {
+      "epoch": 0.035376273085619386,
+      "grad_norm": 0.0050443257205188274,
+      "learning_rate": 0.00019026609130625257,
+      "loss": 11.5,
+      "step": 221
+    },
+    {
+      "epoch": 0.03553634671949096,
+      "grad_norm": 0.005148936528712511,
+      "learning_rate": 0.00019017515299605788,
+      "loss": 11.5,
+      "step": 222
+    },
+    {
+      "epoch": 0.03569642035336255,
+      "grad_norm": 0.004865546710789204,
+      "learning_rate": 0.00019008381380628247,
+      "loss": 11.5,
+      "step": 223
+    },
+    {
+      "epoch": 0.03585649398723413,
+      "grad_norm": 0.00710660545155406,
+      "learning_rate": 0.00018999207414298067,
+      "loss": 11.5,
+      "step": 224
+    },
+    {
+      "epoch": 0.03601656762110571,
+      "grad_norm": 0.005359895061701536,
+      "learning_rate": 0.00018989993441398726,
+      "loss": 11.5,
+      "step": 225
+    },
+    {
+      "epoch": 0.03617664125497729,
+      "grad_norm": 0.006631465628743172,
+      "learning_rate": 0.00018980739502891546,
+      "loss": 11.5,
+      "step": 226
+    },
+    {
+      "epoch": 0.03633671488884887,
+      "grad_norm": 0.00704937893897295,
+      "learning_rate": 0.0001897144563991552,
+      "loss": 11.5,
+      "step": 227
+    },
+    {
+      "epoch": 0.03649678852272045,
+      "grad_norm": 0.0053662629798054695,
+      "learning_rate": 0.00018962111893787128,
+      "loss": 11.5,
+      "step": 228
+    },
+    {
+      "epoch": 0.03665686215659203,
+      "grad_norm": 0.005165701732039452,
+      "learning_rate": 0.00018952738306000151,
+      "loss": 11.5,
+      "step": 229
+    },
+    {
+      "epoch": 0.03681693579046361,
+      "grad_norm": 0.005945558659732342,
+      "learning_rate": 0.00018943324918225494,
+      "loss": 11.5,
+      "step": 230
+    },
+    {
+      "epoch": 0.0369770094243352,
+      "grad_norm": 0.006662803236395121,
+      "learning_rate": 0.0001893387177231099,
+      "loss": 11.5,
+      "step": 231
+    },
+    {
+      "epoch": 0.037137083058206774,
+      "grad_norm": 0.005151469726115465,
+      "learning_rate": 0.0001892437891028122,
+      "loss": 11.5,
+      "step": 232
+    },
+    {
+      "epoch": 0.03729715669207836,
+      "grad_norm": 0.007021139841526747,
+      "learning_rate": 0.0001891484637433733,
+      "loss": 11.5,
+      "step": 233
+    },
+    {
+      "epoch": 0.037457230325949935,
+      "grad_norm": 0.006236946675926447,
+      "learning_rate": 0.00018905274206856837,
+      "loss": 11.5,
+      "step": 234
+    },
+    {
+      "epoch": 0.03761730395982152,
+      "grad_norm": 0.006137041375041008,
+      "learning_rate": 0.00018895662450393438,
+      "loss": 11.5,
+      "step": 235
+    },
+    {
+      "epoch": 0.037777377593693096,
+      "grad_norm": 0.005807884503155947,
+      "learning_rate": 0.00018886011147676833,
+      "loss": 11.5,
+      "step": 236
+    },
+    {
+      "epoch": 0.03793745122756468,
+      "grad_norm": 0.006715100258588791,
+      "learning_rate": 0.00018876320341612522,
+      "loss": 11.5,
+      "step": 237
+    },
+    {
+      "epoch": 0.03809752486143626,
+      "grad_norm": 0.007929547689855099,
+      "learning_rate": 0.00018866590075281624,
+      "loss": 11.5,
+      "step": 238
+    },
+    {
+      "epoch": 0.03825759849530784,
+      "grad_norm": 0.007534688338637352,
+      "learning_rate": 0.00018856820391940674,
+      "loss": 11.5,
+      "step": 239
+    },
+    {
+      "epoch": 0.038417672129179424,
+      "grad_norm": 0.00604398176074028,
+      "learning_rate": 0.00018847011335021449,
+      "loss": 11.5,
+      "step": 240
+    },
+    {
+      "epoch": 0.038577745763051,
+      "grad_norm": 0.008669275790452957,
+      "learning_rate": 0.00018837162948130752,
+      "loss": 11.5,
+      "step": 241
+    },
+    {
+      "epoch": 0.038737819396922585,
+      "grad_norm": 0.004927826579660177,
+      "learning_rate": 0.00018827275275050233,
+      "loss": 11.5,
+      "step": 242
+    },
+    {
+      "epoch": 0.03889789303079417,
+      "grad_norm": 0.008499566465616226,
+      "learning_rate": 0.00018817348359736203,
+      "loss": 11.5,
+      "step": 243
+    },
+    {
+      "epoch": 0.039057966664665746,
+      "grad_norm": 0.014102022163569927,
+      "learning_rate": 0.00018807382246319412,
+      "loss": 11.5,
+      "step": 244
+    },
+    {
+      "epoch": 0.03921804029853733,
+      "grad_norm": 0.016383271664381027,
+      "learning_rate": 0.00018797376979104872,
+      "loss": 11.5,
+      "step": 245
+    },
+    {
+      "epoch": 0.039378113932408906,
+      "grad_norm": 0.01808208040893078,
+      "learning_rate": 0.00018787332602571662,
+      "loss": 11.5,
+      "step": 246
+    },
+    {
+      "epoch": 0.03953818756628049,
+      "grad_norm": 0.01708586886525154,
+      "learning_rate": 0.00018777249161372713,
+      "loss": 11.5,
+      "step": 247
+    },
+    {
+      "epoch": 0.03969826120015207,
+      "grad_norm": 0.02052490971982479,
+      "learning_rate": 0.00018767126700334634,
+      "loss": 11.5,
+      "step": 248
+    },
+    {
+      "epoch": 0.03985833483402365,
+      "grad_norm": 0.017746634781360626,
+      "learning_rate": 0.0001875696526445749,
+      "loss": 11.5,
+      "step": 249
+    },
+    {
+      "epoch": 0.040018408467895235,
+      "grad_norm": 0.019202718511223793,
+      "learning_rate": 0.0001874676489891461,
+      "loss": 11.5,
+      "step": 250
+    },
+    {
+      "epoch": 0.04017848210176681,
+      "grad_norm": 0.010096721351146698,
+      "learning_rate": 0.00018736525649052394,
+      "loss": 11.5,
+      "step": 251
+    },
+    {
+      "epoch": 0.040338555735638396,
+      "grad_norm": 0.01723094843327999,
+      "learning_rate": 0.00018726247560390099,
+      "loss": 11.5,
+      "step": 252
+    },
+    {
+      "epoch": 0.04049862936950997,
+      "grad_norm": 0.011470257304608822,
+      "learning_rate": 0.00018715930678619644,
+      "loss": 11.5,
+      "step": 253
+    },
+    {
+      "epoch": 0.040658703003381556,
+      "grad_norm": 0.014084495604038239,
+      "learning_rate": 0.00018705575049605413,
+      "loss": 11.5,
+      "step": 254
+    },
+    {
+      "epoch": 0.04081877663725313,
+      "grad_norm": 0.009606949985027313,
+      "learning_rate": 0.00018695180719384029,
+      "loss": 11.5,
+      "step": 255
+    },
+    {
+      "epoch": 0.04097885027112472,
+      "grad_norm": 0.013065108098089695,
+      "learning_rate": 0.00018684747734164177,
+      "loss": 11.5,
+      "step": 256
+    },
+    {
+      "epoch": 0.0411389239049963,
+      "grad_norm": 0.007634848356246948,
+      "learning_rate": 0.00018674276140326376,
+      "loss": 11.5,
+      "step": 257
+    },
+    {
+      "epoch": 0.04129899753886788,
+      "grad_norm": 0.009482149966061115,
+      "learning_rate": 0.00018663765984422786,
+      "loss": 11.5,
+      "step": 258
+    },
+    {
+      "epoch": 0.04145907117273946,
+      "grad_norm": 0.007441854570060968,
+      "learning_rate": 0.00018653217313177004,
+      "loss": 11.5,
+      "step": 259
+    },
+    {
+      "epoch": 0.04161914480661104,
+      "grad_norm": 0.006992577575147152,
+      "learning_rate": 0.00018642630173483832,
+      "loss": 11.5,
+      "step": 260
+    },
+    {
+      "epoch": 0.04177921844048262,
+      "grad_norm": 0.0063720578327775,
+      "learning_rate": 0.00018632004612409103,
+      "loss": 11.5,
+      "step": 261
+    },
+    {
+      "epoch": 0.041939292074354206,
+      "grad_norm": 0.004319782834500074,
+      "learning_rate": 0.00018621340677189453,
+      "loss": 11.5,
+      "step": 262
+    },
+    {
+      "epoch": 0.04209936570822578,
+      "grad_norm": 0.00529808783903718,
+      "learning_rate": 0.00018610638415232097,
+      "loss": 11.5,
+      "step": 263
+    },
+    {
+      "epoch": 0.04225943934209737,
+      "grad_norm": 0.005276544950902462,
+      "learning_rate": 0.00018599897874114652,
+      "loss": 11.5,
+      "step": 264
+    },
+    {
+      "epoch": 0.042419512975968944,
+      "grad_norm": 0.004047976806759834,
+      "learning_rate": 0.00018589119101584898,
+      "loss": 11.5,
+      "step": 265
+    },
+    {
+      "epoch": 0.04257958660984053,
+      "grad_norm": 0.004963539075106382,
+      "learning_rate": 0.00018578302145560584,
+      "loss": 11.5,
+      "step": 266
+    },
+    {
+      "epoch": 0.042739660243712105,
+      "grad_norm": 0.004974601324647665,
+      "learning_rate": 0.00018567447054129195,
+      "loss": 11.5,
+      "step": 267
+    },
+    {
+      "epoch": 0.04289973387758369,
+      "grad_norm": 0.005424274131655693,
+      "learning_rate": 0.00018556553875547754,
+      "loss": 11.5,
+      "step": 268
+    },
+    {
+      "epoch": 0.04305980751145527,
+      "grad_norm": 0.004362328443676233,
+      "learning_rate": 0.00018545622658242607,
+      "loss": 11.5,
+      "step": 269
+    },
+    {
+      "epoch": 0.04321988114532685,
+      "grad_norm": 0.004392134957015514,
+      "learning_rate": 0.00018534653450809197,
+      "loss": 11.5,
+      "step": 270
+    },
+    {
+      "epoch": 0.04337995477919843,
+      "grad_norm": 0.004988868720829487,
+      "learning_rate": 0.00018523646302011867,
+      "loss": 11.5,
+      "step": 271
+    },
+    {
+      "epoch": 0.04354002841307001,
+      "grad_norm": 0.0036948062479496002,
+      "learning_rate": 0.00018512601260783606,
+      "loss": 11.5,
+      "step": 272
+    },
+    {
+      "epoch": 0.043700102046941594,
+      "grad_norm": 0.005431415978819132,
+      "learning_rate": 0.00018501518376225887,
+      "loss": 11.5,
+      "step": 273
+    },
+    {
+      "epoch": 0.04386017568081317,
+      "grad_norm": 0.005191940348595381,
+      "learning_rate": 0.00018490397697608395,
+      "loss": 11.5,
+      "step": 274
+    },
+    {
+      "epoch": 0.044020249314684755,
+      "grad_norm": 0.005340060219168663,
+      "learning_rate": 0.0001847923927436884,
+      "loss": 11.5,
+      "step": 275
+    },
+    {
+      "epoch": 0.04418032294855634,
+      "grad_norm": 0.005495226942002773,
+      "learning_rate": 0.00018468043156112728,
+      "loss": 11.5,
+      "step": 276
+    },
+    {
+      "epoch": 0.044340396582427916,
+      "grad_norm": 0.0054380279034376144,
+      "learning_rate": 0.0001845680939261314,
+      "loss": 11.5,
+      "step": 277
+    },
+    {
+      "epoch": 0.0445004702162995,
+      "grad_norm": 0.005373974796384573,
+      "learning_rate": 0.00018445538033810515,
+      "loss": 11.5,
+      "step": 278
+    },
+    {
+      "epoch": 0.044660543850171076,
+      "grad_norm": 0.006492419634014368,
+      "learning_rate": 0.00018434229129812418,
+      "loss": 11.5,
+      "step": 279
+    },
+    {
+      "epoch": 0.04482061748404266,
+      "grad_norm": 0.004978380165994167,
+      "learning_rate": 0.0001842288273089332,
+      "loss": 11.5,
+      "step": 280
+    },
+    {
+      "epoch": 0.04498069111791424,
+      "grad_norm": 0.005132167134433985,
+      "learning_rate": 0.00018411498887494396,
+      "loss": 11.5,
+      "step": 281
+    },
+    {
+      "epoch": 0.04514076475178582,
+      "grad_norm": 0.004873708356171846,
+      "learning_rate": 0.00018400077650223263,
+      "loss": 11.5,
+      "step": 282
+    },
+    {
+      "epoch": 0.045300838385657405,
+      "grad_norm": 0.004690905567258596,
+      "learning_rate": 0.0001838861906985379,
+      "loss": 11.5,
+      "step": 283
+    },
+    {
+      "epoch": 0.04546091201952898,
+      "grad_norm": 0.006050188094377518,
+      "learning_rate": 0.00018377123197325842,
+      "loss": 11.5,
+      "step": 284
+    },
+    {
+      "epoch": 0.045620985653400566,
+      "grad_norm": 0.008120089769363403,
+      "learning_rate": 0.00018365590083745085,
+      "loss": 11.5,
+      "step": 285
+    },
+    {
+      "epoch": 0.04578105928727214,
+      "grad_norm": 0.005599613301455975,
+      "learning_rate": 0.00018354019780382735,
+      "loss": 11.5,
+      "step": 286
+    },
+    {
+      "epoch": 0.045941132921143726,
+      "grad_norm": 0.006047028582543135,
+      "learning_rate": 0.0001834241233867533,
+      "loss": 11.5,
+      "step": 287
+    },
+    {
+      "epoch": 0.04610120655501531,
+      "grad_norm": 0.005496839992702007,
+      "learning_rate": 0.00018330767810224524,
+      "loss": 11.5,
+      "step": 288
+    },
+    {
+      "epoch": 0.04626128018888689,
+      "grad_norm": 0.004045095294713974,
+      "learning_rate": 0.0001831908624679683,
+      "loss": 11.5,
+      "step": 289
+    },
+    {
+      "epoch": 0.04642135382275847,
+      "grad_norm": 0.0066123465076088905,
+      "learning_rate": 0.0001830736770032341,
+      "loss": 11.5,
+      "step": 290
+    },
+    {
+      "epoch": 0.04658142745663005,
+      "grad_norm": 0.004854083526879549,
+      "learning_rate": 0.0001829561222289984,
+      "loss": 11.5,
+      "step": 291
+    },
+    {
+      "epoch": 0.04674150109050163,
+      "grad_norm": 0.004592921119183302,
+      "learning_rate": 0.00018283819866785853,
+      "loss": 11.5,
+      "step": 292
+    },
+    {
+      "epoch": 0.04690157472437321,
+      "grad_norm": 0.021113738417625427,
+      "learning_rate": 0.0001827199068440516,
+      "loss": 11.5,
+      "step": 293
+    },
+    {
+      "epoch": 0.04706164835824479,
+      "grad_norm": 0.026811130344867706,
+      "learning_rate": 0.00018260124728345162,
+      "loss": 11.5,
+      "step": 294
+    },
+    {
+      "epoch": 0.047221721992116376,
+      "grad_norm": 0.011840801686048508,
+      "learning_rate": 0.00018248222051356754,
+      "loss": 11.5,
+      "step": 295
+    },
+    {
+      "epoch": 0.04738179562598795,
+      "grad_norm": 0.00733939791098237,
+      "learning_rate": 0.00018236282706354063,
+      "loss": 11.5,
+      "step": 296
+    },
+    {
+      "epoch": 0.04754186925985954,
+      "grad_norm": 0.013145462609827518,
+      "learning_rate": 0.00018224306746414238,
+      "loss": 11.5,
+      "step": 297
+    },
+    {
+      "epoch": 0.047701942893731114,
+      "grad_norm": 0.009584108367562294,
+      "learning_rate": 0.00018212294224777197,
+      "loss": 11.5,
+      "step": 298
+    },
+    {
+      "epoch": 0.0478620165276027,
+      "grad_norm": 0.011472940444946289,
+      "learning_rate": 0.00018200245194845399,
+      "loss": 11.5,
+      "step": 299
+    },
+    {
+      "epoch": 0.048022090161474275,
+      "grad_norm": 0.01373722031712532,
+      "learning_rate": 0.00018188159710183594,
+      "loss": 11.5,
+      "step": 300
+    },
+    {
+      "epoch": 0.04818216379534586,
+      "grad_norm": 0.007853199727833271,
+      "learning_rate": 0.000181760378245186,
+      "loss": 11.5,
+      "step": 301
+    },
+    {
+      "epoch": 0.04834223742921744,
+      "grad_norm": 0.006789433304220438,
+      "learning_rate": 0.00018163879591739067,
+      "loss": 11.5,
+      "step": 302
+    },
+    {
+      "epoch": 0.04850231106308902,
+      "grad_norm": 0.00708062294870615,
+      "learning_rate": 0.0001815168506589521,
+      "loss": 11.5,
+      "step": 303
+    },
+    {
+      "epoch": 0.0486623846969606,
+      "grad_norm": 0.003183502471074462,
+      "learning_rate": 0.000181394543011986,
+      "loss": 11.5,
+      "step": 304
+    },
+    {
+      "epoch": 0.04882245833083218,
+      "grad_norm": 0.005664783995598555,
+      "learning_rate": 0.00018127187352021907,
+      "loss": 11.5,
+      "step": 305
+    },
+    {
+      "epoch": 0.048982531964703764,
+      "grad_norm": 0.004790788050740957,
+      "learning_rate": 0.0001811488427289866,
+      "loss": 11.5,
+      "step": 306
+    },
+    {
+      "epoch": 0.04914260559857535,
+      "grad_norm": 0.005017767194658518,
+      "learning_rate": 0.00018102545118523007,
+      "loss": 11.5,
+      "step": 307
+    },
+    {
+      "epoch": 0.049302679232446925,
+      "grad_norm": 0.004967286717146635,
+      "learning_rate": 0.00018090169943749476,
+      "loss": 11.5,
+      "step": 308
+    },
+    {
+      "epoch": 0.04946275286631851,
+      "grad_norm": 0.004688838496804237,
+      "learning_rate": 0.00018077758803592718,
+      "loss": 11.5,
+      "step": 309
+    },
+    {
+      "epoch": 0.049622826500190086,
+      "grad_norm": 0.003855897579342127,
+      "learning_rate": 0.00018065311753227273,
+      "loss": 11.5,
+      "step": 310
+    },
+    {
+      "epoch": 0.04978290013406167,
+      "grad_norm": 0.003533479291945696,
+      "learning_rate": 0.0001805282884798732,
+      "loss": 11.5,
+      "step": 311
+    },
+    {
+      "epoch": 0.049942973767933246,
+      "grad_norm": 0.003911127801984549,
+      "learning_rate": 0.00018040310143366446,
+      "loss": 11.5,
+      "step": 312
+    },
+    {
+      "epoch": 0.05010304740180483,
+      "grad_norm": 0.0077298590913414955,
+      "learning_rate": 0.00018027755695017368,
+      "loss": 11.5,
+      "step": 313
+    },
+    {
+      "epoch": 0.050263121035676414,
+      "grad_norm": 0.002832760103046894,
+      "learning_rate": 0.00018015165558751717,
+      "loss": 11.5,
+      "step": 314
+    },
+    {
+      "epoch": 0.05042319466954799,
+      "grad_norm": 0.004375755321234465,
+      "learning_rate": 0.00018002539790539773,
+      "loss": 11.5,
+      "step": 315
+    },
+    {
+      "epoch": 0.050583268303419575,
+      "grad_norm": 0.0037666945718228817,
+      "learning_rate": 0.00017989878446510215,
+      "loss": 11.5,
+      "step": 316
+    },
+    {
+      "epoch": 0.05074334193729115,
+      "grad_norm": 0.0037504660431295633,
+      "learning_rate": 0.00017977181582949888,
+      "loss": 11.5,
+      "step": 317
+    },
+    {
+      "epoch": 0.050903415571162736,
+      "grad_norm": 0.003293354529887438,
+      "learning_rate": 0.0001796444925630353,
+      "loss": 11.5,
+      "step": 318
+    },
+    {
+      "epoch": 0.05106348920503431,
+      "grad_norm": 0.003078571753576398,
+      "learning_rate": 0.00017951681523173542,
+      "loss": 11.5,
+      "step": 319
+    },
+    {
+      "epoch": 0.051223562838905896,
+      "grad_norm": 0.004428179934620857,
+      "learning_rate": 0.0001793887844031972,
+      "loss": 11.5,
+      "step": 320
+    },
+    {
+      "epoch": 0.05138363647277748,
+      "grad_norm": 0.00518577266484499,
+      "learning_rate": 0.00017926040064659014,
+      "loss": 11.5,
+      "step": 321
+    },
+    {
+      "epoch": 0.05154371010664906,
+      "grad_norm": 0.0041250321082770824,
+      "learning_rate": 0.0001791316645326526,
+      "loss": 11.5,
+      "step": 322
+    },
+    {
+      "epoch": 0.05170378374052064,
+      "grad_norm": 0.00472582271322608,
+      "learning_rate": 0.00017900257663368963,
+      "loss": 11.5,
+      "step": 323
+    },
+    {
+      "epoch": 0.05186385737439222,
+      "grad_norm": 0.004885179456323385,
+      "learning_rate": 0.0001788731375235698,
+      "loss": 11.5,
+      "step": 324
+    },
+    {
+      "epoch": 0.0520239310082638,
+      "grad_norm": 0.004233427811414003,
+      "learning_rate": 0.00017874334777772327,
+      "loss": 11.5,
+      "step": 325
+    },
+    {
+      "epoch": 0.052184004642135386,
+      "grad_norm": 0.0075990804471075535,
+      "learning_rate": 0.00017861320797313892,
+      "loss": 11.5,
+      "step": 326
+    },
+    {
+      "epoch": 0.05234407827600696,
+      "grad_norm": 0.004590803291648626,
+      "learning_rate": 0.0001784827186883618,
+      "loss": 11.5,
+      "step": 327
+    },
+    {
+      "epoch": 0.052504151909878546,
+      "grad_norm": 0.007067880593240261,
+      "learning_rate": 0.00017835188050349064,
+      "loss": 11.5,
+      "step": 328
+    },
+    {
+      "epoch": 0.05266422554375012,
+      "grad_norm": 0.00449848547577858,
+      "learning_rate": 0.00017822069400017516,
+      "loss": 11.5,
+      "step": 329
+    },
+    {
+      "epoch": 0.05282429917762171,
+      "grad_norm": 0.003897727932780981,
+      "learning_rate": 0.00017808915976161362,
+      "loss": 11.5,
+      "step": 330
+    },
+    {
+      "epoch": 0.052984372811493284,
+      "grad_norm": 0.003709547221660614,
+      "learning_rate": 0.00017795727837255015,
+      "loss": 11.5,
+      "step": 331
+    },
+    {
+      "epoch": 0.05314444644536487,
+      "grad_norm": 0.0030794816557317972,
+      "learning_rate": 0.00017782505041927216,
+      "loss": 11.5,
+      "step": 332
+    },
+    {
+      "epoch": 0.05330452007923645,
+      "grad_norm": 0.0032050926238298416,
+      "learning_rate": 0.00017769247648960774,
+      "loss": 11.5,
+      "step": 333
+    },
+    {
+      "epoch": 0.05346459371310803,
+      "grad_norm": 0.0033614016138017178,
+      "learning_rate": 0.00017755955717292296,
+      "loss": 11.5,
+      "step": 334
+    },
+    {
+      "epoch": 0.05362466734697961,
+      "grad_norm": 0.003943106159567833,
+      "learning_rate": 0.00017742629306011944,
+      "loss": 11.5,
+      "step": 335
+    },
+    {
+      "epoch": 0.05378474098085119,
+      "grad_norm": 0.006896649021655321,
+      "learning_rate": 0.00017729268474363154,
+      "loss": 11.5,
+      "step": 336
+    },
+    {
+      "epoch": 0.05394481461472277,
+      "grad_norm": 0.004238173831254244,
+      "learning_rate": 0.0001771587328174239,
+      "loss": 11.5,
+      "step": 337
+    },
+    {
+      "epoch": 0.05410488824859435,
+      "grad_norm": 0.0035634618252515793,
+      "learning_rate": 0.0001770244378769885,
+      "loss": 11.5,
+      "step": 338
+    },
+    {
+      "epoch": 0.054264961882465934,
+      "grad_norm": 0.002922303741797805,
+      "learning_rate": 0.0001768898005193425,
+      "loss": 11.5,
+      "step": 339
+    },
+    {
+      "epoch": 0.05442503551633752,
+      "grad_norm": 0.0035376015584915876,
+      "learning_rate": 0.000176754821343025,
+      "loss": 11.5,
+      "step": 340
+    },
+    {
+      "epoch": 0.054585109150209095,
+      "grad_norm": 0.0034501487389206886,
+      "learning_rate": 0.0001766195009480949,
+      "loss": 11.5,
+      "step": 341
+    },
+    {
+      "epoch": 0.05474518278408068,
+      "grad_norm": 0.012458324432373047,
+      "learning_rate": 0.0001764838399361279,
+      "loss": 11.5,
+      "step": 342
+    },
+    {
+      "epoch": 0.054905256417952256,
+      "grad_norm": 0.0368187241256237,
+      "learning_rate": 0.00017634783891021393,
+      "loss": 11.5,
+      "step": 343
+    },
+    {
+      "epoch": 0.05506533005182384,
+      "grad_norm": 0.01612844504415989,
+      "learning_rate": 0.00017621149847495458,
+      "loss": 11.5,
+      "step": 344
+    },
+    {
+      "epoch": 0.05522540368569542,
+      "grad_norm": 0.006253810133785009,
+      "learning_rate": 0.00017607481923646016,
+      "loss": 11.5,
+      "step": 345
+    },
+    {
+      "epoch": 0.055385477319567,
+      "grad_norm": 0.008016599342226982,
+      "learning_rate": 0.0001759378018023473,
+      "loss": 11.5,
+      "step": 346
+    },
+    {
+      "epoch": 0.055545550953438584,
+      "grad_norm": 0.008556563407182693,
+      "learning_rate": 0.00017580044678173592,
+      "loss": 11.5,
+      "step": 347
+    },
+    {
+      "epoch": 0.05570562458731016,
+      "grad_norm": 0.007470616605132818,
+      "learning_rate": 0.00017566275478524693,
+      "loss": 11.5,
+      "step": 348
+    },
+    {
+      "epoch": 0.055865698221181745,
+      "grad_norm": 0.013478902168571949,
+      "learning_rate": 0.0001755247264249991,
+      "loss": 11.5,
+      "step": 349
+    },
+    {
+      "epoch": 0.05602577185505332,
+      "grad_norm": 0.011351917870342731,
+      "learning_rate": 0.0001753863623146066,
+      "loss": 11.5,
+      "step": 350
+    },
+    {
+      "epoch": 0.056185845488924906,
+      "grad_norm": 0.0046243141405284405,
+      "learning_rate": 0.00017524766306917618,
+      "loss": 11.5,
+      "step": 351
+    },
+    {
+      "epoch": 0.05634591912279649,
+      "grad_norm": 0.0038260514847934246,
+      "learning_rate": 0.0001751086293053045,
+      "loss": 11.5,
+      "step": 352
+    },
+    {
+      "epoch": 0.056505992756668066,
+      "grad_norm": 0.0032652895897626877,
+      "learning_rate": 0.0001749692616410753,
+      "loss": 11.5,
+      "step": 353
+    },
+    {
+      "epoch": 0.05666606639053965,
+      "grad_norm": 0.00571425911039114,
+      "learning_rate": 0.00017482956069605668,
+      "loss": 11.5,
+      "step": 354
+    },
+    {
+      "epoch": 0.05682614002441123,
+      "grad_norm": 0.005943847354501486,
+      "learning_rate": 0.00017468952709129846,
+      "loss": 11.5,
+      "step": 355
+    },
+    {
+      "epoch": 0.05698621365828281,
+      "grad_norm": 0.00332586164586246,
+      "learning_rate": 0.00017454916144932922,
+      "loss": 11.5,
+      "step": 356
+    },
+    {
+      "epoch": 0.05714628729215439,
+      "grad_norm": 0.002929111709818244,
+      "learning_rate": 0.0001744084643941536,
+      "loss": 11.5,
+      "step": 357
+    },
+    {
+      "epoch": 0.05730636092602597,
+      "grad_norm": 0.00693755317479372,
+      "learning_rate": 0.00017426743655124974,
+      "loss": 11.5,
+      "step": 358
+    },
+    {
+      "epoch": 0.057466434559897556,
+      "grad_norm": 0.004010320641100407,
+      "learning_rate": 0.0001741260785475661,
+      "loss": 11.5,
+      "step": 359
+    },
+    {
+      "epoch": 0.05762650819376913,
+      "grad_norm": 0.00563431903719902,
+      "learning_rate": 0.00017398439101151905,
+      "loss": 11.5,
+      "step": 360
+    },
+    {
+      "epoch": 0.057786581827640716,
+      "grad_norm": 0.004954980686306953,
+      "learning_rate": 0.00017384237457298987,
+      "loss": 11.5,
+      "step": 361
+    },
+    {
+      "epoch": 0.05794665546151229,
+      "grad_norm": 0.0028225225396454334,
+      "learning_rate": 0.00017370002986332193,
+      "loss": 11.5,
+      "step": 362
+    },
+    {
+      "epoch": 0.05810672909538388,
+      "grad_norm": 0.0028268226888030767,
+      "learning_rate": 0.00017355735751531807,
+      "loss": 11.5,
+      "step": 363
+    },
+    {
+      "epoch": 0.058266802729255454,
+      "grad_norm": 0.0043122912757098675,
+      "learning_rate": 0.00017341435816323756,
+      "loss": 11.5,
+      "step": 364
+    },
+    {
+      "epoch": 0.05842687636312704,
+      "grad_norm": 0.004459218587726355,
+      "learning_rate": 0.00017327103244279348,
+      "loss": 11.5,
+      "step": 365
+    },
+    {
+      "epoch": 0.05858694999699862,
+      "grad_norm": 0.0035306205973029137,
+      "learning_rate": 0.00017312738099114973,
+      "loss": 11.5,
+      "step": 366
+    },
+    {
+      "epoch": 0.0587470236308702,
+      "grad_norm": 0.0034746327437460423,
+      "learning_rate": 0.00017298340444691835,
+      "loss": 11.5,
+      "step": 367
+    },
+    {
+      "epoch": 0.05890709726474178,
+      "grad_norm": 0.0036737604532390833,
+      "learning_rate": 0.00017283910345015647,
+      "loss": 11.5,
+      "step": 368
+    },
+    {
+      "epoch": 0.05906717089861336,
+      "grad_norm": 0.002505925018340349,
+      "learning_rate": 0.0001726944786423637,
+      "loss": 11.5,
+      "step": 369
+    },
+    {
+      "epoch": 0.05922724453248494,
+      "grad_norm": 0.0037393190432339907,
+      "learning_rate": 0.00017254953066647913,
+      "loss": 11.5,
+      "step": 370
+    },
+    {
+      "epoch": 0.05938731816635653,
+      "grad_norm": 0.005481667350977659,
+      "learning_rate": 0.00017240426016687863,
+      "loss": 11.5,
+      "step": 371
+    },
+    {
+      "epoch": 0.059547391800228104,
+      "grad_norm": 0.0024753627367317677,
+      "learning_rate": 0.00017225866778937165,
+      "loss": 11.5,
+      "step": 372
+    },
+    {
+      "epoch": 0.05970746543409969,
+      "grad_norm": 0.0036691934801638126,
+      "learning_rate": 0.00017211275418119876,
+      "loss": 11.5,
+      "step": 373
+    },
+    {
+      "epoch": 0.059867539067971265,
+      "grad_norm": 0.006581971887499094,
+      "learning_rate": 0.0001719665199910285,
+      "loss": 11.5,
+      "step": 374
+    },
+    {
+      "epoch": 0.06002761270184285,
+      "grad_norm": 0.0032361126504838467,
+      "learning_rate": 0.00017181996586895454,
+      "loss": 11.5,
+      "step": 375
+    },
+    {
+      "epoch": 0.06002761270184285,
+      "eval_loss": 11.5,
+      "eval_runtime": 60.2134,
+      "eval_samples_per_second": 174.745,
+      "eval_steps_per_second": 87.373,
+      "step": 375
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 1500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 375,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 59818442588160.0,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8d57c9a175c297e95af3f2ca276784ac7a66686f1d659d9388be66b1b61db445
+size 6776

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff