Training in progress, epoch 0, checkpoint

Browse files

Files changed (13) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +35 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +30 -0
last-checkpoint/tokenizer.json +0 -0
last-checkpoint/tokenizer_config.json +31 -0
last-checkpoint/trainer_state.json +1531 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: facebook/opt-350m
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,35 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "facebook/opt-350m",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.15,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 32,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "out_proj",
+    "k_proj",
+    "project_out",
+    "fc1",
+    "v_proj",
+    "project_in",
+    "q_proj",
+    "fc2"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e1842d0289d059335f06c135c958642873f37de5d5021f03eebf92bef687067b
+size 57056672

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6f5fd3b168b7c1eeddc546440219d9d89aed35465f552ae829bc133dbb3c5b22
+size 114282170

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e4b6f9d779f6b65dd634a06a7b96c8fd5e2eb21a5dc80ff3ee8c2e546b7a31f3
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:891cad020bf7bee78efa739dc10e1e4315e34b096ed70226b38590ec81d7d418
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,30 @@

+{
+  "bos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "eos_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<pad>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  },
+  "unk_token": {
+    "content": "</s>",
+    "lstrip": false,
+    "normalized": true,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,31 @@

+{
+  "add_bos_token": true,
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "1": {
+      "content": "<pad>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "2": {
+      "content": "</s>",
+      "lstrip": false,
+      "normalized": true,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "bos_token": "</s>",
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "</s>",
+  "errors": "replace",
+  "model_max_length": 1000000000000000019884624838656,
+  "pad_token": "<pad>",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "</s>"
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,1531 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 0.661011347361463,
+  "eval_steps": 500,
+  "global_step": 1500,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0030847196210201607,
+      "grad_norm": 2.416346311569214,
+      "learning_rate": 7.000000000000001e-06,
+      "loss": 23.5635,
+      "step": 7
+    },
+    {
+      "epoch": 0.0061694392420403215,
+      "grad_norm": 2.179997682571411,
+      "learning_rate": 1.4000000000000001e-05,
+      "loss": 24.2796,
+      "step": 14
+    },
+    {
+      "epoch": 0.009254158863060483,
+      "grad_norm": 2.3815741539001465,
+      "learning_rate": 2.1e-05,
+      "loss": 23.4777,
+      "step": 21
+    },
+    {
+      "epoch": 0.012338878484080643,
+      "grad_norm": 2.2959744930267334,
+      "learning_rate": 2.8000000000000003e-05,
+      "loss": 22.7741,
+      "step": 28
+    },
+    {
+      "epoch": 0.015423598105100805,
+      "grad_norm": 2.0814883708953857,
+      "learning_rate": 3.5e-05,
+      "loss": 23.312,
+      "step": 35
+    },
+    {
+      "epoch": 0.018508317726120965,
+      "grad_norm": 2.2548601627349854,
+      "learning_rate": 4.2e-05,
+      "loss": 23.1693,
+      "step": 42
+    },
+    {
+      "epoch": 0.021593037347141127,
+      "grad_norm": 2.2283570766448975,
+      "learning_rate": 4.9e-05,
+      "loss": 23.0576,
+      "step": 49
+    },
+    {
+      "epoch": 0.024677756968161286,
+      "grad_norm": 2.230828046798706,
+      "learning_rate": 5.6000000000000006e-05,
+      "loss": 22.4088,
+      "step": 56
+    },
+    {
+      "epoch": 0.027762476589181448,
+      "grad_norm": 2.333780288696289,
+      "learning_rate": 6.3e-05,
+      "loss": 22.3488,
+      "step": 63
+    },
+    {
+      "epoch": 0.03084719621020161,
+      "grad_norm": 2.1342034339904785,
+      "learning_rate": 7e-05,
+      "loss": 21.5944,
+      "step": 70
+    },
+    {
+      "epoch": 0.03393191583122177,
+      "grad_norm": 2.360670328140259,
+      "learning_rate": 7.7e-05,
+      "loss": 21.6597,
+      "step": 77
+    },
+    {
+      "epoch": 0.03701663545224193,
+      "grad_norm": 1.9946174621582031,
+      "learning_rate": 8.4e-05,
+      "loss": 22.4954,
+      "step": 84
+    },
+    {
+      "epoch": 0.04010135507326209,
+      "grad_norm": 2.1843321323394775,
+      "learning_rate": 9.1e-05,
+      "loss": 22.3022,
+      "step": 91
+    },
+    {
+      "epoch": 0.043186074694282255,
+      "grad_norm": 2.3150522708892822,
+      "learning_rate": 9.8e-05,
+      "loss": 21.7739,
+      "step": 98
+    },
+    {
+      "epoch": 0.04627079431530241,
+      "grad_norm": 2.0341029167175293,
+      "learning_rate": 9.999685283773504e-05,
+      "loss": 21.6292,
+      "step": 105
+    },
+    {
+      "epoch": 0.04935551393632257,
+      "grad_norm": 2.019920825958252,
+      "learning_rate": 9.998187325055106e-05,
+      "loss": 21.5738,
+      "step": 112
+    },
+    {
+      "epoch": 0.05244023355734274,
+      "grad_norm": 2.205152988433838,
+      "learning_rate": 9.995456138403733e-05,
+      "loss": 21.4902,
+      "step": 119
+    },
+    {
+      "epoch": 0.055524953178362896,
+      "grad_norm": 2.240128755569458,
+      "learning_rate": 9.991492397698826e-05,
+      "loss": 21.6133,
+      "step": 126
+    },
+    {
+      "epoch": 0.058609672799383054,
+      "grad_norm": 2.140727996826172,
+      "learning_rate": 9.986297080934089e-05,
+      "loss": 22.1083,
+      "step": 133
+    },
+    {
+      "epoch": 0.06169439242040322,
+      "grad_norm": 2.304687261581421,
+      "learning_rate": 9.979871469976196e-05,
+      "loss": 21.2837,
+      "step": 140
+    },
+    {
+      "epoch": 0.06477911204142338,
+      "grad_norm": 2.40910267829895,
+      "learning_rate": 9.972217150248503e-05,
+      "loss": 21.3895,
+      "step": 147
+    },
+    {
+      "epoch": 0.06786383166244354,
+      "grad_norm": 2.2335853576660156,
+      "learning_rate": 9.963336010339868e-05,
+      "loss": 21.8322,
+      "step": 154
+    },
+    {
+      "epoch": 0.0709485512834637,
+      "grad_norm": 2.457430124282837,
+      "learning_rate": 9.953230241538674e-05,
+      "loss": 21.0453,
+      "step": 161
+    },
+    {
+      "epoch": 0.07403327090448386,
+      "grad_norm": 2.061185836791992,
+      "learning_rate": 9.941902337292155e-05,
+      "loss": 21.4088,
+      "step": 168
+    },
+    {
+      "epoch": 0.07711799052550403,
+      "grad_norm": 2.324650287628174,
+      "learning_rate": 9.92935509259118e-05,
+      "loss": 21.6713,
+      "step": 175
+    },
+    {
+      "epoch": 0.08020271014652418,
+      "grad_norm": 2.3613855838775635,
+      "learning_rate": 9.915591603280631e-05,
+      "loss": 21.3864,
+      "step": 182
+    },
+    {
+      "epoch": 0.08328742976754434,
+      "grad_norm": 2.1573920249938965,
+      "learning_rate": 9.900615265295552e-05,
+      "loss": 21.516,
+      "step": 189
+    },
+    {
+      "epoch": 0.08637214938856451,
+      "grad_norm": 2.366680145263672,
+      "learning_rate": 9.884429773823239e-05,
+      "loss": 20.9945,
+      "step": 196
+    },
+    {
+      "epoch": 0.08945686900958466,
+      "grad_norm": 2.237717866897583,
+      "learning_rate": 9.867039122391527e-05,
+      "loss": 20.8505,
+      "step": 203
+    },
+    {
+      "epoch": 0.09254158863060483,
+      "grad_norm": 2.182887554168701,
+      "learning_rate": 9.848447601883435e-05,
+      "loss": 20.8463,
+      "step": 210
+    },
+    {
+      "epoch": 0.09562630825162499,
+      "grad_norm": 2.437626361846924,
+      "learning_rate": 9.828659799478456e-05,
+      "loss": 20.9356,
+      "step": 217
+    },
+    {
+      "epoch": 0.09871102787264514,
+      "grad_norm": 2.4780032634735107,
+      "learning_rate": 9.807680597520746e-05,
+      "loss": 20.7796,
+      "step": 224
+    },
+    {
+      "epoch": 0.10179574749366531,
+      "grad_norm": 2.405442237854004,
+      "learning_rate": 9.785515172314463e-05,
+      "loss": 20.7798,
+      "step": 231
+    },
+    {
+      "epoch": 0.10488046711468547,
+      "grad_norm": 2.3394062519073486,
+      "learning_rate": 9.762168992846614e-05,
+      "loss": 20.091,
+      "step": 238
+    },
+    {
+      "epoch": 0.10796518673570563,
+      "grad_norm": 2.3465561866760254,
+      "learning_rate": 9.737647819437645e-05,
+      "loss": 20.8049,
+      "step": 245
+    },
+    {
+      "epoch": 0.11104990635672579,
+      "grad_norm": 2.3978774547576904,
+      "learning_rate": 9.711957702320175e-05,
+      "loss": 20.3028,
+      "step": 252
+    },
+    {
+      "epoch": 0.11413462597774596,
+      "grad_norm": 2.622391939163208,
+      "learning_rate": 9.685104980146193e-05,
+      "loss": 21.3249,
+      "step": 259
+    },
+    {
+      "epoch": 0.11721934559876611,
+      "grad_norm": 2.322021484375,
+      "learning_rate": 9.657096278423093e-05,
+      "loss": 20.3684,
+      "step": 266
+    },
+    {
+      "epoch": 0.12030406521978627,
+      "grad_norm": 2.2268433570861816,
+      "learning_rate": 9.627938507878917e-05,
+      "loss": 20.1157,
+      "step": 273
+    },
+    {
+      "epoch": 0.12338878484080644,
+      "grad_norm": 2.6939849853515625,
+      "learning_rate": 9.597638862757255e-05,
+      "loss": 20.1678,
+      "step": 280
+    },
+    {
+      "epoch": 0.1264735044618266,
+      "grad_norm": 2.622375965118408,
+      "learning_rate": 9.566204819042152e-05,
+      "loss": 20.8389,
+      "step": 287
+    },
+    {
+      "epoch": 0.12955822408284676,
+      "grad_norm": 2.5094549655914307,
+      "learning_rate": 9.533644132613541e-05,
+      "loss": 20.0785,
+      "step": 294
+    },
+    {
+      "epoch": 0.13264294370386692,
+      "grad_norm": 2.5023655891418457,
+      "learning_rate": 9.49996483733358e-05,
+      "loss": 20.3365,
+      "step": 301
+    },
+    {
+      "epoch": 0.1357276633248871,
+      "grad_norm": 2.475635290145874,
+      "learning_rate": 9.465175243064428e-05,
+      "loss": 20.9147,
+      "step": 308
+    },
+    {
+      "epoch": 0.13881238294590723,
+      "grad_norm": 2.356755256652832,
+      "learning_rate": 9.4292839336179e-05,
+      "loss": 20.6846,
+      "step": 315
+    },
+    {
+      "epoch": 0.1418971025669274,
+      "grad_norm": 2.688347339630127,
+      "learning_rate": 9.39229976463755e-05,
+      "loss": 20.7805,
+      "step": 322
+    },
+    {
+      "epoch": 0.14498182218794756,
+      "grad_norm": 2.5185210704803467,
+      "learning_rate": 9.354231861413668e-05,
+      "loss": 20.0641,
+      "step": 329
+    },
+    {
+      "epoch": 0.14806654180896772,
+      "grad_norm": 2.506746530532837,
+      "learning_rate": 9.315089616631752e-05,
+      "loss": 20.2469,
+      "step": 336
+    },
+    {
+      "epoch": 0.1511512614299879,
+      "grad_norm": 2.3669211864471436,
+      "learning_rate": 9.274882688055005e-05,
+      "loss": 20.7752,
+      "step": 343
+    },
+    {
+      "epoch": 0.15423598105100805,
+      "grad_norm": 2.7276268005371094,
+      "learning_rate": 9.233620996141421e-05,
+      "loss": 20.7967,
+      "step": 350
+    },
+    {
+      "epoch": 0.1573207006720282,
+      "grad_norm": 2.8304104804992676,
+      "learning_rate": 9.191314721596072e-05,
+      "loss": 21.2756,
+      "step": 357
+    },
+    {
+      "epoch": 0.16040542029304836,
+      "grad_norm": 2.6918582916259766,
+      "learning_rate": 9.147974302859157e-05,
+      "loss": 21.0171,
+      "step": 364
+    },
+    {
+      "epoch": 0.16349013991406852,
+      "grad_norm": 2.7030653953552246,
+      "learning_rate": 9.103610433530483e-05,
+      "loss": 20.3249,
+      "step": 371
+    },
+    {
+      "epoch": 0.1665748595350887,
+      "grad_norm": 2.528362512588501,
+      "learning_rate": 9.058234059730976e-05,
+      "loss": 20.0512,
+      "step": 378
+    },
+    {
+      "epoch": 0.16965957915610885,
+      "grad_norm": 2.951495885848999,
+      "learning_rate": 9.01185637740189e-05,
+      "loss": 21.1341,
+      "step": 385
+    },
+    {
+      "epoch": 0.17274429877712902,
+      "grad_norm": 2.648630142211914,
+      "learning_rate": 8.964488829542377e-05,
+      "loss": 20.804,
+      "step": 392
+    },
+    {
+      "epoch": 0.17582901839814916,
+      "grad_norm": 2.99994158744812,
+      "learning_rate": 8.916143103386093e-05,
+      "loss": 20.2452,
+      "step": 399
+    },
+    {
+      "epoch": 0.17891373801916932,
+      "grad_norm": 2.698529005050659,
+      "learning_rate": 8.866831127517557e-05,
+      "loss": 20.4509,
+      "step": 406
+    },
+    {
+      "epoch": 0.1819984576401895,
+      "grad_norm": 2.649247646331787,
+      "learning_rate": 8.81656506892894e-05,
+      "loss": 19.8498,
+      "step": 413
+    },
+    {
+      "epoch": 0.18508317726120965,
+      "grad_norm": 2.640721321105957,
+      "learning_rate": 8.765357330018056e-05,
+      "loss": 21.0144,
+      "step": 420
+    },
+    {
+      "epoch": 0.18816789688222982,
+      "grad_norm": 2.5582149028778076,
+      "learning_rate": 8.71322054552824e-05,
+      "loss": 20.4449,
+      "step": 427
+    },
+    {
+      "epoch": 0.19125261650324998,
+      "grad_norm": 2.586487054824829,
+      "learning_rate": 8.660167579430927e-05,
+      "loss": 20.0445,
+      "step": 434
+    },
+    {
+      "epoch": 0.19433733612427012,
+      "grad_norm": 3.220254421234131,
+      "learning_rate": 8.606211521751652e-05,
+      "loss": 20.7414,
+      "step": 441
+    },
+    {
+      "epoch": 0.1974220557452903,
+      "grad_norm": 2.6512460708618164,
+      "learning_rate": 8.551365685340285e-05,
+      "loss": 20.0391,
+      "step": 448
+    },
+    {
+      "epoch": 0.20050677536631045,
+      "grad_norm": 2.9190361499786377,
+      "learning_rate": 8.495643602586287e-05,
+      "loss": 19.989,
+      "step": 455
+    },
+    {
+      "epoch": 0.20359149498733062,
+      "grad_norm": 2.770519256591797,
+      "learning_rate": 8.439059022079789e-05,
+      "loss": 19.9756,
+      "step": 462
+    },
+    {
+      "epoch": 0.20667621460835078,
+      "grad_norm": 2.722433567047119,
+      "learning_rate": 8.381625905219339e-05,
+      "loss": 20.0968,
+      "step": 469
+    },
+    {
+      "epoch": 0.20976093422937095,
+      "grad_norm": 2.667473793029785,
+      "learning_rate": 8.32335842276713e-05,
+      "loss": 19.2765,
+      "step": 476
+    },
+    {
+      "epoch": 0.2128456538503911,
+      "grad_norm": 2.8417649269104004,
+      "learning_rate": 8.264270951352581e-05,
+      "loss": 20.6839,
+      "step": 483
+    },
+    {
+      "epoch": 0.21593037347141125,
+      "grad_norm": 2.5752272605895996,
+      "learning_rate": 8.20437806992512e-05,
+      "loss": 19.9479,
+      "step": 490
+    },
+    {
+      "epoch": 0.21901509309243142,
+      "grad_norm": 2.998474597930908,
+      "learning_rate": 8.143694556157046e-05,
+      "loss": 20.2107,
+      "step": 497
+    },
+    {
+      "epoch": 0.22209981271345158,
+      "grad_norm": 2.778343677520752,
+      "learning_rate": 8.082235382797349e-05,
+      "loss": 19.7211,
+      "step": 504
+    },
+    {
+      "epoch": 0.22518453233447175,
+      "grad_norm": 2.9170329570770264,
+      "learning_rate": 8.020015713977427e-05,
+      "loss": 19.6575,
+      "step": 511
+    },
+    {
+      "epoch": 0.22826925195549191,
+      "grad_norm": 3.0816757678985596,
+      "learning_rate": 7.957050901469545e-05,
+      "loss": 20.2297,
+      "step": 518
+    },
+    {
+      "epoch": 0.23135397157651205,
+      "grad_norm": 2.636806011199951,
+      "learning_rate": 7.89335648089903e-05,
+      "loss": 20.7546,
+      "step": 525
+    },
+    {
+      "epoch": 0.23443869119753222,
+      "grad_norm": 3.121128559112549,
+      "learning_rate": 7.828948167911074e-05,
+      "loss": 20.1973,
+      "step": 532
+    },
+    {
+      "epoch": 0.23752341081855238,
+      "grad_norm": 2.9973998069763184,
+      "learning_rate": 7.763841854293145e-05,
+      "loss": 20.3803,
+      "step": 539
+    },
+    {
+      "epoch": 0.24060813043957255,
+      "grad_norm": 3.005627393722534,
+      "learning_rate": 7.698053604053922e-05,
+      "loss": 20.769,
+      "step": 546
+    },
+    {
+      "epoch": 0.24369285006059271,
+      "grad_norm": 2.5811495780944824,
+      "learning_rate": 7.631599649459744e-05,
+      "loss": 19.2179,
+      "step": 553
+    },
+    {
+      "epoch": 0.24677756968161288,
+      "grad_norm": 2.6861519813537598,
+      "learning_rate": 7.564496387029532e-05,
+      "loss": 20.4639,
+      "step": 560
+    },
+    {
+      "epoch": 0.24986228930263302,
+      "grad_norm": 3.1090645790100098,
+      "learning_rate": 7.496760373489202e-05,
+      "loss": 20.6762,
+      "step": 567
+    },
+    {
+      "epoch": 0.2529470089236532,
+      "grad_norm": 3.1814725399017334,
+      "learning_rate": 7.428408321686541e-05,
+      "loss": 20.213,
+      "step": 574
+    },
+    {
+      "epoch": 0.2560317285446734,
+      "grad_norm": 3.1218533515930176,
+      "learning_rate": 7.35945709646756e-05,
+      "loss": 20.5288,
+      "step": 581
+    },
+    {
+      "epoch": 0.2591164481656935,
+      "grad_norm": 3.204477548599243,
+      "learning_rate": 7.289923710515339e-05,
+      "loss": 19.7391,
+      "step": 588
+    },
+    {
+      "epoch": 0.26220116778671365,
+      "grad_norm": 2.90537428855896,
+      "learning_rate": 7.219825320152411e-05,
+      "loss": 20.0142,
+      "step": 595
+    },
+    {
+      "epoch": 0.26528588740773384,
+      "grad_norm": 2.678103446960449,
+      "learning_rate": 7.149179221107694e-05,
+      "loss": 20.6346,
+      "step": 602
+    },
+    {
+      "epoch": 0.268370607028754,
+      "grad_norm": 3.1011834144592285,
+      "learning_rate": 7.078002844249032e-05,
+      "loss": 20.1728,
+      "step": 609
+    },
+    {
+      "epoch": 0.2714553266497742,
+      "grad_norm": 2.723069429397583,
+      "learning_rate": 7.006313751282372e-05,
+      "loss": 19.8334,
+      "step": 616
+    },
+    {
+      "epoch": 0.2745400462707943,
+      "grad_norm": 3.0159788131713867,
+      "learning_rate": 6.934129630418701e-05,
+      "loss": 19.4592,
+      "step": 623
+    },
+    {
+      "epoch": 0.27762476589181445,
+      "grad_norm": 2.8916268348693848,
+      "learning_rate": 6.861468292009727e-05,
+      "loss": 19.0861,
+      "step": 630
+    },
+    {
+      "epoch": 0.28070948551283464,
+      "grad_norm": 2.950063943862915,
+      "learning_rate": 6.788347664153447e-05,
+      "loss": 20.0214,
+      "step": 637
+    },
+    {
+      "epoch": 0.2837942051338548,
+      "grad_norm": 2.842519760131836,
+      "learning_rate": 6.714785788270658e-05,
+      "loss": 19.056,
+      "step": 644
+    },
+    {
+      "epoch": 0.286878924754875,
+      "grad_norm": 2.8973655700683594,
+      "learning_rate": 6.640800814653503e-05,
+      "loss": 19.6187,
+      "step": 651
+    },
+    {
+      "epoch": 0.2899636443758951,
+      "grad_norm": 2.975876808166504,
+      "learning_rate": 6.566410997987163e-05,
+      "loss": 19.8787,
+      "step": 658
+    },
+    {
+      "epoch": 0.2930483639969153,
+      "grad_norm": 3.0167558193206787,
+      "learning_rate": 6.49163469284578e-05,
+      "loss": 20.5267,
+      "step": 665
+    },
+    {
+      "epoch": 0.29613308361793544,
+      "grad_norm": 3.1982202529907227,
+      "learning_rate": 6.416490349163748e-05,
+      "loss": 20.2634,
+      "step": 672
+    },
+    {
+      "epoch": 0.2992178032389556,
+      "grad_norm": 2.9147167205810547,
+      "learning_rate": 6.340996507683458e-05,
+      "loss": 19.7286,
+      "step": 679
+    },
+    {
+      "epoch": 0.3023025228599758,
+      "grad_norm": 3.111374616622925,
+      "learning_rate": 6.265171795380659e-05,
+      "loss": 20.043,
+      "step": 686
+    },
+    {
+      "epoch": 0.3053872424809959,
+      "grad_norm": 2.8042960166931152,
+      "learning_rate": 6.189034920868522e-05,
+      "loss": 19.6608,
+      "step": 693
+    },
+    {
+      "epoch": 0.3084719621020161,
+      "grad_norm": 2.809349536895752,
+      "learning_rate": 6.112604669781572e-05,
+      "loss": 19.8869,
+      "step": 700
+    },
+    {
+      "epoch": 0.31155668172303624,
+      "grad_norm": 2.867894172668457,
+      "learning_rate": 6.0358999001406156e-05,
+      "loss": 20.4228,
+      "step": 707
+    },
+    {
+      "epoch": 0.3146414013440564,
+      "grad_norm": 3.0357677936553955,
+      "learning_rate": 5.9589395376998e-05,
+      "loss": 18.8881,
+      "step": 714
+    },
+    {
+      "epoch": 0.3177261209650766,
+      "grad_norm": 3.256911516189575,
+      "learning_rate": 5.8817425712769794e-05,
+      "loss": 19.7992,
+      "step": 721
+    },
+    {
+      "epoch": 0.3208108405860967,
+      "grad_norm": 3.1102960109710693,
+      "learning_rate": 5.804328048068492e-05,
+      "loss": 19.3251,
+      "step": 728
+    },
+    {
+      "epoch": 0.3238955602071169,
+      "grad_norm": 3.086207866668701,
+      "learning_rate": 5.7267150689495644e-05,
+      "loss": 20.0696,
+      "step": 735
+    },
+    {
+      "epoch": 0.32698027982813704,
+      "grad_norm": 3.2523818016052246,
+      "learning_rate": 5.648922783761443e-05,
+      "loss": 20.3309,
+      "step": 742
+    },
+    {
+      "epoch": 0.33006499944915724,
+      "grad_norm": 2.8286468982696533,
+      "learning_rate": 5.570970386586469e-05,
+      "loss": 19.3409,
+      "step": 749
+    },
+    {
+      "epoch": 0.3331497190701774,
+      "grad_norm": 3.046022415161133,
+      "learning_rate": 5.492877111012218e-05,
+      "loss": 19.3339,
+      "step": 756
+    },
+    {
+      "epoch": 0.3362344386911975,
+      "grad_norm": 2.988107204437256,
+      "learning_rate": 5.414662225385903e-05,
+      "loss": 19.6734,
+      "step": 763
+    },
+    {
+      "epoch": 0.3393191583122177,
+      "grad_norm": 2.9048714637756348,
+      "learning_rate": 5.336345028060199e-05,
+      "loss": 19.4514,
+      "step": 770
+    },
+    {
+      "epoch": 0.34240387793323784,
+      "grad_norm": 3.073075294494629,
+      "learning_rate": 5.257944842631658e-05,
+      "loss": 19.525,
+      "step": 777
+    },
+    {
+      "epoch": 0.34548859755425804,
+      "grad_norm": 2.7497434616088867,
+      "learning_rate": 5.179481013172912e-05,
+      "loss": 19.2898,
+      "step": 784
+    },
+    {
+      "epoch": 0.3485733171752782,
+      "grad_norm": 2.99352765083313,
+      "learning_rate": 5.100972899459796e-05,
+      "loss": 19.632,
+      "step": 791
+    },
+    {
+      "epoch": 0.3516580367962983,
+      "grad_norm": 2.820322275161743,
+      "learning_rate": 5.022439872194629e-05,
+      "loss": 19.9252,
+      "step": 798
+    },
+    {
+      "epoch": 0.3547427564173185,
+      "grad_norm": 3.083574056625366,
+      "learning_rate": 4.943901308226771e-05,
+      "loss": 20.1958,
+      "step": 805
+    },
+    {
+      "epoch": 0.35782747603833864,
+      "grad_norm": 2.9774820804595947,
+      "learning_rate": 4.865376585771687e-05,
+      "loss": 19.4867,
+      "step": 812
+    },
+    {
+      "epoch": 0.36091219565935884,
+      "grad_norm": 3.1358230113983154,
+      "learning_rate": 4.7868850796296495e-05,
+      "loss": 19.4092,
+      "step": 819
+    },
+    {
+      "epoch": 0.363996915280379,
+      "grad_norm": 2.8246376514434814,
+      "learning_rate": 4.708446156405307e-05,
+      "loss": 19.7939,
+      "step": 826
+    },
+    {
+      "epoch": 0.36708163490139917,
+      "grad_norm": 2.7796778678894043,
+      "learning_rate": 4.630079169729257e-05,
+      "loss": 19.0637,
+      "step": 833
+    },
+    {
+      "epoch": 0.3701663545224193,
+      "grad_norm": 2.9270405769348145,
+      "learning_rate": 4.551803455482833e-05,
+      "loss": 19.7355,
+      "step": 840
+    },
+    {
+      "epoch": 0.37325107414343944,
+      "grad_norm": 3.2176382541656494,
+      "learning_rate": 4.473638327027259e-05,
+      "loss": 20.4546,
+      "step": 847
+    },
+    {
+      "epoch": 0.37633579376445964,
+      "grad_norm": 3.1589975357055664,
+      "learning_rate": 4.395603070438373e-05,
+      "loss": 20.7552,
+      "step": 854
+    },
+    {
+      "epoch": 0.3794205133854798,
+      "grad_norm": 2.9611129760742188,
+      "learning_rate": 4.31771693974807e-05,
+      "loss": 19.5708,
+      "step": 861
+    },
+    {
+      "epoch": 0.38250523300649997,
+      "grad_norm": 2.9310545921325684,
+      "learning_rate": 4.239999152193664e-05,
+      "loss": 19.0704,
+      "step": 868
+    },
+    {
+      "epoch": 0.3855899526275201,
+      "grad_norm": 2.7281360626220703,
+      "learning_rate": 4.162468883476319e-05,
+      "loss": 19.631,
+      "step": 875
+    },
+    {
+      "epoch": 0.38867467224854024,
+      "grad_norm": 2.9932737350463867,
+      "learning_rate": 4.085145263029726e-05,
+      "loss": 20.2681,
+      "step": 882
+    },
+    {
+      "epoch": 0.39175939186956044,
+      "grad_norm": 3.23228120803833,
+      "learning_rate": 4.008047369300218e-05,
+      "loss": 19.3911,
+      "step": 889
+    },
+    {
+      "epoch": 0.3948441114905806,
+      "grad_norm": 3.382631540298462,
+      "learning_rate": 3.9311942250394276e-05,
+      "loss": 19.8889,
+      "step": 896
+    },
+    {
+      "epoch": 0.39792883111160077,
+      "grad_norm": 3.813472032546997,
+      "learning_rate": 3.8546047926107256e-05,
+      "loss": 19.2962,
+      "step": 903
+    },
+    {
+      "epoch": 0.4010135507326209,
+      "grad_norm": 3.2448301315307617,
+      "learning_rate": 3.778297969310529e-05,
+      "loss": 19.6148,
+      "step": 910
+    },
+    {
+      "epoch": 0.4040982703536411,
+      "grad_norm": 3.1098744869232178,
+      "learning_rate": 3.7022925827056884e-05,
+      "loss": 19.6414,
+      "step": 917
+    },
+    {
+      "epoch": 0.40718298997466124,
+      "grad_norm": 3.254763603210449,
+      "learning_rate": 3.62660738598805e-05,
+      "loss": 19.6441,
+      "step": 924
+    },
+    {
+      "epoch": 0.4102677095956814,
+      "grad_norm": 2.825139284133911,
+      "learning_rate": 3.551261053347404e-05,
+      "loss": 19.0871,
+      "step": 931
+    },
+    {
+      "epoch": 0.41335242921670157,
+      "grad_norm": 3.137845039367676,
+      "learning_rate": 3.4762721753638995e-05,
+      "loss": 21.1325,
+      "step": 938
+    },
+    {
+      "epoch": 0.4164371488377217,
+      "grad_norm": 2.8734850883483887,
+      "learning_rate": 3.401659254421094e-05,
+      "loss": 20.395,
+      "step": 945
+    },
+    {
+      "epoch": 0.4195218684587419,
+      "grad_norm": 3.046984910964966,
+      "learning_rate": 3.3274407001407735e-05,
+      "loss": 20.0349,
+      "step": 952
+    },
+    {
+      "epoch": 0.42260658807976204,
+      "grad_norm": 3.024266242980957,
+      "learning_rate": 3.2536348248406534e-05,
+      "loss": 19.3565,
+      "step": 959
+    },
+    {
+      "epoch": 0.4256913077007822,
+      "grad_norm": 2.8583052158355713,
+      "learning_rate": 3.1802598390160784e-05,
+      "loss": 19.5142,
+      "step": 966
+    },
+    {
+      "epoch": 0.42877602732180237,
+      "grad_norm": 2.9113707542419434,
+      "learning_rate": 3.107333846846872e-05,
+      "loss": 18.874,
+      "step": 973
+    },
+    {
+      "epoch": 0.4318607469428225,
+      "grad_norm": 2.8726882934570312,
+      "learning_rate": 3.0348748417303823e-05,
+      "loss": 18.9628,
+      "step": 980
+    },
+    {
+      "epoch": 0.4349454665638427,
+      "grad_norm": 2.905684471130371,
+      "learning_rate": 2.9629007018418985e-05,
+      "loss": 19.7288,
+      "step": 987
+    },
+    {
+      "epoch": 0.43803018618486284,
+      "grad_norm": 2.931853771209717,
+      "learning_rate": 2.8914291857234636e-05,
+      "loss": 19.6123,
+      "step": 994
+    },
+    {
+      "epoch": 0.44111490580588303,
+      "grad_norm": 2.875977039337158,
+      "learning_rate": 2.8204779279022276e-05,
+      "loss": 19.1103,
+      "step": 1001
+    },
+    {
+      "epoch": 0.44419962542690317,
+      "grad_norm": 3.1230177879333496,
+      "learning_rate": 2.7500644345393943e-05,
+      "loss": 19.3083,
+      "step": 1008
+    },
+    {
+      "epoch": 0.4472843450479233,
+      "grad_norm": 2.8177175521850586,
+      "learning_rate": 2.68020607911083e-05,
+      "loss": 19.2991,
+      "step": 1015
+    },
+    {
+      "epoch": 0.4503690646689435,
+      "grad_norm": 2.941176652908325,
+      "learning_rate": 2.610920098120424e-05,
+      "loss": 19.0886,
+      "step": 1022
+    },
+    {
+      "epoch": 0.45345378428996364,
+      "grad_norm": 3.299826145172119,
+      "learning_rate": 2.5422235868472345e-05,
+      "loss": 19.3967,
+      "step": 1029
+    },
+    {
+      "epoch": 0.45653850391098383,
+      "grad_norm": 3.285066604614258,
+      "learning_rate": 2.4741334951274947e-05,
+      "loss": 19.9651,
+      "step": 1036
+    },
+    {
+      "epoch": 0.45962322353200397,
+      "grad_norm": 2.9992995262145996,
+      "learning_rate": 2.40666662317248e-05,
+      "loss": 19.4277,
+      "step": 1043
+    },
+    {
+      "epoch": 0.4627079431530241,
+      "grad_norm": 2.8765876293182373,
+      "learning_rate": 2.3398396174233178e-05,
+      "loss": 19.9635,
+      "step": 1050
+    },
+    {
+      "epoch": 0.4657926627740443,
+      "grad_norm": 2.8303496837615967,
+      "learning_rate": 2.2736689664437217e-05,
+      "loss": 19.0895,
+      "step": 1057
+    },
+    {
+      "epoch": 0.46887738239506443,
+      "grad_norm": 2.9009594917297363,
+      "learning_rate": 2.2081709968516866e-05,
+      "loss": 19.371,
+      "step": 1064
+    },
+    {
+      "epoch": 0.47196210201608463,
+      "grad_norm": 2.944718837738037,
+      "learning_rate": 2.1433618692911467e-05,
+      "loss": 19.6877,
+      "step": 1071
+    },
+    {
+      "epoch": 0.47504682163710477,
+      "grad_norm": 3.013751983642578,
+      "learning_rate": 2.0792575744445653e-05,
+      "loss": 20.4626,
+      "step": 1078
+    },
+    {
+      "epoch": 0.4781315412581249,
+      "grad_norm": 2.8704001903533936,
+      "learning_rate": 2.015873929087482e-05,
+      "loss": 19.0005,
+      "step": 1085
+    },
+    {
+      "epoch": 0.4812162608791451,
+      "grad_norm": 3.1911540031433105,
+      "learning_rate": 1.95322657218596e-05,
+      "loss": 19.5359,
+      "step": 1092
+    },
+    {
+      "epoch": 0.48430098050016523,
+      "grad_norm": 3.5429601669311523,
+      "learning_rate": 1.8913309610379015e-05,
+      "loss": 19.3831,
+      "step": 1099
+    },
+    {
+      "epoch": 0.48738570012118543,
+      "grad_norm": 2.8143036365509033,
+      "learning_rate": 1.8302023674591935e-05,
+      "loss": 19.3294,
+      "step": 1106
+    },
+    {
+      "epoch": 0.49047041974220557,
+      "grad_norm": 2.7880630493164062,
+      "learning_rate": 1.7698558740156135e-05,
+      "loss": 19.48,
+      "step": 1113
+    },
+    {
+      "epoch": 0.49355513936322576,
+      "grad_norm": 2.929053544998169,
+      "learning_rate": 1.7103063703014372e-05,
+      "loss": 19.0396,
+      "step": 1120
+    },
+    {
+      "epoch": 0.4966398589842459,
+      "grad_norm": 2.9036405086517334,
+      "learning_rate": 1.6515685492656467e-05,
+      "loss": 20.3919,
+      "step": 1127
+    },
+    {
+      "epoch": 0.49972457860526603,
+      "grad_norm": 3.1344425678253174,
+      "learning_rate": 1.59365690358667e-05,
+      "loss": 19.4341,
+      "step": 1134
+    },
+    {
+      "epoch": 0.5028092982262862,
+      "grad_norm": 2.934563636779785,
+      "learning_rate": 1.5365857220965275e-05,
+      "loss": 18.8088,
+      "step": 1141
+    },
+    {
+      "epoch": 0.5058940178473064,
+      "grad_norm": 3.0883371829986572,
+      "learning_rate": 1.4803690862552755e-05,
+      "loss": 19.3404,
+      "step": 1148
+    },
+    {
+      "epoch": 0.5089787374683266,
+      "grad_norm": 2.808397054672241,
+      "learning_rate": 1.4250208666766235e-05,
+      "loss": 19.0109,
+      "step": 1155
+    },
+    {
+      "epoch": 0.5120634570893468,
+      "grad_norm": 2.759035587310791,
+      "learning_rate": 1.3705547197055584e-05,
+      "loss": 19.0286,
+      "step": 1162
+    },
+    {
+      "epoch": 0.5151481767103668,
+      "grad_norm": 2.8846778869628906,
+      "learning_rate": 1.3169840840488501e-05,
+      "loss": 19.7108,
+      "step": 1169
+    },
+    {
+      "epoch": 0.518232896331387,
+      "grad_norm": 2.950453519821167,
+      "learning_rate": 1.2643221774592518e-05,
+      "loss": 19.3757,
+      "step": 1176
+    },
+    {
+      "epoch": 0.5213176159524072,
+      "grad_norm": 2.7792983055114746,
+      "learning_rate": 1.2125819934742188e-05,
+      "loss": 19.4143,
+      "step": 1183
+    },
+    {
+      "epoch": 0.5244023355734273,
+      "grad_norm": 2.9369263648986816,
+      "learning_rate": 1.1617762982099446e-05,
+      "loss": 19.3769,
+      "step": 1190
+    },
+    {
+      "epoch": 0.5274870551944475,
+      "grad_norm": 3.277841091156006,
+      "learning_rate": 1.1119176272115128e-05,
+      "loss": 19.0001,
+      "step": 1197
+    },
+    {
+      "epoch": 0.5305717748154677,
+      "grad_norm": 3.007042169570923,
+      "learning_rate": 1.0630182823599399e-05,
+      "loss": 19.3827,
+      "step": 1204
+    },
+    {
+      "epoch": 0.5336564944364878,
+      "grad_norm": 2.832447052001953,
+      "learning_rate": 1.0150903288368741e-05,
+      "loss": 19.1139,
+      "step": 1211
+    },
+    {
+      "epoch": 0.536741214057508,
+      "grad_norm": 3.4219977855682373,
+      "learning_rate": 9.681455921476839e-06,
+      "loss": 18.8895,
+      "step": 1218
+    },
+    {
+      "epoch": 0.5398259336785282,
+      "grad_norm": 3.0190682411193848,
+      "learning_rate": 9.221956552036992e-06,
+      "loss": 18.9905,
+      "step": 1225
+    },
+    {
+      "epoch": 0.5429106532995484,
+      "grad_norm": 2.9112884998321533,
+      "learning_rate": 8.772518554642973e-06,
+      "loss": 18.9397,
+      "step": 1232
+    },
+    {
+      "epoch": 0.5459953729205684,
+      "grad_norm": 3.5647706985473633,
+      "learning_rate": 8.333252821395526e-06,
+      "loss": 19.9344,
+      "step": 1239
+    },
+    {
+      "epoch": 0.5490800925415886,
+      "grad_norm": 2.910604953765869,
+      "learning_rate": 7.904267734541498e-06,
+      "loss": 19.9231,
+      "step": 1246
+    },
+    {
+      "epoch": 0.5521648121626088,
+      "grad_norm": 3.173757791519165,
+      "learning_rate": 7.485669139732004e-06,
+      "loss": 19.2161,
+      "step": 1253
+    },
+    {
+      "epoch": 0.5552495317836289,
+      "grad_norm": 3.1094110012054443,
+      "learning_rate": 7.077560319906695e-06,
+      "loss": 19.7072,
+      "step": 1260
+    },
+    {
+      "epoch": 0.5583342514046491,
+      "grad_norm": 2.935187339782715,
+      "learning_rate": 6.680041969810203e-06,
+      "loss": 19.9481,
+      "step": 1267
+    },
+    {
+      "epoch": 0.5614189710256693,
+      "grad_norm": 2.720287799835205,
+      "learning_rate": 6.293212171147206e-06,
+      "loss": 19.1016,
+      "step": 1274
+    },
+    {
+      "epoch": 0.5645036906466895,
+      "grad_norm": 3.204946994781494,
+      "learning_rate": 5.917166368382277e-06,
+      "loss": 19.2001,
+      "step": 1281
+    },
+    {
+      "epoch": 0.5675884102677096,
+      "grad_norm": 2.843729019165039,
+      "learning_rate": 5.5519973451903405e-06,
+      "loss": 19.4673,
+      "step": 1288
+    },
+    {
+      "epoch": 0.5706731298887298,
+      "grad_norm": 3.0020055770874023,
+      "learning_rate": 5.197795201563743e-06,
+      "loss": 19.4574,
+      "step": 1295
+    },
+    {
+      "epoch": 0.57375784950975,
+      "grad_norm": 2.837780714035034,
+      "learning_rate": 4.8546473315813856e-06,
+      "loss": 19.7657,
+      "step": 1302
+    },
+    {
+      "epoch": 0.57684256913077,
+      "grad_norm": 2.8947691917419434,
+      "learning_rate": 4.522638401845547e-06,
+      "loss": 18.9782,
+      "step": 1309
+    },
+    {
+      "epoch": 0.5799272887517902,
+      "grad_norm": 3.3117258548736572,
+      "learning_rate": 4.2018503305916775e-06,
+      "loss": 19.4101,
+      "step": 1316
+    },
+    {
+      "epoch": 0.5830120083728104,
+      "grad_norm": 2.7846477031707764,
+      "learning_rate": 3.892362267476313e-06,
+      "loss": 19.7598,
+      "step": 1323
+    },
+    {
+      "epoch": 0.5860967279938306,
+      "grad_norm": 3.0101373195648193,
+      "learning_rate": 3.5942505740480582e-06,
+      "loss": 18.8845,
+      "step": 1330
+    },
+    {
+      "epoch": 0.5891814476148507,
+      "grad_norm": 2.84769868850708,
+      "learning_rate": 3.3075888049065196e-06,
+      "loss": 20.0114,
+      "step": 1337
+    },
+    {
+      "epoch": 0.5922661672358709,
+      "grad_norm": 2.7414329051971436,
+      "learning_rate": 3.03244768955383e-06,
+      "loss": 19.705,
+      "step": 1344
+    },
+    {
+      "epoch": 0.5953508868568911,
+      "grad_norm": 3.127094030380249,
+      "learning_rate": 2.7688951149431595e-06,
+      "loss": 19.0974,
+      "step": 1351
+    },
+    {
+      "epoch": 0.5984356064779112,
+      "grad_norm": 2.868130922317505,
+      "learning_rate": 2.5169961087286974e-06,
+      "loss": 19.2643,
+      "step": 1358
+    },
+    {
+      "epoch": 0.6015203260989314,
+      "grad_norm": 3.1405887603759766,
+      "learning_rate": 2.276812823220964e-06,
+      "loss": 19.2294,
+      "step": 1365
+    },
+    {
+      "epoch": 0.6046050457199516,
+      "grad_norm": 2.8604586124420166,
+      "learning_rate": 2.048404520051722e-06,
+      "loss": 19.2823,
+      "step": 1372
+    },
+    {
+      "epoch": 0.6076897653409716,
+      "grad_norm": 3.2261486053466797,
+      "learning_rate": 1.8318275555520237e-06,
+      "loss": 18.288,
+      "step": 1379
+    },
+    {
+      "epoch": 0.6107744849619918,
+      "grad_norm": 2.863901138305664,
+      "learning_rate": 1.6271353668471655e-06,
+      "loss": 19.3459,
+      "step": 1386
+    },
+    {
+      "epoch": 0.613859204583012,
+      "grad_norm": 3.077561378479004,
+      "learning_rate": 1.4343784586718311e-06,
+      "loss": 19.0298,
+      "step": 1393
+    },
+    {
+      "epoch": 0.6169439242040322,
+      "grad_norm": 2.8641955852508545,
+      "learning_rate": 1.2536043909088191e-06,
+      "loss": 19.392,
+      "step": 1400
+    },
+    {
+      "epoch": 0.6200286438250523,
+      "grad_norm": 3.216136932373047,
+      "learning_rate": 1.0848577668543802e-06,
+      "loss": 19.494,
+      "step": 1407
+    },
+    {
+      "epoch": 0.6231133634460725,
+      "grad_norm": 3.0624561309814453,
+      "learning_rate": 9.281802222129765e-07,
+      "loss": 19.6816,
+      "step": 1414
+    },
+    {
+      "epoch": 0.6261980830670927,
+      "grad_norm": 2.73820424079895,
+      "learning_rate": 7.836104148243484e-07,
+      "loss": 19.5773,
+      "step": 1421
+    },
+    {
+      "epoch": 0.6292828026881128,
+      "grad_norm": 3.053277015686035,
+      "learning_rate": 6.511840151252169e-07,
+      "loss": 19.6097,
+      "step": 1428
+    },
+    {
+      "epoch": 0.632367522309133,
+      "grad_norm": 3.1816012859344482,
+      "learning_rate": 5.309336973481683e-07,
+      "loss": 19.8133,
+      "step": 1435
+    },
+    {
+      "epoch": 0.6354522419301531,
+      "grad_norm": 2.8629932403564453,
+      "learning_rate": 4.228891314597694e-07,
+      "loss": 19.1801,
+      "step": 1442
+    },
+    {
+      "epoch": 0.6385369615511733,
+      "grad_norm": 2.9186813831329346,
+      "learning_rate": 3.2707697583995167e-07,
+      "loss": 19.1621,
+      "step": 1449
+    },
+    {
+      "epoch": 0.6416216811721934,
+      "grad_norm": 2.8842525482177734,
+      "learning_rate": 2.4352087070443895e-07,
+      "loss": 19.103,
+      "step": 1456
+    },
+    {
+      "epoch": 0.6447064007932136,
+      "grad_norm": 3.0548243522644043,
+      "learning_rate": 1.7224143227190236e-07,
+      "loss": 18.9624,
+      "step": 1463
+    },
+    {
+      "epoch": 0.6477911204142338,
+      "grad_norm": 3.0398287773132324,
+      "learning_rate": 1.132562476771959e-07,
+      "loss": 19.3204,
+      "step": 1470
+    },
+    {
+      "epoch": 0.6508758400352539,
+      "grad_norm": 3.19580078125,
+      "learning_rate": 6.657987063200533e-08,
+      "loss": 20.1019,
+      "step": 1477
+    },
+    {
+      "epoch": 0.6539605596562741,
+      "grad_norm": 2.927455186843872,
+      "learning_rate": 3.2223817833931805e-08,
+      "loss": 19.7744,
+      "step": 1484
+    },
+    {
+      "epoch": 0.6570452792772943,
+      "grad_norm": 3.276757001876831,
+      "learning_rate": 1.019656612492592e-08,
+      "loss": 19.2601,
+      "step": 1491
+    },
+    {
+      "epoch": 0.6601299988983145,
+      "grad_norm": 2.9033164978027344,
+      "learning_rate": 5.035503997385949e-10,
+      "loss": 19.2759,
+      "step": 1498
+    }
+  ],
+  "logging_steps": 7,
+  "max_steps": 1500,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 500,
+  "stateful_callbacks": {
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": true
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 8.41564354070446e+16,
+  "train_batch_size": 2,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:eb94623d4d22e506c6c6bc5f600b2f39d2e121ab770b094b9bf66db575c87cd4
+size 6776

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff