Model save

Browse files

Files changed (6) hide show

README.md +48 -180
adapter_model.safetensors +1 -1
all_results.json +8 -0
runs/May02_16-56-01_ip-172-31-69-60.ec2.internal/events.out.tfevents.1714669018.ip-172-31-69-60.ec2.internal.15901.0 +2 -2
train_results.json +8 -0
trainer_state.json +990 -0

README.md CHANGED Viewed

@@ -1,199 +1,67 @@
 ---
-library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
-## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
+license: llama2
+library_name: peft
+tags:
+- trl
+- sft
+- generated_from_trainer
+base_model: meta-llama/Llama-2-7b-hf
+model-index:
+- name: llama2-20p-POE
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# llama2-20p-POE
+This model is a fine-tuned version of [meta-llama/Llama-2-7b-hf](https://huggingface.co/meta-llama/Llama-2-7b-hf) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: nan
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0002
+- train_batch_size: 4
+- eval_batch_size: 1
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 32
+- total_eval_batch_size: 4
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1
+### Training results
+| Training Loss | Epoch | Step | Validation Loss |
+|:-------------:|:-----:|:----:|:---------------:|
+| 0.7591        | 1.0   | 675  | nan             |
+### Framework versions
+- PEFT 0.7.1
+- Transformers 4.39.0.dev0
+- Pytorch 2.2.2+cu121
+- Datasets 2.14.6
+- Tokenizers 0.15.2

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:10540e27a1ce26a5685bf6df50cc3ddceaf3acd7563f763984d0d8cd1dfab48a
 size 60089544

 version https://git-lfs.github.com/spec/v1
+oid sha256:40ddb079850c03f5d4532ebe864139d49dc71b1f17faba60ce25118cda620628
 size 60089544

all_results.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+    "epoch": 1.0,
+    "train_loss": 0.7647893634548893,
+    "train_runtime": 21987.6737,
+    "train_samples": 21594,
+    "train_samples_per_second": 0.982,
+    "train_steps_per_second": 0.031
+}

runs/May02_16-56-01_ip-172-31-69-60.ec2.internal/events.out.tfevents.1714669018.ip-172-31-69-60.ec2.internal.15901.0 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:3f20c4c61a1678f453c6c1cabea27bee1f8d43f13934f5fd8f04920c42e6e2ff
-size 25919

 version https://git-lfs.github.com/spec/v1
+oid sha256:523ff3552f4df64ec145dc96092a80447a2e0ab3271e29491abca7852efd8734
+size 33929

train_results.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+    "epoch": 1.0,
+    "train_loss": 0.7647893634548893,
+    "train_runtime": 21987.6737,
+    "train_samples": 21594,
+    "train_samples_per_second": 0.982,
+    "train_steps_per_second": 0.031
+}

trainer_state.json ADDED Viewed

	@@ -0,0 +1,990 @@

+{
+  "best_metric": null,
+  "best_model_checkpoint": null,
+  "epoch": 1.0,
+  "eval_steps": 500,
+  "global_step": 675,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.0,
+      "grad_norm": 0.028420851102407323,
+      "learning_rate": 2.9411764705882355e-06,
+      "loss": 0.8769,
+      "step": 1
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.02533437087421799,
+      "learning_rate": 1.4705882352941177e-05,
+      "loss": 0.863,
+      "step": 5
+    },
+    {
+      "epoch": 0.01,
+      "grad_norm": 0.029598127358419903,
+      "learning_rate": 2.9411764705882354e-05,
+      "loss": 0.8899,
+      "step": 10
+    },
+    {
+      "epoch": 0.02,
+      "grad_norm": 0.04418645965251518,
+      "learning_rate": 4.411764705882353e-05,
+      "loss": 0.8643,
+      "step": 15
+    },
+    {
+      "epoch": 0.03,
+      "grad_norm": 0.08391069341850195,
+      "learning_rate": 5.882352941176471e-05,
+      "loss": 0.8164,
+      "step": 20
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.0848430702291638,
+      "learning_rate": 7.352941176470589e-05,
+      "loss": 0.838,
+      "step": 25
+    },
+    {
+      "epoch": 0.04,
+      "grad_norm": 0.07340453634110397,
+      "learning_rate": 8.823529411764706e-05,
+      "loss": 0.8371,
+      "step": 30
+    },
+    {
+      "epoch": 0.05,
+      "grad_norm": 0.060721401358425686,
+      "learning_rate": 0.00010294117647058823,
+      "loss": 0.7967,
+      "step": 35
+    },
+    {
+      "epoch": 0.06,
+      "grad_norm": 0.06406699741542068,
+      "learning_rate": 0.00011764705882352942,
+      "loss": 0.7952,
+      "step": 40
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.0702367088144725,
+      "learning_rate": 0.0001323529411764706,
+      "loss": 0.7785,
+      "step": 45
+    },
+    {
+      "epoch": 0.07,
+      "grad_norm": 0.05482720735795007,
+      "learning_rate": 0.00014705882352941178,
+      "loss": 0.7855,
+      "step": 50
+    },
+    {
+      "epoch": 0.08,
+      "grad_norm": 0.061311996254463264,
+      "learning_rate": 0.00016176470588235295,
+      "loss": 0.7836,
+      "step": 55
+    },
+    {
+      "epoch": 0.09,
+      "grad_norm": 0.057997545698835314,
+      "learning_rate": 0.00017647058823529413,
+      "loss": 0.7932,
+      "step": 60
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.04913048738654215,
+      "learning_rate": 0.0001911764705882353,
+      "loss": 0.7746,
+      "step": 65
+    },
+    {
+      "epoch": 0.1,
+      "grad_norm": 0.051276736158220364,
+      "learning_rate": 0.00019999464266898484,
+      "loss": 0.7464,
+      "step": 70
+    },
+    {
+      "epoch": 0.11,
+      "grad_norm": 0.04995736942327917,
+      "learning_rate": 0.00019993437928712978,
+      "loss": 0.7248,
+      "step": 75
+    },
+    {
+      "epoch": 0.12,
+      "grad_norm": 0.045680340036933116,
+      "learning_rate": 0.0001998071963486563,
+      "loss": 0.7855,
+      "step": 80
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 0.053344671279534676,
+      "learning_rate": 0.00019961317901970953,
+      "loss": 0.7396,
+      "step": 85
+    },
+    {
+      "epoch": 0.13,
+      "grad_norm": 0.05347208673468481,
+      "learning_rate": 0.0001993524572210807,
+      "loss": 0.7623,
+      "step": 90
+    },
+    {
+      "epoch": 0.14,
+      "grad_norm": 0.044577349788325,
+      "learning_rate": 0.00019902520554120772,
+      "loss": 0.7595,
+      "step": 95
+    },
+    {
+      "epoch": 0.15,
+      "grad_norm": 0.04633173399608109,
+      "learning_rate": 0.00019863164311926433,
+      "loss": 0.7759,
+      "step": 100
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.044125485710741395,
+      "learning_rate": 0.00019817203349841738,
+      "loss": 0.7858,
+      "step": 105
+    },
+    {
+      "epoch": 0.16,
+      "grad_norm": 0.041739410620092,
+      "learning_rate": 0.00019764668444934854,
+      "loss": 0.7682,
+      "step": 110
+    },
+    {
+      "epoch": 0.17,
+      "grad_norm": 0.046548199607491646,
+      "learning_rate": 0.0001970559477641606,
+      "loss": 0.7442,
+      "step": 115
+    },
+    {
+      "epoch": 0.18,
+      "grad_norm": 0.04287304549804181,
+      "learning_rate": 0.0001964002190208052,
+      "loss": 0.7966,
+      "step": 120
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.04461470993270133,
+      "learning_rate": 0.00019567993731818984,
+      "loss": 0.7678,
+      "step": 125
+    },
+    {
+      "epoch": 0.19,
+      "grad_norm": 0.039705545161659035,
+      "learning_rate": 0.00019489558498214196,
+      "loss": 0.7345,
+      "step": 130
+    },
+    {
+      "epoch": 0.2,
+      "grad_norm": 0.03330898777260923,
+      "learning_rate": 0.00019404768724242666,
+      "loss": 0.7714,
+      "step": 135
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 0.046594949440474036,
+      "learning_rate": 0.00019313681188103457,
+      "loss": 0.7757,
+      "step": 140
+    },
+    {
+      "epoch": 0.21,
+      "grad_norm": 0.04928885679685318,
+      "learning_rate": 0.000192163568851975,
+      "loss": 0.8217,
+      "step": 145
+    },
+    {
+      "epoch": 0.22,
+      "grad_norm": 0.04466472170425771,
+      "learning_rate": 0.00019112860987282958,
+      "loss": 0.7356,
+      "step": 150
+    },
+    {
+      "epoch": 0.23,
+      "grad_norm": 0.05078422739553919,
+      "learning_rate": 0.0001900326279883392,
+      "loss": 0.7262,
+      "step": 155
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 0.042393502583193486,
+      "learning_rate": 0.00018887635710631716,
+      "loss": 0.791,
+      "step": 160
+    },
+    {
+      "epoch": 0.24,
+      "grad_norm": 0.04096339138017866,
+      "learning_rate": 0.00018766057150619865,
+      "loss": 0.7621,
+      "step": 165
+    },
+    {
+      "epoch": 0.25,
+      "grad_norm": 0.04894911531750606,
+      "learning_rate": 0.00018638608532055634,
+      "loss": 0.714,
+      "step": 170
+    },
+    {
+      "epoch": 0.26,
+      "grad_norm": 0.04424627496496155,
+      "learning_rate": 0.00018505375198992857,
+      "loss": 0.7445,
+      "step": 175
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 0.05064439962306937,
+      "learning_rate": 0.00018366446369132578,
+      "loss": 0.7502,
+      "step": 180
+    },
+    {
+      "epoch": 0.27,
+      "grad_norm": 0.05185726609274544,
+      "learning_rate": 0.00018221915074079762,
+      "loss": 0.7423,
+      "step": 185
+    },
+    {
+      "epoch": 0.28,
+      "grad_norm": 0.049634018260632524,
+      "learning_rate": 0.00018071878097046065,
+      "loss": 0.7853,
+      "step": 190
+    },
+    {
+      "epoch": 0.29,
+      "grad_norm": 0.03718521878894617,
+      "learning_rate": 0.00017916435908040413,
+      "loss": 0.7575,
+      "step": 195
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 0.054866103106943676,
+      "learning_rate": 0.00017755692596590778,
+      "loss": 0.7604,
+      "step": 200
+    },
+    {
+      "epoch": 0.3,
+      "grad_norm": 0.040017034968621745,
+      "learning_rate": 0.00017589755802042186,
+      "loss": 0.7818,
+      "step": 205
+    },
+    {
+      "epoch": 0.31,
+      "grad_norm": 0.03964997679073274,
+      "learning_rate": 0.00017418736641477636,
+      "loss": 0.7464,
+      "step": 210
+    },
+    {
+      "epoch": 0.32,
+      "grad_norm": 0.051157610923925706,
+      "learning_rate": 0.0001724274963531022,
+      "loss": 0.7534,
+      "step": 215
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 0.04692776206415383,
+      "learning_rate": 0.00017061912630596252,
+      "loss": 0.7862,
+      "step": 220
+    },
+    {
+      "epoch": 0.33,
+      "grad_norm": 0.04009778793971981,
+      "learning_rate": 0.00016876346722120747,
+      "loss": 0.7619,
+      "step": 225
+    },
+    {
+      "epoch": 0.34,
+      "grad_norm": 0.037858593687305236,
+      "learning_rate": 0.00016686176171308126,
+      "loss": 0.7822,
+      "step": 230
+    },
+    {
+      "epoch": 0.35,
+      "grad_norm": 0.03514517489146636,
+      "learning_rate": 0.0001649152832301241,
+      "loss": 0.7475,
+      "step": 235
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 0.043964485334365984,
+      "learning_rate": 0.00016292533520242662,
+      "loss": 0.775,
+      "step": 240
+    },
+    {
+      "epoch": 0.36,
+      "grad_norm": 0.06121852032774167,
+      "learning_rate": 0.00016089325016880736,
+      "loss": 0.7501,
+      "step": 245
+    },
+    {
+      "epoch": 0.37,
+      "grad_norm": 0.050365919886299945,
+      "learning_rate": 0.0001588203888844982,
+      "loss": 0.7498,
+      "step": 250
+    },
+    {
+      "epoch": 0.38,
+      "grad_norm": 0.04601818654614394,
+      "learning_rate": 0.00015670813940993502,
+      "loss": 0.7942,
+      "step": 255
+    },
+    {
+      "epoch": 0.39,
+      "grad_norm": 0.049451503331748733,
+      "learning_rate": 0.00015455791618126404,
+      "loss": 0.7232,
+      "step": 260
+    },
+    {
+      "epoch": 0.39,
+      "grad_norm": 0.049704756401709786,
+      "learning_rate": 0.00015237115906318563,
+      "loss": 0.7474,
+      "step": 265
+    },
+    {
+      "epoch": 0.4,
+      "grad_norm": 0.043536870823896914,
+      "learning_rate": 0.0001501493323847707,
+      "loss": 0.7074,
+      "step": 270
+    },
+    {
+      "epoch": 0.41,
+      "grad_norm": 0.052592436248192806,
+      "learning_rate": 0.00014789392395889468,
+      "loss": 0.7675,
+      "step": 275
+    },
+    {
+      "epoch": 0.41,
+      "grad_norm": 0.04641887287230184,
+      "learning_rate": 0.00014560644408594602,
+      "loss": 0.7884,
+      "step": 280
+    },
+    {
+      "epoch": 0.42,
+      "grad_norm": 0.03778851947891411,
+      "learning_rate": 0.0001432884245424761,
+      "loss": 0.7364,
+      "step": 285
+    },
+    {
+      "epoch": 0.43,
+      "grad_norm": 0.04383653972641628,
+      "learning_rate": 0.00014094141755546815,
+      "loss": 0.7633,
+      "step": 290
+    },
+    {
+      "epoch": 0.44,
+      "grad_norm": 0.04958511831355967,
+      "learning_rate": 0.00013856699476291176,
+      "loss": 0.7254,
+      "step": 295
+    },
+    {
+      "epoch": 0.44,
+      "grad_norm": 0.047545677145208354,
+      "learning_rate": 0.000136166746161379,
+      "loss": 0.7389,
+      "step": 300
+    },
+    {
+      "epoch": 0.45,
+      "grad_norm": 0.049601892362158714,
+      "learning_rate": 0.00013374227904130724,
+      "loss": 0.7298,
+      "step": 305
+    },
+    {
+      "epoch": 0.46,
+      "grad_norm": 0.039019104385466755,
+      "learning_rate": 0.00013129521691070107,
+      "loss": 0.7372,
+      "step": 310
+    },
+    {
+      "epoch": 0.47,
+      "grad_norm": 0.04324309132836927,
+      "learning_rate": 0.00012882719840797473,
+      "loss": 0.7586,
+      "step": 315
+    },
+    {
+      "epoch": 0.47,
+      "grad_norm": 0.04511381039375704,
+      "learning_rate": 0.0001263398762046623,
+      "loss": 0.782,
+      "step": 320
+    },
+    {
+      "epoch": 0.48,
+      "grad_norm": 0.037065274891104005,
+      "learning_rate": 0.00012383491589873123,
+      "loss": 0.73,
+      "step": 325
+    },
+    {
+      "epoch": 0.49,
+      "grad_norm": 0.04379387246767109,
+      "learning_rate": 0.0001213139948992394,
+      "loss": 0.7602,
+      "step": 330
+    },
+    {
+      "epoch": 0.5,
+      "grad_norm": 0.04868933738312991,
+      "learning_rate": 0.0001187788013030837,
+      "loss": 0.7467,
+      "step": 335
+    },
+    {
+      "epoch": 0.5,
+      "grad_norm": 0.04841745199938846,
+      "learning_rate": 0.00011623103276459086,
+      "loss": 0.7862,
+      "step": 340
+    },
+    {
+      "epoch": 0.51,
+      "grad_norm": 0.046825753709491394,
+      "learning_rate": 0.00011367239535870913,
+      "loss": 0.7523,
+      "step": 345
+    },
+    {
+      "epoch": 0.52,
+      "grad_norm": 0.05204047537850423,
+      "learning_rate": 0.00011110460243856052,
+      "loss": 0.721,
+      "step": 350
+    },
+    {
+      "epoch": 0.53,
+      "grad_norm": 0.04835371095843328,
+      "learning_rate": 0.0001085293734881197,
+      "loss": 0.8165,
+      "step": 355
+    },
+    {
+      "epoch": 0.53,
+      "grad_norm": 0.046503091391954944,
+      "learning_rate": 0.00010594843297078737,
+      "loss": 0.7151,
+      "step": 360
+    },
+    {
+      "epoch": 0.54,
+      "grad_norm": 0.05401067419624875,
+      "learning_rate": 0.00010336350917462925,
+      "loss": 0.7623,
+      "step": 365
+    },
+    {
+      "epoch": 0.55,
+      "grad_norm": 0.046238914587313926,
+      "learning_rate": 0.00010077633305505403,
+      "loss": 0.7952,
+      "step": 370
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 0.04067724976292184,
+      "learning_rate": 9.818863707570475e-05,
+      "loss": 0.7509,
+      "step": 375
+    },
+    {
+      "epoch": 0.56,
+      "grad_norm": 0.041537242387637924,
+      "learning_rate": 9.560215404834095e-05,
+      "loss": 0.7121,
+      "step": 380
+    },
+    {
+      "epoch": 0.57,
+      "grad_norm": 0.042556692843415594,
+      "learning_rate": 9.30186159724869e-05,
+      "loss": 0.7708,
+      "step": 385
+    },
+    {
+      "epoch": 0.58,
+      "grad_norm": 0.04846970178587132,
+      "learning_rate": 9.043975287562441e-05,
+      "loss": 0.7258,
+      "step": 390
+    },
+    {
+      "epoch": 0.59,
+      "grad_norm": 0.062274515472467,
+      "learning_rate": 8.786729165470584e-05,
+      "loss": 0.7698,
+      "step": 395
+    },
+    {
+      "epoch": 0.59,
+      "grad_norm": 0.05082087012341951,
+      "learning_rate": 8.530295491976337e-05,
+      "loss": 0.7717,
+      "step": 400
+    },
+    {
+      "epoch": 0.6,
+      "grad_norm": 0.04831882202604005,
+      "learning_rate": 8.274845984038916e-05,
+      "loss": 0.7679,
+      "step": 405
+    },
+    {
+      "epoch": 0.61,
+      "grad_norm": 0.051680028901517745,
+      "learning_rate": 8.020551699585842e-05,
+      "loss": 0.7265,
+      "step": 410
+    },
+    {
+      "epoch": 0.61,
+      "grad_norm": 0.051175873784970766,
+      "learning_rate": 7.76758292296659e-05,
+      "loss": 0.7696,
+      "step": 415
+    },
+    {
+      "epoch": 0.62,
+      "grad_norm": 0.06738625933727418,
+      "learning_rate": 7.516109050924201e-05,
+      "loss": 0.7781,
+      "step": 420
+    },
+    {
+      "epoch": 0.63,
+      "grad_norm": 0.05117489109997125,
+      "learning_rate": 7.266298479161318e-05,
+      "loss": 0.771,
+      "step": 425
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 0.04211785915218291,
+      "learning_rate": 7.01831848957653e-05,
+      "loss": 0.7368,
+      "step": 430
+    },
+    {
+      "epoch": 0.64,
+      "grad_norm": 0.049459639807936356,
+      "learning_rate": 6.772335138246548e-05,
+      "loss": 0.7234,
+      "step": 435
+    },
+    {
+      "epoch": 0.65,
+      "grad_norm": 0.05981206570373218,
+      "learning_rate": 6.528513144229255e-05,
+      "loss": 0.7063,
+      "step": 440
+    },
+    {
+      "epoch": 0.66,
+      "grad_norm": 0.042858944283255956,
+      "learning_rate": 6.287015779262064e-05,
+      "loss": 0.7178,
+      "step": 445
+    },
+    {
+      "epoch": 0.67,
+      "grad_norm": 0.051605890455677136,
+      "learning_rate": 6.048004758429451e-05,
+      "loss": 0.8009,
+      "step": 450
+    },
+    {
+      "epoch": 0.67,
+      "grad_norm": 0.04839278716779415,
+      "learning_rate": 5.8116401318728667e-05,
+      "loss": 0.7969,
+      "step": 455
+    },
+    {
+      "epoch": 0.68,
+      "grad_norm": 0.053679686497026036,
+      "learning_rate": 5.578080177615575e-05,
+      "loss": 0.7744,
+      "step": 460
+    },
+    {
+      "epoch": 0.69,
+      "grad_norm": 0.04886589382975143,
+      "learning_rate": 5.3474812955741404e-05,
+      "loss": 0.782,
+      "step": 465
+    },
+    {
+      "epoch": 0.7,
+      "grad_norm": 0.05872104081042249,
+      "learning_rate": 5.119997902827584e-05,
+      "loss": 0.7684,
+      "step": 470
+    },
+    {
+      "epoch": 0.7,
+      "grad_norm": 0.04971188432483476,
+      "learning_rate": 4.895782330214291e-05,
+      "loss": 0.8219,
+      "step": 475
+    },
+    {
+      "epoch": 0.71,
+      "grad_norm": 0.05137448628344103,
+      "learning_rate": 4.674984720325961e-05,
+      "loss": 0.7654,
+      "step": 480
+    },
+    {
+      "epoch": 0.72,
+      "grad_norm": 0.04686654449943715,
+      "learning_rate": 4.4577529269668874e-05,
+      "loss": 0.7774,
+      "step": 485
+    },
+    {
+      "epoch": 0.73,
+      "grad_norm": 0.055295260668893974,
+      "learning_rate": 4.244232416145839e-05,
+      "loss": 0.7245,
+      "step": 490
+    },
+    {
+      "epoch": 0.73,
+      "grad_norm": 0.05389337215866635,
+      "learning_rate": 4.0345661686669745e-05,
+      "loss": 0.8061,
+      "step": 495
+    },
+    {
+      "epoch": 0.74,
+      "grad_norm": 0.05870235717745641,
+      "learning_rate": 3.828894584384867e-05,
+      "loss": 0.8031,
+      "step": 500
+    },
+    {
+      "epoch": 0.75,
+      "grad_norm": 0.05571889022670846,
+      "learning_rate": 3.62735538818787e-05,
+      "loss": 0.7614,
+      "step": 505
+    },
+    {
+      "epoch": 0.76,
+      "grad_norm": 0.04492834518434723,
+      "learning_rate": 3.43008353777269e-05,
+      "loss": 0.7331,
+      "step": 510
+    },
+    {
+      "epoch": 0.76,
+      "grad_norm": 0.05343462218042988,
+      "learning_rate": 3.237211133272004e-05,
+      "loss": 0.7355,
+      "step": 515
+    },
+    {
+      "epoch": 0.77,
+      "grad_norm": 0.047988801277496954,
+      "learning_rate": 3.0488673287955882e-05,
+      "loss": 0.7237,
+      "step": 520
+    },
+    {
+      "epoch": 0.78,
+      "grad_norm": 0.055886245680751935,
+      "learning_rate": 2.8651782459442176e-05,
+      "loss": 0.7426,
+      "step": 525
+    },
+    {
+      "epoch": 0.79,
+      "grad_norm": 0.04696112028105608,
+      "learning_rate": 2.686266889354211e-05,
+      "loss": 0.7487,
+      "step": 530
+    },
+    {
+      "epoch": 0.79,
+      "grad_norm": 0.04555764819319834,
+      "learning_rate": 2.5122530643292275e-05,
+      "loss": 0.7344,
+      "step": 535
+    },
+    {
+      "epoch": 0.8,
+      "grad_norm": 0.05199303289710418,
+      "learning_rate": 2.3432532966144527e-05,
+      "loss": 0.7604,
+      "step": 540
+    },
+    {
+      "epoch": 0.81,
+      "grad_norm": 0.05146492861699787,
+      "learning_rate": 2.1793807543668853e-05,
+      "loss": 0.7383,
+      "step": 545
+    },
+    {
+      "epoch": 0.81,
+      "grad_norm": 0.05548374949115557,
+      "learning_rate": 2.0207451723739633e-05,
+      "loss": 0.7565,
+      "step": 550
+    },
+    {
+      "epoch": 0.82,
+      "grad_norm": 0.04718878121287069,
+      "learning_rate": 1.8674527785713247e-05,
+      "loss": 0.7889,
+      "step": 555
+    },
+    {
+      "epoch": 0.83,
+      "grad_norm": 0.07223231382050395,
+      "learning_rate": 1.7196062229088604e-05,
+      "loss": 0.7734,
+      "step": 560
+    },
+    {
+      "epoch": 0.84,
+      "grad_norm": 0.051918878796507154,
+      "learning_rate": 1.577304508612717e-05,
+      "loss": 0.7697,
+      "step": 565
+    },
+    {
+      "epoch": 0.84,
+      "grad_norm": 0.05143371773283759,
+      "learning_rate": 1.4406429258892762e-05,
+      "loss": 0.7591,
+      "step": 570
+    },
+    {
+      "epoch": 0.85,
+      "grad_norm": 0.06242991163796485,
+      "learning_rate": 1.3097129881154934e-05,
+      "loss": 0.7888,
+      "step": 575
+    },
+    {
+      "epoch": 0.86,
+      "grad_norm": 0.05577486269105434,
+      "learning_rate": 1.1846023705583442e-05,
+      "loss": 0.7503,
+      "step": 580
+    },
+    {
+      "epoch": 0.87,
+      "grad_norm": 0.05386359623343792,
+      "learning_rate": 1.065394851664394e-05,
+      "loss": 0.7306,
+      "step": 585
+    },
+    {
+      "epoch": 0.87,
+      "grad_norm": 0.06740139242925512,
+      "learning_rate": 9.521702569588198e-06,
+      "loss": 0.7748,
+      "step": 590
+    },
+    {
+      "epoch": 0.88,
+      "grad_norm": 0.05952304608258396,
+      "learning_rate": 8.450044055914497e-06,
+      "loss": 0.6941,
+      "step": 595
+    },
+    {
+      "epoch": 0.89,
+      "grad_norm": 0.06052761239891292,
+      "learning_rate": 7.439690595656013e-06,
+      "loss": 0.7775,
+      "step": 600
+    },
+    {
+      "epoch": 0.9,
+      "grad_norm": 0.05646485799009386,
+      "learning_rate": 6.4913187568374164e-06,
+      "loss": 0.7941,
+      "step": 605
+    },
+    {
+      "epoch": 0.9,
+      "grad_norm": 0.051627777900137006,
+      "learning_rate": 5.605563602421149e-06,
+      "loss": 0.7621,
+      "step": 610
+    },
+    {
+      "epoch": 0.91,
+      "grad_norm": 0.05672370227304409,
+      "learning_rate": 4.783018265047179e-06,
+      "loss": 0.7598,
+      "step": 615
+    },
+    {
+      "epoch": 0.92,
+      "grad_norm": 0.04806888785096822,
+      "learning_rate": 4.024233549850509e-06,
+      "loss": 0.7585,
+      "step": 620
+    },
+    {
+      "epoch": 0.93,
+      "grad_norm": 0.054256686487143276,
+      "learning_rate": 3.329717565622825e-06,
+      "loss": 0.7766,
+      "step": 625
+    },
+    {
+      "epoch": 0.93,
+      "grad_norm": 0.04081148671596208,
+      "learning_rate": 2.699935384565111e-06,
+      "loss": 0.7324,
+      "step": 630
+    },
+    {
+      "epoch": 0.94,
+      "grad_norm": 0.052421185625722275,
+      "learning_rate": 2.1353087308590314e-06,
+      "loss": 0.7933,
+      "step": 635
+    },
+    {
+      "epoch": 0.95,
+      "grad_norm": 0.05392813215656955,
+      "learning_rate": 1.6362156982656084e-06,
+      "loss": 0.7896,
+      "step": 640
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 0.052109378844003566,
+      "learning_rate": 1.2029904969404482e-06,
+      "loss": 0.7633,
+      "step": 645
+    },
+    {
+      "epoch": 0.96,
+      "grad_norm": 0.05562712893622207,
+      "learning_rate": 8.359232296349162e-07,
+      "loss": 0.7664,
+      "step": 650
+    },
+    {
+      "epoch": 0.97,
+      "grad_norm": 0.05328976304172556,
+      "learning_rate": 5.352596974332436e-07,
+      "loss": 0.7658,
+      "step": 655
+    },
+    {
+      "epoch": 0.98,
+      "grad_norm": 0.05351883682294467,
+      "learning_rate": 3.0120123515540164e-07,
+      "loss": 0.7871,
+      "step": 660
+    },
+    {
+      "epoch": 0.99,
+      "grad_norm": 0.05900811011725127,
+      "learning_rate": 1.3390457653639222e-07,
+      "loss": 0.7749,
+      "step": 665
+    },
+    {
+      "epoch": 0.99,
+      "grad_norm": 0.06175235611664889,
+      "learning_rate": 3.3481749271768726e-08,
+      "loss": 0.7353,
+      "step": 670
+    },
+    {
+      "epoch": 1.0,
+      "grad_norm": 0.05208882374001457,
+      "learning_rate": 0.0,
+      "loss": 0.7591,
+      "step": 675
+    },
+    {
+      "epoch": 1.0,
+      "eval_loss": NaN,
+      "eval_runtime": 2998.9455,
+      "eval_samples_per_second": 0.77,
+      "eval_steps_per_second": 0.193,
+      "step": 675
+    },
+    {
+      "epoch": 1.0,
+      "step": 675,
+      "total_flos": 2.235287773328179e+16,
+      "train_loss": 0.7647893634548893,
+      "train_runtime": 21987.6737,
+      "train_samples_per_second": 0.982,
+      "train_steps_per_second": 0.031
+    }
+  ],
+  "logging_steps": 5,
+  "max_steps": 675,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 1,
+  "save_steps": 100,
+  "total_flos": 2.235287773328179e+16,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}